I recently read your paper on Funseq, and I am pretty interested in using it in solving some of my interested questions regarding cortex plasticiy. However, I’m not very familiar with Linux/UNIX running environment for this software, and what I have is just a mac laptop….Could you give me some information about how I could use this software on a mac computer, or where I could find some useful information instructing me how I could use this software on a mac computer?
You should be able to download this software on a mac and use it.
You can download it from funseq.gersteinlab.org.
I read Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics and A systematic survey of loss-of-function variants in human protein-coding genes, and interested about the list of ‘LoF-tolerant’ gene category. I would be appreciated if you could provide with it.
Please see below the list of LoF-tolerant genes from the Science paper.
This list is based on the data from Phase 1 of the 1000 Genomes project.
We are trying to implement the scores of Funseq2 (running locally).
However, we would like to have a score for each variation in the
input-vcf: this is not the case if we look at the Output.vcf.
Can I conclude from this output, that the missing variants in
Output.vcf have a score of zero?
The somatic variants that overlap 1000 Genomes variants are filtered out.
Those might be the variants being removed from your output vcf.
You can check one or two manually and you should be able to confirm that.
I read with great interest your exciting paper on "Interpretation of genomic variants using a unified biological network approach".
In the last section of the Results, you describe the validation of your logistic regression model using a list of 140 LoF-tolerant genes (McArthur et al 2012) and a list of 115 essential genes (Liao et 2008). Even though I also read both papers, I couldn’t really find the lists of genes mentioned above (e.g. the supplementary table of Liao’s essential genes lists 120 genes and not 115 genes)
So, I was wondering if you’d be so kind and share the list of 140 LoF-tolerant genes and the list of 115 essential genes.
In our plos comp bio paper in Supplementary Table S8 – the genes with significance_score=0 (second column) are LoF-tolerant genes and genes with significance_score=3 are Essential genes. This file contains 140 LoF-tol and 115 essential genes.
I think Liao et al reports 120 essential genes but with gene id conversions we lost 5 of them.
I read your paper entitled “Integrative annotation of variants from 1092 humans: application to cancer genomics” in Science from Oct. 4, 2013. Since the mutation in the so-called ultra sensitive regions play an important role in cancer development, I wonder whether it is possible to find out where those mutations are in the ultra sensitive region and what mutations they are? I can’t find them in the paper although they are mentioned.
Is there some where in which I can go and find the mutations?
Thanks for your interest in our paper.
You can find the genomic coordinates of sensitive and ultra-sensitive regions in Data File S3 provided with the supplement of the paper. For the cancer samples we analyzed, you will find the coordinates and detailed information for candidate drivers in Data File S6; this file also lists whether the mutations are in sensitive or ultra-sensitive regions.
Congrats with a very nice paper in Science (Khurana et al., 2013). I am particularly interested in how you are able to score variants in transcription factor binding sites. According to the supplementary methods you say that: "An SNV that breaks a motif is defined as a mutation that decreases the motif-matching score of the TF-binding site to the position weight matrix (PWM) of the motif (relative to the ancestral allele) (8). Conversely, an SNV that conserves a motif is defined as a mutation that increases the motif-matching score of the TF-binding site to the PWM of the motif."
This makes perfectly sense to me. But how do you define the TF-binding site in the first place? I would guess that you are applying a threshold on the motif-matching score here (to reduce the fraction of false positives), and that you then define disruption/conservation of the variant relative to this score. I cannot see any details with respect to this aspect in the paper (as far as I can see).
You refer to Mu et al. (NAR, 2011), I cannot however see any further details there.
I would very much appreciate an explanation of how you find the TF binding sites and if you use any PWM-score thresholds in this respect.
The set of motifs we used in the two papers are the set of TF motifs officially released by the ENCODE project and was used in the ENCODE main publication in 2012 too. The algorithm to detect the motifs is developed by Pouya at MIT. Here is more detail about it. http://compbio.mit.edu/encode-motifs/
In our paper, we take these motif coordinates and categorized SNVs based on their functional effects you described.