FunSeq2 data context download problem

Q:
After reading your recent paper about the FunSeq2 tool, which is very nice, I was interested to take a closer look at your data. Unfortunately, it seems that I’m not able to download the data context from http://funseq2.gersteinlab.org/data/ . The server always drops the connection after I download about 1Gb of the compressed file, and I also can’t access at all some of the individual files, e.g. human_ancestor_GRCh37_e59.fa . Would you help me to solve this problem?

A:
We have added a alternative link to download files: http://funseq2.gersteinlab.org/data/
Now you can download the files from : http://archive.gersteinlab.org/funseq2_data/

“Results not accessible” @ funseq page

Q:

It is requested to kindly check your servers as results page of
"FunSeq" is not giving any kind of output. It just returns "Page not
found" after running the tool on uploaded vcf file.

A:
We should definitely improve the server. The server deletes the results once per week. Unfortunately I cannot see you results now. I just checked the server. It does return results.

When it shows ‘Page not found’, there are several reasons. 1. the file format is incorrect. 2. no variants left after the filtering against 1000 genomes. If it is the second case, please set different values to MAF (for example, 1). 3. Funseq doesn’t analyze Indels. If there are indels, they will be filtered out. Thus if no results left, there will be an error.

If you still experience the same problem, feel free to contact us. Please give us the job id number, then I can check it for you.

Mutations in sensitive and ultra-sensitive regions

Q:
I read your paper entitled “Integrative annotation of variants from 1092 humans: application to cancer genomics” in Science from Oct. 4, 2013. Since the mutation in the so-called ultra sensitive regions play an important role in cancer development, I wonder whether it is possible to find out where those mutations are in the ultra sensitive region and what mutations they are? I can’t find them in the paper although they are mentioned.
Is there some where in which I can go and find the mutations?

A:
Thanks for your interest in our paper.
You can find the genomic coordinates of sensitive and ultra-sensitive regions in Data File S3 provided with the supplement of the paper. For the cancer samples we analyzed, you will find the coordinates and detailed information for candidate drivers in Data File S6; this file also lists whether the mutations are in sensitive or ultra-sensitive regions.

Annotation of SNPs as breaking or conserving TF motifs

Q:

Congrats with a very nice paper in Science (Khurana et al., 2013). I am particularly interested in how you are able to score variants in transcription factor binding sites. According to the supplementary methods you say that: "An SNV that breaks a motif is defined as a mutation that decreases the motif-matching score of the TF-binding site to the position weight matrix (PWM) of the motif (relative to the ancestral allele) (8). Conversely, an SNV that conserves a motif is defined as a mutation that increases the motif-matching score of the TF-binding site to the PWM of the motif."

This makes perfectly sense to me. But how do you define the TF-binding site in the first place? I would guess that you are applying a threshold on the motif-matching score here (to reduce the fraction of false positives), and that you then define disruption/conservation of the variant relative to this score. I cannot see any details with respect to this aspect in the paper (as far as I can see).

You refer to Mu et al. (NAR, 2011), I cannot however see any further details there.

I would very much appreciate an explanation of how you find the TF binding sites and if you use any PWM-score thresholds in this respect.

A:

The set of motifs we used in the two papers are the set of TF motifs officially released by the ENCODE project and was used in the ENCODE main publication in 2012 too. The algorithm to detect the motifs is developed by Pouya at MIT. Here is more detail about it.
http://compbio.mit.edu/encode-motifs/

In our paper, we take these motif coordinates and categorized SNVs based on their functional effects you described.