Questions about FunSeq2

Recently, I used FunSeq2 to identify non-coding regulatory variations in my
bladder cancer research. In promoter analysis, I discovered the original
file, gencode.v19.promoter.bed, which downloaded from, having the
promoter areas of ranking from 1 to 8979 bp, that was inconsistent with the
definition in your article (promoters defined as -2.5 kb from transcription
starting sites).

So, I checked the process script, which was downloaded
from and suited for gencode.v16.
This script generated gencode.v16.promoter.bed
(, and the
BED file’s 3rd column minus 2nd column all equals 2500 (2.5 Kb). After
comparing, I noticed that the latest gencode.v19.promoter.bed has the
additional 5th column, so I realized the script had
been re-edited, but I did’t find the latest version on the internet.
Therefore, I wonder whether the latest redefined the
meaning of promoter. If it does, can I get one copy of this script?

The promoter file was derived from PCAWG promoter set, which may consider chromHMM segmentation information. Yao have updated this in the v2.1.2, then I keep it in the latest version. User can replace the right file using their own definition of promoters.

The promoter file included in Funseq 2.1.2 is based on PCAWG consortium’s definition, which considers ChromHMM segmentation information. So it will not be exactly 2kb or 2.5kb upstream of TSS.

Using LARVA and FunSeq2 for variant analysis

I have read your articles describing FunSeq2 and LARVA. I
find these two frameworks to be the most complete and well-adapted and so, I
am very interested in using them for my analysis. I have installed both
tools and started to run them following the instructions in the
documentation, but I am still encountering a few problems.

First, I have run the web-based version of FunSeq2 on several of my VCF
files and it seems to return the wanted result, with around 10,000+ entries
for each sample. However, when running the tool on the same files in command
line (with the -nc option), I obtain a different result, with no significant
entries returned.

The output returned is:

… Input format check : vcf …
… Format ok …
… Start filtering SNVs with minor allele frequency = 0 …
Warning: sample Sample1 – no SNVs left after filtering against natrual
variations …

I receive a similar result when attempting to run the program on multiple
files at once (both in command line and on the web).

I am also trying to use LARVA on these files; I have managed to install the
tool and I am currently testing it using the example-variants-1.txt file
from the regression suite as the variant file, but the program returns
“Segmentation Fault: 11” with no other error message.

Therefore, I would like to know if you have encountered these errors before
and if so, please let me know about any steps that I can try to correct

I’m glad to hear that you’ve decided to use LARVA for your analyses. I did some investigating with the LARVA codebase to try to figure out what might be causing the segmentation fault. One thing I found was that one of the helper scripts (bigWigAverageOverBed) is provided in its Linux (64-bit) version, so if you run LARVA on a different type of system (e.g. a Mac), the script won’t work. There are versions for other operating systems here (at the end of the page), but for simplicity we only provided the 64-bit Linux version. If that doesn’t fix the issue, could you please tell me everything you can about the environment in which you’re running LARVA (CPU, RAM, operating system, etc.) and the command line parameters you used.

Also, for help on Funseq2, I refer you to my colleague, Shake Lou (cc’ed).

One more thing I just thought of: how are all your input files formatted?

As to the issue about Funseq2, here is some suggestions:

1. The Funseq webserver version is obsolete, and we recommend you to use github version.
2. The latest 2.1.6 version has fixed a bug that might lead to some variant missed from the output.
3. Please use bed format as the output format. I will update vcf format output later.
4. You can also try, which we have pre-calculated each position’s score for the hg19 genome. If you have a large number of variants to query, we have another good news. We are also testing a rich format whole genome Funseq output file and can let you retrieve the Funseq annotation simply from the command line. If you are interested in this file, we can give you the pre-release testing once it passed our internal QC very soon.

Question about Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors

I was intrigued by your paper about classifying the human genomic regions based on experimentally determined transcription factor binding sites. I was wondering if you can share genomic loci of the six types of regions that you were able to identify in this paper. I was also wondering if by your analysis you were able to conclude which regions are not tissue specific. I was also curious to know if you have done similar analysis on other species. It would be great if you would be able to share the scripts that you used to generate these results if they are available in some sort of a program/package.


FunSeq2 encountered issues processing whole-genome data

I am attempting to use FunSeq2 to complete analysis on whole-genome data, and unfortunately have encountered issues. As there is no contact listed in the documentation, I thought I would try contacting you to inquire about troubleshooting. After loading a BED file in the appropriate format, a message is returned stating that the requested page is unavailable due to a server hiccup.

Could you send me a few lines of your input ? or id provided by the website ?

The ID provided by the website is: 201511510325290607. I’ve also included a few sample lines from my input below. Please let me know if I can provide any further information.




Your input format is different from the usual BED format. Could you separate the fields with tab (instead of comma) and try again ? The last column will be treated as sample name.

Funseq2 output: missing variants

We are trying to implement the scores of Funseq2 (running locally).
However, we would like to have a score for each variation in the
input-vcf: this is not the case if we look at the Output.vcf.
Can I conclude from this output, that the missing variants in
Output.vcf have a score of zero?

The somatic variants that overlap 1000 Genomes variants are filtered out.
Those might be the variants being removed from your output vcf.
You can check one or two manually and you should be able to confirm that.

FunSeq2 data context download problem

After reading your recent paper about the FunSeq2 tool, which is very nice, I was interested to take a closer look at your data. Unfortunately, it seems that I’m not able to download the data context from . The server always drops the connection after I download about 1Gb of the compressed file, and I also can’t access at all some of the individual files, e.g. human_ancestor_GRCh37_e59.fa . Would you help me to solve this problem?

We have added a alternative link to download files:
Now you can download the files from :