Article Problem LARVA

Q:
I am reading your article of “LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations”.And I am really interest in it.But when I run the source code by following the intruductions,I meet some problems.

I put all files in the right places.And I do "make" command successfully,the picture is followed.

A:
When you compile LARVA, the "larva" executable is created in the top level of the LARVA distribution, but it is NOT added to the PATH environment variable. Invoking the LARVA executable as you did would work if the "larva" executable was installed in a standard location like "/usr/bin" or "/usr/local/bin", but since the Makefile creates the executable in the same directory as the .cpp files, you need to invoke it with "./larva", so the Terminal knows to look for the executable in the current directory. Alternatively, you can add the LARVA code directory to your PATH variable like so:

export PATH=~/larva2/code:$PATH

Error reports of larva software

Q1:
I am using larva software to investigate the noncoding hotspot mutation, but one error message was reported as follows:

Error: Mutation counts file example.snv.bed has too few columns on line 1. Expected at least 5, but found 4. Exiting.

The command I used: ./larva -vf example.snv.bed -af example.anno.bed -o larva.out -b

It makes me pretty confused that the “example.snv.bed” file really has 5 columns seperated by tab but the error says only found 4. I have tried a lot but still could not figure it out. Could you please give some help?

#####
The example.snv.bed file likes this:

chrM 5650 5651 BLCA_GD blca01

chrM 8863 8864 BLCA_GD blca01

chr1 1111476 1111477 BLCA_GD blca01

chr1 1632977 1632978 BLCA_GD blca01

chr1 1657153 1657154 BLCA_GD blca01

chr1 2584370 2584371 BLCA_GD blca01
####

The example.anno.bed file likes this:

the fourth column is the annotation info(only subset )

It would be really a great appreciate for your help.

A1:
It looks like the variant file and annotation file excerpts you attached with your email contain the same data (based on columns 1-3). I suspect that wasn’t your intended use of LARVA. Could you please send me the actual set of annotations you’re using? It would be a huge help to uncovering the root cause of the error.

Q2:
As you said, I think maybe the input annotation file is the point that makes an error. Actually, I do not fully apprehend what the annotation file should be.

In your paper published in 2015, the abstract says: "We make LARVA available as a software tool and release our highly mutated annotations as an online resource (larva.gersteinlab.org).”

So, using the highly mutated annotations you provided may be appropriate. However, this website(“larva.gersteinlab.org”) can not be visited any more. I hope you can provide some help.

Sorry to bother you for this little things. I used the RegulomeDB annotation file as the LRVAR’s input annotaion file, and the first error I sent you last time was disappeared, but there was a new error like this:

$ processing chromosomes………………….

Error: Invalid length of 0 in annotation file, line 2

Length must be greater than zero

RegulomeDB annotation file(only the first 4 columns were used): [[see image]]

A2:
I apologize for the accessibility issues with the LARVA website. There was a recent change on the backend that messed up the IP address routing to the website. I’ve contacted our IT people about the issue, but until they fix things on their end, the LARVA website can be accessed with its raw IP address: http://54.164.95.124/

Also, concerning your RegulomeDB issue, the reason you get an "Invalid length of 0" error is because the annotation on the second line uses the same coordinate for start and end. The program considers the annotation length to be (end-start), so the second annotation appears to have zero length, which doesn’t really make sense. In fact, it looks like the entire file is made of single nucleotides. This would make sense for the variant file, but for the annotation file, the intention is that the annotations represent intervals on the genome that perform some function. These are typically regions like exons, promoters, enhancers, etc. The idea is to see if these annotations are being hit with a large number of mutations. Single nucleotides don’t really match that annotation definition.

I hope this helps.

Using LARVA and FunSeq2 for variant analysis

Q:
I have read your articles describing FunSeq2 and LARVA. I
find these two frameworks to be the most complete and well-adapted and so, I
am very interested in using them for my analysis. I have installed both
tools and started to run them following the instructions in the
documentation, but I am still encountering a few problems.

First, I have run the web-based version of FunSeq2 on several of my VCF
files and it seems to return the wanted result, with around 10,000+ entries
for each sample. However, when running the tool on the same files in command
line (with the -nc option), I obtain a different result, with no significant
entries returned.

The output returned is:

… Input format check : vcf …
… Format ok …
… Start filtering SNVs with minor allele frequency = 0 …
Warning: sample Sample1 – no SNVs left after filtering against natrual
variations …

I receive a similar result when attempting to run the program on multiple
files at once (both in command line and on the web).

I am also trying to use LARVA on these files; I have managed to install the
tool and I am currently testing it using the example-variants-1.txt file
from the regression suite as the variant file, but the program returns
“Segmentation Fault: 11” with no other error message.

Therefore, I would like to know if you have encountered these errors before
and if so, please let me know about any steps that I can try to correct
them.

A:
I’m glad to hear that you’ve decided to use LARVA for your analyses. I did some investigating with the LARVA codebase to try to figure out what might be causing the segmentation fault. One thing I found was that one of the helper scripts (bigWigAverageOverBed) is provided in its Linux (64-bit) version, so if you run LARVA on a different type of system (e.g. a Mac), the script won’t work. There are versions for other operating systems here (at the end of the page), but for simplicity we only provided the 64-bit Linux version. If that doesn’t fix the issue, could you please tell me everything you can about the environment in which you’re running LARVA (CPU, RAM, operating system, etc.) and the command line parameters you used.

Also, for help on Funseq2, I refer you to my colleague, Shake Lou (cc’ed).

One more thing I just thought of: how are all your input files formatted?

As to the issue about Funseq2, here is some suggestions:

1. The Funseq webserver version is obsolete, and we recommend you to use github version.
2. The latest 2.1.6 version has fixed a bug that might lead to some variant missed from the output.
3. Please use bed format as the output format. I will update vcf format output later.
4. You can also try funseq3.gersteinlab.org, which we have pre-calculated each position’s score for the hg19 genome. If you have a large number of variants to query, we have another good news. We are also testing a rich format whole genome Funseq output file and can let you retrieve the Funseq annotation simply from the command line. If you are interested in this file, we can give you the pre-release testing once it passed our internal QC very soon.

Correlation ACT error

Q:

I am trying to run the correlation java script and i get the following when I run the example:

java -jar EncodeTfCor2.jar human_genome_file.txt bedlist 1000000 0
Parsing genome chromosomes and tf bindings …
parsing human_genome_file.txt…
parsing lists in bedlist
Building data matrix …
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at encodetfcor2.TfSitesDataMatrixBuilder.<init>(TfSitesDataMatrixBuilder.java:126)
at encodetfcor2.Main.main(Main.java:65)

Any idea whats going on? I have zero familiarity with Java so I am completely lost as to what is going on.

Well I got rid of it and installed a new version and this time I ran the snp data as the example and it worked. I have no idea what happened. One quick question though, I ran the example with the snp from the four individuals and I got the following matrix:

1.000000 0.984057 0.983579 0.941439
0.984057 1.000000 0.985570 0.956917
0.983579 0.985570 1.000000 0.952203
0.941439 0.956917 0.952203 1.000000

The track_names.txt says the following:

chinese.sites.chr1.parsed
korean.sites.parsed.chr1
venter.sites.parsed.chr1
watson.sites.parsed.chr1

so is the actual matrix then:

names chinese korean venter watson
chinese 1.000000 0.984057 0.983579 0.941439
korean 0.984057 1.000000 0.985570 0.956917
venter 0.983579 0.985570 1.000000 0.952203
watson 0.941439 0.956917 0.952203 1.000000

The readme file isnt very clear on that. Thanks.

A:
Yes, the matrix is labelled correctly.