AlleleSeq CNV file from CNVnator

Q:
I have enjoyed your papers on allele specificity and I have a question about
using AlleleSeq. I understand time is short and valuable and would very much
appreciate it. I am making the CNV files with format:

chrm snppos rd
1 52066 0.902113
1 695745 0.909802
1 742429 0.976435

for input to the AlleleSeq pipeline.

I am using the alleleSeq_cnvScript tool to convert the output from CNVnator
v0.3, into the required CNV file, and it appears to work. However, my
problem is that it is running prohibitively slowly.

My alignment BAM is from 1000 genomes Phase III low coverage WGS, and is
24GB in size.

If I process only Chromosome 1, I have a ROOT file of 156 MB. I have six
million SNPs in a SNV file of 171 MB.

The addRD program runs using only 4% of the 16GB of RAM I have available,
but will take many weeks to complete at the current rate.

The rate at which addRD runs slows down dramatically with time. Though I am
not proficient in c++, I examined the code to see if I could identify why it
is slowing with time. I guess it is due to the search through the ROOT file
for each window around each SNP. The search restarts from the beginning for
each SNP, and so as the SNP locations become further along the chromosome
this search takes longer. I imagine that a great deal of time could be saved
by initialising each search based on the previous search?

If this approach is not possible for me, please could you advise on whether
the following algorithm would be appropriate for input to AlleleSeq:

1) divide the BAM file into windows of with W and count the number of reads
in each window.

2) Calculate the mean Read Depth (perhaps as a function of GC content): mu

1) for each SNP in my SNV file:

Use bamtools to select the reads in the window of size W centred on
the SNP location and calculate the Read Depth

(perhaps correct RD for GC content)

Calculate the normalised read depth = RD/(2 mu L/W)

output the SNP location and normalised read depth to file.

A:
Have you tried using the latest version of Personal Genome Constructor?

http://alleleseq.gersteinlab.org/vcf2diploid_v0.2.6a.zip

When generating the .cnv file suitable for AlleleSeq, the pipeline uses bedtools to get read depth around each hetSNP instead of CNVnator. From my experience, this doesn’t take more than a few hours on a ~100GB WGS .bam file (single thread, all chromosomes).

A question about CNVnator resullt

Q:

Recently,I use CNVnator software detecting dog genome CNVs
using dog genome resequensing data from illumina GA platform.I have get the sorted and removed duplicated .bam file using bwa and samtools and then use command as follows to get CNVs result:

./cnvnator -genome Canis_familiaris.CanFam3.1.71.dna_rm.toplevel.fa -root GW2.root -tree GW2_sort.bam
./cnvnator -genome Canis_familiaris.CanFam3.1.71.dna_rm.toplevel.fa -root GW2.root -his 1000 -d genome_split/
./cnvnator -root GW2.root -stat 1000
./cnvnator -root GW2.root -partition 1000
./cnvnator -root GW2.root -call 1000 >GW2_result

I get result file(GW2_result),and then I convert it to VCF format using cnvnator2V! CF.pl,and get GW2_result_vcf file.I found the result is same what werid (I am new in genome CNVs analysis) because I find so many large-sizes duplications and indels in genome.I think the result file need same filter.But I do not know how to filter and do not find any filter information and standard by google,can you help me?Thank you very much!One of my results is in attachment,please check!

A:
thanks for interest to CNVnator.
Not sure what do you mean by many. How many?
Perhaps, some of those are gaps in the reference genome. While duplications are around those gaps.

Using CNVnator

Q:
CNVnator is a very popular software as observed though there is no official guide on CNVnator or any directions available on how to get started with CNVnator.Could you be kind enough to provide me with the same, please? Does your license allow to provide commercial services based on your program?

A:
Please download the software and read README file.

Alex Abyzov

Information in .root file

Q:
By using CNVnator, I managed to create the .root file but from there I can’t go any further because when I try to create the histograms, it seem to be working, but it never creates any files after it’s done.
A:
New information is added to the .root file you provided in the command line.
During next calculation step CNVnator will extract this information from the file.

To browse the content of the .root file you can start ROOT and open browser (type “new TBrowser”).

Please see http://root.cern.ch for details.

CNVnator license

Q:
Does your license allow to provide commercial services based on your program?

A:
Commercial services can use CNVnator for free provided that original software/developers/paper is credited/cited.

Alex Abyzov

***********************************************************
Department of Molecular Biophysics and Biochemistry,
Yale University, 260 Whitney ave., P.O. Box 208114,
New Haven, CT, 06520, USA
Phone: 1-(203)-432-5405
e-mail: abyzov@gersteinlab.org
URL: http://homes.gersteinlab.org/people/aabyzov
***********************************************************