I would like to use AlleleSeq with exome data but am unsure how to generate the .cnv file. If I supply a BAM from my exome sequencing experiment then won’t this produce some very low average coverage if CNVnator examines coverage across the genome? This will mean that my exonic variants will be excluded because AlleleSeq interprets them as being in a CNV… I assume I have to supply a FASTA containing only the target regions of the exome enrichment – is this true?
I haven’t used AlleleSeq with exome data (I will ask around some people in the lab and will probably write to you again next week), but I would try using the latest version of the pipeline: the 0.2.6a version of Personal Genome constructor (http://alleleseq.gersteinlab.org/tools.html) doesn’t use CNVnator, instead the pipeline uses bedtools and calculates median read depth across +/-1000bp regions around snps. So if all snps are exonic, the read depth around each snp will be compared to the median value across all the other ones. This might bias snps located close to exon/intron junctions, if that happens, I would consider decreasing the window size.
…We discussed your question in the lab and believe that it is better not to perform the cnv identification step at all: there should not be many of them in the exome, and the edge effect could be avoided as well. The latest version of the alleleseq pipeline still requires a .cnv file to run. So, the easiest solution is to create the .cnv file with the list of all the snps (as in the .snp file) with rd=1.0:
chrm snppos rd
1 52066 1.0
1 695745 1.0
1 742429 1.0