Q1:
I’ve used your AlleleSeq package to get NA12878 diploid genome alignment. I have two questions:
1. How can I liftover from paternal/maternal alignments to hg19 coordinates? Based on the documents, the map files generated by vcf2diploid are used to convert from hg19 to pat/mat, not the other direction. Last year I found an R script "map.to.ref.R" from your website and used that to convert from pat/mat to hg19. However I cannot find it anymore. Do you have an updated script for this liftover? If "map.to.ref.R" is still valid to use, what reference should we cite in our manuscripts?
2. I noticed that both the "MergeBowtie.py" script in AlleleSeq and the R script "map.to.ref.R" assume bowtie format in input alignment files. Do you have upgraded versions that are compatible with SAM format?
A1:
1. We use the chainSwap tool (http://hgdownload.soe.ucsc.edu/admin/exe/) to flip the maternal.chain and paternal.chain files (generated by vcf2diploid) to the other direction. Then a .bed file in the parental coordinates can be converted into a .bed file in the reference coordinates using the UCSC liftOver tool and the swapped chain files.
This approach is straightforward for any arbitrary interval, but if you are converting alignments, it may require additional scripting or using other tools. We’ve received a few issues/questions with the script you are mentioning and since we didn’t develop it and aren’t currently planning to maintain, we removed the link from the web-site for now.
2. We do not have updated released versions yet, but this is one of the things that will be introduced in future versions and thus I might be able to help. What exactly are you interested in: is it just compatibility with SAM-formatted bowtie1 (i.e. ungapped alignments) output or are you trying to use another aligner?
Q2:
Thanks for your detailed answers. They are very helpful.
We used bowtie2 or tophat2 as the aligners for DNA or RNA.
Currently we hope to liftover the pat/may alignments to hg19 for downstream analyses (e.g. HiC or DEG). We can’t compare results if they are not liftovered to the same reference.
Could I take the mapped position in sam, convert it to bed, liftover to hg19, and then chip this new position back to the sam? Do you have a better solution?
A2:
The start coordinates of the reads can be transferred this way, but I don’t think inserting them back to the sam file will produce a correct sam/bam file overall: some of the other fields (CIGAR string, mismatches, etc) will need to be adjusted as well. Depending on the further analysis this may be an issue. CrossMap http://crossmap.sourceforge.net/ seems to work with sam/bam and chain files. I have never used it though.
Maybe it is easier to do the analysis on the parental alignments? Say, for DEG in order to generate read counts table, one might consider transferring the annotation to mat and pat coordinates. Then for every exon/gene extract the reads mapped to it in both alignments and use the number of unique read-ids as the gene read count.
Where does the merge script come in your approach? Do you want to merge the alignments before transferring them to hg19 and then do further analyses (which I am guessing, do not involve looking into allele-specific counts)?