Using use the current 1000 genomes reference (Phase4 reference) to use BreakSeq2 to perform SV calling

Q:
We have started a Cloud-based Cancer SV Calling project and would like to use BreakSeq2 to perform SV calling, but would like to use the current 1000 genomes reference (Phase4 reference). Because Breakseq2 relies on the coordinates in the breakpoint library GFF, we were hoping that we could either obtain an updated breakpoint library or some advice on the feasibility of using coordinate liftover (via the available hg19 to Hg38 UCSC chain files) to update the coordinates in the GFF inside the latest library hosted on your lab website at:

http://sv.gersteinlab.org/phase1bkpts/breakseq2_bplib_20150129.zip

We are under a time constraint with regard to the Cloud Compute funding, so we would very grateful if you could reply back soon.

A:
I think the best option right now would be to lift over the coordinates to hg38. Both the GFF and the INS files need to be lifted over (you can use CrossMap which supports GFF). After the liftover, you can check to ensure that the SV lengths were lifted correctly, it might be good to ignore SVs whose lengths after the liftover changed. Note that for the INS file, you will need to write a script to liftover the coordinates in the read-name. You can check out the example on the BreakSeq2 page (http://bioinform.github.io/breakseq2/) for how to run from GFF (you will need both the GFF and the INS file). Hope that helps.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s