We have started a Cloud-based Cancer SV Calling project and would like to use BreakSeq2 to perform SV calling, but would like to use the current 1000 genomes reference (Phase4 reference). Because Breakseq2 relies on the coordinates in the breakpoint library GFF, we were hoping that we could either obtain an updated breakpoint library or some advice on the feasibility of using coordinate liftover (via the available hg19 to Hg38 UCSC chain files) to update the coordinates in the GFF inside the latest library hosted on your lab website at:
We are under a time constraint with regard to the Cloud Compute funding, so we would very grateful if you could reply back soon.
I think the best option right now would be to lift over the coordinates to hg38. Both the GFF and the INS files need to be lifted over (you can use CrossMap which supports GFF). After the liftover, you can check to ensure that the SV lengths were lifted correctly, it might be good to ignore SVs whose lengths after the liftover changed. Note that for the INS file, you will need to write a script to liftover the coordinates in the read-name. You can check out the example on the BreakSeq2 page (http://bioinform.github.io/breakseq2/) for how to run from GFF (you will need both the GFF and the INS file). Hope that helps.