I read your excellent breakSeq paper "Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library", and now I have some whole genome sequencing data to be analyzed. The breakpoint library you apply (http://sv.gersteinlab.org/breakseq/) is based on human genome NCBI build 36, but I use NCBI build 37 now. So should I lift-over the coordinate to the NCBI build 37 or realign the junction sequences to the NCBI build 37 first by myself? Or is there any pre-compiled breakpoint junction library used for NCBI build 37 ? By the way, any suggestions about adding the SVs identified in 1000 genome project to the breakpoint junction library ?
There are two sets of SV breakpoints that should be relevant to you:
The published 1000 Genomes pilot data in Mills et al Nature 2010: http://www.nature.com/nature/journal/v470/n7332/extref/nature09708-s9.xls
The 1000 Genomes phase I data that is going to be published soon: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase1/analysis_results/integrated_call_sets/
The published pilot data is on NCBI build 36. Using liftover to convert the genomic coordinates to NCBI build 37 should suffice. You might want to double check whether the SV size and the junction sequences are consistent before and after the liftover.
The phase I data is on NCBI build 37. You may simply take the junction sequences at the breakpoints to add to the library.