1000G enquiry – Breakpoints File Interpretation

I’m trying to interpret your breakpoints file at

Is this file the same as Supplementary Table 3 in the SV map paper?

Yes, they are the same.

What VCF should be used to interpret this file? I’m having difficulty
finding a VCF that has all the IDs accounted for.

Does the breakpoints file contain information that is meant to
override that in the VCF? So if the VCF and the breakpoints file
disagree on the position of a variant, the breakpoints file should be
considered correct?

The VCF file SV events are all SVs identified after taking their unions among other steps. The breakpoint file only contains SVs identified with breakpoint-level resolution by each variant caller. They do not override each other but should be treated as separate datasets. The breakpoint file can be considered to contain more detailed information of the SV region in the union call file.

It looks like the breakpoints file contains an INSSEQ column, giving
(anchored) sequences that are inserted at the same time as deletion
events. That makes the deletion into a substitution of the shorter
sequence for the longer sequence, right?

Yes, these deletions contain mostly micro-insertions (1-20bp) at the deletion site.

It would be ideal for my application if I could get a VCF containing
the information from this file. Is that already available? Have the
more precise breakpoint calls been rolled into e.g.
already? If not, do you have advice on how to cram this information
into a VCF while preserving its semantics?

I am not aware of a breakpoint file in VCF format. You may start with considering including just the chromosome, start, end and type information.

