Interfering the ancestral state of inversions using BreakSeq

Q:
I am writing to you because I am working with the BreakSeq software
which was developed by your team and I am having some troubles.

The work-frame of our group is focused on inversions, and recently we have
started using BreakSeq for the annotation of the breakpoint features.
BreakSeq seems to work fine for all its steps except when interfering the
ancestral state of the inversions. I have successfully installed Blat on our
server and also opened the server connection for the three primate genomes.
In addition, I updated the paths of the BreakSeq configuration file which
allows the correct execution of BreakSeq.

However, If I check the ancestral state of some validated inversions from
our database (http://invfestdb.uab.cat), which we known that have different
orientation (standard or inverted) in the primate genomes, BreakSeq
annotates ALL them as Rect "0:0:0". Which I understand that means that the
inversion has the same orientation in all 3 primate genomes.

I will show you an example of what I am trying to explain. If I run breakseq
for the annotation of the inversion HsInv0501
(http://invfestdb.uab.cat/report.php?q=533), which its orientation is
standard for chimpanzee but inverted for orangutan and macaque, I would
expect the following output: Rect "0:1:1". However, BreakSeq output is Rect
"0:0:0".

In conclusion, my main question is the following one: Can BreakSeq predict
the ancestral state in the case of inversions? If it can, where do you think
I am doing something wrong for obtaining every time Rect "0:0:0" as output?

I am attaching the gff input file containing the inversions that are at
least different orientated in one of the three primates, the BreakSeq
configure file which I am using, and also the resulting output folder after
running BreakSeq.

A:
BreakSeq was not intended to look at inversions initially but I suspect it should be usable with some modifications. Alternatively you could reproduce the way Breaseq interprets alignments to primate genomes for interpretation of ancestry

Help regarding co-authorship network (PubNet)

Q1:
I am trying to perform the comparison of my co-authors network with PubNet submitted network of consortium’s.

Can you please provide the PMID’s for the individual centers you have used.

It will ne easy to compare using same input you have provided.

Can you please provide PubNet access with more than 20 papers.

A1:
Actually I did the work on pubnet over 10 years ago as an undergraduate student. So I am no longer in the lab.

If you have additional questions you should direct them to Mark Gerstein. I believe the TopNet tool can be accessed here: http://networks.gersteinlab.org/ (it seems to have been renamed to TYNA)

Q2:
I am trying to access TopNet from PubNet, its not working.

Can you provide a working link for same.

I will use PMID information from PubNet gallery section for each center’s.

If I also get same network properties as in PubNet paper, than I can say created co-author network is also correct

A2:
If you click on the text label (e.g. "NESG") it will show you PMIDs for the query:

http://pubnet.gersteinlab.org/cgi-bin/view.pl?id=050605171754

You can explore further in the “text view” link near the bottom.

http://pubnet.gersteinlab.org/cgi-bin/node.pl?id=050607174130

Funseq2 output: missing variants

Q:
We are trying to implement the scores of Funseq2 (running locally).
However, we would like to have a score for each variation in the
input-vcf: this is not the case if we look at the Output.vcf.
Can I conclude from this output, that the missing variants in
Output.vcf have a score of zero?

A:
The somatic variants that overlap 1000 Genomes variants are filtered out.
Those might be the variants being removed from your output vcf.
You can check one or two manually and you should be able to confirm that.

7K ncRNA gene set

Q:
We currently have in WormBase the ‘7K’ set of ncRNA genes as described in
the 2011 Integrative analysis modENCODE paper.

We have been looking at the new ENCODE/modENCODE Comparative analysis paper
in Nature.
This paper describes the supervised prediction of a set of ncRNA genes that
do not overlap existing genes.
It is not obvious where to get details of these predicted genes.

Is there a file of chromosomal locations of these genes that we can have?

Are these predicted ncRNA genes suitable for replacing the old ‘7K’ set of
ncRNA genes?

A:
Hi, yes, you can get these from encodeproject.org/comparative . I do
think these can supplement the 7k.

I’d use the new set at encodeproject.org for a smaller, more high-quality & more conservative set than that in the ’10 paper. -marK

Pseudogene identification pipeline for bacterial genome

Q:
I am writing to you reagarding ‘Pseudogenes’ detection within bacterial genome- I was wondering is there a software/ pipleline to use in order to identify pseudogenes within bacterial genome.

A:
The best way is to use our pseudogene annotation pipeline – Pseudopipe. You can download the stand-alone version that can be easily run on your computer and does not require a cluster:
http://pseudogene.org/pseudopipe/

Pseudogene talk at ASHG

Q:
I recently attended the ASHG conference where you gave a talk on pseudogene copy number variation based on the 1000 genomes project. I tried looking for this study online and didn’t find anything that was obviously part of your presentation. I was wondering if this data has already been published, and if so if you would let me know what the name of the study was.

A:
I think the studies you are looking for are:

http://www.pnas.org/content/111/37/13361.abstract
and
http://genome.cshlp.org/content/23/12/2042.full.pdf+html

The first is the latest paper from our lab on pseudogene analysis and the second is a paper on CNVs and retroduplications based on 1000G project.

Size of SV in BreakSeq output

Q:
I have been using BreakSeq for identification of SV along with Break Dancer, CNVnator and Pindel. I was able to run BreakSeq and get SV. However, recently while submitting data to dbVar, I came know that I should also provide information on SIZE of SV. As BreakSeq output does not mention SIZE of each SV’s in its output it has become bit difficult to provide SIZE information to dbVar. However, I find POS and END position in output. Can I consider difference of POS and END as SIZE of SV?

A:
For deletions you can use the pos and end for size. For insertions, the current version does not give you the size. We are planning for a next version which should have size. If you have to get it now, you can basically get the size from the insertion fasta distributed along with breakseq.

control parameters in annealing process in OrthoClust R package

Q:
Recently, I am trying to use the OrthoClust R package for multiple species network clustering. I did not found the control parameters in annealing process as described in your paper: "Standard simulated annealing was employed. Spin values were randomly assigned initially, and updated via a heat bath algorithm. The initial temperature was chosen in a way such that the flipping rate (the probability that a node changes its spin state) was higher than 1 – 1/q. The temperature was gradually decreased with a cooling factor 0.9, until the flipping rate was less than 1%." I did also not found the simulation annealing algorithm in the matlab file OrthoClustN.m (represented by a greedy algorithm). Please help me solve this problem. Thank you for your time.

A:
The annealing procedure is very slow for practical problems. in the revision stage of our manuscript, we discovered the greedy algorithm (Louvain algorithm) and therefore wrapped up the matlab code, and implemented in R too. it’s a very well regard algorithm, and we strongly encourage you to try the matlab code for your purpose.

FunSeq2 data context download problem

Q:
After reading your recent paper about the FunSeq2 tool, which is very nice, I was interested to take a closer look at your data. Unfortunately, it seems that I’m not able to download the data context from http://funseq2.gersteinlab.org/data/ . The server always drops the connection after I download about 1Gb of the compressed file, and I also can’t access at all some of the individual files, e.g. human_ancestor_GRCh37_e59.fa . Would you help me to solve this problem?

A:
We have added a alternative link to download files: http://funseq2.gersteinlab.org/data/
Now you can download the files from : http://archive.gersteinlab.org/funseq2_data/