Help regarding co-authorship network (PubNet)

Posted on June 3, 2019 by gersteinfaq

Q1:
I am trying to perform the comparison of my co-authors network with PubNet submitted network of consortium’s.

Can you please provide the PMID’s for the individual centers you have used.

It will ne easy to compare using same input you have provided.

Can you please provide PubNet access with more than 20 papers.

A1:
Actually I did the work on pubnet over 10 years ago as an undergraduate student. So I am no longer in the lab.

If you have additional questions you should direct them to Mark Gerstein. I believe the TopNet tool can be accessed here: http://networks.gersteinlab.org/ (it seems to have been renamed to TYNA)

Q2:
I am trying to access TopNet from PubNet, its not working.

Can you provide a working link for same.

I will use PMID information from PubNet gallery section for each center’s.

If I also get same network properties as in PubNet paper, than I can say created co-author network is also correct

A2:
If you click on the text label (e.g. "NESG") it will show you PMIDs for the query:

http://pubnet.gersteinlab.org/cgi-bin/view.pl?id=050605171754

You can explore further in the “text view” link near the bottom.

http://pubnet.gersteinlab.org/cgi-bin/node.pl?id=050607174130

Funseq2 output: missing variants

Posted on June 3, 2019 by gersteinfaq

Q:
We are trying to implement the scores of Funseq2 (running locally).
However, we would like to have a score for each variation in the
input-vcf: this is not the case if we look at the Output.vcf.
Can I conclude from this output, that the missing variants in
Output.vcf have a score of zero?

A:
The somatic variants that overlap 1000 Genomes variants are filtered out.
Those might be the variants being removed from your output vcf.
You can check one or two manually and you should be able to confirm that.

7K ncRNA gene set

Posted on June 3, 2019 by gersteinfaq

Q:
We currently have in WormBase the ‘7K’ set of ncRNA genes as described in
the 2011 Integrative analysis modENCODE paper.

We have been looking at the new ENCODE/modENCODE Comparative analysis paper
in Nature.
This paper describes the supervised prediction of a set of ncRNA genes that
do not overlap existing genes.
It is not obvious where to get details of these predicted genes.

Is there a file of chromosomal locations of these genes that we can have?

Are these predicted ncRNA genes suitable for replacing the old ‘7K’ set of
ncRNA genes?

A:
Hi, yes, you can get these from encodeproject.org/comparative . I do
think these can supplement the 7k.

I’d use the new set at encodeproject.org for a smaller, more high-quality & more conservative set than that in the ’10 paper. -marK

Pseudogene identification pipeline for bacterial genome

Posted on June 3, 2019 by gersteinfaq

Q:
I am writing to you reagarding ‘Pseudogenes’ detection within bacterial genome- I was wondering is there a software/ pipleline to use in order to identify pseudogenes within bacterial genome.

A:
The best way is to use our pseudogene annotation pipeline – Pseudopipe. You can download the stand-alone version that can be easily run on your computer and does not require a cluster:
http://pseudogene.org/pseudopipe/

Pseudogene talk at ASHG

Posted on June 3, 2019 by gersteinfaq

Q:
I recently attended the ASHG conference where you gave a talk on pseudogene copy number variation based on the 1000 genomes project. I tried looking for this study online and didn’t find anything that was obviously part of your presentation. I was wondering if this data has already been published, and if so if you would let me know what the name of the study was.

A:
I think the studies you are looking for are:

http://www.pnas.org/content/111/37/13361.abstract
and
http://genome.cshlp.org/content/23/12/2042.full.pdf+html

The first is the latest paper from our lab on pseudogene analysis and the second is a paper on CNVs and retroduplications based on 1000G project.

Size of SV in BreakSeq output

Posted on June 3, 2019 by gersteinfaq

Q:
I have been using BreakSeq for identification of SV along with Break Dancer, CNVnator and Pindel. I was able to run BreakSeq and get SV. However, recently while submitting data to dbVar, I came know that I should also provide information on SIZE of SV. As BreakSeq output does not mention SIZE of each SV’s in its output it has become bit difficult to provide SIZE information to dbVar. However, I find POS and END position in output. Can I consider difference of POS and END as SIZE of SV?

A:
For deletions you can use the pos and end for size. For insertions, the current version does not give you the size. We are planning for a next version which should have size. If you have to get it now, you can basically get the size from the insertion fasta distributed along with breakseq.

control parameters in annealing process in OrthoClust R package

Posted on June 3, 2019 by gersteinfaq

Q:
Recently, I am trying to use the OrthoClust R package for multiple species network clustering. I did not found the control parameters in annealing process as described in your paper: "Standard simulated annealing was employed. Spin values were randomly assigned initially, and updated via a heat bath algorithm. The initial temperature was chosen in a way such that the flipping rate (the probability that a node changes its spin state) was higher than 1 – 1/q. The temperature was gradually decreased with a cooling factor 0.9, until the flipping rate was less than 1%." I did also not found the simulation annealing algorithm in the matlab file OrthoClustN.m (represented by a greedy algorithm). Please help me solve this problem. Thank you for your time.

A:
The annealing procedure is very slow for practical problems. in the revision stage of our manuscript, we discovered the greedy algorithm (Louvain algorithm) and therefore wrapped up the matlab code, and implemented in R too. it’s a very well regard algorithm, and we strongly encourage you to try the matlab code for your purpose.

additional file 1 for FunSeq2

Posted on June 3, 2019 by gersteinfaq

Q:
I recently read your paper of FunSeq2, and I thinks it really interesting. I want to have a look at the supplymentary file, which can’t be avialable online. Could you send me a copy?

A:
The attached file should now be available on the publisher’s web page. See also here.

13059_2014_480_add1.pdf

FunSeq2 data context download problem

Posted on June 3, 2019 by gersteinfaq

Q:
After reading your recent paper about the FunSeq2 tool, which is very nice, I was interested to take a closer look at your data. Unfortunately, it seems that I’m not able to download the data context from http://funseq2.gersteinlab.org/data/ . The server always drops the connection after I download about 1Gb of the compressed file, and I also can’t access at all some of the individual files, e.g. human_ancestor_GRCh37_e59.fa . Would you help me to solve this problem?

A:
We have added a alternative link to download files: http://funseq2.gersteinlab.org/data/
Now you can download the files from : http://archive.gersteinlab.org/funseq2_data/

Bulk Tissue Deconv. Cell Fractions

Posted on May 31, 2019 by gersteinfaq

Q:
I would like to apply the bulk-tissue deconvolution algorithm in your recent paper (Wang et al., 2018) using our own single cell RNA-Seq data and Gandal et al., 2018’s bulk tissue RNA-Seq. I couldn’t find code related to the deconvolution steps in the Gernsetin Lab github page (https://github.com/gersteinlab/PsychENCODE-DSPN) or on the PsychEncode resources page. I only found results to the cell fraction calculations. Would you be able to point me towards how I can apply this algorithm?

A:
We used non-negative least square method for deconvolution and implemented it using R function nnls (https://www.rdocumentation.org/packages/lsei/versions/1.2-0/topics/nnls) For example nnls(C, bi) estimates the cell fractions for ith tissue sample, where C is cell type gene expression matrix (row: gene, column: cell type), and bi is the gene expression vector for ith tissue sample.

Gerstein Lab FAQs

Frequently Asked Questions

Author Archives: gersteinfaq

Help regarding co-authorship network (PubNet)

Funseq2 output: missing variants

7K ncRNA gene set

Pseudogene identification pipeline for bacterial genome

Pseudogene talk at ASHG

Size of SV in BreakSeq output

control parameters in annealing process in OrthoClust R package

additional file 1 for FunSeq2

FunSeq2 data context download problem

Bulk Tissue Deconv. Cell Fractions