OrthoClust – for more than two species

Q:
I just read your recently published paper on OrthoClust approach. It is a well grounded work in both practically and mathematically point of views.

I ran your R scripts for my own data and It worked perfectly fine, however I am wondering how can I use the script for more than two species?

It could be appreciated if you help me to find the solution.

A:
Thanks for your interest in OrthoClust. Orthoclust definitely works on more than 2. The R script is a primitive version for illustrating the concept outlined in the paper. We understand the importance of N-species generalization. We have put a new MATLAB code for N-species. It made use of an efficient code written by Mucha and Porter that implemented the Louvain algorithm for modularity optimization. The 3rd party code as well as our wrapper is now in the gersteinlab github.
Apart from MATLAB, we are planning to provide wrapper for Python or R later.
The N-species code is not exactly the thing we did for the paper. So if you find any bug or question, please let me know. we are trying to make a more user friendly package anyway.

question re. data in paper “Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors”

Q:
As describe in your paper entitled "Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors", it is mentioned "we identified 13,539 potential enhancers (full list available in the Additional files), among which 50 were randomly chosen". But in the additional files, only 50 enhancer co-ordinates are mentioned. Can you please provide me either the source/list of the all 13,539 enhancers.

Many thanks in anticipation of your quick reply,

A:
see http://encodenets.gersteinlab.org/metatracks

related scripts or equations for implementing analyses in paper “Genomic analysis of the hierarchical structure of regulatory networks”

Q:
I recently have been working on constructing human regulatory networks. After reading your paper <Genomic analysis of the hierarchical structure of regulatory networks> published on PNAS, I found it very amazing and useful, which may be applied for my study. I want to construct hierarchical structure of transcription factors (TFs) in humans, and my data is the expression level of these TFs and their targets obtained by RNA sequencing. Can we use your BFS method to construct the network? As we know little about the computational algorithm of BFS, would you please provide related scripts or equations for implementing it easily?

Thank you very much for occupying your precious time reading my letter and I’m looking forward to your guidance.

A:
Hi, see http://info.gersteinlab.org/Hierarchy

PEMer for commercial use

Q:
I work for Novartis Institutes For Biomedical Research Inc, in Cambridge, MA, a commercial entity.

Could you use your PEMer software ? ( I remember you allowed me to try your translocation detection software in 2011, but I did not archive that email in 2011?

A:
this is fine.

Request for data in PLoS Computational Biology paper: Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data

Q:
I am currently working on
a network science project studying properties of heterogenous networks and greatly intrigued by your 2011 paper in PLoS
Computational Biology:

Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data

I am planning to employ the integrated human TF-miRNA-gene regulatory network constructed in your work to verify the
utility of information flow – based techniques in understanding the mapping between network topology and function. However,
the full network does not appear to be available in the supplementary information. I am writing to kindly ask if I could obtain
a copy of the human network data (e.g. CSV format edge list) for my research. I would be more than honored to be able to use
the original dataset in my work, and my apologies if it is against your plans to disclose it or it is available somewhere else that
I am not aware of. Thank you very much!

A:
you can certainly get the network.

The data behind it and a closely related network is available from :

http://papers.gersteinlab.org/papers/mirnet

http://papers.gersteinlab.org/papers/wormawg

(see website links)

meaning of CCA result

Q:
Just now, I have send you an Email, and ask the question of how to plot
CCA structural correlations figure. Now I have gotten that figure (please
see the attachment), But I can’t see anything from this figure, I only know
blue triangle is represent environment factor, and red diamond is for
metabolic pathway. But I don’t know what does this specified red diamond
represent which pathway, and what is for this triangle?

A:
you might find my lecture on the subject useful:
http://lectures.gersteinlab.org/summary/Networks-201102240-i0at10+kitp/

http://archive.gersteinlab.org/mark/out/log/2014/02.02/class/
from
http://www.gersteinlab.org/courses/452/

Information about program code for ENCODE paper

Q:
During the last days I was reading your paper "Architecture of the human
regulatory network derived from ENCODE data".
I am doing something related and I am willing to perform your kind of
analysis in addition or to merge the two ideas somehow.
For this purpose I was looking for some program code that has been
published for the analysis of your work, but so far I just found the
workflow description in the SI.
In case it is possible, I would be delighted if you could share the
relevant code with me, which would make life much easier for me and my
analysis much quicker.
I would be primarily interested in everything that allows me to infer
the hierarchy diagrams for the TF network and the TF-miRNA network.
By the way: Is there any reason why you did not include histone
modification and DNA methylation data?

A:
some code is associated with separate papers – eg see :

http://papers.gersteinlab.org/papers/hier-rewiring
encodenets.gersteinlab.org

training set size used for PPI network construction with bayesian method

Q:
I am trying to construct a "gene co-phenotype" background network using bayesian approach which is mentioned in your paper "A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data" (science ,17th October 2003).

After reading the supplementary method related with this paper, I have a question on how to set the training data set size.
In this paper, 8250 positive /2691903 negative training gene pairs are used. It is recommended that the training data set should be balance with the true situation when we use naive bayesian method. Could you give me some instruvtions on how you set the positive/negative training dataset size. It will be very glad to hear from you.

A:

best I can do here is point you to:
http://papers.gersteinlab.org/papers/funcpred-goldstd

Qs about breakseq tool

Q:
I have just installed Breakseq tool developed by your lab to analyse structural variant in pancreatic cancer genome,

All the required modules has been downloaded, however, I could not find documentation of how to run the tool.

I was wondering is there any manual or an example on how to run the tool?

Or may I could contact someone in the lab who is familiar with Breakseq?

A:
everything we have is at http://sv.gersteinlab.org/breakseq