training set size used for PPI network construction with bayesian method

I am trying to construct a "gene co-phenotype" background network using bayesian approach which is mentioned in your paper "A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data" (science ,17th October 2003).

After reading the supplementary method related with this paper, I have a question on how to set the training data set size.
In this paper, 8250 positive /2691903 negative training gene pairs are used. It is recommended that the training data set should be balance with the true situation when we use naive bayesian method. Could you give me some instruvtions on how you set the positive/negative training dataset size. It will be very glad to hear from you.


best I can do here is point you to:

Yeast Network Hirearchy

I am very interested in your work on network rewiring. I have been working on experimental validation of network rewiring approaches investigating how this can be used to reprogram regulatory networks to improve heterologous protein production in Yeast. I am now in the process of analysing transcriptional rewiring phenotypes I have identified in a combinatorial library based screen. I have noticed some very interesting enrichment criteria in the groups of rewired promoters and open reading frames with regards to network structure.

I was hoping to look at how these rewired components are natively arranged with regards to their network hierarchy. I would like to use the hierarchical network model you proposed in your paper ( but I have been having trouble reconstructing it from the pdf supplemental data. I am really keen on using your model to study my experimental data further if you have any suggestions on how I could best go about this I would be most greatful.

you might find the following links useful :
website with an earlier version of the yeast hierarchy.
information on worm & fly hierarchies
Human hierarchy
Bacterial hierarchy

I would also direct you to the wiki page:

Under the heading "Phenotypic Effects of Network Rewiring in Transcriptional Regulatory Hierarchies", this page lists all the data in a very user-friendly format that you would need to reproduce the hierarchies with all the datasets very well described/annotated.

This page has the initial regulatory network of E. coli and Yeast and it also provides you with the original breadth-first search hierarchies. In addition, it lists all the changes in the hierarchy upon deletion of each gene. There is an extensive description of what each column in each file means.

Further, in order for you to better understand the algorithm/program we used, I am also attaching a light-weight perl script that generates the hierarchy from a given network ( (it is well annotated with an explanation of each step). I am also attaching another perl script that I used to list the changes the hierarchy upon deletion of each gene ( Paths will be broken for input files but it should be enough for you to get a flavor of how we quantified changes in the modified hierarchies.

Data associated w/paper “Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data”

I recently read your article “Construction
and Analysis of an Integrated Regulatory Network Derived from
High-Throughput Sequencing Data”. In the last year, I measured mRNA and
miRNA expression in the different types of mouse skeletal muscle fibers to
discover the different regulatory circuits activated in fast and slow
myofibers. I designed a preliminary network using the databases of miRNA –
target mRNA and protein – protein interactions, and I have started to
include my expression data in order to understand the biological meaning. I
was wondering if it is possible to use your more accurate mouse regulatory
network for my data. Is this network free to use? In the article and in the
website of your laboratory I did not find any file or link with the complete
networks that you describe. I am not a computational biologist, but the
paper is very interesting and I think that the network that you design with
your method could be very useful for the scientific community.

Hereby I attach three files for our three mouse networks. 1) how miRNAs targeting genes (This is not our calculation, but downloaded from TargetScan).
2) how TFs targeting genes, 3) how TFs targeting miRNAs based on ChIP-Seq data of 12 TFs.
The files are in plain text format. The first column is the list of regulators and the second column is the list of targets. The bracket next to a gene name gives the class of the gene, TF for transcription factors, MIR for miRNAs, and X for non-TF protein-coding genes.
Thank you for your interest of our paper. I hope this information will be useful for your work.

Architecture of the human regulatory network derived from


Re: Architecture of the human regulatory network derived from ENCODE data

Hi Dr. Gerstein: This is a very nice paper and is very important in my
current study. Do you have tools/software for TF Co-association (figure 1
and supplemental section B and C) mentioned in this paper. Can I get it?

Anshul did the co-association analysis for this Networks paper. I
think he knows that part the best.

As for the co-association analysis in the ENCODE main paper, it can
be repeated using the GSC package available at the ENCODE statistics web
site ( The first thing you need to do
is to determine (manually or by other means) a segmentation of the
genome, where TF binding is assumed segment-wise stationary. If you have
no specific preference on how the segmentation should be done, you can
use the GSC Python segmentation tool to do that, which will try to
perform an automatic segmentation (the results of which would be better
if you have more data). Then you can run the GSC Python program to
perform segmented block sampling to compute pairwise p-vlaues of your
binding data.

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

DREAM 3 challenge & paper “Improved Reconstruction of In Silico Gene Regulatory Networks by Integrating Knockout and Perturbation Data”


I am interested in exploring further the work did by you and your team
members in DREAM 3 challenge, as reported in the paper stated below. Do you
provide the codes/program for public to view? Thanks.

"Improved Reconstruction of In Silico Gene Regulatory Networks by Integrating Knockout and Perturbation Data"

I am ok with the current software which you said quite tailored to the competition. Please send it to me. Really appreciate it. Thanks.

The current form of the software is quite tailored for the
competition, and we do not have a general, publicly distributable
version. I can send it to you if you think it would be useful.

Please find the version that we submitted to DREAM attached, together with some data and some script files for running it. If you have Apache Ant installed, simply issue the command "ant runall3" to run the program on the DREAM3 files. The size-10 networks are included, and the size-50 and size-100 networks can be downloaded from the DREAM web site.

Data re “Architecture of the human regulatory network derived from ENCODE data”

I am very familiar with the ENCODE TF datasets, as I’ve been applying it to various problems in my PhD. I was interested in the expression analysis across human tissues for the ((miR –> TF) –> targets) FFL. There is a reference in the Supplementary file (section H) to the protein-coding expression atlas Su et al. 2004, for the TF and protein-coding targets in this loop, but doesn’t seem to be a ref for the corresponding expression data for miRNAs? I assume it would be Landgraf et al. 2007 ‘A mammalian microRNA expression atlas based on small RNA library sequencing’, since this allows matched tissues and samples with Su et al. However, it might be some other dataset. It would be helpful to be able to replicate/extend the FFL analysis using the correct data. Would you be able to forward this email to the relevent person(s) to confirm whether microRNA expression was taken from Landgraf atlas? Many thanks for your help

Slight correction: The FFL studied for expression pattern of
components is the other way round: ((TF –> miR) –> targets).

the miRNA expression is actually from
Lu et al, Nature 2005

if you go to
under the heading "MicroRNA Expression Profiles Classify Human Cancers"
see files


PDB data for: Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights


Regarding your seminal paper "Relating Three-Dimensional Structures to
Protein Networks Provides Evolutionary Insights".
Amongst the supplementary data I could not find the PDB entries that were
used for each interaction in the SIN.
I would much appreciate if you could send me this data.

info. should be on the site

Data associated with paper “Redefining Nodes and Edges: Relating 3D Structures to Protein Networks Provides Insight into their Evolution”


I’m hoping to analyse the data from your 2006 "Redefining Nodes and Edges: Relating 3D Structures to Protein Networks Provides Insight into their Evolution" paper. Do you have the full dataset including the pdb ids/chain ids relating to each interaction in the network?

have you seen assoc. paper website :

Question about paper Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data


I am very interested in Chao Cheng’s paper Construction and Analysis of an Integrated Regulatory Network
Derived from High-Throughput Sequencing Data. It is great resource for me to
analysis regulation network of C elegans.

However, I met troubles in downloading the Table S2 and Table S3 from
Is it possible to send me the supporting tables by email?

Thanks for your interest in our work. Please find the tables in the attached files. Let me know if you need more information.



ENCODE-Networks Source Code for Context-Specific TF Co-Association Analyses

I am interested in your paper published in Nature, 06 September 2012, “Architecture of the human regulatory network derived from ENCODE data”. In particular, we are interested in the framework of context-specific TF co-association analysis described in this paper. We would like to apply this method on our in-house datasets. It’s exciting that the code for these analyses is “Available soon” (the file “enets21.coassoc-code.tgz” on Do you know whether the code for co-association analysis in this paper is available now? If so, it might save us a lot of time. Thanks for your help!

The main machine learning method used for the analysis is RuleFit3 which is available here

Detailed instructions on preparing the input data and computing the various scores are in the supplement of the paper.

I don’t have a polished code package that is ready for use for the general public. The code that I wrote for analyses in the paper is here . But I have to warn you that its not designed to work on general datasets as it has scripts that were designed to run on our local cluster. The core functions are in . The code is reasonably commented so hopefully it should help.