Questions about “Architecture of the human regulatory network derived from ENCODE data”

I am reading your paper, and have problem about the TF-target gene network data downloaded from I want to know which refGene and gene symbol did you use when you find the TF target gene with ChIP-seq data? I find that some symbols are not concluded in hg19 refGene I download from ucsc.

the server was down for a while, and I wasn’t sure what names were you talking about. Now, I think the names are from gencode, but I cannot recall the exact release we used. I believe the names wouldn’t change in general. you can see all the releases here, the names should be in one of the metafiles.

Help regarding the paper “Comparative analysis of regulatory information and circuits across distant species”

Recently, I have read one of your paper titled “Comparative analysis of regulatory information and circuits across distant species”. In this paper, you wrote that you used simulated annealing to reveal the organization of regulatory factors in three layers of master-regulators, intermediate regulators, and low-level regulators. However, I can’t find the program for this method or the references related to this method. I want to use this method to class the TFs in my own regulatory network. Can you kindly provided this program for me?

An initial version of the code is available from

The code used for the analysis can be found
more recently, our group published an updated method. the code will be released very soon.

Information about program code for ENCODE paper

During the last days I was reading your paper "Architecture of the human
regulatory network derived from ENCODE data".
I am doing something related and I am willing to perform your kind of
analysis in addition or to merge the two ideas somehow.
For this purpose I was looking for some program code that has been
published for the analysis of your work, but so far I just found the
workflow description in the SI.
In case it is possible, I would be delighted if you could share the
relevant code with me, which would make life much easier for me and my
analysis much quicker.
I would be primarily interested in everything that allows me to infer
the hierarchy diagrams for the TF network and the TF-miRNA network.
By the way: Is there any reason why you did not include histone
modification and DNA methylation data?

some code is associated with separate papers – eg see :

Question re ENCODE data on website


I’ve been incorporating the encode data from your webpage in my analyzes
( The data is fantastic, but I have
questions regarding the enets*.GM_proximal_*filtered_network.txt data

The filtered dataset actually contains more regulators than the
unfiltered data
set, making me speculate that the unfiltered data file is not complete:
[bb447@compute-8-2 TF]$ cut -f1
enets6.GM_proximal_unfiltered_network.txt | sort
-u | wc -l
[bb447@compute-8-2 TF]$ cut -f1 enets8.GM_proximal_filtered_network.txt
| sort
-u | wc -l

Could it be possible that the file is incomplete?

the updated files are uploaded to the site. thanks again for pointing this out.

Architecture of the human regulatory network derived from


Re: Architecture of the human regulatory network derived from ENCODE data

Hi Dr. Gerstein: This is a very nice paper and is very important in my
current study. Do you have tools/software for TF Co-association (figure 1
and supplemental section B and C) mentioned in this paper. Can I get it?

Anshul did the co-association analysis for this Networks paper. I
think he knows that part the best.

As for the co-association analysis in the ENCODE main paper, it can
be repeated using the GSC package available at the ENCODE statistics web
site ( The first thing you need to do
is to determine (manually or by other means) a segmentation of the
genome, where TF binding is assumed segment-wise stationary. If you have
no specific preference on how the segmentation should be done, you can
use the GSC Python segmentation tool to do that, which will try to
perform an automatic segmentation (the results of which would be better
if you have more data). Then you can run the GSC Python program to
perform segmented block sampling to compute pairwise p-vlaues of your
binding data.

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

Data received – Re: Your model and input data to the “…integrative analysis of transcription factor binding data” paper

Many thanks for the excellent ENCODE papers! This is an unprecedented source for life scientists, and we appreciate that accordingly!

Would you be so kind as to access your model and input data your random forest model that predicts gene expression based on transcription factor binding?

Could you please also name the source of TSS CAGE? At UCSC, our only suspects were the Riken CAGE*TSS files, or CSHL LongRNA and ShortRNA files.
We would like to run and to adapt your model to the extremely tight co-regulation of ribosome protein genes. We believe that the ENCODE TF’s may account for a major part of their regulation.

Naturally, we would properly cite your works (incl. Cheng & Gerstein, 2011). Should you prefer, we are open to any reasonable forms of collaboration.



The human TSS CAGE data are from Roderic’s Lab.

here is the Human CAGE TSS file:

here is a readme file:

and here are some additional explanations of how the file was made:

ENCODE-Networks Source Code for Context-Specific TF Co-Association Analyses

I am interested in your paper published in Nature, 06 September 2012, “Architecture of the human regulatory network derived from ENCODE data”. In particular, we are interested in the framework of context-specific TF co-association analysis described in this paper. We would like to apply this method on our in-house datasets. It’s exciting that the code for these analyses is “Available soon” (the file “enets21.coassoc-code.tgz” on Do you know whether the code for co-association analysis in this paper is available now? If so, it might save us a lot of time. Thanks for your help!

The main machine learning method used for the analysis is RuleFit3 which is available here

Detailed instructions on preparing the input data and computing the various scores are in the supplement of the paper.

I don’t have a polished code package that is ready for use for the general public. The code that I wrote for analyses in the paper is here . But I have to warn you that its not designed to work on general datasets as it has scripts that were designed to run on our local cluster. The core functions are in . The code is reasonably commented so hopefully it should help.