I am interested in your paper published in Nature, 06 September 2012, “Architecture of the human regulatory network derived from ENCODE data”. In particular, we are interested in the framework of context-specific TF co-association analysis described in this paper. We would like to apply this method on our in-house datasets. It’s exciting that the code for these analyses is “Available soon” (the file “enets21.coassoc-code.tgz” on http://encodenets.gersteinlab.org/). Do you know whether the code for co-association analysis in this paper is available now? If so, it might save us a lot of time. Thanks for your help!
The main machine learning method used for the analysis is RuleFit3 which is available here
Detailed instructions on preparing the input data and computing the various scores are in the supplement of the paper.
I don’t have a polished code package that is ready for use for the general public. The code that I wrote for analyses in the paper is here https://code.google.com/p/tf-coassociation/source/browse/#svn%2Ftrunk%2Fscripts . But I have to warn you that its not designed to work on general datasets as it has scripts that were designed to run on our local cluster. The core functions are in
https://code.google.com/p/tf-coassociation/source/browse/trunk/scripts/assoc.matrix.utils.R . The code is reasonably commented so hopefully it should help.
With regards to the paper published in Nature, Architecture of the human regulatory network derived from ENCODE data, I have been perusing the Supplementary Information and find that reference No. 69 seems, to the best of my belief, to have been mapped incorrectly. I would like to provide a quote which, in my understanding, promises a reference to a RuleFit3 manuscript but instead corresponds to a paper concerning Transcriptional Regulation in Mast Cells:
The number of rules is not set a priori but is rather learned from the data itself. Details are provided in the RuleFit3 manuscript69. -P. 14/271
69 Bockamp, E. O. et al. Transcriptional regulation of the stem cell leukemia gene by PU.1 and Elf-1. J. Biol. Chem. 273, 29032-29042 (1998).
It turns out that references 69-71 in section C2 of the supplementary material were not correctly added to the reference list. References 69-71 in later sections refer to the correct articles. Below are the correct citations for refs 69-71 in section C2 of the supplement.
Rulefit3 (ref 69)
Frieman, J. H. & Popescu, B. E. Predictive Learning Via Rule Ensembles. Annals Applied Stat. 2, 916-954, doi:10.1214/07-Aoas148 (2008).
the well-known random forest algorithm (ref 70)
Breiman, L. Random forests. Mach Learn 45, 5-32, doi:10.1023/A:1010933404324 (2001). http://dx.doi.org/10.1023/A:1010933404324
the GREAT Functional Annotation server (ref 71)
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nature Biotechnology 28, 495-U155, doi:10.1038/nbt.1630 (2010). http://dx.doi.org/10.1038/nbt.1630