rulefit3 in encodenets

Q:
I read your paper about the co-associations among TF binding events, (Architecture of the human regulatory network derived from ENCODE data), and got interested in your original clustering algorithm. Now, in our laboratory, we are developing a new clustering algorithm for a large number of genomic data, and implemented its prototype algorithm. However, the accuracy of our algorithm is not so completed, and we have to evaluate it. Thus, we want to use your algorithm as the fine basis, so how can we use it? If the program is available for us, can you tell us the way to use it?

A:
In that paper we used the Rulefit3 package from Prof. Jerome Friedman; there is an R package available at the link below. Our use of the algorithm is extensively documented in Section C of the Supplementary Materials.

Rulefit3
http://dx.doi.org/10.1214/07-Aoas148
http://www-stat.stanford.edu/~jhf/r-rulefit/rulefit3/R_RuleFit3.html

Architecture of the human regulatory network derived from ENCODE data http://dx.doi.org/10.1038/Nature11245

missing citations in encodenets supplement

Q:

With regards to the paper published in Nature, Architecture of the human regulatory network derived from ENCODE data, I have been perusing the Supplementary Information and find that reference No. 69 seems, to the best of my belief, to have been mapped incorrectly. I would like to provide a quote which, in my understanding, promises a reference to a RuleFit3 manuscript but instead corresponds to a paper concerning Transcriptional Regulation in Mast Cells:

The number of rules is not set a priori but is rather learned from the data itself. Details are provided in the RuleFit3 manuscript69. -P. 14/271

69 Bockamp, E. O. et al. Transcriptional regulation of the stem cell leukemia gene by PU.1 and Elf-1. J. Biol. Chem. 273, 29032-29042 (1998).

A:

It turns out that references 69-71 in section C2 of the supplementary material were not correctly added to the reference list. References 69-71 in later sections refer to the correct articles. Below are the correct citations for refs 69-71 in section C2 of the supplement.

Rulefit3 (ref 69)
Frieman, J. H. & Popescu, B. E. Predictive Learning Via Rule Ensembles. Annals Applied Stat. 2, 916-954, doi:10.1214/07-Aoas148 (2008).
http://dx.doi.org/10.1214/07-Aoas148

the well-known random forest algorithm (ref 70)
Breiman, L. Random forests. Mach Learn 45, 5-32, doi:10.1023/A:1010933404324 (2001). http://dx.doi.org/10.1023/A:1010933404324

the GREAT Functional Annotation server (ref 71)
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nature Biotechnology 28, 495-U155, doi:10.1038/nbt.1630 (2010). http://dx.doi.org/10.1038/nbt.1630
http://great.stanford.edu/

Multinet (Unified global network) – academic use

Q:
I read your seminal paper “Interpretation of Genomic Variants Using a Unified Biological Network Approach” recently published in PLoS Computational

Biology. I have a few queries:
Is the network available for academic use?
Can we download the relevant multinet to form hypothesis and do

experiments?

A:
Please find the downloadable network at
http://homes.gersteinlab.org/Khurana-PLoSCompBio-2013/
Posted in Uncategorized | Tagged ek | Leave a reply

Do you need parents’ genotype data?

Q:

I am looking for a tool to detect allele specific expression from resequencing and RNA-seq data. I find AlleleSeq could be quite powerful. I noticed the input for the software needs parents genotype data; it requires a VCF file which contains trio genotype to create maternal and paternal genome. But in my case, if I only have genotype information from a single individual, how could I use AlleleSeq?

A:

You dont have to genotype parents. You only need to have variants phased in any way you can/wish (vcf2diploid tool only looks at one column with info for the individual of interest and does not consider other columns). Having trio sequenced is an easy and, probably, the best way to do it.

If you have the mothers genotype only, then you can phase a good fraction of heterozygous variants. Each unphased variants will be randomly assigned to a particular haplotype, so half of them will also be correct. And, of course, all homozygous variants will be phased.

Mismatches between the paternal and maternal chromosomes

Q:
I believe I have discovered numerous errors in the NA12878 dataset. We are working with the most recent version,
NA12878_diploid_genome_may3_2011. They are all single base pair mismatches between the paternal and maternal chromosomes in regions that the accompanying .map file marks as contigs.

A:
.map file shows continuous equivalent (without gaps) blocks between haplotypes. BUT THEY DO INCLUDE SNPs. So, heterozygous SNPs will result in base mismatch within a block.