Q:
Many thanks for the excellent ENCODE papers! This is an unprecedented source for life scientists, and we appreciate that accordingly!
Would you be so kind as to access your model and input data your random forest model that predicts gene expression based on transcription factor binding?
Could you please also name the source of TSS CAGE? At UCSC, our only suspects were the Riken CAGE*TSS files, or CSHL LongRNA and ShortRNA files.
We would like to run and to adapt your model to the extremely tight co-regulation of ribosome protein genes. We believe that the ENCODE TF’s may account for a major part of their regulation.
Naturally, we would properly cite your works (incl. Cheng & Gerstein, 2011). Should you prefer, we are open to any reasonable forms of collaboration.
A:
See http://archive.gersteinlab.org/proj/chromodel
The human TSS CAGE data are from Roderic’s Lab.
here is the Human CAGE TSS file:
ftp://genome.crg.es/pub/Encode/data_analysis/TSS/Gencodev7_CAGE_TSS_clusters_June2011.gff.gz
here is a readme file:
ftp://genome.crg.es/pub/Encode/data_analysis/TSS/Gencodev7_CAGE_TSS_clusters_June2011.txt
and here are some additional explanations of how the file was made:
ftp://genome.crg.es/pub/Encode/data_analysis/TSS/Gencodev7_CAGE_TSS_clusters_june2011.pdf