Understanding transcriptional regulation by integrative analysis of transcription factor binding data

Q1:

In the article, it is mentioned that recent studies often had the problem that they were dependent on techniques like microarrays and that is why these studies were not able to measure expression levels of isoforms of some genes very accurately. It is also said that in this study, those problems would not exist, because ENCODE-data was used. So I looked up the ENCODE project, but I am not quite sure, why this data should be more accurate.

A1:
As we described in the paper, the ENCODE generated CAGE data that measures expression level of each TSS (translational start sites) of a gene. The data enable us to know the effect of TF binding signal nearby a TSS to the expression levels of the TSS.

Q2: Another point I am not sure about is, how this model is used. What kind of data you have to introduce to the program? Do you use transcription factor binding data, or are you just choosing your Transcription factor and the Start site sequence and the program is just telling you, what the probablility for getting an mRNA-transcript is? And if the first option is true, why is it easier to get the binding data of Transcription factor than the expression data – because if you have interactions of the chromatin structure, the latter should be more accurate, shouldn’t it?

A2: The Input to the model is: the TF binding signal nearby each TSS (for all TFs with ChIP-seq data available from ENCODE) AND the expression levels of all TSSes. Since we are using a supervised model, we randomly select 2000 TSSes for training the model, and test the performece of the model in the remaining data. I think your confusion is: since it is easy and more accurate to measure gene expression by RNA-seq or other experiments, why bother using ChIP-seq TF binding data to make prediction? The goal of our model is not to predicting gene expression. The goal is to use the model to quanitfy the relationship between gene expression and TF binding. We want to know: How much gene expression can be explained by TF binding signal? Which TF is more important? TF binding at which position contribute more? And other questions.

Q3: I am also curious, if the developed model is already used for the more predictive transcription factors, or if it was not intended to be used. If it was applied, do you know some groups who did so? I’m quite interested, whether they could create consistent data with this method.

A3: To my knowledge, many other groups also test models to study the relationship between gene expression and TF binding and /or histone modification. You may find the paper by Zhengqing OuYang in PNAS (PMID:19995984), by XIanjun Dong in Genome Biology (PMID:22950368) and many other publications. Again, the goal is to understand regulation conferred by TF binding and histone modifications, rather than predict gene expression.

Data associated with paper “Construction and analysis of an integrated regulatory network derived from high-throughput sequencing data”

Q:
Hello. I read your article in PLoS Computational Biology titled "Construction and analysis of an integrated regulatory network derived from high-throughput sequencing data". It is very interesting and nice study. I really love it!!

Would it be possible to obtain your data and networks that you have used in this article? I would like to add our own data to expand the networks.

I apologize for this sudden request, but I really would like to work with your networks.

Thank you very much for your time. I am looking forward to hearing from you soon.

A:
check out modencode.org where you can download different versions of the networks.

In additional to the link sent by Mark, which provides the most
updated raw data we are using, you can also find processed regulatory
interactions here: http://archive.gersteinlab.org/proj/mirnet/

Since the data is keep being updated, the new data from modENCODE
website might be slightly different from the one used in the paper.

Question about paper Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data

Q:

I am very interested in Chao Cheng’s paper Construction and Analysis of an Integrated Regulatory Network
Derived from High-Throughput Sequencing Data. It is great resource for me to
analysis regulation network of C elegans.

However, I met troubles in downloading the Table S2 and Table S3 from
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002190#cor1.
Is it possible to send me the supporting tables by email?

A:
Thanks for your interest in our work. Please find the tables in the attached files. Let me know if you need more information.

TableS2.xls

TableS3.xls

Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells

Q:

I am very interested in your recent work "Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells". Could you please advise with respect to the experimental datasets used in your study? I am now looking at the mouse ESC TFs dataset from Chen et al., 2008, provided in their supplemental Table S3. Could you please advise, whether these data refer to the mm8 mouse genome assembly or the mm9 mouse genome assembly? The row data deposited in GEO seem to be updated in 2012, so they are probably re-mapped to mm9. But do you know what was the initial genome build reported in this paper, mm8 or mm9? (For consistency I want to use the "original" peaks reported by Chen et al., and used in your study, not making peak calling again from their row data).

A:
We are pleased to know your interest on the paper. In terms of the Genome assembly, we were using mm8 as the original paper (Chen et al.).

integrated regulatory network

Q:

I read your recent paper “Construction and Analysis of an Integrated
Regulatory Network Derived from High-Throughput Sequencing Data” in PLOS
Computational Biology with a great interest. I would like to know if the
data of your integrated regulatory networks is available, or if you mind to
share it. Indeed, I’m part of a group of statisticians in Evry (France)
working on probabilistic models for biological networks. Our aim is to
retrieve the groups of nodes having similar topological behaviours. The
fact that your data has three types of nodes, a hierarchical structure among
TFs and miRNAs and that you made a biological analysis of this structure
makes it very interesting for us to validate or not the methods we
developed. Would it be possible for you to send me the C. elegans network
and the corresponding hierarchical structure? Any use of it would of course
be referenced.

A:

I have upload the worm network data onto http://archive.gersteinlab.org/proj/mirnet
It comprise 3 files:

cel_TF_Target_GID.net : TF->gene interactions
cel_TF_MIR_GID.net: TF->miR interactions
cel_miR_conservedTarget_Kris3way_GID.net: miR->gene interaction

Node type is labeled as “MIR”, “TF” or “X” in the bracket.