how to filter TF binding peaks for a plant ENCODE project

Q:
My lab is doing a few plant ENCODE projects, and we have done ChIP-Seq for ~100 maize TF and is analyzing the data. We followed most of your 2013 paper “architecture of the human regulatory network…”. Something confused me a bit is that we have on average ~10,000 peaks for each TF (from SPP and IDR 0.01). If I associate them to genes based on the distance to TSS, we have a huge TF-gene or TF-TF network. almost everyone is interacting. For example, the 100 TF to 100 TF network has 5k edges, I guess many of them could be false positive due to the weak ChIP-seq peaks. In your paper, you used TIP (in your Cheng et al 2011 NAR) to further filter out some interaction. We are trying that as well. But I don’t understand how did you get the input for TIP (500,542 promoter associated interaction, page 3 of your paper) from 2,948,387 promoter proximal peaks. Is there something I missed?

I also have another question about TF function in general. I am not sure whether we can claim the TF binding is "non-functional”, if the TF gene itself showed low co-expression correlation with the target gene. Or silencing the TF gene did not affect the target gene expression. Because the regulation could be complex with multiple TF targeting one genes. Those show co-expression/correlation might be target genes that the TF play major role. While TF can still contribute to the expression of target genes but it only contribute a small percentage with other TF playing a more dominant role. So can i say that those TF binding has no function?

A:
My understanding is: TIP assume each TF has a specific binding profile around TSS cross the genome in the human genome. TIP then estimate an empirical distribution of signal/peaks around TSS, convert it to weight and calculate a score for a peak. This assumption is based on the human genome. It may not be applied to other genomes directly if there is no clear pattern in around TSS. Before you use the tool, please double check the binding profile of each TF in plants. You can check and adapt the source code of TIP from Github: https://github.com/gersteinlab/TIP

For TF ChIP-seq, if the constructed regulatory network very dense, you may try to use a more stringent cutoff to reduce the false positives regulations.

As to whether gene co-expression reflect TF regulatory function, as you mentioned, you already aware that the mechanism is very complex. The co-expression definitely cannot sufficiently prove this regulatory function. But we still can get some reliable inferences based on the co-expression according to many previous studies. Also if you have multiple data sources, the result can be refined by advanced machine learning techniques. you can refer a new paper from our lab recently, we use elastic-net to refine the TF-gene network(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=30545857&dopt=Abstract).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s