Encode for cancer genomics to predict gene expression

Q:
I am just beginning start my first ever project by using the extended gene definition provided in the dataset of Encode for cancer genomics to predict gene expressions. I would be incredibly grateful if there could be an explanation about the layout of the text files. I have been unsuccessfully trying to understand how the extended gene was used to interpret the mutations and expression changes in the published article.

A:
Thanks for your interest in the research and the extended gene annotation. We are preparing BED-formatted extended gene annotation and they will be available soon on our project website (http://encodec.encodeproject.org/). We will keep you informed.

source code for context-specific TF co-association analysis in ‘Architecture of the human regulatory network derived from ENCODE data’

Q:
I have benefited a lot from you work entitled ‘Architecture of the human regulatory network derived from ENCODE data’ and I want to use the framework you developed for context-specific TF co-association analysis. However, I can’t find the source code at your given address http://code.google.com/p/tf-co-association/. Do you have the replaced address to share the source code for that?

A:
Is this what you are looking for?
https://code.google.com/archive/p/tf-coassociation/source/default/source

Supplementary data of Architecture of the human regulatory network derived from ENCODE data

Q:
I recently read the ENCODE paper "Architecture of the human regulatory network derived from ENCODE data", and I realized that the supplementary data will greatly help me to refine projects results, in particular those files related to the K562. Unfortunately, I found that all the supplementary data files are not available to download, since both of the following sites can’t be reached.

http://encodenets.gersteinlab.org
http://archive.gersteinlab.org/proj/encodenetsold

In particular, the second link is active, but if I try to download one of the files, it points to the first link and the download is interrupted. I am writing to ask if there are any other ways to access the files.

A:
http://encodenets.gersteinlab.org should be back up now. Let us know

how to filter TF binding peaks for a plant ENCODE project

Q:
My lab is doing a few plant ENCODE projects, and we have done ChIP-Seq for ~100 maize TF and is analyzing the data. We followed most of your 2013 paper “architecture of the human regulatory network…”. Something confused me a bit is that we have on average ~10,000 peaks for each TF (from SPP and IDR 0.01). If I associate them to genes based on the distance to TSS, we have a huge TF-gene or TF-TF network. almost everyone is interacting. For example, the 100 TF to 100 TF network has 5k edges, I guess many of them could be false positive due to the weak ChIP-seq peaks. In your paper, you used TIP (in your Cheng et al 2011 NAR) to further filter out some interaction. We are trying that as well. But I don’t understand how did you get the input for TIP (500,542 promoter associated interaction, page 3 of your paper) from 2,948,387 promoter proximal peaks. Is there something I missed?

I also have another question about TF function in general. I am not sure whether we can claim the TF binding is "non-functional”, if the TF gene itself showed low co-expression correlation with the target gene. Or silencing the TF gene did not affect the target gene expression. Because the regulation could be complex with multiple TF targeting one genes. Those show co-expression/correlation might be target genes that the TF play major role. While TF can still contribute to the expression of target genes but it only contribute a small percentage with other TF playing a more dominant role. So can i say that those TF binding has no function?

A:
My understanding is: TIP assume each TF has a specific binding profile around TSS cross the genome in the human genome. TIP then estimate an empirical distribution of signal/peaks around TSS, convert it to weight and calculate a score for a peak. This assumption is based on the human genome. It may not be applied to other genomes directly if there is no clear pattern in around TSS. Before you use the tool, please double check the binding profile of each TF in plants. You can check and adapt the source code of TIP from Github: https://github.com/gersteinlab/TIP

For TF ChIP-seq, if the constructed regulatory network very dense, you may try to use a more stringent cutoff to reduce the false positives regulations.

As to whether gene co-expression reflect TF regulatory function, as you mentioned, you already aware that the mechanism is very complex. The co-expression definitely cannot sufficiently prove this regulatory function. But we still can get some reliable inferences based on the co-expression according to many previous studies. Also if you have multiple data sources, the result can be refined by advanced machine learning techniques. you can refer a new paper from our lab recently, we use elastic-net to refine the TF-gene network(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=30545857&dopt=Abstract).

Request for the supplementary data of the ENCODE paper

Q:
My projects focus on exploring the mechanisms of gene regulation. I recently read the ENCODE paper (Architecture of the human regulatory network derived from ENCODE data, 2012) again and realized that the supplementary data will greatly help us to refine our results.

Unfortunately, I found that all the files have been achieved. Both of the following sites can’t be reached. I am writing to ask if there are any other ways to access the files. Thank you very much for your time. I am looking forward to hearing from you.

http://encodenets.gersteinlab.org
http://archive.gersteinlab.org/proj/encodenetsold/

A:
http://encodenets.gersteinlab.org
should be up shortly

Encode TF binding site data

Q:
I am currently trying to figure out which TFs may be regulating my
modules in the synovium of rheumatoid arthritis patients (generated using
WGCNA of microarray samples). Am I right in understanding I can use your
ENCODE data to do so on this website?

http://encodenets.gersteinlab.org/

Second question is which one of these files should I use to compare my gene
expression module genes with your TF binding gene lists? I noticed the raw
one has huge numbers of genes for each TF, so should I use the other i.e.
filtered?

enets1.Proximal_raw.txt

enets2.Proximal_filtered.txt

Is it a problem the ENCODE TF binding sites are generated from cell lines
and not in the diseased tissue I am interested in? Apologies for my naivety!

A:
quick answers below

…Am I right in understanding I can use your
ENCODE data to do so on this website?

ANSWER: yes

I noticed the raw
one has huge numbers of genes for each TF, so should I use the other i.e.
filtered?

ANSWER: I’d go w/ the filtered one (enets2) first.

Is it a problem the ENCODE TF binding sites are generated from cell lines
and not in the diseased tissue I am interested in?

ANSWER: don’t think so – they are meant to be a ref.

ENCODE analysis with NA12878 genome

Q:
I seem to remember that you were doing some ENCODE analysis using an actual
NA12878 genome instead of the human reference GRCh37 (or whatever what the
current version at the time). Am I remembering this correctly? Was there
ever a comparison published that showed the benefits of using the actual
NA12878 genome versus the reference?

A:
yes to all above!

this is was in the encode phase 2 manuscript (we did the fig.)
Also, there were detailed follow up analyses in
http://papers.gersteinlab.org/papers/AlleleSeq/
http://papers.gersteinlab.org/papers/alleledb/

Architecture of the human regulatory network derived from ENCODE data

Q:
I have a question about the following excerpt from page 37 of the supp.
materials:

"In this paper, we mainly present a TF-centric analysis. We have also
analyzed other types of genomic contexts, such as gene-centric contexts, to
reveal the effect of context-specific TF co-associations to gene expression,
as well as chromatin state contexts to reveal relationships of TF
co-associations to various enrichments of chromatin marks. We plan to
present these results in a future publication.”

Were those results ever published? If so could you please point me to them.
I’m looking for an updated regulatory networks based on ENCODE data.

A:
try:

http://papers.gersteinlab.org/papers/metatrack
http://papers.gersteinlab.org/papers/loregic

Questions about “Architecture of the human regulatory network derived from ENCODE data”

Q:
I am reading your paper, and have problem about the TF-target gene network data downloaded from http://encodenets.gersteinlab.org/. I want to know which refGene and gene symbol did you use when you find the TF target gene with ChIP-seq data? I find that some symbols are not concluded in hg19 refGene I download from ucsc.

A:
the server was down for a while, and I wasn’t sure what names were you talking about. Now, I think the names are from gencode, but I cannot recall the exact release we used. I believe the names wouldn’t change in general. you can see all the releases here, the names should be in one of the metafiles.
http://www.gencodegenes.org/releases/

Help regarding the paper “Comparative analysis of regulatory information and circuits across distant species”

Q:
Recently, I have read one of your paper titled “Comparative analysis of regulatory information and circuits across distant species”. In this paper, you wrote that you used simulated annealing to reveal the organization of regulatory factors in three layers of master-regulators, intermediate regulators, and low-level regulators. However, I can’t find the program for this method or the references related to this method. I want to use this method to class the TFs in my own regulatory network. Can you kindly provided this program for me?

A:
An initial version of the code is available from encodenets.gersteinlab.org.

The code used for the analysis can be found
http://encodenets.gersteinlab.org/enets16.hierarchy_levels.m
more recently, our group published an updated method. the code will be released very soon.
http://genomebiology.com/2015/16/1/63/abstract#