we are trying to find the co-evolving positions in a protein family of interest. I had submitted a job on the co-evolution server several days back, but I have not received a response yet. Could you please let me know the estimated time of completion of my job?
There was a long queue of pending tasks, and one of had been stuck in the queue for some time. I have removed it to let the others run. Please see if you can get your results within a day. If not, please let me know and I will check the system again.
Re: Architecture of the human regulatory network derived from ENCODE data
Hi Dr. Gerstein: This is a very nice paper and is very important in my
current study. Do you have tools/software for TF Co-association (figure 1
and supplemental section B and C) mentioned in this paper. Can I get it?
Anshul did the co-association analysis for this Networks paper. I
think he knows that part the best.
As for the co-association analysis in the ENCODE main paper, it can
be repeated using the GSC package available at the ENCODE statistics web
site (http://www.encodestatistics.org/). The first thing you need to do
is to determine (manually or by other means) a segmentation of the
genome, where TF binding is assumed segment-wise stationary. If you have
no specific preference on how the segmentation should be done, you can
use the GSC Python segmentation tool to do that, which will try to
perform an automatic segmentation (the results of which would be better
if you have more data). Then you can run the GSC Python program to
perform segmented block sampling to compute pairwise p-vlaues of your
Content-Type: text/plain; charset="utf-8"
I am interested in exploring further the work did by you and your team
members in DREAM 3 challenge, as reported in the paper stated below. Do you
provide the codes/program for public to view? Thanks.
"Improved Reconstruction of In Silico Gene Regulatory Networks by Integrating Knockout and Perturbation Data"
I am ok with the current software which you said quite tailored to the competition. Please send it to me. Really appreciate it. Thanks.
The current form of the software is quite tailored for the
competition, and we do not have a general, publicly distributable
version. I can send it to you if you think it would be useful.
Please find the version that we submitted to DREAM attached, together with some data and some script files for running it. If you have Apache Ant installed, simply issue the command "ant runall3" to run the program on the DREAM3 files. The size-10 networks are included, and the size-50 and size-100 networks can be downloaded from the DREAM web site.
Regarding your seminal paper "Relating Three-Dimensional Structures to
Protein Networks Provides Evolutionary Insights".
Amongst the supplementary data I could not find the PDB entries that were
used for each interaction in the SIN.
I would much appreciate if you could send me this data.
info. should be on the site
I really enjoyed your paper and am looking forward to using
some of the genomic regions you published at http://metatracks.encodenets.gersteinlab.org/
in my research.
I had a couple of questions about them.
BARs–are those the regions predicted by the random forest, or are they
the training set (bins overlapped by a TF ChIP-seq peak)?
PRMs–I may have missed it, but what is the definition of a "promoter"?
I’m guessing it was -1000 to +200bp around a TSS.
(This is to clarify the sentence "bins at the TSSs of expressed genes"
at the bottom of page 17.)
Since the PRMs don’t all span the same genomic distance, I presume
that only bins predicted by the random forest classifier are included
in the files?
Finally, do you have plans to make (or have already made) available
the software for creating region files of BARs,DRMs and DRM-targets
in other tissues?
The BARs are the output regions of Random Forest. They do greatly overlap with the input training sets though.
The positive examples for learning PRMs are the 100bp bins at exactly the TSSs of expressed genes. Random Forest then learned the feature patterns of these bins, and searched for similar bins in the whole genome.
After the predictions, adjacent bins all predicted as PRMs were merged to form regions. The files available on the supplementary web site contain these regions.
Since the computer programs were written based on the available data from ENCODE, they were not written in a way that can be easily adopted to other situations. We do not currently have a plan to make them available.
I am reading your paper "Classification of human genomic regions basedon experimentally determined binding sites of more than 100 transcription-related factors" and I have some questions.
In figure 1 what do the colors mean?
I also couldn’t understand plots in figure 4. what are the black dots, the error bars and the black line ?
I would be grateful if you answer my questions.
In figure one different colors are used for different types of regions. For each type of regions, one color is used as the background color as one color is used to show the signal level.
Figure four shows standard Box-and Whisker plots (http://en.wikipedia.org/wiki/Box_plot). The dots are the means of the distributions. The upper and lower lines are the non-outlier maximum and minimum values, respectively. The black lines in the middle are the medians.