Encode TF binding site data

Q:
I am currently trying to figure out which TFs may be regulating my
modules in the synovium of rheumatoid arthritis patients (generated using
WGCNA of microarray samples). Am I right in understanding I can use your
ENCODE data to do so on this website?

http://encodenets.gersteinlab.org/

Second question is which one of these files should I use to compare my gene
expression module genes with your TF binding gene lists? I noticed the raw
one has huge numbers of genes for each TF, so should I use the other i.e.
filtered?

enets1.Proximal_raw.txt

enets2.Proximal_filtered.txt

Is it a problem the ENCODE TF binding sites are generated from cell lines
and not in the diseased tissue I am interested in? Apologies for my naivety!

A:
quick answers below

…Am I right in understanding I can use your
ENCODE data to do so on this website?

ANSWER: yes

I noticed the raw
one has huge numbers of genes for each TF, so should I use the other i.e.
filtered?

ANSWER: I’d go w/ the filtered one (enets2) first.

Is it a problem the ENCODE TF binding sites are generated from cell lines
and not in the diseased tissue I am interested in?

ANSWER: don’t think so – they are meant to be a ref.

ENCODE analysis with NA12878 genome

Q:
I seem to remember that you were doing some ENCODE analysis using an actual
NA12878 genome instead of the human reference GRCh37 (or whatever what the
current version at the time). Am I remembering this correctly? Was there
ever a comparison published that showed the benefits of using the actual
NA12878 genome versus the reference?

A:
yes to all above!

this is was in the encode phase 2 manuscript (we did the fig.)
Also, there were detailed follow up analyses in
http://papers.gersteinlab.org/papers/AlleleSeq/
http://papers.gersteinlab.org/papers/alleledb/

genetic map in 2010 modEncode paper

Q:
In your 2010 modEncode paper you and your collaborators showed chromatin marks against the genetic map (figure 5). We would like to look at the genetic vs the physical map also – is there somewhere we can download a detailed genetic map?

A:
I assume what you want isn’t in the supplement or on the paper’s data page (see links below):
==
http://www.modencode.org/publications/worm_2010pubs/index.shtml
http://science.sciencemag.org/content/sci/suppl/2010/12/20/science.1196914.DC1/Gerstein-SOM.pdf

PARE tool from a Software Engineering (SE) perspective

Q:
I’m interested in the PARE tool from the Software Engineering (SE) side, but
I can’t find any information regarding SE that can help me using it in my
first research.

Is there any SE tools or skills have been used in creating PARE?

Please if the information about the functions describing the work of PARE
(like use cases, sequence diagrams, deployment diagrams, performance
annotations used in diagrams, the architecture used in it ………..) is
available, would you provide it to me please.

After finishing, allow me to send the results of my research to you.

A:
see
http://papers.gersteinlab.org/papers/pare/

Architecture of the human regulatory network derived from ENCODE data

Q:
I have a question about the following excerpt from page 37 of the supp.
materials:

"In this paper, we mainly present a TF-centric analysis. We have also
analyzed other types of genomic contexts, such as gene-centric contexts, to
reveal the effect of context-specific TF co-associations to gene expression,
as well as chromatin state contexts to reveal relationships of TF
co-associations to various enrichments of chromatin marks. We plan to
present these results in a future publication.”

Were those results ever published? If so could you please point me to them.
I’m looking for an updated regulatory networks based on ENCODE data.

A:
try:

http://papers.gersteinlab.org/papers/metatrack
http://papers.gersteinlab.org/papers/loregic

Quantification of private information leakage from phenotype-genotype data: linking attacks – do you have slides?

Q:
I am contacting you as I found your paper extremely interesting and very close to the activities I am doing. I would really like to present your work at our weekly lab meeting to my colleagues. Hence, I was wondering if perhaps you have some slides that I could use for this purpose.

A:
see
http://lectures.gersteinlab.org/summary/Genomic-Privacy-n-Individualized-RNAseq-Incompatible-or-Feasible–20161111-i0idash16/
+
other privacy tagged stuff at
http://lectures.gersteinlab.org/summary/

Macromolecular Database Question

Q:
I have come across the Macromolecular Database and I
was curious to how a degree of motion is quantified in this site. In the
following link for one of the entries (HIV protease:
http://www.molmovdb.org/cgi-bin/motion.cgi?ID=hivprot), the third box from
the top entitled ‘Description’ says of HIV protease:

"Two large loop regions, that together comprise one quarter of the
structure, move CA atoms ~7 Angstroms"

Is this referring to a RMSD value of an ensemble of structures? Is this a
RMSD value of the whole protein, or only for the domain of those two large
loop regions?

A:
This is described in the DB paper
(http://papers.gersteinlab.org/papers/molmovdb2).

Data request re paper “Prediction and characterization of noncoding RNAs in C.elegans by integrating conservation, seondary structure, and high-throughput sequencing and array data. Genome Research.2011”

Q:
I have read your paper "Prediction and characterization of noncoding RNAs in
C.elegans by integrating conservation, seondary structure, and
high-throughput sequencing and array data. Genome Research.2011". I am
currently doing a project to analyze lncRNAs in C.elegans, therefore it will
be a great help to have the coordinates of the lncRNAs discovered in your
paper. I would be grateful if you can send me the lncRNA annotation
file (GFF,GTF or GFF3 file) by email.

A:
try
http://papers.gersteinlab.org/papers/incrna/
linking to
http://archive.gersteinlab.org/proj/incrna/