PsychENCODE GRN questions

Q1:
I had a few questions about the Gene Regulatory Networks published as part of the Comprehensive functional genomic resource and integrative model for the human brain at http://resource.psychencode.org/. Could you pass these along to whomever is best suited to address them?

First question: Which reference genome is used?

The GRN has the following format:

Transcription_Factor,Target_Gene,Enhancer_Region,Edge_Weight

And most rows look like:

BARHL2,SHC1,chr1:154869072-154870071,0.284806116416629

however, some rows just have "Promoter" in the Enhancer_Region column, like this one:

NR2F2,SHC1,Promoter,0.120934147846037

But since NR2F2 (and most other genes) have a couple different reference haplotypes in both refseq and gencode (e.g. see NR2F2 in UCSC genome browser), it’s ambiguous to me where "Promoter" designates.

Does there exist a version of the GRN with Promoter substituted for chromosomal coordinates, or would you mind sending a reference to the haplotype you used as reference when building this GRN?

To summarize above: what reference genome did you use in constructing the GRN? What region does "Promoter" evaluate to?

A1:
We defined the promoter regions by a window of ±1.25 kb (=2.5 kb in
total) relative to the transcription start site (TSS) on hg19.

Q2:
Could you send the hg19 reference genome you’re referring to?

If I go to the UCSC browser and look at refseq hg19, for some arbitrary gene: [[see image]]

The gene has multiple reference isoforms. Where does your GRN situate the promoter for this gene? i.e. which chromosomal location does the ChIP track you integrated in your GRN identify the TF at? Chromosomal coordinates would be less ambiguous than stating the TF binds the promoter. Would the production of such a network be possible, or would you be able to send us a reference genome you used with a single location for each promoter (i.e. a single tss)? How did you choose the ‘canonical’ isoform for each gene? What about the promoters upstream of the other tss’s — is there evidence of regulation at those alternate promoters?

Any chance you might be able to resolve this for us? It seems to limit the utility of this network to have this ambiguity about the chromosomal location of these transcriptional regulatory events. It would be a shame not to resolve it, I think.

A2:
I have added the promoter TSS file to our website at: http://resource.psychencode.org/Datasets/Integrative/tss.sites.codingOnly.gencode.v19.annotation.bed

It can be found at resource.psychencode.org by navigating to the section on "Integrative Analysis", and scrolling to item 3.

Data from 1000-Genomes Allele-Specific Binding Paper

Q:
I came across your lab’s paper Chen et al, 2015 in Nature Communications on allele-specific binding and expression in 1000-Genomes-Project individuals, and was hoping to integrate that data with some analysis on DHSs.

I found the data available on the http://alleledb.gersteinlab.org site, and found the list of SNPs with significant ASB and ASE, but was wondering if you had the total list of SNPs queried in a format similar to the ASB and ASE tables. If not, do you know what the easiest way to assemble that from the data available on the site is?

A:
All heterozygous SNVs of the individuals were queried for ASB/ASE and are available in the VCF format from the 1000 GP site. SNVs with high enough read coverage to be able to detect ASB and ASE events (‘accessible SNVs’) are available from http://alleledb.gersteinlab.org/download/ under (3) and (4), respectively. These files are in a similar format to the tables with significant ASB/ASE events.

Ask for the information of data funseq

Q:
I have downloaded the Whole genome scores(hg19) both Version 2.1.6 and 2.1.0 in the project website you provided http://funseq2.gersteinlab.org/downloads. There no score when I queried the codign region, but in the Whole Genome Query interface displays the results, such as chr1:11073808-11073808. I will be so appreciated if you could kindly tell me the reason of this problem. Thank you for your kind consideration of this request.

A:
The genome score you downloaded only includes non-coding variants. For coding regions, the score mostly reply on VAT annotation (another tool by our lab: vat.gersteinlab.org). Also whole genome score including both coding and non-coding will be a very large file, which over 50G after compressed. So we provide a query server: http://funseq3.gersteinlab.org/ . Thanks also for pointing out this issue on the download page, and we will update the webpage with clear and detailed file descriptions.

genetic map in 2010 modEncode paper

Q:
In your 2010 modEncode paper you and your collaborators showed chromatin marks against the genetic map (figure 5). We would like to look at the genetic vs the physical map also – is there somewhere we can download a detailed genetic map?

A:
I assume what you want isn’t in the supplement or on the paper’s data page (see links below):
==
http://www.modencode.org/publications/worm_2010pubs/index.shtml
http://science.sciencemag.org/content/sci/suppl/2010/12/20/science.1196914.DC1/Gerstein-SOM.pdf

Pseudogenes – PseudoPipe

Q:
I am trying to find SNPs in
pseudogenes but the database for the SNP’s is built for different genome
assemblies than pseudogenes predictions from PseudoPipe. Do you have the
current pipeline pseudogenes predictions on eukaryotic genomes? Or is there
a way to remap the genome assemblies used by Pipeline to a different
assembly?
If I want to use PsedoPipe, where in Ensembl can I find the input data set?

A:
Regarding your questions there are a number of things that you can do:
* if you are interested in the human/mouse genome, these are available for the latest assembly GRC38 from the pseudogene.org webpage , see http://mouse.pseudogene.org/data/Reference/Mus_musculus.GRCm38.87_pgene.txt and http://www.pseudogene.org/Human/Human90.txt respectively.
* the latest annotations for the worm and fly genomes, these are available from here :
http://pseudogene.org/psicube
* if you are interested in other eukaryotic genomes that have annotation build on older assemblies, one option is to do a lift over of the annotation from an old assembly to a newer one. This can easily be done using the UCSC genome browser resource https://genome.ucsc.edu/cgi-bin/hgLiftOver, however I would very much advise to actually run pseudo pipe on your machine given the fact that improvement in assembly and protein coding annotation will considerably improve the output of the pseudogene annotation. You can download and run pseudo pipe as described here: http://pseudogene.org/pseudopipe/
* also using the “fetch file” as described here http://pseudogene.org/pseudopipe/ will automatically download all the necessary data for you from the ensembl server.

Unitary Pseudogene PROMOTER

Q:
I am trying to find an example of a unitary pseudogene whose
promoter is known to be mutated as well and therefore the gene is
definitely non-functional. I can find articles stating there are many
examples of unitary pseudogenes in humans (e.g. Vitamin C) but none
seem to mention the promoter. Any thoughts?

A:
Our analyses compiled a number of activity features associated with pseudogenes (e.g. transcription, presence of functional Pol2 and TF binding sites in the upstream region, presence of open chromatin) that are available in online. Please see https://www.ncbi.nlm.nih.gov/pubmed/22951037 (http://pseudogene.org/psidr/ ) and https://www.ncbi.nlm.nih.gov/pubmed/25157146 (http://pseudogene.org/psicube/) for the functional characterisation of pseudogenes. In particular the unitary pseudogenes that do not have transcription, Pol2 and TF binding sites should be the ones to look at and to check the conservation or not of the promoter region.

PARE tool from a Software Engineering (SE) perspective

Q:
I’m interested in the PARE tool from the Software Engineering (SE) side, but
I can’t find any information regarding SE that can help me using it in my
first research.

Is there any SE tools or skills have been used in creating PARE?

Please if the information about the functions describing the work of PARE
(like use cases, sequence diagrams, deployment diagrams, performance
annotations used in diagrams, the architecture used in it ………..) is
available, would you provide it to me please.

After finishing, allow me to send the results of my research to you.

A:
see
http://papers.gersteinlab.org/papers/pare/