pnas paper supplement duplication

Q1:
I am reading with interest your recent paper (Kumar, Clarke, and Gerstein, PNAS), but I suspect that supplement 1 and 2 are the same, and neither has a list of 434 genes. Could you please supply the list?

A1:
Thank you very much for your interest in the paper. Supplement 1 includes hotspot communities based on pan-cancer analysis (i.e., when will compute statistics over multiple cancer cohorts in TCGA). In contrast, supplement 2 lists out putative driver genes with hotspot communities for specific cancer types. If you note in supplement2, column F list out the name of particular cancer cohorts.

Regarding the number of genes, 434 genes are based on the pan-cancer analysis.
For each gene, there are multiple PDB entries. For analysis in our paper, we selected a representative structure with the highest residue coverage. However, to be exhaustive and allow researchers to analyze protein of their interest, in our supplement, we include all PDB entries for a given gene. We have tried to explain this in our method section.

Q2:
Thanks for your quick reply; but, no, this does not remove my confusion. Please take a moment to check the link from your paper at PNAS. When I download pnas.1901156116.sd01.xlsx, the file has 217 lines (not 434) and includes the column F that breaksdown by cancer type.

A2:
I am attaching our original tables with the email. It appears that the table has been somehow duplicated on the PNAS website. We will work with the PNAS team to get it fixed.

Supplemental_tables.xlsx

Supplementary data of Architecture of the human regulatory network derived from ENCODE data

Q:
I recently read the ENCODE paper "Architecture of the human regulatory network derived from ENCODE data", and I realized that the supplementary data will greatly help me to refine projects results, in particular those files related to the K562. Unfortunately, I found that all the supplementary data files are not available to download, since both of the following sites can’t be reached.

http://encodenets.gersteinlab.org
http://archive.gersteinlab.org/proj/encodenetsold

In particular, the second link is active, but if I try to download one of the files, it points to the first link and the download is interrupted. I am writing to ask if there are any other ways to access the files.

A:
http://encodenets.gersteinlab.org should be back up now. Let us know

Funseq2 Web Server

Q:
The Funseq2 Web Server goes down these days. Would it be available in the next few days?

A:
The Funseq2 web server is up and running now. It has some suspicious activity on the server recently and we are keeping on monitoring it.
If you are submitting your own query, please try to use the correct format, or it will shows ‘service unavailable’ service.

As an alternative, you can also download the whole genome annotations for both hg19 and hg38 from funseq3.gersteinlab.org, then use tabix to query.

Inquiry regarding PsychENCODE Datasets

Q:
We are trying to replicate some results using the bulk RNA-seq datasets available from the PsychENCODE consortium. We currently have access to the transcript RSEM count data from reads aligned to hg19. We were wondering if the same data was available for reads aligned to hg38 and if so, how we could access that data?

A:
Sorry, we currently don’t have the transcript RSEM count data from reads aligned to hg38.

Regulatory Genetic network AND DSPN

Q:
I am studying your publication in Science (Comprehensive functional genomic resource and integrative model for the human brain, Science 362,1266(2018) with great interest. As a quantitative geneticist, I found it very relevant to the study of complex genetic traits. Therefore, I am writing this note to request your assistance inorder get your software/algorithm for Regulatory Genetic Network modeling and Integrative deep learning model (DSPN) so that we could implement them at NIH supercomputer system and conduct some integrative genomic modeling work in the area of brain/neuropsychiatry.

A:
Best to see resource.psychencode.org. Specifically — you can find the matlab codes "7. Matlab code and formatted data for
the DSPN" on http://resource.psychencode.org/

pseudogenes in PseudoPipe

Q:
The pseudogene databases, including Pseudofam and PseudoPipe, have been extremely helpful for a project I am working on, and I was wondering if you knew how it would be possible to compare the DNA sequence of a human gene with all the pseudogenes on the PseudoPipe resources. I am looking to identify pseudogenes that may be related to the genes I am working with. I was hoping there was a way to devise this information by BLAST comparing the DNA sequence a specific gene with the sequences from all the pseudogenes in the genome, similar to NCBI BLAST or UniProt BLAST feature.

Any help or insight would be appreciated.

A:
If you have many genes to query, may be you can use BLAST+ (https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download) to build your own tool. You can then download the sequences of all pseudogenes and make a BLAST database (https://www.ncbi.nlm.nih.gov/books/NBK279688/ ) from which you can query.

ASE analysis within your article (A uniform survey of allele-specific binding and expression over 1000-Genomes-Project individuals)

Q:
I am writing to your regarding the code to analyse ASE in RNA-seq data present within this Article, specifically the beta diversity evaluation and application test. I was wondering if the code is available, I would like to apply it compare samples.

A:
The code for calling allele-specific sites is available at
https://github.com/gersteinlab/alleleDB

The specific scripts for the beta-binomial test are
alleledb_calcOverdispersion.R
alleledb_alleleseqBetabinomial.R