Full set of tQTLs and isoQTLs from Wang et al. 2018

Posted on December 14, 2020 by gersteinfaq

Q:
we have made great use of the publicly available PEC resources on https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fresource.psychencode.org%2F&data=02%7C01%7Cshuang.liu%40yale.edu%7Caa9d9436ceb6478ec71208d8142dea62%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637281534720419443&sdata=eu4DyEY%2BNUJuueEbj4YWFeWfOYoao6j%2B%2F1rqyq1DSUc%3D&reserved=0, in particular the QTL data. However, I have not been able to locate the full set of isoQTLs and tQTLs without any p-value/FDR filtering, as is available for eQTLs. Is there somewhere I can access this easily? Or does access to the full set of tQTLs and isoQTLs require an application to Synapse?

A:
Currently we don’t provide access to the full set. The full set is very large and we need to discuss where we should share these data. I will let you know once we have any updates.

Full set of tQTLs and isoQTLs from Wang et al. 2018

Posted on July 28, 2020 by gersteinfaq

Q:
As a lab, our general interests lie in the intersection between transcriptomics, neurogenetics, and genetic diagnosis. As such, we have made great use of the publicly available PEC resources on https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fresource.psychencode.org%2F&data=02%7C01%7Cshuang.liu%40yale.edu%7Caa9d9436ceb6478ec71208d8142dea62%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637281534720419443&sdata=eu4DyEY%2BNUJuueEbj4YWFeWfOYoao6j%2B%2F1rqyq1DSUc%3D&reserved=0, in particular the QTL data. However, I have not been able to locate the full set of isoQTLs and tQTLs without any p-value/FDR filtering, as is available for eQTLs. Is there somewhere I can access this easily? Or does access to the full set of tQTLs and isoQTLs require an application to Synapse?

A:
Currently we don’t provide access to the full set. The full set is very large and we need to discuss where we should share these data. I will let you know once we have any updates.

Questions regarding eqtl calls

Posted on July 28, 2020 by gersteinfaq

Q:
I am trying to reproduce the eQTL calls published here with file name: Full_hg19_cis-eQTL. I’m having some difficulty reproducing the eQTL calls and in particular the P-values, and wanted to figure out where my pipeline isn’t matching.

1) I am unsure of the earth selection process on the super covariates sets. Currently, we try to reproduce the covariates selection using one hot matrix encoded covariates superset mentioned in the supplementary material (page 7) of this publication . We are curious on what covariates are selected (e.g.: brain bank covariates include multiple institutes, are all of them selected, or just some of them?).

2) We are unsure on which GTEx pipeline for EQTL calls were employed by the publication. We are currently using the GTEx pipeline mentioned here, but am wondering if the paper uses an older version of the GTEx pipeline that was previously available?

3) Another question is which datasets are fed into the eqtl calls? We are currently working with the capstone genotype datasets and TPM expression matrix published here with file name: DER-02_PEC_Gene_expression_matrix_TPM. We are wondering if the Genotype/Expression filtering were done directly on these files?

4) The last question is when we call eqtl using FastQTL, the nominal p-values (that have passed FDR < 0.05) are much larger compared to the p values your study published here with the file name: DER-08a_hg19_eQTL.significant (so it looks like we’re incredibly underpowered). I’ve attached a figure to illustrate the nominal p values reported in your files versus computed by us. We have used the Capstone genotypes and expression files (as described above), and though we should be somewhat underpowered relative to your study (because we are missing the GTEx genotypes/expression files, which need separate agreements), I’m not sure it accounts for the difference in p value magnitudes. I was wondering if you have any thoughts on which part of the pipelines we may have implemented incorrectly that could lead to such a huge difference?

A:
Here are some responses to your questions.

I am unsure of the earth selection process on the super covariates sets. Currently, we try to reproduce the covariates selection using one hot matrix encoded covariates superset mentioned in the supplementary material (page 7) of this publication . We are curious on what covariates are selected (e.g.: brain bank covariates include multiple institutes, are all of them selected, or just some of them?).
Here are the covariates we are using, you can also find the description in supplemental materials in our paper (http://papers.gersteinlab.org/papers/capstone4/index.html):

Top 3 genotyping principal components
Probabilistic Estimation of Expression Residuals (PEER) factors
Genotyping array platform
Gender
Disease status

We are unsure on which GTEx pipeline for EQTL calls were employed by the publication. We are currently using the GTEx pipeline mentioned here, but am wondering if the paper uses an older version of the GTEx pipeline that was previously available?
The detailed description of our eQTL pipeline could be found in Fig. S31 in our paper http://papers.gersteinlab.org/papers/capstone4/index.html.

Another question is which datasets are fed into the eqtl calls? We are currently working with the capstone genotype datasets and TPM expression matrix published here with file name: DER-02_PEC_Gene_expression_matrix_TPM. We are wondering if the Genotype/Expression filtering were done directly on these files?
You can find details in Fig. S31 in our paper http://papers.gersteinlab.org/papers/capstone4/index.html.

The last question is when we call eqtl using FastQTL, the nominal p-values (that have passed FDR < 0.05) are much larger compared to the p values your study published here with the file name: DER-08a_hg19_eQTL.significant (so it looks like we’re incredibly underpowered). I’ve attached a figure to illustrate the nominal p values reported in your files versus computed by us. We have used the Capstone genotypes and expression files (as described above), and though we should be somewhat underpowered relative to your study (because we are missing the GTEx genotypes/expression files, which need separate agreements), I’m not sure it accounts for the difference in p value magnitudes. I was wondering if you have any thoughts on which part of the pipelines we may have implemented incorrectly that could lead to such a huge difference?
I am not sure which genotype file you are using. But we cannot share the merged genotype file since we integrated some GTEx samples in the file. We are also using different covariates. So your results will be different from ours if the genotype, phenotype and covariates inputs are not the same.

Question about the cQTL analysis in Wang et al 2018

Posted on March 9, 2020 by gersteinfaq

Q:
I am writing with a question about the cQTL analysis in Wang et al 2018. Were the 292 individuals analyzed in this analysis all of European ancestry? If not, what were the sample sizes for European vs non-European ancestry, and how did you control for ancestry in your analysis?

I apologize for writing with such a detailed question, but I could not find the answer in the main text or supplement of the paper, or on the synapse website. (Context: I am interested in cross-population genetic analyses of psychiatric disease and wondering if PyschENCODE cQTL data is relevant.)

A:
In calculating the cQTLs, we used 173 Caucasians and 119 non-Caucasians. With respect to controlling for ancestry — we used the top three genotype principal components as covariates to control for ancestral group.

Inquiry regarding PsychENCODE Datasets

Posted on August 20, 2019 by gersteinfaq

Q:
We are trying to replicate some results using the bulk RNA-seq datasets available from the PsychENCODE consortium. We currently have access to the transcript RSEM count data from reads aligned to hg19. We were wondering if the same data was available for reads aligned to hg38 and if so, how we could access that data?

A:
Sorry, we currently don’t have the transcript RSEM count data from reads aligned to hg38.

Question regarding RNA-seq data uploaded to “Synapse”

Posted on July 12, 2019 by gersteinfaq

Q:
I was referred to you by Micheal Gandal for a question I have regarding you RNA-seq data from the fascinating shared article "Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder"

I know you’ve uploaded the TPM data to to PsychEncode website – could you tell me if the data this file is normalized DER-02_PEC_Gene_expression_matrix_TPM

A:
We didn’t run any quantile normalization on this file.

MS data in the Psychencode datasets

Posted on July 12, 2019 by gersteinfaq

Q1:
I recently met you at LMB where you gave a wonderful talk on PsychENCODE data analysis.

You mentioned that there were MS datasets in the PsychENCODE. I am unable to find it. Is it possible for you to point me to that or point me to someone who may know about this? Is it possible for you to point out the MS data in the PschENCODE datasets?

A2:
Could you please explain a little more about what dataset you need?

Q2:
I am looking for Mass Spec data sets in PsychEncode. Mark mentioned that MS analysis were done for some samples. I wonder whether you could help me in identifying them?

A2:
I just checked with our DCC team and currently we don’t have any Mass Spec data available for public sharing.

Q3:
What is dcc team? I was given to believe from the publications that this data was available along with others for analysis. i would not have asked otherwise. is there a way i can reach out to any group among your dcc team that has this data to see whether i can formally collaborate with them? Can you kindly let me know who may be the best person to ask for the details of the group that may have the MS datasets? I am looking for MS data (even if it is published) from any of the samples that were used in the Psychencode project.
I am willing to collaborate and share authorships with the scientists who generated these datasets?
Would it be possible for you to point out to any one whom you may know who may have this dataset (published or unpublished)?

A3:
I have contacted the group that is generating the Mass Spec data. Are you specifically interested in proteomics related to donors with neuropsychiatric disorders? We (Sage Bionetworks) also function as the data coordination center for the NIA funded Accelerating Medicines Partnership – Alzheimer’s Disease (AMP-AD). There are a variety of studies in AMP-AD with Mass Spec proteomics on post mortem brain tissue, that also have other genomic data such as WGS and RNAseq. Included in that is the Religious Orders Study and Memory and Aging project (ROS/MAP) from the Rush Alzheimer’s Disease Center. See here for information on the cohorts. There will be TMT labeled MS on ~400 ROS/MAP donors released this fall.

Q4:
Thank you for getting in touch with me. Thank you for your pointer. Indeed, we will be interested in the Alzheimer’s samples (all the three WGS, RNAseq and Proteomics).
I will write a separate note to you on this.
At the moment, we are looking for MS samples from donors with neuropsychiatric disorders.

A4:
Actually, my lab is doing something very similar as well, validating novel ORFs identified from our third generation sequencing, and riboseq data.
If you use other approaches that we did not use yet, or with some special goals more than just validating ORFs in brain, I will be happy to collaborate.
I have two students/collaborators on this.

Q5:
Is it possible for me to make a quick call?

A5:
…(resolved via phone call on Jul 9, 2019)…

Question about deconvolution analysis in PsychENCODE paper

Posted on May 19, 2019 by gersteinfaq

Q:
I have a question about the deconvolution method used in the flagship PsychENCODE paper Comprehensive functional genomic resource and integrative model for the human brain. I would like to perform a similar analysis on my own bulk samples using the single cell expression profiles used in the paper, however it is unclear how these profiles are formed.

Specifically, supplementary file DER-23 lists the cell type fractions for 24 cell types. These coefficients presumably came from solving the following:

B = C * W

Where B is the marker gene by samples matrix, C is the marker gene by cell type matrix, and W is the appropriate weights matrix. How do I go about obtaining or reproducing the 24 cell type profiles? From what I can tell, these profiles were not released along with the other supplemental data sets.

If you could please answer my question or forward this email on the appropriate author(s), I would appreciate it.

A:
Sorry for the late reply. I think the profiles you want are on resource.psychencode.org

Requesting information about cQTL and fQTL data from PsychENCODE

Posted on May 19, 2019 by gersteinfaq

Q:
I am writing in regards to the datasets posted on PsychENCODE website. I noticed that full summary statistics for QTL maps are posted for eQTLs and isoQTLs, but cQTLS and fQTLs only have top SNP information. Is there a chance you could upload full summary stats for cQTLs and fQTLs as well?

A:
We calculated cQTLs and fQTLs differently from eQTLs and isoQTLs. So we only have top SNP information for cQTLs and fQTLs.

Inquiry regarding PsychENCODE eQTL resource

Posted on May 3, 2019 by gersteinfaq

Q1:

Was the eQTLs calculated on 1,886 unique individuals?

A1:
No, the eQTLs were calculated on 1387 filtered adult samples with matching gene expression and genotypes.

Q2:
In Fig S34, it mentions only 1,432 individuals have genotyped. How was the genotype information determined for the remaining 454 individuals?

A2:
We didn’t have genotype information determined for the remaining 454 individuals. So we didn’t include these 454 individuals in any QTL analysis.

Q3:
The # of samples with genotypes enumerated in Table S1 and Table S11 do not appear to match. For example, Table S1 reports 450 GTEx samples (97 DFC), but Table S11 reports 25 GTEx genotypes from the pre-frontal cortex. There might be some subtlety between these two tables I have missed, could you please clarify how to properly interpret these tables?

A3:
The Genotypes column in Table S11 only includes the filtered high genotyping quality samples (for example, genotype imputation accuracy score R2>0.3) which have matched RNA-seq data.

Gerstein Lab FAQs

Frequently Asked Questions

Tag Archives: sl

Full set of tQTLs and isoQTLs from Wang et al. 2018

Full set of tQTLs and isoQTLs from Wang et al. 2018

Questions regarding eqtl calls

Question about the cQTL analysis in Wang et al 2018

Inquiry regarding PsychENCODE Datasets

Question regarding RNA-seq data uploaded to “Synapse”

MS data in the Psychencode datasets

Question about deconvolution analysis in PsychENCODE paper

Requesting information about cQTL and fQTL data from PsychENCODE

Inquiry regarding PsychENCODE eQTL resource