PCAWG passenger mutation analysis

Posted on March 9, 2020 by gersteinfaq

Q1:
I was trying to download a subset of data from your recent paper (https://www.cell.com/cell/fulltext/S0092-8674(20)30113-6). However, the website is returning ‘not found’ error (http://pcawg.gersteinlab.org/). Especially, I am interested in ‘Gene list categories’. Therefore, I kindly request you to share relevant files listed under ‘Gene List Categories’ on the website, so I could use in my analysis.

A1:
The website works fine for me. Sure it doesn’t work ? … Please let me know which specific file are you trying to download.

Q2:
Thanks a lot for the reply.

I need the gene list categories listed under PCAWG-specific annotations (http://pcawg.gersteinlab.org/#Annotations)

Eseential Genes
Immune Response Genes
DNA repair Genes
Metabolic Genes
Cancer Pathway Genes
non-Essential Genes
cell Cycle Genes
For some reason, when I click on the link, it’s directly downloading the html file with error. It would be great if you could share these files.

A2:
You can download relevant files from the link listed below.

http://pcawg.gersteinlab.org/Datasets/Annotations/categories/

Question about the cQTL analysis in Wang et al 2018

Posted on March 9, 2020 by gersteinfaq

Q:
I am writing with a question about the cQTL analysis in Wang et al 2018. Were the 292 individuals analyzed in this analysis all of European ancestry? If not, what were the sample sizes for European vs non-European ancestry, and how did you control for ancestry in your analysis?

I apologize for writing with such a detailed question, but I could not find the answer in the main text or supplement of the paper, or on the synapse website. (Context: I am interested in cross-population genetic analyses of psychiatric disease and wondering if PyschENCODE cQTL data is relevant.)

A:
In calculating the cQTLs, we used 173 Caucasians and 119 non-Caucasians. With respect to controlling for ancestry — we used the top three genotype principal components as covariates to control for ancestral group.

DTE results as described in the paper “Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder”

Posted on March 9, 2020 by gersteinfaq

Q:
I was trying to reproduce the DTE results as described in the paper "Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder". I am a registered user of synapse but was unable to find the data mentioned below and would really appreaciate your help in obtaining the same.
The supplementary method of this paper mentions the different covariates used for carrying out DGE and DTE using the nlme package. Would it be possible to obtain the seqPCs and SV values, particulary seqPCs (1-3, 5-8, 10-14, 16, 18-25, 27-29) and SVs (1-4) used in the lme model?
Additionally, could I obtain the final list of sample IDs that made it to the DGE/DTE analysis?

A:
See the seqPCs we used in our analysis (attached)

Query regarding Pseudopipe

Posted on March 9, 2020 by gersteinfaq

Q:
Since I am working on pseudogene identification for my new project, I was using your pipeline. But I am having few errors which I am going to mention below. Can you please help me to resolve these errors. I shall be very grateful to you.
>
> ERRORS:
> 1. On terminal:
> sudo bash pseudopipe.sh ~/pgenes/ppipe_output/caenorhabditis_elegans_62_220a ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/Caenorhabditis_elegans.WS220.62.dna.chromosome.%s.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/pep/Caenorhabditis_elegans.WS220.62.pep.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/mysql/chr%s_exLocs 0
> Making directories
> Copying sequences
> Fomatting the DNAs
> Preparing the blast jobs
> Skipping blast
> Processing blast output
> Skipping the processing of blast output
> Running Pseudopipe on both strands
> Working on M strand
> Finished Pseudopipe on strand M
> Working on P strand
> Finished Pseudopipe on strand P
> Generating final results
> find: ‘/home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/minus/pgenes’: No such file or directory
> find: ‘/home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/plus/pgenes’: No such file or directory
> gzip: /home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/*/pgenes/*.all.fa: No such file or directory
> Finished generating pgene full alignment
> Finished running Pseudopipe
> 2. In log file inside minus and plus folder:
> need to document overlap parameter (30) and dependency on mask array files.
> mask fields [2, 3]
> Traceback (most recent call last):
> File "/home/kashmir/SOFTWARE/pgenes/pseudopipe/core/filterEnsemblGene.py", line 60, in <module>
> maskFile = openOrFail(ExonMaskTemplate % chr, ‘r’)
> TypeError: not all arguments converted during string formatting
> running filterEnsemblGene.py
> failed during filterEnsemblGene.py stage.

A:
From the output it looks like you had a couple of issues starting with the blast job.

Could you please check your output directory in the blast/output folder and see if you see any split000*.Out files (where * is a number). If you don’t see any output files it means that your blast job did not run. In order run the pipeline you need to have a couple of additional software packages installed and preferentially added to the path. Specifically you will need: blast-2.2.13 and fasta-35.1.5. If you do not want to add them to the path, you can add the path to their location in the env.sh file that you can find in the bin folder of the PseudoPipe.

This should allow you to run the pipeline without any issues.

Question regarding list of human pseudogenes

Posted on March 9, 2020 by gersteinfaq

Q:
I am … developing an application that matches cancer patients to treatment based on the person’s genetic profile. We are looking for an updated list of human pseudogenes to use in evaluating submitted DNA variants. Can you tell me if the Pseudo Fam data files at the pseudogen.org website are still being updated? If not, perhaps you could recommend an alternate source?

A:
Best to get an updated list of pseudogenes from pseudogene.org, which is continually updated, ie http://pseudogene.org/Human/. Yucheng

Gerstein Lab FAQs

Frequently Asked Questions

Daily Archives: March 9, 2020

PCAWG passenger mutation analysis

Question about the cQTL analysis in Wang et al 2018

DTE results as described in the paper “Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder”

Query regarding Pseudopipe

Question regarding list of human pseudogenes