Request re pseudogene.org

Q:
We develop MH Guide, a genome-guided cancer treatment decision support software (https://www.molecularhealth.com/us/).

I was trying to get the current annotated pseudogene information via http://pseudogene.org/Human/. The link to GENCODE seems not to work and returns “file not found” (https://www.gencodegenes.org/releases/current.html).

Could you please kindly redirect me to the file with annotated pseudogenes.

A:
If you are looking for the current GENCODE annotation, for the current release please follow this link: https://www.gencodegenes.org/human/ . If you want to use pseudo pipe to create a custom human annotation of a genome sequence of preferences, please follow the instructions here: http://pseudogene.org/pseudopipe/ . If you are interested in the functional annotation of pseudogenes with information regarding pseudogene activity please see http://pseudogene.org/psicube/ .

Trying to execute Pseudopipe but I am running into multiple errors

Q1:
I am trying to execute Pseudopipe but I am running into multiple errors. I have downloaded the latest version from the website and the command I am using to run it is

/work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/pseudopipe/bin/pseudopipe.sh /work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/ppipe_output/caenorhabditis_elegans_62_220a /work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa /work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/Caenorhabditis_elegans.WS220.62.dna.chromosome.%s.fa /work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/pep/Caenorhabditis_elegans.WS220.62.pep.fa /work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/mysql/chrI_exLocs 0

I keep getting the following errors:

Making directories
Copying sequences
Fomatting the DNAs
/work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/pseudopipe/bin/pseudopipe.sh: line 84: /home/bp272/bin/blast-2.2.13/bin/formatdb: No such file or directory
Preparing the blast jobs
Skipping blast
Processing blast output
/work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/pseudopipe/bin/pseudopipe.sh: line 114: /home/bp272/bin/Python-2.6.6/python: No such file or directory
Finished processing blast output
Running Pseudopipe on both strands
Working on M strand
/work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/pseudopipe/bin/pseudopipe.sh: line 144: /home/bp272/bin/Python-2.6.6/python: No such file or directory
Finished Pseudopipe on strand M
Working on P strand
/work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/pseudopipe/bin/pseudopipe.sh: line 144: /home/bp272/bin/Python-2.6.6/python: No such file or directory
Finished Pseudopipe on strand P
Generating final results
find: ‘/work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/minus/pgenes’: No such file or directory
find: ‘/work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/plus/pgenes’: No such file or directory
gzip: /work/LAS/rpwise-lab/sagnik/finder/lib/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/*/pgenes/*.all.fa: No such file or directory
Finished generating pgene full alignment
Finished running Pseudopipe

I am running this inside conda environment. I tried executing it outside but it gave me the same errors. Could you please help?

I have posted on the website under the comments section by mistake. Please excuse my ignorance.

A1:
It seems that you did not set the environment file (env.sh) correctly. You may need to set the it as the following and put in the same dir as fetchEnsemblFiles.py & processEnsemblFiles.sh

###
#!/bin/sh
if [ ! -z "$PSEUDOPIPE_ENV" ]; then source $PSEUDOPIPE_ENV; return; fi

# Pseudopipe configuration

export PSEUDOPIPE_HOME=`cd \`dirname $0\`/../; pwd`

export pseudopipe=$PSEUDOPIPE_HOME/core/runScripts.py

export genPgeneResult=$PSEUDOPIPE_HOME/ext/genPgeneResult.sh

export genFullAln=$PSEUDOPIPE_HOME/ext/genFullAln.sh

export fastaSplitter=$PSEUDOPIPE_HOME/ext/splitFasta.py

export sqDedicated=$PSEUDOPIPE_HOME/ext/sqDedicated.py

export sqDummy=$PSEUDOPIPE_HOME/ext/sqDummy.py

export blastHandler=$PSEUDOPIPE_HOME/core/processBlastOutput.py

export extractExLoc=$PSEUDOPIPE_HOME/core/extractKPExonLocations-Aug2016.py # extractKPExonLocations-Jan2016.py

# Python configuration

export pythonExec=/bin/python2

# Alignment tools configuration

export formatDB=/ysm-gpfs/pi/gerstein/yy532/software/blast-2.2.13/bin/formatdb

export blastExec=/ysm-gpfs/pi/gerstein/yy532/software/blast-2.2.13/bin/blastall

export fastaExec=/ysm-gpfs/pi/gerstein/yy532/software/fasta-35.1.5/tfasty35

Q2:
Thank you for your reply. I am using python3 in my pipeline. Will this code work for python3?

A2:
Please use python2.

Small question of the paper “Passenger Mutations in More Than 2,500 Cancer Genomes: Overall Molecular Functional Impact and Consequences”

Q:
Recently, I read a paper which was published in Cell, titled "Passenger Mutations in More Than 2,500 Cancer Genomes: Overall Molecular Functional Impact and Consequences". Cause of my research topic was similar with this paper, just one of question about Figure 2B. In this heatmap, I saw totally 80 motifs on the bottom, but only 70 rows up to them, I was a little bit confused how did you know the ETS motif matched to the marked row?

A:
The rows in the figure correspond to different cancer cohorts or meta-cohorts. We also provide this information on the cancer cohort with significant differential burdening in Supplement 1 in the paper.

PCAWG passenger mutation analysis

Q1:
I was trying to download a subset of data from your recent paper (https://www.cell.com/cell/fulltext/S0092-8674(20)30113-6). However, the website is returning ‘not found’ error (http://pcawg.gersteinlab.org/). Especially, I am interested in ‘Gene list categories’. Therefore, I kindly request you to share relevant files listed under ‘Gene List Categories’ on the website, so I could use in my analysis.

A1:
The website works fine for me. Sure it doesn’t work ? … Please let me know which specific file are you trying to download.

Q2:
Thanks a lot for the reply.

I need the gene list categories listed under PCAWG-specific annotations (http://pcawg.gersteinlab.org/#Annotations)

Eseential Genes
Immune Response Genes
DNA repair Genes
Metabolic Genes
Cancer Pathway Genes
non-Essential Genes
cell Cycle Genes
For some reason, when I click on the link, it’s directly downloading the html file with error. It would be great if you could share these files.

A2:
You can download relevant files from the link listed below.

http://pcawg.gersteinlab.org/Datasets/Annotations/categories/

Question about the cQTL analysis in Wang et al 2018

Q:
I am writing with a question about the cQTL analysis in Wang et al 2018. Were the 292 individuals analyzed in this analysis all of European ancestry? If not, what were the sample sizes for European vs non-European ancestry, and how did you control for ancestry in your analysis?

I apologize for writing with such a detailed question, but I could not find the answer in the main text or supplement of the paper, or on the synapse website. (Context: I am interested in cross-population genetic analyses of psychiatric disease and wondering if PyschENCODE cQTL data is relevant.)

A:
In calculating the cQTLs, we used 173 Caucasians and 119 non-Caucasians. With respect to controlling for ancestry — we used the top three genotype principal components as covariates to control for ancestral group.

DTE results as described in the paper “Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder”

Q:
I was trying to reproduce the DTE results as described in the paper "Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder". I am a registered user of synapse but was unable to find the data mentioned below and would really appreaciate your help in obtaining the same.
The supplementary method of this paper mentions the different covariates used for carrying out DGE and DTE using the nlme package. Would it be possible to obtain the seqPCs and SV values, particulary seqPCs (1-3, 5-8, 10-14, 16, 18-25, 27-29) and SVs (1-4) used in the lme model?
Additionally, could I obtain the final list of sample IDs that made it to the DGE/DTE analysis?

A:
See the seqPCs we used in our analysis (attached)

Query regarding Pseudopipe

Q:
Since I am working on pseudogene identification for my new project, I was using your pipeline. But I am having few errors which I am going to mention below. Can you please help me to resolve these errors. I shall be very grateful to you.
>
> ERRORS:
> 1. On terminal:
> sudo bash pseudopipe.sh ~/pgenes/ppipe_output/caenorhabditis_elegans_62_220a ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/Caenorhabditis_elegans.WS220.62.dna.chromosome.%s.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/pep/Caenorhabditis_elegans.WS220.62.pep.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/mysql/chr%s_exLocs 0
> Making directories
> Copying sequences
> Fomatting the DNAs
> Preparing the blast jobs
> Skipping blast
> Processing blast output
> Skipping the processing of blast output
> Running Pseudopipe on both strands
> Working on M strand
> Finished Pseudopipe on strand M
> Working on P strand
> Finished Pseudopipe on strand P
> Generating final results
> find: ‘/home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/minus/pgenes’: No such file or directory
> find: ‘/home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/plus/pgenes’: No such file or directory
> gzip: /home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/*/pgenes/*.all.fa: No such file or directory
> Finished generating pgene full alignment
> Finished running Pseudopipe
> 2. In log file inside minus and plus folder:
> need to document overlap parameter (30) and dependency on mask array files.
> mask fields [2, 3]
> Traceback (most recent call last):
> File "/home/kashmir/SOFTWARE/pgenes/pseudopipe/core/filterEnsemblGene.py", line 60, in <module>
> maskFile = openOrFail(ExonMaskTemplate % chr, ‘r’)
> TypeError: not all arguments converted during string formatting
> running filterEnsemblGene.py
> failed during filterEnsemblGene.py stage.

A:
From the output it looks like you had a couple of issues starting with the blast job.

Could you please check your output directory in the blast/output folder and see if you see any split000*.Out files (where * is a number). If you don’t see any output files it means that your blast job did not run. In order run the pipeline you need to have a couple of additional software packages installed and preferentially added to the path. Specifically you will need: blast-2.2.13 and fasta-35.1.5. If you do not want to add them to the path, you can add the path to their location in the env.sh file that you can find in the bin folder of the PseudoPipe.

This should allow you to run the pipeline without any issues.

Question regarding list of human pseudogenes

Q:
I am … developing an application that matches cancer patients to treatment based on the person’s genetic profile. We are looking for an updated list of human pseudogenes to use in evaluating submitted DNA variants. Can you tell me if the Pseudo Fam data files at the pseudogen.org website are still being updated? If not, perhaps you could recommend an alternate source?

A:
Best to get an updated list of pseudogenes from pseudogene.org, which is continually updated, ie http://pseudogene.org/Human/. Yucheng

Referring to your paper: Structuring supplemental materials in support of reproducibility

Q:
I just read your paper mentioned above. I work in the area of
computational reproducibility so the paper was pretty interesting to
read. However, I stumbled a bit over one of your concluding remarks. You
are saying

"One useful tactic may be detailed sampling: perhaps it is best for the
editor to organize a system wherein, randomly, referees are asked to
review samples in greater detail to ensure the overall quality of the
supplements without quickly overwhelming the peer review system."

I am not sure whether I understood correctly how this could be
implemented. Does it mean that the editor randomly asks one of the
reviewers to look at the supplements, or do all reviewers look at
subsets of supplements? I find this idea pretty interesting and was
wondering whether you have published further articles on this topic?

A:
With respect to: "Does it mean that the editor randomly asks one of the reviewers to look at the supplements, or do all reviewers look at subsets of supplements?"
—> The former

With respect to: "I find this idea pretty interesting and was wondering whether you have published further articles on this topic?"
—> Not exactly.., but you might find useful the related work:
http://papers.gersteinlab.org/papers/structbl
http://papers.gersteinlab.org/papers/SDA