Query regarding Pseudopipe

Q:
Since I am working on pseudogene identification for my new project, I was using your pipeline. But I am having few errors which I am going to mention below. Can you please help me to resolve these errors. I shall be very grateful to you.
>
> ERRORS:
> 1. On terminal:
> sudo bash pseudopipe.sh ~/pgenes/ppipe_output/caenorhabditis_elegans_62_220a ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/Caenorhabditis_elegans.WS220.62.dna.chromosome.%s.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/pep/Caenorhabditis_elegans.WS220.62.pep.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/mysql/chr%s_exLocs 0
> Making directories
> Copying sequences
> Fomatting the DNAs
> Preparing the blast jobs
> Skipping blast
> Processing blast output
> Skipping the processing of blast output
> Running Pseudopipe on both strands
> Working on M strand
> Finished Pseudopipe on strand M
> Working on P strand
> Finished Pseudopipe on strand P
> Generating final results
> find: ‘/home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/minus/pgenes’: No such file or directory
> find: ‘/home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/plus/pgenes’: No such file or directory
> gzip: /home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/*/pgenes/*.all.fa: No such file or directory
> Finished generating pgene full alignment
> Finished running Pseudopipe
> 2. In log file inside minus and plus folder:
> need to document overlap parameter (30) and dependency on mask array files.
> mask fields [2, 3]
> Traceback (most recent call last):
> File "/home/kashmir/SOFTWARE/pgenes/pseudopipe/core/filterEnsemblGene.py", line 60, in <module>
> maskFile = openOrFail(ExonMaskTemplate % chr, ‘r’)
> TypeError: not all arguments converted during string formatting
> running filterEnsemblGene.py
> failed during filterEnsemblGene.py stage.

A:
From the output it looks like you had a couple of issues starting with the blast job.

Could you please check your output directory in the blast/output folder and see if you see any split000*.Out files (where * is a number). If you don’t see any output files it means that your blast job did not run. In order run the pipeline you need to have a couple of additional software packages installed and preferentially added to the path. Specifically you will need: blast-2.2.13 and fasta-35.1.5. If you do not want to add them to the path, you can add the path to their location in the env.sh file that you can find in the bin folder of the PseudoPipe.

This should allow you to run the pipeline without any issues.

Question regarding list of human pseudogenes

Q:
I am … developing an application that matches cancer patients to treatment based on the person’s genetic profile. We are looking for an updated list of human pseudogenes to use in evaluating submitted DNA variants. Can you tell me if the Pseudo Fam data files at the pseudogen.org website are still being updated? If not, perhaps you could recommend an alternate source?

A:
Best to get an updated list of pseudogenes from pseudogene.org, which is continually updated, ie http://pseudogene.org/Human/. Yucheng

pseudogenes in PseudoPipe

Q:
The pseudogene databases, including Pseudofam and PseudoPipe, have been extremely helpful for a project I am working on, and I was wondering if you knew how it would be possible to compare the DNA sequence of a human gene with all the pseudogenes on the PseudoPipe resources. I am looking to identify pseudogenes that may be related to the genes I am working with. I was hoping there was a way to devise this information by BLAST comparing the DNA sequence a specific gene with the sequences from all the pseudogenes in the genome, similar to NCBI BLAST or UniProt BLAST feature.

Any help or insight would be appreciated.

A:
If you have many genes to query, may be you can use BLAST+ (https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download) to build your own tool. You can then download the sequences of all pseudogenes and make a BLAST database (https://www.ncbi.nlm.nih.gov/books/NBK279688/ ) from which you can query.

Asking or data used in finding processed psedogenes in the human genome

Q:
Recently, I was reading one of your papers about finding processed pseudogenes published in 2003: "Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome". Because I want to find processed pseudogenes among several recently released mammalian genomes. Your paper is very interesting and helpful for my work. And to ensure the method i grasped is correct, I want to use your original data to redo your analysis process.

But I come across a problem when I download nonredundant human proteome set from the EBI Web site. Because the data was published in June 2002, and I can’t successfully download them from EBI website. Here I write to you with the hope of getting nonredundant human proteome set you used released in June 2002. Although I know many years have passed since the paper was published and you may also lost the original data, I still want to have a try!

A:
The data associated with the paper is here: http://pseudogene.org/human-all/index.html. You can also find the latest human pseudogene annotation here: http://pseudogene.org/Human/

Regarding obtaining data of pseudogene

Q:
Can you please help me to get pseudogene information for human, mouse, rat, drosophilla and C. elegans? I need exclusive fasta files or .bed files corresponding to pseudogene annotations for these five species separately.

A:
see pseudogene.org. For any infromation regarding the pseudogene annotation in human, mouse, drosophila and C.elegans please see:
http://www.pseudogene.org/psicube/
And
http://www.pseudogene.org/Mouse/

pseudoPipe

Q:
We are interested in using PseudoPipe for
identifying pseudogenes. I downloaded the software, but the program requires
older version of blast software including blastall and formatdb. Both
programs are replaced by newer version of the blast software, and are not
available to download from NCBI website. I am wondering if you could change
PseudoPipe to accommodate the new version of blast.

A:
Thank you for your suggestion. In the mean time, you can find the correct versions of fasta and blast freely available online. For easing the user experience we provide a link to the two packages on the website http://pseudogene.org/pseudopipe/ .