I am trying to find SNPs in
pseudogenes but the database for the SNP’s is built for different genome
assemblies than pseudogenes predictions from PseudoPipe. Do you have the
current pipeline pseudogenes predictions on eukaryotic genomes? Or is there
a way to remap the genome assemblies used by Pipeline to a different
If I want to use PsedoPipe, where in Ensembl can I find the input data set?
Regarding your questions there are a number of things that you can do:
* if you are interested in the human/mouse genome, these are available for the latest assembly GRC38 from the pseudogene.org webpage , see http://mouse.pseudogene.org/data/Reference/Mus_musculus.GRCm38.87_pgene.txt and http://www.pseudogene.org/Human/Human90.txt respectively.
* the latest annotations for the worm and fly genomes, these are available from here :
* if you are interested in other eukaryotic genomes that have annotation build on older assemblies, one option is to do a lift over of the annotation from an old assembly to a newer one. This can easily be done using the UCSC genome browser resource https://genome.ucsc.edu/cgi-bin/hgLiftOver, however I would very much advise to actually run pseudo pipe on your machine given the fact that improvement in assembly and protein coding annotation will considerably improve the output of the pseudogene annotation. You can download and run pseudo pipe as described here: http://pseudogene.org/pseudopipe/
* also using the “fetch file” as described here http://pseudogene.org/pseudopipe/ will automatically download all the necessary data for you from the ensembl server.