Terms in Pseudopipe output, etc

Posted on June 3, 2019 by gersteinfaq

Q:
I am looking the details of Pseudopipe’s output terms such as "frac", "ins", "del", "shift", "stop", "polya". Also how pseudopipe makes confirm the pseudogenes in its results.

A:
frac = fraction of parent gene that matches the pseudogene
ins = number of insertions in the pseudogene compared to parent sequence
del = number of deletions in the pseudogene compared to parent sequence
shift = number of frame shifts in the pseudogene compared to parent sequence
stop = number of stop codons in the pseudogene compared to parent sequence
polya = flag indicating the presence or absence of a polyA tail

Also see below the code associated with the script fetchEnsemblFiles.py for downloading the input data for eukaryotes from ensembl website:

#!/usr/bin/env python

# some examples of files and locations
# pub
# lrwxrwxrwx 1 ftpuser ftpusers 30 Dec 7 16:33 current_homo_sapiens -> release-36/homo_sapiens_36_35i
#
#/pub/release-36/homo_sapiens_36_35i/data/fasta/dna
#-rw-rw-r– 1 ftpuser ftpusers 67675771 Nov 15 14:48 Homo_sapiens.NCBI35.dec.dna.chromosome.1.fa.gz
#-rw-rw-r– 1 ftpuser ftpusers 40802343 Nov 15 14:55 Homo_sapiens.NCBI35.dec.dna_rm.chromosome.1.fa.gz
#
#/pub/release-36/homo_sapiens_36_35i/data/fasta/pep
#-rw-rw-r– 1 ftpuser ftpusers 3817861 Nov 15 19:46 Homo_sapiens.NCBI35.dec.pep.known.fa.gz
#
#/pub/release-36/homo_sapiens_36_35i/data/mysql/homo_sapiens_core_36_35i
#-rw-rw-r– 1 ftpuser ftpusers 2957452 Dec 2 22:45 exon.txt.table.gz
#-rw-rw-r– 1 ftpuser ftpusers 1747738 Dec 2 22:45 exon_stable_id.txt.table.gz
#-rw-rw-r– 1 ftpuser ftpusers 1489045 Dec 2 22:45 exon_transcript.txt.table.gz
#-rw-rw-r– 1 ftpuser ftpusers 4626 Dec 2 21:57 homo_sapiens_core_36_35i.mysql40_compatible.sql.gz
#-rw-rw-r– 1 ftpuser ftpusers 4753 Dec 2 21:57 homo_sapiens_core_36_35i.sql.gz

import os, os.path, re, sys
from ftplib import FTP

class collect:
def __init__(self): self.data = []
def more(self, l): self.data.append(l)

def maybeRetrFile(fromPath, toPath):
what = ‘from %s –> to %s’ %(fromPath, toPath)
if os.path.exists(toPath):
print ‘skipping ‘+what
return
else:
if toPath.endswith(‘.gz’) and os.path.exists(toPath[:-3]):
print ‘skipping (uncompressed) ‘+what
return

print what
toFile = open(toPath, ‘w’)
ec.retrbinary(‘RETR ‘+fromPath, toFile.write, blocksize=100000)
toFile.close()

target = sys.argv[1].strip().lower().replace(‘ ‘, ‘_’)

release = ‘current_’

if len(sys.argv) > 2:
release = ‘release-‘ + sys.argv[2] + ‘/’

# set up initial connection
host=’ftp.ensemblgenomes.org’
print ‘Logging into ‘+host
ec = FTP(host)
ec.login()

# look for target in a listing of pub
files = collect()
where=’pub/’+release+’mysql’
print ‘Listing ‘+where
ec.dir(where, files.more)
tEntries = [l for l in files.data if target+”_core_” in l and ‘->’ not in l ]
if len(tEntries) != 1:
print target + ‘ is either missing or not unique:’
print tEntries
print ‘\n’.join(files.data)
ec.close()
sys.exit(-1)

# “parse” current link name
curPat = re.compile(r”+target+’_core_(.+)_(.+)\Z’)
tPath = tEntries[0].split()[-1]
mo = curPat.match(tPath)
if not mo:
print ‘dont\’t understand release naming scheme: ‘+ tPath
ec.close()
sys.exit(-1)
[maj, min] = mo.groups()
majMin=maj+’_’+min
outDir = target + ‘_’ + majMin

print ‘Release: ‘+release[0:len(release)-1]+’, ‘+’tPath: ‘+tPath+’, ‘+’target: ‘+target+’, ‘+’maj: ‘+maj+’, ‘+’majMin: ‘+majMin+’, ‘+’outDir: ‘+outDir

## if os.path.exists(outDir):
## print ‘up to date: ‘ + tPath
## ec.close()
## sys.exit(0)

# need to get files. first, set up directories.
[dDir, mDir, pDir] = [outDir+d for d in [‘/dna/’, ‘/mysql/’, ‘/pep/’]]
if not os.path.exists(dDir): os.makedirs(dDir, 0744)
if not os.path.exists(mDir): os.makedirs(mDir, 0744)
if not os.path.exists(pDir): os.makedirs(pDir, 0744)

# retrieve dna
dnaPat = re.compile(r’\.dna(_rm)?\.chromosome\..+\.fa\.gz\Z’)
dFiles = collect()
where = ‘pub/’+release+’fasta/%s/dna’ % target
print ‘Changing dir to ‘+where
ec.dir(where, dFiles.more)
dKeep = [l for l in dFiles.data if dnaPat.search(l)]
for f in dKeep:
fn = f.split()[-1]
maybeRetrFile(where+’/’+fn, dDir+fn)

# retrieve pep
where = ‘pub/’+release+’fasta/%s/pep’ % target
pFiles = collect()
print ‘Changing dir to ‘+where
ec.dir(where, pFiles.more)
for f in pFiles.data:
fn = f.split()[-1]
maybeRetrFile(where+’/’+fn, pDir+fn)

# retrieve mysql
# older releases?: mFiles = [‘exon.txt.table’, ‘exon_transcript.txt.table’, ‘gene_stable_id.txt.table’, ‘seq_region.txt.table’, ‘transcript.txt.table’, ‘translation.txt.table’, ‘translation_stable_id.txt.table’, target+’_core_’+majMin+’.sql’, target+’_core_’+majMin+’.mysql40_compatible.sql’]
#older releases which have *_stable_id.txt: mFiles = [‘exon.txt’, ‘exon_transcript.txt’, ‘gene_stable_id.txt’, ‘seq_region.txt’, ‘transcript.txt’, ‘translation.txt’, ‘translation_stable_id.txt’, target+’_core_’+majMin+’.sql’]
mFiles = [‘exon.txt’, ‘exon_transcript.txt’, ‘seq_region.txt’, ‘transcript.txt’, ‘translation.txt’, target+’_core_’+majMin+’.sql’]

where = ‘pub/’+release+’mysql/%s_core_%s’ % (target, majMin)
print ‘Changing dir to ‘+where
for mf in mFiles:
maybeRetrFile(where+’/’+mf+’.gz’, mDir+mf+’.gz’)

# retrieve GTF
where = ‘pub/’+release+’gtf/%s’ % (target)
print ‘Changing dir to ‘+where
gtfPat = re.compile(r’\.gtf\.gz\Z’)
gFiles = collect()
ec.dir(where, gFiles.more)
gKeep = [l for l in gFiles.data if gtfPat.search(l)]
for f in gKeep:
fn = f.split()[-1]
maybeRetrFile(where+’/’+fn, mDir+fn)

ec.close()

print ‘Processing Fetched Files’
#os.system(‘%s/processEnsemblFiles.sh %s’ % (sys.path[0], outDir))

Posted in pseudogenes | Tagged csds | Leave a reply

Regarding PseudoPipe MySQL file

Posted on June 3, 2019 by gersteinfaq

Reply

Q:
I am using PseudoPipe to find pseudogenes from a query Chromosome. I have a chromosome nucleotide sequence file and a protein sequences file.

I am not getting what is MySQL file and how can get this and one more file of masking.

A:
PseudoPipe is configured to run on nucleotide and protein sequence files as formatted and available for download from the ensembl server.
Regarding your issues:

1. A MySQL file is a file dowloaded from a MySQL database , and thus has it’s specific format. Ensemble uses this database to store exons co-ordinates for all the protein coding genes starting with an exon id, chromosome number, start and end position, strand, etc . As such I suggest you format your exons information accordingly . As example you can use the” chrI_exLocs” file located in the mysql folder from the C.elegans example that you downloaded along with pseudopipe.

2. A masking file is a nucleotide files (in fasta format) that masks all the repeat sequences from the genome. If you want to create it yourself you should use a repeat masker and format it accordingly to the file that you see in the dna folder in the C.elegans example dna_rm.fa .

Posted in pseudogenes | Tagged csds | Leave a reply

Question about a potential error with Pseudogene.org

Posted on June 3, 2019 by gersteinfaq

Reply

Q1:
I want to say great job with the Pseudogene.org site! I recently noticed a potential error and wanted to send a email to inform you if you haven’t already picked it up yourselves….

In the file located at the following address:

http://www.pseudogene.org/psicube/data/gencode.v10.pgene.parents.txt

The start and end chromosomal locations for the pseudogenes are the same. See below:

ENST00000344844.3

unprocessed_pseudogene

chr19 +

9314984

9314984

ENSG00000237521.1 ENST00000456448.1 OR7E24

"Transcribed: 0" "Active Chromatin: GM12878=0;K562=0;Helas3=0;Hepg2=0;H1hesc=1"

"Open Chromatin: GM12878=0;K562=0;Helas3=.;Hepg2=.;H1hesc=."

"TFBS: GM12878=0;K562=0;Helas3=0;Hepg2=0;H1hesc=0"

"Pol2: GM12878=0;K562=0;Helas3=0;Hepg2=0;H1hesc=0"

"Constraint: 0"

ENST00000359901.3

unprocessed_pseudogene

chr2 –

98123508

98123508 . .

. "Transcribed: 0"

"Active Chromatin: GM12878=1;K562=0;Helas3=0;Hepg2=0;H1hesc=1"

"Open Chromatin: GM12878=0;K562=0;Helas3=.;Hepg2=.;H1hesc=."

"TFBS: GM12878=1;K562=1;Helas3=1;Hepg2=1;H1hesc=0"

"Pol2: GM12878=1;K562=1;Helas3=1;Hepg2=1;H1hesc=0"

"Constraint: 0"

ENST00000459808.1

processed_pseudogene chr3 –

136527393

136527393 ENSG00000198075.5 ENST00000272452.2

SULT1C4 "Transcribed: 0"

"Active Chromatin: GM12878=1;K562=0;Helas3=1;Hepg2=1;H1hesc=1"

"Open Chromatin: GM12878=0;K562=0;Helas3=.;Hepg2=.;H1hesc=."

"TFBS: GM12878=0;K562=0;Helas3=0;Hepg2=0;H1hesc=0"

"Pol2: GM12878=0;K562=0;Helas3=0;Hepg2=0;H1hesc=0"

"Constraint: 1"

A1:
Thanks for pointing us the problem. However, I’m a little confused of what file you are referring to. The parents file with url in your message (http://www.pseudogene.org/psicube/data/gencode.v10.pgene.parents.txt) does not match the contents you provided. The contents look more like from the file: http://pseudogene.org/psidr/psiDR.v0.txt. But neither file has the chromosome coordinates issue you mentioned. Maybe you meant some other file?

Q2:
It appears you are correct, i provided the link for the GENCODEv10 pseudogene resource instead of the v7 resource by mistake. I was, however, able to go back and find the file where I had found the mistake.

I had downloaded the Pseudogene Resource psiDR from the GENCODE website ( ftp://ftp.sanger.ac.uk/pub/gencode/psidr/psiDR.v0.txt.gz ) and assumed that this file is the same as the link you provide ( http://pseudogene.org/psidr/psiDR.v0.txt ). Although it appears they are not… The link on the GENCODE website ( ftp://ftp.sanger.ac.uk/pub/gencode/psidr/psiDR.v0.txt.gz ) displays the problem that I previously described, whereas the link you provide does not.

The file with the problem I described is actually linked at this page: http://www.gencodegenes.org/psidr/
Under the link entitled:
New! Pseudogene Resource psiDR
which redirects to: ftp://ftp.sanger.ac.uk/pub/gencode/psidr/psiDR.v0.txt.gz

I am not sure if you part of the administration for the GENCODE site or not, but potentially if you aren’t, you would like to contact them regarding the problem since it appears to be data from your lab that is represented.

I am sorry for providing the wrong link earlier. Please let me know if you have anymore trouble reproducing the problem.

A2:
I can see the problem too. I’ll contact GENCODE to have the file updated. Thanks for pointing this issue to us!

Posted in pseudogenes | Tagged bp | Leave a reply

Having some problems while executing PseudoPipe

Posted on June 3, 2019 by gersteinfaq

Reply

Q:
I would like to use Pseudopipe. But I have been having some problems while executing PseudoPipe. To test the program I used the example input files (from Caenorhabditis Elegans) which were given. I modified the env.sh. I indicated the paths to python (python2.7), blast (blastall 2.2.25) and tfasty (tfasty36). But no pseudogenes were found when I executed the command which is shown below. I added a screenshot of the error as attachment. Could you give me some guidance to solve this problem?

The command that I am using is:
./pseudopipe.sh ~/pgenes/ppipe_output/caenorhabditis_elegans_62_220a ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/Caenorhabditis_elegans.WS220.62.dna.chromosome.I.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/pep/Caenorhabditis_elegans.WS220.62.pep.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/mysql/chrI_exLocs 0

I downloaded PseudoPipe from:
http://www.pseudogene.org/pseudopipe/

A:
The reason you are not getting any output is because you need to use fasta-35.1.5 (tfasty35). The newer versions of the fasta (e.g. tfasty36) have a different output format that cannot be processed by the downstream programs in our pipeline. We are currently working on updating and improving the pipeline, but for the time being please do use tfasty35.

Posted in pseudogenes | Tagged csds | Leave a reply

Question about fly pseudogenes Sisu et al., 2014 publication

Posted on June 3, 2019 by gersteinfaq

Reply

Q:
We here at FlyBase are reviewing the Sisu et al., 2014 publication (PMID: 25157146) to check on the state of our pseudogene annotations. The paper talks about 145 pseudogenes, but the bed file at the PsiCube site (http://pseudogene.org/psicube/) lists only 108. We’ve really poked around the paper and PsiCube to find the remaining 37, but to no avail.

Would you please point us in the right direction, or send us a full list of the 145 pseudogene calls (with coordinates and parental gene/protein calls)?

A:
Thank for pointing this out. The error will be rectified. The full file is pasted below:

# DUP = duplicated pseudogenes
# PSSD = processed pseudogenes
# FRAG = pseudogenes with ambiguous biotype
#
#
#Chr Start End Strand PgeneID Biotype
chr2L 3162515 3163289 + FBtr0077575 DUP
chr2L 21404115 21404579 + FBtr0085889 DUP
chr2L 21404963 21405361 + FBtr0085890 DUP
chr2L 21405657 21405970 + FBtr0085891 DUP
chr2L 21542989 21543706 + FBtr0085895 DUP
chr2L 20923955 20924534 + FBtr0089857 DUP
chr2L 20928824 20929525 + FBtr0089858 DUP
chr2L 21418852 21419260 + FBtr0091806 DUP
chr2L 21428850 21429382 + FBtr0091809 DUP
chr2L 21438910 21439318 + FBtr0091815 DUP
chr2L 14781715 14783110 – FBtr0100604 DUP
chr2L 5621602 5623760 – FBtr0100853 DUP
chr2L 21589784 21590593 – FBtr0301172 DUP
chr2L 21577205 21577676 – FBtr0301174 DUP
chr2L 20639015 20639578 + FBtr0305612 DUP
chr2L 22131026 22131127 – FBtr0306298 DUP
chr2L 3694105 3694371 – FBtr0307120 DUP
chr2L 16728035 16729599 – FBtr0310391 DUP
chr2L 16700940 16703197 – FBtr0310392 DUP
chr2L 16699921 16700825 – FBtr0310393 DUP
chr2L 21282811 21284273 – FBtr0330681 DUP
chr2L 22066517 22067170 + FBtr0301969 FRAG
chr2L 15836725 15837444 – FBtr0304145 FRAG
chr2L 22340622 22341291 + FBtr0307110 FRAG
chr2L 19074992 19075412 – FBtr0081172 PSSD
chr2L 20862547 20863775 – FBtr0081448 PSSD
chr2L 22226151 22226771 + FBtr0085952 PSSD
chr2L 20901134 20901524 + FBtr0089856 PSSD
chr2L 6972243 6972796 + FBtr0305347 PSSD
chr2LHet 176023 179707 – FBtr0302459 DUP
chr2R 15649932 15650075 – FBtr0086355 DUP
chr2R 11092159 11093839 + FBtr0087364 DUP
chr2R 4456874 4457652 + FBtr0088689 DUP
chr2R 4320391 4320889 – FBtr0088762 DUP
chr2R 7754448 7755084 + FBtr0300860 DUP
chr2R 20405347 20406675 + FBtr0302916 DUP
chr2R 2887656 2889059 + FBtr0303442 DUP
chr2R 10247938 10249285 – FBtr0305617 DUP
chr2R 11261395 11262226 – FBtr0306709 DUP
chr2R 667816 671151 – FBtr0306722 DUP
chr2R 2926635 2927318 + FBtr0306743 DUP
chr2R 969389 969936 – FBtr0307111 DUP
chr2R 617944 618746 – FBtr0307119 DUP
chr2R 14289351 14289802 + FBtr0310489 DUP
chr2R 14289979 14290254 + FBtr0310490 DUP
chr2R 7586597 7587822 – FBtr0088081 FRAG
chr2R 619019 619416 + FBtr0111304 FRAG
chr2R 4044762 4045419 – FBtr0303310 FRAG
chr2R 271084 271321 + FBtr0304148 FRAG
chr2R 11262351 11263204 – FBtr0306724 FRAG
chr2RHet 2909278 2910923 – FBtr0301970 DUP
chr2RHet 334447 335080 + FBtr0302396 DUP
chr2RHet 338744 339377 + FBtr0302397 DUP
chr2RHet 1142083 1142548 + FBtr0302913 DUP
chr2RHet 2387787 2388177 – FBtr0302353 FRAG
chr2RHet 2128429 2129740 + FBtr0302915 FRAG
chr2RHet 2316691 2318206 – FBtr0302232 PSSD
chr3L 2171777 2172773 – FBtr0072914 DUP
chr3L 6141071 6141715 + FBtr0076993 DUP
chr3L 9506745 9507059 – FBtr0091689 DUP
chr3L 21952611 21953348 + FBtr0112457 DUP
chr3L 17878354 17878628 – FBtr0301175 DUP
chr3L 16593548 16595795 – FBtr0301925 DUP
chr3L 20971865 20972750 + FBtr0302444 DUP
chr3L 24539238 24540086 + FBtr0303009 DUP
chr3L 24542736 24543545 + FBtr0303010 DUP
chr3L 19417802 19418026 + FBtr0303863 DUP
chr3L 19471384 19471770 + FBtr0303926 DUP
chr3L 17861840 17863673 + FBtr0304978 DUP
chr3L 17867815 17869561 + FBtr0304979 DUP
chr3L 24527189 24528567 + FBtr0307117 DUP
chr3LHet 899 1989 + FBtr0114264 FRAG
chr3LHet 2277400 2277931 + FBtr0305903 FRAG
chr3LHet 687420 688819 + FBtr0302346 PSSD
chr3R 1094803 1095232 + FBtr0078783 DUP
chr3R 8221729 8224651 – FBtr0082602 DUP
chr3R 21094383 21095687 + FBtr0084817 DUP
chr3R 23670811 23671136 + FBtr0085225 DUP
chr3R 25684763 25685587 – FBtr0085524 DUP
chr3R 26037249 26037475 + FBtr0085613 DUP
chr3R 26038625 26038864 + FBtr0089614 DUP
chr3R 3352091 3353790 + FBtr0090038 DUP
chr3R 15211639 15212893 + FBtr0112481 DUP
chr3R 4086428 4087620 – FBtr0300631 DUP
chr3R 8719694 8720251 – FBtr0303313 DUP
chr3R 11731969 11732472 + FBtr0304144 DUP
chr3R 1674675 1675175 + FBtr0306740 DUP
chr3R 21093731 21094186 + FBtr0306845 DUP
chr3R 1853489 1854343 + FBtr0307114 DUP
chr3R 23786492 23787198 + FBtr0307118 DUP
chr3R 5887408 5888036 + FBtr0091606 FRAG
chr3R 69328 71233 + FBtr0113190 FRAG
chr3R 5079267 5080405 + FBtr0303309 PSSD
chr3R 17555224 17555617 – FBtr0304882 PSSD
chr3R 17007 21933 + FBtr0308945 PSSD
chr3RHet 412765 412948 – FBtr0302440 DUP
chr3RHet 859445 859735 – FBtr0302347 PSSD
chr4 48156 52259 – FBtr0089180 DUP
chr4 33566 45680 – FBtr0089181 DUP
chr4 26789 32391 – FBtr0089182 DUP
chrU 3448785 3449605 + FBtr0114269 DUP
chrU 8029209 8030316 – FBtr0308947 DUP
chrU 1397433 1397911 – FBtr0114236 FRAG
chrU 5607529 5607780 – FBtr0114258 FRAG
chrU 1206488 1206901 + FBtr0302912 FRAG
chrU 2072072 2074236 – FBtr0114183 PSSD
chrX 373897 375842 – FBtr0070095 DUP
chrX 371883 373342 – FBtr0070097 DUP
chrX 6255528 6256993 – FBtr0070923 DUP
chrX 6176311 6177608 – FBtr0070931 DUP
chrX 6174266 6174785 – FBtr0070932 DUP
chrX 9154730 9155365 + FBtr0071318 DUP
chrX 17792676 17793531 + FBtr0074499 DUP
chrX 20254611 20255089 – FBtr0112509 DUP
chrX 7791378 7792005 – FBtr0299927 DUP
chrX 7792469 7793210 – FBtr0300634 DUP
chrX 9153318 9154436 – FBtr0304143 DUP
chrX 15470388 15470997 + FBtr0304150 DUP
chrX 11481316 11483206 + FBtr0306144 DUP
chrX 11483409 11485299 + FBtr0306145 DUP
chrX 11485502 11487392 + FBtr0306146 DUP
chrX 11487595 11489485 + FBtr0306147 DUP
chrX 11489688 11491578 + FBtr0306148 DUP
chrX 11491781 11493671 + FBtr0306149 DUP
chrX 21026458 21027648 – FBtr0307579 DUP
chrX 19846847 19847947 – FBtr0307580 DUP
chrX 19842112 19843212 – FBtr0307581 DUP
chrX 19837392 19838482 – FBtr0307582 DUP
chrX 19832512 19833429 – FBtr0307583 DUP
chrX 19827821 19828630 – FBtr0307584 DUP
chrX 19822508 19823613 – FBtr0307585 DUP
chrX 19813674 19814773 – FBtr0307586 DUP
chrX 19808805 19809718 – FBtr0307587 DUP
chrX 19803641 19804554 – FBtr0307588 DUP
chrX 6844543 6845299 + FBtr0307391 FRAG
chrX 6846369 6846842 + FBtr0071001 PSSD
chrX 20746139 20746330 + FBtr0303307 PSSD
chrX 3691530 3693041 – FBtr0305348 PSSD
chrYHet 312456 313714 – FBtr0114243 DUP
chrYHet 319739 320997 – FBtr0114244 DUP
chrYHet 327052 328489 – FBtr0114245 DUP
chrYHet 307129 307365 – FBtr0114289 DUP
chrYHet 340030 340818 + FBtr0114241 FRAG
chrYHet 205196 205372 – FBtr0302914 FRAG
chrYHet 337134 338414 + FBtr0114242 PSSD

Posted in pseudogenes | Tagged csds | Leave a reply

psiDR query

Posted on June 3, 2019 by gersteinfaq

Reply

Q:
I am looking for tissue-specific transcript for pseudogenes in psiDR. However, I could only find the details about their translation in a few cell lines. Kindly provide details of resource/file where this information might be available.

A:
The pseudogene transcription is evaluated using various human cell lines from HumanBodyMap data. The latest information for the human transcription is available in psiCUBE database http://pseudogene.org/psicube/ .
Here used RPKM information to asses the pseudogene transcription levels as described in http://www.pnas.org/content/111/37/13361.short .

I attach here the pseudogene transcription information with the calculated RPKM values in each cell line.

Posted in pseudogenes | Tagged csds | Leave a reply

pseudogene similarities to parent genes

Posted on June 3, 2019 by gersteinfaq

Reply

Q:
I am looking at your paper ("The Gencode pseudogene resource"), which
appears very relevant to something I am doing right now. Specifically
I am interested in the Sequence identity values between pseudogenes
and their parents, which are used in figure 4. Would it be possible
for you to make these available to me (or to tell me where i can
download them if they are already online ?)

A:
You may find the data at http://pseudogene.org/psidr/similarity.dat

Posted in pseudogenes | Tagged bp | Leave a reply

Pseudogene identification pipeline for bacterial genome

Posted on June 3, 2019 by gersteinfaq

Reply

Q:
I am writing to you reagarding ‘Pseudogenes’ detection within bacterial genome- I was wondering is there a software/ pipleline to use in order to identify pseudogenes within bacterial genome.

A:
The best way is to use our pseudogene annotation pipeline – Pseudopipe. You can download the stand-alone version that can be easily run on your computer and does not require a cluster:
http://pseudogene.org/pseudopipe/

Posted in pseudogenes | Tagged csds | Leave a reply

Pseudogene talk at ASHG

Posted on June 3, 2019 by gersteinfaq

Reply

Q:
I recently attended the ASHG conference where you gave a talk on pseudogene copy number variation based on the 1000 genomes project. I tried looking for this study online and didn’t find anything that was obviously part of your presentation. I was wondering if this data has already been published, and if so if you would let me know what the name of the study was.

A:
I think the studies you are looking for are:

http://www.pnas.org/content/111/37/13361.abstract
and
http://genome.cshlp.org/content/23/12/2042.full.pdf+html

The first is the latest paper from our lab on pseudogene analysis and the second is a paper on CNVs and retroduplications based on 1000G project.

Posted in pseudogenes | Tagged csds | Leave a reply

Mouse Transcribed Pseudogene Data

Posted on May 23, 2019 by gersteinfaq

Reply

Q:
I’m currently working on how pseudogenes can act as competitive endogenous RNAs in humans, and would like to expand my study to include mice. I recently read a paper from your lab, Comparative analysis of pseudogenes across three phyla, and in the supplementary information you mention that you identified 878 transcribed pseudogenes in the mouse genome. Is there a list of these pseudogenes as well as their associated parent genes available on either the pseudogene.org website or on a different website?

A:
I think this draft list should be on the psicube site .

Posted in pseudogenes | Tagged csds | Leave a reply

Post navigation

← Older posts

Newer posts →

Search

Search

Categories
alleles annotation capstone4 chip clustering cnvnator diploid-genome encode evolution funseq fusionseq geometry LoF metagenomics microarrays misc modencode molmov networks non-coding privacy privaseq3 proteomics pseudogenes pubnet RNA-seq STRESS structures SVs tf_binding Uncategorized VAT
Authors

aan

ans

asb

bp

bw

cc

csds

dc

dl

dw

ek

fn

gg

hy

Hyejung

hyl

jc

jk

jq

jz

kky

kx

ky

lh

ll

mette

mf

mg

m gandel

M Mohiyuddin

mw

m_peters

nb

neil voss

pdm

pe

ra

sb

sf

sk

skl

sl

stl

tg

ugJJ2

ugSD

xjm

yf

yg

yhl

yz

zdz

zz

Calendar
May 2026

M T W T F S S

1 2 3

4 5 6 7 8 9 10

11 12 13 14 15 16 17

18 19 20 21 22 23 24

25 26 27 28 29 30 31

« Jan

Gerstein Lab FAQs

Frequently Asked Questions

Category Archives: pseudogenes

Terms in Pseudopipe output, etc

Regarding PseudoPipe MySQL file

Question about a potential error with Pseudogene.org

Having some problems while executing PseudoPipe

Question about fly pseudogenes Sisu et al., 2014 publication

psiDR query

pseudogene similarities to parent genes

Pseudogene identification pipeline for bacterial genome

Pseudogene talk at ASHG

Mouse Transcribed Pseudogene Data