I am using Pseudopipe and I am wondering the different types of its output.
I looked into the script and found there are several types: GENE-SINGLE, PSSD, FRAG, GENE-MULT, and DUP. Would you like to explain the meaning of each type?
From what i can see you are looking at an intermediary result file not at the final output. The final output should contain only 3 biotypes: PSSD, DUP and FRAG.
The PSSD is indicative of processed pseudogenes, DUP is indicative of duplicated pseudogenes, FRAG is indicative of pseudogene loci where we can not assign with certitude a biotype (processed or duplicated).
GENE-SINGLE and GENE-MULTI are intermediary biotype definitions. The SINGLE refers to the fact that the pseudogene locus contains only one exon (similar to processed pseudogenes) and MULTI refers to the fact that the potential pseudogenic locus contains multiple exons (similar to duplicated pseudogenes).
If a proposed locus has over 95% sequence identity to the parent gene and covers over 95% of the parent gene sequence and there are no identifiable disablements associated with it we initially refer to these potential loci as GENE-SINGLE and respectively GENE-MULTI. If we find a polyA tail you might see PSSD|GENE-SIGNLE and in that case we will relabel that locus as a processed pseudogene. For very high similarity we tend to be conservative and not label that locus as a pseudogene. If we find in subsequent searches additional data (E.g. polyA tail, truncations etc) we will relabel the locus as pseudogene.