I read about the recently published software for deconvoluting pervasive and autonomous retrotransposons. Could another calculation be added to the software’s output which estimates the abundance of ORF1 and ORF2, the parts of the retrotransposon which are translated into protein? I’m not experienced in this research area, so I am unsure of how feasible that is. I would like to make an approximation to the ORF1 and ORF2 protein abundances using RNA-seq.
Thanks for reaching out here and on GitHub. This is an interesting question and suggestion. Unfortunately, estimating the rate of protein abundance of ORF1 and ORF2 from RNA-seq is extremelly hard. There are essentially two factors that make it difficult to estimate protein abundance from transcriptome data. The first is technical. RNA-seq has a strong bias to overrepresenting the 3′ or transcripts, therefore, ORF2 would most likely be overestimated. This is issue is easily addressable.
The second one is more biological: LINE-1 is tightly regulated at many different levels. No only LINE-1 transcription is regulated but there are also many post-transcription mechanisms that either boost or stop LINE-1 translation. This is not only true for LINE-1, in general, estimating protein abundance from RNA is a hard problem (https://www.nature.com/articles/nrg3185).
That said, I’m really interested in this question. In theory, we could use machine learning algorithms to predict ORF1 and ORF2 protein levels based on RNA-seq if we had enough data. This could be an interesting followup work after TeXP