I am examining your uORF annotations with great interest but am unsure how to interpret a few of the entries in the file below on the github site.
Complete list of predictions (complete_uORF_predictions_hg19.zip · 35.29 MB)
If you look at these two uORF_IDs:
They are annotated with the same start and end coordinates, but different start codons (ATC / ATA).
Also, looking at the region I cannot find either start codon in the hg19 reference.
Any idea what is going on here?
Basically, the start codon here appears to overlie a splice site. Alternative splicing means you could either end up with an ATC or an ATA at that location depending on which processed transcript you are looking at (see image below). That’s why these uORFs have the same start and end coordinate, but different start codons.
We had wrestled a bit with the question of whether or not to call these two separate uORFs. However, they do have different mRNA/protein sequences, so that’s why they received separate entries in our catalog.