Recently, I used FunSeq2 to identify non-coding regulatory variations in my
bladder cancer research. In promoter analysis, I discovered the original
file, gencode.v19.promoter.bed, which downloaded from
http://funseq2.gersteinlab.org/static/data_context2.1.2/gencode/, having the
promoter areas of ranking from 1 to 8979 bp, that was inconsistent with the
definition in your article (promoters defined as -2.5 kb from transcription
So, I checked the process script 3.gencode.process.pl, which was downloaded
from http://funseq2.gersteinlab.org/scripts_dev/ and suited for gencode.v16.
This script generated gencode.v16.promoter.bed
(http://funseq2.gersteinlab.org/static/data_context2.1.0/gencode/), and the
BED file’s 3rd column minus 2nd column all equals 2500 (2.5 Kb). After
comparing, I noticed that the latest gencode.v19.promoter.bed has the
additional 5th column, so I realized the 3.gencode.process.pl script had
been re-edited, but I did’t find the latest version on the internet.
Therefore, I wonder whether the latest 3.gencode.process.pl redefined the
meaning of promoter. If it does, can I get one copy of this script?
The promoter file was derived from PCAWG promoter set, which may consider chromHMM segmentation information. Yao have updated this in the v2.1.2, then I keep it in the latest version. User can replace the right file using their own definition of promoters.
The promoter file included in Funseq 2.1.2 is based on PCAWG consortium’s definition, which considers ChromHMM segmentation information. So it will not be exactly 2kb or 2.5kb upstream of TSS.