Running FunSeq

Q:
I recently read your paper on Funseq, and I am pretty interested in using it in solving some of my interested questions regarding cortex plasticiy. However, I’m not very familiar with Linux/UNIX running environment for this software, and what I have is just a mac laptop….Could you give me some information about how I could use this software on a mac computer, or where I could find some useful information instructing me how I could use this software on a mac computer?

A:
You should be able to download this software on a mac and use it.
You can download it from funseq.gersteinlab.org.

Since you are not familiar with downloading software, have you tried to use the online version at http://funseq.gersteinlab.org/analysis .
You can upload your file and see what you get.

list of ‘LoF-tolerant’ gene category

Q:
I read Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics and A systematic survey of loss-of-function variants in human protein-coding genes, and interested about the list of ‘LoF-tolerant’ gene category. I would be appreciated if you could provide with it.

A:
Please see below the list of LoF-tolerant genes from the Science paper.
This list is based on the data from Phase 1 of the 1000 Genomes project.

ABHD14B
AC002511.1
AC007342.1
AC007601.1
AC008676.1
AC009041.2
AC009113.1
AC013480.1
AC018755.11
AC020763.1
AC022148.1
AC022692.1
AC079612.1
AC083883.1
AC091435.1
AC091435.2
AC092171.2
AC092329.1
AC096920.1
AC100788.1
AC100803.1
AC111170.3
AC116447.1
AC118758.1
AC124944.1
AC129492.6
AC130686.1
AC132186.2
AC133919.6
ACSM3
ACTR3C
AF131215.4
AGAP6
AHCTF1
AKR1E2
AL022324.1
AL031587.1
AL035696.1
AL122001.2
AL139385.1
AL355102.1
AL356270.2
AL359236.1
AL359392.1
AL359878.1
AL391137.1
AL449106.1
AL596442.1
ALMS1
AP000354.1
AP001793.1
AP002962.1
APOBEC3B
AQP12B
ARID3A
ARL9
ARMS2
ATP13A5
BPHL
BTN3A2
C10orf113
C10orf68
C11orf21
C12orf60
C13orf26
C14orf180
C14orf182
C17orf107
C17orf77
C17orf97
C18orf56
C19orf71
C1orf227
C20orf185
C21orf88
C2orf57
C3orf14
C3orf49
C4orf17
C5orf27
C5orf49
C6orf123
C8orf44
C9orf43
CALHM2
CAPN11
CAPN9
CASP12
CCDC163P
CCDC7
CD200R1
CD200R1L
CD207
CEACAM4
CELA1
CENPBD1
CFHR1
CLYBL
COL16A1
COL23A1
COL6A5
COX6B2
CPN2
CPNE1
CR392000.1
CRIPAK
CST9
CWH43
CYP2A13
CYP2A7
CYP2C18
CYP2C19
CYP2D6
CYP4B1
DCLRE1A
DDIT4L
DEFB126
DEFB128
DEM1
DNAJC28
DSCR8
DSG1
EBF4
EIF3CL
ENPP7
FAM111B
FAM187B
FAM25A
FAM71D
FAM75A6
FBXL21
FMO2
FMO6P
FTHL17
FUT2
GAB4
GBAP1
GBP3
GBP7
GDPD4
GLT6D1
GPR142
GPRC6A
GRIN3B
GSTT2
GSTT2B
GUF1
H2BFM
HBM
HBP1
HSD17B13
HTN3
IDI2
IDO2
IFNE
IL34
ITIH5
JMJD1C
KRT31
KRT37
KRT77
KRTAP1-1
KRTAP13-2
KRTAP1-5
KRTAP4-8
KRTAP9-1
LAD1
LCN10
LILRA2
LILRA3
LILRB1
LIPJ
LPA
LRRC39
MAGEB16
MAGEE2
MAN2A1
MBL2
METTL7B
MEX3C
MOGAT1
MS4A12
MSR1
MST1R
NACA2
NIPA2
NOXO1
NT5C1B-RDH14
OLFM4
OR10AD1
OR10D3
OR10G7
OR10R2
OR10X1
OR11G2
OR13C2
OR13C4
OR13D1
OR1B1
OR1J2
OR2A5
OR2C1
OR2D2
OR2D3
OR2G6
OR2T11
OR2T27
OR2T4
OR2V2
OR3A1
OR4C11
OR4C16
OR4D10
OR4D6
OR4L1
OR4P4
OR4S2
OR4X1
OR4X2
OR51F1
OR51H1P
OR51I2
OR51Q1
OR51V1
OR52A1
OR52A4
OR52B4
OR52I2
OR52K2
OR52M1
OR52N4
OR5AC2
OR5AR1
OR5B17
OR5H1
OR5H15
OR5K4
OR5M1
OR5M10
OR5M11
OR6C4
OR6C74
OR6Q1
OR7G1
OR7G3
OR8B3
OR8I2
OTOP1
OXGR1
PCDHA3
PCDHGA8
PKD2L1
PKHD1L1
PLA2G4D
PLA2R1
PLEKHG5
PNLIPRP3
POM121L4P
PPEF2
PRAMEF4
PRB4
PSG9
PSORS1C2
PTCHD3
PTGDR
PTX4
PXDNL
PZP
RAI1
RESP18
RFPL1
RHD
RP11-113D6.6
RP11-297N6.4
RP11-455G16.1
RP11-481A20.11
RP11-521B24.1
RP11-542P2.1
RP11-766F14.2
SATL1
SCN8A
SDR42E1
SEC14L4
SEMA4C
SERPINA9
SERPINB3
SLC22A14
SLCO1B1
SLFN12L
SNX31
SPATA18
SPATA4
SPATA8
SPERT
SPTBN5
SPZ1
STARD6
SUMF2
TAAR2
TAS2R46
TAS2R7
TBC1D29
TCHHL1
TCP10L2
TCTEX1D1
TIGD6
TLR10
TLR5
TMEM198
TMEM82
TMPRSS7
TNK1
TRIM22
TRIM38
TRIM73
TRPM1
TSPAN19
TTC24
TXNRD3IT1
UBE2NL
UGT2B10
UGT2B28
ULBP3
UNC93A
USP50
UTS2D
VN1R1
Z82214.1
ZAN
ZFP91
ZNF28
ZNF284
ZNF417
ZNF469
ZNF474
ZNF527
ZNF681
ZNF790
ZNF80
ZNF804A
ZNF812
ZNF860

Funseq2 output: missing variants

Q:
We are trying to implement the scores of Funseq2 (running locally).
However, we would like to have a score for each variation in the
input-vcf: this is not the case if we look at the Output.vcf.
Can I conclude from this output, that the missing variants in
Output.vcf have a score of zero?

A:
The somatic variants that overlap 1000 Genomes variants are filtered out.
Those might be the variants being removed from your output vcf.
You can check one or two manually and you should be able to confirm that.

list of LoF tolerant genes (140) and list of essential genes (115)

Q:
I read with great interest your exciting paper on "Interpretation of genomic variants using a unified biological network approach".
In the last section of the Results, you describe the validation of your logistic regression model using a list of 140 LoF-tolerant genes (McArthur et al 2012) and a list of 115 essential genes (Liao et 2008). Even though I also read both papers, I couldn’t really find the lists of genes mentioned above (e.g. the supplementary table of Liao’s essential genes lists 120 genes and not 115 genes)
So, I was wondering if you’d be so kind and share the list of 140 LoF-tolerant genes and the list of 115 essential genes.

A:
In our plos comp bio paper in Supplementary Table S8 – the genes with significance_score=0 (second column) are LoF-tolerant genes and genes with significance_score=3 are Essential genes. This file contains 140 LoF-tol and 115 essential genes.

I think Liao et al reports 120 essential genes but with gene id conversions we lost 5 of them.

Mutations in sensitive and ultra-sensitive regions

Q:
I read your paper entitled “Integrative annotation of variants from 1092 humans: application to cancer genomics” in Science from Oct. 4, 2013. Since the mutation in the so-called ultra sensitive regions play an important role in cancer development, I wonder whether it is possible to find out where those mutations are in the ultra sensitive region and what mutations they are? I can’t find them in the paper although they are mentioned.
Is there some where in which I can go and find the mutations?

A:
Thanks for your interest in our paper.
You can find the genomic coordinates of sensitive and ultra-sensitive regions in Data File S3 provided with the supplement of the paper. For the cancer samples we analyzed, you will find the coordinates and detailed information for candidate drivers in Data File S6; this file also lists whether the mutations are in sensitive or ultra-sensitive regions.

Annotation of SNPs as breaking or conserving TF motifs

Q:

Congrats with a very nice paper in Science (Khurana et al., 2013). I am particularly interested in how you are able to score variants in transcription factor binding sites. According to the supplementary methods you say that: "An SNV that breaks a motif is defined as a mutation that decreases the motif-matching score of the TF-binding site to the position weight matrix (PWM) of the motif (relative to the ancestral allele) (8). Conversely, an SNV that conserves a motif is defined as a mutation that increases the motif-matching score of the TF-binding site to the PWM of the motif."

This makes perfectly sense to me. But how do you define the TF-binding site in the first place? I would guess that you are applying a threshold on the motif-matching score here (to reduce the fraction of false positives), and that you then define disruption/conservation of the variant relative to this score. I cannot see any details with respect to this aspect in the paper (as far as I can see).

You refer to Mu et al. (NAR, 2011), I cannot however see any further details there.

I would very much appreciate an explanation of how you find the TF binding sites and if you use any PWM-score thresholds in this respect.

A:

The set of motifs we used in the two papers are the set of TF motifs officially released by the ENCODE project and was used in the ENCODE main publication in 2012 too. The algorithm to detect the motifs is developed by Pouya at MIT. Here is more detail about it.
http://compbio.mit.edu/encode-motifs/

In our paper, we take these motif coordinates and categorized SNVs based on their functional effects you described.

Multinet (Unified global network) – academic use

Q:
I read your seminal paper “Interpretation of Genomic Variants Using a Unified Biological Network Approach” recently published in PLoS Computational

Biology. I have a few queries:
Is the network available for academic use?
Can we download the relevant multinet to form hypothesis and do

experiments?

A:
Please find the downloadable network at
http://homes.gersteinlab.org/Khurana-PLoSCompBio-2013/
Posted in Uncategorized | Tagged ek | Leave a reply