We are interested in using the HotCommics pipeline to identify hotspot
communities from our own cancer mutation data. However, we have
difficulty in running the pipeline because we could not find
description of the input files in the snpMapping and the
hotSpotCalculation step. Could you kindly help to provide us some
example input files so that we can appropriately format our input?
Thank you for your interest in our work. The input file for SNP
mapping step is the input file for VAT tool, which can be the vcf file
that you. are working with. Alternatively, you can also use a
tab-separated file with header information described below.
#CHROM hg19_pos ID Ref Alt Tumor_Sample_Barcode
For the hotspot community identification, you will have to run the
community identification module for each PDBs on which your mutations
have mapped to
Once you have generated these communities and have a list of PDBs on
which mutations have mapped to then you will need to provide the list
of PDBs for hotspot calculation.
I’m having trouble with the multi-chain morph server. My protein includes chains that are designated as an upper case “A” and lower case “a”. When I upload the PDB file and specify all chains including A and a, the resulting Morph PDB file does not contain the lower case chain a. Is there any way to fix this problem?
Thank you for reaching out to us regarding the morph server. It is perhaps the case that the server is internally case-insensitive with regard to the chain. If possible, I would suggest changing chain "a" to a different letter in the PDB input file (ie, by changing "a" to "B" using a simple script). Then, once you get your output from the server, you can again change chain "B" to the original label chain "a".
I uploaded my PDB files on the multi-chain morph server, the job ID is b499364-832. It has been two days but the results page still says that the job is not yet complete. Is there any problem with my morph? I understand that my files are very large so it may take a long time to finish, but is it possible that it takes this long? I would really appreciate your assistance.
Files generated by the multi-chain server can be found in
Yours are in
Looking at it, some files seem to be missing.
For e.g., a complete run would look like
There is a job running and it seems to be related to yours
"/usr/bin/perl ./multi.pl b499364-832 –chains=WCBAXHGFEDJLMNOPQRYAIS
–nframes=8 –email=seiga –engine=CNS –debug"
The short of it is that your job is probably stuck since you seem to
have submitted it 6 days ago judging from the submit time. Note that
we cannot guarantee the full functionality of this service as it is
from 2005 and has not been fully maintained since. Occasionally, we
may roll back but that would mean you will need to resubmit your job.
Regarding your paper entitled "3V: cavity, channel and cleft volume calculator and extractor", which I read carefully.
I’ve a question for you. In the abstract, it is written the following:"It rapidly finds internal volumes by taking the difference between two rolling-probe solvent-excluded surfaces,…", but I think you mean "two imaginary rolling-probe solvent-excluded surfaces" because after looking at your code, I haven’t seen any analytic SES formulation therein. I guess you are just using two probe spheres of distinct radii to account for cavities, not the analytic SES themselves. Am I right?
I am not certain about your use of the term "imaginary", but I would say my method is a "discrete approximation" to the SES. And because it is discrete (i.e. a 3D grid) one can simply subtract one grid from another. See attached figures.
With small grid sizes (0.2 A), I see very little discrepancy to the analytical solution.
I am reading with interest your recent paper (Kumar, Clarke, and Gerstein, PNAS), but I suspect that supplement 1 and 2 are the same, and neither has a list of 434 genes. Could you please supply the list?
Thank you very much for your interest in the paper. Supplement 1 includes hotspot communities based on pan-cancer analysis (i.e., when will compute statistics over multiple cancer cohorts in TCGA). In contrast, supplement 2 lists out putative driver genes with hotspot communities for specific cancer types. If you note in supplement2, column F list out the name of particular cancer cohorts.
Regarding the number of genes, 434 genes are based on the pan-cancer analysis.
For each gene, there are multiple PDB entries. For analysis in our paper, we selected a representative structure with the highest residue coverage. However, to be exhaustive and allow researchers to analyze protein of their interest, in our supplement, we include all PDB entries for a given gene. We have tried to explain this in our method section.
Thanks for your quick reply; but, no, this does not remove my confusion. Please take a moment to check the link from your paper at PNAS. When I download pnas.1901156116.sd01.xlsx, the file has 217 lines (not 434) and includes the column F that breaksdown by cancer type.
I am attaching our original tables with the email. It appears that the table has been somehow duplicated on the PNAS website. We will work with the PNAS team to get it fixed.
I am reading your Hinge Atlas (2007) paper.
I searched for your dataset to study the pdb structures you used and their
hinge residues, but I could not download it from the page:
Could you please send a file if it is possible by email, or please check and
fix if there is a bug on the web page.
Please try the Hinge Atlas Gold while we investigate the webpage. This may take time.
I tried to use the 3V (http://3vee.molmovdb.org/) server developed at your lab but I was not able to connect to it. Is the server still running? Would you help me find a way to run it?
It was up but unresponsive. It should be working now.
Could you please have a look at my submission from yesterday and tell me what I did wrong that they job does not finish?
The server is running fine. Please follow the tutorial vedio for input file preparations. We are not responsible for user’a input files that are not compatible with the original script of the server.
We do not run jobs for those users, given that the server runs fine as it is originally supposed to be. There’s FAQ and tutorial I made on the webpage that you should follow.
I am writing this e-mail to inquire about STRESS software.
We have learned from your paper (Structure 2016,24:826-837)
that STRESS software can be used for identifying allosteric pockets.
We are interested in using the software for our drug discovery research.
We will perform evaluation of the software for a start.
Will you allow us to use STRESS software for the purpose of our
commercial drug discovery project free of charge?
As this is an urgent project, we would highly appreciate if you could
see license at https://sites.gersteinlab.org/permissions/
We could not access your 3V server listed in your paper below.
Link to such calculation of protein volume in the presence of probe with
Should be up now. We had to power down all our machines on Monday and
3vee went up but it was in a hung state.