Question on STRESS (allostery) tool

Q1:
I would like to ask for advice on your recently published STRESS tool. I would like to use it to identify residues that might be involved in allostery, however in the case of surface critical residues, it currently reports only up to ten residues per binding pocket. Is there a way to "hack" the identify_high_confidence_BL_sites.py script (which writes this file) to write all such residues, according to some probability cutoff? (I’m a Perl programmer, with relatively limited python experience.) Thank you in advance.

A1:
I have modified 2 of the c scripts, so you might want to re-compile using these new c scripts before running your calculation.

Now — how exactly did I modify the scripts, and how can you expect your new output to be different from what you were getting previously? I was a little bit unsure about your query, where you wrote "to write all such residues, according to some probability cutoff". As you know, there was an original cutoff of 10 residues. I have modified the scripts such that the new limit is now set to 15 residues. You can change this cutoff number to be any arbitrary value that you wish, simply be changing the c scripts in the way that I’ve changed them, and then recompiling. To see the specific changes that I’ve made, you can run the Unix "diff" command to compare the old C scripts with the new ones. Doing so, you’ll see that:

In the script bindingSiteMeasures.c, I have replaced instances of "10" with "15" in lines 77, 79, 217, 219, 240, 242.

In the script surfaceProbe.c, I have replaced instances of "10" with "15" in lines 1227, 1229, 1235, 1237.

You could try running things using these 2 new scripts, and you can let me know how things go.

Q2:
Basically I would like to use the tool to identify putative allosteric residues in a protein, i.e. get a list of them.Theoretically, one should be able to set some kind of probability cutoff, which holds for most residues. i.e. that there is a certain probability that they are allosteric. After reading the paper (and I cannot say I understand everything in the methodology) it seems that there are two problems: first that the surface and interior residues are identified by different methods (so the likelihood of being allosteric may be different for the two sets); second that in the case of the surface sites there is a limit of 10. So I wonder whether it is somehow possible to adjust/set the internal cutoffs of the tool (Jaccard?) to get a – ideally full – list of surface and internal residues that are more or less equally likely to be allosteric. Do you think it is doable? And if not with this tool, can you suggest one?

A2:
I think I understand what you mean. The 2-fold problem here is:
(1) since the surface and interior allosteric residues are identified by very different methods, there is a sort of "apples and oranges comparison" that makes it difficult to assign a unified, consistent probability rule (of being allosteric) to both sets of residues
(2) for the case of surface residues, there is an (admittedly) arbitrary cutoff threshold of 10 residues per site.

With respect to problem (1): I agree that the two methods are very different, but the idea assigning a numeric likelihood or probability (of being allosteric) in either case is actually not really that straightforward. In both cases, these are only predictions, and we have tried to make these predictions as accurate as possible by comparing our predicted allosteric residues with known allosteric sites in proteins. The two methods are very different because the allosteric mechanisms in the interior and on the surface are so distinctly different. If you are working with just one protein, and you need greater sensitivity, I might suggest using molecular dynamics, which would give more accurate predictions. Relevant studies that use such approaches might include:

del Sol, A., Fujihashi, H., Amoros, D., and Nussinov, R. (2006). Residues crucial for maintaining short paths in network communication mediate signaling in proteins. Mol. Syst. Biol. 2(1).

Ghosh, A., and Vishveshwara, S. (2008). Variations in Clique and Community Patterns in Protein Structures during Allosteric Communication: Investigation of Dynamically Equilibrated Structures of Methionyl tRNA Synthetase Complexes. Biochemistry. 47, 11398-11407.

Ming, Dengming, and Michael E. Wall. “Quantifying allosteric effects in proteins.” Proteins: Structure, Function, and Bioinformatics 59.4 (2005): 697-707.

Mitternacht, S. and Berezovsky, I.N. (2011). Binding leverage as a molecular basis for allosteric regulation. PLoS Comput. Biol. 7, e1002148.

Rousseau, F. and Schymkowitz, J. (2005). A systems biology perspective on protein structural dynamics and signal transduction. Curr. Opin. Struct. Biol. 15, 23–30.

With respect to problem (2): We have established the parameters of the surface-site identification scheme using a known set of allosteric residues. That is, our parameters were established empirically to best capture known allosteric sites. The details of all this can be found in the Supplementary Materials of the paper, specifically in the Supp section 3.1-a-iii "Defining & Applying Thresholds to Select High-Confidence Surface-Critical Sites". I would be happy to help you modify the code to use different parameters, but I would advise against changing them, since again, they were empirically optimized.

Unfortunately, I do not know of any one tool or software that would provide both surface and allosteric residues within one suite, and we have tried as best as possible to do essentially that.

Q3:
Thank you very much. I plan to analyze a relatively large number of proteins, so MD doesn’t seem to be the right choice. I thought that STRESS was optimized empirically, so I would prefer to change it a little as possible. I made a few tests, and I would like to ask for some more help in interpreting the results. 1) is STRESS suitable for the analysis of protein complexes? 2) Some proteins have very large ligands, while STRESS seems to use a small ligand to identify the cavities. I’m not familar with the internals, but that may matter a lot for the results in proteins where a ligand is large, i.e. a dinucleotide, or a cofator. Is it possible to somehow adjust this for a given protein structure in the analysis?

Finally, it is not entirely clear how to use the firs two columns of the table of surface critical residues, i.e. how to identify the rows that actually matter, and how to set a threshold of reported residues so that it contains all residues of a site not just 10.

A3:
There are a few questions that you pose here, so I’ll address each one in turn:

With respect to: "I plan to analyze a relatively large number of proteins, so MD doesn’t seem to be the right choice."
–> In that case, I agree that STRESS is a good option.

With respect to: "1) is STRESS suitable for the analysis of protein complexes?"
–> The answer here really depends on the nature of the complex and the nature of the allosteric residues that you’re trying to identify. If you’re considering an obligate protein complex (ie, a complex in which the proteins must be together in order to function, such as a STAT dimer), then STRESS is a great tool. In such a case, STRESS will attempt to identify both internal and surface allosteric residues in the context of such an obligate complex. In fact, STRESS was parameterized in the context of proteins, some of which were studied in complex form (in the PDB, we studied the so-called "biological assemblies"). In addition, we studied conservation (as a type of validation) using many proteins in complex form.
However, you must be very careful — it may not be ideal to use STRESS to study transient complexes (for example, a protein kinase interacting with its target during target phosphorylation). In such a case, consider the surfaces of normally-exposed proteins — these surfaces may have biologically functional allosteric sites, but those surfaces will be occluded when the proteins are in complex form. As a result of that surface occlusion, STRESS will not have access to those surfaces in the surface-critical identification module. Also be aware that the network of interconnecting residues will have very different topological properties when the proteins are in their complex vs. monomeric forms (for instance, a given residue that shows up as a hub in the network of the complex may not be a hub in the monomeric network). Thus, again, in such a transient complex, it may not be appropriate to use STRESS to identify interior allosteric residues (unless you’re only interested in the interior allosteric residues that function when the protein is in complex form).
Having said all this, I should mention that most protein complexes that occur in the PDB are less likely to be transient (transient interactions are difficult to crystalize), so, in that regard, I’d say that STRESS is probably ok for most complexes in the PDB.

With respect to: "2) Some proteins have very large ligands, while STRESS seems to use a small ligand to identify the cavities. I’m not familar with the internals, but that may matter a lot for the results in proteins where a ligand is large, i.e. a dinucleotide, or a cofator. Is it possible to somehow adjust this for a given protein structure in the analysis?"
–> Unfortunately, it is not possible to change the 4-atom ligand. You’re absolutely correct about this, and it is indeed a consideration that we took into account. However, the 4-atom ligand ism an inherent limitation in the STRESS software. For 3 reasons, it we decided to stick to just 4 residues, and it is very difficult to change that:
1) The software performs MC and needs to measure atom-ligand distances many times. Increasing the number of atoms in the ligands substantially increases the running time of the software. Increasing to just 5 atoms may increase the running time by more than 10-fold.
2) We wanted to make STRESS as general as possible, and STRESS does not assume a-priori knowledge of the specific ligands of a given protein. Thus, to provide such generality, we decided to use a 4-atom ligand (as many natural ligands may be pretty small).
3) The STRESS software was actually developed based on a code precursor written by one of the other authors a few years ago. That author hard-coded the 4-atom ligand requirement into the software’s architecture, and it was very difficult to change that setup.

With respect to: "it is not entirely clear how to use the firs two columns of the table of surface critical residues, i.e. how to identify the rows that actually matter…"
–> The first column (integers) is actually meaningless to the user — it only serves as an arbitrary index for the site when the software is run, and it was used by us as an internal tracker (index) for debugging purposes. The second column (floating-point numbers) indicates the actual binding leverage score for a site. High scores designate high binding leverage scores (ie, sites that strongly couple to the protein’s motions). Columns 3 and over designate the actual identities of the residues within that site.

With respect to: "how to set a threshold of reported residues so that it contains all residues of a site not just 10"
–> This can be done by changing the code in the way that I had detailed for those 2 other scripts (ie, where I changed the threshold from 10 residues to 15 residues). I would be happy to help you change it to another threshold if you like, but everything should work if you change the code based on my earlier changes.

Q4:
Thanks a lot. So, if I understand correctly, STRESS can handle pdb entries with multiple chains (i.e. "motions" can be transmitted between residues of different chains, and having more than one chain in an entry does not compromises its performance) – and it is up to the user to decide whether it biologically makes sense or not – distinguishing between obligate and transient complexes is far from being straightforward. Ligand size might be a bigger problem for me.

Is there a recommended cutoff for binding leverage score (i.e a score below the likelihood of being allosteric is negligible)? Also it would be great if one could set the number of printed residues (in the surface critical file) not by their maximum allowed number (10 or 15), but by some statistical measure, that quantifies their likelihood of being allosteric. (In my tests for several sites the number of printed residues is lower than 10, I guess in those cases a cutoff like this is used.)

A4:
A few items here, so I’ll address each in turn:

With respect to: "I understand correctly, STRESS can handle pdb entries with multiple chains (i.e. "motions" can be transmitted between residues of different chains, and having more than one chain in an entry does not compromises its performance) – and it is up to the user to decide whether it biologically makes sense or not"
—> Yes — your interpretation is 100% correct.

With respect to: "distinguishing between obligate and transient complexes is far from being straightforward"
—> Although it is true that it is not straightforward, I have found that it is quite reasonable to treat most proteins as stable complexes if they’ve been deposited in the PDB, for 2 reasons: 1) If they were truly very transient in nature, it would be difficult to crystallize them, and 2) if you’re studying a protein in the complex form, then it is often the case that the protein is in its biologically active state. In such a cases, identifying allosteric residues in this state is the most reasonable way to go (ie, the allosteric residues within the biological state are, of course, generally what’s of interest).

With respect to: "Ligand size might be a bigger problem for me."
—> Correct — ligand size will be a problem if most of your ligands are quite large. However, STRESS is also designed to not only find the known ligand-binding sites, but also "cryptic allosteric sites" – that is, sites that do not serve as allosteric residues in normal biological context, but which may function allosterically in artificial contexts (for instance – many drug binding sites on proteins do not serve as true biological binding sites within the normal functioning of a cell, but drug binding to the site may nevertheless impart allosteric consequences).

With respect to: "Is there a recommended cutoff for binding leverage score (i.e a score below the likelihood of being allosteric is negligible)?"
—> There really is not cutoff. The reason is that different proteins (being of very different sizes and topologies) will exhibit such different distributions of binding leverage scores. Using a universal cutoff would be unrealistic, given the vastly different score distributions for different proteins. Thus, rather than using a ‘universal cutoff’, we instead devised a scheme (detailed in the supplement) to find a reasonable cutoff using the distribution of binding leverage scores for each protein.

With respect to: "Also it would be great if one could set the number of printed residues (in the surface critical file) not by their maximum allowed number (10 or 15), but by some statistical measure, that quantifies their likelihood of being allosteric. (In my tests for several sites the number of printed residues is lower than 10, I guess in those cases a cutoff like this is used.)"
—> I do see what you mean, but this would really not be straightforward. In order to do this, one would need to do the following two things (the first of which would be very difficult to do theoretically, and the second of which would entail a lot of extra work):
1) One would need to devise a statistically rigorous means of assigning confidence in the first place. Doing this may entail assumptions about what the distributions (be they distributions of scores or measures of confidence, etc) would be. For instance, one may need to assume that binding leverage scores are normally distributed. However, we’ve observed that normal distributions are generally not applicable. What’s more difficult is trying to justify that one single family of distributions (ex: exponential) describes the distributions of scores for all proteins universally, and this really is not the case. Thus, it would be theoretically quite difficult to devise a truly justifiable and rigorous statistical test.
2) Just from a technical point of view — we iteratively merging the sites using jaccard scores, etc (details in the Supplement). Re-engineering the pipeline that we already have (assuming this could be justified — see note #1 above) would entail a lot of work. Plus, I wouldn’t recommend putting the code under such surgery, especially if you have limited experience with C.

In any case, I think that’s all for now. Certainly feel free to let me know if there’s anything else with which I can help.

Help with Voronoi software

Q:
I have a protein PDB structure I’ve seen you this text,,I want to use Voronoia software to find the cavity of the protein and the amino acids of the cavity ,Can you help me to analyse it? I want to use the SURFNET software to find my cavity,can you help me to analyse? the following is my protein PDB structure.

A:
My apologies for the confusion, but SURFNET was not written by our group. SURFNET was written by Laskowski. Perhaps you could reach out to that team, and they may be able to help:

https://www.ncbi.nlm.nih.gov/pubmed/8603061
SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions.
Laskowski RA1.

If you’re interested in using a similar software, you may want to try our 3V server:
http://3vee.molmovdb.org/

If you experience difficulties using 3V, please do not hesitate to let us know.

ps — By the way, I just ran 3v on your pdb. Feel free to set whatever parameters you like, but I used 2 and 6 as probe radii. See attached results.

problem with server MolMovDB

Q:
It has been a while i am uploading my pdb files on morph server but did no receive any answer. I have checked both files and the atoms are the same. i am afraid what is the problem??

Is it possible to check my uploaded jobs and tell me what is the error causing the problem? it would be very helpful to solve my pdb files problem. Here is the code of last job 755599-30494.

A:
There seems to be something wrong with your PDB files, because when we try running the single-chain server using other PDB pairs, it seems to work fine with those PDBs. For instance, when running the PDBs attached to this email, the following morph is generated (using Safari to visualize the morph):

http://www.molmovdb.org/cgi-bin/morph.cgi?ID=800501-5588

At first glance, your PDBs do not seem to have chain fields. May we ask how you produced your PDB files? Would it be possible for you to re-build the files with the chain fields set to the letter "A" (or any other arbitrary letter)? Then we will continue from there.

Trouble installing libproteingeometry on Ubuntu 12.04

Q1:
I am trying to set up your software package (libproteingeometry) on a lab
desktop. I having trouble during the Make step. After running for a while,
one of the packages is unable to link to the appropriate math header files,
causing make to exit with an error. I am able to understand some C++ coding,
but am not proficient enough to come up with a solution to this problem
independently. I would like to use your program, so is there anyway that you
can put me in contact with someone in your lab who might be able to help me
finish installing this software?

A1:
Apologies again for the delay. May I ask how many structures you plan on running? Customizing libprotgeometry for your machine’s settings may be pretty tricky, especially from our end. If you’re not running too many structures, I’d highly suggest using the Online version of this tool (assuming you’re calculating residue-level packing metrics), linked here:

http://www.molmovdb.org/cgi-bin/voronoi.cgi

Q2:
Unfortunately, I plan on using you program in some molecular dynamics analysis, so the number of structures would be on the order of thousands. I fee like this is too much to run on your server (unless your server has a MD analysis option that I missed, which would be great).

I can send you specific error messages I have upon trying to run the make command. Please let me know what would help you. I saw this error on a forum somewhere where it looked like the problem had been solved, but the solution was not posted.

A2:
Yes, please do send us specific error messages upon trying to run make, as well as any other error messages that you obtain. The more info we have, the better. We’ll try to sort things out. Also, what type of OS is your machine running?

Q3:
I run ubuntu 12.04. I’m pretty sure the system is a converted dell. i5 processor, with nvidia graphics card. I’ll send the error messages in the morning.

molmovdb job 014670-22217

Q:
I submitted a job to your morph server but after 4-5 days it is not yet completed. Could you please check if there is a problem or if I made a mistake in my submission?

I just gave two PDB codes for different conformations of maltose-binding protein (MBP). The two codes were 1OMP and 3MBP. They are both monomers and have the same number of residues, but 3MBP has a ligand bound.

A:
Indeed, it appears as if the issue has to do with PDB format irregularities. We have corrected these issues, and your morph may viewed by clicking the link below (please use Safari to view morphs, as Chrome and Firefox no longer support java):

http://www.molmovdb.org/cgi-bin/morph.cgi?ID=057837-877

Feel free to let us know if you experience any further difficulties. Also, if you like, we’d be happy to send you all of the accessory files associated with this morph.

molmovdb.org reboot?

Q:
Still getting this message after several days.

The job 072540-24927 is not yet completed

The two files were 1ohu chain A

1ty4 chain A

A:
It appears as if the issue has to do with PDB format irregularities. Specifically, the sequences within ATOM fields do no match the residues reported in the PDB file’s SEQRES field. In any case, we have corrected these, and your morph may viewed by clicking the link below (please use Safari to view morphs, as Chrome and Firefox no longer support java):

http://www.molmovdb.org/cgi-bin/morph.cgi?ID=056703-32536

Questions about the STRESS method

Q1:
I thoroughly enjoyed reading your recent article regarding allosteric hotspot
detection.

I am interested in using this for my proteins and have started submitting my
queries via the provided web server.

It has been running for quite a while, since thursday/friday and I wondered
whether if this was normal or whether something has gone wrong.

In addition to this, as I am wanting to do this for a few more PDBs, is
there an option for a batch input?

A1:
We’re happy to hear that you enjoyed reading about our work, though we apologize about the issues you’re experiencing w/the server. We are investigating this, but it appears as if there is a load issue (all four of our backend CPUs have been running 24/7 since the paper went online). We’ll send you further updates soon.

In the meantime, however, there are two alternative options, and both would also address the need for batch input with multiple structures (the server itself does not provide an option for batch input):

1) We would be more than happy to run as many structures as you like, and we’d start running your structures as soon as you send me the relevant PDBs. WE can run over 1000 structures if necessary (We’d be running them on Yale’s HPC machines, not on Amazon).

2) All of the source code is available through GitHub (github.com/gersteinlab/STRESS). If you already have MMTK installed, then everything should be ready to run on the PDB files with which you’re working.

Q2:
Thank you for your very helpful message. I am emailing off my different account as the other one is having trouble attaching documents.

It would be great if you could help run the batch query as I haven’t yet got MMTK installed on my computer. In total I have 163 structures and would be very grateful if you could run STRESS for them.

Here is the .tsv file attached which contains the PDB_ID and CHAIN_ID for my structures.

Just to note, in the file i sent you, the first PDB id is 1e50, for some reason google has changed that. I hope this is OK!

A2:
Most of your runs are now finished, and you can access them in the link below. There are also a few notes I should mention:

http://homes.gersteinlab.org/people/dc547/.M_Pang/

1) I noticed that some of your structures are NMR structures (as oppose to x-ray crystal structures). I should mention that I have not tested to STRESS framework for such structures, so I can’t say for sure how well it performs. Also, the big issue is that, since NMR structures are generally given as an ensemble, the question is: when running STRESS, which structure should be used? I took the first model in each NMR structure, but this is somewhat arbitrary. Thus, I would interpret the NMR structures with caution. A list of your NMR structures is pasted here:
1f5y.pdb EXPDTA SOLUTION NMR
1gd5.pdb EXPDTA SOLUTION NMR
1k1g.pdb EXPDTA SOLUTION NMR
1o4x.pdb EXPDTA SOLUTION NMR
1rmj.pdb EXPDTA SOLUTION NMR
1urf.pdb EXPDTA SOLUTION NMR
2cr4.pdb EXPDTA SOLUTION NMR
2dn4.pdb EXPDTA SOLUTION NMR
2e7b.pdb EXPDTA SOLUTION NMR
2edk.pdb EXPDTA SOLUTION NMR
2edl.pdb EXPDTA SOLUTION NMR
2js7.pdb1 EXPDTA SOLUTION NMR
2jwa.pdb EXPDTA SOLUTION NMR
2jzx.pdb1 EXPDTA SOLUTION NMR
2kn6.pdb1 EXPDTA SOLUTION NMR
2l4c.pdb1 EXPDTA SOLUTION NMR

2) In the link above, I’ve also provided a gz file of all the original PDB files used.

3) There was an error when running the surface module on 1m5o. I think that this is a result of the fact that this structure is mostly composed of nucleic acid (HETATMs in PDB records, which are removed prior to processing (MMTK fails on HETATMs)

4) The interior-module is not yet complete for all 147 of your structures. The following 4 structures are still running (I can send you the results once they’re complete):
2ozl
2zw3
3ezz
3hn3

That’s it for now. Please take a look at the output whenever you get a free moment, and let me know what you think.

Question about morph jobs “not yet completed” and errors

Q:
Greetings. Apologies for bothering you, but your morphing site suggests that
I contact you if a job is not finished in a day or so. The following jobs
were submitted last Friday:
– 692711-16199
– 692809-16356
– b692486-16007

For b692486-16007, at https://urldefense.proofpoint.com/v2/url?u=http-3A__www.molmovdb.org_cgi-2Dbin_morph.cgi-3FID-3D&d=AwIFaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=GXLLd-iiiG3R6K6OQPuu_LKCNRF_WFNNajU6UPeecr0&m=Pa2aCcufMFrEjabZ9RvhKK7vIb2V8KTTaDj5TsR6n7E&s=W6R2HgNPIB0eB-w_lOkuCjLbCf_3TMwfNsDf9Nr1lUQ&e=
I get the following message: "Your request could not be processed. The
following error was detected: Morph not found in database.”

A:
An investigation into your morphs has been completed, and we have identified likely causes of these errors. In all cases, it appears as if the errors are a consequence of the PDB file formats. Details on specific morphs are given below:

For kb protein: 2qke_monomer.pdb —> fsTB_avg_min.pdb —> 2qke_monomer.pdb
At least one major issue identified was the fact that there appear to be pathological residue formats. For instance, have a look at the beginning of the ATOM records for the PDB 2qke_monomer.pdb:
ATOM 1 N MET A 1 -8.932 -20.214 2.255 1.00129.79 N
ATOM 2 CA MET A 1 -8.182 -19.102 2.920 1.00129.79 C
ATOM 3 C MET A 1 -8.441 -19.104 4.433 1.00129.79 C
ATOM 4 O MET A 1 -8.910 -18.109 4.991 1.00129.79 O
ATOM 5 CB MET A 1 -8.608 -17.750 2.326 1.00129.79 C
ATOM 6 CG MET A 1 -8.651 -17.719 0.800 1.00129.79 C
ATOM 7 SD MET A 1 -9.005 -16.077 0.094 1.00129.79 S
ATOM 8 CE MET A 1 -10.801 -15.970 0.277 1.00129.79 C

Everything appears to be perfectly fine with this MET residue. But have a look at the corresponding MET (again, the first residue in the file) within the PDB file to which we are trying to morph (ie, MET 1 in fsTB_avg_min.pdb):
ATOM 1 N MET 1 -10.344 9.596 10.785 1.00999.99
ATOM 2 HT1 MET 1 -10.167 8.615 10.490 1.00999.99
ATOM 3 HT2 MET 1 -9.599 10.204 10.388 1.00999.99
ATOM 4 HT3 MET 1 -11.263 9.902 10.406 1.00999.99
ATOM 5 CA MET 1 -10.344 9.693 12.268 1.00999.99
ATOM 6 HA MET 1 -10.330 10.740 12.539 1.00999.99
ATOM 7 CB MET 1 -11.608 9.045 12.838 1.00999.99
ATOM 8 HB1 MET 1 -11.402 8.006 13.048 1.00999.99
ATOM 9 HB2 MET 1 -12.394 9.105 12.100 1.00999.99
ATOM 10 CG MET 1 -12.105 9.700 14.117 1.00999.99
ATOM 11 HG1 MET 1 -11.283 9.762 14.816 1.00999.99
ATOM 12 HG2 MET 1 -12.888 9.087 14.538 1.00999.99
ATOM 13 SD MET 1 -12.756 11.359 13.840 1.00999.99
ATOM 14 CE MET 1 -11.572 12.350 14.748 1.00999.99
ATOM 15 HE1 MET 1 -10.811 12.710 14.072 1.00999.99
ATOM 16 HE2 MET 1 -11.114 11.749 15.519 1.00999.99
ATOM 17 HE3 MET 1 -12.078 13.190 15.200 1.00999.99
ATOM 18 C MET 1 -9.108 9.019 12.853 1.00999.99
ATOM 19 O MET 1 -8.352 9.629 13.609 1.00999.99

Of course, the morph server is trying to morph each residue into the corresponding residue of the other PDB file. However, it is very difficult to do this, most likely because the residues given are actually completely different (your MET residue in fsTB_avg_min.pdb seems to have ~3 times the number of atoms, making it impossible to perform the morph). Notably, the MET 1 residue is not unique in this regard — it appears as if there are many other residues with completely different numbers of atoms and formats.

I would also mention that all morphs are pairwise (rather than being annotated as 3-way morphs in the way that you have this one) — thus, what we tried to generate was really the following: 2qke_monomer.pdb —> fsTB_avg_min.pdb

For ka protein:
truncated5c5e.pdb —> KaiA_transitionState.pdb —> kaiA_fromTernary.pdb —> KaiA_transitionState.pdb —> truncated5c5e.pdb

Again, one immediate issue here is that all morphs are pairwise. Thus, the following individual pairwise morphs are possible
truncated5c5e.pdb —> KaiA_transitionState.pdb
KaiA_transitionState.pdb —> kaiA_fromTernary.pdb
kaiA_fromTernary.pdb —> KaiA_transitionState.pdb
KaiA_transitionState.pdb —> truncated5c5e.pdb

However, a single continues morph between all 5 structures given (really a cycle between 4 morphs) is not possible.

Secondly, if you look closely at the files kaiA_fromTernary.pdb and KaiA_transitionState.pdb, these seem to be completely different sequences (ie, the sequence of residues are very different). Some sequence differences can indeed be tolerated by the morph server, but beyond a certain degree of sequence homology, morphing becomes unreliable and eventually impossible.

For kc protein: c1_from_BCcomplex.pdb —> c1_from_40om.pdb —> c1_from_BCcomplex.pdb

Here, one immediate issue is (as with the first morph), the residue formats seem to be completely different and incompatible. For instance, have a look at VAL 19 in c1_from_BCcomplex.pdb:
ATOM 1 N VAL A 19 41.315 27.606 60.932 1.00 43.58 N
ATOM 2 CA VAL A 19 40.989 28.635 59.949 1.00 41.68 C
ATOM 3 C VAL A 19 42.235 29.400 59.515 1.00 24.69 C
ATOM 4 O VAL A 19 42.771 30.221 60.265 1.00 37.44 O
ATOM 5 CB VAL A 19 39.924 29.626 60.510 1.00 46.84 C
ATOM 6 CG1 VAL A 19 39.678 30.796 59.562 1.00 31.21 C
ATOM 7 CG2 VAL A 19 38.613 28.894 60.761 1.00 49.29 C
ATOM 8 HA VAL A 19 40.613 28.209 59.163 1.00 50.01 H
ATOM 9 HB VAL A 19 40.237 29.983 61.356 1.00 56.21 H
ATOM 10 HG11 VAL A 19 39.011 31.383 59.952 1.00 37.45 H
ATOM 11 HG12 VAL A 19 40.509 31.279 59.434 1.00 37.45 H
ATOM 12 HG13 VAL A 19 39.360 30.452 58.712 1.00 37.45 H
ATOM 13 HG21 VAL A 19 37.962 29.523 61.109 1.00 59.15 H
ATOM 14 HG22 VAL A 19 38.296 28.519 59.924 1.00 59.15 H
ATOM 15 HG23 VAL A 19 38.766 28.185 61.405 1.00 59.15 H

Now compare this to the corresponding residue VAL 19 in the file c1_from_40om.pdb:
ATOM 1 N VAL A 19 -23.156 44.101 -9.426 1.00 64.75 N
ATOM 2 CA VAL A 19 -22.022 43.812 -10.291 1.00 58.63 C
ATOM 3 C VAL A 19 -21.671 42.331 -10.275 1.00 54.58 C
ATOM 4 O VAL A 19 -21.400 41.761 -9.218 1.00 55.90 O
ATOM 5 CB VAL A 19 -20.783 44.627 -9.874 1.00 52.74 C
ATOM 6 CG1 VAL A 19 -19.630 44.361 -10.818 1.00 47.35 C
ATOM 7 CG2 VAL A 19 -21.116 46.109 -9.837 1.00 61.62 C

The VAL 19 within this second file looks good, but there seems to be something wrong with the format of the VAL19 in the first file. It is likely that the errors are a result of a) the incompatible residue formats, and b) the unrecognizable format given in the 1st file. Here, again, I just use VAL19 as an example — many other residues in your file seem to have this issue.

In sum, we advise ‘homogonizing’ the file formats, and adopting conventional residue formats, if possible. You might want to run a python script to extract out the atoms that are consistent with standard formats, for instance. We cannot guarantee with 100% that this will fix everything, but we can guarantee that this is the ideal starting point for resolving these errors.

Thank you again for using the server, and please do not hesitate to contact us if you have further questions or experience further difficulty.

Morphing TRAP1

Q1:
in the past months I tried several times to us the multi-chain morph server for creating a movie of the heterodimeric protein TRAP1. I never got an e-mail back and when I used the old version of the server it does not seem to come to a result since more than 24 hours. It always gives the message not completed yet. I am wondering what the problem is.

A1:
Thank you for your query regarding the server. We’ll look into this, but may I ask why you are using the old version of our server specifically? I only ask because I ran tests on our newer multi-chain server on Sunday, and things worked very well there:

http://molmovdb.org/cgi-bin/beta.cgi

Having said that, we’ll check on things. Would you mind providing us with your Job ID (if you still have it), as well as the PBD files which you’d like to morph?

Q2:
I used the old server because I had used the new one before several times with the same job and never got an e-mail that it was finished. the job number is m716893-2511.

A2:
I was unable to find the directory corresponding to your morph, so our apologies for that. There are two possibilities I can think of:

1) We recently did some minor work on the server. Things were down for a short time, but when I checked over the weekend, things were back to normal. It is possible that your issue was a temporary one.

2) The second possibility is that your PDB files have formatting irregularities or some type of heteroatom which is not being processed or recognized by our server.

The best way to address the 1st possibility is for you to just re-submit your jobs on the newer server [ http://molmovdb.org/cgi-bin/beta.cgi ], and see if everything works. The easiest way to address the 2nd possible issue may be if you just send us your PDB files, and we can have a closer look at them.

Q3:
I tried the morph again in ran through this time but the movie file was empty and the pdb-files I could download just had "end" written in them. I also tried pdb-files of single chains. I also removed heteroatoms and made sure the number of amino acids is identical in both files. The message I get from the server is that it is not yet complete. The latest jog has the number 018936-29130. I do not understand what is wrong with these pdb-files. Please find enclosed the template pdbs that I have tried last. It would be great, if you would find out what is wrong with this.

A3:
Your PDB files look very good. It is not clear why your previous submission failed, but we were able to successfully generate your morph. You may view it here:

http://www.molmovdb.org/cgi-bin/morph.cgi?ID=032929-30641

You may or may not find the attached image useful, but we have also produced a structure alignment for you (blue corresponds to low-RMSD regions of the alignment, and red corresponds to regions with higher RMSD between your two structures).

Request for a SI document

Q1:
The file in the url:
http://www.nature.com/nature/journal/v489/n7414/extref/nature11245-s1.pdf
(SI of the paper "Architecture of the human regulatory network derived from
ENCODE data") is damaged and cannot be read.

Can you please send me a copy?

A1:
The file in the url:
http://www.nature.com/nature/journal/v489/n7414/extref/nature11245-s1.pdf
(SI of the paper "Architecture of the human regulatory network derived from
ENCODE data") is damaged and cannot be read.

Can you please send me a copy?

Q2:
Thank you for your prompt reply.

The nature11245__ALL.pdf file you generated works fine (using Adobe XI on Windows 7) but as I mentioned, the file download from nature’s website is damaged (on windows and Linux machines).

FYI, here is Linux stderr (using ocular):

Error: PDF file is damaged – attempting to reconstruct xref table…

Error: Couldn’t find trailer dictionary

Error: Couldn’t read xref table

Connecting to deprecated signal QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString)

A2:
Thank you for letting us know. This is actually a more serious problem than what I had been expecting. We may need to contact them about this very soon, as other users will experience the same problem. Thanks again.

It might be a problem of fonts that mac os has but windows/linux don’t. You might want to try produce the pdf on windows and try to open it on mac os and linux and if it works just substitute the file on nature site (which is not a trivial task I guess).