molmovdb job 014670-22217

Q:
I submitted a job to your morph server but after 4-5 days it is not yet completed. Could you please check if there is a problem or if I made a mistake in my submission?

I just gave two PDB codes for different conformations of maltose-binding protein (MBP). The two codes were 1OMP and 3MBP. They are both monomers and have the same number of residues, but 3MBP has a ligand bound.

A:
Indeed, it appears as if the issue has to do with PDB format irregularities. We have corrected these issues, and your morph may viewed by clicking the link below (please use Safari to view morphs, as Chrome and Firefox no longer support java):

http://www.molmovdb.org/cgi-bin/morph.cgi?ID=057837-877

Feel free to let us know if you experience any further difficulties. Also, if you like, we’d be happy to send you all of the accessory files associated with this morph.

molmovdb.org reboot?

Q:
Still getting this message after several days.

The job 072540-24927 is not yet completed

The two files were 1ohu chain A

1ty4 chain A

A:
It appears as if the issue has to do with PDB format irregularities. Specifically, the sequences within ATOM fields do no match the residues reported in the PDB file’s SEQRES field. In any case, we have corrected these, and your morph may viewed by clicking the link below (please use Safari to view morphs, as Chrome and Firefox no longer support java):

http://www.molmovdb.org/cgi-bin/morph.cgi?ID=056703-32536

Questions about the STRESS method

Q1:
I thoroughly enjoyed reading your recent article regarding allosteric hotspot
detection.

I am interested in using this for my proteins and have started submitting my
queries via the provided web server.

It has been running for quite a while, since thursday/friday and I wondered
whether if this was normal or whether something has gone wrong.

In addition to this, as I am wanting to do this for a few more PDBs, is
there an option for a batch input?

A1:
We’re happy to hear that you enjoyed reading about our work, though we apologize about the issues you’re experiencing w/the server. We are investigating this, but it appears as if there is a load issue (all four of our backend CPUs have been running 24/7 since the paper went online). We’ll send you further updates soon.

In the meantime, however, there are two alternative options, and both would also address the need for batch input with multiple structures (the server itself does not provide an option for batch input):

1) We would be more than happy to run as many structures as you like, and we’d start running your structures as soon as you send me the relevant PDBs. WE can run over 1000 structures if necessary (We’d be running them on Yale’s HPC machines, not on Amazon).

2) All of the source code is available through GitHub (github.com/gersteinlab/STRESS). If you already have MMTK installed, then everything should be ready to run on the PDB files with which you’re working.

Q2:
Thank you for your very helpful message. I am emailing off my different account as the other one is having trouble attaching documents.

It would be great if you could help run the batch query as I haven’t yet got MMTK installed on my computer. In total I have 163 structures and would be very grateful if you could run STRESS for them.

Here is the .tsv file attached which contains the PDB_ID and CHAIN_ID for my structures.

Just to note, in the file i sent you, the first PDB id is 1e50, for some reason google has changed that. I hope this is OK!

A2:
Most of your runs are now finished, and you can access them in the link below. There are also a few notes I should mention:

http://homes.gersteinlab.org/people/dc547/.M_Pang/

1) I noticed that some of your structures are NMR structures (as oppose to x-ray crystal structures). I should mention that I have not tested to STRESS framework for such structures, so I can’t say for sure how well it performs. Also, the big issue is that, since NMR structures are generally given as an ensemble, the question is: when running STRESS, which structure should be used? I took the first model in each NMR structure, but this is somewhat arbitrary. Thus, I would interpret the NMR structures with caution. A list of your NMR structures is pasted here:
1f5y.pdb EXPDTA SOLUTION NMR
1gd5.pdb EXPDTA SOLUTION NMR
1k1g.pdb EXPDTA SOLUTION NMR
1o4x.pdb EXPDTA SOLUTION NMR
1rmj.pdb EXPDTA SOLUTION NMR
1urf.pdb EXPDTA SOLUTION NMR
2cr4.pdb EXPDTA SOLUTION NMR
2dn4.pdb EXPDTA SOLUTION NMR
2e7b.pdb EXPDTA SOLUTION NMR
2edk.pdb EXPDTA SOLUTION NMR
2edl.pdb EXPDTA SOLUTION NMR
2js7.pdb1 EXPDTA SOLUTION NMR
2jwa.pdb EXPDTA SOLUTION NMR
2jzx.pdb1 EXPDTA SOLUTION NMR
2kn6.pdb1 EXPDTA SOLUTION NMR
2l4c.pdb1 EXPDTA SOLUTION NMR

2) In the link above, I’ve also provided a gz file of all the original PDB files used.

3) There was an error when running the surface module on 1m5o. I think that this is a result of the fact that this structure is mostly composed of nucleic acid (HETATMs in PDB records, which are removed prior to processing (MMTK fails on HETATMs)

4) The interior-module is not yet complete for all 147 of your structures. The following 4 structures are still running (I can send you the results once they’re complete):
2ozl
2zw3
3ezz
3hn3

That’s it for now. Please take a look at the output whenever you get a free moment, and let me know what you think.

Question about morph jobs “not yet completed” and errors

Q:
Greetings. Apologies for bothering you, but your morphing site suggests that
I contact you if a job is not finished in a day or so. The following jobs
were submitted last Friday:
– 692711-16199
– 692809-16356
– b692486-16007

For b692486-16007, at https://urldefense.proofpoint.com/v2/url?u=http-3A__www.molmovdb.org_cgi-2Dbin_morph.cgi-3FID-3D&d=AwIFaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=GXLLd-iiiG3R6K6OQPuu_LKCNRF_WFNNajU6UPeecr0&m=Pa2aCcufMFrEjabZ9RvhKK7vIb2V8KTTaDj5TsR6n7E&s=W6R2HgNPIB0eB-w_lOkuCjLbCf_3TMwfNsDf9Nr1lUQ&e=
I get the following message: "Your request could not be processed. The
following error was detected: Morph not found in database.”

A:
An investigation into your morphs has been completed, and we have identified likely causes of these errors. In all cases, it appears as if the errors are a consequence of the PDB file formats. Details on specific morphs are given below:

For kb protein: 2qke_monomer.pdb —> fsTB_avg_min.pdb —> 2qke_monomer.pdb
At least one major issue identified was the fact that there appear to be pathological residue formats. For instance, have a look at the beginning of the ATOM records for the PDB 2qke_monomer.pdb:
ATOM 1 N MET A 1 -8.932 -20.214 2.255 1.00129.79 N
ATOM 2 CA MET A 1 -8.182 -19.102 2.920 1.00129.79 C
ATOM 3 C MET A 1 -8.441 -19.104 4.433 1.00129.79 C
ATOM 4 O MET A 1 -8.910 -18.109 4.991 1.00129.79 O
ATOM 5 CB MET A 1 -8.608 -17.750 2.326 1.00129.79 C
ATOM 6 CG MET A 1 -8.651 -17.719 0.800 1.00129.79 C
ATOM 7 SD MET A 1 -9.005 -16.077 0.094 1.00129.79 S
ATOM 8 CE MET A 1 -10.801 -15.970 0.277 1.00129.79 C

Everything appears to be perfectly fine with this MET residue. But have a look at the corresponding MET (again, the first residue in the file) within the PDB file to which we are trying to morph (ie, MET 1 in fsTB_avg_min.pdb):
ATOM 1 N MET 1 -10.344 9.596 10.785 1.00999.99
ATOM 2 HT1 MET 1 -10.167 8.615 10.490 1.00999.99
ATOM 3 HT2 MET 1 -9.599 10.204 10.388 1.00999.99
ATOM 4 HT3 MET 1 -11.263 9.902 10.406 1.00999.99
ATOM 5 CA MET 1 -10.344 9.693 12.268 1.00999.99
ATOM 6 HA MET 1 -10.330 10.740 12.539 1.00999.99
ATOM 7 CB MET 1 -11.608 9.045 12.838 1.00999.99
ATOM 8 HB1 MET 1 -11.402 8.006 13.048 1.00999.99
ATOM 9 HB2 MET 1 -12.394 9.105 12.100 1.00999.99
ATOM 10 CG MET 1 -12.105 9.700 14.117 1.00999.99
ATOM 11 HG1 MET 1 -11.283 9.762 14.816 1.00999.99
ATOM 12 HG2 MET 1 -12.888 9.087 14.538 1.00999.99
ATOM 13 SD MET 1 -12.756 11.359 13.840 1.00999.99
ATOM 14 CE MET 1 -11.572 12.350 14.748 1.00999.99
ATOM 15 HE1 MET 1 -10.811 12.710 14.072 1.00999.99
ATOM 16 HE2 MET 1 -11.114 11.749 15.519 1.00999.99
ATOM 17 HE3 MET 1 -12.078 13.190 15.200 1.00999.99
ATOM 18 C MET 1 -9.108 9.019 12.853 1.00999.99
ATOM 19 O MET 1 -8.352 9.629 13.609 1.00999.99

Of course, the morph server is trying to morph each residue into the corresponding residue of the other PDB file. However, it is very difficult to do this, most likely because the residues given are actually completely different (your MET residue in fsTB_avg_min.pdb seems to have ~3 times the number of atoms, making it impossible to perform the morph). Notably, the MET 1 residue is not unique in this regard — it appears as if there are many other residues with completely different numbers of atoms and formats.

I would also mention that all morphs are pairwise (rather than being annotated as 3-way morphs in the way that you have this one) — thus, what we tried to generate was really the following: 2qke_monomer.pdb —> fsTB_avg_min.pdb

For ka protein:
truncated5c5e.pdb —> KaiA_transitionState.pdb —> kaiA_fromTernary.pdb —> KaiA_transitionState.pdb —> truncated5c5e.pdb

Again, one immediate issue here is that all morphs are pairwise. Thus, the following individual pairwise morphs are possible
truncated5c5e.pdb —> KaiA_transitionState.pdb
KaiA_transitionState.pdb —> kaiA_fromTernary.pdb
kaiA_fromTernary.pdb —> KaiA_transitionState.pdb
KaiA_transitionState.pdb —> truncated5c5e.pdb

However, a single continues morph between all 5 structures given (really a cycle between 4 morphs) is not possible.

Secondly, if you look closely at the files kaiA_fromTernary.pdb and KaiA_transitionState.pdb, these seem to be completely different sequences (ie, the sequence of residues are very different). Some sequence differences can indeed be tolerated by the morph server, but beyond a certain degree of sequence homology, morphing becomes unreliable and eventually impossible.

For kc protein: c1_from_BCcomplex.pdb —> c1_from_40om.pdb —> c1_from_BCcomplex.pdb

Here, one immediate issue is (as with the first morph), the residue formats seem to be completely different and incompatible. For instance, have a look at VAL 19 in c1_from_BCcomplex.pdb:
ATOM 1 N VAL A 19 41.315 27.606 60.932 1.00 43.58 N
ATOM 2 CA VAL A 19 40.989 28.635 59.949 1.00 41.68 C
ATOM 3 C VAL A 19 42.235 29.400 59.515 1.00 24.69 C
ATOM 4 O VAL A 19 42.771 30.221 60.265 1.00 37.44 O
ATOM 5 CB VAL A 19 39.924 29.626 60.510 1.00 46.84 C
ATOM 6 CG1 VAL A 19 39.678 30.796 59.562 1.00 31.21 C
ATOM 7 CG2 VAL A 19 38.613 28.894 60.761 1.00 49.29 C
ATOM 8 HA VAL A 19 40.613 28.209 59.163 1.00 50.01 H
ATOM 9 HB VAL A 19 40.237 29.983 61.356 1.00 56.21 H
ATOM 10 HG11 VAL A 19 39.011 31.383 59.952 1.00 37.45 H
ATOM 11 HG12 VAL A 19 40.509 31.279 59.434 1.00 37.45 H
ATOM 12 HG13 VAL A 19 39.360 30.452 58.712 1.00 37.45 H
ATOM 13 HG21 VAL A 19 37.962 29.523 61.109 1.00 59.15 H
ATOM 14 HG22 VAL A 19 38.296 28.519 59.924 1.00 59.15 H
ATOM 15 HG23 VAL A 19 38.766 28.185 61.405 1.00 59.15 H

Now compare this to the corresponding residue VAL 19 in the file c1_from_40om.pdb:
ATOM 1 N VAL A 19 -23.156 44.101 -9.426 1.00 64.75 N
ATOM 2 CA VAL A 19 -22.022 43.812 -10.291 1.00 58.63 C
ATOM 3 C VAL A 19 -21.671 42.331 -10.275 1.00 54.58 C
ATOM 4 O VAL A 19 -21.400 41.761 -9.218 1.00 55.90 O
ATOM 5 CB VAL A 19 -20.783 44.627 -9.874 1.00 52.74 C
ATOM 6 CG1 VAL A 19 -19.630 44.361 -10.818 1.00 47.35 C
ATOM 7 CG2 VAL A 19 -21.116 46.109 -9.837 1.00 61.62 C

The VAL 19 within this second file looks good, but there seems to be something wrong with the format of the VAL19 in the first file. It is likely that the errors are a result of a) the incompatible residue formats, and b) the unrecognizable format given in the 1st file. Here, again, I just use VAL19 as an example — many other residues in your file seem to have this issue.

In sum, we advise ‘homogonizing’ the file formats, and adopting conventional residue formats, if possible. You might want to run a python script to extract out the atoms that are consistent with standard formats, for instance. We cannot guarantee with 100% that this will fix everything, but we can guarantee that this is the ideal starting point for resolving these errors.

Thank you again for using the server, and please do not hesitate to contact us if you have further questions or experience further difficulty.

Volume calculations w/3V

Q:
I would really appreciate if your could assist me with an issue I encounter by using your online 3V software.
I am trying to compute the cavity size of a host (which I also did two years ago, see Org. Biomol. Chem., 2013, 11,
7667) but am receiving the message: "failed to create an MRC file”.

Program is still running, progress is shown below
host ip address=164.107.224.25 (Tue Mar 31 17:13:16 2015)
converting pdb into xyzr (Tue Mar 31 17:13:16 2015)
completed conversion of PDB file: 2015.mar31.8dd.pdb (size: 4k) (Tue Mar 31 17:13:16 2015)
converted 128 atoms of 128 atoms (Tue Mar 31 17:13:16 2015)
found 128 atoms in pdb (Tue Mar 31 17:13:16 2015)
running 3v channel program (Tue Mar 31 17:13:16 2015)
failed to create an MRC file (Tue Mar 31 17:13:16 2015)

The program stops and does not provide me with any result.

A:
I looked at the log file and the program does not find a cavity at coordinate 0,0,0. I tried it again with a high resolution grid size.

I used the cached PDB that you uploaded and I managed to get a channel using channel finder:
http://3vee.molmovdb.org/viewResults.php?jobid=2015.apr04.e5b

It missed the channel, I think if you use the coordinates, 5,0,0 it will work better:
http://3vee.molmovdb.org/viewResults.php?jobid=2015.apr04.e64

Temp issues w/Packing-Eff

Q1:

I read your journal about Packing-Eff. I find it very informative and resourceful. I would like to try to use it on some protein models that i had built using comparative modelling (MODELLER).

However, when i try to access the website, Packing-Eff Online, I’m afraid it is down. Can you help me with the problem?

Q2:
I am studying the packing of residues in proteins and tried online version
of "Packing-Eff". Unfortunately I could not find any relation between output
amino acid numbering or total number of residues and the input PDB file (I
used PDB: 451C). Is it normal?

It would be nice of you if you help me to solve this problem.

A1 & A2:
Sorry about this. We had briefly experienced a systems failure, but have since recovered. Try again now.

Alternative to StoneHinge?

Q:
I am a research student working on protein structures using computational methods. I have used the tool StoneHinge to determine the hinge region residue and %protein rigidity for a protein. To confirm and report the significance of the putative hinge residue, I induced single residue mutations and noted the changes in the %packing rigidity. I was unable to find any significant changes. Therefore to confirm the importance of the putative hinge residue and effects of mutation on the hinge movement/rotation, is there any other tool?

A:
did you see our "related resources" page?:

http://www2.molmovdb.org/wiki/info/index.php/Related_Resources

Some of the items in here may be of some help

SIN database, request detailed format

Q:

I am interested in the evolution of protein-protein interaction networks, and
recently became an enthusiastic user of your Structural Interaction Network
(SIN) database.

While downloading the data from the SIN website
(http://networks.gersteinlab.org/structint/), I noticed that more detailed
formats are available upon request for for SIN versions 0.9, 1.0 and 2.0.
In particular, which Pfam domains are involved in each interaction, and
which yeast crystal structure (hopefully PDB identifications) the
interactions are based on.

Would it be possible to obtain this information? I would really appreciate
that. I hope to be able to use it to survey physical properties of the
interactions throughout the network, and connect it to the evolutionary
simulations I’m working on at the lab.

I have a few questions about the DynaSIN. Sorry for this long email, I tried to be as clear as possible. It would be really great if you could help me answer those questions!

Question (1) and (2) are regarding the ‘Interaction Data’ section, file ‘interface_final2.txt’:

(1) What is the significance of the order in which protein A and protein B (second and third columns, respectively) are presented? In other words – if protein A and B are swapped, should the other entries (PDB IDs and surface residues) be calculated in a different way? I thought that swapping protein A and B should give the same result, but I noticed that for interaction 566 and 508, swapping protein A and B result in different PDB IDs and different surface residues for the PDB IDs they have in common:

566 HFE_HUMAN TFR1_HUMAN Permanent 1A6Z_A;1A6Z_B;26,30,49,97,122,202,204,236,243,;54,55,53,31,60,99,11,10, 1A6Z_A;1A6Z_D;; 1A6Z_C;1A6Z_B;; 1A6Z_C;1A6Z_D;26,30,49,97,122,204,236,243,;54,55,53,31,60,11,99,10, 1DE4_A;1DE4_B;30,49,121,122,204,233,236,243,;55,53,1,60,99,11,8,10, 1DE4_A;1DE4_E;; 1DE4_A;1DE4_H;; 1DE4_D;1DE4_B;; 1DE4_D;1DE4_E;30,49,97,120,122,202,204,206,207,233,236,239,243,;55,53,60,3,98,99,11,12,13,8,10, 1DE4_D;1DE4_H;; 1DE4_G;1DE4_B;; 1DE4_G;1DE4_E;; 1DE4_G;1DE4_H;30,49,97,120,121,122,202,204,233,236,;55,53,62,31,1,60,98,99,11,8,10,

508 TFR1_HUMAN HFE_HUMAN Permanent 1DE4_C;1DE4_A;629,640,;85,146, 1DE4_C;1DE4_D;; 1DE4_C;1DE4_G;; 1DE4_F;1DE4_A;; 1DE4_F;1DE4_D;629,658,;146,64, 1DE4_F;1DE4_G;; 1DE4_I;1DE4_A;; 1DE4_I;1DE4_D;; 1DE4_I;1DE4_G;629,640,;85,146,

(2) Do the surface residues numbers (column 5 and subsequent columns) correspond to their position in the full protein sequence as defined in UniProt? Or the residue ID in the PDB file? I assume the latter (but still wanted to make sure) because sometimes the surface residues numbers exceed the protein length. For example in interaction 554, first PDB description:

554 CDC42_HUMAN RHG01_HUMAN Transient 1AM4_D;1AM4_A;532,561,563,564,;189,191,198,126,197,220, …

For the PDB ID 1AM4 (see ), chain D (protein CDC42) is 191 amino acids long (see http://www.uniprot.org/uniprot/P60953) and the surface residues are 532,561,563 and 564.

And (3), a more general question regarding the definition of ‘transient’ and ‘permanent’ interactions. In the Bhardwaj et al (2011) paper it was mentioned that:

"It should be noted here that the term ‘‘permanent’’ does not indicate that the relevant protein interacts with its partner in a strictly permanent fashion (i.e., it does not remain bound to the partner for the duration of its life time). This term (along with ‘‘transient’’ interaction) is based on the convention previously adopted by Kim et al".

I searched the Kim et al (Science 2006) paper for a definition, but I couldn’t find it in the main text or supporting information. Could you please let me know what is the definition, or point out where the definition is? That would be very helpful.

A:
you might want to look at dynasin.molmovdb.org

Unfortunately, the E. coli set does not include the same level of detail which
we provide for the human set on our website. Indeed, the E. coli set, though
part of our study, was not the main focus of the study that motivated the
creation of DynaSIN [ref provided below].

Having said that, however, it should be possible to parse through our E. coli
set and to download the appropriate data from biomart by searching for gene-PDB
mappings. Again, thank you for your interest in this work.

Bhardwaj et al (2011) Integration of protein motions with molecular networks
reveals different mechanisms for permanent and transient interactions. Protein
Science 20:1745-1754.

1) This is indeed a strange observation in the file. It should not be
happening,
unless there’s an implicit convention of which I’m unaware. The analysis and
file compilation has been performed by a previous member of our group. Since I
cannot explain what you’ve observed for interactions 508 and 566, I’ll have to
defer your question to the post-doc who managed these files. I will cc you on
that email I send to him now.

2) You are correct — the surface residues are numbered according to their
numbering in the actual PDB files, and not according to their respective
UniProt reside indices.

3) You’re correct that, in the Kim et al 2006 paper, the terms "transient" and
"permanent" are never given explicit definitions. Rather, certain implied
definitions are appended to these terms in that paper. These definitions and
the reasoning are as follows:
A "transient" interaction is one in which multiple distinct pairs of
protein interact by using a shared interface on either protein. So, for
instance, let’s say that interface "a" on protein "A" interacts with interface
"b" on protein "B". Let’s also say that it’s possible for interface "a" on
protein "A" to interact with a completely different protein (say,
protein "C").
Since both "C" and "B" need to user surface "a" on "A", it is not possible for
both protein C & B to interact with A at the same time. That is to say, such
interactions are mutually exclusive. Assuming that both interactions are, at
some point in time, essential for biological processes, it must be the case
that there’s a transient nature to these interactions, thereby enabling
B and C
to interact with A at different times.
A "permanent" interaction, on the other hand, is one in which there are
not other competing pairs. The analogy here would be if "a" on "A" is inferred
to interact ONLY with "b" on "B". In theory, the interaction between "A" and
"B" may be permanent, since no other proteins need to interact with "a"
on "A".

We’ll wait to hear back from one of the other authors of the DynaSIN
paper, but
if anything I said above is unclear, of if you have any other queries, please
don’t hesitate to let us know.

Thanks for bringing it up; its been a while since I had a look at the codes behind DynaSIN (I have moved from Gerstein Lab). Anyways, ideally, order of proteins should not make a difference; swapping protein A and B should not change the contact residues. How many such cases do you see where order of the proteins made a difference?

The good thing is that these contact residues were not used for deriving the main results of the paper, they were only provided as an additional piece of data. Plus, if you think that the list of contact residues has some issues, its very easy to extract interface residues. That also gives you the freedom to change the distance cutoff.

List of all PDB structures + chain IDs for motions database

Q:

I would like to have information on the list of PDB structures (with chain IDs) for the structure pairs used in the motions database. I saw that the zipped list file of IDs in the website did not have the chain ID. I was wondering if you have already compiled information available for this?

I was a bit confused by the format of the file available for download on MolMovDb website (List.txt.gz). Do you have a compiled list of just the motion pairs manually created and for which PDB structures are available? I am also specifically looking for the corresponding chain ID for each of the PDB structures. Any help would be appreciated!

A:
Fairly recently, we virtualized MolMovDB, and this process
may have made it difficult to obtain some of the data files for which you’re
searching. This may also have played a role in the issue you bring up
about the
filters.

Cavity.exe within 3V

Q1:

I am using your program 3V for computing the volume of the active site
of a protein which is clearly external. I am using your program Cavity.exe of
3V. However, I do not know how to interpret the results of the run. It
would be nice if you please help me on this regards,

A1:
If the site is external then you are probably looking for a channel
rather than a cavity. Cavities are completely enclosed by the structure.
My suggestion would be to use the AllChannel.exe program and play with
the probe sizes.

The output from can be in several forms, but I typically use the -m
volume.mrc, MRC output and view the results in UCSF Chimera since it is
a free program.

Q2:
Thanks for your reply. I followed your suggestion and able to visualize the
Channel. But, I am wondering if I could also measure the Channel. Is it
possible by your program?

More specifically, given a structure, can I compute the volume of any
active site using your program? Do you have any other program or webserver
to do that?

A2:
I know the latest subversion source code outputs both the volume and
surface area of the channels. The website shows it as well.

It is up to you to get the parameters right, so that you are only
looking at the channel and not the any extra pockets.