Batch submissions to the STRESS server

Q:
I have a large number of structures that I would like to submit to the STRESS server. Does the server offer an option for batch submissions?

A:
The STRESS server itself does not currently provide an option for batch submissions. However, we encourage users to try implementing such jobs by running the source code available on our GitHub page. This may be accessed through github.com/gersteinlab/STRESS

Dealing w/system boundaries in Voronoi calculations & assigning radii to pseudo-water

Q:
I’m analyzing protein structures (specifically, I’m performing a Voronoi-based analysis) using the tools linked on the geometry page:

http://geometry.molmovdb.org/NucProt/

I understand that the bisection of distances between atoms means that the radius does not matter. However, what happens at the boundary of the system?

Also, if you add ‘pseudo-water’ to the system, then do the water atoms need to have a particular radius? If not, then is there a distance cutoff?

A:
With respect to your first question (regarding the boundary of the system — ie the protein surface): the Voronoi volumes become large and potentially infinite. That’s why you need to introduce solvent. A course lecture may help to further explain this nicely:
http://www.gersteinlab.org/courses/452/09-spring/pdf/structure2.pdf

With respect to your second question (regarding the radius assigned to water): Yes, the water atoms most definitely need to have a particular radius. Distance cutoffs won’t work. You can probably use the normal water radius here.

Change in contact areas as the radii grow larger (but remain in proportion)

Q:

I’m analyzing protein structures (specifically, I’m performing a Voronoi-based analysis) using the tools linked on the geometry page:

http://geometry.molmovdb.org/NucProt/

Is there any work you know off showing how the contact areas change as the radii grow larger but in proportion? … I managed to read in any file definition of atom radii but this has no effect on the area of polygon faces. I also tried to multiply the atom_vdw[ii] but this too had no effect. The main routine I use is "full-dump-polyhedra.main.c".

A:
I think there’s an easy answer. See the DumpAFace routine in the code linked here:

http://geometry.molmovdb.org/files/libproteingeometry/src-prog/full-dump-polyhedra.c

This prints out :
"– Face between atom %3d and neighbour %3d"
& then
"Face-Area= %9.4f Pyramid-Volume=%9.4f\n",area,FaceVolume

If you vary the radii used to parameterize the program, you can see how the contact area changes, perhaps by tabulating the value of the area variable. The effect is, as you guess, rather small but is related to the way optimal radii sets were selected in the past.

With respect to the second part of the question, ie:
I also managed to read in any file definition of atom radii but this
has no effect on the area of polygon faces.
I also tried to multiple the atom_vdw[ii] but this too had no effect.
The main routine I use is "full-dump-polyhedra.main.c".

There’s a number of reasons why this is happening.

(1) You only have one atom type (ie just CA).

In this case, radii are irrelevant and you’re effectively just using bisection. The radii are only relevant when two atoms of different types come into contact, and one has to apportion the space between them.

(2) You have differently typed atoms, but you’re using the normal Voronoi bisection method and not the alternate plane positioning methods using radii.

You need to tell the program explicitly not to use the normal bisection approach via the "-method" argument. See the documentation for calc-volume in the readme file linked here:

http://geometry.molmovdb.org/files/libproteingeometry/src-prog/README

I think this argument works properly for full-dump-polyhedra:

http://geometry.molmovdb.org/files/libproteingeometry/src-prog/full-dump-polyhedra.main.c

See the code for the main() routine to see it being invoked.

(3) You have differently typed atoms & are specifying a non-bisection plane positioning method, but you’re not reading the radii properly.

Here you can verify the atoms are correctly typed by using the show-2rad-refV program, viz:

http://geometry.molmovdb.org/files/libproteingeometry/src-prog/show-2rad-refV.main.c

Re-parameterizing radii when performing Voronoi-based analysis on structures with only the alpha carbon atoms

Q:

I’m analyzing protein structures (specifically, I’m performing a Voronoi-based analysis) using the tools linked on the geometry page:

http://geometry.molmovdb.org/NucProt/

I’m hoping to run the calculations using just the CA atoms, and so I must change the radii accordingly. Where should I start in terms of figuring out the new radii that should be used?

A:
The most recent re-parameterization of the radii is from Neil Voss’s work about ten years ago. See

http://papers.gersteinlab.org/papers/nucprot/

The logic in this paper could be easily extended to derive a set of CA radii.

Voronoi-based analyses of very large structures using tools in NucProt

Q:

I’m analyzing protein structures (specifically, I’m performing a Voronoi-based analysis) using the tools linked on the geometry page:

http://geometry.molmovdb.org/NucProt/

I’m using this on huge systems with more than 99,999 atoms. Is this possible?

A:
I don’t think there’s any hard coded limitation in the number of atoms. Look at the read_pdb_file routine this source script:

http://geometry.molmovdb.org/files/libproteingeometry/src-lib/readpdb.c

This "mallocs" up space on demand so in theory if you have enough memory I think you can accommodate >100K atoms. However, the PDB format itself is an issue here. You can modify the PDB reading routines to a different format. Just modify the read_record routine in the same file. However, I don’t know if doing this in multiple "models" will work.

Distinction Between Surface- and Interior-Critical Residues

Q:
What is the main difference between surface- and interior-critical residues?

A:
Allosteric surface residues play regulatory roles that are fundamentally distinct from those of allosteric residues within the interior. While surface residues may often constitute the sources or sinks of allosteric signals, interior residues act to transmit such signals. Thus, different approaches are needed for identifying these two classes of residues. Surface-critical residues are identified by finding pockets such that the occlusion of such pockets is likely to interfere with large-scale protein motions (see Documentation for details; see also Ming and Wall, 2005; Mitternacht and Berezovsky, 2011). Interior-critical residues are identified by finding information-flow bottlenecks within the protein structure (see Documentation and main paper for details and Sethi et al, 2009).

Ming, Dengming, and Michael E. Wall. “Quantifying allosteric effects in proteins.” Proteins: Structure, Function, and Bioinformatics 59.4 (2005): 697-707.

Mitternacht, S. and Berezovsky, I.N. (2011). Binding leverage as a molecular basis for allosteric regulation. PLoS Comput. Biol. 7, e1002148.

Sethi, A., Eargle, J., Black, A.A., and Luthey-Schulten, Z. (2009). Dynamical networks in tRNA:protein complexes. Proc. Natl. Acad. Sci. U. S. A. 106, 6620–5.

 

 

ENCODE-Networks Source Code for Context-Specific TF Co-Association Analyses

Q:
Hello,
I am interested in your paper published in Nature, 06 September 2012, “Architecture of the human regulatory network derived from ENCODE data”. In particular, we are interested in the framework of context-specific TF co-association analysis described in this paper. We would like to apply this method on our in-house datasets. It’s exciting that the code for these analyses is “Available soon” (the file “enets21.coassoc-code.tgz” on http://encodenets.gersteinlab.org/). Do you know whether the code for co-association analysis in this paper is available now? If so, it might save us a lot of time. Thanks for your help!

A:
The main machine learning method used for the analysis is RuleFit3 which is available here
http://statweb.stanford.edu/~jhf/r-rulefit/rulefit3/R_RuleFit3.html

Detailed instructions on preparing the input data and computing the various scores are in the supplement of the paper.

I don’t have a polished code package that is ready for use for the general public. The code that I wrote for analyses in the paper is here https://code.google.com/p/tf-coassociation/source/browse/#svn%2Ftrunk%2Fscripts . But I have to warn you that its not designed to work on general datasets as it has scripts that were designed to run on our local cluster. The core functions are in
https://code.google.com/p/tf-coassociation/source/browse/trunk/scripts/assoc.matrix.utils.R . The code is reasonably commented so hopefully it should help.