Macromolecular Database Question

Q:
I have come across the Macromolecular Database and I
was curious to how a degree of motion is quantified in this site. In the
following link for one of the entries (HIV protease:
http://www.molmovdb.org/cgi-bin/motion.cgi?ID=hivprot), the third box from
the top entitled ‘Description’ says of HIV protease:

"Two large loop regions, that together comprise one quarter of the
structure, move CA atoms ~7 Angstroms"

Is this referring to a RMSD value of an ensemble of structures? Is this a
RMSD value of the whole protein, or only for the domain of those two large
loop regions?

A:
This is described in the DB paper
(http://papers.gersteinlab.org/papers/molmovdb2).

Data request re paper “Prediction and characterization of noncoding RNAs in C.elegans by integrating conservation, seondary structure, and high-throughput sequencing and array data. Genome Research.2011”

Q:
I have read your paper "Prediction and characterization of noncoding RNAs in
C.elegans by integrating conservation, seondary structure, and
high-throughput sequencing and array data. Genome Research.2011". I am
currently doing a project to analyze lncRNAs in C.elegans, therefore it will
be a great help to have the coordinates of the lncRNAs discovered in your
paper. I would be grateful if you can send me the lncRNA annotation
file (GFF,GTF or GFF3 file) by email.

A:
try
http://papers.gersteinlab.org/papers/incrna/
linking to
http://archive.gersteinlab.org/proj/incrna/

pseudogene.org error message

Q:
I would just like to bring the below error message to your attention that I recently received when attempting to access data on pseudogenes.org. (see image)

A:
We were not able to reproduce your error. In order to understand what happened and find a solution, it would be of a considerable help if you could let us know the exact commands you made that resulted in this error.

molmovdb job 014670-22217

Q:
I submitted a job to your morph server but after 4-5 days it is not yet completed. Could you please check if there is a problem or if I made a mistake in my submission?

I just gave two PDB codes for different conformations of maltose-binding protein (MBP). The two codes were 1OMP and 3MBP. They are both monomers and have the same number of residues, but 3MBP has a ligand bound.

A:
Indeed, it appears as if the issue has to do with PDB format irregularities. We have corrected these issues, and your morph may viewed by clicking the link below (please use Safari to view morphs, as Chrome and Firefox no longer support java):

http://www.molmovdb.org/cgi-bin/morph.cgi?ID=057837-877

Feel free to let us know if you experience any further difficulties. Also, if you like, we’d be happy to send you all of the accessory files associated with this morph.

molmovdb.org reboot?

Q:
Still getting this message after several days.

The job 072540-24927 is not yet completed

The two files were 1ohu chain A

1ty4 chain A

A:
It appears as if the issue has to do with PDB format irregularities. Specifically, the sequences within ATOM fields do no match the residues reported in the PDB file’s SEQRES field. In any case, we have corrected these, and your morph may viewed by clicking the link below (please use Safari to view morphs, as Chrome and Firefox no longer support java):

http://www.molmovdb.org/cgi-bin/morph.cgi?ID=056703-32536

OrthoClust questions

Q:
I am contacting you regarding the OrthoClust program that your group has on github and had a couple questions about how to apply the program to new datasets. First, how was the co-appearance matrix calculated from the OrthoClust output? Second, is it necessary to modify the initial number of spin states (q) or the coupling constant (k) parameters that were used in the 2014 Genome Biology paper? I am not able to find options in the current release and wondering where these values can be changed in the code?

A:
The current implementation in github is based on a heuristic, rather than the simulated annealing method used in the 2014 Genome Biology paper. The initial number of spin state q is no longer a parameter you have to supply. It’s set to be the total number of nodes in the system. As explained in the readme, the coupling constant k is supplied in one of the input files (the coupling information file). It should be the 3rd column of the file. in my example (ortho_info file found in data folder), the third column is all 1, meaning k=1.
For the co-occurrence matrix, notice that the output file is a tab delimited file which consists of three columns. The 1st and 2nd columns are the species id and the gene id given by the input files. The 3rd column is a module id. Suppose there are N1 genes in species 1 and N2 genes in species 2, the co-appearance matrix has dim (N1+N2) by (N1+N2). One should build a map between the genes in individual species to the indices running from 1 to (N1+N2). Suppose there are n genes in module 1, then all the pair-wise combination of these n genes should be marked as 1 in the corresponding matrix elements.
One output file can be used to make a co-appearance matrix (with only 0 and 1). If you have multiple output files from multiple runs of the algorithm, you will arrive at a final co-appearance matrix shown in the Genome biology paper by adding the results together. Of course, in order to make a plot like the heat map shown in the paper, one has to further perform clustering to arrange the rows and columns.
If you use Julia, I may be able to send you a little script.