Interaction Data set

Posted on January 31, 2012 by gersteinfaq

help of your paper “Redefining Nodes and Edges: Relating 3D Structures to Protein Networks Provides Insight
into their Evolution “. Now I need to get those protein in pfam which are involved in interaction and also the crystal structure of them.
I would be very grateful to you if you send me the link to access the more detail format of SIN v0.9 data.

My understanding from your email is that you would like to know the Pfam IDs
and the corresponding crystal structures (ie, the PDB IDs) for the
interactions involved in the SIN. To do this, you will have to process two
separate datasets together, but this will not be difficult. Here are the
steps:

i) access the raw SIN data (http://networks.gersteinlab.org/structint/) At
this page, click on “composite dataset” under the download column for SIN
v0.9 data. This is a list of open reading frame IDs corresponding to each
interaction (the first and third columns), as well as whether the
interaction is taken from Pfam.
ii) open the text file I’ve attached with this email. Each row contains
several pieces of information, but what you would like to do is find the PDB
IDs (contained in the 2nd column) corresponding to each Ensembl Gene ID (the
first column). This Ensembl Gene ID is taken from (i) above.
I should mention that there are two problems with the procedure outlined
above.
The first is that I noticed it will not provide crystal structures for all
interactions. I’m not sure why this is the case. Secondly, for some
interactions, multiple crystal structures are available, and it is not clear
which structure was used in Pfam. Nitin (CC’ed to this email) may know how
to negotiate with these issues. If you are still having difficulty, please
contact Nitin or I again after further efforts to get the data you need.

integrated regulatory network

Posted on January 31, 2012 by gersteinfaq

I read your recent paper “Construction and Analysis of an Integrated
Regulatory Network Derived from High-Throughput Sequencing Data” in PLOS
Computational Biology with a great interest. I would like to know if the
data of your integrated regulatory networks is available, or if you mind to
share it. Indeed, I’m part of a group of statisticians in Evry (France)
working on probabilistic models for biological networks. Our aim is to
retrieve the groups of nodes having similar topological behaviours. The
fact that your data has three types of nodes, a hierarchical structure among
TFs and miRNAs and that you made a biological analysis of this structure
makes it very interesting for us to validate or not the methods we
developed. Would it be possible for you to send me the C. elegans network
and the corresponding hierarchical structure? Any use of it would of course
be referenced.

I have upload the worm network data onto http://archive.gersteinlab.org/proj/mirnet
It comprise 3 files:

cel_TF_Target_GID.net : TF->gene interactions
cel_TF_MIR_GID.net: TF->miR interactions
cel_miR_conservedTarget_Kris3way_GID.net: miR->gene interaction

Node type is labeled as “MIR”, “TF” or “X” in the bracket.

Request for Pseudogene

Posted on January 31, 2012 by gersteinfaq

We are basically looking for the pseudogenes of protein P53 (tumor protein 53, or tumor suppressor) and protein WSTF (also call it as BAZ1B) in human species. There have no information in Pseudogene.org. Could you please help us to find a way to get the result?
Later on I found one webservice, which is called PseudoGeneQuest, and I submitted my target protein sequences and I got the results as shown in the following forwarded emails.

The results showed that there are known-pseudogenes in your database, however, I couldn’t extract the data out. Could you please help me to do so?
We are basically looking for the pseudogenes of protein P53 (tumor protein 53, or tumor suppressor) and protein WSTF (also call it as BAZ1B) in human species.

I have looked at our pseudogene database and there are no pseudogenes for P53 and WSTF. I have further rechecked this by redoing homology analysis to the genome based on both P53 and WSTF sequence and there are no other regions in the genome which are good hits to P53 and WSTF. I have also looked at the results from the other program and either the matches are to other coding exons of other genes or all they are not significant matches, i.e. the match-lengths are very small and the e-values are not significant.

For example, these are the other regions in the genome homologous to the coding sequence in BLAST. Please see attached image. The only significant matches to P53 proteins are
1. NT_010718.16

This corresponds to P53 itself

2. NT_004350.19 This corresponds to P73, another gene and not a pseudogene

3. NT_005612.16 This corresponds to P63, another gene and not a pseudogene

The other two matches are not significant matches and have length homology only to 20% of P53.

This is the result that you obtained from the other program.

0 - QUERY:111222153038348410812
2 - KNOWN_PSEUDOGENE:ref|NT_004350.19|:NT_010755.15:3118600:3119076
2 - KNOWN_PSEUDOGENE:ref|NT_004350.19|:NT_033903.7:3114083:3118495
2 - KNOWN_PSEUDOGENE:ref|NT_010718.16|:NT_008470.18:7177265:7178188
2 - KNOWN_PSEUDOGENE:ref|NT_010718.16|:NT_023935.17:7181340:7182403
2 - KNOWN_PSEUDOGENE:ref|NT_010718.16|:NT_079573.3:7181224:7182633
3 - REAL GENE OR EXON:ref|NT_004350.19|:3122278:3122442
3 - REAL GENE OR EXON:ref|NT_005612.16|:96077137:96077361
3 - REAL GENE OR EXON:ref|NT_005612.16|:96079592:96079771
3 - REAL GENE OR EXON:ref|NT_005612.16|:96080735:96080899
3 - REAL GENE OR EXON:ref|NT_005612.16|:96081483:96081638
3 - REAL GENE OR EXON:ref|NT_010718.16|:7176274:7176414
3 - REAL GENE OR EXON:ref|NT_010718.16|:7180194:7180331
3 - REAL GENE OR EXON:ref|NT_010718.16|:7180364:7180564
3 - REAL GENE OR EXON:ref|NT_010718.16|:7180845:7181012
3 - REAL GENE OR EXON:ref|NT_010718.16|:7183182:7183316

So all the good hits are to coding exons of P53 or P63 or P73 presumably because P53 is homologous to P63, P73 etc.

Similarly for WSTF, the other matches are either to known genes or the matches are not significant. You can easily check this by querying your protein sequence using BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn&BLAST_PROG_DEF=megaBlast&SHOW_DEFAULTS=on&SHOW_DEFAULTS=on&BLAST_SPEC=OGP__9606__9558)

morph server question

Posted on January 31, 2012 by gersteinfaq

I am currently using your server to morph two actin structures (open nucleotide cleft and closed). Besides morphing the two actin chains, I would also like to morph the bound ATP molecules. Although I can the two actin structures to morph, I cannot seem to figure out how to morph the ATPs. So basically my question is: can bound nucleotides be morphed? How should they be defined in the PDB filem which server to use, and should the morphs be done separately (i.e. protein and ATP as separate morphs)?

Thank you for your query, and for using MolMovDB. Specifically, which server is it that you have been using, and what error message(s) are returned? It is sometimes the case that the formatting of the PDB files must be manually changed in certain ways, and this in and of itself can be a little tricky. If you like (and if you don’t mind), you may send the PDB files to me, and I will spend some time on trying to format them for the server. If all else fails, it may indeed be necessary to morph things separately, but combining the resultant morphs may be an endeavor on its own.

Finally (and importantly), I should point out that the PDB formats may not be the issue at all; it may be the case that that there is an issue with ATP constituting ‘heteroatoms’. Dealing with heteroatoms in our server is extremely difficult, and this is sometimes a result of the way in which they’re numbered in your input files. In addition, parameterizing heteroatoms using our interpolation software is very difficult (to the point that, if you do indeed obtain a morph, the resultant interpolation may be very questionable, so it may be a use-at-your-own-risk practice if the parameterization is not done properly).

Gerstein Lab FAQs

Frequently Asked Questions

Daily Archives: January 31, 2012

Interaction Data set

integrated regulatory network

Request for Pseudogene

morph server question