Help regarding the paper “Comparative analysis of regulatory information and circuits across distant species”

Q:
Recently, I have read one of your paper titled “Comparative analysis of regulatory information and circuits across distant species”. In this paper, you wrote that you used simulated annealing to reveal the organization of regulatory factors in three layers of master-regulators, intermediate regulators, and low-level regulators. However, I can’t find the program for this method or the references related to this method. I want to use this method to class the TFs in my own regulatory network. Can you kindly provided this program for me?

A:
An initial version of the code is available from encodenets.gersteinlab.org.

The code used for the analysis can be found
http://encodenets.gersteinlab.org/enets16.hierarchy_levels.m
more recently, our group published an updated method. the code will be released very soon.
http://genomebiology.com/2015/16/1/63/abstract#

1000genomes allele proves japanese ancestry?

Q:
I’ve been messaging the
list of contributors to the 1000genomes project to help me confirm the data
from you project is the proof I need in my search for japanese ancestry, and
would really appreciate your approval on whether what I’ve found is the
breakthrough I’ve been searching for.

Thanks to the help of a population geneticist from china who has a good set
of japanese allele at their disposal, she was able to find two japanese
specific allele. One of which is rs184214090 AG, i carry this one. And if
you take a look at ensembl/1000genomes population frequency you will see
that the AG allele is only found in the japanese sample.

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.org_Homo-5Fsapiens_Variation_Population-3Fr-3D19-3A43873182-2D43874182-3Bv-3Drs184214090-3Bvdb-3Dvariation-3Bvf-3D44631311&d=AwIDAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=k13szPaSoN-FV15yTT5Guyx7KbJMywtSANe0quSPJ-Q&m=Ayzbfe4LKLVqd9jn_XPrPiw1lN7agPrK9CcA_FutJec&s=9eV37Lanwkxb0uLLB7rJVO9yKv7dDES_gDLtTxjjL68&e=

I also spoke with Mr saitou naruya who is a very well cited japanese
geneticist and he told me that if I can find an allele that is specific
restricted to japan this is a good indication which is known as "private
polymorphism". Can you please give a quick confirmation to this whether I’ve
got this all right?

A:
I agree with what Mr Saitou said about private polymorphism.

Volume calculations w/3V

Q:
I would really appreciate if your could assist me with an issue I encounter by using your online 3V software.
I am trying to compute the cavity size of a host (which I also did two years ago, see Org. Biomol. Chem., 2013, 11,
7667) but am receiving the message: "failed to create an MRC file”.

Program is still running, progress is shown below
host ip address=164.107.224.25 (Tue Mar 31 17:13:16 2015)
converting pdb into xyzr (Tue Mar 31 17:13:16 2015)
completed conversion of PDB file: 2015.mar31.8dd.pdb (size: 4k) (Tue Mar 31 17:13:16 2015)
converted 128 atoms of 128 atoms (Tue Mar 31 17:13:16 2015)
found 128 atoms in pdb (Tue Mar 31 17:13:16 2015)
running 3v channel program (Tue Mar 31 17:13:16 2015)
failed to create an MRC file (Tue Mar 31 17:13:16 2015)

The program stops and does not provide me with any result.

A:
I looked at the log file and the program does not find a cavity at coordinate 0,0,0. I tried it again with a high resolution grid size.

I used the cached PDB that you uploaded and I managed to get a channel using channel finder:
http://3vee.molmovdb.org/viewResults.php?jobid=2015.apr04.e5b

It missed the channel, I think if you use the coordinates, 5,0,0 it will work better:
http://3vee.molmovdb.org/viewResults.php?jobid=2015.apr04.e64

Question about Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors

Q:
I was intrigued by your paper about classifying the human genomic regions based on experimentally determined transcription factor binding sites. I was wondering if you can share genomic loci of the six types of regions that you were able to identify in this paper. I was also wondering if by your analysis you were able to conclude which regions are not tissue specific. I was also curious to know if you have done similar analysis on other species. It would be great if you would be able to share the scripts that you used to generate these results if they are available in some sort of a program/package.

A:
see
funseq2.gersteinlab.org
+
metatracks.gersteinlab.org

Question about A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data

Q:
My research focus on understanding measure trust prediction in social networks. I read your paper about A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data. I am interested in this method. Maybe I would use Bayesian Networks Approach for Predicting User-User Interactions from Social Network. So I want to ask whether I can refer to the realization of the experimental in this paper, especially for the code and data.

A:
yes – see http://networks.gersteinlab.org/intint/

AlleleSeq question

Q:
I would like to use AlleleSeq with exome data but am unsure how to generate the .cnv file. If I supply a BAM from my exome sequencing experiment then won’t this produce some very low average coverage if CNVnator examines coverage across the genome? This will mean that my exonic variants will be excluded because AlleleSeq interprets them as being in a CNV… I assume I have to supply a FASTA containing only the target regions of the exome enrichment – is this true?

A:
I haven’t used AlleleSeq with exome data (I will ask around some people in the lab and will probably write to you again next week), but I would try using the latest version of the pipeline: the 0.2.6a version of Personal Genome constructor (http://alleleseq.gersteinlab.org/tools.html) doesn’t use CNVnator, instead the pipeline uses bedtools and calculates median read depth across +/-1000bp regions around snps. So if all snps are exonic, the read depth around each snp will be compared to the median value across all the other ones. This might bias snps located close to exon/intron junctions, if that happens, I would consider decreasing the window size.

…We discussed your question in the lab and believe that it is better not to perform the cnv identification step at all: there should not be many of them in the exome, and the edge effect could be avoided as well. The latest version of the alleleseq pipeline still requires a .cnv file to run. So, the easiest solution is to create the .cnv file with the list of all the snps (as in the .snp file) with rd=1.0:

chrm snppos rd
1 52066 1.0
1 695745 1.0
1 742429 1.0
….

Technical questions about local gene co-expression

Q1:
I am interested to assess the matching
score and the relationship between expression profiles as you did in your
Qian et al 2000 (pubmedid: 11743722) paper, on my own data.
But I need some clarifications if possible.
After normalizing gene expressions using z-score, how did you eliminated
the negative expression levels? In other words, if the expression of each
gene is normalized using z-score, so each gene contains positive and
negative normalized expression levels, so how do you define genes having
negative expression levels?

A1:
Normalization was used to calculate the correlation coefficient. Although we will have negative values, we should not interpret them as actual gene expression levels.

Q2:
To estimate the p-value of each matching score, how did you generated the
random expression profiles? Did you switched two gene expression time points
for each gene or did you permuted the gene expressions for each gene?

A2:
We permuted the gene expression for each gene by switching two gene expression time points.

Q3:
If I wish to determine locally co-expressed genes in different
time-series experiments, can I combine the gene expression profiles from the
different experiments in one matrix as bellow and apply your algorithm on
this new matrix instead of applying the algorithm on the gene expression
profile of each experiment alone?
exp1: exp1_t1, exp1_t2, exp1_t3, exp1_t4
exp2: exp2_t1, exp2_t2, exp2_t3
combined_exp: exp1_t1, exp1_t2, exp1_t3, exp1_t4, exp2_t1, exp2_t2, exp2_t3.

A3:
Our algorithm will detect the time delayed relationships. If exp2_t1 is indeed the measurement following exp1_t4, the operation should be fine.

Morphing TRAP1

Q1:
in the past months I tried several times to us the multi-chain morph server for creating a movie of the heterodimeric protein TRAP1. I never got an e-mail back and when I used the old version of the server it does not seem to come to a result since more than 24 hours. It always gives the message not completed yet. I am wondering what the problem is.

A1:
Thank you for your query regarding the server. We’ll look into this, but may I ask why you are using the old version of our server specifically? I only ask because I ran tests on our newer multi-chain server on Sunday, and things worked very well there:

http://molmovdb.org/cgi-bin/beta.cgi

Having said that, we’ll check on things. Would you mind providing us with your Job ID (if you still have it), as well as the PBD files which you’d like to morph?

Q2:
I used the old server because I had used the new one before several times with the same job and never got an e-mail that it was finished. the job number is m716893-2511.

A2:
I was unable to find the directory corresponding to your morph, so our apologies for that. There are two possibilities I can think of:

1) We recently did some minor work on the server. Things were down for a short time, but when I checked over the weekend, things were back to normal. It is possible that your issue was a temporary one.

2) The second possibility is that your PDB files have formatting irregularities or some type of heteroatom which is not being processed or recognized by our server.

The best way to address the 1st possibility is for you to just re-submit your jobs on the newer server [ http://molmovdb.org/cgi-bin/beta.cgi ], and see if everything works. The easiest way to address the 2nd possible issue may be if you just send us your PDB files, and we can have a closer look at them.

Q3:
I tried the morph again in ran through this time but the movie file was empty and the pdb-files I could download just had "end" written in them. I also tried pdb-files of single chains. I also removed heteroatoms and made sure the number of amino acids is identical in both files. The message I get from the server is that it is not yet complete. The latest jog has the number 018936-29130. I do not understand what is wrong with these pdb-files. Please find enclosed the template pdbs that I have tried last. It would be great, if you would find out what is wrong with this.

A3:
Your PDB files look very good. It is not clear why your previous submission failed, but we were able to successfully generate your morph. You may view it here:

http://www.molmovdb.org/cgi-bin/morph.cgi?ID=032929-30641

You may or may not find the attached image useful, but we have also produced a structure alignment for you (blue corresponds to low-RMSD regions of the alignment, and red corresponds to regions with higher RMSD between your two structures).

FunSeq2 encountered issues processing whole-genome data

Q1:
I am attempting to use FunSeq2 to complete analysis on whole-genome data, and unfortunately have encountered issues. As there is no contact listed in the documentation, I thought I would try contacting you to inquire about troubleshooting. After loading a BED file in the appropriate format, a message is returned stating that the requested page is unavailable due to a server hiccup.

A1:
Could you send me a few lines of your input ? or id provided by the website ?

Q2:
The ID provided by the website is: 201511510325290607. I’ve also included a few sample lines from my input below. Please let me know if I can provide any further information.

chr1,203214078,203214078,T,C,98-22532-1

chr1,203275292,203275292,C,G,06-14634-1

chr1,203808954,203808954,C,T,06-14634-1

A2:
Your input format is different from the usual BED format. Could you separate the fields with tab (instead of comma) and try again ? The last column will be treated as sample name.

pseudogene similarities to parent genes

Q:
I am looking at your paper ("The Gencode pseudogene resource"), which
appears very relevant to something I am doing right now. Specifically
I am interested in the Sequence identity values between pseudogenes
and their parents, which are used in figure 4. Would it be possible
for you to make these available to me (or to tell me where i can
download them if they are already online ?)

A:
You may find the data at http://pseudogene.org/psidr/similarity.dat