Tab delimited Hinge Atlas Gold

Q1:
I needed to use the Hinge Atlas Gold for my research. I tried using http://www.molmovdb.org/tarballs/hinge_atlas_gold/hinge_atlas_gold.txt link as mentioned in the paper and on the link but it doesn’t have any data but just the metadata.

Can you please help me with this?

A1:
does
http://www.molmovdb.org/tarballs/hinge_atlas_gold/
have what you need?

Q2:
I needed to know annotated hinge residue numbers and their corresponding PDB IDs/ Morph IDs from Hinge Atlas Gold

The link: http://www.molmovdb.org/tarballs/hinge_atlas_gold/hinge_atlas_gold.txt
Has information about the data but not the actual data.
Paper and the link mentions that It is supposed to have tab-delimited data as it says:
This is a tab-delimited database of hinge predictor results and gold standard hinge annotation for the Hinge Atlas Gold dataset used in our submitted HingeMaster manuscript, also used in our BMC Bioinformatics paper, ‘FlexOracle: predicting flexible hinges by identification of stable domains’ by Flores et al.
However, the data is not present over there. Please let me know where I can find it.

A2:
There was a script on the server to refres/regenerate the mysql dump ever so many years ago. It is possible this was run in some way that led to an empty result.

I looked at the Hinge Atlas Gold gallery

There was a script on the server to refres/regenerate the mysql dump ever so many years ago. It is possible this was run in some way that led to an empty result. I looked at the Hinge Atlas Gold gallery (http://molmovdb.org/cgi-bin/movie.cgi?set=HingeAtlasGold ) but it seems to not work either, at least I cannot follow it to pull up the individual morphs. This was all years ago, I don’t have access now which is just as well since I probably don’t have time to debug.

Maybe someone in Mark’s lab can get the gallery back up?

Q3:
Thank you for the response. I read your reply and based on that, I have a suggestion:

‘Hinge Atlas’ (Not Hinge Atlas Gold) link seems to work (http://www.molmovdb.org/tarballs/hingeatlas/hingeatlas.txt) so maybe the shell script in hinge atlas (following):

echo "drop table temp; create table temp select distinct(stats.mid_) from sequence, stats where stats.mid_=sequence.mid_ and stats.nonredundant=1 and (sam_hinge or leslie_hinge); select sequence.mid_,resnum,restype,(sam_hinge or leslie_hinge) from sequence,temp where sequence.mid_=temp.mid_ order by mid_,resnum;" | mysql -u root -p molmovdb > hingeatlas.txt

will possibly work if everything stored in the same table just by replacing stats.nonredundant=1 to stats. (Something representing Hinge Atlas Gold field)

Again, this is just a suggestion.

A3:
The numbering does not seem to quite match up with 1dv2 or 1bnc.pdb . I think maybe it has to do with some renumbering of the PDB file. Probably ff1.pdb would settle this.

Regarding obtaining data of pseudogene

Q:
Can you please help me to get pseudogene information for human, mouse, rat, drosophilla and C. elegans? I need exclusive fasta files or .bed files corresponding to pseudogene annotations for these five species separately.

A:
see pseudogene.org. For any infromation regarding the pseudogene annotation in human, mouse, drosophila and C.elegans please see:
http://www.pseudogene.org/psicube/
And
http://www.pseudogene.org/Mouse/

interested in Funseq2

Q:
I found your paper regarding to Funseq2 and quite interested at how do you assign weight or calculated weight for each category. From weighted scoring schema, I could see different categories have different weight, but I am not sure how do you decide them .

A lot bit about me: I am interested pediatric genetic diseases and working on a birth cohort at Beijing Children Hospital as assistant professor.

A:
It’s an entropy-based scheme in the paper. It’s also described in
various FunSeq lectures (on lectures.gersteinlab.org).

The details of Funseq2 can be found in our paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0480-5. Simply, In Funseq2, we firstly to define a weighted score for each feature based on their distribution of features in random selected common variants. Discrete and continuous features use slightly different way (refer the formula 1 and 2 in the paper).
for a discrete feature, like ‘In sensitive regions’: [see image]

if there are 20 out of 2000000 random common variants are overlapping with sensitive regions, the Pd will be 20/2000000 = 0.0001 , then [see image]
will be used to get the weight for ‘In sensitive regions’

For the continuous feature, it uses:
[see image]

how to filter TF binding peaks for a plant ENCODE project

Q:
My lab is doing a few plant ENCODE projects, and we have done ChIP-Seq for ~100 maize TF and is analyzing the data. We followed most of your 2013 paper “architecture of the human regulatory network…”. Something confused me a bit is that we have on average ~10,000 peaks for each TF (from SPP and IDR 0.01). If I associate them to genes based on the distance to TSS, we have a huge TF-gene or TF-TF network. almost everyone is interacting. For example, the 100 TF to 100 TF network has 5k edges, I guess many of them could be false positive due to the weak ChIP-seq peaks. In your paper, you used TIP (in your Cheng et al 2011 NAR) to further filter out some interaction. We are trying that as well. But I don’t understand how did you get the input for TIP (500,542 promoter associated interaction, page 3 of your paper) from 2,948,387 promoter proximal peaks. Is there something I missed?

I also have another question about TF function in general. I am not sure whether we can claim the TF binding is "non-functional”, if the TF gene itself showed low co-expression correlation with the target gene. Or silencing the TF gene did not affect the target gene expression. Because the regulation could be complex with multiple TF targeting one genes. Those show co-expression/correlation might be target genes that the TF play major role. While TF can still contribute to the expression of target genes but it only contribute a small percentage with other TF playing a more dominant role. So can i say that those TF binding has no function?

A:
My understanding is: TIP assume each TF has a specific binding profile around TSS cross the genome in the human genome. TIP then estimate an empirical distribution of signal/peaks around TSS, convert it to weight and calculate a score for a peak. This assumption is based on the human genome. It may not be applied to other genomes directly if there is no clear pattern in around TSS. Before you use the tool, please double check the binding profile of each TF in plants. You can check and adapt the source code of TIP from Github: https://github.com/gersteinlab/TIP

For TF ChIP-seq, if the constructed regulatory network very dense, you may try to use a more stringent cutoff to reduce the false positives regulations.

As to whether gene co-expression reflect TF regulatory function, as you mentioned, you already aware that the mechanism is very complex. The co-expression definitely cannot sufficiently prove this regulatory function. But we still can get some reliable inferences based on the co-expression according to many previous studies. Also if you have multiple data sources, the result can be refined by advanced machine learning techniques. you can refer a new paper from our lab recently, we use elastic-net to refine the TF-gene network(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=30545857&dopt=Abstract).

Article Problem LARVA

Q:
I am reading your article of “LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations”.And I am really interest in it.But when I run the source code by following the intruductions,I meet some problems.

I put all files in the right places.And I do "make" command successfully,the picture is followed.

A:
When you compile LARVA, the "larva" executable is created in the top level of the LARVA distribution, but it is NOT added to the PATH environment variable. Invoking the LARVA executable as you did would work if the "larva" executable was installed in a standard location like "/usr/bin" or "/usr/local/bin", but since the Makefile creates the executable in the same directory as the .cpp files, you need to invoke it with "./larva", so the Terminal knows to look for the executable in the current directory. Alternatively, you can add the LARVA code directory to your PATH variable like so:

export PATH=~/larva2/code:$PATH

Request for the supplementary data of the ENCODE paper

Q:
My projects focus on exploring the mechanisms of gene regulation. I recently read the ENCODE paper (Architecture of the human regulatory network derived from ENCODE data, 2012) again and realized that the supplementary data will greatly help us to refine our results.

Unfortunately, I found that all the files have been achieved. Both of the following sites can’t be reached. I am writing to ask if there are any other ways to access the files. Thank you very much for your time. I am looking forward to hearing from you.

http://encodenets.gersteinlab.org
http://archive.gersteinlab.org/proj/encodenetsold/

A:
http://encodenets.gersteinlab.org
should be up shortly

EN-TEX data

Q:
Your postdoc give a great talk about the EN-TEX work in the ASHG meeting. The data
generated from this project will benefit the community greatly. Could you
please tell when and how the data will be made available for external users?

A:
Thank you for your suggestion. In the mean time, you can find the correct versions of fasta and blast freely available online. For easing the user experience we provide a link to the two packages on the website http://pseudogene.org/pseudopipe/ .