HingeAtlas (2007)

Q:
I am reading your Hinge Atlas (2007) paper.
I searched for your dataset to study the pdb structures you used and their
hinge residues, but I could not download it from the page:
http://molmovdb.org/cgi-bin/sets.cgi.
Could you please send a file if it is possible by email, or please check and
fix if there is a bug on the web page.

A:
Please try the Hinge Atlas Gold while we investigate the webpage. This may take time.

MS data in the Psychencode datasets

Q1:
I recently met you at LMB where you gave a wonderful talk on PsychENCODE data analysis.

You mentioned that there were MS datasets in the PsychENCODE. I am unable to find it. Is it possible for you to point me to that or point me to someone who may know about this? Is it possible for you to point out the MS data in the PschENCODE datasets?

A2:
Could you please explain a little more about what dataset you need?

Q2:
I am looking for Mass Spec data sets in PsychEncode. Mark mentioned that MS analysis were done for some samples. I wonder whether you could help me in identifying them?

A2:
I just checked with our DCC team and currently we don’t have any Mass Spec data available for public sharing.

Q3:
What is dcc team? I was given to believe from the publications that this data was available along with others for analysis. i would not have asked otherwise. is there a way i can reach out to any group among your dcc team that has this data to see whether i can formally collaborate with them? Can you kindly let me know who may be the best person to ask for the details of the group that may have the MS datasets? I am looking for MS data (even if it is published) from any of the samples that were used in the Psychencode project.
I am willing to collaborate and share authorships with the scientists who generated these datasets?
Would it be possible for you to point out to any one whom you may know who may have this dataset (published or unpublished)?

A3:
I have contacted the group that is generating the Mass Spec data. Are you specifically interested in proteomics related to donors with neuropsychiatric disorders? We (Sage Bionetworks) also function as the data coordination center for the NIA funded Accelerating Medicines Partnership – Alzheimer’s Disease (AMP-AD). There are a variety of studies in AMP-AD with Mass Spec proteomics on post mortem brain tissue, that also have other genomic data such as WGS and RNAseq. Included in that is the Religious Orders Study and Memory and Aging project (ROS/MAP) from the Rush Alzheimer’s Disease Center. See here for information on the cohorts. There will be TMT labeled MS on ~400 ROS/MAP donors released this fall.

Q4:
Thank you for getting in touch with me. Thank you for your pointer. Indeed, we will be interested in the Alzheimer’s samples (all the three WGS, RNAseq and Proteomics).
I will write a separate note to you on this.
At the moment, we are looking for MS samples from donors with neuropsychiatric disorders.

A4:
Actually, my lab is doing something very similar as well, validating novel ORFs identified from our third generation sequencing, and riboseq data.
If you use other approaches that we did not use yet, or with some special goals more than just validating ORFs in brain, I will be happy to collaborate.
I have two students/collaborators on this.

Q5:
Is it possible for me to make a quick call?

A5:
…(resolved via phone call on Jul 9, 2019)…

inquiry about your publication (on papillary kidney cancer)

Q:
we are working on lncRNA. We have learned a lot from you publication
“Whole-genome analysis of papillary kidney cancer finds significant
noncoding alterations” published on PLoS Genet. And we want to get more
detailed information about this study from you. Would like to tell us the
detailed final nucleotide mutated and mutation frequency in NEAT1 and
MALAT1?

If you were kindly offering us this information, it would be very helpful
for us.

A:
Below is the detailed mutation information in NEAT1 (BED format, hg19). The second and fifth mutations are in the same sample. The cohort size is 35.

Question about liftover from pat/mat to ref

Q1:
I’ve used your AlleleSeq package to get NA12878 diploid genome alignment. I have two questions:

1. How can I liftover from paternal/maternal alignments to hg19 coordinates? Based on the documents, the map files generated by vcf2diploid are used to convert from hg19 to pat/mat, not the other direction. Last year I found an R script "map.to.ref.R" from your website and used that to convert from pat/mat to hg19. However I cannot find it anymore. Do you have an updated script for this liftover? If "map.to.ref.R" is still valid to use, what reference should we cite in our manuscripts?

2. I noticed that both the "MergeBowtie.py" script in AlleleSeq and the R script "map.to.ref.R" assume bowtie format in input alignment files. Do you have upgraded versions that are compatible with SAM format?

A1:
1. We use the chainSwap tool (http://hgdownload.soe.ucsc.edu/admin/exe/) to flip the maternal.chain and paternal.chain files (generated by vcf2diploid) to the other direction. Then a .bed file in the parental coordinates can be converted into a .bed file in the reference coordinates using the UCSC liftOver tool and the swapped chain files.

This approach is straightforward for any arbitrary interval, but if you are converting alignments, it may require additional scripting or using other tools. We’ve received a few issues/questions with the script you are mentioning and since we didn’t develop it and aren’t currently planning to maintain, we removed the link from the web-site for now.

2. We do not have updated released versions yet, but this is one of the things that will be introduced in future versions and thus I might be able to help. What exactly are you interested in: is it just compatibility with SAM-formatted bowtie1 (i.e. ungapped alignments) output or are you trying to use another aligner?

Q2:
Thanks for your detailed answers. They are very helpful.
We used bowtie2 or tophat2 as the aligners for DNA or RNA.
Currently we hope to liftover the pat/may alignments to hg19 for downstream analyses (e.g. HiC or DEG). We can’t compare results if they are not liftovered to the same reference.
Could I take the mapped position in sam, convert it to bed, liftover to hg19, and then chip this new position back to the sam? Do you have a better solution?

A2:
The start coordinates of the reads can be transferred this way, but I don’t think inserting them back to the sam file will produce a correct sam/bam file overall: some of the other fields (CIGAR string, mismatches, etc) will need to be adjusted as well. Depending on the further analysis this may be an issue. CrossMap http://crossmap.sourceforge.net/ seems to work with sam/bam and chain files. I have never used it though.

Maybe it is easier to do the analysis on the parental alignments? Say, for DEG in order to generate read counts table, one might consider transferring the annotation to mat and pat coordinates. Then for every exon/gene extract the reads mapped to it in both alignments and use the number of unique read-ids as the gene read count.

Where does the merge script come in your approach? Do you want to merge the alignments before transferring them to hg19 and then do further analyses (which I am guessing, do not involve looking into allele-specific counts)?

Tab delimited Hinge Atlas Gold

Q1:
I needed to use the Hinge Atlas Gold for my research. I tried using http://www.molmovdb.org/tarballs/hinge_atlas_gold/hinge_atlas_gold.txt link as mentioned in the paper and on the link but it doesn’t have any data but just the metadata.

Can you please help me with this?

A1:
does
http://www.molmovdb.org/tarballs/hinge_atlas_gold/
have what you need?

Q2:
I needed to know annotated hinge residue numbers and their corresponding PDB IDs/ Morph IDs from Hinge Atlas Gold

The link: http://www.molmovdb.org/tarballs/hinge_atlas_gold/hinge_atlas_gold.txt
Has information about the data but not the actual data.
Paper and the link mentions that It is supposed to have tab-delimited data as it says:
This is a tab-delimited database of hinge predictor results and gold standard hinge annotation for the Hinge Atlas Gold dataset used in our submitted HingeMaster manuscript, also used in our BMC Bioinformatics paper, ‘FlexOracle: predicting flexible hinges by identification of stable domains’ by Flores et al.
However, the data is not present over there. Please let me know where I can find it.

A2:
There was a script on the server to refres/regenerate the mysql dump ever so many years ago. It is possible this was run in some way that led to an empty result.

I looked at the Hinge Atlas Gold gallery

There was a script on the server to refres/regenerate the mysql dump ever so many years ago. It is possible this was run in some way that led to an empty result. I looked at the Hinge Atlas Gold gallery (http://molmovdb.org/cgi-bin/movie.cgi?set=HingeAtlasGold ) but it seems to not work either, at least I cannot follow it to pull up the individual morphs. This was all years ago, I don’t have access now which is just as well since I probably don’t have time to debug.

Maybe someone in Mark’s lab can get the gallery back up?

Q3:
Thank you for the response. I read your reply and based on that, I have a suggestion:

‘Hinge Atlas’ (Not Hinge Atlas Gold) link seems to work (http://www.molmovdb.org/tarballs/hingeatlas/hingeatlas.txt) so maybe the shell script in hinge atlas (following):

echo "drop table temp; create table temp select distinct(stats.mid_) from sequence, stats where stats.mid_=sequence.mid_ and stats.nonredundant=1 and (sam_hinge or leslie_hinge); select sequence.mid_,resnum,restype,(sam_hinge or leslie_hinge) from sequence,temp where sequence.mid_=temp.mid_ order by mid_,resnum;" | mysql -u root -p molmovdb > hingeatlas.txt

will possibly work if everything stored in the same table just by replacing stats.nonredundant=1 to stats. (Something representing Hinge Atlas Gold field)

Again, this is just a suggestion.

A3:
The numbering does not seem to quite match up with 1dv2 or 1bnc.pdb . I think maybe it has to do with some renumbering of the PDB file. Probably ff1.pdb would settle this.