Request for the pdf version of the article

Q:
Currently my research
area focuses on the whole genome sequencing (WGS) of Indian samples. However
during my PhD i have worked on the study copy number variation in Indian
population and its implication in health.

Can you please send the following article "The current excitement about
copy-number variation: how it relates to gene duplications and protein
families" in the pdf format for my reference.

A:
Thank you for requesting copies of some of my recent
papers. Essentially all of my work is available on-line. Go to:

http://papers.gersteinlab.org

and click on the appropriate "preprint" link. You will be get a
preprint or (if appropriate) journal reprint of the paper you want.
There should be NO password challenges or other barriers. Usually, the
papers are in PDF format but some are in HTML. (Other formats are
available directly from http://papers.gersteinlab.org/e-print.)

Please let me know if you have any problems with this service. If you
can’t get what you want, we can easily post you normal paper reprints.

Questions about chromatin data

Q:
I would like to compare some data she has with ChIP-chip/ChIP-seq data in the worm. We have found wig files but these are not very useful. Can you direct us to a site with peak calls? (How were the peaks called?)

A:
the published worm & fly data, incl. peak calls, is at:
https://www.encodeproject.org/comparative

The peak calling is described in Boyle et al. & on the website – eg
https://www.encodeproject.org/comparative/regulation/#Humanset6

Question about RNA-Seq, HiC, ChIP-Seq data integration analysis software

Q:
I have been searching for a bioinformatics software package that allows for the integration of different NGS analysis, RNA-Seq, HiC, ChIP-Seq in one project.

Does software like this exist yet?

A:
I don’t know of a package that can exactly help you but in some of previous work we’ve integrated RNA-seq and Chip-seq.

See links:

http://papers.gersteinlab.org/papers/worm_HM/
http://papers.gersteinlab.org/papers/tfmodel/

Code for random forest models in paper “Comparative analysis of the transcriptome across distant species”

Q:
I read your recent letter in Nature ("Comparative analysis of the transcriptome across
distant species") and would like to use your strategy to model and predict gene expression profiles using modified histone ChIP-seq data in Eucalyptus.

Currently we have RNA-seq data for 7 tissues and some total, stranded and small RNA-seq data for selected tissues. I’m busy generating ChIP-seq profiles of 5 histone modifications in two tissues, and I’d like to see to what degree we can predict mRNA-seq data from these. We also have DNase-seq and TF ChIP-seq experiments planned in future.

I was wondering whether you have any workflows or scripts that you would be willing to share with us that would help us to better understand how the randomForest package was used for the modeling (I don’t have a programing background but we have an able bioinformatics unit). Alternatively, it would be a pleasure to collaborate on a publication with your lab if your team could assist us with the modeling aspect.

A:
There’s some scripts associated with:

http://papers.gersteinlab.org/papers/worm_HM/

Source code for paper “MSB: A mean shift based approach for the analysis of structural variation in the genome”

Q:
I have recently read your paper "MSB: A mean shift based approach for the analysis of structural variation in the genome",But it was hard for me to realize your method.Could you please send your source code for me to reference? Thanks for your kinder consideration .

A:
the mean-shift alg. is very similar to that of CNVnator for which distribute code. I suggest you use that.

Question about Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors

Q:
I was intrigued by your paper about classifying the human genomic regions based on experimentally determined transcription factor binding sites. I was wondering if you can share genomic loci of the six types of regions that you were able to identify in this paper. I was also wondering if by your analysis you were able to conclude which regions are not tissue specific. I was also curious to know if you have done similar analysis on other species. It would be great if you would be able to share the scripts that you used to generate these results if they are available in some sort of a program/package.

A:
see
funseq2.gersteinlab.org
+
metatracks.gersteinlab.org

Question about A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data

Q:
My research focus on understanding measure trust prediction in social networks. I read your paper about A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data. I am interested in this method. Maybe I would use Bayesian Networks Approach for Predicting User-User Interactions from Social Network. So I want to ask whether I can refer to the realization of the experimental in this paper, especially for the code and data.

A:
yes – see http://networks.gersteinlab.org/intint/

7K ncRNA gene set

Q:
We currently have in WormBase the ‘7K’ set of ncRNA genes as described in
the 2011 Integrative analysis modENCODE paper.

We have been looking at the new ENCODE/modENCODE Comparative analysis paper
in Nature.
This paper describes the supervised prediction of a set of ncRNA genes that
do not overlap existing genes.
It is not obvious where to get details of these predicted genes.

Is there a file of chromosomal locations of these genes that we can have?

Are these predicted ncRNA genes suitable for replacing the old ‘7K’ set of
ncRNA genes?

A:
Hi, yes, you can get these from encodeproject.org/comparative . I do
think these can supplement the 7k.

I’d use the new set at encodeproject.org for a smaller, more high-quality & more conservative set than that in the ’10 paper. -marK

Spectral biclustering

Q:
I recently read
your 2003 paper titled "Spectral biclustering of microarray data: Coclustering
genes and conditions".

I would like to investigate implementing your approach on a GPU.
Is there any code (Matlab? Python?) you would be willing to share as a result of the paper?

A:
Sorry we’re just using simple SVD routines in matlab. No meaningful code available. -marK