Request for the pdf version of the article

Q:
Currently my research
area focuses on the whole genome sequencing (WGS) of Indian samples. However
during my PhD i have worked on the study copy number variation in Indian
population and its implication in health.

Can you please send the following article "The current excitement about
copy-number variation: how it relates to gene duplications and protein
families" in the pdf format for my reference.

A:
Thank you for requesting copies of some of my recent
papers. Essentially all of my work is available on-line. Go to:

http://papers.gersteinlab.org

and click on the appropriate "preprint" link. You will be get a
preprint or (if appropriate) journal reprint of the paper you want.
There should be NO password challenges or other barriers. Usually, the
papers are in PDF format but some are in HTML. (Other formats are
available directly from http://papers.gersteinlab.org/e-print.)

Please let me know if you have any problems with this service. If you
can’t get what you want, we can easily post you normal paper reprints.

Java chromod package request CoassociationAnalyzer.java and GSCCoassociationAnalyzer.java scripts that Kevin Yip wrote (April 14, 2011)

Q:
I’m writing to you to see if you could share with me your java "chromod" package – I’m wanting to use the CoassociationAnalyzer.java and GSCCoassociationAnalyzer.java scripts that Kevin Yip wrote (April 14, 2011), but they rely on the chromod package (package org.gersteinlab.chromod)

If you could share this with me if it’s not a top secret lab package, I would be hugely indebted!

A:
Please download it at http://www.cse.cuhk.edu.hk/~kevinyip/outbox/chromod.jar . Let me know if you encounter any problem when using it.

Loregic paper: binarized yeast expression data

Q:
I am writing to ask if you could kindly share with me the yeast cell cycle binarized expression data that you used in Loregic’s paper.

In our group we would like to find a method to identify the logic rules that govern cooperativity of multiple regulators, in GRNs built from differentially expressed genes.

The amount of samples we will have is limited, so we will be mainly relying on literature information, and as a first step we would like to test our method on your binarized expression data.

A:
We used BoolNet to binarize data,
http://cran.r-project.org/web/packages/BoolNet/index.html . We also
tried ArrayBin,
http://cran.r-project.org/web/packages/ArrayBin/index.html, which gave
very similar Loregic results with BoolNet (see Supplemental Figure).

The yeast cell cycle data we used was the classical microarray data
published in 1998 (Spellman & Cho):
http://genome-www.stanford.edu/cellcycle/data/rawdata/

Technical questions about local gene co-expression

Q1:
I am interested to assess the matching
score and the relationship between expression profiles as you did in your
Qian et al 2000 (pubmedid: 11743722) paper, on my own data.
But I need some clarifications if possible.
After normalizing gene expressions using z-score, how did you eliminated
the negative expression levels? In other words, if the expression of each
gene is normalized using z-score, so each gene contains positive and
negative normalized expression levels, so how do you define genes having
negative expression levels?

A1:
Normalization was used to calculate the correlation coefficient. Although we will have negative values, we should not interpret them as actual gene expression levels.

Q2:
To estimate the p-value of each matching score, how did you generated the
random expression profiles? Did you switched two gene expression time points
for each gene or did you permuted the gene expressions for each gene?

A2:
We permuted the gene expression for each gene by switching two gene expression time points.

Q3:
If I wish to determine locally co-expressed genes in different
time-series experiments, can I combine the gene expression profiles from the
different experiments in one matrix as bellow and apply your algorithm on
this new matrix instead of applying the algorithm on the gene expression
profile of each experiment alone?
exp1: exp1_t1, exp1_t2, exp1_t3, exp1_t4
exp2: exp2_t1, exp2_t2, exp2_t3
combined_exp: exp1_t1, exp1_t2, exp1_t3, exp1_t4, exp2_t1, exp2_t2, exp2_t3.

A3:
Our algorithm will detect the time delayed relationships. If exp2_t1 is indeed the measurement following exp1_t4, the operation should be fine.

Request for a SI document

Q1:
The file in the url:
http://www.nature.com/nature/journal/v489/n7414/extref/nature11245-s1.pdf
(SI of the paper "Architecture of the human regulatory network derived from
ENCODE data") is damaged and cannot be read.

Can you please send me a copy?

A1:
The file in the url:
http://www.nature.com/nature/journal/v489/n7414/extref/nature11245-s1.pdf
(SI of the paper "Architecture of the human regulatory network derived from
ENCODE data") is damaged and cannot be read.

Can you please send me a copy?

Q2:
Thank you for your prompt reply.

The nature11245__ALL.pdf file you generated works fine (using Adobe XI on Windows 7) but as I mentioned, the file download from nature’s website is damaged (on windows and Linux machines).

FYI, here is Linux stderr (using ocular):

Error: PDF file is damaged – attempting to reconstruct xref table…

Error: Couldn’t find trailer dictionary

Error: Couldn’t read xref table

Connecting to deprecated signal QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString)

A2:
Thank you for letting us know. This is actually a more serious problem than what I had been expecting. We may need to contact them about this very soon, as other users will experience the same problem. Thanks again.

It might be a problem of fonts that mac os has but windows/linux don’t. You might want to try produce the pdf on windows and try to open it on mac os and linux and if it works just substitute the file on nature site (which is not a trivial task I guess).

Sequence and quality scores for Brainspan MRF file

Q1:
I am contacted you regarding some questions of "mrfQuantifier" in RSEQtools.

We have cloned many novel transcripts from the whole human brain and wanted
to relate their expression to specific periods/regions using RNA-seq data
from the Brainspan. We were able to download the MRF file that only contains
alignment blocks from Brainspan. We have noticed that you have described in
the Bioinformatics paper that mrfQuantifier does not perform the expression
quantification on the transcript level.

So, Could you let me know why mrfQuantifier does not perform on the
transcript level? Do you mean it does not address the problem of multireads?
Also, "mrf2sam" does not work with MRF files we downloaded from Brainspain
and it shows error "PROBLEM: Unknown presentColumn:
H376_V_51_A1C_L_RNASeq.mrf". Could that be it needs the sequences
information? Also, is any recent tool in RESQtools that would perform on the
transcript level for expression quantification?

A1:
iqseq can perform transcript level quant. Here is my understanding to your questions.

1) When RSEQtools were first published in 2011, it was designed to just take care of the gene or exon form level quantification. Isoform quantification is challenging, partially because of the multi-reads.

Then in 2012, our lab published another paper IQseq (https://code.google.com/p/iqseq/downloads/list). IQseq actually take all the reads mapped to one gene, for those mapped to multiple isoforms, EM algorithm is used to allocate the reads iteratively to each transcript.

2) For the second question, I am not sure what is happening. If possible could please send me the sample mrf files for me to check what’s going on there?

Hopefully this could help a little bit. Please let me know if you have any questions.

Q2:
We have our novel transcript annotations (eg, exon boundaries), and we downloaded all MRF format files of RNA-seq reads of developmental brain from Brainspan to try to quantify the expression levels.
Under "Supplemental Data" from http://www.brainspan.org/static/download.html , there are all the MRF files we downloaded. I have tried to run them with mrf2sam and failed with error.
My question is, would IQseq tool works with MRF files containing only alignment blocks (genomic coordinates only) without sequence information since Brainspan MRF files contain no such information? or it still needs sequences and quality score information to be able to run the tool. The complete set of MRF files can be obtained from Under "Supplemental Data" from http://www.brainspan.org/static/download.html for testing. If would be great if you could also check whether or not IQSeq would work on these set of MRF files.

A2:
After consulting some of my colleagues, I have the answer like this.

For one of the files, if you type

mrf2sam <H376_IIA_50_LGE_L_RNASeq.mrf H376_IIA_50_LGE_L_RNASeq.sam

it works, but if you type mrf2sam H376_IIA_50_LGE_L_RNASeq.mrf
it wont work.

I am not the author of RSEQtools, but I guess that the reason for the command line to be like this is that everything should be piped to avoid I/O for the middle files.

If you still can not figure it out, please let me know what it your command and the error message. Hope this would help!

PEMer for commercial use

Q:
I work for Novartis Institutes For Biomedical Research Inc, in Cambridge, MA, a commercial entity.

Could you use your PEMer software ? ( I remember you allowed me to try your translocation detection software in 2011, but I did not archive that email in 2011?

A:
this is fine.