EN-TEX data

Your postdoc give a great talk about the EN-TEX work in the ASHG meeting. The data
generated from this project will benefit the community greatly. Could you
please tell when and how the data will be made available for external users?

Thank you for your suggestion. In the mean time, you can find the correct versions of fasta and blast freely available online. For easing the user experience we provide a link to the two packages on the website http://pseudogene.org/pseudopipe/ .

Inquiry about STRESS

I am writing this e-mail to inquire about STRESS software.

We have learned from your paper (Structure 2016,24:826-837)
that STRESS software can be used for identifying allosteric pockets.
We are interested in using the software for our drug discovery research.
We will perform evaluation of the software for a start.

Will you allow us to use STRESS software for the purpose of our
commercial drug discovery project free of charge?

As this is an urgent project, we would highly appreciate if you could
reply soon.

see license at https://sites.gersteinlab.org/permissions/

HiC-Spector data

We have read with much interest your article about the HiC-Spector method.
We are currently working on a method that we hope will help identify
conserved features across different HiC-maps. As the problem we are studying
and the one tackled in your article are closely related, we think it would
be useful for us to test our method using your data set as the ground truth.
We kindly ask whether you would be able to provide us with the HiC maps used
in the article for this purpose.


Indel counts for RCC WGS paper

We’ve just been reading your excellent papillary RCC WGS paper- there is a
real paucity of data on papillary cases, so many thanks for this.

Sorry if I missed it, but do you happen to know the SNV and (small scale)
indel counts across the cohort? We’re especially interested in indel
mutations in RCC, and wandered what proportion of your variants were of this

For tumor SNV counts, you can find them in the supplemental table (https://doi.org/10.1371/journal.pgen.1006685.s009). We also include SVs in the supplements too. Unfortunately, we do not have indels for those tumors.

Loregic – further validation

I’ve been trying to apply the Loregic algorithm in other organisms in order to further validate the method, however I’m finding some inconsistencies that could be related to data manipulation (choosing datasets, merging and mean-centering samples).
Furthermore, I’ve also found those inconsistencies when trying to reproduce the analysis from yeast datasets provided in your publication (probably due to the same data manipulation issues described before).

Would you be able to provide a more in-depth protocol for using Loregic with multiple datasets (how you handled the data, for example) in order to improve the consistency of the method between labs?

Yes, we normalized the yeast data. Here was how we preprocessed:

1) got time-series yeast cell cycle data (alpha, cdc15, cdc28) from
which were logarithm values.
2) standardized(2^(data)) s.t., each time point has mean=0, and sigma=1
3) binarized the standardized data using the function,
binarizeTimeSeries with ‘kmeans’ clustering in R package BoolNet.

Request for the pdf version of the article

Currently my research
area focuses on the whole genome sequencing (WGS) of Indian samples. However
during my PhD i have worked on the study copy number variation in Indian
population and its implication in health.

Can you please send the following article "The current excitement about
copy-number variation: how it relates to gene duplications and protein
families" in the pdf format for my reference.

Thank you for requesting copies of some of my recent
papers. Essentially all of my work is available on-line. Go to:


and click on the appropriate "preprint" link. You will be get a
preprint or (if appropriate) journal reprint of the paper you want.
There should be NO password challenges or other barriers. Usually, the
papers are in PDF format but some are in HTML. (Other formats are
available directly from http://papers.gersteinlab.org/e-print.)

Please let me know if you have any problems with this service. If you
can’t get what you want, we can easily post you normal paper reprints.

Java chromod package request CoassociationAnalyzer.java and GSCCoassociationAnalyzer.java scripts that Kevin Yip wrote (April 14, 2011)

I’m writing to you to see if you could share with me your java "chromod" package – I’m wanting to use the CoassociationAnalyzer.java and GSCCoassociationAnalyzer.java scripts that Kevin Yip wrote (April 14, 2011), but they rely on the chromod package (package org.gersteinlab.chromod)

If you could share this with me if it’s not a top secret lab package, I would be hugely indebted!

Please download it at http://www.cse.cuhk.edu.hk/~kevinyip/outbox/chromod.jar . Let me know if you encounter any problem when using it.

Loregic paper: binarized yeast expression data

I am writing to ask if you could kindly share with me the yeast cell cycle binarized expression data that you used in Loregic’s paper.

In our group we would like to find a method to identify the logic rules that govern cooperativity of multiple regulators, in GRNs built from differentially expressed genes.

The amount of samples we will have is limited, so we will be mainly relying on literature information, and as a first step we would like to test our method on your binarized expression data.

We used BoolNet to binarize data,
http://cran.r-project.org/web/packages/BoolNet/index.html . We also
tried ArrayBin,
http://cran.r-project.org/web/packages/ArrayBin/index.html, which gave
very similar Loregic results with BoolNet (see Supplemental Figure).

The yeast cell cycle data we used was the classical microarray data
published in 1998 (Spellman & Cho):

Technical questions about local gene co-expression

I am interested to assess the matching
score and the relationship between expression profiles as you did in your
Qian et al 2000 (pubmedid: 11743722) paper, on my own data.
But I need some clarifications if possible.
After normalizing gene expressions using z-score, how did you eliminated
the negative expression levels? In other words, if the expression of each
gene is normalized using z-score, so each gene contains positive and
negative normalized expression levels, so how do you define genes having
negative expression levels?

Normalization was used to calculate the correlation coefficient. Although we will have negative values, we should not interpret them as actual gene expression levels.

To estimate the p-value of each matching score, how did you generated the
random expression profiles? Did you switched two gene expression time points
for each gene or did you permuted the gene expressions for each gene?

We permuted the gene expression for each gene by switching two gene expression time points.

If I wish to determine locally co-expressed genes in different
time-series experiments, can I combine the gene expression profiles from the
different experiments in one matrix as bellow and apply your algorithm on
this new matrix instead of applying the algorithm on the gene expression
profile of each experiment alone?
exp1: exp1_t1, exp1_t2, exp1_t3, exp1_t4
exp2: exp2_t1, exp2_t2, exp2_t3
combined_exp: exp1_t1, exp1_t2, exp1_t3, exp1_t4, exp2_t1, exp2_t2, exp2_t3.

Our algorithm will detect the time delayed relationships. If exp2_t1 is indeed the measurement following exp1_t4, the operation should be fine.