Request for a SI document

Q1:
The file in the url:
http://www.nature.com/nature/journal/v489/n7414/extref/nature11245-s1.pdf
(SI of the paper "Architecture of the human regulatory network derived from
ENCODE data") is damaged and cannot be read.

Can you please send me a copy?

A1:
The file in the url:
http://www.nature.com/nature/journal/v489/n7414/extref/nature11245-s1.pdf
(SI of the paper "Architecture of the human regulatory network derived from
ENCODE data") is damaged and cannot be read.

Can you please send me a copy?

Q2:
Thank you for your prompt reply.

The nature11245__ALL.pdf file you generated works fine (using Adobe XI on Windows 7) but as I mentioned, the file download from nature’s website is damaged (on windows and Linux machines).

FYI, here is Linux stderr (using ocular):

Error: PDF file is damaged – attempting to reconstruct xref table…

Error: Couldn’t find trailer dictionary

Error: Couldn’t read xref table

Connecting to deprecated signal QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString)

A2:
Thank you for letting us know. This is actually a more serious problem than what I had been expecting. We may need to contact them about this very soon, as other users will experience the same problem. Thanks again.

It might be a problem of fonts that mac os has but windows/linux don’t. You might want to try produce the pdf on windows and try to open it on mac os and linux and if it works just substitute the file on nature site (which is not a trivial task I guess).

Sequence and quality scores for Brainspan MRF file

Q1:
I am contacted you regarding some questions of "mrfQuantifier" in RSEQtools.

We have cloned many novel transcripts from the whole human brain and wanted
to relate their expression to specific periods/regions using RNA-seq data
from the Brainspan. We were able to download the MRF file that only contains
alignment blocks from Brainspan. We have noticed that you have described in
the Bioinformatics paper that mrfQuantifier does not perform the expression
quantification on the transcript level.

So, Could you let me know why mrfQuantifier does not perform on the
transcript level? Do you mean it does not address the problem of multireads?
Also, "mrf2sam" does not work with MRF files we downloaded from Brainspain
and it shows error "PROBLEM: Unknown presentColumn:
H376_V_51_A1C_L_RNASeq.mrf". Could that be it needs the sequences
information? Also, is any recent tool in RESQtools that would perform on the
transcript level for expression quantification?

A1:
iqseq can perform transcript level quant. Here is my understanding to your questions.

1) When RSEQtools were first published in 2011, it was designed to just take care of the gene or exon form level quantification. Isoform quantification is challenging, partially because of the multi-reads.

Then in 2012, our lab published another paper IQseq (https://code.google.com/p/iqseq/downloads/list). IQseq actually take all the reads mapped to one gene, for those mapped to multiple isoforms, EM algorithm is used to allocate the reads iteratively to each transcript.

2) For the second question, I am not sure what is happening. If possible could please send me the sample mrf files for me to check what’s going on there?

Hopefully this could help a little bit. Please let me know if you have any questions.

Q2:
We have our novel transcript annotations (eg, exon boundaries), and we downloaded all MRF format files of RNA-seq reads of developmental brain from Brainspan to try to quantify the expression levels.
Under "Supplemental Data" from http://www.brainspan.org/static/download.html , there are all the MRF files we downloaded. I have tried to run them with mrf2sam and failed with error.
My question is, would IQseq tool works with MRF files containing only alignment blocks (genomic coordinates only) without sequence information since Brainspan MRF files contain no such information? or it still needs sequences and quality score information to be able to run the tool. The complete set of MRF files can be obtained from Under "Supplemental Data" from http://www.brainspan.org/static/download.html for testing. If would be great if you could also check whether or not IQSeq would work on these set of MRF files.

A2:
After consulting some of my colleagues, I have the answer like this.

For one of the files, if you type

mrf2sam <H376_IIA_50_LGE_L_RNASeq.mrf H376_IIA_50_LGE_L_RNASeq.sam

it works, but if you type mrf2sam H376_IIA_50_LGE_L_RNASeq.mrf
it wont work.

I am not the author of RSEQtools, but I guess that the reason for the command line to be like this is that everything should be piped to avoid I/O for the middle files.

If you still can not figure it out, please let me know what it your command and the error message. Hope this would help!

PEMer for commercial use

Q:
I work for Novartis Institutes For Biomedical Research Inc, in Cambridge, MA, a commercial entity.

Could you use your PEMer software ? ( I remember you allowed me to try your translocation detection software in 2011, but I did not archive that email in 2011?

A:
this is fine.

Query regarding paper “Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors”

Q:
I recently read your paper "Classification of human genomic regions
based on experimentally determined binding sites of more than 100
transcription-related factors" in Genome Biology, since I am interested
in enhancers. If I understand things correctly, you identified ~13k
putative enhancers in K562 cells, but I cannot locate the list of loci
in the supplemental materials. I was wondering if you would be willing
to share that list with me?

A:
see http://encodenets.gersteinlab.org/metatracks/

Correlation ACT error

Q:

I am trying to run the correlation java script and i get the following when I run the example:

java -jar EncodeTfCor2.jar human_genome_file.txt bedlist 1000000 0
Parsing genome chromosomes and tf bindings …
parsing human_genome_file.txt…
parsing lists in bedlist
Building data matrix …
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at encodetfcor2.TfSitesDataMatrixBuilder.<init>(TfSitesDataMatrixBuilder.java:126)
at encodetfcor2.Main.main(Main.java:65)

Any idea whats going on? I have zero familiarity with Java so I am completely lost as to what is going on.

Well I got rid of it and installed a new version and this time I ran the snp data as the example and it worked. I have no idea what happened. One quick question though, I ran the example with the snp from the four individuals and I got the following matrix:

1.000000 0.984057 0.983579 0.941439
0.984057 1.000000 0.985570 0.956917
0.983579 0.985570 1.000000 0.952203
0.941439 0.956917 0.952203 1.000000

The track_names.txt says the following:

chinese.sites.chr1.parsed
korean.sites.parsed.chr1
venter.sites.parsed.chr1
watson.sites.parsed.chr1

so is the actual matrix then:

names chinese korean venter watson
chinese 1.000000 0.984057 0.983579 0.941439
korean 0.984057 1.000000 0.985570 0.956917
venter 0.983579 0.985570 1.000000 0.952203
watson 0.941439 0.956917 0.952203 1.000000

The readme file isnt very clear on that. Thanks.

A:
Yes, the matrix is labelled correctly.

protein sequences co-evolution software

Q:

I’m writing to you in connection with your research on the computational tools for the study of residue co-evolution in protein sequences, described in Bioinformatics (2008), http://coevolution.gersteinlab.org

We have a summer internship opportunity here at Dupont Industrial Biosciences (IB) in Palo Alto and the proposed project would involve evaluating different methods for identifying co-evolving residues, so that the suitable method or methods could be applied to proteins and protein families of interest to the company. If this approach is successful, it could help guide future protein engineering efforts here at Dupont IB.

If you happen to know a candidate who would be interested in this internship opportunity, I would welcome your recommendations. I’m in the process of interviewing a few people, but would be glad to talk to additional qualified candidates.

This internship is somewhat unusual because it is not part of a bioinformatics group, so the intern would need to make independent judgments regarding the merits and drawbacks of different approaches and regarding the technical implementation of the project.

My second question is whether there are any terms or conditions associated with using the co-evolution computational tools from your lab? Are the terms different if we were to run these programs on a local computer here within the company (rather than submitting our sequences to the remote server)? I didn’t see any indications to that effect on the coevolution.gersteinlab.org page or in the publication, but it is an important aspect to clarify before using external software within the company, so I hope you can let me know what the rules are or suggest the person I should contact.

A:
I’ll look for an intern. There’s no conditions on the use of this software — it’s open source. Just cite us as described on the permissions page.