A forum for conversations about published paper

I saw your paper "Structuring supplemental materials in support of reproducibility" and appreciate your points. I would love to see a forum (like GATK’s forum or StackOverflow) where each topic for a conversation thread is a single published paper. Then everyone who is trying to replicate results could post their questions and authors their answers for all to see. I think this would be much better than the current closed system of emailing the authors. I would love to see a day when a link to a forum is provided on papers, rather than the authors’ email addresses.Who would have the ability to make something like this get started and catch on? Do you know if they are thinking about funding a platform for something like this at the NIH?

with respect to "Who would have the ability to make something like this get started and catch on?"
maybe plos

with respect to "Do you know if they are thinking about funding a platform for something like this at the NIH?"
don’t know

Permission to use images

I have been using the Genboree exceRpt workflow, and loving it! It has saved me so much time! Your paper got me on to it, and I would like to use one of the figures (1) of the exceRpt pipeline in my PhD thesis. Am I right to contact you to request permission? Or should I be heading to Cell for this?

fine w/ me – just acknowledge us (see

MOAT output

I have a problem with the analysis and I’m not sure if I am using you software properly. I am trying to calculate the mutation burden of some of my samples (similar to the measurements performed here: https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-017-0424-2#Sec2). I ended up trying to using MOAT from the second comment of this post in Biostars (https://www.biostars.org/p/299549/). However I cannot obtain the percentage as (nr.mutations/Mb). I am using MOAT-a using the argument “—wg_signal_mode=n”, I am doing something wrong?

MOAT-a wasn’t meant to be used that way. The simulated variants in MOAT-a are internal data used to calculate p-value significance for elevated mutation burden on the input annotations. You can use MOAT-s to create a simulated variant set, and then calculate (number_of_mutations)/Mb from that.

source code for context-specific TF co-association analysis in ‘Architecture of the human regulatory network derived from ENCODE data’

I have benefited a lot from you work entitled ‘Architecture of the human regulatory network derived from ENCODE data’ and I want to use the framework you developed for context-specific TF co-association analysis. However, I can’t find the source code at your given address http://code.google.com/p/tf-co-association/. Do you have the replaced address to share the source code for that?

Is this what you are looking for?

A question about 3V

Regarding your paper entitled "3V: cavity, channel and cleft volume calculator and extractor", which I read carefully.

I’ve a question for you. In the abstract, it is written the following:"It rapidly finds internal volumes by taking the difference between two rolling-probe solvent-excluded surfaces,…", but I think you mean "two imaginary rolling-probe solvent-excluded surfaces" because after looking at your code, I haven’t seen any analytic SES formulation therein. I guess you are just using two probe spheres of distinct radii to account for cavities, not the analytic SES themselves. Am I right?

I am not certain about your use of the term "imaginary", but I would say my method is a "discrete approximation" to the SES. And because it is discrete (i.e. a 3D grid) one can simply subtract one grid from another. See attached figures.

With small grid sizes (0.2 A), I see very little discrepancy to the analytical solution.




Retrotransposon Quantitation

I read about the recently published software for deconvoluting pervasive and autonomous retrotransposons. Could another calculation be added to the software’s output which estimates the abundance of ORF1 and ORF2, the parts of the retrotransposon which are translated into protein? I’m not experienced in this research area, so I am unsure of how feasible that is. I would like to make an approximation to the ORF1 and ORF2 protein abundances using RNA-seq.

Thanks for reaching out here and on GitHub. This is an interesting question and suggestion. Unfortunately, estimating the rate of protein abundance of ORF1 and ORF2 from RNA-seq is extremelly hard. There are essentially two factors that make it difficult to estimate protein abundance from transcriptome data. The first is technical. RNA-seq has a strong bias to overrepresenting the 3′ or transcripts, therefore, ORF2 would most likely be overestimated. This is issue is easily addressable.

The second one is more biological: LINE-1 is tightly regulated at many different levels. No only LINE-1 transcription is regulated but there are also many post-transcription mechanisms that either boost or stop LINE-1 translation. This is not only true for LINE-1, in general, estimating protein abundance from RNA is a hard problem (https://www.nature.com/articles/nrg3185).
That said, I’m really interested in this question. In theory, we could use machine learning algorithms to predict ORF1 and ORF2 protein levels based on RNA-seq if we had enough data. This could be an interesting followup work after TeXP

Running SVFX

I would like to run your new SVFX method on some structural variants. For full disclosure, I’m working on a method to assess the pathogenicity of germline SVs, and would like to compare with yours. Based on reading your preprint, I believe our methods are quite distinct in terms of training data. I think it’s great you’ve already put code on github, but I’m not sure what data files are needed to run the code. Could you put me in touch with one of your students to help me run SVFX locally?

Thanks for your interest in SVFX. We have reported our feature list in supplement table1.

Overall, our feature list is extracted from a bunch of genomic annotations and various functional genomics/epigenomics signal files.

You can download signal files from iHEC or epigenome roadmap data portal. As you might have noticed, we created multiple tissue-specific models for our analysis.

For the germline model, we also built a feature matrix based on the h1HESC cell line, which performed quite well. On the SVFX GitHub page, we have uploaded the bed file for different annotations (under the data folder) used in our study.

MorphServer job

I am trying to use Morph Server, my job ID: b198308-29491
It has been running for 5 days and is still not completed. Could you please check?

The files in the folder of your jobs seems to indicate that it
finished successfully. You can find the files using


So yours would be in


Unfortunately, some of the other features of the web interface need