The real cost of sequencing: higher than you think!

Q:
In support of an editorial comment I’m working on with a colleague, I just read your paper: The real cost of sequencing: higher than you think!

It is right on the money…as far as the topic I’m interested in but is, obviously, dated (but was helpful for me nonetheless). Pubmed didn’t bring up a more recent revision of your paper…so I assume it hasn’t been updated.

Can you recommend any other, more recent papers or resources that address the issue of cost/price of sequencing? The NHGRI webpage on this topic is also helpful, but it was written in 2016 and does not address the post sequencing costs such as variant calling, annotation, and interpretation. Do you think your illustration of the proportion of costs in 2020 due to the different components of testing still holds true?

A:
see
http://papers.gersteinlab.org/papers/costseq2
http://papers.gersteinlab.org/papers/dsg

Types in Pseudopipe output

Q:
I am using Pseudopipe and I am wondering the different types of its output.
I looked into the script and found there are several types: GENE-SINGLE, PSSD, FRAG, GENE-MULT, and DUP. Would you like to explain the meaning of each type?

A:
From what i can see you are looking at an intermediary result file not at the final output. The final output should contain only 3 biotypes: PSSD, DUP and FRAG.
The PSSD is indicative of processed pseudogenes, DUP is indicative of duplicated pseudogenes, FRAG is indicative of pseudogene loci where we can not assign with certitude a biotype (processed or duplicated).
GENE-SINGLE and GENE-MULTI are intermediary biotype definitions. The SINGLE refers to the fact that the pseudogene locus contains only one exon (similar to processed pseudogenes) and MULTI refers to the fact that the potential pseudogenic locus contains multiple exons (similar to duplicated pseudogenes).
If a proposed locus has over 95% sequence identity to the parent gene and covers over 95% of the parent gene sequence and there are no identifiable disablements associated with it we initially refer to these potential loci as GENE-SINGLE and respectively GENE-MULTI. If we find a polyA tail you might see PSSD|GENE-SIGNLE and in that case we will relabel that locus as a processed pseudogene. For very high similarity we tend to be conservative and not label that locus as a pseudogene. If we find in subsequent searches additional data (E.g. polyA tail, truncations etc) we will relabel the locus as pseudogene.

List of 321 high confidence SCZ-associated genes from Wang et al. 2018

Q:
I read your excellent work in Wang et al. 2018, and am wondering whether you could kindly share the list of 321 high confidence SCZ-associated genes. We are studying SCZ iPSC-derived interneurons and this information would be helpful for us to understand which DE gene may be causal in our system.

A:
It should be at: http://resource.psychencode.org

Request for data from Zhang and Gerstein NAR (2003)

Q:
I recently came across your paper, "Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes."

I’m interested in the substitution rates in human pseudogenes. Figure 2A from your paper (pasted below) plots these rates. Would you be able to send me these rates as a table?

Additionally, has your group calculated the substitution rates for more families of pseudogenes? (The NAR 2003 paper only analyzed ribosomal protein pseudogenes sequences.) I tried poking around psiDR, but wasn’t not able to find this type of information readily available.

These substitution rate matrices would be very helpful for my research.

A:
see: http://www.pseudogene.org/indel-nar
(via http://papers.gersteinlab.org/papers/indel-nar)

Question regarding list of human pseudogenes

Q:
I am … developing an application that matches cancer patients to treatment based on the person’s genetic profile. We are looking for an updated list of human pseudogenes to use in evaluating submitted DNA variants. Can you tell me if the Pseudo Fam data files at the pseudogen.org website are still being updated? If not, perhaps you could recommend an alternate source?

A:
Best to get an updated list of pseudogenes from pseudogene.org, which is continually updated, ie http://pseudogene.org/Human/. Yucheng

Referring to your paper: Structuring supplemental materials in support of reproducibility

Q:
I just read your paper mentioned above. I work in the area of
computational reproducibility so the paper was pretty interesting to
read. However, I stumbled a bit over one of your concluding remarks. You
are saying

"One useful tactic may be detailed sampling: perhaps it is best for the
editor to organize a system wherein, randomly, referees are asked to
review samples in greater detail to ensure the overall quality of the
supplements without quickly overwhelming the peer review system."

I am not sure whether I understood correctly how this could be
implemented. Does it mean that the editor randomly asks one of the
reviewers to look at the supplements, or do all reviewers look at
subsets of supplements? I find this idea pretty interesting and was
wondering whether you have published further articles on this topic?

A:
With respect to: "Does it mean that the editor randomly asks one of the reviewers to look at the supplements, or do all reviewers look at subsets of supplements?"
—> The former

With respect to: "I find this idea pretty interesting and was wondering whether you have published further articles on this topic?"
—> Not exactly.., but you might find useful the related work:
http://papers.gersteinlab.org/papers/structbl
http://papers.gersteinlab.org/papers/SDA

A forum for conversations about published paper

Q:
I saw your paper "Structuring supplemental materials in support of reproducibility" and appreciate your points. I would love to see a forum (like GATK’s forum or StackOverflow) where each topic for a conversation thread is a single published paper. Then everyone who is trying to replicate results could post their questions and authors their answers for all to see. I think this would be much better than the current closed system of emailing the authors. I would love to see a day when a link to a forum is provided on papers, rather than the authors’ email addresses.Who would have the ability to make something like this get started and catch on? Do you know if they are thinking about funding a platform for something like this at the NIH?

A:
with respect to "Who would have the ability to make something like this get started and catch on?"
maybe plos

with respect to "Do you know if they are thinking about funding a platform for something like this at the NIH?"
don’t know

Permission to use images

Q:
I have been using the Genboree exceRpt workflow, and loving it! It has saved me so much time! Your paper got me on to it, and I would like to use one of the figures (1) of the exceRpt pipeline in my PhD thesis. Am I right to contact you to request permission? Or should I be heading to Cell for this?

A:
fine w/ me – just acknowledge us (see
https://sites.gersteinlab.org/permissions/)