Recently, I read a paper which was published in Cell, titled "Passenger Mutations in More Than 2,500 Cancer Genomes: Overall Molecular Functional Impact and Consequences". Cause of my research topic was similar with this paper, just one of question about Figure 2B. In this heatmap, I saw totally 80 motifs on the bottom, but only 70 rows up to them, I was a little bit confused how did you know the ETS motif matched to the marked row?
The rows in the figure correspond to different cancer cohorts or meta-cohorts. We also provide this information on the cancer cohort with significant differential burdening in Supplement 1 in the paper.
I was trying to download a subset of data from your recent paper (https://www.cell.com/cell/fulltext/S0092-8674(20)30113-6). However, the website is returning ‘not found’ error (http://pcawg.gersteinlab.org/). Especially, I am interested in ‘Gene list categories’. Therefore, I kindly request you to share relevant files listed under ‘Gene List Categories’ on the website, so I could use in my analysis.
The website works fine for me. Sure it doesn’t work ? … Please let me know which specific file are you trying to download.
Thanks a lot for the reply.
I need the gene list categories listed under PCAWG-specific annotations (http://pcawg.gersteinlab.org/#Annotations)
Immune Response Genes
DNA repair Genes
Cancer Pathway Genes
cell Cycle Genes
For some reason, when I click on the link, it’s directly downloading the html file with error. It would be great if you could share these files.
You can download relevant files from the link listed below.
I am writing with a question about the cQTL analysis in Wang et al 2018. Were the 292 individuals analyzed in this analysis all of European ancestry? If not, what were the sample sizes for European vs non-European ancestry, and how did you control for ancestry in your analysis?
I apologize for writing with such a detailed question, but I could not find the answer in the main text or supplement of the paper, or on the synapse website. (Context: I am interested in cross-population genetic analyses of psychiatric disease and wondering if PyschENCODE cQTL data is relevant.)
In calculating the cQTLs, we used 173 Caucasians and 119 non-Caucasians. With respect to controlling for ancestry — we used the top three genotype principal components as covariates to control for ancestral group.
I was trying to reproduce the DTE results as described in the paper "Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder". I am a registered user of synapse but was unable to find the data mentioned below and would really appreaciate your help in obtaining the same.
The supplementary method of this paper mentions the different covariates used for carrying out DGE and DTE using the nlme package. Would it be possible to obtain the seqPCs and SV values, particulary seqPCs (1-3, 5-8, 10-14, 16, 18-25, 27-29) and SVs (1-4) used in the lme model?
Additionally, could I obtain the final list of sample IDs that made it to the DGE/DTE analysis?
See the seqPCs we used in our analysis (attached)
Since I am working on pseudogene identification for my new project, I was using your pipeline. But I am having few errors which I am going to mention below. Can you please help me to resolve these errors. I shall be very grateful to you.
> 1. On terminal:
> sudo bash pseudopipe.sh ~/pgenes/ppipe_output/caenorhabditis_elegans_62_220a ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/Caenorhabditis_elegans.WS220.62.dna.chromosome.%s.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/pep/Caenorhabditis_elegans.WS220.62.pep.fa ~/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/mysql/chr%s_exLocs 0
> Making directories
> Copying sequences
> Fomatting the DNAs
> Preparing the blast jobs
> Skipping blast
> Processing blast output
> Skipping the processing of blast output
> Running Pseudopipe on both strands
> Working on M strand
> Finished Pseudopipe on strand M
> Working on P strand
> Finished Pseudopipe on strand P
> Generating final results
> find: ‘/home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/minus/pgenes’: No such file or directory
> find: ‘/home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/plus/pgenes’: No such file or directory
> gzip: /home/kashmir/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/pgenes/*/pgenes/*.all.fa: No such file or directory
> Finished generating pgene full alignment
> Finished running Pseudopipe
> 2. In log file inside minus and plus folder:
> need to document overlap parameter (30) and dependency on mask array files.
> mask fields [2, 3]
> Traceback (most recent call last):
> File "/home/kashmir/SOFTWARE/pgenes/pseudopipe/core/filterEnsemblGene.py", line 60, in <module>
> maskFile = openOrFail(ExonMaskTemplate % chr, ‘r’)
> TypeError: not all arguments converted during string formatting
> running filterEnsemblGene.py
> failed during filterEnsemblGene.py stage.
From the output it looks like you had a couple of issues starting with the blast job.
Could you please check your output directory in the blast/output folder and see if you see any split000*.Out files (where * is a number). If you don’t see any output files it means that your blast job did not run. In order run the pipeline you need to have a couple of additional software packages installed and preferentially added to the path. Specifically you will need: blast-2.2.13 and fasta-35.1.5. If you do not want to add them to the path, you can add the path to their location in the env.sh file that you can find in the bin folder of the PseudoPipe.
This should allow you to run the pipeline without any issues.
I am … developing an application that matches cancer patients to treatment based on the person’s genetic profile. We are looking for an updated list of human pseudogenes to use in evaluating submitted DNA variants. Can you tell me if the Pseudo Fam data files at the pseudogen.org website are still being updated? If not, perhaps you could recommend an alternate source?
Best to get an updated list of pseudogenes from pseudogene.org, which is continually updated, ie http://pseudogene.org/Human/. Yucheng
I just read your paper mentioned above. I work in the area of
computational reproducibility so the paper was pretty interesting to
read. However, I stumbled a bit over one of your concluding remarks. You
"One useful tactic may be detailed sampling: perhaps it is best for the
editor to organize a system wherein, randomly, referees are asked to
review samples in greater detail to ensure the overall quality of the
supplements without quickly overwhelming the peer review system."
I am not sure whether I understood correctly how this could be
implemented. Does it mean that the editor randomly asks one of the
reviewers to look at the supplements, or do all reviewers look at
subsets of supplements? I find this idea pretty interesting and was
wondering whether you have published further articles on this topic?
With respect to: "Does it mean that the editor randomly asks one of the reviewers to look at the supplements, or do all reviewers look at subsets of supplements?"
—> The former
With respect to: "I find this idea pretty interesting and was wondering whether you have published further articles on this topic?"
—> Not exactly.., but you might find useful the related work:
I’m in need of one of your published articles:
Role of non-coding sequence variants in cancer
I will much appreciate if you could kindly send me a pdf copy of your published article for personal reading.
I saw your paper "Structuring supplemental materials in support of reproducibility" and appreciate your points. I would love to see a forum (like GATK’s forum or StackOverflow) where each topic for a conversation thread is a single published paper. Then everyone who is trying to replicate results could post their questions and authors their answers for all to see. I think this would be much better than the current closed system of emailing the authors. I would love to see a day when a link to a forum is provided on papers, rather than the authors’ email addresses.Who would have the ability to make something like this get started and catch on? Do you know if they are thinking about funding a platform for something like this at the NIH?
with respect to "Who would have the ability to make something like this get started and catch on?"
with respect to "Do you know if they are thinking about funding a platform for something like this at the NIH?"
I have been using the Genboree exceRpt workflow, and loving it! It has saved me so much time! Your paper got me on to it, and I would like to use one of the figures (1) of the exceRpt pipeline in my PhD thesis. Am I right to contact you to request permission? Or should I be heading to Cell for this?
fine w/ me – just acknowledge us (see