Question about ‘Using Ethereum blockchain to store and query pharmacogenomics data via smart contracts’

Q:
I’ve read your paper, Using Ethereum blockchain to store and query pharmacogenomics data via smart contracts, published in BMC Medical Genomics. And I’m very, very interested in your program and result. Could you share your program with me? I’m new in blockchain field and a standalone researcher, so I hope you may share the program and give me some advice.

A:
GitHub link is on the paper and you can find below too

https://github.com/gersteinlab/idash19bc

It’s freely available to everyone so you can share with anyone if you like

Conformation Explorer

Q:
I’m trying to use Conformation Explorer.

I have submitted a job, but looks like Proxy error. Is this server still working?

Server message:
The job ce199231-3887 is not yet completed. Please check again a few minutes to a day from the time of your initial submission.

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /cgi-bin/morph.cgi.

Reason: Error reading from remote server

Here’s the URL and job ID: http://molmovdb.org/cgi-bin/morph.cgi?ID=ce199231-3887&type=choosehinge

Server message:
The job ce199231-3887 is not yet completed. Please check again a few minutes to a day from the time of your initial submission. If this job is older than a day or so, please contact the server administrator. Email: Mark.Gerstein _at_ yale.eduI wanted to know how conformation of protein changes due to ligand addition.

A:
Unfortunately, the server is experiencing problems and we do not have a developer maintaining this right now. All files generated and related to your submission are in http://www.molmovdb.org/uploads/ce199231-3887/ . The PDB movie file looks like it consists of one frame when viewed in VMD so I do not think a movie was successfully produced by the server.

Running FunSeq2 through the online application

Q:
I’m trying to run FunSeq2 through the online application. I’ve tried a few times over the past few days, but always get this error message: "Sorry, but the requested page is unavailable due to a server hiccup." Can you please advise?

I am still not able to run FunSeq2 online; I’m getting the same error message as before. I also downloaded the source scripts and can run example input, however, it is extremely slow when trying to run >1,000 input SNPs (20hr+ on cluster computing). I noticed that Supplemental Table 5 says it can process 2,000 SNPs in 2 minutes – is this for the online version only?

Do you have suggestions for working with the online version? I’ve tried tab delimited .bed and a mix of double spacing for the positions and tabs for the alleles .bed (as suggested by the ARVIN method), and neither work for me.

A:
I have double-checked the webserver and done some testing, the server works as usual. But I noticed some input formatting may cause the error you had. for example, please follow the format we suggested, and use <tab> delimiter not space. If you still have problems, could you share your input file, so we can help to figure out the problem?

However, considering the webserver is based on an old version of FunSeq2, we recommend you use our latest version. We have also prepared a pre-calculated whole-genome score on hg19 and hg38(leftover). You just need to download the score and use tabix tools and bed to query. For details, please refer http://funseq2.gersteinlab.org/downloads

Peak-call table?

Q:
I enjoyed reading your recent STARRPeaker paper. However, when I tried to download Additional file 2: Supplementary Table S1 to get the peak calls, it returned the same PDF of the figure supplement as Additional file 1. I’d be grateful if you or a colleague would send me the table or point to where I might download it.

A:
Please see the Supplementary Table 1 file. We are working with Genome Biology to get it corrected. FYI, you can also download BED format peaks (same as Suppl. table S1) from the ENCODE project website (http://bit.ly/whg-starr-seq).

Genodock

Q1:
We are running some docking and would like to use genodock for some experiments too. Is it something that we can install on our servers to run? who should we talk to for more details? I have also noticed that the server is currently down. We would actually like to even run on our cloud environment for some more intensive computation. Is there a github or sth or any documentation about it? or would you happen to know more about it?

A1:
The webserver was in a hung state at the moment, but will be back up shortly.

Q2:
We are trying out Genodock on http://genodock.molmovdb.org/calculation/0. But unfortunately none of the models worked.

A2:
There’s enough file space in the device and nothing obviously and immediately stands out to me why the 77 print(….) line should fail. Nothing in his user history indicates that any other service should be running.

I restarted it and it now works. See below for details:

The error probably was due to the function ‘print’ which couldn’t find its stdout handle to log. I have restarted the server with this command:

nohup python3.6 manage.py runserver 0.0.0.0:80 >nohup.out 2>&1 </dev/null &

This should be the same as how the server previously started:
root 25966 25965 0 2020 ? 00:00:00 python3.6 manage.py runserver 0.0.0.0:80
root 25968 25966 2 2020 ? 1-05:19:32 /usr/bin/python3.6 manage.py runserver 0.0.0.0:80

Hopefully the nohup, redirects and background execution, which previously may or may not have it all, together can prevent this problem from happening again. You probably know but just fyi, the way that it started (which is now restarted) might not be fully production level:
https://docs.djangoproject.com/en/2.2/ref/django-admin/#runserver

STARRPeaker publication in Genome Biology — missing Supplemental Table 1

Q:
I recently read your STARRPeaker publication in Genome Biology. The STARR-seq technology is very interesting to me, and I thoroughly enjoyed your paper. In considering my own experiments, I was looking to see through the supplemental data and noticed that link for the supplemental table 1 links to a recapitulation of the PDF of the supplemental figures (S1-S13; it’s a new file but contains the same 13 supplemental figures). Thus, Supplemental Table 1 is completely missing from the Genome Biology site. Could you or one of your co-authors forward a copy of that table to me?

A:
We will contact Genome Biology to get it corrected.

The real cost of sequencing: higher than you think!

Q:
In support of an editorial comment I’m working on with a colleague, I just read your paper: The real cost of sequencing: higher than you think!

It is right on the money…as far as the topic I’m interested in but is, obviously, dated (but was helpful for me nonetheless). Pubmed didn’t bring up a more recent revision of your paper…so I assume it hasn’t been updated.

Can you recommend any other, more recent papers or resources that address the issue of cost/price of sequencing? The NHGRI webpage on this topic is also helpful, but it was written in 2016 and does not address the post sequencing costs such as variant calling, annotation, and interpretation. Do you think your illustration of the proportion of costs in 2020 due to the different components of testing still holds true?

A:
see
http://papers.gersteinlab.org/papers/costseq2
http://papers.gersteinlab.org/papers/dsg

Hi-C contact matrix from PsychEncode

Q:
I am trying to explore Hi-C contact matrices at 10kb and 40kb resolution from here http://resource.psychencode.org/. However, I could not find any document showing the genomic coordinates of bins for both columns and rows (two matrices with the size of 303642 x 13554, 75919 x 2259).

A:
This issue i resolved the issue now. The decompression didn’t work properly so re-downloading the data solved the problem.

Using Ethereum blockchain to store and query pharmacogenomics data via smart contracts

Q1:
I found your above article quite instructive for a venture I am planning to launch that will harness the value of cryptocurrency and blockchain technology to develop an ecosystem for the community of patients, providers, and r esearchers concerned with stem cell therapies. I am in the process of drafting a use case for the currency I want to issue. My question is, how would I be able to actually use some variant of your smart contract for my system?

A1:
The paper is published and the code (contract) is freely available on github – if that’s what you’re asking for. If you could be a little bit more specific, we might be able to help you better.

Q2:
I have access to your journal article. My question is how can I actually implement your fastQuery technical solution? I am not an informatics specialist by any means. The other thing is I am interested in developing algorithms that are useful for analysing laboratory and clinical data in this stem cell field. Maybe you can shed some light on how I might achieve this aim. And I guess you mean the code for the fastQuery solution? sorry didn’t catch that before.

A2:
In https://github.com/gersteinlab/idash19bc, GeneDrugRepoV2.sol is the contract for the fastQuery solution mentioned in the paper. Charlotte and I wrote a small, lay tutorial on how to run smart contracts here:https://thegccontent.wordpress.com/2020/04/13/a-practical-ethereum-and-multichain-blockchain-tutorial/ Github page also has information on how to run the implementation. I’m unfortunately not knowledgeable on the data collected in clinical settings related to stem cells so perhaps a more specificialized person in that field could be more helpful to you.

Types in Pseudopipe output

Q:
I am using Pseudopipe and I am wondering the different types of its output.
I looked into the script and found there are several types: GENE-SINGLE, PSSD, FRAG, GENE-MULT, and DUP. Would you like to explain the meaning of each type?

A:
From what i can see you are looking at an intermediary result file not at the final output. The final output should contain only 3 biotypes: PSSD, DUP and FRAG.
The PSSD is indicative of processed pseudogenes, DUP is indicative of duplicated pseudogenes, FRAG is indicative of pseudogene loci where we can not assign with certitude a biotype (processed or duplicated).
GENE-SINGLE and GENE-MULTI are intermediary biotype definitions. The SINGLE refers to the fact that the pseudogene locus contains only one exon (similar to processed pseudogenes) and MULTI refers to the fact that the potential pseudogenic locus contains multiple exons (similar to duplicated pseudogenes).
If a proposed locus has over 95% sequence identity to the parent gene and covers over 95% of the parent gene sequence and there are no identifiable disablements associated with it we initially refer to these potential loci as GENE-SINGLE and respectively GENE-MULTI. If we find a polyA tail you might see PSSD|GENE-SIGNLE and in that case we will relabel that locus as a processed pseudogene. For very high similarity we tend to be conservative and not label that locus as a pseudogene. If we find in subsequent searches additional data (E.g. polyA tail, truncations etc) we will relabel the locus as pseudogene.