[DOCKTESTERS] Broad PCAWG Tokens definition?
Alexander Buchanan
buchanae at ohsu.edu
Thu Oct 20 17:18:55 EDT 2016
Great, thanks Gordon.
Fyi, I just finished a successful run of the tokens wdl with a relatively small BAM file. I had to rebuild the docker image, because the one hosted on Dockerhub was missing a source file. I filed a github issue for that here: https://github.com/broadinstitute/pcawg/issues/10
A rebuild of the image seemed to do the trick. I had some issues building it (now resolved), probably because my build environment was different (Mac) and/or broken (mysterious network issues with Bioconductor.org).
Is there some expected input/output pair I should be using to validate that it’s working as expected?
I don’t have any feedback on the internals of the docker images necessarily (pipette, single wdl task, etc). I think my first impression was just being a little confused looking at the repo and not knowing where to start. The docs mention Firecloud, GCE, make, etc. which doesn’t match my environment, so I didn’t easily know where to start. But, after a day or poking around I figured it out. I guess that’s to be expected when opening up a new codebase on an unfamiliar and unreleased project.
Thanks again,
Alex Buchanan
From: Gordon Saksena <gsaksena at broadinstitute.org>
Date: Thursday, October 20, 2016 at 2:03 PM
To: Alexander Buchanan <buchanae at ohsu.edu>
Cc: "docktesters at lists.icgc.org" <docktesters at lists.icgc.org>
Subject: Re: Broad PCAWG Tokens definition?
The tokens task is part of the PoN (Panel of Normals) filter. It is a Java program that collects stats on the Normal BAM for the current donor. These stats are later aggregated with stats from the other samples (in another docker), and then used to flag certain variants in a VCF as suspect (in a third docker). The overall algorithm is in the process of being published.
It should be one of the more straightforward algorithms to test - it has very predictable CPU time and RAM usage, and should produce outputs that can be tested via an exact binary match. It accepts just the normal BAM for its input.
I'm planning later dockers to have a similar structure, though with increased memory and core requirements. The .wdl file will continue to contain a single task, with the bulk of the pipeline wiring embedded inside the docker. The dockers will either accept the source BAMs (for callers) or VCFs (for filters) as inputs. If you have feedback I can incorporate it into the other dockers.
Gordon
On Thu, Oct 20, 2016 at 4:08 PM, Alexander Buchanan <buchanae at ohsu.edu<mailto:buchanae at ohsu.edu>> wrote:
Hey Gordon,
I’m new here so maybe I missed this, but what is the tokens task? How would you describe what it does and the results it produces?
Thanks!
Alex Buchanan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161020/2910fd2f/attachment-0001.html>
More information about the docktesters
mailing list