[DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy)

Miguel Vazquez miguel.vazquez at cnio.es
Mon Dec 12 09:51:36 EST 2016


Hi Keiran,

No, I've not filtered for PASS, so these are all variants as far as I know.

Best

M

On Mon, Dec 12, 2016 at 3:48 PM, Keiran Raine <kr2 at sanger.ac.uk> wrote:

> Additionally, are these stats based on PASSED SUB variants?  A couple of
> the missing items are clear SNPs which would be filtered.
>
> Keiran Raine
> Principal Bioinformatician
> Cancer Genome Project
> Wellcome Trust Sanger Institute
>
> kr2 at sanger.ac.uk
> Tel:+44 (0)1223 834244 Ext: 4983 <+44%201223%20834244>
> Office: H104
>
> On 12 Dec 2016, at 14:12, Brian O'Connor <Brian.OConnor at oicr.on.ca> wrote:
>
> Hi Francis,
>
> I agree with you, I think Miguel is showing what this group needs to show,
> that someone else can run the tools from Dockstore, have that be
> successful, and the results are largely in agreement with previous results
> (or duplicate runs).  I think maybe a statement about the possibility of
> stochastic results in the README for each tool would be sufficient.  This
> could be something that Keiran can craft/comment for Sanger’s pipeline
> since he’s in the best position for this one.
>
> Brian
>
> On Dec 12, 2016, at 7:38 AM, Francis Ouellette <francis at oicr.on.ca> wrote:
>
> I know I'm not suppose to be there (and I'm not :-), but one slippery
> slope I want this dockerstore testing working group to be wary about (and
> Christina, this is really directed at you, chairing the discussion today)
> is that the request from Lincoln for this to reproduce what we are doing is
> fine, but I don't think it is this working group's task to reproduce and
> explain all of the discrepancies we see. I don't think we ever saw that
> kind of data from the people that ran the original workflow.
>
> If this group can ascertain that a dock store container basically works, I
> think we need to call that test a success, and move on to the next one.
> What Miguel is suggesting/asking below is very good, but I could see this
> becoming into a very slippery slope, which I would advise us against
> slipping down.
>
> Anyway, going off to my day off,
>
> Have a ghre at discussion,
>
> Francis
>
> --
> B.F. Francis Ouellette          http://oicr.on.ca/person/francis-ouellette
>
> On Dec 12, 2016, at 05:44, Miguel Vazquez <miguel.vazquez at cnio.es> wrote:
>
> Dear all,
>
> I was wondering if someone here was acquainted with the Sanger workflow
> and could help explain these discrepancies. I've skimmed through the code,
> and it seems like uses EM but I didn't find anything random in it, such as
> during initialization, which was my initial guess. The other thing I though
> is that when it splits the work for parallel processing it might choose a
> different number of splits to accommodate the number of CPUs, and that this
> might affect the calculations.
>
> Is there someone here that could help shed some light? As soon as some
> other tests finish I'll be running the process again, but since it takes so
> long perhaps a little insight would help.
>
> Best regards
>
> Miguel
>
>
>
> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez <miguel.vazquez at cnio.es>
> wrote:
> Dear all,
>
> The Sanger pipeline completed, after about 2 weeks of computing, for donor
> DO50311
>
> The results are the following:
>
> Comparison for DO50311 using Sanger
> ---
> Common: 156299
> Extra: 1
>    - Example: Y:58885197:G
> Missing: 14
>    - Example: 1:102887902:T,1:143165228:G,16:87047601:C
>
>
> The donor results for DKFZ yielded
>
> Comparison for DO50311 using DKFZ
> ---
> Common: 51087
> Extra: 0
> Missing: 0
>
>
> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've
> updated the information here
>
> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
>
>
> Best regards
>
> Miguel
>
>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
> https://lists.icgc.org/mailman/listinfo/docktesters
>
>
>
>
> -- The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a company
> registered in England with number 2742969, whose registered office is 215
> Euston Road, London, NW1 2BE.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161212/f6f52cd4/attachment.html>


More information about the docktesters mailing list