[DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy)

Francis Ouellette francis at oicr.on.ca
Mon Dec 12 07:38:46 EST 2016


I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow.

If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down.

Anyway, going off to my day off,

Have a ghre at discussion,

Francis

--
B.F. Francis Ouellette          http://oicr.on.ca/person/francis-ouellette

On Dec 12, 2016, at 05:44, Miguel Vazquez <miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>> wrote:

Dear all,

I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations.

Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help.

Best regards

Miguel



On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez <miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>> wrote:
Dear all,

The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311

The results are the following:

Comparison for DO50311 using Sanger
---
Common: 156299
Extra: 1
    - Example: Y:58885197:G
Missing: 14
    - Example: 1:102887902:T,1:143165228:G,16:87047601:C


The donor results for DKFZ yielded

Comparison for DO50311 using DKFZ
---
Common: 51087
Extra: 0
Missing: 0


In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here

https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data


Best regards

Miguel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161212/4f7453fd/attachment.html>


More information about the docktesters mailing list