[DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy)

Miguel Vazquez miguel.vazquez at cnio.es
Mon Dec 12 05:44:07 EST 2016


Dear all,

I was wondering if someone here was acquainted with the Sanger workflow and
could help explain these discrepancies. I've skimmed through the code, and
it seems like uses EM but I didn't find anything random in it, such as
during initialization, which was my initial guess. The other thing I though
is that when it splits the work for parallel processing it might choose a
different number of splits to accommodate the number of CPUs, and that this
might affect the calculations.

Is there someone here that could help shed some light? As soon as some
other tests finish I'll be running the process again, but since it takes so
long perhaps a little insight would help.

Best regards

Miguel



On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez <miguel.vazquez at cnio.es>
wrote:

> Dear all,
>
> The Sanger pipeline completed, after about 2 weeks of computing, for donor
> DO50311
>
> The results are the following:
>
>
>
>
>
>
>
> *Comparison for DO50311 using Sanger---Common: 156299Extra: 1    -
> Example: Y:58885197:GMissing: 14    - Example:
> 1:102887902:T,1:143165228:G,16:87047601:C*
>
>
> The donor results for DKFZ yielded
>
> Comparison for DO50311 using DKFZ
> ---
> Common: 51087
> Extra: 0
> Missing: 0
>
>
> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've
> updated the information here
>
> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
>
>
> Best regards
>
> Miguel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161212/a8645f92/attachment.html>


More information about the docktesters mailing list