[DOCKTESTERS] Summary of validation of Dockers Sanger and DKFZ(+Delly)

Junjun Zhang Junjun.Zhang at oicr.on.ca
Mon Feb 6 10:24:28 EST 2017


Hi Matthias,

Thanks for clarification.

We had discussion about this donor (DO51087) during today¹s tech call.
Lincoln suggests to ask you to help viewing the QC metrics of this donor,
if appropriate, this donor may be added to the exclusion list.

Just an FYI, you can find more information about this donor here:
https://docs.google.com/spreadsheets/d/126V4Dke1IvfVZqHLvZPUUeo7PO1Hi8jg8Qh
xiIDOFR4/edit#gid=1654136615 (search for DO51087).

Can you please help with this?

Thanks,
Junjun




On 2017-02-06, 1:28 AM, "Schlesner, Matthias"
<m.schlesner at Dkfz-Heidelberg.de> wrote:

>Hi Junjun,
>
>This sample has extreme OxoG which could  not be removed completely by
>our filter. Hence there is a huge number of artifacts remaining which
>blow up the file size.
>
>Best,
>Matthias
>
>
>Dr. Matthias Schlesner
>Division Theoretical Bioinformatics (B080)
>Head of Computational Oncology Group
>
>German Cancer Research Center (DKFZ)
>Foundation under Public Law
>Im Neuenheimer Feld 280
>69120 Heidelberg
>Germany
>office: Berliner Str. 41 (Mathematikon), room 02.MB.116
>phone: +49 6221 42-2720
>fax:      +49 6221 42-3626
>
>m.schlesner at dkfz.d<mailto:m.schlesner at dkfz.de>e
>www.dkfz.de<http://www.dkfz.de/>
>
> [unknown.png]
>
>Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta
>
>VAT-ID No.: DE143293537
>
>From: Junjun Zhang
><Junjun.Zhang at oicr.on.ca<mailto:Junjun.Zhang at oicr.on.ca>>
>Date: Monday, February 6, 2017 at 12:52 AM
>To: Jonas Demeulemeester
><Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>
>>, Miguel Vazquez <miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>>
>Cc: "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>"
><docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>,
>"joachim.weischenfeldt at bric.ku.dk<mailto:joachim.weischenfeldt at bric.ku.dk>
>" 
><joachim.weischenfeldt at bric.ku.dk<mailto:joachim.weischenfeldt at bric.ku.dk>
>>, "Schlesner, Matthias"
>><m.schlesner at Dkfz-Heidelberg.de<mailto:m.schlesner at Dkfz-Heidelberg.de>>
>Subject: Re: [DOCKTESTERS] Summary of validation of Dockers Sanger and
>DKFZ(+Delly)
>
>Hi Miguel and Jonas,
>
>I hope DKFZ pipeline authors (cc¹d here) would be able to figure out the
>differences of the calls for DO52140
>
>Here I have another donor: DO51087. The size of the somatic SNV/MNV DKFZ
>call seems to be surprisingly large, the GZ¹d VCF file is greater than
>500MB. Here you can find more information about the file:
>https://dcc.icgc.org/repositories/files/FI500885. You can verify that in
>GNOS as well: 
>https://gtrepo-dkfz.annailabs.com/cghub/metadata/analysisFull/e1e9062e-35e
>6-447d-bc61-591e76fbeee0.
>
>Matthias, can you please take a look of the VCF? Hope you may be able to
>spot something abnormal there.
>
>Maybe Miguel/Jonas, if you plan to test more donors for the DKFZ
>pipeline, can you please choose this donor? Tumour aligned BAM is:
>https://dcc.icgc.org/repositories/files/FI37278, normal aligned BAM is
>https://dcc.icgc.org/repositories/files/FI37277
>
>Thanks,
>Junjun
>
>
>
>From: 
><docktesters-bounces+junjun.zhang=oicr.on.ca at lists.icgc.org<mailto:docktes
>ters-bounces+junjun.zhang=oicr.on.ca at lists.icgc.org>> on behalf of Jonas
>Demeulemeester 
><Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>
>>
>Date: Saturday, February 4, 2017 at 10:14 AM
>To: Miguel Vazquez <miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>>
>Cc: "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>"
><docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>
>Subject: Re: [DOCKTESTERS] Summary of validation of Dockers Sanger and
>DKFZ(+Delly)
>
>Hi Miguel,
>
>The comparison was indeed run largely using your scripts.
>I didn't notice any missteps but you never know of course.
>Hope they can pinpoint the issue.
>
>Cheers,
>Jonas
>
>
>On 3 Feb 2017, at 19:06, Miguel Vazquez
><miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>> wrote:
>
>Excellent Jonas, this is very useful info.
>
> I guess you are using my own scripts for this. The possibility remains
>that there is a misstep in them regarding delly. Let's see what turns out
>of the checks by our friends at DKFZ.
>
>Best regards
>
>Have a great weekend
>
>Miguel
>
>On Feb 3, 2017 6:10 PM, "Jonas Demeulemeester"
><Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>
>> wrote:
>Dear all,
>
>Also for the DKFZ (+Delly) workflow, I can confirm Miguel¹s results on
>samples DO52140 and DO50311.
>The dockerised pipelines return identical calls for SNV.MNVs and indels
>but partly different ones for SVs and CNVs, independent of the
>infrastructure.
>
>Best regards,
>Jonas
>
>
>
>
>
>RESULTS DO52140
>---------------
>
>Comparison of somatic.sv<http://somatic.sv> for DO52140 using DKFZ
>---
>Common: 72
>Extra: 23
>    - Example: 
>10:132840774:N:<DEL>,11:38252019:N:<TRA>,11:47700673:N:<TRA>
>Missing: 61
>    - Example: 10:134749140:N:<DEL>,11:179191:N:<TRA>,11:38252005:N:<TRA>
>
>
>Comparison of germline.sv<http://germline.sv> for DO52140 using DKFZ
>---
>Common: 1108
>Extra: 1116
>    - Example: 
>10:102158308:N:<DEL>,10:104645247:N:<DEL>,10:105097522:N:<DEL>
>Missing: 2908
>    - Example: 
>10:100107032:N:<TRA>,10:100107151:N:<TRA>,10:102158345:N:<DEL>
>
>
>Comparison of somatic.snv.mnv for DO52140 using DKFZ
>---
>Common: 37160
>Extra: 0
>Missing: 0
>
>
>Comparison of germline.snv.mnv for DO52140 using DKFZ
>---
>Common: 3833896
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.indel for DO52140 using DKFZ
>---
>Common: 19347
>Extra: 0
>Missing: 0
>
>
>Comparison of germline.indel for DO52140 using DKFZ
>---
>Common: 706572
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.cnv for DO52140 using DKFZ
>---
>Common: 275
>Extra: 94
>    - Example: 1:106505931:N:<LOH>,1:109068899:N:<DEL>,1:109359995:N:<DEL>
>Missing: 286
>    - Example: 10:88653561:N:<LOH>,11:179192:N:<LOH>,11:38252006:N:<LOH>
>
>
>
>
>RESULTS DO50311
>---------------
>
>Comparison of somatic.sv<http://somatic.sv> for DO50311 using DKFZ
>---
>Common: 231
>Extra: 44
>    - Example: 10:20596800:N:<TRA>,10:56066821:N:<TRA>,11:16776092:N:<TRA>
>Missing: 48
>    - Example: 
>10:119704959:N:<INV>,10:13116322:N:<TRA>,10:47063485:N:<TRA>
>
>
>Comparison of germline.sv<http://germline.sv> for DO50311 using DKFZ
>---
>Common: 1393
>Extra: 231
>    - Example: 
>10:134319313:N:<DEL>,10:134948976:N:<DEL>,10:19996638:N:<DEL>
>Missing: 615
>    - Example: 
>10:101851839:N:<TRA>,10:101851884:N:<TRA>,10:10745225:N:<DUP>
>
>
>Comparison of somatic.snv.mnv for DO50311 using DKFZ
>---
>Common: 51087
>Extra: 0
>Missing: 0
>
>
>Comparison of germline.snv.mnv for DO50311 using DKFZ
>---
>Common: 3850992
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.indel for DO50311 using DKFZ
>---
>Common: 26469
>Extra: 0
>Missing: 0
>
>
>Comparison of germline.indel for DO50311 using DKFZ
>---
>Common: 709060
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.cnv for DO50311 using DKFZ
>---
>Common: 731
>Extra: 213
>    - Example: 
>10:132510034:N:<DEL>,10:20596801:N:<NEUTRAL>,10:47674883:N:<NEUTRAL>
>Missing: 190
>    - Example: 
>10:100891940:N:<NEUTRAL>,10:104975905:N:<NEUTRAL>,10:119704960:N:<NEUTRAL>
>
>
>
>
>
>_________________________________
>Jonas Demeulemeester, PhD
>Postdoctoral Researcher
>The Francis Crick Institute
>1 Midland Road
>London
>NW1 1AT
>
>T: +44 (0)20 3796 2594<tel:+44%2020%203796%202594>
>M: +44 (0)7482 070730<tel:+44%207482%20070730>
>E: jonas.demeulemeester at crick.ac.uk
>W: www.crick.ac.uk
>
>
>
>On 26 Jan 2017, at 13:41, Jonas Demeulemeester
><Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>
>> wrote:
>
>Hi all,
>
>I can now confirm Miguel¹s results with the Sanger workflow on donors
>DO50311 and DO52140.
>The calls made by the dockerised version are identical for Indels and SVs
>and produce only small discrepancies for SNV_MNVs and CNVs.
>The discrepancies seem independent of the system infrastructure as the
>number of missing/extra variants called are the same as Miguel¹s reported
>previously (on DO52140)
>
>I¹ve also updated the wiki page accordingly.
>
>Best regards,
>Jonas
>
>
>
>RESULTS - DO50311
>------
>
>
>Comparison of cnv for DO50311 using Sanger
>---
>Common: 138
>Extra: 0
>Missing: 0
>
>
>Comparison of indel for DO50311 using Sanger
>---
>Common: 812487
>Extra: 0
>Missing: 0
>
>
>Comparison of snv_mnv for DO50311 using Sanger
>---
>Common: 156313
>Extra: 0
>Missing: 0
>
>
>Comparison of sv for DO50311 using Sanger
>---
>Common: 260
>Extra: 0
>Missing: 0
>
>
>
>
>
>RESULTS - DO52140
>------
>
>
>Comparison of cnv for DO52140 using Sanger
>---
>Common: 36
>Extra: 0
>Missing: 2
>    - Example: 10:11767915:T:<CNV>,10:11779907:G:<CNV>
>
>
>Comparison of indel for DO52140 using Sanger
>---
>Common: 803986
>Extra: 0
>Missing: 0
>
>
>Comparison of svn_mnv for DO52140 using Sanger
>---
>Common: 87234
>Extra: 5
>    - Example: 1:23719098:G,12:43715930:A,20:4058335:A
>Missing: 7
>    - Example: 10:6881937:T,1:148579866:G,11:9271589:A
>
>
>Comparison of sv for DO52140 using Sanger
>---
>Common: 6
>Extra: 0
>Missing: 0
>
>
>
>
>
>
>For comparison, Miguel¹s report on DO51240:
>Report
>~~~~~~
>
>Comparison of somatic.cnv for DO52140 using Sanger
>---
>Common: 36
>Extra: 0
>Missing: 2
>    - Example: 10:11767915:T:<CNV>,10:11779907:G:<CNV>
>
>
>Comparison of somatic.indel for DO52140 using Sanger
>---
>Common: 803986
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.snv.mnv for DO52140 using Sanger
>---
>Common: 87234
>Extra: 5
>    - Example: 1:23719098:A:G,12:43715930:T:A,20:4058335:T:A
>Missing: 7
>    - Example: 10:6881937:A:T,1:148579866:A:G,11:9271589:T:A
>
>
>Comparison of somatic.sv<http://somatic.sv> for DO52140 using Sanger
>---
>Common: 6
>Extra: 0
>Missing: 0
>
>
>_________________________________
>Jonas Demeulemeester, PhD
>Postdoctoral Researcher
>The Francis Crick Institute
>1 Midland Road
>London
>NW1 1AT
>
>T: +44 (0)20 3796 2594<tel:+44%2020%203796%202594>
>M: +44 (0)7482 070730<tel:+44%207482%20070730>
>E: jonas.demeulemeester at crick.ac.uk
>W: www.crick.ac.uk
>
>
>
>On 16 Jan 2017, at 14:24, Miguel Vazquez
><miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>> wrote:
>
>Dear all,
>
>Let me summarize the status of the testing for Sanger and DKFZ. The
>validation has been run for two donors for each workflow: DO50311 DO52140
>
>Sanger:
>----------
>
>Sanger call only somatic variants. The results are identical for Indels
>and SVs but almost identical for SNV.MNV and CNV. The discrepancies are
>reproducible (on the same machine at least), i.e. the same are found
>after running the workflow a second time.
>
>DKFZ:
>---------
>DKFZ cals somatic and germline variants, except germline CNVs. For both
>germline and somatic variants the results are identical for SNV.MNV and
>Indels but with large discrepancies for SV and CNV.
>
>
>Kortine Kleinheinz and Joachim Weischenfeldt are in the process of
>investigating this issue I believe.
>
>BWA-Mem failed for me and has also failed for Denis Yuen and Jonas
>Demeulemeester. Denis I believe is investigating this problem further. I
>haven't had the chance to investigate this much myself.
>
>Best
>
>Miguel
>
>
>
>
>---------------------
>RESULTS
>---------------------
>
>ubuntu at ip-10-253-35-14:~/DockerTest-Miguel$ cat results.txt
>
>Comparison of somatic.snv.mnv for DO50311 using DKFZ
>---
>Common: 51087
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.indel for DO50311 using DKFZ
>---
>Common: 26469
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.sv<http://somatic.sv/> for DO50311 using DKFZ
>---
>Common: 231
>Extra: 44
>    - Example: 10:20596800:N:<TRA>,10:56066821:N:<TRA>,11:16776092:N:<TRA>
>Missing: 48
>    - Example: 
>10:119704959:N:<INV>,10:13116322:N:<TRA>,10:47063485:N:<TRA>
>
>
>Comparison of somatic.cnv for DO50311 using DKFZ
>---
>Common: 731
>Extra: 213
>    - Example: 
>10:132510034:N:<DEL>,10:20596801:N:<NEUTRAL>,10:47674883:N:<NEUTRAL>
>Missing: 190
>    - Example: 
>10:100891940:N:<NEUTRAL>,10:104975905:N:<NEUTRAL>,10:119704960:N:<NEUTRAL>
>
>
>Comparison of germline.snv.mnv for DO50311 using DKFZ
>---
>Common: 3850992
>Extra: 0
>Missing: 0
>
>
>Comparison of germline.indel for DO50311 using DKFZ
>---
>Common: 709060
>Extra: 0
>Missing: 0
>
>
>Comparison of germline.sv<http://germline.sv/> for DO50311 using DKFZ
>---
>Common: 1393
>Extra: 231
>    - Example: 
>10:134319313:N:<DEL>,10:134948976:N:<DEL>,10:19996638:N:<DEL>
>Missing: 615
>    - Example: 
>10:101851839:N:<TRA>,10:101851884:N:<TRA>,10:10745225:N:<DUP>
>
>File not found 
>/mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/DO50311//output//DO50311.germli
>ne.cnv.vcf.gz
>
>Comparison of somatic.snv.mnv for DO52140 using DKFZ
>---
>Common: 37160
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.indel for DO52140 using DKFZ
>---
>Common: 19347
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.sv<http://somatic.sv/> for DO52140 using DKFZ
>---
>Common: 72
>Extra: 23
>    - Example: 
>10:132840774:N:<DEL>,11:38252019:N:<TRA>,11:47700673:N:<TRA>
>Missing: 61
>    - Example: 10:134749140:N:<DEL>,11:179191:N:<TRA>,11:38252005:N:<TRA>
>
>
>Comparison of somatic.cnv for DO52140 using DKFZ
>---
>Common: 275
>Extra: 94
>    - Example: 1:106505931:N:<LOH>,1:109068899:N:<DEL>,1:109359995:N:<DEL>
>Missing: 286
>    - Example: 10:88653561:N:<LOH>,11:179192:N:<LOH>,11:38252006:N:<LOH>
>
>
>Comparison of germline.snv.mnv for DO52140 using DKFZ
>---
>Common: 3833896
>Extra: 0
>Missing: 0
>
>
>Comparison of germline.indel for DO52140 using DKFZ
>---
>Common: 706572
>Extra: 0
>Missing: 0
>
>
>Comparison of germline.sv<http://germline.sv/> for DO52140 using DKFZ
>---
>Common: 1108
>Extra: 1116
>    - Example: 
>10:102158308:N:<DEL>,10:104645247:N:<DEL>,10:105097522:N:<DEL>
>Missing: 2908
>    - Example: 
>10:100107032:N:<TRA>,10:100107151:N:<TRA>,10:102158345:N:<DEL>
>
>File not found 
>/mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/DO52140//output//DO52140.germli
>ne.cnv.vcf.gz
>
>Comparison of somatic.snv.mnv for DO50311 using Sanger
>---
>Common: 156299
>Extra: 1
>    - Example: Y:58885197:A:G
>Missing: 14
>    - Example: 1:102887902:A:T,1:143165228:C:G,16:87047601:A:C
>
>
>Comparison of somatic.indel for DO50311 using Sanger
>---
>Common: 812487
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.sv<http://somatic.sv/> for DO50311 using Sanger
>---
>Common: 260
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.cnv for DO50311 using Sanger
>---
>Common: 138
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.snv.mnv for DO52140 using Sanger
>---
>Common: 87234
>Extra: 5
>    - Example: 1:23719098:A:G,12:43715930:T:A,20:4058335:T:A
>Missing: 7
>    - Example: 10:6881937:A:T,1:148579866:A:G,11:9271589:T:A
>
>
>Comparison of somatic.indel for DO52140 using Sanger
>---
>Common: 803986
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.sv<http://somatic.sv/> for DO52140 using Sanger
>---
>Common: 6
>Extra: 0
>Missing: 0
>
>
>Comparison of somatic.cnv for DO52140 using Sanger
>---
>Common: 36
>Extra: 0
>Missing: 2
>    - Example: 10:11767915:T:<CNV>,10:11779907:G:<CNV>
>_______________________________________________
>docktesters mailing list
>docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>
>https://lists.icgc.org/mailman/listinfo/docktesters
>
>
>The Francis Crick Institute Limited is a registered charity in England
>and Wales no. 1140062 and a company registered in England and Wales no.
>06885462, with its registered office at 1 Midland Road London NW1 1AT
>
>_______________________________________________
>docktesters mailing list
>docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>
>https://lists.icgc.org/mailman/listinfo/docktesters
>
>
>The Francis Crick Institute Limited is a registered charity in England
>and Wales no. 1140062 and a company registered in England and Wales no.
>06885462, with its registered office at 1 Midland Road London NW1 1AT
>
>_______________________________________________
>docktesters mailing list
>docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>
>https://lists.icgc.org/mailman/listinfo/docktesters
>
>
>The Francis Crick Institute Limited is a registered charity in England
>and Wales no. 1140062 and a company registered in England and Wales no.
>06885462, with its registered office at 1 Midland Road London NW1 1AT



More information about the docktesters mailing list