[DOCKTESTERS] Summary of validation of Dockers Sanger and DKFZ(+Delly)

Schlesner, Matthias m.schlesner at Dkfz-Heidelberg.de
Mon Feb 6 01:28:40 EST 2017


Hi Junjun,

This sample has extreme OxoG which could  not be removed completely by our filter. Hence there is a huge number of artifacts remaining which blow up the file size.

Best,
Matthias


Dr. Matthias Schlesner
Division Theoretical Bioinformatics (B080)
Head of Computational Oncology Group

German Cancer Research Center (DKFZ)
Foundation under Public Law
Im Neuenheimer Feld 280
69120 Heidelberg
Germany
office: Berliner Str. 41 (Mathematikon), room 02.MB.116
phone: +49 6221 42-2720
fax:      +49 6221 42-3626

m.schlesner at dkfz.d<mailto:m.schlesner at dkfz.de>e
www.dkfz.de<http://www.dkfz.de/>

 [unknown.png]

Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta

VAT-ID No.: DE143293537

From: Junjun Zhang <Junjun.Zhang at oicr.on.ca<mailto:Junjun.Zhang at oicr.on.ca>>
Date: Monday, February 6, 2017 at 12:52 AM
To: Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>>, Miguel Vazquez <miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>>
Cc: "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>" <docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>, "joachim.weischenfeldt at bric.ku.dk<mailto:joachim.weischenfeldt at bric.ku.dk>" <joachim.weischenfeldt at bric.ku.dk<mailto:joachim.weischenfeldt at bric.ku.dk>>, "Schlesner, Matthias" <m.schlesner at Dkfz-Heidelberg.de<mailto:m.schlesner at Dkfz-Heidelberg.de>>
Subject: Re: [DOCKTESTERS] Summary of validation of Dockers Sanger and DKFZ(+Delly)

Hi Miguel and Jonas,

I hope DKFZ pipeline authors (cc’d here) would be able to figure out the differences of the calls for DO52140

Here I have another donor: DO51087. The size of the somatic SNV/MNV DKFZ call seems to be surprisingly large, the GZ’d VCF file is greater than 500MB. Here you can find more information about the file: https://dcc.icgc.org/repositories/files/FI500885. You can verify that in GNOS as well: https://gtrepo-dkfz.annailabs.com/cghub/metadata/analysisFull/e1e9062e-35e6-447d-bc61-591e76fbeee0.

Matthias, can you please take a look of the VCF? Hope you may be able to spot something abnormal there.

Maybe Miguel/Jonas, if you plan to test more donors for the DKFZ pipeline, can you please choose this donor? Tumour aligned BAM is: https://dcc.icgc.org/repositories/files/FI37278, normal aligned BAM is https://dcc.icgc.org/repositories/files/FI37277

Thanks,
Junjun



From: <docktesters-bounces+junjun.zhang=oicr.on.ca at lists.icgc.org<mailto:docktesters-bounces+junjun.zhang=oicr.on.ca at lists.icgc.org>> on behalf of Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>>
Date: Saturday, February 4, 2017 at 10:14 AM
To: Miguel Vazquez <miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>>
Cc: "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>" <docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>
Subject: Re: [DOCKTESTERS] Summary of validation of Dockers Sanger and DKFZ(+Delly)

Hi Miguel,

The comparison was indeed run largely using your scripts.
I didn't notice any missteps but you never know of course.
Hope they can pinpoint the issue.

Cheers,
Jonas


On 3 Feb 2017, at 19:06, Miguel Vazquez <miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>> wrote:

Excellent Jonas, this is very useful info.

 I guess you are using my own scripts for this. The possibility remains that there is a misstep in them regarding delly. Let's see what turns out of the checks by our friends at DKFZ.

Best regards

Have a great weekend

Miguel

On Feb 3, 2017 6:10 PM, "Jonas Demeulemeester" <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>> wrote:
Dear all,

Also for the DKFZ (+Delly) workflow, I can confirm Miguel’s results on samples DO52140 and DO50311.
The dockerised pipelines return identical calls for SNV.MNVs and indels but partly different ones for SVs and CNVs, independent of the infrastructure.

Best regards,
Jonas





RESULTS DO52140
---------------

Comparison of somatic.sv<http://somatic.sv> for DO52140 using DKFZ
---
Common: 72
Extra: 23
    - Example: 10:132840774:N:<DEL>,11:38252019:N:<TRA>,11:47700673:N:<TRA>
Missing: 61
    - Example: 10:134749140:N:<DEL>,11:179191:N:<TRA>,11:38252005:N:<TRA>


Comparison of germline.sv<http://germline.sv> for DO52140 using DKFZ
---
Common: 1108
Extra: 1116
    - Example: 10:102158308:N:<DEL>,10:104645247:N:<DEL>,10:105097522:N:<DEL>
Missing: 2908
    - Example: 10:100107032:N:<TRA>,10:100107151:N:<TRA>,10:102158345:N:<DEL>


Comparison of somatic.snv.mnv for DO52140 using DKFZ
---
Common: 37160
Extra: 0
Missing: 0


Comparison of germline.snv.mnv for DO52140 using DKFZ
---
Common: 3833896
Extra: 0
Missing: 0


Comparison of somatic.indel for DO52140 using DKFZ
---
Common: 19347
Extra: 0
Missing: 0


Comparison of germline.indel for DO52140 using DKFZ
---
Common: 706572
Extra: 0
Missing: 0


Comparison of somatic.cnv for DO52140 using DKFZ
---
Common: 275
Extra: 94
    - Example: 1:106505931:N:<LOH>,1:109068899:N:<DEL>,1:109359995:N:<DEL>
Missing: 286
    - Example: 10:88653561:N:<LOH>,11:179192:N:<LOH>,11:38252006:N:<LOH>




RESULTS DO50311
---------------

Comparison of somatic.sv<http://somatic.sv> for DO50311 using DKFZ
---
Common: 231
Extra: 44
    - Example: 10:20596800:N:<TRA>,10:56066821:N:<TRA>,11:16776092:N:<TRA>
Missing: 48
    - Example: 10:119704959:N:<INV>,10:13116322:N:<TRA>,10:47063485:N:<TRA>


Comparison of germline.sv<http://germline.sv> for DO50311 using DKFZ
---
Common: 1393
Extra: 231
    - Example: 10:134319313:N:<DEL>,10:134948976:N:<DEL>,10:19996638:N:<DEL>
Missing: 615
    - Example: 10:101851839:N:<TRA>,10:101851884:N:<TRA>,10:10745225:N:<DUP>


Comparison of somatic.snv.mnv for DO50311 using DKFZ
---
Common: 51087
Extra: 0
Missing: 0


Comparison of germline.snv.mnv for DO50311 using DKFZ
---
Common: 3850992
Extra: 0
Missing: 0


Comparison of somatic.indel for DO50311 using DKFZ
---
Common: 26469
Extra: 0
Missing: 0


Comparison of germline.indel for DO50311 using DKFZ
---
Common: 709060
Extra: 0
Missing: 0


Comparison of somatic.cnv for DO50311 using DKFZ
---
Common: 731
Extra: 213
    - Example: 10:132510034:N:<DEL>,10:20596801:N:<NEUTRAL>,10:47674883:N:<NEUTRAL>
Missing: 190
    - Example: 10:100891940:N:<NEUTRAL>,10:104975905:N:<NEUTRAL>,10:119704960:N:<NEUTRAL>





_________________________________
Jonas Demeulemeester, PhD
Postdoctoral Researcher
The Francis Crick Institute
1 Midland Road
London
NW1 1AT

T: +44 (0)20 3796 2594<tel:+44%2020%203796%202594>
M: +44 (0)7482 070730<tel:+44%207482%20070730>
E: jonas.demeulemeester at crick.ac.uk
W: www.crick.ac.uk



On 26 Jan 2017, at 13:41, Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>> wrote:

Hi all,

I can now confirm Miguel’s results with the Sanger workflow on donors DO50311 and DO52140.
The calls made by the dockerised version are identical for Indels and SVs and produce only small discrepancies for SNV_MNVs and CNVs.
The discrepancies seem independent of the system infrastructure as the number of missing/extra variants called are the same as Miguel’s reported previously (on DO52140)

I’ve also updated the wiki page accordingly.

Best regards,
Jonas



RESULTS - DO50311
------


Comparison of cnv for DO50311 using Sanger
---
Common: 138
Extra: 0
Missing: 0


Comparison of indel for DO50311 using Sanger
---
Common: 812487
Extra: 0
Missing: 0


Comparison of snv_mnv for DO50311 using Sanger
---
Common: 156313
Extra: 0
Missing: 0


Comparison of sv for DO50311 using Sanger
---
Common: 260
Extra: 0
Missing: 0





RESULTS - DO52140
------


Comparison of cnv for DO52140 using Sanger
---
Common: 36
Extra: 0
Missing: 2
    - Example: 10:11767915:T:<CNV>,10:11779907:G:<CNV>


Comparison of indel for DO52140 using Sanger
---
Common: 803986
Extra: 0
Missing: 0


Comparison of svn_mnv for DO52140 using Sanger
---
Common: 87234
Extra: 5
    - Example: 1:23719098:G,12:43715930:A,20:4058335:A
Missing: 7
    - Example: 10:6881937:T,1:148579866:G,11:9271589:A


Comparison of sv for DO52140 using Sanger
---
Common: 6
Extra: 0
Missing: 0






For comparison, Miguel’s report on DO51240:
Report
~~~~~~

Comparison of somatic.cnv for DO52140 using Sanger
---
Common: 36
Extra: 0
Missing: 2
    - Example: 10:11767915:T:<CNV>,10:11779907:G:<CNV>


Comparison of somatic.indel for DO52140 using Sanger
---
Common: 803986
Extra: 0
Missing: 0


Comparison of somatic.snv.mnv for DO52140 using Sanger
---
Common: 87234
Extra: 5
    - Example: 1:23719098:A:G,12:43715930:T:A,20:4058335:T:A
Missing: 7
    - Example: 10:6881937:A:T,1:148579866:A:G,11:9271589:T:A


Comparison of somatic.sv<http://somatic.sv> for DO52140 using Sanger
---
Common: 6
Extra: 0
Missing: 0


_________________________________
Jonas Demeulemeester, PhD
Postdoctoral Researcher
The Francis Crick Institute
1 Midland Road
London
NW1 1AT

T: +44 (0)20 3796 2594<tel:+44%2020%203796%202594>
M: +44 (0)7482 070730<tel:+44%207482%20070730>
E: jonas.demeulemeester at crick.ac.uk
W: www.crick.ac.uk



On 16 Jan 2017, at 14:24, Miguel Vazquez <miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>> wrote:

Dear all,

Let me summarize the status of the testing for Sanger and DKFZ. The validation has been run for two donors for each workflow: DO50311 DO52140

Sanger:
----------

Sanger call only somatic variants. The results are identical for Indels and SVs but almost identical for SNV.MNV and CNV. The discrepancies are reproducible (on the same machine at least), i.e. the same are found after running the workflow a second time.

DKFZ:
---------
DKFZ cals somatic and germline variants, except germline CNVs. For both germline and somatic variants the results are identical for SNV.MNV and Indels but with large discrepancies for SV and CNV.


Kortine Kleinheinz and Joachim Weischenfeldt are in the process of investigating this issue I believe.

BWA-Mem failed for me and has also failed for Denis Yuen and Jonas Demeulemeester. Denis I believe is investigating this problem further. I haven't had the chance to investigate this much myself.

Best

Miguel




---------------------
RESULTS
---------------------

ubuntu at ip-10-253-35-14:~/DockerTest-Miguel$ cat results.txt

Comparison of somatic.snv.mnv for DO50311 using DKFZ
---
Common: 51087
Extra: 0
Missing: 0


Comparison of somatic.indel for DO50311 using DKFZ
---
Common: 26469
Extra: 0
Missing: 0


Comparison of somatic.sv<http://somatic.sv/> for DO50311 using DKFZ
---
Common: 231
Extra: 44
    - Example: 10:20596800:N:<TRA>,10:56066821:N:<TRA>,11:16776092:N:<TRA>
Missing: 48
    - Example: 10:119704959:N:<INV>,10:13116322:N:<TRA>,10:47063485:N:<TRA>


Comparison of somatic.cnv for DO50311 using DKFZ
---
Common: 731
Extra: 213
    - Example: 10:132510034:N:<DEL>,10:20596801:N:<NEUTRAL>,10:47674883:N:<NEUTRAL>
Missing: 190
    - Example: 10:100891940:N:<NEUTRAL>,10:104975905:N:<NEUTRAL>,10:119704960:N:<NEUTRAL>


Comparison of germline.snv.mnv for DO50311 using DKFZ
---
Common: 3850992
Extra: 0
Missing: 0


Comparison of germline.indel for DO50311 using DKFZ
---
Common: 709060
Extra: 0
Missing: 0


Comparison of germline.sv<http://germline.sv/> for DO50311 using DKFZ
---
Common: 1393
Extra: 231
    - Example: 10:134319313:N:<DEL>,10:134948976:N:<DEL>,10:19996638:N:<DEL>
Missing: 615
    - Example: 10:101851839:N:<TRA>,10:101851884:N:<TRA>,10:10745225:N:<DUP>

File not found /mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/DO50311//output//DO50311.germline.cnv.vcf.gz

Comparison of somatic.snv.mnv for DO52140 using DKFZ
---
Common: 37160
Extra: 0
Missing: 0


Comparison of somatic.indel for DO52140 using DKFZ
---
Common: 19347
Extra: 0
Missing: 0


Comparison of somatic.sv<http://somatic.sv/> for DO52140 using DKFZ
---
Common: 72
Extra: 23
    - Example: 10:132840774:N:<DEL>,11:38252019:N:<TRA>,11:47700673:N:<TRA>
Missing: 61
    - Example: 10:134749140:N:<DEL>,11:179191:N:<TRA>,11:38252005:N:<TRA>


Comparison of somatic.cnv for DO52140 using DKFZ
---
Common: 275
Extra: 94
    - Example: 1:106505931:N:<LOH>,1:109068899:N:<DEL>,1:109359995:N:<DEL>
Missing: 286
    - Example: 10:88653561:N:<LOH>,11:179192:N:<LOH>,11:38252006:N:<LOH>


Comparison of germline.snv.mnv for DO52140 using DKFZ
---
Common: 3833896
Extra: 0
Missing: 0


Comparison of germline.indel for DO52140 using DKFZ
---
Common: 706572
Extra: 0
Missing: 0


Comparison of germline.sv<http://germline.sv/> for DO52140 using DKFZ
---
Common: 1108
Extra: 1116
    - Example: 10:102158308:N:<DEL>,10:104645247:N:<DEL>,10:105097522:N:<DEL>
Missing: 2908
    - Example: 10:100107032:N:<TRA>,10:100107151:N:<TRA>,10:102158345:N:<DEL>

File not found /mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/DO52140//output//DO52140.germline.cnv.vcf.gz

Comparison of somatic.snv.mnv for DO50311 using Sanger
---
Common: 156299
Extra: 1
    - Example: Y:58885197:A:G
Missing: 14
    - Example: 1:102887902:A:T,1:143165228:C:G,16:87047601:A:C


Comparison of somatic.indel for DO50311 using Sanger
---
Common: 812487
Extra: 0
Missing: 0


Comparison of somatic.sv<http://somatic.sv/> for DO50311 using Sanger
---
Common: 260
Extra: 0
Missing: 0


Comparison of somatic.cnv for DO50311 using Sanger
---
Common: 138
Extra: 0
Missing: 0


Comparison of somatic.snv.mnv for DO52140 using Sanger
---
Common: 87234
Extra: 5
    - Example: 1:23719098:A:G,12:43715930:T:A,20:4058335:T:A
Missing: 7
    - Example: 10:6881937:A:T,1:148579866:A:G,11:9271589:T:A


Comparison of somatic.indel for DO52140 using Sanger
---
Common: 803986
Extra: 0
Missing: 0


Comparison of somatic.sv<http://somatic.sv/> for DO52140 using Sanger
---
Common: 6
Extra: 0
Missing: 0


Comparison of somatic.cnv for DO52140 using Sanger
---
Common: 36
Extra: 0
Missing: 2
    - Example: 10:11767915:T:<CNV>,10:11779907:G:<CNV>
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>
https://lists.icgc.org/mailman/listinfo/docktesters


The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT

_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>
https://lists.icgc.org/mailman/listinfo/docktesters


The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT

_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>
https://lists.icgc.org/mailman/listinfo/docktesters


The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0BB9DB68-3AC6-4A64-8FEC-A2F54869FBB5[2].png
Type: image/png
Size: 9135 bytes
Desc: 0BB9DB68-3AC6-4A64-8FEC-A2F54869FBB5[2].png
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20170206/56d9ddc3/attachment-0001.png>


More information about the docktesters mailing list