[DOCKTESTERS] Variant call validation results for Sanger
Francis Ouellette
francis at oicr.on.ca
Tue Dec 6 10:52:52 EST 2016
Hi Alex,
Likewise here, on the test that “worked”, are the differences platform specific, and
are they reproducible.
I think we only need to do this a couple of times, to inform us if the differences are
operator and/or platform specific, or simply (which I think) more about the heuristics
of the testing we are doing.
Thank you for looking into this.
@bffo
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
On Dec 5, 2016, at 1:21 PM, Alexander Buchanan <buchanae at ohsu.edu<mailto:buchanae at ohsu.edu>> wrote:
Regarding the validation results I posted last Friday, we think these poor results are likely due an upstream issue and not the Sanger workflow itself. Those variant call results were from a larger process including fastq prep., alignment, and then Sanger variant calling, and we think we introduced a problem early on during fastq prep.
We have a different set of Sanger results that reused the existing alignments from GNOS, and those variants match the expected results much more closely (99.99% match). One example output from USeq:
82486 Key variants
82486 Key variants in shared regions
0.953626071716167 Shared key variants Ti/Tv
82482 Test variants
82482 Test variants in shared regions
0.9536238749407864 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKeyFPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 82479 3 3.6371574E-5 3.6371574E-5 0.9999151 3.636981E-5 0.99996364
From: <docktesters-bounces+buchanae=ohsu.edu at lists.icgc.org<mailto:docktesters-bounces+buchanae=ohsu.edu at lists.icgc.org>> on behalf of Alexander Buchanan <buchanae at ohsu.edu<mailto:buchanae at ohsu.edu>>
Date: Friday, December 2, 2016 at 4:11 PM
To: "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>" <docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>
Subject: [DOCKTESTERS] Variant call validation results for Sanger
I was able to run USeq on data output from running the sanger workflow on a Cromwell engine, for 5 donors. It’s reporting some pretty big differences, so I still need investigate. I’ll copy the USeq output at the end of this email.
I also wrote a simple comparison script, similar to what Miguel is doing (but in python), and it also reports differences.
At this point, I don’t know the source of the difference. Maybe I’m not comparing the data correctly, or maybe the workflows were run incorrectly. Maybe the tools have some element of randomness, but I’m not sure that would explain the substantial differences.
Output from python:
python test.py
==================================================
Donor: DO50414
intersection 17395
key - test 10904
test - key 6722
==================================================
Donor: DO50415
intersection 34721
key - test 17806
test - key 8755
==================================================
Donor: DO50417
intersection 81477
key - test 39521
test - key 15959
==================================================
Donor: DO50419
intersection 82705
key - test 41674
test - key 15262
==================================================
Donor: DO50432
intersection 3941
key - test 24358
test - key 138224
Output from USeq:
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50414/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50414/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
28299 Key variants
28299 Key variants in shared regions
0.904886914378029 Shared key variants Ti/Tv
24117 Test variants
24117 Test variants in shared regions
0.919073764621628 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 17395 6722 0.27872455 0.27872455 0.614686 0.2375349 0.72127545
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50415/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50415/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
52527 Key variants
52527 Key variants in shared regions
0.9612067356158758 Shared key variants Ti/Tv
43476 Test variants
43476 Test variants in shared regions
0.9573203673689897 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 34721 8755 0.20137547 0.20137547 0.6610124 0.16667618 0.7986245
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50417/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50417/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
120998 Key variants
120998 Key variants in shared regions
0.9540073962824799 Shared key variants Ti/Tv
97436 Test variants
97436 Test variants in shared regions
0.9392950261728001 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 81477 15959 0.16378957 0.16378957 0.6733748 0.13189474 0.8362104
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50419/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50419/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
124379 Key variants
124379 Key variants in shared regions
0.9678664662605807 Shared key variants Ti/Tv
97967 Test variants
97967 Test variants in shared regions
0.9450632358488693 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 82705 15262 0.15578716 0.15578716 0.66494346 0.1227056 0.84421283
Done! 3 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50432/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50432/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
28299 Key variants
28299 Key variants in shared regions
0.904886914378029 Shared key variants Ti/Tv
142165 Test variants
142165 Test variants in shared regions
0.9905488658639037 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 3940 138225 0.97228575 0.97228575 0.13922754 4.884448 0.027714275
Done! 4 seconds
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>
https://lists.icgc.org/mailman/listinfo/docktesters
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161206/8ffcaa1b/attachment-0001.html>
More information about the docktesters
mailing list