[DOCKTESTERS] Variant call validation results for Sanger

Alexander Buchanan buchanae at ohsu.edu
Mon Dec 5 13:21:33 EST 2016


Regarding the validation results I posted last Friday, we think these poor results are likely due an upstream issue and not the Sanger workflow itself. Those variant call results were from a larger process including fastq prep., alignment, and then Sanger variant calling, and we think we introduced a problem early on during fastq prep.

We have a different set of Sanger results that reused the existing alignments from GNOS, and those variants match the expected results much more closely (99.99% match).  One example output from USeq:

82486    Key variants
82486    Key variants in shared regions
0.953626071716167    Shared key variants Ti/Tv
82482    Test variants
82482    Test variants in shared regions
0.9536238749407864    Shared test variants Ti/Tv

QUALThreshold    NumMatchTest    NumNonMatchTest    FDR=nonMatchTest/(matchTest+nonMatchTest)    decreasingFDR    TPR=matchTest/totalKeyFPR=nonMatchTest/totalKey    PPV=matchTest/(matchTest+nonMatchTest)
none    82479    3    3.6371574E-5    3.6371574E-5    0.9999151    3.636981E-5    0.99996364



From: <docktesters-bounces+buchanae=ohsu.edu at lists.icgc.org> on behalf of Alexander Buchanan <buchanae at ohsu.edu>
Date: Friday, December 2, 2016 at 4:11 PM
To: "docktesters at lists.icgc.org" <docktesters at lists.icgc.org>
Subject: [DOCKTESTERS] Variant call validation results for Sanger

I was able to run USeq on data output from running the sanger workflow on a Cromwell engine, for 5 donors. It’s reporting some pretty big differences, so I still need investigate. I’ll copy the USeq output at the end of this email.

I also wrote a simple comparison script, similar to what Miguel is doing (but in python), and it also reports differences.

At this point, I don’t know the source of the difference. Maybe I’m not comparing the data correctly, or maybe the workflows were run incorrectly. Maybe the tools have some element of randomness, but I’m not sure that would explain the substantial differences.

Output from python:

python test.py
==================================================
Donor:  DO50414
intersection 17395
key - test 10904
test - key 6722
==================================================
Donor:  DO50415
intersection 34721
key - test 17806
test - key 8755
==================================================
Donor:  DO50417
intersection 81477
key - test 39521
test - key 15959
==================================================
Donor:  DO50419
intersection 82705
key - test 41674
test - key 15262
==================================================
Donor:  DO50432
intersection 3941
key - test 24358
test - key 138224

Output from USeq:


[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50414/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50414/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed

VCF Comparator Settings:

somatic_snv_mnv.vcf.gz  Key vcf file
human_chromosomes.bed              Key interrogated regions file
somatic_snv_mnv.vcf.gz  Test vcf file
human_chromosomes.bed              Test interrogated regions file
true        Require matching alternate bases
false       Require matching genotypes
false       Use record VQSLOD score as ranking statistic
false       Exclude non PASS or . records
true        Compare all variant

Parsing and filtering variant data for common interrogated regions...
Comparing calls...

3137454505         Interrogated bps in key
3137454505         Interrogated bps in test
3137454505         Interrogated bps in common
28299    Key variants
28299    Key variants in shared regions
0.904886914378029         Shared key variants Ti/Tv
24117    Test variants
24117    Test variants in shared regions
0.919073764621628         Shared test variants Ti/Tv

QUALThreshold   NumMatchTest   NumNonMatchTest           FDR=nonMatchTest/(matchTest+nonMatchTest)     decreasingFDR                TPR=matchTest/totalKey FPR=nonMatchTest/totalKey          PPV=matchTest/(matchTest+nonMatchTest)
none       17395    6722       0.27872455          0.27872455          0.614686              0.2375349            0.72127545


Done! 4 seconds


[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50415/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50415/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed

VCF Comparator Settings:

somatic_snv_mnv.vcf.gz  Key vcf file
human_chromosomes.bed              Key interrogated regions file
somatic_snv_mnv.vcf.gz  Test vcf file
human_chromosomes.bed              Test interrogated regions file
true        Require matching alternate bases
false       Require matching genotypes
false       Use record VQSLOD score as ranking statistic
false       Exclude non PASS or . records
true        Compare all variant

Parsing and filtering variant data for common interrogated regions...
Comparing calls...

3137454505         Interrogated bps in key
3137454505         Interrogated bps in test
3137454505         Interrogated bps in common
52527    Key variants
52527    Key variants in shared regions
0.9612067356158758       Shared key variants Ti/Tv
43476    Test variants
43476    Test variants in shared regions
0.9573203673689897       Shared test variants Ti/Tv

QUALThreshold   NumMatchTest   NumNonMatchTest           FDR=nonMatchTest/(matchTest+nonMatchTest)     decreasingFDR                TPR=matchTest/totalKey FPR=nonMatchTest/totalKey          PPV=matchTest/(matchTest+nonMatchTest)
none       34721    8755       0.20137547          0.20137547          0.6610124            0.16667618          0.7986245


Done! 4 seconds


[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50417/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50417/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed

VCF Comparator Settings:

somatic_snv_mnv.vcf.gz  Key vcf file
human_chromosomes.bed              Key interrogated regions file
somatic_snv_mnv.vcf.gz  Test vcf file
human_chromosomes.bed              Test interrogated regions file
true        Require matching alternate bases
false       Require matching genotypes
false       Use record VQSLOD score as ranking statistic
false       Exclude non PASS or . records
true        Compare all variant

Parsing and filtering variant data for common interrogated regions...
Comparing calls...

3137454505         Interrogated bps in key
3137454505         Interrogated bps in test
3137454505         Interrogated bps in common
120998  Key variants
120998  Key variants in shared regions
0.9540073962824799       Shared key variants Ti/Tv
97436    Test variants
97436    Test variants in shared regions
0.9392950261728001       Shared test variants Ti/Tv

QUALThreshold   NumMatchTest   NumNonMatchTest           FDR=nonMatchTest/(matchTest+nonMatchTest)     decreasingFDR                TPR=matchTest/totalKey FPR=nonMatchTest/totalKey          PPV=matchTest/(matchTest+nonMatchTest)
none       81477    15959    0.16378957          0.16378957          0.6733748            0.13189474          0.8362104


Done! 4 seconds


[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50419/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50419/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed

VCF Comparator Settings:

somatic_snv_mnv.vcf.gz  Key vcf file
human_chromosomes.bed              Key interrogated regions file
somatic_snv_mnv.vcf.gz  Test vcf file
human_chromosomes.bed              Test interrogated regions file
true        Require matching alternate bases
false       Require matching genotypes
false       Use record VQSLOD score as ranking statistic
false       Exclude non PASS or . records
true        Compare all variant

Parsing and filtering variant data for common interrogated regions...
Comparing calls...

3137454505         Interrogated bps in key
3137454505         Interrogated bps in test
3137454505         Interrogated bps in common
124379  Key variants
124379  Key variants in shared regions
0.9678664662605807       Shared key variants Ti/Tv
97967    Test variants
97967    Test variants in shared regions
0.9450632358488693       Shared test variants Ti/Tv

QUALThreshold   NumMatchTest   NumNonMatchTest           FDR=nonMatchTest/(matchTest+nonMatchTest)     decreasingFDR                TPR=matchTest/totalKey FPR=nonMatchTest/totalKey          PPV=matchTest/(matchTest+nonMatchTest)
none       82705    15262    0.15578716          0.15578716          0.66494346          0.1227056            0.84421283


Done! 3 seconds


[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50432/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50432/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed

VCF Comparator Settings:

somatic_snv_mnv.vcf.gz  Key vcf file
human_chromosomes.bed              Key interrogated regions file
somatic_snv_mnv.vcf.gz  Test vcf file
human_chromosomes.bed              Test interrogated regions file
true        Require matching alternate bases
false       Require matching genotypes
false       Use record VQSLOD score as ranking statistic
false       Exclude non PASS or . records
true        Compare all variant

Parsing and filtering variant data for common interrogated regions...
Comparing calls...

3137454505         Interrogated bps in key
3137454505         Interrogated bps in test
3137454505         Interrogated bps in common
28299    Key variants
28299    Key variants in shared regions
0.904886914378029         Shared key variants Ti/Tv
142165  Test variants
142165  Test variants in shared regions
0.9905488658639037       Shared test variants Ti/Tv

QUALThreshold   NumMatchTest   NumNonMatchTest           FDR=nonMatchTest/(matchTest+nonMatchTest)     decreasingFDR                TPR=matchTest/totalKey FPR=nonMatchTest/totalKey          PPV=matchTest/(matchTest+nonMatchTest)
none       3940       138225  0.97228575          0.97228575          0.13922754          4.884448              0.027714275


Done! 4 seconds

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161205/1aa2ba36/attachment-0001.html>


More information about the docktesters mailing list