[DOCKTESTERS] Variant call validation results for Sanger

Alexander Buchanan buchanae at ohsu.edu
Fri Dec 2 19:11:15 EST 2016


I was able to run USeq on data output from running the sanger workflow on a Cromwell engine, for 5 donors. It’s reporting some pretty big differences, so I still need investigate. I’ll copy the USeq output at the end of this email.

I also wrote a simple comparison script, similar to what Miguel is doing (but in python), and it also reports differences.

At this point, I don’t know the source of the difference. Maybe I’m not comparing the data correctly, or maybe the workflows were run incorrectly. Maybe the tools have some element of randomness, but I’m not sure that would explain the substantial differences.

Output from python:

python test.py
==================================================
Donor:  DO50414
intersection 17395
key - test 10904
test - key 6722
==================================================
Donor:  DO50415
intersection 34721
key - test 17806
test - key 8755
==================================================
Donor:  DO50417
intersection 81477
key - test 39521
test - key 15959
==================================================
Donor:  DO50419
intersection 82705
key - test 41674
test - key 15262
==================================================
Donor:  DO50432
intersection 3941
key - test 24358
test - key 138224

Output from USeq:


[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50414/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50414/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed

VCF Comparator Settings:

somatic_snv_mnv.vcf.gz  Key vcf file
human_chromosomes.bed              Key interrogated regions file
somatic_snv_mnv.vcf.gz  Test vcf file
human_chromosomes.bed              Test interrogated regions file
true        Require matching alternate bases
false       Require matching genotypes
false       Use record VQSLOD score as ranking statistic
false       Exclude non PASS or . records
true        Compare all variant

Parsing and filtering variant data for common interrogated regions...
Comparing calls...

3137454505         Interrogated bps in key
3137454505         Interrogated bps in test
3137454505         Interrogated bps in common
28299    Key variants
28299    Key variants in shared regions
0.904886914378029         Shared key variants Ti/Tv
24117    Test variants
24117    Test variants in shared regions
0.919073764621628         Shared test variants Ti/Tv

QUALThreshold   NumMatchTest   NumNonMatchTest           FDR=nonMatchTest/(matchTest+nonMatchTest)     decreasingFDR                TPR=matchTest/totalKey FPR=nonMatchTest/totalKey          PPV=matchTest/(matchTest+nonMatchTest)
none       17395    6722       0.27872455          0.27872455          0.614686              0.2375349            0.72127545


Done! 4 seconds


[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50415/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50415/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed

VCF Comparator Settings:

somatic_snv_mnv.vcf.gz  Key vcf file
human_chromosomes.bed              Key interrogated regions file
somatic_snv_mnv.vcf.gz  Test vcf file
human_chromosomes.bed              Test interrogated regions file
true        Require matching alternate bases
false       Require matching genotypes
false       Use record VQSLOD score as ranking statistic
false       Exclude non PASS or . records
true        Compare all variant

Parsing and filtering variant data for common interrogated regions...
Comparing calls...

3137454505         Interrogated bps in key
3137454505         Interrogated bps in test
3137454505         Interrogated bps in common
52527    Key variants
52527    Key variants in shared regions
0.9612067356158758       Shared key variants Ti/Tv
43476    Test variants
43476    Test variants in shared regions
0.9573203673689897       Shared test variants Ti/Tv

QUALThreshold   NumMatchTest   NumNonMatchTest           FDR=nonMatchTest/(matchTest+nonMatchTest)     decreasingFDR                TPR=matchTest/totalKey FPR=nonMatchTest/totalKey          PPV=matchTest/(matchTest+nonMatchTest)
none       34721    8755       0.20137547          0.20137547          0.6610124            0.16667618          0.7986245


Done! 4 seconds


[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50417/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50417/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed

VCF Comparator Settings:

somatic_snv_mnv.vcf.gz  Key vcf file
human_chromosomes.bed              Key interrogated regions file
somatic_snv_mnv.vcf.gz  Test vcf file
human_chromosomes.bed              Test interrogated regions file
true        Require matching alternate bases
false       Require matching genotypes
false       Use record VQSLOD score as ranking statistic
false       Exclude non PASS or . records
true        Compare all variant

Parsing and filtering variant data for common interrogated regions...
Comparing calls...

3137454505         Interrogated bps in key
3137454505         Interrogated bps in test
3137454505         Interrogated bps in common
120998  Key variants
120998  Key variants in shared regions
0.9540073962824799       Shared key variants Ti/Tv
97436    Test variants
97436    Test variants in shared regions
0.9392950261728001       Shared test variants Ti/Tv

QUALThreshold   NumMatchTest   NumNonMatchTest           FDR=nonMatchTest/(matchTest+nonMatchTest)     decreasingFDR                TPR=matchTest/totalKey FPR=nonMatchTest/totalKey          PPV=matchTest/(matchTest+nonMatchTest)
none       81477    15959    0.16378957          0.16378957          0.6733748            0.13189474          0.8362104


Done! 4 seconds


[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50419/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50419/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed

VCF Comparator Settings:

somatic_snv_mnv.vcf.gz  Key vcf file
human_chromosomes.bed              Key interrogated regions file
somatic_snv_mnv.vcf.gz  Test vcf file
human_chromosomes.bed              Test interrogated regions file
true        Require matching alternate bases
false       Require matching genotypes
false       Use record VQSLOD score as ranking statistic
false       Exclude non PASS or . records
true        Compare all variant

Parsing and filtering variant data for common interrogated regions...
Comparing calls...

3137454505         Interrogated bps in key
3137454505         Interrogated bps in test
3137454505         Interrogated bps in common
124379  Key variants
124379  Key variants in shared regions
0.9678664662605807       Shared key variants Ti/Tv
97967    Test variants
97967    Test variants in shared regions
0.9450632358488693       Shared test variants Ti/Tv

QUALThreshold   NumMatchTest   NumNonMatchTest           FDR=nonMatchTest/(matchTest+nonMatchTest)     decreasingFDR                TPR=matchTest/totalKey FPR=nonMatchTest/totalKey          PPV=matchTest/(matchTest+nonMatchTest)
none       82705    15262    0.15578716          0.15578716          0.66494346          0.1227056            0.84421283


Done! 3 seconds


[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50432/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50432/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed

VCF Comparator Settings:

somatic_snv_mnv.vcf.gz  Key vcf file
human_chromosomes.bed              Key interrogated regions file
somatic_snv_mnv.vcf.gz  Test vcf file
human_chromosomes.bed              Test interrogated regions file
true        Require matching alternate bases
false       Require matching genotypes
false       Use record VQSLOD score as ranking statistic
false       Exclude non PASS or . records
true        Compare all variant

Parsing and filtering variant data for common interrogated regions...
Comparing calls...

3137454505         Interrogated bps in key
3137454505         Interrogated bps in test
3137454505         Interrogated bps in common
28299    Key variants
28299    Key variants in shared regions
0.904886914378029         Shared key variants Ti/Tv
142165  Test variants
142165  Test variants in shared regions
0.9905488658639037       Shared test variants Ti/Tv

QUALThreshold   NumMatchTest   NumNonMatchTest           FDR=nonMatchTest/(matchTest+nonMatchTest)     decreasingFDR                TPR=matchTest/totalKey FPR=nonMatchTest/totalKey          PPV=matchTest/(matchTest+nonMatchTest)
none       3940       138225  0.97228575          0.97228575          0.13922754          4.884448              0.027714275


Done! 4 seconds

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161203/76114015/attachment-0001.html>


More information about the docktesters mailing list