[DOCKTESTERS] Preliminary results for overlap between testing and original VCF
Christina Yung
Christina.Yung at oicr.on.ca
Mon Nov 21 08:50:47 EST 2016
Hi Miguel,
For all of these pipelines, I suggest comparing to their original outputs from the production runs. You can find the GNOS info to download the BAMs and VCFs in this spreadsheet:
http://pancancer.info/data_releases/may2016/release_may2016.v1.4.tsv
The VCFs are from individual pipelines before any merging and filtering. A subset of BAMs and VCFs are also on AWS (US-West). Let me know if that’s your work environment, and I’ll point you to downloading from S3.
Thanks,
Christina
From: docktesters-bounces+christina.yung=oicr.on.ca at lists.icgc.org [mailto:docktesters-bounces+christina.yung=oicr.on.ca at lists.icgc.org] On Behalf Of Miguel Vazquez
Sent: Monday, November 21, 2016 8:43 AM
To: Francis Ouellette
Cc: docktesters at lists.icgc.org
Subject: Re: [DOCKTESTERS] Preliminary results for overlap between testing and original VCF
Hi all,
I have a question regarding the comparison with the official VCF for the BWA-Mem pipeline. I the VCF files I'm working with the callers are: broad, dkfz, sanger and muse. Which one corresponds to the BWA-Mem, if none, with what should I compare?
Best
M
On Mon, Nov 21, 2016 at 1:31 PM, Francis Ouellette <francis at oicr.on.ca<mailto:francis at oicr.on.ca>> wrote:
Miguel,
I’ve updated the wiki with your results, and added another link (on the same page)
to the google doc, where you describe what you did get USeq to work.
To all:
Christina has commented in an e-mail that we had what we needed to test
pcawg-bwa-mem-workflow pipeline, as well the pcawg-sanger-cgp-workflow
pieline.
Adam/Alex: Any advances in either of these fronts?
Talk to some of you in 90 min.
@bffo
From: Christina Yung <Christina.Yung at oicr.on.ca<mailto:Christina.Yung at oicr.on.ca>>
Thank you, Miguel. These results are very encouraging. I just have a suggestion: since we’re comparing strictly the outputs of the DKFZ/EMBL pipeline, we should compare the pre-filtered results, i.e.. ~51K calls. We’ll later compare if the filtering steps give similar results as well when the dockers become ready.
For testing BWA-Mem, Keiran has documented the steps to convert aligned BAM to unaligned:
https://wiki.oicr.on.ca/display/PANCANCER/Preparing+paired-end+data+for+upload
For Sanger docker, I believe Denis has tested the new version and reported that the problem is fixed.
On Nov 21, 2016, at 04:07, Miguel Vazquez <miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>> wrote:
Thanks Denis, I'm trying it out now
On Fri, Nov 18, 2016 at 7:18 PM, Denis Yuen <Denis.Yuen at oicr.on.ca<mailto:Denis.Yuen at oicr.on.ca>> wrote:
Hi,
Yes, you're going to want version 2.0.2 of quay.io/pancancer/pcawg-sanger-cgp-workflow<https://dockstore.org/containers/quay.io/pancancer/pcawg-sanger-cgp-workflow> and it should work on DO50311
________________________________
From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org<mailto:oicr.on.ca at lists.icgc.org> [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org<mailto:oicr.on.ca at lists.icgc.org>] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>]
Sent: November 18, 2016 10:06 AM
To: Francis Ouellette
Cc: docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>; Alysha Moncrieffe
Subject: Re: [DOCKTESTERS] Preliminary results for overlap between testing and original VCF
I've added a description on the google doc. Next week I'll try to put it properly into my scripts so I can run a bunch of these.
What is the status of the Sanger pipeline, is it fixed already?
On Fri, Nov 18, 2016 at 3:37 PM, Francis Ouellette <francis at oicr.on.ca<http://redir.aspx/?REF=WSDmaoJ3jmz2tDPVny13duJ4BUlEbh_-y7ecNxPVyIIp6Vk83w_UCAFtYWlsdG86ZnJhbmNpc0BvaWNyLm9uLmNh>> wrote:
Great,
Thank you Miguel! I would call this one a success!
I think we need two such success for each pipeline.
I will update table with this one.
Let’s get it done for the others. I will send more mail today.
Miguel: I imagine you documented what you did on google doc?
Thank you all,
francis
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette<http://redir.aspx/?REF=0UreupwcTkKW9dPdcAcOZSwr1PCoFogk1pc5rUxKyqwp6Vk83w_UCAFodHRwOi8vb2ljci5vbi5jYS9wZXJzb24vZnJhbmNpcy1vdWVsbGV0dGU.>
On Nov 18, 2016, at 8:48 AM, Miguel Vazquez <miguel.vazquez at cnio.es<http://redir.aspx/?REF=Kweh6Zrv2yLYJ6d-A3noRb40KYzhiBDYyhrAJFBs39wp6Vk83w_UCAFtYWlsdG86bWlndWVsLnZhenF1ZXpAY25pby5lcw..>> wrote:
Hi again
I've done some more investigating and it turns out that there is a was ignoring the quite obvious 'FILTER' tag. Silly me. Filtering now for mutations that 'PASS' I get
Comparison
----------
Total original (dkfz): 16090
Total this: 16088
Common: 16088
Missing: 2. Example: 10:86361665:T, 3:168842417:G
Extra: 0. Example:
Not a perfect match, but very close!!!!
Best
Miguel
On Fri, Nov 18, 2016 at 2:06 PM, Miguel Vazquez <miguel.vazquez at cnio.es<http://redir.aspx/?REF=Kweh6Zrv2yLYJ6d-A3noRb40KYzhiBDYyhrAJFBs39wp6Vk83w_UCAFtYWlsdG86bWlndWVsLnZhenF1ZXpAY25pby5lcw..>> wrote:
Dear Francis and friends,
Given that Francis was eager to see some inital estimates on how well the testing where in terms of overlap I have made some advances. Let me show you some of my initial results.
For sample DO50311 with the pipeline from DKFZ (using Delly first to produce the BEDPE file) I get the following result:
Comparison
----------
Total original (dkfz): 16090
Total this: 51087
Common: 16090
Missing: 0. Example:
Extra: 34997. Example: 1:10157:C, 1:725511:A, 1:725971:T, 1:726707:A
Whit means that in the original VCF there are 16K mutations, all of them are found in our new VCF (this), however our new file contains 35K extra mutations. Listed are some examples of extra mutations, going back to our VCF here is a sample line
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CONTROL TUMOR
1 725971 . G T . RE;BL;TAC;HSDEPTH;SBAF;FRQ;VAF SOMATIC;SNP;AF=0.02,0.03;MQ=57 GT:DP:DP4 0/0:115:60,53,2,0 0/0:114:49,62,0,3
I take it this is a good result. Finding all the reported mutations is a great sign I think, and the extra mutations must be a filtering step that we need to account for. I hope someone can point out from the VCF line above what is it that I need to use for the filtering.
The VCF files I took from a file I have named 'preliminary_final_release.snvs.tgz' from May 30 that contains VCF file with the merged results from all callers. I simply subset the lines for each caller, in this case dkfz. Also the files are listed by aliquote so I have to translate the donor to aliquote ID. I've script this quickly using my Rbbt framework but I'll rewrite it all in bash and add it to my repo of testing scripts at https://github.com/mikisvaz/PCAWG-Docker-Test<http://redir.aspx/?REF=ifG4GCk_l3pNcH7e3HJrfA0cM0Gm2sNaFyX98swizIYp6Vk83w_UCAFodHRwczovL2dpdGh1Yi5jb20vbWlraXN2YXovUENBV0ctRG9ja2VyLVRlc3Q.>
Summary of my progress
-----------------------------------
- Pipelines: DKFZ (works), Sanger (doesn't work. fixed?), BWM-Mem (not integrated; missing data-preparation step), Broad (??)
- Donor integration: GNOS (works), IGCG (works)
- Comparison: DKFZ (missing filtering?), rest (waiting)
I have everything scripted so I can iterate a list of donors and download the data, run pipelines, erase data, compare results.
Missing things on my ToDo list
-------------------------------------------
- Integrate BWM-Mem by incorporating the initial step to de-align the BAM files
- Find a programmatic way to access the bundle-id files for each donor from ICGC data portal, righ now I have to go to the web page
- Add filtering step to DKFZ and other pipelines as they become usable.
- Change the scripting of the comparison to bash and add it to https://github.com/mikisvaz/PCAWG-Docker-Test<http://redir.aspx/?REF=ifG4GCk_l3pNcH7e3HJrfA0cM0Gm2sNaFyX98swizIYp6Vk83w_UCAFodHRwczovL2dpdGh1Yi5jb20vbWlraXN2YXovUENBV0ctRG9ja2VyLVRlc3Q.>
Best regards to all
Miguel
On Tue, Nov 8, 2016 at 3:40 PM, Francis Ouellette <francis at oicr.on.ca<http://redir.aspx/?REF=WSDmaoJ3jmz2tDPVny13duJ4BUlEbh_-y7ecNxPVyIIp6Vk83w_UCAFtYWlsdG86ZnJhbmNpc0BvaWNyLm9uLmNh>> wrote:
Anybody else on our poll for next call?
Looks like Friday at 11:00. I will close poll later today.
@bffo
<Screenshot 2016-11-08 09.38.56.png>
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org<http://redir.aspx/?REF=KK_VOfU2uNcbODGU4Lfr1y3ZGPez4FjPXh7X_ZQ_MS8p6Vk83w_UCAFtYWlsdG86ZG9ja3Rlc3RlcnNAbGlzdHMuaWNnYy5vcmc.>
https://lists.icgc.org/mailman/listinfo/docktesters<http://redir.aspx/?REF=u-2uNMcWMGwMjsNB2mWtIgvNoHHVJeMtFa2HY-To8sAp6Vk83w_UCAFodHRwczovL2xpc3RzLmljZ2Mub3JnL21haWxtYW4vbGlzdGluZm8vZG9ja3Rlc3RlcnM.>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161121/a1c921fb/attachment-0001.html>
More information about the docktesters
mailing list