[DOCKTESTERS] Preliminary results for overlap between testing and original VCF

Miguel Vazquez miguel.vazquez at cnio.es
Mon Nov 21 09:34:06 EST 2016


Hi Christina,

I've done a quick test downloading the VCF file from GNOS and it appears we
have a *100% overlap* between when considering all variants. I'll update my
code to use the GNOS VCFs and no filtering from now on.

Best

M

On Mon, Nov 21, 2016 at 2:50 PM, Christina Yung <Christina.Yung at oicr.on.ca>
wrote:

> Hi Miguel,
>
>
>
> For all of these pipelines, I suggest comparing to their original outputs
> from the production runs.  You can find the GNOS info to download the BAMs
> and VCFs in this spreadsheet:
>
> http://pancancer.info/data_releases/may2016/release_may2016.v1.4.tsv
>
>
>
> The VCFs are from individual pipelines before any merging and filtering.
> A subset of BAMs and VCFs are also on AWS (US-West).  Let me know if that’s
> your work environment, and I’ll point you to downloading from S3.
>
>
>
> Thanks,
>
> Christina
>
>
>
> *From:* docktesters-bounces+christina.yung=oicr.on.ca at lists.icgc.org
> [mailto:docktesters-bounces+christina.yung=oicr.on.ca at lists.icgc.org] *On
> Behalf Of *Miguel Vazquez
> *Sent:* Monday, November 21, 2016 8:43 AM
> *To:* Francis Ouellette
> *Cc:* docktesters at lists.icgc.org
> *Subject:* Re: [DOCKTESTERS] Preliminary results for overlap between
> testing and original VCF
>
>
>
> Hi all,
>
> I have a question regarding the comparison with the official VCF for the
> BWA-Mem pipeline. I the VCF files I'm working with the callers are: broad,
> dkfz, sanger and muse. Which one corresponds to the BWA-Mem, if none, with
> what should I compare?
>
> Best
>
> M
>
>
>
> On Mon, Nov 21, 2016 at 1:31 PM, Francis Ouellette <francis at oicr.on.ca>
> wrote:
>
> Miguel,
>
>
>
> I’ve updated the wiki with your results, and added another link (on the
> same page)
>
> to the google doc, where you describe what you did get USeq to work.
>
>
>
> To all:
>
>
>
> Christina has commented in an e-mail that we had what we needed to test
>
> pcawg-bwa-mem-workflow pipeline, as well the pcawg-sanger-cgp-workflow
>
> pieline.
>
>
>
> Adam/Alex: Any advances in either of these fronts?
>
>
>
> Talk to some of you in 90 min.
>
>
>
> @bffo
>
>
>
>
>
> *From: *Christina Yung <Christina.Yung at oicr.on.ca>
>
>
>
> Thank you, Miguel.  These results are very encouraging.  I just have a
> suggestion: since we’re comparing strictly the outputs of the DKFZ/EMBL
> pipeline, we should compare the pre-filtered results, i.e.. ~51K calls.
> We’ll later compare if the filtering steps give similar results as well
> when the dockers become ready.
>
>
>
> For testing BWA-Mem, Keiran has documented the steps to convert aligned
> BAM to unaligned:
>
> https://wiki.oicr.on.ca/display/PANCANCER/Preparing+
> paired-end+data+for+upload
>
>
>
> For Sanger docker, I believe Denis has tested the new version and reported
> that the problem is fixed.
>
>
>
>
>
>
>
>
>
> On Nov 21, 2016, at 04:07, Miguel Vazquez <miguel.vazquez at cnio.es> wrote:
>
>
>
> Thanks Denis, I'm trying it out now
>
>
>
> On Fri, Nov 18, 2016 at 7:18 PM, Denis Yuen <Denis.Yuen at oicr.on.ca> wrote:
>
> Hi,
>
> Yes, you're going to want version 2.0.2 of quay.io/pancancer/pcawg-
> sanger-cgp-workflow
> <https://dockstore.org/containers/quay.io/pancancer/pcawg-sanger-cgp-workflow>
> and it should work on DO50311
>
>
> ------------------------------
>
> *From:* docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org
> [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of
> Miguel Vazquez [miguel.vazquez at cnio.es]
> *Sent:* November 18, 2016 10:06 AM
> *To:* Francis Ouellette
> *Cc:* docktesters at lists.icgc.org; Alysha Moncrieffe
> *Subject:* Re: [DOCKTESTERS] Preliminary results for overlap between
> testing and original VCF
>
> I've added a description on the google doc. Next week I'll try to put it
> properly into my scripts so I can run a bunch of these.
>
> What is the status of the Sanger pipeline, is it fixed already?
>
>
>
> On Fri, Nov 18, 2016 at 3:37 PM, Francis Ouellette <francis at oicr.on.ca
> <http://redir.aspx/?REF=WSDmaoJ3jmz2tDPVny13duJ4BUlEbh_-y7ecNxPVyIIp6Vk83w_UCAFtYWlsdG86ZnJhbmNpc0BvaWNyLm9uLmNh>>
> wrote:
>
>
>
> Great,
>
>
>
> Thank you Miguel!  I would call this one a success!
>
>
>
> I think we need two such success for each pipeline.
>
>
>
> I will update table with this one.
>
>
>
> Let’s get it done for the others. I will send more mail today.
>
>
>
> Miguel: I imagine you documented what you did on google doc?
>
>
>
> Thank you all,
>
>
>
> francis
>
>
>
> --
> B.F. Francis Ouellette          http://oicr.on.ca/person/francis-ouellette
> <http://redir.aspx/?REF=0UreupwcTkKW9dPdcAcOZSwr1PCoFogk1pc5rUxKyqwp6Vk83w_UCAFodHRwOi8vb2ljci5vbi5jYS9wZXJzb24vZnJhbmNpcy1vdWVsbGV0dGU.>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Nov 18, 2016, at 8:48 AM, Miguel Vazquez <miguel.vazquez at cnio.es
> <http://redir.aspx/?REF=Kweh6Zrv2yLYJ6d-A3noRb40KYzhiBDYyhrAJFBs39wp6Vk83w_UCAFtYWlsdG86bWlndWVsLnZhenF1ZXpAY25pby5lcw..>>
> wrote:
>
>
>
> Hi again
>
>
> I've done some more investigating and it turns out that there is a was
> ignoring the quite obvious 'FILTER' tag. Silly me. Filtering now for
> mutations that 'PASS' I get
>
> Comparison
> ----------
> Total original (dkfz): 16090
> Total this: 16088
>
>
> *Common: 16088 Missing: 2. Example: 10:86361665:T, 3:168842417:G Extra: 0.
> Example: *
>
> Not a perfect match, but very close!!!!
>
> Best
>
> Miguel
>
>
>
>
>
> On Fri, Nov 18, 2016 at 2:06 PM, Miguel Vazquez <miguel.vazquez at cnio.es
> <http://redir.aspx/?REF=Kweh6Zrv2yLYJ6d-A3noRb40KYzhiBDYyhrAJFBs39wp6Vk83w_UCAFtYWlsdG86bWlndWVsLnZhenF1ZXpAY25pby5lcw..>>
> wrote:
>
> Dear Francis and friends,
>
> Given that Francis was eager to see some inital estimates on how well the
> testing where in terms of overlap I have made some advances. Let me show
> you some of my initial results.
>
> For sample DO50311 with the pipeline from DKFZ (using Delly first to
> produce the BEDPE file) I get the following result:
>
>
> *Comparison ----------*
> Total original (dkfz): 16090
> Total this: 51087
> *Common: 16090*
> *Missing: 0*. Example:
> *Extra: 34997*. Example: 1:10157:C, 1:725511:A, 1:725971:T, 1:726707:A
>
>
>
> Whit means that in the original VCF there are 16K mutations, all of them
> are found in our new VCF (this), however our new file contains 35K extra
> mutations. Listed are some examples of extra mutations, going back to our
> VCF here is a sample line
>
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
> CONTROL TUMOR
> 1       725971  .       G       T       .       RE;BL;TAC;HSDEPTH;SBAF;FRQ;VAF
> SOMATIC;SNP;AF=0.02,0.03;MQ=57  GT:DP:DP4       0/0:115:60,53,2,0
> 0/0:114:49,62,0,3
>
> I take it this is a good result. Finding all the reported mutations is a
> great sign I think, and the extra mutations must be a filtering step that
> we need to account for. *I hope someone can point out from the VCF line
> above what is it that I need to use for the filtering.*
>
> The *VCF files I took from a file I have named
> 'preliminary_final_release.snvs.tgz' from May 30* that contains VCF file
> with the merged results from all callers. I simply subset the lines for
> each caller, in this case dkfz. Also the files are listed by aliquote so I
> have to translate the donor to aliquote ID. I've script this quickly using
> my Rbbt framework but I'll rewrite it all in bash and add it to my repo of
> testing scripts at https://github.com/mikisvaz/PCAWG-Docker-Test
> <http://redir.aspx/?REF=ifG4GCk_l3pNcH7e3HJrfA0cM0Gm2sNaFyX98swizIYp6Vk83w_UCAFodHRwczovL2dpdGh1Yi5jb20vbWlraXN2YXovUENBV0ctRG9ja2VyLVRlc3Q.>
>
> Summary of my progress
> -----------------------------------
>
>
>
> - Pipelines: DKFZ (works), Sanger (doesn't work. fixed?), BWM-Mem (not
> integrated; missing data-preparation step), Broad (??)
>
> - Donor integration: GNOS (works), IGCG (works)
>
> - Comparison: DKFZ (missing filtering?), rest (waiting)
>
> I have everything scripted so I can iterate a list of donors and download
> the data, run pipelines, erase data, compare results.
>
>
>
> Missing things on my ToDo list
> -------------------------------------------
>
> - Integrate BWM-Mem by incorporating the initial step to de-align the BAM
> files
>
> - Find a programmatic way to access the bundle-id files for each donor
> from ICGC data portal, righ now I have to go to the web page
>
> - Add filtering step to DKFZ and other pipelines as they become usable.
>
> - Change the scripting of the comparison to bash and add it to
> https://github.com/mikisvaz/PCAWG-Docker-Test
> <http://redir.aspx/?REF=ifG4GCk_l3pNcH7e3HJrfA0cM0Gm2sNaFyX98swizIYp6Vk83w_UCAFodHRwczovL2dpdGh1Yi5jb20vbWlraXN2YXovUENBV0ctRG9ja2VyLVRlc3Q.>
>
>
>
> Best regards to all
>
> Miguel
>
>
>
>
>
>
>
> On Tue, Nov 8, 2016 at 3:40 PM, Francis Ouellette <francis at oicr.on.ca
> <http://redir.aspx/?REF=WSDmaoJ3jmz2tDPVny13duJ4BUlEbh_-y7ecNxPVyIIp6Vk83w_UCAFtYWlsdG86ZnJhbmNpc0BvaWNyLm9uLmNh>>
> wrote:
>
>
>
> Anybody else on our poll for next call?
>
> Looks like Friday at 11:00. I will close poll later today.
>
>
>
>
>
> @bffo
>
>
>
>
>
>
>
> <Screenshot 2016-11-08 09.38.56.png>
>
>
>
>
>
>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
> <http://redir.aspx/?REF=KK_VOfU2uNcbODGU4Lfr1y3ZGPez4FjPXh7X_ZQ_MS8p6Vk83w_UCAFtYWlsdG86ZG9ja3Rlc3RlcnNAbGlzdHMuaWNnYy5vcmc.>
> https://lists.icgc.org/mailman/listinfo/docktesters
> <http://redir.aspx/?REF=u-2uNMcWMGwMjsNB2mWtIgvNoHHVJeMtFa2HY-To8sAp6Vk83w_UCAFodHRwczovL2xpc3RzLmljZ2Mub3JnL21haWxtYW4vbGlzdGluZm8vZG9ja3Rlc3RlcnM.>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161121/13c2fe10/attachment-0001.html>


More information about the docktesters mailing list