[DOCKTESTERS] Preliminary results for overlap between testing and original VCF

Miguel Vazquez miguel.vazquez at cnio.es
Mon Nov 21 04:07:46 EST 2016


Thanks Denis, I'm trying it out now

On Fri, Nov 18, 2016 at 7:18 PM, Denis Yuen <Denis.Yuen at oicr.on.ca> wrote:

> Hi,
>
> Yes, you're going to want version 2.0.2 of quay.io/pancancer/pcawg-
> sanger-cgp-workflow
> <https://dockstore.org/containers/quay.io/pancancer/pcawg-sanger-cgp-workflow>
> and it should work on DO50311
>
> ------------------------------
> *From:* docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org
> [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of
> Miguel Vazquez [miguel.vazquez at cnio.es]
> *Sent:* November 18, 2016 10:06 AM
> *To:* Francis Ouellette
> *Cc:* docktesters at lists.icgc.org; Alysha Moncrieffe
> *Subject:* Re: [DOCKTESTERS] Preliminary results for overlap between
> testing and original VCF
>
> I've added a description on the google doc. Next week I'll try to put it
> properly into my scripts so I can run a bunch of these.
>
> What is the status of the Sanger pipeline, is it fixed already?
>
> On Fri, Nov 18, 2016 at 3:37 PM, Francis Ouellette <francis at oicr.on.ca
> <http://redir.aspx?REF=WSDmaoJ3jmz2tDPVny13duJ4BUlEbh_-y7ecNxPVyIIp6Vk83w_UCAFtYWlsdG86ZnJhbmNpc0BvaWNyLm9uLmNh>
> > wrote:
>
>>
>> Great,
>>
>> Thank you Miguel!  I would call this one a success!
>>
>> I think we need two such success for each pipeline.
>>
>> I will update table with this one.
>>
>> Let’s get it done for the others. I will send more mail today.
>>
>> Miguel: I imagine you documented what you did on google doc?
>>
>> Thank you all,
>>
>> francis
>>
>> --
>> B.F. Francis Ouellette          http://oicr.on.ca/person/fran
>> cis-ouellette
>> <http://redir.aspx?REF=0UreupwcTkKW9dPdcAcOZSwr1PCoFogk1pc5rUxKyqwp6Vk83w_UCAFodHRwOi8vb2ljci5vbi5jYS9wZXJzb24vZnJhbmNpcy1vdWVsbGV0dGU.>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Nov 18, 2016, at 8:48 AM, Miguel Vazquez <miguel.vazquez at cnio.es
>> <http://redir.aspx?REF=Kweh6Zrv2yLYJ6d-A3noRb40KYzhiBDYyhrAJFBs39wp6Vk83w_UCAFtYWlsdG86bWlndWVsLnZhenF1ZXpAY25pby5lcw..>>
>> wrote:
>>
>> Hi again
>>
>> I've done some more investigating and it turns out that there is a was
>> ignoring the quite obvious 'FILTER' tag. Silly me. Filtering now for
>> mutations that 'PASS' I get
>>
>> Comparison
>> ----------
>> Total original (dkfz): 16090
>> Total this: 16088
>>
>>
>>
>>
>> *Common: 16088 Missing: 2. Example: 10:86361665:T, 3:168842417:G Extra:
>> 0. Example: *
>> Not a perfect match, but very close!!!!
>>
>> Best
>>
>> Miguel
>>
>>
>> On Fri, Nov 18, 2016 at 2:06 PM, Miguel Vazquez <miguel.vazquez at cnio.es
>> <http://redir.aspx?REF=Kweh6Zrv2yLYJ6d-A3noRb40KYzhiBDYyhrAJFBs39wp6Vk83w_UCAFtYWlsdG86bWlndWVsLnZhenF1ZXpAY25pby5lcw..>
>> > wrote:
>>
>>> Dear Francis and friends,
>>>
>>> Given that Francis was eager to see some inital estimates on how well
>>> the testing where in terms of overlap I have made some advances. Let me
>>> show you some of my initial results.
>>>
>>> For sample DO50311 with the pipeline from DKFZ (using Delly first to
>>> produce the BEDPE file) I get the following result:
>>>
>>>
>>>> *Comparison ----------*
>>>> Total original (dkfz): 16090
>>>> Total this: 51087
>>>> *Common: 16090*
>>>> *Missing: 0*. Example:
>>>> *Extra: 34997*. Example: 1:10157:C, 1:725511:A, 1:725971:T, 1:726707:A
>>>>
>>>>
>>> Whit means that in the original VCF there are 16K mutations, all of them
>>> are found in our new VCF (this), however our new file contains 35K extra
>>> mutations. Listed are some examples of extra mutations, going back to our
>>> VCF here is a sample line
>>>
>>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
>>> CONTROL TUMOR
>>> 1       725971  .       G       T       .
>>> RE;BL;TAC;HSDEPTH;SBAF;FRQ;VAF  SOMATIC;SNP;AF=0.02,0.03;MQ=57
>>> GT:DP:DP4       0/0:115:60,53,2,0       0/0:114:49,62,0,3
>>>
>>> I take it this is a good result. Finding all the reported mutations is a
>>> great sign I think, and the extra mutations must be a filtering step that
>>> we need to account for. *I hope someone can point out from the VCF line
>>> above what is it that I need to use for the filtering.*
>>>
>>> The *VCF files I took from a file I have named
>>> 'preliminary_final_release.snvs.tgz' from May 30* that contains VCF
>>> file with the merged results from all callers. I simply subset the lines
>>> for each caller, in this case dkfz. Also the files are listed by aliquote
>>> so I have to translate the donor to aliquote ID. I've script this quickly
>>> using my Rbbt framework but I'll rewrite it all in bash and add it to my
>>> repo of testing scripts at https://github.com/mikisvaz/PCAWG-Docker-Test
>>> <http://redir.aspx?REF=ifG4GCk_l3pNcH7e3HJrfA0cM0Gm2sNaFyX98swizIYp6Vk83w_UCAFodHRwczovL2dpdGh1Yi5jb20vbWlraXN2YXovUENBV0ctRG9ja2VyLVRlc3Q.>
>>>
>>> Summary of my progress
>>> -----------------------------------
>>>
>>> - Pipelines: DKFZ (works), Sanger (doesn't work. fixed?), BWM-Mem (not
>>> integrated; missing data-preparation step), Broad (??)
>>> - Donor integration: GNOS (works), IGCG (works)
>>> - Comparison: DKFZ (missing filtering?), rest (waiting)
>>>
>>> I have everything scripted so I can iterate a list of donors and
>>> download the data, run pipelines, erase data, compare results.
>>>
>>> Missing things on my ToDo list
>>> -------------------------------------------
>>>
>>> - Integrate BWM-Mem by incorporating the initial step to de-align the
>>> BAM files
>>> - Find a programmatic way to access the bundle-id files for each donor
>>> from ICGC data portal, righ now I have to go to the web page
>>> - Add filtering step to DKFZ and other pipelines as they become usable.
>>> - Change the scripting of the comparison to bash and add it to
>>> https://github.com/mikisvaz/PCAWG-Docker-Test
>>> <http://redir.aspx?REF=ifG4GCk_l3pNcH7e3HJrfA0cM0Gm2sNaFyX98swizIYp6Vk83w_UCAFodHRwczovL2dpdGh1Yi5jb20vbWlraXN2YXovUENBV0ctRG9ja2VyLVRlc3Q.>
>>>
>>> Best regards to all
>>>
>>> Miguel
>>>
>>>
>>>
>>> On Tue, Nov 8, 2016 at 3:40 PM, Francis Ouellette <francis at oicr.on.ca
>>> <http://redir.aspx?REF=WSDmaoJ3jmz2tDPVny13duJ4BUlEbh_-y7ecNxPVyIIp6Vk83w_UCAFtYWlsdG86ZnJhbmNpc0BvaWNyLm9uLmNh>
>>> > wrote:
>>>
>>>>
>>>> Anybody else on our poll for next call?
>>>> Looks like Friday at 11:00. I will close poll later today.
>>>>
>>>>
>>>> @bffo
>>>>
>>>>
>>>>
>>>> <Screenshot 2016-11-08 09.38.56.png>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> docktesters mailing list
>>>> docktesters at lists.icgc.org
>>>> <http://redir.aspx?REF=KK_VOfU2uNcbODGU4Lfr1y3ZGPez4FjPXh7X_ZQ_MS8p6Vk83w_UCAFtYWlsdG86ZG9ja3Rlc3RlcnNAbGlzdHMuaWNnYy5vcmc.>
>>>> https://lists.icgc.org/mailman/listinfo/docktesters
>>>> <http://redir.aspx?REF=u-2uNMcWMGwMjsNB2mWtIgvNoHHVJeMtFa2HY-To8sAp6Vk83w_UCAFodHRwczovL2xpc3RzLmljZ2Mub3JnL21haWxtYW4vbGlzdGluZm8vZG9ja3Rlc3RlcnM.>
>>>>
>>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161121/b708cfb7/attachment-0001.html>


More information about the docktesters mailing list