[DOCKTESTERS] Preliminary results for overlap between testing and original VCF

Miguel Vazquez miguel.vazquez at cnio.es
Mon Nov 21 08:43:12 EST 2016


Hi all,

I have a question regarding the comparison with the official VCF for the
BWA-Mem pipeline. I the VCF files I'm working with the callers are: broad,
dkfz, sanger and muse. Which one corresponds to the BWA-Mem, if none, with
what should I compare?

Best

M

On Mon, Nov 21, 2016 at 1:31 PM, Francis Ouellette <francis at oicr.on.ca>
wrote:

> Miguel,
>
> I’ve updated the wiki with your results, and added another link (on the
> same page)
> to the google doc, where you describe what you did get USeq to work.
>
> To all:
>
> Christina has commented in an e-mail that we had what we needed to test
> pcawg-bwa-mem-workflow pipeline, as well the pcawg-sanger-cgp-workflow
> pieline.
>
> Adam/Alex: Any advances in either of these fronts?
>
> Talk to some of you in 90 min.
>
> @bffo
>
>
> *From: *Christina Yung <Christina.Yung at oicr.on.ca>
>
> Thank you, Miguel.  These results are very encouraging.  I just have a
> suggestion: since we’re comparing strictly the outputs of the DKFZ/EMBL
> pipeline, we should compare the pre-filtered results, i.e.. ~51K calls.
> We’ll later compare if the filtering steps give similar results as well
> when the dockers become ready.
>
> For testing BWA-Mem, Keiran has documented the steps to convert aligned
> BAM to unaligned:
> https://wiki.oicr.on.ca/display/PANCANCER/Preparing+
> paired-end+data+for+upload
>
> For Sanger docker, I believe Denis has tested the new version and reported
> that the problem is fixed.
>
>
>
>
>
> On Nov 21, 2016, at 04:07, Miguel Vazquez <miguel.vazquez at cnio.es> wrote:
>
> Thanks Denis, I'm trying it out now
>
> On Fri, Nov 18, 2016 at 7:18 PM, Denis Yuen <Denis.Yuen at oicr.on.ca> wrote:
>
>> Hi,
>>
>> Yes, you're going to want version 2.0.2 of quay.io/pancancer/pcawg-sanger
>> -cgp-workflow
>> <https://dockstore.org/containers/quay.io/pancancer/pcawg-sanger-cgp-workflow>
>> and it should work on DO50311
>>
>> ------------------------------
>> *From:* docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org
>> [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of
>> Miguel Vazquez [miguel.vazquez at cnio.es]
>> *Sent:* November 18, 2016 10:06 AM
>> *To:* Francis Ouellette
>> *Cc:* docktesters at lists.icgc.org; Alysha Moncrieffe
>> *Subject:* Re: [DOCKTESTERS] Preliminary results for overlap between
>> testing and original VCF
>>
>> I've added a description on the google doc. Next week I'll try to put it
>> properly into my scripts so I can run a bunch of these.
>>
>> What is the status of the Sanger pipeline, is it fixed already?
>>
>> On Fri, Nov 18, 2016 at 3:37 PM, Francis Ouellette <francis at oicr.on.ca
>> <http://redir.aspx/?REF=WSDmaoJ3jmz2tDPVny13duJ4BUlEbh_-y7ecNxPVyIIp6Vk83w_UCAFtYWlsdG86ZnJhbmNpc0BvaWNyLm9uLmNh>
>> > wrote:
>>
>>>
>>> Great,
>>>
>>> Thank you Miguel!  I would call this one a success!
>>>
>>> I think we need two such success for each pipeline.
>>>
>>> I will update table with this one.
>>>
>>> Let’s get it done for the others. I will send more mail today.
>>>
>>> Miguel: I imagine you documented what you did on google doc?
>>>
>>> Thank you all,
>>>
>>> francis
>>>
>>> --
>>> B.F. Francis Ouellette          http://oicr.on.ca/person/fran
>>> cis-ouellette
>>> <http://redir.aspx/?REF=0UreupwcTkKW9dPdcAcOZSwr1PCoFogk1pc5rUxKyqwp6Vk83w_UCAFodHRwOi8vb2ljci5vbi5jYS9wZXJzb24vZnJhbmNpcy1vdWVsbGV0dGU.>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Nov 18, 2016, at 8:48 AM, Miguel Vazquez <miguel.vazquez at cnio.es
>>> <http://redir.aspx/?REF=Kweh6Zrv2yLYJ6d-A3noRb40KYzhiBDYyhrAJFBs39wp6Vk83w_UCAFtYWlsdG86bWlndWVsLnZhenF1ZXpAY25pby5lcw..>>
>>> wrote:
>>>
>>> Hi again
>>>
>>> I've done some more investigating and it turns out that there is a was
>>> ignoring the quite obvious 'FILTER' tag. Silly me. Filtering now for
>>> mutations that 'PASS' I get
>>>
>>> Comparison
>>> ----------
>>> Total original (dkfz): 16090
>>> Total this: 16088
>>>
>>>
>>>
>>>
>>> *Common: 16088 Missing: 2. Example: 10:86361665:T, 3:168842417:G Extra:
>>> 0. Example: *
>>> Not a perfect match, but very close!!!!
>>>
>>> Best
>>>
>>> Miguel
>>>
>>>
>>> On Fri, Nov 18, 2016 at 2:06 PM, Miguel Vazquez <miguel.vazquez at cnio.es
>>> <http://redir.aspx/?REF=Kweh6Zrv2yLYJ6d-A3noRb40KYzhiBDYyhrAJFBs39wp6Vk83w_UCAFtYWlsdG86bWlndWVsLnZhenF1ZXpAY25pby5lcw..>
>>> > wrote:
>>>
>>>> Dear Francis and friends,
>>>>
>>>> Given that Francis was eager to see some inital estimates on how well
>>>> the testing where in terms of overlap I have made some advances. Let me
>>>> show you some of my initial results.
>>>>
>>>> For sample DO50311 with the pipeline from DKFZ (using Delly first to
>>>> produce the BEDPE file) I get the following result:
>>>>
>>>>
>>>>> *Comparison ----------*
>>>>> Total original (dkfz): 16090
>>>>> Total this: 51087
>>>>> *Common: 16090*
>>>>> *Missing: 0*. Example:
>>>>> *Extra: 34997*. Example: 1:10157:C, 1:725511:A, 1:725971:T, 1:726707:A
>>>>>
>>>>>
>>>> Whit means that in the original VCF there are 16K mutations, all of
>>>> them are found in our new VCF (this), however our new file contains 35K
>>>> extra mutations. Listed are some examples of extra mutations, going back to
>>>> our VCF here is a sample line
>>>>
>>>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
>>>> CONTROL TUMOR
>>>> 1       725971  .       G       T       .
>>>> RE;BL;TAC;HSDEPTH;SBAF;FRQ;VAF  SOMATIC;SNP;AF=0.02,0.03;MQ=57
>>>> GT:DP:DP4       0/0:115:60,53,2,0       0/0:114:49,62,0,3
>>>>
>>>> I take it this is a good result. Finding all the reported mutations is
>>>> a great sign I think, and the extra mutations must be a filtering step that
>>>> we need to account for. *I hope someone can point out from the VCF
>>>> line above what is it that I need to use for the filtering.*
>>>>
>>>> The *VCF files I took from a file I have named
>>>> 'preliminary_final_release.snvs.tgz' from May 30* that contains VCF
>>>> file with the merged results from all callers. I simply subset the lines
>>>> for each caller, in this case dkfz. Also the files are listed by aliquote
>>>> so I have to translate the donor to aliquote ID. I've script this quickly
>>>> using my Rbbt framework but I'll rewrite it all in bash and add it to my
>>>> repo of testing scripts at https://github.com/mikisvaz/PC
>>>> AWG-Docker-Test
>>>> <http://redir.aspx/?REF=ifG4GCk_l3pNcH7e3HJrfA0cM0Gm2sNaFyX98swizIYp6Vk83w_UCAFodHRwczovL2dpdGh1Yi5jb20vbWlraXN2YXovUENBV0ctRG9ja2VyLVRlc3Q.>
>>>>
>>>> Summary of my progress
>>>> -----------------------------------
>>>>
>>>> - Pipelines: DKFZ (works), Sanger (doesn't work. fixed?), BWM-Mem (not
>>>> integrated; missing data-preparation step), Broad (??)
>>>> - Donor integration: GNOS (works), IGCG (works)
>>>> - Comparison: DKFZ (missing filtering?), rest (waiting)
>>>>
>>>> I have everything scripted so I can iterate a list of donors and
>>>> download the data, run pipelines, erase data, compare results.
>>>>
>>>> Missing things on my ToDo list
>>>> -------------------------------------------
>>>>
>>>> - Integrate BWM-Mem by incorporating the initial step to de-align the
>>>> BAM files
>>>> - Find a programmatic way to access the bundle-id files for each donor
>>>> from ICGC data portal, righ now I have to go to the web page
>>>> - Add filtering step to DKFZ and other pipelines as they become usable.
>>>> - Change the scripting of the comparison to bash and add it to
>>>> https://github.com/mikisvaz/PCAWG-Docker-Test
>>>> <http://redir.aspx/?REF=ifG4GCk_l3pNcH7e3HJrfA0cM0Gm2sNaFyX98swizIYp6Vk83w_UCAFodHRwczovL2dpdGh1Yi5jb20vbWlraXN2YXovUENBV0ctRG9ja2VyLVRlc3Q.>
>>>>
>>>> Best regards to all
>>>>
>>>> Miguel
>>>>
>>>>
>>>>
>>>> On Tue, Nov 8, 2016 at 3:40 PM, Francis Ouellette <francis at oicr.on.ca
>>>> <http://redir.aspx/?REF=WSDmaoJ3jmz2tDPVny13duJ4BUlEbh_-y7ecNxPVyIIp6Vk83w_UCAFtYWlsdG86ZnJhbmNpc0BvaWNyLm9uLmNh>
>>>> > wrote:
>>>>
>>>>>
>>>>> Anybody else on our poll for next call?
>>>>> Looks like Friday at 11:00. I will close poll later today.
>>>>>
>>>>>
>>>>> @bffo
>>>>>
>>>>>
>>>>>
>>>>> <Screenshot 2016-11-08 09.38.56.png>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> docktesters mailing list
>>>>> docktesters at lists.icgc.org
>>>>> <http://redir.aspx/?REF=KK_VOfU2uNcbODGU4Lfr1y3ZGPez4FjPXh7X_ZQ_MS8p6Vk83w_UCAFtYWlsdG86ZG9ja3Rlc3RlcnNAbGlzdHMuaWNnYy5vcmc.>
>>>>> https://lists.icgc.org/mailman/listinfo/docktesters
>>>>> <http://redir.aspx/?REF=u-2uNMcWMGwMjsNB2mWtIgvNoHHVJeMtFa2HY-To8sAp6Vk83w_UCAFodHRwczovL2xpc3RzLmljZ2Mub3JnL21haWxtYW4vbGlzdGluZm8vZG9ja3Rlc3RlcnM.>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161121/e24c3b61/attachment-0001.html>


More information about the docktesters mailing list