[DOCKTESTERS] Preliminary results for overlap between testing and original VCF

Miguel Vazquez miguel.vazquez at cnio.es
Mon Nov 21 08:35:39 EST 2016


Hi Francis,

Just a few updates from my end.

- I've ported the comparison code to bash and added it to my repository of
scripts
- I've included the necesary tools to perform BAM unalignment (biobambam),
they are now submodules of my repository, and there are scripts to compile
them and use them
- I'm in the process of testing the BWA-Mem on the HCC1143 test data
- I've created a bacth processing tool that takes a list of donors and goes
through them running all three workflows (in addition to Delly before DKFZ)
and comparing the results. It downloads the donor data, runs the workflows,
compares, and cleans up for the next donor. I will do the finishing touches
one I finish the tests I have running on Sanger with a donor and BWA-Mem
for the test data.

My scripts are https://github.com/mikisvaz/PCAWG-Docker-Test

Once I have everything thing in line I will finish the documentation and
update you guys with another email

Best

M

On Mon, Nov 21, 2016 at 1:31 PM, Francis Ouellette <francis at oicr.on.ca>
wrote:

> Miguel,
>
> I’ve updated the wiki with your results, and added another link (on the
> same page)
> to the google doc, where you describe what you did get USeq to work.
>
> To all:
>
> Christina has commented in an e-mail that we had what we needed to test
> pcawg-bwa-mem-workflow pipeline, as well the pcawg-sanger-cgp-workflow
> pieline.
>
> Adam/Alex: Any advances in either of these fronts?
>
> Talk to some of you in 90 min.
>
> @bffo
>
>
> *From: *Christina Yung <Christina.Yung at oicr.on.ca>
>
> Thank you, Miguel.  These results are very encouraging.  I just have a
> suggestion: since we’re comparing strictly the outputs of the DKFZ/EMBL
> pipeline, we should compare the pre-filtered results, i.e.. ~51K calls.
> We’ll later compare if the filtering steps give similar results as well
> when the dockers become ready.
>
> For testing BWA-Mem, Keiran has documented the steps to convert aligned
> BAM to unaligned:
> https://wiki.oicr.on.ca/display/PANCANCER/Preparing+
> paired-end+data+for+upload
>
> For Sanger docker, I believe Denis has tested the new version and reported
> that the problem is fixed.
>
>
>
>
>
> On Nov 21, 2016, at 04:07, Miguel Vazquez <miguel.vazquez at cnio.es> wrote:
>
> Thanks Denis, I'm trying it out now
>
> On Fri, Nov 18, 2016 at 7:18 PM, Denis Yuen <Denis.Yuen at oicr.on.ca> wrote:
>
>> Hi,
>>
>> Yes, you're going to want version 2.0.2 of quay.io/pancancer/pcawg-sanger
>> -cgp-workflow
>> <https://dockstore.org/containers/quay.io/pancancer/pcawg-sanger-cgp-workflow>
>> and it should work on DO50311
>>
>> ------------------------------
>> *From:* docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org
>> [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of
>> Miguel Vazquez [miguel.vazquez at cnio.es]
>> *Sent:* November 18, 2016 10:06 AM
>> *To:* Francis Ouellette
>> *Cc:* docktesters at lists.icgc.org; Alysha Moncrieffe
>> *Subject:* Re: [DOCKTESTERS] Preliminary results for overlap between
>> testing and original VCF
>>
>> I've added a description on the google doc. Next week I'll try to put it
>> properly into my scripts so I can run a bunch of these.
>>
>> What is the status of the Sanger pipeline, is it fixed already?
>>
>> On Fri, Nov 18, 2016 at 3:37 PM, Francis Ouellette <francis at oicr.on.ca
>> <http://redir.aspx/?REF=WSDmaoJ3jmz2tDPVny13duJ4BUlEbh_-y7ecNxPVyIIp6Vk83w_UCAFtYWlsdG86ZnJhbmNpc0BvaWNyLm9uLmNh>
>> > wrote:
>>
>>>
>>> Great,
>>>
>>> Thank you Miguel!  I would call this one a success!
>>>
>>> I think we need two such success for each pipeline.
>>>
>>> I will update table with this one.
>>>
>>> Let’s get it done for the others. I will send more mail today.
>>>
>>> Miguel: I imagine you documented what you did on google doc?
>>>
>>> Thank you all,
>>>
>>> francis
>>>
>>> --
>>> B.F. Francis Ouellette          http://oicr.on.ca/person/fran
>>> cis-ouellette
>>> <http://redir.aspx/?REF=0UreupwcTkKW9dPdcAcOZSwr1PCoFogk1pc5rUxKyqwp6Vk83w_UCAFodHRwOi8vb2ljci5vbi5jYS9wZXJzb24vZnJhbmNpcy1vdWVsbGV0dGU.>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Nov 18, 2016, at 8:48 AM, Miguel Vazquez <miguel.vazquez at cnio.es
>>> <http://redir.aspx/?REF=Kweh6Zrv2yLYJ6d-A3noRb40KYzhiBDYyhrAJFBs39wp6Vk83w_UCAFtYWlsdG86bWlndWVsLnZhenF1ZXpAY25pby5lcw..>>
>>> wrote:
>>>
>>> Hi again
>>>
>>> I've done some more investigating and it turns out that there is a was
>>> ignoring the quite obvious 'FILTER' tag. Silly me. Filtering now for
>>> mutations that 'PASS' I get
>>>
>>> Comparison
>>> ----------
>>> Total original (dkfz): 16090
>>> Total this: 16088
>>>
>>>
>>>
>>>
>>> *Common: 16088 Missing: 2. Example: 10:86361665:T, 3:168842417:G Extra:
>>> 0. Example: *
>>> Not a perfect match, but very close!!!!
>>>
>>> Best
>>>
>>> Miguel
>>>
>>>
>>> On Fri, Nov 18, 2016 at 2:06 PM, Miguel Vazquez <miguel.vazquez at cnio.es
>>> <http://redir.aspx/?REF=Kweh6Zrv2yLYJ6d-A3noRb40KYzhiBDYyhrAJFBs39wp6Vk83w_UCAFtYWlsdG86bWlndWVsLnZhenF1ZXpAY25pby5lcw..>
>>> > wrote:
>>>
>>>> Dear Francis and friends,
>>>>
>>>> Given that Francis was eager to see some inital estimates on how well
>>>> the testing where in terms of overlap I have made some advances. Let me
>>>> show you some of my initial results.
>>>>
>>>> For sample DO50311 with the pipeline from DKFZ (using Delly first to
>>>> produce the BEDPE file) I get the following result:
>>>>
>>>>
>>>>> *Comparison ----------*
>>>>> Total original (dkfz): 16090
>>>>> Total this: 51087
>>>>> *Common: 16090*
>>>>> *Missing: 0*. Example:
>>>>> *Extra: 34997*. Example: 1:10157:C, 1:725511:A, 1:725971:T, 1:726707:A
>>>>>
>>>>>
>>>> Whit means that in the original VCF there are 16K mutations, all of
>>>> them are found in our new VCF (this), however our new file contains 35K
>>>> extra mutations. Listed are some examples of extra mutations, going back to
>>>> our VCF here is a sample line
>>>>
>>>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
>>>> CONTROL TUMOR
>>>> 1       725971  .       G       T       .
>>>> RE;BL;TAC;HSDEPTH;SBAF;FRQ;VAF  SOMATIC;SNP;AF=0.02,0.03;MQ=57
>>>> GT:DP:DP4       0/0:115:60,53,2,0       0/0:114:49,62,0,3
>>>>
>>>> I take it this is a good result. Finding all the reported mutations is
>>>> a great sign I think, and the extra mutations must be a filtering step that
>>>> we need to account for. *I hope someone can point out from the VCF
>>>> line above what is it that I need to use for the filtering.*
>>>>
>>>> The *VCF files I took from a file I have named
>>>> 'preliminary_final_release.snvs.tgz' from May 30* that contains VCF
>>>> file with the merged results from all callers. I simply subset the lines
>>>> for each caller, in this case dkfz. Also the files are listed by aliquote
>>>> so I have to translate the donor to aliquote ID. I've script this quickly
>>>> using my Rbbt framework but I'll rewrite it all in bash and add it to my
>>>> repo of testing scripts at https://github.com/mikisvaz/PC
>>>> AWG-Docker-Test
>>>> <http://redir.aspx/?REF=ifG4GCk_l3pNcH7e3HJrfA0cM0Gm2sNaFyX98swizIYp6Vk83w_UCAFodHRwczovL2dpdGh1Yi5jb20vbWlraXN2YXovUENBV0ctRG9ja2VyLVRlc3Q.>
>>>>
>>>> Summary of my progress
>>>> -----------------------------------
>>>>
>>>> - Pipelines: DKFZ (works), Sanger (doesn't work. fixed?), BWM-Mem (not
>>>> integrated; missing data-preparation step), Broad (??)
>>>> - Donor integration: GNOS (works), IGCG (works)
>>>> - Comparison: DKFZ (missing filtering?), rest (waiting)
>>>>
>>>> I have everything scripted so I can iterate a list of donors and
>>>> download the data, run pipelines, erase data, compare results.
>>>>
>>>> Missing things on my ToDo list
>>>> -------------------------------------------
>>>>
>>>> - Integrate BWM-Mem by incorporating the initial step to de-align the
>>>> BAM files
>>>> - Find a programmatic way to access the bundle-id files for each donor
>>>> from ICGC data portal, righ now I have to go to the web page
>>>> - Add filtering step to DKFZ and other pipelines as they become usable.
>>>> - Change the scripting of the comparison to bash and add it to
>>>> https://github.com/mikisvaz/PCAWG-Docker-Test
>>>> <http://redir.aspx/?REF=ifG4GCk_l3pNcH7e3HJrfA0cM0Gm2sNaFyX98swizIYp6Vk83w_UCAFodHRwczovL2dpdGh1Yi5jb20vbWlraXN2YXovUENBV0ctRG9ja2VyLVRlc3Q.>
>>>>
>>>> Best regards to all
>>>>
>>>> Miguel
>>>>
>>>>
>>>>
>>>> On Tue, Nov 8, 2016 at 3:40 PM, Francis Ouellette <francis at oicr.on.ca
>>>> <http://redir.aspx/?REF=WSDmaoJ3jmz2tDPVny13duJ4BUlEbh_-y7ecNxPVyIIp6Vk83w_UCAFtYWlsdG86ZnJhbmNpc0BvaWNyLm9uLmNh>
>>>> > wrote:
>>>>
>>>>>
>>>>> Anybody else on our poll for next call?
>>>>> Looks like Friday at 11:00. I will close poll later today.
>>>>>
>>>>>
>>>>> @bffo
>>>>>
>>>>>
>>>>>
>>>>> <Screenshot 2016-11-08 09.38.56.png>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> docktesters mailing list
>>>>> docktesters at lists.icgc.org
>>>>> <http://redir.aspx/?REF=KK_VOfU2uNcbODGU4Lfr1y3ZGPez4FjPXh7X_ZQ_MS8p6Vk83w_UCAFtYWlsdG86ZG9ja3Rlc3RlcnNAbGlzdHMuaWNnYy5vcmc.>
>>>>> https://lists.icgc.org/mailman/listinfo/docktesters
>>>>> <http://redir.aspx/?REF=u-2uNMcWMGwMjsNB2mWtIgvNoHHVJeMtFa2HY-To8sAp6Vk83w_UCAFodHRwczovL2xpc3RzLmljZ2Mub3JnL21haWxtYW4vbGlzdGluZm8vZG9ja3Rlc3RlcnM.>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161121/f159eeee/attachment-0001.html>


More information about the docktesters mailing list