[DOCKTESTERS] Preliminary results for overlap between testing and original VCF

Miguel Vazquez miguel.vazquez at cnio.es
Fri Nov 18 10:06:44 EST 2016


I've added a description on the google doc. Next week I'll try to put it
properly into my scripts so I can run a bunch of these.

What is the status of the Sanger pipeline, is it fixed already?

On Fri, Nov 18, 2016 at 3:37 PM, Francis Ouellette <francis at oicr.on.ca>
wrote:

>
> Great,
>
> Thank you Miguel!  I would call this one a success!
>
> I think we need two such success for each pipeline.
>
> I will update table with this one.
>
> Let’s get it done for the others. I will send more mail today.
>
> Miguel: I imagine you documented what you did on google doc?
>
> Thank you all,
>
> francis
>
> --
> B.F. Francis Ouellette          http://oicr.on.ca/person/francis-ouellette
>
>
>
>
>
>
>
>
>
> On Nov 18, 2016, at 8:48 AM, Miguel Vazquez <miguel.vazquez at cnio.es>
> wrote:
>
> Hi again
>
> I've done some more investigating and it turns out that there is a was
> ignoring the quite obvious 'FILTER' tag. Silly me. Filtering now for
> mutations that 'PASS' I get
>
> Comparison
> ----------
> Total original (dkfz): 16090
> Total this: 16088
>
>
>
>
> *Common: 16088 Missing: 2. Example: 10:86361665:T, 3:168842417:G Extra: 0.
> Example: *
> Not a perfect match, but very close!!!!
>
> Best
>
> Miguel
>
>
> On Fri, Nov 18, 2016 at 2:06 PM, Miguel Vazquez <miguel.vazquez at cnio.es>
> wrote:
>
>> Dear Francis and friends,
>>
>> Given that Francis was eager to see some inital estimates on how well the
>> testing where in terms of overlap I have made some advances. Let me show
>> you some of my initial results.
>>
>> For sample DO50311 with the pipeline from DKFZ (using Delly first to
>> produce the BEDPE file) I get the following result:
>>
>>
>>> *Comparison ----------*
>>> Total original (dkfz): 16090
>>> Total this: 51087
>>> *Common: 16090*
>>> *Missing: 0*. Example:
>>> *Extra: 34997*. Example: 1:10157:C, 1:725511:A, 1:725971:T, 1:726707:A
>>>
>>>
>> Whit means that in the original VCF there are 16K mutations, all of them
>> are found in our new VCF (this), however our new file contains 35K extra
>> mutations. Listed are some examples of extra mutations, going back to our
>> VCF here is a sample line
>>
>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
>> CONTROL TUMOR
>> 1       725971  .       G       T       .
>> RE;BL;TAC;HSDEPTH;SBAF;FRQ;VAF  SOMATIC;SNP;AF=0.02,0.03;MQ=57
>> GT:DP:DP4       0/0:115:60,53,2,0       0/0:114:49,62,0,3
>>
>> I take it this is a good result. Finding all the reported mutations is a
>> great sign I think, and the extra mutations must be a filtering step that
>> we need to account for. *I hope someone can point out from the VCF line
>> above what is it that I need to use for the filtering.*
>>
>> The *VCF files I took from a file I have named
>> 'preliminary_final_release.snvs.tgz' from May 30* that contains VCF file
>> with the merged results from all callers. I simply subset the lines for
>> each caller, in this case dkfz. Also the files are listed by aliquote so I
>> have to translate the donor to aliquote ID. I've script this quickly using
>> my Rbbt framework but I'll rewrite it all in bash and add it to my repo of
>> testing scripts at https://github.com/mikisvaz/PCAWG-Docker-Test
>>
>> Summary of my progress
>> -----------------------------------
>>
>> - Pipelines: DKFZ (works), Sanger (doesn't work. fixed?), BWM-Mem (not
>> integrated; missing data-preparation step), Broad (??)
>> - Donor integration: GNOS (works), IGCG (works)
>> - Comparison: DKFZ (missing filtering?), rest (waiting)
>>
>> I have everything scripted so I can iterate a list of donors and download
>> the data, run pipelines, erase data, compare results.
>>
>> Missing things on my ToDo list
>> -------------------------------------------
>>
>> - Integrate BWM-Mem by incorporating the initial step to de-align the BAM
>> files
>> - Find a programmatic way to access the bundle-id files for each donor
>> from ICGC data portal, righ now I have to go to the web page
>> - Add filtering step to DKFZ and other pipelines as they become usable.
>> - Change the scripting of the comparison to bash and add it to
>> https://github.com/mikisvaz/PCAWG-Docker-Test
>>
>> Best regards to all
>>
>> Miguel
>>
>>
>>
>> On Tue, Nov 8, 2016 at 3:40 PM, Francis Ouellette <francis at oicr.on.ca>
>> wrote:
>>
>>>
>>> Anybody else on our poll for next call?
>>> Looks like Friday at 11:00. I will close poll later today.
>>>
>>>
>>> @bffo
>>>
>>>
>>>
>>> <Screenshot 2016-11-08 09.38.56.png>
>>>
>>>
>>>
>>> _______________________________________________
>>> docktesters mailing list
>>> docktesters at lists.icgc.org
>>> https://lists.icgc.org/mailman/listinfo/docktesters
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161118/10112cec/attachment-0001.html>


More information about the docktesters mailing list