[DOCKTESTERS] Preliminary results for overlap between testing and original VCF

Christina Yung Christina.Yung at oicr.on.ca
Fri Nov 18 12:40:51 EST 2016


Thank you, Miguel.  These results are very encouraging.  I just have a suggestion: since we’re comparing strictly the outputs of the DKFZ/EMBL pipeline, we should compare the pre-filtered results, ie. ~51K calls.  We’ll later compare if the filtering steps give similar results as well when the dockers become ready.

For testing BWA-Mem, Keiran has documented the steps to convert aligned BAM to unaligned:
https://wiki.oicr.on.ca/display/PANCANCER/Preparing+paired-end+data+for+upload

For Sanger docker, I believe Denis has tested the new version and reported that the problem is fixed.

Best,
Christina


From: docktesters-bounces+christina.yung=oicr.on.ca at lists.icgc.org [mailto:docktesters-bounces+christina.yung=oicr.on.ca at lists.icgc.org] On Behalf Of Miguel Vazquez
Sent: Friday, November 18, 2016 10:07 AM
To: Francis Ouellette
Cc: docktesters at lists.icgc.org; Alysha Moncrieffe
Subject: Re: [DOCKTESTERS] Preliminary results for overlap between testing and original VCF

I've added a description on the google doc. Next week I'll try to put it properly into my scripts so I can run a bunch of these.
What is the status of the Sanger pipeline, is it fixed already?

On Fri, Nov 18, 2016 at 3:37 PM, Francis Ouellette <francis at oicr.on.ca<mailto:francis at oicr.on.ca>> wrote:

Great,

Thank you Miguel!  I would call this one a success!

I think we need two such success for each pipeline.

I will update table with this one.

Let’s get it done for the others. I will send more mail today.

Miguel: I imagine you documented what you did on google doc?

Thank you all,

francis

--
B.F. Francis Ouellette          http://oicr.on.ca/person/francis-ouellette







On Nov 18, 2016, at 8:48 AM, Miguel Vazquez <miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>> wrote:

Hi again

I've done some more investigating and it turns out that there is a was ignoring the quite obvious 'FILTER' tag. Silly me. Filtering now for mutations that 'PASS' I get

Comparison
----------
Total original (dkfz): 16090
Total this: 16088
Common: 16088
Missing: 2. Example: 10:86361665:T, 3:168842417:G
Extra: 0. Example:
Not a perfect match, but very close!!!!
Best
Miguel


On Fri, Nov 18, 2016 at 2:06 PM, Miguel Vazquez <miguel.vazquez at cnio.es<mailto:miguel.vazquez at cnio.es>> wrote:
Dear Francis and friends,
Given that Francis was eager to see some inital estimates on how well the testing where in terms of overlap I have made some advances. Let me show you some of my initial results.
For sample DO50311 with the pipeline from DKFZ (using Delly first to produce the BEDPE file) I get the following result:
Comparison
----------
Total original (dkfz): 16090
Total this: 51087
Common: 16090
Missing: 0. Example:
Extra: 34997. Example: 1:10157:C, 1:725511:A, 1:725971:T, 1:726707:A

Whit means that in the original VCF there are 16K mutations, all of them are found in our new VCF (this), however our new file contains 35K extra mutations. Listed are some examples of extra mutations, going back to our VCF here is a sample line

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  CONTROL TUMOR
1       725971  .       G       T       .       RE;BL;TAC;HSDEPTH;SBAF;FRQ;VAF  SOMATIC;SNP;AF=0.02,0.03;MQ=57  GT:DP:DP4       0/0:115:60,53,2,0       0/0:114:49,62,0,3
I take it this is a good result. Finding all the reported mutations is a great sign I think, and the extra mutations must be a filtering step that we need to account for. I hope someone can point out from the VCF line above what is it that I need to use for the filtering.
The VCF files I took from a file I have named 'preliminary_final_release.snvs.tgz' from May 30 that contains VCF file with the merged results from all callers. I simply subset the lines for each caller, in this case dkfz. Also the files are listed by aliquote so I have to translate the donor to aliquote ID. I've script this quickly using my Rbbt framework but I'll rewrite it all in bash and add it to my repo of testing scripts at https://github.com/mikisvaz/PCAWG-Docker-Test
Summary of my progress
-----------------------------------

- Pipelines: DKFZ (works), Sanger (doesn't work. fixed?), BWM-Mem (not integrated; missing data-preparation step), Broad (??)
- Donor integration: GNOS (works), IGCG (works)
- Comparison: DKFZ (missing filtering?), rest (waiting)
I have everything scripted so I can iterate a list of donors and download the data, run pipelines, erase data, compare results.

Missing things on my ToDo list
-------------------------------------------
- Integrate BWM-Mem by incorporating the initial step to de-align the BAM files
- Find a programmatic way to access the bundle-id files for each donor from ICGC data portal, righ now I have to go to the web page
- Add filtering step to DKFZ and other pipelines as they become usable.
- Change the scripting of the comparison to bash and add it to https://github.com/mikisvaz/PCAWG-Docker-Test

Best regards to all
Miguel



On Tue, Nov 8, 2016 at 3:40 PM, Francis Ouellette <francis at oicr.on.ca<mailto:francis at oicr.on.ca>> wrote:

Anybody else on our poll for next call?
Looks like Friday at 11:00. I will close poll later today.


@bffo



<Screenshot 2016-11-08 09.38.56.png>



_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>
https://lists.icgc.org/mailman/listinfo/docktesters




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161118/28cfc438/attachment-0001.html>


More information about the docktesters mailing list