[DOCKTESTERS] Preliminary results for overlap between testing and original VCF

Miguel Vazquez miguel.vazquez at cnio.es
Fri Nov 18 08:06:35 EST 2016


Dear Francis and friends,

Given that Francis was eager to see some inital estimates on how well the
testing where in terms of overlap I have made some advances. Let me show
you some of my initial results.

For sample DO50311 with the pipeline from DKFZ (using Delly first to
produce the BEDPE file) I get the following result:


> *Comparison----------*
> Total original (dkfz): 16090
> Total this: 51087
> *Common: 16090*
> *Missing: 0*. Example:
> *Extra: 34997*. Example: 1:10157:C, 1:725511:A, 1:725971:T, 1:726707:A
>
>
Whit means that in the original VCF there are 16K mutations, all of them
are found in our new VCF (this), however our new file contains 35K extra
mutations. Listed are some examples of extra mutations, going back to our
VCF here is a sample line

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
CONTROL TUMOR
1       725971  .       G       T       .       RE;BL;TAC;HSDEPTH;SBAF;FRQ;VAF
SOMATIC;SNP;AF=0.02,0.03;MQ=57  GT:DP:DP4       0/0:115:60,53,2,0
0/0:114:49,62,0,3

I take it this is a good result. Finding all the reported mutations is a
great sign I think, and the extra mutations must be a filtering step that
we need to account for. *I hope someone can point out from the VCF line
above what is it that I need to use for the filtering.*

The *VCF files I took from a file I have named
'preliminary_final_release.snvs.tgz' from May 30* that contains VCF file
with the merged results from all callers. I simply subset the lines for
each caller, in this case dkfz. Also the files are listed by aliquote so I
have to translate the donor to aliquote ID. I've script this quickly using
my Rbbt framework but I'll rewrite it all in bash and add it to my repo of
testing scripts at https://github.com/mikisvaz/PCAWG-Docker-Test

Summary of my progress
-----------------------------------

- Pipelines: DKFZ (works), Sanger (doesn't work. fixed?), BWM-Mem (not
integrated; missing data-preparation step), Broad (??)
- Donor integration: GNOS (works), IGCG (works)
- Comparison: DKFZ (missing filtering?), rest (waiting)

I have everything scripted so I can iterate a list of donors and download
the data, run pipelines, erase data, compare results.

Missing things on my ToDo list
-------------------------------------------

- Integrate BWM-Mem by incorporating the initial step to de-align the BAM
files
- Find a programmatic way to access the bundle-id files for each donor from
ICGC data portal, righ now I have to go to the web page
- Add filtering step to DKFZ and other pipelines as they become usable.
- Change the scripting of the comparison to bash and add it to
https://github.com/mikisvaz/PCAWG-Docker-Test

Best regards to all

Miguel



On Tue, Nov 8, 2016 at 3:40 PM, Francis Ouellette <francis at oicr.on.ca>
wrote:

>
> Anybody else on our poll for next call?
> Looks like Friday at 11:00. I will close poll later today.
>
>
> @bffo
>
>
>
>
>
>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
> https://lists.icgc.org/mailman/listinfo/docktesters
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161118/2d7c2492/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2016-11-08 09.38.56.png
Type: image/png
Size: 219463 bytes
Desc: not available
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161118/2d7c2492/attachment-0001.png>


More information about the docktesters mailing list