[DOCKTESTERS] BWA-Mem update
Keiran Raine
kr2 at sanger.ac.uk
Thu Mar 23 05:58:49 EDT 2017
Hi,
Sorry if this is in your confluence page but I'm unable to access (could be as I'm outside OICR or that the default for your space is owner only).
Can you confirm if the CaVEMan calling was base on the BAM file that the original data was generated with or a one mapped with the new/recent mapping flow?
Also, the key information for determining if a call change is erroneous:
1. Is the variant is marked 'PASSED'.
2. What are the probabilities attached to the VCF record (should be in the info field)?
As previously stated we do expect a small variance in the results for the data processed at the beginning of the project and those at the end as well as some minor changes introduced when the normal-panel was moved from a web-service to a local file.
Regards,
Keiran
From: George Mihaiescu <George.Mihaiescu at oicr.on.ca>
Date: Wednesday, 22 March 2017 at 20:18
To: Miguel Vazquez <mikisvaz at gmail.com>, Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk>
Cc: Keiran Raine <kr2 at sanger.ac.uk>, Junjun Zhang <Junjun.Zhang at oicr.on.ca>, "docktesters at lists.icgc.org" <docktesters at lists.icgc.org>
Subject: Re: [DOCKTESTERS] BWA-Mem update
I finished one of the dockerized Sanger tests and upon verification there were just a few differences, but I'm not sure if they are normal or not.
Results:
root at dockstore-test3:~/PCAWG-Docker-Test# bin/compare_result.sh Sanger DO50398
var/spool/cwl/0/caveman/
var/spool/cwl/0/caveman/splitList
var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.muts.ids.vcf.gz
var/spool/cwl/0/caveman/alg_bean
var/spool/cwl/0/caveman/prob_arr
var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.snps.ids.vcf.gz.tbi
var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.no_analysis.bed
var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.snps.ids.vcf.gz
var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.flagged.muts.vcf.gz
var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.muts.ids.vcf.gz.tbi
var/spool/cwl/0/caveman/cov_arr
var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.flagged.muts.vcf.gz.tbi
var/spool/cwl/0/caveman/caveman.cfg.ini
Comparison for DO50398 using Sanger
---
Common: 171325
Extra: 3
- Example: 14:20031258:G,8:43827158:A,X:61711363:C
Missing: 13
- Example: 10:106963148:T,17:64794691:G,1:82709263:T
Because I'm a infrastructure architect my main reason for the test was to monitor resource utilization, so I wrote a wiki detailing my observations:
https://wiki.oicr.on.ca/display/~gmihaiescu/Dockerized+Sanger+workflow
I have there more Docker tests running, two of them run Sanger against the same donor (but using Vms with 8 cores because I want to see if the run time and resource utilization are constant), and a third test that is running DKFZ.
Cheers,
George
From: Miguel Vazquez <mikisvaz at gmail.com<mailto:mikisvaz at gmail.com>>
Date: Wednesday, March 22, 2017 at 1:08 PM
To: Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>>
Cc: Keiran Raine <kr2 at sanger.ac.uk<mailto:kr2 at sanger.ac.uk>>, Junjun Zhang <Junjun.Zhang at oicr.on.ca<mailto:Junjun.Zhang at oicr.on.ca>>, George Mihaiescu <George.Mihaiescu at oicr.on.ca<mailto:George.Mihaiescu at oicr.on.ca>>, "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>" <docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>
Subject: Re: [DOCKTESTERS] BWA-Mem update
Thanks Jonas for this information.
I hope that someone here can provide us with some suggestion on what to try next. Perhaps the version issue that Jonas point out is the key.
I just want to add that, as I told Jonas earlier, my own tests using the new split BAM files also gave 3% mismatches.
Best regards
Miguel
On Wed, Mar 22, 2017 at 6:56 PM, Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>> wrote:
Hi all,
A brief update on the BWA-Mem docker tests.
I prepared normal + tumor lane-level unaligned bams for DO503011 and ran the BWA-Mem workflow for normal and tumor seperately.
Doing the comparison however, I am still getting 3% of reads that are aligned differently (see below for a few examples).
However, when checking the headers of the original and newly mapped bam files (attached) I noticed that the original is mapped using a different version of BWA and SeqWare.
I’m hoping the mapping differences can be ascribed to this.
Is there a list available somewhere detailing which samples were mapped using which versions?
That way we could select a relevant test sample without having to sort through the headers of all different bams.
Best wishes,
Jonas
newly aligned:
IDflagchrpos
HS2000-1012_275:7:1101:17411:15403993112743126
HS2000-1012_275:7:1101:17411:154031473112743376
HS2000-1012_275:7:1101:11883:83640991628672999
HS2000-1012_275:7:1101:11883:836401471628673223
HS2000-1012_275:7:1101:16576:28476163GL000238.121309
HS2000-1012_275:7:1101:16576:2847683GL000238.121664
vs the original:
IDflagchrpos
HS2000-1012_275:7:1101:17411:1540399854944243
HS2000-1012_275:7:1101:17411:15403147854944493
HS2000-1012_275:7:1101:11883:836401631628464362
HS2000-1012_275:7:1101:11883:83640831628464586
HS2000-1012_275:7:1101:16576:2847699126124549
HS2000-1012_275:7:1101:16576:28476147126124903
_________________________________
Jonas Demeulemeester, PhD
Postdoctoral Researcher
The Francis Crick Institute
1 Midland Road
London
NW1 1AT
T: +44 (0)20 3796 2594<tel:+44%2020%203796%202594>
M: +44 (0)7482 070730<tel:+44%207482%20070730>
E: jonas.demeulemeester at crick.ac.uk
W: www.crick.ac.uk
The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20170323/d6b786ba/attachment-0001.html>
More information about the docktesters
mailing list