[DOCKTESTERS] BWA-Mem update

George Mihaiescu George.Mihaiescu at oicr.on.ca
Thu Apr 27 12:17:04 EDT 2017


I was in vacation last week and then busy with other tasks, but I would like to add that I ran DKFZ on donor DO50398 and the comparison returned 100% validation.


Comparison for DO50398 using DKFZ

---

Common: 109936

Extra: 0

Missing: 0


Cheers,

George


From: George Mihaiescu <George.Mihaiescu at oicr.on.ca<mailto:George.Mihaiescu at oicr.on.ca>>
Date: Friday, March 24, 2017 at 11:58 PM
To: Keiran Raine <kr2 at sanger.ac.uk<mailto:kr2 at sanger.ac.uk>>, Miguel Vazquez <mikisvaz at gmail.com<mailto:mikisvaz at gmail.com>>, Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>>
Cc: Junjun Zhang <Junjun.Zhang at oicr.on.ca<mailto:Junjun.Zhang at oicr.on.ca>>, "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>" <docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>
Subject: Re: [DOCKTESTERS] BWA-Mem update

Hi Keiran,

I used the original aligned BAMs available in Collaboratory and GNOS sites.

One of my two other Sanger tests ran against the same donor completed too, and it had exactly the same output when I ran the "compare_result.sh" script, but I'm not sure what you meant by "the key information for determining if a call change is erroneous".
Is the check script correctly (or not) validating the result?

I'll probably send a final report on Monday with the results of all four tests (three Sanger and one DKFZ).

Cheers,
George

From: Keiran Raine <kr2 at sanger.ac.uk<mailto:kr2 at sanger.ac.uk>>
Date: Thursday, March 23, 2017 at 4:58 AM
To: George Mihaiescu <George.Mihaiescu at oicr.on.ca<mailto:George.Mihaiescu at oicr.on.ca>>, Miguel Vazquez <mikisvaz at gmail.com<mailto:mikisvaz at gmail.com>>, Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>>
Cc: Junjun Zhang <Junjun.Zhang at oicr.on.ca<mailto:Junjun.Zhang at oicr.on.ca>>, "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>" <docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>
Subject: Re: [DOCKTESTERS] BWA-Mem update

Hi,

Sorry if this is in your confluence page but I'm unable to access (could be as I'm outside OICR or that the default for your space is owner only).

Can you confirm if the CaVEMan calling was base on the BAM file that the original data was generated with or a one mapped with the new/recent mapping flow?

Also, the key information for determining if a call change is erroneous:

1. Is the variant is marked 'PASSED'.
2. What are the probabilities attached to the VCF record (should be in the info field)?

As previously stated we do expect a small variance in the results for the data processed at the beginning of the project and those at the end as well as some minor changes introduced when the normal-panel was moved from a web-service to a local file.

Regards,

Keiran

From: George Mihaiescu <George.Mihaiescu at oicr.on.ca<mailto:George.Mihaiescu at oicr.on.ca>>
Date: Wednesday, 22 March 2017 at 20:18
To: Miguel Vazquez <mikisvaz at gmail.com<mailto:mikisvaz at gmail.com>>, Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>>
Cc: Keiran Raine <kr2 at sanger.ac.uk<mailto:kr2 at sanger.ac.uk>>, Junjun Zhang <Junjun.Zhang at oicr.on.ca<mailto:Junjun.Zhang at oicr.on.ca>>, "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>" <docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>
Subject: Re: [DOCKTESTERS] BWA-Mem update

I finished one of the dockerized Sanger tests and upon verification there were just a few differences, but I'm not sure if they are normal or not.

Results:

root at dockstore-test3:~/PCAWG-Docker-Test# bin/compare_result.sh Sanger DO50398

var/spool/cwl/0/caveman/

var/spool/cwl/0/caveman/splitList

var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.muts.ids.vcf.gz

var/spool/cwl/0/caveman/alg_bean

var/spool/cwl/0/caveman/prob_arr

var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.snps.ids.vcf.gz.tbi

var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.no_analysis.bed

var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.snps.ids.vcf.gz

var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.flagged.muts.vcf.gz

var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.muts.ids.vcf.gz.tbi

var/spool/cwl/0/caveman/cov_arr

var/spool/cwl/0/caveman/7f94d650-41b9-4664-bcde-dc8533e4602d_vs_69586c55-6f81-4728-8a82-bd97bceafaaa.flagged.muts.vcf.gz.tbi

var/spool/cwl/0/caveman/caveman.cfg.ini

Comparison for DO50398 using Sanger

---

Common: 171325

Extra: 3

    - Example: 14:20031258:G,8:43827158:A,X:61711363:C

Missing: 13

    - Example: 10:106963148:T,17:64794691:G,1:82709263:T




Because I'm a infrastructure architect my main reason for the test was to monitor resource utilization, so I wrote a wiki detailing my observations:

https://wiki.oicr.on.ca/display/~gmihaiescu/Dockerized+Sanger+workflow

I have there more Docker tests running, two of them run Sanger against the same donor (but using Vms with 8 cores because I want to see if the run time and resource utilization are constant), and a third test that is running DKFZ.

Cheers,
George

From: Miguel Vazquez <mikisvaz at gmail.com<mailto:mikisvaz at gmail.com>>
Date: Wednesday, March 22, 2017 at 1:08 PM
To: Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>>
Cc: Keiran Raine <kr2 at sanger.ac.uk<mailto:kr2 at sanger.ac.uk>>, Junjun Zhang <Junjun.Zhang at oicr.on.ca<mailto:Junjun.Zhang at oicr.on.ca>>, George Mihaiescu <George.Mihaiescu at oicr.on.ca<mailto:George.Mihaiescu at oicr.on.ca>>, "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>" <docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>
Subject: Re: [DOCKTESTERS] BWA-Mem update

Thanks Jonas for this information.

I hope that someone here can provide us with some suggestion on what to try next. Perhaps the version issue that Jonas point out is the key.

I just want to add that, as I told Jonas earlier, my own tests using the new split BAM files also gave 3% mismatches.
Best regards
Miguel

On Wed, Mar 22, 2017 at 6:56 PM, Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>> wrote:
Hi all,

A brief update on the BWA-Mem docker tests.
I prepared normal + tumor lane-level unaligned bams for DO503011 and ran the BWA-Mem workflow for normal and tumor seperately.
Doing the comparison however, I am still getting 3% of reads that are aligned differently (see below for a few examples).
However, when checking the headers of the original and newly mapped bam files (attached) I noticed that the original is mapped using a different version of BWA and SeqWare.
I’m hoping the mapping differences can be ascribed to this.

Is there a list available somewhere detailing which samples were mapped using which versions?
That way we could select a relevant test sample without having to sort through the headers of all different bams.

Best wishes,
Jonas





newly aligned:

IDflagchrpos
HS2000-1012_275:7:1101:17411:15403993112743126
HS2000-1012_275:7:1101:17411:154031473112743376
HS2000-1012_275:7:1101:11883:83640991628672999
HS2000-1012_275:7:1101:11883:836401471628673223
HS2000-1012_275:7:1101:16576:28476163GL000238.121309
HS2000-1012_275:7:1101:16576:2847683GL000238.121664

vs the original:

IDflagchrpos
HS2000-1012_275:7:1101:17411:1540399854944243
HS2000-1012_275:7:1101:17411:15403147854944493
HS2000-1012_275:7:1101:11883:836401631628464362
HS2000-1012_275:7:1101:11883:83640831628464586
HS2000-1012_275:7:1101:16576:2847699126124549
HS2000-1012_275:7:1101:16576:28476147126124903


_________________________________
Jonas Demeulemeester, PhD
Postdoctoral Researcher
The Francis Crick Institute
1 Midland Road
London
NW1 1AT

T: +44 (0)20 3796 2594<tel:+44%2020%203796%202594>
M: +44 (0)7482 070730<tel:+44%207482%20070730>
E: jonas.demeulemeester at crick.ac.uk<mailto:jonas.demeulemeester at crick.ac.uk>
W: www.crick.ac.uk

The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT

-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20170427/ea47d174/attachment-0001.html>


More information about the docktesters mailing list