[DOCKTESTERS] BWA-Mem update
Keiran Raine
kr2 at sanger.ac.uk
Thu Mar 23 06:14:40 EDT 2017
The order of reads passed into BWA does have an effect due to the way the insert size is calculated during proper-pair determination.
Diff_bams in PCAP-core has an option to ignore MAPQ=0, does that pass or fail with differences?
Keiran
From: Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk>
Date: Thursday, 23 March 2017 at 10:07
To: Keiran Raine <kr2 at sanger.ac.uk>
Cc: Miguel Vazquez <mikisvaz at gmail.com>, Junjun Zhang <Junjun.Zhang at oicr.on.ca>, George Mihaiescu <George.Mihaiescu at oicr.on.ca>, "docktesters at lists.icgc.org" <docktesters at lists.icgc.org>
Subject: Re: [DOCKTESTERS] BWA-Mem update
Thanks Keiran for the info!
Digging deeper into the headers, the versions of BWA (0.7.8-r455), bamsort (0.0.148) and bammarkduplicates (0.0.148) do seem to be the same for DO50311, and it’s only the workflow and SeqWare versions that differ.
I don’t really see how this could create the 3% discrepancies we’re getting though.
Is there anything else we might be overlooking here or some stochasticity involved, as the mismatched reads really do map differently, despite having completely identical sequences?
Thanks,
Jonas
_________________________________
Jonas Demeulemeester, PhD
Postdoctoral Researcher
The Francis Crick Institute
1 Midland Road
London
NW1 1AT
T: +44 (0)20 3796 2594
M: +44 (0)7482 070730
E: jonas.demeulemeester at crick.ac.uk<%22mailto:>
W: www.crick.ac.uk<%22http:/>
On 23 Mar 2017, at 09:47, Keiran Raine <kr2 at sanger.ac.uk<mailto:kr2 at sanger.ac.uk>> wrote:
Hi,
The jsonl files files on pancancer.org<http://pancancer.org/> contain the versions of software used originally. If someone can give me the BWA and bammarkduplicates(2?) versions used this may be explained.
Bammarkduplicates had a bug fix a few monthis into the mapping, but the reported differences at the time (I don't remember who did it) was <1%.
Keiran
From: Miguel Vazquez <mikisvaz at gmail.com<mailto:mikisvaz at gmail.com>>
Date: Wednesday, 22 March 2017 at 18:08
To: Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>>
Cc: Keiran Raine <kr2 at sanger.ac.uk<mailto:kr2 at sanger.ac.uk>>, Junjun Zhang <Junjun.Zhang at oicr.on.ca<mailto:Junjun.Zhang at oicr.on.ca>>, George Mihaiescu <George.Mihaiescu at oicr.on.ca<mailto:George.Mihaiescu at oicr.on.ca>>, "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>" <docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>
Subject: Re: [DOCKTESTERS] BWA-Mem update
Thanks Jonas for this information.
I hope that someone here can provide us with some suggestion on what to try next. Perhaps the version issue that Jonas point out is the key.
I just want to add that, as I told Jonas earlier, my own tests using the new split BAM files also gave 3% mismatches.
Best regards
Miguel
On Wed, Mar 22, 2017 at 6:56 PM, Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>> wrote:
Hi all,
A brief update on the BWA-Mem docker tests.
I prepared normal + tumor lane-level unaligned bams for DO503011 and ran the BWA-Mem workflow for normal and tumor seperately.
Doing the comparison however, I am still getting 3% of reads that are aligned differently (see below for a few examples).
However, when checking the headers of the original and newly mapped bam files (attached) I noticed that the original is mapped using a different version of BWA and SeqWare.
I’m hoping the mapping differences can be ascribed to this.
Is there a list available somewhere detailing which samples were mapped using which versions?
That way we could select a relevant test sample without having to sort through the headers of all different bams.
Best wishes,
Jonas
newly aligned:
ID flag chr pos
HS2000-1012_275:7:1101:17411:15403 99 3 112743126
HS2000-1012_275:7:1101:17411:15403 147 3 112743376
HS2000-1012_275:7:1101:11883:83640 99 16 28672999
HS2000-1012_275:7:1101:11883:83640 147 16 28673223
HS2000-1012_275:7:1101:16576:28476 163 GL000238.1 21309
HS2000-1012_275:7:1101:16576:28476 83 GL000238.1 21664
vs the original:
ID flag chr pos
HS2000-1012_275:7:1101:17411:15403 99 8 54944243
HS2000-1012_275:7:1101:17411:15403 147 8 54944493
HS2000-1012_275:7:1101:11883:83640 163 16 28464362
HS2000-1012_275:7:1101:11883:83640 83 16 28464586
HS2000-1012_275:7:1101:16576:28476 99 12 6124549
HS2000-1012_275:7:1101:16576:28476 147 12 6124903
_________________________________
Jonas Demeulemeester, PhD
Postdoctoral Researcher
The Francis Crick Institute
1 Midland Road
London
NW1 1AT
T: +44 (0)20 3796 2594<tel:+44%2020%203796%202594>
M: +44 (0)7482 070730<tel:+44%207482%20070730>
E: jonas.demeulemeester at crick.ac.uk<mailto:jonas.demeulemeester at crick.ac.uk>
W: www.crick.ac.uk<http://www.crick.ac.uk/>
The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20170323/34ca5473/attachment-0001.html>
More information about the docktesters
mailing list