[DOCKTESTERS] BWA-Mem update
Jonas Demeulemeester
Jonas.Demeulemeester at crick.ac.uk
Mon Apr 10 12:33:16 EDT 2017
Hi George,
We should be able to provide you with a final answer on this tomorrow by the latest :)
Would that be OK for you?
You could already download the relevant unaligned bam files as this will anyhow take some time.
Miguel and I have been running on donor DO51057 and you can find GNOS IDs for the unaligned files in the attached JSON file.
Get all of the bams defined under the “unaligned_bams” headers (there will be 2 of these headers in the JSON, one for the normal and one for the tumor).
The BWA-Mem docker is internally set to run on 8 cores, so using a large VM is unlikely to affect run-time.
Best wishes,
Jonas
_________________________________
Jonas Demeulemeester, PhD
Postdoctoral Researcher
The Francis Crick Institute
1 Midland Road
London
NW1 1AT
T: +44 (0)20 3796 2594
M: +44 (0)7482 070730
E: jonas.demeulemeester at crick.ac.uk
W: www.crick.ac.uk
On 10 Apr 2017, at 17:11, George Mihaiescu <George.Mihaiescu at oicr.on.ca<mailto:George.Mihaiescu at oicr.on.ca>> wrote:
Hi,
I would like to run the BWA-mem dockerized workflow in the Collaboratory environment, but I need some help in order to do this:
* A ready-to-run script or instructions
* The input files: single file or multiple files, whatever the script needs as an input
* The donor ID, preferably the same donor that was already used in order to prove the reproducibility of the results
I can start the workflow on a large VM in order to speed up the result.
Also, I'm currently running the DKFZ workflow on DO50398 because I've already ran Sanger on it, and I want to compare the run times for the two workflows on the same data set.
Thank you,
George
From: Miguel Vazquez <mikisvaz at gmail.com<mailto:mikisvaz at gmail.com>>
Date: Wednesday, March 22, 2017 at 2:08 PM
To: Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>>
Cc: Keiran Raine <kr2 at sanger.ac.uk<mailto:kr2 at sanger.ac.uk>>, Junjun Zhang <Junjun.Zhang at oicr.on.ca<mailto:Junjun.Zhang at oicr.on.ca>>, George Mihaiescu <George.Mihaiescu at oicr.on.ca<mailto:George.Mihaiescu at oicr.on.ca>>, "docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>" <docktesters at lists.icgc.org<mailto:docktesters at lists.icgc.org>>
Subject: Re: [DOCKTESTERS] BWA-Mem update
Thanks Jonas for this information.
I hope that someone here can provide us with some suggestion on what to try next. Perhaps the version issue that Jonas point out is the key.
I just want to add that, as I told Jonas earlier, my own tests using the new split BAM files also gave 3% mismatches.
Best regards
Miguel
On Wed, Mar 22, 2017 at 6:56 PM, Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk<mailto:Jonas.Demeulemeester at crick.ac.uk>> wrote:
Hi all,
A brief update on the BWA-Mem docker tests.
I prepared normal + tumor lane-level unaligned bams for DO503011 and ran the BWA-Mem workflow for normal and tumor seperately.
Doing the comparison however, I am still getting 3% of reads that are aligned differently (see below for a few examples).
However, when checking the headers of the original and newly mapped bam files (attached) I noticed that the original is mapped using a different version of BWA and SeqWare.
I’m hoping the mapping differences can be ascribed to this.
Is there a list available somewhere detailing which samples were mapped using which versions?
That way we could select a relevant test sample without having to sort through the headers of all different bams.
Best wishes,
Jonas
newly aligned:
IDflagchrpos
HS2000-1012_275:7:1101:17411:15403993112743126
HS2000-1012_275:7:1101:17411:154031473112743376
HS2000-1012_275:7:1101:11883:83640991628672999
HS2000-1012_275:7:1101:11883:836401471628673223
HS2000-1012_275:7:1101:16576:28476163GL000238.121309
HS2000-1012_275:7:1101:16576:2847683GL000238.121664
vs the original:
IDflagchrpos
HS2000-1012_275:7:1101:17411:1540399854944243
HS2000-1012_275:7:1101:17411:15403147854944493
HS2000-1012_275:7:1101:11883:836401631628464362
HS2000-1012_275:7:1101:11883:83640831628464586
HS2000-1012_275:7:1101:16576:2847699126124549
HS2000-1012_275:7:1101:16576:28476147126124903
_________________________________
Jonas Demeulemeester, PhD
Postdoctoral Researcher
The Francis Crick Institute
1 Midland Road
London
NW1 1AT
T: +44 (0)20 3796 2594<tel:+44%2020%203796%202594>
M: +44 (0)7482 070730<tel:+44%207482%20070730>
E: jonas.demeulemeester at crick.ac.uk
W: www.crick.ac.uk
The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT
The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20170410/be15eda5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DO51057.json.gz
Type: application/x-gzip
Size: 33031 bytes
Desc: DO51057.json.gz
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20170410/be15eda5/attachment-0001.bin>
More information about the docktesters
mailing list