[DOCKTESTERS] Thanks!

Miguel Vazquez mikisvaz at gmail.com
Wed Mar 15 06:30:04 EDT 2017


Excellent Jonas! Thank you so much. I'll pull your changes and try to help
out debugging the issues.

Cheers

Miguel

On Wed, Mar 15, 2017 at 11:20 AM, Jonas Demeulemeester <
Jonas.Demeulemeester at crick.ac.uk> wrote:

> Hi all,
>
> I’ve written up the code to prepare unaligned bam files split by read
> group from the merged bams (*prepare_unaligned.sh*, I deprecated the
> previous one as *prepare_unaligned_deprecated.sh*).
> Briefly, it’s using Picard to split and reset the bams and afterwards to
> correct the headers.
> (I’ve added a wrapper script to install Picard locally as well:
> *install_picard.sh*)
>
> For subsampled merged DO50311 bam files this results in 5 separate bams
> for the tumor (tumor.unaligned.1–5.bam) with the following headers
> corresponding to the 5 different read groups in the original data:
>
> @HD     VN:1.4
> @RG     ID:CRUK-CI:LP6005334-DNA_C03''' PL:ILLUMINA     CN:CRUK-CI
>  DT:2014-07-27T01:00:00+0100     PI:0    LB:WGS:CRUK-CI:LP6005334-DNA_C03
>        PM:Illumina HiSeq 2000  SM:b02b4bba-6e66-44fb-a48f-38c309aaaac5
> PU:CRUK-CI:LP6005334-DNA_C03_7
> @CO     dcc_project_code:DOCKER-TEST
> @CO     submitter_donor_id:dummy
> @CO     submitter_specimen_id:dummy.specimen
> @CO     submitter_sample_id:dummy.sample
> @CO     dcc_specimen_type:Primary tumour - solid tissue
> @CO     use_cntl:85098796-a2c1-11e3-a743-6c6c38d06053
>
>
> @HD     VN:1.4
> @RG     ID:CRUK-CI:LP6005334-DNA_C03''''        PL:ILLUMINA
> CN:CRUK-CI      DT:2014-07-27T01:00:00+0100     PI:0
>  LB:WGS:CRUK-CI:LP6005334-DNA_C03        PM:Illumina HiSeq 2000
>  SM:b02b4bba-6e66-44fb-a48f-38c309aaaac5 PU:CRUK-CI:LP6005334-DNA_C03_8
> @CO     dcc_project_code:DOCKER-TEST
> @CO     submitter_donor_id:dummy
> @CO     submitter_specimen_id:dummy.specimen
> @CO     submitter_sample_id:dummy.sample
> @CO     dcc_specimen_type:Primary tumour - solid tissue
> @CO     use_cntl:85098796-a2c1-11e3-a743-6c6c38d06053
>
>
> @HD     VN:1.4
> @RG     ID:CRUK-CI:LP6005334-DNA_C03    PL:ILLUMINA     CN:CRUK-CI
>  DT:2014-07-27T01:00:00+0100     PI:0    LB:WGS:CRUK-CI:LP6005334-DNA_C03
>        PM:Illumina HiSeq 2000  SM:b02b4bba-6e66-44fb-a48f-38c309aaaac5
> PU:CRUK-CI:LP6005334-DNA_C03_1
> @CO     dcc_project_code:DOCKER-TEST
> @CO     submitter_donor_id:dummy
> @CO     submitter_specimen_id:dummy.specimen
> @CO     submitter_sample_id:dummy.sample
> @CO     dcc_specimen_type:Primary tumour - solid tissue
> @CO     use_cntl:85098796-a2c1-11e3-a743-6c6c38d06053
>
>
> @HD     VN:1.4
> @RG     ID:CRUK-CI:LP6005334-DNA_C03'   PL:ILLUMINA     CN:CRUK-CI
>  DT:2014-07-27T01:00:00+0100     PI:0    LB:WGS:CRUK-CI:LP6005334-DNA_C03
>        PM:Illumina HiSeq 2000  SM:b02b4bba-6e66-44fb-a48f-38c309aaaac5
> PU:CRUK-CI:LP6005334-DNA_C03_2
> @CO     dcc_project_code:DOCKER-TEST
> @CO     submitter_donor_id:dummy
> @CO     submitter_specimen_id:dummy.specimen
> @CO     submitter_sample_id:dummy.sample
> @CO     dcc_specimen_type:Primary tumour - solid tissue
> @CO     use_cntl:85098796-a2c1-11e3-a743-6c6c38d06053
>
>
> @HD     VN:1.4
> @RG     ID:CRUK-CI:LP6005334-DNA_C03''  PL:ILLUMINA     CN:CRUK-CI
>  DT:2014-07-27T01:00:00+0100     PI:0    LB:WGS:CRUK-CI:LP6005334-DNA_C03
>        PM:Illumina HiSeq 2000  SM:b02b4bba-6e66-44fb-a48f-38c309aaaac5
> PU:CRUK-CI:LP6005334-DNA_C03_6
> @CO     dcc_project_code:DOCKER-TEST
> @CO     submitter_donor_id:dummy
> @CO     submitter_specimen_id:dummy.specimen
> @CO     submitter_sample_id:dummy.sample
> @CO     dcc_specimen_type:Primary tumour - solid tissue
> @CO     use_cntl:85098796-a2c1-11e3-a743-6c6c38d06053
>
>
>
>
> For the (subsampled) normal there are 3 read groups and hence 3 unaligned
> bam files (normal.unaligned.1–5.bam) with the following headers
>
> @HD     VN:1.4
> @RG     ID:CRUK-CI:LP6005333-DNA_C03''  PL:ILLUMINA     CN:CRUK-CI
>  DT:2014-07-27T01:00:00+0100     PI:0    LB:WGS:CRUK-CI:LP6005333-DNA_C03
>        PM:Illumina HiSeq 2000  SM:8c0354eb-6a3e-4a98-b41c-f8add599884c
> PU:CRUK-CI:LP6005333-DNA_C03_1
> @CO     dcc_project_code:DOCKER-TEST
> @CO     submitter_donor_id:dummy
> @CO     submitter_specimen_id:dummy.specimen
> @CO     submitter_sample_id:dummy.sample
> @CO     dcc_specimen_type:Primary tumour - solid tissue
> @CO     use_cntl:85098796-a2c1-11e3-a743-6c6c38d06053
>
>
> @HD     VN:1.4
> @RG     ID:CRUK-CI:LP6005333-DNA_C03'   PL:ILLUMINA     CN:CRUK-CI
>  DT:2014-07-26T01:00:00+0100     PI:0    LB:WGS:CRUK-CI:LP6005333-DNA_C03
>        PM:Illumina HiSeq 2000  SM:8c0354eb-6a3e-4a98-b41c-f8add599884c
> PU:CRUK-CI:LP6005333-DNA_C03_8
> @CO     dcc_project_code:DOCKER-TEST
> @CO     submitter_donor_id:dummy
> @CO     submitter_specimen_id:dummy.specimen
> @CO     submitter_sample_id:dummy.sample
> @CO     dcc_specimen_type:Primary tumour - solid tissue
> @CO     use_cntl:85098796-a2c1-11e3-a743-6c6c38d06053
>
>
> @HD     VN:1.4
> @RG     ID:CRUK-CI:LP6005333-DNA_C03    PL:ILLUMINA     CN:CRUK-CI
>  DT:2014-07-26T01:00:00+0100     PI:0    LB:WGS:CRUK-CI:LP6005333-DNA_C03
>        PM:Illumina HiSeq 2000  SM:8c0354eb-6a3e-4a98-b41c-f8add599884c
> PU:CRUK-CI:LP6005333-DNA_C03_7
> @CO     dcc_project_code:DOCKER-TEST
> @CO     submitter_donor_id:dummy
> @CO     submitter_specimen_id:dummy.specimen
> @CO     submitter_sample_id:dummy.sample
> @CO     dcc_specimen_type:Primary tumour - solid tissue
> @CO     use_cntl:85098796-a2c1-11e3-a743-6c6c38d06053
>
>
> I’ve also modified the downstream code to run tests and prepare the JSON
> file for input (*run_test.sh*, *BWA-Mem.json.template*).
> If I’m not mistaken, feeding either all tumor or all normal bam files to
> the BWA-Mem docker should result in the desired, merged output, as all
> files are processed separately internally before being merged in a final
> BWA-Mem docker step.
> Please correct me if I’m wrong.
> In any case, I’m pushing the code from my repo (
> https://github.com/jdemeul/PCAWG-Docker-Test) to Miguel’s (
> https://github.com/mikisvaz/PCAWG-Docker-Test), so anyone interested can
> look at it (and try it)
>
> Using this setup, the BWA-Mem docker runs successfully here (on my
> downsampled DO50311 dummy bams), up until the point the output
> unaligned_bam_bai file needs to be collected.
> (*Error while running job: Error collecting output for parameter
> 'merged_output_bai': Long-running script killed after 20 seconds.*)
> This is an error I was having before as well, and initially thought it was
> a disk space issue, but I no longer think this is the case.
> I’ve attached the run output, does anyone know what might be the issue
> here?
>
> Best wishes,
> Jonas
>
>
>
> _________________________________
> Jonas Demeulemeester, PhD
> Postdoctoral Researcher
> The Francis Crick Institute
> 1 Midland Road
> London
> NW1 1AT
>
> *T:* +44 (0)20 3796 2594 <+44%2020%203796%202594>
> M: +44 (0)7482 070730 <+44%207482%20070730>
> *E:* jonas.demeulemeester at crick.ac.uk
> *W:* www.crick.ac.uk
>
>
>
> On 14 Mar 2017, at 14:32, Keiran Raine <kr2 at sanger.ac.uk> wrote:
>
> Hi,
>
> You would also only expect a minimal level of duplicates in a good test
> sample, and likely quite a small number of readgroups.
>
> Keiran
>
> *From: *Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk>
> *Date: *Tuesday, 14 March 2017 at 13:49
> *To: *Miguel Vazquez <mikisvaz at gmail.com>
> *Cc: *Junjun Zhang <Junjun.Zhang at oicr.on.ca>, Keiran Raine <
> kr2 at sanger.ac.uk>, George Mihaiescu <George.Mihaiescu at oicr.on.ca>, "
> docktesters at lists.icgc.org" <docktesters at lists.icgc.org>
> *Subject: *Re: [DOCKTESTERS] Thanks!
>
> Hi Miguel,
>
> I’ll have a go at modifying your scripts to do this kind of preprocessing.
>
> As to why alignment by lane level vs alignment of a single merged bam
> would result in only 3% discrepancies, I can imagine that read lengths etc
> may not be that different between the different libraries (for our tested
> donors at least).
> Please correct me if I’m wrong though!
>
> Best regards,
> Jonas
>
> _________________________________
> Jonas Demeulemeester, PhD
> Postdoctoral Researcher
> The Francis Crick Institute
> 1 Midland Road
> London
> NW1 1AT
>
> *T:* +44 (0)20 3796 2594 <+44%2020%203796%202594>
> M: +44 (0)7482 070730 <+44%207482%20070730>
> *E:* jonas.demeulemeester at crick.ac.uk
> *W:* www.crick.ac.uk
>
>
>
>
> On 14 Mar 2017, at 12:44, Miguel Vazquez <mikisvaz at gmail.com> wrote:
>
>
> Hi Junjun and Keiran,
>
> I'm sorry guys, but his is too alien for me, this was never my area of
> expertise. I'm going to need someone to write a script for me that takes a
> BAM file and turns it into what ever I need to run BWA-Mem on. At least
> pseudo-code or something that I can start with.
>
> I think perhaps someone more knowledgeable than me should consider if this
> procedure as a whole is acceptable in terms of reproducibility, and how
> would be best to document it or if it could possibly be improved.
> Also, I don't think I understand the nature of the problem because from
> what I can fathom this problem should have either broken the process or
> render a much larger of discrepancies than 3%. Can someone explain in
> layman words how can only 3% of reads be affected?
>
>
> Best regards
> Miguel
>
>
>
> On Tue, Mar 14, 2017 at 1:28 PM, Junjun Zhang <Junjun.Zhang at oicr.on.ca>
> wrote:
>
> Hi Kieran,
>
> Thanks for the detailed explanation. So, in order to reproduce PCAWG BWA
> MEM alignment result, one must use lane level BAMs (one lane one BAM) as
> input.
>
> A processing is needed to prepare lane level BAMs from merged BAM.
>
> @Migual, hope this is helpful. Let us know if you have any other
> questions.
>
> Best regards
> Junjun
>
>
> On Mar 14, 2017, at 5:16 AM, Keiran Raine <kr2 at sanger.ac.uk> wrote:
>
> Hi Junjun,
>
> You won't be able to separate out the readgroups in the headers if the
> input is a merged BAM file .  If there are different libraries, read
> lengths etc it will cause problems for insert-size determination (used in
> determining proper-pairs) and result in inter-library duplicate removal (by
> definition reads from different libraries can't be duplicates).
>
> If you really need to do it this way you'd have to add a pre-processing
> step, bamtofastq can split a BAM into it's component readgroups in a single
> pass.
>
> Regards,
>
> Keiran Raine
> Principal Bioinformatician
> Cancer Genome Project
> Wellcome Trust Sanger Institute
>
> kr2 at sanger.ac.uk
> Tel:+44 (0)1223 834244 Ext: 4983 <+44%201223%20834244>
> Office: H104
>
>
> On 13 Mar 2017, at 21:16, Junjun Zhang <Junjun.Zhang at oicr.on.ca> wrote:
>
> Hi Keiran,
>
> Can you please comment on this, i.e., comparison between alignment done
> lane by lane v.s. done with all lanes mixed?
>
> Basically, we are trying to prepare input BAMs for testing PCAWG BWA MEM
> workflow. The starting point is the aligned BAM because we don't have the
> unaligned lane BAM any more. The key point here is: should input BAM
> organized by lanes, one lane one BAM? Or just one BAM containing all lanes?
>
> Thanks,
> Junjun
>
>
>
> *From: *Miguel Vazquez <mikisvaz at gmail.com>
> *Date: *Monday, March 13, 2017 at 2:31 PM
> *To: *Junjun Zhang <junjun.zhang at oicr.on.ca>
> *Cc: *George Mihaiescu <George.Mihaiescu at oicr.on.ca>, Jonas
> Demeulemeester <Jonas.Demeulemeester at crick.ac.uk>, "
> docktesters at lists.icgc.org" <docktesters at lists.icgc.org>
> *Subject: *Re: [DOCKTESTERS] Thanks!
>
>
> Hi Junjun
>
> About the unaligned BAM files, in fact I do have them for the two test
> I've ran. I could put them available for George but I think he could just
> as well produce them on site, since he might have to do that anyway. But we
> can always explore that option, though right now I don't know of a simple
> way to move these files around.
>
> About the number of lanes let me just say good grief! This is the first
> time I hear about it. So if I understand you correctly I need to:
>
> 1- Download the metadata for the BAM file
> 2- Determine the read_groups
> 3- Split the BAM file according to these read_groups
> 4- Unalign these BAM files and produce header files with different lanes
> 5- Run BWA-Mem
> 6- Compare collectively the reads from these BAM files with the original
> BAM
>
> Could you please confirm that this is the case? Is this consistent with
> the 3% mismatches? A similar percentage was found in the HCC1143, could
> this be the reason for that as well? Also I asked Keiran about these
> headers and he said there where OK. If you could please confirm that I need
> to do this extended process I'd be grateful, because its quite involved and
> there are concepts here I'm not familiar with.
>
> Regards
>
> Miguel
>
>
> On Mon, Mar 13, 2017 at 6:51 PM, Junjun Zhang <Junjun.Zhang at oicr.on.ca>
> wrote:
>
> Hi Miguel,
>
> I thought you kept the unaligned sequence you prepared for the testing.
>
> Following your link about preparing unaligned input, I found this:
> https://github.com/mikisvaz/PCAWG-Docker-Test/blob/master/bin/prepare_
> unaligned.sh#L16-L35, which actually could explain the high mismatch rate.
>
> When BWA MEM workflow runs, the alignments are done one lane level BAM at
> a time, then merge the aligned BAM later: https://github.com/
> ICGC-TCGA-PanCancer/Seqware-BWA-Workflow/blob/develop/src/
> main/java/com/github/seqware/WorkflowClient.java#L201
>
> I see the script prepare_unaligned.sh always generates one read group
> (i.e., lane) for normal or tumour, no matter how many read groups (lanes)
> in the aligned BAMs. This has big impact on the alignment result when lanes
> are aligned independently comparing aligned altogether.
>
> The PCAWG Sequence Submission SOP has a step to prepare unaligned BAM, but
> it only works when the input is *single lane BAM file*:
> https://wiki.oicr.on.ca/display/PANCANCER/PCAWG+%28a.
> k.a.+PCAP+or+PAWG%29+Sequence+Submission+SOP+-+v1.0#PCAWG(a.
> k.a.PCAPorPAWG)SequenceSubmissionSOP-v1.0-a)Followthisifyoustartfromsingle
> laneBAMfiles
>
> So, I think in order to perform testing alignment workflow properly, we
> will need to prepare *lane level *unaligned BAM (one lane one BAM) as
> inputs. For example, this aligned BAM: https://gtrepo-ebi.
> annailabs.com/cghub/metadata/analysisFull/c9fa1c22-6432-
> 4851-af67-30f4b4812c63, it has 7 read groups (search for read_group). It
> needs to be converted to 7 individual lane level BAM files.
>
> Not sure whether it's the best way to do BAM splitting, but here is
> someone's Python code to do it: https://gist.github.com/seandavi/2014542
>
> Hope this helps,
> Junjun
>
>
>
> *From: *Miguel Vazquez <mikisvaz at gmail.com>
> *Date: *Monday, March 13, 2017 at 1:01 PM
> *To: *George Mihaiescu <George.Mihaiescu at oicr.on.ca>
> *Cc: *Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk>, Junjun
> Zhang <junjun.zhang at oicr.on.ca>, "docktesters at lists.icgc.org" <
> docktesters at lists.icgc.org>
> *Subject: *Re: [DOCKTESTERS] Thanks!
>
>
> Hi George,
>
> The analigned BAM files are not available as far as I know, rather you
> must unalign the final BAM files, the normal ones you get from ICGC or
> GNOS. This process is also in my scripts, as you see here:
>
> https://github.com/mikisvaz/PCAWG-Docker-Test/blob/master/
> bin/run_batch.sh#L32
>
> About the steps in the workflows, I don't know them myself. I think you'll
> need to ask the developers, and not all workflows use the same underlying
> workflow enactment tool. Not an easy answer
>
>
>
> On Mon, Mar 13, 2017 at 5:57 PM, George Mihaiescu <
> George.Mihaiescu at oicr.on.ca> wrote:
>
> Junjun told me this would provide value to the testing process, so I would
> like to kick off a test of the BWA_mem docker.
> Can somebody provide some quick instructions and the location of the
> unaligned BAM files that were used already?
>
> Also, do we have somewhere the steps involved in each workflow, so I can
> get an idea of how far they are while running?
> For example, s58_cgpPindel_pin2vcf_95 is three steps from finish, or 50
> steps from finish…
>
> Thank you,
> George
>
> *From: *Miguel Vazquez <mikisvaz at gmail.com>
> *Date: *Monday, March 13, 2017 at 8:52 AM
>
> *To: *George Mihaiescu <George.Mihaiescu at oicr.on.ca>
> *Cc: *Junjun Zhang <Junjun.Zhang at oicr.on.ca>, Jonas Demeulemeester <
> Jonas.Demeulemeester at crick.ac.uk>, "docktesters at lists.icgc.org" <
> docktesters at lists.icgc.org>
> *Subject: *Re: [DOCKTESTERS] Thanks!
>
>
> Hi George,
> Answers inline
>
> On Mon, Mar 13, 2017 at 2:43 PM, George Mihaiescu <
> George.Mihaiescu at oicr.on.ca> wrote:
>
> Hi Miguel,
>
> I've started the test by running "bin/run_test.sh Sanger DO50398", so I
> guess with just one workflow running it should complete faster than two
> weeks.
>
>
> I think it still should take a long time. My scripts will run one workflow
> after another.
>
>
>
> Because I'm running in Collaboratory I've changed the "get_icgc_donor.sh"
> script to use a docker container that has the icgc client inside and pull
> data from Collaboratory. There is no "bam.bas" file downloaded, just a
> ".bam" and a ".bam.bai" files, not sure if this is an issue.
>
>
>
> I wondered the same thing first time I did this, but this file is produced
> by the pipeline. There was some problem with this that was dealt with by
> the developers and updated in the docker. So I think you won't have a
> problem
>
>
> By looking at the "bin/compare_result_type.sh" it looks like it's using
> the gnos client to pull down the existing VCF files for comparison reasons,
> but I think we store those files in Collaboratory as well, so I'll work
> with Junjun to adapt the script for this.
>
>
>
> Let me know if you need any help
>
>
> I think I initially tried to run the DKFZ workflow, but it complained
> about having to run Delly first, so I abandoned this for now.
>
>
> Yes, if you look at the run_batch.sh you will see that when using DKFZ it
> will always run Delly first. Delly prepares some files the the  DKFZ file
> needs, namely related to copy number I believe.
>
>
>
> I'll set up a new VM and run the "run_batch.sh" on the DO52140 donor.
>
>
> Remember that you will need to add the relevant has-keys for the different
> files in the etc/donor_files.csv. Its a bit tedious right now. You need to
> go to the ICGC DCC and find these codes manually for the files you need.
> Ask me if you need help. Once you have all you can run all the workflows
> for that donor and evaluate results.
>
> https://github.com/mikisvaz/PCAWG-Docker-Test/blob/master/
> etc/donor_files.csv
>
>
>
> Regards
> Miguel
>
>
>
> George
>
> *From: *Miguel Vazquez <mikisvaz at gmail.com>
> *Date: *Monday, March 13, 2017 at 6:53 AM
> *To: *George Mihaiescu <George.Mihaiescu at oicr.on.ca>
> *Cc: *Junjun Zhang <Junjun.Zhang at oicr.on.ca>, Jonas Demeulemeester <
> Jonas.Demeulemeester at crick.ac.uk>, "docktesters at lists.icgc.org" <
> docktesters at lists.icgc.org>
> *Subject: *Re: [DOCKTESTERS] Thanks!
>
>
> Hi George,
>
> The Sanger workflow is very lengthy, it takes about two weeks in my tests.
>
>
> About correctness, my scripts also cover that part, if you are not using
> them they might still help you to clarify how we do it. The idea is to take
> each of the output files produced: SNV_MNV, Indel, SV, and CNV, for both
> germline and somatic and compare it with the result uploaded to GNOS (not
> all pipelines produce all files). This is the relevant part in the
> run_batch.sh script:
>
> https://github.com/mikisvaz/PCAWG-Docker-Test/blob/master/
> bin/run_batch.sh#L42-L46
> The bin/compare_result_type.sh script will take care of downloading the
> correct file from GNOS and running the comparison. The comparison itself is
> simple since all files are VCFs, it consists in taking out the variants in
> terms of chromosome, position, reference and alternative allele and
> measuring the overlaps.
>
> https://github.com/mikisvaz/PCAWG-Docker-Test/blob/master/
> bin/compare_result_type.sh
>
> About which donors to test, DO52140 is one Jonas and I have both tested
> and could be interesting to get a third opinion. Also, any other donor
> could be interesting to see if something new comes up. I'm not sure which
> options is best.
>
> Miguel
>
>
>
>
> On Mon, Mar 13, 2017 at 5:12 AM, George Mihaiescu <
> George.Mihaiescu at oicr.on.ca> wrote:
>
> Hi,
>
> I've started Sanger on DO50398 and it's been running for more than 24
> hours, currently at "Workflow step succeeded: s58_bbAllele_merge_59"
>
> I just started a second run on a different VM on same donor, just to
> compare run times.
> The VM used has 8 cores, 48 GB of RAM and 1.1 TB disk and I'll send some
> monitoring graphs when it finishes the workflow, but I have no idea how to
> check its correctness.
>
> Give me a list of donors and what workflows you want me to run and I'll
> try to schedule them tomorrow.
>
> George
>
>
> *From: *Junjun Zhang <Junjun.Zhang at oicr.on.ca>
> *Date: *Sunday, March 12, 2017 at 10:45 PM
> *To: *Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk>, George
> Mihaiescu <George.Mihaiescu at oicr.on.ca>
> *Cc: *Miguel Vazquez <miguel.vazquez at cnio.es>, Denis Yuen <
> Denis.Yuen at oicr.on.ca>, "docktesters at lists.icgc.org" <
> docktesters at lists.icgc.org>
> *Subject: *Re: [DOCKTESTERS] Thanks!
>
> Thanks Miguel and Jonas for your help here!
>
> Do you have any update on the latest testing? Please feel free updating
> the wiki with any update: https://wiki.oicr.on.
> ca/display/PANCANCER/2017-03-13+PCAWG-TECH+Teleconference
>
> Regards,
> Junjun
>
>
>
> *From: *Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk>
> *Date: *Saturday, March 11, 2017 at 7:15 PM
> *To: *George Mihaiescu <George.Mihaiescu at oicr.on.ca>
> *Cc: *Miguel Vazquez <miguel.vazquez at cnio.es>, Junjun Zhang <
> junjun.zhang at oicr.on.ca>, Denis Yuen <Denis.Yuen at oicr.on.ca>, "
> docktesters at lists.icgc.org" <docktesters at lists.icgc.org>
> *Subject: *Re: [DOCKTESTERS] Thanks!
>
>
> Hi George,
>
> Yup, I've been running the PCAWG dockers mainly using Miguel's set of
> scripts.
> Give them a go and if you run into issues, just let us know!
>
> Cheers,
> Jonas
>
>
>
> On 11 Mar 2017, at 17:00, George Mihaiescu <George.Mihaiescu at oicr.on.ca>
> wrote:
>
> Sure, I'll give it a try and report later.
>
> Thank you,
>
> *George Mihaiescu*
> Senior Cloud Architect
>
> *Ontario Institute for Cancer Research*
> MaRS Centre
> 661 University Avenue
> Suite 510
> Toronto, Ontario
> Canada M5G 0A3
>
> Email: George.Mihaiescu at oicr.on.ca
> Toll-free: 1-866-678-6427
> Twitter: @OICR_news
>
> www.oicr.on.ca
>
> This message and any attachments may contain confidential and/or
> privileged information for the sole use of the intended recipient. Any
> review or distribution by anyone other than the person for whom it was
> originally intended is strictly prohibited. If you have received this
> message in error, please contact the sender and delete all copies.
> Opinions, conclusions or other information contained in this message may
> not be that of the organization.
>
>
> *From: *Miguel Vazquez <miguel.vazquez at cnio.es>
> *Date: *Saturday, March 11, 2017 at 10:57 AM
> *To: *Junjun Zhang <Junjun.Zhang at oicr.on.ca>
> *Cc: *Denis Yuen <Denis.Yuen at oicr.on.ca>, Jonas Demeulemeester <
> jonas.demeulemeester at crick.ac.uk>, George Mihaiescu <
> George.Mihaiescu at oicr.on.ca>, "docktesters at lists.icgc.org" <
> docktesters at lists.icgc.org>
> *Subject: *Re: [DOCKTESTERS] Thanks!
>
>
> Hi Junjun,
>
> I think Jonas has been using my scripts to run some of the tests, maybe
> George could try them as well, it should be very easy for him to try the
> Sanger, Delly+DKFZ, BWA-Mem, and the BiasFilter.
>
> https://github.com/mikisvaz/PCAWG-Docker-Test
>
> He would just need to update the tokens for DACO access and the scripts
> will take care of downloading the BAM files, running the workflows and
> evaluating the result.
>
> The documentation there is reasonably updated, but if this sounds good
> then perhaps he could contact me and I could walk him through the details.
>
> Best regards
> Miguel
>
> On Fri, Mar 10, 2017 at 9:51 PM, Junjun Zhang <Junjun.Zhang at oicr.on.ca>
> wrote:
>
> Dear Docktesters,
>
> George Mihaiescu, cloud architect, of the Collaboratory at OICR plans to
> run some bioinformatics workflows to test Collab environment.
>
> Just thought this is a good opportunity to use as extra help for testing
> out the PCAWG dockerized workflows.
>
> Miguel, Denis and others, what workflows / datasets do you think would be
> good for George to run?
>
> Thanks,
> Junjun
>
>
>
> *From:*<docktesters-bounces+junjun.zhang=oicr.on.ca at lists.icgc.org> on
> behalf of Denis Yuen <Denis.Yuen at oicr.on.ca>
> *Date: *Wednesday, March 1, 2017 at 10:26 AM
> *To: *"docktesters at lists.icgc.org" <docktesters at lists.icgc.org>
> *Subject: *[DOCKTESTERS] Thanks!
>
>
>
> Hi,
>
> Just wanted to say thanks to Miguel and Jonas for keeping the workflow
> testing data page up-to-date.
>
> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
>
> As we work on new versions or debugging, it is invaluable to know what
> versions of the workflows have worked outside OICR, thanks!
>
>
> *Denis Yuen*
> Senior Software Developer
>
> *OntarioInstituteforCancerResearch*
> MaRSCentre
> 661 University Avenue
> Suite510
> Toronto, Ontario,Canada M5G0A3
> Toll-free: 1-866-678-6427
> Twitter: @OICR_news
> *www.oicr.on.ca <http://www.oicr.on.ca/>*
> This message and any attachments may contain confidential and/or
> privileged information for the sole use of the intended recipient. Any
> review or distribution by anyone other than the person for whom it was
> originally intended is strictly prohibited. If you have received this
> message in error, please contact the sender and delete all copies.
> Opinions, conclusions or other information contained in this message may
> not be that of the organization.
>
>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
> https://lists.icgc.org/mailman/listinfo/docktesters
>
>
>
> *The Francis Crick Institute Limited is a registered charity in England
> and Wales no. 1140062 and a company registered in England and Wales no.
> 06885462, with its registered office at 1 Midland Road London NW1 1AT*
>
>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
> https://lists.icgc.org/mailman/listinfo/docktesters
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *The Francis Crick Institute Limited is a registered charity in England
> and Wales no. 1140062 and a company registered in England and Wales no.
> 06885462, with its registered office at 1 Midland Road London NW1 1AT*
> -- The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a company
> registered in England with number 2742969, whose registered office is 215
> Euston Road, London, NW1 2BE.
>
>
> The Francis Crick Institute Limited is a registered charity in England and
> Wales no. 1140062 and a company registered in England and Wales no.
> 06885462, with its registered office at 1 Midland Road London NW1 1AT
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20170315/6f3484d1/attachment-0001.html>


More information about the docktesters mailing list