[DOCKTESTERS] Thanks!

Miguel Vazquez mikisvaz at gmail.com
Mon Mar 13 13:01:06 EDT 2017


Hi George,

The analigned BAM files are not available as far as I know, rather you must
unalign the final BAM files, the normal ones you get from ICGC or GNOS.
This process is also in my scripts, as you see here:

https://github.com/mikisvaz/PCAWG-Docker-Test/blob/master/bin/run_batch.sh#L32

About the steps in the workflows, I don't know them myself. I think you'll
need to ask the developers, and not all workflows use the same underlying
workflow enactment tool. Not an easy answer



On Mon, Mar 13, 2017 at 5:57 PM, George Mihaiescu <
George.Mihaiescu at oicr.on.ca> wrote:

> Junjun told me this would provide value to the testing process, so I would
> like to kick off a test of the BWA_mem docker.
> Can somebody provide some quick instructions and the location of the
> unaligned BAM files that were used already?
>
> Also, do we have somewhere the steps involved in each workflow, so I can
> get an idea of how far they are while running?
> For example, s58_cgpPindel_pin2vcf_95 is three steps from finish, or 50
> steps from finish…
>
> Thank you,
> George
>
> From: Miguel Vazquez <mikisvaz at gmail.com>
> Date: Monday, March 13, 2017 at 8:52 AM
>
> To: George Mihaiescu <George.Mihaiescu at oicr.on.ca>
> Cc: Junjun Zhang <Junjun.Zhang at oicr.on.ca>, Jonas Demeulemeester <
> Jonas.Demeulemeester at crick.ac.uk>, "docktesters at lists.icgc.org" <
> docktesters at lists.icgc.org>
> Subject: Re: [DOCKTESTERS] Thanks!
>
> Hi George,
>
> Answers inline
>
> On Mon, Mar 13, 2017 at 2:43 PM, George Mihaiescu <
> George.Mihaiescu at oicr.on.ca> wrote:
>
>> Hi Miguel,
>>
>> I've started the test by running "bin/run_test.sh Sanger DO50398", so I
>> guess with just one workflow running it should complete faster than two
>> weeks.
>>
>
> I think it still should take a long time. My scripts will run one workflow
> after another.
>
>
>>
>> Because I'm running in Collaboratory I've changed the "get_icgc_donor.sh"
>> script to use a docker container that has the icgc client inside and pull
>> data from Collaboratory. There is no "bam.bas" file downloaded, just a
>> ".bam" and a ".bam.bai" files, not sure if this is an issue.
>>
>>
> I wondered the same thing first time I did this, but this file is produced
> by the pipeline. There was some problem with this that was dealt with by
> the developers and updated in the docker. So I think you won't have a
> problem
>
>
>> By looking at the "bin/compare_result_type.sh" it looks like it's using
>> the gnos client to pull down the existing VCF files for comparison reasons,
>> but I think we store those files in Collaboratory as well, so I'll work
>> with Junjun to adapt the script for this.
>>
>>
> Let me know if you need any help
>
>
>> I think I initially tried to run the DKFZ workflow, but it complained
>> about having to run Delly first, so I abandoned this for now.
>>
>
> Yes, if you look at the run_batch.sh you will see that when using DKFZ it
> will always run Delly first. Delly prepares some files the the  DKFZ file
> needs, namely related to copy number I believe.
>
>
>>
>> I'll set up a new VM and run the "run_batch.sh" on the DO52140 donor.
>>
>
> Remember that you will need to add the relevant has-keys for the different
> files in the etc/donor_files.csv. Its a bit tedious right now. You need to
> go to the ICGC DCC and find these codes manually for the files you need.
> Ask me if you need help. Once you have all you can run all the workflows
> for that donor and evaluate results.
>
> https://github.com/mikisvaz/PCAWG-Docker-Test/blob/master/
> etc/donor_files.csv
>
>
> Regards
>
> Miguel
>
>
>>
>> George
>>
>> From: Miguel Vazquez <mikisvaz at gmail.com>
>> Date: Monday, March 13, 2017 at 6:53 AM
>> To: George Mihaiescu <George.Mihaiescu at oicr.on.ca>
>> Cc: Junjun Zhang <Junjun.Zhang at oicr.on.ca>, Jonas Demeulemeester <
>> Jonas.Demeulemeester at crick.ac.uk>, "docktesters at lists.icgc.org" <
>> docktesters at lists.icgc.org>
>> Subject: Re: [DOCKTESTERS] Thanks!
>>
>> Hi George,
>>
>> The Sanger workflow is very lengthy, it takes about two weeks in my
>> tests.
>>
>> About correctness, my scripts also cover that part, if you are not using
>> them they might still help you to clarify how we do it. The idea is to take
>> each of the output files produced: SNV_MNV, Indel, SV, and CNV, for both
>> germline and somatic and compare it with the result uploaded to GNOS (not
>> all pipelines produce all files). This is the relevant part in the
>> run_batch.sh script:
>>
>> https://github.com/mikisvaz/PCAWG-Docker-Test/blob/master/bi
>> n/run_batch.sh#L42-L46
>>
>> The bin/compare_result_type.sh script will take care of downloading the
>> correct file from GNOS and running the comparison. The comparison itself is
>> simple since all files are VCFs, it consists in taking out the variants in
>> terms of chromosome, position, reference and alternative allele and
>> measuring the overlaps.
>>
>> https://github.com/mikisvaz/PCAWG-Docker-Test/blob/master/bi
>> n/compare_result_type.sh
>>
>> About which donors to test, DO52140 is one Jonas and I have both tested
>> and could be interesting to get a third opinion. Also, any other donor
>> could be interesting to see if something new comes up. I'm not sure which
>> options is best.
>>
>> Miguel
>>
>>
>>
>>
>> On Mon, Mar 13, 2017 at 5:12 AM, George Mihaiescu <
>> George.Mihaiescu at oicr.on.ca> wrote:
>>
>>> Hi,
>>>
>>> I've started Sanger on DO50398 and it's been running for more than 24
>>> hours, currently at "Workflow step succeeded: s58_bbAllele_merge_59"
>>>
>>> I just started a second run on a different VM on same donor, just to
>>> compare run times.
>>> The VM used has 8 cores, 48 GB of RAM and 1.1 TB disk and I'll send some
>>> monitoring graphs when it finishes the workflow, but I have no idea how to
>>> check its correctness.
>>>
>>> Give me a list of donors and what workflows you want me to run and I'll
>>> try to schedule them tomorrow.
>>>
>>> George
>>>
>>>
>>> From: Junjun Zhang <Junjun.Zhang at oicr.on.ca>
>>> Date: Sunday, March 12, 2017 at 10:45 PM
>>> To: Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk>, George
>>> Mihaiescu <George.Mihaiescu at oicr.on.ca>
>>> Cc: Miguel Vazquez <miguel.vazquez at cnio.es>, Denis Yuen <
>>> Denis.Yuen at oicr.on.ca>, "docktesters at lists.icgc.org" <
>>> docktesters at lists.icgc.org>
>>> Subject: Re: [DOCKTESTERS] Thanks!
>>>
>>> Thanks Miguel and Jonas for your help here!
>>>
>>> Do you have any update on the latest testing? Please feel free updating
>>> the wiki with any update: https://wiki.oicr.on.c
>>> a/display/PANCANCER/2017-03-13+PCAWG-TECH+Teleconference
>>>
>>> Regards,
>>> Junjun
>>>
>>>
>>>
>>> From: Jonas Demeulemeester <Jonas.Demeulemeester at crick.ac.uk>
>>> Date: Saturday, March 11, 2017 at 7:15 PM
>>> To: George Mihaiescu <George.Mihaiescu at oicr.on.ca>
>>> Cc: Miguel Vazquez <miguel.vazquez at cnio.es>, Junjun Zhang <
>>> junjun.zhang at oicr.on.ca>, Denis Yuen <Denis.Yuen at oicr.on.ca>, "
>>> docktesters at lists.icgc.org" <docktesters at lists.icgc.org>
>>> Subject: Re: [DOCKTESTERS] Thanks!
>>>
>>> Hi George,
>>>
>>> Yup, I've been running the PCAWG dockers mainly using Miguel's set of
>>> scripts.
>>> Give them a go and if you run into issues, just let us know!
>>>
>>> Cheers,
>>> Jonas
>>>
>>>
>>> On 11 Mar 2017, at 17:00, George Mihaiescu <George.Mihaiescu at oicr.on.ca>
>>> wrote:
>>>
>>> Sure, I'll give it a try and report later.
>>>
>>> Thank you,
>>>
>>> *George Mihaiescu*
>>> Senior Cloud Architect
>>>
>>> *Ontario Institute for Cancer Research*
>>> MaRS Centre
>>> 661 University Avenue
>>> Suite 510
>>> Toronto, Ontario
>>> Canada M5G 0A3
>>>
>>> Email: George.Mihaiescu at oicr.on.ca
>>> Toll-free: 1-866-678-6427
>>> Twitter: @OICR_news
>>>
>>> www.oicr.on.ca
>>>
>>> This message and any attachments may contain confidential and/or
>>> privileged information for the sole use of the intended recipient. Any
>>> review or distribution by anyone other than the person for whom it was
>>> originally intended is strictly prohibited. If you have received this
>>> message in error, please contact the sender and delete all copies.
>>> Opinions, conclusions or other information contained in this message may
>>> not be that of the organization.
>>>
>>>
>>>
>>> From: Miguel Vazquez <miguel.vazquez at cnio.es>
>>> Date: Saturday, March 11, 2017 at 10:57 AM
>>> To: Junjun Zhang <Junjun.Zhang at oicr.on.ca>
>>> Cc: Denis Yuen <Denis.Yuen at oicr.on.ca>, Jonas Demeulemeester <
>>> jonas.demeulemeester at crick.ac.uk>, George Mihaiescu <
>>> George.Mihaiescu at oicr.on.ca>, "docktesters at lists.icgc.org" <
>>> docktesters at lists.icgc.org>
>>> Subject: Re: [DOCKTESTERS] Thanks!
>>>
>>> Hi Junjun,
>>>
>>> I think Jonas has been using my scripts to run some of the tests, maybe
>>> George could try them as well, it should be very easy for him to try the
>>> Sanger, Delly+DKFZ, BWA-Mem, and the BiasFilter.
>>>
>>> https://github.com/mikisvaz/PCAWG-Docker-Test
>>>
>>> He would just need to update the tokens for DACO access and the scripts
>>> will take care of downloading the BAM files, running the workflows and
>>> evaluating the result.
>>>
>>> The documentation there is reasonably updated, but if this sounds good
>>> then perhaps he could contact me and I could walk him through the details.
>>>
>>> Best regards
>>>
>>> Miguel
>>>
>>> On Fri, Mar 10, 2017 at 9:51 PM, Junjun Zhang <Junjun.Zhang at oicr.on.ca>
>>> wrote:
>>>
>>>> Dear Docktesters,
>>>>
>>>> George Mihaiescu, cloud architect, of the Collaboratory at OICR plans
>>>> to run some bioinformatics workflows to test Collab environment.
>>>>
>>>> Just thought this is a good opportunity to use as extra help for
>>>> testing out the PCAWG dockerized workflows.
>>>>
>>>> Miguel, Denis and others, what workflows / datasets do you think would
>>>> be good for George to run?
>>>>
>>>> Thanks,
>>>> Junjun
>>>>
>>>>
>>>>
>>>> From: <docktesters-bounces+junjun.zhang=oicr.on.ca at lists.icgc.org> on
>>>> behalf of Denis Yuen <Denis.Yuen at oicr.on.ca>
>>>> Date: Wednesday, March 1, 2017 at 10:26 AM
>>>> To: "docktesters at lists.icgc.org" <docktesters at lists.icgc.org>
>>>> Subject: [DOCKTESTERS] Thanks!
>>>>
>>>> Hi,
>>>>
>>>> Just wanted to say thanks to Miguel and Jonas for keeping the workflow
>>>> testing data page up-to-date.
>>>>
>>>> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
>>>>
>>>>
>>>> As we work on new versions or debugging, it is invaluable to know what
>>>> versions of the workflows have worked outside OICR, thanks!
>>>>
>>>>
>>>>
>>>> *Denis Yuen*
>>>> Senior Software Developer
>>>>
>>>>
>>>> *Ontario**Institute**for**Cancer**Research*
>>>> MaRSCentre
>>>> 661 University Avenue
>>>> Suite510
>>>> Toronto, Ontario,Canada M5G0A3
>>>>
>>>> Toll-free: 1-866-678-6427
>>>> Twitter: @OICR_news
>>>> *www.oicr.on.ca <http://www.oicr.on.ca/>*
>>>>
>>>> This message and any attachments may contain confidential and/or
>>>> privileged information for the sole use of the intended recipient. Any
>>>> review or distribution by anyone other than the person for whom it was
>>>> originally intended is strictly prohibited. If you have received this
>>>> message in error, please contact the sender and delete all copies.
>>>> Opinions, conclusions or other information contained in this message may
>>>> not be that of the organization.
>>>>
>>>>
>>>> _______________________________________________
>>>> docktesters mailing list
>>>> docktesters at lists.icgc.org
>>>> https://lists.icgc.org/mailman/listinfo/docktesters
>>>>
>>>>
>>> The Francis Crick Institute Limited is a registered charity in England
>>> and Wales no. 1140062 and a company registered in England and Wales no.
>>> 06885462, with its registered office at 1 Midland Road London NW1 1AT
>>>
>>>
>>> _______________________________________________
>>> docktesters mailing list
>>> docktesters at lists.icgc.org
>>> https://lists.icgc.org/mailman/listinfo/docktesters
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20170313/d1ee2712/attachment-0001.html>


More information about the docktesters mailing list