[DOCKTESTERS] Amazon account

Denis Yuen Denis.Yuen at oicr.on.ca
Tue Sep 20 10:48:27 EDT 2016


Hi,

An AMI/Vagrant script would make sense.
I like Ansible too as well and we have some re-usable code here that was used to construct the VMs for the pan-cancer run that could be forked/stripped down to create a testing VM https://github.com/ICGC-TCGA-PanCancer/container-host-bag/tree/bcf5c074f69914ab9542ff8e2ef2607612cd05f7  (you'll probably want just the common, install-docker, java, and workflow roles for example).

Docker in Docker would be complicated and with the restrictions that cwltool places on running containers, I wouldn't be surprised if there was some incompatibility.


Denis Yuen
Bioinformatics Software Developer

Ontario Institute for Cancer Research
MaRS Centre
661 University Avenue
Suite 510
Toronto, Ontario, Canada M5G 0A3

Toll-free: 1-866-678-6427
Twitter: @OICR_news
www.oicr.on.ca<http://www.oicr.on.ca/>

This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

________________________________
From: Miguel Vazquez [mikisvaz at gmail.com]
Sent: September 20, 2016 10:22 AM
To: Brian O'Connor
Cc: Junjun Zhang; Denis Yuen; docktesters at lists.icgc.org
Subject: Re: [DOCKTESTERS] Amazon account

Hi all,

I think there are three scripting or systematizing efforts that would help

1) Setting environment up to running the dockerstore. Junjun suggested docker inside docker, which might be a bit convoluted, but a preset virtual machine or AWS AIM would be very suitable. Brian's script would be the script that provisions such a VM, and Vagrant would be a possible way to program this if this technology plays well with VM vendors

2) Downloading the samples and preparing the Dockerstore.json files to run and produce the results in standard places

3) Downloading the released version of the files and comparing with the results of the test. This could be as simple as retrieving the list of SNVs and indels and measuring the overlaps.

There is also the issue of launching the instances, issuing the jobs, monitoring completion, and possibly gathering results files for validation prior to turning them down.

Point 1) is fairly easy with the documentation we have already. Point 2) is almost 25% done, we still have 3 other dockerstore workflows to setup op, and also drivers for the ICGC client and GNOS client. Point 3) might not be that hard either once the files are gathered and organized.

The moving parts here that would require specific drivers I think are:
- The code to provision VM images, Vagrant for instance, though it is really just running a script
- The different VM technologies: AWS, Collaboratory, etc
- The 4 different pipeline: Sanger, Broad, DKFZ, BWA
- The 2 downloading tools: ICGC and GNOS
- The several different result files: SNV, Indels, SV, etc

The effort required to set all this infrastructure up might be a bit of an overkill though. On the other hand the results might be a deliverable piece of work that might have more reach than just these tests. Maybe we can discuss this on our next meeting.

Best

M



On Tue, Sep 20, 2016 at 3:42 PM, Brian O'Connor <Brian.OConnor at oicr.on.ca<redir.aspx?REF=JOJGSNDw6zO7xJTn4lEJPJKoLdIHN42swDBiNODOqNNpD-v9Y-HTCAFtYWlsdG86QnJpYW4uT0Nvbm5vckBvaWNyLm9uLmNh>> wrote:
Hi Junjun,

Great to hear this!

I would recommend against trying to make this a docker in docker.  It’s something to be avoided because it really causes problems running in various environments.

Maybe a well commented bash script?  I think the instructions are long but really we expect these workflows to run in a lot of different environments and it’s better to explain how it works so users can customize for their environment.

What do you all think?

Brian

> On Sep 19, 2016, at 11:02 PM, Junjun Zhang <junjun.zhang at oicr.on.ca<redir.aspx?REF=ASxoWr8PVhV4ZqJuG80TdoqoaLoz0o1gAQ3DKN8H_21pD-v9Y-HTCAFtYWlsdG86anVuanVuLnpoYW5nQG9pY3Iub24uY2E.>> wrote:
>
> Good news, both Miguel and myself have Sanger pipeline running on AWS and Collab respectively.
>
> Here is the documentation on all steps we went through to get things set up: https://docs.google.com/document/d/1EPo2Wgh-WJz75GdykgTI1fpm89yIdoGGHAyVlJ9PbcA/edit<redir.aspx?REF=DJCUIo5s2GE30Zh5rPFLrxvEbE6UfXDk4p3-dme7o2xpD-v9Y-HTCAFodHRwczovL2RvY3MuZ29vZ2xlLmNvbS9kb2N1bWVudC9kLzFFUG8yV2doLVdKejc1R2R5a2dUSTFmcG04OXlJZG9HR0hBeVZsSjlQYmNBL2VkaXQ.>
>
> As you can see there are quite some steps to go through, does it make sense to build a docker image for setting up testing environment?
>
> It's kind of like docker in docker, is that OK?
>
> Junjun
>
>
> From: docktesters-bounces+junjun.zhang=oicr.on.ca at lists.icgc.org<redir.aspx?REF=Hr5uKkVE2EK8UviEOIWvtC5yKMeOjbFAqN5PMnMeg3lpD-v9Y-HTCAFtYWlsdG86b2ljci5vbi5jYUBsaXN0cy5pY2djLm9yZw..> [docktesters-bounces+junjun.zhang=oicr.on.ca at lists.icgc.org<redir.aspx?REF=Hr5uKkVE2EK8UviEOIWvtC5yKMeOjbFAqN5PMnMeg3lpD-v9Y-HTCAFtYWlsdG86b2ljci5vbi5jYUBsaXN0cy5pY2djLm9yZw..>] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca<redir.aspx?REF=5TK5qJ47ocb4-z7xM0uJdZe3zcKmfqJn6zBO2HZXZrtpD-v9Y-HTCAFtYWlsdG86RGVuaXMuWXVlbkBvaWNyLm9uLmNh>]
> Sent: Monday, September 19, 2016 3:39 PM
> To: Miguel Vazquez
> Cc: docktesters at lists.icgc.org<redir.aspx?REF=WR0uUDJk7tutP4V68t0ajiITEVZZwbCQG-SEgAFI0edpD-v9Y-HTCAFtYWlsdG86ZG9ja3Rlc3RlcnNAbGlzdHMuaWNnYy5vcmc.>
> Subject: Re: [DOCKTESTERS] Amazon account
>
> Hi,
>
> Quoting myself from an email since it is applicable here too
>
> Hi,
> Sounds reasonable, the hardware requirements listed at https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker#hardware-requirements<redir.aspx?REF=H5guZJVRY7NWHntfkzxlhf0nfj_fOI1AT-mcXxKIRHFpD-v9Y-HTCAFodHRwczovL2dpdGh1Yi5jb20vSUNHQy1UQ0dBLVBhbkNhbmNlci9DR1AtU29tYXRpYy1Eb2NrZXIjaGFyZHdhcmUtcmVxdWlyZW1lbnRz> match my recollection and a r3.4xlarge matches that handily in terms of RAM and CPU
>
> The only thing I would check would be to make sure that the working directory (where you run Dockstore) is in a large 1TB volume. The workflow should be able to overwhelm 320 GB if that's all you have, causing it to crash.
>
> When we were running for Pan-cancer, we sometimes used lvm to merge all ephemeral drives on an AWS instance into one larger drive.
> But for testing, it would probably be simpler just to use one large EBS volume.
>
> Denis Yuen
> Bioinformatics Software Developer
>
> Ontario Institute for Cancer Research
> MaRS Centre
> 661 University Avenue
> Suite 510
> Toronto, Ontario, Canada M5G 0A3
> Toll-free: 1-866-678-6427
> Twitter: @OICR_news
> www.oicr.on.ca<redir.aspx?REF=_MWJDDliGopzgFj8KrpXZkL4NYdIY7uHmgtFzJ3UuXVpD-v9Y-HTCAFodHRwOi8vd3d3Lm9pY3Iub24uY2E.>
> This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org<redir.aspx?REF=Hr5uKkVE2EK8UviEOIWvtC5yKMeOjbFAqN5PMnMeg3lpD-v9Y-HTCAFtYWlsdG86b2ljci5vbi5jYUBsaXN0cy5pY2djLm9yZw..> [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org<redir.aspx?REF=Hr5uKkVE2EK8UviEOIWvtC5yKMeOjbFAqN5PMnMeg3lpD-v9Y-HTCAFtYWlsdG86b2ljci5vbi5jYUBsaXN0cy5pY2djLm9yZw..>] on behalf of Miguel Vazquez [mikisvaz at gmail.com<redir.aspx?REF=6Va2xiP0WWog_p5KPj4rfH2rtSiKXLwdKL5tWrQQlZZpD-v9Y-HTCAFtYWlsdG86bWlraXN2YXpAZ21haWwuY29t>]
> Sent: September 19, 2016 3:26 PM
> To: Francis Ouellette
> Cc: docktesters at lists.icgc.org<redir.aspx?REF=WR0uUDJk7tutP4V68t0ajiITEVZZwbCQG-SEgAFI0edpD-v9Y-HTCAFtYWlsdG86ZG9ja3Rlc3RlcnNAbGlzdHMuaWNnYy5vcmc.>; Zhibin
> Subject: Re: [DOCKTESTERS] Amazon account
>
> Thanks Francis.
>
> BTW Brian and Junjun, I think I might not have enough disk space in the instance you got for me, gtdownload croaks:
>
> Error:  The system *might* run out of disk space before all downloads are complete, Downloading will continue until less than 1.00 GB is available.
>
> and does not seem to download anything.
>
> I took the liberty to make a directory in /mn/, which has 95GB avai. but it does not seem to change things. Excuse my ignorance, but how big are these files?
>
> Miguel
>
> On Mon, Sep 19, 2016 at 9:22 PM, Francis Ouellette <francis at oicr.on.ca<redir.aspx?REF=VAAAISo5KJIyZ6mFZV-MXaENXCEZkleNbdWlIykGqwlpD-v9Y-HTCAFtYWlsdG86ZnJhbmNpc0BvaWNyLm9uLmNh>> wrote:
> Hi Miguel,
>
> I’m CCing docktester .. I think Junjun or Brian will be best to answer this …
>  I think a bit more RAM would be OK, so could do: m4.10xlarge (10 CPU and 160 GB RAM).
>
> Would that be good Brian?
>
> @bffo
>
>
> --
> B.F. Francis Ouellette          http://oicr.on.ca/person/francis-ouellette<redir.aspx?REF=quehQKh_NunNyocw2DyYD4vTDG-166XFwjKD8GtEOnppD-v9Y-HTCAFodHRwOi8vb2ljci5vbi5jYS9wZXJzb24vZnJhbmNpcy1vdWVsbGV0dGU.>
>
>
>
>> On Sep 19, 2016, at 3:09 PM, Zhibin <zhibin at gmail.com<redir.aspx?REF=GsU0s0Qe9JAl1pQXRTPM5MY-eSO34YEhj9bem9d7uWJpD-v9Y-HTCAFtYWlsdG86emhpYmluQGdtYWlsLmNvbQ..>> wrote:
>>
>> Jo Miguel,
>>
>> I am not familiar with Sanger pipeline. You should launch instances based on the number of CPUs and memory you need.
>>
>> Best,
>>
>> Zhibin
>>
>> On Mon, Sep 19, 2016 at 3:05 PM, Miguel Vazquez <mikisvaz at gmail.com<redir.aspx?REF=6Va2xiP0WWog_p5KPj4rfH2rtSiKXLwdKL5tWrQQlZZpD-v9Y-HTCAFtYWlsdG86bWlraXN2YXpAZ21haWwuY29t>> wrote:
>> Hello again,
>>
>> What image will you suggest to run the Sanger pipeline? I'm not very used to AWS and I wouldn't want to burn through your credit accidentally. I was thinking of r3.4xlarge that has 16 cores, 122GB memmory and 1x320 (SSD), would that be a good choice?
>>
>> Best
>>
>> Miguel
>>
>> On Mon, Sep 19, 2016 at 8:42 PM, Miguel Vazquez <mikisvaz at gmail.com<redir.aspx?REF=6Va2xiP0WWog_p5KPj4rfH2rtSiKXLwdKL5tWrQQlZZpD-v9Y-HTCAFtYWlsdG86bWlraXN2YXpAZ21haWwuY29t>> wrote:
>> Thanks Zhibin
>> Best regards
>> Miguel
>
>
>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org<redir.aspx?REF=WR0uUDJk7tutP4V68t0ajiITEVZZwbCQG-SEgAFI0edpD-v9Y-HTCAFtYWlsdG86ZG9ja3Rlc3RlcnNAbGlzdHMuaWNnYy5vcmc.>
> https://lists.icgc.org/mailman/listinfo/docktesters<redir.aspx?REF=yDX7cRnKluLPMydT1NZ2VoGE_LAVeZNZVIjjbIxDzJFpD-v9Y-HTCAFodHRwczovL2xpc3RzLmljZ2Mub3JnL21haWxtYW4vbGlzdGluZm8vZG9ja3Rlc3RlcnM.>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20160920/5eb89557/attachment-0001.html>


More information about the docktesters mailing list