[DOCKTESTERS] Amazon account

Miguel Vazquez miguel.vazquez at cnio.es
Tue Sep 20 10:24:20 EDT 2016


I'm resending this. I used the wrong email again and it bounced. Apologies

On Tue, Sep 20, 2016 at 4:22 PM, Miguel Vazquez <mikisvaz at gmail.com> wrote:

> Hi all,
>
> I think there are three scripting or systematizing efforts that would help
>
> 1) Setting environment up to running the dockerstore. Junjun suggested
> docker inside docker, which might be a bit convoluted, but a preset virtual
> machine or AWS AIM would be very suitable. Brian's script would be the
> script that provisions such a VM, and Vagrant would be a possible way to
> program this if this technology plays well with VM vendors
>
> 2) Downloading the samples and preparing the Dockerstore.json files to run
> and produce the results in standard places
>
> 3) Downloading the released version of the files and comparing with the
> results of the test. This could be as simple as retrieving the list of SNVs
> and indels and measuring the overlaps.
>
> There is also the issue of launching the instances, issuing the jobs,
> monitoring completion, and possibly gathering results files for validation
> prior to turning them down.
>
> Point 1) is fairly easy with the documentation we have already. Point 2)
> is almost 25% done, we still have 3 other dockerstore workflows to setup
> op, and also drivers for the ICGC client and GNOS client. Point 3) might
> not be that hard either once the files are gathered and organized.
>
> The moving parts here that would require specific drivers I think are:
> - The code to provision VM images, Vagrant for instance, though it is
> really just running a script
> - The different VM technologies: AWS, Collaboratory, etc
> - The 4 different pipeline: Sanger, Broad, DKFZ, BWA
> - The 2 downloading tools: ICGC and GNOS
> - The several different result files: SNV, Indels, SV, etc
>
> The effort required to set all this infrastructure up might be a bit of an
> overkill though. On the other hand the results might be a deliverable piece
> of work that might have more reach than just these tests. Maybe we can
> discuss this on our next meeting.
>
> Best
>
> M
>
>
>
> On Tue, Sep 20, 2016 at 3:42 PM, Brian O'Connor <Brian.OConnor at oicr.on.ca>
> wrote:
>
>> Hi Junjun,
>>
>> Great to hear this!
>>
>> I would recommend against trying to make this a docker in docker.  It’s
>> something to be avoided because it really causes problems running in
>> various environments.
>>
>> Maybe a well commented bash script?  I think the instructions are long
>> but really we expect these workflows to run in a lot of different
>> environments and it’s better to explain how it works so users can customize
>> for their environment.
>>
>> What do you all think?
>>
>> Brian
>>
>> > On Sep 19, 2016, at 11:02 PM, Junjun Zhang <junjun.zhang at oicr.on.ca>
>> wrote:
>> >
>> > Good news, both Miguel and myself have Sanger pipeline running on AWS
>> and Collab respectively.
>> >
>> > Here is the documentation on all steps we went through to get things
>> set up: https://docs.google.com/document/d/1EPo2Wgh-WJz75GdykgTI1fpm
>> 89yIdoGGHAyVlJ9PbcA/edit
>> >
>> > As you can see there are quite some steps to go through, does it make
>> sense to build a docker image for setting up testing environment?
>> >
>> > It's kind of like docker in docker, is that OK?
>> >
>> > Junjun
>> >
>> >
>> > From: docktesters-bounces+junjun.zhang=oicr.on.ca at lists.icgc.org
>> [docktesters-bounces+junjun.zhang=oicr.on.ca at lists.icgc.org] on behalf
>> of Denis Yuen [Denis.Yuen at oicr.on.ca]
>> > Sent: Monday, September 19, 2016 3:39 PM
>> > To: Miguel Vazquez
>> > Cc: docktesters at lists.icgc.org
>> > Subject: Re: [DOCKTESTERS] Amazon account
>> >
>> > Hi,
>> >
>> > Quoting myself from an email since it is applicable here too
>> >
>> > Hi,
>> > Sounds reasonable, the hardware requirements listed at
>> https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker#ha
>> rdware-requirements match my recollection and a r3.4xlarge matches that
>> handily in terms of RAM and CPU
>> >
>> > The only thing I would check would be to make sure that the working
>> directory (where you run Dockstore) is in a large 1TB volume. The workflow
>> should be able to overwhelm 320 GB if that's all you have, causing it to
>> crash.
>> >
>> > When we were running for Pan-cancer, we sometimes used lvm to merge all
>> ephemeral drives on an AWS instance into one larger drive.
>> > But for testing, it would probably be simpler just to use one large EBS
>> volume.
>> >
>> > Denis Yuen
>> > Bioinformatics Software Developer
>> >
>> > Ontario Institute for Cancer Research
>> > MaRS Centre
>> > 661 University Avenue
>> > Suite 510
>> > Toronto, Ontario, Canada M5G 0A3
>> > Toll-free: 1-866-678-6427
>> > Twitter: @OICR_news
>> > www.oicr.on.ca
>> > This message and any attachments may contain confidential and/or
>> privileged information for the sole use of the intended recipient. Any
>> review or distribution by anyone other than the person for whom it was
>> originally intended is strictly prohibited. If you have received this
>> message in error, please contact the sender and delete all copies.
>> Opinions, conclusions or other information contained in this message may
>> not be that of the organization.
>> > From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org
>> [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of
>> Miguel Vazquez [mikisvaz at gmail.com]
>> > Sent: September 19, 2016 3:26 PM
>> > To: Francis Ouellette
>> > Cc: docktesters at lists.icgc.org; Zhibin
>> > Subject: Re: [DOCKTESTERS] Amazon account
>> >
>> > Thanks Francis.
>> >
>> > BTW Brian and Junjun, I think I might not have enough disk space in the
>> instance you got for me, gtdownload croaks:
>> >
>> > Error:  The system *might* run out of disk space before all downloads
>> are complete, Downloading will continue until less than 1.00 GB is
>> available.
>> >
>> > and does not seem to download anything.
>> >
>> > I took the liberty to make a directory in /mn/, which has 95GB avai.
>> but it does not seem to change things. Excuse my ignorance, but how big are
>> these files?
>> >
>> > Miguel
>> >
>> > On Mon, Sep 19, 2016 at 9:22 PM, Francis Ouellette <francis at oicr.on.ca>
>> wrote:
>> > Hi Miguel,
>> >
>> > I’m CCing docktester .. I think Junjun or Brian will be best to answer
>> this …
>> >  I think a bit more RAM would be OK, so could do: m4.10xlarge (10 CPU
>> and 160 GB RAM).
>> >
>> > Would that be good Brian?
>> >
>> > @bffo
>> >
>> >
>> > --
>> > B.F. Francis Ouellette          http://oicr.on.ca/person/franc
>> is-ouellette
>> >
>> >
>> >
>> >> On Sep 19, 2016, at 3:09 PM, Zhibin <zhibin at gmail.com> wrote:
>> >>
>> >> Jo Miguel,
>> >>
>> >> I am not familiar with Sanger pipeline. You should launch instances
>> based on the number of CPUs and memory you need.
>> >>
>> >> Best,
>> >>
>> >> Zhibin
>> >>
>> >> On Mon, Sep 19, 2016 at 3:05 PM, Miguel Vazquez <mikisvaz at gmail.com>
>> wrote:
>> >> Hello again,
>> >>
>> >> What image will you suggest to run the Sanger pipeline? I'm not very
>> used to AWS and I wouldn't want to burn through your credit accidentally. I
>> was thinking of r3.4xlarge that has 16 cores, 122GB memmory and 1x320
>> (SSD), would that be a good choice?
>> >>
>> >> Best
>> >>
>> >> Miguel
>> >>
>> >> On Mon, Sep 19, 2016 at 8:42 PM, Miguel Vazquez <mikisvaz at gmail.com>
>> wrote:
>> >> Thanks Zhibin
>> >> Best regards
>> >> Miguel
>> >
>> >
>> >
>> > _______________________________________________
>> > docktesters mailing list
>> > docktesters at lists.icgc.org
>> > https://lists.icgc.org/mailman/listinfo/docktesters
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20160920/a8e8f87c/attachment-0001.html>


More information about the docktesters mailing list