[DOCKTESTERS] Amazon account
Junjun Zhang
Junjun.Zhang at oicr.on.ca
Tue Sep 20 11:09:31 EDT 2016
Thanks for sharing the thoughts and ansible code.
I will just do the docker image as a learning exercise. Just want to get my hands thirty. If docker in docker is so bad at least I know first hand and understand why.
An AMI/Vagrant script would make sense.
I like Ansible too as well and we have some re-usable code here that was used to construct the VMs for the pan-cancer run that could be forked/stripped down to create a testing VM https://github.com/ICGC-TCGA-PanCancer/container-host-bag/tree/bcf5c074f69914ab9542ff8e2ef2607612cd05f7 (you'll probably want just the common, install-docker, java, and workflow roles for example).
Docker in Docker would be complicated and with the restrictions that cwltool places on running containers, I wouldn't be surprised if there was some incompatibility.
Hi all,
I think there are three scripting or systematizing efforts that would help
1) Setting environment up to running the dockerstore. Junjun suggested docker inside docker, which might be a bit convoluted, but a preset virtual machine or AWS AIM would be very suitable. Brian's script would be the script that provisions such a VM, and Vagrant would be a possible way to program this if this technology plays well with VM vendors
2) Downloading the samples and preparing the Dockerstore.json files to run and produce the results in standard places
3) Downloading the released version of the files and comparing with the results of the test. This could be as simple as retrieving the list of SNVs and indels and measuring the overlaps.
There is also the issue of launching the instances, issuing the jobs, monitoring completion, and possibly gathering results files for validation prior to turning them down.
Point 1) is fairly easy with the documentation we have already. Point 2) is almost 25% done, we still have 3 other dockerstore workflows to setup op, and also drivers for the ICGC client and GNOS client. Point 3) might not be that hard either once the files are gathered and organized.
The moving parts here that would require specific drivers I think are:
- The code to provision VM images, Vagrant for instance, though it is really just running a script
- The different VM technologies: AWS, Collaboratory, etc
- The 4 different pipeline: Sanger, Broad, DKFZ, BWA
- The 2 downloading tools: ICGC and GNOS
- The several different result files: SNV, Indels, SV, etc
The effort required to set all this infrastructure up might be a bit of an overkill though. On the other hand the results might be a deliverable piece of work that might have more reach than just these tests. Maybe we can discuss this on our next meeting.
Hi Junjun,
Great to hear this!
I would recommend against trying to make this a docker in docker. It’s something to be avoided because it really causes problems running in various environments.
Maybe a well commented bash script? I think the instructions are long but really we expect these workflows to run in a lot of different environments and it’s better to explain how it works so users can customize for their environment.
> Good news, both Miguel and myself have Sanger pipeline running on AWS and Collab respectively.
> Here is the documentation on all steps we went through to get things set up: https://docs.google.com/document/d/1EPo2Wgh-WJz75GdykgTI1fpm89yIdoGGHAyVlJ9PbcA/edit<redir.aspx?REF=DJCUIo5s2GE30Zh5rPFLrxvEbE6UfXDk4p3-dme7o2xpD-v9Y-HTCAFodHRwczovL2RvY3MuZ29vZ2xlLmNvbS9kb2N1bWVudC9kLzFFUG8yV2doLVdKejc1R2R5a2dUSTFmcG04OXlJZG9HR0hBeVZsSjlQYmNBL2VkaXQ.>
> As you can see there are quite some steps to go through, does it make sense to build a docker image for setting up testing environment?
> It's kind of like docker in docker, is that OK?
> Thanks Francis.
> BTW Brian and Junjun, I think I might not have enough disk space in the instance you got for me, gtdownload croaks:
> Error: The system *might* run out of disk space before all downloads are complete, Downloading will continue until less than 1.00 GB is available.
> and does not seem to download anything.
> I took the liberty to make a directory in /mn/, which has 95GB avai. but it does not seem to change things. Excuse my ignorance, but how big are these files?
>> Jo Miguel,
>> I am not familiar with Sanger pipeline. You should launch instances based on the number of CPUs and memory you need.
>> Best,
> docktesters mailing list
> docktesters at lists.icgc.org<redir.aspx?REF=WR0uUDJk7tutP4V68t0ajiITEVZZwbCQG-SEgAFI0edpD-v9Y-HTCAFtYWlsdG86ZG9ja3Rlc3RlcnNAbGlzdHMuaWNnYy5vcmc.>
> https://lists.icgc.org/mailman/listinfo/docktesters<redir.aspx?REF=yDX7cRnKluLPMydT1NZ2VoGE_LAVeZNZVIjjbIxDzJFpD-v9Y-HTCAFodHRwczovL2xpc3RzLmljZ2Mub3JnL21haWxtYW4vbGlzdGluZm8vZG9ja3Rlc3RlcnM.>
