[DOCKTESTERS] Sanger symlinked input file issue

Keiran Raine kr2 at sanger.ac.uk
Thu Oct 20 04:28:12 EDT 2016


Hi Denis,

Less painful than I thought.

Just update line 163 of the dockerfile to pull this version:

https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockerfile <https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockerfile>

Regards,

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

kr2 at sanger.ac.uk
Tel:+44 (0)1223 834244 Ext: 4983
Office: H104

> On 19 Oct 2016, at 21:36, Denis Yuen <Denis.Yuen at oicr.on.ca> wrote:
> 
> Hi,
> 
> FYI, breaking this out into a thread so hopefully this renders better in threaded email clients. 
> 
> Keiran, I took a look at running that /usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl command while pointing it at the symlinked version in /var/spool/cwl and surprisingly it works (or at least executes well past the displayed error). 
> 
> Unfortunately, this command is generated and it doesn't seem to be part of the CGP-Somatic-Docker <https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Dockerhttp://> codebase. 
> Basically, the SeqWare code that is in the that repo encodes this command (note that it refers to the version in /var/spool/cwl)
> 
> seqware at 83bd876f1b5d:/datastore/oozie-76d2e11f-9e92-44aa-841e-a7ddf70a4aab/generated-scripts$ cat s58_cgpPindel_input_70.sh
> #!/usr/bin/env bash
> set -o errexit
> set -o pipefail
> 
> export SEQWARE_SETTINGS=/var/spool/cwl/.seqware/settings
> cd /datastore/oozie-76d2e11f-9e92-44aa-841e-a7ddf70a4aab
> /usr/bin/time /usr/bin/time --format="Wall_s %e\nUser_s %U\nSystem_s %S\nMax_kb %M" --output=/var/spool/cwl/timings/0_cgpPindel_input_2 /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/Workflow_Bundle_CgpSomaticCore/0.0.0/bin/wrapper.sh /opt/wtsi-cgp pindel.pl -p input -r /var/spool/cwl/reference_files/genome.fa -e MT,GL%,hs37d5,NC_007605 -st WGS -as GRCh37 -sp human -s /var/spool/cwl/reference_files/pindel/simpleRepeats.bed.gz -f /var/spool/cwl/reference_files/pindel/genomicRules.lst -g /var/spool/cwl/reference_files/pindel/human.GRCh37.indelCoding.bed.gz -u /var/spool/cwl/reference_files/pindel/pindel_np.gff3.gz -sf /var/spool/cwl/reference_files/pindel/softRules.lst -b /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz -o /var/spool/cwl/0/pindel -t /var/spool/cwl/7875b5196f6b8b52847f99bf370aada0.bam -n /var/spool/cwl/fdcb1bd7cffca69d15383ca9566c58e0.bam -i 2 -c 4
> 
> I think /opt/wtsi-cgp pindel.pl must then be generating the failing command (using  /opt/wtsi-cgp/bin/pindel_input_gen.pl  ) that I listed below with the original symlinked location. 
> I believe that the culprit is this section of pindel.pl which must be tracing the input back to the original target of the symlink and using that instead. 
> 
>   # make all things that appear to be paths absolute
>   for (keys %opts) {
>     $opts{$_} = abs_path($opts{$_}) if(defined $opts{$_} && -e $opts{$_});
>   }
> 
> Not entirely sure why this affects this particular donor, but that's what I've found so far. Should we attempt to modify pindel or is there a configuration setting we can take advantage of? 
> 
> 
> 
> From: Keiran Raine [kr2 at sanger.ac.uk <mailto:kr2 at sanger.ac.uk>]
> Sent: October 17, 2016 4:25 AM
> To: Denis Yuen
> Cc: Miguel Vazquez; Adam Struck; docktesters at lists.icgc.org <mailto:docktesters at lists.icgc.org>
> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference
> 
> Hi Denis,
> 
> The commands executed by the Pindel step appear to be being handed the original BAM file location rather (/var/lib/cwl) than the symlinks in the output area (/var/spool/cwl).  This would be the problem.
> 
> All the jobs that execute after the BAS generation should use the symlinked BAMs in the output area (although I think it's only important for pindel and brass).
> 
> Regards,
> 
> Keiran Raine
> Principal Bioinformatician
> Cancer Genome Project
> Wellcome Trust Sanger Institute
> 
> kr2 at sanger.ac.uk <x-msg://7/redir.aspx?REF=4zfyWFME5zuNg3Iu2duWBTIs04TZffesqfdqwRC7VeM4Kn9QXvjTCAFtYWlsdG86a3IyQHNhbmdlci5hYy51aw..>
> Tel:+44 (0)1223 834244 Ext: 4983
> Office: H104
> 
>> On 15 Oct 2016, at 22:10, Denis Yuen <Denis.Yuen at oicr.on.ca <x-msg://7/redir.aspx?REF=EEPuTl7_Q12Px7_fSgo92oc2O0HCcrAbJ6FakT0TcB04Kn9QXvjTCAFtYWlsdG86RGVuaXMuWXVlbkBvaWNyLm9uLmNh>> wrote:
>> 
>> Hi,
>> 
>> Agreed, to sum up:
>> 
>> 1) The donor test set includes bas files. The real donor sets do not. 
>> 2) That said, the way the CWL file is written, regardless of whether a bas file is provided in the test set, they don't actually make it into the docker container. Instead, they get generated inside the container while it is running.
>> 3) The pindel step does indeed fail in DO50311 on a host that successfully ran the test data. 
>> 
>> Keiran, some additional info for debugging.
>> 
>> The CWL file results in this docker invocation:
>> 
>> [job temp8674499429956656923.cwl] /tmp/tmp0NOg7v$ docker \
>>     run \
>>     -i \
>>     --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:ro \
>>     --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam:ro \
>>     --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam.bai:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai:ro \
>>     --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/ab975a04-937f-40fc-b3e5-40b41c2295fc/GRCh37d5_CGP_refBundle.tar.gz:/var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz:ro \
>>     --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam:ro \
>>     --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/d3ae586e-1251-470b-bbf8-f498e5895312/GRCh37d5_battenberg.tar.gz:/var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz:ro \
>>     --volume=/tmp/tmp0NOg7v:/var/spool/cwl:rw \
>>     --volume=/home/ubuntu/CGP-Somatic-Docker-original/datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/workingQriPMs:/tmp:rw \
>>     --workdir=/var/spool/cwl \
>>     --read-only=true \
>>     --user=1000 \
>>     --env=TMPDIR=/tmp \
>>     --env=HOME=/var/spool/cwl \
>>     quay.io/pancancer/pcawg-sanger-cgp-workflow:2.0.0-cwl1 <x-msg://7/redir.aspx?REF=4xVT1d2OiypnFeCFXosdCvpxbtXo4eyJ-2eROBgvFKOSjIFQXvjTCAFodHRwOi8vcXVheS5pby9wYW5jYW5jZXIvcGNhd2ctc2FuZ2VyLWNncC13b3JrZmxvdzoyLjAuMC1jd2wx> \
>>     python \
>>     /home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py \
>>     --tumor \
>>     /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam \
>>     --normal \
>>     /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam \
>>     --refFrom \
>>     /var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz \
>>     --bbFrom \
>>     /var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz
>> 
>> The listing of the working directory (/var/spool/cwl) is as follows and does seem to include generated bas files:
>> 
>> ubuntu at sanger-retest:/tmp/tmp0NOg7v$ ls -alhtr
>> total 92K
>> -rw-r--r--  1 ubuntu ubuntu 1.6K Oct 14 22:07 workflow.ini
>> -rw-r--r--  1 ubuntu ubuntu   28 Oct 14 22:07 .Rprofile
>> drwxr-xr-x  3 ubuntu root   4.0K Oct 14 22:07 .seqware
>> drwxr-xr-x  2 ubuntu ubuntu 4.0K Oct 14 22:07 ngsCounts
>> lrwxrwxrwx  1 ubuntu ubuntu   89 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam
>> lrwxrwxrwx  1 ubuntu ubuntu   93 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam.bai -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai
>> lrwxrwxrwx  1 ubuntu ubuntu   89 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam
>> lrwxrwxrwx  1 ubuntu ubuntu   93 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam.bai -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai
>> drwxr-xr-x  3 ubuntu ubuntu 4.0K Oct 14 22:07 1
>> drwxr-xr-x  2 ubuntu ubuntu 4.0K Oct 14 22:07 genotype
>> drwxr-xr-x  8 ubuntu ubuntu 4.0K Oct 14 22:19 reference_files
>> drwxr-xr-x  2 ubuntu ubuntu 4.0K Oct 14 22:19 genotype_b02b4bba-6e66-44fb-a48f-38c309aaaac5
>> -rw-r--r--  1 ubuntu ubuntu 2.4K Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz
>> -rw-r--r--  1 ubuntu ubuntu   33 Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz.md5
>> -rw-r--r--  1 ubuntu ubuntu 1.5K Oct 14 22:58 fdcb1bd7cffca69d15383ca9566c58e0.bam.bas
>> -rw-r--r--  1 ubuntu ubuntu 2.3K Oct 14 23:17 7875b5196f6b8b52847f99bf370aada0.bam.bas
>> -rw-r--r--  1 ubuntu ubuntu   33 Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz.md5
>> -rw-r--r--  1 ubuntu ubuntu 2.4K Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz
>> drwxr-xr-x  2 ubuntu ubuntu 4.0K Oct 15 10:02 timings
>> drwxr-xr-x  5 ubuntu ubuntu 4.0K Oct 15 10:02 0
>> drwxr-xr-x  3 ubuntu ubuntu 4.0K Oct 15 10:06 bbCounts
>> drwx------ 11 ubuntu ubuntu 4.0K Oct 15 20:28 .
>> drwxrwxrwt 21 root   root   4.0K Oct 15 20:29 ..
>> 
>> The full output of the failing script is:
>> 
>> Errors from command: /usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz
>> 
>> Unknown sort order field: unknown
>> Collated 500000 readpairs (in 6 sec.)
>> [V] 1   34.4825MB/s     133279
>> Thread Worker 1: started
>> Thread 1 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03''
>> Collated 500000 readpairs (in 4 sec.)
>> [V] 2   39.9836MB/s     154626
>> Thread Worker 2: started
>> Thread 2 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03''
>> Collated 500000 readpairs (in 4 sec.)
>> Thread Worker 3: started
>> Thread 3 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'
>> [V] 3   42.4188MB/s     164102
>> Collated 500000 readpairs (in 4 sec.)
>> Thread Worker 4: started
>> Thread 4 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03''
>> [V] 4   43.7799MB/s     169368
>> Collated 500000 readpairs (in 4 sec.)
>> An error occurred while running:
>>         /opt/wtsi-cgp/bin/bamcollate2 outputformat=sam colsbs=268435456 collate=1 classes=F,F2 exclude=DUP,SECONDARY,QCFAIL,SUPPLEMENTARY T=/var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c/tmp1kNw/collate_tmp filename=/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam
>> ERROR: Converter thread error: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03''
>> 
>> Perl exited with active threads:
>>         0 running and unjoined
>>         3 finished and unjoined
>>         0 running and detached
>> Thread 2 terminated abnormally: main=HASH(0x50044b0) at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 115.
>> Thread error: "/usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz" unexpectedly returned exit value 29 at (eval 410) line 13 thread 2.
>>  at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 190
>> 
>> Command exited with non-zero status 25
>> 10.23user 8.79system 0:32.33elapsed 58%CPU (0avgtext+0avgdata 10819936maxresident)k
>> 1184inputs+8552outputs (2major+711948minor)pagefaults 0swaps 
>> 
>> Please let me know if any of the files on that host would be useful to debug this.
>> 
>> 
>> From: mikisvaz at gmail.com <x-msg://7/redir.aspx?REF=HLi2gsYnkByZIrxw9tkgQ3smNIdi5933-xaaJSjcikiSjIFQXvjTCAFtYWlsdG86bWlraXN2YXpAZ21haWwuY29t> [mikisvaz at gmail.com <x-msg://7/redir.aspx?REF=HLi2gsYnkByZIrxw9tkgQ3smNIdi5933-xaaJSjcikiSjIFQXvjTCAFtYWlsdG86bWlraXN2YXpAZ21haWwuY29t>] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es <x-msg://7/redir.aspx?REF=82FKQEdg1KYeV7fUIdfKnUibR7K5SoE50-vpnDSw3VqSjIFQXvjTCAFtYWlsdG86bWlndWVsLnZhenF1ZXpAY25pby5lcw..>]
>> Sent: October 15, 2016 2:52 AM
>> To: Denis Yuen
>> Cc: Adam Struck; Keiran Raine; docktesters at lists.icgc.org <x-msg://7/redir.aspx?REF=kid5T9Srz1MPcWdg34ViTlJk22Q3G-C_70CPWVlxSX2SjIFQXvjTCAFtYWlsdG86ZG9ja3Rlc3RlcnNAbGlzdHMuaWNnYy5vcmc.>
>> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference
>> 
>> Just to clarify Denis, the BAS file was not present in the download of the donor data, while it was present in the download of the test data. That is as much as I observed, and this matched Keiran comment that a missing BAS file was consistent with the pindel error.I have no idea of what the workflow was doing so as far as I know the BAS could have been created correctly and the error was something else. 
>> 
>> Best regards
>> 
>> On Sat, Oct 15, 2016 at 12:20 AM, Denis Yuen <Denis.Yuen at oicr.on.ca <x-msg://7/UrlBlockedError.aspx>> wrote:
>> Hi,
>> 
>> Adam, to summarise:
>> My observations seem to match yours, the bas file and input bams are generated inside the Docker container with the test data. However, Miguel has observed that something else seems to be happening with DO50311 that looks like the bas file being missing. I'm currently running that donor to see if I can extract more information and determine what is occurring. 
>> 
>> 
>> From: Adam Struck [strucka at ohsu.edu <x-msg://7/UrlBlockedError.aspx>]
>> Sent: October 14, 2016 6:10 PM
>> To: Denis Yuen; Keiran Raine
>> 
>> Cc: docktesters at lists.icgc.org <x-msg://7/UrlBlockedError.aspx>
>> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference
>> 
>> Hi Denis,
>>  
>> Sorry, to chime in late.  The bas file and input BAMs should be getting colocalized already (see below). Where are these files ending up when you run the workflow?
>>  
>> Inputs are symlinked to the OUTDIR.
>> https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L1035-L1039 <x-msg://7/UrlBlockedError.aspx>
>>  
>> Bas files are written to OUTDIR
>> https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 <x-msg://7/UrlBlockedError.aspx>
>>  
>> I have now run 25 donors worth of data (from PRAD-UK) through this workflow using the WDL descriptor and the cromwell engine on the CCC platform without an issue.
>>  
>> -Adam 
>>  
>> From: <docktesters-bounces+strucka=ohsu.edu at lists.icgc.org <x-msg://7/UrlBlockedError.aspx>> on behalf of Denis Yuen <Denis.Yuen at oicr.on.ca <x-msg://7/UrlBlockedError.aspx>>
>> Date: Friday, October 14, 2016 at 2:55 PM
>> To: Keiran Raine <kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>>
>> Cc: "docktesters at lists.icgc.org <x-msg://7/UrlBlockedError.aspx>" <docktesters at lists.icgc.org <x-msg://7/UrlBlockedError.aspx>>
>> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference
>>  
>> Hi,
>> 
>> Just as a heads-up for the end-of-week in this thread. 
>> 
>> > RUN cpanm --mirror http://cpan.metacpan.org <x-msg://7/UrlBlockedError.aspx> -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \
>>      rm -rf ~/.cpanm
>> 
>> This got me on the right track, I actually needed the following syntax
>> 
>> RUN cpanm --mirror https://cpan.metacpan.org <x-msg://7/UrlBlockedError.aspx> -l $OPT File::ShareDir File::ShareDir::Install CJFIELDS/BioPerl-1.6.924.tar.gz Const::Fast Graph && \
>>      rm -rf ~/.cpanm
>> 
>> However, it looks like making the suggested change breaks the workflow when attempting to run with the test data. In short, the bas file is definitely being generated inside the Docker container. Moving it to the suggested location breaks the workflow later. 
>> 
>> I'm currently attempting to run the donor DO50311 to see if I can get more insight into what is going on.
>> 
>> From: Keiran Raine [kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>]
>> Sent: October 12, 2016 2:09 PM
>> To: Denis Yuen
>> Cc: docktesters at lists.icgc.org <x-msg://7/UrlBlockedError.aspx>
>> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference
>> 
>> Hi Denis,
>>  
>> You've hit an issue that only has occurred in the last few weeks for us also.
>>  
>> BioPerl released a new version (first in ~20 months) that split the repository moving a whole section into a different package.
>>  
>> The fix would be to force the first install of BioPerl to a specific version.  Modify line 25/26 of the Dockerfile from:
>> 
>> RUN cpanm --mirror http://cpan.metacpan.org <x-msg://7/UrlBlockedError.aspx> -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version Const::Fast Graph && \
>>      rm -rf ~/.cpanm
>>  
>> to
>>  
>> RUN cpanm --mirror http://cpan.metacpan.org <x-msg://7/UrlBlockedError.aspx> -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \
>>      rm -rf ~/.cpanm
>>  
>> Thankfully something I could identify immediately.
>>  
>> Regards,
>> 
>> Keiran Raine
>> Principal Bioinformatician
>> Cancer Genome Project
>> Wellcome Trust Sanger Institute
>>  
>> kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>
>> Tel:+44 (0)1223 834244 Ext: 4983 <tel:%2B44%20%280%291223%20834244%20Ext%3A%204983>
>> Office: H104
>>  
>> On 12 Oct 2016, at 17:04, Denis Yuen <Denis.Yuen at oicr.on.ca <x-msg://7/UrlBlockedError.aspx>> wrote:
>>  
>> Hi, 
>> 
>> Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section.
>> Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)?
>> I'm attaching the build log from the Dockerfile and the log from inside the container.
>> 
>> 
>>  
>> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org <x-msg://7/UrlBlockedError.aspx> [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org <x-msg://7/UrlBlockedError.aspx>] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca <x-msg://7/UrlBlockedError.aspx>]
>> Sent: October 12, 2016 10:59 AM
>> To: Keiran Raine
>> Cc: docktesters at lists.icgc.org <x-msg://7/UrlBlockedError.aspx>
>> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference
>> 
>> Hi,
>> 
>> While that would have been a good explanation, unfortunately, it doesn't seem to be the case. 
>> In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl <x-msg://7/UrlBlockedError.aspx> ) , the bam files are described like 
>> 
>> inputs:
>>   tumor:
>>     type: File
>>     inputBinding:
>>       position: 1
>>       prefix: --tumor
>>     secondaryFiles:
>>     - .bai
>>  
>>   refFrom:
>>     type: File
>>     inputBinding:
>>       position: 3
>>       prefix: --refFrom
>>   bbFrom:
>>     type: File
>>     inputBinding:
>>       position: 4
>>       prefix: --bbFrom
>>   normal:
>>     type: File
>>     inputBinding:
>>       position: 2
>>       prefix: --normal
>>     secondaryFiles:
>>     - .bai
>> The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. 
>>  
>> From: Keiran Raine [kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>]
>> Sent: October 12, 2016 10:49 AM
>> To: Denis Yuen
>> Cc: Miguel Vazquez; docktesters at lists.icgc.org <x-msg://7/UrlBlockedError.aspx>
>> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference
>> 
>> Hi Denis, 
>>  
>> I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected.
>> 
>> Keiran Raine
>> Principal Bioinformatician
>> Cancer Genome Project
>> Wellcome Trust Sanger Institute
>>  
>> kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>
>> Tel:+44 (0)1223 834244 Ext: 4983 <tel:%2B44%20%280%291223%20834244%20Ext%3A%204983>
>> Office: H104
>>  
>> On 12 Oct 2016, at 15:36, Denis Yuen <Denis.Yuen at oicr.on.ca <x-msg://7/UrlBlockedError.aspx>> wrote:
>>  
>> Hi,
>> I can make the modification, I'll run it through the test data and that should finish in roughly a day. 
>> In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data?
>> 
>>  
>> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org <x-msg://7/UrlBlockedError.aspx> [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org <x-msg://7/UrlBlockedError.aspx>] on behalf of Keiran Raine [kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>]
>> Sent: October 12, 2016 6:16 AM
>> To: Miguel Vazquez
>> Cc: docktesters at lists.icgc.org <x-msg://7/UrlBlockedError.aspx>
>> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference
>> 
>> Hi, 
>>  
>> This is assuming that it is possible to write to the location the BAM are in.
>>  
>> I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL).
>>  
>> Regards,
>> 
>> Keiran Raine
>> Principal Bioinformatician
>> Cancer Genome Project
>> Wellcome Trust Sanger Institute
>>  
>> kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>
>> Tel:+44 (0)1223 834244 Ext: 4983 <tel:%2B44%20%280%291223%20834244%20Ext%3A%204983>
>> Office: H104
>>  
>> On 12 Oct 2016, at 10:37, Miguel Vazquez <miguel.vazquez at cnio.es <x-msg://7/UrlBlockedError.aspx>> wrote:
>>  
>> [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM]
>> 
>> Hi Keiran,
>> 
>> Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? 
>> 
>>  
>> Miguel
>> 
>> 
>>  
>> On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine <kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>> wrote:
>> In the original version we didn't do this step, if we have write access it can be made to do that
>> 
>> Keiran Raine
>> Principal Bioinformatician
>> Cancer Genome Project
>> Wellcome Trust Sanger Institute
>>  
>> kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>
>> Tel:+44 (0)1223 834244 Ext: 4983 <tel:%2B44%20%280%291223%20834244%20Ext%3A%204983>
>> Office: H104
>>  
>> On 11 Oct 2016, at 13:49, Miguel Vazquez <miguel.vazquez at cnio.es <x-msg://7/UrlBlockedError.aspx>> wrote:
>>  
>> Hi Keiran,
>> 
>> If the BAS and BAM files need to be collocated, why is it not created next to the BAM file?
>> 
>> Would it not be better if it read
>> 
>> private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { 
>> Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index);
>>  
>> File f = new File(sampleBam);
>>  
>> thisJob.getCommand()
>>  
>> .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh")
>>  
>> .addArgument(installBase)
>>  
>> .addArgument("bam_stats")
>>  
>> .addArgument("-i " + sampleBam)
>>  
>> .addArgument("-o " + sampleBam + ".bas")
>>  
>> ;
>>  
>> return thisJob;
>> }
>>  
>> Best
>> 
>> Miguel
>>  
>> On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine <kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>> wrote:
>> Relevant section of code: 
>>  
>> https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 <x-msg://7/UrlBlockedError.aspx>
>> 
>> Keiran Raine
>> Principal Bioinformatician
>> Cancer Genome Project
>> Wellcome Trust Sanger Institute
>>  
>> kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>
>> Tel:+44 (0)1223 834244 Ext: 4983 <tel:%2B44%20%280%291223%20834244%20Ext%3A%204983>
>> Office: H104
>>  
>> On 11 Oct 2016, at 13:40, Keiran Raine <kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>> wrote:
>>  
>> Hi, 
>>  
>> There is a step generating the BAS files:
>>  
>> [2016/10/10 07:28:37] |  Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh
>> [2016/10/10 07:28:37] |  Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh
>>  
>> But if the BAM files and BAS aren't co-located then you have a problem.  You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end.
>>  
>> This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing.  Moving to this means that any BAM input is sufficient.
>>  
>> Hope this is easier to solve now,
>> 
>> Keiran Raine
>> Principal Bioinformatician
>> Cancer Genome Project
>> Wellcome Trust Sanger Institute
>>  
>> kr2 at sanger.ac.uk <x-msg://7/UrlBlockedError.aspx>
>> Tel:+44 (0)1223 834244 Ext: 4983 <tel:%2B44%20%280%291223%20834244%20Ext%3A%204983>
>> Office: H104
>>  
>> On 11 Oct 2016, at 13:31, Miguel Vazquez <miguel.vazquez at cnio.es <x-msg://7/UrlBlockedError.aspx>> wrote:
>>  
>> Keiran,
>> 
>> Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them?
>> 
>> Best
>> 
>> Miguel
>>  
>> On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez <miguel.vazquez at cnio.es <x-msg://7/UrlBlockedError.aspx>> wrote:
>>  
>> 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location.
>>  
>>  
>> I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. 
>> 
>> Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more.
>> 
>> Best
>> 
>> Miguel
>>  
>>  
>>  
>> 
>> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. 
>>  
>> 
>> 
>> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. 
>>  
>>  
>> 
>> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
>>  
>> 
>> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. 
>> <build.log><outer.build.log>
>>  
>> 
>> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
>> 
>> _______________________________________________
>> docktesters mailing list
>> docktesters at lists.icgc.org <x-msg://7/UrlBlockedError.aspx>
>> https://lists.icgc.org/mailman/listinfo/docktesters <x-msg://7/UrlBlockedError.aspx>
> 
> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161020/c9182da9/attachment-0001.html>


More information about the docktesters mailing list