[DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference

Denis Yuen Denis.Yuen at oicr.on.ca
Wed Oct 12 10:59:32 EDT 2016


Hi,

While that would have been a good explanation, unfortunately, it doesn't seem to be the case.
In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like


inputs:
  tumor:
    type: File
    inputBinding:
      position: 1
      prefix: --tumor
    secondaryFiles:
    - .bai

  refFrom:
    type: File
    inputBinding:
      position: 3
      prefix: --refFrom
  bbFrom:
    type: File
    inputBinding:
      position: 4
      prefix: --bbFrom
  normal:
    type: File
    inputBinding:
      position: 2
      prefix: --normal
    secondaryFiles:
    - .bai


The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime.

________________________________
From: Keiran Raine [kr2 at sanger.ac.uk]
Sent: October 12, 2016 10:49 AM
To: Denis Yuen
Cc: Miguel Vazquez; docktesters at lists.icgc.org
Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference

Hi Denis,

I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected.

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

kr2 at sanger.ac.uk<redir.aspx?REF=J1koWwfjkq5EAdaIuQMRk95_ySl3wotqeE8pmPxjmS9wARJyr_LTCAFtYWlsdG86a3IyQHNhbmdlci5hYy51aw..>
Tel:+44 (0)1223 834244 Ext: 4983
Office: H104

On 12 Oct 2016, at 15:36, Denis Yuen <Denis.Yuen at oicr.on.ca<redir.aspx?REF=CiFHvlnft--70vaHhkCfTkibUdEQZ0jJu493cQ9nD6FwARJyr_LTCAFtYWlsdG86RGVuaXMuWXVlbkBvaWNyLm9uLmNh>> wrote:

Hi,
I can make the modification, I'll run it through the test data and that should finish in roughly a day.
In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data?


________________________________
From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org<redir.aspx?REF=POTlcWOY4NGvVrBhF5zcLO2SdwvkHeTOjluUtEPtlfJwARJyr_LTCAFtYWlsdG86ZG9ja3Rlc3RlcnMtYm91bmNlcytkZW5pcy55dWVuPW9pY3Iub24uY2FAbGlzdHMuaWNnYy5vcmc.> [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org<redir.aspx?REF=POTlcWOY4NGvVrBhF5zcLO2SdwvkHeTOjluUtEPtlfJwARJyr_LTCAFtYWlsdG86ZG9ja3Rlc3RlcnMtYm91bmNlcytkZW5pcy55dWVuPW9pY3Iub24uY2FAbGlzdHMuaWNnYy5vcmc.>] on behalf of Keiran Raine [kr2 at sanger.ac.uk<redir.aspx?REF=J1koWwfjkq5EAdaIuQMRk95_ySl3wotqeE8pmPxjmS9wARJyr_LTCAFtYWlsdG86a3IyQHNhbmdlci5hYy51aw..>]
Sent: October 12, 2016 6:16 AM
To: Miguel Vazquez
Cc: docktesters at lists.icgc.org<redir.aspx?REF=WnRYWRCxTfjIWXAX1VfEuNFz6QJ-EOMRQJn6dJiu3qNwARJyr_LTCAFtYWlsdG86ZG9ja3Rlc3RlcnNAbGlzdHMuaWNnYy5vcmc.>
Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference

Hi,

This is assuming that it is possible to write to the location the BAM are in.

I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL).

Regards,

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

kr2 at sanger.ac.uk<UrlBlockedError.aspx>
Tel:+44 (0)1223 834244 Ext: 4983
Office: H104

On 12 Oct 2016, at 10:37, Miguel Vazquez <miguel.vazquez at cnio.es<UrlBlockedError.aspx>> wrote:

[The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM]

Hi Keiran,

Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative?


Miguel



On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine <kr2 at sanger.ac.uk<UrlBlockedError.aspx>> wrote:
In the original version we didn't do this step, if we have write access it can be made to do that

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

kr2 at sanger.ac.uk<UrlBlockedError.aspx>
Tel:+44 (0)1223 834244 Ext: 4983<tel:%2B44%20%280%291223%20834244%20Ext%3A%204983>
Office: H104

On 11 Oct 2016, at 13:49, Miguel Vazquez <miguel.vazquez at cnio.es<UrlBlockedError.aspx>> wrote:

Hi Keiran,

If the BAS and BAM files need to be collocated, why is it not created next to the BAM file?

Would it not be better if it read

private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) {
Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index);
File f = new File(sampleBam);
thisJob.getCommand()
.addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh")
.addArgument(installBase)
.addArgument("bam_stats")
.addArgument("-i " + sampleBam)
.addArgument("-o " + sampleBam + ".bas")
;
return thisJob;
}

Best

Miguel

On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine <kr2 at sanger.ac.uk<UrlBlockedError.aspx>> wrote:
Relevant section of code:

https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780<UrlBlockedError.aspx>

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

kr2 at sanger.ac.uk<UrlBlockedError.aspx>
Tel:+44 (0)1223 834244 Ext: 4983<tel:%2B44%20%280%291223%20834244%20Ext%3A%204983>
Office: H104

On 11 Oct 2016, at 13:40, Keiran Raine <kr2 at sanger.ac.uk<UrlBlockedError.aspx>> wrote:

Hi,

There is a step generating the BAS files:

[2016/10/10 07:28:37] |  Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh
[2016/10/10 07:28:37] |  Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh

But if the BAM files and BAS aren't co-located then you have a problem.  You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end.

This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing.  Moving to this means that any BAM input is sufficient.

Hope this is easier to solve now,

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

kr2 at sanger.ac.uk<UrlBlockedError.aspx>
Tel:+44 (0)1223 834244 Ext: 4983<tel:%2B44%20%280%291223%20834244%20Ext%3A%204983>
Office: H104

On 11 Oct 2016, at 13:31, Miguel Vazquez <miguel.vazquez at cnio.es<UrlBlockedError.aspx>> wrote:

Keiran,

Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them?

Best

Miguel

On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez <miguel.vazquez at cnio.es<UrlBlockedError.aspx>> wrote:

4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location.


I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check.

Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more.

Best

Miguel




-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.



-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.



-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.


-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.icgc.org/mailman/private/docktesters/attachments/20161012/de0aa558/attachment-0001.html>


More information about the docktesters mailing list