From miguel.vazquez at cnio.es Tue Oct 4 03:22:43 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Tue, 4 Oct 2016 09:22:43 +0200 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <8212A167-228D-4184-A707-F5DC39F8FA27@oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6DDD0535-6E45-413E-8E93-60E22F6C3510@oicr.on.ca> <8231F9BE-D796-4F7C-9F61-8BAEA336F038@ohsu.edu> <8212A167-228D-4184-A707-F5DC39F8FA27@oicr.on.ca> Message-ID: Hi all, I'm sorry to have missed the call yesterday, I got hung up with administrative things. What did I miss? Best Miguel On Fri, Sep 30, 2016 at 6:19 PM, Francis Ouellette wrote: > > > Adam, > > Can you complete table on wiki: > > https://wiki.oicr.on.ca/pages/editpage.action?pageId=66309629 > > Actually: Miguel and Junjun, can you also somplete that table? > > Thank you. > > Adam: can you try and make workflow work on one platform with the same > data that Miguel and Junjun tried? > > Thank you, > > @bffo > > > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > > > On Sep 30, 2016, at 12:02 PM, Adam Struck wrote: > > I thought I should add that I've run the Sanger pipeline on the test data > in three separate environments using both CWL and WDL descriptors. > > Feel free to email me the errors you are seeing and I'll try and debug. > > Adam Struck > Scientific Programmer > Computational Biology > Oregon Health and Science University > > > On Sep 30, 2016, at 7:52 AM, Francis Ouellette wrote: > > yes, good plan. > > @bffo > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > > > On Sep 30, 2016, at 10:46 AM, Miguel Vazquez > wrote: > > Dear Francis, > > I haven't had a chance to go at it again since the meeting, I got a new > position in Norway, but next week I will try to find some time. I think I > might just try another pipeline, since it seems like the Sanger pipeline > needs fixing. Although perhaps I will first try Denis test data again and > see if I can reproduce the error, after which we probably should contact > the person in charge. > > Best > > M > > On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: > >> Dear Miguel and Junjun, >> >> Any more attempts on testing the PCAWG sanger docker container? >> >> If you reproduce the same error, we will need to involve Bryan and >> Keiran Raine (author of the container). >> >> Let?s get this one figured out. >> >> I am going to assume that the making of the docker container is what >> needs resolving. >> >> Brian: We may need your input here. >> >> Details of our current experiment should continue to be posted here: >> >> https://goo.gl/XX5BG9 >> >> Thank you all, >> >> francis >> >> PS would be good for others on list to follow directions on above google >> doc and also see if they can succeed on this workflow. >> >> Junjun and Miguel have tried different clouds, but used the sanger >> workflow, on the same data set. >> >> Thank you for trying to do this. >> >> Would be good if I heard back from anybody before Monday AM (tech call). >> >> @bffo >> >> PS I CCed Keiran, but waiting to hear back from Brian before we need to >> involve him some more. >> PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the >> same data set). >> >> >> -- >> B.F. Francis Ouellette http://oicr.on.ca/person/fran >> cis-ouellette >> >> >> >> Begin forwarded message: >> >> *From: *Christina Yung >> *Subject: **[PAWG-TECH] Draft agenda for PCAWG-TECH teleconference* >> *Date: *September 30, 2016 at 9:19:39 AM EDT >> *To: *"pawg-tech (pawg-tech at lists.icgc.org)" >> >> Hi Everyone, >> >> >> >> Below is a draft agenda for Monday?s tech call. Please let me know if >> you have any agenda items for discussion. >> >> https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG- >> TECH+Teleconference >> >> >> >> Have a great weekend! >> >> Christina >> >> >> Call Info >> >> *Usual Time 9 AM Eastern Time, Mondays* >> >> *UK 0208 322 2500* >> >> *Canada 1-866-220-6419* >> >> *United States **1-877-420-0272* >> >> *All Others Please see attached PDF file >> with >> a list of numbers for other countries.* >> >> *Participant Code 5910819#* >> Agenda >> >> >> *Time* >> >> *Item* >> >> *Who* >> >> *Attachments/Links* >> >> 5min >> >> Welcome. Wait for group members to log on >> >> Christina Yung , OICR >> >> 10min >> >> Overall status >> Christina Yung , OICR >> >> ? Linkouts to Most Current PCAWG Data >> >> >> ? Report data issues to pcawg-data at icgc.org, GNOS issue to: >> Help at annaisystems.com >> >> ? From Boston F2F: PCAWG datasets & dependencies >> >> >> *Action Items* >> >> 1. [Joachim] Consensus SV - final? >> >> 2. [Jakob] Consensus SNVs - changes to "SNV near indels" >> annotation? >> >> 3. [Junjun] Specimen ID mapping for miRNA and methylation >> >> 4. [Jonathan & Joachim] Consensus calls for cell lines, followed >> by filtering >> >> 5. [Matthias & Gordon] Docker containers for filtering methods >> >> 6. [Christina] Run alignment & variant workflows on >> medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to >> estimate false negative rate >> >> 7. [Christina] Follow up with institutes interested in hosting >> PCAWG data long-term >> >> 8. [All] As per Jennifer's email on Sept 16, please provide >> authorship information again or for the first time using PCAWG Author Form ( >> http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so >> you can go back later to provide updates, for example about your evolving >> role in writing specific papers. >> >> 9. [All] Contribute to the manuscripts on >> >> a. infrastructure: https://docs.google.com/docume >> nt/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit >> >> b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( >> https://goo.gl/EWYh7e ) >> >> c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline >> >> ) >> >> 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 >> conference >> >> 10min >> >> Status of dockerizing workflows >> Brian O'Connor , UCSC >> >> Gordon Saksena >> , Broad >> >> Francis Ouellette , OICR >> *Status of PCAWG Workflow ports to Dockstore*: >> >> Denis has been porting the Dockstore entries to CWL version 1.0 which is >> part of our effort to publish Dockstore (this doesn't affect the content of >> the pipelines, simply their "descriptors" which allow them to be runnable >> via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL >> (which all work with CWL 1.0 and Kerian's test dataset) and has fixed >> issues with DKFZ and is testing the latter with a real sample shortly. >> >> 1. BWA-Mem - Ready for testing by Francis' team >> >> 2. Sanger - Ready for testing by Francis' team. >> >> 3. EMBL - Ready for testing by Francis' team. >> >> 4. DKFZ - Ready for testing by Francis' team. I've exchanged >> emails with Manuel Ballesteros who has been testing this pipeline. >> >> 5. Broad - Variant calling (MuTect, dRanger, snowman), need some >> work, Gordan sent details previously >> >> 6. OxoG - Waiting for Dimitri to provide OxoG docker >> >> 7. Variantbam >> >> 8. Consensus algorithm >> >> PCAWG Docker (Dockstore) Testing Working Group >> >> >> 5min >> >> Other business? >> >> Group >> >> >> >> >> >> >> >> *Christina K. Yung, PhD* >> Project Manager, Cancer Genome Collaboratory >> >> *Ontario Institute for Cancer Research* >> MaRS Centre >> >> 661 University Avenue, Suite 510 >> Toronto, Ontario, Canada M5G 0A3 >> Tel: 416-673-8578 >> >> www.oicr.on.ca >> >> >> This message and any attachments may contain confidential and/or >> privileged information for the sole use of the intended recipient. Any >> review or distribution by anyone other than the person for whom it was >> originally intended is strictly prohibited. If you have received this >> message in error, please contact the sender and delete all copies. >> Opinions, conclusions or other information contained in this message may >> not be that of the organization. >> >> >> _______________________________________________ >> PAWG-TECH mailing list >> PAWG-TECH at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/pawg-tech >> >> >> >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters >> >> > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Tue Oct 4 10:24:26 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Tue, 4 Oct 2016 14:24:26 +0000 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6DDD0535-6E45-413E-8E93-60E22F6C3510@oicr.on.ca> <8231F9BE-D796-4F7C-9F61-8BAEA336F038@ohsu.edu> <8212A167-228D-4184-A707-F5DC39F8FA27@oicr.on.ca> Message-ID: <8ECBAF05-EACF-4D5F-8C0A-36C69EAE9F99@oicr.on.ca> Hi Miguel, I think it will be important for all to update this table: https://wiki.oicr.on.ca/pages/editpage.action?pageId=66309629 (in particular, Adam S, Miguel, Junjun and Christina). Other important message from Lincoln: We need to test BWA-mem workflow. Christina: do we need to ?undo? a BAM file for that? We have a lot of testing to do, and next call is in 13 days (skipping next week because of thanksgiving holiday in Canada next Monday). Brian/Dennis: apparently there is a problem with the Sanger workflow docker container because it didn?t work for Miguel and Junjun, but worked on test data for Adam. Please advise what you would like tested. (I asked Adam to test ?standard? dataset for Sanger workflow. thank you, @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Oct 4, 2016, at 3:22 AM, Miguel Vazquez > wrote: Hi all, I'm sorry to have missed the call yesterday, I got hung up with administrative things. What did I miss? Best Miguel On Fri, Sep 30, 2016 at 6:19 PM, Francis Ouellette > wrote: Adam, Can you complete table on wiki: https://wiki.oicr.on.ca/pages/editpage.action?pageId=66309629 Actually: Miguel and Junjun, can you also somplete that table? Thank you. Adam: can you try and make workflow work on one platform with the same data that Miguel and Junjun tried? Thank you, @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Sep 30, 2016, at 12:02 PM, Adam Struck > wrote: I thought I should add that I've run the Sanger pipeline on the test data in three separate environments using both CWL and WDL descriptors. Feel free to email me the errors you are seeing and I'll try and debug. Adam Struck Scientific Programmer Computational Biology Oregon Health and Science University On Sep 30, 2016, at 7:52 AM, Francis Ouellette > wrote: yes, good plan. @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Sep 30, 2016, at 10:46 AM, Miguel Vazquez > wrote: Dear Francis, I haven't had a chance to go at it again since the meeting, I got a new position in Norway, but next week I will try to find some time. I think I might just try another pipeline, since it seems like the Sanger pipeline needs fixing. Although perhaps I will first try Denis test data again and see if I can reproduce the error, after which we probably should contact the person in charge. Best M On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: Dear Miguel and Junjun, Any more attempts on testing the PCAWG sanger docker container? If you reproduce the same error, we will need to involve Bryan and Keiran Raine > (author of the container). Let?s get this one figured out. I am going to assume that the making of the docker container is what needs resolving. Brian: We may need your input here. Details of our current experiment should continue to be posted here: https://goo.gl/XX5BG9 Thank you all, francis PS would be good for others on list to follow directions on above google doc and also see if they can succeed on this workflow. Junjun and Miguel have tried different clouds, but used the sanger workflow, on the same data set. Thank you for trying to do this. Would be good if I heard back from anybody before Monday AM (tech call). @bffo PS I CCed Keiran, but waiting to hear back from Brian before we need to involve him some more. PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same data set). -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette Begin forwarded message: From: Christina Yung > Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Date: September 30, 2016 at 9:19:39 AM EDT To: "pawg-tech (pawg-tech at lists.icgc.org)" > Hi Everyone, Below is a draft agenda for Monday?s tech call. Please let me know if you have any agenda items for discussion. https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG-TECH+Teleconference Have a great weekend! Christina Call Info Usual Time 9 AM Eastern Time, Mondays UK 0208 322 2500 Canada 1-866-220-6419 United States 1-877-420-0272 All Others Please see attached PDF file with a list of numbers for other countries. Participant Code 5910819# Agenda Time Item Who Attachments/Links 5min Welcome. Wait for group members to log on Christina Yung, OICR 10min Overall status Christina Yung, OICR ? Linkouts to Most Current PCAWG Data ? Report data issues to pcawg-data at icgc.org, GNOS issue to: Help at annaisystems.com ? From Boston F2F: PCAWG datasets & dependencies Action Items 1. [Joachim] Consensus SV - final? 2. [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? 3. [Junjun] Specimen ID mapping for miRNA and methylation 4. [Jonathan & Joachim] Consensus calls for cell lines, followed by filtering 5. [Matthias & Gordon] Docker containers for filtering methods 6. [Christina] Run alignment & variant workflows on medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false negative rate 7. [Christina] Follow up with institutes interested in hosting PCAWG data long-term 8. [All] As per Jennifer's email on Sept 16, please provide authorship information again or for the first time using PCAWG Author Form (http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so you can go back later to provide updates, for example about your evolving role in writing specific papers. 9. [All] Contribute to the manuscripts on a. infrastructure: https://docs.google.com/document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e ) c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline ) 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 conference 10min Status of dockerizing workflows Brian O'Connor, UCSC Gordon Saksena, Broad Francis Ouellette, OICR Status of PCAWG Workflow ports to Dockstore: Denis has been porting the Dockstore entries to CWL version 1.0 which is part of our effort to publish Dockstore (this doesn't affect the content of the pipelines, simply their "descriptors" which allow them to be runnable via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL (which all work with CWL 1.0 and Kerian's test dataset) and has fixed issues with DKFZ and is testing the latter with a real sample shortly. 1. BWA-Mem - Ready for testing by Francis' team 2. Sanger - Ready for testing by Francis' team. 3. EMBL - Ready for testing by Francis' team. 4. DKFZ - Ready for testing by Francis' team. I've exchanged emails with Manuel Ballesteros who has been testing this pipeline. 5. Broad - Variant calling (MuTect, dRanger, snowman), need some work, Gordan sent details previously 6. OxoG - Waiting for Dimitri to provide OxoG docker 7. Variantbam 8. Consensus algorithm PCAWG Docker (Dockstore) Testing Working Group 5min Other business? Group Christina K. Yung, PhD Project Manager, Cancer Genome Collaboratory Ontario Institute for Cancer Research MaRS Centre 661 University Avenue, Suite 510 Toronto, Ontario, Canada M5G 0A3 Tel: 416-673-8578 www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ PAWG-TECH mailing list PAWG-TECH at lists.icgc.org https://lists.icgc.org/mailman/listinfo/pawg-tech _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Tue Oct 4 11:04:26 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Tue, 4 Oct 2016 15:04:26 +0000 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <8ECBAF05-EACF-4D5F-8C0A-36C69EAE9F99@oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6DDD0535-6E45-413E-8E93-60E22F6C3510@oicr.on.ca> <8231F9BE-D796-4F7C-9F61-8BAEA336F038@ohsu.edu> <8212A167-228D-4184-A707-F5DC39F8FA27@oicr.on.ca> , <8ECBAF05-EACF-4D5F-8C0A-36C69EAE9F99@oicr.on.ca> Message-ID: <27512884B2D81B41AAB7BB266248F240C09A05E3@exmb2.ad.oicr.on.ca> Hi, Brian/Dennis: apparently there is a problem with the Sanger workflow docker container because it didn?t work for Miguel and Junjun, but worked on test data for Adam. Please advise what you would like tested. (I asked Adam to test ?standard? dataset for Sanger workflow. I think the first useful test would be to see if the test data ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/test1.json ) works in JunJun/Miguel's environments. That way we can diagnose whether there is a difference in environment causing the issue or an issue with the data. bwa-mem is a good suggestion since it is also a useful test. It along with delly are the fastest workflows in this test, so running test data through them will quickly get results. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Francis Ouellette [francis at oicr.on.ca] Sent: October 4, 2016 10:24 AM To: docktesters at lists.icgc.org Cc: kr2 at sanger.ac.uk Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Miguel, I think it will be important for all to update this table: https://wiki.oicr.on.ca/pages/editpage.action?pageId=66309629 (in particular, Adam S, Miguel, Junjun and Christina). Other important message from Lincoln: We need to test BWA-mem workflow. Christina: do we need to ?undo? a BAM file for that? We have a lot of testing to do, and next call is in 13 days (skipping next week because of thanksgiving holiday in Canada next Monday). Brian/Dennis: apparently there is a problem with the Sanger workflow docker container because it didn?t work for Miguel and Junjun, but worked on test data for Adam. Please advise what you would like tested. (I asked Adam to test ?standard? dataset for Sanger workflow. thank you, @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Oct 4, 2016, at 3:22 AM, Miguel Vazquez > wrote: Hi all, I'm sorry to have missed the call yesterday, I got hung up with administrative things. What did I miss? Best Miguel On Fri, Sep 30, 2016 at 6:19 PM, Francis Ouellette > wrote: Adam, Can you complete table on wiki: https://wiki.oicr.on.ca/pages/editpage.action?pageId=66309629 Actually: Miguel and Junjun, can you also somplete that table? Thank you. Adam: can you try and make workflow work on one platform with the same data that Miguel and Junjun tried? Thank you, @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Sep 30, 2016, at 12:02 PM, Adam Struck > wrote: I thought I should add that I've run the Sanger pipeline on the test data in three separate environments using both CWL and WDL descriptors. Feel free to email me the errors you are seeing and I'll try and debug. Adam Struck Scientific Programmer Computational Biology Oregon Health and Science University On Sep 30, 2016, at 7:52 AM, Francis Ouellette > wrote: yes, good plan. @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Sep 30, 2016, at 10:46 AM, Miguel Vazquez > wrote: Dear Francis, I haven't had a chance to go at it again since the meeting, I got a new position in Norway, but next week I will try to find some time. I think I might just try another pipeline, since it seems like the Sanger pipeline needs fixing. Although perhaps I will first try Denis test data again and see if I can reproduce the error, after which we probably should contact the person in charge. Best M On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: Dear Miguel and Junjun, Any more attempts on testing the PCAWG sanger docker container? If you reproduce the same error, we will need to involve Bryan and Keiran Raine > (author of the container). Let?s get this one figured out. I am going to assume that the making of the docker container is what needs resolving. Brian: We may need your input here. Details of our current experiment should continue to be posted here: https://goo.gl/XX5BG9 Thank you all, francis PS would be good for others on list to follow directions on above google doc and also see if they can succeed on this workflow. Junjun and Miguel have tried different clouds, but used the sanger workflow, on the same data set. Thank you for trying to do this. Would be good if I heard back from anybody before Monday AM (tech call). @bffo PS I CCed Keiran, but waiting to hear back from Brian before we need to involve him some more. PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same data set). -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette Begin forwarded message: From: Christina Yung > Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Date: September 30, 2016 at 9:19:39 AM EDT To: "pawg-tech (pawg-tech at lists.icgc.org)" > Hi Everyone, Below is a draft agenda for Monday?s tech call. Please let me know if you have any agenda items for discussion. https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG-TECH+Teleconference Have a great weekend! Christina Call Info Usual Time 9 AM Eastern Time, Mondays UK 0208 322 2500 Canada 1-866-220-6419 United States 1-877-420-0272 All Others Please see attached PDF file with a list of numbers for other countries. Participant Code 5910819# Agenda Time Item Who Attachments/Links 5min Welcome. Wait for group members to log on Christina Yung, OICR 10min Overall status Christina Yung, OICR ? Linkouts to Most Current PCAWG Data ? Report data issues to pcawg-data at icgc.org, GNOS issue to: Help at annaisystems.com ? From Boston F2F: PCAWG datasets & dependencies Action Items 1. [Joachim] Consensus SV - final? 2. [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? 3. [Junjun] Specimen ID mapping for miRNA and methylation 4. [Jonathan & Joachim] Consensus calls for cell lines, followed by filtering 5. [Matthias & Gordon] Docker containers for filtering methods 6. [Christina] Run alignment & variant workflows on medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false negative rate 7. [Christina] Follow up with institutes interested in hosting PCAWG data long-term 8. [All] As per Jennifer's email on Sept 16, please provide authorship information again or for the first time using PCAWG Author Form (http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so you can go back later to provide updates, for example about your evolving role in writing specific papers. 9. [All] Contribute to the manuscripts on a. infrastructure: https://docs.google.com/document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e ) c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline ) 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 conference 10min Status of dockerizing workflows Brian O'Connor, UCSC Gordon Saksena, Broad Francis Ouellette, OICR Status of PCAWG Workflow ports to Dockstore: Denis has been porting the Dockstore entries to CWL version 1.0 which is part of our effort to publish Dockstore (this doesn't affect the content of the pipelines, simply their "descriptors" which allow them to be runnable via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL (which all work with CWL 1.0 and Kerian's test dataset) and has fixed issues with DKFZ and is testing the latter with a real sample shortly. 1. BWA-Mem - Ready for testing by Francis' team 2. Sanger - Ready for testing by Francis' team. 3. EMBL - Ready for testing by Francis' team. 4. DKFZ - Ready for testing by Francis' team. I've exchanged emails with Manuel Ballesteros who has been testing this pipeline. 5. Broad - Variant calling (MuTect, dRanger, snowman), need some work, Gordan sent details previously 6. OxoG - Waiting for Dimitri to provide OxoG docker 7. Variantbam 8. Consensus algorithm PCAWG Docker (Dockstore) Testing Working Group 5min Other business? Group Christina K. Yung, PhD Project Manager, Cancer Genome Collaboratory Ontario Institute for Cancer Research MaRS Centre 661 University Avenue, Suite 510 Toronto, Ontario, Canada M5G 0A3 Tel: 416-673-8578 www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ PAWG-TECH mailing list PAWG-TECH at lists.icgc.org https://lists.icgc.org/mailman/listinfo/pawg-tech _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From christina.yung at oicr.on.ca Tue Oct 4 11:22:55 2016 From: christina.yung at oicr.on.ca (Christina Yung) Date: Tue, 4 Oct 2016 11:22:55 -0400 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A05E3@exmb2.ad.oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6DDD0535-6E45-413E-8E93-60E22F6C3510@oicr.on.ca> <8231F9BE-D796-4F7C-9F61-8BAEA336F038@ohsu.edu> <8212A167-228D-4184-A707-F5DC39F8FA27@oicr.on.ca> <8ECBAF05-EACF-4D5F-8C0A-36C69EAE9F99@oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A05E3@exmb2.ad.oicr.on.ca> Message-ID: <8d125501-ff3a-b2a1-de6e-326d970bc322@oicr.on.ca> For testing BWA-mem with real data, you'll have to follow the SOP to unalign the data first. https://wiki.oicr.on.ca/display/PANCANCER/PCAWG+%28a.k.a.+PCAP+or+PAWG%29+Sequence+Submission+SOP+-+v1.0 Christina On 16-10-04 11:04 AM, Denis Yuen wrote: > Hi, > > Brian/Dennis: apparently there is a problem with the Sanger workflow > docker container because it didn?t work for Miguel > and Junjun, > but worked on test data for Adam. > > Please advise what you would like tested. (I asked Adam to test > ?standard? > dataset for Sanger workflow. > > I think the first useful test would be to see if the test data ( > https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/test1.json > ) works in JunJun/Miguel's environments. That way we can diagnose > whether there is a difference in environment causing the issue or an > issue with the data. > > bwa-mem is a good suggestion since it is also a useful test. It along > with delly are the fastest workflows in this test, so running test > data through them will quickly get results. > > > ------------------------------------------------------------------------ > *From:* docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org > [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf > of Francis Ouellette [francis at oicr.on.ca] > *Sent:* October 4, 2016 10:24 AM > *To:* docktesters at lists.icgc.org > *Cc:* kr2 at sanger.ac.uk > *Subject:* Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH > teleconference > > Hi Miguel, > > I think it will be important for all to update this table: > > https://wiki.oicr.on.ca/pages/editpage.action?pageId=66309629 > > > (in particular, Adam S, Miguel, Junjun and Christina). > > Other important message from Lincoln: We need to test BWA-mem workflow. > > Christina: do we need to ?undo? a BAM file for that? > > We have a lot of testing to do, and next call is in 13 days (skipping > next > week because of thanksgiving holiday in Canada next Monday). > > Brian/Dennis: apparently there is a problem with the Sanger workflow > docker container because it didn?t work for Miguel and > Junjun, > but worked on test data for Adam. > > Please advise what you would like tested. (I asked Adam to test > ?standard? > dataset for Sanger workflow. > > thank you, > > @bffo > > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > > > > > > > >> On Oct 4, 2016, at 3:22 AM, Miguel Vazquez > > >> wrote: >> >> Hi all, >> >> I'm sorry to have missed the call yesterday, I got hung up with >> administrative things. What did I miss? >> >> Best >> >> Miguel >> >> On Fri, Sep 30, 2016 at 6:19 PM, Francis Ouellette >> > > >> wrote: >> >> >> >> Adam, >> >> Can you complete table on wiki: >> >> https://wiki.oicr.on.ca/pages/editpage.action?pageId=66309629 >> >> >> Actually: Miguel and Junjun, can you also somplete that table? >> >> Thank you. >> >> Adam: can you try and make workflow work on one platform with the >> same data that Miguel and Junjun tried? >> >> Thank you, >> >> @bffo >> >> >> >> -- >> B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette >> >> >> >> >> >>> On Sep 30, 2016, at 12:02 PM, Adam Struck >> > >>> wrote: >>> >>> I thought I should add that I've run the Sanger pipeline on the >>> test data in three separate environments using both CWL and WDL >>> descriptors. >>> >>> Feel free to email me the errors you are seeing and I'll try and >>> debug. >>> >>> Adam Struck >>> Scientific Programmer >>> Computational Biology >>> Oregon Health and Science University >>> >>> >>> On Sep 30, 2016, at 7:52 AM, Francis Ouellette >>> >> > >>> wrote: >>> >>>> yes, good plan. >>>> >>>> @bffo >>>> >>>> -- >>>> B.F. Francis Ouellette >>>> http://oicr.on.ca/person/francis-ouellette >>>> >>>> >>>> >>>> >>>> >>>>> On Sep 30, 2016, at 10:46 AM, Miguel Vazquez >>>>> >>>> > >>>>> wrote: >>>>> >>>>> Dear Francis, >>>>> >>>>> I haven't had a chance to go at it again since the meeting, I >>>>> got a new position in Norway, but next week I will try to find >>>>> some time. I think I might just try another pipeline, since it >>>>> seems like the Sanger pipeline needs fixing. Although perhaps >>>>> I will first try Denis test data again and see if I can >>>>> reproduce the error, after which we probably should contact >>>>> the person in charge. >>>>> >>>>> Best >>>>> >>>>> M >>>>> >>>>> On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette >>>>> >>>> > >>>>> wrote: >>>>> >>>>> Dear Miguel and Junjun, >>>>> >>>>> Any more attempts on testing the PCAWG sanger docker >>>>> container? >>>>> >>>>> If you reproduce the same error, we will need to involve >>>>> Bryan and >>>>> Keiran Raine >>>> > >>>>> (author of the container). >>>>> >>>>> Let?s get this one figured out. >>>>> >>>>> I am going to assume that the making of the docker >>>>> container is what >>>>> needs resolving. >>>>> >>>>> Brian: We may need your input here. >>>>> >>>>> Details of our current experiment should continue to be >>>>> posted here: >>>>> >>>>> https://goo.gl/XX5BG9 >>>>> >>>>> >>>>> Thank you all, >>>>> >>>>> francis >>>>> >>>>> PS would be good for others on list to follow directions >>>>> on above google >>>>> doc and also see if they can succeed on this workflow. >>>>> >>>>> Junjun and Miguel have tried different clouds, but used >>>>> the sanger >>>>> workflow, on the same data set. >>>>> >>>>> Thank you for trying to do this. >>>>> >>>>> Would be good if I heard back from anybody before Monday >>>>> AM (tech call). >>>>> >>>>> @bffo >>>>> >>>>> PS I CCed Keiran, but waiting to hear back from Brian >>>>> before we need to involve him some more. >>>>> PPS Junjun/Miguel: maybe you can try the DKFZ docker as >>>>> well? (on the same data set). >>>>> >>>>> >>>>> -- >>>>> B.F. Francis Ouellette >>>>> http://oicr.on.ca/person/francis-ouellette >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Begin forwarded message: >>>>>> >>>>>> *From: *Christina Yung >>>>> > >>>>>> *Subject: **[PAWG-TECH] Draft agenda for PCAWG-TECH >>>>>> teleconference* >>>>>> *Date: *September 30, 2016 at 9:19:39 AM EDT >>>>>> *To: *"pawg-tech (pawg-tech at lists.icgc.org >>>>>> )" >>>>>> >>>>> > >>>>>> >>>>>> Hi Everyone, >>>>>> >>>>>> Below is a draft agenda for Monday?s tech call. Please >>>>>> let me know if you have any agenda items for discussion. >>>>>> >>>>>> https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG-TECH+Teleconference >>>>>> >>>>>> >>>>>> Have a great weekend! >>>>>> >>>>>> Christina >>>>>> >>>>>> >>>>>> Call Info >>>>>> >>>>>> *Usual Time 9 AM Eastern Time, Mondays* >>>>>> >>>>>> *UK 0208 322 2500* >>>>>> >>>>>> *Canada 1-866-220-6419* >>>>>> >>>>>> *United States **1-877-420-0272* >>>>>> >>>>>> *All Others Please see attached PDF file >>>>>> with >>>>>> a list of numbers for other countries.* >>>>>> >>>>>> *Participant Code 5910819#* >>>>>> >>>>>> >>>>>> Agenda >>>>>> >>>>>> >>>>>> *Time* >>>>>> >>>>>> >>>>>> >>>>>> *Item* >>>>>> >>>>>> >>>>>> >>>>>> *Who* >>>>>> >>>>>> >>>>>> >>>>>> *Attachments/Links* >>>>>> >>>>>> 5min >>>>>> >>>>>> >>>>>> >>>>>> Welcome. Wait for group members to log on >>>>>> >>>>>> >>>>>> >>>>>> Christina Yung >>>>>> , >>>>>> OICR >>>>>> >>>>>> >>>>>> >>>>>> 10min >>>>>> >>>>>> >>>>>> >>>>>> Overall status >>>>>> >>>>>> >>>>>> Christina Yung >>>>>> , >>>>>> OICR >>>>>> >>>>>> >>>>>> ?Linkouts to Most Current PCAWG Data >>>>>> >>>>>> >>>>>> ?Report data issues to pcawg-data at icgc.org >>>>>> , >>>>>> GNOS issue to: Help at annaisystems.com >>>>>> >>>>>> >>>>>> ?From Boston F2F: PCAWG datasets & dependencies >>>>>> >>>>>> >>>>>> *Action Items* >>>>>> >>>>>> 1.[Joachim] Consensus SV - final? >>>>>> >>>>>> 2.[Jakob] Consensus SNVs - changes to "SNV near indels" >>>>>> annotation? >>>>>> >>>>>> 3.[Junjun] Specimen ID mapping for miRNA and methylation >>>>>> >>>>>> 4.[Jonathan & Joachim] Consensus calls for cell lines, >>>>>> followed by filtering >>>>>> >>>>>> 5.[Matthias & Gordon] Docker containers for filtering methods >>>>>> >>>>>> 6.[Christina] Run alignment & variant workflows on >>>>>> medulloblastoma sample (tumor 40x, normal 30x) from ICGC >>>>>> benchmark to estimate false negative rate >>>>>> >>>>>> 7.[Christina] Follow up with institutes interested in >>>>>> hosting PCAWG data long-term >>>>>> >>>>>> 8.[All] As per Jennifer's email on Sept 16, please >>>>>> provide authorship information again or for the first >>>>>> time using PCAWG Author Form >>>>>> (http://goo.gl/forms/5Wq5x5X1DK >>>>>> ). >>>>>> Save the link "Edit your response" so you can go back >>>>>> later to provide updates, for example about your evolving >>>>>> role in writing specific papers. >>>>>> >>>>>> 9.[All] Contribute to the manuscripts on >>>>>> >>>>>> a.infrastructure: >>>>>> https://docs.google.com/document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit >>>>>> >>>>>> >>>>>> b.variants: Paper ( https://goo.gl/g9CLsu >>>>>> ), >>>>>> Supplement ( https://goo.gl/EWYh7e >>>>>> ) >>>>>> >>>>>> c.Rogue's Gallery of Cancer Genome Sequencing Artifacts ( >>>>>> outline >>>>>> >>>>>> ) >>>>>> >>>>>> 10.[Junjun] Discuss PCAWG vs DCC glossary terms at next >>>>>> PCAWG-10/13 conference >>>>>> >>>>>> 10min >>>>>> >>>>>> >>>>>> >>>>>> Status of dockerizing workflows >>>>>> >>>>>> >>>>>> Brian O'Connor >>>>>> , >>>>>> UCSC >>>>>> >>>>>> Gordon Saksena >>>>>> , >>>>>> Broad >>>>>> >>>>>> Francis Ouellette >>>>>> , >>>>>> OICR >>>>>> >>>>>> >>>>>> *Status of PCAWG Workflow ports to Dockstore*: >>>>>> >>>>>> Denis has been porting the Dockstore entries to CWL >>>>>> version 1.0 which is part of our effort to publish >>>>>> Dockstore (this doesn't affect the content of the >>>>>> pipelines, simply their "descriptors" which allow them to >>>>>> be runnable via Dockstore). Denis has also worked on >>>>>> testing BWA-Mem, Sanger, EMBL (which all work with CWL >>>>>> 1.0 and Kerian's test dataset) and has fixed issues with >>>>>> DKFZ and is testing the latter with a real sample shortly. >>>>>> >>>>>> 1.BWA-Mem - Ready for testing by Francis' team >>>>>> >>>>>> 2.Sanger - Ready for testing by Francis' team. >>>>>> >>>>>> 3.EMBL - Ready for testing by Francis' team. >>>>>> >>>>>> 4.DKFZ - Ready for testing by Francis' team. I've >>>>>> exchanged emails with Manuel Ballesteros who has been >>>>>> testing this pipeline. >>>>>> >>>>>> 5.Broad - Variant calling (MuTect, dRanger, >>>>>> snowman), need some work, Gordan sent details previously >>>>>> >>>>>> 6.OxoG - Waiting for Dimitri to provide OxoG docker >>>>>> >>>>>> 7.Variantbam >>>>>> >>>>>> 8.Consensus algorithm >>>>>> >>>>>> PCAWG Docker (Dockstore) Testing Working Group >>>>>> >>>>>> >>>>>> 5min >>>>>> >>>>>> >>>>>> >>>>>> Other business? >>>>>> >>>>>> >>>>>> >>>>>> Group >>>>>> >>>>>> >>>>>> >>>>>> ** >>>>>> >>>>>> *Christina K. Yung, PhD***Project Manager, Cancer Genome >>>>>> Collaboratory*Ontario Institute for Cancer Research*MaRS >>>>>> Centre >>>>>> >>>>>> 661 University Avenue, Suite 510 Toronto, Ontario, >>>>>> Canada M5G 0A3 Tel: 416-673-8578 >>>>>> >>>>>> www.oicr.on.ca >>>>>> >>>>>> >>>>>> This message and any attachments may contain confidential >>>>>> and/or privileged information for the sole use of the >>>>>> intended recipient. Any review or distribution by anyone >>>>>> other than the person for whom it was originally intended >>>>>> is strictly prohibited. If you have received this message >>>>>> in error, please contact the sender and delete all >>>>>> copies. Opinions, conclusions or other information >>>>>> contained in this message may not be that of the >>>>>> organization. >>>>>> >>>>>> _______________________________________________ PAWG-TECH >>>>>> mailing list PAWG-TECH at lists.icgc.org >>>>>> >>>>>> https://lists.icgc.org/mailman/listinfo/pawg-tech >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> docktesters mailing list docktesters at lists.icgc.org >>>>> >>>>> https://lists.icgc.org/mailman/listinfo/docktesters >>>>> >>>>> >>>>> >>>> _______________________________________________ docktesters >>>> mailing list docktesters at lists.icgc.org >>>> >>>> https://lists.icgc.org/mailman/listinfo/docktesters >>>> >>>> >> _______________________________________________ docktesters >> mailing list docktesters at lists.icgc.org >> >> https://lists.icgc.org/mailman/listinfo/docktesters >> >> >> > This body part will be downloaded on demand. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Tue Oct 4 11:31:19 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Tue, 4 Oct 2016 17:31:19 +0200 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A05E3@exmb2.ad.oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6DDD0535-6E45-413E-8E93-60E22F6C3510@oicr.on.ca> <8231F9BE-D796-4F7C-9F61-8BAEA336F038@ohsu.edu> <8212A167-228D-4184-A707-F5DC39F8FA27@oicr.on.ca> <8ECBAF05-EACF-4D5F-8C0A-36C69EAE9F99@oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A05E3@exmb2.ad.oicr.on.ca> Message-ID: I'm already on it On Tue, Oct 4, 2016 at 5:04 PM, Denis Yuen wrote: > Hi, > > Brian/Dennis: apparently there is a problem with the Sanger workflow > docker container because it didn?t work for Miguel and > Junjun, > but worked on test data for Adam. > > Please advise what you would like tested. (I asked Adam to test ?standard? > dataset for Sanger workflow. > > I think the first useful test would be to see if the test data ( > https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/bl > ob/develop/test1.json ) works in JunJun/Miguel's environments. That way > we can diagnose whether there is a difference in environment causing the > issue or an issue with the data. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Wed Oct 5 04:44:10 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Wed, 5 Oct 2016 10:44:10 +0200 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> Message-ID: Hello all, I've created a repository in https://github.com/mikisvaz/PCAWG-Docker-Test With scripts that can help you run the tests. They can help download samples, prepare the Dockerstore.json files and run them. I've added an entry in the google doc. I've ran the Sanger on the HCC1143 (Test) data and it seemed to have completed correctly (I updated the table). I'm in the process now of running the same test data in the DKFZ and sample DO50311 on the Delly workflow. I have two standing questions: - What are the other workflows I need to test? - What is the bedpe file used by DKFZ and how to produce it. Denis has pointed me to some forum threads that I have to study a bit, but it seems that we need to run something to produce it before we can run the workflow, and I wonder how this fits into our testing scheme. I'll read more about it when I have a chance. Best regards On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette wrote: > Dear Miguel and Junjun, > > Any more attempts on testing the PCAWG sanger docker container? > > If you reproduce the same error, we will need to involve Bryan and > Keiran Raine (author of the container). > > Let?s get this one figured out. > > I am going to assume that the making of the docker container is what > needs resolving. > > Brian: We may need your input here. > > Details of our current experiment should continue to be posted here: > > https://goo.gl/XX5BG9 > > Thank you all, > > francis > > PS would be good for others on list to follow directions on above google > doc and also see if they can succeed on this workflow. > > Junjun and Miguel have tried different clouds, but used the sanger > workflow, on the same data set. > > Thank you for trying to do this. > > Would be good if I heard back from anybody before Monday AM (tech call). > > @bffo > > PS I CCed Keiran, but waiting to hear back from Brian before we need to > involve him some more. > PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same > data set). > > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > > > Begin forwarded message: > > *From: *Christina Yung > *Subject: **[PAWG-TECH] Draft agenda for PCAWG-TECH teleconference* > *Date: *September 30, 2016 at 9:19:39 AM EDT > *To: *"pawg-tech (pawg-tech at lists.icgc.org)" > > Hi Everyone, > > > > Below is a draft agenda for Monday?s tech call. Please let me know if you > have any agenda items for discussion. > > https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+ > PCAWG-TECH+Teleconference > > > > Have a great weekend! > > Christina > > > Call Info > > *Usual Time 9 AM Eastern Time, Mondays* > > *UK 0208 322 2500* > > *Canada 1-866-220-6419* > > *United States **1-877-420-0272* > > *All Others Please see attached PDF file > with > a list of numbers for other countries.* > > *Participant Code 5910819#* > Agenda > > > > *Time* > > *Item* > > *Who* > > *Attachments/Links* > > 5min > > Welcome. Wait for group members to log on > > Christina Yung , OICR > > 10min > > Overall status > Christina Yung , OICR > > ? Linkouts to Most Current PCAWG Data > > > ? Report data issues to pcawg-data at icgc.org, GNOS issue to: > Help at annaisystems.com > > ? From Boston F2F: PCAWG datasets & dependencies > > > *Action Items* > > 1. [Joachim] Consensus SV - final? > > 2. [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? > > 3. [Junjun] Specimen ID mapping for miRNA and methylation > > 4. [Jonathan & Joachim] Consensus calls for cell lines, followed by > filtering > > 5. [Matthias & Gordon] Docker containers for filtering methods > > 6. [Christina] Run alignment & variant workflows on medulloblastoma > sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false > negative rate > > 7. [Christina] Follow up with institutes interested in hosting > PCAWG data long-term > > 8. [All] As per Jennifer's email on Sept 16, please provide > authorship information again or for the first time using PCAWG Author Form ( > http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so > you can go back later to provide updates, for example about your evolving > role in writing specific papers. > > 9. [All] Contribute to the manuscripts on > > a. infrastructure: https://docs.google.com/ > document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit > > b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( > https://goo.gl/EWYh7e ) > > c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline > > ) > > 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 > conference > > 10min > > Status of dockerizing workflows > Brian O'Connor , UCSC > > Gordon Saksena > , Broad > > Francis Ouellette , OICR > *Status of PCAWG Workflow ports to Dockstore*: > > Denis has been porting the Dockstore entries to CWL version 1.0 which is > part of our effort to publish Dockstore (this doesn't affect the content of > the pipelines, simply their "descriptors" which allow them to be runnable > via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL > (which all work with CWL 1.0 and Kerian's test dataset) and has fixed > issues with DKFZ and is testing the latter with a real sample shortly. > > 1. BWA-Mem - Ready for testing by Francis' team > > 2. Sanger - Ready for testing by Francis' team. > > 3. EMBL - Ready for testing by Francis' team. > > 4. DKFZ - Ready for testing by Francis' team. I've exchanged emails > with Manuel Ballesteros who has been testing this pipeline. > > 5. Broad - Variant calling (MuTect, dRanger, snowman), need some > work, Gordan sent details previously > > 6. OxoG - Waiting for Dimitri to provide OxoG docker > > 7. Variantbam > > 8. Consensus algorithm > > PCAWG Docker (Dockstore) Testing Working Group > > > 5min > > Other business? > > Group > > > > > > > > *Christina K. Yung, PhD* > Project Manager, Cancer Genome Collaboratory > > *Ontario Institute for Cancer Research* > MaRS Centre > > 661 University Avenue, Suite 510 > Toronto, Ontario, Canada M5G 0A3 > Tel: 416-673-8578 > > www.oicr.on.ca > > > > This message and any attachments may contain confidential and/or > privileged information for the sole use of the intended recipient. Any > review or distribution by anyone other than the person for whom it was > originally intended is strictly prohibited. If you have received this > message in error, please contact the sender and delete all copies. > Opinions, conclusions or other information contained in this message may > not be that of the organization. > > > _______________________________________________ > PAWG-TECH mailing list > PAWG-TECH at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/pawg-tech > > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Wed Oct 5 06:44:39 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Wed, 5 Oct 2016 12:44:39 +0200 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> Message-ID: Hi all, There was en Error running the test data for the DKFZ Workflow. It appears to have some path hardcoded somewhere: Here are the logs: 8<--------------- + [[ 1 == 0 ]] + for logfile in '${jobstateFiles[@]}' ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt ++ grep -v null: ++ wc -l ++ grep :STARTED: + cntStarted=4 ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt ++ grep -v null: ++ grep :0: ++ wc -l + cntSuccessful=4 ++ expr 4 - 4 + cntErrornous=0 + [[ 0 -gt 0 ]] + [[ 0 == 0 ]] + echo 'No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt' No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt + for logfile in '${jobstateFiles[@]}' ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt ++ grep -v null: ++ grep :STARTED: ++ wc -l + cntStarted=2 ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt ++ grep -v null: ++ wc -l ++ grep :0: + cntSuccessful=2 ++ expr 2 - 2 + cntErrornous=0 + [[ 0 -gt 0 ]] + [[ 0 == 0 ]] + echo 'No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt' No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt There was at least one error in a job status logfile. Will exit now! + [[ true == true ]] + echo 'There was at least one error in a job status logfile. Will exit now!' + exit 5 *mv: cannot stat `/mnt/datastore/resultdata/*': No such file or directory* Result directory listing is: Error while running job: Error collecting output for parameter 'germline_indel_vcf_gz': Did not find output file with glob pattern: '['*.germline.indel.vcf.gz']' [job temp5368999563481290609.cwl] completed permanentFail Final process status is permanentFail Workflow error, try again with --debug for more information: Process status is ['permanentFail'] stdout : java.lang.RuntimeException: problems running command: cwltool --enable-dev --non-strict --enable-net --outdir /mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/HCC1143/./datastore/launcher-7bb3a7ef-64e2-43ac-81c9-dd42e1b35f1b/outputs/ --tmpdir-prefix /mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/HCC1143/./datastore/launcher-7bb3a7ef-64e2-43ac-81c9-dd42e1b35f1b/working/ /tmp/1475650852550-0/temp5368999563481290609.cwl /mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/HCC1143/./datastore/launcher-7bb3a7ef-64e2-43ac-81c9-dd42e1b35f1b/workflow_params.json 8<--------------- The Dockerstore.json I used is as follows: { "run-id": "run_id", "tumor-bam": { "path":"/home/ubuntu/DockerTest-Miguel/data/HCC1143/tumor.bam", "class":"File" }, "normal-bam": { "path":"/home/ubuntu/DockerTest-Miguel/data/HCC1143/normal.bam", "class":"File" }, "reference-gz": { "path": "/home/ubuntu/DockerTest-Miguel/resources//dkfz-workflow-dependencies_150318_0951.tar.gz", "class": "File" }, "delly-bedpe": { "path":"/home/ubuntu/DockerTest-Miguel/resources//run_id.embl-delly_1-3-0-preFilter.20150318.somatic.sv.bedpe.txt", "class":"File" }, "germline_indel_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.germline.indel.vcf.gz", "class": "File" }, "somatic_snv_mnv_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.snv.mnv.vcf.gz", "class": "File" }, "germline_snv_mnv_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.germline.snv.mnv.vcf.gz", "class": "File" }, "somatic_cnv_tar_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.cnv.tar.gz", "class": "File" }, "somatic_cnv_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.cnv.vcf.gz", "class": "File" }, "somatic_indel_tar_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.indel.tar.gz", "class": "File" }, "somatic_snv_mnv_tar_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.snv.mnv.tar.gz", "class": "File" }, "somatic_indel_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.indel.vcf.gz", "class": "File" } } On Wed, Oct 5, 2016 at 10:44 AM, Miguel Vazquez wrote: > Hello all, > > I've created a repository in > > https://github.com/mikisvaz/PCAWG-Docker-Test > > With scripts that can help you run the tests. They can help download > samples, prepare the Dockerstore.json files and run them. I've added an > entry in the google doc. > > I've ran the Sanger on the HCC1143 (Test) data and it seemed to have > completed correctly (I updated the table). I'm in the process now of > running the same test data in the DKFZ and sample DO50311 on the Delly > workflow. > > I have two standing questions: > > - What are the other workflows I need to test? > - What is the bedpe file used by DKFZ and how to produce it. Denis has > pointed me to some forum threads that I have to study a bit, but it seems > that we need to run something to produce it before we can run the workflow, > and I wonder how this fits into our testing scheme. I'll read more about it > when I have a chance. > > Best regards > > > > On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: > >> Dear Miguel and Junjun, >> >> Any more attempts on testing the PCAWG sanger docker container? >> >> If you reproduce the same error, we will need to involve Bryan and >> Keiran Raine (author of the container). >> >> Let?s get this one figured out. >> >> I am going to assume that the making of the docker container is what >> needs resolving. >> >> Brian: We may need your input here. >> >> Details of our current experiment should continue to be posted here: >> >> https://goo.gl/XX5BG9 >> >> Thank you all, >> >> francis >> >> PS would be good for others on list to follow directions on above google >> doc and also see if they can succeed on this workflow. >> >> Junjun and Miguel have tried different clouds, but used the sanger >> workflow, on the same data set. >> >> Thank you for trying to do this. >> >> Would be good if I heard back from anybody before Monday AM (tech call). >> >> @bffo >> >> PS I CCed Keiran, but waiting to hear back from Brian before we need to >> involve him some more. >> PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the >> same data set). >> >> >> -- >> B.F. Francis Ouellette http://oicr.on.ca/person/fran >> cis-ouellette >> >> >> >> Begin forwarded message: >> >> *From: *Christina Yung >> *Subject: **[PAWG-TECH] Draft agenda for PCAWG-TECH teleconference* >> *Date: *September 30, 2016 at 9:19:39 AM EDT >> *To: *"pawg-tech (pawg-tech at lists.icgc.org)" >> >> Hi Everyone, >> >> >> >> Below is a draft agenda for Monday?s tech call. Please let me know if >> you have any agenda items for discussion. >> >> https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG- >> TECH+Teleconference >> >> >> >> Have a great weekend! >> >> Christina >> >> >> Call Info >> >> *Usual Time 9 AM Eastern Time, Mondays* >> >> *UK 0208 322 2500* >> >> *Canada 1-866-220-6419* >> >> *United States **1-877-420-0272* >> >> *All Others Please see attached PDF file >> with >> a list of numbers for other countries.* >> >> *Participant Code 5910819#* >> Agenda >> >> >> >> *Time* >> >> *Item* >> >> *Who* >> >> *Attachments/Links* >> >> 5min >> >> Welcome. Wait for group members to log on >> >> Christina Yung , OICR >> >> 10min >> >> Overall status >> Christina Yung , OICR >> >> ? Linkouts to Most Current PCAWG Data >> >> >> ? Report data issues to pcawg-data at icgc.org, GNOS issue to: >> Help at annaisystems.com >> >> ? From Boston F2F: PCAWG datasets & dependencies >> >> >> *Action Items* >> >> 1. [Joachim] Consensus SV - final? >> >> 2. [Jakob] Consensus SNVs - changes to "SNV near indels" >> annotation? >> >> 3. [Junjun] Specimen ID mapping for miRNA and methylation >> >> 4. [Jonathan & Joachim] Consensus calls for cell lines, followed >> by filtering >> >> 5. [Matthias & Gordon] Docker containers for filtering methods >> >> 6. [Christina] Run alignment & variant workflows on >> medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to >> estimate false negative rate >> >> 7. [Christina] Follow up with institutes interested in hosting >> PCAWG data long-term >> >> 8. [All] As per Jennifer's email on Sept 16, please provide >> authorship information again or for the first time using PCAWG Author Form ( >> http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so >> you can go back later to provide updates, for example about your evolving >> role in writing specific papers. >> >> 9. [All] Contribute to the manuscripts on >> >> a. infrastructure: https://docs.google.com/docume >> nt/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit >> >> b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( >> https://goo.gl/EWYh7e ) >> >> c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline >> >> ) >> >> 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 >> conference >> >> 10min >> >> Status of dockerizing workflows >> Brian O'Connor , UCSC >> >> Gordon Saksena >> , Broad >> >> Francis Ouellette , OICR >> *Status of PCAWG Workflow ports to Dockstore*: >> >> Denis has been porting the Dockstore entries to CWL version 1.0 which is >> part of our effort to publish Dockstore (this doesn't affect the content of >> the pipelines, simply their "descriptors" which allow them to be runnable >> via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL >> (which all work with CWL 1.0 and Kerian's test dataset) and has fixed >> issues with DKFZ and is testing the latter with a real sample shortly. >> >> 1. BWA-Mem - Ready for testing by Francis' team >> >> 2. Sanger - Ready for testing by Francis' team. >> >> 3. EMBL - Ready for testing by Francis' team. >> >> 4. DKFZ - Ready for testing by Francis' team. I've exchanged >> emails with Manuel Ballesteros who has been testing this pipeline. >> >> 5. Broad - Variant calling (MuTect, dRanger, snowman), need some >> work, Gordan sent details previously >> >> 6. OxoG - Waiting for Dimitri to provide OxoG docker >> >> 7. Variantbam >> >> 8. Consensus algorithm >> >> PCAWG Docker (Dockstore) Testing Working Group >> >> >> 5min >> >> Other business? >> >> Group >> >> >> >> >> >> >> >> *Christina K. Yung, PhD* >> Project Manager, Cancer Genome Collaboratory >> >> *Ontario Institute for Cancer Research* >> MaRS Centre >> >> 661 University Avenue, Suite 510 >> Toronto, Ontario, Canada M5G 0A3 >> Tel: 416-673-8578 >> >> www.oicr.on.ca >> >> >> >> This message and any attachments may contain confidential and/or >> privileged information for the sole use of the intended recipient. Any >> review or distribution by anyone other than the person for whom it was >> originally intended is strictly prohibited. If you have received this >> message in error, please contact the sender and delete all copies. >> Opinions, conclusions or other information contained in this message may >> not be that of the organization. >> >> >> _______________________________________________ >> PAWG-TECH mailing list >> PAWG-TECH at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/pawg-tech >> >> >> >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Wed Oct 5 11:39:53 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Wed, 5 Oct 2016 15:39:53 +0000 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> , Message-ID: <27512884B2D81B41AAB7BB266248F240C09A079F@exmb2.ad.oicr.on.ca> Hi, Sorry, unfortunately, the message isn't very clear. The output before " mv: cannot stat `/mnt/datastore/resultdata/*': No such file or directory" comes from the CWL runner which basically is saying it couldn't find the output in the Docker container. The output above that is from DKFZ (Roddy rather, which is DKFZ's workflow engine) itself, the problem is actually "There was at least one error in a job status logfile. Will exit now!" but I don't see any more details Could you attach the full output as an attachment? There may be more hints in the rest of the output. I don't see any glaring issues with the JSON assuming that /home/ubuntu/DockerTest-Miguel/data/HCC1143/tumor.bam does in fact exist at that location. Denis Yuen Bioinformatics Software Developer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario, Canada M5G 0A3 Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es] Sent: October 5, 2016 6:44 AM To: Francis Ouellette Cc: kr2 at sanger.ac.uk; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi all, There was en Error running the test data for the DKFZ Workflow. It appears to have some path hardcoded somewhere: Here are the logs: 8<--------------- + [[ 1 == 0 ]] + for logfile in '${jobstateFiles[@]}' ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt ++ grep -v null: ++ wc -l ++ grep :STARTED: + cntStarted=4 ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt ++ grep -v null: ++ grep :0: ++ wc -l + cntSuccessful=4 ++ expr 4 - 4 + cntErrornous=0 + [[ 0 -gt 0 ]] + [[ 0 == 0 ]] + echo 'No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt' No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt + for logfile in '${jobstateFiles[@]}' ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt ++ grep -v null: ++ grep :STARTED: ++ wc -l + cntStarted=2 ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt ++ grep -v null: ++ wc -l ++ grep :0: + cntSuccessful=2 ++ expr 2 - 2 + cntErrornous=0 + [[ 0 -gt 0 ]] + [[ 0 == 0 ]] + echo 'No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt' No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt There was at least one error in a job status logfile. Will exit now! + [[ true == true ]] + echo 'There was at least one error in a job status logfile. Will exit now!' + exit 5 mv: cannot stat `/mnt/datastore/resultdata/*': No such file or directory Result directory listing is: Error while running job: Error collecting output for parameter 'germline_indel_vcf_gz': Did not find output file with glob pattern: '['*.germline.indel.vcf.gz']' [job temp5368999563481290609.cwl] completed permanentFail Final process status is permanentFail Workflow error, try again with --debug for more information: Process status is ['permanentFail'] stdout : java.lang.RuntimeException: problems running command: cwltool --enable-dev --non-strict --enable-net --outdir /mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/HCC1143/./datastore/launcher-7bb3a7ef-64e2-43ac-81c9-dd42e1b35f1b/outputs/ --tmpdir-prefix /mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/HCC1143/./datastore/launcher-7bb3a7ef-64e2-43ac-81c9-dd42e1b35f1b/working/ /tmp/1475650852550-0/temp5368999563481290609.cwl /mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/HCC1143/./datastore/launcher-7bb3a7ef-64e2-43ac-81c9-dd42e1b35f1b/workflow_params.json 8<--------------- The Dockerstore.json I used is as follows: { "run-id": "run_id", "tumor-bam": { "path":"/home/ubuntu/DockerTest-Miguel/data/HCC1143/tumor.bam", "class":"File" }, "normal-bam": { "path":"/home/ubuntu/DockerTest-Miguel/data/HCC1143/normal.bam", "class":"File" }, "reference-gz": { "path": "/home/ubuntu/DockerTest-Miguel/resources//dkfz-workflow-dependencies_150318_0951.tar.gz", "class": "File" }, "delly-bedpe": { "path":"/home/ubuntu/DockerTest-Miguel/resources//run_id.embl-delly_1-3-0-preFilter.20150318.somatic.sv.bedpe.txt", "class":"File" }, "germline_indel_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.germline.indel.vcf.gz", "class": "File" }, "somatic_snv_mnv_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.snv.mnv.vcf.gz", "class": "File" }, "germline_snv_mnv_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.germline.snv.mnv.vcf.gz", "class": "File" }, "somatic_cnv_tar_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.cnv.tar.gz", "class": "File" }, "somatic_cnv_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.cnv.vcf.gz", "class": "File" }, "somatic_indel_tar_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.indel.tar.gz", "class": "File" }, "somatic_snv_mnv_tar_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.snv.mnv.tar.gz", "class": "File" }, "somatic_indel_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.indel.vcf.gz", "class": "File" } } On Wed, Oct 5, 2016 at 10:44 AM, Miguel Vazquez > wrote: Hello all, I've created a repository in https://github.com/mikisvaz/PCAWG-Docker-Test With scripts that can help you run the tests. They can help download samples, prepare the Dockerstore.json files and run them. I've added an entry in the google doc. I've ran the Sanger on the HCC1143 (Test) data and it seemed to have completed correctly (I updated the table). I'm in the process now of running the same test data in the DKFZ and sample DO50311 on the Delly workflow. I have two standing questions: - What are the other workflows I need to test? - What is the bedpe file used by DKFZ and how to produce it. Denis has pointed me to some forum threads that I have to study a bit, but it seems that we need to run something to produce it before we can run the workflow, and I wonder how this fits into our testing scheme. I'll read more about it when I have a chance. Best regards On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: Dear Miguel and Junjun, Any more attempts on testing the PCAWG sanger docker container? If you reproduce the same error, we will need to involve Bryan and Keiran Raine > (author of the container). Let?s get this one figured out. I am going to assume that the making of the docker container is what needs resolving. Brian: We may need your input here. Details of our current experiment should continue to be posted here: https://goo.gl/XX5BG9 Thank you all, francis PS would be good for others on list to follow directions on above google doc and also see if they can succeed on this workflow. Junjun and Miguel have tried different clouds, but used the sanger workflow, on the same data set. Thank you for trying to do this. Would be good if I heard back from anybody before Monday AM (tech call). @bffo PS I CCed Keiran, but waiting to hear back from Brian before we need to involve him some more. PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same data set). -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette Begin forwarded message: From: Christina Yung > Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Date: September 30, 2016 at 9:19:39 AM EDT To: "pawg-tech (pawg-tech at lists.icgc.org)" > Hi Everyone, Below is a draft agenda for Monday?s tech call. Please let me know if you have any agenda items for discussion. https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG-TECH+Teleconference Have a great weekend! Christina Call Info Usual Time 9 AM Eastern Time, Mondays UK 0208 322 2500 Canada 1-866-220-6419 United States 1-877-420-0272 All Others Please see attached PDF file with a list of numbers for other countries. Participant Code 5910819# Agenda Time Item Who Attachments/Links 5min Welcome. Wait for group members to log on Christina Yung, OICR 10min Overall status Christina Yung, OICR ? Linkouts to Most Current PCAWG Data ? Report data issues to pcawg-data at icgc.org, GNOS issue to: Help at annaisystems.com ? From Boston F2F: PCAWG datasets & dependencies Action Items 1. [Joachim] Consensus SV - final? 2. [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? 3. [Junjun] Specimen ID mapping for miRNA and methylation 4. [Jonathan & Joachim] Consensus calls for cell lines, followed by filtering 5. [Matthias & Gordon] Docker containers for filtering methods 6. [Christina] Run alignment & variant workflows on medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false negative rate 7. [Christina] Follow up with institutes interested in hosting PCAWG data long-term 8. [All] As per Jennifer's email on Sept 16, please provide authorship information again or for the first time using PCAWG Author Form (http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so you can go back later to provide updates, for example about your evolving role in writing specific papers. 9. [All] Contribute to the manuscripts on a. infrastructure: https://docs.google.com/document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e ) c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline ) 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 conference 10min Status of dockerizing workflows Brian O'Connor, UCSC Gordon Saksena, Broad Francis Ouellette, OICR Status of PCAWG Workflow ports to Dockstore: Denis has been porting the Dockstore entries to CWL version 1.0 which is part of our effort to publish Dockstore (this doesn't affect the content of the pipelines, simply their "descriptors" which allow them to be runnable via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL (which all work with CWL 1.0 and Kerian's test dataset) and has fixed issues with DKFZ and is testing the latter with a real sample shortly. 1. BWA-Mem - Ready for testing by Francis' team 2. Sanger - Ready for testing by Francis' team. 3. EMBL - Ready for testing by Francis' team. 4. DKFZ - Ready for testing by Francis' team. I've exchanged emails with Manuel Ballesteros who has been testing this pipeline. 5. Broad - Variant calling (MuTect, dRanger, snowman), need some work, Gordan sent details previously 6. OxoG - Waiting for Dimitri to provide OxoG docker 7. Variantbam 8. Consensus algorithm PCAWG Docker (Dockstore) Testing Working Group 5min Other business? Group Christina K. Yung, PhD Project Manager, Cancer Genome Collaboratory Ontario Institute for Cancer Research MaRS Centre 661 University Avenue, Suite 510 Toronto, Ontario, Canada M5G 0A3 Tel: 416-673-8578 www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ PAWG-TECH mailing list PAWG-TECH at lists.icgc.org https://lists.icgc.org/mailman/listinfo/pawg-tech _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Wed Oct 5 11:43:13 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Wed, 5 Oct 2016 15:43:13 +0000 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca>, Message-ID: <27512884B2D81B41AAB7BB266248F240C09A07B1@exmb2.ad.oicr.on.ca> Hi, Heads-up, is that token in https://github.com/mikisvaz/PCAWG-Docker-Test/blob/master/icgc-storage-client-1.0.19/conf/application.properties a real token? As for workflows, I would probably go in the order bwa-mem, delly, sanger (which you have already done), dkfz ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es] Sent: October 5, 2016 4:44 AM To: Francis Ouellette Cc: kr2 at sanger.ac.uk; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hello all, I've created a repository in https://github.com/mikisvaz/PCAWG-Docker-Test With scripts that can help you run the tests. They can help download samples, prepare the Dockerstore.json files and run them. I've added an entry in the google doc. I've ran the Sanger on the HCC1143 (Test) data and it seemed to have completed correctly (I updated the table). I'm in the process now of running the same test data in the DKFZ and sample DO50311 on the Delly workflow. I have two standing questions: - What are the other workflows I need to test? - What is the bedpe file used by DKFZ and how to produce it. Denis has pointed me to some forum threads that I have to study a bit, but it seems that we need to run something to produce it before we can run the workflow, and I wonder how this fits into our testing scheme. I'll read more about it when I have a chance. Best regards On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: Dear Miguel and Junjun, Any more attempts on testing the PCAWG sanger docker container? If you reproduce the same error, we will need to involve Bryan and Keiran Raine > (author of the container). Let?s get this one figured out. I am going to assume that the making of the docker container is what needs resolving. Brian: We may need your input here. Details of our current experiment should continue to be posted here: https://goo.gl/XX5BG9 Thank you all, francis PS would be good for others on list to follow directions on above google doc and also see if they can succeed on this workflow. Junjun and Miguel have tried different clouds, but used the sanger workflow, on the same data set. Thank you for trying to do this. Would be good if I heard back from anybody before Monday AM (tech call). @bffo PS I CCed Keiran, but waiting to hear back from Brian before we need to involve him some more. PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same data set). -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette Begin forwarded message: From: Christina Yung > Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Date: September 30, 2016 at 9:19:39 AM EDT To: "pawg-tech (pawg-tech at lists.icgc.org)" > Hi Everyone, Below is a draft agenda for Monday?s tech call. Please let me know if you have any agenda items for discussion. https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG-TECH+Teleconference Have a great weekend! Christina Call Info Usual Time 9 AM Eastern Time, Mondays UK 0208 322 2500 Canada 1-866-220-6419 United States 1-877-420-0272 All Others Please see attached PDF file with a list of numbers for other countries. Participant Code 5910819# Agenda Time Item Who Attachments/Links 5min Welcome. Wait for group members to log on Christina Yung, OICR 10min Overall status Christina Yung, OICR ? Linkouts to Most Current PCAWG Data ? Report data issues to pcawg-data at icgc.org, GNOS issue to: Help at annaisystems.com ? From Boston F2F: PCAWG datasets & dependencies Action Items 1. [Joachim] Consensus SV - final? 2. [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? 3. [Junjun] Specimen ID mapping for miRNA and methylation 4. [Jonathan & Joachim] Consensus calls for cell lines, followed by filtering 5. [Matthias & Gordon] Docker containers for filtering methods 6. [Christina] Run alignment & variant workflows on medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false negative rate 7. [Christina] Follow up with institutes interested in hosting PCAWG data long-term 8. [All] As per Jennifer's email on Sept 16, please provide authorship information again or for the first time using PCAWG Author Form (http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so you can go back later to provide updates, for example about your evolving role in writing specific papers. 9. [All] Contribute to the manuscripts on a. infrastructure: https://docs.google.com/document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e ) c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline ) 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 conference 10min Status of dockerizing workflows Brian O'Connor, UCSC Gordon Saksena, Broad Francis Ouellette, OICR Status of PCAWG Workflow ports to Dockstore: Denis has been porting the Dockstore entries to CWL version 1.0 which is part of our effort to publish Dockstore (this doesn't affect the content of the pipelines, simply their "descriptors" which allow them to be runnable via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL (which all work with CWL 1.0 and Kerian's test dataset) and has fixed issues with DKFZ and is testing the latter with a real sample shortly. 1. BWA-Mem - Ready for testing by Francis' team 2. Sanger - Ready for testing by Francis' team. 3. EMBL - Ready for testing by Francis' team. 4. DKFZ - Ready for testing by Francis' team. I've exchanged emails with Manuel Ballesteros who has been testing this pipeline. 5. Broad - Variant calling (MuTect, dRanger, snowman), need some work, Gordan sent details previously 6. OxoG - Waiting for Dimitri to provide OxoG docker 7. Variantbam 8. Consensus algorithm PCAWG Docker (Dockstore) Testing Working Group 5min Other business? Group Christina K. Yung, PhD Project Manager, Cancer Genome Collaboratory Ontario Institute for Cancer Research MaRS Centre 661 University Avenue, Suite 510 Toronto, Ontario, Canada M5G 0A3 Tel: 416-673-8578 www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ PAWG-TECH mailing list PAWG-TECH at lists.icgc.org https://lists.icgc.org/mailman/listinfo/pawg-tech _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Wed Oct 5 14:15:04 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Wed, 5 Oct 2016 18:15:04 +0000 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A07B1@exmb2.ad.oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca>, , <27512884B2D81B41AAB7BB266248F240C09A07B1@exmb2.ad.oicr.on.ca> Message-ID: <27512884B2D81B41AAB7BB266248F240C09A07F6@exmb2.ad.oicr.on.ca> Hi, FYI, I checked with the ICGC team, it doesn't look like a valid token, but it might be an expired token. If it is an expired token, it might be useful to delete it from your token manager on the ICGC site to be doubly sure. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca] Sent: October 5, 2016 11:43 AM To: Miguel Vazquez; Francis Ouellette Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, Heads-up, is that token in https://github.com/mikisvaz/PCAWG-Docker-Test/blob/master/icgc-storage-client-1.0.19/conf/application.properties a real token? As for workflows, I would probably go in the order bwa-mem, delly, sanger (which you have already done), dkfz ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es] Sent: October 5, 2016 4:44 AM To: Francis Ouellette Cc: kr2 at sanger.ac.uk; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hello all, I've created a repository in https://github.com/mikisvaz/PCAWG-Docker-Test With scripts that can help you run the tests. They can help download samples, prepare the Dockerstore.json files and run them. I've added an entry in the google doc. I've ran the Sanger on the HCC1143 (Test) data and it seemed to have completed correctly (I updated the table). I'm in the process now of running the same test data in the DKFZ and sample DO50311 on the Delly workflow. I have two standing questions: - What are the other workflows I need to test? - What is the bedpe file used by DKFZ and how to produce it. Denis has pointed me to some forum threads that I have to study a bit, but it seems that we need to run something to produce it before we can run the workflow, and I wonder how this fits into our testing scheme. I'll read more about it when I have a chance. Best regards On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: Dear Miguel and Junjun, Any more attempts on testing the PCAWG sanger docker container? If you reproduce the same error, we will need to involve Bryan and Keiran Raine > (author of the container). Let?s get this one figured out. I am going to assume that the making of the docker container is what needs resolving. Brian: We may need your input here. Details of our current experiment should continue to be posted here: https://goo.gl/XX5BG9 Thank you all, francis PS would be good for others on list to follow directions on above google doc and also see if they can succeed on this workflow. Junjun and Miguel have tried different clouds, but used the sanger workflow, on the same data set. Thank you for trying to do this. Would be good if I heard back from anybody before Monday AM (tech call). @bffo PS I CCed Keiran, but waiting to hear back from Brian before we need to involve him some more. PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same data set). -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette Begin forwarded message: From: Christina Yung > Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Date: September 30, 2016 at 9:19:39 AM EDT To: "pawg-tech (pawg-tech at lists.icgc.org)" > Hi Everyone, Below is a draft agenda for Monday?s tech call. Please let me know if you have any agenda items for discussion. https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG-TECH+Teleconference Have a great weekend! Christina Call Info Usual Time 9 AM Eastern Time, Mondays UK 0208 322 2500 Canada 1-866-220-6419 United States 1-877-420-0272 All Others Please see attached PDF file with a list of numbers for other countries. Participant Code 5910819# Agenda Time Item Who Attachments/Links 5min Welcome. Wait for group members to log on Christina Yung, OICR 10min Overall status Christina Yung, OICR ? Linkouts to Most Current PCAWG Data ? Report data issues to pcawg-data at icgc.org, GNOS issue to: Help at annaisystems.com ? From Boston F2F: PCAWG datasets & dependencies Action Items 1. [Joachim] Consensus SV - final? 2. [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? 3. [Junjun] Specimen ID mapping for miRNA and methylation 4. [Jonathan & Joachim] Consensus calls for cell lines, followed by filtering 5. [Matthias & Gordon] Docker containers for filtering methods 6. [Christina] Run alignment & variant workflows on medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false negative rate 7. [Christina] Follow up with institutes interested in hosting PCAWG data long-term 8. [All] As per Jennifer's email on Sept 16, please provide authorship information again or for the first time using PCAWG Author Form (http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so you can go back later to provide updates, for example about your evolving role in writing specific papers. 9. [All] Contribute to the manuscripts on a. infrastructure: https://docs.google.com/document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e ) c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline ) 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 conference 10min Status of dockerizing workflows Brian O'Connor, UCSC Gordon Saksena, Broad Francis Ouellette, OICR Status of PCAWG Workflow ports to Dockstore: Denis has been porting the Dockstore entries to CWL version 1.0 which is part of our effort to publish Dockstore (this doesn't affect the content of the pipelines, simply their "descriptors" which allow them to be runnable via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL (which all work with CWL 1.0 and Kerian's test dataset) and has fixed issues with DKFZ and is testing the latter with a real sample shortly. 1. BWA-Mem - Ready for testing by Francis' team 2. Sanger - Ready for testing by Francis' team. 3. EMBL - Ready for testing by Francis' team. 4. DKFZ - Ready for testing by Francis' team. I've exchanged emails with Manuel Ballesteros who has been testing this pipeline. 5. Broad - Variant calling (MuTect, dRanger, snowman), need some work, Gordan sent details previously 6. OxoG - Waiting for Dimitri to provide OxoG docker 7. Variantbam 8. Consensus algorithm PCAWG Docker (Dockstore) Testing Working Group 5min Other business? Group Christina K. Yung, PhD Project Manager, Cancer Genome Collaboratory Ontario Institute for Cancer Research MaRS Centre 661 University Avenue, Suite 510 Toronto, Ontario, Canada M5G 0A3 Tel: 416-673-8578 www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ PAWG-TECH mailing list PAWG-TECH at lists.icgc.org https://lists.icgc.org/mailman/listinfo/pawg-tech _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Wed Oct 5 15:56:19 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Wed, 5 Oct 2016 21:56:19 +0200 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A07B1@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A07F6@exmb2.ad.oicr.on.ca> Message-ID: Yes Denis, it was my token. I uploaded by mistake and the revoked it. On Oct 5, 2016 8:15 PM, "Denis Yuen" wrote: Hi, FYI, I checked with the ICGC team, it doesn't look like a valid token, but it might be an expired token. If it is an expired token, it might be useful to delete it from your token manager on the ICGC site to be doubly sure. ------------------------------ *From:* docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca] *Sent:* October 5, 2016 11:43 AM *To:* Miguel Vazquez; Francis Ouellette *Cc:* docktesters at lists.icgc.org *Subject:* Re: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, Heads-up, is that token in https://github.com/mikisvaz/ PCAWG-Docker-Test/blob/master/icgc-storage-client-1.0.19/ conf/application.properties a real token? As for workflows, I would probably go in the order bwa-mem, delly, sanger (which you have already done), dkfz ------------------------------ *From:* docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es] *Sent:* October 5, 2016 4:44 AM *To:* Francis Ouellette *Cc:* kr2 at sanger.ac.uk; docktesters at lists.icgc.org *Subject:* Re: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hello all, I've created a repository in https://github.com/mikisvaz/PCAWG-Docker-Test With scripts that can help you run the tests. They can help download samples, prepare the Dockerstore.json files and run them. I've added an entry in the google doc. I've ran the Sanger on the HCC1143 (Test) data and it seemed to have completed correctly (I updated the table). I'm in the process now of running the same test data in the DKFZ and sample DO50311 on the Delly workflow. I have two standing questions: - What are the other workflows I need to test? - What is the bedpe file used by DKFZ and how to produce it. Denis has pointed me to some forum threads that I have to study a bit, but it seems that we need to run something to produce it before we can run the workflow, and I wonder how this fits into our testing scheme. I'll read more about it when I have a chance. Best regards On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: > Dear Miguel and Junjun, > > Any more attempts on testing the PCAWG sanger docker container? > > If you reproduce the same error, we will need to involve Bryan and > Keiran Raine > (author of > the container). > > Let?s get this one figured out. > > I am going to assume that the making of the docker container is what > needs resolving. > > Brian: We may need your input here. > > Details of our current experiment should continue to be posted here: > > https://goo.gl/XX5BG9 > > Thank you all, > > francis > > PS would be good for others on list to follow directions on above google > doc and also see if they can succeed on this workflow. > > Junjun and Miguel have tried different clouds, but used the sanger > workflow, on the same data set. > > Thank you for trying to do this. > > Would be good if I heard back from anybody before Monday AM (tech call). > > @bffo > > PS I CCed Keiran, but waiting to hear back from Brian before we need to > involve him some more. > PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same > data set). > > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > > > Begin forwarded message: > > *From: *Christina Yung > > *Subject: **[PAWG-TECH] Draft agenda for PCAWG-TECH teleconference* > *Date: *September 30, 2016 at 9:19:39 AM EDT > *To: *"pawg-tech (pawg-tech at lists.icgc.org )" > > > > Hi Everyone, > > > > Below is a draft agenda for Monday?s tech call. Please let me know if you > have any agenda items for discussion. > > https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG- > TECH+Teleconference > > > > Have a great weekend! > > Christina > > > Call Info > > *Usual Time 9 AM Eastern Time, Mondays* > > *UK 0208 322 2500* > > *Canada 1-866-220-6419* > > *United States **1-877-420-0272* > > *All Others Please see attached PDF file > with a list of numbers for other countries.* > > *Participant Code 5910819#* > Agenda > > > > *Time* > > *Item* > > *Who* > > *Attachments/Links* > > 5min > > Welcome. Wait for group members to log on > > Christina Yung , OICR > > 10min > > Overall status > Christina Yung , OICR > > ? Linkouts to Most Current PCAWG Data > > > ? Report data issues to pcawg-data at icgc.org > , GNOS issue to: Help at annaisystems.com > > > ? From Boston F2F: PCAWG datasets & dependencies > > > *Action Items* > > 1. [Joachim] Consensus SV - final? > > 2. [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? > > 3. [Junjun] Specimen ID mapping for miRNA and methylation > > 4. [Jonathan & Joachim] Consensus calls for cell lines, followed by > filtering > > 5. [Matthias & Gordon] Docker containers for filtering methods > > 6. [Christina] Run alignment & variant workflows on medulloblastoma > sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false > negative rate > > 7. [Christina] Follow up with institutes interested in hosting > PCAWG data long-term > > 8. [All] As per Jennifer's email on Sept 16, please provide > authorship information again or for the first time using PCAWG Author Form ( > http://goo.gl/forms/5Wq5x5X1DK ). Save the > link "Edit your response" so you can go back later to provide updates, for > example about your evolving role in writing specific papers. > > 9. [All] Contribute to the manuscripts on > > a. infrastructure: https://docs.google.com/docume > nt/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit > > > b. variants: Paper ( https://goo.gl/g9CLsu > ), Supplement ( https://goo.gl/EWYh7e > ) > > c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline > ) > > 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 > conference > > 10min > > Status of dockerizing workflows > Brian O'Connor , UCSC > > Gordon Saksena , Broad > > Francis Ouellette , OICR > *Status of PCAWG Workflow ports to Dockstore*: > > Denis has been porting the Dockstore entries to CWL version 1.0 which is > part of our effort to publish Dockstore (this doesn't affect the content of > the pipelines, simply their "descriptors" which allow them to be runnable > via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL > (which all work with CWL 1.0 and Kerian's test dataset) and has fixed > issues with DKFZ and is testing the latter with a real sample shortly. > > 1. BWA-Mem - Ready for testing by Francis' team > > 2. Sanger - Ready for testing by Francis' team. > > 3. EMBL - Ready for testing by Francis' team. > > 4. DKFZ - Ready for testing by Francis' team. I've exchanged emails > with Manuel Ballesteros who has been testing this pipeline. > > 5. Broad - Variant calling (MuTect, dRanger, snowman), need some > work, Gordan sent details previously > > 6. OxoG - Waiting for Dimitri to provide OxoG docker > > 7. Variantbam > > 8. Consensus algorithm > > PCAWG Docker (Dockstore) Testing Working Group > > > 5min > > Other business? > > Group > > > > > > > > *Christina K. Yung, PhD* > Project Manager, Cancer Genome Collaboratory > > *Ontario Institute for Cancer Research* > MaRS Centre > > 661 University Avenue, Suite 510 > Toronto, Ontario, Canada M5G 0A3 > Tel: 416-673-8578 > > www.oicr.on.ca > > > > This message and any attachments may contain confidential and/or > privileged information for the sole use of the intended recipient. Any > review or distribution by anyone other than the person for whom it was > originally intended is strictly prohibited. If you have received this > message in error, please contact the sender and delete all copies. > Opinions, conclusions or other information contained in this message may > not be that of the organization. > > > _______________________________________________ > PAWG-TECH mailing list > PAWG-TECH at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/pawg-tech > > > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Thu Oct 6 10:20:21 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Thu, 6 Oct 2016 14:20:21 +0000 Subject: [DOCKTESTERS] [Is-Suspect-Spam] RE: Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A079F@exmb2.ad.oicr.on.ca>, Message-ID: <27512884B2D81B41AAB7BB266248F240C09A083C@exmb2.ad.oicr.on.ca> Hi, Ah sorry, my mistake, I misread your output. The DKFZ workflow unlike the others, doesn't actually function with test data. You can only use it to test whether the workflow starts up properly. https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows#running-with-the-dockstore-command-line I've checked in a params.json Dockstore-BTCA-SG.json that does function but requires real data. Sorry for that. ________________________________ From: mikisvaz at gmail.com [mikisvaz at gmail.com] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es] Sent: October 6, 2016 2:44 AM To: Denis Yuen Subject: Re: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, This are all the logs I could salvage, unfortunately I didn't caputure STDERR and got this from the screen buffer. It seems like there are some folders mentioned like /mnt/datastore/testdata/ where the string 'testdata' is a bit suspicious. Perhaps greping for that in the source code or scripts can reveal if there is something hardcoded? If you'd like I can run the test again, tell you how to reproduce it using the scripts I made (should be pretty quick to setup for this data), or give you access to the instance I'm running Miguel On Wed, Oct 5, 2016 at 5:39 PM, Denis Yuen > wrote: Hi, Sorry, unfortunately, the message isn't very clear. The output before " mv: cannot stat `/mnt/datastore/resultdata/*': No such file or directory" comes from the CWL runner which basically is saying it couldn't find the output in the Docker container. The output above that is from DKFZ (Roddy rather, which is DKFZ's workflow engine) itself, the problem is actually "There was at least one error in a job status logfile. Will exit now!" but I don't see any more details Could you attach the full output as an attachment? There may be more hints in the rest of the output. I don't see any glaring issues with the JSON assuming that /home/ubuntu/DockerTest-Miguel/data/HCC1143/tumor.bam does in fact exist at that location. Denis Yuen Bioinformatics Software Developer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario, Canada M5G 0A3 Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es] Sent: October 5, 2016 6:44 AM To: Francis Ouellette Cc: kr2 at sanger.ac.uk; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi all, There was en Error running the test data for the DKFZ Workflow. It appears to have some path hardcoded somewhere: Here are the logs: 8<--------------- + [[ 1 == 0 ]] + for logfile in '${jobstateFiles[@]}' ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt ++ grep -v null: ++ wc -l ++ grep :STARTED: + cntStarted=4 ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt ++ grep -v null: ++ grep :0: ++ wc -l + cntSuccessful=4 ++ expr 4 - 4 + cntErrornous=0 + [[ 0 -gt 0 ]] + [[ 0 == 0 ]] + echo 'No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt' No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_071931526_roddy_snvCalling/jobStateLogfile.txt + for logfile in '${jobstateFiles[@]}' ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt ++ grep -v null: ++ grep :STARTED: ++ wc -l + cntStarted=2 ++ cat /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt ++ grep -v null: ++ wc -l ++ grep :0: + cntSuccessful=2 ++ expr 2 - 2 + cntErrornous=0 + [[ 0 -gt 0 ]] + [[ 0 == 0 ]] + echo 'No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt' No errors found for /mnt/datastore/testdata/run_id/roddyExecutionStore/exec_161005_072436413_roddy_indelCalling/jobStateLogfile.txt There was at least one error in a job status logfile. Will exit now! + [[ true == true ]] + echo 'There was at least one error in a job status logfile. Will exit now!' + exit 5 mv: cannot stat `/mnt/datastore/resultdata/*': No such file or directory Result directory listing is: Error while running job: Error collecting output for parameter 'germline_indel_vcf_gz': Did not find output file with glob pattern: '['*.germline.indel.vcf.gz']' [job temp5368999563481290609.cwl] completed permanentFail Final process status is permanentFail Workflow error, try again with --debug for more information: Process status is ['permanentFail'] stdout : java.lang.RuntimeException: problems running command: cwltool --enable-dev --non-strict --enable-net --outdir /mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/HCC1143/./datastore/launcher-7bb3a7ef-64e2-43ac-81c9-dd42e1b35f1b/outputs/ --tmpdir-prefix /mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/HCC1143/./datastore/launcher-7bb3a7ef-64e2-43ac-81c9-dd42e1b35f1b/working/ /tmp/1475650852550-0/temp5368999563481290609.cwl /mnt/1TB/work/DockerTest-Miguel/tests/DKFZ/HCC1143/./datastore/launcher-7bb3a7ef-64e2-43ac-81c9-dd42e1b35f1b/workflow_params.json 8<--------------- The Dockerstore.json I used is as follows: { "run-id": "run_id", "tumor-bam": { "path":"/home/ubuntu/DockerTest-Miguel/data/HCC1143/tumor.bam", "class":"File" }, "normal-bam": { "path":"/home/ubuntu/DockerTest-Miguel/data/HCC1143/normal.bam", "class":"File" }, "reference-gz": { "path": "/home/ubuntu/DockerTest-Miguel/resources//dkfz-workflow-dependencies_150318_0951.tar.gz", "class": "File" }, "delly-bedpe": { "path":"/home/ubuntu/DockerTest-Miguel/resources//run_id.embl-delly_1-3-0-preFilter.20150318.somatic.sv.bedpe.txt", "class":"File" }, "germline_indel_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.germline.indel.vcf.gz", "class": "File" }, "somatic_snv_mnv_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.snv.mnv.vcf.gz", "class": "File" }, "germline_snv_mnv_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.germline.snv.mnv.vcf.gz", "class": "File" }, "somatic_cnv_tar_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.cnv.tar.gz", "class": "File" }, "somatic_cnv_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.cnv.vcf.gz", "class": "File" }, "somatic_indel_tar_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.indel.tar.gz", "class": "File" }, "somatic_snv_mnv_tar_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.snv.mnv.tar.gz", "class": "File" }, "somatic_indel_vcf_gz": { "path": "/home/ubuntu/DockerTest-Miguel/tests/DKFZ/HCC1143//output//HCC1143.somatic.indel.vcf.gz", "class": "File" } } On Wed, Oct 5, 2016 at 10:44 AM, Miguel Vazquez > wrote: Hello all, I've created a repository in https://github.com/mikisvaz/PCAWG-Docker-Test With scripts that can help you run the tests. They can help download samples, prepare the Dockerstore.json files and run them. I've added an entry in the google doc. I've ran the Sanger on the HCC1143 (Test) data and it seemed to have completed correctly (I updated the table). I'm in the process now of running the same test data in the DKFZ and sample DO50311 on the Delly workflow. I have two standing questions: - What are the other workflows I need to test? - What is the bedpe file used by DKFZ and how to produce it. Denis has pointed me to some forum threads that I have to study a bit, but it seems that we need to run something to produce it before we can run the workflow, and I wonder how this fits into our testing scheme. I'll read more about it when I have a chance. Best regards On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: Dear Miguel and Junjun, Any more attempts on testing the PCAWG sanger docker container? If you reproduce the same error, we will need to involve Bryan and Keiran Raine > (author of the container). Let?s get this one figured out. I am going to assume that the making of the docker container is what needs resolving. Brian: We may need your input here. Details of our current experiment should continue to be posted here: https://goo.gl/XX5BG9 Thank you all, francis PS would be good for others on list to follow directions on above google doc and also see if they can succeed on this workflow. Junjun and Miguel have tried different clouds, but used the sanger workflow, on the same data set. Thank you for trying to do this. Would be good if I heard back from anybody before Monday AM (tech call). @bffo PS I CCed Keiran, but waiting to hear back from Brian before we need to involve him some more. PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same data set). -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette Begin forwarded message: From: Christina Yung > Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Date: September 30, 2016 at 9:19:39 AM EDT To: "pawg-tech (pawg-tech at lists.icgc.org)" > Hi Everyone, Below is a draft agenda for Monday?s tech call. Please let me know if you have any agenda items for discussion. https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG-TECH+Teleconference Have a great weekend! Christina Call Info Usual Time 9 AM Eastern Time, Mondays UK 0208 322 2500 Canada 1-866-220-6419 United States 1-877-420-0272 All Others Please see attached PDF file with a list of numbers for other countries. Participant Code 5910819# Agenda Time Item Who Attachments/Links 5min Welcome. Wait for group members to log on Christina Yung, OICR 10min Overall status Christina Yung, OICR ? Linkouts to Most Current PCAWG Data ? Report data issues to pcawg-data at icgc.org, GNOS issue to: Help at annaisystems.com ? From Boston F2F: PCAWG datasets & dependencies Action Items 1. [Joachim] Consensus SV - final? 2. [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? 3. [Junjun] Specimen ID mapping for miRNA and methylation 4. [Jonathan & Joachim] Consensus calls for cell lines, followed by filtering 5. [Matthias & Gordon] Docker containers for filtering methods 6. [Christina] Run alignment & variant workflows on medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false negative rate 7. [Christina] Follow up with institutes interested in hosting PCAWG data long-term 8. [All] As per Jennifer's email on Sept 16, please provide authorship information again or for the first time using PCAWG Author Form (http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so you can go back later to provide updates, for example about your evolving role in writing specific papers. 9. [All] Contribute to the manuscripts on a. infrastructure: https://docs.google.com/document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e ) c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline ) 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 conference 10min Status of dockerizing workflows Brian O'Connor, UCSC Gordon Saksena, Broad Francis Ouellette, OICR Status of PCAWG Workflow ports to Dockstore: Denis has been porting the Dockstore entries to CWL version 1.0 which is part of our effort to publish Dockstore (this doesn't affect the content of the pipelines, simply their "descriptors" which allow them to be runnable via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL (which all work with CWL 1.0 and Kerian's test dataset) and has fixed issues with DKFZ and is testing the latter with a real sample shortly. 1. BWA-Mem - Ready for testing by Francis' team 2. Sanger - Ready for testing by Francis' team. 3. EMBL - Ready for testing by Francis' team. 4. DKFZ - Ready for testing by Francis' team. I've exchanged emails with Manuel Ballesteros who has been testing this pipeline. 5. Broad - Variant calling (MuTect, dRanger, snowman), need some work, Gordan sent details previously 6. OxoG - Waiting for Dimitri to provide OxoG docker 7. Variantbam 8. Consensus algorithm PCAWG Docker (Dockstore) Testing Working Group 5min Other business? Group Christina K. Yung, PhD Project Manager, Cancer Genome Collaboratory Ontario Institute for Cancer Research MaRS Centre 661 University Avenue, Suite 510 Toronto, Ontario, Canada M5G 0A3 Tel: 416-673-8578 www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ PAWG-TECH mailing list PAWG-TECH at lists.icgc.org https://lists.icgc.org/mailman/listinfo/pawg-tech _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Fri Oct 7 02:27:22 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Fri, 7 Oct 2016 08:27:22 +0200 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> Message-ID: Hi all, The Delly workflow on DO50311 just finished correctly. I've added that to the table. It's important to point out that I consider a success just that the workflow completes, not that the results are what they need to be. That I guess we can do at a later stage. By the way, the Delly workflow just seems to produce SV, is that the case? no SNV? It also produces a bedpe file that I guess I can try on the DKFZ, which is what was missing. I'm going to try that now, albeit with no clear idea of what I'm doing I must say, I'm not familiar with any of this. Current status so far as I understand it is that only Delly works straigh out of the box on real data, but only produces only SV. Sanger works on test data and fails on real data. DKFZ fails on Test data and requires bedpe file for real data (produced by Delly?) BWA-Mem requires preprocesing the files? I have to say that from where I stand the preprocesing required by Delly and BWA-Mem is a bit inconvenient and makes the tools not stand-alone. I admit that I'm not an expert so I might have got some info wrong. Any comments on this please? Best Miguel On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette wrote: > Dear Miguel and Junjun, > > Any more attempts on testing the PCAWG sanger docker container? > > If you reproduce the same error, we will need to involve Bryan and > Keiran Raine (author of the container). > > Let?s get this one figured out. > > I am going to assume that the making of the docker container is what > needs resolving. > > Brian: We may need your input here. > > Details of our current experiment should continue to be posted here: > > https://goo.gl/XX5BG9 > > Thank you all, > > francis > > PS would be good for others on list to follow directions on above google > doc and also see if they can succeed on this workflow. > > Junjun and Miguel have tried different clouds, but used the sanger > workflow, on the same data set. > > Thank you for trying to do this. > > Would be good if I heard back from anybody before Monday AM (tech call). > > @bffo > > PS I CCed Keiran, but waiting to hear back from Brian before we need to > involve him some more. > PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same > data set). > > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > > > Begin forwarded message: > > *From: *Christina Yung > *Subject: **[PAWG-TECH] Draft agenda for PCAWG-TECH teleconference* > *Date: *September 30, 2016 at 9:19:39 AM EDT > *To: *"pawg-tech (pawg-tech at lists.icgc.org)" > > Hi Everyone, > > > > Below is a draft agenda for Monday?s tech call. Please let me know if you > have any agenda items for discussion. > > https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+ > PCAWG-TECH+Teleconference > > > > Have a great weekend! > > Christina > > > Call Info > > *Usual Time 9 AM Eastern Time, Mondays* > > *UK 0208 322 2500* > > *Canada 1-866-220-6419* > > *United States **1-877-420-0272* > > *All Others Please see attached PDF file > with > a list of numbers for other countries.* > > *Participant Code 5910819#* > Agenda > > > > *Time* > > *Item* > > *Who* > > *Attachments/Links* > > 5min > > Welcome. Wait for group members to log on > > Christina Yung , OICR > > 10min > > Overall status > Christina Yung , OICR > > ? Linkouts to Most Current PCAWG Data > > > ? Report data issues to pcawg-data at icgc.org, GNOS issue to: > Help at annaisystems.com > > ? From Boston F2F: PCAWG datasets & dependencies > > > *Action Items* > > 1. [Joachim] Consensus SV - final? > > 2. [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? > > 3. [Junjun] Specimen ID mapping for miRNA and methylation > > 4. [Jonathan & Joachim] Consensus calls for cell lines, followed by > filtering > > 5. [Matthias & Gordon] Docker containers for filtering methods > > 6. [Christina] Run alignment & variant workflows on medulloblastoma > sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false > negative rate > > 7. [Christina] Follow up with institutes interested in hosting > PCAWG data long-term > > 8. [All] As per Jennifer's email on Sept 16, please provide > authorship information again or for the first time using PCAWG Author Form ( > http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so > you can go back later to provide updates, for example about your evolving > role in writing specific papers. > > 9. [All] Contribute to the manuscripts on > > a. infrastructure: https://docs.google.com/ > document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit > > b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( > https://goo.gl/EWYh7e ) > > c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline > > ) > > 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 > conference > > 10min > > Status of dockerizing workflows > Brian O'Connor , UCSC > > Gordon Saksena > , Broad > > Francis Ouellette , OICR > *Status of PCAWG Workflow ports to Dockstore*: > > Denis has been porting the Dockstore entries to CWL version 1.0 which is > part of our effort to publish Dockstore (this doesn't affect the content of > the pipelines, simply their "descriptors" which allow them to be runnable > via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL > (which all work with CWL 1.0 and Kerian's test dataset) and has fixed > issues with DKFZ and is testing the latter with a real sample shortly. > > 1. BWA-Mem - Ready for testing by Francis' team > > 2. Sanger - Ready for testing by Francis' team. > > 3. EMBL - Ready for testing by Francis' team. > > 4. DKFZ - Ready for testing by Francis' team. I've exchanged emails > with Manuel Ballesteros who has been testing this pipeline. > > 5. Broad - Variant calling (MuTect, dRanger, snowman), need some > work, Gordan sent details previously > > 6. OxoG - Waiting for Dimitri to provide OxoG docker > > 7. Variantbam > > 8. Consensus algorithm > > PCAWG Docker (Dockstore) Testing Working Group > > > 5min > > Other business? > > Group > > > > > > > > *Christina K. Yung, PhD* > Project Manager, Cancer Genome Collaboratory > > *Ontario Institute for Cancer Research* > MaRS Centre > > 661 University Avenue, Suite 510 > Toronto, Ontario, Canada M5G 0A3 > Tel: 416-673-8578 > > www.oicr.on.ca > > > > This message and any attachments may contain confidential and/or > privileged information for the sole use of the intended recipient. Any > review or distribution by anyone other than the person for whom it was > originally intended is strictly prohibited. If you have received this > message in error, please contact the sender and delete all copies. > Opinions, conclusions or other information contained in this message may > not be that of the organization. > > > _______________________________________________ > PAWG-TECH mailing list > PAWG-TECH at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/pawg-tech > > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Fri Oct 7 05:35:38 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Fri, 7 Oct 2016 11:35:38 +0200 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <1FF2CBA3-3521-4C37-AA3B-952A115CAE5F@sanger.ac.uk> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <1FF2CBA3-3521-4C37-AA3B-952A115CAE5F@sanger.ac.uk> Message-ID: On Fri, Oct 7, 2016 at 10:37 AM, Keiran Raine wrote: > Hi, > > Can someone give me an idea what the errors you see on real data are for > Sanger? Which real data is in use? > Junjun and I tried it on donor DO50398 and it failed on the pindel process. I'll try to find our error report and forward it to you. > > Also what is non-standard in the BWA-mem preprocessing, I didn't work on > the actual workflow but I did give input at the beginning (and have a very > simple to use version in house which can run from fastq, BAM or CRAM > inputs). > It seems to require removing the alignment of the BAM file or something like that, right? I need to read more on it. Can you send me some info on how you did it? Best > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > > On 7 Oct 2016, at 07:27, Miguel Vazquez wrote: > > Hi all, > > The Delly workflow on DO50311 just finished correctly. I've added that to > the table. It's important to point out that I consider a success just that > the workflow completes, not that the results are what they need to be. That > I guess we can do at a later stage. > > By the way, the Delly workflow just seems to produce SV, is that the case? > no SNV? It also produces a bedpe file that I guess I can try on the DKFZ, > which is what was missing. I'm going to try that now, albeit with no clear > idea of what I'm doing I must say, I'm not familiar with any of this. > > Current status so far as I understand it is that only Delly works straigh > out of the box on real data, but only produces only SV. Sanger works on > test data and fails on real data. DKFZ fails on Test data and requires > bedpe file for real data (produced by Delly?) BWA-Mem requires preprocesing > the files? > > I have to say that from where I stand the preprocesing required by Delly > and BWA-Mem is a bit inconvenient and makes the tools not stand-alone. > > > I admit that I'm not an expert so I might have got some info wrong. Any > comments on this please? > > Best > > Miguel > > > > On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: > >> Dear Miguel and Junjun, >> >> Any more attempts on testing the PCAWG sanger docker container? >> >> If you reproduce the same error, we will need to involve Bryan and >> Keiran Raine (author of the container). >> >> Let?s get this one figured out. >> >> I am going to assume that the making of the docker container is what >> needs resolving. >> >> Brian: We may need your input here. >> >> Details of our current experiment should continue to be posted here: >> >> https://goo.gl/XX5BG9 >> >> Thank you all, >> >> francis >> >> PS would be good for others on list to follow directions on above google >> doc and also see if they can succeed on this workflow. >> >> Junjun and Miguel have tried different clouds, but used the sanger >> workflow, on the same data set. >> >> Thank you for trying to do this. >> >> Would be good if I heard back from anybody before Monday AM (tech call). >> >> @bffo >> >> PS I CCed Keiran, but waiting to hear back from Brian before we need to >> involve him some more. >> PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the >> same data set). >> >> >> -- >> B.F. Francis Ouellette http://oicr.on.ca/person/fran >> cis-ouellette >> >> >> >> Begin forwarded message: >> >> *From: *Christina Yung >> *Subject: **[PAWG-TECH] Draft agenda for PCAWG-TECH teleconference* >> *Date: *September 30, 2016 at 9:19:39 AM EDT >> *To: *"pawg-tech (pawg-tech at lists.icgc.org)" >> >> Hi Everyone, >> >> >> >> Below is a draft agenda for Monday?s tech call. Please let me know if >> you have any agenda items for discussion. >> >> https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG- >> TECH+Teleconference >> >> >> >> Have a great weekend! >> >> Christina >> >> >> Call Info >> >> *Usual Time 9 AM Eastern Time, Mondays* >> >> *UK 0208 322 2500* >> >> *Canada 1-866-220-6419* >> >> *United States **1-877-420-0272* >> >> *All Others Please see attached PDF file >> with >> a list of numbers for other countries.* >> >> *Participant Code 5910819#* >> Agenda >> >> >> *Time* >> >> *Item* >> >> *Who* >> >> *Attachments/Links* >> >> 5min >> >> Welcome. Wait for group members to log on >> >> Christina Yung , OICR >> >> 10min >> >> Overall status >> Christina Yung , OICR >> >> ? Linkouts to Most Current PCAWG Data >> >> >> ? Report data issues to pcawg-data at icgc.org, GNOS issue to: >> Help at annaisystems.com >> >> ? From Boston F2F: PCAWG datasets & dependencies >> >> >> *Action Items* >> >> 1. [Joachim] Consensus SV - final? >> >> 2. [Jakob] Consensus SNVs - changes to "SNV near indels" >> annotation? >> >> 3. [Junjun] Specimen ID mapping for miRNA and methylation >> >> 4. [Jonathan & Joachim] Consensus calls for cell lines, followed >> by filtering >> >> 5. [Matthias & Gordon] Docker containers for filtering methods >> >> 6. [Christina] Run alignment & variant workflows on >> medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to >> estimate false negative rate >> >> 7. [Christina] Follow up with institutes interested in hosting >> PCAWG data long-term >> >> 8. [All] As per Jennifer's email on Sept 16, please provide >> authorship information again or for the first time using PCAWG Author Form ( >> http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so >> you can go back later to provide updates, for example about your evolving >> role in writing specific papers. >> >> 9. [All] Contribute to the manuscripts on >> >> a. infrastructure: https://docs.google.com/docume >> nt/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit >> >> b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( >> https://goo.gl/EWYh7e ) >> >> c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline >> >> ) >> >> 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 >> conference >> >> 10min >> >> Status of dockerizing workflows >> Brian O'Connor , UCSC >> >> Gordon Saksena >> , Broad >> >> Francis Ouellette , OICR >> *Status of PCAWG Workflow ports to Dockstore*: >> >> Denis has been porting the Dockstore entries to CWL version 1.0 which is >> part of our effort to publish Dockstore (this doesn't affect the content of >> the pipelines, simply their "descriptors" which allow them to be runnable >> via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL >> (which all work with CWL 1.0 and Kerian's test dataset) and has fixed >> issues with DKFZ and is testing the latter with a real sample shortly. >> >> 1. BWA-Mem - Ready for testing by Francis' team >> >> 2. Sanger - Ready for testing by Francis' team. >> >> 3. EMBL - Ready for testing by Francis' team. >> >> 4. DKFZ - Ready for testing by Francis' team. I've exchanged >> emails with Manuel Ballesteros who has been testing this pipeline. >> >> 5. Broad - Variant calling (MuTect, dRanger, snowman), need some >> work, Gordan sent details previously >> >> 6. OxoG - Waiting for Dimitri to provide OxoG docker >> >> 7. Variantbam >> >> 8. Consensus algorithm >> >> PCAWG Docker (Dockstore) Testing Working Group >> >> >> 5min >> >> Other business? >> >> Group >> >> >> >> >> >> >> >> *Christina K. Yung, PhD* >> Project Manager, Cancer Genome Collaboratory >> >> *Ontario Institute for Cancer Research* >> MaRS Centre >> >> 661 University Avenue, Suite 510 >> Toronto, Ontario, Canada M5G 0A3 >> Tel: 416-673-8578 >> >> www.oicr.on.ca >> >> >> This message and any attachments may contain confidential and/or >> privileged information for the sole use of the intended recipient. Any >> review or distribution by anyone other than the person for whom it was >> originally intended is strictly prohibited. If you have received this >> message in error, please contact the sender and delete all copies. >> Opinions, conclusions or other information contained in this message may >> not be that of the organization. >> >> >> _______________________________________________ >> PAWG-TECH mailing list >> PAWG-TECH at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/pawg-tech >> >> >> >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters >> >> > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kr2 at sanger.ac.uk Fri Oct 7 04:37:26 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Fri, 7 Oct 2016 09:37:26 +0100 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> Message-ID: <1FF2CBA3-3521-4C37-AA3B-952A115CAE5F@sanger.ac.uk> Hi, Can someone give me an idea what the errors you see on real data are for Sanger? Which real data is in use? Also what is non-standard in the BWA-mem preprocessing, I didn't work on the actual workflow but I did give input at the beginning (and have a very simple to use version in house which can run from fastq, BAM or CRAM inputs). Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 7 Oct 2016, at 07:27, Miguel Vazquez wrote: > > Hi all, > > The Delly workflow on DO50311 just finished correctly. I've added that to the table. It's important to point out that I consider a success just that the workflow completes, not that the results are what they need to be. That I guess we can do at a later stage. > > By the way, the Delly workflow just seems to produce SV, is that the case? no SNV? It also produces a bedpe file that I guess I can try on the DKFZ, which is what was missing. I'm going to try that now, albeit with no clear idea of what I'm doing I must say, I'm not familiar with any of this. > > Current status so far as I understand it is that only Delly works straigh out of the box on real data, but only produces only SV. Sanger works on test data and fails on real data. DKFZ fails on Test data and requires bedpe file for real data (produced by Delly?) BWA-Mem requires preprocesing the files? > > I have to say that from where I stand the preprocesing required by Delly and BWA-Mem is a bit inconvenient and makes the tools not stand-alone. > > > I admit that I'm not an expert so I might have got some info wrong. Any comments on this please? > > Best > > Miguel > > > > On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: > Dear Miguel and Junjun, > > Any more attempts on testing the PCAWG sanger docker container? > > If you reproduce the same error, we will need to involve Bryan and > Keiran Raine > (author of the container). > > Let?s get this one figured out. > > I am going to assume that the making of the docker container is what > needs resolving. > > Brian: We may need your input here. > > Details of our current experiment should continue to be posted here: > > https://goo.gl/XX5BG9 > > Thank you all, > > francis > > PS would be good for others on list to follow directions on above google > doc and also see if they can succeed on this workflow. > > Junjun and Miguel have tried different clouds, but used the sanger > workflow, on the same data set. > > Thank you for trying to do this. > > Would be good if I heard back from anybody before Monday AM (tech call). > > @bffo > > PS I CCed Keiran, but waiting to hear back from Brian before we need to involve him some more. > PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same data set). > > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > >> Begin forwarded message: >> >> From: Christina Yung > >> Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> Date: September 30, 2016 at 9:19:39 AM EDT >> To: "pawg-tech (pawg-tech at lists.icgc.org )" > >> >> Hi Everyone, >> >> >> >> Below is a draft agenda for Monday?s tech call. Please let me know if you have any agenda items for discussion. >> >> https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG-TECH+Teleconference >> >> >> Have a great weekend! >> >> Christina >> >> >> >> Call Info >> Usual Time 9 AM Eastern Time, Mondays >> UK 0208 322 2500 >> Canada 1-866-220-6419 >> United States 1-877-420-0272 >> All Others Please see attached PDF file with a list of numbers for other countries. >> Participant Code 5910819# >> Agenda >> >> Time >> >> Item >> >> Who >> >> Attachments/Links >> >> 5min >> >> >> Welcome. Wait for group members to log on >> >> >> Christina Yung , OICR >> >> >> >> 10min >> >> >> Overall status >> >> >> >> Christina Yung , OICR >> >> >> ? >> Linkouts to Most Current PCAWG Data >> >> ? >> Report data issues to pcawg-data at icgc.org , GNOS issue to: Help at annaisystems.com >> >> ? >> From Boston F2F: >> >> PCAWG datasets & dependencies >> >> Action Items >> >> 1. >> [Joachim] Consensus SV - final? >> >> >> 2. >> [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? >> >> >> 3. >> [Junjun] Specimen ID mapping for miRNA and methylation >> >> >> 4. >> [Jonathan & Joachim] Consensus calls for cell lines, followed by filtering >> >> >> 5. >> [Matthias & Gordon] Docker containers for filtering methods >> >> >> 6. >> [Christina] Run alignment & variant workflows on medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false negative >> rate >> >> >> 7. >> [Christina] Follow up with institutes interested in hosting PCAWG data long-term >> >> >> 8. >> [All] As per Jennifer's email on Sept 16, please provide authorship information again or for the first time using PCAWG Author Form (http://goo.gl/forms/5Wq5x5X1DK ). >> Save the link "Edit your response" so you can go back later to provide updates, for example about your evolving role in writing specific papers. >> >> >> 9. >> [All] Contribute to the manuscripts on >> >> >> a. >> infrastructure: >> >> https://docs.google.com/document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit >> >> b. >> variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e ) >> >> >> c. >> Rogue's Gallery of Cancer Genome Sequencing Artifacts ( >> >> outline ) >> >> >> 10. >> [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 conference >> >> >> 10min >> >> >> Status of dockerizing workflows >> >> >> >> Brian O'Connor , UCSC >> >> Gordon Saksena , Broad >> >> Francis Ouellette , OICR >> >> >> Status of PCAWG Workflow ports to Dockstore: >> >> Denis has been porting the Dockstore entries to CWL version 1.0 which is part of our effort to publish Dockstore (this doesn't affect the content of the pipelines, simply their >> "descriptors" which allow them to be runnable via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL (which all work with CWL 1.0 and Kerian's test dataset) and has fixed issues with DKFZ and is testing the latter with a real sample shortly. >> >> 1. >> BWA-Mem - Ready for testing by Francis' team >> >> >> 2. >> Sanger - Ready for testing by Francis' team. >> >> >> 3. >> EMBL - Ready for testing by Francis' team. >> >> >> 4. >> DKFZ - Ready for testing by Francis' team. I've exchanged emails with Manuel Ballesteros who has been testing this pipeline. >> >> >> 5. >> Broad - Variant calling (MuTect, dRanger, snowman), need some work, Gordan sent details previously >> >> >> 6. >> OxoG - Waiting for Dimitri to provide OxoG docker >> >> >> 7. >> Variantbam >> >> >> 8. >> Consensus algorithm >> >> >> PCAWG Docker (Dockstore) Testing Working Group >> 5min >> >> >> Other business? >> >> >> Group >> >> >> >> >> >> >> >> >> >> Christina K. Yung, PhD >> Project Manager, Cancer Genome Collaboratory >> >> Ontario Institute for Cancer Research >> MaRS Centre >> >> 661 University Avenue, Suite 510 >> Toronto, Ontario, Canada M5G 0A3 >> Tel: 416-673-8578 >> >> www.oicr.on.ca >> >> This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. >> >> >> >> _______________________________________________ >> PAWG-TECH mailing list >> PAWG-TECH at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/pawg-tech > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Sat Oct 8 04:11:41 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Sat, 8 Oct 2016 10:11:41 +0200 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> Message-ID: Hi all, Good news. The DKFZ workflow was successful for donor DO50311. I've updated the table. I'm trying the Sanger workflow again with that donor. Have a nice weekend Miguel On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette wrote: > Dear Miguel and Junjun, > > Any more attempts on testing the PCAWG sanger docker container? > > If you reproduce the same error, we will need to involve Bryan and > Keiran Raine (author of the container). > > Let?s get this one figured out. > > I am going to assume that the making of the docker container is what > needs resolving. > > Brian: We may need your input here. > > Details of our current experiment should continue to be posted here: > > https://goo.gl/XX5BG9 > > Thank you all, > > francis > > PS would be good for others on list to follow directions on above google > doc and also see if they can succeed on this workflow. > > Junjun and Miguel have tried different clouds, but used the sanger > workflow, on the same data set. > > Thank you for trying to do this. > > Would be good if I heard back from anybody before Monday AM (tech call). > > @bffo > > PS I CCed Keiran, but waiting to hear back from Brian before we need to > involve him some more. > PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same > data set). > > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > > > Begin forwarded message: > > *From: *Christina Yung > *Subject: **[PAWG-TECH] Draft agenda for PCAWG-TECH teleconference* > *Date: *September 30, 2016 at 9:19:39 AM EDT > *To: *"pawg-tech (pawg-tech at lists.icgc.org)" > > Hi Everyone, > > > > Below is a draft agenda for Monday?s tech call. Please let me know if you > have any agenda items for discussion. > > https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG- > TECH+Teleconference > > > > Have a great weekend! > > Christina > > > Call Info > > *Usual Time 9 AM Eastern Time, Mondays* > > *UK 0208 322 2500* > > *Canada 1-866-220-6419* > > *United States **1-877-420-0272* > > *All Others Please see attached PDF file > with > a list of numbers for other countries.* > > *Participant Code 5910819#* > Agenda > > > > *Time* > > *Item* > > *Who* > > *Attachments/Links* > > 5min > > Welcome. Wait for group members to log on > > Christina Yung , OICR > > 10min > > Overall status > Christina Yung , OICR > > ? Linkouts to Most Current PCAWG Data > > > ? Report data issues to pcawg-data at icgc.org, GNOS issue to: > Help at annaisystems.com > > ? From Boston F2F: PCAWG datasets & dependencies > > > *Action Items* > > 1. [Joachim] Consensus SV - final? > > 2. [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? > > 3. [Junjun] Specimen ID mapping for miRNA and methylation > > 4. [Jonathan & Joachim] Consensus calls for cell lines, followed by > filtering > > 5. [Matthias & Gordon] Docker containers for filtering methods > > 6. [Christina] Run alignment & variant workflows on medulloblastoma > sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false > negative rate > > 7. [Christina] Follow up with institutes interested in hosting > PCAWG data long-term > > 8. [All] As per Jennifer's email on Sept 16, please provide > authorship information again or for the first time using PCAWG Author Form ( > http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so > you can go back later to provide updates, for example about your evolving > role in writing specific papers. > > 9. [All] Contribute to the manuscripts on > > a. infrastructure: https://docs.google.com/docume > nt/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit > > b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( > https://goo.gl/EWYh7e ) > > c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline > > ) > > 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 > conference > > 10min > > Status of dockerizing workflows > Brian O'Connor , UCSC > > Gordon Saksena > , Broad > > Francis Ouellette , OICR > *Status of PCAWG Workflow ports to Dockstore*: > > Denis has been porting the Dockstore entries to CWL version 1.0 which is > part of our effort to publish Dockstore (this doesn't affect the content of > the pipelines, simply their "descriptors" which allow them to be runnable > via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL > (which all work with CWL 1.0 and Kerian's test dataset) and has fixed > issues with DKFZ and is testing the latter with a real sample shortly. > > 1. BWA-Mem - Ready for testing by Francis' team > > 2. Sanger - Ready for testing by Francis' team. > > 3. EMBL - Ready for testing by Francis' team. > > 4. DKFZ - Ready for testing by Francis' team. I've exchanged emails > with Manuel Ballesteros who has been testing this pipeline. > > 5. Broad - Variant calling (MuTect, dRanger, snowman), need some > work, Gordan sent details previously > > 6. OxoG - Waiting for Dimitri to provide OxoG docker > > 7. Variantbam > > 8. Consensus algorithm > > PCAWG Docker (Dockstore) Testing Working Group > > > 5min > > Other business? > > Group > > > > > > > > *Christina K. Yung, PhD* > Project Manager, Cancer Genome Collaboratory > > *Ontario Institute for Cancer Research* > MaRS Centre > > 661 University Avenue, Suite 510 > Toronto, Ontario, Canada M5G 0A3 > Tel: 416-673-8578 > > www.oicr.on.ca > > > > This message and any attachments may contain confidential and/or > privileged information for the sole use of the intended recipient. Any > review or distribution by anyone other than the person for whom it was > originally intended is strictly prohibited. If you have received this > message in error, please contact the sender and delete all copies. > Opinions, conclusions or other information contained in this message may > not be that of the organization. > > > _______________________________________________ > PAWG-TECH mailing list > PAWG-TECH at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/pawg-tech > > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Mon Oct 10 03:11:31 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Mon, 10 Oct 2016 09:11:31 +0200 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> Message-ID: Hi all, I've reproduced the Sanger error again. I'm attaching the result of the command to see if this helps Best M On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette wrote: > Dear Miguel and Junjun, > > Any more attempts on testing the PCAWG sanger docker container? > > If you reproduce the same error, we will need to involve Bryan and > Keiran Raine (author of the container). > > Let?s get this one figured out. > > I am going to assume that the making of the docker container is what > needs resolving. > > Brian: We may need your input here. > > Details of our current experiment should continue to be posted here: > > https://goo.gl/XX5BG9 > > Thank you all, > > francis > > PS would be good for others on list to follow directions on above google > doc and also see if they can succeed on this workflow. > > Junjun and Miguel have tried different clouds, but used the sanger > workflow, on the same data set. > > Thank you for trying to do this. > > Would be good if I heard back from anybody before Monday AM (tech call). > > @bffo > > PS I CCed Keiran, but waiting to hear back from Brian before we need to > involve him some more. > PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same > data set). > > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > > > Begin forwarded message: > > *From: *Christina Yung > *Subject: **[PAWG-TECH] Draft agenda for PCAWG-TECH teleconference* > *Date: *September 30, 2016 at 9:19:39 AM EDT > *To: *"pawg-tech (pawg-tech at lists.icgc.org)" > > Hi Everyone, > > > > Below is a draft agenda for Monday?s tech call. Please let me know if you > have any agenda items for discussion. > > https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+ > PCAWG-TECH+Teleconference > > > > Have a great weekend! > > Christina > > > Call Info > > *Usual Time 9 AM Eastern Time, Mondays* > > *UK 0208 322 2500* > > *Canada 1-866-220-6419* > > *United States **1-877-420-0272* > > *All Others Please see attached PDF file > with > a list of numbers for other countries.* > > *Participant Code 5910819#* > Agenda > > > > *Time* > > *Item* > > *Who* > > *Attachments/Links* > > 5min > > Welcome. Wait for group members to log on > > Christina Yung , OICR > > 10min > > Overall status > Christina Yung , OICR > > ? Linkouts to Most Current PCAWG Data > > > ? Report data issues to pcawg-data at icgc.org, GNOS issue to: > Help at annaisystems.com > > ? From Boston F2F: PCAWG datasets & dependencies > > > *Action Items* > > 1. [Joachim] Consensus SV - final? > > 2. [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? > > 3. [Junjun] Specimen ID mapping for miRNA and methylation > > 4. [Jonathan & Joachim] Consensus calls for cell lines, followed by > filtering > > 5. [Matthias & Gordon] Docker containers for filtering methods > > 6. [Christina] Run alignment & variant workflows on medulloblastoma > sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false > negative rate > > 7. [Christina] Follow up with institutes interested in hosting > PCAWG data long-term > > 8. [All] As per Jennifer's email on Sept 16, please provide > authorship information again or for the first time using PCAWG Author Form ( > http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so > you can go back later to provide updates, for example about your evolving > role in writing specific papers. > > 9. [All] Contribute to the manuscripts on > > a. infrastructure: https://docs.google.com/ > document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit > > b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( > https://goo.gl/EWYh7e ) > > c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline > > ) > > 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 > conference > > 10min > > Status of dockerizing workflows > Brian O'Connor , UCSC > > Gordon Saksena > , Broad > > Francis Ouellette , OICR > *Status of PCAWG Workflow ports to Dockstore*: > > Denis has been porting the Dockstore entries to CWL version 1.0 which is > part of our effort to publish Dockstore (this doesn't affect the content of > the pipelines, simply their "descriptors" which allow them to be runnable > via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL > (which all work with CWL 1.0 and Kerian's test dataset) and has fixed > issues with DKFZ and is testing the latter with a real sample shortly. > > 1. BWA-Mem - Ready for testing by Francis' team > > 2. Sanger - Ready for testing by Francis' team. > > 3. EMBL - Ready for testing by Francis' team. > > 4. DKFZ - Ready for testing by Francis' team. I've exchanged emails > with Manuel Ballesteros who has been testing this pipeline. > > 5. Broad - Variant calling (MuTect, dRanger, snowman), need some > work, Gordan sent details previously > > 6. OxoG - Waiting for Dimitri to provide OxoG docker > > 7. Variantbam > > 8. Consensus algorithm > > PCAWG Docker (Dockstore) Testing Working Group > > > 5min > > Other business? > > Group > > > > > > > > *Christina K. Yung, PhD* > Project Manager, Cancer Genome Collaboratory > > *Ontario Institute for Cancer Research* > MaRS Centre > > 661 University Avenue, Suite 510 > Toronto, Ontario, Canada M5G 0A3 > Tel: 416-673-8578 > > www.oicr.on.ca > > > > This message and any attachments may contain confidential and/or > privileged information for the sole use of the intended recipient. Any > review or distribution by anyone other than the person for whom it was > originally intended is strictly prohibited. If you have received this > message in error, please contact the sender and delete all copies. > Opinions, conclusions or other information contained in this message may > not be that of the organization. > > > _______________________________________________ > PAWG-TECH mailing list > PAWG-TECH at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/pawg-tech > > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Sanger.Test.log Type: application/octet-stream Size: 94843 bytes Desc: not available URL: From kr2 at sanger.ac.uk Mon Oct 10 04:28:04 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Mon, 10 Oct 2016 09:28:04 +0100 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> Message-ID: Hi Miguel, It's odd that there's no error message. Can you give me a listing of the following areas (with file sizes too) /var/spool/cwl/0/pindel/tmpPindel/*/ And any content of the logs directory that have a non-zero file size: /var/spool/cwl/0/pindel/tmpPindel/logs Thanks, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 10 Oct 2016, at 08:11, Miguel Vazquez wrote: > > Hi all, > > I've reproduced the Sanger error again. I'm attaching the result of the command to see if this helps > > Best > > M > > On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: > Dear Miguel and Junjun, > > Any more attempts on testing the PCAWG sanger docker container? > > If you reproduce the same error, we will need to involve Bryan and > Keiran Raine > (author of the container). > > Let?s get this one figured out. > > I am going to assume that the making of the docker container is what > needs resolving. > > Brian: We may need your input here. > > Details of our current experiment should continue to be posted here: > > https://goo.gl/XX5BG9 > > Thank you all, > > francis > > PS would be good for others on list to follow directions on above google > doc and also see if they can succeed on this workflow. > > Junjun and Miguel have tried different clouds, but used the sanger > workflow, on the same data set. > > Thank you for trying to do this. > > Would be good if I heard back from anybody before Monday AM (tech call). > > @bffo > > PS I CCed Keiran, but waiting to hear back from Brian before we need to involve him some more. > PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same data set). > > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > >> Begin forwarded message: >> >> From: Christina Yung > >> Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> Date: September 30, 2016 at 9:19:39 AM EDT >> To: "pawg-tech (pawg-tech at lists.icgc.org )" > >> >> Hi Everyone, >> >> >> >> Below is a draft agenda for Monday?s tech call. Please let me know if you have any agenda items for discussion. >> >> https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG-TECH+Teleconference >> >> >> Have a great weekend! >> >> Christina >> >> >> >> Call Info >> Usual Time 9 AM Eastern Time, Mondays >> UK 0208 322 2500 >> Canada 1-866-220-6419 >> United States 1-877-420-0272 >> All Others Please see attached PDF file with a list of numbers for other countries. >> Participant Code 5910819# >> Agenda >> >> Time >> >> Item >> >> Who >> >> Attachments/Links >> >> 5min >> >> >> Welcome. Wait for group members to log on >> >> >> Christina Yung , OICR >> >> >> >> 10min >> >> >> Overall status >> >> >> >> Christina Yung , OICR >> >> >> ? >> Linkouts to Most Current PCAWG Data >> >> ? >> Report data issues to pcawg-data at icgc.org , GNOS issue to: Help at annaisystems.com >> >> ? >> From Boston F2F: >> >> PCAWG datasets & dependencies >> >> Action Items >> >> 1. >> [Joachim] Consensus SV - final? >> >> >> 2. >> [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? >> >> >> 3. >> [Junjun] Specimen ID mapping for miRNA and methylation >> >> >> 4. >> [Jonathan & Joachim] Consensus calls for cell lines, followed by filtering >> >> >> 5. >> [Matthias & Gordon] Docker containers for filtering methods >> >> >> 6. >> [Christina] Run alignment & variant workflows on medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false negative >> rate >> >> >> 7. >> [Christina] Follow up with institutes interested in hosting PCAWG data long-term >> >> >> 8. >> [All] As per Jennifer's email on Sept 16, please provide authorship information again or for the first time using PCAWG Author Form (http://goo.gl/forms/5Wq5x5X1DK ). >> Save the link "Edit your response" so you can go back later to provide updates, for example about your evolving role in writing specific papers. >> >> >> 9. >> [All] Contribute to the manuscripts on >> >> >> a. >> infrastructure: >> >> https://docs.google.com/document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit >> >> b. >> variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e ) >> >> >> c. >> Rogue's Gallery of Cancer Genome Sequencing Artifacts ( >> >> outline ) >> >> >> 10. >> [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 conference >> >> >> 10min >> >> >> Status of dockerizing workflows >> >> >> >> Brian O'Connor , UCSC >> >> Gordon Saksena , Broad >> >> Francis Ouellette , OICR >> >> >> Status of PCAWG Workflow ports to Dockstore: >> >> Denis has been porting the Dockstore entries to CWL version 1.0 which is part of our effort to publish Dockstore (this doesn't affect the content of the pipelines, simply their >> "descriptors" which allow them to be runnable via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL (which all work with CWL 1.0 and Kerian's test dataset) and has fixed issues with DKFZ and is testing the latter with a real sample shortly. >> >> 1. >> BWA-Mem - Ready for testing by Francis' team >> >> >> 2. >> Sanger - Ready for testing by Francis' team. >> >> >> 3. >> EMBL - Ready for testing by Francis' team. >> >> >> 4. >> DKFZ - Ready for testing by Francis' team. I've exchanged emails with Manuel Ballesteros who has been testing this pipeline. >> >> >> 5. >> Broad - Variant calling (MuTect, dRanger, snowman), need some work, Gordan sent details previously >> >> >> 6. >> OxoG - Waiting for Dimitri to provide OxoG docker >> >> >> 7. >> Variantbam >> >> >> 8. >> Consensus algorithm >> >> >> PCAWG Docker (Dockstore) Testing Working Group >> 5min >> >> >> Other business? >> >> >> Group >> >> >> >> >> >> >> >> >> >> Christina K. Yung, PhD >> Project Manager, Cancer Genome Collaboratory >> >> Ontario Institute for Cancer Research >> MaRS Centre >> >> 661 University Avenue, Suite 510 >> Toronto, Ontario, Canada M5G 0A3 >> Tel: 416-673-8578 >> >> www.oicr.on.ca >> >> This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. >> >> >> >> _______________________________________________ >> PAWG-TECH mailing list >> PAWG-TECH at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/pawg-tech > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Tue Oct 11 05:19:09 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Tue, 11 Oct 2016 11:19:09 +0200 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> Message-ID: Hi again, Here is the log file for the Sanger pipeline using a different donor. I hope this helps. Other than that there is not much I can do. The filesystem does not seem to hold anything other than the input files that are copied into the 'datastore' subdirectory, and the docker image has been wiped. If someone has any other suggestion on how I can make myself useful in this matter I'd be glad to hear it. Otherwise I'm moving on. Regards, Miguel On Mon, Oct 10, 2016 at 9:11 AM, Miguel Vazquez wrote: > Hi all, > > I've reproduced the Sanger error again. I'm attaching the result of the > command to see if this helps > > Best > > M > > On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: > >> Dear Miguel and Junjun, >> >> Any more attempts on testing the PCAWG sanger docker container? >> >> If you reproduce the same error, we will need to involve Bryan and >> Keiran Raine (author of the container). >> >> Let?s get this one figured out. >> >> I am going to assume that the making of the docker container is what >> needs resolving. >> >> Brian: We may need your input here. >> >> Details of our current experiment should continue to be posted here: >> >> https://goo.gl/XX5BG9 >> >> Thank you all, >> >> francis >> >> PS would be good for others on list to follow directions on above google >> doc and also see if they can succeed on this workflow. >> >> Junjun and Miguel have tried different clouds, but used the sanger >> workflow, on the same data set. >> >> Thank you for trying to do this. >> >> Would be good if I heard back from anybody before Monday AM (tech call). >> >> @bffo >> >> PS I CCed Keiran, but waiting to hear back from Brian before we need to >> involve him some more. >> PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the >> same data set). >> >> >> -- >> B.F. Francis Ouellette http://oicr.on.ca/person/fran >> cis-ouellette >> >> >> >> Begin forwarded message: >> >> *From: *Christina Yung >> *Subject: **[PAWG-TECH] Draft agenda for PCAWG-TECH teleconference* >> *Date: *September 30, 2016 at 9:19:39 AM EDT >> *To: *"pawg-tech (pawg-tech at lists.icgc.org)" >> >> Hi Everyone, >> >> >> >> Below is a draft agenda for Monday?s tech call. Please let me know if >> you have any agenda items for discussion. >> >> https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG- >> TECH+Teleconference >> >> >> >> Have a great weekend! >> >> Christina >> >> >> Call Info >> >> *Usual Time 9 AM Eastern Time, Mondays* >> >> *UK 0208 322 2500* >> >> *Canada 1-866-220-6419* >> >> *United States **1-877-420-0272* >> >> *All Others Please see attached PDF file >> with >> a list of numbers for other countries.* >> >> *Participant Code 5910819#* >> Agenda >> >> >> >> *Time* >> >> *Item* >> >> *Who* >> >> *Attachments/Links* >> >> 5min >> >> Welcome. Wait for group members to log on >> >> Christina Yung , OICR >> >> 10min >> >> Overall status >> Christina Yung , OICR >> >> ? Linkouts to Most Current PCAWG Data >> >> >> ? Report data issues to pcawg-data at icgc.org, GNOS issue to: >> Help at annaisystems.com >> >> ? From Boston F2F: PCAWG datasets & dependencies >> >> >> *Action Items* >> >> 1. [Joachim] Consensus SV - final? >> >> 2. [Jakob] Consensus SNVs - changes to "SNV near indels" >> annotation? >> >> 3. [Junjun] Specimen ID mapping for miRNA and methylation >> >> 4. [Jonathan & Joachim] Consensus calls for cell lines, followed >> by filtering >> >> 5. [Matthias & Gordon] Docker containers for filtering methods >> >> 6. [Christina] Run alignment & variant workflows on >> medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to >> estimate false negative rate >> >> 7. [Christina] Follow up with institutes interested in hosting >> PCAWG data long-term >> >> 8. [All] As per Jennifer's email on Sept 16, please provide >> authorship information again or for the first time using PCAWG Author Form ( >> http://goo.gl/forms/5Wq5x5X1DK). Save the link "Edit your response" so >> you can go back later to provide updates, for example about your evolving >> role in writing specific papers. >> >> 9. [All] Contribute to the manuscripts on >> >> a. infrastructure: https://docs.google.com/docume >> nt/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit >> >> b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( >> https://goo.gl/EWYh7e ) >> >> c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline >> >> ) >> >> 10. [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 >> conference >> >> 10min >> >> Status of dockerizing workflows >> Brian O'Connor , UCSC >> >> Gordon Saksena >> , Broad >> >> Francis Ouellette , OICR >> *Status of PCAWG Workflow ports to Dockstore*: >> >> Denis has been porting the Dockstore entries to CWL version 1.0 which is >> part of our effort to publish Dockstore (this doesn't affect the content of >> the pipelines, simply their "descriptors" which allow them to be runnable >> via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL >> (which all work with CWL 1.0 and Kerian's test dataset) and has fixed >> issues with DKFZ and is testing the latter with a real sample shortly. >> >> 1. BWA-Mem - Ready for testing by Francis' team >> >> 2. Sanger - Ready for testing by Francis' team. >> >> 3. EMBL - Ready for testing by Francis' team. >> >> 4. DKFZ - Ready for testing by Francis' team. I've exchanged >> emails with Manuel Ballesteros who has been testing this pipeline. >> >> 5. Broad - Variant calling (MuTect, dRanger, snowman), need some >> work, Gordan sent details previously >> >> 6. OxoG - Waiting for Dimitri to provide OxoG docker >> >> 7. Variantbam >> >> 8. Consensus algorithm >> >> PCAWG Docker (Dockstore) Testing Working Group >> >> >> 5min >> >> Other business? >> >> Group >> >> >> >> >> >> >> >> *Christina K. Yung, PhD* >> Project Manager, Cancer Genome Collaboratory >> >> *Ontario Institute for Cancer Research* >> MaRS Centre >> >> 661 University Avenue, Suite 510 >> Toronto, Ontario, Canada M5G 0A3 >> Tel: 416-673-8578 >> >> www.oicr.on.ca >> >> >> >> This message and any attachments may contain confidential and/or >> privileged information for the sole use of the intended recipient. Any >> review or distribution by anyone other than the person for whom it was >> originally intended is strictly prohibited. If you have received this >> message in error, please contact the sender and delete all copies. >> Opinions, conclusions or other information contained in this message may >> not be that of the organization. >> >> >> _______________________________________________ >> PAWG-TECH mailing list >> PAWG-TECH at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/pawg-tech >> >> >> >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Sanger.DO50398.log Type: application/octet-stream Size: 146368 bytes Desc: not available URL: From kr2 at sanger.ac.uk Tue Oct 11 07:50:02 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Tue, 11 Oct 2016 12:50:02 +0100 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> Message-ID: <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> Hi, 1. Can you confirm that the donors you are using are the standard BAMs mapped within the PanCancer project i.e. those available in GNOS as I have copied of all local? 2. Is there an MD5 check of the BAM files carried out? When pulled from GSNO it was part of that application. 3. All the output of pindel is being written to '/var/spool/cwl/0/pindel/tmpPindel/' Is this area part of the 1TB(or large) workspace? 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. If you don't have the ability for the state to be saved it's going to be very difficult to debug this, Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 11 Oct 2016, at 10:19, Miguel Vazquez wrote: > > Hi again, > > Here is the log file for the Sanger pipeline using a different donor. I hope this helps. Other than that there is not much I can do. The filesystem does not seem to hold anything other than the input files that are copied into the 'datastore' subdirectory, and the docker image has been wiped. > > If someone has any other suggestion on how I can make myself useful in this matter I'd be glad to hear it. Otherwise I'm moving on. > > Regards, > > Miguel > > On Mon, Oct 10, 2016 at 9:11 AM, Miguel Vazquez > wrote: > Hi all, > > I've reproduced the Sanger error again. I'm attaching the result of the command to see if this helps > > Best > > M > > On Fri, Sep 30, 2016 at 3:52 PM, Francis Ouellette > wrote: > Dear Miguel and Junjun, > > Any more attempts on testing the PCAWG sanger docker container? > > If you reproduce the same error, we will need to involve Bryan and > Keiran Raine > (author of the container). > > Let?s get this one figured out. > > I am going to assume that the making of the docker container is what > needs resolving. > > Brian: We may need your input here. > > Details of our current experiment should continue to be posted here: > > https://goo.gl/XX5BG9 > > Thank you all, > > francis > > PS would be good for others on list to follow directions on above google > doc and also see if they can succeed on this workflow. > > Junjun and Miguel have tried different clouds, but used the sanger > workflow, on the same data set. > > Thank you for trying to do this. > > Would be good if I heard back from anybody before Monday AM (tech call). > > @bffo > > PS I CCed Keiran, but waiting to hear back from Brian before we need to involve him some more. > PPS Junjun/Miguel: maybe you can try the DKFZ docker as well? (on the same data set). > > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > >> Begin forwarded message: >> >> From: Christina Yung > >> Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> Date: September 30, 2016 at 9:19:39 AM EDT >> To: "pawg-tech (pawg-tech at lists.icgc.org )" > >> >> Hi Everyone, >> >> >> >> Below is a draft agenda for Monday?s tech call. Please let me know if you have any agenda items for discussion. >> >> https://wiki.oicr.on.ca/display/PANCANCER/2016-10-03+PCAWG-TECH+Teleconference >> >> >> Have a great weekend! >> >> Christina >> >> >> >> Call Info >> Usual Time 9 AM Eastern Time, Mondays >> UK 0208 322 2500 >> Canada 1-866-220-6419 >> United States 1-877-420-0272 >> All Others Please see attached PDF file with a list of numbers for other countries. >> Participant Code 5910819# >> Agenda >> >> Time >> >> Item >> >> Who >> >> Attachments/Links >> >> 5min >> >> >> Welcome. Wait for group members to log on >> >> >> Christina Yung , OICR >> >> >> >> 10min >> >> >> Overall status >> >> >> >> Christina Yung , OICR >> >> >> ? >> Linkouts to Most Current PCAWG Data >> >> ? >> Report data issues to pcawg-data at icgc.org , GNOS issue to: Help at annaisystems.com >> >> ? >> From Boston F2F: >> >> PCAWG datasets & dependencies >> >> Action Items >> >> 1. >> [Joachim] Consensus SV - final? >> >> >> 2. >> [Jakob] Consensus SNVs - changes to "SNV near indels" annotation? >> >> >> 3. >> [Junjun] Specimen ID mapping for miRNA and methylation >> >> >> 4. >> [Jonathan & Joachim] Consensus calls for cell lines, followed by filtering >> >> >> 5. >> [Matthias & Gordon] Docker containers for filtering methods >> >> >> 6. >> [Christina] Run alignment & variant workflows on medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark to estimate false negative >> rate >> >> >> 7. >> [Christina] Follow up with institutes interested in hosting PCAWG data long-term >> >> >> 8. >> [All] As per Jennifer's email on Sept 16, please provide authorship information again or for the first time using PCAWG Author Form (http://goo.gl/forms/5Wq5x5X1DK ). >> Save the link "Edit your response" so you can go back later to provide updates, for example about your evolving role in writing specific papers. >> >> >> 9. >> [All] Contribute to the manuscripts on >> >> >> a. >> infrastructure: >> >> https://docs.google.com/document/d/10alAxrWLdLSyhci-rfNuVH13rFXCJkaY_rzf1KJn7nc/edit >> >> b. >> variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e ) >> >> >> c. >> Rogue's Gallery of Cancer Genome Sequencing Artifacts ( >> >> outline ) >> >> >> 10. >> [Junjun] Discuss PCAWG vs DCC glossary terms at next PCAWG-10/13 conference >> >> >> 10min >> >> >> Status of dockerizing workflows >> >> >> >> Brian O'Connor , UCSC >> >> Gordon Saksena , Broad >> >> Francis Ouellette , OICR >> >> >> Status of PCAWG Workflow ports to Dockstore: >> >> Denis has been porting the Dockstore entries to CWL version 1.0 which is part of our effort to publish Dockstore (this doesn't affect the content of the pipelines, simply their >> "descriptors" which allow them to be runnable via Dockstore). Denis has also worked on testing BWA-Mem, Sanger, EMBL (which all work with CWL 1.0 and Kerian's test dataset) and has fixed issues with DKFZ and is testing the latter with a real sample shortly. >> >> 1. >> BWA-Mem - Ready for testing by Francis' team >> >> >> 2. >> Sanger - Ready for testing by Francis' team. >> >> >> 3. >> EMBL - Ready for testing by Francis' team. >> >> >> 4. >> DKFZ - Ready for testing by Francis' team. I've exchanged emails with Manuel Ballesteros who has been testing this pipeline. >> >> >> 5. >> Broad - Variant calling (MuTect, dRanger, snowman), need some work, Gordan sent details previously >> >> >> 6. >> OxoG - Waiting for Dimitri to provide OxoG docker >> >> >> 7. >> Variantbam >> >> >> 8. >> Consensus algorithm >> >> >> PCAWG Docker (Dockstore) Testing Working Group >> 5min >> >> >> Other business? >> >> >> Group >> >> >> >> >> >> >> >> >> >> Christina K. Yung, PhD >> Project Manager, Cancer Genome Collaboratory >> >> Ontario Institute for Cancer Research >> MaRS Centre >> >> 661 University Avenue, Suite 510 >> Toronto, Ontario, Canada M5G 0A3 >> Tel: 416-673-8578 >> >> www.oicr.on.ca >> >> This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. >> >> >> >> _______________________________________________ >> PAWG-TECH mailing list >> PAWG-TECH at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/pawg-tech > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Tue Oct 11 08:22:11 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Tue, 11 Oct 2016 14:22:11 +0200 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> Message-ID: > 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds > which could indicate a problem with either the headers or the absence of > the BAS file from the expected location. > > I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. Best Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Tue Oct 11 08:31:02 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Tue, 11 Oct 2016 14:31:02 +0200 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> Message-ID: Keiran, Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? Best Miguel On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez wrote: > > 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds >> which could indicate a problem with either the headers or the absence of >> the BAS file from the expected location. >> >> > I think that you just revealed the problem. There is in fact no BAS files > only BAM and BAI. There where BAS files for the test data HCC1143 which is > the one that in fact work. It seems like BAS files are not gathered by > gnos, could that be? or that my script fails to copy them. I'll try to > gather a different sample with my client and check. > > Not knowing a thing about these files explains why I didn't notice. I'll > get back to you when I know more. > > Best > > Miguel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kr2 at sanger.ac.uk Tue Oct 11 08:40:02 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Tue, 11 Oct 2016 13:40:02 +0100 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> Message-ID: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. Hope this is easier to solve now, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 11 Oct 2016, at 13:31, Miguel Vazquez wrote: > > Keiran, > > Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? > > Best > > Miguel > > On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: > > 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. > > > I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. > > Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. > > Best > > Miguel > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kr2 at sanger.ac.uk Tue Oct 11 08:44:19 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Tue, 11 Oct 2016 13:44:19 +0100 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> Message-ID: <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> Relevant section of code: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 11 Oct 2016, at 13:40, Keiran Raine wrote: > > Hi, > > There is a step generating the BAS files: > > [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh > [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh > > But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. > > This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. > > Hope this is easier to solve now, > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > >> On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: >> >> Keiran, >> >> Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? >> >> Best >> >> Miguel >> >> On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: >> >> 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. >> >> >> I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. >> >> Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. >> >> Best >> >> Miguel >> > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Tue Oct 11 08:47:03 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Tue, 11 Oct 2016 14:47:03 +0200 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> Message-ID: Hi Keiran, On Tue, Oct 11, 2016 at 2:40 PM, Keiran Raine wrote: > Hi, > > There is a step generating the BAS files: > > [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9- > 8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_ > basFileGenerate_control_11-runner.sh > [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9- > 8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_ > basFileGenerate_tumours_12-runner.sh > > But if the BAM files and BAS aren't co-located then you have a problem. > You could symlink the BAM files into the work space and have all tools work > from that path instead, deleting the symlinks at the end. > I'm afraid that this is not under my control. Since I'm running this through dockstore there is little I can do about it as far as I understand it. Maybe someone that knows dockstore can help out at this point? Best Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Tue Oct 11 09:20:32 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Tue, 11 Oct 2016 13:20:32 +0000 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> Message-ID: Brian/Dennis: Suggestion for a tool/container: Additional documentation (in readme.md?): checklist of which files need to be there/contributed to the existing container to make the pipeline work. @bffo On Oct 11, 2016, at 8:47 AM, Miguel Vazquez > wrote: Hi Keiran, On Tue, Oct 11, 2016 at 2:40 PM, Keiran Raine > wrote: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. I'm afraid that this is not under my control. Since I'm running this through dockstore there is little I can do about it as far as I understand it. Maybe someone that knows dockstore can help out at this point? Best Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Tue Oct 11 10:55:32 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Tue, 11 Oct 2016 14:55:32 +0000 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> , Message-ID: <27512884B2D81B41AAB7BB266248F240C09A112A@exmb2.ad.oicr.on.ca> Hi, Sorry, catching up with the thread after thanksgiving. For debugging the state of the filesystem, you can instruct Dockstore to leave the filesystem in place after execution. You can subsequently start the container interactively and examine it. The documentation is here https://github.com/ga4gh/dockstore-ui/commit/b3fe88d706da59e6af350ff1dec9d9492a690724 but in short, you need to add `cwltool-extra-parameters: --leave-container`to your config. For the bas files, Keiran, do you have any thoughts on how this functions differently between the test data and the donor Miguel is using? From the code snippet https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 it looks like the bas files are generated from the bam files. Miguel, I would try to to look for that resulting bas file after saving the filesystem as above and see if there appears to be anything wrong with it. Denis Yuen Bioinformatics Software Developer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario, Canada M5G 0A3 Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es] Sent: October 11, 2016 8:47 AM To: Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Keiran, On Tue, Oct 11, 2016 at 2:40 PM, Keiran Raine > wrote: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. I'm afraid that this is not under my control. Since I'm running this through dockstore there is little I can do about it as far as I understand it. Maybe someone that knows dockstore can help out at this point? Best Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Wed Oct 12 05:37:26 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Wed, 12 Oct 2016 11:37:26 +0200 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> Message-ID: [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] Hi Keiran, Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? Miguel On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine wrote: > In the original version we didn't do this step, if we have write access it > can be made to do that > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > > On 11 Oct 2016, at 13:49, Miguel Vazquez wrote: > > Hi Keiran, > > If the BAS and BAM files need to be collocated, why is it not created next > to the BAM file? > > Would it not be better if it read > > private Job basFileBaseJob(int tumourCount, String sampleBam, String > process, int index) { > Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, > index); > File f = new File(sampleBam); > thisJob.getCommand() > .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") > .addArgument(installBase) > .addArgument("bam_stats") > .addArgument("-i " + sampleBam) > .addArgument("-o " + sampleBam + ".bas") > ; > return thisJob; > } > > Best > > Miguel > > On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine wrote: > >> Relevant section of code: >> >> https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/bl >> ob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore >> .java#L769-L780 >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 11 Oct 2016, at 13:40, Keiran Raine wrote: >> >> Hi, >> >> There is a step generating the BAS files: >> >> [2016/10/10 07:28:37] | Running command: bash >> /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/genera >> ted-scripts/s58_basFileGenerate_control_11-runner.sh >> [2016/10/10 07:28:37] | Running command: bash >> /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/genera >> ted-scripts/s58_basFileGenerate_tumours_12-runner.sh >> >> But if the BAM files and BAS aren't co-located then you have a problem. >> You could symlink the BAM files into the work space and have all tools work >> from that path instead, deleting the symlinks at the end. >> >> This is one of the changes we had to implement differently as the BAS >> file data was being held in the GNOS xml data structures during the initial >> processing. Moving to this means that any BAM input is sufficient. >> >> Hope this is easier to solve now, >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 11 Oct 2016, at 13:31, Miguel Vazquez wrote: >> >> Keiran, >> >> Its downloading the files still but in fact it does not seem to download >> any BAS file. Could you please educate me a bit on what are these and how I >> can create them? >> >> Best >> >> Miguel >> >> On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez >> wrote: >> >>> >>> 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds >>>> which could indicate a problem with either the headers or the absence of >>>> the BAS file from the expected location. >>>> >>>> >>> I think that you just revealed the problem. There is in fact no BAS >>> files only BAM and BAI. There where BAS files for the test data HCC1143 >>> which is the one that in fact work. It seems like BAS files are not >>> gathered by gnos, could that be? or that my script fails to copy them. I'll >>> try to gather a different sample with my client and check. >>> >>> Not knowing a thing about these files explains why I didn't notice. I'll >>> get back to you when I know more. >>> >>> Best >>> >>> Miguel >>> >> >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a company >> registered in England with number 2742969, whose registered office is 215 >> Euston Road, London, NW1 2BE. >> > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kr2 at sanger.ac.uk Wed Oct 12 06:16:57 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Wed, 12 Oct 2016 11:16:57 +0100 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> Message-ID: <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> Hi, This is assuming that it is possible to write to the location the BAM are in. I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 12 Oct 2016, at 10:37, Miguel Vazquez wrote: > > [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] > > Hi Keiran, > > Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? > > > Miguel > > > > On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: > In the original version we didn't do this step, if we have write access it can be made to do that > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > >> On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: >> >> Hi Keiran, >> >> If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? >> >> Would it not be better if it read >> >> private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { >> Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); >> File f = new File(sampleBam); >> thisJob.getCommand() >> .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") >> .addArgument(installBase) >> .addArgument("bam_stats") >> .addArgument("-i " + sampleBam) >> .addArgument("-o " + sampleBam + ".bas") >> ; >> return thisJob; >> } >> >> Best >> >> Miguel >> >> On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: >> Relevant section of code: >> >> https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >>> On 11 Oct 2016, at 13:40, Keiran Raine > wrote: >>> >>> Hi, >>> >>> There is a step generating the BAS files: >>> >>> [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh >>> [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh >>> >>> But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. >>> >>> This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. >>> >>> Hope this is easier to solve now, >>> >>> Keiran Raine >>> Principal Bioinformatician >>> Cancer Genome Project >>> Wellcome Trust Sanger Institute >>> >>> kr2 at sanger.ac.uk >>> Tel:+44 (0)1223 834244 Ext: 4983 >>> Office: H104 >>> >>>> On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: >>>> >>>> Keiran, >>>> >>>> Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? >>>> >>>> Best >>>> >>>> Miguel >>>> >>>> On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: >>>> >>>> 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. >>>> >>>> >>>> I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. >>>> >>>> Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. >>>> >>>> Best >>>> >>>> Miguel >>>> >>> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Wed Oct 12 07:17:10 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Wed, 12 Oct 2016 13:17:10 +0200 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> Message-ID: Ok Keiran, I leave it up to you guys then. I'll try again when the issue is resolved somehow. Thanks for you help Miguel On Wed, Oct 12, 2016 at 12:16 PM, Keiran Raine wrote: > Hi, > > This is assuming that it is possible to write to the location the BAM are > in. > > I think Denis would be best placed to make the minor modification as I > don't know the process they are using for build and deploy of the images (I > made modifications and then handed over for CWL). > > Regards, > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > > On 12 Oct 2016, at 10:37, Miguel Vazquez wrote: > > [The rest of the list where out of the loop for this part of the > conversation, I'm putting them back in. In short, the Sanger pipeline > produces the BAS file but not co-located with the BAM] > > Hi Keiran, > > Would it be possible then to change this and try again? what needs to > happen? I guess you'll need to change the code and a new docker image be > produced. Would this be our best alternative? > > > Miguel > > > > On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine wrote: > >> In the original version we didn't do this step, if we have write access >> it can be made to do that >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 11 Oct 2016, at 13:49, Miguel Vazquez wrote: >> >> Hi Keiran, >> >> If the BAS and BAM files need to be collocated, why is it not created >> next to the BAM file? >> >> Would it not be better if it read >> >> private Job basFileBaseJob(int tumourCount, String sampleBam, String >> process, int index) { >> Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, >> index); >> File f = new File(sampleBam); >> thisJob.getCommand() >> .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") >> .addArgument(installBase) >> .addArgument("bam_stats") >> .addArgument("-i " + sampleBam) >> .addArgument("-o " + sampleBam + ".bas") >> ; >> return thisJob; >> } >> >> Best >> >> Miguel >> >> On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine wrote: >> >>> Relevant section of code: >>> >>> https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/bl >>> ob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore >>> .java#L769-L780 >>> >>> Keiran Raine >>> Principal Bioinformatician >>> Cancer Genome Project >>> Wellcome Trust Sanger Institute >>> >>> kr2 at sanger.ac.uk >>> Tel:+44 (0)1223 834244 Ext: 4983 >>> Office: H104 >>> >>> On 11 Oct 2016, at 13:40, Keiran Raine wrote: >>> >>> Hi, >>> >>> There is a step generating the BAS files: >>> >>> [2016/10/10 07:28:37] | Running command: bash >>> /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/genera >>> ted-scripts/s58_basFileGenerate_control_11-runner.sh >>> [2016/10/10 07:28:37] | Running command: bash >>> /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/genera >>> ted-scripts/s58_basFileGenerate_tumours_12-runner.sh >>> >>> But if the BAM files and BAS aren't co-located then you have a problem. >>> You could symlink the BAM files into the work space and have all tools work >>> from that path instead, deleting the symlinks at the end. >>> >>> This is one of the changes we had to implement differently as the BAS >>> file data was being held in the GNOS xml data structures during the initial >>> processing. Moving to this means that any BAM input is sufficient. >>> >>> Hope this is easier to solve now, >>> >>> Keiran Raine >>> Principal Bioinformatician >>> Cancer Genome Project >>> Wellcome Trust Sanger Institute >>> >>> kr2 at sanger.ac.uk >>> Tel:+44 (0)1223 834244 Ext: 4983 >>> Office: H104 >>> >>> On 11 Oct 2016, at 13:31, Miguel Vazquez wrote: >>> >>> Keiran, >>> >>> Its downloading the files still but in fact it does not seem to download >>> any BAS file. Could you please educate me a bit on what are these and how I >>> can create them? >>> >>> Best >>> >>> Miguel >>> >>> On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez >>> wrote: >>> >>>> >>>> 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds >>>>> which could indicate a problem with either the headers or the absence of >>>>> the BAS file from the expected location. >>>>> >>>>> >>>> I think that you just revealed the problem. There is in fact no BAS >>>> files only BAM and BAI. There where BAS files for the test data HCC1143 >>>> which is the one that in fact work. It seems like BAS files are not >>>> gathered by gnos, could that be? or that my script fails to copy them. I'll >>>> try to gather a different sample with my client and check. >>>> >>>> Not knowing a thing about these files explains why I didn't notice. >>>> I'll get back to you when I know more. >>>> >>>> Best >>>> >>>> Miguel >>>> >>> >>> >>> >>> >>> -- The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, a charity registered in England with number 1021457 and a company >>> registered in England with number 2742969, whose registered office is 215 >>> Euston Road, London, NW1 2BE. >>> >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a company >> registered in England with number 2742969, whose registered office is 215 >> Euston Road, London, NW1 2BE. >> > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Wed Oct 12 10:36:58 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Wed, 12 Oct 2016 14:36:58 +0000 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> , <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> Message-ID: <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca> Hi, I can make the modification, I'll run it through the test data and that should finish in roughly a day. In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 6:16 AM To: Miguel Vazquez Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, This is assuming that it is possible to write to the location the BAM are in. I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] Hi Keiran, Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? Miguel On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: In the original version we didn't do this step, if we have write access it can be made to do that Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: Hi Keiran, If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? Would it not be better if it read private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); File f = new File(sampleBam); thisJob.getCommand() .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") .addArgument(installBase) .addArgument("bam_stats") .addArgument("-i " + sampleBam) .addArgument("-o " + sampleBam + ".bas") ; return thisJob; } Best Miguel On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: Relevant section of code: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:40, Keiran Raine > wrote: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. Hope this is easier to solve now, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: Keiran, Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? Best Miguel On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. Best Miguel -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kr2 at sanger.ac.uk Wed Oct 12 10:49:27 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Wed, 12 Oct 2016 15:49:27 +0100 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca> Message-ID: Hi Denis, I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 12 Oct 2016, at 15:36, Denis Yuen wrote: > > Hi, > I can make the modification, I'll run it through the test data and that should finish in roughly a day. > In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? > > > From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Keiran Raine [kr2 at sanger.ac.uk] > Sent: October 12, 2016 6:16 AM > To: Miguel Vazquez > Cc: docktesters at lists.icgc.org > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > Hi, > > This is assuming that it is possible to write to the location the BAM are in. > > I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). > > Regards, > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > >> On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: >> >> [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] >> >> Hi Keiran, >> >> Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? >> >> >> Miguel >> >> >> >> On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: >> In the original version we didn't do this step, if we have write access it can be made to do that >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >>> On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: >>> >>> Hi Keiran, >>> >>> If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? >>> >>> Would it not be better if it read >>> >>> private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { >>> Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); >>> File f = new File(sampleBam); >>> thisJob.getCommand() >>> .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") >>> .addArgument(installBase) >>> .addArgument("bam_stats") >>> .addArgument("-i " + sampleBam) >>> .addArgument("-o " + sampleBam + ".bas") >>> ; >>> return thisJob; >>> } >>> >>> Best >>> >>> Miguel >>> >>> On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: >>> Relevant section of code: >>> >>> https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 >>> >>> Keiran Raine >>> Principal Bioinformatician >>> Cancer Genome Project >>> Wellcome Trust Sanger Institute >>> >>> kr2 at sanger.ac.uk >>> Tel:+44 (0)1223 834244 Ext: 4983 >>> Office: H104 >>> >>>> On 11 Oct 2016, at 13:40, Keiran Raine > wrote: >>>> >>>> Hi, >>>> >>>> There is a step generating the BAS files: >>>> >>>> [2016/10/10 07:28:37] | >>>> Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh >>>> [2016/10/10 07:28:37] | >>>> Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh >>>> >>>> But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. >>>> >>>> This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. >>>> >>>> Hope this is easier to solve now, >>>> >>>> Keiran Raine >>>> Principal Bioinformatician >>>> Cancer Genome Project >>>> Wellcome Trust Sanger Institute >>>> >>>> kr2 at sanger.ac.uk >>>> Tel:+44 (0)1223 834244 Ext: 4983 >>>> Office: H104 >>>> >>>>> On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: >>>>> >>>>> Keiran, >>>>> >>>>> Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? >>>>> >>>>> Best >>>>> >>>>> Miguel >>>>> >>>>> On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: >>>>> >>>>> 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. >>>>> >>>>> >>>>> I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. >>>>> >>>>> Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. >>>>> >>>>> Best >>>>> >>>>> Miguel >>>>> >>>> >>> >>> >>> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >>> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Wed Oct 12 10:59:32 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Wed, 12 Oct 2016 14:59:32 +0000 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca>, Message-ID: <27512884B2D81B41AAB7BB266248F240C09A12E1@exmb2.ad.oicr.on.ca> Hi, While that would have been a good explanation, unfortunately, it doesn't seem to be the case. In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like inputs: tumor: type: File inputBinding: position: 1 prefix: --tumor secondaryFiles: - .bai refFrom: type: File inputBinding: position: 3 prefix: --refFrom bbFrom: type: File inputBinding: position: 4 prefix: --bbFrom normal: type: File inputBinding: position: 2 prefix: --normal secondaryFiles: - .bai The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 10:49 AM To: Denis Yuen Cc: Miguel Vazquez; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 15:36, Denis Yuen > wrote: Hi, I can make the modification, I'll run it through the test data and that should finish in roughly a day. In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 6:16 AM To: Miguel Vazquez Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, This is assuming that it is possible to write to the location the BAM are in. I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] Hi Keiran, Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? Miguel On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: In the original version we didn't do this step, if we have write access it can be made to do that Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: Hi Keiran, If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? Would it not be better if it read private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); File f = new File(sampleBam); thisJob.getCommand() .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") .addArgument(installBase) .addArgument("bam_stats") .addArgument("-i " + sampleBam) .addArgument("-o " + sampleBam + ".bas") ; return thisJob; } Best Miguel On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: Relevant section of code: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:40, Keiran Raine > wrote: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. Hope this is easier to solve now, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: Keiran, Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? Best Miguel On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. Best Miguel -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Wed Oct 12 12:04:47 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Wed, 12 Oct 2016 16:04:47 +0000 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A12E1@exmb2.ad.oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca>, , <27512884B2D81B41AAB7BB266248F240C09A12E1@exmb2.ad.oicr.on.ca> Message-ID: <27512884B2D81B41AAB7BB266248F240C09A1314@exmb2.ad.oicr.on.ca> Hi, Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section. Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)? I'm attaching the build log from the Dockerfile and the log from inside the container. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca] Sent: October 12, 2016 10:59 AM To: Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, While that would have been a good explanation, unfortunately, it doesn't seem to be the case. In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like inputs: tumor: type: File inputBinding: position: 1 prefix: --tumor secondaryFiles: - .bai refFrom: type: File inputBinding: position: 3 prefix: --refFrom bbFrom: type: File inputBinding: position: 4 prefix: --bbFrom normal: type: File inputBinding: position: 2 prefix: --normal secondaryFiles: - .bai The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 10:49 AM To: Denis Yuen Cc: Miguel Vazquez; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 15:36, Denis Yuen > wrote: Hi, I can make the modification, I'll run it through the test data and that should finish in roughly a day. In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 6:16 AM To: Miguel Vazquez Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, This is assuming that it is possible to write to the location the BAM are in. I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] Hi Keiran, Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? Miguel On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: In the original version we didn't do this step, if we have write access it can be made to do that Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: Hi Keiran, If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? Would it not be better if it read private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); File f = new File(sampleBam); thisJob.getCommand() .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") .addArgument(installBase) .addArgument("bam_stats") .addArgument("-i " + sampleBam) .addArgument("-o " + sampleBam + ".bas") ; return thisJob; } Best Miguel On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: Relevant section of code: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:40, Keiran Raine > wrote: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. Hope this is easier to solve now, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: Keiran, Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? Best Miguel On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. Best Miguel -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: build.log Type: text/x-log Size: 32586 bytes Desc: build.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: outer.build.log Type: text/x-log Size: 19806 bytes Desc: outer.build.log URL: From kr2 at sanger.ac.uk Wed Oct 12 14:09:36 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Wed, 12 Oct 2016 19:09:36 +0100 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A1314@exmb2.ad.oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A12E1@exmb2.ad.oicr.on.ca> <27512884B2D8 1B41AAB7BB266248F240C09A1314@exmb2.ad.oicr.on.ca> Message-ID: <66DF88B3-B90D-419B-8E9E-D9864BE237FA@sanger.ac.uk> Hi Denis, You've hit an issue that only has occurred in the last few weeks for us also. BioPerl released a new version (first in ~20 months) that split the repository moving a whole section into a different package. The fix would be to force the first install of BioPerl to a specific version. Modify line 25/26 of the Dockerfile from: RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version Const::Fast Graph && \ rm -rf ~/.cpanm to RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm Thankfully something I could identify immediately. Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 12 Oct 2016, at 17:04, Denis Yuen wrote: > > Hi, > > Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section. > Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)? > I'm attaching the build log from the Dockerfile and the log from inside the container. > > > > From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca] > Sent: October 12, 2016 10:59 AM > To: Keiran Raine > Cc: docktesters at lists.icgc.org > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > Hi, > > While that would have been a good explanation, unfortunately, it doesn't seem to be the case. > In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like > > inputs: > tumor: > type: File > inputBinding: > position: 1 > prefix: --tumor > secondaryFiles: > - .bai > > refFrom: > type: File > inputBinding: > position: 3 > prefix: --refFrom > bbFrom: > type: File > inputBinding: > position: 4 > prefix: --bbFrom > normal: > type: File > inputBinding: > position: 2 > prefix: --normal > secondaryFiles: > - .bai > The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. > > From: Keiran Raine [kr2 at sanger.ac.uk] > Sent: October 12, 2016 10:49 AM > To: Denis Yuen > Cc: Miguel Vazquez; docktesters at lists.icgc.org > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > Hi Denis, > > I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > >> On 12 Oct 2016, at 15:36, Denis Yuen > wrote: >> >> Hi, >> I can make the modification, I'll run it through the test data and that should finish in roughly a day. >> In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? >> >> >> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org ] on behalf of Keiran Raine [kr2 at sanger.ac.uk ] >> Sent: October 12, 2016 6:16 AM >> To: Miguel Vazquez >> Cc: docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi, >> >> This is assuming that it is possible to write to the location the BAM are in. >> >> I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). >> >> Regards, >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >>> On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: >>> >>> [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] >>> >>> Hi Keiran, >>> >>> Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? >>> >>> >>> Miguel >>> >>> >>> >>> On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: >>> In the original version we didn't do this step, if we have write access it can be made to do that >>> >>> Keiran Raine >>> Principal Bioinformatician >>> Cancer Genome Project >>> Wellcome Trust Sanger Institute >>> >>> kr2 at sanger.ac.uk >>> Tel:+44 (0)1223 834244 Ext: 4983 >>> Office: H104 >>> >>>> On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: >>>> >>>> Hi Keiran, >>>> >>>> If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? >>>> >>>> Would it not be better if it read >>>> >>>> private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { >>>> Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); >>>> File f = new File(sampleBam); >>>> thisJob.getCommand() >>>> .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") >>>> .addArgument(installBase) >>>> .addArgument("bam_stats") >>>> .addArgument("-i " + sampleBam) >>>> .addArgument("-o " + sampleBam + ".bas") >>>> ; >>>> return thisJob; >>>> } >>>> >>>> Best >>>> >>>> Miguel >>>> >>>> On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: >>>> Relevant section of code: >>>> >>>> https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 >>>> >>>> Keiran Raine >>>> Principal Bioinformatician >>>> Cancer Genome Project >>>> Wellcome Trust Sanger Institute >>>> >>>> kr2 at sanger.ac.uk >>>> Tel:+44 (0)1223 834244 Ext: 4983 >>>> Office: H104 >>>> >>>>> On 11 Oct 2016, at 13:40, Keiran Raine > wrote: >>>>> >>>>> Hi, >>>>> >>>>> There is a step generating the BAS files: >>>>> >>>>> [2016/10/10 07:28:37] | >>>>> Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh >>>>> [2016/10/10 07:28:37] | >>>>> Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh >>>>> >>>>> But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. >>>>> >>>>> This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. >>>>> >>>>> Hope this is easier to solve now, >>>>> >>>>> Keiran Raine >>>>> Principal Bioinformatician >>>>> Cancer Genome Project >>>>> Wellcome Trust Sanger Institute >>>>> >>>>> kr2 at sanger.ac.uk >>>>> Tel:+44 (0)1223 834244 Ext: 4983 >>>>> Office: H104 >>>>> >>>>>> On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: >>>>>> >>>>>> Keiran, >>>>>> >>>>>> Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? >>>>>> >>>>>> Best >>>>>> >>>>>> Miguel >>>>>> >>>>>> On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: >>>>>> >>>>>> 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. >>>>>> >>>>>> >>>>>> I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. >>>>>> >>>>>> Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. >>>>>> >>>>>> Best >>>>>> >>>>>> Miguel >>>>>> >>>>> >>>> >>>> >>>> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >>>> >>> >>> >>> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >>> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsaksena at broadinstitute.org Wed Oct 12 15:11:07 2016 From: gsaksena at broadinstitute.org (Gordon Saksena) Date: Wed, 12 Oct 2016 15:11:07 -0400 Subject: [DOCKTESTERS] Broad dockers Message-ID: Hi, I have approval for the Broad PCAWG dockers to go on the staging (non-password protected) portion of Dockstore. The Github Repo and DockerHub image have permissions granted to folks who sent me their github and docker usernames. This should be adequate for the initial rounds of testing. The tokens portion of the pipeline should be ready for folks to give it a spin. The non-protected reference files are passed in via http in the inputs file, and it only needs the normal BAM. It takes about 5 hours on a full size file using a couple cores. Other portions will be added as they are split out and tested. While the current permissions setup is fine for testing, something different will be needed for post publication. Many of the algorithms are (or will be) free only for educational or research use, but require a separate license for commercial use. We should discuss how that use case can be supported, and whether it has implications on testing. https://hub.docker.com/r/broadinstitute/pcawg_tokens/ https://github.com/broadinstitute/pcawg https://github.com/broadinstitute/pcawg/blob/master/tasks/tokens/taskdef.tokens.wdl https://github.com/broadinstitute/pcawg/blob/master/tasks/tokens/inputtest_http_refdata.tokens.json Gordon -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsaksena at broadinstitute.org Wed Oct 12 17:42:27 2016 From: gsaksena at broadinstitute.org (Gordon Saksena) Date: Wed, 12 Oct 2016 17:42:27 -0400 Subject: [DOCKTESTERS] Broad dockers In-Reply-To: References: Message-ID: Btw the Broad docker MTA is now all set for OICR, UCSC, and OHSU. Gordon On Oct 12, 2016 3:11 PM, "Gordon Saksena" wrote: > Hi, > > I have approval for the Broad PCAWG dockers to go on the staging > (non-password protected) portion of Dockstore. The Github Repo and > DockerHub image have permissions granted to folks who sent me their github > and docker usernames. This should be adequate for the initial rounds of > testing. > > The tokens portion of the pipeline should be ready for folks to give it a > spin. The non-protected reference files are passed in via http in the > inputs file, and it only needs the normal BAM. It takes about 5 hours on a > full size file using a couple cores. Other portions will be added as they > are split out and tested. > > While the current permissions setup is fine for testing, something > different will be needed for post publication. Many of the algorithms are > (or will be) free only for educational or research use, but require a > separate license for commercial use. We should discuss how that use case > can be supported, and whether it has implications on testing. > > > https://hub.docker.com/r/broadinstitute/pcawg_tokens/ > > https://github.com/broadinstitute/pcawg > https://github.com/broadinstitute/pcawg/blob/master/tasks/tokens/taskdef. > tokens.wdl > https://github.com/broadinstitute/pcawg/blob/ > master/tasks/tokens/inputtest_http_refdata.tokens.json > > Gordon > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Thu Oct 13 10:46:24 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Thu, 13 Oct 2016 14:46:24 +0000 Subject: [DOCKTESTERS] Broad dockers In-Reply-To: References: Message-ID: <27512884B2D81B41AAB7BB266248F240C09A141C@exmb2.ad.oicr.on.ca> Hi, Thanks for the heads-up, we'll look into these containers and see what we can do in terms of testing them and getting them posted to the Dockstore. I do have a few questions: 1) What versions of Cromwell and wdl4s was this portion of the pipeline tested with? 2) This release is for the tokens portion of the pipeline, how many portions of the pipeline do you anticipate will be available in the end? (and do you have a handy graph/chart that we can use to describe this?) 3) For post-publication, are you thinking about a dual-license model ( something like https://www.quora.com/What-is-the-best-license-to-apply-on-an-open-source-project-that-I-intend-to-sell-commercially-as-well ) or are you thinking about some form of DRM in the Docker image? ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Gordon Saksena [gsaksena at broadinstitute.org] Sent: October 12, 2016 3:11 PM To: docktesters Cc: Gad Getz Subject: [DOCKTESTERS] Broad dockers Hi, I have approval for the Broad PCAWG dockers to go on the staging (non-password protected) portion of Dockstore. The Github Repo and DockerHub image have permissions granted to folks who sent me their github and docker usernames. This should be adequate for the initial rounds of testing. The tokens portion of the pipeline should be ready for folks to give it a spin. The non-protected reference files are passed in via http in the inputs file, and it only needs the normal BAM. It takes about 5 hours on a full size file using a couple cores. Other portions will be added as they are split out and tested. While the current permissions setup is fine for testing, something different will be needed for post publication. Many of the algorithms are (or will be) free only for educational or research use, but require a separate license for commercial use. We should discuss how that use case can be supported, and whether it has implications on testing. https://hub.docker.com/r/broadinstitute/pcawg_tokens/ https://github.com/broadinstitute/pcawg https://github.com/broadinstitute/pcawg/blob/master/tasks/tokens/taskdef.tokens.wdl https://github.com/broadinstitute/pcawg/blob/master/tasks/tokens/inputtest_http_refdata.tokens.json Gordon -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsaksena at broadinstitute.org Thu Oct 13 11:57:37 2016 From: gsaksena at broadinstitute.org (Gordon Saksena) Date: Thu, 13 Oct 2016 11:57:37 -0400 Subject: [DOCKTESTERS] Broad dockers In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A141C@exmb2.ad.oicr.on.ca> References: <27512884B2D81B41AAB7BB266248F240C09A141C@exmb2.ad.oicr.on.ca> Message-ID: 1) I tested under Cromwell 0.19.3. I will likely move over to 0.21 soon. I used wdltool 0.4. 2) At the moment I have almost 20 dockers listed, including some that are filters and some that have known buggy algorithms. More discussion is needed, probably both internally and externally, to develop a more stable list and map out timelines. Let me discuss more internal to Broad first before sharing the list. The intent is to allow users to both A) reproduce the PCAWG work, and B) let them apply the algorithms to their own new project. A) (reproducibility) could be mostly served by one big docker, as was run in production, but there are currently issues around a) distributing the aggregated panel of normals for MuTect and dRanger, b) some GPL code needs to be unlinked from the mini-bam creator, and c) the MuTect-ContEst rescue code needs to be patched in. The one big docker is not ideal for B) (re-application), due to the additional issues: a) wanting bug fixes, b) you need to wait for algorithms you don't care about, c) the SV caller algorithms demand a ton of RAM and CPU time on certain samples, making them bad neighbors on cheap VMs, d) some algorithms (eg aggregators, filters) have not been dockerized yet and would delay the release of others, and e) even portions licensed for free commercial use would be bundled under the more restrictive GATK license. 3) The GATK dual-license model is documented here: https://software.broadinstitute.org/gatk/download/licensing.php And, something like it will probably apply to certain portions that do not use GATK. We have no plans to use DRM. Gordon On Thu, Oct 13, 2016 at 10:46 AM, Denis Yuen wrote: > Hi, > Thanks for the heads-up, we'll look into these containers and see what we > can do in terms of testing them and getting them posted to the Dockstore. > > I do have a few questions: > 1) What versions of Cromwell and wdl4s was this portion of the pipeline > tested with? > 2) This release is for the tokens portion of the pipeline, how many > portions of the pipeline do you anticipate will be available in the end? > (and do you have a handy graph/chart that we can use to describe this?) > 3) For post-publication, are you thinking about a dual-license model ( > something like https://www.quora.com/What-is-the-best-license-to-apply-on- > an-open-source-project-that-I-intend-to-sell-commercially-as-well ) or > are you thinking about some form of DRM in the Docker image? > > > ------------------------------ > *From:* docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org > [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of > Gordon Saksena [gsaksena at broadinstitute.org] > *Sent:* October 12, 2016 3:11 PM > *To:* docktesters > *Cc:* Gad Getz > *Subject:* [DOCKTESTERS] Broad dockers > > Hi, > > I have approval for the Broad PCAWG dockers to go on the staging > (non-password protected) portion of Dockstore. The Github Repo and > DockerHub image have permissions granted to folks who sent me their github > and docker usernames. This should be adequate for the initial rounds of > testing. > > The tokens portion of the pipeline should be ready for folks to give it a > spin. The non-protected reference files are passed in via http in the > inputs file, and it only needs the normal BAM. It takes about 5 hours on a > full size file using a couple cores. Other portions will be added as they > are split out and tested. > > While the current permissions setup is fine for testing, something > different will be needed for post publication. Many of the algorithms are > (or will be) free only for educational or research use, but require a > separate license for commercial use. We should discuss how that use case > can be supported, and whether it has implications on testing. > > > https://hub.docker.com/r/broadinstitute/pcawg_tokens/ > > > https://github.com/broadinstitute/pcawg > > https://github.com/broadinstitute/pcawg/blob/master/tasks/tokens/taskdef. > tokens.wdl > > https://github.com/broadinstitute/pcawg/blob/ > master/tasks/tokens/inputtest_http_refdata.tokens.json > > > Gordon > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Fri Oct 14 17:55:33 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Fri, 14 Oct 2016 21:55:33 +0000 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <66DF88B3-B90D-419B-8E9E-D9864BE237FA@sanger.ac.uk> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A12E1@exmb2.ad.oicr.on.ca> <27512884B2D8 1B41AAB7BB266248F240C09A1314@exmb2.ad.oicr.on.ca>, <66DF88B3-B90D-419B-8E9E-D9864BE237FA@sanger.ac.uk> Message-ID: <27512884B2D81B41AAB7BB266248F240C09A165C@exmb2.ad.oicr.on.ca> Hi, Just as a heads-up for the end-of-week in this thread. > RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm This got me on the right track, I actually needed the following syntax RUN cpanm --mirror https://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install CJFIELDS/BioPerl-1.6.924.tar.gz Const::Fast Graph && \ rm -rf ~/.cpanm However, it looks like making the suggested change breaks the workflow when attempting to run with the test data. In short, the bas file is definitely being generated inside the Docker container. Moving it to the suggested location breaks the workflow later. I'm currently attempting to run the donor DO50311 to see if I can get more insight into what is going on. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 2:09 PM To: Denis Yuen Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, You've hit an issue that only has occurred in the last few weeks for us also. BioPerl released a new version (first in ~20 months) that split the repository moving a whole section into a different package. The fix would be to force the first install of BioPerl to a specific version. Modify line 25/26 of the Dockerfile from: RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version Const::Fast Graph && \ rm -rf ~/.cpanm to RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm Thankfully something I could identify immediately. Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 17:04, Denis Yuen > wrote: Hi, Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section. Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)? I'm attaching the build log from the Dockerfile and the log from inside the container. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca] Sent: October 12, 2016 10:59 AM To: Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, While that would have been a good explanation, unfortunately, it doesn't seem to be the case. In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like inputs: tumor: type: File inputBinding: position: 1 prefix: --tumor secondaryFiles: - .bai refFrom: type: File inputBinding: position: 3 prefix: --refFrom bbFrom: type: File inputBinding: position: 4 prefix: --bbFrom normal: type: File inputBinding: position: 2 prefix: --normal secondaryFiles: - .bai The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 10:49 AM To: Denis Yuen Cc: Miguel Vazquez; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 15:36, Denis Yuen > wrote: Hi, I can make the modification, I'll run it through the test data and that should finish in roughly a day. In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 6:16 AM To: Miguel Vazquez Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, This is assuming that it is possible to write to the location the BAM are in. I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] Hi Keiran, Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? Miguel On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: In the original version we didn't do this step, if we have write access it can be made to do that Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: Hi Keiran, If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? Would it not be better if it read private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); File f = new File(sampleBam); thisJob.getCommand() .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") .addArgument(installBase) .addArgument("bam_stats") .addArgument("-i " + sampleBam) .addArgument("-o " + sampleBam + ".bas") ; return thisJob; } Best Miguel On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: Relevant section of code: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:40, Keiran Raine > wrote: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. Hope this is easier to solve now, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: Keiran, Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? Best Miguel On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. Best Miguel -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From strucka at ohsu.edu Fri Oct 14 18:10:14 2016 From: strucka at ohsu.edu (Adam Struck) Date: Fri, 14 Oct 2016 22:10:14 +0000 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A165C@exmb2.ad.oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A12E1@exmb2.ad.oicr.on.ca> <66DF88B3-B90D-419B-8E9E-D9864BE237FA@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A165C@exmb2.ad.oicr.on.ca> Message-ID: <7DDCBF2C-66B0-42F2-B1CF-60508098BDD8@ohsu.edu> Hi Denis, Sorry, to chime in late. The bas file and input BAMs should be getting colocalized already (see below). Where are these files ending up when you run the workflow? Inputs are symlinked to the OUTDIR. https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L1035-L1039 Bas files are written to OUTDIR https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 I have now run 25 donors worth of data (from PRAD-UK) through this workflow using the WDL descriptor and the cromwell engine on the CCC platform without an issue. -Adam From: on behalf of Denis Yuen Date: Friday, October 14, 2016 at 2:55 PM To: Keiran Raine Cc: "docktesters at lists.icgc.org" Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, Just as a heads-up for the end-of-week in this thread. > RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm This got me on the right track, I actually needed the following syntax RUN cpanm --mirror https://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install CJFIELDS/BioPerl-1.6.924.tar.gz Const::Fast Graph && \ rm -rf ~/.cpanm However, it looks like making the suggested change breaks the workflow when attempting to run with the test data. In short, the bas file is definitely being generated inside the Docker container. Moving it to the suggested location breaks the workflow later. I'm currently attempting to run the donor DO50311 to see if I can get more insight into what is going on. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 2:09 PM To: Denis Yuen Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, You've hit an issue that only has occurred in the last few weeks for us also. BioPerl released a new version (first in ~20 months) that split the repository moving a whole section into a different package. The fix would be to force the first install of BioPerl to a specific version. Modify line 25/26 of the Dockerfile from: RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version Const::Fast Graph && \ rm -rf ~/.cpanm to RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm Thankfully something I could identify immediately. Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 17:04, Denis Yuen > wrote: Hi, Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section. Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)? I'm attaching the build log from the Dockerfile and the log from inside the container. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca] Sent: October 12, 2016 10:59 AM To: Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, While that would have been a good explanation, unfortunately, it doesn't seem to be the case. In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like inputs: tumor: type: File inputBinding: position: 1 prefix: --tumor secondaryFiles: - .bai refFrom: type: File inputBinding: position: 3 prefix: --refFrom bbFrom: type: File inputBinding: position: 4 prefix: --bbFrom normal: type: File inputBinding: position: 2 prefix: --normal secondaryFiles: - .bai The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 10:49 AM To: Denis Yuen Cc: Miguel Vazquez; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 15:36, Denis Yuen > wrote: Hi, I can make the modification, I'll run it through the test data and that should finish in roughly a day. In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 6:16 AM To: Miguel Vazquez Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, This is assuming that it is possible to write to the location the BAM are in. I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] Hi Keiran, Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? Miguel On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: In the original version we didn't do this step, if we have write access it can be made to do that Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: Hi Keiran, If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? Would it not be better if it read private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); File f = new File(sampleBam); thisJob.getCommand() .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") .addArgument(installBase) .addArgument("bam_stats") .addArgument("-i " + sampleBam) .addArgument("-o " + sampleBam + ".bas") ; return thisJob; } Best Miguel On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: Relevant section of code: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:40, Keiran Raine > wrote: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. Hope this is easier to solve now, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: Keiran, Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? Best Miguel On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. Best Miguel -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Fri Oct 14 18:20:46 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Fri, 14 Oct 2016 22:20:46 +0000 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <7DDCBF2C-66B0-42F2-B1CF-60508098BDD8@ohsu.edu> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A12E1@exmb2.ad.oicr.on.ca> <66DF88B3-B90D-419B-8E9E-D9864BE237FA@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A165C@exmb2.ad.oicr.on.ca>, <7DDCBF2C-66B0-42F2-B1CF-60508098BDD8@ohsu.edu> Message-ID: <27512884B2D81B41AAB7BB266248F240C09A169F@exmb2.ad.oicr.on.ca> Hi, Adam, to summarise: My observations seem to match yours, the bas file and input bams are generated inside the Docker container with the test data. However, Miguel has observed that something else seems to be happening with DO50311 that looks like the bas file being missing. I'm currently running that donor to see if I can extract more information and determine what is occurring. ________________________________ From: Adam Struck [strucka at ohsu.edu] Sent: October 14, 2016 6:10 PM To: Denis Yuen; Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, Sorry, to chime in late. The bas file and input BAMs should be getting colocalized already (see below). Where are these files ending up when you run the workflow? Inputs are symlinked to the OUTDIR. https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L1035-L1039 Bas files are written to OUTDIR https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 I have now run 25 donors worth of data (from PRAD-UK) through this workflow using the WDL descriptor and the cromwell engine on the CCC platform without an issue. -Adam From: on behalf of Denis Yuen Date: Friday, October 14, 2016 at 2:55 PM To: Keiran Raine Cc: "docktesters at lists.icgc.org" Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, Just as a heads-up for the end-of-week in this thread. > RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm This got me on the right track, I actually needed the following syntax RUN cpanm --mirror https://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install CJFIELDS/BioPerl-1.6.924.tar.gz Const::Fast Graph && \ rm -rf ~/.cpanm However, it looks like making the suggested change breaks the workflow when attempting to run with the test data. In short, the bas file is definitely being generated inside the Docker container. Moving it to the suggested location breaks the workflow later. I'm currently attempting to run the donor DO50311 to see if I can get more insight into what is going on. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 2:09 PM To: Denis Yuen Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, You've hit an issue that only has occurred in the last few weeks for us also. BioPerl released a new version (first in ~20 months) that split the repository moving a whole section into a different package. The fix would be to force the first install of BioPerl to a specific version. Modify line 25/26 of the Dockerfile from: RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version Const::Fast Graph && \ rm -rf ~/.cpanm to RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm Thankfully something I could identify immediately. Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 17:04, Denis Yuen > wrote: Hi, Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section. Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)? I'm attaching the build log from the Dockerfile and the log from inside the container. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca] Sent: October 12, 2016 10:59 AM To: Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, While that would have been a good explanation, unfortunately, it doesn't seem to be the case. In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like inputs: tumor: type: File inputBinding: position: 1 prefix: --tumor secondaryFiles: - .bai refFrom: type: File inputBinding: position: 3 prefix: --refFrom bbFrom: type: File inputBinding: position: 4 prefix: --bbFrom normal: type: File inputBinding: position: 2 prefix: --normal secondaryFiles: - .bai The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 10:49 AM To: Denis Yuen Cc: Miguel Vazquez; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 15:36, Denis Yuen > wrote: Hi, I can make the modification, I'll run it through the test data and that should finish in roughly a day. In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 6:16 AM To: Miguel Vazquez Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, This is assuming that it is possible to write to the location the BAM are in. I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] Hi Keiran, Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? Miguel On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: In the original version we didn't do this step, if we have write access it can be made to do that Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: Hi Keiran, If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? Would it not be better if it read private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); File f = new File(sampleBam); thisJob.getCommand() .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") .addArgument(installBase) .addArgument("bam_stats") .addArgument("-i " + sampleBam) .addArgument("-o " + sampleBam + ".bas") ; return thisJob; } Best Miguel On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: Relevant section of code: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:40, Keiran Raine > wrote: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. Hope this is easier to solve now, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: Keiran, Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? Best Miguel On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. Best Miguel -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Sat Oct 15 02:52:19 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Sat, 15 Oct 2016 08:52:19 +0200 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A169F@exmb2.ad.oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A12E1@exmb2.ad.oicr.on.ca> <66DF88B3-B90D-419B-8E9E-D9864BE237FA@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A165C@exmb2.ad.oicr.on.ca> <7DDCBF2C-66B0-42F2-B1CF-60508098BDD8@ohsu.edu> <27512884B2D81B41AAB7BB266248F240C09A169F@exmb2.ad.oicr.on.ca> Message-ID: Just to clarify Denis, the BAS file was not present in the download of the donor data, while it was present in the download of the test data. That is as much as I observed, and this matched Keiran comment that a missing BAS file was consistent with the pindel error.I have no idea of what the workflow was doing so as far as I know the BAS could have been created correctly and the error was something else. Best regards On Sat, Oct 15, 2016 at 12:20 AM, Denis Yuen wrote: > Hi, > > Adam, to summarise: > My observations seem to match yours, the bas file and input bams are > generated inside the Docker container with the test data. However, Miguel > has observed that something else seems to be happening with DO50311 that > looks like the bas file being missing. I'm currently running that donor to > see if I can extract more information and determine what is occurring. > > > ------------------------------ > *From:* Adam Struck [strucka at ohsu.edu] > *Sent:* October 14, 2016 6:10 PM > *To:* Denis Yuen; Keiran Raine > > *Cc:* docktesters at lists.icgc.org > *Subject:* Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH > teleconference > > Hi Denis, > > > > Sorry, to chime in late. The bas file and input BAMs should be getting > colocalized already (see below). Where are these files ending up when you > run the workflow? > > > > Inputs are symlinked to the OUTDIR. > > https://github.com/adamstruck/CGP-Somatic-Docker/blob/ > develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L1035-L1039 > > > > > Bas files are written to OUTDIR > > https://github.com/adamstruck/CGP-Somatic-Docker/blob/ > develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 > > > > > I have now run 25 donors worth of data (from PRAD-UK) through this > workflow using the WDL descriptor and the cromwell engine on the CCC > platform without an issue. > > > > -Adam > > > > *From: * on behalf > of Denis Yuen > *Date: *Friday, October 14, 2016 at 2:55 PM > *To: *Keiran Raine > *Cc: *"docktesters at lists.icgc.org" > *Subject: *Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH > teleconference > > > > Hi, > > Just as a heads-up for the end-of-week in this thread. > > > RUN cpanm --mirror http://cpan.metacpan.org > > -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 > Const::Fast Graph && \ > rm -rf ~/.cpanm > > This got me on the right track, I actually needed the following syntax > > RUN cpanm --mirror https://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install CJFIELDS/BioPerl-1.6.924.tar.gz Const::Fast Graph && \ > > rm -rf ~/.cpanm > > > However, it looks like making the suggested change breaks the workflow > when attempting to run with the test data. In short, the bas file is > definitely being generated inside the Docker container. Moving it to the > suggested location breaks the workflow later. > > I'm currently attempting to run the donor DO50311 to see if I can get > more insight into what is going on. > ------------------------------ > > *From:* Keiran Raine [kr2 at sanger.ac.uk] > *Sent:* October 12, 2016 2:09 PM > *To:* Denis Yuen > *Cc:* docktesters at lists.icgc.org > *Subject:* Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH > teleconference > > Hi Denis, > > > > You've hit an issue that only has occurred in the last few weeks for us > also. > > > > BioPerl released a new version (first in ~20 months) that split the > repository moving a whole section into a different package. > > > > The fix would be to force the first install of BioPerl to a specific > version. Modify line 25/26 of the Dockerfile from: > > > RUN cpanm --mirror http://cpan.metacpan.org > -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version > Const::Fast Graph && \ > rm -rf ~/.cpanm > > > > to > > > > RUN cpanm --mirror http://cpan.metacpan.org > -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 > Const::Fast Graph && \ > rm -rf ~/.cpanm > > > > Thankfully something I could identify immediately. > > > > Regards, > > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > > kr2 at sanger.ac.uk > > Tel:+44 (0)1223 834244 Ext: 4983 > > Office: H104 > > > > On 12 Oct 2016, at 17:04, Denis Yuen > wrote: > > > > Hi, > > Keiran, I'm having trouble rebuilding the Sanger docker container in what > I think is an unrelated section. > Has anything changed about the build dependencies (for example, if there > is a floating version that changed over time)? > I'm attaching the build log from the Dockerfile and the log from inside > the container. > > > ------------------------------ > > *From:* docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org > [docktesters-bounces+denis. > yuen=oicr.on.ca at lists.icgc.org ] on behalf > of Denis Yuen [Denis.Yuen at oicr.on.ca ] > *Sent:* October 12, 2016 10:59 AM > *To:* Keiran Raine > *Cc:* docktesters at lists.icgc.org > *Subject:* Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH > teleconference > > Hi, > > While that would have been a good explanation, unfortunately, it doesn't > seem to be the case. > In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic- > Docker/blob/develop/Dockstore.cwl ) , the > bam files are described like > > inputs: > > tumor: > > type: File > > inputBinding: > > position: 1 > > prefix: --tumor > > secondaryFiles: > > - .bai > > > > refFrom: > > type: File > > inputBinding: > > position: 3 > > prefix: --refFrom > > bbFrom: > > type: File > > inputBinding: > > position: 4 > > prefix: --bbFrom > > normal: > > type: File > > inputBinding: > > position: 2 > > prefix: --normal > > secondaryFiles: > > - .bai > > The type of File (as opposed to directory) means that while the bam and > bai files are individually mounted into the docker container while it runs, > the bas files never were. If Miguel has the "docker run" output from the > run (should just be in the stdout of the run), we should be able to verify > this by looking at what gets mounted into the container at runtime. > > > ------------------------------ > > *From:* Keiran Raine [kr2 at sanger.ac.uk ] > *Sent:* October 12, 2016 10:49 AM > *To:* Denis Yuen > *Cc:* Miguel Vazquez; docktesters at lists.icgc.org > > *Subject:* Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH > teleconference > > Hi Denis, > > > > I expect when you unpack the the test data the BAS files exist in the > archive in that area so the fact it runs the out of that step to a > different location isn't detected. > > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > > kr2 at sanger.ac.uk > > Tel:+44 (0)1223 834244 Ext: 4983 > > Office: H104 > > > > On 12 Oct 2016, at 15:36, Denis Yuen > wrote: > > > > Hi, > I can make the modification, I'll run it through the test data and that > should finish in roughly a day. > In the meantime though, I am puzzled. Why would an issue like this affect > a donor dataset, but not the test data? > > > ------------------------------ > > *From:* docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org > [docktesters-bounces+ > denis.yuen=oicr.on.ca at lists.icgc.org ] on > behalf of Keiran Raine [kr2 at sanger.ac.uk ] > *Sent:* October 12, 2016 6:16 AM > *To:* Miguel Vazquez > *Cc:* docktesters at lists.icgc.org > *Subject:* Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH > teleconference > > Hi, > > > > This is assuming that it is possible to write to the location the BAM are > in. > > > > I think Denis would be best placed to make the minor modification as I > don't know the process they are using for build and deploy of the images (I > made modifications and then handed over for CWL). > > > > Regards, > > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > > kr2 at sanger.ac.uk > > Tel:+44 (0)1223 834244 Ext: 4983 > > Office: H104 > > > > On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: > > > > [The rest of the list where out of the loop for this part of the > conversation, I'm putting them back in. In short, the Sanger pipeline > produces the BAS file but not co-located with the BAM] > > Hi Keiran, > > > Would it be possible then to change this and try again? what needs to > happen? I guess you'll need to change the code and a new docker image be > produced. Would this be our best alternative? > > > > Miguel > > > > On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: > > In the original version we didn't do this step, if we have write access it > can be made to do that > > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > > kr2 at sanger.ac.uk > > Tel:+44 (0)1223 834244 Ext: 4983 > > Office: H104 > > > > On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: > > > > Hi Keiran, > > If the BAS and BAM files need to be collocated, why is it not created next > to the BAM file? > > Would it not be better if it read > > private Job basFileBaseJob(int tumourCount, String sampleBam, String > process, int index) { > > Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, > index); > > > > File f = new File(sampleBam); > > > > thisJob.getCommand() > > > > .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") > > > > .addArgument(installBase) > > > > .addArgument("bam_stats") > > > > .addArgument("-i " + sampleBam) > > > > .addArgument("-o " + sampleBam + ".bas") > > > > ; > > > > return thisJob; > > } > > > > Best > > Miguel > > > > On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: > > Relevant section of code: > > > > https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/ > blob/develop/src/main/java/io/seqware/pancancer/ > CgpSomaticCore.java#L769-L780 > > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > > kr2 at sanger.ac.uk > > Tel:+44 (0)1223 834244 Ext: 4983 > > Office: H104 > > > > On 11 Oct 2016, at 13:40, Keiran Raine > wrote: > > > > Hi, > > > > There is a step generating the BAS files: > > > > [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9- > 8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_ > basFileGenerate_control_11-runner.sh > [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9- > 8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_ > basFileGenerate_tumours_12-runner.sh > > > > But if the BAM files and BAS aren't co-located then you have a problem. > You could symlink the BAM files into the work space and have all tools work > from that path instead, deleting the symlinks at the end. > > > > This is one of the changes we had to implement differently as the BAS file > data was being held in the GNOS xml data structures during the initial > processing. Moving to this means that any BAM input is sufficient. > > > > Hope this is easier to solve now, > > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > > kr2 at sanger.ac.uk > > Tel:+44 (0)1223 834244 Ext: 4983 > > Office: H104 > > > > On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: > > > > Keiran, > > Its downloading the files still but in fact it does not seem to download > any BAS file. Could you please educate me a bit on what are these and how I > can create them? > > Best > > Miguel > > > > On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: > > > > 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds > which could indicate a problem with either the headers or the absence of > the BAS file from the expected location. > > > > > > I think that you just revealed the problem. There is in fact no BAS files > only BAM and BAI. There where BAS files for the test data HCC1143 which is > the one that in fact work. It seems like BAS files are not gathered by > gnos, could that be? or that my script fails to copy them. I'll try to > gather a different sample with my client and check. > > Not knowing a thing about these files explains why I didn't notice. I'll > get back to you when I know more. > > Best > > Miguel > > > > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Sat Oct 15 17:10:04 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Sat, 15 Oct 2016 21:10:04 +0000 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A12E1@exmb2.ad.oicr.on.ca> <66DF88B3-B90D-419B-8E9E-D9864BE237FA@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A165C@exmb2.ad.oicr.on.ca> <7DDCBF2C-66B0-42F2-B1CF-60508098BDD8@ohsu.edu> <27512884B2D81B41AAB7BB266248F240C09A169F@exmb2.ad.oicr.on.ca>, Message-ID: <27512884B2D81B41AAB7BB266248F240C09A1713@exmb2.ad.oicr.on.ca> Hi, Agreed, to sum up: 1) The donor test set includes bas files. The real donor sets do not. 2) That said, the way the CWL file is written, regardless of whether a bas file is provided in the test set, they don't actually make it into the docker container. Instead, they get generated inside the container while it is running. 3) The pindel step does indeed fail in DO50311 on a host that successfully ran the test data. Keiran, some additional info for debugging. The CWL file results in this docker invocation: [job temp8674499429956656923.cwl] /tmp/tmp0NOg7v$ docker \ run \ -i \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:ro \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam:ro \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam.bai:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai:ro \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/ab975a04-937f-40fc-b3e5-40b41c2295fc/GRCh37d5_CGP_refBundle.tar.gz:/var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz:ro \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam:ro \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/d3ae586e-1251-470b-bbf8-f498e5895312/GRCh37d5_battenberg.tar.gz:/var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz:ro \ --volume=/tmp/tmp0NOg7v:/var/spool/cwl:rw \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/workingQriPMs:/tmp:rw \ --workdir=/var/spool/cwl \ --read-only=true \ --user=1000 \ --env=TMPDIR=/tmp \ --env=HOME=/var/spool/cwl \ quay.io/pancancer/pcawg-sanger-cgp-workflow:2.0.0-cwl1 \ python \ /home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py \ --tumor \ /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam \ --normal \ /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam \ --refFrom \ /var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz \ --bbFrom \ /var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz The listing of the working directory (/var/spool/cwl) is as follows and does seem to include generated bas files: ubuntu at sanger-retest:/tmp/tmp0NOg7v$ ls -alhtr total 92K -rw-r--r-- 1 ubuntu ubuntu 1.6K Oct 14 22:07 workflow.ini -rw-r--r-- 1 ubuntu ubuntu 28 Oct 14 22:07 .Rprofile drwxr-xr-x 3 ubuntu root 4.0K Oct 14 22:07 .seqware drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:07 ngsCounts lrwxrwxrwx 1 ubuntu ubuntu 89 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam lrwxrwxrwx 1 ubuntu ubuntu 93 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam.bai -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai lrwxrwxrwx 1 ubuntu ubuntu 89 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam lrwxrwxrwx 1 ubuntu ubuntu 93 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam.bai -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai drwxr-xr-x 3 ubuntu ubuntu 4.0K Oct 14 22:07 1 drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:07 genotype drwxr-xr-x 8 ubuntu ubuntu 4.0K Oct 14 22:19 reference_files drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:19 genotype_b02b4bba-6e66-44fb-a48f-38c309aaaac5 -rw-r--r-- 1 ubuntu ubuntu 2.4K Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz -rw-r--r-- 1 ubuntu ubuntu 33 Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz.md5 -rw-r--r-- 1 ubuntu ubuntu 1.5K Oct 14 22:58 fdcb1bd7cffca69d15383ca9566c58e0.bam.bas -rw-r--r-- 1 ubuntu ubuntu 2.3K Oct 14 23:17 7875b5196f6b8b52847f99bf370aada0.bam.bas -rw-r--r-- 1 ubuntu ubuntu 33 Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz.md5 -rw-r--r-- 1 ubuntu ubuntu 2.4K Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 15 10:02 timings drwxr-xr-x 5 ubuntu ubuntu 4.0K Oct 15 10:02 0 drwxr-xr-x 3 ubuntu ubuntu 4.0K Oct 15 10:06 bbCounts drwx------ 11 ubuntu ubuntu 4.0K Oct 15 20:28 . drwxrwxrwt 21 root root 4.0K Oct 15 20:29 .. The full output of the failing script is: Errors from command: /usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz Unknown sort order field: unknown Collated 500000 readpairs (in 6 sec.) [V] 1 34.4825MB/s 133279 Thread Worker 1: started Thread 1 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' Collated 500000 readpairs (in 4 sec.) [V] 2 39.9836MB/s 154626 Thread Worker 2: started Thread 2 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' Collated 500000 readpairs (in 4 sec.) Thread Worker 3: started Thread 3 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03' [V] 3 42.4188MB/s 164102 Collated 500000 readpairs (in 4 sec.) Thread Worker 4: started Thread 4 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' [V] 4 43.7799MB/s 169368 Collated 500000 readpairs (in 4 sec.) An error occurred while running: /opt/wtsi-cgp/bin/bamcollate2 outputformat=sam colsbs=268435456 collate=1 classes=F,F2 exclude=DUP,SECONDARY,QCFAIL,SUPPLEMENTARY T=/var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c/tmp1kNw/collate_tmp filename=/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam ERROR: Converter thread error: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' Perl exited with active threads: 0 running and unjoined 3 finished and unjoined 0 running and detached Thread 2 terminated abnormally: main=HASH(0x50044b0) at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 115. Thread error: "/usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz" unexpectedly returned exit value 29 at (eval 410) line 13 thread 2. at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 190 Command exited with non-zero status 25 10.23user 8.79system 0:32.33elapsed 58%CPU (0avgtext+0avgdata 10819936maxresident)k 1184inputs+8552outputs (2major+711948minor)pagefaults 0swaps Please let me know if any of the files on that host would be useful to debug this. ________________________________ From: mikisvaz at gmail.com [mikisvaz at gmail.com] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es] Sent: October 15, 2016 2:52 AM To: Denis Yuen Cc: Adam Struck; Keiran Raine; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Just to clarify Denis, the BAS file was not present in the download of the donor data, while it was present in the download of the test data. That is as much as I observed, and this matched Keiran comment that a missing BAS file was consistent with the pindel error.I have no idea of what the workflow was doing so as far as I know the BAS could have been created correctly and the error was something else. Best regards On Sat, Oct 15, 2016 at 12:20 AM, Denis Yuen > wrote: Hi, Adam, to summarise: My observations seem to match yours, the bas file and input bams are generated inside the Docker container with the test data. However, Miguel has observed that something else seems to be happening with DO50311 that looks like the bas file being missing. I'm currently running that donor to see if I can extract more information and determine what is occurring. ________________________________ From: Adam Struck [strucka at ohsu.edu] Sent: October 14, 2016 6:10 PM To: Denis Yuen; Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, Sorry, to chime in late. The bas file and input BAMs should be getting colocalized already (see below). Where are these files ending up when you run the workflow? Inputs are symlinked to the OUTDIR. https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L1035-L1039 Bas files are written to OUTDIR https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 I have now run 25 donors worth of data (from PRAD-UK) through this workflow using the WDL descriptor and the cromwell engine on the CCC platform without an issue. -Adam From: > on behalf of Denis Yuen > Date: Friday, October 14, 2016 at 2:55 PM To: Keiran Raine > Cc: "docktesters at lists.icgc.org" > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, Just as a heads-up for the end-of-week in this thread. > RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm This got me on the right track, I actually needed the following syntax RUN cpanm --mirror https://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install CJFIELDS/BioPerl-1.6.924.tar.gz Const::Fast Graph && \ rm -rf ~/.cpanm However, it looks like making the suggested change breaks the workflow when attempting to run with the test data. In short, the bas file is definitely being generated inside the Docker container. Moving it to the suggested location breaks the workflow later. I'm currently attempting to run the donor DO50311 to see if I can get more insight into what is going on. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 2:09 PM To: Denis Yuen Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, You've hit an issue that only has occurred in the last few weeks for us also. BioPerl released a new version (first in ~20 months) that split the repository moving a whole section into a different package. The fix would be to force the first install of BioPerl to a specific version. Modify line 25/26 of the Dockerfile from: RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version Const::Fast Graph && \ rm -rf ~/.cpanm to RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm Thankfully something I could identify immediately. Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 17:04, Denis Yuen > wrote: Hi, Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section. Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)? I'm attaching the build log from the Dockerfile and the log from inside the container. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca] Sent: October 12, 2016 10:59 AM To: Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, While that would have been a good explanation, unfortunately, it doesn't seem to be the case. In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like inputs: tumor: type: File inputBinding: position: 1 prefix: --tumor secondaryFiles: - .bai refFrom: type: File inputBinding: position: 3 prefix: --refFrom bbFrom: type: File inputBinding: position: 4 prefix: --bbFrom normal: type: File inputBinding: position: 2 prefix: --normal secondaryFiles: - .bai The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 10:49 AM To: Denis Yuen Cc: Miguel Vazquez; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 15:36, Denis Yuen > wrote: Hi, I can make the modification, I'll run it through the test data and that should finish in roughly a day. In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 6:16 AM To: Miguel Vazquez Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, This is assuming that it is possible to write to the location the BAM are in. I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] Hi Keiran, Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? Miguel On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: In the original version we didn't do this step, if we have write access it can be made to do that Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: Hi Keiran, If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? Would it not be better if it read private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); File f = new File(sampleBam); thisJob.getCommand() .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") .addArgument(installBase) .addArgument("bam_stats") .addArgument("-i " + sampleBam) .addArgument("-o " + sampleBam + ".bas") ; return thisJob; } Best Miguel On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: Relevant section of code: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:40, Keiran Raine > wrote: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. Hope this is easier to solve now, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: Keiran, Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? Best Miguel On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. Best Miguel -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From kr2 at sanger.ac.uk Mon Oct 17 04:25:12 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Mon, 17 Oct 2016 09:25:12 +0100 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A1713@exmb2.ad.oicr.on.ca> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A12E1@exmb2.ad.oicr.on.ca> <66DF88B3-B90 D-419B-8E9E-D9864BE237FA@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A165C@exmb2.ad.oicr.on.ca> <7DDCBF2C-66B0-42F2-B1CF-60508098BDD8@ohsu.edu> <27512884B2D81B41AAB7BB266248F240C09A169F@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A1713@exmb2.ad.oicr.on.ca> Message-ID: Hi Denis, The commands executed by the Pindel step appear to be being handed the original BAM file location rather (/var/lib/cwl) than the symlinks in the output area (/var/spool/cwl). This would be the problem. All the jobs that execute after the BAS generation should use the symlinked BAMs in the output area (although I think it's only important for pindel and brass). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 15 Oct 2016, at 22:10, Denis Yuen wrote: > > Hi, > > Agreed, to sum up: > > 1) The donor test set includes bas files. The real donor sets do not. > 2) That said, the way the CWL file is written, regardless of whether a bas file is provided in the test set, they don't actually make it into the docker container. Instead, they get generated inside the container while it is running. > 3) The pindel step does indeed fail in DO50311 on a host that successfully ran the test data. > > Keiran, some additional info for debugging. > > The CWL file results in this docker invocation: > > [job temp8674499429956656923.cwl] /tmp/tmp0NOg7v$ docker \ > run \ > -i \ > --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:ro \ > --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam:ro \ > --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam.bai:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai:ro \ > --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/ab975a04-937f-40fc-b3e5-40b41c2295fc/GRCh37d5_CGP_refBundle.tar.gz:/var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz:ro \ > --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam:ro \ > --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/d3ae586e-1251-470b-bbf8-f498e5895312/GRCh37d5_battenberg.tar.gz:/var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz:ro \ > --volume=/tmp/tmp0NOg7v:/var/spool/cwl:rw \ > --volume=/home/ubuntu/CGP-Somatic-Docker-original/datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/workingQriPMs:/tmp:rw \ > --workdir=/var/spool/cwl \ > --read-only=true \ > --user=1000 \ > --env=TMPDIR=/tmp \ > --env=HOME=/var/spool/cwl \ > quay.io/pancancer/pcawg-sanger-cgp-workflow:2.0.0-cwl1 \ > python \ > /home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py \ > --tumor \ > /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam \ > --normal \ > /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam \ > --refFrom \ > /var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz \ > --bbFrom \ > /var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz > > The listing of the working directory (/var/spool/cwl) is as follows and does seem to include generated bas files: > > ubuntu at sanger-retest:/tmp/tmp0NOg7v$ ls -alhtr > total 92K > -rw-r--r-- 1 ubuntu ubuntu 1.6K Oct 14 22:07 workflow.ini > -rw-r--r-- 1 ubuntu ubuntu 28 Oct 14 22:07 .Rprofile > drwxr-xr-x 3 ubuntu root 4.0K Oct 14 22:07 .seqware > drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:07 ngsCounts > lrwxrwxrwx 1 ubuntu ubuntu 89 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam > lrwxrwxrwx 1 ubuntu ubuntu 93 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam.bai -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai > lrwxrwxrwx 1 ubuntu ubuntu 89 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam > lrwxrwxrwx 1 ubuntu ubuntu 93 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam.bai -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai > drwxr-xr-x 3 ubuntu ubuntu 4.0K Oct 14 22:07 1 > drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:07 genotype > drwxr-xr-x 8 ubuntu ubuntu 4.0K Oct 14 22:19 reference_files > drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:19 genotype_b02b4bba-6e66-44fb-a48f-38c309aaaac5 > -rw-r--r-- 1 ubuntu ubuntu 2.4K Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz > -rw-r--r-- 1 ubuntu ubuntu 33 Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz.md5 > -rw-r--r-- 1 ubuntu ubuntu 1.5K Oct 14 22:58 fdcb1bd7cffca69d15383ca9566c58e0.bam.bas > -rw-r--r-- 1 ubuntu ubuntu 2.3K Oct 14 23:17 7875b5196f6b8b52847f99bf370aada0.bam.bas > -rw-r--r-- 1 ubuntu ubuntu 33 Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz.md5 > -rw-r--r-- 1 ubuntu ubuntu 2.4K Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz > drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 15 10:02 timings > drwxr-xr-x 5 ubuntu ubuntu 4.0K Oct 15 10:02 0 > drwxr-xr-x 3 ubuntu ubuntu 4.0K Oct 15 10:06 bbCounts > drwx------ 11 ubuntu ubuntu 4.0K Oct 15 20:28 . > drwxrwxrwt 21 root root 4.0K Oct 15 20:29 .. > > The full output of the failing script is: > > Errors from command: /usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz > > Unknown sort order field: unknown > Collated 500000 readpairs (in 6 sec.) > [V] 1 34.4825MB/s 133279 > Thread Worker 1: started > Thread 1 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' > Collated 500000 readpairs (in 4 sec.) > [V] 2 39.9836MB/s 154626 > Thread Worker 2: started > Thread 2 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' > Collated 500000 readpairs (in 4 sec.) > Thread Worker 3: started > Thread 3 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03' > [V] 3 42.4188MB/s 164102 > Collated 500000 readpairs (in 4 sec.) > Thread Worker 4: started > Thread 4 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' > [V] 4 43.7799MB/s 169368 > Collated 500000 readpairs (in 4 sec.) > An error occurred while running: > /opt/wtsi-cgp/bin/bamcollate2 outputformat=sam colsbs=268435456 collate=1 classes=F,F2 exclude=DUP,SECONDARY,QCFAIL,SUPPLEMENTARY T=/var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c/tmp1kNw/collate_tmp filename=/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam > ERROR: Converter thread error: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' > > Perl exited with active threads: > 0 running and unjoined > 3 finished and unjoined > 0 running and detached > Thread 2 terminated abnormally: main=HASH(0x50044b0) at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 115. > Thread error: "/usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz" unexpectedly returned exit value 29 at (eval 410) line 13 thread 2. > at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 190 > > Command exited with non-zero status 25 > 10.23user 8.79system 0:32.33elapsed 58%CPU (0avgtext+0avgdata 10819936maxresident)k > 1184inputs+8552outputs (2major+711948minor)pagefaults 0swaps > > Please let me know if any of the files on that host would be useful to debug this. > > > From: mikisvaz at gmail.com [mikisvaz at gmail.com] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es] > Sent: October 15, 2016 2:52 AM > To: Denis Yuen > Cc: Adam Struck; Keiran Raine; docktesters at lists.icgc.org > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > Just to clarify Denis, the BAS file was not present in the download of the donor data, while it was present in the download of the test data. That is as much as I observed, and this matched Keiran comment that a missing BAS file was consistent with the pindel error.I have no idea of what the workflow was doing so as far as I know the BAS could have been created correctly and the error was something else. > > Best regards > > On Sat, Oct 15, 2016 at 12:20 AM, Denis Yuen > wrote: > Hi, > > Adam, to summarise: > My observations seem to match yours, the bas file and input bams are generated inside the Docker container with the test data. However, Miguel has observed that something else seems to be happening with DO50311 that looks like the bas file being missing. I'm currently running that donor to see if I can extract more information and determine what is occurring. > > > From: Adam Struck [strucka at ohsu.edu ] > Sent: October 14, 2016 6:10 PM > To: Denis Yuen; Keiran Raine > > Cc: docktesters at lists.icgc.org > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > Hi Denis, > > > Sorry, to chime in late. The bas file and input BAMs should be getting colocalized already (see below). Where are these files ending up when you run the workflow? > > > Inputs are symlinked to the OUTDIR. > > https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L1035-L1039 > > Bas files are written to OUTDIR > > https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 > > I have now run 25 donors worth of data (from PRAD-UK) through this workflow using the WDL descriptor and the cromwell engine on the CCC platform without an issue. > > > -Adam > > > From: > on behalf of Denis Yuen > > Date: Friday, October 14, 2016 at 2:55 PM > To: Keiran Raine > > Cc: "docktesters at lists.icgc.org " > > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > > Hi, > > Just as a heads-up for the end-of-week in this thread. > > > RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ > rm -rf ~/.cpanm > > This got me on the right track, I actually needed the following syntax > > RUN cpanm --mirror https://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install CJFIELDS/BioPerl-1.6.924.tar.gz Const::Fast Graph && \ > rm -rf ~/.cpanm > > However, it looks like making the suggested change breaks the workflow when attempting to run with the test data. In short, the bas file is definitely being generated inside the Docker container. Moving it to the suggested location breaks the workflow later. > > I'm currently attempting to run the donor DO50311 to see if I can get more insight into what is going on. > > From: Keiran Raine [kr2 at sanger.ac.uk ] > Sent: October 12, 2016 2:09 PM > To: Denis Yuen > Cc: docktesters at lists.icgc.org > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > Hi Denis, > > > You've hit an issue that only has occurred in the last few weeks for us also. > > > BioPerl released a new version (first in ~20 months) that split the repository moving a whole section into a different package. > > > The fix would be to force the first install of BioPerl to a specific version. Modify line 25/26 of the Dockerfile from: > > > RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version Const::Fast Graph && \ > rm -rf ~/.cpanm > > > to > > > RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ > rm -rf ~/.cpanm > > > Thankfully something I could identify immediately. > > > Regards, > > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > > > On 12 Oct 2016, at 17:04, Denis Yuen > wrote: > > > Hi, > > Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section. > Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)? > I'm attaching the build log from the Dockerfile and the log from inside the container. > > > > From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org ] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca ] > Sent: October 12, 2016 10:59 AM > To: Keiran Raine > Cc: docktesters at lists.icgc.org > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > Hi, > > While that would have been a good explanation, unfortunately, it doesn't seem to be the case. > In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like > > inputs: > tumor: > type: File > inputBinding: > position: 1 > prefix: --tumor > secondaryFiles: > - .bai > > refFrom: > type: File > inputBinding: > position: 3 > prefix: --refFrom > bbFrom: > type: File > inputBinding: > position: 4 > prefix: --bbFrom > normal: > type: File > inputBinding: > position: 2 > prefix: --normal > secondaryFiles: > - .bai > The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. > > > From: Keiran Raine [kr2 at sanger.ac.uk ] > Sent: October 12, 2016 10:49 AM > To: Denis Yuen > Cc: Miguel Vazquez; docktesters at lists.icgc.org > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > Hi Denis, > > > I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. > > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > > > On 12 Oct 2016, at 15:36, Denis Yuen > wrote: > > > Hi, > I can make the modification, I'll run it through the test data and that should finish in roughly a day. > In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? > > > From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org ] on behalf of Keiran Raine [kr2 at sanger.ac.uk ] > Sent: October 12, 2016 6:16 AM > To: Miguel Vazquez > Cc: docktesters at lists.icgc.org > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > Hi, > > > This is assuming that it is possible to write to the location the BAM are in. > > > I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). > > > Regards, > > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > > > On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: > > > [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] > > Hi Keiran, > > > Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? > > > Miguel > > > > On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: > > In the original version we didn't do this step, if we have write access it can be made to do that > > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > > > On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: > > > Hi Keiran, > > If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? > > Would it not be better if it read > > private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { > > Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); > > > File f = new File(sampleBam); > > > thisJob.getCommand() > > > .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") > > > .addArgument(installBase) > > > .addArgument("bam_stats") > > > .addArgument("-i " + sampleBam) > > > .addArgument("-o " + sampleBam + ".bas") > > > ; > > > return thisJob; > > } > > > Best > > Miguel > > > On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: > > Relevant section of code: > > > https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > > > On 11 Oct 2016, at 13:40, Keiran Raine > wrote: > > > Hi, > > > There is a step generating the BAS files: > > > [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh > [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh > > > But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. > > > This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. > > > Hope this is easier to solve now, > > > Keiran Raine > > Principal Bioinformatician > > Cancer Genome Project > > Wellcome Trust Sanger Institute > > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > > > On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: > > > Keiran, > > Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? > > Best > > Miguel > > > On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: > > > 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. > > > > I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. > > Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. > > Best > > Miguel > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Mon Oct 17 08:50:41 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Mon, 17 Oct 2016 12:50:41 +0000 Subject: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference In-Reply-To: <7DDCBF2C-66B0-42F2-B1CF-60508098BDD8@ohsu.edu> References: <0F84ED6166CE664E8563B61CB2ECB98CCBA7F1FA@exmb2.ad.oicr.on.ca> <16D5F0B4-2278-4A78-A038-02A3C61737CF@oicr.on.ca> <6D379955-753C-4F00-A4C8-5E32C654349E@sanger.ac.uk> <3AD4BF40-3610-4BFF-BCBF-19D63BA88820@sanger.ac.uk> <3C1FBE27-69BD-4C2C-A683-0AB8541A1602@sanger.ac.uk> <689D9C27-0DC6-4FAF-A8BB-4885477E6AC9@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A1297@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A12E1@exmb2.ad.oicr.on.ca> <66DF88B3-B90D-419B-8E9E-D9864BE237FA@sanger.ac.uk> <27512884B2D81B41AAB7BB266248F240C09A165C@exmb2.ad.oicr.on.ca> <7DDCBF2C-66B0-42F2-B1CF-60508098BDD8@ohsu.edu> Message-ID: <3EEB41D2-872B-44CC-9597-4C8FA121DB88@oicr.on.ca> Adam, Can you post your success with Sanger workflow on https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data Thank you, @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Oct 14, 2016, at 3:10 PM, Adam Struck > wrote: Hi Denis, Sorry, to chime in late. The bas file and input BAMs should be getting colocalized already (see below). Where are these files ending up when you run the workflow? Inputs are symlinked to the OUTDIR. https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L1035-L1039 Bas files are written to OUTDIR https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 I have now run 25 donors worth of data (from PRAD-UK) through this workflow using the WDL descriptor and the cromwell engine on the CCC platform without an issue. -Adam From: > on behalf of Denis Yuen > Date: Friday, October 14, 2016 at 2:55 PM To: Keiran Raine > Cc: "docktesters at lists.icgc.org" > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, Just as a heads-up for the end-of-week in this thread. > RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm This got me on the right track, I actually needed the following syntax RUN cpanm --mirror https://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install CJFIELDS/BioPerl-1.6.924.tar.gz Const::Fast Graph && \ rm -rf ~/.cpanm However, it looks like making the suggested change breaks the workflow when attempting to run with the test data. In short, the bas file is definitely being generated inside the Docker container. Moving it to the suggested location breaks the workflow later. I'm currently attempting to run the donor DO50311 to see if I can get more insight into what is going on. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 2:09 PM To: Denis Yuen Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, You've hit an issue that only has occurred in the last few weeks for us also. BioPerl released a new version (first in ~20 months) that split the repository moving a whole section into a different package. The fix would be to force the first install of BioPerl to a specific version. Modify line 25/26 of the Dockerfile from: RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version Const::Fast Graph && \ rm -rf ~/.cpanm to RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm Thankfully something I could identify immediately. Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 17:04, Denis Yuen > wrote: Hi, Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section. Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)? I'm attaching the build log from the Dockerfile and the log from inside the container. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca] Sent: October 12, 2016 10:59 AM To: Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, While that would have been a good explanation, unfortunately, it doesn't seem to be the case. In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like inputs: tumor: type: File inputBinding: position: 1 prefix: --tumor secondaryFiles: - .bai refFrom: type: File inputBinding: position: 3 prefix: --refFrom bbFrom: type: File inputBinding: position: 4 prefix: --bbFrom normal: type: File inputBinding: position: 2 prefix: --normal secondaryFiles: - .bai The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 10:49 AM To: Denis Yuen Cc: Miguel Vazquez; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 15:36, Denis Yuen > wrote: Hi, I can make the modification, I'll run it through the test data and that should finish in roughly a day. In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 6:16 AM To: Miguel Vazquez Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, This is assuming that it is possible to write to the location the BAM are in. I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] Hi Keiran, Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? Miguel On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: In the original version we didn't do this step, if we have write access it can be made to do that Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: Hi Keiran, If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? Would it not be better if it read private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); File f = new File(sampleBam); thisJob.getCommand() .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") .addArgument(installBase) .addArgument("bam_stats") .addArgument("-i " + sampleBam) .addArgument("-o " + sampleBam + ".bas") ; return thisJob; } Best Miguel On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: Relevant section of code: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:40, Keiran Raine > wrote: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. Hope this is easier to solve now, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: Keiran, Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? Best Miguel On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. Best Miguel -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From i.buchhalter at dkfz-heidelberg.de Tue Oct 18 03:06:10 2016 From: i.buchhalter at dkfz-heidelberg.de (Ivo Buchhalter) Date: Tue, 18 Oct 2016 09:06:10 +0200 Subject: [DOCKTESTERS] DKFZ bias filter docker Message-ID: <8b96b9c2-d0a4-d8ea-b2ca-7dabfc3a785d@dkfz.de> Dear dockertesters, I was told to contact you regarding the DKFZ bias filter docker. The docker is basically ready and it also passed internal testing. The scripts and Dockerfile can be found here: https://github.com/eilslabs/DKFZBiasFilter Johannes and Manuel (CCed) have been working on bringing it into Dockstore (they ran into a NFS problem). Johannes is currently on vacation (until October 23rd). Please let me know if it is sufficient to have the docker available on Dockstore some time next week. Alternatively it might be possible that you work with the scripts and Dockerfile on GitHub. If none of this works I (or probably Manuel) can also try to get it running on Dockstore some time later this week. Best, Ivo PS: Sorry to the DKFZ people the email before bounced back because the link of the email address on the Wiki had a typo. From francis at oicr.on.ca Tue Oct 18 03:32:02 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Tue, 18 Oct 2016 07:32:02 +0000 Subject: [DOCKTESTERS] Fwd: DKFZ bias filter docker References: <8b96b9c2-d0a4-d8ea-b2ca-7dabfc3a785d@dkfz.de> Message-ID: <2CFD3EE1-FB12-448F-8B99-391F7CE46CA2@oicr.on.ca> Dennis and Brian: this is for you. (from today?s tech call) @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette Begin forwarded message: From: Ivo Buchhalter > Subject: [DOCKTESTERS] DKFZ bias filter docker Date: October 18, 2016 at 12:06:10 AM PDT To: > Cc: "Werner, Johannes" >, prinz >, "Schlesner, Matthias" > Dear dockertesters, I was told to contact you regarding the DKFZ bias filter docker. The docker is basically ready and it also passed internal testing. The scripts and Dockerfile can be found here: https://github.com/eilslabs/DKFZBiasFilter Johannes and Manuel (CCed) have been working on bringing it into Dockstore (they ran into a NFS problem). Johannes is currently on vacation (until October 23rd). Please let me know if it is sufficient to have the docker available on Dockstore some time next week. Alternatively it might be possible that you work with the scripts and Dockerfile on GitHub. If none of this works I (or probably Manuel) can also try to get it running on Dockstore some time later this week. Best, Ivo PS: Sorry to the DKFZ people the email before bounced back because the link of the email address on the Wiki had a typo. _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Tue Oct 18 11:38:00 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Tue, 18 Oct 2016 15:38:00 +0000 Subject: [DOCKTESTERS] DKFZ bias filter docker In-Reply-To: <8b96b9c2-d0a4-d8ea-b2ca-7dabfc3a785d@dkfz.de> References: <8b96b9c2-d0a4-d8ea-b2ca-7dabfc3a785d@dkfz.de> Message-ID: <27512884B2D81B41AAB7BB266248F240C09A1875@exmb2.ad.oicr.on.ca> Hi, Thanks for the heads-up, we can probably help out with writing the CWL descriptor for this, but I do have a few questions. 1) How does this relate to the existing dkfz workflow? ( https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows ) Is this a step that follows that workflow or is this a portion of that workflow that has been split out? 2) Could we get a readme that describes how to use this? 3) Do you have some non-confidential small test data that we can use to run/test this quickly before running a real donor through this? Thanks! ________________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] Sent: October 18, 2016 3:06 AM To: docktesters at lists.icgc.org Cc: Werner, Johannes; prinz; Schlesner, Matthias Subject: [DOCKTESTERS] DKFZ bias filter docker Dear dockertesters, I was told to contact you regarding the DKFZ bias filter docker. The docker is basically ready and it also passed internal testing. The scripts and Dockerfile can be found here: https://github.com/eilslabs/DKFZBiasFilter Johannes and Manuel (CCed) have been working on bringing it into Dockstore (they ran into a NFS problem). Johannes is currently on vacation (until October 23rd). Please let me know if it is sufficient to have the docker available on Dockstore some time next week. Alternatively it might be possible that you work with the scripts and Dockerfile on GitHub. If none of this works I (or probably Manuel) can also try to get it running on Dockstore some time later this week. Best, Ivo PS: Sorry to the DKFZ people the email before bounced back because the link of the email address on the Wiki had a typo. _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters From i.buchhalter at dkfz-heidelberg.de Wed Oct 19 02:19:56 2016 From: i.buchhalter at dkfz-heidelberg.de (Ivo Buchhalter) Date: Wed, 19 Oct 2016 08:19:56 +0200 Subject: [DOCKTESTERS] DKFZ bias filter docker In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A1875@exmb2.ad.oicr.on.ca> References: <8b96b9c2-d0a4-d8ea-b2ca-7dabfc3a785d@dkfz.de> <27512884B2D81B41AAB7BB266248F240C09A1875@exmb2.ad.oicr.on.ca> Message-ID: <95c07a22-98c6-8602-a6ce-f17ca178ec9b@dkfz.de> Hi Denis, Sorry for the missing information. I updated the README in the repository. I hope things are more clear now. > 1) How does this relate to the existing dkfz workflow? ( https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows ) Is this a step that follows that workflow or is this a portion of that workflow that has been split out? The filter is part of the DKFZ workflow. Since the other workflows don't use similar filters the DKFZ stand alone filter was run on the complete data set after merging the calls (only somatic SNV calls). > 2) Could we get a readme that describes how to use this? I updated the README. I hope it's more clear now. > 3) Do you have some non-confidential small test data that we can use to run/test this quickly before running a real donor through this? I will check if we can provide this later but the filter generally runs only a couple of minutes (on a normal somatic "PASS" vcf with <= 10 000 variants). Best, Ivo > Thanks! > > ________________________________________ > From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] > Sent: October 18, 2016 3:06 AM > To: docktesters at lists.icgc.org > Cc: Werner, Johannes; prinz; Schlesner, Matthias > Subject: [DOCKTESTERS] DKFZ bias filter docker > > Dear dockertesters, > > I was told to contact you regarding the DKFZ bias filter docker. The > docker is basically ready and it also passed internal testing. The > scripts and Dockerfile can be found here: > https://github.com/eilslabs/DKFZBiasFilter > Johannes and Manuel (CCed) have been working on bringing it into > Dockstore (they ran into a NFS problem). Johannes is currently on > vacation (until October 23rd). > Please let me know if it is sufficient to have the docker available on > Dockstore some time next week. Alternatively it might be possible that > you work with the scripts and Dockerfile on GitHub. If none of this > works I (or probably Manuel) can also try to get it running on Dockstore > some time later this week. > > Best, > Ivo > > PS: Sorry to the DKFZ people the email before bounced back because the > link of the email address on the Wiki had a typo. > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters On 10/18/2016 05:38 PM, Denis Yuen wrote: > Hi, > > Thanks for the heads-up, we can probably help out with writing the CWL descriptor for this, but I do have a few questions. > > 1) How does this relate to the existing dkfz workflow? ( https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows ) Is this a step that follows that workflow or is this a portion of that workflow that has been split out? > 2) Could we get a readme that describes how to use this? > 3) Do you have some non-confidential small test data that we can use to run/test this quickly before running a real donor through this? > > Thanks! > > ________________________________________ > From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] > Sent: October 18, 2016 3:06 AM > To: docktesters at lists.icgc.org > Cc: Werner, Johannes; prinz; Schlesner, Matthias > Subject: [DOCKTESTERS] DKFZ bias filter docker > > Dear dockertesters, > > I was told to contact you regarding the DKFZ bias filter docker. The > docker is basically ready and it also passed internal testing. The > scripts and Dockerfile can be found here: > https://github.com/eilslabs/DKFZBiasFilter > Johannes and Manuel (CCed) have been working on bringing it into > Dockstore (they ran into a NFS problem). Johannes is currently on > vacation (until October 23rd). > Please let me know if it is sufficient to have the docker available on > Dockstore some time next week. Alternatively it might be possible that > you work with the scripts and Dockerfile on GitHub. If none of this > works I (or probably Manuel) can also try to get it running on Dockstore > some time later this week. > > Best, > Ivo > > PS: Sorry to the DKFZ people the email before bounced back because the > link of the email address on the Wiki had a typo. > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters From Denis.Yuen at oicr.on.ca Wed Oct 19 15:33:07 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Wed, 19 Oct 2016 19:33:07 +0000 Subject: [DOCKTESTERS] DKFZ bias filter docker In-Reply-To: <95c07a22-98c6-8602-a6ce-f17ca178ec9b@dkfz.de> References: <8b96b9c2-d0a4-d8ea-b2ca-7dabfc3a785d@dkfz.de> <27512884B2D81B41AAB7BB266248F240C09A1875@exmb2.ad.oicr.on.ca>, <95c07a22-98c6-8602-a6ce-f17ca178ec9b@dkfz.de> Message-ID: <27512884B2D81B41AAB7BB266248F240C09A19D9@exmb2.ad.oicr.on.ca> Hi, Thanks, this definitely helps to make it more clear. I think this will give us enough information for us to start working on a CWL descriptor or to assist one being written. ________________________________________ From: Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] Sent: October 19, 2016 2:19 AM To: Denis Yuen; Ivo Buchhalter; docktesters at lists.icgc.org Cc: Werner, Johannes; prinz; Schlesner, Matthias Subject: Re: [DOCKTESTERS] DKFZ bias filter docker Hi Denis, Sorry for the missing information. I updated the README in the repository. I hope things are more clear now. > 1) How does this relate to the existing dkfz workflow? ( https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows ) Is this a step that follows that workflow or is this a portion of that workflow that has been split out? The filter is part of the DKFZ workflow. Since the other workflows don't use similar filters the DKFZ stand alone filter was run on the complete data set after merging the calls (only somatic SNV calls). > 2) Could we get a readme that describes how to use this? I updated the README. I hope it's more clear now. > 3) Do you have some non-confidential small test data that we can use to run/test this quickly before running a real donor through this? I will check if we can provide this later but the filter generally runs only a couple of minutes (on a normal somatic "PASS" vcf with <= 10 000 variants). Best, Ivo > Thanks! > > ________________________________________ > From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] > Sent: October 18, 2016 3:06 AM > To: docktesters at lists.icgc.org > Cc: Werner, Johannes; prinz; Schlesner, Matthias > Subject: [DOCKTESTERS] DKFZ bias filter docker > > Dear dockertesters, > > I was told to contact you regarding the DKFZ bias filter docker. The > docker is basically ready and it also passed internal testing. The > scripts and Dockerfile can be found here: > https://github.com/eilslabs/DKFZBiasFilter > Johannes and Manuel (CCed) have been working on bringing it into > Dockstore (they ran into a NFS problem). Johannes is currently on > vacation (until October 23rd). > Please let me know if it is sufficient to have the docker available on > Dockstore some time next week. Alternatively it might be possible that > you work with the scripts and Dockerfile on GitHub. If none of this > works I (or probably Manuel) can also try to get it running on Dockstore > some time later this week. > > Best, > Ivo > > PS: Sorry to the DKFZ people the email before bounced back because the > link of the email address on the Wiki had a typo. > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters On 10/18/2016 05:38 PM, Denis Yuen wrote: > Hi, > > Thanks for the heads-up, we can probably help out with writing the CWL descriptor for this, but I do have a few questions. > > 1) How does this relate to the existing dkfz workflow? ( https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows ) Is this a step that follows that workflow or is this a portion of that workflow that has been split out? > 2) Could we get a readme that describes how to use this? > 3) Do you have some non-confidential small test data that we can use to run/test this quickly before running a real donor through this? > > Thanks! > > ________________________________________ > From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] > Sent: October 18, 2016 3:06 AM > To: docktesters at lists.icgc.org > Cc: Werner, Johannes; prinz; Schlesner, Matthias > Subject: [DOCKTESTERS] DKFZ bias filter docker > > Dear dockertesters, > > I was told to contact you regarding the DKFZ bias filter docker. The > docker is basically ready and it also passed internal testing. The > scripts and Dockerfile can be found here: > https://github.com/eilslabs/DKFZBiasFilter > Johannes and Manuel (CCed) have been working on bringing it into > Dockstore (they ran into a NFS problem). Johannes is currently on > vacation (until October 23rd). > Please let me know if it is sufficient to have the docker available on > Dockstore some time next week. Alternatively it might be possible that > you work with the scripts and Dockerfile on GitHub. If none of this > works I (or probably Manuel) can also try to get it running on Dockstore > some time later this week. > > Best, > Ivo > > PS: Sorry to the DKFZ people the email before bounced back because the > link of the email address on the Wiki had a typo. > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters From Denis.Yuen at oicr.on.ca Wed Oct 19 16:36:39 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Wed, 19 Oct 2016 20:36:39 +0000 Subject: [DOCKTESTERS] Sanger symlinked input file issue Message-ID: <27512884B2D81B41AAB7BB266248F240C09A19FA@exmb2.ad.oicr.on.ca> Hi, FYI, breaking this out into a thread so hopefully this renders better in threaded email clients. Keiran, I took a look at running that /usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl command while pointing it at the symlinked version in /var/spool/cwl and surprisingly it works (or at least executes well past the displayed error). Unfortunately, this command is generated and it doesn't seem to be part of the CGP-Somatic-Docker codebase. Basically, the SeqWare code that is in the that repo encodes this command (note that it refers to the version in /var/spool/cwl) seqware at 83bd876f1b5d:/datastore/oozie-76d2e11f-9e92-44aa-841e-a7ddf70a4aab/generated-scripts$ cat s58_cgpPindel_input_70.sh #!/usr/bin/env bash set -o errexit set -o pipefail export SEQWARE_SETTINGS=/var/spool/cwl/.seqware/settings cd /datastore/oozie-76d2e11f-9e92-44aa-841e-a7ddf70a4aab /usr/bin/time /usr/bin/time --format="Wall_s %e\nUser_s %U\nSystem_s %S\nMax_kb %M" --output=/var/spool/cwl/timings/0_cgpPindel_input_2 /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/Workflow_Bundle_CgpSomaticCore/0.0.0/bin/wrapper.sh /opt/wtsi-cgp pindel.pl -p input -r /var/spool/cwl/reference_files/genome.fa -e MT,GL%,hs37d5,NC_007605 -st WGS -as GRCh37 -sp human -s /var/spool/cwl/reference_files/pindel/simpleRepeats.bed.gz -f /var/spool/cwl/reference_files/pindel/genomicRules.lst -g /var/spool/cwl/reference_files/pindel/human.GRCh37.indelCoding.bed.gz -u /var/spool/cwl/reference_files/pindel/pindel_np.gff3.gz -sf /var/spool/cwl/reference_files/pindel/softRules.lst -b /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz -o /var/spool/cwl/0/pindel -t /var/spool/cwl/7875b5196f6b8b52847f99bf370aada0.bam -n /var/spool/cwl/fdcb1bd7cffca69d15383ca9566c58e0.bam -i 2 -c 4 I think /opt/wtsi-cgp pindel.pl must then be generating the failing command (using /opt/wtsi-cgp/bin/pindel_input_gen.pl ) that I listed below with the original symlinked location. I believe that the culprit is this section of pindel.pl which must be tracing the input back to the original target of the symlink and using that instead. # make all things that appear to be paths absolute for (keys %opts) { $opts{$_} = abs_path($opts{$_}) if(defined $opts{$_} && -e $opts{$_}); } Not entirely sure why this affects this particular donor, but that's what I've found so far. Should we attempt to modify pindel or is there a configuration setting we can take advantage of? ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 17, 2016 4:25 AM To: Denis Yuen Cc: Miguel Vazquez; Adam Struck; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, The commands executed by the Pindel step appear to be being handed the original BAM file location rather (/var/lib/cwl) than the symlinks in the output area (/var/spool/cwl). This would be the problem. All the jobs that execute after the BAS generation should use the symlinked BAMs in the output area (although I think it's only important for pindel and brass). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 15 Oct 2016, at 22:10, Denis Yuen > wrote: Hi, Agreed, to sum up: 1) The donor test set includes bas files. The real donor sets do not. 2) That said, the way the CWL file is written, regardless of whether a bas file is provided in the test set, they don't actually make it into the docker container. Instead, they get generated inside the container while it is running. 3) The pindel step does indeed fail in DO50311 on a host that successfully ran the test data. Keiran, some additional info for debugging. The CWL file results in this docker invocation: [job temp8674499429956656923.cwl] /tmp/tmp0NOg7v$ docker \ run \ -i \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:ro \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam:ro \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam.bai:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai:ro \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/ab975a04-937f-40fc-b3e5-40b41c2295fc/GRCh37d5_CGP_refBundle.tar.gz:/var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz:ro \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam:ro \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/d3ae586e-1251-470b-bbf8-f498e5895312/GRCh37d5_battenberg.tar.gz:/var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz:ro \ --volume=/tmp/tmp0NOg7v:/var/spool/cwl:rw \ --volume=/home/ubuntu/CGP-Somatic-Docker-original/datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/workingQriPMs:/tmp:rw \ --workdir=/var/spool/cwl \ --read-only=true \ --user=1000 \ --env=TMPDIR=/tmp \ --env=HOME=/var/spool/cwl \ quay.io/pancancer/pcawg-sanger-cgp-workflow:2.0.0-cwl1 \ python \ /home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py \ --tumor \ /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam \ --normal \ /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam \ --refFrom \ /var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz \ --bbFrom \ /var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz The listing of the working directory (/var/spool/cwl) is as follows and does seem to include generated bas files: ubuntu at sanger-retest:/tmp/tmp0NOg7v$ ls -alhtr total 92K -rw-r--r-- 1 ubuntu ubuntu 1.6K Oct 14 22:07 workflow.ini -rw-r--r-- 1 ubuntu ubuntu 28 Oct 14 22:07 .Rprofile drwxr-xr-x 3 ubuntu root 4.0K Oct 14 22:07 .seqware drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:07 ngsCounts lrwxrwxrwx 1 ubuntu ubuntu 89 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam lrwxrwxrwx 1 ubuntu ubuntu 93 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam.bai -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai lrwxrwxrwx 1 ubuntu ubuntu 89 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam lrwxrwxrwx 1 ubuntu ubuntu 93 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam.bai -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai drwxr-xr-x 3 ubuntu ubuntu 4.0K Oct 14 22:07 1 drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:07 genotype drwxr-xr-x 8 ubuntu ubuntu 4.0K Oct 14 22:19 reference_files drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:19 genotype_b02b4bba-6e66-44fb-a48f-38c309aaaac5 -rw-r--r-- 1 ubuntu ubuntu 2.4K Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz -rw-r--r-- 1 ubuntu ubuntu 33 Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz.md5 -rw-r--r-- 1 ubuntu ubuntu 1.5K Oct 14 22:58 fdcb1bd7cffca69d15383ca9566c58e0.bam.bas -rw-r--r-- 1 ubuntu ubuntu 2.3K Oct 14 23:17 7875b5196f6b8b52847f99bf370aada0.bam.bas -rw-r--r-- 1 ubuntu ubuntu 33 Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz.md5 -rw-r--r-- 1 ubuntu ubuntu 2.4K Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 15 10:02 timings drwxr-xr-x 5 ubuntu ubuntu 4.0K Oct 15 10:02 0 drwxr-xr-x 3 ubuntu ubuntu 4.0K Oct 15 10:06 bbCounts drwx------ 11 ubuntu ubuntu 4.0K Oct 15 20:28 . drwxrwxrwt 21 root root 4.0K Oct 15 20:29 .. The full output of the failing script is: Errors from command: /usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz Unknown sort order field: unknown Collated 500000 readpairs (in 6 sec.) [V] 1 34.4825MB/s 133279 Thread Worker 1: started Thread 1 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' Collated 500000 readpairs (in 4 sec.) [V] 2 39.9836MB/s 154626 Thread Worker 2: started Thread 2 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' Collated 500000 readpairs (in 4 sec.) Thread Worker 3: started Thread 3 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03' [V] 3 42.4188MB/s 164102 Collated 500000 readpairs (in 4 sec.) Thread Worker 4: started Thread 4 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' [V] 4 43.7799MB/s 169368 Collated 500000 readpairs (in 4 sec.) An error occurred while running: /opt/wtsi-cgp/bin/bamcollate2 outputformat=sam colsbs=268435456 collate=1 classes=F,F2 exclude=DUP,SECONDARY,QCFAIL,SUPPLEMENTARY T=/var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c/tmp1kNw/collate_tmp filename=/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam ERROR: Converter thread error: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' Perl exited with active threads: 0 running and unjoined 3 finished and unjoined 0 running and detached Thread 2 terminated abnormally: main=HASH(0x50044b0) at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 115. Thread error: "/usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz" unexpectedly returned exit value 29 at (eval 410) line 13 thread 2. at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 190 Command exited with non-zero status 25 10.23user 8.79system 0:32.33elapsed 58%CPU (0avgtext+0avgdata 10819936maxresident)k 1184inputs+8552outputs (2major+711948minor)pagefaults 0swaps Please let me know if any of the files on that host would be useful to debug this. ________________________________ From: mikisvaz at gmail.com [mikisvaz at gmail.com] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es] Sent: October 15, 2016 2:52 AM To: Denis Yuen Cc: Adam Struck; Keiran Raine; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Just to clarify Denis, the BAS file was not present in the download of the donor data, while it was present in the download of the test data. That is as much as I observed, and this matched Keiran comment that a missing BAS file was consistent with the pindel error.I have no idea of what the workflow was doing so as far as I know the BAS could have been created correctly and the error was something else. Best regards On Sat, Oct 15, 2016 at 12:20 AM, Denis Yuen > wrote: Hi, Adam, to summarise: My observations seem to match yours, the bas file and input bams are generated inside the Docker container with the test data. However, Miguel has observed that something else seems to be happening with DO50311 that looks like the bas file being missing. I'm currently running that donor to see if I can extract more information and determine what is occurring. ________________________________ From: Adam Struck [strucka at ohsu.edu] Sent: October 14, 2016 6:10 PM To: Denis Yuen; Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, Sorry, to chime in late. The bas file and input BAMs should be getting colocalized already (see below). Where are these files ending up when you run the workflow? Inputs are symlinked to the OUTDIR. https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L1035-L1039 Bas files are written to OUTDIR https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 I have now run 25 donors worth of data (from PRAD-UK) through this workflow using the WDL descriptor and the cromwell engine on the CCC platform without an issue. -Adam From: > on behalf of Denis Yuen > Date: Friday, October 14, 2016 at 2:55 PM To: Keiran Raine > Cc: "docktesters at lists.icgc.org" > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, Just as a heads-up for the end-of-week in this thread. > RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm This got me on the right track, I actually needed the following syntax RUN cpanm --mirror https://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install CJFIELDS/BioPerl-1.6.924.tar.gz Const::Fast Graph && \ rm -rf ~/.cpanm However, it looks like making the suggested change breaks the workflow when attempting to run with the test data. In short, the bas file is definitely being generated inside the Docker container. Moving it to the suggested location breaks the workflow later. I'm currently attempting to run the donor DO50311 to see if I can get more insight into what is going on. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 2:09 PM To: Denis Yuen Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, You've hit an issue that only has occurred in the last few weeks for us also. BioPerl released a new version (first in ~20 months) that split the repository moving a whole section into a different package. The fix would be to force the first install of BioPerl to a specific version. Modify line 25/26 of the Dockerfile from: RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version Const::Fast Graph && \ rm -rf ~/.cpanm to RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ rm -rf ~/.cpanm Thankfully something I could identify immediately. Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 17:04, Denis Yuen > wrote: Hi, Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section. Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)? I'm attaching the build log from the Dockerfile and the log from inside the container. ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca] Sent: October 12, 2016 10:59 AM To: Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, While that would have been a good explanation, unfortunately, it doesn't seem to be the case. In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like inputs: tumor: type: File inputBinding: position: 1 prefix: --tumor secondaryFiles: - .bai refFrom: type: File inputBinding: position: 3 prefix: --refFrom bbFrom: type: File inputBinding: position: 4 prefix: --bbFrom normal: type: File inputBinding: position: 2 prefix: --normal secondaryFiles: - .bai The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. ________________________________ From: Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 10:49 AM To: Denis Yuen Cc: Miguel Vazquez; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi Denis, I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 15:36, Denis Yuen > wrote: Hi, I can make the modification, I'll run it through the test data and that should finish in roughly a day. In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? ________________________________ From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Keiran Raine [kr2 at sanger.ac.uk] Sent: October 12, 2016 6:16 AM To: Miguel Vazquez Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Hi, This is assuming that it is possible to write to the location the BAM are in. I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] Hi Keiran, Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? Miguel On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: In the original version we didn't do this step, if we have write access it can be made to do that Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: Hi Keiran, If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? Would it not be better if it read private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); File f = new File(sampleBam); thisJob.getCommand() .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") .addArgument(installBase) .addArgument("bam_stats") .addArgument("-i " + sampleBam) .addArgument("-o " + sampleBam + ".bas") ; return thisJob; } Best Miguel On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: Relevant section of code: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:40, Keiran Raine > wrote: Hi, There is a step generating the BAS files: [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. Hope this is easier to solve now, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: Keiran, Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? Best Miguel On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. Best Miguel -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kr2 at sanger.ac.uk Thu Oct 20 04:04:18 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Thu, 20 Oct 2016 09:04:18 +0100 Subject: [DOCKTESTERS] Sanger symlinked input file issue In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A19FA@exmb2.ad.oicr.on.ca> References: <27512884B2D81B41AAB7BB266248F240C09A19FA@exmb2.ad.oicr.on.ca> Message-ID: <37D04046-ABED-4687-B07A-0DE0845B34BA@sanger.ac.uk> Hi Denis, I was afraid of this, but I thought it had been fixed before the release in the docker: https://github.com/cancerit/cgpPindel/commit/13dea0d2546f932d5c521e5aa612069d2b6226ba Unfortunately you can't cleanly upgrade to the release with that as it switched from the legacy Bio::DB::Sam to Bio::DB::HTS which changes the whole stack of tools. I'll try to create a hotfix pulling that change back in time add a release for it (will have to think about this as hubflow doesn't support that). Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 19 Oct 2016, at 21:36, Denis Yuen wrote: > > Hi, > > FYI, breaking this out into a thread so hopefully this renders better in threaded email clients. > > Keiran, I took a look at running that /usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl command while pointing it at the symlinked version in /var/spool/cwl and surprisingly it works (or at least executes well past the displayed error). > > Unfortunately, this command is generated and it doesn't seem to be part of the CGP-Somatic-Docker codebase. > Basically, the SeqWare code that is in the that repo encodes this command (note that it refers to the version in /var/spool/cwl) > > seqware at 83bd876f1b5d:/datastore/oozie-76d2e11f-9e92-44aa-841e-a7ddf70a4aab/generated-scripts$ cat s58_cgpPindel_input_70.sh > #!/usr/bin/env bash > set -o errexit > set -o pipefail > > export SEQWARE_SETTINGS=/var/spool/cwl/.seqware/settings > cd /datastore/oozie-76d2e11f-9e92-44aa-841e-a7ddf70a4aab > /usr/bin/time /usr/bin/time --format="Wall_s %e\nUser_s %U\nSystem_s %S\nMax_kb %M" --output=/var/spool/cwl/timings/0_cgpPindel_input_2 /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/Workflow_Bundle_CgpSomaticCore/0.0.0/bin/wrapper.sh /opt/wtsi-cgp pindel.pl -p input -r /var/spool/cwl/reference_files/genome.fa -e MT,GL%,hs37d5,NC_007605 -st WGS -as GRCh37 -sp human -s /var/spool/cwl/reference_files/pindel/simpleRepeats.bed.gz -f /var/spool/cwl/reference_files/pindel/genomicRules.lst -g /var/spool/cwl/reference_files/pindel/human.GRCh37.indelCoding.bed.gz -u /var/spool/cwl/reference_files/pindel/pindel_np.gff3.gz -sf /var/spool/cwl/reference_files/pindel/softRules.lst -b /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz -o /var/spool/cwl/0/pindel -t /var/spool/cwl/7875b5196f6b8b52847f99bf370aada0.bam -n /var/spool/cwl/fdcb1bd7cffca69d15383ca9566c58e0.bam -i 2 -c 4 > > I think /opt/wtsi-cgp pindel.pl must then be generating the failing command (using /opt/wtsi-cgp/bin/pindel_input_gen.pl ) that I listed below with the original symlinked location. > I believe that the culprit is this section of pindel.pl which must be tracing the input back to the original target of the symlink and using that instead. > > # make all things that appear to be paths absolute > for (keys %opts) { > $opts{$_} = abs_path($opts{$_}) if(defined $opts{$_} && -e $opts{$_}); > } > > Not entirely sure why this affects this particular donor, but that's what I've found so far. Should we attempt to modify pindel or is there a configuration setting we can take advantage of? > > > > From: Keiran Raine [kr2 at sanger.ac.uk ] > Sent: October 17, 2016 4:25 AM > To: Denis Yuen > Cc: Miguel Vazquez; Adam Struck; docktesters at lists.icgc.org > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > Hi Denis, > > The commands executed by the Pindel step appear to be being handed the original BAM file location rather (/var/lib/cwl) than the symlinks in the output area (/var/spool/cwl). This would be the problem. > > All the jobs that execute after the BAS generation should use the symlinked BAMs in the output area (although I think it's only important for pindel and brass). > > Regards, > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > >> On 15 Oct 2016, at 22:10, Denis Yuen > wrote: >> >> Hi, >> >> Agreed, to sum up: >> >> 1) The donor test set includes bas files. The real donor sets do not. >> 2) That said, the way the CWL file is written, regardless of whether a bas file is provided in the test set, they don't actually make it into the docker container. Instead, they get generated inside the container while it is running. >> 3) The pindel step does indeed fail in DO50311 on a host that successfully ran the test data. >> >> Keiran, some additional info for debugging. >> >> The CWL file results in this docker invocation: >> >> [job temp8674499429956656923.cwl] /tmp/tmp0NOg7v$ docker \ >> run \ >> -i \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:ro \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam:ro \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam.bai:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai:ro \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/ab975a04-937f-40fc-b3e5-40b41c2295fc/GRCh37d5_CGP_refBundle.tar.gz:/var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz:ro \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam:ro \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/d3ae586e-1251-470b-bbf8-f498e5895312/GRCh37d5_battenberg.tar.gz:/var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz:ro \ >> --volume=/tmp/tmp0NOg7v:/var/spool/cwl:rw \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/workingQriPMs:/tmp:rw \ >> --workdir=/var/spool/cwl \ >> --read-only=true \ >> --user=1000 \ >> --env=TMPDIR=/tmp \ >> --env=HOME=/var/spool/cwl \ >> quay.io/pancancer/pcawg-sanger-cgp-workflow:2.0.0-cwl1 \ >> python \ >> /home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py \ >> --tumor \ >> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam \ >> --normal \ >> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam \ >> --refFrom \ >> /var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz \ >> --bbFrom \ >> /var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz >> >> The listing of the working directory (/var/spool/cwl) is as follows and does seem to include generated bas files: >> >> ubuntu at sanger-retest:/tmp/tmp0NOg7v$ ls -alhtr >> total 92K >> -rw-r--r-- 1 ubuntu ubuntu 1.6K Oct 14 22:07 workflow.ini >> -rw-r--r-- 1 ubuntu ubuntu 28 Oct 14 22:07 .Rprofile >> drwxr-xr-x 3 ubuntu root 4.0K Oct 14 22:07 .seqware >> drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:07 ngsCounts >> lrwxrwxrwx 1 ubuntu ubuntu 89 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam >> lrwxrwxrwx 1 ubuntu ubuntu 93 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam.bai -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai >> lrwxrwxrwx 1 ubuntu ubuntu 89 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam >> lrwxrwxrwx 1 ubuntu ubuntu 93 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam.bai -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai >> drwxr-xr-x 3 ubuntu ubuntu 4.0K Oct 14 22:07 1 >> drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:07 genotype >> drwxr-xr-x 8 ubuntu ubuntu 4.0K Oct 14 22:19 reference_files >> drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:19 genotype_b02b4bba-6e66-44fb-a48f-38c309aaaac5 >> -rw-r--r-- 1 ubuntu ubuntu 2.4K Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz >> -rw-r--r-- 1 ubuntu ubuntu 33 Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz.md5 >> -rw-r--r-- 1 ubuntu ubuntu 1.5K Oct 14 22:58 fdcb1bd7cffca69d15383ca9566c58e0.bam.bas >> -rw-r--r-- 1 ubuntu ubuntu 2.3K Oct 14 23:17 7875b5196f6b8b52847f99bf370aada0.bam.bas >> -rw-r--r-- 1 ubuntu ubuntu 33 Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz.md5 >> -rw-r--r-- 1 ubuntu ubuntu 2.4K Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz >> drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 15 10:02 timings >> drwxr-xr-x 5 ubuntu ubuntu 4.0K Oct 15 10:02 0 >> drwxr-xr-x 3 ubuntu ubuntu 4.0K Oct 15 10:06 bbCounts >> drwx------ 11 ubuntu ubuntu 4.0K Oct 15 20:28 . >> drwxrwxrwt 21 root root 4.0K Oct 15 20:29 .. >> >> The full output of the failing script is: >> >> Errors from command: /usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz >> >> Unknown sort order field: unknown >> Collated 500000 readpairs (in 6 sec.) >> [V] 1 34.4825MB/s 133279 >> Thread Worker 1: started >> Thread 1 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' >> Collated 500000 readpairs (in 4 sec.) >> [V] 2 39.9836MB/s 154626 >> Thread Worker 2: started >> Thread 2 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' >> Collated 500000 readpairs (in 4 sec.) >> Thread Worker 3: started >> Thread 3 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03' >> [V] 3 42.4188MB/s 164102 >> Collated 500000 readpairs (in 4 sec.) >> Thread Worker 4: started >> Thread 4 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' >> [V] 4 43.7799MB/s 169368 >> Collated 500000 readpairs (in 4 sec.) >> An error occurred while running: >> /opt/wtsi-cgp/bin/bamcollate2 outputformat=sam colsbs=268435456 collate=1 classes=F,F2 exclude=DUP,SECONDARY,QCFAIL,SUPPLEMENTARY T=/var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c/tmp1kNw/collate_tmp filename=/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam >> ERROR: Converter thread error: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' >> >> Perl exited with active threads: >> 0 running and unjoined >> 3 finished and unjoined >> 0 running and detached >> Thread 2 terminated abnormally: main=HASH(0x50044b0) at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 115. >> Thread error: "/usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz" unexpectedly returned exit value 29 at (eval 410) line 13 thread 2. >> at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 190 >> >> Command exited with non-zero status 25 >> 10.23user 8.79system 0:32.33elapsed 58%CPU (0avgtext+0avgdata 10819936maxresident)k >> 1184inputs+8552outputs (2major+711948minor)pagefaults 0swaps >> >> Please let me know if any of the files on that host would be useful to debug this. >> >> >> From: mikisvaz at gmail.com [mikisvaz at gmail.com ] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es ] >> Sent: October 15, 2016 2:52 AM >> To: Denis Yuen >> Cc: Adam Struck; Keiran Raine; docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Just to clarify Denis, the BAS file was not present in the download of the donor data, while it was present in the download of the test data. That is as much as I observed, and this matched Keiran comment that a missing BAS file was consistent with the pindel error.I have no idea of what the workflow was doing so as far as I know the BAS could have been created correctly and the error was something else. >> >> Best regards >> >> On Sat, Oct 15, 2016 at 12:20 AM, Denis Yuen > wrote: >> Hi, >> >> Adam, to summarise: >> My observations seem to match yours, the bas file and input bams are generated inside the Docker container with the test data. However, Miguel has observed that something else seems to be happening with DO50311 that looks like the bas file being missing. I'm currently running that donor to see if I can extract more information and determine what is occurring. >> >> >> From: Adam Struck [strucka at ohsu.edu ] >> Sent: October 14, 2016 6:10 PM >> To: Denis Yuen; Keiran Raine >> >> Cc: docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi Denis, >> >> Sorry, to chime in late. The bas file and input BAMs should be getting colocalized already (see below). Where are these files ending up when you run the workflow? >> >> Inputs are symlinked to the OUTDIR. >> https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L1035-L1039 >> >> Bas files are written to OUTDIR >> https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 >> >> I have now run 25 donors worth of data (from PRAD-UK) through this workflow using the WDL descriptor and the cromwell engine on the CCC platform without an issue. >> >> -Adam >> >> From: > on behalf of Denis Yuen > >> Date: Friday, October 14, 2016 at 2:55 PM >> To: Keiran Raine > >> Cc: "docktesters at lists.icgc.org " > >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi, >> >> Just as a heads-up for the end-of-week in this thread. >> >> > RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ >> rm -rf ~/.cpanm >> >> This got me on the right track, I actually needed the following syntax >> >> RUN cpanm --mirror https://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install CJFIELDS/BioPerl-1.6.924.tar.gz Const::Fast Graph && \ >> rm -rf ~/.cpanm >> >> However, it looks like making the suggested change breaks the workflow when attempting to run with the test data. In short, the bas file is definitely being generated inside the Docker container. Moving it to the suggested location breaks the workflow later. >> >> I'm currently attempting to run the donor DO50311 to see if I can get more insight into what is going on. >> >> From: Keiran Raine [kr2 at sanger.ac.uk ] >> Sent: October 12, 2016 2:09 PM >> To: Denis Yuen >> Cc: docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi Denis, >> >> You've hit an issue that only has occurred in the last few weeks for us also. >> >> BioPerl released a new version (first in ~20 months) that split the repository moving a whole section into a different package. >> >> The fix would be to force the first install of BioPerl to a specific version. Modify line 25/26 of the Dockerfile from: >> >> RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version Const::Fast Graph && \ >> rm -rf ~/.cpanm >> >> to >> >> RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ >> rm -rf ~/.cpanm >> >> Thankfully something I could identify immediately. >> >> Regards, >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 12 Oct 2016, at 17:04, Denis Yuen > wrote: >> >> Hi, >> >> Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section. >> Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)? >> I'm attaching the build log from the Dockerfile and the log from inside the container. >> >> >> >> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org ] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca ] >> Sent: October 12, 2016 10:59 AM >> To: Keiran Raine >> Cc: docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi, >> >> While that would have been a good explanation, unfortunately, it doesn't seem to be the case. >> In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like >> >> inputs: >> tumor: >> type: File >> inputBinding: >> position: 1 >> prefix: --tumor >> secondaryFiles: >> - .bai >> >> refFrom: >> type: File >> inputBinding: >> position: 3 >> prefix: --refFrom >> bbFrom: >> type: File >> inputBinding: >> position: 4 >> prefix: --bbFrom >> normal: >> type: File >> inputBinding: >> position: 2 >> prefix: --normal >> secondaryFiles: >> - .bai >> The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. >> >> From: Keiran Raine [kr2 at sanger.ac.uk ] >> Sent: October 12, 2016 10:49 AM >> To: Denis Yuen >> Cc: Miguel Vazquez; docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi Denis, >> >> I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 12 Oct 2016, at 15:36, Denis Yuen > wrote: >> >> Hi, >> I can make the modification, I'll run it through the test data and that should finish in roughly a day. >> In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? >> >> >> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org ] on behalf of Keiran Raine [kr2 at sanger.ac.uk ] >> Sent: October 12, 2016 6:16 AM >> To: Miguel Vazquez >> Cc: docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi, >> >> This is assuming that it is possible to write to the location the BAM are in. >> >> I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). >> >> Regards, >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: >> >> [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] >> >> Hi Keiran, >> >> Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? >> >> >> Miguel >> >> >> >> On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: >> In the original version we didn't do this step, if we have write access it can be made to do that >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: >> >> Hi Keiran, >> >> If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? >> >> Would it not be better if it read >> >> private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { >> Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); >> >> File f = new File(sampleBam); >> >> thisJob.getCommand() >> >> .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") >> >> .addArgument(installBase) >> >> .addArgument("bam_stats") >> >> .addArgument("-i " + sampleBam) >> >> .addArgument("-o " + sampleBam + ".bas") >> >> ; >> >> return thisJob; >> } >> >> Best >> >> Miguel >> >> On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: >> Relevant section of code: >> >> https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 11 Oct 2016, at 13:40, Keiran Raine > wrote: >> >> Hi, >> >> There is a step generating the BAS files: >> >> [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh >> [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh >> >> But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. >> >> This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. >> >> Hope this is easier to solve now, >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: >> >> Keiran, >> >> Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? >> >> Best >> >> Miguel >> >> On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: >> >> 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. >> >> >> I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. >> >> Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. >> >> Best >> >> Miguel >> >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kr2 at sanger.ac.uk Thu Oct 20 04:28:12 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Thu, 20 Oct 2016 09:28:12 +0100 Subject: [DOCKTESTERS] Sanger symlinked input file issue In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A19FA@exmb2.ad.oicr.on.ca> References: <27512884B2D81B41AAB7BB266248F240C09A19FA@exmb2.ad.oicr.on.ca> Message-ID: Hi Denis, Less painful than I thought. Just update line 163 of the dockerfile to pull this version: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockerfile Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 19 Oct 2016, at 21:36, Denis Yuen wrote: > > Hi, > > FYI, breaking this out into a thread so hopefully this renders better in threaded email clients. > > Keiran, I took a look at running that /usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl command while pointing it at the symlinked version in /var/spool/cwl and surprisingly it works (or at least executes well past the displayed error). > > Unfortunately, this command is generated and it doesn't seem to be part of the CGP-Somatic-Docker codebase. > Basically, the SeqWare code that is in the that repo encodes this command (note that it refers to the version in /var/spool/cwl) > > seqware at 83bd876f1b5d:/datastore/oozie-76d2e11f-9e92-44aa-841e-a7ddf70a4aab/generated-scripts$ cat s58_cgpPindel_input_70.sh > #!/usr/bin/env bash > set -o errexit > set -o pipefail > > export SEQWARE_SETTINGS=/var/spool/cwl/.seqware/settings > cd /datastore/oozie-76d2e11f-9e92-44aa-841e-a7ddf70a4aab > /usr/bin/time /usr/bin/time --format="Wall_s %e\nUser_s %U\nSystem_s %S\nMax_kb %M" --output=/var/spool/cwl/timings/0_cgpPindel_input_2 /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/Workflow_Bundle_CgpSomaticCore/0.0.0/bin/wrapper.sh /opt/wtsi-cgp pindel.pl -p input -r /var/spool/cwl/reference_files/genome.fa -e MT,GL%,hs37d5,NC_007605 -st WGS -as GRCh37 -sp human -s /var/spool/cwl/reference_files/pindel/simpleRepeats.bed.gz -f /var/spool/cwl/reference_files/pindel/genomicRules.lst -g /var/spool/cwl/reference_files/pindel/human.GRCh37.indelCoding.bed.gz -u /var/spool/cwl/reference_files/pindel/pindel_np.gff3.gz -sf /var/spool/cwl/reference_files/pindel/softRules.lst -b /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz -o /var/spool/cwl/0/pindel -t /var/spool/cwl/7875b5196f6b8b52847f99bf370aada0.bam -n /var/spool/cwl/fdcb1bd7cffca69d15383ca9566c58e0.bam -i 2 -c 4 > > I think /opt/wtsi-cgp pindel.pl must then be generating the failing command (using /opt/wtsi-cgp/bin/pindel_input_gen.pl ) that I listed below with the original symlinked location. > I believe that the culprit is this section of pindel.pl which must be tracing the input back to the original target of the symlink and using that instead. > > # make all things that appear to be paths absolute > for (keys %opts) { > $opts{$_} = abs_path($opts{$_}) if(defined $opts{$_} && -e $opts{$_}); > } > > Not entirely sure why this affects this particular donor, but that's what I've found so far. Should we attempt to modify pindel or is there a configuration setting we can take advantage of? > > > > From: Keiran Raine [kr2 at sanger.ac.uk ] > Sent: October 17, 2016 4:25 AM > To: Denis Yuen > Cc: Miguel Vazquez; Adam Struck; docktesters at lists.icgc.org > Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference > > Hi Denis, > > The commands executed by the Pindel step appear to be being handed the original BAM file location rather (/var/lib/cwl) than the symlinks in the output area (/var/spool/cwl). This would be the problem. > > All the jobs that execute after the BAS generation should use the symlinked BAMs in the output area (although I think it's only important for pindel and brass). > > Regards, > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > >> On 15 Oct 2016, at 22:10, Denis Yuen > wrote: >> >> Hi, >> >> Agreed, to sum up: >> >> 1) The donor test set includes bas files. The real donor sets do not. >> 2) That said, the way the CWL file is written, regardless of whether a bas file is provided in the test set, they don't actually make it into the docker container. Instead, they get generated inside the container while it is running. >> 3) The pindel step does indeed fail in DO50311 on a host that successfully ran the test data. >> >> Keiran, some additional info for debugging. >> >> The CWL file results in this docker invocation: >> >> [job temp8674499429956656923.cwl] /tmp/tmp0NOg7v$ docker \ >> run \ >> -i \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:ro \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam:ro \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/8bcaff90-24fb-4d61-8898-800a95dce3e0/7875b5196f6b8b52847f99bf370aada0.bam.bai:/var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai:ro \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/ab975a04-937f-40fc-b3e5-40b41c2295fc/GRCh37d5_CGP_refBundle.tar.gz:/var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz:ro \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/70fb518e-42c8-4fdd-b473-d3b380aafbdb/fdcb1bd7cffca69d15383ca9566c58e0.bam:/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam:ro \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/./datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/inputs/d3ae586e-1251-470b-bbf8-f498e5895312/GRCh37d5_battenberg.tar.gz:/var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz:ro \ >> --volume=/tmp/tmp0NOg7v:/var/spool/cwl:rw \ >> --volume=/home/ubuntu/CGP-Somatic-Docker-original/datastore/launcher-ba5b9c2f-0b64-473b-8c9f-99e1153ad0de/workingQriPMs:/tmp:rw \ >> --workdir=/var/spool/cwl \ >> --read-only=true \ >> --user=1000 \ >> --env=TMPDIR=/tmp \ >> --env=HOME=/var/spool/cwl \ >> quay.io/pancancer/pcawg-sanger-cgp-workflow:2.0.0-cwl1 \ >> python \ >> /home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py \ >> --tumor \ >> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam \ >> --normal \ >> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam \ >> --refFrom \ >> /var/lib/cwl/stg218c30fe-3a28-4c3a-9803-253754dae462/GRCh37d5_CGP_refBundle.tar.gz \ >> --bbFrom \ >> /var/lib/cwl/stg4e55e3b3-46aa-4c4b-b4d6-3f2749498168/GRCh37d5_battenberg.tar.gz >> >> The listing of the working directory (/var/spool/cwl) is as follows and does seem to include generated bas files: >> >> ubuntu at sanger-retest:/tmp/tmp0NOg7v$ ls -alhtr >> total 92K >> -rw-r--r-- 1 ubuntu ubuntu 1.6K Oct 14 22:07 workflow.ini >> -rw-r--r-- 1 ubuntu ubuntu 28 Oct 14 22:07 .Rprofile >> drwxr-xr-x 3 ubuntu root 4.0K Oct 14 22:07 .seqware >> drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:07 ngsCounts >> lrwxrwxrwx 1 ubuntu ubuntu 89 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam >> lrwxrwxrwx 1 ubuntu ubuntu 93 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam.bai -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai >> lrwxrwxrwx 1 ubuntu ubuntu 89 Oct 14 22:07 fdcb1bd7cffca69d15383ca9566c58e0.bam -> /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam >> lrwxrwxrwx 1 ubuntu ubuntu 93 Oct 14 22:07 7875b5196f6b8b52847f99bf370aada0.bam.bai -> /var/lib/cwl/stg2e1e9218-4ca7-4cbb-b474-dfdf6ac0b8a6/7875b5196f6b8b52847f99bf370aada0.bam.bai >> drwxr-xr-x 3 ubuntu ubuntu 4.0K Oct 14 22:07 1 >> drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:07 genotype >> drwxr-xr-x 8 ubuntu ubuntu 4.0K Oct 14 22:19 reference_files >> drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 14 22:19 genotype_b02b4bba-6e66-44fb-a48f-38c309aaaac5 >> -rw-r--r-- 1 ubuntu ubuntu 2.4K Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz >> -rw-r--r-- 1 ubuntu ubuntu 33 Oct 14 22:19 b02b4bba-6e66-44fb-a48f-38c309aaaac5.genotype.tar.gz.md5 >> -rw-r--r-- 1 ubuntu ubuntu 1.5K Oct 14 22:58 fdcb1bd7cffca69d15383ca9566c58e0.bam.bas >> -rw-r--r-- 1 ubuntu ubuntu 2.3K Oct 14 23:17 7875b5196f6b8b52847f99bf370aada0.bam.bas >> -rw-r--r-- 1 ubuntu ubuntu 33 Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz.md5 >> -rw-r--r-- 1 ubuntu ubuntu 2.4K Oct 15 08:23 b02b4bba-6e66-44fb-a48f-38c309aaaac5.csc_0-0-0.20161014.somatic.genotype.tar.gz >> drwxr-xr-x 2 ubuntu ubuntu 4.0K Oct 15 10:02 timings >> drwxr-xr-x 5 ubuntu ubuntu 4.0K Oct 15 10:02 0 >> drwxr-xr-x 3 ubuntu ubuntu 4.0K Oct 15 10:06 bbCounts >> drwx------ 11 ubuntu ubuntu 4.0K Oct 15 20:28 . >> drwxrwxrwt 21 root root 4.0K Oct 15 20:29 .. >> >> The full output of the failing script is: >> >> Errors from command: /usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz >> >> Unknown sort order field: unknown >> Collated 500000 readpairs (in 6 sec.) >> [V] 1 34.4825MB/s 133279 >> Thread Worker 1: started >> Thread 1 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' >> Collated 500000 readpairs (in 4 sec.) >> [V] 2 39.9836MB/s 154626 >> Thread Worker 2: started >> Thread 2 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' >> Collated 500000 readpairs (in 4 sec.) >> Thread Worker 3: started >> Thread 3 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03' >> [V] 3 42.4188MB/s 164102 >> Collated 500000 readpairs (in 4 sec.) >> Thread Worker 4: started >> Thread 4 terminated abnormally: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' >> [V] 4 43.7799MB/s 169368 >> Collated 500000 readpairs (in 4 sec.) >> An error occurred while running: >> /opt/wtsi-cgp/bin/bamcollate2 outputformat=sam colsbs=268435456 collate=1 classes=F,F2 exclude=DUP,SECONDARY,QCFAIL,SUPPLEMENTARY T=/var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c/tmp1kNw/collate_tmp filename=/var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam >> ERROR: Converter thread error: Failed to get insert size for readgroup: 'CRUK-CI:LP6005333-DNA_C03'' >> >> Perl exited with active threads: >> 0 running and unjoined >> 3 finished and unjoined >> 0 running and detached >> Thread 2 terminated abnormally: main=HASH(0x50044b0) at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 115. >> Thread error: "/usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/lib/cwl/stge94eb32c-dc0f-4523-a88c-d1e00a322537/fdcb1bd7cffca69d15383ca9566c58e0.bam -o /var/spool/cwl/0/pindel/tmpPindel/8c0354eb-6a3e-4a98-b41c-f8add599884c -t 4 -e /var/spool/cwl/reference_files/brass/ucscHiDepth_0.01_mrg1000_no_exon_coreChrs.bed.gz" unexpectedly returned exit value 29 at (eval 410) line 13 thread 2. >> at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 190 >> >> Command exited with non-zero status 25 >> 10.23user 8.79system 0:32.33elapsed 58%CPU (0avgtext+0avgdata 10819936maxresident)k >> 1184inputs+8552outputs (2major+711948minor)pagefaults 0swaps >> >> Please let me know if any of the files on that host would be useful to debug this. >> >> >> From: mikisvaz at gmail.com [mikisvaz at gmail.com ] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es ] >> Sent: October 15, 2016 2:52 AM >> To: Denis Yuen >> Cc: Adam Struck; Keiran Raine; docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Just to clarify Denis, the BAS file was not present in the download of the donor data, while it was present in the download of the test data. That is as much as I observed, and this matched Keiran comment that a missing BAS file was consistent with the pindel error.I have no idea of what the workflow was doing so as far as I know the BAS could have been created correctly and the error was something else. >> >> Best regards >> >> On Sat, Oct 15, 2016 at 12:20 AM, Denis Yuen > wrote: >> Hi, >> >> Adam, to summarise: >> My observations seem to match yours, the bas file and input bams are generated inside the Docker container with the test data. However, Miguel has observed that something else seems to be happening with DO50311 that looks like the bas file being missing. I'm currently running that donor to see if I can extract more information and determine what is occurring. >> >> >> From: Adam Struck [strucka at ohsu.edu ] >> Sent: October 14, 2016 6:10 PM >> To: Denis Yuen; Keiran Raine >> >> Cc: docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi Denis, >> >> Sorry, to chime in late. The bas file and input BAMs should be getting colocalized already (see below). Where are these files ending up when you run the workflow? >> >> Inputs are symlinked to the OUTDIR. >> https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L1035-L1039 >> >> Bas files are written to OUTDIR >> https://github.com/adamstruck/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 >> >> I have now run 25 donors worth of data (from PRAD-UK) through this workflow using the WDL descriptor and the cromwell engine on the CCC platform without an issue. >> >> -Adam >> >> From: > on behalf of Denis Yuen > >> Date: Friday, October 14, 2016 at 2:55 PM >> To: Keiran Raine > >> Cc: "docktesters at lists.icgc.org " > >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi, >> >> Just as a heads-up for the end-of-week in this thread. >> >> > RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ >> rm -rf ~/.cpanm >> >> This got me on the right track, I actually needed the following syntax >> >> RUN cpanm --mirror https://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install CJFIELDS/BioPerl-1.6.924.tar.gz Const::Fast Graph && \ >> rm -rf ~/.cpanm >> >> However, it looks like making the suggested change breaks the workflow when attempting to run with the test data. In short, the bas file is definitely being generated inside the Docker container. Moving it to the suggested location breaks the workflow later. >> >> I'm currently attempting to run the donor DO50311 to see if I can get more insight into what is going on. >> >> From: Keiran Raine [kr2 at sanger.ac.uk ] >> Sent: October 12, 2016 2:09 PM >> To: Denis Yuen >> Cc: docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi Denis, >> >> You've hit an issue that only has occurred in the last few weeks for us also. >> >> BioPerl released a new version (first in ~20 months) that split the repository moving a whole section into a different package. >> >> The fix would be to force the first install of BioPerl to a specific version. Modify line 25/26 of the Dockerfile from: >> >> RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version Const::Fast Graph && \ >> rm -rf ~/.cpanm >> >> to >> >> RUN cpanm --mirror http://cpan.metacpan.org -l $OPT File::ShareDir File::ShareDir::Install Bio::Root::Version at 1.006924 Const::Fast Graph && \ >> rm -rf ~/.cpanm >> >> Thankfully something I could identify immediately. >> >> Regards, >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 12 Oct 2016, at 17:04, Denis Yuen > wrote: >> >> Hi, >> >> Keiran, I'm having trouble rebuilding the Sanger docker container in what I think is an unrelated section. >> Has anything changed about the build dependencies (for example, if there is a floating version that changed over time)? >> I'm attaching the build log from the Dockerfile and the log from inside the container. >> >> >> >> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org ] on behalf of Denis Yuen [Denis.Yuen at oicr.on.ca ] >> Sent: October 12, 2016 10:59 AM >> To: Keiran Raine >> Cc: docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi, >> >> While that would have been a good explanation, unfortunately, it doesn't seem to be the case. >> In the CWL file ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/Dockstore.cwl ) , the bam files are described like >> >> inputs: >> tumor: >> type: File >> inputBinding: >> position: 1 >> prefix: --tumor >> secondaryFiles: >> - .bai >> >> refFrom: >> type: File >> inputBinding: >> position: 3 >> prefix: --refFrom >> bbFrom: >> type: File >> inputBinding: >> position: 4 >> prefix: --bbFrom >> normal: >> type: File >> inputBinding: >> position: 2 >> prefix: --normal >> secondaryFiles: >> - .bai >> The type of File (as opposed to directory) means that while the bam and bai files are individually mounted into the docker container while it runs, the bas files never were. If Miguel has the "docker run" output from the run (should just be in the stdout of the run), we should be able to verify this by looking at what gets mounted into the container at runtime. >> >> From: Keiran Raine [kr2 at sanger.ac.uk ] >> Sent: October 12, 2016 10:49 AM >> To: Denis Yuen >> Cc: Miguel Vazquez; docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi Denis, >> >> I expect when you unpack the the test data the BAS files exist in the archive in that area so the fact it runs the out of that step to a different location isn't detected. >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 12 Oct 2016, at 15:36, Denis Yuen > wrote: >> >> Hi, >> I can make the modification, I'll run it through the test data and that should finish in roughly a day. >> In the meantime though, I am puzzled. Why would an issue like this affect a donor dataset, but not the test data? >> >> >> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org ] on behalf of Keiran Raine [kr2 at sanger.ac.uk ] >> Sent: October 12, 2016 6:16 AM >> To: Miguel Vazquez >> Cc: docktesters at lists.icgc.org >> Subject: Re: [DOCKTESTERS] [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference >> >> Hi, >> >> This is assuming that it is possible to write to the location the BAM are in. >> >> I think Denis would be best placed to make the minor modification as I don't know the process they are using for build and deploy of the images (I made modifications and then handed over for CWL). >> >> Regards, >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 12 Oct 2016, at 10:37, Miguel Vazquez > wrote: >> >> [The rest of the list where out of the loop for this part of the conversation, I'm putting them back in. In short, the Sanger pipeline produces the BAS file but not co-located with the BAM] >> >> Hi Keiran, >> >> Would it be possible then to change this and try again? what needs to happen? I guess you'll need to change the code and a new docker image be produced. Would this be our best alternative? >> >> >> Miguel >> >> >> >> On Tue, Oct 11, 2016 at 4:07 PM, Keiran Raine > wrote: >> In the original version we didn't do this step, if we have write access it can be made to do that >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 11 Oct 2016, at 13:49, Miguel Vazquez > wrote: >> >> Hi Keiran, >> >> If the BAS and BAM files need to be collocated, why is it not created next to the BAM file? >> >> Would it not be better if it read >> >> private Job basFileBaseJob(int tumourCount, String sampleBam, String process, int index) { >> Job thisJob = prepTimedJob(tumourCount, "basFileGenerate", process, index); >> >> File f = new File(sampleBam); >> >> thisJob.getCommand() >> >> .addArgument(getWorkflowBaseDir()+ "/bin/wrapper.sh") >> >> .addArgument(installBase) >> >> .addArgument("bam_stats") >> >> .addArgument("-i " + sampleBam) >> >> .addArgument("-o " + sampleBam + ".bas") >> >> ; >> >> return thisJob; >> } >> >> Best >> >> Miguel >> >> On Tue, Oct 11, 2016 at 2:44 PM, Keiran Raine > wrote: >> Relevant section of code: >> >> https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/blob/develop/src/main/java/io/seqware/pancancer/CgpSomaticCore.java#L769-L780 >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 11 Oct 2016, at 13:40, Keiran Raine > wrote: >> >> Hi, >> >> There is a step generating the BAS files: >> >> [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_control_11-runner.sh >> [2016/10/10 07:28:37] | Running command: bash /datastore/oozie-6599f0b9-8af7-44ca-a608-03c5bbc159c6/generated-scripts/s58_basFileGenerate_tumours_12-runner.sh >> >> But if the BAM files and BAS aren't co-located then you have a problem. You could symlink the BAM files into the work space and have all tools work from that path instead, deleting the symlinks at the end. >> >> This is one of the changes we had to implement differently as the BAS file data was being held in the GNOS xml data structures during the initial processing. Moving to this means that any BAM input is sufficient. >> >> Hope this is easier to solve now, >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 >> Office: H104 >> >> On 11 Oct 2016, at 13:31, Miguel Vazquez > wrote: >> >> Keiran, >> >> Its downloading the files still but in fact it does not seem to download any BAS file. Could you please educate me a bit on what are these and how I can create them? >> >> Best >> >> Miguel >> >> On Tue, Oct 11, 2016 at 2:22 PM, Miguel Vazquez > wrote: >> >> 4. It looks like the *_pindel_input_* steps run for only 22-23 seconds which could indicate a problem with either the headers or the absence of the BAS file from the expected location. >> >> >> I think that you just revealed the problem. There is in fact no BAS files only BAM and BAI. There where BAS files for the test data HCC1143 which is the one that in fact work. It seems like BAS files are not gathered by gnos, could that be? or that my script fails to copy them. I'll try to gather a different sample with my client and check. >> >> Not knowing a thing about these files explains why I didn't notice. I'll get back to you when I know more. >> >> Best >> >> Miguel >> >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >> >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From i.buchhalter at dkfz-heidelberg.de Thu Oct 20 09:48:05 2016 From: i.buchhalter at dkfz-heidelberg.de (Ivo Buchhalter) Date: Thu, 20 Oct 2016 15:48:05 +0200 Subject: [DOCKTESTERS] DKFZ bias filter docker In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A19D9@exmb2.ad.oicr.on.ca> References: <8b96b9c2-d0a4-d8ea-b2ca-7dabfc3a785d@dkfz.de> <27512884B2D81B41AAB7BB266248F240C09A1875@exmb2.ad.oicr.on.ca> <95c07a22-98c6-8602-a6ce-f17ca178ec9b@dkfz.de> <27512884B2D81B41AAB7BB266248F240C09A19D9@exmb2.ad.oicr.on.ca> Message-ID: Hi Denis, I just talked to Manuel. He told me that there is/was some architectural problem with the Docker I made. He and Johannes already fixed most of it. If it works out for you we will provide you with a CWL by the end of next week. I suppose it is better not to work on it in parallel and I am sure you have enough on your plate. I hope that works for you! Best, Ivo On 10/19/2016 09:33 PM, Denis Yuen wrote: > Hi, > > Thanks, this definitely helps to make it more clear. > I think this will give us enough information for us to start working on a CWL descriptor or to assist one being written. > > ________________________________________ > From: Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] > Sent: October 19, 2016 2:19 AM > To: Denis Yuen; Ivo Buchhalter; docktesters at lists.icgc.org > Cc: Werner, Johannes; prinz; Schlesner, Matthias > Subject: Re: [DOCKTESTERS] DKFZ bias filter docker > > Hi Denis, > > Sorry for the missing information. I updated the README in the > repository. I hope things are more clear now. >> 1) How does this relate to the existing dkfz workflow? ( https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows ) Is this a step that follows that workflow or is this a portion of that workflow that has been split out? > The filter is part of the DKFZ workflow. Since the other workflows don't > use similar filters the DKFZ stand alone filter was run on the complete > data set after merging the calls (only somatic SNV calls). >> 2) Could we get a readme that describes how to use this? > I updated the README. I hope it's more clear now. >> 3) Do you have some non-confidential small test data that we can use to run/test this quickly before running a real donor through this? > I will check if we can provide this later but the filter generally runs > only a couple of minutes (on a normal somatic "PASS" vcf with <= 10 000 > variants). > > Best, > Ivo > > >> Thanks! >> >> ________________________________________ >> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] >> Sent: October 18, 2016 3:06 AM >> To: docktesters at lists.icgc.org >> Cc: Werner, Johannes; prinz; Schlesner, Matthias >> Subject: [DOCKTESTERS] DKFZ bias filter docker >> >> Dear dockertesters, >> >> I was told to contact you regarding the DKFZ bias filter docker. The >> docker is basically ready and it also passed internal testing. The >> scripts and Dockerfile can be found here: >> https://github.com/eilslabs/DKFZBiasFilter >> Johannes and Manuel (CCed) have been working on bringing it into >> Dockstore (they ran into a NFS problem). Johannes is currently on >> vacation (until October 23rd). >> Please let me know if it is sufficient to have the docker available on >> Dockstore some time next week. Alternatively it might be possible that >> you work with the scripts and Dockerfile on GitHub. If none of this >> works I (or probably Manuel) can also try to get it running on Dockstore >> some time later this week. >> >> Best, >> Ivo >> >> PS: Sorry to the DKFZ people the email before bounced back because the >> link of the email address on the Wiki had a typo. >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters > On 10/18/2016 05:38 PM, Denis Yuen wrote: >> Hi, >> >> Thanks for the heads-up, we can probably help out with writing the CWL descriptor for this, but I do have a few questions. >> >> 1) How does this relate to the existing dkfz workflow? ( https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows ) Is this a step that follows that workflow or is this a portion of that workflow that has been split out? >> 2) Could we get a readme that describes how to use this? >> 3) Do you have some non-confidential small test data that we can use to run/test this quickly before running a real donor through this? >> >> Thanks! >> >> ________________________________________ >> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] >> Sent: October 18, 2016 3:06 AM >> To: docktesters at lists.icgc.org >> Cc: Werner, Johannes; prinz; Schlesner, Matthias >> Subject: [DOCKTESTERS] DKFZ bias filter docker >> >> Dear dockertesters, >> >> I was told to contact you regarding the DKFZ bias filter docker. The >> docker is basically ready and it also passed internal testing. The >> scripts and Dockerfile can be found here: >> https://github.com/eilslabs/DKFZBiasFilter >> Johannes and Manuel (CCed) have been working on bringing it into >> Dockstore (they ran into a NFS problem). Johannes is currently on >> vacation (until October 23rd). >> Please let me know if it is sufficient to have the docker available on >> Dockstore some time next week. Alternatively it might be possible that >> you work with the scripts and Dockerfile on GitHub. If none of this >> works I (or probably Manuel) can also try to get it running on Dockstore >> some time later this week. >> >> Best, >> Ivo >> >> PS: Sorry to the DKFZ people the email before bounced back because the >> link of the email address on the Wiki had a typo. >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters From buchanae at ohsu.edu Thu Oct 20 16:08:21 2016 From: buchanae at ohsu.edu (Alexander Buchanan) Date: Thu, 20 Oct 2016 20:08:21 +0000 Subject: [DOCKTESTERS] Broad PCAWG Tokens definition? Message-ID: <4F0774A8-E3A5-4C3E-AABE-5117523E87FE@ohsu.edu> Hey Gordon, I?m new here so maybe I missed this, but what is the tokens task? How would you describe what it does and the results it produces? Thanks! Alex Buchanan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsaksena at broadinstitute.org Thu Oct 20 17:03:42 2016 From: gsaksena at broadinstitute.org (Gordon Saksena) Date: Thu, 20 Oct 2016 17:03:42 -0400 Subject: [DOCKTESTERS] Broad PCAWG Tokens definition? In-Reply-To: <4F0774A8-E3A5-4C3E-AABE-5117523E87FE@ohsu.edu> References: <4F0774A8-E3A5-4C3E-AABE-5117523E87FE@ohsu.edu> Message-ID: The tokens task is part of the PoN (Panel of Normals) filter. It is a Java program that collects stats on the Normal BAM for the current donor. These stats are later aggregated with stats from the other samples (in another docker), and then used to flag certain variants in a VCF as suspect (in a third docker). The overall algorithm is in the process of being published. It should be one of the more straightforward algorithms to test - it has very predictable CPU time and RAM usage, and should produce outputs that can be tested via an exact binary match. It accepts just the normal BAM for its input. I'm planning later dockers to have a similar structure, though with increased memory and core requirements. The .wdl file will continue to contain a single task, with the bulk of the pipeline wiring embedded inside the docker. The dockers will either accept the source BAMs (for callers) or VCFs (for filters) as inputs. If you have feedback I can incorporate it into the other dockers. Gordon On Thu, Oct 20, 2016 at 4:08 PM, Alexander Buchanan wrote: > Hey Gordon, > > > > I?m new here so maybe I missed this, but what is the tokens task? How > would you describe what it does and the results it produces? > > > > Thanks! > > Alex Buchanan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From buchanae at ohsu.edu Thu Oct 20 17:18:55 2016 From: buchanae at ohsu.edu (Alexander Buchanan) Date: Thu, 20 Oct 2016 21:18:55 +0000 Subject: [DOCKTESTERS] Broad PCAWG Tokens definition? In-Reply-To: References: <4F0774A8-E3A5-4C3E-AABE-5117523E87FE@ohsu.edu> Message-ID: Great, thanks Gordon. Fyi, I just finished a successful run of the tokens wdl with a relatively small BAM file. I had to rebuild the docker image, because the one hosted on Dockerhub was missing a source file. I filed a github issue for that here: https://github.com/broadinstitute/pcawg/issues/10 A rebuild of the image seemed to do the trick. I had some issues building it (now resolved), probably because my build environment was different (Mac) and/or broken (mysterious network issues with Bioconductor.org). Is there some expected input/output pair I should be using to validate that it?s working as expected? I don?t have any feedback on the internals of the docker images necessarily (pipette, single wdl task, etc). I think my first impression was just being a little confused looking at the repo and not knowing where to start. The docs mention Firecloud, GCE, make, etc. which doesn?t match my environment, so I didn?t easily know where to start. But, after a day or poking around I figured it out. I guess that?s to be expected when opening up a new codebase on an unfamiliar and unreleased project. Thanks again, Alex Buchanan From: Gordon Saksena Date: Thursday, October 20, 2016 at 2:03 PM To: Alexander Buchanan Cc: "docktesters at lists.icgc.org" Subject: Re: Broad PCAWG Tokens definition? The tokens task is part of the PoN (Panel of Normals) filter. It is a Java program that collects stats on the Normal BAM for the current donor. These stats are later aggregated with stats from the other samples (in another docker), and then used to flag certain variants in a VCF as suspect (in a third docker). The overall algorithm is in the process of being published. It should be one of the more straightforward algorithms to test - it has very predictable CPU time and RAM usage, and should produce outputs that can be tested via an exact binary match. It accepts just the normal BAM for its input. I'm planning later dockers to have a similar structure, though with increased memory and core requirements. The .wdl file will continue to contain a single task, with the bulk of the pipeline wiring embedded inside the docker. The dockers will either accept the source BAMs (for callers) or VCFs (for filters) as inputs. If you have feedback I can incorporate it into the other dockers. Gordon On Thu, Oct 20, 2016 at 4:08 PM, Alexander Buchanan > wrote: Hey Gordon, I?m new here so maybe I missed this, but what is the tokens task? How would you describe what it does and the results it produces? Thanks! Alex Buchanan -------------- next part -------------- An HTML attachment was scrubbed... URL: From buchanae at ohsu.edu Thu Oct 20 17:19:56 2016 From: buchanae at ohsu.edu (Alexander Buchanan) Date: Thu, 20 Oct 2016 21:19:56 +0000 Subject: [DOCKTESTERS] Broad PCAWG Tokens definition? Message-ID: <17A0C9DA-616E-4D47-90C7-982F2A937783@ohsu.edu> I forgot to add: the tokens task is the only part that?s ready for testing, correct? From: on behalf of Alexander Buchanan Date: Thursday, October 20, 2016 at 2:18 PM To: Gordon Saksena Cc: "docktesters at lists.icgc.org" Subject: Re: [DOCKTESTERS] Broad PCAWG Tokens definition? Great, thanks Gordon. Fyi, I just finished a successful run of the tokens wdl with a relatively small BAM file. I had to rebuild the docker image, because the one hosted on Dockerhub was missing a source file. I filed a github issue for that here: https://github.com/broadinstitute/pcawg/issues/10 A rebuild of the image seemed to do the trick. I had some issues building it (now resolved), probably because my build environment was different (Mac) and/or broken (mysterious network issues with Bioconductor.org). Is there some expected input/output pair I should be using to validate that it?s working as expected? I don?t have any feedback on the internals of the docker images necessarily (pipette, single wdl task, etc). I think my first impression was just being a little confused looking at the repo and not knowing where to start. The docs mention Firecloud, GCE, make, etc. which doesn?t match my environment, so I didn?t easily know where to start. But, after a day or poking around I figured it out. I guess that?s to be expected when opening up a new codebase on an unfamiliar and unreleased project. Thanks again, Alex Buchanan From: Gordon Saksena Date: Thursday, October 20, 2016 at 2:03 PM To: Alexander Buchanan Cc: "docktesters at lists.icgc.org" Subject: Re: Broad PCAWG Tokens definition? The tokens task is part of the PoN (Panel of Normals) filter. It is a Java program that collects stats on the Normal BAM for the current donor. These stats are later aggregated with stats from the other samples (in another docker), and then used to flag certain variants in a VCF as suspect (in a third docker). The overall algorithm is in the process of being published. It should be one of the more straightforward algorithms to test - it has very predictable CPU time and RAM usage, and should produce outputs that can be tested via an exact binary match. It accepts just the normal BAM for its input. I'm planning later dockers to have a similar structure, though with increased memory and core requirements. The .wdl file will continue to contain a single task, with the bulk of the pipeline wiring embedded inside the docker. The dockers will either accept the source BAMs (for callers) or VCFs (for filters) as inputs. If you have feedback I can incorporate it into the other dockers. Gordon On Thu, Oct 20, 2016 at 4:08 PM, Alexander Buchanan > wrote: Hey Gordon, I?m new here so maybe I missed this, but what is the tokens task? How would you describe what it does and the results it produces? Thanks! Alex Buchanan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsaksena at broadinstitute.org Thu Oct 20 17:32:50 2016 From: gsaksena at broadinstitute.org (Gordon Saksena) Date: Thu, 20 Oct 2016 17:32:50 -0400 Subject: [DOCKTESTERS] Broad PCAWG Tokens definition? In-Reply-To: <17A0C9DA-616E-4D47-90C7-982F2A937783@ohsu.edu> References: <17A0C9DA-616E-4D47-90C7-982F2A937783@ohsu.edu> Message-ID: Tokens is the only one ready for testing. Yes, I need to sync the Dockerhub repo with the code updates from GIT. I am using a debian7bp environment for my builds and testing. The readme is for creating and debugging new tasks. Your test environment is probably going to be quite different. There is a .tok.gz file included in the Broad intermediate files for each donor. The intermediate files currently live on GNOS, in the form of one tarfile per donor. Either you can pull it from there, or I could post one to Jamboree for some donor. Gordon On Thu, Oct 20, 2016 at 5:19 PM, Alexander Buchanan wrote: > I forgot to add: the tokens task is the only part that?s ready for > testing, correct? > > > > *From: * on behalf > of Alexander Buchanan > *Date: *Thursday, October 20, 2016 at 2:18 PM > *To: *Gordon Saksena > *Cc: *"docktesters at lists.icgc.org" > *Subject: *Re: [DOCKTESTERS] Broad PCAWG Tokens definition? > > > > Great, thanks Gordon. > > > > Fyi, I just finished a successful run of the tokens wdl with a relatively > small BAM file. I had to rebuild the docker image, because the one hosted > on Dockerhub was missing a source file. I filed a github issue for that > here: https://github.com/broadinstitute/pcawg/issues/10 > > > > A rebuild of the image seemed to do the trick. I had some issues building > it (now resolved), probably because my build environment was different > (Mac) and/or broken (mysterious network issues with Bioconductor.org). > > > > Is there some expected input/output pair I should be using to validate > that it?s working as expected? > > > > I don?t have any feedback on the internals of the docker images > necessarily (pipette, single wdl task, etc). I think my first impression > was just being a little confused looking at the repo and not knowing where > to start. The docs mention Firecloud, GCE, make, etc. which doesn?t match > my environment, so I didn?t easily know where to start. But, after a day or > poking around I figured it out. I guess that?s to be expected when opening > up a new codebase on an unfamiliar and unreleased project. > > > > Thanks again, > > Alex Buchanan > > > > *From: *Gordon Saksena > *Date: *Thursday, October 20, 2016 at 2:03 PM > *To: *Alexander Buchanan > *Cc: *"docktesters at lists.icgc.org" > *Subject: *Re: Broad PCAWG Tokens definition? > > > > The tokens task is part of the PoN (Panel of Normals) filter. It is a > Java program that collects stats on the Normal BAM for the current donor. > These stats are later aggregated with stats from the other samples (in > another docker), and then used to flag certain variants in a VCF as suspect > (in a third docker). The overall algorithm is in the process of being > published. > > > > It should be one of the more straightforward algorithms to test - it has > very predictable CPU time and RAM usage, and should produce outputs that > can be tested via an exact binary match. It accepts just the normal BAM > for its input. > > > > I'm planning later dockers to have a similar structure, though with > increased memory and core requirements. The .wdl file will continue to > contain a single task, with the bulk of the pipeline wiring embedded inside > the docker. The dockers will either accept the source BAMs (for callers) > or VCFs (for filters) as inputs. If you have feedback I can incorporate it > into the other dockers. > > > > Gordon > > > > On Thu, Oct 20, 2016 at 4:08 PM, Alexander Buchanan > wrote: > > Hey Gordon, > > > > I?m new here so maybe I missed this, but what is the tokens task? How > would you describe what it does and the results it produces? > > > > Thanks! > > Alex Buchanan > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From i.buchhalter at Dkfz-Heidelberg.de Wed Oct 26 09:32:52 2016 From: i.buchhalter at Dkfz-Heidelberg.de (Buchhalter, Ivo) Date: Wed, 26 Oct 2016 13:32:52 +0000 Subject: [DOCKTESTERS] DKFZ bias filter docker In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A19D9@exmb2.ad.oicr.on.ca> References: <8b96b9c2-d0a4-d8ea-b2ca-7dabfc3a785d@dkfz.de> <27512884B2D81B41AAB7BB266248F240C09A1875@exmb2.ad.oicr.on.ca> <95c07a22-98c6-8602-a6ce-f17ca178ec9b@dkfz.de> <27512884B2D81B41AAB7BB266248F240C09A19D9@exmb2.ad.oicr.on.ca> Message-ID: Hi Denis, Unfortunately Manuel, who was supposed to help us to fix the docker fell sick. He will be back by the mid of next week. Will it be sufficient it we submit our docker by the end of next week or should we try to start working on it (probably with your help?). Thanks, Ivo > On 19 Oct 2016, at 21:33, Denis Yuen wrote: > > Hi, > > Thanks, this definitely helps to make it more clear. > I think this will give us enough information for us to start working on a CWL descriptor or to assist one being written. > > ________________________________________ > From: Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] > Sent: October 19, 2016 2:19 AM > To: Denis Yuen; Ivo Buchhalter; docktesters at lists.icgc.org > Cc: Werner, Johannes; prinz; Schlesner, Matthias > Subject: Re: [DOCKTESTERS] DKFZ bias filter docker > > Hi Denis, > > Sorry for the missing information. I updated the README in the > repository. I hope things are more clear now. >> 1) How does this relate to the existing dkfz workflow? ( https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows ) Is this a step that follows that workflow or is this a portion of that workflow that has been split out? > The filter is part of the DKFZ workflow. Since the other workflows don't > use similar filters the DKFZ stand alone filter was run on the complete > data set after merging the calls (only somatic SNV calls). >> 2) Could we get a readme that describes how to use this? > I updated the README. I hope it's more clear now. >> 3) Do you have some non-confidential small test data that we can use to run/test this quickly before running a real donor through this? > I will check if we can provide this later but the filter generally runs > only a couple of minutes (on a normal somatic "PASS" vcf with <= 10 000 > variants). > > Best, > Ivo > > >> Thanks! >> >> ________________________________________ >> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] >> Sent: October 18, 2016 3:06 AM >> To: docktesters at lists.icgc.org >> Cc: Werner, Johannes; prinz; Schlesner, Matthias >> Subject: [DOCKTESTERS] DKFZ bias filter docker >> >> Dear dockertesters, >> >> I was told to contact you regarding the DKFZ bias filter docker. The >> docker is basically ready and it also passed internal testing. The >> scripts and Dockerfile can be found here: >> https://github.com/eilslabs/DKFZBiasFilter >> Johannes and Manuel (CCed) have been working on bringing it into >> Dockstore (they ran into a NFS problem). Johannes is currently on >> vacation (until October 23rd). >> Please let me know if it is sufficient to have the docker available on >> Dockstore some time next week. Alternatively it might be possible that >> you work with the scripts and Dockerfile on GitHub. If none of this >> works I (or probably Manuel) can also try to get it running on Dockstore >> some time later this week. >> >> Best, >> Ivo >> >> PS: Sorry to the DKFZ people the email before bounced back because the >> link of the email address on the Wiki had a typo. >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters > On 10/18/2016 05:38 PM, Denis Yuen wrote: >> Hi, >> >> Thanks for the heads-up, we can probably help out with writing the CWL descriptor for this, but I do have a few questions. >> >> 1) How does this relate to the existing dkfz workflow? ( https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows ) Is this a step that follows that workflow or is this a portion of that workflow that has been split out? >> 2) Could we get a readme that describes how to use this? >> 3) Do you have some non-confidential small test data that we can use to run/test this quickly before running a real donor through this? >> >> Thanks! >> >> ________________________________________ >> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] >> Sent: October 18, 2016 3:06 AM >> To: docktesters at lists.icgc.org >> Cc: Werner, Johannes; prinz; Schlesner, Matthias >> Subject: [DOCKTESTERS] DKFZ bias filter docker >> >> Dear dockertesters, >> >> I was told to contact you regarding the DKFZ bias filter docker. The >> docker is basically ready and it also passed internal testing. The >> scripts and Dockerfile can be found here: >> https://github.com/eilslabs/DKFZBiasFilter >> Johannes and Manuel (CCed) have been working on bringing it into >> Dockstore (they ran into a NFS problem). Johannes is currently on >> vacation (until October 23rd). >> Please let me know if it is sufficient to have the docker available on >> Dockstore some time next week. Alternatively it might be possible that >> you work with the scripts and Dockerfile on GitHub. If none of this >> works I (or probably Manuel) can also try to get it running on Dockstore >> some time later this week. >> >> Best, >> Ivo >> >> PS: Sorry to the DKFZ people the email before bounced back because the >> link of the email address on the Wiki had a typo. >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters > From Denis.Yuen at oicr.on.ca Wed Oct 26 11:04:58 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Wed, 26 Oct 2016 15:04:58 +0000 Subject: [DOCKTESTERS] DKFZ bias filter docker In-Reply-To: References: <8b96b9c2-d0a4-d8ea-b2ca-7dabfc3a785d@dkfz.de> <27512884B2D81B41AAB7BB266248F240C09A1875@exmb2.ad.oicr.on.ca> <95c07a22-98c6-8602-a6ce-f17ca178ec9b@dkfz.de> <27512884B2D81B41AAB7BB266248F240C09A19D9@exmb2.ad.oicr.on.ca>, Message-ID: <27512884B2D81B41AAB7BB266248F240C09A2111@exmb2.ad.oicr.on.ca> Hi, I think next week should be sufficient, thanks for the update! ________________________________________ From: Buchhalter, Ivo [i.buchhalter at Dkfz-Heidelberg.de] Sent: October 26, 2016 9:32 AM To: Denis Yuen Cc: Buchhalter, Ivo; docktesters at lists.icgc.org; Werner, Johannes; Prinz, Manuel; Schlesner, Matthias Subject: Re: [DOCKTESTERS] DKFZ bias filter docker Hi Denis, Unfortunately Manuel, who was supposed to help us to fix the docker fell sick. He will be back by the mid of next week. Will it be sufficient it we submit our docker by the end of next week or should we try to start working on it (probably with your help?). Thanks, Ivo > On 19 Oct 2016, at 21:33, Denis Yuen wrote: > > Hi, > > Thanks, this definitely helps to make it more clear. > I think this will give us enough information for us to start working on a CWL descriptor or to assist one being written. > > ________________________________________ > From: Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] > Sent: October 19, 2016 2:19 AM > To: Denis Yuen; Ivo Buchhalter; docktesters at lists.icgc.org > Cc: Werner, Johannes; prinz; Schlesner, Matthias > Subject: Re: [DOCKTESTERS] DKFZ bias filter docker > > Hi Denis, > > Sorry for the missing information. I updated the README in the > repository. I hope things are more clear now. >> 1) How does this relate to the existing dkfz workflow? ( https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows ) Is this a step that follows that workflow or is this a portion of that workflow that has been split out? > The filter is part of the DKFZ workflow. Since the other workflows don't > use similar filters the DKFZ stand alone filter was run on the complete > data set after merging the calls (only somatic SNV calls). >> 2) Could we get a readme that describes how to use this? > I updated the README. I hope it's more clear now. >> 3) Do you have some non-confidential small test data that we can use to run/test this quickly before running a real donor through this? > I will check if we can provide this later but the filter generally runs > only a couple of minutes (on a normal somatic "PASS" vcf with <= 10 000 > variants). > > Best, > Ivo > > >> Thanks! >> >> ________________________________________ >> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] >> Sent: October 18, 2016 3:06 AM >> To: docktesters at lists.icgc.org >> Cc: Werner, Johannes; prinz; Schlesner, Matthias >> Subject: [DOCKTESTERS] DKFZ bias filter docker >> >> Dear dockertesters, >> >> I was told to contact you regarding the DKFZ bias filter docker. The >> docker is basically ready and it also passed internal testing. The >> scripts and Dockerfile can be found here: >> https://github.com/eilslabs/DKFZBiasFilter >> Johannes and Manuel (CCed) have been working on bringing it into >> Dockstore (they ran into a NFS problem). Johannes is currently on >> vacation (until October 23rd). >> Please let me know if it is sufficient to have the docker available on >> Dockstore some time next week. Alternatively it might be possible that >> you work with the scripts and Dockerfile on GitHub. If none of this >> works I (or probably Manuel) can also try to get it running on Dockstore >> some time later this week. >> >> Best, >> Ivo >> >> PS: Sorry to the DKFZ people the email before bounced back because the >> link of the email address on the Wiki had a typo. >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters > On 10/18/2016 05:38 PM, Denis Yuen wrote: >> Hi, >> >> Thanks for the heads-up, we can probably help out with writing the CWL descriptor for this, but I do have a few questions. >> >> 1) How does this relate to the existing dkfz workflow? ( https://github.com/ICGC-TCGA-PanCancer/dkfz_dockered_workflows ) Is this a step that follows that workflow or is this a portion of that workflow that has been split out? >> 2) Could we get a readme that describes how to use this? >> 3) Do you have some non-confidential small test data that we can use to run/test this quickly before running a real donor through this? >> >> Thanks! >> >> ________________________________________ >> From: docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org [docktesters-bounces+denis.yuen=oicr.on.ca at lists.icgc.org] on behalf of Ivo Buchhalter [i.buchhalter at dkfz-heidelberg.de] >> Sent: October 18, 2016 3:06 AM >> To: docktesters at lists.icgc.org >> Cc: Werner, Johannes; prinz; Schlesner, Matthias >> Subject: [DOCKTESTERS] DKFZ bias filter docker >> >> Dear dockertesters, >> >> I was told to contact you regarding the DKFZ bias filter docker. The >> docker is basically ready and it also passed internal testing. The >> scripts and Dockerfile can be found here: >> https://github.com/eilslabs/DKFZBiasFilter >> Johannes and Manuel (CCed) have been working on bringing it into >> Dockstore (they ran into a NFS problem). Johannes is currently on >> vacation (until October 23rd). >> Please let me know if it is sufficient to have the docker available on >> Dockstore some time next week. Alternatively it might be possible that >> you work with the scripts and Dockerfile on GitHub. If none of this >> works I (or probably Manuel) can also try to get it running on Dockstore >> some time later this week. >> >> Best, >> Ivo >> >> PS: Sorry to the DKFZ people the email before bounced back because the >> link of the email address on the Wiki had a typo. >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters >