From Denis.Yuen at oicr.on.ca Fri Dec 2 12:15:35 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Fri, 2 Dec 2016 17:15:35 +0000 Subject: [DOCKTESTERS] Core workflow icgc reference data location and gosu Message-ID: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> Hi, I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes. 1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally. If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location. 2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue. The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data In particular: * bwa-mem 2.6.8_1.2 * dkfz 2.0.1_cwl1.0 * sanger 2.0.3 * embl 2.0.1-cwl1.0 Denis Yuen Bioinformatics Software Developer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario, Canada M5G 0A3 Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Fri Dec 2 15:39:03 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Fri, 2 Dec 2016 20:39:03 +0000 Subject: [DOCKTESTERS] Core workflow icgc reference data location and gosu In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> Message-ID: <47AFC784-7986-447F-AA3D-2892402BED04@oicr.on.ca> OHSU people: any chance some of these will be tested before Monday AM?s call? Miguel: Can you rerun some of your successful tests with this new dockstore? All: any testing possible? Gordon: What is up with broady docker containers? We also have a request from Peter van Loo, to join this group ? I said yes, but he wants to test other containers from his group ? Which is great! After I get some specifics from him I will add peter and other(s) from his group. Stay tuned, Also, please update things on wiki/google doc as you progress forward, Thank you, @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Dec 2, 2016, at 12:15 PM, Denis Yuen > wrote: Hi, I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes. 1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally. If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location. 2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue. The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data In particular: * bwa-mem 2.6.8_1.2 * dkfz 2.0.1_cwl1.0 * sanger 2.0.3 * embl 2.0.1-cwl1.0 Denis Yuen Bioinformatics Software Developer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario, Canada M5G 0A3 Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikisvaz at gmail.com Fri Dec 2 17:38:50 2016 From: mikisvaz at gmail.com (Miguel Vazquez) Date: Fri, 2 Dec 2016 23:38:50 +0100 Subject: [DOCKTESTERS] Core workflow icgc reference data location and gosu In-Reply-To: <47AFC784-7986-447F-AA3D-2892402BED04@oicr.on.ca> References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> <47AFC784-7986-447F-AA3D-2892402BED04@oicr.on.ca> Message-ID: Francis, I'll rerun the test. I'm still waiting for the Sanger wf test that has been running for about 10 days. I'll see if it completes by Monday, otherwise I'll think if I abort and retest dkfz and see if I can debug bwa-mem. Best Miguel On Dec 2, 2016 9:39 PM, "Francis Ouellette" wrote: > > OHSU people: any chance some of these will be tested before Monday AM?s > call? > > Miguel: Can you rerun some of your successful tests with this new > dockstore? > > All: any testing possible? > > Gordon: What is up with broady docker containers? > > We also have a request from Peter van Loo, to join this group ? I said > yes, but he wants to test other containers from his group ? Which is great! > > After I get some specifics from him I will add peter and other(s) from his > group. > > Stay tuned, > > Also, please update things on wiki/google doc as you progress forward, > > Thank you, > > @bffo > > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > > > > > > > > > On Dec 2, 2016, at 12:15 PM, Denis Yuen wrote: > > Hi, > > I've gone through and created a new release for each of the core tools > (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes. > > 1) Each of these tools has been redirected to use reference data hosted on > the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws > in order to conserve our use of S3. This means that sample json parameter > files have been updated and any references inside the Docker images that > we're aware of also have been updated. As usual, if re-running workflows > frequently, you'll want to host these files locally. > > If you are using the Dockstore command-line interface, you'll need to > upgrade to the latest release of dockstore (1.2) to use this new file > location. > > 2) A number of users have run into problems running the workflow when > running in a multi-user environment (i.e. not running Docker containers > with the first user on a host). This release replaces most usage of sudo > inside the tools with gosu to deal with this issue. > > The new release numbers are documented at https://wiki.oicr.on.ca/ > display/PANCANCER/Workflow+Testing+Data > In particular: > > - bwa-mem 2.6.8_1.2 > - dkfz 2.0.1_cwl1.0 > - sanger 2.0.3 > - embl 2.0.1-cwl1.0 > > > > *Denis Yuen* > Bioinformatics Software Developer > > *Ontario* *Institute* *for* *Cancer* *Research* > MaRS Centre > 661 University Avenue > Suite 510 > Toronto, Ontario, Canada M5G 0A3 > Toll-free: 1-866-678-6427 > Twitter: @OICR_news > *www.oicr.on.ca * > This message and any attachments may contain confidential and/or > privileged information for the sole use of the intended recipient. Any > review or distribution by anyone other than the person for whom it was > originally intended is strictly prohibited. If you have received this > message in error, please contact the sender and delete all copies. > Opinions, conclusions or other information contained in this message may > not be that of the organization. > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From buchanae at ohsu.edu Fri Dec 2 19:08:55 2016 From: buchanae at ohsu.edu (Alexander Buchanan) Date: Sat, 3 Dec 2016 00:08:55 +0000 Subject: [DOCKTESTERS] Core workflow icgc reference data location and gosu In-Reply-To: <47AFC784-7986-447F-AA3D-2892402BED04@oicr.on.ca> References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> <47AFC784-7986-447F-AA3D-2892402BED04@oicr.on.ca> Message-ID: <53D2FB61-82B2-4610-A6D9-87049F8E5964@ohsu.edu> I don?t think we?ll have time before Monday morning to run any of the new containers. I?ll probably try out the new dkfz container early next week though, as I have been having lots of issues with running DKFZ on Cromwell. The issues are related to read-only files being mounted and dkfz switching users to ?roddy?, so I?m hoping the gosu changes will help. From: on behalf of Francis Ouellette Date: Friday, December 2, 2016 at 12:39 PM To: "docktesters at lists.icgc.org" Subject: Re: [DOCKTESTERS] Core workflow icgc reference data location and gosu OHSU people: any chance some of these will be tested before Monday AM?s call? Miguel: Can you rerun some of your successful tests with this new dockstore? All: any testing possible? Gordon: What is up with broady docker containers? We also have a request from Peter van Loo, to join this group ? I said yes, but he wants to test other containers from his group ? Which is great! After I get some specifics from him I will add peter and other(s) from his group. Stay tuned, Also, please update things on wiki/google doc as you progress forward, Thank you, @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Dec 2, 2016, at 12:15 PM, Denis Yuen > wrote: Hi, I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes. 1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally. If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location. 2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue. The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data In particular: * bwa-mem 2.6.8_1.2 * dkfz 2.0.1_cwl1.0 * sanger 2.0.3 * embl 2.0.1-cwl1.0 Denis Yuen Bioinformatics Software Developer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario, Canada M5G 0A3 Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From buchanae at ohsu.edu Fri Dec 2 19:11:15 2016 From: buchanae at ohsu.edu (Alexander Buchanan) Date: Sat, 3 Dec 2016 00:11:15 +0000 Subject: [DOCKTESTERS] Variant call validation results for Sanger Message-ID: <76AFD1D3-1EEA-427A-83C2-B28A7883E317@ohsu.edu> I was able to run USeq on data output from running the sanger workflow on a Cromwell engine, for 5 donors. It?s reporting some pretty big differences, so I still need investigate. I?ll copy the USeq output at the end of this email. I also wrote a simple comparison script, similar to what Miguel is doing (but in python), and it also reports differences. At this point, I don?t know the source of the difference. Maybe I?m not comparing the data correctly, or maybe the workflows were run incorrectly. Maybe the tools have some element of randomness, but I?m not sure that would explain the substantial differences. Output from python: python test.py ================================================== Donor: DO50414 intersection 17395 key - test 10904 test - key 6722 ================================================== Donor: DO50415 intersection 34721 key - test 17806 test - key 8755 ================================================== Donor: DO50417 intersection 81477 key - test 39521 test - key 15959 ================================================== Donor: DO50419 intersection 82705 key - test 41674 test - key 15262 ================================================== Donor: DO50432 intersection 3941 key - test 24358 test - key 138224 Output from USeq: [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50414/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50414/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 28299 Key variants 28299 Key variants in shared regions 0.904886914378029 Shared key variants Ti/Tv 24117 Test variants 24117 Test variants in shared regions 0.919073764621628 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 17395 6722 0.27872455 0.27872455 0.614686 0.2375349 0.72127545 Done! 4 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50415/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50415/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 52527 Key variants 52527 Key variants in shared regions 0.9612067356158758 Shared key variants Ti/Tv 43476 Test variants 43476 Test variants in shared regions 0.9573203673689897 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 34721 8755 0.20137547 0.20137547 0.6610124 0.16667618 0.7986245 Done! 4 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50417/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50417/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 120998 Key variants 120998 Key variants in shared regions 0.9540073962824799 Shared key variants Ti/Tv 97436 Test variants 97436 Test variants in shared regions 0.9392950261728001 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 81477 15959 0.16378957 0.16378957 0.6733748 0.13189474 0.8362104 Done! 4 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50419/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50419/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 124379 Key variants 124379 Key variants in shared regions 0.9678664662605807 Shared key variants Ti/Tv 97967 Test variants 97967 Test variants in shared regions 0.9450632358488693 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 82705 15262 0.15578716 0.15578716 0.66494346 0.1227056 0.84421283 Done! 3 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50432/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50432/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 28299 Key variants 28299 Key variants in shared regions 0.904886914378029 Shared key variants Ti/Tv 142165 Test variants 142165 Test variants in shared regions 0.9905488658639037 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 3940 138225 0.97228575 0.97228575 0.13922754 4.884448 0.027714275 Done! 4 seconds -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Sat Dec 3 06:10:36 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Sat, 3 Dec 2016 12:10:36 +0100 Subject: [DOCKTESTERS] Core workflow icgc reference data location and gosu In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> Message-ID: Denis, I've tried updating to the latest dockstore and ran into a bug. ubuntu at ip-10-253-35-14:~$ dockstore Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: io/cwl/avro/CWL$GsonBuildException at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: io.cwl.avro.CWL$GsonBuildException at java.net.URLClassLoader$1.run(URLClassLoader.java:370) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more Caused by: java.util.zip.ZipException: invalid LOC header (bad signature) at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:60) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:462) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) ... 13 more What I did is that I moved the old version of dockstore from ~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore I ran the new dockstore, it downloaded https://seqwaremaven.oicr.on.ca/artifactory/collab-release/io/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar and then when I tried running dockstore --version I got the error above. Anything I can do? Best regards Miguel On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen wrote: > Hi, > > I've gone through and created a new release for each of the core tools > (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes. > > 1) Each of these tools has been redirected to use reference data hosted on > the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws > in order to conserve our use of S3. This means that sample json parameter > files have been updated and any references inside the Docker images that > we're aware of also have been updated. As usual, if re-running workflows > frequently, you'll want to host these files locally. > > If you are using the Dockstore command-line interface, you'll need to > upgrade to the latest release of dockstore (1.2) to use this new file > location. > > 2) A number of users have run into problems running the workflow when > running in a multi-user environment (i.e. not running Docker containers > with the first user on a host). This release replaces most usage of sudo > inside the tools with gosu to deal with this issue. > > The new release numbers are documented at https://wiki.oicr.on.ca/ > display/PANCANCER/Workflow+Testing+Data > In particular: > > - bwa-mem 2.6.8_1.2 > - dkfz 2.0.1_cwl1.0 > - sanger 2.0.3 > - embl 2.0.1-cwl1.0 > > > > *Denis Yuen* > Bioinformatics Software Developer > > > *Ontario* *Institute* *for* *Cancer* *Research* > MaRS Centre > 661 University Avenue > Suite 510 > Toronto, Ontario, Canada M5G 0A3 > > Toll-free: 1-866-678-6427 > Twitter: @OICR_news > *www.oicr.on.ca * > > This message and any attachments may contain confidential and/or > privileged information for the sole use of the intended recipient. Any > review or distribution by anyone other than the person for whom it was > originally intended is strictly prohibited. If you have received this > message in error, please contact the sender and delete all copies. > Opinions, conclusions or other information contained in this message may > not be that of the organization. > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Sat Dec 3 18:48:10 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Sat, 3 Dec 2016 23:48:10 +0000 Subject: [DOCKTESTERS] Core workflow icgc reference data location and gosu In-Reply-To: References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> Message-ID: Anybody else with these problems? @bffo On Dec 3, 2016, at 06:10, Miguel Vazquez > wrote: Denis, I've tried updating to the latest dockstore and ran into a bug. ubuntu at ip-10-253-35-14:~$ dockstore Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: io/cwl/avro/CWL$GsonBuildException at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: io.cwl.avro.CWL$GsonBuildException at java.net.URLClassLoader$1.run(URLClassLoader.java:370) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more Caused by: java.util.zip.ZipException: invalid LOC header (bad signature) at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:60) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:462) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) ... 13 more What I did is that I moved the old version of dockstore from ~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore I ran the new dockstore, it downloaded https://seqwaremaven.oicr.on.ca/artifactory/collab-release/io/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar and then when I tried running dockstore --version I got the error above. Anything I can do? Best regards Miguel On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen > wrote: Hi, I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes. 1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally. If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location. 2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue. The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data In particular: * bwa-mem 2.6.8_1.2 * dkfz 2.0.1_cwl1.0 * sanger 2.0.3 * embl 2.0.1-cwl1.0 Denis Yuen Bioinformatics Software Developer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario, Canada M5G 0A3 Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Mon Dec 5 07:41:58 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Mon, 5 Dec 2016 13:41:58 +0100 Subject: [DOCKTESTERS] Sanger pipeline completed, small discrepancies with GNOS VCF (99.9905% accuracy) Message-ID: Dear all, The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311 The results are the following: *Comparison for DO50311 using Sanger---Common: 156299Extra: 1 - Example: Y:58885197:GMissing: 14 - Example: 1:102887902:T,1:143165228:G,16:87047601:C* The donor results for DKFZ yielded Comparison for DO50311 using DKFZ --- Common: 51087 Extra: 0 Missing: 0 In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data Best regards Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Mon Dec 5 09:18:58 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Mon, 5 Dec 2016 14:18:58 +0000 Subject: [DOCKTESTERS] Core workflow icgc reference data location and gosu In-Reply-To: References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> , Message-ID: <27512884B2D81B41AAB7BB266248F240C09A5D21@exmb2.ad.oicr.on.ca> Hi, Having some difficulty reproducing this. It's possible that a jar file was corrupted in transit, try deleting ~/.dockstore/self-installs/ and retrying. ________________________________ From: Francis Ouellette Sent: December 3, 2016 6:48 PM To: docktesters at lists.icgc.org Cc: Denis Yuen; Miguel Vazquez Subject: Re: [DOCKTESTERS] Core workflow icgc reference data location and gosu Anybody else with these problems? @bffo On Dec 3, 2016, at 06:10, Miguel Vazquez > wrote: Denis, I've tried updating to the latest dockstore and ran into a bug. ubuntu at ip-10-253-35-14:~$ dockstore Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: io/cwl/avro/CWL$GsonBuildException at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: io.cwl.avro.CWL$GsonBuildException at java.net.URLClassLoader$1.run(URLClassLoader.java:370) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more Caused by: java.util.zip.ZipException: invalid LOC header (bad signature) at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:60) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:462) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) ... 13 more What I did is that I moved the old version of dockstore from ~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore I ran the new dockstore, it downloaded https://seqwaremaven.oicr.on.ca/artifactory/collab-release/io/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar and then when I tried running dockstore --version I got the error above. Anything I can do? Best regards Miguel On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen > wrote: Hi, I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes. 1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally. If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location. 2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue. The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data In particular: * bwa-mem 2.6.8_1.2 * dkfz 2.0.1_cwl1.0 * sanger 2.0.3 * embl 2.0.1-cwl1.0 Denis Yuen Bioinformatics Software Developer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario, Canada M5G 0A3 Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Mon Dec 5 09:38:01 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Mon, 5 Dec 2016 14:38:01 +0000 Subject: [DOCKTESTERS] Joining PCAWG testing WG In-Reply-To: <70079FBF-6A3A-4AC3-A712-FDCB8B65713A@oicr.on.ca> References: <2F7E0094-10A8-4C90-89DB-B89A6AFB60DE@oicr.on.ca> <70079FBF-6A3A-4AC3-A712-FDCB8B65713A@oicr.on.ca> Message-ID: Hi Peter, You and Jonas have been added to the docktester list. I will send you recent docktester messages. @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Dec 2, 2016, at 3:40 PM, Francis Ouellette > wrote: Yes, I will add you to mailing list, and we don?t have much of TC schedule. We do most of our work via mailing list and wiki. cheers, @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Dec 2, 2016, at 3:38 PM, Peter Van Loo > wrote: Hi Francis, Yes, Jonas and me! Thanks! Can you add us to the relevant mailing lists and let us know when your TCs are? Cheers, Peter -- Peter Van Loo, PhD Winton Group Leader ? Cancer Genomics The Francis Crick Institute 1 Midland Road London NW1 1AT http://www.crick.ac.uk/peter-van-loo From: Francis Ouellette > Date: Friday, 2 December 2016 20:36 To: Peter Van Loo > Cc: Kyle Ellrott >, Paul Spellman >, Jonas Demeulemeester > Subject: Re: Joining PCAWG testing WG Sorry for the late reply Peter, the answer is yes, of course, but who is ?we?? Jonas and you? @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Nov 30, 2016, at 3:57 PM, Peter Van Loo > wrote: Hi Francis, Just to chase up on this again: could we join the PCAWG testing WG? We?re setting up to run the main PCAWG pipelines through a series of simulated BAM files from the SMC-Het DREAM challenge, to be able to characterise in detail how well we can detect subclones. It would be great if we could work with the PCAWG testing WG to get everything to run! Thanks! Cheers, Peter -- Peter Van Loo, PhD Winton Group Leader ? Cancer Genomics The Francis Crick Institute 1 Midland Road London NW1 1AT http://www.crick.ac.uk/peter-van-loo The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Mon Dec 5 10:05:15 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Mon, 5 Dec 2016 16:05:15 +0100 Subject: [DOCKTESTERS] Core workflow icgc reference data location and gosu In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A5D21@exmb2.ad.oicr.on.ca> References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A5D21@exmb2.ad.oicr.on.ca> Message-ID: Thanks Denis, your suggestion of removing the jar files worked. Best regards Miguel On Mon, Dec 5, 2016 at 3:18 PM, Denis Yuen wrote: > Hi, > Having some difficulty reproducing this. > It's possible that a jar file was corrupted in transit, try deleting > ~/.dockstore/self-installs/ and retrying. > > > > ------------------------------ > *From:* Francis Ouellette > *Sent:* December 3, 2016 6:48 PM > *To:* docktesters at lists.icgc.org > *Cc:* Denis Yuen; Miguel Vazquez > *Subject:* Re: [DOCKTESTERS] Core workflow icgc reference data location > and gosu > > > Anybody else with these problems? > > @bffo > > > > > On Dec 3, 2016, at 06:10, Miguel Vazquez > > wrote: > > Denis, > > I've tried updating to the latest dockstore and ran into a bug. > > ubuntu at ip-10-253-35-14:~$ dockstore > Error: A JNI error has occurred, please check your installation and try > again > Exception in thread "main" java.lang.NoClassDefFoundError: > io/cwl/avro/CWL$GsonBuildException > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass( > LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain( > LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: io.cwl.avro.CWL$ > GsonBuildException > at java.net.URLClassLoader$1.run(URLClassLoader.java:370) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:361) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more > Caused by: java.util.zip.ZipException: invalid LOC header (bad signature) > at java.util.zip.ZipFile.read(Native Method) > at java.util.zip.ZipFile.access$1400(ZipFile.java:60) > at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717) > at java.util.zip.ZipFile$ZipFileInflaterInputStream. > fill(ZipFile.java:419) > at java.util.zip.InflaterInputStream.read( > InflaterInputStream.java:158) > at sun.misc.Resource.getBytes(Resource.java:124) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:462) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > ... 13 more > > What I did is that I moved the old version of dockstore from > ~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from > > https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore > > > I ran the new dockstore, it downloaded > > https://seqwaremaven.oicr.on.ca/artifactory/collab-release/ > io/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar > > > and then when I tried running dockstore --version I got the error above. > Anything I can do? > > Best regards > > Miguel > > > On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen > > wrote: > >> Hi, >> >> I've gone through and created a new release for each of the core tools >> (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes. >> >> 1) Each of these tools has been redirected to use reference data hosted >> on the icgc portal ( https://dcc.icgc.org/releases/PCAWG >> >> ) rather than aws in order to conserve our use of S3. This means that >> sample json parameter files have been updated and any references inside the >> Docker images that we're aware of also have been updated. As usual, if >> re-running workflows frequently, you'll want to host these files locally. >> >> If you are using the Dockstore command-line interface, you'll need to >> upgrade to the latest release of dockstore (1.2) to use this new file >> location. >> >> 2) A number of users have run into problems running the workflow when >> running in a multi-user environment (i.e. not running Docker containers >> with the first user on a host). This release replaces most usage of sudo >> inside the tools with gosu to deal with this issue. >> >> The new release numbers are documented at https://wiki.oicr.on.ca/displa >> y/PANCANCER/Workflow+Testing+Data >> >> In particular: >> >> - bwa-mem 2.6.8_1.2 >> - dkfz 2.0.1_cwl1.0 >> - sanger 2.0.3 >> - embl 2.0.1-cwl1.0 >> >> >> >> *Denis Yuen* >> Bioinformatics Software Developer >> >> >> *Ontario* *Institute* *for* *Cancer* *Research* >> MaRS Centre >> 661 University Avenue >> Suite 510 >> Toronto, Ontario, Canada M5G 0A3 >> >> Toll-free: 1-866-678-6427 >> Twitter: @OICR_news >> *www.oicr.on.ca >> * >> >> This message and any attachments may contain confidential and/or >> privileged information for the sole use of the intended recipient. Any >> review or distribution by anyone other than the person for whom it was >> originally intended is strictly prohibited. If you have received this >> message in error, please contact the sender and delete all copies. >> Opinions, conclusions or other information contained in this message may >> not be that of the organization. >> >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> >> https://lists.icgc.org/mailman/listinfo/docktesters >> >> >> > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > > https://lists.icgc.org/mailman/listinfo/docktesters > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Mon Dec 5 12:16:03 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Mon, 5 Dec 2016 18:16:03 +0100 Subject: [DOCKTESTERS] Core workflow icgc reference data location and gosu In-Reply-To: References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A5D21@exmb2.ad.oicr.on.ca> Message-ID: By the way Denis, I now get this message when running dockstore: cwltool version is 1.0.20161202203310 , Dockstore is tested with 1.0.20161114152756 Override and run with `--script` I've added the --script to the command line and it works, but just so you know. Miguel On Mon, Dec 5, 2016 at 4:05 PM, Miguel Vazquez wrote: > Thanks Denis, your suggestion of removing the jar files worked. > > Best regards > > Miguel > > On Mon, Dec 5, 2016 at 3:18 PM, Denis Yuen wrote: > >> Hi, >> Having some difficulty reproducing this. >> It's possible that a jar file was corrupted in transit, try deleting >> ~/.dockstore/self-installs/ and retrying. >> >> >> >> ------------------------------ >> *From:* Francis Ouellette >> *Sent:* December 3, 2016 6:48 PM >> *To:* docktesters at lists.icgc.org >> *Cc:* Denis Yuen; Miguel Vazquez >> *Subject:* Re: [DOCKTESTERS] Core workflow icgc reference data location >> and gosu >> >> >> Anybody else with these problems? >> >> @bffo >> >> >> >> >> On Dec 3, 2016, at 06:10, Miguel Vazquez > > >> wrote: >> >> Denis, >> >> I've tried updating to the latest dockstore and ran into a bug. >> >> ubuntu at ip-10-253-35-14:~$ dockstore >> Error: A JNI error has occurred, please check your installation and try >> again >> Exception in thread "main" java.lang.NoClassDefFoundError: >> io/cwl/avro/CWL$GsonBuildException >> at java.lang.Class.getDeclaredMethods0(Native Method) >> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) >> at java.lang.Class.privateGetMethodRecursive(Class.java:3048) >> at java.lang.Class.getMethod0(Class.java:3018) >> at java.lang.Class.getMethod(Class.java:1784) >> at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper >> .java:544) >> at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper. >> java:526) >> Caused by: java.lang.ClassNotFoundException: >> io.cwl.avro.CWL$GsonBuildException >> at java.net.URLClassLoader$1.run(URLClassLoader.java:370) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:362) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:361) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> ... 7 more >> Caused by: java.util.zip.ZipException: invalid LOC header (bad signature) >> at java.util.zip.ZipFile.read(Native Method) >> at java.util.zip.ZipFile.access$1400(ZipFile.java:60) >> at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java: >> 717) >> at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill( >> ZipFile.java:419) >> at java.util.zip.InflaterInputStream.read(InflaterInputStream. >> java:158) >> at sun.misc.Resource.getBytes(Resource.java:124) >> at java.net.URLClassLoader.defineClass(URLClassLoader.java:462) >> at java.net.URLClassLoader.access$100(URLClassLoader.java:73) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:368) >> ... 13 more >> >> What I did is that I moved the old version of dockstore from >> ~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from >> >> https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore >> >> >> I ran the new dockstore, it downloaded >> >> https://seqwaremaven.oicr.on.ca/artifactory/collab-release/i >> o/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar >> >> >> and then when I tried running dockstore --version I got the error above. >> Anything I can do? >> >> Best regards >> >> Miguel >> >> >> On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen > >> > wrote: >> >>> Hi, >>> >>> I've gone through and created a new release for each of the core tools >>> (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes. >>> >>> 1) Each of these tools has been redirected to use reference data hosted >>> on the icgc portal ( https://dcc.icgc.org/releases/PCAWG >>> >>> ) rather than aws in order to conserve our use of S3. This means that >>> sample json parameter files have been updated and any references inside the >>> Docker images that we're aware of also have been updated. As usual, if >>> re-running workflows frequently, you'll want to host these files locally. >>> >>> If you are using the Dockstore command-line interface, you'll need to >>> upgrade to the latest release of dockstore (1.2) to use this new file >>> location. >>> >>> 2) A number of users have run into problems running the workflow when >>> running in a multi-user environment (i.e. not running Docker containers >>> with the first user on a host). This release replaces most usage of sudo >>> inside the tools with gosu to deal with this issue. >>> >>> The new release numbers are documented at https://wiki.oicr.on.ca/displa >>> y/PANCANCER/Workflow+Testing+Data >>> >>> In particular: >>> >>> - bwa-mem 2.6.8_1.2 >>> - dkfz 2.0.1_cwl1.0 >>> - sanger 2.0.3 >>> - embl 2.0.1-cwl1.0 >>> >>> >>> >>> *Denis Yuen* >>> Bioinformatics Software Developer >>> >>> >>> *Ontario* *Institute* *for* *Cancer* *Research* >>> MaRS Centre >>> 661 University Avenue >>> Suite 510 >>> Toronto, Ontario, Canada M5G 0A3 >>> >>> Toll-free: 1-866-678-6427 >>> Twitter: @OICR_news >>> *www.oicr.on.ca >>> * >>> >>> This message and any attachments may contain confidential and/or >>> privileged information for the sole use of the intended recipient. Any >>> review or distribution by anyone other than the person for whom it was >>> originally intended is strictly prohibited. If you have received this >>> message in error, please contact the sender and delete all copies. >>> Opinions, conclusions or other information contained in this message may >>> not be that of the organization. >>> >>> _______________________________________________ >>> docktesters mailing list >>> docktesters at lists.icgc.org >>> >>> https://lists.icgc.org/mailman/listinfo/docktesters >>> >>> >>> >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> >> https://lists.icgc.org/mailman/listinfo/docktesters >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denis.Yuen at oicr.on.ca Mon Dec 5 12:47:40 2016 From: Denis.Yuen at oicr.on.ca (Denis Yuen) Date: Mon, 5 Dec 2016 17:47:40 +0000 Subject: [DOCKTESTERS] Core workflow icgc reference data location and gosu In-Reply-To: References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca> <27512884B2D81B41AAB7BB266248F240C09A5D21@exmb2.ad.oicr.on.ca> , Message-ID: <27512884B2D81B41AAB7BB266248F240C09A617F@exmb2.ad.oicr.on.ca> Hi, Yup, we've added that warning because we test with a specific version of cwltool. It looks like you have a newer version of cwltool which is probably ok (assuming you haven't run into any problems). The warning is more aimed at users of older versions of cwltool than newer. Denis Yuen Bioinformatics Software Developer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario, Canada M5G 0A3 Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. ________________________________ From: mikisvaz at gmail.com [mikisvaz at gmail.com] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es] Sent: December 5, 2016 12:16 PM To: Denis Yuen Cc: Francis Ouellette; docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] Core workflow icgc reference data location and gosu By the way Denis, I now get this message when running dockstore: cwltool version is 1.0.20161202203310 , Dockstore is tested with 1.0.20161114152756 Override and run with `--script` I've added the --script to the command line and it works, but just so you know. Miguel On Mon, Dec 5, 2016 at 4:05 PM, Miguel Vazquez > wrote: Thanks Denis, your suggestion of removing the jar files worked. Best regards Miguel On Mon, Dec 5, 2016 at 3:18 PM, Denis Yuen > wrote: Hi, Having some difficulty reproducing this. It's possible that a jar file was corrupted in transit, try deleting ~/.dockstore/self-installs/ and retrying. ________________________________ From: Francis Ouellette Sent: December 3, 2016 6:48 PM To: docktesters at lists.icgc.org Cc: Denis Yuen; Miguel Vazquez Subject: Re: [DOCKTESTERS] Core workflow icgc reference data location and gosu Anybody else with these problems? @bffo On Dec 3, 2016, at 06:10, Miguel Vazquez > wrote: Denis, I've tried updating to the latest dockstore and ran into a bug. ubuntu at ip-10-253-35-14:~$ dockstore Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: io/cwl/avro/CWL$GsonBuildException at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: io.cwl.avro.CWL$GsonBuildException at java.net.URLClassLoader$1.run(URLClassLoader.java:370) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more Caused by: java.util.zip.ZipException: invalid LOC header (bad signature) at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:60) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:462) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) ... 13 more What I did is that I moved the old version of dockstore from ~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore I ran the new dockstore, it downloaded https://seqwaremaven.oicr.on.ca/artifactory/collab-release/io/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar and then when I tried running dockstore --version I got the error above. Anything I can do? Best regards Miguel On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen > wrote: Hi, I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes. 1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally. If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location. 2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue. The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data In particular: * bwa-mem 2.6.8_1.2 * dkfz 2.0.1_cwl1.0 * sanger 2.0.3 * embl 2.0.1-cwl1.0 Denis Yuen Bioinformatics Software Developer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario, Canada M5G 0A3 Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From buchanae at ohsu.edu Mon Dec 5 13:21:33 2016 From: buchanae at ohsu.edu (Alexander Buchanan) Date: Mon, 5 Dec 2016 18:21:33 +0000 Subject: [DOCKTESTERS] Variant call validation results for Sanger Message-ID: Regarding the validation results I posted last Friday, we think these poor results are likely due an upstream issue and not the Sanger workflow itself. Those variant call results were from a larger process including fastq prep., alignment, and then Sanger variant calling, and we think we introduced a problem early on during fastq prep. We have a different set of Sanger results that reused the existing alignments from GNOS, and those variants match the expected results much more closely (99.99% match). One example output from USeq: 82486 Key variants 82486 Key variants in shared regions 0.953626071716167 Shared key variants Ti/Tv 82482 Test variants 82482 Test variants in shared regions 0.9536238749407864 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKeyFPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 82479 3 3.6371574E-5 3.6371574E-5 0.9999151 3.636981E-5 0.99996364 From: on behalf of Alexander Buchanan Date: Friday, December 2, 2016 at 4:11 PM To: "docktesters at lists.icgc.org" Subject: [DOCKTESTERS] Variant call validation results for Sanger I was able to run USeq on data output from running the sanger workflow on a Cromwell engine, for 5 donors. It?s reporting some pretty big differences, so I still need investigate. I?ll copy the USeq output at the end of this email. I also wrote a simple comparison script, similar to what Miguel is doing (but in python), and it also reports differences. At this point, I don?t know the source of the difference. Maybe I?m not comparing the data correctly, or maybe the workflows were run incorrectly. Maybe the tools have some element of randomness, but I?m not sure that would explain the substantial differences. Output from python: python test.py ================================================== Donor: DO50414 intersection 17395 key - test 10904 test - key 6722 ================================================== Donor: DO50415 intersection 34721 key - test 17806 test - key 8755 ================================================== Donor: DO50417 intersection 81477 key - test 39521 test - key 15959 ================================================== Donor: DO50419 intersection 82705 key - test 41674 test - key 15262 ================================================== Donor: DO50432 intersection 3941 key - test 24358 test - key 138224 Output from USeq: [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50414/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50414/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 28299 Key variants 28299 Key variants in shared regions 0.904886914378029 Shared key variants Ti/Tv 24117 Test variants 24117 Test variants in shared regions 0.919073764621628 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 17395 6722 0.27872455 0.27872455 0.614686 0.2375349 0.72127545 Done! 4 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50415/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50415/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 52527 Key variants 52527 Key variants in shared regions 0.9612067356158758 Shared key variants Ti/Tv 43476 Test variants 43476 Test variants in shared regions 0.9573203673689897 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 34721 8755 0.20137547 0.20137547 0.6610124 0.16667618 0.7986245 Done! 4 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50417/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50417/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 120998 Key variants 120998 Key variants in shared regions 0.9540073962824799 Shared key variants Ti/Tv 97436 Test variants 97436 Test variants in shared regions 0.9392950261728001 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 81477 15959 0.16378957 0.16378957 0.6733748 0.13189474 0.8362104 Done! 4 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50419/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50419/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 124379 Key variants 124379 Key variants in shared regions 0.9678664662605807 Shared key variants Ti/Tv 97967 Test variants 97967 Test variants in shared regions 0.9450632358488693 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 82705 15262 0.15578716 0.15578716 0.66494346 0.1227056 0.84421283 Done! 3 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50432/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50432/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 28299 Key variants 28299 Key variants in shared regions 0.904886914378029 Shared key variants Ti/Tv 142165 Test variants 142165 Test variants in shared regions 0.9905488658639037 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 3940 138225 0.97228575 0.97228575 0.13922754 4.884448 0.027714275 Done! 4 seconds -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Tue Dec 6 10:41:33 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Tue, 6 Dec 2016 15:41:33 +0000 Subject: [DOCKTESTERS] Sanger pipeline completed, small discrepancies with GNOS VCF (99.9905% accuracy) In-Reply-To: References: Message-ID: Miguel, I personally think this is a slippery slope, but the PCAWG-tech (i.e. Lincoln and Christina) asked if you could repeat this experiment, and see if you get the same different variants, or the same, and in particular, do you think the minor differences you see are caused by the cloud infrastructure you are using. Would be good to test for that too. Can you redo before Monday?s call? @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Dec 5, 2016, at 7:41 AM, Miguel Vazquez > wrote: Dear all, The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311 The results are the following: Comparison for DO50311 using Sanger --- Common: 156299 Extra: 1 - Example: Y:58885197:G Missing: 14 - Example: 1:102887902:T,1:143165228:G,16:87047601:C The donor results for DKFZ yielded Comparison for DO50311 using DKFZ --- Common: 51087 Extra: 0 Missing: 0 In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data Best regards Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Tue Dec 6 10:52:52 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Tue, 6 Dec 2016 15:52:52 +0000 Subject: [DOCKTESTERS] Variant call validation results for Sanger In-Reply-To: References: Message-ID: <6AF31004-13C6-4EF0-B3A4-9988D284D8D2@oicr.on.ca> Hi Alex, Likewise here, on the test that ?worked?, are the differences platform specific, and are they reproducible. I think we only need to do this a couple of times, to inform us if the differences are operator and/or platform specific, or simply (which I think) more about the heuristics of the testing we are doing. Thank you for looking into this. @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Dec 5, 2016, at 1:21 PM, Alexander Buchanan > wrote: Regarding the validation results I posted last Friday, we think these poor results are likely due an upstream issue and not the Sanger workflow itself. Those variant call results were from a larger process including fastq prep., alignment, and then Sanger variant calling, and we think we introduced a problem early on during fastq prep. We have a different set of Sanger results that reused the existing alignments from GNOS, and those variants match the expected results much more closely (99.99% match). One example output from USeq: 82486 Key variants 82486 Key variants in shared regions 0.953626071716167 Shared key variants Ti/Tv 82482 Test variants 82482 Test variants in shared regions 0.9536238749407864 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKeyFPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 82479 3 3.6371574E-5 3.6371574E-5 0.9999151 3.636981E-5 0.99996364 From: > on behalf of Alexander Buchanan > Date: Friday, December 2, 2016 at 4:11 PM To: "docktesters at lists.icgc.org" > Subject: [DOCKTESTERS] Variant call validation results for Sanger I was able to run USeq on data output from running the sanger workflow on a Cromwell engine, for 5 donors. It?s reporting some pretty big differences, so I still need investigate. I?ll copy the USeq output at the end of this email. I also wrote a simple comparison script, similar to what Miguel is doing (but in python), and it also reports differences. At this point, I don?t know the source of the difference. Maybe I?m not comparing the data correctly, or maybe the workflows were run incorrectly. Maybe the tools have some element of randomness, but I?m not sure that would explain the substantial differences. Output from python: python test.py ================================================== Donor: DO50414 intersection 17395 key - test 10904 test - key 6722 ================================================== Donor: DO50415 intersection 34721 key - test 17806 test - key 8755 ================================================== Donor: DO50417 intersection 81477 key - test 39521 test - key 15959 ================================================== Donor: DO50419 intersection 82705 key - test 41674 test - key 15262 ================================================== Donor: DO50432 intersection 3941 key - test 24358 test - key 138224 Output from USeq: [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50414/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50414/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 28299 Key variants 28299 Key variants in shared regions 0.904886914378029 Shared key variants Ti/Tv 24117 Test variants 24117 Test variants in shared regions 0.919073764621628 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 17395 6722 0.27872455 0.27872455 0.614686 0.2375349 0.72127545 Done! 4 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50415/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50415/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 52527 Key variants 52527 Key variants in shared regions 0.9612067356158758 Shared key variants Ti/Tv 43476 Test variants 43476 Test variants in shared regions 0.9573203673689897 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 34721 8755 0.20137547 0.20137547 0.6610124 0.16667618 0.7986245 Done! 4 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50417/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50417/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 120998 Key variants 120998 Key variants in shared regions 0.9540073962824799 Shared key variants Ti/Tv 97436 Test variants 97436 Test variants in shared regions 0.9392950261728001 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 81477 15959 0.16378957 0.16378957 0.6733748 0.13189474 0.8362104 Done! 4 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50419/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50419/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 124379 Key variants 124379 Key variants in shared regions 0.9678664662605807 Shared key variants Ti/Tv 97967 Test variants 97967 Test variants in shared regions 0.9450632358488693 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 82705 15262 0.15578716 0.15578716 0.66494346 0.1227056 0.84421283 Done! 3 seconds [2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50432/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50432/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed VCF Comparator Settings: somatic_snv_mnv.vcf.gz Key vcf file human_chromosomes.bed Key interrogated regions file somatic_snv_mnv.vcf.gz Test vcf file human_chromosomes.bed Test interrogated regions file true Require matching alternate bases false Require matching genotypes false Use record VQSLOD score as ranking statistic false Exclude non PASS or . records true Compare all variant Parsing and filtering variant data for common interrogated regions... Comparing calls... 3137454505 Interrogated bps in key 3137454505 Interrogated bps in test 3137454505 Interrogated bps in common 28299 Key variants 28299 Key variants in shared regions 0.904886914378029 Shared key variants Ti/Tv 142165 Test variants 142165 Test variants in shared regions 0.9905488658639037 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 3940 138225 0.97228575 0.97228575 0.13922754 4.884448 0.027714275 Done! 4 seconds _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Fri Dec 9 14:46:31 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Fri, 9 Dec 2016 19:46:31 +0000 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference References: <0F84ED6166CE664E8563B61CB2ECB98CCBB4123C@exmb2.ad.oicr.on.ca> Message-ID: <357A36EA-3A35-4B69-8C72-BD422005718B@oicr.on.ca> Please e-mail Miguel( and/or this list) if there are any docktesting updates before Monday. I will not be on pcawg-tech call, but Miguel will be. https://wiki.oicr.on.ca/display/PANCANCER/2016-12-12+PCAWG-TECH+Teleconference Have a great weekend, @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette Begin forwarded message: From: Christina Yung > Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Date: December 9, 2016 at 2:40:11 PM EST To: "pawg-tech (pawg-tech at lists.icgc.org)" > Hi Everyone, Below is a draft agenda for Monday?s tech call. Please feel free to edit on the wiki: https://wiki.oicr.on.ca/display/PANCANCER/2016-12-12+PCAWG-TECH+Teleconference Best regards, Christina Call Info Usual Time 9 AM Eastern Time, Mondays UK 0208 322 2500 Canada 1-866-220-6419 United States 1-877-420-0272 All Others Please see attached PDF file with a list of numbers for other countries. Participant Code 5910819# Agenda Time Item Who Attachments/Links 5min Welcome. Wait for group members to log on Christina Yung, OICR 10min Overall status Christina Yung, OICR ? Linkouts to Most Current PCAWG Data ? Report data issues to pcawg-data at icgc.org, GNOS issue to: Help at annaisystems.com ? From Boston F2F: PCAWG datasets & dependencies Action Items 1. [Junjun] Specimen ID mapping for miRNA and methylation, see update below: a. miRNA release posted in Linkouts to Most Current PCAWG Data item #11. i. Release table includes various entity IDs, source miRNA data, and mapping between miRNA samples and WGS / RNA-Seq samples, Google Spreadsheet: https://docs.google.com/spreadsheets/d/1QG8UTaX71H6tpvt-XJp0_pGFkxURWZ8tC1cIMg3sMJM/edit#gid=2063341485 ii. The release is also available here in TSV, JSONL formats (easy for programmatic parsing): ? http://pancancer.info/data_releases/mirna/mirna_release.v1.0.jsonl ? http://pancancer.info/data_releases/mirna/mirna_release.v1.0.tsv ? http://pancancer.info/data_releases/mirna/mirna_sample_sheet.v1.0.tsv b. New RNA expression data file produced, previously reported issue about missing aliquot fixed. c. Methylation WG will have have first complete set of analysis result by mid December. Will wait till then to revisit the matching of sample IDs. 2. [Junjun] Create directories on ICGC Portal for reference datasets, see update below: a. Reference data folder structure reorganized available at https://dcc.icgc.org/releases/PCAWG (README under each sub-folder will be added to provide addition information). PCAWG ??? reference_data ??? data_for_testing ??? pcawg-broad ??? pcawg-bwa-mem ??? pcawg-delly ??? pcawg-sanger b. c. Other additional reference data from other working groups can be added too. ? hg19_cosmic_v54_120711.vcf - Adam Butler confirmed that this file can be redistrubted with the other references files used by Broad. 3. [Jonathan] For cell lines, consensus SVs available: pdf; syn7373725. Consensus SNVs & indels: . merged results passed to Broad for filtering a. dkfz-filtered SNVs have been added to https://www.synapse.org/#!Synapse:syn7510859 4. [Christina] For medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark, run alignment & variant workflows . Sanger - completed a. DKFZ/EMBL - Encountered 2nd Roddy error. Logs sent to Michael. b. Broad v1 completed - passed to Broad to fix 5. [All] Contribute to the manuscripts . infrastructure: Paper ( https://goo.gl/utx3cC ), Supplement ( https://goo.gl/gtYUv7 ) a. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e ) b. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline ) i. some examples from Matthias: GroupSeminar_20160531_onlyPCAWG.pptx ? Nuno just reported 7 whitelisted aliquots are missing from V1.6 of consensus SV. Joachim explained that 1 of the 4 SV callers failed for these aliquots, and has added a note to Synapse. 15min Update on publication plan Lincoln Stein, OICR 10min Status of dockerizing workflows Denis Yuen, OICR Brian O'Connor, UCSC Gordon Saksena, Broad Status of PCAWG Workflow ports to Dockstore: 1. [Kyle] MuSE 2. [Gordon] Broad dockers with #days estimated for development a. 0 - tokens alg, for SNV/Indel PoN generation (not GATK) b. 5 - SNV/Indel PoN filter c. 3 - MuTect with ContEst wiring bug, plus rescue, plus SV minibam, using public PoN (GATK) d. 3 - MuTect without ContEst wiring bug, using public PoN. (GATK license) e. ? - MuTect PoN generator f. 3 - OxoQ measure + OxoG filter (GATK license due to filter step) g. 5 - Snowman (GPL needs to be unlinked. not GATK) h. 5 - dRanger, includes preprocess and BreakPointer. (does public PoN exist? release without PoN?) i. ? - dRanger PoN generator j. 3 - frag counter k. 3 - haplotype caller (GATK) l. 3 - RECapseg (GATK) m. 3 - vcf merge (GATK) n. ? - het sites force call (GATK, results never used due to bug) o. ? - BQSR + cocleaning (relying on Kyle) p. 10 - deTiN - detection and filter (GATK) q. 3 - gVCF merge (GATK) r. 5 - blat SNV filter (GATK) s. 10 - germline overlap SNV filter (GATK) t. ? - Variantbam 3. [Jonathan] Consensus algorithm for SNVs: 2+ of 4 4. [Jonathan] Consensus algorithm for indels: stacked logistic regression - Update from Jonathan on 11/22: 1 week to work on scripts, 1.5 week to Dockerize 5. [Joachim] Consensus algorithm for SVs - will check in on 12/5 Of note: ? Naming convention: "pcawg--workflow" for complex, docker-based workflows; or "pcawg--tool" for standalone, single tools ? currently working on releases using gosu to tackle "unknown user issue" and new test.json pointing at pcawg site ? PCAWG DOI Generation for a howto guide on doing this (we use GitHub + Zenodo) ? Brian's Dockstore tutorials: ? https://www.youtube.com/watch?v=sInP-ByF9xU ? https://www.youtube.com/edit?video_id=-JuKsSQja3g ? Tutorial from 12/6: https://goo.gl/2bnXq & https://www.youtube.com/watch?v=Gb6LnmpZj_g 10min Status of testing dockerized workflows Francis Ouellette, OICR Denis Yuen, OICR Brian O'Connor, UCSC PCAWG Docker (Dockstore) Testing Working Group Workflow Testing Data Docker containers to be tested ________________________________ Copy of table from "Workflow Testing Data" representing latest status of PCAWG docker containers currentlt present on the PUBLIC Dockstore.org and what the status of their testing is (taken 08:30 AM EDT, Dec 5, 2017). [cid:image001.png at 01D2522A.2678F110] ________________________________ Status of workflows being tested: 1. BWA-Mem 2. Sanger - Changes proposed by Keiran have been made. Currently testing if the new version fixes issues with a specific donor (DO50311); test had to be restarted due to reboot for security patch a. Tests passed with 2.0.2, watch out though, test data ran at the regular time, but the previously failing donor DO50311 took upward of 8 days 3. EMBL 4. DKFZ 5. DKFZ's PCR & strand bias filtering 5min Other business? Group Christina K. Yung, PhD Project Manager, Cancer Genome Collaboratory Ontario Institute for Cancer Research MaRS Centre 661 University Avenue, Suite 510 Toronto, Ontario, Canada M5G 0A3 Tel: 416-673-8578 www.oicr.on.ca www.cancercollaboratory.org This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ PAWG-TECH mailing list PAWG-TECH at lists.icgc.org https://lists.icgc.org/mailman/listinfo/pawg-tech -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 240 bytes Desc: image001.png URL: From miguel.vazquez at cnio.es Mon Dec 12 04:50:02 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Mon, 12 Dec 2016 10:50:02 +0100 Subject: [DOCKTESTERS] =?utf-8?q?_DKFZ_2=C2=BA_validation_=28DO52140=29=3A?= =?utf-8?q?_100=25_match!?= Message-ID: Hi all, I've a second validation result for the DKFZ (plus Delly) workflow and that confirms the perfect results on the previous one. *Comparison for DO52140 using DKFZ---Common: 37160Extra: 0Missing: 0* I've updated the information in the wiki. A second validation in on the way for Sanger, for which the first one had a few missmatches (+1; -14). Best Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Mon Dec 12 05:44:07 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Mon, 12 Dec 2016 11:44:07 +0100 Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy) Message-ID: Dear all, I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations. Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help. Best regards Miguel On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez wrote: > Dear all, > > The Sanger pipeline completed, after about 2 weeks of computing, for donor > DO50311 > > The results are the following: > > > > > > > > *Comparison for DO50311 using Sanger---Common: 156299Extra: 1 - > Example: Y:58885197:GMissing: 14 - Example: > 1:102887902:T,1:143165228:G,16:87047601:C* > > > The donor results for DKFZ yielded > > Comparison for DO50311 using DKFZ > --- > Common: 51087 > Extra: 0 > Missing: 0 > > > In both cases I'm comparing agains the VCF file downloaded from GNOS. I've > updated the information here > > https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data > > > Best regards > > Miguel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Mon Dec 12 07:38:46 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Mon, 12 Dec 2016 12:38:46 +0000 Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy) In-Reply-To: References: Message-ID: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca> I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow. If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down. Anyway, going off to my day off, Have a ghre at discussion, Francis -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Dec 12, 2016, at 05:44, Miguel Vazquez > wrote: Dear all, I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations. Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help. Best regards Miguel On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez > wrote: Dear all, The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311 The results are the following: Comparison for DO50311 using Sanger --- Common: 156299 Extra: 1 - Example: Y:58885197:G Missing: 14 - Example: 1:102887902:T,1:143165228:G,16:87047601:C The donor results for DKFZ yielded Comparison for DO50311 using DKFZ --- Common: 51087 Extra: 0 Missing: 0 In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data Best regards Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From Brian.OConnor at oicr.on.ca Mon Dec 12 09:12:48 2016 From: Brian.OConnor at oicr.on.ca (Brian O'Connor) Date: Mon, 12 Dec 2016 14:12:48 +0000 Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy) In-Reply-To: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca> References: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca> Message-ID: Hi Francis, I agree with you, I think Miguel is showing what this group needs to show, that someone else can run the tools from Dockstore, have that be successful, and the results are largely in agreement with previous results (or duplicate runs). I think maybe a statement about the possibility of stochastic results in the README for each tool would be sufficient. This could be something that Keiran can craft/comment for Sanger?s pipeline since he?s in the best position for this one. Brian > On Dec 12, 2016, at 7:38 AM, Francis Ouellette wrote: > > I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow. > > If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down. > > Anyway, going off to my day off, > > Have a ghre at discussion, > > Francis > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > On Dec 12, 2016, at 05:44, Miguel Vazquez wrote: > >> Dear all, >> >> I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations. >> >> Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help. >> >> Best regards >> >> Miguel >> >> >> >> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez wrote: >> Dear all, >> >> The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311 >> >> The results are the following: >> >> Comparison for DO50311 using Sanger >> --- >> Common: 156299 >> Extra: 1 >> - Example: Y:58885197:G >> Missing: 14 >> - Example: 1:102887902:T,1:143165228:G,16:87047601:C >> >> >> The donor results for DKFZ yielded >> >> Comparison for DO50311 using DKFZ >> --- >> Common: 51087 >> Extra: 0 >> Missing: 0 >> >> >> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here >> >> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data >> >> >> Best regards >> >> Miguel >> >> > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters From kr2 at sanger.ac.uk Mon Dec 12 09:38:01 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Mon, 12 Dec 2016 14:38:01 +0000 Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy) In-Reply-To: References: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca> Message-ID: <64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk> Hi, I'd need access to the full set of result files from the run but can you confirm the pindel and ASCAT VCF's exactly the same? Both feed into caveman analysis. ASCAT is the least stable of the algorithms as it randomly assigns the B-allele and if this donor is known to have an unusual copynumber/rearrangment state it is likely to be the cause (I wouldn't consider a sample like this to particularly good for testing though). What were the results on the other samples, I assume cleaner data has also been run? Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 12 Dec 2016, at 14:12, Brian O'Connor wrote: > > Hi Francis, > > I agree with you, I think Miguel is showing what this group needs to show, that someone else can run the tools from Dockstore, have that be successful, and the results are largely in agreement with previous results (or duplicate runs). I think maybe a statement about the possibility of stochastic results in the README for each tool would be sufficient. This could be something that Keiran can craft/comment for Sanger?s pipeline since he?s in the best position for this one. > > Brian > >> On Dec 12, 2016, at 7:38 AM, Francis Ouellette wrote: >> >> I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow. >> >> If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down. >> >> Anyway, going off to my day off, >> >> Have a ghre at discussion, >> >> Francis >> >> -- >> B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette >> >> On Dec 12, 2016, at 05:44, Miguel Vazquez wrote: >> >>> Dear all, >>> >>> I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations. >>> >>> Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help. >>> >>> Best regards >>> >>> Miguel >>> >>> >>> >>> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez wrote: >>> Dear all, >>> >>> The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311 >>> >>> The results are the following: >>> >>> Comparison for DO50311 using Sanger >>> --- >>> Common: 156299 >>> Extra: 1 >>> - Example: Y:58885197:G >>> Missing: 14 >>> - Example: 1:102887902:T,1:143165228:G,16:87047601:C >>> >>> >>> The donor results for DKFZ yielded >>> >>> Comparison for DO50311 using DKFZ >>> --- >>> Common: 51087 >>> Extra: 0 >>> Missing: 0 >>> >>> >>> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here >>> >>> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data >>> >>> >>> Best regards >>> >>> Miguel >>> >>> >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kr2 at sanger.ac.uk Mon Dec 12 09:48:11 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Mon, 12 Dec 2016 14:48:11 +0000 Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy) In-Reply-To: References: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca> Message-ID: <38E4FB68-1BDF-458D-83C0-30BE53647805@sanger.ac.uk> Additionally, are these stats based on PASSED SUB variants? A couple of the missing items are clear SNPs which would be filtered. Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 12 Dec 2016, at 14:12, Brian O'Connor wrote: > > Hi Francis, > > I agree with you, I think Miguel is showing what this group needs to show, that someone else can run the tools from Dockstore, have that be successful, and the results are largely in agreement with previous results (or duplicate runs). I think maybe a statement about the possibility of stochastic results in the README for each tool would be sufficient. This could be something that Keiran can craft/comment for Sanger?s pipeline since he?s in the best position for this one. > > Brian > >> On Dec 12, 2016, at 7:38 AM, Francis Ouellette wrote: >> >> I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow. >> >> If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down. >> >> Anyway, going off to my day off, >> >> Have a ghre at discussion, >> >> Francis >> >> -- >> B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette >> >> On Dec 12, 2016, at 05:44, Miguel Vazquez wrote: >> >>> Dear all, >>> >>> I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations. >>> >>> Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help. >>> >>> Best regards >>> >>> Miguel >>> >>> >>> >>> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez wrote: >>> Dear all, >>> >>> The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311 >>> >>> The results are the following: >>> >>> Comparison for DO50311 using Sanger >>> --- >>> Common: 156299 >>> Extra: 1 >>> - Example: Y:58885197:G >>> Missing: 14 >>> - Example: 1:102887902:T,1:143165228:G,16:87047601:C >>> >>> >>> The donor results for DKFZ yielded >>> >>> Comparison for DO50311 using DKFZ >>> --- >>> Common: 51087 >>> Extra: 0 >>> Missing: 0 >>> >>> >>> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here >>> >>> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data >>> >>> >>> Best regards >>> >>> Miguel >>> >>> >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Mon Dec 12 09:51:36 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Mon, 12 Dec 2016 15:51:36 +0100 Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy) In-Reply-To: <38E4FB68-1BDF-458D-83C0-30BE53647805@sanger.ac.uk> References: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca> <38E4FB68-1BDF-458D-83C0-30BE53647805@sanger.ac.uk> Message-ID: Hi Keiran, No, I've not filtered for PASS, so these are all variants as far as I know. Best M On Mon, Dec 12, 2016 at 3:48 PM, Keiran Raine wrote: > Additionally, are these stats based on PASSED SUB variants? A couple of > the missing items are clear SNPs which would be filtered. > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 <+44%201223%20834244> > Office: H104 > > On 12 Dec 2016, at 14:12, Brian O'Connor wrote: > > Hi Francis, > > I agree with you, I think Miguel is showing what this group needs to show, > that someone else can run the tools from Dockstore, have that be > successful, and the results are largely in agreement with previous results > (or duplicate runs). I think maybe a statement about the possibility of > stochastic results in the README for each tool would be sufficient. This > could be something that Keiran can craft/comment for Sanger?s pipeline > since he?s in the best position for this one. > > Brian > > On Dec 12, 2016, at 7:38 AM, Francis Ouellette wrote: > > I know I'm not suppose to be there (and I'm not :-), but one slippery > slope I want this dockerstore testing working group to be wary about (and > Christina, this is really directed at you, chairing the discussion today) > is that the request from Lincoln for this to reproduce what we are doing is > fine, but I don't think it is this working group's task to reproduce and > explain all of the discrepancies we see. I don't think we ever saw that > kind of data from the people that ran the original workflow. > > If this group can ascertain that a dock store container basically works, I > think we need to call that test a success, and move on to the next one. > What Miguel is suggesting/asking below is very good, but I could see this > becoming into a very slippery slope, which I would advise us against > slipping down. > > Anyway, going off to my day off, > > Have a ghre at discussion, > > Francis > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > On Dec 12, 2016, at 05:44, Miguel Vazquez wrote: > > Dear all, > > I was wondering if someone here was acquainted with the Sanger workflow > and could help explain these discrepancies. I've skimmed through the code, > and it seems like uses EM but I didn't find anything random in it, such as > during initialization, which was my initial guess. The other thing I though > is that when it splits the work for parallel processing it might choose a > different number of splits to accommodate the number of CPUs, and that this > might affect the calculations. > > Is there someone here that could help shed some light? As soon as some > other tests finish I'll be running the process again, but since it takes so > long perhaps a little insight would help. > > Best regards > > Miguel > > > > On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez > wrote: > Dear all, > > The Sanger pipeline completed, after about 2 weeks of computing, for donor > DO50311 > > The results are the following: > > Comparison for DO50311 using Sanger > --- > Common: 156299 > Extra: 1 > - Example: Y:58885197:G > Missing: 14 > - Example: 1:102887902:T,1:143165228:G,16:87047601:C > > > The donor results for DKFZ yielded > > Comparison for DO50311 using DKFZ > --- > Common: 51087 > Extra: 0 > Missing: 0 > > > In both cases I'm comparing agains the VCF file downloaded from GNOS. I've > updated the information here > > https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data > > > Best regards > > Miguel > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Mon Dec 12 09:56:13 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Mon, 12 Dec 2016 15:56:13 +0100 Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy) In-Reply-To: <64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk> References: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca> <64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk> Message-ID: Hi Keiran, I don't know how to check the pindel and ASCAT VCF's. I have not saved the docker image. If you give me detailed instructions I can save it on my next run and get them for you. As for the difficulties on this donor (just my luck to choose this one at random), I'm running the pipeline on another donor, perhaps it will show no discrepancies, or perhaps its a better subject for our inquiries. We should see soon, I hope; it's 4 days into the analysis. Best Miguel On Mon, Dec 12, 2016 at 3:38 PM, Keiran Raine wrote: > Hi, > > I'd need access to the full set of result files from the run but can you > confirm the pindel and ASCAT VCF's exactly the same? Both feed into > caveman analysis. > > ASCAT is the least stable of the algorithms as it randomly assigns the > B-allele and if this donor is known to have an unusual > copynumber/rearrangment state it is likely to be the cause (I wouldn't > consider a sample like this to particularly good for testing though). > > What were the results on the other samples, I assume cleaner data has also > been run? > > Regards, > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 <+44%201223%20834244> > Office: H104 > > On 12 Dec 2016, at 14:12, Brian O'Connor wrote: > > Hi Francis, > > I agree with you, I think Miguel is showing what this group needs to show, > that someone else can run the tools from Dockstore, have that be > successful, and the results are largely in agreement with previous results > (or duplicate runs). I think maybe a statement about the possibility of > stochastic results in the README for each tool would be sufficient. This > could be something that Keiran can craft/comment for Sanger?s pipeline > since he?s in the best position for this one. > > Brian > > On Dec 12, 2016, at 7:38 AM, Francis Ouellette wrote: > > I know I'm not suppose to be there (and I'm not :-), but one slippery > slope I want this dockerstore testing working group to be wary about (and > Christina, this is really directed at you, chairing the discussion today) > is that the request from Lincoln for this to reproduce what we are doing is > fine, but I don't think it is this working group's task to reproduce and > explain all of the discrepancies we see. I don't think we ever saw that > kind of data from the people that ran the original workflow. > > If this group can ascertain that a dock store container basically works, I > think we need to call that test a success, and move on to the next one. > What Miguel is suggesting/asking below is very good, but I could see this > becoming into a very slippery slope, which I would advise us against > slipping down. > > Anyway, going off to my day off, > > Have a ghre at discussion, > > Francis > > -- > B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette > > On Dec 12, 2016, at 05:44, Miguel Vazquez wrote: > > Dear all, > > I was wondering if someone here was acquainted with the Sanger workflow > and could help explain these discrepancies. I've skimmed through the code, > and it seems like uses EM but I didn't find anything random in it, such as > during initialization, which was my initial guess. The other thing I though > is that when it splits the work for parallel processing it might choose a > different number of splits to accommodate the number of CPUs, and that this > might affect the calculations. > > Is there someone here that could help shed some light? As soon as some > other tests finish I'll be running the process again, but since it takes so > long perhaps a little insight would help. > > Best regards > > Miguel > > > > On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez > wrote: > Dear all, > > The Sanger pipeline completed, after about 2 weeks of computing, for donor > DO50311 > > The results are the following: > > Comparison for DO50311 using Sanger > --- > Common: 156299 > Extra: 1 > - Example: Y:58885197:G > Missing: 14 > - Example: 1:102887902:T,1:143165228:G,16:87047601:C > > > The donor results for DKFZ yielded > > Comparison for DO50311 using DKFZ > --- > Common: 51087 > Extra: 0 > Missing: 0 > > > In both cases I'm comparing agains the VCF file downloaded from GNOS. I've > updated the information here > > https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data > > > Best regards > > Miguel > > > _______________________________________________ > docktesters mailing list > docktesters at lists.icgc.org > https://lists.icgc.org/mailman/listinfo/docktesters > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Christina.Yung at oicr.on.ca Mon Dec 12 10:24:30 2016 From: Christina.Yung at oicr.on.ca (Christina Yung) Date: Mon, 12 Dec 2016 15:24:30 +0000 Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy) In-Reply-To: References: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca> <64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk> Message-ID: <0F84ED6166CE664E8563B61CB2ECB98CCBB47002@exmb2.ad.oicr.on.ca> Thanks to Miguel for all the great work, and for giving an update on the tech call. Consolidating some comments, I think we can conclude that a docker has passed testing when it produces 1. The same outputs as the production runs (SNVs, indels, SVs ? somatic + germline in some cases), or 2. Outputs with very small discrepancies from the production runs, plus an explanation from the workflow author on the discrepancies. Workflow authors need to point out steps that are stochastic, or changes in the docker that introduce any non-random differences. Best, Christina From: docktesters-bounces+christina.yung=oicr.on.ca at lists.icgc.org [mailto:docktesters-bounces+christina.yung=oicr.on.ca at lists.icgc.org] On Behalf Of Miguel Vazquez Sent: Monday, December 12, 2016 9:56 AM To: Keiran Raine Cc: docktesters at lists.icgc.org Subject: Re: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy) Hi Keiran, I don't know how to check the pindel and ASCAT VCF's. I have not saved the docker image. If you give me detailed instructions I can save it on my next run and get them for you. As for the difficulties on this donor (just my luck to choose this one at random), I'm running the pipeline on another donor, perhaps it will show no discrepancies, or perhaps its a better subject for our inquiries. We should see soon, I hope; it's 4 days into the analysis. Best Miguel On Mon, Dec 12, 2016 at 3:38 PM, Keiran Raine > wrote: Hi, I'd need access to the full set of result files from the run but can you confirm the pindel and ASCAT VCF's exactly the same? Both feed into caveman analysis. ASCAT is the least stable of the algorithms as it randomly assigns the B-allele and if this donor is known to have an unusual copynumber/rearrangment state it is likely to be the cause (I wouldn't consider a sample like this to particularly good for testing though). What were the results on the other samples, I assume cleaner data has also been run? Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 On 12 Dec 2016, at 14:12, Brian O'Connor > wrote: Hi Francis, I agree with you, I think Miguel is showing what this group needs to show, that someone else can run the tools from Dockstore, have that be successful, and the results are largely in agreement with previous results (or duplicate runs). I think maybe a statement about the possibility of stochastic results in the README for each tool would be sufficient. This could be something that Keiran can craft/comment for Sanger?s pipeline since he?s in the best position for this one. Brian On Dec 12, 2016, at 7:38 AM, Francis Ouellette > wrote: I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow. If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down. Anyway, going off to my day off, Have a ghre at discussion, Francis -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette On Dec 12, 2016, at 05:44, Miguel Vazquez > wrote: Dear all, I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations. Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help. Best regards Miguel On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez > wrote: Dear all, The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311 The results are the following: Comparison for DO50311 using Sanger --- Common: 156299 Extra: 1 - Example: Y:58885197:G Missing: 14 - Example: 1:102887902:T,1:143165228:G,16:87047601:C The donor results for DKFZ yielded Comparison for DO50311 using DKFZ --- Common: 51087 Extra: 0 Missing: 0 In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data Best regards Miguel _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kr2 at sanger.ac.uk Mon Dec 12 10:38:42 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Mon, 12 Dec 2016 15:38:42 +0000 Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy) In-Reply-To: References: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca> <64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk> Message-ID: <12197707-78EE-45CB-ACB1-0A1D2819470B@sanger.ac.uk> Hi Miguel, ASCAT is *.somatic.cnv.vcf.gz Pindel is *.somatic.indel.vcf.gz Are you not using vcftools to do comaprisons on all generated VCF files? All variants: vcftools --gzvcf input_file1.vcf.gz --gzdiff input_file2.vcf.gz --diff-site --out in1_v_in2 Passed variants: vcftools --gzvcf input_file1.vcf.gz --gzdiff input_file2.vcf.gz --diff-site --out in1_v_in2 --remove-filtered-all (unfortunately a sort instability in Pindel may require the indel vcf to be resorted first on: chr, pos, ref, alt) Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 12 Dec 2016, at 14:56, Miguel Vazquez wrote: > > Hi Keiran, > > I don't know how to check the pindel and ASCAT VCF's. I have not saved the docker image. If you give me detailed instructions I can save it on my next run and get them for you. > > As for the difficulties on this donor (just my luck to choose this one at random), I'm running the pipeline on another donor, perhaps it will show no discrepancies, or perhaps its a better subject for our inquiries. We should see soon, I hope; it's 4 days into the analysis. > > Best > > Miguel > > On Mon, Dec 12, 2016 at 3:38 PM, Keiran Raine > wrote: > Hi, > > I'd need access to the full set of result files from the run but can you confirm the pindel and ASCAT VCF's exactly the same? Both feed into caveman analysis. > > ASCAT is the least stable of the algorithms as it randomly assigns the B-allele and if this donor is known to have an unusual copynumber/rearrangment state it is likely to be the cause (I wouldn't consider a sample like this to particularly good for testing though). > > What were the results on the other samples, I assume cleaner data has also been run? > > Regards, > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 > Office: H104 > >> On 12 Dec 2016, at 14:12, Brian O'Connor > wrote: >> >> Hi Francis, >> >> I agree with you, I think Miguel is showing what this group needs to show, that someone else can run the tools from Dockstore, have that be successful, and the results are largely in agreement with previous results (or duplicate runs). I think maybe a statement about the possibility of stochastic results in the README for each tool would be sufficient. This could be something that Keiran can craft/comment for Sanger?s pipeline since he?s in the best position for this one. >> >> Brian >> >>> On Dec 12, 2016, at 7:38 AM, Francis Ouellette > wrote: >>> >>> I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow. >>> >>> If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down. >>> >>> Anyway, going off to my day off, >>> >>> Have a ghre at discussion, >>> >>> Francis >>> >>> -- >>> B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette >>> >>> On Dec 12, 2016, at 05:44, Miguel Vazquez > wrote: >>> >>>> Dear all, >>>> >>>> I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations. >>>> >>>> Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help. >>>> >>>> Best regards >>>> >>>> Miguel >>>> >>>> >>>> >>>> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez > wrote: >>>> Dear all, >>>> >>>> The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311 >>>> >>>> The results are the following: >>>> >>>> Comparison for DO50311 using Sanger >>>> --- >>>> Common: 156299 >>>> Extra: 1 >>>> - Example: Y:58885197:G >>>> Missing: 14 >>>> - Example: 1:102887902:T,1:143165228:G,16:87047601:C >>>> >>>> >>>> The donor results for DKFZ yielded >>>> >>>> Comparison for DO50311 using DKFZ >>>> --- >>>> Common: 51087 >>>> Extra: 0 >>>> Missing: 0 >>>> >>>> >>>> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here >>>> >>>> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data >>>> >>>> >>>> Best regards >>>> >>>> Miguel >>>> >>>> >>> _______________________________________________ >>> docktesters mailing list >>> docktesters at lists.icgc.org >>> https://lists.icgc.org/mailman/listinfo/docktesters >> > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Mon Dec 12 11:16:20 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Mon, 12 Dec 2016 17:16:20 +0100 Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy) In-Reply-To: <12197707-78EE-45CB-ACB1-0A1D2819470B@sanger.ac.uk> References: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca> <64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk> <12197707-78EE-45CB-ACB1-0A1D2819470B@sanger.ac.uk> Message-ID: Thanks Keiran, that is what Christina asked us to do,so I'll check it next On Dec 12, 2016 4:38 PM, "Keiran Raine" wrote: > Hi Miguel, > > ASCAT is *.somatic.cnv.vcf.gz > > Pindel is *.somatic.indel.vcf.gz > > Are you not using vcftools to do comaprisons on all generated VCF files? > > All variants: > vcftools --gzvcf input_file1.vcf.gz --gzdiff input_file2.vcf.gz > --diff-site --out in1_v_in2 > > Passed variants: > vcftools --gzvcf input_file1.vcf.gz --gzdiff input_file2.vcf.gz > --diff-site --out in1_v_in2 --remove-filtered-all > > (unfortunately a sort instability in Pindel may require the indel vcf to > be resorted first on: chr, pos, ref, alt) > > Regards, > > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 <+44%201223%20834244> > Office: H104 > > On 12 Dec 2016, at 14:56, Miguel Vazquez wrote: > > Hi Keiran, > > I don't know how to check the pindel and ASCAT VCF's. I have not saved the > docker image. If you give me detailed instructions I can save it on my next > run and get them for you. > > As for the difficulties on this donor (just my luck to choose this one at > random), I'm running the pipeline on another donor, perhaps it will show no > discrepancies, or perhaps its a better subject for our inquiries. We should > see soon, I hope; it's 4 days into the analysis. > > Best > > Miguel > > On Mon, Dec 12, 2016 at 3:38 PM, Keiran Raine wrote: > >> Hi, >> >> I'd need access to the full set of result files from the run but can you >> confirm the pindel and ASCAT VCF's exactly the same? Both feed into >> caveman analysis. >> >> ASCAT is the least stable of the algorithms as it randomly assigns the >> B-allele and if this donor is known to have an unusual >> copynumber/rearrangment state it is likely to be the cause (I wouldn't >> consider a sample like this to particularly good for testing though). >> >> What were the results on the other samples, I assume cleaner data has >> also been run? >> >> Regards, >> >> Keiran Raine >> Principal Bioinformatician >> Cancer Genome Project >> Wellcome Trust Sanger Institute >> >> kr2 at sanger.ac.uk >> Tel:+44 (0)1223 834244 Ext: 4983 <+44%201223%20834244> >> Office: H104 >> >> On 12 Dec 2016, at 14:12, Brian O'Connor >> wrote: >> >> Hi Francis, >> >> I agree with you, I think Miguel is showing what this group needs to >> show, that someone else can run the tools from Dockstore, have that be >> successful, and the results are largely in agreement with previous results >> (or duplicate runs). I think maybe a statement about the possibility of >> stochastic results in the README for each tool would be sufficient. This >> could be something that Keiran can craft/comment for Sanger?s pipeline >> since he?s in the best position for this one. >> >> Brian >> >> On Dec 12, 2016, at 7:38 AM, Francis Ouellette >> wrote: >> >> I know I'm not suppose to be there (and I'm not :-), but one slippery >> slope I want this dockerstore testing working group to be wary about (and >> Christina, this is really directed at you, chairing the discussion today) >> is that the request from Lincoln for this to reproduce what we are doing is >> fine, but I don't think it is this working group's task to reproduce and >> explain all of the discrepancies we see. I don't think we ever saw that >> kind of data from the people that ran the original workflow. >> >> If this group can ascertain that a dock store container basically works, >> I think we need to call that test a success, and move on to the next one. >> What Miguel is suggesting/asking below is very good, but I could see this >> becoming into a very slippery slope, which I would advise us against >> slipping down. >> >> Anyway, going off to my day off, >> >> Have a ghre at discussion, >> >> Francis >> >> -- >> B.F. Francis Ouellette http://oicr.on.ca/per >> son/francis-ouellette >> >> On Dec 12, 2016, at 05:44, Miguel Vazquez wrote: >> >> Dear all, >> >> I was wondering if someone here was acquainted with the Sanger workflow >> and could help explain these discrepancies. I've skimmed through the code, >> and it seems like uses EM but I didn't find anything random in it, such as >> during initialization, which was my initial guess. The other thing I though >> is that when it splits the work for parallel processing it might choose a >> different number of splits to accommodate the number of CPUs, and that this >> might affect the calculations. >> >> Is there someone here that could help shed some light? As soon as some >> other tests finish I'll be running the process again, but since it takes so >> long perhaps a little insight would help. >> >> Best regards >> >> Miguel >> >> >> >> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez >> wrote: >> Dear all, >> >> The Sanger pipeline completed, after about 2 weeks of computing, for >> donor DO50311 >> >> The results are the following: >> >> Comparison for DO50311 using Sanger >> --- >> Common: 156299 >> Extra: 1 >> - Example: Y:58885197:G >> Missing: 14 >> - Example: 1:102887902:T,1:143165228:G,16:87047601:C >> >> >> The donor results for DKFZ yielded >> >> Comparison for DO50311 using DKFZ >> --- >> Common: 51087 >> Extra: 0 >> Missing: 0 >> >> >> In both cases I'm comparing agains the VCF file downloaded from GNOS. >> I've updated the information here >> >> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data >> >> >> Best regards >> >> Miguel >> >> >> _______________________________________________ >> docktesters mailing list >> docktesters at lists.icgc.org >> https://lists.icgc.org/mailman/listinfo/docktesters >> >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a company >> registered in England with number 2742969, whose registered office is 215 >> Euston Road, London, NW1 2BE. >> > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Thu Dec 15 16:17:52 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Thu, 15 Dec 2016 21:17:52 +0000 Subject: [DOCKTESTERS] updates Message-ID: Any updates from docktesters? Miquel: are you free again on Monday? I may need somebody to cover for me again at the pcawg-tech conf call. Many thanks, @bffo -- B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette From miguel.vazquez at cnio.es Fri Dec 16 05:05:34 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Fri, 16 Dec 2016 11:05:34 +0100 Subject: [DOCKTESTERS] DKFZ extended validation for 2 samples. Only differences in CNV Message-ID: Hi Christina et al. Like you asked me I've extended the validation from SNV to Indel and SNV and also for germline For the two samples all matches perfectly except SNV where we find some large differences. Best regards Miguel Report ~~~~~ Comparison of germline.indel for DO50311 using DKFZ --- Common: 709060 Extra: 0 Missing: 0 Comparison of germline.snv.mnv for DO50311 using DKFZ --- Common: 3850992 Extra: 0 Missing: 0 Comparison of somatic.cnv for DO50311 using DKFZ --- Common: 731 Extra: 213 - Example: 10:132510034:,10:20596801:,10:47674883: Missing: 190 - Example: 10:100891940:,10:104975905:,10:119704960: Comparison of somatic.indel for DO50311 using DKFZ --- Common: 26469 Extra: 0 Missing: 0 Comparison of somatic.snv.mnv for DO50311 using DKFZ --- Common: 51087 Extra: 0 Missing: 0 Comparison of germline.indel for DO52140 using DKFZ --- Common: 706572 Extra: 0 Missing: 0 Comparison of germline.snv.mnv for DO52140 using DKFZ --- Common: 3833896 Extra: 0 Missing: 0 Comparison of somatic.cnv for DO52140 using DKFZ --- Common: 275 Extra: 94 - Example: 1:106505931:,1:109068899:,1:109359995: Missing: 286 - Example: 10:88653561:,11:179192:,11:38252006: Comparison of somatic.indel for DO52140 using DKFZ --- Common: 19347 Extra: 0 Missing: 0 Comparison of somatic.snv.mnv for DO52140 using DKFZ --- Common: 37160 Extra: 0 Missing: 0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Fri Dec 16 05:07:01 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Fri, 16 Dec 2016 11:07:01 +0100 Subject: [DOCKTESTERS] DKFZ extended validation for 2 samples. Only differences in CNV In-Reply-To: References: Message-ID: Excuse me, obviously I meant *For the two samples all matches perfectly except CNV where we find some large differences. * On Fri, Dec 16, 2016 at 11:05 AM, Miguel Vazquez wrote: > Hi Christina et al. > > Like you asked me I've extended the validation from SNV to Indel and SNV > and also for germline > > For the two samples all matches perfectly except SNV where we find some > large differences. > > Best regards > > Miguel > > Report > ~~~~~ > > Comparison of germline.indel for DO50311 using DKFZ > --- > Common: 709060 > Extra: 0 > Missing: 0 > > > Comparison of germline.snv.mnv for DO50311 using DKFZ > --- > Common: 3850992 > Extra: 0 > Missing: 0 > > > Comparison of somatic.cnv for DO50311 using DKFZ > --- > Common: 731 > Extra: 213 > - Example: 10:132510034:,10:20596801:,10: > 47674883: > Missing: 190 > - Example: 10:100891940:,10:104975905:,10: > 119704960: > > > Comparison of somatic.indel for DO50311 using DKFZ > --- > Common: 26469 > Extra: 0 > Missing: 0 > > > Comparison of somatic.snv.mnv for DO50311 using DKFZ > --- > Common: 51087 > Extra: 0 > Missing: 0 > > > Comparison of germline.indel for DO52140 using DKFZ > --- > Common: 706572 > Extra: 0 > Missing: 0 > > > Comparison of germline.snv.mnv for DO52140 using DKFZ > --- > Common: 3833896 > Extra: 0 > Missing: 0 > > > Comparison of somatic.cnv for DO52140 using DKFZ > --- > Common: 275 > Extra: 94 > - Example: 1:106505931:,1:109068899:,1:109359995: > Missing: 286 > - Example: 10:88653561:,11:179192:,11:38252006: > > > Comparison of somatic.indel for DO52140 using DKFZ > --- > Common: 19347 > Extra: 0 > Missing: 0 > > > Comparison of somatic.snv.mnv for DO52140 using DKFZ > --- > Common: 37160 > Extra: 0 > Missing: 0 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Fri Dec 16 05:50:36 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Fri, 16 Dec 2016 11:50:36 +0100 Subject: [DOCKTESTERS] Sanger extended validation for 1 sample. Only differences in SNV Message-ID: Dear Christina and Keiran, I've extended the analysis also for the Sanger workflow. I only have one sample (DO50311) since the second one (DO52140) its still computing since December 8 (8 days) Keiran I believed you asked me about how indels and CNVs matched. Is this what you needed? All matches except for the +1 -14 differences I reported before Best regards Miguel Report ~~~~~ Comparison of somatic.cnv for DO50311 using Sanger --- Common: 138 Extra: 0 Missing: 0 Comparison of somatic.indel for DO50311 using Sanger --- Common: 812487 Extra: 0 Missing: 0 Comparison of somatic.snv.mnv for DO50311 using Sanger --- Common: 156299 Extra: 1 - Example: Y:58885197:A:G Missing: 14 - Example: 1:102887902:A:T,1:143165228:C:G,16:87047601:A:C Comparison of somatic.sv for DO50311 using Sanger --- Common: 260 Extra: 0 Missing: 0 *Note. To make matching more stringent in indels I've added the reference to the mutation code end up comparing. This extends to SNV as well so where previously I wrote *16:87047601:C I* now write *16:87047601:A:C.* The extremely thorough reader will notice that the reports for DKFZ bellow show the discrepancies in CNV not following this new format. I've introduced this afterwards, but the results have not changed for DKFZ; I've checked. On Fri, Dec 16, 2016 at 11:07 AM, Miguel Vazquez wrote: > Excuse me, obviously I meant > > *For the two samples all matches perfectly except CNV where we find some > large differences. * > > On Fri, Dec 16, 2016 at 11:05 AM, Miguel Vazquez > wrote: > >> Hi Christina et al. >> >> Like you asked me I've extended the validation from SNV to Indel and SNV >> and also for germline >> >> For the two samples all matches perfectly except SNV where we find some >> large differences. >> >> Best regards >> >> Miguel >> >> Report >> ~~~~~ >> >> Comparison of germline.indel for DO50311 using DKFZ >> --- >> Common: 709060 >> Extra: 0 >> Missing: 0 >> >> >> Comparison of germline.snv.mnv for DO50311 using DKFZ >> --- >> Common: 3850992 >> Extra: 0 >> Missing: 0 >> >> >> Comparison of somatic.cnv for DO50311 using DKFZ >> --- >> Common: 731 >> Extra: 213 >> - Example: 10:132510034:,10:20596801:,10:47674883:< >> NEUTRAL> >> Missing: 190 >> - Example: 10:100891940:,10:1049 >> 75905:,10:119704960: >> >> >> Comparison of somatic.indel for DO50311 using DKFZ >> --- >> Common: 26469 >> Extra: 0 >> Missing: 0 >> >> >> Comparison of somatic.snv.mnv for DO50311 using DKFZ >> --- >> Common: 51087 >> Extra: 0 >> Missing: 0 >> >> >> Comparison of germline.indel for DO52140 using DKFZ >> --- >> Common: 706572 >> Extra: 0 >> Missing: 0 >> >> >> Comparison of germline.snv.mnv for DO52140 using DKFZ >> --- >> Common: 3833896 >> Extra: 0 >> Missing: 0 >> >> >> Comparison of somatic.cnv for DO52140 using DKFZ >> --- >> Common: 275 >> Extra: 94 >> - Example: 1:106505931:,1:109068899:,1:109359995: >> Missing: 286 >> - Example: 10:88653561:,11:179192:,11:38252006: >> >> >> Comparison of somatic.indel for DO52140 using DKFZ >> --- >> Common: 19347 >> Extra: 0 >> Missing: 0 >> >> >> Comparison of somatic.snv.mnv for DO52140 using DKFZ >> --- >> Common: 37160 >> Extra: 0 >> Missing: 0 >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kr2 at sanger.ac.uk Fri Dec 16 06:33:46 2016 From: kr2 at sanger.ac.uk (Keiran Raine) Date: Fri, 16 Dec 2016 11:33:46 +0000 Subject: [DOCKTESTERS] Sanger extended validation for 1 sample. Only differences in SNV In-Reply-To: References: Message-ID: <260604EE-713F-4095-96F6-F41260E67AA8@sanger.ac.uk> Hi Miguel, Please be aware that we agreed to include the last version of the algorithms used in the core analysis for the final docker (as there were fixes along the way). There were several versions of the core SNV caller fixing edge cases. I don't see these discrepancies being an issue, especially if they aren't marked as 'PASS'. It's theoretically possible for some of these to be floating-point differences (esp. if different CPU arch), this can cause a call to flip between the SNP and SUB output if very close to the cut-off. Sending the physical VCF records (with the VCF header) for these from the relevant run (missing from the old run, Extra from the new) would allow a confirmation, but as I say if they aren't marked 'PASS' we wouldn't be planning to do any thing about them. Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr2 at sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 4983 Office: H104 > On 16 Dec 2016, at 10:50, Miguel Vazquez wrote: > > Dear Christina and Keiran, > > I've extended the analysis also for the Sanger workflow. I only have one sample (DO50311) since the second one (DO52140) its still computing since December 8 (8 days) > > Keiran I believed you asked me about how indels and CNVs matched. Is this what you needed? > > All matches except for the +1 -14 differences I reported before > > Best regards > > Miguel > > Report > ~~~~~ > > Comparison of somatic.cnv for DO50311 using Sanger > --- > Common: 138 > Extra: 0 > Missing: 0 > > > Comparison of somatic.indel for DO50311 using Sanger > --- > Common: 812487 > Extra: 0 > Missing: 0 > > > Comparison of somatic.snv.mnv for DO50311 using Sanger > --- > Common: 156299 > Extra: 1 > - Example: Y:58885197:A:G > Missing: 14 > - Example: 1:102887902:A:T,1:143165228:C:G,16:87047601:A:C > > > Comparison of somatic.sv for DO50311 using Sanger > --- > Common: 260 > Extra: 0 > Missing: 0 > > > *Note. To make matching more stringent in indels I've added the reference to the mutation code end up comparing. This extends to SNV as well so where previously I wrote 16:87047601:C I now write 16:87047601:A:C. The extremely thorough reader will notice that the reports for DKFZ bellow show the discrepancies in CNV not following this new format. I've introduced this afterwards, but the results have not changed for DKFZ; I've checked. > > On Fri, Dec 16, 2016 at 11:07 AM, Miguel Vazquez > wrote: > Excuse me, obviously I meant > > For the two samples all matches perfectly except CNV where we find some large differences. > > On Fri, Dec 16, 2016 at 11:05 AM, Miguel Vazquez > wrote: > Hi Christina et al. > > Like you asked me I've extended the validation from SNV to Indel and SNV and also for germline > > For the two samples all matches perfectly except SNV where we find some large differences. > > Best regards > > Miguel > > Report > ~~~~~ > > Comparison of germline.indel for DO50311 using DKFZ > --- > Common: 709060 > Extra: 0 > Missing: 0 > > > Comparison of germline.snv.mnv for DO50311 using DKFZ > --- > Common: 3850992 > Extra: 0 > Missing: 0 > > > Comparison of somatic.cnv for DO50311 using DKFZ > --- > Common: 731 > Extra: 213 > - Example: 10:132510034:,10:20596801:,10:47674883: > Missing: 190 > - Example: 10:100891940:,10:104975905:,10:119704960: > > > Comparison of somatic.indel for DO50311 using DKFZ > --- > Common: 26469 > Extra: 0 > Missing: 0 > > > Comparison of somatic.snv.mnv for DO50311 using DKFZ > --- > Common: 51087 > Extra: 0 > Missing: 0 > > > Comparison of germline.indel for DO52140 using DKFZ > --- > Common: 706572 > Extra: 0 > Missing: 0 > > > Comparison of germline.snv.mnv for DO52140 using DKFZ > --- > Common: 3833896 > Extra: 0 > Missing: 0 > > > Comparison of somatic.cnv for DO52140 using DKFZ > --- > Common: 275 > Extra: 94 > - Example: 1:106505931:,1:109068899:,1:109359995: > Missing: 286 > - Example: 10:88653561:,11:179192:,11:38252006: > > > Comparison of somatic.indel for DO52140 using DKFZ > --- > Common: 19347 > Extra: 0 > Missing: 0 > > > Comparison of somatic.snv.mnv for DO52140 using DKFZ > --- > Common: 37160 > Extra: 0 > Missing: 0 > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Fri Dec 16 08:29:09 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Fri, 16 Dec 2016 14:29:09 +0100 Subject: [DOCKTESTERS] Sanger extended validation for 1 sample. Only differences in SNV In-Reply-To: <260604EE-713F-4095-96F6-F41260E67AA8@sanger.ac.uk> References: <260604EE-713F-4095-96F6-F41260E67AA8@sanger.ac.uk> Message-ID: Thanks for your input Keiran. Indeed none of the extra or missing mutations are marked 'PASS'. Best Miguel On Fri, Dec 16, 2016 at 12:33 PM, Keiran Raine wrote: > Hi Miguel, > > Please be aware that we agreed to include the last version of the > algorithms used in the core analysis for the final docker (as there were > fixes along the way). > > There were several versions of the core SNV caller fixing edge cases. I > don't see these discrepancies being an issue, especially if they aren't > marked as 'PASS'. > > It's theoretically possible for some of these to be floating-point > differences (esp. if different CPU arch), this can cause a call to flip > between the SNP and SUB output if very close to the cut-off. > > Sending the physical VCF records (with the VCF header) for these from the > relevant run (missing from the old run, Extra from the new) would allow a > confirmation, but as I say if they aren't marked 'PASS' we wouldn't be > planning to do any thing about them. > > Regards, > > Keiran Raine > Principal Bioinformatician > Cancer Genome Project > Wellcome Trust Sanger Institute > > kr2 at sanger.ac.uk > Tel:+44 (0)1223 834244 Ext: 4983 <+44%201223%20834244> > Office: H104 > > On 16 Dec 2016, at 10:50, Miguel Vazquez wrote: > > Dear Christina and Keiran, > > I've extended the analysis also for the Sanger workflow. I only have one > sample (DO50311) since the second one (DO52140) its still computing since > December 8 (8 days) > > Keiran I believed you asked me about how indels and CNVs matched. Is this > what you needed? > > All matches except for the +1 -14 differences I reported before > > Best regards > > Miguel > > Report > ~~~~~ > > Comparison of somatic.cnv for DO50311 using Sanger > --- > Common: 138 > Extra: 0 > Missing: 0 > > > Comparison of somatic.indel for DO50311 using Sanger > --- > Common: 812487 > Extra: 0 > Missing: 0 > > > Comparison of somatic.snv.mnv for DO50311 using Sanger > --- > Common: 156299 > Extra: 1 > - Example: Y:58885197:A:G > Missing: 14 > - Example: 1:102887902:A:T,1:143165228:C:G,16:87047601:A:C > > > Comparison of somatic.sv for DO50311 using Sanger > --- > Common: 260 > Extra: 0 > Missing: 0 > > > *Note. To make matching more stringent in indels I've added the reference > to the mutation code end up comparing. This extends to SNV as well so where > previously I wrote *16:87047601:C I* now write *16:87047601:A:C.* The > extremely thorough reader will notice that the reports for DKFZ bellow show > the discrepancies in CNV not following this new format. I've introduced > this afterwards, but the results have not changed for DKFZ; I've checked. > > On Fri, Dec 16, 2016 at 11:07 AM, Miguel Vazquez > wrote: > >> Excuse me, obviously I meant >> >> *For the two samples all matches perfectly except CNV where we find some >> large differences. * >> >> On Fri, Dec 16, 2016 at 11:05 AM, Miguel Vazquez >> wrote: >> >>> Hi Christina et al. >>> >>> Like you asked me I've extended the validation from SNV to Indel and SNV >>> and also for germline >>> >>> For the two samples all matches perfectly except SNV where we find some >>> large differences. >>> >>> Best regards >>> >>> Miguel >>> >>> Report >>> ~~~~~ >>> >>> Comparison of germline.indel for DO50311 using DKFZ >>> --- >>> Common: 709060 >>> Extra: 0 >>> Missing: 0 >>> >>> >>> Comparison of germline.snv.mnv for DO50311 using DKFZ >>> --- >>> Common: 3850992 >>> Extra: 0 >>> Missing: 0 >>> >>> >>> Comparison of somatic.cnv for DO50311 using DKFZ >>> --- >>> Common: 731 >>> Extra: 213 >>> - Example: 10:132510034:,10:20596801 >>> :,10:47674883: >>> Missing: 190 >>> - Example: 10:100891940:,10:1049 >>> 75905:,10:119704960: >>> >>> >>> Comparison of somatic.indel for DO50311 using DKFZ >>> --- >>> Common: 26469 >>> Extra: 0 >>> Missing: 0 >>> >>> >>> Comparison of somatic.snv.mnv for DO50311 using DKFZ >>> --- >>> Common: 51087 >>> Extra: 0 >>> Missing: 0 >>> >>> >>> Comparison of germline.indel for DO52140 using DKFZ >>> --- >>> Common: 706572 >>> Extra: 0 >>> Missing: 0 >>> >>> >>> Comparison of germline.snv.mnv for DO52140 using DKFZ >>> --- >>> Common: 3833896 >>> Extra: 0 >>> Missing: 0 >>> >>> >>> Comparison of somatic.cnv for DO52140 using DKFZ >>> --- >>> Common: 275 >>> Extra: 94 >>> - Example: 1:106505931:,1:109068899:,1:109359995: >>> Missing: 286 >>> - Example: 10:88653561:,11:179192:,11:38252006: >>> >>> >>> Comparison of somatic.indel for DO52140 using DKFZ >>> --- >>> Common: 19347 >>> Extra: 0 >>> Missing: 0 >>> >>> >>> Comparison of somatic.snv.mnv for DO52140 using DKFZ >>> --- >>> Common: 37160 >>> Extra: 0 >>> Missing: 0 >>> >>> >> > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Fri Dec 16 12:16:23 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Fri, 16 Dec 2016 17:16:23 +0000 Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference References: <0F84ED6166CE664E8563B61CB2ECB98CCBB55265@exmb2.ad.oicr.on.ca> Message-ID: <869814F7-3C18-4E6E-8B18-0C26AA7B4F48@oicr.on.ca> I thought all of you were on pcawg-tech list, but that wasn?t true ? Just passing this along, as FYI. Miguel is presenting Monday AM. @bffo Begin forwarded message: From: Christina Yung > Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference Date: December 16, 2016 at 12:13:30 EST To: "pawg-tech (pawg-tech at lists.icgc.org)" > Hi All, Below is a draft agenda for Monday?s tech call. Please feel free to edit on the wiki: https://wiki.oicr.on.ca/display/PANCANCER/2016-12-19+PCAWG-TECH+Teleconference Also, two reminders: 1. To ensure your access to NCI?s sFTP (sftp://dccsftp.nci.nih.gov, aka Jamboree) is uninterrupted, please request for NIH-Ext accounts by December 19th. Instructions are in the attached email. 2. OICR-hosted systems including the wiki will be unavailable during maintenance on Dec 17, 8am through Dec 18, 10pm ET. Best regards, Christina Call Info Usual Time 9 AM Eastern Time, Mondays UK 0208 322 2500 Canada 1-866-220-6419 United States 1-877-420-0272 All Others Please see attached PDF file with a list of numbers for other countries. Participant Code 5910819# Agenda Time Item Who Attachments/Links 5min Welcome. Wait for group members to log on Christina Yung, OICR 15min Overall status Christina Yung, OICR ? Linkouts to Most Current PCAWG Data ? Report data issues to pcawg-data at icgc.org, GNOS issue to: Help at annaisystems.com ? From Boston F2F: PCAWG datasets & dependencies Action Items 1. [Junjun] Specimen ID mapping for miRNA and methylation, see update below: a. miRNA release posted in Linkouts to Most Current PCAWG Data item #11. i. Following Lincoln last week's suggestion, a new PCAWG sample sheet is prepared: https://docs.google.com/spreadsheets/d/1lT8kUkox6SHlXpCfIkUqs9BnQ3kU8ljyxSiR-ytiyxk/edit#gid=625855956. Samples from WGS, RNA-Seq and miRNA-Seq are all included, sorted by projects, donors, then specimens. Availability of same specimen with WGS, RNA-Seq and/or miRNA-Seq data is easily reviewable. b. New RNA expression data file produced, previously reported issue about missing aliquot fixed. On going support for user's inquiries: [cid:8de7f5fc-ecd2-4664-a653-107e34932dfe at oicr.on.ca] c. Methylation WG will have have first complete set of analysis result by mid December. Will wait till then to revisit the matching of sample IDs. No update since last week 2. [Junjun] Create directories on ICGC Portal for reference datasets, see update below: a. Reference data folder structure reorganized available at https://dcc.icgc.org/releases/PCAWG (README under each sub-folder will be added to provide addition information). No update since last week b. Other additional reference data from other working groups can be added too. c. design folder structure for this additional reference datasets 3. [Jonathan] For cell lines, consensus SVs available: pdf; syn7373725. Consensus SNVs & indels: a. merged results passed to Broad for filtering b. dkfz-filtered SNVs have been added to https://www.synapse.org/#!Synapse:syn7510859 4. [Christina] For medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark, run alignment & variant workflows a. Sanger - completed b. DKFZ/EMBL - Encountered 2nd Roddy error. Logs sent to Michael. c. Broad v1 completed - passed to Broad to fix 5. [All] Contribute to the manuscripts a. infrastructure: Paper ( https://goo.gl/utx3cC ), Supplement ( https://goo.gl/gtYUv7 ) b. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e ) c. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline ) i. some examples from Matthias: GroupSeminar_20160531_onlyPCAWG.pptx 6. Access to PCAWG Jamboree 10min Status of dockerizing workflows Denis Yuen, OICR Brian O'Connor, UCSC Gordon Saksena, Broad Status of PCAWG Workflow ports to Dockstore: 1. [Kyle] MuSE 2. [Gordon] Broad dockers with #days estimated for development a. 0 - tokens alg, for SNV/Indel PoN generation (not GATK) b. 5 - SNV/Indel PoN filter c. 3 - MuTect with ContEst wiring bug, plus rescue, plus SV minibam, using public PoN (GATK) d. 3 - MuTect without ContEst wiring bug, using public PoN. (GATK license) e. ? - MuTect PoN generator f. 3 - OxoQ measure + OxoG filter (GATK license due to filter step) g. 5 - Snowman (GPL needs to be unlinked. not GATK) h. 5 - dRanger, includes preprocess and BreakPointer. (does public PoN exist? release without PoN?) i. ? - dRanger PoN generator j. 3 - frag counter k. 3 - haplotype caller (GATK) l. 3 - RECapseg (GATK) m. 3 - vcf merge (GATK) n. ? - het sites force call (GATK, results never used due to bug) o. ? - BQSR + cocleaning (relying on Kyle) p. 10 - deTiN - detection and filter (GATK) q. 3 - gVCF merge (GATK) r. 5 - blat SNV filter (GATK) s. 10 - germline overlap SNV filter (GATK) t. ? - Variantbam 3. [Jonathan] Consensus algorithm for SNVs: 2+ of 4 4. [Jonathan] Consensus algorithm for indels: stacked logistic regression - Update from Jonathan on 12/9: cleaning up scripts and removing unnecesssary steps 5. [Joachim] Consensus algorithm for SVs - will check in on 12/5 Of note: ? Naming convention: "pcawg--workflow" for complex, docker-based workflows; or "pcawg--tool" for standalone, single tools ? currently working on releases using gosu to tackle "unknown user issue" and new test.json pointing at pcawg site ? PCAWG DOI Generation for a howto guide on doing this (we use GitHub + Zenodo) ? Brian's Dockstore tutorials: ? https://www.youtube.com/watch?v=sInP-ByF9xU ? https://www.youtube.com/edit?video_id=-JuKsSQja3g ? Tutorial from 12/6: https://goo.gl/2bnXq & https://www.youtube.com/watch?v=Gb6LnmpZj_g 10min Status of testing dockerized workflows Miguel Vazquez, CNIO Francis Ouellette, OICR Denis Yuen, OICR Brian O'Connor, UCSC PCAWG Docker (Dockstore) Testing Working Group Workflow Testing Data Docker containers to be tested ________________________________ Copy of table from "Workflow Testing Data" representing latest status of PCAWG docker containers currentlt present on the PUBLIC Dockstore.org and what the status of their testing is (taken 08:30 AM EDT, Dec 5, 2017). Miguel Vazquez slides ________________________________ Status of workflows being tested: 1. BWA-Mem 2. Sanger - Changes proposed by Keiran have been made. Currently testing if the new version fixes issues with a specific donor (DO50311); test had to be restarted due to reboot for security patch a. Tests passed with 2.0.2, watch out though, test data ran at the regular time, but the previously failing donor DO50311 took upward of 8 days 3. EMBL 4. DKFZ 5. DKFZ's PCR & strand bias filtering 5min Other business? Group Next meeting will be on January 9, 2017 Christina K. Yung, PhD Project Manager, Cancer Genome Collaboratory Ontario Institute for Cancer Research MaRS Centre 661 University Avenue, Suite 510 Toronto, Ontario, Canada M5G 0A3 Tel: 416-673-8578 www.oicr.on.ca www.cancercollaboratory.org This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. _______________________________________________ PAWG-TECH mailing list PAWG-TECH at lists.icgc.org https://lists.icgc.org/mailman/listinfo/pawg-tech -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 174 bytes Desc: image001.png URL: -------------- next part -------------- An embedded message was scrubbed... From: Christina Yung Subject: [PAWG] FW: Action Requested to maintain access to PCAWG Jamboree Date: Fri, 16 Dec 2016 17:03:33 +0000 Size: 44239 URL: From miguel.vazquez at cnio.es Tue Dec 27 08:24:57 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Tue, 27 Dec 2016 14:24:57 +0100 Subject: [DOCKTESTERS] =?utf-8?q?Sanger_2=C2=BA_validation_=28DO52140=29?= =?utf-8?q?=3A_Very_small_dicrepancies?= Message-ID: Dear all, The test for the Sanger workflow on the second sample is complete. The results are posted below. In brief, *somatic* *indels* *and* *SV* *are a 100% match* (803986 and 6 respectively matches) there are *small differences* *in* *somatic.snv.mnv* (+5 -7 with 87234 matches) *and* *somatic.cnv* (-2 and 36 matches) Best regards Report ~~~~~~ Comparison of somatic.cnv for DO52140 using Sanger --- Common: 36 Extra: 0 Missing: 2 - Example: 10:11767915:T:,10:11779907:G: Comparison of somatic.indel for DO52140 using Sanger --- Common: 803986 Extra: 0 Missing: 0 Comparison of somatic.snv.mnv for DO52140 using Sanger --- Common: 87234 Extra: 5 - Example: 1:23719098:A:G,12:43715930:T:A,20:4058335:T:A Missing: 7 - Example: 10:6881937:A:T,1:148579866:A:G,11:9271589:T:A Comparison of somatic.sv for DO52140 using Sanger --- Common: 6 Extra: 0 Missing: 0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Tue Dec 27 11:30:21 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Tue, 27 Dec 2016 16:30:21 +0000 Subject: [DOCKTESTERS] =?iso-8859-1?q?Sanger_2=BA_validation_=28DO52140=29?= =?iso-8859-1?q?=3A_Very_small_dicrepancies?= In-Reply-To: References: Message-ID: <89C0788F-3762-4CDE-AEB0-37339637E375@oicr.on.ca> Are the differences the same or different then the last time you ran this? Was this Sanger workflow run on the same infrastructure as the last one you did? @bffo On Dec 27, 2016, at 08:24, Miguel Vazquez > wrote: Dear all, The test for the Sanger workflow on the second sample is complete. The results are posted below. In brief, somatic indels and SV are a 100% match (803986 and 6 respectively matches) there are small differences in somatic.snv.mnv (+5 -7 with 87234 matches) and somatic.cnv (-2 and 36 matches) Best regards Report ~~~~~~ Comparison of somatic.cnv for DO52140 using Sanger --- Common: 36 Extra: 0 Missing: 2 - Example: 10:11767915:T:,10:11779907:G: Comparison of somatic.indel for DO52140 using Sanger --- Common: 803986 Extra: 0 Missing: 0 Comparison of somatic.snv.mnv for DO52140 using Sanger --- Common: 87234 Extra: 5 - Example: 1:23719098:A:G,12:43715930:T:A,20:4058335:T:A Missing: 7 - Example: 10:6881937:A:T,1:148579866:A:G,11:9271589:T:A Comparison of somatic.sv for DO52140 using Sanger --- Common: 6 Extra: 0 Missing: 0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.vazquez at cnio.es Tue Dec 27 13:16:27 2016 From: miguel.vazquez at cnio.es (Miguel Vazquez) Date: Tue, 27 Dec 2016 19:16:27 +0100 Subject: [DOCKTESTERS] =?utf-8?q?Sanger_2=C2=BA_validation_=28DO52140=29?= =?utf-8?q?=3A_Very_small_discrepancies?= Message-ID: Francis, This was on a new donor. I decided to get a second donor before repeating the first. The first donor also had differences in SNVs, but no differences in CNV. Since the discrepancies for the first donor were not deemed too critical, I've actually moved onto trying the bwa-mem workflow, which I couldn't make to run on test data and so I'm trying on real data. Should I instead try a second run on a donor to test if the differences are the same? Best Miguel On Dec 27, 2016 5:30 PM, "Francis Ouellette" wrote: Are the differences the same or different then the last time you ran this? Was this Sanger workflow run on the same infrastructure as the last one you did? @bffo On Dec 27, 2016, at 08:24, Miguel Vazquez wrote: Dear all, The test for the Sanger workflow on the second sample is complete. The results are posted below. In brief, *somatic* *indels* * and* *SV* *are a 100% match* (803986 and 6 respectively matches) there are *small differences* *in* *somatic.snv.mnv* (+5 -7 with 87234 matches) *and* *somatic.cnv* (-2 and 36 matches) Best regards Report ~~~~~~ Comparison of somatic.cnv for DO52140 using Sanger --- Common: 36 Extra: 0 Missing: 2 - Example: 10:11767915:T:,10:11779907:G: Comparison of somatic.indel for DO52140 using Sanger --- Common: 803986 Extra: 0 Missing: 0 Comparison of somatic.snv.mnv for DO52140 using Sanger --- Common: 87234 Extra: 5 - Example: 1:23719098:A:G,12:43715930:T:A,20:4058335:T:A Missing: 7 - Example: 10:6881937:A:T,1:148579866:A:G,11:9271589:T:A Comparison of somatic.sv for DO52140 using Sanger --- Common: 6 Extra: 0 Missing: 0 _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: From francis at oicr.on.ca Tue Dec 27 16:21:26 2016 From: francis at oicr.on.ca (Francis Ouellette) Date: Tue, 27 Dec 2016 21:21:26 +0000 Subject: [DOCKTESTERS] =?utf-8?q?Sanger_2=C2=BA_validation_=28DO52140=29?= =?utf-8?q?=3A_Very_small_discrepancies?= In-Reply-To: References: Message-ID: The question that we should answer once is: Are differences we see in the output from a given pipeline (how ever small they are) cause by randomness (stochastic) or are they because of the way we are running it. So if you run the same pipeline twice, and do or do not get the same answer (8 or 10 differences, as you have seen), are you able to reproduce that, or will always be different each time you run it. We don?t need to do that for all pipelines that generate different results, but maybe once, so that when Lincoln asks ?is this difference stochastic or reproducible?? I can say something better than what I said last time, which was ?Stochastic, is my educated guess? :-) @bffo On Dec 27, 2016, at 13:16, Miguel Vazquez > wrote: Francis, This was on a new donor. I decided to get a second donor before repeating the first. The first donor also had differences in SNVs, but no differences in CNV. Since the discrepancies for the first donor were not deemed too critical, I've actually moved onto trying the bwa-mem workflow, which I couldn't make to run on test data and so I'm trying on real data. Should I instead try a second run on a donor to test if the differences are the same? Best Miguel On Dec 27, 2016 5:30 PM, "Francis Ouellette" > wrote: Are the differences the same or different then the last time you ran this? Was this Sanger workflow run on the same infrastructure as the last one you did? @bffo On Dec 27, 2016, at 08:24, Miguel Vazquez > wrote: Dear all, The test for the Sanger workflow on the second sample is complete. The results are posted below. In brief, somatic indels and SV are a 100% match (803986 and 6 respectively matches) there are small differences in somatic.snv.mnv (+5 -7 with 87234 matches) and somatic.cnv (-2 and 36 matches) Best regards Report ~~~~~~ Comparison of somatic.cnv for DO52140 using Sanger --- Common: 36 Extra: 0 Missing: 2 - Example: 10:11767915:T:,10:11779907:G: Comparison of somatic.indel for DO52140 using Sanger --- Common: 803986 Extra: 0 Missing: 0 Comparison of somatic.snv.mnv for DO52140 using Sanger --- Common: 87234 Extra: 5 - Example: 1:23719098:A:G,12:43715930:T:A,20:4058335:T:A Missing: 7 - Example: 10:6881937:A:T,1:148579866:A:G,11:9271589:T:A Comparison of somatic.sv for DO52140 using Sanger --- Common: 6 Extra: 0 Missing: 0 _______________________________________________ docktesters mailing list docktesters at lists.icgc.org https://lists.icgc.org/mailman/listinfo/docktesters -------------- next part -------------- An HTML attachment was scrubbed... URL: