From Denis.Yuen at oicr.on.ca Fri Dec 2 12:15:35 2016
From: Denis.Yuen at oicr.on.ca (Denis Yuen)
Date: Fri, 2 Dec 2016 17:15:35 +0000
Subject: [DOCKTESTERS] Core workflow icgc reference data location and gosu
Message-ID: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
Hi,
I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes.
1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally.
If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location.
2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue.
The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
In particular:
* bwa-mem 2.6.8_1.2
* dkfz 2.0.1_cwl1.0
* sanger 2.0.3
* embl 2.0.1-cwl1.0
Denis Yuen
Bioinformatics Software Developer
Ontario Institute for Cancer Research
MaRS Centre
661 University Avenue
Suite 510
Toronto, Ontario, Canada M5G 0A3
Toll-free: 1-866-678-6427
Twitter: @OICR_news
www.oicr.on.ca
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From francis at oicr.on.ca Fri Dec 2 15:39:03 2016
From: francis at oicr.on.ca (Francis Ouellette)
Date: Fri, 2 Dec 2016 20:39:03 +0000
Subject: [DOCKTESTERS] Core workflow icgc reference data location and
gosu
In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
Message-ID: <47AFC784-7986-447F-AA3D-2892402BED04@oicr.on.ca>
OHSU people: any chance some of these will be tested before Monday AM?s call?
Miguel: Can you rerun some of your successful tests with this new dockstore?
All: any testing possible?
Gordon: What is up with broady docker containers?
We also have a request from Peter van Loo, to join this group ? I said yes, but he wants to test other containers from his group ? Which is great!
After I get some specifics from him I will add peter and other(s) from his group.
Stay tuned,
Also, please update things on wiki/google doc as you progress forward,
Thank you,
@bffo
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
On Dec 2, 2016, at 12:15 PM, Denis Yuen > wrote:
Hi,
I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes.
1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally.
If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location.
2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue.
The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
In particular:
* bwa-mem 2.6.8_1.2
* dkfz 2.0.1_cwl1.0
* sanger 2.0.3
* embl 2.0.1-cwl1.0
Denis Yuen
Bioinformatics Software Developer
Ontario Institute for Cancer Research
MaRS Centre
661 University Avenue
Suite 510
Toronto, Ontario, Canada M5G 0A3
Toll-free: 1-866-678-6427
Twitter: @OICR_news
www.oicr.on.ca
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org
https://lists.icgc.org/mailman/listinfo/docktesters
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From mikisvaz at gmail.com Fri Dec 2 17:38:50 2016
From: mikisvaz at gmail.com (Miguel Vazquez)
Date: Fri, 2 Dec 2016 23:38:50 +0100
Subject: [DOCKTESTERS] Core workflow icgc reference data location and
gosu
In-Reply-To: <47AFC784-7986-447F-AA3D-2892402BED04@oicr.on.ca>
References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
<47AFC784-7986-447F-AA3D-2892402BED04@oicr.on.ca>
Message-ID:
Francis, I'll rerun the test. I'm still waiting for the Sanger wf test that
has been running for about 10 days. I'll see if it completes by Monday,
otherwise I'll think if I abort and retest dkfz and see if I can debug
bwa-mem.
Best
Miguel
On Dec 2, 2016 9:39 PM, "Francis Ouellette" wrote:
>
> OHSU people: any chance some of these will be tested before Monday AM?s
> call?
>
> Miguel: Can you rerun some of your successful tests with this new
> dockstore?
>
> All: any testing possible?
>
> Gordon: What is up with broady docker containers?
>
> We also have a request from Peter van Loo, to join this group ? I said
> yes, but he wants to test other containers from his group ? Which is great!
>
> After I get some specifics from him I will add peter and other(s) from his
> group.
>
> Stay tuned,
>
> Also, please update things on wiki/google doc as you progress forward,
>
> Thank you,
>
> @bffo
>
>
> --
> B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
>
>
>
>
>
>
>
>
>
> On Dec 2, 2016, at 12:15 PM, Denis Yuen wrote:
>
> Hi,
>
> I've gone through and created a new release for each of the core tools
> (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes.
>
> 1) Each of these tools has been redirected to use reference data hosted on
> the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws
> in order to conserve our use of S3. This means that sample json parameter
> files have been updated and any references inside the Docker images that
> we're aware of also have been updated. As usual, if re-running workflows
> frequently, you'll want to host these files locally.
>
> If you are using the Dockstore command-line interface, you'll need to
> upgrade to the latest release of dockstore (1.2) to use this new file
> location.
>
> 2) A number of users have run into problems running the workflow when
> running in a multi-user environment (i.e. not running Docker containers
> with the first user on a host). This release replaces most usage of sudo
> inside the tools with gosu to deal with this issue.
>
> The new release numbers are documented at https://wiki.oicr.on.ca/
> display/PANCANCER/Workflow+Testing+Data
> In particular:
>
> - bwa-mem 2.6.8_1.2
> - dkfz 2.0.1_cwl1.0
> - sanger 2.0.3
> - embl 2.0.1-cwl1.0
>
>
>
> *Denis Yuen*
> Bioinformatics Software Developer
>
> *Ontario* *Institute* *for* *Cancer* *Research*
> MaRS Centre
> 661 University Avenue
> Suite 510
> Toronto, Ontario, Canada M5G 0A3
> Toll-free: 1-866-678-6427
> Twitter: @OICR_news
> *www.oicr.on.ca *
> This message and any attachments may contain confidential and/or
> privileged information for the sole use of the intended recipient. Any
> review or distribution by anyone other than the person for whom it was
> originally intended is strictly prohibited. If you have received this
> message in error, please contact the sender and delete all copies.
> Opinions, conclusions or other information contained in this message may
> not be that of the organization.
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
> https://lists.icgc.org/mailman/listinfo/docktesters
>
>
>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
> https://lists.icgc.org/mailman/listinfo/docktesters
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From buchanae at ohsu.edu Fri Dec 2 19:08:55 2016
From: buchanae at ohsu.edu (Alexander Buchanan)
Date: Sat, 3 Dec 2016 00:08:55 +0000
Subject: [DOCKTESTERS] Core workflow icgc reference data location and
gosu
In-Reply-To: <47AFC784-7986-447F-AA3D-2892402BED04@oicr.on.ca>
References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
<47AFC784-7986-447F-AA3D-2892402BED04@oicr.on.ca>
Message-ID: <53D2FB61-82B2-4610-A6D9-87049F8E5964@ohsu.edu>
I don?t think we?ll have time before Monday morning to run any of the new containers.
I?ll probably try out the new dkfz container early next week though, as I have been having lots of issues with running DKFZ on Cromwell. The issues are related to read-only files being mounted and dkfz switching users to ?roddy?, so I?m hoping the gosu changes will help.
From: on behalf of Francis Ouellette
Date: Friday, December 2, 2016 at 12:39 PM
To: "docktesters at lists.icgc.org"
Subject: Re: [DOCKTESTERS] Core workflow icgc reference data location and gosu
OHSU people: any chance some of these will be tested before Monday AM?s call?
Miguel: Can you rerun some of your successful tests with this new dockstore?
All: any testing possible?
Gordon: What is up with broady docker containers?
We also have a request from Peter van Loo, to join this group ? I said yes, but he wants to test other containers from his group ? Which is great!
After I get some specifics from him I will add peter and other(s) from his group.
Stay tuned,
Also, please update things on wiki/google doc as you progress forward,
Thank you,
@bffo
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
On Dec 2, 2016, at 12:15 PM, Denis Yuen > wrote:
Hi,
I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes.
1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally.
If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location.
2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue.
The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
In particular:
* bwa-mem 2.6.8_1.2
* dkfz 2.0.1_cwl1.0
* sanger 2.0.3
* embl 2.0.1-cwl1.0
Denis Yuen
Bioinformatics Software Developer
Ontario Institute for Cancer Research
MaRS Centre
661 University Avenue
Suite 510
Toronto, Ontario, Canada M5G 0A3
Toll-free: 1-866-678-6427
Twitter: @OICR_news
www.oicr.on.ca
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org
https://lists.icgc.org/mailman/listinfo/docktesters
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From buchanae at ohsu.edu Fri Dec 2 19:11:15 2016
From: buchanae at ohsu.edu (Alexander Buchanan)
Date: Sat, 3 Dec 2016 00:11:15 +0000
Subject: [DOCKTESTERS] Variant call validation results for Sanger
Message-ID: <76AFD1D3-1EEA-427A-83C2-B28A7883E317@ohsu.edu>
I was able to run USeq on data output from running the sanger workflow on a Cromwell engine, for 5 donors. It?s reporting some pretty big differences, so I still need investigate. I?ll copy the USeq output at the end of this email.
I also wrote a simple comparison script, similar to what Miguel is doing (but in python), and it also reports differences.
At this point, I don?t know the source of the difference. Maybe I?m not comparing the data correctly, or maybe the workflows were run incorrectly. Maybe the tools have some element of randomness, but I?m not sure that would explain the substantial differences.
Output from python:
python test.py
==================================================
Donor: DO50414
intersection 17395
key - test 10904
test - key 6722
==================================================
Donor: DO50415
intersection 34721
key - test 17806
test - key 8755
==================================================
Donor: DO50417
intersection 81477
key - test 39521
test - key 15959
==================================================
Donor: DO50419
intersection 82705
key - test 41674
test - key 15262
==================================================
Donor: DO50432
intersection 3941
key - test 24358
test - key 138224
Output from USeq:
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50414/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50414/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
28299 Key variants
28299 Key variants in shared regions
0.904886914378029 Shared key variants Ti/Tv
24117 Test variants
24117 Test variants in shared regions
0.919073764621628 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 17395 6722 0.27872455 0.27872455 0.614686 0.2375349 0.72127545
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50415/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50415/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
52527 Key variants
52527 Key variants in shared regions
0.9612067356158758 Shared key variants Ti/Tv
43476 Test variants
43476 Test variants in shared regions
0.9573203673689897 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 34721 8755 0.20137547 0.20137547 0.6610124 0.16667618 0.7986245
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50417/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50417/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
120998 Key variants
120998 Key variants in shared regions
0.9540073962824799 Shared key variants Ti/Tv
97436 Test variants
97436 Test variants in shared regions
0.9392950261728001 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 81477 15959 0.16378957 0.16378957 0.6733748 0.13189474 0.8362104
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50419/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50419/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
124379 Key variants
124379 Key variants in shared regions
0.9678664662605807 Shared key variants Ti/Tv
97967 Test variants
97967 Test variants in shared regions
0.9450632358488693 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 82705 15262 0.15578716 0.15578716 0.66494346 0.1227056 0.84421283
Done! 3 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50432/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50432/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
28299 Key variants
28299 Key variants in shared regions
0.904886914378029 Shared key variants Ti/Tv
142165 Test variants
142165 Test variants in shared regions
0.9905488658639037 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 3940 138225 0.97228575 0.97228575 0.13922754 4.884448 0.027714275
Done! 4 seconds
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From miguel.vazquez at cnio.es Sat Dec 3 06:10:36 2016
From: miguel.vazquez at cnio.es (Miguel Vazquez)
Date: Sat, 3 Dec 2016 12:10:36 +0100
Subject: [DOCKTESTERS] Core workflow icgc reference data location and
gosu
In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
Message-ID:
Denis,
I've tried updating to the latest dockstore and ran into a bug.
ubuntu at ip-10-253-35-14:~$ dockstore
Error: A JNI error has occurred, please check your installation and try
again
Exception in thread "main" java.lang.NoClassDefFoundError:
io/cwl/avro/CWL$GsonBuildException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at
sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException:
io.cwl.avro.CWL$GsonBuildException
at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Caused by: java.util.zip.ZipException: invalid LOC header (bad signature)
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1400(ZipFile.java:60)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717)
at
java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419)
at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at sun.misc.Resource.getBytes(Resource.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:462)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
... 13 more
What I did is that I moved the old version of dockstore from
~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from
https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore
I ran the new dockstore, it downloaded
https://seqwaremaven.oicr.on.ca/artifactory/collab-release/io/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar
and then when I tried running dockstore --version I got the error above.
Anything I can do?
Best regards
Miguel
On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen wrote:
> Hi,
>
> I've gone through and created a new release for each of the core tools
> (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes.
>
> 1) Each of these tools has been redirected to use reference data hosted on
> the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws
> in order to conserve our use of S3. This means that sample json parameter
> files have been updated and any references inside the Docker images that
> we're aware of also have been updated. As usual, if re-running workflows
> frequently, you'll want to host these files locally.
>
> If you are using the Dockstore command-line interface, you'll need to
> upgrade to the latest release of dockstore (1.2) to use this new file
> location.
>
> 2) A number of users have run into problems running the workflow when
> running in a multi-user environment (i.e. not running Docker containers
> with the first user on a host). This release replaces most usage of sudo
> inside the tools with gosu to deal with this issue.
>
> The new release numbers are documented at https://wiki.oicr.on.ca/
> display/PANCANCER/Workflow+Testing+Data
> In particular:
>
> - bwa-mem 2.6.8_1.2
> - dkfz 2.0.1_cwl1.0
> - sanger 2.0.3
> - embl 2.0.1-cwl1.0
>
>
>
> *Denis Yuen*
> Bioinformatics Software Developer
>
>
> *Ontario* *Institute* *for* *Cancer* *Research*
> MaRS Centre
> 661 University Avenue
> Suite 510
> Toronto, Ontario, Canada M5G 0A3
>
> Toll-free: 1-866-678-6427
> Twitter: @OICR_news
> *www.oicr.on.ca *
>
> This message and any attachments may contain confidential and/or
> privileged information for the sole use of the intended recipient. Any
> review or distribution by anyone other than the person for whom it was
> originally intended is strictly prohibited. If you have received this
> message in error, please contact the sender and delete all copies.
> Opinions, conclusions or other information contained in this message may
> not be that of the organization.
>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
> https://lists.icgc.org/mailman/listinfo/docktesters
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From francis at oicr.on.ca Sat Dec 3 18:48:10 2016
From: francis at oicr.on.ca (Francis Ouellette)
Date: Sat, 3 Dec 2016 23:48:10 +0000
Subject: [DOCKTESTERS] Core workflow icgc reference data location
and gosu
In-Reply-To:
References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
Message-ID:
Anybody else with these problems?
@bffo
On Dec 3, 2016, at 06:10, Miguel Vazquez > wrote:
Denis,
I've tried updating to the latest dockstore and ran into a bug.
ubuntu at ip-10-253-35-14:~$ dockstore
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: io/cwl/avro/CWL$GsonBuildException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: io.cwl.avro.CWL$GsonBuildException
at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Caused by: java.util.zip.ZipException: invalid LOC header (bad signature)
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1400(ZipFile.java:60)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717)
at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at sun.misc.Resource.getBytes(Resource.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:462)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
... 13 more
What I did is that I moved the old version of dockstore from ~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from
https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore
I ran the new dockstore, it downloaded
https://seqwaremaven.oicr.on.ca/artifactory/collab-release/io/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar
and then when I tried running dockstore --version I got the error above. Anything I can do?
Best regards
Miguel
On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen > wrote:
Hi,
I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes.
1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally.
If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location.
2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue.
The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
In particular:
* bwa-mem 2.6.8_1.2
* dkfz 2.0.1_cwl1.0
* sanger 2.0.3
* embl 2.0.1-cwl1.0
Denis Yuen
Bioinformatics Software Developer
Ontario Institute for Cancer Research
MaRS Centre
661 University Avenue
Suite 510
Toronto, Ontario, Canada M5G 0A3
Toll-free: 1-866-678-6427
Twitter: @OICR_news
www.oicr.on.ca
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org
https://lists.icgc.org/mailman/listinfo/docktesters
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org
https://lists.icgc.org/mailman/listinfo/docktesters
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From miguel.vazquez at cnio.es Mon Dec 5 07:41:58 2016
From: miguel.vazquez at cnio.es (Miguel Vazquez)
Date: Mon, 5 Dec 2016 13:41:58 +0100
Subject: [DOCKTESTERS] Sanger pipeline completed,
small discrepancies with GNOS VCF (99.9905% accuracy)
Message-ID:
Dear all,
The Sanger pipeline completed, after about 2 weeks of computing, for donor
DO50311
The results are the following:
*Comparison for DO50311 using Sanger---Common: 156299Extra: 1 - Example:
Y:58885197:GMissing: 14 - Example:
1:102887902:T,1:143165228:G,16:87047601:C*
The donor results for DKFZ yielded
Comparison for DO50311 using DKFZ
---
Common: 51087
Extra: 0
Missing: 0
In both cases I'm comparing agains the VCF file downloaded from GNOS. I've
updated the information here
https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
Best regards
Miguel
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Denis.Yuen at oicr.on.ca Mon Dec 5 09:18:58 2016
From: Denis.Yuen at oicr.on.ca (Denis Yuen)
Date: Mon, 5 Dec 2016 14:18:58 +0000
Subject: [DOCKTESTERS] Core workflow icgc reference data location
and gosu
In-Reply-To:
References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
,
Message-ID: <27512884B2D81B41AAB7BB266248F240C09A5D21@exmb2.ad.oicr.on.ca>
Hi,
Having some difficulty reproducing this.
It's possible that a jar file was corrupted in transit, try deleting ~/.dockstore/self-installs/ and retrying.
________________________________
From: Francis Ouellette
Sent: December 3, 2016 6:48 PM
To: docktesters at lists.icgc.org
Cc: Denis Yuen; Miguel Vazquez
Subject: Re: [DOCKTESTERS] Core workflow icgc reference data location and gosu
Anybody else with these problems?
@bffo
On Dec 3, 2016, at 06:10, Miguel Vazquez > wrote:
Denis,
I've tried updating to the latest dockstore and ran into a bug.
ubuntu at ip-10-253-35-14:~$ dockstore
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: io/cwl/avro/CWL$GsonBuildException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: io.cwl.avro.CWL$GsonBuildException
at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Caused by: java.util.zip.ZipException: invalid LOC header (bad signature)
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1400(ZipFile.java:60)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717)
at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at sun.misc.Resource.getBytes(Resource.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:462)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
... 13 more
What I did is that I moved the old version of dockstore from ~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from
https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore
I ran the new dockstore, it downloaded
https://seqwaremaven.oicr.on.ca/artifactory/collab-release/io/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar
and then when I tried running dockstore --version I got the error above. Anything I can do?
Best regards
Miguel
On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen > wrote:
Hi,
I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes.
1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally.
If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location.
2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue.
The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
In particular:
* bwa-mem 2.6.8_1.2
* dkfz 2.0.1_cwl1.0
* sanger 2.0.3
* embl 2.0.1-cwl1.0
Denis Yuen
Bioinformatics Software Developer
Ontario Institute for Cancer Research
MaRS Centre
661 University Avenue
Suite 510
Toronto, Ontario, Canada M5G 0A3
Toll-free: 1-866-678-6427
Twitter: @OICR_news
www.oicr.on.ca
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org
https://lists.icgc.org/mailman/listinfo/docktesters
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org
https://lists.icgc.org/mailman/listinfo/docktesters
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From francis at oicr.on.ca Mon Dec 5 09:38:01 2016
From: francis at oicr.on.ca (Francis Ouellette)
Date: Mon, 5 Dec 2016 14:38:01 +0000
Subject: [DOCKTESTERS] Joining PCAWG testing WG
In-Reply-To: <70079FBF-6A3A-4AC3-A712-FDCB8B65713A@oicr.on.ca>
References:
<2F7E0094-10A8-4C90-89DB-B89A6AFB60DE@oicr.on.ca>
<70079FBF-6A3A-4AC3-A712-FDCB8B65713A@oicr.on.ca>
Message-ID:
Hi Peter,
You and Jonas have been added to the docktester list.
I will send you recent docktester messages.
@bffo
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
On Dec 2, 2016, at 3:40 PM, Francis Ouellette > wrote:
Yes, I will add you to mailing list, and we don?t have much of TC schedule. We do most of our work via mailing list and wiki.
cheers,
@bffo
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
On Dec 2, 2016, at 3:38 PM, Peter Van Loo > wrote:
Hi Francis,
Yes, Jonas and me!
Thanks!
Can you add us to the relevant mailing lists and let us know when your TCs are?
Cheers,
Peter
--
Peter Van Loo, PhD
Winton Group Leader ? Cancer Genomics
The Francis Crick Institute
1 Midland Road
London NW1 1AT
http://www.crick.ac.uk/peter-van-loo
From: Francis Ouellette >
Date: Friday, 2 December 2016 20:36
To: Peter Van Loo >
Cc: Kyle Ellrott >, Paul Spellman >, Jonas Demeulemeester >
Subject: Re: Joining PCAWG testing WG
Sorry for the late reply Peter,
the answer is yes, of course, but who is ?we?? Jonas and you?
@bffo
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
On Nov 30, 2016, at 3:57 PM, Peter Van Loo > wrote:
Hi Francis,
Just to chase up on this again: could we join the PCAWG testing WG?
We?re setting up to run the main PCAWG pipelines through a series of simulated BAM files from the SMC-Het DREAM challenge, to be able to characterise in detail how well we can detect subclones. It would be great if we could work with the PCAWG testing WG to get everything to run!
Thanks!
Cheers,
Peter
--
Peter Van Loo, PhD
Winton Group Leader ? Cancer Genomics
The Francis Crick Institute
1 Midland Road
London NW1 1AT
http://www.crick.ac.uk/peter-van-loo
The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE.
The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From miguel.vazquez at cnio.es Mon Dec 5 10:05:15 2016
From: miguel.vazquez at cnio.es (Miguel Vazquez)
Date: Mon, 5 Dec 2016 16:05:15 +0100
Subject: [DOCKTESTERS] Core workflow icgc reference data location and
gosu
In-Reply-To: <27512884B2D81B41AAB7BB266248F240C09A5D21@exmb2.ad.oicr.on.ca>
References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
<27512884B2D81B41AAB7BB266248F240C09A5D21@exmb2.ad.oicr.on.ca>
Message-ID:
Thanks Denis, your suggestion of removing the jar files worked.
Best regards
Miguel
On Mon, Dec 5, 2016 at 3:18 PM, Denis Yuen wrote:
> Hi,
> Having some difficulty reproducing this.
> It's possible that a jar file was corrupted in transit, try deleting
> ~/.dockstore/self-installs/ and retrying.
>
>
>
> ------------------------------
> *From:* Francis Ouellette
> *Sent:* December 3, 2016 6:48 PM
> *To:* docktesters at lists.icgc.org
> *Cc:* Denis Yuen; Miguel Vazquez
> *Subject:* Re: [DOCKTESTERS] Core workflow icgc reference data location
> and gosu
>
>
> Anybody else with these problems?
>
> @bffo
>
>
>
>
> On Dec 3, 2016, at 06:10, Miguel Vazquez >
> wrote:
>
> Denis,
>
> I've tried updating to the latest dockstore and ran into a bug.
>
> ubuntu at ip-10-253-35-14:~$ dockstore
> Error: A JNI error has occurred, please check your installation and try
> again
> Exception in thread "main" java.lang.NoClassDefFoundError:
> io/cwl/avro/CWL$GsonBuildException
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
> at java.lang.Class.getMethod0(Class.java:3018)
> at java.lang.Class.getMethod(Class.java:1784)
> at sun.launcher.LauncherHelper.validateMainClass(
> LauncherHelper.java:544)
> at sun.launcher.LauncherHelper.checkAndLoadMain(
> LauncherHelper.java:526)
> Caused by: java.lang.ClassNotFoundException: io.cwl.avro.CWL$
> GsonBuildException
> at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 7 more
> Caused by: java.util.zip.ZipException: invalid LOC header (bad signature)
> at java.util.zip.ZipFile.read(Native Method)
> at java.util.zip.ZipFile.access$1400(ZipFile.java:60)
> at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717)
> at java.util.zip.ZipFile$ZipFileInflaterInputStream.
> fill(ZipFile.java:419)
> at java.util.zip.InflaterInputStream.read(
> InflaterInputStream.java:158)
> at sun.misc.Resource.getBytes(Resource.java:124)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:462)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
> ... 13 more
>
> What I did is that I moved the old version of dockstore from
> ~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from
>
> https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore
>
>
> I ran the new dockstore, it downloaded
>
> https://seqwaremaven.oicr.on.ca/artifactory/collab-release/
> io/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar
>
>
> and then when I tried running dockstore --version I got the error above.
> Anything I can do?
>
> Best regards
>
> Miguel
>
>
> On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen
> > wrote:
>
>> Hi,
>>
>> I've gone through and created a new release for each of the core tools
>> (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes.
>>
>> 1) Each of these tools has been redirected to use reference data hosted
>> on the icgc portal ( https://dcc.icgc.org/releases/PCAWG
>>
>> ) rather than aws in order to conserve our use of S3. This means that
>> sample json parameter files have been updated and any references inside the
>> Docker images that we're aware of also have been updated. As usual, if
>> re-running workflows frequently, you'll want to host these files locally.
>>
>> If you are using the Dockstore command-line interface, you'll need to
>> upgrade to the latest release of dockstore (1.2) to use this new file
>> location.
>>
>> 2) A number of users have run into problems running the workflow when
>> running in a multi-user environment (i.e. not running Docker containers
>> with the first user on a host). This release replaces most usage of sudo
>> inside the tools with gosu to deal with this issue.
>>
>> The new release numbers are documented at https://wiki.oicr.on.ca/displa
>> y/PANCANCER/Workflow+Testing+Data
>>
>> In particular:
>>
>> - bwa-mem 2.6.8_1.2
>> - dkfz 2.0.1_cwl1.0
>> - sanger 2.0.3
>> - embl 2.0.1-cwl1.0
>>
>>
>>
>> *Denis Yuen*
>> Bioinformatics Software Developer
>>
>>
>> *Ontario* *Institute* *for* *Cancer* *Research*
>> MaRS Centre
>> 661 University Avenue
>> Suite 510
>> Toronto, Ontario, Canada M5G 0A3
>>
>> Toll-free: 1-866-678-6427
>> Twitter: @OICR_news
>> *www.oicr.on.ca
>> *
>>
>> This message and any attachments may contain confidential and/or
>> privileged information for the sole use of the intended recipient. Any
>> review or distribution by anyone other than the person for whom it was
>> originally intended is strictly prohibited. If you have received this
>> message in error, please contact the sender and delete all copies.
>> Opinions, conclusions or other information contained in this message may
>> not be that of the organization.
>>
>> _______________________________________________
>> docktesters mailing list
>> docktesters at lists.icgc.org
>>
>> https://lists.icgc.org/mailman/listinfo/docktesters
>>
>>
>>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
>
> https://lists.icgc.org/mailman/listinfo/docktesters
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From miguel.vazquez at cnio.es Mon Dec 5 12:16:03 2016
From: miguel.vazquez at cnio.es (Miguel Vazquez)
Date: Mon, 5 Dec 2016 18:16:03 +0100
Subject: [DOCKTESTERS] Core workflow icgc reference data location and
gosu
In-Reply-To:
References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
<27512884B2D81B41AAB7BB266248F240C09A5D21@exmb2.ad.oicr.on.ca>
Message-ID:
By the way Denis, I now get this message when running dockstore:
cwltool version is 1.0.20161202203310 , Dockstore is tested with
1.0.20161114152756
Override and run with `--script`
I've added the --script to the command line and it works, but just so you
know.
Miguel
On Mon, Dec 5, 2016 at 4:05 PM, Miguel Vazquez
wrote:
> Thanks Denis, your suggestion of removing the jar files worked.
>
> Best regards
>
> Miguel
>
> On Mon, Dec 5, 2016 at 3:18 PM, Denis Yuen wrote:
>
>> Hi,
>> Having some difficulty reproducing this.
>> It's possible that a jar file was corrupted in transit, try deleting
>> ~/.dockstore/self-installs/ and retrying.
>>
>>
>>
>> ------------------------------
>> *From:* Francis Ouellette
>> *Sent:* December 3, 2016 6:48 PM
>> *To:* docktesters at lists.icgc.org
>> *Cc:* Denis Yuen; Miguel Vazquez
>> *Subject:* Re: [DOCKTESTERS] Core workflow icgc reference data location
>> and gosu
>>
>>
>> Anybody else with these problems?
>>
>> @bffo
>>
>>
>>
>>
>> On Dec 3, 2016, at 06:10, Miguel Vazquez > >
>> wrote:
>>
>> Denis,
>>
>> I've tried updating to the latest dockstore and ran into a bug.
>>
>> ubuntu at ip-10-253-35-14:~$ dockstore
>> Error: A JNI error has occurred, please check your installation and try
>> again
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> io/cwl/avro/CWL$GsonBuildException
>> at java.lang.Class.getDeclaredMethods0(Native Method)
>> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>> at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
>> at java.lang.Class.getMethod0(Class.java:3018)
>> at java.lang.Class.getMethod(Class.java:1784)
>> at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper
>> .java:544)
>> at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.
>> java:526)
>> Caused by: java.lang.ClassNotFoundException:
>> io.cwl.avro.CWL$GsonBuildException
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 7 more
>> Caused by: java.util.zip.ZipException: invalid LOC header (bad signature)
>> at java.util.zip.ZipFile.read(Native Method)
>> at java.util.zip.ZipFile.access$1400(ZipFile.java:60)
>> at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:
>> 717)
>> at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(
>> ZipFile.java:419)
>> at java.util.zip.InflaterInputStream.read(InflaterInputStream.
>> java:158)
>> at sun.misc.Resource.getBytes(Resource.java:124)
>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:462)
>> at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>> ... 13 more
>>
>> What I did is that I moved the old version of dockstore from
>> ~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from
>>
>> https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore
>>
>>
>> I ran the new dockstore, it downloaded
>>
>> https://seqwaremaven.oicr.on.ca/artifactory/collab-release/i
>> o/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar
>>
>>
>> and then when I tried running dockstore --version I got the error above.
>> Anything I can do?
>>
>> Best regards
>>
>> Miguel
>>
>>
>> On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen >
>> > wrote:
>>
>>> Hi,
>>>
>>> I've gone through and created a new release for each of the core tools
>>> (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes.
>>>
>>> 1) Each of these tools has been redirected to use reference data hosted
>>> on the icgc portal ( https://dcc.icgc.org/releases/PCAWG
>>>
>>> ) rather than aws in order to conserve our use of S3. This means that
>>> sample json parameter files have been updated and any references inside the
>>> Docker images that we're aware of also have been updated. As usual, if
>>> re-running workflows frequently, you'll want to host these files locally.
>>>
>>> If you are using the Dockstore command-line interface, you'll need to
>>> upgrade to the latest release of dockstore (1.2) to use this new file
>>> location.
>>>
>>> 2) A number of users have run into problems running the workflow when
>>> running in a multi-user environment (i.e. not running Docker containers
>>> with the first user on a host). This release replaces most usage of sudo
>>> inside the tools with gosu to deal with this issue.
>>>
>>> The new release numbers are documented at https://wiki.oicr.on.ca/displa
>>> y/PANCANCER/Workflow+Testing+Data
>>>
>>> In particular:
>>>
>>> - bwa-mem 2.6.8_1.2
>>> - dkfz 2.0.1_cwl1.0
>>> - sanger 2.0.3
>>> - embl 2.0.1-cwl1.0
>>>
>>>
>>>
>>> *Denis Yuen*
>>> Bioinformatics Software Developer
>>>
>>>
>>> *Ontario* *Institute* *for* *Cancer* *Research*
>>> MaRS Centre
>>> 661 University Avenue
>>> Suite 510
>>> Toronto, Ontario, Canada M5G 0A3
>>>
>>> Toll-free: 1-866-678-6427
>>> Twitter: @OICR_news
>>> *www.oicr.on.ca
>>> *
>>>
>>> This message and any attachments may contain confidential and/or
>>> privileged information for the sole use of the intended recipient. Any
>>> review or distribution by anyone other than the person for whom it was
>>> originally intended is strictly prohibited. If you have received this
>>> message in error, please contact the sender and delete all copies.
>>> Opinions, conclusions or other information contained in this message may
>>> not be that of the organization.
>>>
>>> _______________________________________________
>>> docktesters mailing list
>>> docktesters at lists.icgc.org
>>>
>>> https://lists.icgc.org/mailman/listinfo/docktesters
>>>
>>>
>>>
>> _______________________________________________
>> docktesters mailing list
>> docktesters at lists.icgc.org
>>
>> https://lists.icgc.org/mailman/listinfo/docktesters
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Denis.Yuen at oicr.on.ca Mon Dec 5 12:47:40 2016
From: Denis.Yuen at oicr.on.ca (Denis Yuen)
Date: Mon, 5 Dec 2016 17:47:40 +0000
Subject: [DOCKTESTERS] Core workflow icgc reference data location and
gosu
In-Reply-To:
References: <27512884B2D81B41AAB7BB266248F240C09A54A3@exmb2.ad.oicr.on.ca>
<27512884B2D81B41AAB7BB266248F240C09A5D21@exmb2.ad.oicr.on.ca>
,
Message-ID: <27512884B2D81B41AAB7BB266248F240C09A617F@exmb2.ad.oicr.on.ca>
Hi,
Yup, we've added that warning because we test with a specific version of cwltool.
It looks like you have a newer version of cwltool which is probably ok (assuming you haven't run into any problems). The warning is more aimed at users of older versions of cwltool than newer.
Denis Yuen
Bioinformatics Software Developer
Ontario Institute for Cancer Research
MaRS Centre
661 University Avenue
Suite 510
Toronto, Ontario, Canada M5G 0A3
Toll-free: 1-866-678-6427
Twitter: @OICR_news
www.oicr.on.ca
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
________________________________
From: mikisvaz at gmail.com [mikisvaz at gmail.com] on behalf of Miguel Vazquez [miguel.vazquez at cnio.es]
Sent: December 5, 2016 12:16 PM
To: Denis Yuen
Cc: Francis Ouellette; docktesters at lists.icgc.org
Subject: Re: [DOCKTESTERS] Core workflow icgc reference data location and gosu
By the way Denis, I now get this message when running dockstore:
cwltool version is 1.0.20161202203310 , Dockstore is tested with 1.0.20161114152756
Override and run with `--script`
I've added the --script to the command line and it works, but just so you know.
Miguel
On Mon, Dec 5, 2016 at 4:05 PM, Miguel Vazquez > wrote:
Thanks Denis, your suggestion of removing the jar files worked.
Best regards
Miguel
On Mon, Dec 5, 2016 at 3:18 PM, Denis Yuen > wrote:
Hi,
Having some difficulty reproducing this.
It's possible that a jar file was corrupted in transit, try deleting ~/.dockstore/self-installs/ and retrying.
________________________________
From: Francis Ouellette
Sent: December 3, 2016 6:48 PM
To: docktesters at lists.icgc.org
Cc: Denis Yuen; Miguel Vazquez
Subject: Re: [DOCKTESTERS] Core workflow icgc reference data location and gosu
Anybody else with these problems?
@bffo
On Dec 3, 2016, at 06:10, Miguel Vazquez > wrote:
Denis,
I've tried updating to the latest dockstore and ran into a bug.
ubuntu at ip-10-253-35-14:~$ dockstore
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: io/cwl/avro/CWL$GsonBuildException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: io.cwl.avro.CWL$GsonBuildException
at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Caused by: java.util.zip.ZipException: invalid LOC header (bad signature)
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1400(ZipFile.java:60)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717)
at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at sun.misc.Resource.getBytes(Resource.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:462)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
... 13 more
What I did is that I moved the old version of dockstore from ~/bin/dockstore to ~/bin/dockstore.old and downloaded the new version from
https://github.com/ga4gh/dockstore/releases/download/1.1/dockstore
I ran the new dockstore, it downloaded
https://seqwaremaven.oicr.on.ca/artifactory/collab-release/io/dockstore/dockstore-client/1.1/dockstore-client-1.1.jar
and then when I tried running dockstore --version I got the error above. Anything I can do?
Best regards
Miguel
On Fri, Dec 2, 2016 at 6:15 PM, Denis Yuen > wrote:
Hi,
I've gone through and created a new release for each of the core tools (bwa-mem, delly, dkfz, sanger) with two (non-scientific) changes.
1) Each of these tools has been redirected to use reference data hosted on the icgc portal ( https://dcc.icgc.org/releases/PCAWG ) rather than aws in order to conserve our use of S3. This means that sample json parameter files have been updated and any references inside the Docker images that we're aware of also have been updated. As usual, if re-running workflows frequently, you'll want to host these files locally.
If you are using the Dockstore command-line interface, you'll need to upgrade to the latest release of dockstore (1.2) to use this new file location.
2) A number of users have run into problems running the workflow when running in a multi-user environment (i.e. not running Docker containers with the first user on a host). This release replaces most usage of sudo inside the tools with gosu to deal with this issue.
The new release numbers are documented at https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
In particular:
* bwa-mem 2.6.8_1.2
* dkfz 2.0.1_cwl1.0
* sanger 2.0.3
* embl 2.0.1-cwl1.0
Denis Yuen
Bioinformatics Software Developer
Ontario Institute for Cancer Research
MaRS Centre
661 University Avenue
Suite 510
Toronto, Ontario, Canada M5G 0A3
Toll-free: 1-866-678-6427
Twitter: @OICR_news
www.oicr.on.ca
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org
https://lists.icgc.org/mailman/listinfo/docktesters
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org
https://lists.icgc.org/mailman/listinfo/docktesters
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From buchanae at ohsu.edu Mon Dec 5 13:21:33 2016
From: buchanae at ohsu.edu (Alexander Buchanan)
Date: Mon, 5 Dec 2016 18:21:33 +0000
Subject: [DOCKTESTERS] Variant call validation results for Sanger
Message-ID:
Regarding the validation results I posted last Friday, we think these poor results are likely due an upstream issue and not the Sanger workflow itself. Those variant call results were from a larger process including fastq prep., alignment, and then Sanger variant calling, and we think we introduced a problem early on during fastq prep.
We have a different set of Sanger results that reused the existing alignments from GNOS, and those variants match the expected results much more closely (99.99% match). One example output from USeq:
82486 Key variants
82486 Key variants in shared regions
0.953626071716167 Shared key variants Ti/Tv
82482 Test variants
82482 Test variants in shared regions
0.9536238749407864 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKeyFPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 82479 3 3.6371574E-5 3.6371574E-5 0.9999151 3.636981E-5 0.99996364
From: on behalf of Alexander Buchanan
Date: Friday, December 2, 2016 at 4:11 PM
To: "docktesters at lists.icgc.org"
Subject: [DOCKTESTERS] Variant call validation results for Sanger
I was able to run USeq on data output from running the sanger workflow on a Cromwell engine, for 5 donors. It?s reporting some pretty big differences, so I still need investigate. I?ll copy the USeq output at the end of this email.
I also wrote a simple comparison script, similar to what Miguel is doing (but in python), and it also reports differences.
At this point, I don?t know the source of the difference. Maybe I?m not comparing the data correctly, or maybe the workflows were run incorrectly. Maybe the tools have some element of randomness, but I?m not sure that would explain the substantial differences.
Output from python:
python test.py
==================================================
Donor: DO50414
intersection 17395
key - test 10904
test - key 6722
==================================================
Donor: DO50415
intersection 34721
key - test 17806
test - key 8755
==================================================
Donor: DO50417
intersection 81477
key - test 39521
test - key 15959
==================================================
Donor: DO50419
intersection 82705
key - test 41674
test - key 15262
==================================================
Donor: DO50432
intersection 3941
key - test 24358
test - key 138224
Output from USeq:
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50414/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50414/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
28299 Key variants
28299 Key variants in shared regions
0.904886914378029 Shared key variants Ti/Tv
24117 Test variants
24117 Test variants in shared regions
0.919073764621628 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 17395 6722 0.27872455 0.27872455 0.614686 0.2375349 0.72127545
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50415/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50415/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
52527 Key variants
52527 Key variants in shared regions
0.9612067356158758 Shared key variants Ti/Tv
43476 Test variants
43476 Test variants in shared regions
0.9573203673689897 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 34721 8755 0.20137547 0.20137547 0.6610124 0.16667618 0.7986245
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50417/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50417/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
120998 Key variants
120998 Key variants in shared regions
0.9540073962824799 Shared key variants Ti/Tv
97436 Test variants
97436 Test variants in shared regions
0.9392950261728001 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 81477 15959 0.16378957 0.16378957 0.6733748 0.13189474 0.8362104
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50419/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50419/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
124379 Key variants
124379 Key variants in shared regions
0.9678664662605807 Shared key variants Ti/Tv
97967 Test variants
97967 Test variants in shared regions
0.9450632358488693 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 82705 15262 0.15578716 0.15578716 0.66494346 0.1227056 0.84421283
Done! 3 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50432/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50432/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
28299 Key variants
28299 Key variants in shared regions
0.904886914378029 Shared key variants Ti/Tv
142165 Test variants
142165 Test variants in shared regions
0.9905488658639037 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 3940 138225 0.97228575 0.97228575 0.13922754 4.884448 0.027714275
Done! 4 seconds
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From francis at oicr.on.ca Tue Dec 6 10:41:33 2016
From: francis at oicr.on.ca (Francis Ouellette)
Date: Tue, 6 Dec 2016 15:41:33 +0000
Subject: [DOCKTESTERS] Sanger pipeline completed,
small discrepancies with GNOS VCF (99.9905% accuracy)
In-Reply-To:
References:
Message-ID:
Miguel,
I personally think this is a slippery slope, but the PCAWG-tech (i.e. Lincoln and Christina) asked
if you could repeat this experiment, and see if you get the same different variants, or the same,
and in particular, do you think the minor differences you see are caused by the cloud
infrastructure you are using.
Would be good to test for that too.
Can you redo before Monday?s call?
@bffo
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
On Dec 5, 2016, at 7:41 AM, Miguel Vazquez > wrote:
Dear all,
The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311
The results are the following:
Comparison for DO50311 using Sanger
---
Common: 156299
Extra: 1
- Example: Y:58885197:G
Missing: 14
- Example: 1:102887902:T,1:143165228:G,16:87047601:C
The donor results for DKFZ yielded
Comparison for DO50311 using DKFZ
---
Common: 51087
Extra: 0
Missing: 0
In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here
https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
Best regards
Miguel
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From francis at oicr.on.ca Tue Dec 6 10:52:52 2016
From: francis at oicr.on.ca (Francis Ouellette)
Date: Tue, 6 Dec 2016 15:52:52 +0000
Subject: [DOCKTESTERS] Variant call validation results for Sanger
In-Reply-To:
References:
Message-ID: <6AF31004-13C6-4EF0-B3A4-9988D284D8D2@oicr.on.ca>
Hi Alex,
Likewise here, on the test that ?worked?, are the differences platform specific, and
are they reproducible.
I think we only need to do this a couple of times, to inform us if the differences are
operator and/or platform specific, or simply (which I think) more about the heuristics
of the testing we are doing.
Thank you for looking into this.
@bffo
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
On Dec 5, 2016, at 1:21 PM, Alexander Buchanan > wrote:
Regarding the validation results I posted last Friday, we think these poor results are likely due an upstream issue and not the Sanger workflow itself. Those variant call results were from a larger process including fastq prep., alignment, and then Sanger variant calling, and we think we introduced a problem early on during fastq prep.
We have a different set of Sanger results that reused the existing alignments from GNOS, and those variants match the expected results much more closely (99.99% match). One example output from USeq:
82486 Key variants
82486 Key variants in shared regions
0.953626071716167 Shared key variants Ti/Tv
82482 Test variants
82482 Test variants in shared regions
0.9536238749407864 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKeyFPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 82479 3 3.6371574E-5 3.6371574E-5 0.9999151 3.636981E-5 0.99996364
From: > on behalf of Alexander Buchanan >
Date: Friday, December 2, 2016 at 4:11 PM
To: "docktesters at lists.icgc.org" >
Subject: [DOCKTESTERS] Variant call validation results for Sanger
I was able to run USeq on data output from running the sanger workflow on a Cromwell engine, for 5 donors. It?s reporting some pretty big differences, so I still need investigate. I?ll copy the USeq output at the end of this email.
I also wrote a simple comparison script, similar to what Miguel is doing (but in python), and it also reports differences.
At this point, I don?t know the source of the difference. Maybe I?m not comparing the data correctly, or maybe the workflows were run incorrectly. Maybe the tools have some element of randomness, but I?m not sure that would explain the substantial differences.
Output from python:
python test.py
==================================================
Donor: DO50414
intersection 17395
key - test 10904
test - key 6722
==================================================
Donor: DO50415
intersection 34721
key - test 17806
test - key 8755
==================================================
Donor: DO50417
intersection 81477
key - test 39521
test - key 15959
==================================================
Donor: DO50419
intersection 82705
key - test 41674
test - key 15262
==================================================
Donor: DO50432
intersection 3941
key - test 24358
test - key 138224
Output from USeq:
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50414/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50414/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
28299 Key variants
28299 Key variants in shared regions
0.904886914378029 Shared key variants Ti/Tv
24117 Test variants
24117 Test variants in shared regions
0.919073764621628 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 17395 6722 0.27872455 0.27872455 0.614686 0.2375349 0.72127545
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50415/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50415/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
52527 Key variants
52527 Key variants in shared regions
0.9612067356158758 Shared key variants Ti/Tv
43476 Test variants
43476 Test variants in shared regions
0.9573203673689897 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 34721 8755 0.20137547 0.20137547 0.6610124 0.16667618 0.7986245
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50417/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50417/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
120998 Key variants
120998 Key variants in shared regions
0.9540073962824799 Shared key variants Ti/Tv
97436 Test variants
97436 Test variants in shared regions
0.9392950261728001 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 81477 15959 0.16378957 0.16378957 0.6733748 0.13189474 0.8362104
Done! 4 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50419/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50419/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
124379 Key variants
124379 Key variants in shared regions
0.9678664662605807 Shared key variants Ti/Tv
97967 Test variants
97967 Test variants in shared regions
0.9450632358488693 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 82705 15262 0.15578716 0.15578716 0.66494346 0.1227056 0.84421283
Done! 3 seconds
[2 Dec 2016 15:09] USeq_8.9.6 Arguments: -a key-data/DO50432/somatic_snv_mnv.vcf.gz -b human_chromosomes.bed -c ccc-output/DO50432/somatic_snv_mnv.vcf.gz -d human_chromosomes.bed
VCF Comparator Settings:
somatic_snv_mnv.vcf.gz Key vcf file
human_chromosomes.bed Key interrogated regions file
somatic_snv_mnv.vcf.gz Test vcf file
human_chromosomes.bed Test interrogated regions file
true Require matching alternate bases
false Require matching genotypes
false Use record VQSLOD score as ranking statistic
false Exclude non PASS or . records
true Compare all variant
Parsing and filtering variant data for common interrogated regions...
Comparing calls...
3137454505 Interrogated bps in key
3137454505 Interrogated bps in test
3137454505 Interrogated bps in common
28299 Key variants
28299 Key variants in shared regions
0.904886914378029 Shared key variants Ti/Tv
142165 Test variants
142165 Test variants in shared regions
0.9905488658639037 Shared test variants Ti/Tv
QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest)
none 3940 138225 0.97228575 0.97228575 0.13922754 4.884448 0.027714275
Done! 4 seconds
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org
https://lists.icgc.org/mailman/listinfo/docktesters
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From francis at oicr.on.ca Fri Dec 9 14:46:31 2016
From: francis at oicr.on.ca (Francis Ouellette)
Date: Fri, 9 Dec 2016 19:46:31 +0000
Subject: [DOCKTESTERS] Fwd: [PAWG-TECH] Draft agenda for PCAWG-TECH
teleconference
References: <0F84ED6166CE664E8563B61CB2ECB98CCBB4123C@exmb2.ad.oicr.on.ca>
Message-ID: <357A36EA-3A35-4B69-8C72-BD422005718B@oicr.on.ca>
Please e-mail Miguel( and/or this list) if there are any docktesting updates before Monday.
I will not be on pcawg-tech call, but Miguel will be.
https://wiki.oicr.on.ca/display/PANCANCER/2016-12-12+PCAWG-TECH+Teleconference
Have a great weekend,
@bffo
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
Begin forwarded message:
From: Christina Yung >
Subject: [PAWG-TECH] Draft agenda for PCAWG-TECH teleconference
Date: December 9, 2016 at 2:40:11 PM EST
To: "pawg-tech (pawg-tech at lists.icgc.org)" >
Hi Everyone,
Below is a draft agenda for Monday?s tech call. Please feel free to edit on the wiki:
https://wiki.oicr.on.ca/display/PANCANCER/2016-12-12+PCAWG-TECH+Teleconference
Best regards,
Christina
Call Info
Usual Time 9 AM Eastern Time, Mondays
UK 0208 322 2500
Canada 1-866-220-6419
United States 1-877-420-0272
All Others Please see attached PDF file with a list of numbers for other countries.
Participant Code 5910819#
Agenda
Time
Item
Who
Attachments/Links
5min
Welcome. Wait for group members to log on
Christina Yung, OICR
10min
Overall status
Christina Yung, OICR
? Linkouts to Most Current PCAWG Data
? Report data issues to pcawg-data at icgc.org, GNOS issue to: Help at annaisystems.com
? From Boston F2F: PCAWG datasets & dependencies
Action Items
1. [Junjun] Specimen ID mapping for miRNA and methylation, see update below:
a. miRNA release posted in Linkouts to Most Current PCAWG Data item #11.
i. Release table includes various entity IDs, source miRNA data, and mapping between miRNA samples and WGS / RNA-Seq samples, Google Spreadsheet: https://docs.google.com/spreadsheets/d/1QG8UTaX71H6tpvt-XJp0_pGFkxURWZ8tC1cIMg3sMJM/edit#gid=2063341485
ii. The release is also available here in TSV, JSONL formats (easy for programmatic parsing):
? http://pancancer.info/data_releases/mirna/mirna_release.v1.0.jsonl
? http://pancancer.info/data_releases/mirna/mirna_release.v1.0.tsv
? http://pancancer.info/data_releases/mirna/mirna_sample_sheet.v1.0.tsv
b. New RNA expression data file produced, previously reported issue about missing aliquot fixed.
c. Methylation WG will have have first complete set of analysis result by mid December. Will wait till then to revisit the matching of sample IDs.
2. [Junjun] Create directories on ICGC Portal for reference datasets, see update below:
a. Reference data folder structure reorganized available at https://dcc.icgc.org/releases/PCAWG (README under each sub-folder will be added to provide addition information).
PCAWG
??? reference_data
??? data_for_testing
??? pcawg-broad
??? pcawg-bwa-mem
??? pcawg-delly
??? pcawg-sanger
b.
c. Other additional reference data from other working groups can be added too.
? hg19_cosmic_v54_120711.vcf - Adam Butler confirmed that this file can be redistrubted with the other references files used by Broad.
3. [Jonathan] For cell lines, consensus SVs available: pdf; syn7373725. Consensus SNVs & indels:
. merged results passed to Broad for filtering
a. dkfz-filtered SNVs have been added to https://www.synapse.org/#!Synapse:syn7510859
4. [Christina] For medulloblastoma sample (tumor 40x, normal 30x) from ICGC benchmark, run alignment & variant workflows
. Sanger - completed
a. DKFZ/EMBL - Encountered 2nd Roddy error. Logs sent to Michael.
b. Broad v1 completed - passed to Broad to fix
5. [All] Contribute to the manuscripts
. infrastructure: Paper ( https://goo.gl/utx3cC ), Supplement ( https://goo.gl/gtYUv7 )
a. variants: Paper ( https://goo.gl/g9CLsu ), Supplement ( https://goo.gl/EWYh7e )
b. Rogue's Gallery of Cancer Genome Sequencing Artifacts ( outline )
i. some examples from Matthias: GroupSeminar_20160531_onlyPCAWG.pptx
? Nuno just reported 7 whitelisted aliquots are missing from V1.6 of consensus SV. Joachim explained that 1 of the 4 SV callers failed for these aliquots, and has added a note to Synapse.
15min
Update on publication plan
Lincoln Stein, OICR
10min
Status of dockerizing workflows
Denis Yuen, OICR
Brian O'Connor, UCSC
Gordon Saksena, Broad
Status of PCAWG Workflow ports to Dockstore:
1. [Kyle] MuSE
2. [Gordon] Broad dockers with #days estimated for development
a. 0 - tokens alg, for SNV/Indel PoN generation (not GATK)
b. 5 - SNV/Indel PoN filter
c. 3 - MuTect with ContEst wiring bug, plus rescue, plus SV minibam, using public PoN (GATK)
d. 3 - MuTect without ContEst wiring bug, using public PoN. (GATK license)
e. ? - MuTect PoN generator
f. 3 - OxoQ measure + OxoG filter (GATK license due to filter step)
g. 5 - Snowman (GPL needs to be unlinked. not GATK)
h. 5 - dRanger, includes preprocess and BreakPointer. (does public PoN exist? release without PoN?)
i. ? - dRanger PoN generator
j. 3 - frag counter
k. 3 - haplotype caller (GATK)
l. 3 - RECapseg (GATK)
m. 3 - vcf merge (GATK)
n. ? - het sites force call (GATK, results never used due to bug)
o. ? - BQSR + cocleaning (relying on Kyle)
p. 10 - deTiN - detection and filter (GATK)
q. 3 - gVCF merge (GATK)
r. 5 - blat SNV filter (GATK)
s. 10 - germline overlap SNV filter (GATK)
t. ? - Variantbam
3. [Jonathan] Consensus algorithm for SNVs: 2+ of 4
4. [Jonathan] Consensus algorithm for indels: stacked logistic regression - Update from Jonathan on 11/22: 1 week to work on scripts, 1.5 week to Dockerize
5. [Joachim] Consensus algorithm for SVs - will check in on 12/5
Of note:
? Naming convention: "pcawg--workflow" for complex, docker-based workflows; or "pcawg--tool" for standalone, single tools
? currently working on releases using gosu to tackle "unknown user issue" and new test.json pointing at pcawg site
? PCAWG DOI Generation for a howto guide on doing this (we use GitHub + Zenodo)
? Brian's Dockstore tutorials:
? https://www.youtube.com/watch?v=sInP-ByF9xU
? https://www.youtube.com/edit?video_id=-JuKsSQja3g
? Tutorial from 12/6: https://goo.gl/2bnXq & https://www.youtube.com/watch?v=Gb6LnmpZj_g
10min
Status of testing dockerized workflows
Francis Ouellette, OICR
Denis Yuen, OICR
Brian O'Connor, UCSC
PCAWG Docker (Dockstore) Testing Working Group
Workflow Testing Data
Docker containers to be tested
________________________________
Copy of table from "Workflow Testing Data" representing latest status of PCAWG docker
containers currentlt present on the PUBLIC Dockstore.org and what the status of their testing is (taken 08:30 AM EDT, Dec 5, 2017).
[cid:image001.png at 01D2522A.2678F110]
________________________________
Status of workflows being tested:
1. BWA-Mem
2. Sanger - Changes proposed by Keiran have been made. Currently testing if the new version fixes issues with a specific donor (DO50311); test had to be restarted due to reboot for security patch
a. Tests passed with 2.0.2, watch out though, test data ran at the regular time, but the previously failing donor DO50311 took upward of 8 days
3. EMBL
4. DKFZ
5. DKFZ's PCR & strand bias filtering
5min
Other business?
Group
Christina K. Yung, PhD
Project Manager, Cancer Genome Collaboratory
Ontario Institute for Cancer Research
MaRS Centre
661 University Avenue, Suite 510
Toronto, Ontario, Canada M5G 0A3
Tel: 416-673-8578
www.oicr.on.ca
www.cancercollaboratory.org
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
_______________________________________________
PAWG-TECH mailing list
PAWG-TECH at lists.icgc.org
https://lists.icgc.org/mailman/listinfo/pawg-tech
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 240 bytes
Desc: image001.png
URL:
From miguel.vazquez at cnio.es Mon Dec 12 04:50:02 2016
From: miguel.vazquez at cnio.es (Miguel Vazquez)
Date: Mon, 12 Dec 2016 10:50:02 +0100
Subject: [DOCKTESTERS] =?utf-8?q?_DKFZ_2=C2=BA_validation_=28DO52140=29=3A?=
=?utf-8?q?_100=25_match!?=
Message-ID:
Hi all,
I've a second validation result for the DKFZ (plus Delly) workflow and that
confirms the perfect results on the previous one.
*Comparison for DO52140 using DKFZ---Common: 37160Extra: 0Missing: 0*
I've updated the information in the wiki. A second validation in on the way
for Sanger, for which the first one had a few missmatches (+1; -14).
Best
Miguel
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From miguel.vazquez at cnio.es Mon Dec 12 05:44:07 2016
From: miguel.vazquez at cnio.es (Miguel Vazquez)
Date: Mon, 12 Dec 2016 11:44:07 +0100
Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger
pipeline with GNOS VCF (99.9905% accuracy)
Message-ID:
Dear all,
I was wondering if someone here was acquainted with the Sanger workflow and
could help explain these discrepancies. I've skimmed through the code, and
it seems like uses EM but I didn't find anything random in it, such as
during initialization, which was my initial guess. The other thing I though
is that when it splits the work for parallel processing it might choose a
different number of splits to accommodate the number of CPUs, and that this
might affect the calculations.
Is there someone here that could help shed some light? As soon as some
other tests finish I'll be running the process again, but since it takes so
long perhaps a little insight would help.
Best regards
Miguel
On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez
wrote:
> Dear all,
>
> The Sanger pipeline completed, after about 2 weeks of computing, for donor
> DO50311
>
> The results are the following:
>
>
>
>
>
>
>
> *Comparison for DO50311 using Sanger---Common: 156299Extra: 1 -
> Example: Y:58885197:GMissing: 14 - Example:
> 1:102887902:T,1:143165228:G,16:87047601:C*
>
>
> The donor results for DKFZ yielded
>
> Comparison for DO50311 using DKFZ
> ---
> Common: 51087
> Extra: 0
> Missing: 0
>
>
> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've
> updated the information here
>
> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
>
>
> Best regards
>
> Miguel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From francis at oicr.on.ca Mon Dec 12 07:38:46 2016
From: francis at oicr.on.ca (Francis Ouellette)
Date: Mon, 12 Dec 2016 12:38:46 +0000
Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger
pipeline with GNOS VCF (99.9905% accuracy)
In-Reply-To:
References:
Message-ID: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca>
I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow.
If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down.
Anyway, going off to my day off,
Have a ghre at discussion,
Francis
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
On Dec 12, 2016, at 05:44, Miguel Vazquez > wrote:
Dear all,
I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations.
Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help.
Best regards
Miguel
On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez > wrote:
Dear all,
The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311
The results are the following:
Comparison for DO50311 using Sanger
---
Common: 156299
Extra: 1
- Example: Y:58885197:G
Missing: 14
- Example: 1:102887902:T,1:143165228:G,16:87047601:C
The donor results for DKFZ yielded
Comparison for DO50311 using DKFZ
---
Common: 51087
Extra: 0
Missing: 0
In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here
https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
Best regards
Miguel
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Brian.OConnor at oicr.on.ca Mon Dec 12 09:12:48 2016
From: Brian.OConnor at oicr.on.ca (Brian O'Connor)
Date: Mon, 12 Dec 2016 14:12:48 +0000
Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger
pipeline with GNOS VCF (99.9905% accuracy)
In-Reply-To: <65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca>
References:
<65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca>
Message-ID:
Hi Francis,
I agree with you, I think Miguel is showing what this group needs to show, that someone else can run the tools from Dockstore, have that be successful, and the results are largely in agreement with previous results (or duplicate runs). I think maybe a statement about the possibility of stochastic results in the README for each tool would be sufficient. This could be something that Keiran can craft/comment for Sanger?s pipeline since he?s in the best position for this one.
Brian
> On Dec 12, 2016, at 7:38 AM, Francis Ouellette wrote:
>
> I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow.
>
> If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down.
>
> Anyway, going off to my day off,
>
> Have a ghre at discussion,
>
> Francis
>
> --
> B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
>
> On Dec 12, 2016, at 05:44, Miguel Vazquez wrote:
>
>> Dear all,
>>
>> I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations.
>>
>> Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help.
>>
>> Best regards
>>
>> Miguel
>>
>>
>>
>> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez wrote:
>> Dear all,
>>
>> The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311
>>
>> The results are the following:
>>
>> Comparison for DO50311 using Sanger
>> ---
>> Common: 156299
>> Extra: 1
>> - Example: Y:58885197:G
>> Missing: 14
>> - Example: 1:102887902:T,1:143165228:G,16:87047601:C
>>
>>
>> The donor results for DKFZ yielded
>>
>> Comparison for DO50311 using DKFZ
>> ---
>> Common: 51087
>> Extra: 0
>> Missing: 0
>>
>>
>> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here
>>
>> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
>>
>>
>> Best regards
>>
>> Miguel
>>
>>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
> https://lists.icgc.org/mailman/listinfo/docktesters
From kr2 at sanger.ac.uk Mon Dec 12 09:38:01 2016
From: kr2 at sanger.ac.uk (Keiran Raine)
Date: Mon, 12 Dec 2016 14:38:01 +0000
Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger
pipeline with GNOS VCF (99.9905% accuracy)
In-Reply-To:
References:
<65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca>
Message-ID: <64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk>
Hi,
I'd need access to the full set of result files from the run but can you confirm the pindel and ASCAT VCF's exactly the same? Both feed into caveman analysis.
ASCAT is the least stable of the algorithms as it randomly assigns the B-allele and if this donor is known to have an unusual copynumber/rearrangment state it is likely to be the cause (I wouldn't consider a sample like this to particularly good for testing though).
What were the results on the other samples, I assume cleaner data has also been run?
Regards,
Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute
kr2 at sanger.ac.uk
Tel:+44 (0)1223 834244 Ext: 4983
Office: H104
> On 12 Dec 2016, at 14:12, Brian O'Connor wrote:
>
> Hi Francis,
>
> I agree with you, I think Miguel is showing what this group needs to show, that someone else can run the tools from Dockstore, have that be successful, and the results are largely in agreement with previous results (or duplicate runs). I think maybe a statement about the possibility of stochastic results in the README for each tool would be sufficient. This could be something that Keiran can craft/comment for Sanger?s pipeline since he?s in the best position for this one.
>
> Brian
>
>> On Dec 12, 2016, at 7:38 AM, Francis Ouellette wrote:
>>
>> I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow.
>>
>> If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down.
>>
>> Anyway, going off to my day off,
>>
>> Have a ghre at discussion,
>>
>> Francis
>>
>> --
>> B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
>>
>> On Dec 12, 2016, at 05:44, Miguel Vazquez wrote:
>>
>>> Dear all,
>>>
>>> I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations.
>>>
>>> Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help.
>>>
>>> Best regards
>>>
>>> Miguel
>>>
>>>
>>>
>>> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez wrote:
>>> Dear all,
>>>
>>> The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311
>>>
>>> The results are the following:
>>>
>>> Comparison for DO50311 using Sanger
>>> ---
>>> Common: 156299
>>> Extra: 1
>>> - Example: Y:58885197:G
>>> Missing: 14
>>> - Example: 1:102887902:T,1:143165228:G,16:87047601:C
>>>
>>>
>>> The donor results for DKFZ yielded
>>>
>>> Comparison for DO50311 using DKFZ
>>> ---
>>> Common: 51087
>>> Extra: 0
>>> Missing: 0
>>>
>>>
>>> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here
>>>
>>> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
>>>
>>>
>>> Best regards
>>>
>>> Miguel
>>>
>>>
>> _______________________________________________
>> docktesters mailing list
>> docktesters at lists.icgc.org
>> https://lists.icgc.org/mailman/listinfo/docktesters
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From kr2 at sanger.ac.uk Mon Dec 12 09:48:11 2016
From: kr2 at sanger.ac.uk (Keiran Raine)
Date: Mon, 12 Dec 2016 14:48:11 +0000
Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger
pipeline with GNOS VCF (99.9905% accuracy)
In-Reply-To:
References:
<65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca>
Message-ID: <38E4FB68-1BDF-458D-83C0-30BE53647805@sanger.ac.uk>
Additionally, are these stats based on PASSED SUB variants? A couple of the missing items are clear SNPs which would be filtered.
Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute
kr2 at sanger.ac.uk
Tel:+44 (0)1223 834244 Ext: 4983
Office: H104
> On 12 Dec 2016, at 14:12, Brian O'Connor wrote:
>
> Hi Francis,
>
> I agree with you, I think Miguel is showing what this group needs to show, that someone else can run the tools from Dockstore, have that be successful, and the results are largely in agreement with previous results (or duplicate runs). I think maybe a statement about the possibility of stochastic results in the README for each tool would be sufficient. This could be something that Keiran can craft/comment for Sanger?s pipeline since he?s in the best position for this one.
>
> Brian
>
>> On Dec 12, 2016, at 7:38 AM, Francis Ouellette wrote:
>>
>> I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow.
>>
>> If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down.
>>
>> Anyway, going off to my day off,
>>
>> Have a ghre at discussion,
>>
>> Francis
>>
>> --
>> B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
>>
>> On Dec 12, 2016, at 05:44, Miguel Vazquez wrote:
>>
>>> Dear all,
>>>
>>> I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations.
>>>
>>> Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help.
>>>
>>> Best regards
>>>
>>> Miguel
>>>
>>>
>>>
>>> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez wrote:
>>> Dear all,
>>>
>>> The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311
>>>
>>> The results are the following:
>>>
>>> Comparison for DO50311 using Sanger
>>> ---
>>> Common: 156299
>>> Extra: 1
>>> - Example: Y:58885197:G
>>> Missing: 14
>>> - Example: 1:102887902:T,1:143165228:G,16:87047601:C
>>>
>>>
>>> The donor results for DKFZ yielded
>>>
>>> Comparison for DO50311 using DKFZ
>>> ---
>>> Common: 51087
>>> Extra: 0
>>> Missing: 0
>>>
>>>
>>> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here
>>>
>>> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
>>>
>>>
>>> Best regards
>>>
>>> Miguel
>>>
>>>
>> _______________________________________________
>> docktesters mailing list
>> docktesters at lists.icgc.org
>> https://lists.icgc.org/mailman/listinfo/docktesters
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From miguel.vazquez at cnio.es Mon Dec 12 09:51:36 2016
From: miguel.vazquez at cnio.es (Miguel Vazquez)
Date: Mon, 12 Dec 2016 15:51:36 +0100
Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger
pipeline with GNOS VCF (99.9905% accuracy)
In-Reply-To: <38E4FB68-1BDF-458D-83C0-30BE53647805@sanger.ac.uk>
References:
<65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca>
<38E4FB68-1BDF-458D-83C0-30BE53647805@sanger.ac.uk>
Message-ID:
Hi Keiran,
No, I've not filtered for PASS, so these are all variants as far as I know.
Best
M
On Mon, Dec 12, 2016 at 3:48 PM, Keiran Raine wrote:
> Additionally, are these stats based on PASSED SUB variants? A couple of
> the missing items are clear SNPs which would be filtered.
>
> Keiran Raine
> Principal Bioinformatician
> Cancer Genome Project
> Wellcome Trust Sanger Institute
>
> kr2 at sanger.ac.uk
> Tel:+44 (0)1223 834244 Ext: 4983 <+44%201223%20834244>
> Office: H104
>
> On 12 Dec 2016, at 14:12, Brian O'Connor wrote:
>
> Hi Francis,
>
> I agree with you, I think Miguel is showing what this group needs to show,
> that someone else can run the tools from Dockstore, have that be
> successful, and the results are largely in agreement with previous results
> (or duplicate runs). I think maybe a statement about the possibility of
> stochastic results in the README for each tool would be sufficient. This
> could be something that Keiran can craft/comment for Sanger?s pipeline
> since he?s in the best position for this one.
>
> Brian
>
> On Dec 12, 2016, at 7:38 AM, Francis Ouellette wrote:
>
> I know I'm not suppose to be there (and I'm not :-), but one slippery
> slope I want this dockerstore testing working group to be wary about (and
> Christina, this is really directed at you, chairing the discussion today)
> is that the request from Lincoln for this to reproduce what we are doing is
> fine, but I don't think it is this working group's task to reproduce and
> explain all of the discrepancies we see. I don't think we ever saw that
> kind of data from the people that ran the original workflow.
>
> If this group can ascertain that a dock store container basically works, I
> think we need to call that test a success, and move on to the next one.
> What Miguel is suggesting/asking below is very good, but I could see this
> becoming into a very slippery slope, which I would advise us against
> slipping down.
>
> Anyway, going off to my day off,
>
> Have a ghre at discussion,
>
> Francis
>
> --
> B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
>
> On Dec 12, 2016, at 05:44, Miguel Vazquez wrote:
>
> Dear all,
>
> I was wondering if someone here was acquainted with the Sanger workflow
> and could help explain these discrepancies. I've skimmed through the code,
> and it seems like uses EM but I didn't find anything random in it, such as
> during initialization, which was my initial guess. The other thing I though
> is that when it splits the work for parallel processing it might choose a
> different number of splits to accommodate the number of CPUs, and that this
> might affect the calculations.
>
> Is there someone here that could help shed some light? As soon as some
> other tests finish I'll be running the process again, but since it takes so
> long perhaps a little insight would help.
>
> Best regards
>
> Miguel
>
>
>
> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez
> wrote:
> Dear all,
>
> The Sanger pipeline completed, after about 2 weeks of computing, for donor
> DO50311
>
> The results are the following:
>
> Comparison for DO50311 using Sanger
> ---
> Common: 156299
> Extra: 1
> - Example: Y:58885197:G
> Missing: 14
> - Example: 1:102887902:T,1:143165228:G,16:87047601:C
>
>
> The donor results for DKFZ yielded
>
> Comparison for DO50311 using DKFZ
> ---
> Common: 51087
> Extra: 0
> Missing: 0
>
>
> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've
> updated the information here
>
> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
>
>
> Best regards
>
> Miguel
>
>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
> https://lists.icgc.org/mailman/listinfo/docktesters
>
>
>
>
> -- The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a company
> registered in England with number 2742969, whose registered office is 215
> Euston Road, London, NW1 2BE.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From miguel.vazquez at cnio.es Mon Dec 12 09:56:13 2016
From: miguel.vazquez at cnio.es (Miguel Vazquez)
Date: Mon, 12 Dec 2016 15:56:13 +0100
Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger
pipeline with GNOS VCF (99.9905% accuracy)
In-Reply-To: <64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk>
References:
<65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca>
<64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk>
Message-ID:
Hi Keiran,
I don't know how to check the pindel and ASCAT VCF's. I have not saved the
docker image. If you give me detailed instructions I can save it on my next
run and get them for you.
As for the difficulties on this donor (just my luck to choose this one at
random), I'm running the pipeline on another donor, perhaps it will show no
discrepancies, or perhaps its a better subject for our inquiries. We should
see soon, I hope; it's 4 days into the analysis.
Best
Miguel
On Mon, Dec 12, 2016 at 3:38 PM, Keiran Raine wrote:
> Hi,
>
> I'd need access to the full set of result files from the run but can you
> confirm the pindel and ASCAT VCF's exactly the same? Both feed into
> caveman analysis.
>
> ASCAT is the least stable of the algorithms as it randomly assigns the
> B-allele and if this donor is known to have an unusual
> copynumber/rearrangment state it is likely to be the cause (I wouldn't
> consider a sample like this to particularly good for testing though).
>
> What were the results on the other samples, I assume cleaner data has also
> been run?
>
> Regards,
>
> Keiran Raine
> Principal Bioinformatician
> Cancer Genome Project
> Wellcome Trust Sanger Institute
>
> kr2 at sanger.ac.uk
> Tel:+44 (0)1223 834244 Ext: 4983 <+44%201223%20834244>
> Office: H104
>
> On 12 Dec 2016, at 14:12, Brian O'Connor wrote:
>
> Hi Francis,
>
> I agree with you, I think Miguel is showing what this group needs to show,
> that someone else can run the tools from Dockstore, have that be
> successful, and the results are largely in agreement with previous results
> (or duplicate runs). I think maybe a statement about the possibility of
> stochastic results in the README for each tool would be sufficient. This
> could be something that Keiran can craft/comment for Sanger?s pipeline
> since he?s in the best position for this one.
>
> Brian
>
> On Dec 12, 2016, at 7:38 AM, Francis Ouellette wrote:
>
> I know I'm not suppose to be there (and I'm not :-), but one slippery
> slope I want this dockerstore testing working group to be wary about (and
> Christina, this is really directed at you, chairing the discussion today)
> is that the request from Lincoln for this to reproduce what we are doing is
> fine, but I don't think it is this working group's task to reproduce and
> explain all of the discrepancies we see. I don't think we ever saw that
> kind of data from the people that ran the original workflow.
>
> If this group can ascertain that a dock store container basically works, I
> think we need to call that test a success, and move on to the next one.
> What Miguel is suggesting/asking below is very good, but I could see this
> becoming into a very slippery slope, which I would advise us against
> slipping down.
>
> Anyway, going off to my day off,
>
> Have a ghre at discussion,
>
> Francis
>
> --
> B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
>
> On Dec 12, 2016, at 05:44, Miguel Vazquez wrote:
>
> Dear all,
>
> I was wondering if someone here was acquainted with the Sanger workflow
> and could help explain these discrepancies. I've skimmed through the code,
> and it seems like uses EM but I didn't find anything random in it, such as
> during initialization, which was my initial guess. The other thing I though
> is that when it splits the work for parallel processing it might choose a
> different number of splits to accommodate the number of CPUs, and that this
> might affect the calculations.
>
> Is there someone here that could help shed some light? As soon as some
> other tests finish I'll be running the process again, but since it takes so
> long perhaps a little insight would help.
>
> Best regards
>
> Miguel
>
>
>
> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez
> wrote:
> Dear all,
>
> The Sanger pipeline completed, after about 2 weeks of computing, for donor
> DO50311
>
> The results are the following:
>
> Comparison for DO50311 using Sanger
> ---
> Common: 156299
> Extra: 1
> - Example: Y:58885197:G
> Missing: 14
> - Example: 1:102887902:T,1:143165228:G,16:87047601:C
>
>
> The donor results for DKFZ yielded
>
> Comparison for DO50311 using DKFZ
> ---
> Common: 51087
> Extra: 0
> Missing: 0
>
>
> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've
> updated the information here
>
> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
>
>
> Best regards
>
> Miguel
>
>
> _______________________________________________
> docktesters mailing list
> docktesters at lists.icgc.org
> https://lists.icgc.org/mailman/listinfo/docktesters
>
>
>
>
> -- The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a company
> registered in England with number 2742969, whose registered office is 215
> Euston Road, London, NW1 2BE.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Christina.Yung at oicr.on.ca Mon Dec 12 10:24:30 2016
From: Christina.Yung at oicr.on.ca (Christina Yung)
Date: Mon, 12 Dec 2016 15:24:30 +0000
Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger
pipeline with GNOS VCF (99.9905% accuracy)
In-Reply-To:
References:
<65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca>
<64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk>
Message-ID: <0F84ED6166CE664E8563B61CB2ECB98CCBB47002@exmb2.ad.oicr.on.ca>
Thanks to Miguel for all the great work, and for giving an update on the tech call.
Consolidating some comments, I think we can conclude that a docker has passed testing when it produces
1. The same outputs as the production runs (SNVs, indels, SVs ? somatic + germline in some cases), or
2. Outputs with very small discrepancies from the production runs, plus an explanation from the workflow author on the discrepancies. Workflow authors need to point out steps that are stochastic, or changes in the docker that introduce any non-random differences.
Best,
Christina
From: docktesters-bounces+christina.yung=oicr.on.ca at lists.icgc.org [mailto:docktesters-bounces+christina.yung=oicr.on.ca at lists.icgc.org] On Behalf Of Miguel Vazquez
Sent: Monday, December 12, 2016 9:56 AM
To: Keiran Raine
Cc: docktesters at lists.icgc.org
Subject: Re: [DOCKTESTERS] Help understand the small discrepancies in Sanger pipeline with GNOS VCF (99.9905% accuracy)
Hi Keiran,
I don't know how to check the pindel and ASCAT VCF's. I have not saved the docker image. If you give me detailed instructions I can save it on my next run and get them for you.
As for the difficulties on this donor (just my luck to choose this one at random), I'm running the pipeline on another donor, perhaps it will show no discrepancies, or perhaps its a better subject for our inquiries. We should see soon, I hope; it's 4 days into the analysis.
Best
Miguel
On Mon, Dec 12, 2016 at 3:38 PM, Keiran Raine > wrote:
Hi,
I'd need access to the full set of result files from the run but can you confirm the pindel and ASCAT VCF's exactly the same? Both feed into caveman analysis.
ASCAT is the least stable of the algorithms as it randomly assigns the B-allele and if this donor is known to have an unusual copynumber/rearrangment state it is likely to be the cause (I wouldn't consider a sample like this to particularly good for testing though).
What were the results on the other samples, I assume cleaner data has also been run?
Regards,
Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute
kr2 at sanger.ac.uk
Tel:+44 (0)1223 834244 Ext: 4983
Office: H104
On 12 Dec 2016, at 14:12, Brian O'Connor > wrote:
Hi Francis,
I agree with you, I think Miguel is showing what this group needs to show, that someone else can run the tools from Dockstore, have that be successful, and the results are largely in agreement with previous results (or duplicate runs). I think maybe a statement about the possibility of stochastic results in the README for each tool would be sufficient. This could be something that Keiran can craft/comment for Sanger?s pipeline since he?s in the best position for this one.
Brian
On Dec 12, 2016, at 7:38 AM, Francis Ouellette > wrote:
I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow.
If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down.
Anyway, going off to my day off,
Have a ghre at discussion,
Francis
--
B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
On Dec 12, 2016, at 05:44, Miguel Vazquez > wrote:
Dear all,
I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations.
Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help.
Best regards
Miguel
On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez > wrote:
Dear all,
The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311
The results are the following:
Comparison for DO50311 using Sanger
---
Common: 156299
Extra: 1
- Example: Y:58885197:G
Missing: 14
- Example: 1:102887902:T,1:143165228:G,16:87047601:C
The donor results for DKFZ yielded
Comparison for DO50311 using DKFZ
---
Common: 51087
Extra: 0
Missing: 0
In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here
https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
Best regards
Miguel
_______________________________________________
docktesters mailing list
docktesters at lists.icgc.org
https://lists.icgc.org/mailman/listinfo/docktesters
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From kr2 at sanger.ac.uk Mon Dec 12 10:38:42 2016
From: kr2 at sanger.ac.uk (Keiran Raine)
Date: Mon, 12 Dec 2016 15:38:42 +0000
Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger
pipeline with GNOS VCF (99.9905% accuracy)
In-Reply-To:
References:
<65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca>
<64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk>
Message-ID: <12197707-78EE-45CB-ACB1-0A1D2819470B@sanger.ac.uk>
Hi Miguel,
ASCAT is *.somatic.cnv.vcf.gz
Pindel is *.somatic.indel.vcf.gz
Are you not using vcftools to do comaprisons on all generated VCF files?
All variants:
vcftools --gzvcf input_file1.vcf.gz --gzdiff input_file2.vcf.gz --diff-site --out in1_v_in2
Passed variants:
vcftools --gzvcf input_file1.vcf.gz --gzdiff input_file2.vcf.gz --diff-site --out in1_v_in2 --remove-filtered-all
(unfortunately a sort instability in Pindel may require the indel vcf to be resorted first on: chr, pos, ref, alt)
Regards,
Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute
kr2 at sanger.ac.uk
Tel:+44 (0)1223 834244 Ext: 4983
Office: H104
> On 12 Dec 2016, at 14:56, Miguel Vazquez wrote:
>
> Hi Keiran,
>
> I don't know how to check the pindel and ASCAT VCF's. I have not saved the docker image. If you give me detailed instructions I can save it on my next run and get them for you.
>
> As for the difficulties on this donor (just my luck to choose this one at random), I'm running the pipeline on another donor, perhaps it will show no discrepancies, or perhaps its a better subject for our inquiries. We should see soon, I hope; it's 4 days into the analysis.
>
> Best
>
> Miguel
>
> On Mon, Dec 12, 2016 at 3:38 PM, Keiran Raine > wrote:
> Hi,
>
> I'd need access to the full set of result files from the run but can you confirm the pindel and ASCAT VCF's exactly the same? Both feed into caveman analysis.
>
> ASCAT is the least stable of the algorithms as it randomly assigns the B-allele and if this donor is known to have an unusual copynumber/rearrangment state it is likely to be the cause (I wouldn't consider a sample like this to particularly good for testing though).
>
> What were the results on the other samples, I assume cleaner data has also been run?
>
> Regards,
>
> Keiran Raine
> Principal Bioinformatician
> Cancer Genome Project
> Wellcome Trust Sanger Institute
>
> kr2 at sanger.ac.uk
> Tel:+44 (0)1223 834244 Ext: 4983
> Office: H104
>
>> On 12 Dec 2016, at 14:12, Brian O'Connor > wrote:
>>
>> Hi Francis,
>>
>> I agree with you, I think Miguel is showing what this group needs to show, that someone else can run the tools from Dockstore, have that be successful, and the results are largely in agreement with previous results (or duplicate runs). I think maybe a statement about the possibility of stochastic results in the README for each tool would be sufficient. This could be something that Keiran can craft/comment for Sanger?s pipeline since he?s in the best position for this one.
>>
>> Brian
>>
>>> On Dec 12, 2016, at 7:38 AM, Francis Ouellette > wrote:
>>>
>>> I know I'm not suppose to be there (and I'm not :-), but one slippery slope I want this dockerstore testing working group to be wary about (and Christina, this is really directed at you, chairing the discussion today) is that the request from Lincoln for this to reproduce what we are doing is fine, but I don't think it is this working group's task to reproduce and explain all of the discrepancies we see. I don't think we ever saw that kind of data from the people that ran the original workflow.
>>>
>>> If this group can ascertain that a dock store container basically works, I think we need to call that test a success, and move on to the next one. What Miguel is suggesting/asking below is very good, but I could see this becoming into a very slippery slope, which I would advise us against slipping down.
>>>
>>> Anyway, going off to my day off,
>>>
>>> Have a ghre at discussion,
>>>
>>> Francis
>>>
>>> --
>>> B.F. Francis Ouellette http://oicr.on.ca/person/francis-ouellette
>>>
>>> On Dec 12, 2016, at 05:44, Miguel Vazquez > wrote:
>>>
>>>> Dear all,
>>>>
>>>> I was wondering if someone here was acquainted with the Sanger workflow and could help explain these discrepancies. I've skimmed through the code, and it seems like uses EM but I didn't find anything random in it, such as during initialization, which was my initial guess. The other thing I though is that when it splits the work for parallel processing it might choose a different number of splits to accommodate the number of CPUs, and that this might affect the calculations.
>>>>
>>>> Is there someone here that could help shed some light? As soon as some other tests finish I'll be running the process again, but since it takes so long perhaps a little insight would help.
>>>>
>>>> Best regards
>>>>
>>>> Miguel
>>>>
>>>>
>>>>
>>>> On Mon, Dec 5, 2016 at 1:41 PM, Miguel Vazquez > wrote:
>>>> Dear all,
>>>>
>>>> The Sanger pipeline completed, after about 2 weeks of computing, for donor DO50311
>>>>
>>>> The results are the following:
>>>>
>>>> Comparison for DO50311 using Sanger
>>>> ---
>>>> Common: 156299
>>>> Extra: 1
>>>> - Example: Y:58885197:G
>>>> Missing: 14
>>>> - Example: 1:102887902:T,1:143165228:G,16:87047601:C
>>>>
>>>>
>>>> The donor results for DKFZ yielded
>>>>
>>>> Comparison for DO50311 using DKFZ
>>>> ---
>>>> Common: 51087
>>>> Extra: 0
>>>> Missing: 0
>>>>
>>>>
>>>> In both cases I'm comparing agains the VCF file downloaded from GNOS. I've updated the information here
>>>>
>>>> https://wiki.oicr.on.ca/display/PANCANCER/Workflow+Testing+Data
>>>>
>>>>
>>>> Best regards
>>>>
>>>> Miguel
>>>>
>>>>
>>> _______________________________________________
>>> docktesters mailing list
>>> docktesters at lists.icgc.org
>>> https://lists.icgc.org/mailman/listinfo/docktesters
>>
>
>
> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From miguel.vazquez at cnio.es Mon Dec 12 11:16:20 2016
From: miguel.vazquez at cnio.es (Miguel Vazquez)
Date: Mon, 12 Dec 2016 17:16:20 +0100
Subject: [DOCKTESTERS] Help understand the small discrepancies in Sanger
pipeline with GNOS VCF (99.9905% accuracy)
In-Reply-To: <12197707-78EE-45CB-ACB1-0A1D2819470B@sanger.ac.uk>
References:
<65F9A62D-04DB-4BD1-A196-F628FF46FAA0@oicr.on.ca>
<64755405-B418-43F5-ADEC-5C4D682E8EC8@sanger.ac.uk>
<12197707-78EE-45CB-ACB1-0A1D2819470B@sanger.ac.uk>
Message-ID:
Thanks Keiran, that is what Christina asked us to do,so I'll check it next
On Dec 12, 2016 4:38 PM, "Keiran Raine" wrote:
> Hi Miguel,
>
> ASCAT is *.somatic.cnv.vcf.gz
>
> Pindel is *.somatic.indel.vcf.gz
>
> Are you not using vcftools to do comaprisons on all generated VCF files?
>
> All variants:
> vcftools --gzvcf input_file1.vcf.gz --gzdiff input_file2.vcf.gz
> --diff-site --out in1_v_in2
>
> Passed variants:
> vcftools --gzvcf input_file1.vcf.gz --gzdiff input_file2.vcf.gz
> --diff-site --out in1_v_in2 --remove-filtered-all
>
> (unfortunately a sort instability in Pindel may require the indel vcf to
> be resorted first on: chr, pos, ref, alt)
>
> Regards,
>
>
> Keiran Raine
> Principal Bioinformatician
> Cancer Genome Project
> Wellcome Trust Sanger Institute
>
> kr2 at sanger.ac.uk
> Tel:+44 (0)1223 834244 Ext: 4983 <+44%201223%20834244>
> Office: H104
>
> On 12 Dec 2016, at 14:56, Miguel Vazquez wrote:
>
> Hi Keiran,
>
> I don't know how to check the pindel and ASCAT VCF's. I have not saved the
> docker image. If you give me detailed instructions I can save it on my next
> run and get them for you.
>
> As for the difficulties on this donor (just my luck to choose this one at
> random), I'm running the pipeline on another donor, perhaps it will show no
> discrepancies, or perhaps its a better subject for our inquiries. We should
> see soon, I hope; it's 4 days into the analysis.
>
> Best
>
> Miguel
>
> On Mon, Dec 12, 2016 at 3:38 PM, Keiran Raine wrote:
>
>> Hi,
>>
>> I'd need access to the full set of result files from the run but can you
>> confirm the pindel and ASCAT VCF's exactly the same? Both feed into
>> caveman analysis.
>>
>> ASCAT is the least stable of the algorithms as it randomly assigns the
>> B-allele and if this donor is known to have an unusual
>> copynumber/rearrangment state it is likely to be the cause (I wouldn't
>> consider a sample like this to particularly good for testing though).
>>
>> What were the results on the other samples, I assume cleaner data has
>> also been run?
>>
>> Regards,
>>
>> Keiran Raine
>> Principal Bioinformatician
>> Cancer Genome Project
>> Wellcome Trust Sanger Institute
>>
>> kr2 at sanger.ac.uk
>> Tel:+44 (0)1223 834244 Ext: 4983 <+44%201223%20834244>
>> Office: H104
>>
>> On 12 Dec 2016, at 14:12, Brian O'Connor
>> wrote:
>>
>> Hi Francis,
>>
>> I agree with you, I think Miguel is showing what this group needs to
>> show, that someone else can run the tools from Dockstore, have that be
>> successful, and the results are largely in agreement with previous results
>> (or duplicate runs). I think maybe a statement about the possibility of
>> stochastic results in the README for each tool would be sufficient. This
>> could be something that Keiran can craft/comment for Sanger?s pipeline
>> since he?s in the best position for this one.
>>
>> Brian
>>
>> On Dec 12, 2016, at 7:38 AM, Francis Ouellette
>> wrote:
>>
>> I know I'm not suppose to be there (and I'm not :-), but one slippery
>> slope I want this dockerstore testing working group to be wary about (and
>> Christina, this is really directed at you, chairing the discussion today)
>> is that the request from Lincoln for this to reproduce what we are doing is
>> fine, but I don't think it is this working group's task to reproduce and
>> explain all of the discrepancies we see. I don't think we ever saw that
>> kind of data from the people that ran the original workflow.
>>
>> If this group can ascertain that a dock store container basically works,
>> I think we need to call that test a success, and move on to the next one.
>> What Miguel is suggesting/asking below is very good, but I could see this
>> becoming into a very slippery slope, which I would advise us against
>> slipping down.
>>
>> Anyway, going off to my day off,
>>
>> Have a ghre at discussion,
>>
>> Francis
>>
>> --
>> B.F. Francis Ouellette http://oicr.on.ca/per
>> son/francis-ouellette
>>
>> On Dec 12, 2016, at 05:44, Miguel Vazquez