Section 2 Compare bcbio/umccrise runs
2.1 Running on Raijin/Gadi
Step 1: Grab an interactive node
- Most memory is for
bcftools isec
qsub -I -q normalbw -l ncpus=28,walltime=02:00:00,mem=80gb
Step 2: Create woof run environment
source /g/data3/gx8/extras/woof/load_woof.sh
Step 3: Run woof compare
- Supports only SNVs/INDELs for now.
# compare bcbio runs
woof compare --sample sample_label path/to/run1/final path/to/run2/final
# compare umccrise runs
woof compare --sample sample_label path/to/run1/umccrised/sample path/to/run2/umccrised/sample- The above will create the following directory structure (example):
|-- final
| |-- 2016.249.17.MH.P033
| | |-- bcftools_isec
| | |-- vcf_counts
| | |-- vcf_eval
| | `-- vcf_pass
| `-- CUP-Pairs8
| |-- bcftools_isec
| |-- vcf_counts
| |-- vcf_eval
| `-- vcf_pass
`-- work
|-- 2016.249.17.MH.P033
| |-- cromwell-executions
| |-- cromwell-workflow-logs
| |-- cromwell_config.conf
| |-- cromwell_inputs.json
| |-- cromwell_log.log
| |-- cromwell_meta.json
| |-- cromwell_opts.json
| |-- cromwell_samples.tsv
| |-- persist
| `-- wdl
`-- CUP-Pairs8
|-- cromwell-executions
|-- cromwell-workflow-logs
|-- cromwell_config.conf
|-- cromwell_inputs.json
|-- cromwell_log.log
|-- cromwell_meta.json
|-- cromwell_opts.json
|-- cromwell_samples.tsv
|-- persist
`-- wdl
The final evaluation results are in final/<sample>/vcf_eval/<vcf_type>/<ALL-or-PASS>/eval_stats.tsv
Step 4: Run woof report
2.2 Multi-Sample Mode
If you want to run woof compare on multiple samples (say, A & B), you can hack it in the following (relatively) simple way:
woof compare --justprep path/to/run1/A/final path/to/run2/A/final -s SAMPLE_A -o woof
woof compare --justprep path/to/run1/B/final path/to/run2/B/final -s SAMPLE_B -o woof
Each of the above runs prints out a cromwell command, and ‘just prepares’ a directory structure like the following:
woof
├── final/ # empty
└── work
├── SAMPLE_A
│ ├── cromwell_config.conf
│ ├── cromwell_inputs.json
│ ├── cromwell_opts.json
│ ├── cromwell_samples.tsv
│ └── wdl
│ ├── compare.wdl
│ └── tasks/[...]
└── SAMPLE_B
├── cromwell_config.conf
├── cromwell_inputs.json
├── cromwell_opts.json
├── cromwell_samples.tsv
└── wdl
├── compare.wdl
└── tasks/[...]
The cromwell_samples.tsv file contains rows with the sample name (e.g. SAMPLE_A), VCF name (e.g. ensemble),
and paths to VCF1 and VCF2. You need to simply concatenate those files for each sample you want into one, then run the Cromwell command:
cd woof/final/work/SAMPLE_A
cat ../SAMPLE_B/cromwell_samples.tsv >> cromwell_samples.tsv
cromwell -Xms1g -Xmx3g run -Dconfig.file=cromwell_config.conf \
-DLOG_LEVEL=ERROR -DLOG_LEVEL=WARN \
--metadata-output cromwell_meta.json \
--options cromwell_opts.json \
--inputs cromwell_inputs.json \
wdl/compare.wdl 2>&1 | tee -a cromwell_log.log
That would fill up the final directory shown in the above file tree.