Section 2 Compare bcbio/umccrise runs
2.1 Running on Raijin/Gadi
Step 1: Grab an interactive node
- Most memory is for
bcftools isec
qsub -I -q normalbw -l ncpus=28,walltime=02:00:00,mem=80gb
Step 2: Create woof run environment
source /g/data3/gx8/extras/woof/load_woof.sh
Step 3: Run woof compare
- Supports only SNVs/INDELs for now.
# compare bcbio runs
woof compare --sample sample_label path/to/run1/final path/to/run2/final
# compare umccrise runs
woof compare --sample sample_label path/to/run1/umccrised/sample path/to/run2/umccrised/sample
- The above will create the following directory structure (example):
|-- final
| |-- 2016.249.17.MH.P033
| | |-- bcftools_isec
| | |-- vcf_counts
| | |-- vcf_eval
| | `-- vcf_pass
| `-- CUP-Pairs8
| |-- bcftools_isec
| |-- vcf_counts
| |-- vcf_eval
| `-- vcf_pass
`-- work
|-- 2016.249.17.MH.P033
| |-- cromwell-executions
| |-- cromwell-workflow-logs
| |-- cromwell_config.conf
| |-- cromwell_inputs.json
| |-- cromwell_log.log
| |-- cromwell_meta.json
| |-- cromwell_opts.json
| |-- cromwell_samples.tsv
| |-- persist
| `-- wdl
`-- CUP-Pairs8
|-- cromwell-executions
|-- cromwell-workflow-logs
|-- cromwell_config.conf
|-- cromwell_inputs.json
|-- cromwell_log.log
|-- cromwell_meta.json
|-- cromwell_opts.json
|-- cromwell_samples.tsv
|-- persist
`-- wdl
The final evaluation results are in final/<sample>/vcf_eval/<vcf_type>/<ALL-or-PASS>/eval_stats.tsv
Step 4: Run woof report
2.2 Multi-Sample Mode
If you want to run woof compare
on multiple samples (say, A & B), you can hack it in the following (relatively) simple way:
woof compare --justprep path/to/run1/A/final path/to/run2/A/final -s SAMPLE_A -o woof
woof compare --justprep path/to/run1/B/final path/to/run2/B/final -s SAMPLE_B -o woof
Each of the above runs prints out a cromwell command, and ‘just prepares’ a directory structure like the following:
woof
├── final/ # empty
└── work
├── SAMPLE_A
│ ├── cromwell_config.conf
│ ├── cromwell_inputs.json
│ ├── cromwell_opts.json
│ ├── cromwell_samples.tsv
│ └── wdl
│ ├── compare.wdl
│ └── tasks/[...]
└── SAMPLE_B
├── cromwell_config.conf
├── cromwell_inputs.json
├── cromwell_opts.json
├── cromwell_samples.tsv
└── wdl
├── compare.wdl
└── tasks/[...]
The cromwell_samples.tsv
file contains rows with the sample name (e.g. SAMPLE_A
), VCF name (e.g. ensemble
),
and paths to VCF1 and VCF2. You need to simply concatenate those files for each sample you want into one, then run the Cromwell command:
cd woof/final/work/SAMPLE_A
cat ../SAMPLE_B/cromwell_samples.tsv >> cromwell_samples.tsv
cromwell -Xms1g -Xmx3g run -Dconfig.file=cromwell_config.conf \
-DLOG_LEVEL=ERROR -DLOG_LEVEL=WARN \
--metadata-output cromwell_meta.json \
--options cromwell_opts.json \
--inputs cromwell_inputs.json \
wdl/compare.wdl 2>&1 | tee -a cromwell_log.log
That would fill up the final
directory shown in the above file tree.