Section 2 Compare bcbio/umccrise runs

2.1 Running on Raijin/Gadi

Step 1: Grab an interactive node

  • Most memory is for bcftools isec
qsub -I -q normalbw -l ncpus=28,walltime=02:00:00,mem=80gb

Step 2: Create woof run environment

source /g/data3/gx8/extras/woof/

Step 3: Run woof compare

  • Supports only SNVs/INDELs for now.
  • The above will create the following directory structure (example):
|-- final
|   |-- 2016.249.17.MH.P033
|   |   |-- bcftools_isec
|   |   |-- vcf_counts
|   |   |-- vcf_eval
|   |   `-- vcf_pass
|   `-- CUP-Pairs8
|       |-- bcftools_isec
|       |-- vcf_counts
|       |-- vcf_eval
|       `-- vcf_pass
`-- work
    |-- 2016.249.17.MH.P033
    |   |-- cromwell-executions
    |   |-- cromwell-workflow-logs
    |   |-- cromwell_config.conf
    |   |-- cromwell_inputs.json
    |   |-- cromwell_log.log
    |   |-- cromwell_meta.json
    |   |-- cromwell_opts.json
    |   |-- cromwell_samples.tsv
    |   |-- persist
    |   `-- wdl
    `-- CUP-Pairs8
        |-- cromwell-executions
        |-- cromwell-workflow-logs
        |-- cromwell_config.conf
        |-- cromwell_inputs.json
        |-- cromwell_log.log
        |-- cromwell_meta.json
        |-- cromwell_opts.json
        |-- cromwell_samples.tsv
        |-- persist
        `-- wdl

The final evaluation results are in final/<sample>/vcf_eval/<vcf_type>/<ALL-or-PASS>/eval_stats.tsv

2.2 Multi-Sample Mode

If you want to run woof compare on multiple samples (say, A & B), you can hack it in the following (relatively) simple way:

woof compare --justprep path/to/run1/A/final path/to/run2/A/final -s SAMPLE_A -o woof woof compare --justprep path/to/run1/B/final path/to/run2/B/final -s SAMPLE_B -o woof

Each of the above runs prints out a cromwell command, and ‘just prepares’ a directory structure like the following:

├── final/ # empty
└── work
    ├── SAMPLE_A
    │   ├── cromwell_config.conf
    │   ├── cromwell_inputs.json
    │   ├── cromwell_opts.json
    │   ├── cromwell_samples.tsv
    │   └── wdl
    │       ├── compare.wdl
    │       └── tasks/[...]
    └── SAMPLE_B
        ├── cromwell_config.conf
        ├── cromwell_inputs.json
        ├── cromwell_opts.json
        ├── cromwell_samples.tsv
        └── wdl
            ├── compare.wdl
            └── tasks/[...]

The cromwell_samples.tsv file contains rows with the sample name (e.g. SAMPLE_A), VCF name (e.g. ensemble), and paths to VCF1 and VCF2. You need to simply concatenate those files for each sample you want into one, then run the Cromwell command:

cd woof/final/work/SAMPLE_A
cat ../SAMPLE_B/cromwell_samples.tsv >> cromwell_samples.tsv

cromwell -Xms1g -Xmx3g run -Dconfig.file=cromwell_config.conf \
  --metadata-output cromwell_meta.json \
  --options cromwell_opts.json \
  --inputs cromwell_inputs.json \
  wdl/compare.wdl 2>&1 | tee -a cromwell_log.log

That would fill up the final directory shown in the above file tree.

2.3 Diagram