BWA

Introduction

BWA is widely used aligner for mapping short reads to a reference genome. It works for both eukaryotes and prokaryotes, and is still one of the standard tools for genomics workflows.

Requirements

BWA is available as a module on NCSA servers.

Load the module:

module load bwa

Notes on Resource Allocation

Depending on the genomics project, there might be some performance differences worth pointing out:

Eukaryotes:

Eukaryote genomes are usually bigger and therefore it might take longer and need more memory.
BWA is not splicing-aware.
Indexing (with bwa index) might take a while in large genomes

Prokaryotes:

Prokaryote genomes are relatively smaller and usually do not require much memory

Usage

Below is a basic code snippet that could be used to create the SLURM batch script to run BWA (e.g., bwa mem) alignment for paired-end reads. This is a conventional use case for a project that aims to discover genome variants.

# Path to Working Directory

myWorkDir="/path/to/my/working/directory"
cd $myWorkDir

# Declare the reference genome for convenience

REF="$myWorkDir/ref/ref.fasta"

# Index the reference genome

## Notes:
## Usually we perform the reference indexing just once per project
## Adding reference to a variable (ie $REF) is just a convenience -- it is not a necessary step so the reference could be referenced directly in the code.

bwa index $REF

# Declare the FASTQ files

## Notes:
## These files could be supplied in a loop to increase the performance
## However, the required resources should match with the number of files used for the alignment

R1="$myWorkDir/samples/sample1_R1.fastq.gz"
R2="$myWorkDir/samples/sample1_R2.fastq.gz"

## Name of the alignment output for the sample (sam file)

OUT="$myWorkDir/sample1.sam"

## -- Run BWA MEM to align reads
## -- Parallelize the job using the flag -t, matching the --cpus-per-task value.

bwa mem -t 8 $REF $R1 $R2 > $OUT

References

Li H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997.