samtools

Introduction

samtools is a widely used suite of programs for interacting with high-throughput sequencing data stored typically in SAM and BAM. It provides a collection of core utilities essential for most NGS processing, including sorting, indexing, viewing, and basic analysis of alignment files.

Requirements

samtools is available as a module on Delta and Illinois Campus Cluster (ICC). All required dependencies load automatically when the module is invoked:

module load samtools

Usage

Below are examples of commonly used samtools functions, followed by a template SLURM script demonstrating typical batch usage.

Viewing alignments

Convert a SAM file to BAM:

# Convert SAM to BAM
samtools view -bS input.sam > output.bam

# Extract mapped reads from the alignments
samtools view -b -F 4 input.bam > mapped_reads.bam

Sorting BAM files

Sorting BAM files is typically required for indexing and several downstream analysis:

samtools sort -o sample.sorted.bam input.bam

Indexing BAM files

Indexing enables fast random access, required by many analysis tools:

samtools index sample.sorted.bam

Extracting statistics

Generate a summary of alignment statistics for quality assessment:

samtools flagstat sample.sorted.bam > sample.flagstat.txt

SLURM batch script example

The following code snippet can be used to create a SLURM batch script for basic alignment file processing with samtools:

# Path to Working Directory

myWorkDir="/path/to/my/working/directory"
cd $myWorkDir

## -- Reserve a folder containing input SAM/BAM files. For example: $myWorkDir/alignments/*.sam
## -- Reserve a folder for storing processed files. For example: $myWorkDir/processed/

mkdir -p $myWorkDir/processed

for samfile in $myWorkDir/alignments/*.sam; do

    samplename=$(basename $samfile .sam)

    # Converting SAM to BAM

    samtools view -bS $samfile > $myWorkDir/processed/${samplename}.bam

    # Sorting the BAM

    samtools sort -o $myWorkDir/processed/${samplename}.sorted.bam $myWorkDir/processed/${samplename}.bam

    # Indexing the BAM

    samtools index $myWorkDir/processed/${samplename}.sorted.bam

    # Generate alignment analysis statistics

    samtools flagstat $myWorkDir/processed/${samplename}.sorted.bam > $myWorkDir/processed/${samplename}.flagstat.txt

done

References

  1. Li H, et al. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16): 2078–2079. Available at: http://www.htslib.org/