samtools
Introduction
samtools is a widely used suite of programs for interacting with high-throughput sequencing data stored typically in SAM and BAM. It provides a collection of core utilities essential for most NGS processing, including sorting, indexing, viewing, and basic analysis of alignment files.
Requirements
samtools is available as a module on Delta and Illinois Campus Cluster (ICC). All required dependencies load automatically when the module is invoked:
module load samtools
Usage
Below are examples of commonly used samtools functions, followed by a template SLURM script demonstrating typical batch usage.
Viewing alignments
Convert a SAM file to BAM:
# Convert SAM to BAM
samtools view -bS input.sam > output.bam
# Extract mapped reads from the alignments
samtools view -b -F 4 input.bam > mapped_reads.bam
Sorting BAM files
Sorting BAM files is typically required for indexing and several downstream analysis:
samtools sort -o sample.sorted.bam input.bam
Indexing BAM files
Indexing enables fast random access, required by many analysis tools:
samtools index sample.sorted.bam
Extracting statistics
Generate a summary of alignment statistics for quality assessment:
samtools flagstat sample.sorted.bam > sample.flagstat.txt
SLURM batch script example
The following code snippet can be used to create a SLURM batch script for basic alignment file processing with samtools:
# Path to Working Directory
myWorkDir="/path/to/my/working/directory"
cd $myWorkDir
## -- Reserve a folder containing input SAM/BAM files. For example: $myWorkDir/alignments/*.sam
## -- Reserve a folder for storing processed files. For example: $myWorkDir/processed/
mkdir -p $myWorkDir/processed
for samfile in $myWorkDir/alignments/*.sam; do
samplename=$(basename $samfile .sam)
# Converting SAM to BAM
samtools view -bS $samfile > $myWorkDir/processed/${samplename}.bam
# Sorting the BAM
samtools sort -o $myWorkDir/processed/${samplename}.sorted.bam $myWorkDir/processed/${samplename}.bam
# Indexing the BAM
samtools index $myWorkDir/processed/${samplename}.sorted.bam
# Generate alignment analysis statistics
samtools flagstat $myWorkDir/processed/${samplename}.sorted.bam > $myWorkDir/processed/${samplename}.flagstat.txt
done
References
Li H, et al. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16): 2078–2079. Available at: http://www.htslib.org/