BCFtools
Introduction
BCFtools is a suite of utilities designed for manipulating and analyzing variant call data stored in VCF and BCF formats. It provides essential tools for filtering, converting, and summarizing VCF information, making it an important component of most variant-calling workflows.
Requirements
BCFtools is available as a module on Delta and Illinois Campus Cluster (ICC). All necessary dependencies load automatically when the module is loaded:
module load bcftools
Usage
Below are examples of commonly used BCFtools functions, followed by a sample SLURM batch script suitable for processing VCF or BCF files.
View VCFs
View and filter VCF files:
# View a VCF
bcftools view input.vcf.gz
# Filter variants based on quality score
bcftools view -i 'QUAL>30' input.vcf.gz -o filtered.vcf
Indexing VCFs
Indexing enables efficient access and is required by many downstream bioinformatics tools:
bcftools index output.vcf.gz
Summary VCFs
bcftools stats input.vcf.gz > input.stats.txt
SLURM batch script example
The following code snippet could be used as an example to create the SLURM batch script for BCFtools variant processing:
# Path to Working Directory
myWorkDir="/path/to/my/working/directory"
cd $myWorkDir
## -- Reserve a folder containing input VCF files. For example: $myWorkDir/variants/*.vcf.gz
## -- Reserve a folder for storing processed files. For example: $myWorkDir/bcf_processed/
mkdir -p $myWorkDir/bcf_processed
for myvcf in $myWorkDir/variants/*.vcf.gz
do
samplename=$(basename $myvcf .vcf.gz)
# Filter variants by quality
bcftools view -i 'QUAL>30' $myvcf -o $myWorkDir/bcf_processed/${samplename}.filtered.vcf.gz
# Index the filtered VCF
bcftools index $myWorkDir/bcf_processed/${samplename}.filtered.vcf.gz
# Generate statistics
bcftools stats $myWorkDir/bcf_processed/${samplename}.filtered.vcf.gz > $myWorkDir/bcf_processed/${BASENAME}.stats.txt
done
References
Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience. 2021 Feb 16;10(2):giab008. doi: 10.1093/gigascience/giab008. PMID: 33590861; PMCID: PMC7931819.