AlphaFold
AlphaFold is an artificial intelligence (AI) program originally developed by Google DeepMind to predict the 3D structure of proteins from the corresponding amino acid sequences. The latest version, AlphaFold 3, can predict the structures of a broader range of biomolecules than just proteins. This includes DNA, RNA, and ligands. For a broad overview of AlphaFold 3, see AlphaFold 3 predicts the structure and interactions of all of life’s molecules. For a more detailed view of the inner workings of AlphaFold 3, see The Illustrated AlphaFold.
Availability
Version |
Delta |
ICC |
---|---|---|
3.0.1 |
X |
X |
AlphaFold 3 is available for all Delta and Illinois Campus Cluster (ICC) users.
Obtaining model parameters
The AlphaFold 3 model parameters are available under the AlphaFold 3 Model Parameters Terms of Use. You may not use these model parameters except in compliance with these Terms of Use. Read them carefully before using AlphaFold 3.
To obtain the model parameters, follow the instructions at Obtaining Model Parameters.
Using AlphaFold 3
The AlphaFold inference pipeline has two stages:
The CPU-intensive data pipeline stage which generates features from input sequences. This involves looking up publicly available sequence and structural databases.
The GPU-intensive model inference stage which predicts the structure of the molecule from the generated features. This stage requires the model parameters.
You can either run the full pipeline, or the individual stages separately to better allocate computational resources. The following Slurm scripts show how you can run AlphaFold 3 on Delta and ICC. We will use the amino acid sequence of the Disco-interacting protein 2 homolog B (DIP2B) molecule, stored as a JSON file, as the input for demonstration.
{
"name": "DIP2B_dimer_AF-Q9P265-F1-model_v4",
"modelSeeds": [
15545
],
"sequences": [
{
"protein": {
"id": [
"A"
],
"sequence": "MAERGLEPSPAAVAALPPEVRAQLAELELELSEGDITQKGYEKKRSKLLSPYSPQTQETDSAVQKELRNQTPAPSAAQTSAPSKYHRTRSGGARDERYRSDIHTEAVQAALAKHKEQKMALPMPTKRRSTFVQSPADACTPPDTSSASEDEGSLRRQAALSAALQQSLQNAESWINRSIQGSSTSSSASSTLSHGEVKGTSGSLADVFANTRIENFSAPPDVTTTTSSSSSSSSIRPANIDLPPSGIVKGMHKGSNRSSLMDTADGVPVSSRVSTKIQQLLNTLKRPKRPPLKEFFVDDSEEIVEVPQPDPNQPKPEGRQMTPVKGEPLGVICNWPPALESALQRWGTTQAKCSCLTALDMTGKPVYTLTYGKLWSRSLKLAYTLLNKLGTKNEPVLKPGDRVALVYPNNDPVMFMVAFYGCLLAEVIPVPIEVPLTRKDAGGQQIGFLLGSCGIALALTSEVCLKGLPKTQNGEIVQFKGWPRLKWVVTDSKYLSKPPKDWQPHISPAGTEPAYIEYKTSKEGSVMGVTVSRLAMLSHCQALSQACNYSEGETIVNVLDFKKDAGLWHGMFANVMNKMHTISVPYSVMKTCPLSWVQRVHAHKAKVALVKCRDLHWAMMAHRDQRDVSLSSLRMLIVTDGANPWSVSSCDAFLSLFQSHGLKPEAICPCATSAEAMTVAIRRPGVPGAPLPGRAILSMNGLSYGVIRVNTEDKNSALTVQDVGHVMPGGMMCIVKPDGPPQLCKTDEIGEICVSSRTGGMMYFGLAGVTKNTFEVIPVNSAGSPVGDVPFIRSGLLGFVGPGSLVFVVGKMDGLLMVSGRRHNADDIVATGLAVESIKTVYRGRIAVFSVSVFYDERIVVVAEQRPDASEEDSFQWMSRVLQAIDSIHQVGVYCLALVPANTLPKTPLGGIHISQTKQLFLEGSLHPCNILMCPHTCVTNLPKPRQKQPGVGPASVMVGNLVAGKRIAQAAGRDLGQIEENDLVRKHQFLAEILQWRAQATPDHVLFMLLNAKGTTVCTASCLQLHKRAERIASVLGDKGHLNAGDNVVLLYPPGIELIAAFYGCLYAGCIPVTVRPPHAQNLTATLPTVRMIVDVSKAACILTSQTLMRLLRSREAAAAVDVKTWPTIIDTDDLPRKRLPQLYKPPTPEMLAYLDFSVSTTGMLTGVKMSHSAVNALCRAIKLQCELYSSRQIAICLDPYCGLGFALWCLCSVYSGHQSVLIPPMELENNLFLWLSTVNQYKIRDTFCSYSVMELCTKGLGNQVEVLKTRGINLSCVRTCVVVAEERPRVALQQSFSKLFKDIGLSPRAVSTTFGSRVNVAICLQGTSGPDPTTVYVDLKSLRHDRVRLVERGAPQSLLLSESGKILPGVKVVIVNPETKGPVGDSHLGEIWVNSPHTASGYYTIYDSETLQADHFNTRLSFGDAAQTLWARTGYLGFVRRTELTAATGERHDALYVVGALDETLELRGLRYHPIDIETSVSRIHRSIAECAVFTWTNLLVVVVELCGSEQEALDLVPLVTNVVLEEHYLIVGVVVVVDPGVIPINSRGEKQRMHLRDSFLADQLDPIYVAYNM",
"modifications": []
}
}
],
"bondedAtomPairs": [],
"dialect": "alphafold3",
"version": 1
}
In the scripts below we assume the following directory structure:
af3-runs/
|- inputs/
| |- DIP2B_dimer_AF-Q9P265-F1-model_v4.json
|- outputs/
|- models/
| |- af3.bin
|- af3.slurm
The inputs
directory contains the JSON file with the sequence
information. The Slurm and AlphaFold 3 outputs will be produced in the
outputs
directory. The af3.bin
file in the models
sub-directory contains the model parameters which, as mentioned earlier,
you have to obtain from Google. The af3.slurm
file is the Slurm
script.
Split workflow
Data pipeline with CPU
#!/bin/bash
#SBATCH --job-name=af3-dip2b-data-pipeline
#SBATCH --output=outputs/slurm-%j.out
#SBATCH --error=outputs/slurm-%j.err
#SBATCH --time=00:30:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --account=<account_name>
#SBATCH --partition=<partition_name>
#SBATCH --mem=80GB
ROOT=/taiga/ncsa/alphafold/
DB_DIR=${ROOT}/datasets
IMAGE=${ROOT}/alphafold3/alphafold3.sif
apptainer exec \
--bind ${ROOT}/alphafold3:/root/af3 \
--bind ${DB_DIR}:/root/public_databases \
--bind ${PWD}/inputs:/root/af_input \
--bind ${PWD}/outputs:/root/af_output \
--bind ${PWD}/models:/root/models \
${IMAGE} \
python /root/af3/run_alphafold.py \
--model_dir=/root/models \
--db_dir=/root/public_databases \
--json_path=/root/af_input/DIP2B_dimer_AF-Q9P265-F1-model_v4.json \
--output_dir=/root/af_output \
--norun_inference
On Delta, replace <account_name>
with XXXX-delta-cpu
where
XXXX
is your project allocation code, and <partition_name>
with
cpu
. On ICC, replace <account_name>
with an account available to
you, and <partition_name>
with a CPU partition available to you. On
job completion, you will have the following output:
outputs/
|- slurm-<job_number>.out
|- dip2b_dimer_af-q9p265-f1-model_v4_<timestamp>/
| |- dip2b_dimer_af-q9p265-f1-model_v4_data.json
The dip2b_dimer_af-q9p265-f1-model_v4_data.json
file contains the
generated features and you can use this file as the input to predict the
structure of the protein.
Model inference with GPU
#!/bin/bash
#SBATCH --job-name=af3-dip2b-model-inference
#SBATCH --output=outputs/slurm-%j.out
#SBATCH --error=outputs/slurm-%j.err
#SBATCH --time=00:10:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --gpus-per-task=1
#SBATCH --account=<account_name>
#SBATCH --partition=<partition_name>
#SBATCH --mem=80GB
module load cuda/12.6
ROOT=/taiga/ncsa/alphafold/
DB_DIR=${ROOT}/datasets
IMAGE=${ROOT}/alphafold3/alphafold3.sif
apptainer exec \
--nv \
--bind ${ROOT}/alphafold3:/root/af3 \
--bind ${DB_DIR}:/root/public_databases \
--bind ${PWD}/inputs:/root/af_input \
--bind ${PWD}/outputs:/root/af_output \
--bind ${PWD}/models:/root/models \
${IMAGE} \
python /root/af3/run_alphafold.py \
--model_dir=/root/models \
--db_dir=/root/public_databases \
--json_path=/root/af_output/dip2b_dimer_af-q9p265-f1-model_v4_<timestamp>/dip2b_dimer_af-q9p265-f1-model_v4_data.json \
--output_dir=/root/af_output \
--norun_data_pipeline
On Delta, replace <account_name>
with XXXX-delta-gpu
where
XXXX
is your project allocation code, and <partition_name>
with
a GPU partition name, such as gpuA100x4
. On ICC, replace
<account_name>
with an account available to you, and
<partition_name>
with a GPU partition available to you. On ICC, also
replace gpus-per-task=1
with --gres=gpu:<gpu_type>
where <gpu_type>
is a type of GPU available to you, such as A100
. For more information on
specifying GPU nodes on ICC, see Specifying nodes with
GPUs.
Notice that in this case we are using the output of the data pipeline
stage as the input (--json_path
). On job completion, you will have
the following output (arranged anti-chronologically):
outputs/
|- slurm-<new_job_number>.out
|- dip2b_dimer_af-q9p265-f1-model_v4_<new_timestamp>/
| |- dip2b_dimer_af-q9p265-f1-model_v4_confidences.json
| |- dip2b_dimer_af-q9p265-f1-model_v4_data.json
| |- ...
| |- ranking_scores.csv
| |- ...
|- slurm-<job_number>.out
|- dip2b_dimer_af-q9p265-f1-model_v4_<timestamp>/
| |- dip2b_dimer_af-q9p265-f1-model_v4_data.json
The dip2b_dimer_af-q9p265-f1-model_v4_<new_timestamp>
sub-directory
contains the final model predictions regarding the structure of the
molecule.
Full pipeline
#!/bin/bash
#SBATCH --job-name=af3-dip2b-full-pipeline
#SBATCH --output=outputs/slurm-%j.out
#SBATCH --error=outputs/slurm-%j.err
#SBATCH --time=00:30:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --gpus-per-task=1
#SBATCH --account=<account_name>
#SBATCH --partition=<partition_name>
#SBATCH --mem=80GB
module load cuda/12.6
ROOT=/taiga/ncsa/alphafold/
DB_DIR=${ROOT}/datasets
IMAGE=${ROOT}/alphafold3/alphafold3.sif
apptainer exec \
--nv \
--bind ${ROOT}/alphafold3:/root/af3 \
--bind ${DB_DIR}:/root/public_databases \
--bind ${PWD}/inputs:/root/af_input \
--bind ${PWD}/outputs:/root/af_output \
--bind ${PWD}/models:/root/models \
${IMAGE} \
python /root/af3/run_alphafold.py \
--model_dir=/root/models \
--db_dir=/root/public_databases \
--json_path=/root/af_input/DIP2B_dimer_AF-Q9P265-F1-model_v4.json \
--output_dir=/root/af_output
On Delta, replace <account_name>
with XXXX-delta-gpu
where
XXXX
is your project allocation code, and <partition_name>
with
a GPU partition name, such as gpuA100x4
. On ICC, replace
<account_name>
with an account available to you, and
<partition_name>
with a GPU partition available to you. On ICC, also
replace gpus-per-task=1
with --gres=gpu:<gpu_type>
where <gpu_type>
is a type of GPU available to you, such as A100
. For more information on
specifying GPU nodes on ICC, see Specifying nodes with
GPUs.
On job completion, you will have the following output (arranged anti-chronologically):
outputs/
|- slurm-<job_number>.out
|- dip2b_dimer_af-q9p265-f1-model_v4/
| |- dip2b_dimer_af-q9p265-f1-model_v4_confidences.json
| |- dip2b_dimer_af-q9p265-f1-model_v4_data.json
| |- ...
| |- ranking_scores.csv
| |- ...