Quick Start Guide
This guide will help you run your first AlphaFold 3 predictions using various input types.
Before starting, ensure you have completed the installation and have obtained model parameters.
Basic Workflow
Create input JSON
Define your biomolecular structure prediction task
Run AlphaFold 3
Execute the Docker container with your input
Analyze outputs
Review predicted structures and confidence metrics
Example 1: Simple Protein Structure
Let’s start with a basic homodimer protein prediction.
Create $HOME/af_input/fold_input.json:
{
"name" : "2PV7" ,
"sequences" : [
{
"protein" : {
"id" : [ "A" , "B" ],
"sequence" : "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
}
}
],
"modelSeeds" : [ 1 ],
"dialect" : "alphafold3" ,
"version" : 1
}
The "id": ["A", "B"] specifies a homodimer with two copies of the same protein chain.
Run Prediction
docker run -it \
--volume $HOME /af_input:/root/af_input \
--volume $HOME /af_output:/root/af_output \
--volume < MODEL_PARAMETERS_DI R > :/root/models \
--volume < DB_DI R > :/root/public_databases \
--gpus all \
alphafold3 \
python run_alphafold.py \
--json_path=/root/af_input/fold_input.json \
--model_dir=/root/models \
--output_dir=/root/af_output
Expected Output
The prediction creates an output directory $HOME/af_output/2PV7/ containing:
2PV7_model.cif - Top-ranked predicted structure
2PV7_confidences.json - Full confidence metrics
2PV7_summary_confidences.json - Summary confidence scores
2PV7_data.json - Input with MSA and template data
ranking_scores.csv - Scores for all predictions
seed-1_sample-0/ through seed-1_sample-4/ - Individual sample predictions
Example 2: Protein-Ligand Complex
Predict a protein bound to an ATP molecule.
Using CCD Code
Using SMILES
{
"name" : "protein_atp_complex" ,
"sequences" : [
{
"protein" : {
"id" : "A" ,
"sequence" : "MKVLWAALLVTFLAGCQAKVDQIAEGAVRKIEEELGAIAAAH"
}
},
{
"ligand" : {
"id" : "L" ,
"ccdCodes" : [ "ATP" ]
}
}
],
"modelSeeds" : [ 1 , 2 , 3 ],
"dialect" : "alphafold3" ,
"version" : 1
}
Using multiple modelSeeds generates multiple predictions with different random seeds, improving confidence in results.
Example 3: RNA Structure with Modifications
Predict an RNA structure with modified nucleotides.
{
"name" : "modified_rna" ,
"sequences" : [
{
"rna" : {
"id" : "R" ,
"sequence" : "AGCUAGCUAGCUAGCU" ,
"modifications" : [
{ "modificationType" : "2MG" , "basePosition" : 1 },
{ "modificationType" : "5MC" , "basePosition" : 4 }
]
}
}
],
"modelSeeds" : [ 1 ],
"dialect" : "alphafold3" ,
"version" : 1
}
Example 4: Protein with Post-Translational Modifications
{
"name" : "modified_protein" ,
"sequences" : [
{
"protein" : {
"id" : "A" ,
"sequence" : "PVLSCGEWQLVLHVWAKVEADVAGHGQDILIRLFK" ,
"modifications" : [
{ "ptmType" : "HY3" , "ptmPosition" : 1 },
{ "ptmType" : "P1L" , "ptmPosition" : 5 }
]
}
}
],
"modelSeeds" : [ 1 ],
"dialect" : "alphafold3" ,
"version" : 1
}
Modifications are specified using CCD codes and 1-based residue positions. The first residue in the example won’t be a proline (P) but HY3 instead.
Example 5: DNA-Protein Complex
{
"name" : "dna_protein_complex" ,
"sequences" : [
{
"protein" : {
"id" : "P" ,
"sequence" : "MTEKLTSAELGTRGVGLAKVAADGYVPDEAVRKAL"
}
},
{
"dna" : {
"id" : "D1" ,
"sequence" : "GACCTCT"
}
},
{
"dna" : {
"id" : "D2" ,
"sequence" : "AGAGGTC"
}
}
],
"modelSeeds" : [ 1 , 2 ],
"dialect" : "alphafold3" ,
"version" : 1
}
Example 6: Covalent Ligand with Bond Specification
For covalent ligands, you can specify bonds between atoms.
{
"name" : "covalent_ligand" ,
"sequences" : [
{
"protein" : {
"id" : "A" ,
"sequence" : "MKTIIALSYIFCLVFADYKDDDDK"
}
},
{
"ligand" : {
"id" : "L" ,
"ccdCodes" : [ "HEM" ]
}
}
],
"bondedAtomPairs" : [
[[ "A" , 145 , "SG" ], [ "L" , 1 , "FE" ]]
],
"modelSeeds" : [ 1 ],
"dialect" : "alphafold3" ,
"version" : 1
}
Bonds are specified as [[entity_id, residue_position, atom_name], [entity_id, residue_position, atom_name]]. Atom names must match CCD definitions.
Running Pipeline in Stages
You can split the pipeline into stages to optimize resource usage.
Stage 1: Data Pipeline Only (CPU)
Generate MSAs and templates without running inference:
docker run -it \
--volume $HOME /af_input:/root/af_input \
--volume $HOME /af_output:/root/af_output \
--volume < DB_DI R > :/root/public_databases \
alphafold3 \
python run_alphafold.py \
--json_path=/root/af_input/fold_input.json \
--output_dir=/root/af_output \
--norun_inference
This stage is CPU-only and doesn’t require a GPU. Run it on a cheaper CPU-only instance to save costs.
Stage 2: Inference Only (GPU)
Run inference using pre-computed MSAs and templates:
docker run -it \
--volume $HOME /af_input:/root/af_input \
--volume $HOME /af_output:/root/af_output \
--volume < MODEL_PARAMETERS_DI R > :/root/models \
--gpus all \
alphafold3 \
python run_alphafold.py \
--json_path=/root/af_output/job_name/job_name_data.json \
--model_dir=/root/models \
--output_dir=/root/af_output \
--norun_data_pipeline
Process a directory of JSON files:
docker run -it \
--volume $HOME /af_input:/root/af_input \
--volume $HOME /af_output:/root/af_output \
--volume < MODEL_PARAMETERS_DI R > :/root/models \
--volume < DB_DI R > :/root/public_databases \
--gpus all \
alphafold3 \
python run_alphafold.py \
--input_dir=/root/af_input \
--model_dir=/root/models \
--output_dir=/root/af_output
Understanding Output Files
Structure File (.cif)
The mmCIF file contains the predicted 3D coordinates:
Compatible with PyMOL, ChimeraX, and other structural biology tools
Contains all atoms for all chains
Includes B-factor values corresponding to pLDDT confidence
Confidence Metrics
Summary Confidences
Per-Chain Metrics
{
"ptm" : 0.87 , // Predicted TM-score (0-1, higher is better)
"iptm" : 0.82 , // Interface TM-score (0-1)
"ranking_score" : 0.91 , // Overall ranking score
"has_clash" : false , // Significant clashes detected?
"fraction_disordered" : 0.05
}
Interpreting Confidence Scores
pLDDT (per-atom) 90-100 : Very high confidence
70-90 : Confident
50-70 : Low confidence
<50 : Very low confidence
pTM / ipTM >0.8 : High quality prediction
0.6-0.8 : Gray zone (may or may not be correct)
<0.6 : Failed prediction
>0.5 : Overall fold might be similar to true structure
PAE (Predicted Aligned Error) Low values (0-5 Å) : High confidence in relative positions
High values (>10 Å) : Low confidence in relative positions
Useful for identifying domains and interfaces
Ranking Score Range : -100 to 1.5
Formula : 0.8×ipTM + 0.2×pTM + 0.5×disorder - 100×clash
Use to rank predictions across multiple seeds
Advanced Options
Custom MSA and Templates
Provide your own MSA (in A3M format):
{
"protein" : {
"id" : "A" ,
"sequence" : "MKTAYIAKQRQ" ,
"unpairedMsa" : ">query \n MKTAYIAKQRQ \n >seq1 \n MKT-YIAKQRQ \n >seq2 \n MKTAYI-KQRQ \n " ,
"pairedMsa" : "" ,
"templates" : []
}
}
Running MSA-Free
Predict structure without MSA search:
{
"protein" : {
"id" : "A" ,
"sequence" : "MKTAYIAKQRQ" ,
"unpairedMsa" : "" ,
"pairedMsa" : "" ,
"templates" : []
}
}
Multiple Seeds for Better Sampling
{
"name" : "my_prediction" ,
"modelSeeds" : [ 1 , 2 , 3 , 4 , 5 ],
"sequences" : [ ... ]
}
By default, AlphaFold 3 generates 5 samples per seed. With 5 seeds, you’ll get 25 total predictions to choose from.
Use SSD for Databases Place genetic databases on SSD or RAM-backed filesystem for 5-10x faster MSA search
Reuse MSA Data For multiple predictions with the same chains, run data pipeline once and reuse the output JSON
Enable JAX Compilation Cache Use --jax_compilation_cache_dir to avoid recompiling between runs
Optimize Bucket Sizes Adjust --buckets flag to minimize recompilation for your typical input sizes
Common Issues
Out of Memory
# Enable unified memory in Dockerfile
ENV XLA_PYTHON_CLIENT_PREALLOCATE= false
ENV TF_FORCE_UNIFIED_MEMORY= true
ENV XLA_CLIENT_MEM_FRACTION= 3.2
If you see “Failed to construct RDKit reference structure”:
Try increasing iterations: --conformer_max_iterations=10000
Or provide a reference structure using user-provided CCD format
Invalid JSON
Ensure SMILES strings are properly escaped:
# Use jq to escape SMILES
jq -R . <<< 'CCC[C@@H](O)CC\C=C\C=C\C#CC#C\C=C\CO'
Next Steps
Input Documentation Complete JSON format specification
Output Documentation Detailed output format guide
Performance Tuning Optimize for speed and throughput
Getting Help
If you encounter issues:
Check Known Issues
Search GitHub Issues
Contact the team at [email protected]