AlphaFold2 & ColabFold: Next-Gen Protein Structure Prediction

AlphaFold2 Architecture: A Deep Dive

The AlphaFold2 architecture represents a revolutionary approach to protein structure prediction, combining evolutionary information with deep learning to achieve unprecedented accuracy. Unlike previous methods that relied on fragment assembly and physics-based simulations, AlphaFold2 uses an end-to-end neural network approach to directly predict 3D coordinates.

Overall architecture of AlphaFold2 showing the pipeline from input sequence to 3D structure — Figure 5: Overall architecture of AlphaFold2 showing the complete pipeline from amino acid sequence to 3D structure prediction.

Key Components

1. Multiple Sequence Alignment (MSA) Processing

The first critical component of AlphaFold2 is the processing of evolutionary information through multiple sequence alignments:

Searches genetic databases to find evolutionarily related sequences
Aligns these sequences to capture conservation patterns
Generates a rich representation that captures which amino acids tend to co-evolve
This evolutionary information is crucial, as co-evolving residues often indicate spatial proximity in the folded structure

2. Template Processing

While not always necessary for accurate predictions, AlphaFold2 can use information from known protein structures:

Searches structural databases (PDB) for proteins with similar sequences
Extracts distance maps and structural features from these templates
Integrates this information with sequence-based predictions
This component helps guide the model, especially for proteins with known homologs

3. Evoformer

The core of AlphaFold2's architecture is the Evoformer, a novel transformer-based neural network:

Contains 48 identical blocks that iteratively refine representations
Processes both MSA and pairwise representations simultaneously
Uses specialized attention mechanisms to update information bidirectionally between these representations
Key mechanisms include:
- Row-wise gated self-attention: processes evolutionary information across sequence positions
- Column-wise gated self-attention: processes evolutionary information across aligned sequences
- Triangle multiplication: models higher-order interactions between residues
- Transition layers: non-linear processing that integrates information

Detailed structure of the Evoformer module showing attention mechanisms — Figure 6: Detailed structure of the Evoformer module showing the various attention mechanisms and information flow.

4. Structure Module

The final stage of AlphaFold2 is the Structure Module, which converts refined representations into 3D coordinates:

Consists of 8 blocks with shared weights
Takes the representations from the Evoformer and gradually builds the 3D structure
Uses Invariant Point Attention (IPA): a novel attention mechanism that respects the physics of 3D space
Predicts backbone frames (rotations and translations)
Computes atom positions based on these frames
Predicts torsion angles to determine side chain conformations

5. Recycling and Confidence Estimation

AlphaFold2 employs two additional techniques that significantly improve its performance:

Recycling: The entire prediction process is repeated multiple times (typically 3), with each iteration using the output of the previous one as input. This allows the model to refine its predictions based on emerging structural information.
Confidence Estimation: AlphaFold2 provides two key metrics:
- pLDDT (predicted Local Distance Difference Test): per-residue confidence scores from 0-100, with higher values indicating greater confidence
- PAE (Predicted Aligned Error): estimates the expected position error between any two residues, providing insight into the reliability of domain arrangements in multi-domain proteins

This sophisticated architecture enables AlphaFold2 to achieve remarkable accuracy, with many predictions reaching experimental-level quality (GDT scores > 90). The integration of evolutionary information with geometric reasoning through deep learning represents a fundamental breakthrough in protein structure prediction.

Methods

Our approach utilizes ColabFold, an open-source adaptation of AlphaFold2 designed for accessibility and efficiency. Developed by Mirdita et al., ColabFold significantly reduces the computational barriers to protein structure prediction while maintaining AlphaFold2's accuracy. Key advantages of ColabFold include:

Accelerated MSA generation using MMseqs2, reducing search times from hours to minutes
Optimized implementation for Google Colab's free GPU resources
Streamlined preprocessing and reduced memory requirements
Support for both protein monomer and complex predictions
Interactive visualization tools for structural analysis
Batch processing capabilities for multiple sequences

ColabFold preserves the core AlphaFold2 architecture while making it accessible to researchers without specialized computing infrastructure. Our implementation follows the standard ColabFold pipeline, which employs a multi-stage neural network architecture consisting of:

Multiple Sequence Alignment (MSA) generation using MMseqs2 to capture evolutionary information
Template structure identification from the PDB database
Neural network processing through the Evoformer (48 blocks) which refines sequence representations
Structure module (8 blocks) that predicts backbone coordinates and side chain orientations
Confidence assessment using pLDDT (predicted Local Distance Difference Test) scores

For our experiments, we configured ColabFold with the following parameters:

Model type: "auto" (automatically selects appropriate model based on input)
MSA depth: 512 sequences for optimal runs, reduced to 2 sequences for limitation testing
Extra MSA depth: 1024 sequences for optimal runs, reduced to 0 for limitation testing
Number of recycles: 6
Early stopping tolerance: 0.5 RMSD
Use of templates: Enabled with PDB70 database for optimal runs, disabled for some limitation tests

Our experiments covered two distinct scenarios. First, we predicted the structure of lysozyme (129 amino acids), achieving very high confidence scores (pLDDT ~0.98) that compared favorably with experimentally determined structures (PDB ID: 1LYZ). The entire prediction process took approximately 1-2 minutes on a T4 GPU, with early stopping triggering after just 2 recycles due to structural convergence. Second, we conducted limit-testing experiments on a larger fragment (~1000 amino acids) with minimal MSA and no templates, which resulted in dramatically lower confidence predictions (pLDDT scores averaging 30-50%), demonstrating the critical dependence of AlphaFold2 on sufficient evolutionary and template information, especially for larger proteins.

Link to Code

Our implementation and demonstration can be found in our GitHub repository: AlphaFold2-Demo. The repository includes our Jupyter notebook with step-by-step execution of the ColabFold pipeline, visualization code, and explanations of each component. We've also incorporated auxiliary scripts for structural comparison between predicted models and experimentally determined structures.

Click the button above to open our demo notebook directly in Google Colab where you can run the protein structure prediction pipeline yourself.

Experiments

Experiment 1: Lysozyme Prediction (Optimal Conditions)

We conducted structure prediction experiments on lysozyme (129 amino acids), a well-characterized enzyme found in egg whites and human tears. Our implementation:

Generated deep multiple sequence alignments (>2500 sequences)

Graph showing MSA for lysozyme. — Figure 8: Graph showing MSA coverage.

Figure 9: Top 5 MSA received for lysozyme input.

Identified suitable templates (1ior, 1ioq, 1iot, 1kxw)

Figure 10: Suitable templates for lysozyme.

Ran predictions with 5 recycles across multiple AlphaFold models

Figure 11: 2 recycles due to early stopping.

Achieved exceptional prediction quality with pLDDT scores averaging 0.984 and ptm scores of 0.913

Figure 12: Prediction for lysozyme with confidence coloring.

Visualized predictions using both rainbow coloring (by residue position) and confidence coloring
Compared our predicted structure with the experimental structure (1LYZ)

Figure 13: Predicted vs Experimental Structure of lysozyme.

The high confidence score (blue coloring in pLDDT visualization) indicates a highly reliable prediction across the entire protein structure, demonstrating AlphaFold2's remarkable ability to predict protein structures with near-experimental accuracy.

Experiment 2: Exploring the Limitations of AlphaFold2

To understand the boundaries of AlphaFold2's capabilities, we conducted two challenging prediction scenarios:

Limited Evolutionary Information for long protein

We tested how the prediction quality changes for a significantly larger protein fragment( ~1000 amino acids) under challenging conditions when evolutionary information is restricted:

Reduced MSA depth from 512 to just 2 sequences
Maintained all other parameters
No structural templates available
Observed significant degradation in prediction quality:
- pLDDT scores dropped from 0.984 to 0.433 at best
- Required full 6 recycles (no early stopping)
- High predicted alignment errors (predominantly red PAE matrix)
- Unstable structures across recycles with large RMSD variations
- Different models produced substantially different predicted structures

Figure 14: Predictions from model 1 and 2.

Figure 15: Predictions from model 3 and 4.

Predictions from model 5. — Figure 16: Predictions from model 5 and best pLDDT of 0.433.

Figure 17: Prediction result for long protein fragment with limited information.

Key Insights from Experiments

These experiments collectively demonstrate that AlphaFold2's prediction quality depends on multiple interdependent factors:

Evolutionary information (MSA depth) plays a crucial role in achieving high-confidence predictions
Template availability provides important structural guidance
Prediction quality degrades substantially under challenging conditions with limited information
The combination of protein complexity and limited evolutionary information presents the greatest challenge
Under optimal conditions, predictions can achieve near-experimental accuracy

Conclusion:

Our hands-on exploration of AlphaFold2 through ColabFold has revealed both the impressive capabilities and important limitations of this revolutionary technology. While we achieved remarkable accuracy with lysozyme prediction (pLDDT scores of 0.984), our experiments with challenging scenarios demonstrated clear performance boundaries. When working with a protein fragment under constrained conditions (minimal MSA depth and no templates), prediction confidence dropped significantly, with pLDDT scores averaging only around 30-50% and high predicted alignment errors as visualized in the red confidence maps.

These findings highlight a crucial insight about the interdependent factors affecting prediction quality: the absence of sufficient evolutionary information (limited MSA) combined with lack of structural templates dramatically impacts performance, especially for challenging proteins. Our experimental comparison demonstrates how AlphaFold2's success depends on the integration of multiple data sources rather than any single algorithmic innovation. This Nobel Prize-worthy breakthrough represents a sophisticated synthesis of evolutionary information, structural knowledge, and deep learning rather than a complete departure from traditional approaches.

References

[1] Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583-589 (2021). This groundbreaking paper introduced AlphaFold2 and its revolutionary approach to protein structure prediction. It details the neural network architecture that solved a 50-year-old grand challenge in biology with near-experimental accuracy.

[2] Mirdita, M., Schütze, K., Moriwaki, Y. et al. ColabFold: making protein folding accessible to all. Nat Methods 19, 679-682 (2022). This paper describes an optimized implementation of AlphaFold2 designed to run efficiently on consumer hardware. ColabFold democratizes access to state-of-the-art protein structure prediction by reducing computational requirements while maintaining high accuracy.

[3] Google DeepMind. Demis Hassabis and John Jumper awarded Nobel Prize in Chemistry. DeepMind Blog (2024). Retrieved April 21, 2025. This announcement highlights the unprecedented scientific impact of AlphaFold2, which earned a Nobel Prize just four years after publication. The recognition underscores how AI can solve fundamental scientific problems previously thought to require decades more research.

[4] Ahdritz, G., Bouatta, N., Kadyan, S. et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat Methods (2024). This paper presents OpenFold, an open-source reimplementation and retraining of AlphaFold2 that provides deeper insights into how the model works. The study reveals important details about AlphaFold2's learning mechanisms and its ability to generalize to novel protein structures.

AlphaFold2 & ColabFold: Next-Gen Protein Structure Prediction
For CS 7150

An Analysis of "Alphafold2 - Highly accurate protein structure prediction with AlphaFold"

Introduction