AlphaFold2, developed by DeepMind and described in the 2021 Nature paper "Highly accurate protein structure prediction with AlphaFold" by Jumper et al., uses deep learning with attention mechanisms and evolutionary sequence data to predict protein three-dimensional structures in minutes with near-experimental accuracy (median GDT > 90 in CASP14). It solves the protein folding problem, a grand challenge for over 50 years that resisted rapid, cost-effective solutions and forced reliance on slow, expensive, and sometimes unsuccessful experimental methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy. This breakthrough enables rapid, reliable protein structure determination, transforming biological research and therapeutic development.
Our team was inspired to delve deeper into this breakthrough to understand how AlphaFold2 works and explore its potential applications in research and education. After investigating implementation options, we discovered ColabFold - an adaptation of AlphaFold2 designed to run efficiently on Google Colab. This tool provides accessibility to this powerful technology without requiring specialized hardware, democratizing access to state-of-the-art protein structure prediction. Our project aims to demonstrate how ColabFold can be used to successfully predict protein structures and provide hands-on experience with a technology that's transforming structural biology worldwide.
To assess the impact of evolutionary and template data on predictive performance, we will perform two contrasting ColabFold experiments. In the first, we will supply a deep MSA of ~150 homologs plus four high-quality structural templates to establish a high-information baseline. In the second, we will restrict the MSA to the query sequence alone and disable template use to simulate an information-poor scenario, then compare per-residue confidence (pLDDT) and overall pTM scores across multiple refinement cycles.
How does the availability of evolutionary alignments and structural templates affect ColabFold's protein structure predictions?
The AlphaFold2 architecture represents a revolutionary approach to protein structure prediction, combining evolutionary information with deep learning to achieve unprecedented accuracy. Unlike previous methods that relied on fragment assembly and physics-based simulations, AlphaFold2 uses an end-to-end neural network approach to directly predict 3D coordinates.
The first critical component of AlphaFold2 is the processing of evolutionary information through multiple sequence alignments:
While not always necessary for accurate predictions, AlphaFold2 can use information from known protein structures:
The core of AlphaFold2's architecture is the Evoformer, a novel transformer-based neural network:
The final stage of AlphaFold2 is the Structure Module, which converts refined representations into 3D coordinates:
AlphaFold2 employs two additional techniques that significantly improve its performance:
This sophisticated architecture enables AlphaFold2 to achieve remarkable accuracy, with many predictions reaching experimental-level quality (GDT scores > 90). The integration of evolutionary information with geometric reasoning through deep learning represents a fundamental breakthrough in protein structure prediction.
Our approach utilizes ColabFold, an open-source adaptation of AlphaFold2 designed for accessibility and efficiency. Developed by Mirdita et al., ColabFold significantly reduces the computational barriers to protein structure prediction while maintaining AlphaFold2's accuracy. Key advantages of ColabFold include:
ColabFold preserves the core AlphaFold2 architecture while making it accessible to researchers without specialized computing infrastructure. Our implementation follows the standard ColabFold pipeline, which employs a multi-stage neural network architecture consisting of:
For our experiments, we configured ColabFold with the following parameters:
Our experiments covered two distinct scenarios. First, we predicted the structure of lysozyme (129 amino acids), achieving very high confidence scores (pLDDT ~0.98) that compared favorably with experimentally determined structures (PDB ID: 1LYZ). The entire prediction process took approximately 1-2 minutes on a T4 GPU, with early stopping triggering after just 2 recycles due to structural convergence. Second, we conducted limit-testing experiments on a larger fragment (~1000 amino acids) with minimal MSA and no templates, which resulted in dramatically lower confidence predictions (pLDDT scores averaging 30-50%), demonstrating the critical dependence of AlphaFold2 on sufficient evolutionary and template information, especially for larger proteins.
Our implementation and demonstration can be found in our GitHub repository: AlphaFold2-Demo. The repository includes our Jupyter notebook with step-by-step execution of the ColabFold pipeline, visualization code, and explanations of each component. We've also incorporated auxiliary scripts for structural comparison between predicted models and experimentally determined structures.
We conducted structure prediction experiments on lysozyme (129 amino acids), a well-characterized enzyme found in egg whites and human tears. Our implementation:
The high confidence score (blue coloring in pLDDT visualization) indicates a highly reliable prediction across the entire protein structure, demonstrating AlphaFold2's remarkable ability to predict protein structures with near-experimental accuracy.
To understand the boundaries of AlphaFold2's capabilities, we conducted two challenging prediction scenarios:
We tested how the prediction quality changes for a significantly larger protein fragment( ~1000 amino acids) under challenging conditions when evolutionary information is restricted:
These experiments collectively demonstrate that AlphaFold2's prediction quality depends on multiple interdependent factors:
Our hands-on exploration of AlphaFold2 through ColabFold has revealed both the impressive capabilities and important limitations of this revolutionary technology. While we achieved remarkable accuracy with lysozyme prediction (pLDDT scores of 0.984), our experiments with challenging scenarios demonstrated clear performance boundaries. When working with a protein fragment under constrained conditions (minimal MSA depth and no templates), prediction confidence dropped significantly, with pLDDT scores averaging only around 30-50% and high predicted alignment errors as visualized in the red confidence maps.
These findings highlight a crucial insight about the interdependent factors affecting prediction quality: the absence of sufficient evolutionary information (limited MSA) combined with lack of structural templates dramatically impacts performance, especially for challenging proteins. Our experimental comparison demonstrates how AlphaFold2's success depends on the integration of multiple data sources rather than any single algorithmic innovation. This Nobel Prize-worthy breakthrough represents a sophisticated synthesis of evolutionary information, structural knowledge, and deep learning rather than a complete departure from traditional approaches.
[1] Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583-589 (2021). This groundbreaking paper introduced AlphaFold2 and its revolutionary approach to protein structure prediction. It details the neural network architecture that solved a 50-year-old grand challenge in biology with near-experimental accuracy.
[2] Mirdita, M., Schütze, K., Moriwaki, Y. et al. ColabFold: making protein folding accessible to all. Nat Methods 19, 679-682 (2022). This paper describes an optimized implementation of AlphaFold2 designed to run efficiently on consumer hardware. ColabFold democratizes access to state-of-the-art protein structure prediction by reducing computational requirements while maintaining high accuracy.
[3] Google DeepMind. Demis Hassabis and John Jumper awarded Nobel Prize in Chemistry. DeepMind Blog (2024). Retrieved April 21, 2025. This announcement highlights the unprecedented scientific impact of AlphaFold2, which earned a Nobel Prize just four years after publication. The recognition underscores how AI can solve fundamental scientific problems previously thought to require decades more research.
[4] Ahdritz, G., Bouatta, N., Kadyan, S. et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat Methods (2024). This paper presents OpenFold, an open-source reimplementation and retraining of AlphaFold2 that provides deeper insights into how the model works. The study reveals important details about AlphaFold2's learning mechanisms and its ability to generalize to novel protein structures.