Comprehensive encoding of conformational and compositional protein structural ensembles through the mmCIF data structure

IUCrJ. 2024 Jul 1;11(Pt 4):494-501. doi: 10.1107/S2052252524005098.

Abstract

In the folded state, biomolecules exchange between multiple conformational states crucial for their function. However, most structural models derived from experiments and computational predictions only encode a single state. To represent biomolecules accurately, we must move towards modeling and predicting structural ensembles. Information about structural ensembles exists within experimental data from X-ray crystallography and cryo-electron microscopy. Although new tools are available to detect conformational and compositional heterogeneity within these ensembles, the legacy PDB data structure does not robustly encapsulate this complexity. We propose modifications to the macromolecular crystallographic information file (mmCIF) to improve the representation and interrelation of conformational and compositional heterogeneity. These modifications will enable the capture of macromolecular ensembles in a human and machine-interpretable way, potentially catalyzing breakthroughs for ensemble-function predictions, analogous to the achievements of AlphaFold with single-structure prediction.

Keywords: biomolecules; cryoEM; ensemble–function predictions; macromolecular ensembles; mmCIF.

MeSH terms

  • Cryoelectron Microscopy* / methods
  • Crystallography, X-Ray
  • Databases, Protein*
  • Humans
  • Models, Molecular*
  • Protein Conformation*
  • Proteins* / chemistry

Substances

  • Proteins