Snowflake: A deep learning-based human leukocyte antigen matching algorithm considering allele-specific surface accessibility

Matthias Niemann; Benedict M Matern; Eric Spierings

doi:10.3389/fimmu.2022.937587

Snowflake: A deep learning-based human leukocyte antigen matching algorithm considering allele-specific surface accessibility

Front Immunol. 2022 Jul 29:13:937587. doi: 10.3389/fimmu.2022.937587. eCollection 2022.

Authors

Matthias Niemann¹, Benedict M Matern², Eric Spierings^{2

3}

Affiliations

¹ Research and Development, PIRCHE AG, Berlin, Germany.
² Center for Translational Immunology, University Medical Center, Utrecht, Netherlands.
³ Central Diagnostic Laboratory, University Medical Center, Utrecht, Netherlands.

Abstract

Histocompatibility in solid-organ transplantation has a strong impact on long-term graft survival. Although recent advances in matching of both B-cell epitopes and T-cell epitopes have improved understanding of allorecognition, the immunogenic determinants are still not fully understood. We hypothesized that HLA solvent accessibility is allele-specific, thus supporting refinement of HLA B-cell epitope prediction. We developed a computational pipeline named Snowflake to calculate solvent accessibility of HLA Class I proteins for deposited HLA crystal structures, supplemented by constructed HLA structures through the AlphaFold protein folding predictor and peptide binding predictions of the APE-Gen docking framework. This dataset trained a four-layer long short-term memory bidirectional recurrent neural network, which in turn inferred solvent accessibility of all known HLA Class I proteins. We extracted 676 HLA Class-I experimental structures from the Protein Data Bank and supplemented it by 37 Class-I alleles for which structures were predicted. For each of the predicted structures, 10 known binding peptides as reported by the Immune Epitope DataBase were rendered into the binding groove. Although HLA Class I proteins predominantly are folded similarly, we found higher variation in root mean square difference of solvent accessibility between experimental structures of different HLAs compared to structures with identical amino acid sequence, suggesting HLA's solvent accessible surface is protein specific. Hence, residues may be surface-accessible on e.g. HLA-A*02:01, but not on HLA-A*01:01. Mapping these data to antibody-verified epitopes as defined by the HLA Epitope Registry reveals patterns of (1) consistently accessible residues, (2) only subsets of an epitope's residues being consistently accessible and (3) varying surface accessibility of residues of epitopes. Our data suggest B-cell epitope definitions can be refined by considering allele-specific solvent-accessibility, rather than aggregating HLA protein surface maps by HLA class or locus. To support studies on epitope analyses in organ transplantation, the calculation of donor-allele-specific solvent-accessible amino acid mismatches was implemented as a cloud-based web service.

Keywords: 3D-structures; HLA; antibodies; deep-learning; epitope; epitope matching; neural network; structure prediction.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Alleles
Deep Learning*
Epitopes, B-Lymphocyte*
HLA Antigens / genetics
HLA-A Antigens
Humans
Solvents

Substances

Epitopes, B-Lymphocyte
HLA Antigens
HLA-A Antigens
Solvents