Evaluation of computational programs to predict HLA genotypes from genomic sequencing data

Brief Bioinform. 2018 Mar 1;19(2):179-187. doi: 10.1093/bib/bbw097.

Abstract

Motivation: Despite being essential for numerous clinical and research applications, high-resolution human leukocyte antigen (HLA) typing remains challenging and laboratory tests are also time-consuming and labour intensive. With next-generation sequencing data becoming widely accessible, on-demand in silico HLA typing offers an economical and efficient alternative.

Results: In this study we evaluate the HLA typing accuracy and efficiency of five computational HLA typing methods by comparing their predictions against a curated set of > 1000 published polymerase chain reaction-derived HLA genotypes on three different data sets (whole genome sequencing, whole exome sequencing and transcriptomic sequencing data). The highest accuracy at clinically relevant resolution (four digits) we observe is 81% on RNAseq data by PHLAT and 99% accuracy by OptiType when limited to Class I genes only. We also observed variability between the tools for resource consumption, with runtime ranging from an average of 5 h (HLAminer) to 7 min (seq2HLA) and memory from 12.8 GB (HLA-VBSeq) to 0.46 GB (HLAminer) per sample. While a minimal coverage is required, other factors also determine prediction accuracy and the results between tools do not correlate well. Therefore, by combining tools, there is the potential to develop a highly accurate ensemble method that is able to deliver fast, economical HLA typing from existing sequencing data.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology / methods
  • Exome
  • Genotype
  • HLA Antigens / genetics*
  • Histocompatibility Testing / methods*
  • Humans
  • Sequence Analysis, DNA / methods*

Substances

  • HLA Antigens