Klebsiella pneumoniae is a growing cause of healthcare-associated infections for which multi-drug resistance is a concern. Its polysaccharide capsule is a major virulence determinant and epidemiological marker. However, little is known about capsule epidemiology since serological typing is not widely accessible and many isolates are serologically non-typeable. Molecular typing techniques provide useful insights, but existing methods fail to take full advantage of the information in whole genome sequences. We investigated the diversity of the capsule synthesis loci (K-loci) among 2503 K. pneumoniae genomes. We incorporated analyses of full-length K-locus nucleotide sequences and also clustered protein-encoding sequences to identify, annotate and compare K-locus structures. We propose a standardized nomenclature for K-loci and present a curated reference database. A total of 134 distinct K-loci were identified, including 31 novel types. Comparative analyses indicated 508 unique protein-encoding gene clusters that appear to reassort via homologous recombination. Extensive intra- and inter-locus nucleotide diversity was detected among the wzi and wzc genes, indicating that current molecular typing schemes based on these genes are inadequate. As a solution, we introduce Kaptive, a novel software tool that automates the process of identifying K-loci based on full locus information extracted from whole genome sequences (https://github.com/katholt/Kaptive). This work highlights the extensive diversity of Klebsiella K-loci and the proteins that they encode. The nomenclature, reference database and novel typing method presented here will become essential resources for genomic surveillance and epidemiological investigations of this pathogen.
Keywords: Klebsiella capsule K-locus genomic surveillance.