Selection of a phylogenetically informative region of the norovirus genome for outbreak linkage

Linda Verhoef; Kelly P Williams; Annelies Kroneman; Bruno Sobral; Wilfrid van Pelt; Marion Koopmans; FBVE network

doi:10.1007/s11262-011-0673-x

Selection of a phylogenetically informative region of the norovirus genome for outbreak linkage

Virus Genes. 2012 Feb;44(1):8-18. doi: 10.1007/s11262-011-0673-x. Epub 2011 Sep 30.

Authors

Linda Verhoef¹, Kelly P Williams, Annelies Kroneman, Bruno Sobral, Wilfrid van Pelt, Marion Koopmans; FBVE network

Collaborators

FBVE network:
D Brown, B Adak, J Gray, J Harris, M Iturriza, B Böttiger, K Mølbak, C Johnsen, K-O Hedlund, Y Andersson, M Thorhagen, M Lysén, M Hjertqvist, P Pothier, E Kohli, K Balay, J Kaplon, G Belliot, S Le Guyader, F Ruggeri, I Di Bartolo, E Schreier, K Stark, J Koch, M Höhne, K Vainio, K Nygard, G Kapperud

Affiliation

¹ National Institute for Public Health and the Environment (RIVM), Postbak 22, 3720 BA, Bilthoven, The Netherlands. linda.verhoef@rivm.nl

Abstract

The recognition of a common source norovirus outbreak is supported by finding identical norovirus sequences in patients. Norovirus sequencing has been established in many (national) public health laboratories and academic centers, but often partial and different genome sequences are used. Therefore, agreement on a target sequence of sufficient diversity to resolve links between outbreaks is crucial. Although harmonization of laboratory methods is one of the keystone activities of networks that have the aim to identify common source norovirus outbreaks, this has proven difficult to accomplish, particularly in the international context. Here, we aimed at providing a method enabling identification of the genomic region informative of a common source norovirus outbreak by bio-informatic tools. The data set of 502 unique full length capsid gene sequences available from the public domain, combined with epidemiological data including linkage information was used to build over 3,000 maximum likelihood (ML) trees for different sequence lengths and regions. All ML trees were evaluated for robustness and specificity of clustering of known linked norovirus outbreaks against the background diversity of strains. Great differences were seen in the robustness of commonly used PCR targets for cluster detection. The capsid gene region spanning nucleotides 900-1,400 was identified as the region optimally substituting for the full length capsid region. Reliability of this approach depends on the quality of the background data set, and we recommend periodic reassessment of this growing data set. The approach may be applicable to multiple sequence-based data sets of other pathogens.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Caliciviridae Infections / epidemiology
Caliciviridae Infections / virology*
Capsid Proteins / genetics
Computational Biology / methods*
Disease Outbreaks
Genetic Linkage*
Genome, Viral*
Genotype
Humans
Molecular Sequence Data
Netherlands / epidemiology
Norovirus / classification*
Norovirus / genetics*
Norovirus / isolation & purification
Phylogeny*
United States / epidemiology

Substances

Capsid Proteins