HIV nucleotide sequence data can identify clusters of persons with genetically similar strains suggesting transmission. We simulated the effect of lowered data completeness, defined by the percent of persons with diagnosed HIV with a reported sequence, on transmission patterns and detection of growing HIV transmission clusters. We analyzed HIV surveillance data for persons with HIV diagnosed during 2008-2014 who resided in Michigan or Washington. We calculated genetic distances, constructed the inferred transmission network for each jurisdiction, and compared transmission network characteristics and detection of growing transmission clusters in the full dataset with artificially reduced datasets. Simulating lower levels of completeness resulted in decreased percentages of persons linked to a cluster from high completeness (full dataset) to low completeness (5%) (Michigan: 54%-18%; Washington, 46%-16%). Patterns of transmission between certain populations remained robust as data completeness level was reduced. As data completeness was artificially decreased, sensitivity of cluster detection substantially diminished in both states. In Michigan, sensitivity decreased from 100% with the full dataset, to 62% at 50% completeness and 21% at 25% completeness. In Washington, sensitivity decreased from 100% with the full dataset, to 71% at 50% completeness and 29% at 25% completeness. Lower sequence data completeness limits the ability to detect clusters that may benefit from investigation; however, inferences can be made about transmission patterns even with low data completeness, given sufficient numbers. Data completeness should be prioritized, as lack of or delays in detection of transmission clusters could result in additional infections.
Keywords: HIV; completeness; detection; sensitivity; sequence; transmission cluster.