The global impact of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to considerable interest in detecting novel beneficial mutations and other genomic changes that may signal the development of variants of concern (VOCs). The ability to accurately detect these changes within individual patient samples is important in enabling early detection of VOCs. Such genomic scans for rarely acting positive selection are best performed via comparison of empirical data with simulated data wherein commonly acting evolutionary factors, including mutation and recombination, reproductive and infection dynamics, and purifying and background selection, can be carefully accounted for and parameterized. Although there has been work to quantify these factors in SARS-CoV-2, they have yet to be integrated into a baseline model describing intrahost evolutionary dynamics. To construct such a baseline model, we develop a simulation framework that enables one to establish expectations for underlying levels and patterns of patient-level variation. By varying eight key parameters, we evaluated 12,096 different model-parameter combinations and compared them with existing empirical data. Of these, 592 models (∼5%) were plausible based on the resulting mean expected number of segregating variants. These plausible models shared several commonalities shedding light on intrahost SARS-CoV-2 evolutionary dynamics: severe infection bottlenecks, low levels of reproductive skew, and a distribution of fitness effects skewed toward strongly deleterious mutations. We also describe important areas of model uncertainty and highlight additional sequence data that may help to further refine a baseline model. This study lays the groundwork for the improved analysis of existing and future SARS-CoV-2 within-patient data.
Keywords: SARS-CoV-2; evolutionary genomics; population genetics; viral evolution.
© The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.