We have assembled two sets of HIV-1 V3 sequences with defined epidemiologic relationships associated with experimentally determined coreceptor usage or MT-2 cell tropism. These data sets were used for three purposes. First, they were employed to test existing methods for predicting coreceptor usage and MT-2 cell tropism. Of these methods, the presence of one basic amino acid at position 11 or 25 proved to be most reliable for both phenotypic classifications, although its predictive power for the X4 phenotype was less than 50%. Second, we used the sequence sets to train neural networks to infer coreceptor usage from V3 genotype with better success than the best available motif-based method, and with a predictive power equal to that of the best motif-based method for MT-2 cell tropism. Third, we used the sequence sets to reexamine patterns of variability associated with the different phenotypes, and we showed that the phenotype-associated sequence patterns could be reproduced from large sets of V3 sequences using phenotypes predicted by the trained neural network.
Copyright 2001 Academic Press.