Members of the SOX family of transcription factors are found throughout the animal kingdom, are characterized by the presence of a DNA-binding HMG domain, and are involved in a diverse range of developmental processes. Previous attempts to group SOX genes and deduce their structural, functional, and evolutionary relationships have relied largely on complete or partial HMG box sequence of a limited number of genes. In this study, we have used complete HMG domain sequence, full-length protein structure, and gene organization data to study the pattern of evolution within the family. For the first time, a substantial number of invertebrate SOX sequences have been included in the analysis. We find support for subdivision of the family into groups A-H, as has been suggested in some previous studies, and for the assignment of two new groups, I and J. For vertebrate genes, it appears that relatedness as suggested by HMG domain sequence is congruent with relatedness as indicated by overall structure of the full-length protein and intron-exon structure of the genes. Most of the SOX groups identified in vertebrates were represented by a single SOX sequence in each invertebrate species studied. We have named anonymous sequences and, where appropriate, have suggested systematic names for some previously identified sequences. In addition, we identify an HMG domain signature motif which may be considered representative of the SOX family. Based on our data, we propose a robust phylogeny of SOX genes that reflects their evolutionary history in metazoans.
Copyright 2000 Academic Press.