Background: The UK Biobank is a large prospective cohort, based in the UK, that has deep phenotypic and genomic data on roughly a half a million individuals. Included in this resource are data on approximately 78,000 individuals with "non-white British ancestry." While most epidemiology studies have focused predominantly on populations of European ancestry, there is an opportunity to contribute to the study of health and disease for a broader segment of the population by making use of the UK Biobank's "non-white British ancestry" samples. Here, we present an empirical description of the continental ancestry and population structure among the individuals in this UK Biobank subset.
Results: Reference populations from the 1000 Genomes Project for Africa, Europe, East Asia, and South Asia were used to estimate ancestry for each individual. Those with at least 80% ancestry in one of these four continental ancestry groups were taken forward (N = 62,484). Principal component and K-means clustering analyses were used to identify and characterize population structure within each ancestry group. Of the approximately 78,000 individuals in the UK Biobank that are of "non-white British" ancestry, 50,685, 6653, 2782, and 2364 individuals were associated to the European, African, South Asian, and East Asian continental ancestry groups, respectively. Each continental ancestry group exhibits prominent population structure that is consistent with self-reported country of birth data and geography.
Conclusions: Methods outlined here provide an avenue to leverage UK Biobank's deeply phenotyped data allowing researchers to maximize its potential in the study of health and disease in individuals of non-white British ancestry.
Keywords: Ancestry; Population structure; UK Biobank.
© 2022. The Author(s).