Comprehensive whole-genome analyses of the UK Biobank reveal significant sex differences in both genotype missingness and allele frequency on the X chromosome

Hum Mol Genet. 2024 Feb 28;33(6):543-551. doi: 10.1093/hmg/ddad201.

Abstract

The UK Biobank is the most used dataset for genome-wide association studies (GWAS). GWAS of sex, essentially sex differences in minor allele frequencies (sdMAF), has identified autosomal SNPs with significant sdMAF, including in the UK Biobank, but the X chromosome was excluded. Our recent report identified multiple regions on the X chromosome with significant sdMAF, using short-read sequencing of other datasets. We performed a whole genome sdMAF analysis, with ~410 k white British individuals from the UK Biobank, using array genotyped, imputed or exome sequencing data. We observed marked sdMAF on the X chromosome, particularly at the boundaries between the pseudo-autosomal regions (PAR) and the non-PAR (NPR), as well as throughout the NPR, consistent with our earlier report. A small fraction of autosomal SNPs also showed significant sdMAF. Using the centrally imputed data, which relied mostly on low-coverage whole genome sequence, resulted in 2.1% of NPR SNPs with significant sdMAF. The whole exome sequencing also displays sdMAF on the X chromosome, including some NPR SNPs with heterozygous genotype calls in males. Genotyping, sequencing and imputation of X chromosomal SNPs requires further attention to ensure the integrity for downstream association analysis.

Keywords: GWAS; Sex; X chromosome; association; genotyping.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Specimen Banks*
  • Chromosomes, Human, X / genetics
  • Female
  • Gene Frequency / genetics
  • Genome-Wide Association Study
  • Genotype
  • Humans
  • Male
  • Sex Characteristics
  • UK Biobank*