Optimized high-throughput screening of non-coding variants identified from genome-wide association studies

Nucleic Acids Res. 2023 Feb 22;51(3):e18. doi: 10.1093/nar/gkac1198.

Abstract

The vast majority of disease-associated single nucleotide polymorphisms (SNP) identified from genome-wide association studies (GWAS) are localized in non-coding regions. A significant fraction of these variants impact transcription factors binding to enhancer elements and alter gene expression. To functionally interrogate the activity of such variants we developed snpSTARRseq, a high-throughput experimental method that can interrogate the functional impact of hundreds to thousands of non-coding variants on enhancer activity. snpSTARRseq dramatically improves signal-to-noise by utilizing a novel sequencing and bioinformatic approach that increases both insert size and the number of variants tested per loci. Using this strategy, we interrogated known prostate cancer (PCa) risk-associated loci and demonstrated that 35% of them harbor SNPs that significantly altered enhancer activity. Combining these results with chromosomal looping data we could identify interacting genes and provide a mechanism of action for 20 PCa GWAS risk regions. When benchmarked to orthogonal methods, snpSTARRseq showed a strong correlation with in vivo experimental allelic-imbalance studies whereas there was no correlation with predictive in silico approaches. Overall, snpSTARRseq provides an integrated experimental and computational framework to functionally test non-coding genetic variants.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genetic Predisposition to Disease
  • Genome-Wide Association Study*
  • Humans
  • Male
  • Polymorphism, Single Nucleotide
  • Regulatory Sequences, Nucleic Acid*
  • Transcription Factors / genetics

Substances

  • Transcription Factors