At least 40 human diseases are associated with repeat expansions; yet, the mutational origin and instability mechanisms remain unknown for most of them. Previously, genetic epidemiology and predisposing backgrounds for the instability of some expanding loci have been studied in different populations through the analysis of diversity flanking the respective pathogenic repeats. Here, we aimed at developing a pipeline to assess disease-associated haplotypes at oligonucleotide repeat loci, combining analysis of single nucleotide polymorphisms (SNPs) and short tandem repeats (STRs). Machado-Joseph disease (MJD/SCA3), the most frequent dominant ataxia worldwide, was used as an example of a detailed procedure. Thus, to identify genetic backgrounds that segregate with expanded/mutated alleles in MJD, we selected a set of 26 SNPs and 7 STRs flanking the causative CAG repeat. Key criteria and steps for this selection are described, and included (1) haplotype blocks minimizing the occurrence of recombination (for SNPs); and (2) match scores to increase potential for polymorphic information content of repetitive sequences found in Tandem Repeats Finder (for STRs). To directly assess SNP haplotypes in phase with MJD expansions, we optimized a strategy with preferential amplification of normal over expanded alleles, in addition to SNP allele-specific amplifications; this allowed the identification of disease-associated SNP haplotypes, even when only the proband is available in a given family. To infer STR haplotypes, we optimized a multiplex PCR, including 7 STRs plus the MJD_CAG repeat, followed by analysis of segregation or the use of the PHASE software. This protocol is a ready-to-use tool to assess MJD haplotypes in different populations. The pipeline designed can be used to assess disease-associated haplotypes in other repeat-expansion diseases. This should be of great utility to study (1) genetic epidemiology (population-of-origin, age and spreading routes of mutations) and (2) mechanisms responsible for de novo expansions, in these neurological diseases; (3) to detect predisposing haplotypes and (4) phenotype modifiers; (5) to help solving cases of apparent homoallelism (two same-size normal alleles) in diagnosis; and (6) to identify the best targets for the development of allele-specific therapies in ethnically diverse patient populations.
Keywords: CAG expansion; Machado-Joseph disease; SCA3; SNP; STR; haplotype; mutation origin; repeat instability.