Background: Jointly analyzing multiple phenotype/traits may increase power in genetic association studies by aggregating weak genetic effects. The chance that at least one phenotype is missing increases exponentially as the number of phenotype increases especially for a real dataset. It is a common practice to discard individuals with missing phenotype or phenotype with a large proportion of missing values. Such a discarding method may lead to a loss of power or even an insufficient sample size for analysis. To our knowledge, many existing phenotype imputing methods are built on multivariate normal assumptions for analysis. Violation of these assumptions may lead to inflated type I errors or even loss of power in some cases. To overcome these limitations, we propose a novel phenotype imputation method based on a new Gaussian copula model with three different loss functions to address the issue of missing phenotype.
Results: In a variety of simulations and a real genetic association study for lung function, we show that our method outperforms existing methods and can also increase the power of the association test when compared to other comparable phenotype imputation methods. The proposed method is implemented in an R package available at https://github.com/jane-zizhen-zhao/CopulaPhenoImpute1.0 CONCLUSIONS: We propose a novel phenotype imputation method with a new Gaussian copula model based on three loss functions. Results of the simulation studies and real data analyses illustrate that the proposed method outperforms comparable methods.
Keywords: Gaussian copula; Genetic studies; Inflated type I error; Loss function; Phenotype imputation.
© 2024. The Author(s).