Gradient-based sparse principal component analysis with extensions to online learning

Yixuan Qiu; Jing Lei; Kathryn Roeder

doi:10.1093/biomet/asac041

Gradient-based sparse principal component analysis with extensions to online learning

Biometrika. 2022 Jul 12;110(2):339-360. doi: 10.1093/biomet/asac041. eCollection 2023 Jun.

Authors

Yixuan Qiu¹, Jing Lei², Kathryn Roeder²

Affiliations

¹ School of Statistics and Management, Shanghai University of Finance and Economics, 777 Guoding Road, Shanghai 200433, China.
² Department of Statistics and Data Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, U.S.A.

Abstract

Sparse principal component analysis is an important technique for simultaneous dimensionality reduction and variable selection with high-dimensional data. In this work we combine the unique geometric structure of the sparse principal component analysis problem with recent advances in convex optimization to develop novel gradient-based sparse principal component analysis algorithms. These algorithms enjoy the same global convergence guarantee as the original alternating direction method of multipliers, and can be more efficiently implemented with the rich toolbox developed for gradient methods from the deep learning literature. Most notably, these gradient-based algorithms can be combined with stochastic gradient descent methods to produce efficient online sparse principal component analysis algorithms with provable numerical and statistical performance guarantees. The practical performance and usefulness of the new algorithms are demonstrated in various simulation studies. As an application, we show how the scalability and statistical accuracy of our method enable us to find interesting functional gene groups in high-dimensional RNA sequencing data.

Keywords: Convex optimization; Dimensionality reduction; Gradient descent; Online learning; Sparse principal component analysis.

Abstract

Grants and funding