An increasingly common viewpoint is that protein dynamics datasets reside in a nonlinear subspace of low conformational energy. Ideal data analysis tools should therefore account for such nonlinear geometry. The Riemannian geometry setting can be suitable for a variety of reasons. First, it comes with a rich mathematical structure to account for a wide range of geometries that can be modeled after an energy landscape. Second, many standard data analysis tools developed for data in Euclidean space can be generalized to Riemannian manifolds. In the context of protein dynamics, a conceptual challenge comes from the lack of guidelines for constructing a smooth Riemannian structure based on an energy landscape. In addition, computational feasibility in computing geodesics and related mappings poses a major challenge. This work considers these challenges. The first part of the paper develops a local approximation technique for computing geodesics and related mappings on Riemannian manifolds in a computationally feasible manner. The second part constructs a smooth manifold and a Riemannian structure that is based on an energy landscape for protein conformations. The resulting Riemannian geometry is tested on several data analysis tasks relevant for protein dynamics data. In particular, the geodesics with given start- and end-points approximately recover corresponding molecular dynamics trajectories for proteins that undergo relatively ordered transitions with medium-sized deformations. The Riemannian protein geometry also gives physically realistic summary statistics and retrieves the underlying dimension even for large-sized deformations within seconds on a laptop.
Keywords: Riemannian manifold; dimension reduction; interpolation; manifold-valued data; protein dynamics.