Adaptive optics (AO) is an established technique to measure and compensate for optical aberrations. One of its key components is the wavefront sensor (WFS), which is typically a Shack-Hartmann sensor (SH) capturing an image related to the aberrated wavefront. We propose an efficient implementation of the SH-WFS centroid extraction algorithm, tailored for edge computing. In the edge-computing paradigm, the data are elaborated close to the source (i.e., at the edge) through low-power embedded architectures, in which CPU computing elements are combined with heterogeneous accelerators (e.g., GPUs, field-programmable gate arrays). Since the control loop latency must be minimized to compensate for the wavefront aberration temporal dynamics, we propose an optimized algorithm that takes advantage of the unified CPU/GPU memory of recent low-power embedded architectures. Experimental results show that the centroid extraction latency obtained over spot images up to 700×700 pixels wide is smaller than 2 ms. Therefore, our approach meets the temporal requirements of small- to medium-sized AO systems, which are equipped with deformable mirrors having tens of actuators.