A Distributed Computing Platform for fMRI Big Data Analytics

IEEE Trans Big Data. 2019 Jun;5(2):109-119. doi: 10.1109/TBDATA.2018.2811508. Epub 2018 Mar 6.

Abstract

Since the BRAIN Initiative and Human Brain Project began, a few efforts have been made to address the computational challenges of neuroscience Big Data. The promises of these two projects were to model the complex interaction of brain and behavior and to understand and diagnose brain diseases by collecting and analyzing large quanitites of data. Archiving, analyzing, and sharing the growing neuroimaging datasets posed major challenges. New computational methods and technologies have emerged in the domain of Big Data but have not been fully adapted for use in neuroimaging. In this work, we introduce the current challenges of neuroimaging in a big data context. We review our efforts toward creating a data management system to organize the large-scale fMRI datasets, and present our novel algorithms/methods for the distributed fMRI data processing that employs Hadoop and Spark. Finally, we demonstrate the significant performance gains of our algorithms/methods to perform distributed dictionary learning.

Keywords: apache-spark; big data analytics; distributed computing; fMRI; machine learning.