Task-based functional magnetic resonance imaging (tfMRI) has been widely used to study functional brain networks under task performance. Modeling tfMRI data is challenging due to at least two problems: the lack of the ground truth of underlying neural activity and the highly complex intrinsic structure of tfMRI data. To better understand brain networks based on fMRI data, data-driven approaches have been proposed, for instance, independent component analysis (ICA) and sparse dictionary learning (SDL). However, both ICA and SDL only build shallow models, and they are under the strong assumption that original fMRI signal could be linearly decomposed into time series components with their corresponding spatial maps. As growing evidence shows that human brain function is hierarchically organized, new approaches that can infer and model the hierarchical structure of brain networks are widely called for. Recently, deep convolutional neural network (CNN) has drawn much attention, in that deep CNN has proven to be a powerful method for learning high-level and mid-level abstractions from low-level raw data. Inspired by the power of deep CNN, in this paper, we developed a new neural network structure based on CNN, called deep convolutional auto-encoder (DCAE), in order to take the advantages of both data-driven approach and CNN's hierarchical feature abstraction ability for the purpose of learning mid-level and high-level features from complex, large-scale tfMRI time series in an unsupervised manner. The DCAE has been applied and tested on the publicly available human connectome project tfMRI data sets, and promising results are achieved.