LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images

Hui Li; Tianyang Xu; Xiao-Jun Wu; Jiwen Lu; Josef Kittler

doi:10.1109/TPAMI.2023.3268209

LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images

IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):11040-11052. doi: 10.1109/TPAMI.2023.3268209. Epub 2023 Aug 7.

Authors

Hui Li, Tianyang Xu, Xiao-Jun Wu, Jiwen Lu, Josef Kittler

PMID: 37074897
DOI: 10.1109/TPAMI.2023.3268209

Abstract

Deep learning based fusion methods have been achieving promising performance in image fusion tasks. This is attributed to the network architecture that plays a very important role in the fusion process. However, in general, it is hard to specify a good fusion architecture, and consequently, the design of fusion networks is still a black art, rather than science. To address this problem, we formulate the fusion task mathematically, and establish a connection between its optimal solution and the network architecture that can implement it. This approach leads to a novel method proposed in the paper of constructing a lightweight fusion network. It avoids the time-consuming empirical network design by a trial-and-test strategy. In particular we adopt a learnable representation approach to the fusion task, in which the construction of the fusion network architecture is guided by the optimisation algorithm producing the learnable model. The low-rank representation (LRR) objective is the foundation of our learnable model. The matrix multiplications, which are at the heart of the solution are transformed into convolutional operations, and the iterative process of optimisation is replaced by a special feed-forward network. Based on this novel network architecture, an end-to-end lightweight fusion network is constructed to fuse infrared and visible light images. Its successful training is facilitated by a detail-to-semantic information loss function proposed to preserve the image details and to enhance the salient features of the source images. Our experiments show that the proposed fusion network exhibits better fusion performance than the state-of-the-art fusion methods on public datasets. Interestingly, our network requires a fewer training parameters than other existing methods.