MIMO-Uformer: A Transformer-Based Image Deblurring Network for Vehicle Surveillance Scenarios

Jian Zhang; Baoping Cheng; Tengying Zhang; Yongsheng Zhao; Tao Fu; Zijian Wu; Xiaoming Tao

doi:10.3390/jimaging10110274

MIMO-Uformer: A Transformer-Based Image Deblurring Network for Vehicle Surveillance Scenarios

J Imaging. 2024 Oct 31;10(11):274. doi: 10.3390/jimaging10110274.

Authors

Jian Zhang¹, Baoping Cheng^{1

2}, Tengying Zhang¹, Yongsheng Zhao¹, Tao Fu¹, Zijian Wu¹, Xiaoming Tao²

Affiliations

¹ China Mobile (Hangzhou) Information Technology Co., Ltd., Hangzhou 311100, China.
² Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.

PMID: 39590738
DOI: 10.3390/jimaging10110274

Abstract

Motion blur is a common problem in the field of surveillance scenarios, and it obstructs the acquisition of valuable information. Thanks to the success of deep learning, a sequence of CNN-based architecture has been designed for image deblurring and has made great progress. As another type of neural network, transformers have exhibited powerful deep representation learning and impressive performance based on high-level vision tasks. Transformer-based networks leverage self-attention to capture the long-range dependencies in the data, yet the computational complexity is quadratic to the spatial resolution, which makes transformers infeasible for the restoration of high-resolution images. In this article, we propose an efficient transformer-based deblurring network, named MIMO-Uformer, for vehicle-surveillance scenarios. The distinct feature of the MIMO-Uformer is that the basic-window-based multi-head self-attention (W-MSA) of the Swin transformer is employed to reduce the computational complexity and then incorporated into a multi-input and multi-output U-shaped network (MIMO-UNet). The performance can benefit from the operation of multi-scale images by MIMO-UNet. However, most deblurring networks are designed for global blur, while local blur is more common under vehicle-surveillance scenarios since the motion blur is primarily caused by local moving vehicles. Based on this observation, we further propose an Intersection over Patch (IoP) factor and a supervised morphological loss to improve the performance based on local blur. Extensive experiments on a public and a self-established dataset are carried out to verify the effectiveness. As a result, the deblurring behavior based on PSNR is improved at least 0.21 dB based on GOPRO and 0.74 dB based on the self-established datasets compared to the existing benchmarks.

Keywords: IoP factor; MIMO-UNet; image deblurring; supervised morphological loss; transformer-based network; vehicle-surveillance scenarios.

Grants and funding

U22B2001/the National Nature Science Foundation of China