SOD-YOLOv8-Enhancing YOLOv8 for Small Object Detection in Aerial Imagery and Traffic Scenes

Boshra Khalili; Andrew W Smyth

doi:10.3390/s24196209

SOD-YOLOv8-Enhancing YOLOv8 for Small Object Detection in Aerial Imagery and Traffic Scenes

Sensors (Basel). 2024 Sep 25;24(19):6209. doi: 10.3390/s24196209.

Authors

Boshra Khalili¹, Andrew W Smyth¹

Affiliation

¹ Department of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027, USA.

Abstract

Object detection, as a crucial aspect of computer vision, plays a vital role in traffic management, emergency response, autonomous vehicles, and smart cities. Despite the significant advancements in object detection, detecting small objects in images captured by high-altitude cameras remains challenging, due to factors such as object size, distance from the camera, varied shapes, and cluttered backgrounds. To address these challenges, we propose small object detection YOLOv8 (SOD-YOLOv8), a novel model specifically designed for scenarios involving numerous small objects. Inspired by efficient generalized feature pyramid networks (GFPNs), we enhance multi-path fusion within YOLOv8 to integrate features across different levels, preserving details from shallower layers and improving small object detection accuracy. Additionally, we introduce a fourth detection layer to effectively utilize high-resolution spatial information. The efficient multi-scale attention module (EMA) in the C2f-EMA module further enhances feature extraction by redistributing weights and prioritizing relevant features. We introduce powerful-IoU (PIoU) as a replacement for CIoU, focusing on moderate quality anchor boxes and adding a penalty based on differences between predicted and ground truth bounding box corners. This approach simplifies calculations, speeds up convergence, and enhances detection accuracy. SOD-YOLOv8 significantly improves small object detection, surpassing widely used models across various metrics, without substantially increasing the computational cost or latency compared to YOLOv8s. Specifically, it increased recall from 40.1% to 43.9%, precision from 51.2% to 53.9%, mAP_0.5 from 40.6% to 45.1%, and mAP_0.5:0.95 from 24% to 26.6%. Furthermore, experiments conducted in dynamic real-world traffic scenes illustrated SOD-YOLOv8's significant enhancements across diverse environmental conditions, highlighting its reliability and effective object detection capabilities in challenging scenarios.

Keywords: YOLOv8; attention mechanism; bounding box regression; feature pyramid network; small object detection.

Grants and funding

EEC-2133516/National Science Foundation