Patch-based convolutional neural networks for automatic landmark detection of 3D facial images in clinical settings

Bodore Al-Baker; Ashraf Ayoub; Xiangyang Ju; Peter Mossey

doi:10.1093/ejo/cjae056

Patch-based convolutional neural networks for automatic landmark detection of 3D facial images in clinical settings

Eur J Orthod. 2024 Dec 1;46(6):cjae056. doi: 10.1093/ejo/cjae056.

Authors

Bodore Al-Baker¹, Ashraf Ayoub², Xiangyang Ju³, Peter Mossey⁴

Affiliations

¹ Orthodontic Department, Hamad Dental Center, Hamad Medical Corporation, Doha, Qatar.
² Scottish Craniofacial Research Group, Glasgow University Dental Hospital & School, School of Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.
³ Medical Devices Unit, Department of Clinical Physics and Bioengineering, National Health Service of Greater Glasgow and Clyde, Glasgow, United Kingdom.
⁴ Dental Hospital and School, University of Dundee, Dundee, United Kingdom.

PMID: 39607679
DOI: 10.1093/ejo/cjae056

Abstract

Background: The facial landmark annotation of 3D facial images is crucial in clinical orthodontics and orthognathic surgeries for accurate diagnosis and treatment planning. While manual landmarking has traditionally been the gold standard, it is labour-intensive and prone to variability.

Objective: This study presents a framework for automated landmark detection in 3D facial images within a clinical context, using convolutional neural networks (CNNs), and it assesses its accuracy in comparison to that of ground-truth data.

Material and methods: Initially, an in-house dataset of 408 3D facial images, each annotated with 37 landmarks by an expert, was constructed. Subsequently, a 2.5D patch-based CNN architecture was trained using this dataset to detect the same set of landmarks automatically.

Results: The developed CNN model demonstrated high accuracy, with an overall mean localization error of 0.83 ± 0.49 mm. The majority of the landmarks had low localization errors, with 95% exhibiting a mean error of less than 1 mm across all axes. Moreover, the method achieved a high success detection rate, with 88% of detections having an error below 1.5 mm and 94% below 2 mm.

Conclusion: The automated method used in this study demonstrated accuracy comparable to that achieved with manual annotations within clinical settings. In addition, the proposed framework for automatic landmark localization exhibited improved accuracy over existing models in the literature. Despite these advancements, it is important to acknowledge the limitations of this research, such as that it was based on a single-centre study and a single annotator. Future work should address computational time challenges to achieve further enhancements. This approach has significant potential to improve the efficiency and accuracy of orthodontic and orthognathic procedures.

Keywords: 3D facial images; convolutional neural networks; landmark annotation; mean localization error; orthodontics; orthognathic surgery.

MeSH terms

Anatomic Landmarks* / diagnostic imaging
Face* / anatomy & histology
Face* / diagnostic imaging
Female
Humans
Image Processing, Computer-Assisted / methods
Imaging, Three-Dimensional* / methods
Male
Neural Networks, Computer*