CSTAN: A Deepfake Detection Network with CST Attention for Superior Generalization

Sensors (Basel). 2024 Nov 5;24(22):7101. doi: 10.3390/s24227101.

Abstract

With the advancement of deepfake forgery technology, highly realistic fake faces have posed serious security risks to sensor-based facial recognition systems. Recent deepfake detection models mainly use binary classification models based on deep learning. Despite achieving high detection accuracy on intra-datasets, these models lack generalization ability when applied to cross-datasets. We propose a deepfake detection model named Channel-Spatial-Triplet Attention Network (CSTAN), which focuses on the difference between real and fake features, thereby enhancing the generality of the detection model. To enhance the feature-learning ability of the model for image forgery regions, we have designed the Channel-Spatial-Triplet (CST) attention mechanism, which extracts subtle local information by capturing feature channels and the spatial correlation of three different scales. Additionally, we propose a novel feature extraction method, OD-ResNet-34, by embedding ODConv into the feature extraction network to enhance its dynamic adaptability to data features. Trained on the FF++ dataset and tested on the Celeb-DF-v1 and Celeb-DF-v2 datasets, the experimental results show that our model has stronger generalization ability in cross-datasets than similar models.

Keywords: attention mechanism; deepfake detection; detection model; feature extraction.

MeSH terms

  • Algorithms
  • Automated Facial Recognition / methods
  • Deep Learning*
  • Face / anatomy & histology
  • Face / physiology
  • Facial Recognition / physiology
  • Humans
  • Image Processing, Computer-Assisted / methods
  • Neural Networks, Computer*