Re-identification from histopathology images

Jonathan Ganz; Jonas Ammeling; Samir Jabari; Katharina Breininger; Marc Aubreville

doi:10.1016/j.media.2024.103335

Re-identification from histopathology images

Med Image Anal. 2025 Jan:99:103335. doi: 10.1016/j.media.2024.103335. Epub 2024 Sep 19.

Authors

Jonathan Ganz¹, Jonas Ammeling¹, Samir Jabari², Katharina Breininger³, Marc Aubreville⁴

Affiliations

¹ Technische Hochschule Ingolstadt, Esplanade 10, 85049, Ingolstadt, Germany.
² Klinikum Nuremberg, Institute of Pathology, Paracelsus Medical University, Prof. Ernst-Nathan-Straße 1, 90419, Nuremberg, Germany; Institute of Pathology, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Krankenhausstraße 8-10, 91054, Erlangen, Germany.
³ Center for AI and Data Science, Julius-Maximilians-Universität Würzburg, John-Skilton-Straße 4a, 97074, Würzbug, Germany; Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Werner-von-Siemens-Straße 61, 91052, Erlangen, Germany.
⁴ Technische Hochschule Ingolstadt, Esplanade 10, 85049, Ingolstadt, Germany; Flensburg Artificial Intelligence Research (FLAIR) and Department Information and Communication, Flensburg University of Applied Sciences, Kanzleistraße 91-93, 24943, Flensburg, Germany. Electronic address: marc.aubreville@hs-flensburg.de.

PMID: 39316996
DOI: 10.1016/j.media.2024.103335

Abstract

In numerous studies, deep learning algorithms have proven their potential for the analysis of histopathology images, for example, for revealing the subtypes of tumors or the primary origin of metastases. These models require large datasets for training, which must be anonymized to prevent possible patient identity leaks. This study demonstrates that even relatively simple deep learning algorithms can re-identify patients in large histopathology datasets with substantial accuracy. In addition, we compared a comprehensive set of state-of-the-art whole slide image classifiers and feature extractors for the given task. We evaluated our algorithms on two TCIA datasets including lung squamous cell carcinoma (LSCC) and lung adenocarcinoma (LUAD). We also demonstrate the algorithm's performance on an in-house dataset of meningioma tissue. We predicted the source patient of a slide with F₁ scores of up to 80.1% and 77.19% on the LSCC and LUAD datasets, respectively, and with 77.09% on our meningioma dataset. Based on our findings, we formulated a risk assessment scheme to estimate the risk to the patient's privacy prior to publication.

Keywords: Deep learning; Digital pathology; Re-identification.

MeSH terms

Adenocarcinoma of Lung / diagnostic imaging
Adenocarcinoma of Lung / pathology
Algorithms*
Carcinoma, Squamous Cell / diagnostic imaging
Carcinoma, Squamous Cell / pathology
Data Anonymization
Deep Learning
Humans
Image Interpretation, Computer-Assisted / methods
Lung Neoplasms* / diagnostic imaging
Lung Neoplasms* / pathology
Meningeal Neoplasms / diagnostic imaging
Meningeal Neoplasms / pathology
Meningioma* / diagnostic imaging
Meningioma* / pathology