BRAX, Brazilian labeled chest x-ray dataset

Eduardo P Reis; Joselisa P Q de Paiva; Maria C B da Silva; Guilherme A S Ribeiro; Victor F Paiva; Lucas Bulgarelli; Henrique M H Lee; Paulo V Santos; Vanessa M Brito; Lucas T W Amaral; Gabriel L Beraldo; Jorge N Haidar Filho; Gustavo B S Teles; Gilberto Szarf; Tom Pollard; Alistair E W Johnson; Leo A Celi; Edson Amaro Jr

doi:10.1038/s41597-022-01608-8

BRAX, Brazilian labeled chest x-ray dataset

Sci Data. 2022 Aug 10;9(1):487. doi: 10.1038/s41597-022-01608-8.

Authors

Eduardo P Reis^{1

2}, Joselisa P Q de Paiva³, Maria C B da Silva³, Guilherme A S Ribeiro³, Victor F Paiva⁴, Lucas Bulgarelli⁵, Henrique M H Lee³, Paulo V Santos³, Vanessa M Brito³, Lucas T W Amaral³, Gabriel L Beraldo³, Jorge N Haidar Filho⁴, Gustavo B S Teles³, Gilberto Szarf³, Tom Pollard⁵, Alistair E W Johnson⁶, Leo A Celi^{5

7

8}, Edson Amaro Jr^{4

3}

Affiliations

¹ Hospital Israelita Albert Einstein - Big Data Analytics, São Paulo, Brazil. eduardo.reis@einstein.br.
² Hospital Israelita Albert Einstein - Imaging Department, São Paulo, Brazil. eduardo.reis@einstein.br.
³ Hospital Israelita Albert Einstein - Imaging Department, São Paulo, Brazil.
⁴ Hospital Israelita Albert Einstein - Big Data Analytics, São Paulo, Brazil.
⁵ Massachusetts Institute of Technology - Laboratory for Computational Physiology, Cambridge, USA.
⁶ The Hospital for Sick Children - Peter Gilgan Centre for Research and Learning, Toronto, Canada.
⁷ Beth Israel Deaconess Medical Center - Department of Medicine, Boston, USA.
⁸ Harvard T.H. Chan School of Public Health - Department of Biostatistics, Boston, USA.

Abstract

Chest radiographs allow for the meticulous examination of a patient's chest but demands specialized training for proper interpretation. Automated analysis of medical imaging has become increasingly accessible with the advent of machine learning (ML) algorithms. Large labeled datasets are key elements for training and validation of these ML solutions. In this paper we describe the Brazilian labeled chest x-ray dataset, BRAX: an automatically labeled dataset designed to assist researchers in the validation of ML models. The dataset contains 24,959 chest radiography studies from patients presenting to a large general Brazilian hospital. A total of 40,967 images are available in the BRAX dataset. All images have been verified by trained radiologists and de-identified to protect patient privacy. Fourteen labels were derived from free-text radiology reports written in Brazilian Portuguese using Natural Language Processing.

Publication types

Dataset

MeSH terms

Algorithms*
Brazil
Humans
Natural Language Processing*
Radiography, Thoracic*
X-Rays

Abstract

Publication types

MeSH terms

Grants and funding