Computational derivation of structural alerts from large toxicology data sets

Ernst Ahlberg; Lars Carlsson; Scott Boyer

doi:10.1021/ci500314a

Computational derivation of structural alerts from large toxicology data sets

J Chem Inf Model. 2014 Oct 27;54(10):2945-52. doi: 10.1021/ci500314a. Epub 2014 Oct 13.

Authors

Ernst Ahlberg¹, Lars Carlsson, Scott Boyer

Affiliation

¹ Drug Safety and Metabolism, AstraZeneca Research & Development , Pepparredsleden 1, 43183 Mölndal, Sweden.

PMID: 25275755
DOI: 10.1021/ci500314a

Abstract

Structural alerts have been one of the backbones of computational toxicology and have applications in many areas including cosmetic, environmental, and pharmaceutical toxicology. The development of structural alerts has always involved a manual analysis of existing data related to a relevant end point followed by the determination of substructures that appear to be related to a specific outcome. The substructures are then analyzed for their utility in posterior validation studies, which at times have stretched over years or even decades. With higher throughput methods now being employed in many areas of toxicology, data sets are growing at an unprecedented rate. This growth has made manual analysis of data sets impractical in many cases. This report outlines a fully automatic method that highlights significant substructures for toxicologically important data sets. The method identifies important substructures by computationally breaking chemical structures into fragments and analyzing those fragments for their contribution to the given activity by the calculation of a p-value and a substructure accuracy. The method is intended to aid the expert in locating and analyzing alerts by automatic retrieval of alerts or by enhancing existing alerts. The method has been applied to a data set of AMES mutagenicity results and compared to the substructures generated by manual curation of this same data set as well as another computationally based substructure identification method. The results show that this method can retrieve significant substructures quickly, that the substructures are comparable and in some cases superior to those derived from manual curation, that the substructures found covers all previously known substructures, and that they can be used to make reasonably accurate predictions of AMES activity.

MeSH terms

Animals
Computer Simulation
Datasets as Topic
Drug Design
Humans
Models, Chemical*
Molecular Conformation
Mutagenicity Tests
Mutagens / chemistry*
Mutagens / toxicity
Predictive Value of Tests
Small Molecule Libraries / chemistry*
Small Molecule Libraries / toxicity
Structure-Activity Relationship

Substances

Mutagens
Small Molecule Libraries