Computational derivation of structural alerts from large toxicology data sets

J Chem Inf Model. 2014 Oct 27;54(10):2945-52. doi: 10.1021/ci500314a. Epub 2014 Oct 13.

Abstract

Structural alerts have been one of the backbones of computational toxicology and have applications in many areas including cosmetic, environmental, and pharmaceutical toxicology. The development of structural alerts has always involved a manual analysis of existing data related to a relevant end point followed by the determination of substructures that appear to be related to a specific outcome. The substructures are then analyzed for their utility in posterior validation studies, which at times have stretched over years or even decades. With higher throughput methods now being employed in many areas of toxicology, data sets are growing at an unprecedented rate. This growth has made manual analysis of data sets impractical in many cases. This report outlines a fully automatic method that highlights significant substructures for toxicologically important data sets. The method identifies important substructures by computationally breaking chemical structures into fragments and analyzing those fragments for their contribution to the given activity by the calculation of a p-value and a substructure accuracy. The method is intended to aid the expert in locating and analyzing alerts by automatic retrieval of alerts or by enhancing existing alerts. The method has been applied to a data set of AMES mutagenicity results and compared to the substructures generated by manual curation of this same data set as well as another computationally based substructure identification method. The results show that this method can retrieve significant substructures quickly, that the substructures are comparable and in some cases superior to those derived from manual curation, that the substructures found covers all previously known substructures, and that they can be used to make reasonably accurate predictions of AMES activity.

MeSH terms

  • Animals
  • Computer Simulation
  • Datasets as Topic
  • Drug Design
  • Humans
  • Models, Chemical*
  • Molecular Conformation
  • Mutagenicity Tests
  • Mutagens / chemistry*
  • Mutagens / toxicity
  • Predictive Value of Tests
  • Small Molecule Libraries / chemistry*
  • Small Molecule Libraries / toxicity
  • Structure-Activity Relationship

Substances

  • Mutagens
  • Small Molecule Libraries