Hierarchical strategy for identifying active chemotype classes in compound databases

Chem Biol Drug Des. 2006 Jun;67(6):395-408. doi: 10.1111/j.1747-0285.2006.00397.x.

Abstract

A general methodology is presented for analyzing patterns of activity in compound databases, which is based on the use of structural chemotypes and provides a focused, hierarchical classification of active compounds. Each node in the hierarchical tree corresponds to a specific chemotype and is labeled by a unique code or identifier. All chemotypes at a given level of the hierarchy define equivalence classes, and those of higher structural resolution have a strict parent-child (i.e. subset) relationship to those of lower resolution. Active chemotypes contain a relatively high proportion of actives and are characterized through the use of enrichment plots. These plots show the relationship of occupancy to activity enrichment for a set of chemotypes at a given level of structural resolution. Paths through the hierarchy from chemotypes of lower to those of higher structural resolution (e.g. reduced cyclic system skeletons --> cyclic system skeletons --> cyclic systems --> complete molecules) are unique. Unique paths in the hierarchy that only pass through active chemotypes are called chains or paths of actives. These chains provide links for identifying structurally related active compounds at increasing levels of structural resolution. Analysis of actives can also be carried out at any specific level of structural resolution deemed appropriate by the investigator. Chemotype codes can be used to search compound databases for new molecules possessing these codes or sets of hierarchically related codes. An example, based on the NCI AIDS database, is presented that illustrates the general approach and provides a more detailed description of several interesting classes of active chemotypes and their inter-relationships.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chemistry Techniques, Analytical / methods*
  • Computational Biology
  • Databases, Factual*
  • Models, Molecular
  • Sensitivity and Specificity