Rule-based classification models of molecular autofluorescence

J Chem Inf Model. 2015 Feb 23;55(2):434-45. doi: 10.1021/ci5007432. Epub 2015 Feb 9.

Abstract

Fluorescence-based detection has been commonly used in high-throughput screening (HTS) assays. Autofluorescent compounds, which can emit light in the absence of artificial fluorescent markers, often interfere with the detection of fluorophores and result in false positive signals in these assays. This interference presents a major issue in fluorescence-based screening techniques. In an effort to reduce the time and cost that will be spent on prescreening of autofluorescent compounds, in silico autofluorescence prediction models were developed for selected fluorescence-based assays in this study. Five prediction models were developed based on the respective fluorophores used in these HTS assays, which absorb and emit light at specific wavelengths (excitation/emission): Alexa Fluor 350 (A350) (340 nm/450 nm), 7-amino-4-trifluoromethyl-coumarin (AFC) (405 nm/520 nm), Alexa Fluor 488 (A488) (480 nm/540 nm), Rhodamine (547 nm/598 nm), and Texas Red (547 nm/618 nm). The C5.0 rule-based classification algorithm and PubChem 2D chemical structure fingerprints were used to develop prediction models. To optimize the accuracies of these prediction models despite the highly imbalanced ratio of fluorescent versus nonfluorescent compounds presented in the collected data sets, oversampling and undersampling strategies were applied. The average final accuracy achieved for the training set was 97%, and that for the testing set was 92%. In addition, five external data sets were used to further validate the models. Ultimately, 14 representative structural features (or rules) were determined to efficiently predict autofluorescence in data sets containing both fluorescent and nonfluorescent compounds. Several cases were illustrated in this study to demonstrate the applicability of these rules.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Computer Simulation
  • Fluorescence
  • Fluorescent Dyes / chemistry
  • Fluorescent Dyes / classification*
  • Fuzzy Logic
  • High-Throughput Screening Assays / methods*
  • Machine Learning
  • Models, Chemical*
  • Predictive Value of Tests
  • Quantitative Structure-Activity Relationship
  • Structure-Activity Relationship

Substances

  • Fluorescent Dyes