Understanding false positives in reporter gene assays: in silico chemogenomics approaches to prioritize cell-based HTS data

Thomas J Crisman; Christian N Parker; Jeremy L Jenkins; Josef Scheiber; Mathis Thoma; Zhao Bin Kang; Richard Kim; Andreas Bender; James H Nettles; John W Davies; Meir Glick

doi:10.1021/ci6005504

Understanding false positives in reporter gene assays: in silico chemogenomics approaches to prioritize cell-based HTS data

J Chem Inf Model. 2007 Jul-Aug;47(4):1319-27. doi: 10.1021/ci6005504. Epub 2007 Jul 4.

Authors

Thomas J Crisman¹, Christian N Parker, Jeremy L Jenkins, Josef Scheiber, Mathis Thoma, Zhao Bin Kang, Richard Kim, Andreas Bender, James H Nettles, John W Davies, Meir Glick

Affiliation

¹ Lead Discovery Center, Novartis Institutes for Biomedical Research Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA.

PMID: 17608469
DOI: 10.1021/ci6005504

Abstract

High throughput screening (HTS) data is often noisy, containing both false positives and negatives. Thus, careful triaging and prioritization of the primary hit list can save time and money by identifying potential false positives before incurring the expense of followup. Of particular concern are cell-based reporter gene assays (RGAs) where the number of hits may be prohibitively high to be scrutinized manually for weeding out erroneous data. Based on statistical models built from chemical structures of 650 000 compounds tested in RGAs, we created "frequent hitter" models that make it possible to prioritize potential false positives. Furthermore, we followed up the frequent hitter evaluation with chemical structure based in silico target predictions to hypothesize a mechanism for the observed "off target" response. It was observed that the predicted cellular targets for the frequent hitters were known to be associated with undesirable effects such as cytotoxicity. More specifically, the most frequently predicted targets relate to apoptosis and cell differentiation, including kinases, topoisomerases, and protein phosphatases. The mechanism-based frequent hitter hypothesis was tested using 160 additional druglike compounds predicted by the model to be nonspecific actives in RGAs. This validation was successful (showing a 50% hit rate compared to a normal hit rate as low as 2%), and it demonstrates the power of computational models toward understanding complex relations between chemical structure and biological function.

MeSH terms

False Positive Reactions
Genes, Reporter*
Genomics*
Reproducibility of Results