SCUBIDOO: A Large yet Screenable and Easily Searchable Database of Computationally Created Chemical Compounds Optimized toward High Likelihood of Synthetic Tractability

J Chem Inf Model. 2015 Sep 28;55(9):1824-35. doi: 10.1021/acs.jcim.5b00203. Epub 2015 Sep 4.

Abstract

De novo drug design is widely assisted by computational approaches that enable the generation of a tremendous amount of new virtual molecules within a short time frame. While the novelty of the computationally generated compounds can easily be assessed, such approaches often neglect the synthetic feasibility of the molecules, thus creating a potential hurdle that can be a barrier to further investigation. Therefore, we have developed SCUBIDOO, a freely accessible database concept that currently holds 21 million virtual products originating from a small library of building blocks and a collection of robust organic reactions. This large data set was reduced to three representative and computationally tractable samples denoted as S, M, and L, containing 9994, 99,977, and 999,794 products, respectively. These small sets are useful as starting points for ligand identification and optimization projects. The generated products come with synthesis instructions and alerts of possible side reactions, and we show that they exhibit drug-like properties while still extending into unexplored quadrants of chemical space, thus suggesting novelty. We show multiple examples that demonstrate how SCUBIDOO can facilitate the search around initial hits. This database might be a useful idea generator for early ligand discovery projects since it allows a focus on those molecules that are likely to be synthetically feasible and can therefore be studied further. Together with its modular building block construction principle, this database is also suitable for structure-activity relationship studies or fragment-growing strategies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Databases, Factual*
  • Drug Design
  • Drug Discovery / methods*
  • Principal Component Analysis
  • Structure-Activity Relationship