High-content image generation for drug discovery using generative adversarial networks

Shaista Hussain; Ayesha Anees; Ankit Das; Binh P Nguyen; Mardiana Marzuki; Shuping Lin; Graham Wright; Amit Singhal

doi:10.1016/j.neunet.2020.09.007

High-content image generation for drug discovery using generative adversarial networks

Neural Netw. 2020 Dec:132:353-363. doi: 10.1016/j.neunet.2020.09.007. Epub 2020 Sep 20.

Authors

Shaista Hussain¹, Ayesha Anees², Ankit Das², Binh P Nguyen³, Mardiana Marzuki⁴, Shuping Lin⁵, Graham Wright⁵, Amit Singhal⁶

Affiliations

¹ Institute of High Performance Computing, A*STAR, 138673, Singapore. Electronic address: hussains@ihpc.a-star.edu.sg.
² Institute of High Performance Computing, A*STAR, 138673, Singapore.
³ School of Mathematics and Statistics, VUW, 6140, New Zealand.
⁴ Singapore Immunology Network, A*STAR, 138648, Singapore.
⁵ Skin Research Institute of Singapore, A*STAR, 138648, Singapore.
⁶ Singapore Immunology Network, A*STAR, 138648, Singapore. Electronic address: Amit_Singhal@immunol.a-star.edu.sg.

PMID: 32977280
DOI: 10.1016/j.neunet.2020.09.007

Abstract

Immense amount of high-content image data generated in drug discovery screening requires computationally driven automated analysis. Emergence of advanced machine learning algorithms, like deep learning models, has transformed the interpretation and analysis of imaging data. However, deep learning methods generally require large number of high-quality data samples, which could be limited during preclinical investigations. To address this issue, we propose a generative modeling based computational framework to synthesize images, which can be used for phenotypic profiling of perturbations induced by drug compounds. We investigated the use of three variants of Generative Adversarial Network (GAN) in our framework, viz., a basic Vanilla GAN, Deep Convolutional GAN (DCGAN) and Progressive GAN (ProGAN), and found DCGAN to be most efficient in generating realistic synthetic images. A pre-trained convolutional neural network (CNN) was used to extract features of both real and synthetic images, followed by a classification model trained on real and synthetic images. The quality of synthesized images was evaluated by comparing their feature distributions with that of real images. The DCGAN-based framework was applied to high-content image data from a drug screen to synthesize high-quality cellular images, which were used to augment the real image data. The augmented dataset was shown to yield better classification performance compared with that obtained using only real images. We also demonstrated the application of proposed method on the generation of bacterial images and computed feature distributions for bacterial images specific to different drug treatments. In summary, our results showed that the proposed DCGAN-based framework can be utilized to generate realistic synthetic high-content images, thus enabling the study of drug-induced effects on cells and bacteria.

Keywords: Deep learning; Drug discovery; Generative modeling; High-content imaging.

MeSH terms

Algorithms
Data Accuracy
Deep Learning*
Drug Discovery / methods*
Humans
Image Processing, Computer-Assisted* / methods
Neural Networks, Computer*