An automatic end-to-end chemical synthesis development platform powered by large language models

Yixiang Ruan; Chenyin Lu; Ning Xu; Yuchen He; Yixin Chen; Jian Zhang; Jun Xuan; Jianzhang Pan; Qun Fang; Hanyu Gao; Xiaodong Shen; Ning Ye; Qiang Zhang; Yiming Mo

doi:10.1038/s41467-024-54457-x

An automatic end-to-end chemical synthesis development platform powered by large language models

Nat Commun. 2024 Nov 23;15(1):10160. doi: 10.1038/s41467-024-54457-x.

Authors

Yixiang Ruan^{1

2}, Chenyin Lu², Ning Xu^{1

2}, Yuchen He^{1

2}, Yixin Chen^{1

2}, Jian Zhang², Jun Xuan², Jianzhang Pan^{2

3}, Qun Fang^{2

3}, Hanyu Gao⁴, Xiaodong Shen⁵, Ning Ye⁶, Qiang Zhang^{2

7}, Yiming Mo^{8

9}

Affiliations

¹ College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, China.
² Zhejiang-Hong Kong Joint Laboratory for Intelligent Molecule and Material Design and Synthesis, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311215, China.
³ Institute of Microanalytical Systems, Department of Chemistry, Zhejiang University, Hangzhou, 310058, China.
⁴ Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, 999077, China.
⁵ Chemical & Analytical Development, Suzhou Novartis Technical Development Co. Ltd., Changshu, 215537, China.
⁶ Rezubio Pharmaceuticals Co. Ltd., Zhuhai, 519070, China.
⁷ College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China.
⁸ College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, China. yimingmo@zju.edu.cn.
⁹ Zhejiang-Hong Kong Joint Laboratory for Intelligent Molecule and Material Design and Synthesis, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311215, China. yimingmo@zju.edu.cn.

PMID: 39580482
DOI: 10.1038/s41467-024-54457-x

Abstract

The rapid emergence of large language model (LLM) technology presents promising opportunities to facilitate the development of synthetic reactions. In this work, we leveraged the power of GPT-4 to build an LLM-based reaction development framework (LLM-RDF) to handle fundamental tasks involved throughout the chemical synthesis development. LLM-RDF comprises six specialized LLM-based agents, including Literature Scouter, Experiment Designer, Hardware Executor, Spectrum Analyzer, Separation Instructor, and Result Interpreter, which are pre-prompted to accomplish the designated tasks. A web application with LLM-RDF as the backend was built to allow chemist users to interact with automated experimental platforms and analyze results via natural language, thus, eliminating the need for coding skills and ensuring accessibility for all chemists. We demonstrated the capabilities of LLM-RDF in guiding the end-to-end synthesis development process for the copper/TEMPO catalyzed aerobic alcohol oxidation to aldehyde reaction, including literature search and information extraction, substrate scope and condition screening, reaction kinetics study, reaction condition optimization, reaction scale-up and product purification. Furthermore, LLM-RDF's broader applicability and versability was validated on various synthesis tasks of three distinct reactions (S_NAr reaction, photoredox C-C cross-coupling reaction, and heterogeneous photoelectrochemical reaction).

MeSH terms

Alcohols / chemistry
Aldehydes / chemistry
Catalysis
Chemistry Techniques, Synthetic / methods
Copper* / chemistry
Cyclic N-Oxides
Kinetics
Oxidation-Reduction
Software

Substances

Copper
TEMPO
Aldehydes
Alcohols
Cyclic N-Oxides