An automatic end-to-end chemical synthesis development platform powered by large language models

Nat Commun. 2024 Nov 23;15(1):10160. doi: 10.1038/s41467-024-54457-x.

Abstract

The rapid emergence of large language model (LLM) technology presents promising opportunities to facilitate the development of synthetic reactions. In this work, we leveraged the power of GPT-4 to build an LLM-based reaction development framework (LLM-RDF) to handle fundamental tasks involved throughout the chemical synthesis development. LLM-RDF comprises six specialized LLM-based agents, including Literature Scouter, Experiment Designer, Hardware Executor, Spectrum Analyzer, Separation Instructor, and Result Interpreter, which are pre-prompted to accomplish the designated tasks. A web application with LLM-RDF as the backend was built to allow chemist users to interact with automated experimental platforms and analyze results via natural language, thus, eliminating the need for coding skills and ensuring accessibility for all chemists. We demonstrated the capabilities of LLM-RDF in guiding the end-to-end synthesis development process for the copper/TEMPO catalyzed aerobic alcohol oxidation to aldehyde reaction, including literature search and information extraction, substrate scope and condition screening, reaction kinetics study, reaction condition optimization, reaction scale-up and product purification. Furthermore, LLM-RDF's broader applicability and versability was validated on various synthesis tasks of three distinct reactions (SNAr reaction, photoredox C-C cross-coupling reaction, and heterogeneous photoelectrochemical reaction).

MeSH terms

  • Alcohols / chemistry
  • Aldehydes / chemistry
  • Catalysis
  • Chemistry Techniques, Synthetic / methods
  • Copper* / chemistry
  • Cyclic N-Oxides
  • Kinetics
  • Oxidation-Reduction
  • Software

Substances

  • Copper
  • TEMPO
  • Aldehydes
  • Alcohols
  • Cyclic N-Oxides