Evaluating Polymer Representations via Quantifying Structure-Property Relationships

Ruimin Ma; Zeyu Liu; Quanwei Zhang; Zhiyu Liu; Tengfei Luo

doi:10.1021/acs.jcim.9b00358

Evaluating Polymer Representations via Quantifying Structure-Property Relationships

J Chem Inf Model. 2019 Jul 22;59(7):3110-3119. doi: 10.1021/acs.jcim.9b00358. Epub 2019 Jul 3.

Authors

Ruimin Ma¹, Zeyu Liu¹, Quanwei Zhang¹, Zhiyu Liu¹, Tengfei Luo^{1

2}

Affiliations

¹ Department of Aerospace and Mechanical Engineering , University of Notre Dame , Notre Dame , Indiana 46556 , United States.
² Department of Chemical and Biomolecular Engineering , University of Notre Dame , Notre Dame , Indiana 46556 , United States.

PMID: 31268306
DOI: 10.1021/acs.jcim.9b00358

Abstract

Machine learning techniques are being applied in quantifying structure-property relationships for a wide variety of materials, where the properly represented materials play key roles. Although algorithms for representation learning are extensively studied, their applications to domain-specific areas, such as polymers, are limited largely due to the lack of benchmark databases. In this work, we investigate different types of polymer representations, including Morgan fingerprint (MF), molecular embedding (ME), and molecular graph (MG), based on the benchmark database from a subset of the well-known web-based polymer databases, PolyInfo. We evaluate the quality of different polymer representations via quantifying the relationships between the representations and polymer properties, including density, melting temperature, and glass transition temperature. Different representation learning schemes for MEs, such as supervised learning, semisupervised learning, and transfer learning, are investigated. In supervised learning, only labeled molecules in our benchmark database are used for representation learning, in semisupervised learning, both labeled and unlabeled molecules in our benchmark database are used, and in transfer learning, molecules from an external database that is different from the benchmark database are used for representation learning. It is found that ME (with the R² of 0.724 in the density case, 0.684 in the melting temperature case, and 0.865 in the glass transition temperature case) outperforms the other representations for structure-property relationship quantification in all cases studied, and MG (with the R² of 0.260 in the density case, -0.149 in the melting temperature case, and 0.711 in the glass transition case) is shown to be much inferior to ME and MF (with the R² of 0.562 in the density case, 0.645 in the melting temperature case, and 0.849 in the glass transition case), likely due to the relatively small volumes of training data available. For MEs, it is found that the similarities of substructure MEs under different learning schemes (e.g., SL, SSL, and TL) are differently estimated, thus leading to different performance scores in structure-property relation quantification. Combinations of MEs show little effect on predictive performance when comparing to the single MEs in the corresponding regression tasks, proving no information gain in mixing MEs.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Machine Learning*
Models, Molecular
Molecular Structure
Polymers / chemistry*
Structure-Activity Relationship

Substances

Polymers