Expansion of bond dissociation prediction with machine learning to medicinally and environmentally relevant chemical space
Literature Information
Shree Sowndarya S. V., Yeonjoon Kim, Seonah Kim, Peter C. St. John, Robert S. Paton
Bond dissociation energetics underpin the thermodynamics of chemical transformations where bonds are broken or formed and can also be used to predict reaction rates and selectivities. Current machine learning (ML) models to predict bond dissociation energy (BDE) are largely limited in their elemental coverage to hydrogen and the second-row elements. This has restricted the applicability of ML-derived BDE predictions, particularly for molecules of medicinal relevance, since the heteroatoms S, Cl, F, P, Br, and I are commonly found in approved pharmaceuticals. Atmospherically and environmentally relevant molecules containing multiple halogen atoms have been similarly inaccessible. In this study, we considerably expand the size, elemental composition, and bond types of an extensive BDE database and train a new ML BDE model that includes C, H, N, O, S, Cl, F, P, Br, and I. We curate a new quantum chemical dataset of 531 244 unique zero-point energy inclusive homolytic dissociations of organic compounds. We investigate accuracy for out-of-sample molecules and implement iterative training and testing cycles during model development to improve the model accuracy. Improvements in predictive accuracy were achieved for datasets of pharmaceutically relevant molecules containing multiple C(sp2)–halogen bonds from 5.7 to 0.8 kcal mol−1 and polyhaloalkyl compounds with multiple C(sp3)–halogen bonds from 2.7 to 1.2 kcal mol−1 through the targeted augmentation of training data by as little as eight additional molecules. Our updated and expanded model (ALFABET) achieves a mean absolute error of 0.6 kcal mol−1 for both enthalpies and free energies compared to the quantum chemical ground truth. The graph-based representations utilized here outperform traditional cheminformatics features such as radial fingerprints, and there is no discernible improvement in accuracy by including more expensive QM-derived parameters, such as optimized bond lengths. Finally, we illustrate high accuracy in external prediction tasks for large halogenated natural products, pharmaceutically relevant halogenated molecules, atmospherically important halocarbons, and polyfluoroalkyl substances related to environmental toxicity.
Related Literature
Linking crystal structure with temperature-sensitive vibrational modes in calcium carbonate minerals
Ben Xu, Kristin M. Poduska
DOI: 10.1039/C4CP01772B
PICVib: an accurate, fast and simple procedure to investigate selected vibrational modes and evaluate infrared intensities
Marcus V. P. dos Santos, Yaicel G. Proenza, Ricardo L. Longo
DOI: 10.1039/C4CP02279C
Exploring zinc coordination in novel zinc battery electrolytes
Mega Kar, Bjorn Winther-Jensen, Maria Forsyth, Douglas R. MacFarlane
DOI: 10.1039/C4CP00749B
Computational studies of electrochemical CO2 reduction on subnanometer transition metal clusters
Cong Liu, Haiying He, Peter Zapol, Larry A. Curtiss
DOI: 10.1039/C4CP02690J
Electronic structure at nanocontacts of surface passivated CdSe nanorods with gold clusters
Deepashri Saraf, Anjali Kshirsagar
DOI: 10.1039/C4CP00069B
Structures and optical properties of two phases of SrMgF4
Alexander P. Yelisseyev, Lei Bai, Zheshuai Lin, Alina A. Goloshumova, Sergei I. Lobanov, Dmitry Y. Naumov
DOI: 10.1039/C4CP04689G
Correction: Plasmon-enhanced water splitting on TiO2-passivated GaP photocatalysts
Jing Qiu, Guangtong Zeng, Prathamesh Pavaskar, Zhen Li
DOI: 10.1039/C4CP90165G
Solvent-mediated molar conductivity of protic ionic liquids
Sachin Thawarkar, Nageshwar D. Khupse, Anil Kumar
DOI: 10.1039/C4CP04591B
Polymer-grafted multiwall carbon nanotubes functionalized by nitrene chemistry: effect on cooperativity and phase miscibility
Goutam Prasanna Kar, Priti Xavier, Suryasarathi Bose
DOI: 10.1039/C4CP01594K
First-principles study of ground-state properties of U2Mo
Xiyue Cheng, Yuting Zhang, Ronghan Li, Weiwei Xing, Pengcheng Zhang, Xing-Qiu Chen
DOI: 10.1039/C4CP03841J
You might also like
What are the main uses of 1H-Indazole-6-carbonitrile (CAS: 141290-59-7)?
1H-Indazole-6-carbonitrile finds applications in pharmaceuticals, where it serve...
How should waste containing Dioctyl (2E)-2-butenedioate (CAS: 2997-85-5) be handled?
Waste containing Dioctyl (2E)-2-butenedioate (CAS: 2997-85-5) should be collecte...
What industries use Sodium [(1,2-benzoxazol-3-ylmethyl)sulfonyl]azanide (CAS: 68291-98-5)?
Sodium [(1,2-benzoxazol-3-ylmethyl)sulfonyl]azanide is primarily used in pharmac...
Are there alternatives to Dimethyl 4-(4,4,5,5-tetramethyl-1,3,2-dioxaborolan-2-yl)-2,6-pyridinedicarboxylate (CAS: 741709-66-0) in synthesis?
Dimethyl 4-(4,4,5,5-tetramethyl-1,3,2-dioxaborolan-2-yl)-2,6-pyridinedicarboxyla...
How should waste containing 2-Fluoro-6-hydrazinopyridine (CAS: 80714-39-2) be handled?
Waste containing 2-Fluoro-6-hydrazinopyridine (CAS: 80714-39-2) should be manage...
What is 6-Formyl-2-pyridinecarboxylic acid (CAS: 499214-11-8)?
6-Formyl-2-pyridinecarboxylic acid is an organic compound with the molecular for...
What is the market or research trend for 3-(3,4-dimethoxyphenyl)-2,5-dimethyl-N-(2-morpholin-4-ylethyl)pyrazolo[1,5-a]pyrimidin-7-amine (CAS: 900874-91-1)?
Research trends for this compound indicate a focus on its potential applications...
How is 9H-Tribenzo[b,d,f]azepine (CAS: 29875-73-8) typically synthesized?
9H-Tribenzo[b,d,f]azepine is typically synthesized via a multi-step process invo...
How is 1-Cyclopropyl-7-ethoxy-6-fluoro-8-methoxy-4-oxo-1,4-dihydro-3-quinolinecarboxylic acid (CAS: 1797982-51-4) typically synthesized?
1-Cyclopropyl-7-ethoxy-6-fluoro-8-methoxy-4-oxo-1,4-dihydro-3-quinolinecarboxyli...
How should waste containing Methyl 3-oxo-1,2,3,4-tetrahydro-6-quinoxalinecarboxylate (CAS: 671820-52-3) be handled?
Waste containing Methyl 3-oxo-1,2,3,4-tetrahydro-6-quinoxalinecarboxylate (CAS: ...














![N-[(9H-Fluoren-9-ylmethoxy)carbonyl]serine structure N-[(9H-Fluoren-9-ylmethoxy)carbonyl]serine structure](https://static.chemtradehub.com/structs/737/73724-45-5-b0dc.webp)
