Machine Learning in Chemistry

We are working in improving and accelerating quantum chemical research with machine learning (ML). See my Perspective on Quantum Chemistry in the Age of Machine Learning for a concise overview of this research field[9] and also a brief tutorial introduction to the field in my book chapter.[12]

Recently we introduced ML-nuclear ensemble approach for cost-efficient high-precision simulation of absorption spectra.[11]

We develop techniques for accelerating nonadiabatic excited-state dynamics with machine learning. In our exploratory work, we have demonstrated that kernel ridge regression-based ML can be used to allow routine simulations with thousands of trajectories.[5] We have also shown that deep learning can be used for pure ML nonadiabatic dynamics of molecules.[6]

Earlier, we have proposed structure-based sampling and self-correcting machine learning (ML) for precise representation of molecular potential energy surfaces and calculating vibrational levels with spectroscopic accuracy (errors less than 1 cm−1 relative to the reference ab initio spectrum).[4] Structure-based sampling ensures that ML works most of the time in the interpolation regime, where it performs best. Our approach reduces the number of required quantum mechanical calculations by up to 90%.

We also apply ML methods to improve accuracy of less accurate quantum mechanical (QM) methods—DFT and especially semiempirical quantum chemical (SQC) methods—and use them for calculating various molecular properties with reasonable accuracy and low computational cost.

Improvement of a semiempirical quantum chemical (SQC) method using Machine Learning (ML)
Improvement of a semiempirical quantum chemical (SQC) method using Machine Learning (ML)

Application of ML methods alone would often lead to very inaccurate predictions, which limits their application in chemistry. Thus, we propose two different hybrid ML/QM approaches to eliminate deficiencies of both ML and QM techniques. For our studies we use our huge database of QM properties.[1]

The first approach is improving implicitly QM methods. Practically we do it by correcting semiempirical parameters using ML (ML-SQC technique).[2]

The second approach is improving explicitly QM methods by making on-top corrections with ML (Δ-ML technique).[3]

Recently we generalized Δ-machine learning to semi-automatically define training sets for many quantum chemical corrections in order to build hierarchical machine learning (hML) potential energy surfaces of high accuracy.[10]

For our ML studies we develop and use our own ML program package MLatom.[7][8]


12. Pavlo O. Dral, Quantum Chemistry Assisted by Machine Learning. In Advances in Quantum Chemistry: Chemical Physics and Quantum Chemistry, Volume 81, 1st ed.; Kenneth Ruud, Erkki J. Brändas, Eds. Academic Press: 2020; Vol. 81. DOI: 10.1016/bs.aiq.2020.05.002. (free copy online till 6 Nov. 2020 | blog postonline tutorial)

11. Bao-Xin Xue, Mario Barbatti, Pavlo O. Dral, Machine Learning for Absorption Cross Sections, J. Phys. Chem. A 2020, 124, 7199–7210. DOI: 10.1021/acs.jpca.0c05310.

10. Pavlo O. Dral, Alec Owens, Alexey Dral, Gábor Csányi, Hierarchical Machine Learning of Potential Energy Surfaces. J. Chem. Phys. 2020, 152, 204110. DOI: 10.1063/5.0006498.

9. Pavlo O. Dral, Quantum Chemistry in the Age of Machine Learning. J. Phys. Chem. Lett. 2020, 11, 2336–2347. DOI: 10.1021/acs.jpclett.9b03664.

8. Pavlo O. Dral, Bao-Xin Xue, Fuchun Ge, Yi-Fan Hou, MLatom: A Program Package for Quantum Chemical Research Assisted by Machine Learning. J. Comput. Chem. 2019, 40, 2339–2347. DOI: 10.1002/jcc.26004.

7. Pavlo O. Dral, MLatom: A Package for Atomistic Simulations with Machine Learning, (, 2013–2020.

6. Wen-Kai Chen, Xiang-Yang Liu, Weihai Fang, Pavlo O. Dral, Ganglong Cui, Deep Learning for Nonadiabatic Excited-State Dynamics. J. Phys. Chem. Lett. 2018, 9, 6702–6708. DOI: 10.1021/acs.jpclett.8b03026.

5. Pavlo O. Dral, Mario Barbatti, Walter Thiel, Nonadiabatic Excited-State Dynamics with Machine Learning. J. Phys. Chem. Lett. 2018, 9, 5660–5663. DOI: 10.1021/acs.jpclett.8b02469.

4. Pavlo O. Dral, Alec Owens, Sergei N. Yurchenko, Walter Thiel, Structure-Based Sampling and Self-Correcting Machine Learning for Accurate Calculations of Potential Energy Surfaces and Vibrational Levels. J. Chem. Phys. 2017, 146, 244108. DOI: 10.1063/1.4989536.

3. Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld, Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. J. Chem. Theory Comput. 2015, 11, 2087–2096. DOI: 10.1021/acs.jctc.5b00099.
arXiv:1503.04987 [physics.chem-ph].

2. Pavlo O. Dral, O. Anatole von Lilienfeld, Walter Thiel, Machine Learning of Parameters for Accurate Semiempirical Quantum Chemical Calculations. J. Chem. Theory Comput. 2015, 11, 2120–2125. DOI: 10.1021/acs.jctc.5b00141.

1. Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld, Quantum Chemistry Structures and Properties of 134 Kilo Molecules. Sci. Data 2014, 1, 140022. DOI: 10.1038/sdata.2014.22.
Data set download link: figshare.