Machine Learning in Chemistry

I am currently working on accelerating nonadiabatic excited-state dynamics with machine learning (ML). In our exploratory work, we have demonstrated that kernel ridge regression-based ML can be used to allow routine simulations with thousands of trajectories.[5] We have also shown that deep learning can be used for pure ML nonadiabatic dynamics of molecules.[6]

Earlier, I have proposed structure-based sampling and self-correcting machine learning (ML) for precise representation of molecular potential energy surfaces and calculating vibrational levels with spectroscopic accuracy (errors less than 1 cm−1 relative to the reference ab initio spectrum).[4] Structure-based sampling ensures that ML works most of the time in the interpolation regime, where it performs best. My approach reduces the number of required quantum mechanical calculations by up to 90%.

We also apply ML methods to improve accuracy of less accurate quantum mechanical (QM) methods—DFT and especially semiempirical quantum chemical (SQC) methods—and use them for calculating various molecular properties with reasonable accuracy and low computational cost.

Improvement of a semiempirical quantum chemical (SQC) method using Machine Learning (ML)

Improvement of a semiempirical quantum chemical (SQC) method using Machine Learning (ML)

Application of ML methods alone would often lead to very inaccurate predictions, which limits their application in chemistry. Thus, we propose two different hybrid ML/QM approaches to eliminate deficiencies of both ML and QM techniques. For our studies we use our huge database of QM properties.[1]

The first approach is improving implicitly QM methods. Practically we do it by correcting semiempirical parameters using ML (ML-SQC technique).[2]

The second approach is improving explicitly QM methods by making on-top corrections with ML (Δ-ML technique).[3]

For our ML studies I develop and use my own ML program package MLatom.[7]


7. Pavlo O. Dral, MLatom: A Package for Atomistic Simulations with Machine Learning, Max-Planck-Institut für Kohlenforschung, Mülheim an der Ruhr, Germany (, 2013–2018.

6. Wen-Kai Chen, Xiang-Yang Liu, Weihai Fang, Pavlo O. Dral, Ganglong Cui, Deep Learning for Nonadiabatic Excited-State Dynamics. J. Phys. Chem. Lett. 2018, 9, 6702–6708. DOI: 10.1021/acs.jpclett.8b03026.

5. Pavlo O. Dral, Mario Barbatti, Walter Thiel, Nonadiabatic Excited-State Dynamics with Machine Learning. J. Phys. Chem. Lett. 2018, 9, 5660–5663. DOI: 10.1021/acs.jpclett.8b02469.

4. Pavlo O. Dral, Alec Owens, Sergei N. Yurchenko, Walter Thiel, Structure-Based Sampling and Self-Correcting Machine Learning for Accurate Calculations of Potential Energy Surfaces and Vibrational Levels. J. Chem. Phys. 2017, 146, 244108. DOI: 10.1063/1.4989536.

3. Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld, Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. J. Chem. Theory Comput. 2015, 11, 2087–2096. DOI: 10.1021/acs.jctc.5b00099.
arXiv:1503.04987 [physics.chem-ph].

2. Pavlo O. Dral, O. Anatole von Lilienfeld, Walter Thiel, Machine Learning of Parameters for Accurate Semiempirical Quantum Chemical Calculations. J. Chem. Theory Comput. 2015, 11, 2120–2125. DOI: 10.1021/acs.jctc.5b00141.

1. Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld, Quantum Chemistry Structures and Properties of 134 Kilo Molecules. Sci. Data 2014, 1, 140022. DOI: 10.1038/sdata.2014.22.
Data set download link: figshare.