We have proposed structure-based sampling and self-correcting machine learning (ML) for precise representation of molecular potential energy surfaces and calculating vibrational levels with spectroscopic accuracy (errors less than 1 cm−1 relative to the reference ab initio spectrum). Structure-based sampling ensures that ML works most of the time in the interpolation regime, where it performs best. Our approach reduces the number of required quantum mechanical calculations by up to 90%.
We also apply ML methods to improve accuracy of less accurate quantum mechanical (QM) methods—DFT and especially semiempirical quantum chemical (SQC) methods—and use them for calculating various molecular properties with reasonable accuracy and low computational cost.
Application of ML methods alone would often lead to very inaccurate predictions, which limits their application in chemistry. Thus, we propose two different hybrid ML/QM approaches to eliminate deficiencies of both ML and QM techniques. For our studies we use our huge database of QM properties.
For our ML studies I develop and use my own ML program package MLatom.
5. Pavlo O. Dral, Alec Owens, Sergei N. Yurchenko, Walter Thiel, Structure-Based Sampling and Self-Correcting Machine Learning for Accurate Calculations of Potential Energy Surfaces and Vibrational Levels. J. Chem. Phys. 2017, 146, 244108. DOI: 10.1063/1.4989536.
3. Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld, Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. J. Chem. Theory Comput. 2015, 11, 2087–2096. DOI: 10.1021/acs.jctc.5b00099.
2. Pavlo O. Dral, O. Anatole von Lilienfeld, Walter Thiel, Machine Learning of Parameters for Accurate Semiempirical Quantum Chemical Calculations. J. Chem. Theory Comput. 2015, 11, 2120–2125. DOI: 10.1021/acs.jctc.5b00141.
1. Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld, Quantum Chemistry Structures and Properties of 134 Kilo Molecules. Sci. Data 2014, 1, 140022. DOI: 10.1038/sdata.2014.22.
Data set download link: figshare.