(p)KREG Models for Accurate Molecular Potential Energy Surfaces

(p)KREG Models for Accurate Molecular Potential Energy Surfaces

We improved (p)KREG models for an accurate representation of molecular potential energy surfaces (PESs) by including gradient information explicitly in their formalism. Our models are better or on par with other state-of-the-art machine learning models as we show on extensive benchmarks. We also found that learning both energy and energy gradients are important to properly model potential energy surfaces and learning only one of them is not enough. The methods and benchmarks are reported in the J. Chem. Theory Comput. Code in MLatom and tutorials are available online and calculations can be performed on the XACS cloud.

Machine learning is very powerful in fitting multi-dimensional functions. In quantum chemistry, this capability of ML is exploited to fit such properties as potential energy surfaces, excitation energies, dipole moments, and electron densities. The key is also the choice of descriptors that encode geometrical and/or other information about molecules. The kernel methods (KM) such as kernel ridge regression (KRR) are among the most widely used machine learning algorithms in this field, together with neural networks and linear models.

One of our previously developed ML approaches to learning quantum chemical properties is the KRR-based KREG model which uses a global relative-to-equilibrium (RE) descriptor based on scaled internuclear distances and the Gaussian kernel function; its pKREG variant is based on a permutationally invariant kernel applied to enforce invariance with atom permutations.

Our previous works showed that (p)KREG models can be used to efficiently and accurately learn scalar properties (potential energies, excitation energies, oscillator strengths). However, it is known that the accuracy can be greatly improved by including derivative information explicitly in the KM formalism. Thus, in this work, we implement the explicit learning of derivatives in (p)KREG models.

We evaluated the performance of our models on single molecule potential energy surfaces of the popular MD17 dataset and our recently developed, more challenging WS22 dataset. From the learning curves of KREG models below, we can see that including energy gradient information is necessary to significantly improve the performance of models, as expected. Besides, including both energies and energy gradients makes the model more robust for some challenging cases such as urocanic acid in the WS22 dataset, where models trained only on energy gradients fail completely. As a result, we recommend training on both energies and energy gradients if both are available as including energies does not bring much additional computational cost but greatly improves the model robustness in small data settings.

We also compare our (p)KREG models with other state-of-the-art machine learning models such as sGDML, ANI, DPMD, PhysNet and GAP-SOAP. The learning curves show that pKREG model has competitive accuracy across all potential energy surfaces. For small training sets, and some simple molecules and narrower distribution of energies (such as in MD17), KREG underperforms , while pKREG has higher accuracy, which is comparable to sGDML. The difference between KREG and pKREG becomes smaller as the size of the training set becomes larger, which is caused by the underlying symmetry information in larger training sets. For broader energy distributin and more distorted structures as in the WS22 dataset and for larger training sets in both MD17 and WS22 datasets, (p)KREG models start to have better performance than sGDML. Finally, for some challenging cases like urocanic acid and o-HBDI, sGDML, which does not learn energies explicitly, fails to learn energies.

Both KREG and pKREG models can be accessed in our open-source package MLatom, which can also be used on the XACS cloud computing platform free of charge at MLatom@XACS. Tutorials can be found at http://mlatom.com/kreg/.

Publication:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.