Benchmark of Semiempirical Methods
Recently details on OMx methods were discussed. In the corresponding paper, the evaluation on their own training sets was given. Obviously, a validation on other established and independent benchmark sets was necessary for a fair comparison with other semiempirical quantum-chemical (SQC) methods. Thus, in our paper  we provide very extensive benchmark study on a huge compilation of the data sets with 13035 original and derived reference data.
This study showcases strengths and exposes weaknesses of each method studied and is therefore useful in two main aspects. First, it provides a unique guide for computational chemists looking for an appropriate fast and yet accurate enough method for their problem at hand. Second, developers have a better overview of what has to be improved in the next generation of SQC methods.
As for the benchmarked SQC methods, we compared Thiel’s OM1, OM2, and OM3 methods and D2, D3, and D3T dispersion corrected variants of OM2 and OM3 with MNDO and a range of the popular MNDO-type methods AM1, RM1, PM3, PM6, and PM7.
We benchmarked above SQC methods for general ground-state properties including energies (atomization and reaction energies, heats of formation, barriers, relative energies, ionization potentials, and electron affinities) and geometries. Apart from our own validation sets, other established sets were used. Among them we used a high-confidence set of atomization energies W4-11, extensive and popular in DFT community GMTKN30 and CE345 sets, sets used by other groups for fitting and validating MNDO-type semiempirical methods, and a subset of the huge set of 134 thousand species. Most of the above sets consist of many subsets targeting different property.
For ease of reference, in conclusions to the paper  we provide a table summarizing what method is best for a particular set and subset. Generally, the OMx methods show up there most frequently. PM6 and PM7 are the best among tested MNDO-type methods. Importantly, the OMx techniques outclass other SQC methods for species with unexpected structures as in the ‘mindless’ MB08-165 set. This provides a clear indication for the robustness of the OMx methods.
In addition to that, we also tested the performance of SQC methods in calculations of noncovalent interactions. Here we used many sets for relatively small (often from the BEGDB database) and large complexes. Although both the OMx-Dn methods and PM7 have comparable accuracy for small complexes, there are some distinctions. Generally, the OMx-Dn methods are better for the dispersion-dominated and mixed electrostatic/dispersion complexes, while PM7 is better for the hydrogen bonded complexes. Good performance of the PM7 may be explained by explicit hydrogen-bond corrections, which are an integral part of the method. None of such corrections are used in the OMx-Dn approaches. Since the large systems are typical objects of SQC calculations, it is important to mention that the OMx-Dn methods are superior for very large complexes (including hydrogen bonded ones) as exemplified by the S30L set.
Our benchmark has clearly shown that the OMx and OMx-Dn methods can be recommended in most cases. Since the OMx methods are currently parametrized only for C, H, N, O, and F elements, PM6 and PM7 remain a valuable alternative for molecules containing other elements. Of course, we also have to keep in mind that, if possible, it is always a good idea to calibrate any method against more accurate experimental or theoretical data for your system before you use this method.
1. Pavlo O. Dral, Xin Wu, Lasse Spörkel, Axel Koslowski, Walter Thiel, Semiempirical Quantum-Chemical Orthogonalization-Corrected Methods: Benchmarks for Ground-State Properties. J. Chem. Theory Comput. 2016, ASAP. DOI: 10.1021/acs.jctc.5b01047.