r2SCAN datasets
Development of robust, high fidelity datasets for training universal machine learning interatomic potentials
MatPES
The materials potential energy surface (MatPES) collaboration aims to generate low noise, high coverage small datasets of DFT-computed properties (energies, forces, stresses, magnetic moments, etc.) for training universal machine learning interatomic potentials.
The data is generated with the MatPESStaticSet in pymatgen, with efficient PBE and r2SCAN workflows implemented in atomate2.
The full dataset can be downloaded from MPContribs and uses the MatPESTrainDoc schema from emmet-core
MP-ALOE
In a similar vein, Kuner et al. [2] sought to expand both the size and chemistries chosen in an r2SCAN dataset, and used an active learning method to explore under-sampled regions of the potential energy surface. The resultant Materials Project active learning of off-equilibrium structures (MP-ALOE) dataset contains ~900,000 r2SCAN calculations which are compatible with MatPES.
[WIP] MP-ALOE will similarly be available on MPContribs (explorer) (bulk download).
References:
[1] A. D. Kaplan, R. Liu, J. Qi, T. W. Ko, B. Deng, J. Riebesell, G. Ceder, K. A. Persson, and S. P. Ong, “A foundational potential energy surface dataset for materials,” arXiv:2503.04070, yr. 2025 (DOI) (MPContribs explorer) (website)
[2] M.C. Kuner, A.D. Kaplan, K.A. Persson, M. Asta, and D.C. Chrzan, "MP-ALOE: An r2SCAN dataset for universal machine learning interatomic potentials". arXiv:2507.05559, yr. 2025. (DOI) (original figshare)
Last updated
Was this helpful?