# Elasticity Predictions¶

## Introduction¶

The Materials Project (MP) provides a growing collection of elastic constants calculated from first principles Density Functional Theory (DFT). Please see Elasticity calculations for details regarding the MP elasticity workflow 1.

For compounds that have not yet been processed with the elasticity workflow, the MP offers statistical learning (SL) predictions of the Voigt-Reuss-Hill 2 average bulk and shear moduli ($K_{VRH}$ and $G_{VRH}$, respectively). The SL models were trained with a diverse set of 1,940 k-nary compounds, using the $K_{VRH}$ and $G_{VRH}$ moduli calculated by the elasticity workflow.

IMPORTANT NOTE: The SL predictions are generated by a local polynomial regression of several composition and structural descriptors (see Table 1) to available DFT-calculated elasticity data. They are not directly physics-based predictions and should not be viewed as having similar accuracy or precision to experimental or DFT data. See the SL paper 3 and Table 2 for prediction accuracy details. In addition, the SL predictions are based on a fraction of currently available DFT elasticity data. More sophisticated SL models with larger datasets are an objective for future work.

## Formalism¶

Ensemble statistical learning techniques construct a predictor from a collection or ensemble of weak learners. Each weak learner is either a single descriptor or a function of just a few descriptors, which limits the level of interaction between descriptors. Gradient boosting (GB) is a very flexible ensemble technique, which makes few assumptions regarding the form of the solution and iteratively builds a predictor from a series of weak learners while minimizing the residual of a loss function 4. GB implementations use regularization techniques to reduce the risk of over-fitting, which typically include limiting the level of interaction between descriptors, limiting the number of iterations per some risk criteria, and employing shrinkage 4. At each iteration, the weak learner that causes the greatest reduction in the loss function’s residual is selected and added to the model; however, when shrinkage is employed, each new term is attenuated by the learning rate. See the SL paper 3 and references therein for details about the regression model.

## Descriptors¶

The successful application of SL requires a set of descriptor candidates that sufficiently explain the diversity of the phenomenon being learned. We distinguish between composition and structural descriptors. Composition descriptors are calculated from elemental properties and only require knowledge of a compound’s composition. Structural descriptors require knowledge of a compound’s specific structure and are calculated using DFT. The descriptors used for the final learned model and their relative influence (RI) are presented in Table 1.

Model Rank Descriptor Underlying property RI (%)
K 1 log(V) volume per atom 46.6
2 μ1(Rn) row number 24.5
3 Ec cohesive energy 19.4
4 μ-4(X) electronegativity 9.5
G 1 Ec cohesive energy 37.0
2 log(V) volume per atom 35.9
3 μ-3(Rn) row number 13.8
4 μ4(X) electronegativity 13.3

Table 1: Descriptor rank and relative influence (RI) for the models for $K$ and $G$. Composition descriptors are constructed as Hölder means $\mu_p(x)$ (power $p$, property $x$). This table and caption are from de Jong et al.'s SL paper 3, which also provides details on Hölder means.

## Accuracy¶

The accuracy of the model is summarized below.

Model Iteration Threshold Prediction RMSE (log(GPa)) Percent of Predictions within Relative Error of
5% 10% 20% 30%
K 99 0.0750 33.1 58.4 87.3 94.5
G 90 0.1378 13.6 28.8 53.0 73.0

Table 2: Iteration threshold as determined by cross validation, prediction root mean squared error (RMSE), and percentage of predictions within 5, 10, 20, and 30 percent relative error for K and G models. This table and caption are from de Jong et al.'s SL paper 3.

## Citations¶

To cite elastic constant predictions within the Materials Project, please reference the following works:

1. "de Jong M, Chen W, Notestine R, Persson K, Ceder G, Jain A, Asta M, and Gamst A (2016) A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds, Scientific Reports 6: 34256." doi:10.1038/srep34256
2. "de Jong M, Chen W, Angsten T, Jain A, Notestine R, Gamst A, Sluiter M, Ande CK, van der Zwaag S, Plata JJ, Toher C, Curtarolo S, Ceder G, Persson KA, Asta M (2015) Charting the complete elastic properties of inorganic crystalline compounds. Scientific Data 2: 150009." doi:10.1038/sdata.2015.9

## Authors¶

1. Randy Notestine
2. Maarten de Jong
3. Kyle Bystrom

## References¶

1. de Jong M, Chen W, Angsten T, Jain A, Notestine R, Gamst A, Sluiter M, Ande CK, van der Zwaag S, Plata JJ, Toher C, Curtarolo S, Ceder G, Persson KA, Asta M (2015) Charting the complete elastic properties of inorganic crystalline compounds. Scientific Data 2: 150009.

2. Hill, R. The elastic behaviour of a crystalline aggregate. Proceedings of the Physical Society. Section A 65, 349 (1952).

3. de Jong M, Chen W, Notestine R, Persson K, Ceder G, Jain A, Asta M, and Gamst A (2016) A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds, Scientific Reports 6: 34256. doi:10.1038/srep34256

4. Hastie, T., Tibshirani, R. & Friedman, J. The elements of statistical learning: data mining, inference, and prediction. (Springer, 2011), second edn.