Elasticity Predictions¶
Introduction¶
The Materials Project (MP) provides a growing collection of elastic constants calculated from first principles Density Functional Theory (DFT). Please see Elasticity calculations for details regarding the MP elasticity workflow ^{1}.
For compounds that have not yet been processed with the elasticity workflow, the MP offers statistical learning (SL) predictions of the VoigtReussHill ^{2} average bulk and shear moduli (\(K_{VRH}\) and \(G_{VRH}\), respectively). The SL models were trained with a diverse set of 1,940 knary compounds, using the \(K_{VRH}\) and \(G_{VRH}\) moduli calculated by the elasticity workflow.
IMPORTANT NOTE: The SL predictions are generated by a local polynomial regression of several composition and structural descriptors (see Table 1) to available DFTcalculated elasticity data. They are not directly physicsbased predictions and should not be viewed as having similar accuracy or precision to experimental or DFT data. See the SL paper ^{3} and Table 2 for prediction accuracy details. In addition, the SL predictions are based on a fraction of currently available DFT elasticity data. More sophisticated SL models with larger datasets are an objective for future work.
Formalism¶
Ensemble statistical learning techniques construct a predictor from a collection or ensemble of weak learners. Each weak learner is either a single descriptor or a function of just a few descriptors, which limits the level of interaction between descriptors. Gradient boosting (GB) is a very flexible ensemble technique, which makes few assumptions regarding the form of the solution and iteratively builds a predictor from a series of weak learners while minimizing the residual of a loss function ^{4}. GB implementations use regularization techniques to reduce the risk of overfitting, which typically include limiting the level of interaction between descriptors, limiting the number of iterations per some risk criteria, and employing shrinkage ^{4}. At each iteration, the weak learner that causes the greatest reduction in the loss function’s residual is selected and added to the model; however, when shrinkage is employed, each new term is attenuated by the learning rate. See the SL paper ^{3} and references therein for details about the regression model.
Descriptors¶
The successful application of SL requires a set of descriptor candidates that sufficiently explain the diversity of the phenomenon being learned. We distinguish between composition and structural descriptors. Composition descriptors are calculated from elemental properties and only require knowledge of a compound’s composition. Structural descriptors require knowledge of a compound’s specific structure and are calculated using DFT. The descriptors used for the final learned model and their relative influence (RI) are presented in Table 1.
Model  Rank  Descriptor  Underlying property  RI (%) 

K  1  log(V)  volume per atom  46.6 
2  $$μ_{1}(R_{n})  row number  24.5  
3  E_{c}  cohesive energy  19.4  
4  $$μ_{4}(X)  electronegativity  9.5  
G  1  E_{c}  cohesive energy  37.0 
2  log(V)  volume per atom  35.9  
3  $$μ_{3}(R_{n})  row number  13.8  
4  $$μ_{4}(X)  electronegativity  13.3 
Table 1: Descriptor rank and relative influence (RI) for the models for \(K\) and \(G\). Composition descriptors are constructed as Hölder means \(\mu_p(x)\) (power \(p\), property \(x\)). This table and caption are from de Jong et al.'s SL paper ^{3}, which also provides details on Hölder means.
Accuracy¶
The accuracy of the model is summarized below.
Model  Iteration Threshold  Prediction RMSE (log(GPa))  Percent of Predictions within Relative Error of  

5%  10%  20%  30%  
K  99  0.0750  33.1  58.4  87.3  94.5 
G  90  0.1378  13.6  28.8  53.0  73.0 
Table 2: Iteration threshold as determined by cross validation, prediction root mean squared error (RMSE), and percentage of predictions within 5, 10, 20, and 30 percent relative error for K and G models. This table and caption are from de Jong et al.'s SL paper ^{3}.
Citations¶
To cite elastic constant predictions within the Materials Project, please reference the following works:
 "de Jong M, Chen W, Notestine R, Persson K, Ceder G, Jain A, Asta M, and Gamst A (2016) A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of knary Inorganic Polycrystalline Compounds, Scientific Reports 6: 34256." doi:10.1038/srep34256
 "de Jong M, Chen W, Angsten T, Jain A, Notestine R, Gamst A, Sluiter M, Ande CK, van der Zwaag S, Plata JJ, Toher C, Curtarolo S, Ceder G, Persson KA, Asta M (2015) Charting the complete elastic properties of inorganic crystalline compounds. Scientific Data 2: 150009." doi:10.1038/sdata.2015.9
Authors¶
 Randy Notestine
 Maarten de Jong
 Kyle Bystrom
References¶

de Jong M, Chen W, Angsten T, Jain A, Notestine R, Gamst A, Sluiter M, Ande CK, van der Zwaag S, Plata JJ, Toher C, Curtarolo S, Ceder G, Persson KA, Asta M (2015) Charting the complete elastic properties of inorganic crystalline compounds. Scientific Data 2: 150009. ↩

Hill, R. The elastic behaviour of a crystalline aggregate. Proceedings of the Physical Society. Section A 65, 349 (1952). ↩

de Jong M, Chen W, Notestine R, Persson K, Ceder G, Jain A, Asta M, and Gamst A (2016) A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of knary Inorganic Polycrystalline Compounds, Scientific Reports 6: 34256. doi:10.1038/srep34256 ↩↩↩↩

Hastie, T., Tibshirani, R. & Friedman, J. The elements of statistical learning: data mining, inference, and prediction. (Springer, 2011), second edn. ↩↩