Data Builders

Description of builders used to produce Materials Project data.

Builders to produce MongoDB collections associated with specific categories of materials data are implemented in the emmet-builders software package. These take a set of MongoDB collections as input, extract and transform data from them, and then output a new collection. The schema used by builders for the input and output collections is defined by a set of standardized document models. We use a Python library called pydantic to structure these document models, and we store our documents models within the emmet-core software package. Additionally, these models are used to define the schema for the Materials Project API which has its server-side code implemented in emmet-api.

To browse our document models defined in emmet, see here. For example, see the ThermoDoc defined here as an example document model that powers both the ThermoBuilder and the Materials Project's thermo API endpoint.

The figure below illustrates the entire Materials Project build pipeline including builders and all input/output collections:

For information on how to run any of the emmet builders, see the Running Builders section of the maggma software package which defines a lot of the core builder related code and CLI.

Last updated