# Data Builders

Builders to produce MongoDB collections associated with specific categories of materials data are implemented in the [emmet-builders](https://github.com/materialsproject/emmet/tree/main/emmet-builders) software package. These take a set of MongoDB collections as input, extract and transform data from them, and then output a new collection. The schema used by builders for the input and output collections is defined by a set of standardized document models. We use a Python library called [pydantic](https://pydantic-docs.helpmanual.io) to structure these document models, and we store our documents models within the [emmet-core](https://github.com/materialsproject/emmet/tree/main/emmet-core) software package. Additionally, these models are used to define the schema for the Materials Project API which has its server-side code implemented in [emmet-api](https://github.com/materialsproject/emmet/tree/main/emmet-api).

To browse our document models defined in emmet, see [here](https://github.com/materialsproject/emmet/tree/main/emmet-core/emmet/core). For example, see the `ThermoDoc` defined [here](https://github.com/materialsproject/emmet/blob/main/emmet-core/emmet/core/thermo.py) as an example document model that powers both the [`ThermoBuilder`](https://github.com/materialsproject/emmet/blob/main/emmet-builders/emmet/builders/vasp/thermo.py) and the Materials Project's [thermo API endpoint](https://api.materialsproject.org/docs#/Thermo/get_by_key_thermo__material_id___get).

The figure below illustrates the entire Materials Project build pipeline including builders and all input/output collections:

![](/files/Z7S7Yi7NyHY1dFVx4voq)

For information on how to run any of the emmet builders, see the [Running Builders](https://materialsproject.github.io/maggma/getting_started/running_builders/) section of the maggma software package which defines a lot of the core builder related code and CLI.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.materialsproject.org/data-production/data-builders.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
