githubEdit

Access-controlled Data

Not all data products provided by the Materials Project are subject to the same terms of use

circle-info

A full list of the terms of use applying to each Materials Project data product are available at: Materials Project - Terms of Usearrow-up-right

Utilizing the mp-api python client for the Materials Project API will seamlessly handle data access according to the terms of use you have agreed to upon account registration (and as you agree to further terms, e.g., opting into using the GNoME dataset for non-commercialarrow-up-right uses).

Retrieving data directly from the Materials Project's data lakehouse will circumvent the data access control convenience features provided by mp-api python client. This is perfectly acceptable, but know that you as a user are still responsible for abiding by the Materials Project's terms of use provided above.

For convenience, a list of relevant metadata filtering methods (similar to the methods used in mp-api) are provided here for each of the data products in the Materials Project that have special access restrictions

Graph Networks for Materials Exploration Database (GNoME)

The GNoME dataset is subject to the following terms: GNoME Database License and Terms of Usearrow-up-right

A subset of the data contained in following data products endpoints are effected by this set of terms: tasksarrow-up-right, tasks-buildarrow-up-right, summaryarrow-up-right, and materialsarrow-up-right

The following list of batch_idarrow-up-rights correspond to documents associated with the GNoME dataset:

["gnome_r2scan_statics"] 

So if for example you have a full copy of the tasks table (or tasks-build) stored locally as an arrow dataset, the following snippet could be used to create a new dataset with all GNoME entries removed:

import pyarrow.dataset as ds
import pyarrow.compute as pc

gnome_batch_ids = ["gnome_r2scan_statics"]

dataset_with_gnome = ds.dataset(<PATH TO LOCAL DATASET>)
filter_expr = pc.invert(pc.field("batch_id").isin(gnome_batch_ids))
no_gnome_dataset = dataset_with_gnome.filter(filter_expr)
# write new dataset, sort, groupby, etc.
...

A similar process can be applied to the summary and materials tables, all that needs to be adjusted is the field name in the filter expression:

Last updated

Was this helpful?