Not all data products provided by the Materials Project are subject to the same terms of use
A full list of the terms of use applying to each Materials Project data product are available at: Materials Project - Terms of Use
Utilizing the mp-api python client for the Materials Project API will seamlessly handle data access according to the terms of use you have agreed to upon account registration (and as you agree to further terms, e.g., opting into using the GNoME dataset for non-commercial uses).
Retrieving data directly from the Materials Project's data lakehouse will circumvent the data access control convenience features provided by mp-api python client. This is perfectly acceptable, but know that you as a user are still responsible for abiding by the Materials Project's terms of use provided above.
For convenience, a list of relevant metadata filtering methods (similar to the methods used in mp-api) are provided here for each of the data products in the Materials Project that have special access restrictions
Graph Networks for Materials Exploration Database (GNoME)
A subset of the data contained in following data products endpoints are effected by this set of terms: tasks, tasks-build, summary, and materials
The following list of batch_ids correspond to documents associated with the GNoME dataset:
["gnome_r2scan_statics"]
So if for example you have a full copy of the tasks table (or tasks-build) stored locally as an arrow dataset, the following snippet could be used to create a new dataset with all GNoME entries removed:
import pyarrow.dataset as dsimport pyarrow.compute as pcgnome_batch_ids =["gnome_r2scan_statics"]dataset_with_gnome = ds.dataset(<PATHTOLOCALDATASET>)filter_expr = pc.invert(pc.field("batch_id").isin(gnome_batch_ids))no_gnome_dataset = dataset_with_gnome.filter(filter_expr)# write new dataset, sort, groupby, etc....
A similar process can be applied to the summary and materials tables, all that needs to be adjusted is the field name in the filter expression: