Tips for Large Downloads
Tips for downloading large data sets from the API
The Materials Project API imposes rate limits on requests starting at 25/second. Below are a few tips for downloading large datasets quickly without hitting the limit:
To get data for multiple materials, pass query parameter values as lists where available. For example, avoid looping over material ID values and using
get_data_by_id
for each individual material. Instead, pass the materials as a list in thesearch
method:Before requesting data, use the has_props key to find which materials have data for your desired property. One source of wasted queries occurs when data is requested for materials that are either nonexistent or do not contain the property of interest. You should instead first determine what materials have the data you are looking for. For example, below is a query to get all of the material ID values for entries that have dielectric and density of states data:
Restrict the data returned to the specific fields of interest, to the extent possible. This will help our server load and greatly speed up data retrieval. For example, if you are only interested in the material ID, volume, and list of elements you can pass those values to the
fields
argument:For large, long-running, or frequently duplicated queries, we ask that you make a local copy and retrieve the data using the API only once. This will speed up your own analyses and also avoid unnecessary loads on the Materials Project servers. Additionally, we've optimized downloads of full collections such that they're often more efficient and faster than providing long lists of
material_ids
orfields
. For instance,
We are currently in the process of developing additional tools for users to download extremely large datasets more easily. If you have further questions, please contact support@materialsproject.org specifying in detail the issue you are facing and listing the steps you have taken to try to resolve the issue.
Last updated