Last updated
Last updated
In an effort to make our data as accessible as possible (FAIR principle) as well as significantly improve data downloads and take pressure off our servers, we are making a growing list of our data products available through the . Also see the entries for MP-managed data on the or the . Usage of all data provided through our API or directly through OpenData is subject to our .
MP data is organized in 3 buckets named materialsproject-{raw,parsed,build}
. Note that the particular organization of our data in these buckets is still in flux and can change without notice as we integrate them into our cloud infrastructure.
We are in the process of providing VASP output files for our calculations in the raw
bucket. Look out for announcements through our email lists and notifications on our website.
The contains objects that MP generates by parsing the VASP output files. The objects form the basis for our builder pipelines which create the derived high-level data collections served through the MP API and website. All S3 objects in this bucket are serialized pymatgen
or emmet
python objects and most are stored as gzip-compressed JSON files for each MP ID (i.e. <prefix>/<mp-id>.json.gz
). We are in the process of grouping documents into JSON Lines (JSONL) files to reduce the number of files and significantly improve transfer speeds. tasks
are now organized by nelements/output.spacegroup.number
and a timestamp (dt
) derived from the earliest completed_at
in the list of tasks included in the respective object. The latest set of tasks is in the /tasks_atomate2
prefix (see for details).
All objects for a prefix can be downloaded, using the format
The contains the high-level derived data that comprises the source for the collections available through the as well as pre-built objects and images for efficient visualization on the website.
The collections and pre-built objects are versioned by the database release date and individual documents grouped into gzip-compressed JSONL files. Images are stored in PNG format. Use the ls
command for the AWS CLI or the to list the categories available under each prefix (see below).
The mp-api
python client internally uses direct downloads from the OpenData repositories to improve convenience and efficiency. All data in MP's OpenData buckets can also be downloaded directly using the .
Start by exploring the contents of the bucket you're interested in, by either navigating to the bucket's web interface (e.g. ) or using the CLI's ls
command:
/collections
/2022-10-28
12.6k
2.8 GB
/2023-11-01
18.4k
6.1 GB
/2024-11-14
213k
83 GB
/objects
/2022-10-28
289k
55.9 GB
/images
N/A
200k
58 GB
/dos
692k
63.2 GB
/bandstructures
705k
1.5 TB
/chgcars
415k
7.7 TB
/aeccar{0,1,2}s
138.8k each
1.1 TB each
/elfcars
107.5k
101 GB
/locpots
158k
2.5 TB
/tasks
1556
34 GB
/tasks_atomate2
1286
28 GB
MP data is also available through the AWS OpenData Program.