Materials Project Documentation
Return to materialsproject.org
  • Introduction
  • Frequently Asked Questions (FAQ)
    • Glossary of Terms
  • Changes and Updates
    • Database Versions
    • Website Changelog
  • Documentation Credit
  • Community
    • Getting Help
    • Getting Involved
      • Contributor Guide
      • Potential Collaborators
      • MP Community Software Ecosystem
    • Community Resources
    • Code of Conduct
  • Services
    • MPContribs
  • Methodology
    • Materials Methodology
      • Overview
      • Calculation Details
        • GGA/GGA+U Calculations
          • Parameters and Convergence
          • Hubbard U Values
          • Pseudo-potentials
        • r2SCAN Calculations
          • Parameters and Convergence
          • Pseudopotentials
      • Thermodynamic Stability
        • Energy Corrections
          • Anion and GGA/GGA+U Mixing
          • GGA/GGA+U/r2SCAN Mixing
        • Phase Diagrams (PDs)
        • Chemical Potential Diagrams (CPDs)
        • Finite Temperature Estimation
      • Electronic Structure
      • Phonon Dispersion
      • Diffraction Patterns
      • Aqueous Stability (Pourbaix)
      • Magnetic Properties
      • Elastic Constants
      • Piezoelectric Constants
      • Dielectric Constants
      • Equations of State (EOS)
      • X-ray Absorption Spectra (XAS)
      • Surface Energies
      • Grain Boundaries
      • Charge Density
      • Suggested Substrates
      • Related Materials
      • Optical absorption spectra
      • Alloys
    • Molecules Methodology
      • Overview
      • Calculation Details
      • Atomic Partial Charges
      • Atomic Partial Spins
      • Bonding
      • Metal Coordination and Binding
      • Natural Atomic and Molecular Orbitals
      • Redox and Electrochemical Properties
      • Molecular Thermodynamics
      • Vibrational Properties
      • Legacy Data
    • MOF Methodology
      • Calculation Parameters
        • DFT Parameters
        • Density Functionals
        • Pseudopotentials
        • DFT Workflow
  • Apps
    • Explore and Search Apps
      • Materials Explorer
        • Tutorial
      • Molecules Explorer
        • Tutorial
        • Legacy Data
      • Battery Explorer
        • Background
        • Tutorial
      • Synthesis Explorer
        • Background
        • Tutorial
      • Catalysis Explorer
        • Tutorial
      • MOF Explorer
        • Downloading the Data
        • Structure Details
          • QMOF IDs
          • Structure Sources
          • Finding MOFs by Common Name
          • Structural Fidelity
        • Property Definitions
          • SMILES, MOFid, and MOFkey
          • Pore Geometry
          • Topology
          • Electronic Structure
          • Population Analyses and Bond Orders
          • Symmetry
        • Version History
        • How to Cite
    • Analysis Apps
      • Phase Diagram
        • Background
        • Tutorials
        • FAQ
      • Pourbaix Diagram
        • Background
        • Tutorial
        • FAQ
      • Crystal Toolkit
        • Background
        • Tutorial
        • FAQ
      • Reaction Calculator
      • Interface Reactions
    • Characterization Apps
      • X-ray Absorption Spectra (XAS)
    • Explore Contributed Data
  • Downloading Data
    • How do I download the Materials Project database?
    • Using the API
      • Getting Started
      • Querying Data
      • Tips for Large Downloads
      • Examples
      • Advanced Usage
    • Differences between new and legacy API
    • Query and Download Contributed Data
    • AWS OpenData
  • Uploading Data
    • Contribute Data
  • Data Production
    • Data Workflows
    • Data Builders
Powered by GitBook
On this page
  • Overview
  • raw data
  • parsed data
  • build data
  • Explore & Download Data

Was this helpful?

Edit on GitHub
Export as PDF
  1. Downloading Data

AWS OpenData

MP data is also available through the AWS OpenData Program.

PreviousQuery and Download Contributed DataNextContribute Data

Last updated 5 months ago

Was this helpful?

In an effort to make our data as accessible as possible (FAIR principle) as well as significantly improve data downloads and take pressure off our servers, we are making a growing list of our data products available through the . Also see the entries for MP-managed data on the or the . Usage of all data provided through our API or directly through OpenData is subject to our .

Overview

MP data is organized in 3 buckets named materialsproject-{raw,parsed,build}. Note that the particular organization of our data in these buckets is still in flux and can change without notice as we integrate them into our cloud infrastructure.

raw data

We are in the process of providing VASP output files for our calculations in the raw bucket. Look out for announcements through our email lists and notifications on our website.

parsed data

The contains objects that MP generates by parsing the VASP output files. The objects form the basis for our builder pipelines which create the derived high-level data collections served through the MP API and website. All S3 objects in this bucket are serialized pymatgen or emmet python objects and most are stored as gzip-compressed JSON files for each MP ID (i.e. <prefix>/<mp-id>.json.gz). We are in the process of grouping documents into JSON Lines (JSONL) files to reduce the number of files and significantly improve transfer speeds. tasks are now organized by nelements/output.spacegroup.number and a timestamp (dt) derived from the earliest completed_at in the list of tasks included in the respective object. The latest set of tasks is in the /tasks_atomate2prefix (see for details).

prefix
# objects
size

/dos

692k

63.2 GB

/bandstructures

705k

1.5 TB

/chgcars

415k

7.7 TB

/aeccar{0,1,2}s

138.8k each

1.1 TB each

/elfcars

107.5k

101 GB

/locpots

158k

2.5 TB

/tasks

1556

34 GB

/tasks_atomate2

1286

28 GB

build data

prefix
version
# objects
size

/collections

/2022-10-28

12.6k

2.8 GB

/2023-11-01

18.4k

6.1 GB

/2024-11-14

213k

83 GB

/objects

/2022-10-28

289k

55.9 GB

/images

N/A

200k

58 GB

Explore & Download Data

aws s3 ls --no-sign-request s3://<bucket>/<prefix>/[<version>]/

# examples
aws s3 ls --no-sign-request s3://materialsproject-parsed/
aws s3 ls --no-sign-request s3://materialsproject-build/
aws s3 ls --no-sign-request s3://materialsproject-build/collections/2022-10-28/
aws s3 ls --no-sign-request s3://materialsproject-build/images/

All objects for a prefix can be downloaded, using the format

# the AWS CLI will parallelize download requests automatically
aws s3 cp --no-sign-request --recursive s3://<bucket>/<prefix>/ <output-dir>/

# examples
aws s3 cp --no-sign-request --recursive s3://materialsproject-parsed/tasks/ mp-tasks/
aws s3 cp --no-sign-request --recursive \
    s3://materialsproject-build/images/structures/ mp-images-structures/

The contains the high-level derived data that comprises the source for the collections available through the as well as pre-built objects and images for efficient visualization on the website.

The collections and pre-built objects are versioned by the database release date and individual documents grouped into gzip-compressed JSONL files. Images are stored in PNG format. Use the ls command for the AWS CLI or the to list the categories available under each prefix (see below).

The mp-api python client internally uses direct downloads from the OpenData repositories to improve convenience and efficiency. All data in MP's OpenData buckets can also be downloaded directly using the .

Start by exploring the contents of the bucket you're interested in, by either navigating to the bucket's web interface (e.g. ) or using the CLI's ls command:

AWS OpenData Program
AWS OpenData Registry
AWS Data Exchange
terms
parsed bucket
database changelog
build bucket
MP API
AWS CLI
https://materialsproject-parsed.s3.amazonaws.com/index.html
bucket explorer
download section