Introduction to MP's contribution platform MPContribs
MPContribs provides a platform and advanced programming interface (API) to contribute computational as well as experimental data to Materials Project. Data on MPContribs is collectively maintained as annotations to existing MP materials (or formulas and chemical systems), and automatically exposed to over 200,000 MP users. The platform serves as the backbone for data and apps contributed to MP while leaving full ownership and control over the data with contributors. Contributed data is automatically shown on MP's materials details pages or its disambiguation pages for formulas and chemical systems. A dedicated landing page is provided for each MPContribs project which can be used to reference the dataset in journal publications through Digital Object Identifiers (DOIs) provided by MP in collaboration with the DOE Office of Scientific and Technological Information (OSTI). The MPContribs python client can be used to programmatically retrieve, upload and modify contributed data.
See below for a list of current MPContribs deployments and an overview of its concepts. Continue with the following sections in MP's documentation to learn more:
The table below lists the various MPContribs portals currently available. When you explore data or consider contributing your data to MP, please pick the portal that best aligns with it.
Datasets that don't fall under the purview of the other portals
Benchmark datasets for Machine Learning
Datasets from beam lines at X-Ray light sources (NSLS-II, ALS, etc.)
Each MPContribs deployment is organized into projects. The MP account creating the project becomes its owner. An owner can ask for the MP accounts of their collaborators to be given access to their project. A collaborator assumes the same level of permissions within a project as the owner.
A project contains a list of contributions to existing MP materials (or alternatively to formulas and chemical systems). It's in the owner's purview to decide what exactly constitutes a project. Often this will simply be an umbrella for a dataset containing contributions to MP materials that are comparable in their scientific context and thus are consistent in their data schema.
Any MP account can create (or be an owner of) a maximum of 3 projects at any time. Project owners can immediately start adding up to 500 contributions to their project without approval from MP. To add more contributions, project owners or their collaborators can reach out to MPContribs administrators to obtain approval.
By default, projects are set to private, i.e. only visible to owners and their collaborators. Each individual contribution in a project is set to public by default and thus automatically released to the public when the project is published. Since the public/private flag can be controlled for each contribution individually, some contributions in a project can be kept private even if the project is public. The public/private state of a project and its contributions can be changed/reverted at any time.
A single contribution constitutes a small blob of data assigned and linked to the according MP material through identifiers such as MP's materials IDs, formulas or chemical systems. In addition to these identifiers, each individual contribution can contain the following four components:
- A data component containing hierarchically organized key-value data (think nested dictionaries). In its flattened format, this component can contain a maximum of 50 keys/fields each of which becomes a column in the overview table on the project landing pages. Nested fields in the data dictionary are organized as nested columns on the landing page table. Any data types included in the data component become queryable, filterable and sortable using a wide variety of operators. Also see the API documentation for any MPContribs deployment for a generic list of available filters.
- A structures component containing a list of up to 10 pymatgen structures with optionally customized names. A string in the format used for Crystallographic Information Files (CIFs) is stored with each structure and can be retrieved through the API or downloaded through the project landing pages.
- A tables component containing a list of up to 10 pandas DataFrames. This component is intended for the inclusion of 2D spectra (think CSV files) with each contribution. A Plotly graph is generated for each table and included in the according contribution detail page for visualization purposes. Each DataFrame's name and other attributes (title, axis labels, ...) needed to configure the Plotly graph can be controlled via the Dataframes'
attrsattribute. The total number of table rows is stored and all table cells formatted automatically. The API paginates the table rows for more efficient data retrieval. Each table can be downloaded as CSV programmatically or through the project landing pages.
- An attachments component containing a list of up to 10 MPContribs Attachments with customized names. Attachments can be gzipped text files (CSV, JSON, ...) or images in PNG, JPEG, GIF, or TIFF formats. An attachment can either be created directly from a file path or from a python list or dictionary using the
mpcontribs.client.Attachment.from_data()method. Each attachment can be up to 2.4 MB large. Attachment meta-data are queryable but not its contents (think e-mail attachments).
Duplicate structures, tables, and attachments are only saved once internally but referenced by all contributions they were submitted with. See the section about contributing data for more information and examples.