title

Summary

The progress of a machine learning field is both tracked and propelled through the development of robust benchmarks. In the field of materials informatics, benchmarks often consist of standard metrics such as mean absolute error (MAE) and root mean squared error (RMSE) using traditional train-test splits. In the case of materials discovery, one domain-specific metric involves the ability to successfully predict known materials which have been held out. For experimental materials discovery, a robust measure of performance is whether or not we can predict materials of the future based only on training data from the past. In other words: "how well can we predict what will be discovered in the future?" In the Materials Project database [@jain_commentary_2013], there are records of when experimentally validated compounds were first reported in the literature. As a robust validation setup, we formalize the time-series splits of Materials Project crystal structures for use in generative modeling benchmarking via the mp-time-split Python package (see \autoref{fig:summary}). The Python package provides convenience functions for downloading and processing snapshots of experimentally verified Materials Project entries and creating random time-series splits of the data.

$Summary visualization of splitting Materials Project entries into train and test splits using grouping by first report of experimental verification in the literature.\label{fig:summary}$

Statement of need

Time-based splits have been used in the past for validating materials informatics models. For example, Jain et al. [@tshitoyanUnsupervisedWordEmbeddings2019] "tested whether [the] model -- if trained at various points in the past -- would have correctly predicted thermoelectric materials reported later in the literature." Likewise, Montoya et al. [@palizhati_agents_2022] "seeded [multi-fidelity agents with the] first 500 experimentally discovered compositions (based on ICSD58 timeline of their first publication) and their corresponding DFT data." Hummelshøj et al. [@aykol_network_2019] describe the difficulties associated with predicting future trends of materials discovery in the time-evolution of a materials stability network. We note that each of these examples used bespoke splitting of the data. Recently, Hu et al. [@zhao_physics_2022] used a rediscovery metric to evaluate the results of their generative model for crystal structure, though this was not using a time-based split. The need to generate millions of structures to replicate small portions of the heldout dataset highlights the difficulty of the task. When used with other benchmarking metrics, time-based rediscovery can provide the rigor required to effectively evaluate the performance of generative materials discovery models. mp-time-split acts as a convenient, standardized backend for rediscovery benchmarking metrics as well as other applications such as the ones listed previously.

In particular, the mp-time-split package provides the following features:

downloading and storing snapshots of Materials Project crystal structures via pymatgen [REF] (experimentally verified, theoretical, or both)
modification of search criteria to fetch custom datasets
utilities for post-processing the Materials Project entries
convenient access to a snapshot dataset
predefined scikit-learn TimeSeriesSplit cross-validation splits [REF]

We believe mp-time-split provides the convenience and standardization required of rigorous benchmarking of generative materials discovery models. mp-time-split serves as the basis for a set of benchmarking metrics hosted in the matbench-genmetrics suite which has recently been applied to xtal2png [@baird_xtal2png_2022], a generative model for crystal structure.

Acknowledgements

S.G.B. and T.D.S. acknowledge support by the National Science Foundation, USA under Grant No. DMR-1651668.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paper.md

paper.md

Summary

Statement of need

Acknowledgements

References

Files

paper.md

Latest commit

History

paper.md

File metadata and controls

Summary

Statement of need

Acknowledgements

References