Skip to content
This repository was archived by the owner on Jun 17, 2023. It is now read-only.

Commit

Permalink
Update paper.md
Browse files Browse the repository at this point in the history
  • Loading branch information
sgbaird committed Nov 14, 2022
1 parent 923e094 commit 34be9f4
Showing 1 changed file with 0 additions and 137 deletions.
137 changes: 0 additions & 137 deletions reports/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,20 +59,6 @@ literature.\label{fig:summary}](figures/time-split-abstract.png)
<!--- Mention similar options in molecular discovery benchmarking, e.g. guacamol which I believe has something similar in terms of rediscovery, though maybe not time-based. Mention legacy materials informatics (CrabNet, CGCNN, etc.) and the shift towards inverse design via generative modeling (CDVAE, FTCP, PGCGM, CubicGAN, etc.). --->


<!-- The latest advances in machine learning are often in natural language processing such as with long
short-term memory networks (LSTMs) and Transformers, or image processing such as with
generative adversarial networks (GANs), variational autoencoders (VAEs), and guided
diffusion models. `xtal2png` encodes and decodes crystal structures via PNG
images (see e.g. \autoref{fig:64-bit}) by writing and reading the necessary information
for crystal reconstruction (unit cell, atomic elements, atomic coordinates) as a square
matrix of numbers. This is akin to making/reading a QR code for crystal
structures, where the `xtal2png` representation is an invertible representation. The
ability to feed these images directly into image-based pipelines allows you, as a
materials informatics practitioner, to get streamlined results for new state-of-the-art
image-based machine learning models applied to crystal structures.
![A real size $64\times64$ pixel `xtal2png` representation of a crystal structure.\label{fig:64-bit}](figures/Zn8B8Pb4O24,volume=623,uid=bc2d.png) -->

# Statement of need

Time-based splits have been used in the past for validating materials informatics
Expand Down Expand Up @@ -106,129 +92,6 @@ rigorous benchmarking of generative materials discovery models. `mp-time-split`
as the basis for a set of benchmarking metrics hosted in the [`matbench-genmetrics`](https://github.com/sparks-baird/matbench-genmetrics) suite
which has recently been applied to `xtal2png` [@baird_xtal2png_2022], a generative model for crystal structure.

<!-- Using a state-of-the-art method in a separate domain with a custom data representation
is often an expensive and drawn-out process. For example, [@vaswaniAttentionAllYou2017]
introduced the revolutionary natural language processing Transformer architecture in
June 2017, yet the application of Transformers to the adjacent domain of materials
informatics (chemical-formula-based predictions) was not publicly realized until late
2019 [@goodallPredictingMaterialsProperties2019], approximately two-and-a-half years
later, with peer-reviewed publications dating to late 2020
[@goodallPredictingMaterialsProperties2020]. Interestingly, a nearly identical
implementation was being developed concurrently in a different research group with
slightly later public release [@wangCompositionallyrestrictedAttentionbasedNetwork2020]
and publication [@wangCompositionallyRestrictedAttentionbased2021] dates. Another
example of a state-of-the-art algorithm domain transfer is refactoring image-processing
models for crystal structure applications, which was first introduced in a preprint
[@kipfSemisupervisedClassificationGraph2016] and published with application for
materials' property prediction in a peer-reviewed journal over a year later
[@xieCrystalGraphConvolutional2018]. Similarly, VAEs were introduced in 2013
[@kingmaAutoEncodingVariationalBayes2014a] and implemented for molecules in 2016
[@gomez-bombarelliAutomaticChemicalDesign2016], and denoising diffusion probabilistic
models (DDPMs) were introduced in 2015 [@sohl-dicksteinDeepUnsupervisedLearning2015] and
implemented for crystal structures in 2021 [@xieCrystalDiffusionVariational2021]. Here,
we focus on state-of-the-art domain transfer (especially of generative models) from
image processing to crystal structure to enable materials science practitioners to
leverage the most advanced image processing models for materials' property prediction
and inverse design. -->

<!-- `xtal2png` is a Python package that allows you to convert between a crystal structure
and a PNG image for direct use with image-based machine learning models. Let's take
[Google's image-to-image diffusion model,
Palette](https://iterative-refinement.github.io/palette/)
[@sahariaPaletteImagetoImageDiffusion2022a], which supports unconditional image
generation, conditional inpainting, and conditional image restoration, which are modeling tasks
that can be used in crystal generation, structure prediction, and structure
relaxation, respectively. Rather than dig into the code and spending hours, days, or
weeks modifying, debugging, and playing GitHub phone tag with the developers before you
can (maybe) get preliminary results, `xtal2png` lets you get comparable results using the default parameters, assuming the instructions can be run without
error. While there are other invertible representations for crystal structures
[@xieCrystalDiffusionVariational2022;@renInvertibleCrystallographicRepresentation2022a]
as well as cross-domain conversions such as converting between molecules and strings
[@weiningerSMILESChemicalLanguage1988;@selfies], to our knowledge, this is the first
package that enables conversion between a crystal structure and an image file format.
![(a) upscaled example image and (b) legend of the `xtal2png` encoding.\label{fig:example-and-legend}](figures/example-and-legend.png)
`xtal2png` was designed to be easy to use by both
"[Pythonistas](https://en.wiktionary.org/wiki/Pythonista)" and entry-level coders alike.
`xtal2png` provides a straightforward Python application programming interface (API) and
command-line interface (CLI). `xtal2png` relies on `pymatgen.core.structure.Structure`
[@ongPythonMaterialsGenomics2013] objects for representing crystal structures and also
supports reading crystallographic information files (CIFs) from directories. `xtal2png`
encodes crystallographic information related to the unit cell, crystallographic
symmetry, and atomic elements and coordinates which are each scaled individually
according to the information type. An upscaled version of the PNG image and a legend of
the representation are given in \autoref{fig:example-and-legend}. Due to the encoding of
numerical values as PNG images (allowable values are integers between 0 and
255), a round-off error is present during a single round of encoding and decoding.
An example comparing an original vs. decoded structure is given in
\autoref{fig:original-decoded}.
There are some limitations and design considerations for `xtal2png` that are described
in `xtal2png`'s [documentation](https://xtal2png.readthedocs.io/en/latest/index.html) in
the Overview section.
At this time, it is unclear to what extent deviation from the aforementioned design
choices will affect performance. We intend to use hyperparameter optimization to
determine an optimal configuration for crystal structure generation tasks using the
`xtal2png` representation.
![(a) Original and (b) `xtal2png` decoded visualizations of
[`mp-560471`](https://materialsproject.org/materials/mp-560471/) / Zn$_2$B$_2$PbO$_6$. Images were generated using [ase visualizations](https://wiki.fysik.dtu.dk/ase/ase/visualize/visualize.html). \label{fig:original-decoded}](figures/original-decoded.png){ width=50% }
The significance of the representation lies in being able to directly use the PNG
representation with image-based models which often do not directly support custom
dataset types. We expect the use of `xtal2png` as a screening tool for such models to
save significant user time of code refactoring and adaptation during the process of
obtaining preliminary results on a newly released model. After obtaining preliminary
results, you get to decide whether it's worth it to you to take on the
higher-cost/higher-expertise task of modifying the codebase and using a more customized
approach. Or you can stick with the results of xtal2png. It's up to you!
We plan to apply `xtal2png` to a probabilistic diffusion generative model as a
proof of concept and present our findings in the near future. -->

<!-- ![Caption for example figure.\label{fig:example}](figure.png) -->

<!-- # Mathematics
Single dollars ($) are required for inline mathematics e.g. $f(x) = e^{\pi/x}$
Double dollars make self-standing equations:
$$\Theta(x) = \left\{\begin{array}{l}
0\textrm{ if } x < 0\cr
1\textrm{ else}
\end{array}\right.$$
You can also use plain \LaTeX for equations
\begin{equation}\label{eq:fourier}
\hat f(\omega) = \int_{-\infty}^{\infty} f(x) e^{i\omega x} dx
\end{equation}
and refer to \autoref{eq:fourier} from text. -->

<!--
# Citations
Citations to entries in paper.bib should be in
[rMarkdown](http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html)
format.
If you want to cite a software repository URL (e.g. something on GitHub without a preferred
citation) then you can do it with the example BibTeX entry below for @fidgit.
For a quick reference, the following citation commands can be used:
- `@author:2001` -> "Author et al. (2001)"
- `[@author:2001]` -> "(Author et al., 2001)"
- `[@author1:2001; @author2:2001]` -> "(Author1 et al., 2001; Author2 et al., 2002)" -->

<!-- # Figures
Figures can be included like this:
![Caption for example figure.\label{fig:example}](figure.png)
and referenced from text using \autoref{fig:example}.
Figure sizes can be customized by adding an optional second parameter:
![Caption for example figure.](figure.png){ width=20% } -->

# Acknowledgements

S.G.B. and T.D.S. acknowledge support by the National Science Foundation, USA under
Expand Down

0 comments on commit 34be9f4

Please sign in to comment.