Skip to content

Commit

Permalink
Merge pull request #84 from WMD-group/doc_updates
Browse files Browse the repository at this point in the history
Doc updates
  • Loading branch information
AntObi authored Aug 7, 2023
2 parents 1698aed + 858d10f commit ba87938
Show file tree
Hide file tree
Showing 13 changed files with 229 additions and 307 deletions.
19 changes: 8 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# ElementEmbeddings


[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
Expand Down Expand Up @@ -143,21 +142,19 @@ The `composition_featuriser` function can be used to featurise the data. The com
```python
from elementembeddings.composition import composition_featuriser

df_featurised = composition_featuriser(df, embedding="magpie", stats="mean")
df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean","sum"])

df_featurised
```

| formula | mean_Number | mean_MendeleevNumber | mean_AtomicWeight | mean_MeltingT | ... | mean_SpaceGroupNumber |
|---------|-------------|----------------------|--------------------|-------------------|-----|-----------------------|
| CsPbI3 | 59.2 | 74.8 | 144.16377238 | 412.55 | ... | 129.20000000000002 |
| Fe2O3 | 15.2 | 74.19999999999999 | 31.937640000000002 | 757.2800000000001 | ... | 98.80000000000001 |
| NaCl | 14.0 | 48.0 | 29.221384640000004 | 271.235 | ... | 146.5 |
| ZnS | 23.0 | 78.5 | 48.7225 | 540.52 | ... | 132.0 |

(The columns of the resulting dataframe have been truncated for clarity.)
| formula | mean_Number | mean_MendeleevNumber | mean_AtomicWeight | mean_MeltingT | mean_Column | mean_Row | mean_CovalentRadius | mean_Electronegativity | mean_NsValence | mean_NpValence | mean_NdValence | mean_NfValence | mean_NValence | mean_NsUnfilled | mean_NpUnfilled | mean_NdUnfilled | mean_NfUnfilled | mean_NUnfilled | mean_GSvolume_pa | mean_GSbandgap | mean_GSmagmom | mean_SpaceGroupNumber | sum_Number | sum_MendeleevNumber | sum_AtomicWeight | sum_MeltingT | sum_Column | sum_Row | sum_CovalentRadius | sum_Electronegativity | sum_NsValence | sum_NpValence | sum_NdValence | sum_NfValence | sum_NValence | sum_NsUnfilled | sum_NpUnfilled | sum_NdUnfilled | sum_NfUnfilled | sum_NUnfilled | sum_GSvolume_pa | sum_GSbandgap | sum_GSmagmom | sum_SpaceGroupNumber |
|---------|-------------|----------------------|--------------------|-------------------|-------------|----------|---------------------|------------------------|----------------|----------------|--------------------|--------------------|---------------|-----------------|-----------------|-----------------|-----------------|----------------|------------------|----------------|--------------------|-----------------------|------------|---------------------|-------------------|--------------|------------|---------|--------------------|-----------------------|---------------|---------------|---------------|---------------|--------------|----------------|----------------|----------------|----------------|---------------|--------------------|---------------|--------------|----------------------|
| CsPbI3 | 59.2 | 74.8 | 144.16377238 | 412.55 | 13.2 | 5.4 | 161.39999999999998 | 2.22 | 1.8 | 3.4 | 8.0 | 2.8000000000000003 | 16.0 | 0.2 | 1.4 | 0.0 | 0.0 | 1.6 | 54.584 | 0.6372 | 0.0 | 129.20000000000002 | 296.0 | 374.0 | 720.8188619 | 2062.75 | 66.0 | 27.0 | 807.0 | 11.100000000000001 | 9.0 | 17.0 | 40.0 | 14.0 | 80.0 | 1.0 | 7.0 | 0.0 | 0.0 | 8.0 | 272.92 | 3.186 | 0.0 | 646.0 |
| Fe2O3 | 15.2 | 74.19999999999999 | 31.937640000000002 | 757.2800000000001 | 12.8 | 2.8 | 92.4 | 2.7960000000000003 | 2.0 | 2.4 | 2.4000000000000004 | 0.0 | 6.8 | 0.0 | 1.2 | 1.6 | 0.0 | 2.8 | 9.755 | 0.0 | 0.8442651200000001 | 98.80000000000001 | 76.0 | 371.0 | 159.6882 | 3786.4 | 64.0 | 14.0 | 462.0 | 13.98 | 10.0 | 12.0 | 12.0 | 0.0 | 34.0 | 0.0 | 6.0 | 8.0 | 0.0 | 14.0 | 48.775000000000006 | 0.0 | 4.2213256 | 494.0 |
| NaCl | 14.0 | 48.0 | 29.221384640000004 | 271.235 | 9.0 | 3.0 | 134.0 | 2.045 | 1.5 | 2.5 | 0.0 | 0.0 | 4.0 | 0.5 | 0.5 | 0.0 | 0.0 | 1.0 | 26.87041666665 | 1.2465 | 0.0 | 146.5 | 28.0 | 96.0 | 58.44276928000001 | 542.47 | 18.0 | 6.0 | 268.0 | 4.09 | 3.0 | 5.0 | 0.0 | 0.0 | 8.0 | 1.0 | 1.0 | 0.0 | 0.0 | 2.0 | 53.7408333333 | 2.493 | 0.0 | 293.0 |
| ZnS | 23.0 | 78.5 | 48.7225 | 540.52 | 14.0 | 3.5 | 113.5 | 2.115 | 2.0 | 2.0 | 5.0 | 0.0 | 9.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 19.8734375 | 1.101 | 0.0 | 132.0 | 46.0 | 157.0 | 97.445 | 1081.04 | 28.0 | 7.0 | 227.0 | 4.23 | 4.0 | 4.0 | 10.0 | 0.0 | 18.0 | 0.0 | 2.0 | 0.0 | 0.0 | 2.0 | 39.746875 | 2.202 | 0.0 | 264.0 |

The returned dataframe contains the mean-pooled features of the magpie representation for the four formulas.
The returned dataframe contains the mean-pooled and sum-pooled features of the magpie representation for the four formulas.

## Development notes

Expand Down
14 changes: 10 additions & 4 deletions docs/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,19 @@
[![GitHub issues](https://img.shields.io/github/issues-raw/WMD-Group/ElementEmbeddings)](https://github.com/WMD-group/ElementEmbeddings/issues)
[![CI Status](https://github.com/WMD-group/ElementEmbeddings/actions/workflows/ci.yml/badge.svg)](https://github.com/WMD-group/ElementEmbeddings/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/WMD-group/ElementEmbeddings/branch/main/graph/badge.svg?token=OCMIM5SHL0)](https://codecov.io/gh/WMD-group/ElementEmbeddings)
[![DOI](https://zenodo.org/badge/493285385.svg)](https://zenodo.org/badge/latestdoi/493285385)
[![PyPI](https://img.shields.io/pypi/v/ElementEmbeddings)](https://pypi.org/project/ElementEmbeddings/)
[![documentation](https://img.shields.io/badge/docs-mkdocs%20material-blue.svg?style=flat)](https://wmd-group.github.io/ElementEmbeddings/)
![python version](https://img.shields.io/pypi/pyversions/elementembeddings)

The **ElementEmbeddings** package provides high-level tools for analysing elemental
The **Element Embeddings** package provides high-level tools for analysing elemental
embeddings data. This primarily involves visualising the correlation between
embedding schemes using different statistical measures.

Motivation
--------
* **Documentation:** <https://wmd-group.github.io/ElementEmbeddings/>
* **Examples:** <https://github.com/WMD-group/ElementEmbeddings/tree/main/examples>

## Motivation

Machine learning approaches for materials informatics have become increasingly
widespread. Some of these involve the use of deep learning
Expand All @@ -22,4 +28,4 @@ rather than specified by the user of the model. While an important goal of
machine learning training is to minimise the chosen error function to make more
accurate predictions, it is also important for us material scientists to be able
to interpret these models. As such, we aim to evaluate and compare different atomic embedding
schemes in a consistent framework.
schemes in a consistent framework.
58 changes: 55 additions & 3 deletions docs/contribution.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,63 @@
## Bug reports, feature requests and questions
# Contributing

Please use the [Issue Tracker](https://github.com/WMD-group/ElementEmbeddings/issues) to report bugs or request features in the first instance. Contributions are always welcome.
This is a quick guide on how to follow best practice and contribute smoothly to `ElementEmbeddings`.

## Code contributions

We are always looking for ways to make `ElementEmbeddings` better and a more useful to a wider community. For making contributions, use the ["Fork and Pull"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow to make contributions and stick as closely as possible to the following:

* Code style should comply with [PEP8](https://peps.python.org/pep-0008/) where possible. [Google's house style](https://google.github.io/styleguide/pyguide.html) is also helpful, including a good model for docstrings.
* Please use comments liberally when adding nontrivial features, and take the chance to clean up other people's code while looking at it.
* Add tests wherever possible, and use the test suite to check if you broke anything.
* Add tests wherever possible, and use the test suite to check if you broke anything.

## Workflow

We follow the [GitHub flow]
(<https://guides.github.com/introduction/flow/index.html>), using
branches for new work and pull requests for verifying the work.

The steps for a new piece of work can be summarised as follows:

1. Push up or create [an issue](https://guides.github.com/features/issues).
2. Create a branch from main, with a sensible name that relates to the issue.
3. Do the work and commit changes to the branch. Push the branch
regularly to GitHub to make sure no work is accidentally lost.
4. Write or update unit tests for the code you work on.
5. When you are finished with the work, ensure that all of the unit
tests pass on your own machine.
6. Open a pull request [on the pull request page](https://github.com/WMD-group/ElementEmbeddings/pulls).
7. If nobody acknowledges your pull request promptly, feel free to poke one of the main developers into action.

## Pull requests

For a general overview of using pull requests on GitHub look [in the GitHub docs](https://help.github.com/en/articles/about-pull-requests).

When creating a pull request you should:

* Ensure that the title succinctly describes the changes so it is easy to read on the overview page
* Reference the issue which the pull request is closing

Recommended reading: [How to Write the Perfect Pull Request](https://github.blog/2015-01-21-how-to-write-the-perfect-pull-request/)

## Dev requirements

When developing locally, it is recommended to install the python packages in `requirements-dev.txt`.

```bash
pip install -r requirements-dev.txt
```

This will allow you to run the tests locally with pytest as described in the main README,
as well as run pre-commit hooks to automatically format python files with isort and black.
To install the pre-commit hooks (only needs to be done once):

```bash
pre-commit install
pre-commit run --all-files # optionally run hooks on all files
```

Pre-commit hooks will check all files when you commit changes, automatically fixing any files which are not formatted correctly. Those files will need to be staged again before re-attempting the commit.

## Bug reports, feature requests and questions

Please use the [Issue Tracker](https://github.com/WMD-group/ElementEmbeddings/issues) to report bugs or request features in the first instance. Contributions are always welcome.
Binary file added docs/images/magpie_cosine_sim_heatmap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/magpie_umap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 8 additions & 5 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
# Getting Started

The latest version of the package can be installed using:
The latest stable release can be installed via pip using:

```
pip install git+git://github.com/WMD-group/ElementEmbeddings.git
```bash
pip install ElementEmbeddings
```

## Developer's installation (optional)

For development work, `ElementEmbeddings` can eb installed from a copy of the [source repository](https://github.com/WMD-group/ElementEmbeddings.git); this is preferred if using experimental code branches.

To clone the project from Github and make a local installation:

```
```bash
git clone https://github.com/WMD-group/ElementEmbeddings.git
cd ElementEmbeddings
pip install -e .
```
With `-e`, pip will create links to the source folder so that the changes to the code will be reflected on the PATH.

With `-e`, pip will create links to the source folder so that the changes to the code will be reflected on the PATH.
64 changes: 49 additions & 15 deletions docs/reference.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,39 @@
# Elemental Embeddings

The data contained in this folder is a collection of various elemental representation/embedding schemes
The data contained in this repository are a collection of various elemental representation/embedding schemes. We provide the literature source for these representations as well as the data source for which the files were obtained. A majority of these representations have been obtained from the following repositories:

* [lrcfmd/ElMD](https://github.com/lrcfmd/ElMD/tree/master)
* [Kaaiian/CBFV](https://github.com/Kaaiian/CBFV/tree/master)

## Linear representations

For the linear/scalar representations, the `Embedding` class will load these representations as one-hot vectors where the vector components are ordered following the scale (i.e. the `atomic` representation is ordered by atomic numbers).

### Modified Pettifor scale

The following paper describes the details of the modified Pettifor chemical scale:
[The optimal one-dimensional periodic table: a modified Pettifor chemical scale from data mining](https://iopscience.iop.org/article/10.1088/1367-2630/18/9/093011/meta)

[Data source](https://github.com/lrcfmd/ElMD/blob/master/ElMD/el_lookup/mod_petti.json)

### Atomic numbers

We included `atomic` as a linear representation to generate one-hot vectors corresponding to the atomic numbers

## Vector representations

The following representations are all vector representations (some are local, some are distributed) and the `Embedding` class will load these representations as they are.

### Magpie

## Magpie
The following paper describes the details of the Materials Agnostic Platform for Informatics and Exploration (Magpie) framework:
[A general-purpose machine learning framework for predicting properties of inorganic materials](https://www.nature.com/articles/npjcompumats201628)

The source code for Magpie can be found
[here](https://bitbucket.org/wolverton/magpie/src/master/)

[Data source](https://github.com/Kaaiian/CBFV/blob/master/cbfv/element_properties/magpie.csv)

The 22 dimensional embedding vector includes the following elemental properties:

<details>
Expand All @@ -32,30 +57,36 @@ The 22 dimensional embedding vector includes the following elemental properties:
* Space Group Number
</details>

* `magpie_sc` is scaled version of the magpie embeddings
* `magpie_sc` is a scaled version of the magpie embeddings. [Data source](https://github.com/lrcfmd/ElMD/blob/master/ElMD/el_lookup/magpie_sc.json)

## mat2vec
### mat2vec

The following paper describes the implementation of mat2vec:
[Unsupervised word embeddings capture latent knowledge from materials science literature](https://www.nature.com/articles/s41586-019-1335-8)

## MatScholar
[Data source](https://github.com/Kaaiian/CBFV/blob/master/cbfv/element_properties/mat2vec.csv)

### MatScholar

The following paper describes the natural language processing implementation of Materials Scholar (matscholar):
[Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00470)

## MEGnet
[Data source](https://github.com/lrcfmd/ElMD/blob/master/ElMD/el_lookup/matscholar.json)

### MEGnet

The following paper describes the details of the construction of the MatErials Graph Network (MEGNet):
[Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals](https://doi.org/10.1021/acs.chemmater.9b01294)

## Modified Pettifor scale
The following paper describes the details of the modified Pettifor chemical scale:
[The optimal one dimensional periodic table: a modified Pettifor chemical scale from data mining](https://iopscience.iop.org/article/10.1088/1367-2630/18/9/093011/meta)
[Data source](https://github.com/lrcfmd/ElMD/blob/master/ElMD/el_lookup/megnet16.json)

### Oliynyk

## Oliynkyk
The following paper describes the details:
[High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds](https://pubs.acs.org/doi/full/10.1021/acs.chemmater.6b02724)

[Data source](https://github.com/Kaaiian/CBFV/blob/master/cbfv/element_properties/oliynyk.csv)

The 44 features of the embedding vector are formed of the following properties:
<details>
<summary> Click to see the 44 features!</summary>
Expand Down Expand Up @@ -106,21 +137,24 @@ The 44 features of the embedding vector are formed of the following properties:
* Cohesive_energy
</details>

* `oliynyk_sc` is scaled version of the oliynyk embeddings
* `oliynyk_sc` is a scaled version of the oliynyk embeddings: [Data source](https://github.com/lrcfmd/ElMD/blob/master/ElMD/el_lookup/oliynyk_sc.json)

## Random
### Random

This is a set of 200-dimensional vectors in which the components are randomly generated

The 118 200-dimensional vectors in `random_200_new` was generated using the following code:
The 118 200-dimensional vectors in `random_200_new` were generated using the following code:

```python
import numpy as np

mu , sigma = 0 , 0.1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
mu , sigma = 0 , 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
s = np.random.default_rng(seed=42).normal(mu, sigma, (118,200))
```
## SkipAtom

### SkipAtom

The following paper describes the details:
[Distributed representations of atoms and materials for machine learning](https://www.nature.com/articles/s41524-022-00729-3)

[Data source](https://github.com/lantunes/skipatom/blob/main/data/skipatom_20201009_induced.csv)
Loading

0 comments on commit ba87938

Please sign in to comment.