Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Neat and automated transfer learning with OPTIMADE API for auto-adjusted problem-specific ML model generation on the fly #16

Merged
merged 50 commits into from
Apr 4, 2024
Merged
Changes from 1 commit
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
55043df
(modelAdjusters) set up scaffold for the submodule
amkrajewski Mar 27, 2024
bf83909
(MA) documentation and initialization of `LocalAdjuster`
amkrajewski Mar 27, 2024
7408aed
(MA) minor initialization additions and 2 plotting helper functions
amkrajewski Mar 27, 2024
bd93bde
(core) added `plotly` to the requirements
amkrajewski Mar 27, 2024
f1af828
(MA) finalized the key `adjust` routine
amkrajewski Mar 27, 2024
6825060
(MA) aded returns to the `adjust` routine
amkrajewski Mar 27, 2024
d69151b
(MA) add a message if ClearML is to be used in case someone does it o…
amkrajewski Mar 27, 2024
3a4b7a0
(MA) improved memory handling during adjustments
amkrajewski Mar 28, 2024
2faea84
(MA) implemented a hyperparameter search matrix for 27 common options…
amkrajewski Mar 28, 2024
47e3fbf
(MA) implemented a hyperparameter search matrix for 27 common options…
amkrajewski Mar 28, 2024
a3a6c4c
(MA) added `optimade` to the `dev` requirements
amkrajewski Mar 28, 2024
96c3220
(MA) implemented initialization of `OPTIMADEAdjuster` and its docstrings
amkrajewski Mar 28, 2024
e07e7f6
(MA) implemented querying OPTIMADE and passing structures to featuriz…
amkrajewski Mar 28, 2024
a823eca
(core) added `[http_client]` to the `optimade` requirement needed for…
amkrajewski Mar 28, 2024
637f1bb
(MA) added entry name collection to the `OPTIMADEAdjuster`
amkrajewski Mar 28, 2024
889cad5
(MA) upped the default result number limit to `10000` and added it as…
amkrajewski Mar 28, 2024
2525b2e
(core) removed info about importing from the top `pysipfenn` namespac…
amkrajewski Mar 28, 2024
7889c30
(MA) finalized the `fetchAndFeturize` routine of `OPTIMADEAdjuster`,…
amkrajewski Mar 28, 2024
71c8557
(MA) add a message on possible degeneracy of datasets from overlappi…
amkrajewski Mar 28, 2024
3c5c290
(MA) added class parameter of `validationLabels`, assigned at every …
amkrajewski Mar 28, 2024
a8fd4cd
(MA) improved several printouts
amkrajewski Mar 28, 2024
df8c77c
(MA) added assertions that data to operate on is available before tu…
amkrajewski Mar 28, 2024
5209b22
(MA) added automatic shuffling of the OPTIMADE-obtained data
amkrajewski Mar 29, 2024
2cbcbc1
(core) limit the convenience import scope of the top level `__init__…
amkrajewski Mar 29, 2024
4919926
(MA) distributed optional dependency imports as needed
amkrajewski Mar 29, 2024
6e69976
(deps) moved `plotly` from optional to required dependencies
amkrajewski Mar 29, 2024
1f6ec51
(core) small addition to the top level imports
amkrajewski Mar 29, 2024
b7bb2c4
(core) reverted the `__init__.py` scope narrowing for the top level …
amkrajewski Mar 29, 2024
231bd5f
(MA) aded tracking of the compositions coming in from OPTIMADE; not u…
amkrajewski Mar 29, 2024
cb02fb8
(MA) plotting style improvements
amkrajewski Mar 29, 2024
31cad70
(MA) added a convenience function for highlighting points at given in…
amkrajewski Mar 29, 2024
01d646f
(MA) added reduced formula homogenization just in case
amkrajewski Mar 29, 2024
81b41de
(MA) added printout on the number of unique compositions at the end o…
amkrajewski Mar 29, 2024
6858fd4
(MA) added population of validation labels with "Training" after data…
amkrajewski Mar 29, 2024
36fce5a
(MA) improved target path documentation
amkrajewski Mar 29, 2024
ff16218
(MA) added ETA printouts to the hyperparameter search
amkrajewski Mar 29, 2024
bedfc09
(MA) bugfix at CSV data ingestion which did not expect default pySIPF…
amkrajewski Mar 30, 2024
a079d36
(MA) additional assertiona at the plotting step
amkrajewski Mar 30, 2024
e024222
(MA) general minor improvements including populating `self.validation…
amkrajewski Mar 30, 2024
8d94133
(test) added an example NumPy file with stored descriptor data
amkrajewski Mar 30, 2024
8aa6dda
(test) added an example CSV file with stored descriptor data
amkrajewski Mar 30, 2024
6316916
(test) added an example NumPy file with a stored target data array
amkrajewski Mar 30, 2024
87fb74e
(test) added an example CSV file with a stored target data table
amkrajewski Mar 30, 2024
8d635af
(test) added a pretty complete testing suite for model adjusters; tes…
amkrajewski Mar 30, 2024
f6e92e0
(test) improved `ModelAdjusters` test documentation
amkrajewski Mar 30, 2024
57eb3e5
(MA) reduced hyperparameter search epoch *default* number to 20 and i…
amkrajewski Apr 3, 2024
bd354c9
(MA) added a set of assertions to the `OPTIMADEAdjuster` initialization
amkrajewski Apr 3, 2024
785dabb
(MA) added an assertion more than no data was fetched, so that the er…
amkrajewski Apr 3, 2024
ac819da
(MA) implemented `endpointOverride` parameter to allow user to specif…
amkrajewski Apr 3, 2024
4c12c21
(tests) added testing procedure for endpoint overriding
amkrajewski Apr 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
(MA) implemented a hyperparameter search matrix for 27 common options…
…; meant mostly for tuning to smaller datasets
amkrajewski committed Mar 28, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit 47e3fbfcf1706185d6a1e7432123ba5085de65c6
161 changes: 159 additions & 2 deletions pysipfenn/core/modelAdjusters.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
import os
from typing import Union, Literal, Tuple, List
from typing import Union, Literal, Tuple, List, Dict
from copy import deepcopy
import gc

import numpy as np
import torch
from torch.utils.data import DataLoader, TensorDataset
import plotly.express as px
import plotly.graph_objects as go
from pysipfenn.core.pysipfenn import Calculator

class LocalAdjuster:
@@ -257,7 +258,6 @@ def adjust(
if verbose:
print(f'Train: {transferLosses[-1]:.4f} | Epoch: 0/{epochs}')


for epoch in range(epochs):
model.train()
for data, target in dataloaderTrain:
@@ -305,7 +305,160 @@ def adjust(

return self.adjustedModel, transferLosses, validationLosses

def matrixHyperParameterSearch(
self,
validation: float = 0.2,
epochs: int = 100,
batchSize: int = 32,
lossFunction: Literal["MSE", "MAE"] = "MAE",
learningRates: Tuple[float] = (1e-6, 1e-5, 1e-4),
optimizers: Tuple[Literal["Adam", "AdamW", "Adamax", "RMSprop"]] = ("Adam", "AdamW", "Adamax"),
weightDecays: Tuple[float] = (1e-5, 1e-4, 1e-3),
verbose: bool = True,
plot: bool = True
) -> Tuple[torch.nn.Module, Dict[str, Union[float, str]]]:
"""
Performs a grid search over the hyperparameters provided to find the best combination. By default, it will
plot the training history with plotly in your browser, and (b) print the best hyperparameters found. If the
ClearML platform was set to be used for logging (at the class initialization), the results will be uploaded
there as well. If the default values are used, it will test 27 combinations of learning rates, optimizers, and
weight decays. The method will then adjust the model to the best hyperparameters found, corresponding to the
lowest validation loss if validation is used, or the lowest training loss if validation is not used
(``validation=0``). Note that the validation is used by default.

Args:
validation: Same as in the ``adjust`` method. Default is ``0.2``.
epochs: Same as in the ``adjust`` method. Default is ``100``.
batchSize: Same as in the ``adjust`` method. Default is ``32``.
lossFunction: Same as in the ``adjust`` method. Default is ``MAE``, i.e. Mean Absolute Error or L1 loss.
learningRates: Tuple of floats with the learning rates to be tested. Default is ``(1e-6, 1e-5, 1e-4)``. See
the ``adjust`` method for more information.
optimizers: Tuple of strings with the optimizers to be tested. Default is ``("Adam", "AdamW", "Adamax")``. See
the ``adjust`` method for more information.
weightDecays: Tuple of floats with the weight decays to be tested. Default is ``(1e-5, 1e-4, 1e-3)``. See
the ``adjust`` method for more information.
verbose: Same as in the ``adjust`` method. Default is ``True``.
plot: Whether to plot the training history after all the combinations are tested. Default is ``True``.
"""
if verbose:
print("Starting the hyperparameter search...")

bestModel: torch.nn.Module = None
bestTrainingLoss: float = np.inf
bestValidationLoss: float = np.inf
bestHyperparameters: Dict[str, Union[float, str, None]] = {
"learningRate": None,
"optimizer": None,
"weightDecay": None,
"epochs": None
}

trainLossHistory: List[List[float]] = []
validationLossHistory: List[List[float]] = []
labels: List[str] = []

for learningRate in learningRates:
for optimizer in optimizers:
for weightDecay in weightDecays:
labels.append(f"LR: {learningRate} | OPT: {optimizer} | WD: {weightDecay}")
model, trainingLoss, validationLoss = self.adjust(
validation=validation,
learningRate=learningRate,
epochs=epochs,
batchSize=batchSize,
optimizer=optimizer,
weightDecay=weightDecay,
lossFunction=lossFunction,
verbose=True
)
trainLossHistory.append(trainingLoss)
validationLossHistory.append(validationLoss)
if validation > 0:
localBestValidationLoss, bestEpoch = min((val, idx) for idx, val in enumerate(validationLoss))
if localBestValidationLoss < bestValidationLoss:
print(f"New best model found with LR: {learningRate}, OPT: {optimizer}, WD: {weightDecay}, "
f"Epoch: {bestEpoch + 1}/{epochs} | Train: {trainingLoss[bestEpoch]:.4f} | "
f"Validation: {localBestValidationLoss:.4f}")
del bestModel
gc.collect()
bestModel = model
bestTrainingLoss = trainingLoss[bestEpoch]
bestValidationLoss = localBestValidationLoss
bestHyperparameters["learningRate"] = learningRate
bestHyperparameters["optimizer"] = optimizer
bestHyperparameters["weightDecay"] = weightDecay
bestHyperparameters["epochs"] = bestEpoch + 1
else:
print(f"Model with LR: {learningRate}, OPT: {optimizer}, WD: {weightDecay} did not improve.")
else:
localBestTrainingLoss, bestEpoch = min((val, idx) for idx, val in enumerate(trainingLoss))
if localBestTrainingLoss < bestTrainingLoss:
print(f"New best model found with LR: {learningRate}, OPT: {optimizer}, WD: {weightDecay}, "
f"Epoch: {bestEpoch + 1}/{epochs} | Train: {localBestTrainingLoss:.4f}")
del bestModel
gc.collect()
bestModel = model
bestTrainingLoss = localBestTrainingLoss
bestHyperparameters["learningRate"] = learningRate
bestHyperparameters["optimizer"] = optimizer
bestHyperparameters["weightDecay"] = weightDecay
bestHyperparameters["epochs"] = bestEpoch + 1
else:
print(f"Model with LR: {learningRate}, OPT: {optimizer}, WD: {weightDecay} did not improve.")

if verbose:
print(f"\n\nBest model found with LR: {bestHyperparameters['learningRate']}, OPT: {bestHyperparameters['optimizer']}, "
f"WD: {bestHyperparameters['weightDecay']}, Epoch: {bestHyperparameters['epochs']}")
if validation > 0:
print(f"Train: {bestTrainingLoss:.4f} | Validation: {bestValidationLoss:.4f}")
else:
print(f"Train: {bestTrainingLoss:.4f}")
assert bestModel is not None, "The best model was not found. Something went wrong during the hyperparameter search."
self.adjustedModel = bestModel
del bestModel
gc.collect()

if plot:
fig1 = go.Figure()
for idx, label in enumerate(labels):
fig1.add_trace(
go.Scatter(
x=np.arange(epochs+1),
y=trainLossHistory[idx],
mode='lines+markers',
name=label)

)
fig1.update_layout(
title="Training Loss History",
xaxis_title="Epoch",
yaxis_title="Loss",
legend_title="Hyperparameters",
showlegend=True,
template="plotly_white"
)
fig1.show()
if validation > 0:
fig2 = go.Figure()
for idx, label in enumerate(labels):
fig2.add_trace(
go.Scatter(
x=np.arange(epochs+1),
y=validationLossHistory[idx],
mode='lines+markers',
name=label)
)
fig2.update_layout(
title="Validation Loss History",
xaxis_title="Epoch",
yaxis_title="Loss",
legend_title="Hyperparameters",
showlegend=True,
template="plotly_white"
)
fig2.show()

return self.adjustedModel, bestHyperparameters



@@ -317,3 +470,7 @@ class OPTIMADEAdjuster(LocalAdjuster):
settings used by that database or focusing its attention to specific chemistry like, for instance, all compounds of
Sn and all perovskites. It accepts OPTIMADE query as an input and then operates based on the ``LocalAdjuster`` class.
"""