EMLE training #35

kzinovjev · 2024-11-02T15:22:33Z

This implements EMLETrainer class and emle-trainer script that exposes training as CLI. The reference data is expected in the same format as emle-analyse (a tarball with ORCA and horton outputs). This should facilitate active learning, where error analysis and retraining is performed for the same data.

Main changes to the common codebase:

All EMLE-specifie AEV logic is moved to EMLEAEVComputer class. This includes masking, normalization, and mapping between EMLE and AEVComputer zid values (the latter only needed when using a common computer with ani2x)
A small increment (1e-16) was added in various places in EMLEBase to avoid nan in the gradients w.r.t. EMLE parameters. Before it was not needed, since the gradients were only calculated w.r.t. atomic positions
aev_mean feature was removed

# Conflicts: # emle/train/_aev_calculator.py # emle/train/_trainer.py # emle/train/_utils.py

…le-train

…ckends in _analyzer.py

…le-train

…e-analyze

…le-train

kzinovjev · 2024-11-14T13:37:59Z

Ready to go. Changes:

Numerous bugfixes in both the core and emle-analyze and emle-train scripts and API. Tested training for both QM7-based model and a custom one. Analysis also now works well (fixed multiple unit-related bugs).
Fixed units of the embedding forces in EMLECalculator. Propagation now seems to be stable.
Updated docstrings for EMLEAnalyzer
Added a short README section for emle-analyze and emle-train
Changed logging to loguru

@JMorado, feel free to add to the list if I forgot anything substantial!

lohedges · 2024-11-14T13:52:39Z

Many thanks for this. I've merged into my feature_train branch, blackened, and am now running the CI against the updated AEV model. Everything passes locally and the sire-emle interface works without issue.

JMorado · 2024-11-14T14:45:44Z

Many thanks to both of you. Looks good to me.

kzinovjev and others added 30 commits October 23, 2024 18:38

A (very) rough sketch of emle-train

b592076

Merge branch 'chemle:main' into emle-train

798c4d2

Created entrypoint and directory for the emle-train code

6dfefc7

Add computation of q_core

99bc36e

Add AEVCalculator module

0d6a9f2

Add module with batch data structures for training

8b9fc8c

Add utils module

897f61a

Add IVM module

de0cc7f

Add updates to EMLETrainer

bfc397e

Add module with loss functions

1fa8abc

Clean __init__ and add _trainer.py file

0bf481f

Update mapping on the AEVCalculator

0769c3a

Update batching

593c782

Miscelanneous fixes

617e970

Remove numpy conversion from the AEVCalculator

155402a

A (very) rough sketch of emle-train

5139a2c

Created entrypoint and directory for the emle-train code

cba197b

Add computation of q_core

13511b6

Add AEVCalculator module

092c952

Add module with batch data structures for training

c3e519d

Add utils module

ff6ca9d

Add IVM module

59d531c

Add updates to EMLETrainer

3cfe71f

Add module with loss functions

d1370e0

Clean __init__ and add _trainer.py file

764e95c

Merge branch 'main' into emle-train

d018233

Merge remote-tracking branch 'origin/emle-train' into emle-train

da9c13e

# Conflicts: # emle/train/_aev_calculator.py # emle/train/_trainer.py # emle/train/_utils.py

Implement EMLEAEVComputer

2a45a27

Add EMLEAEVComputer to exported classes

e0609dd

Add default hypers to EMLEAEVComputer

7a3703a

kzinovjev and others added 25 commits November 13, 2024 15:39

Merge remote-tracking branch 'origin/emle-train' into emle-train

c9c7e12

Do not assign default torch device in BaseBackend

32694a8

Replace print statements with logging

179c84b

Merge branch 'emle-train' of github.com:kzinovjev/emle-engine into em…

8fa1548

…le-train

Add logger initialization logic

3783d29

Always return energy in kcal/mol and forces in kcal/mol/A from the ba…

65515c9

…ckends in _analyzer.py

Fix docstring in forward method

4df218b

Convert EMLE energies to kcal/mol in EMLEAnalyzer

b54ffd1

Merge remote-tracking branch 'origin/emle-train' into emle-train

27a8d2e

Fix placement and dtype of tensors

a38a077

Merge branch 'emle-train' of github.com:kzinovjev/emle-engine into em…

bb4e5d8

…le-train

Fix plot data

5804f75

Fix mesh calculation in EMLEAnalyzer (wrong xyz units)

3e8ce00

Write out static energy predictions with exact MBIS properties in eml…

40e9226

…e-analyze

Merge remote-tracking branch 'origin/emle-train' into emle-train

add2a8c

Fix embedding forces units in calculator

44095bb

Add --pc-xyz-file CLI argument

e5c752f

Add README section on coordinates logging

94b4472

Detach tensors

bc227ee

Merge branch 'emle-train' of github.com:kzinovjev/emle-engine into em…

3bc131d

…le-train

Add README sections for emle-analyze and emle-train

e551c08

Remove -q argument from emle-analyze (infer automatically from MBIS)

d875161

Merge remote-tracking branch 'origin/emle-train' into emle-train

b5a3125

Merge branch 'main' into emle-train

db2843b

Update docstrings in EMLEAnalyzer

56ef30c

kzinovjev reopened this Nov 14, 2024

lohedges merged commit 56ef30c into chemle:main Nov 14, 2024
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EMLE training #35

EMLE training #35

kzinovjev commented Nov 2, 2024

kzinovjev commented Nov 14, 2024

lohedges commented Nov 14, 2024

JMorado commented Nov 14, 2024

EMLE training #35

EMLE training #35

Conversation

kzinovjev commented Nov 2, 2024

kzinovjev commented Nov 14, 2024

lohedges commented Nov 14, 2024

JMorado commented Nov 14, 2024