Conversation
… compute, multi-gpu support
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #651 +/- ##
===========================================
- Coverage 39.30% 28.81% -10.49%
===========================================
Files 30 35 +5
Lines 1743 2513 +770
===========================================
+ Hits 685 724 +39
- Misses 1058 1789 +731 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
The PR creates the inaugural benchmarking feature for Garden with Matbench Discovery.
Discussion
Everything lives in a new
garden_ai.benchmarksmodule. Here is an example script running the full matbench discovery benchamrk on MACE (more in thegarden_ai/benchmarks/matbench_discovery/examples/folder):The benchmark tasks are implemented as
@hog.method()s on theMatbenchDiscoveryclass. This makes it easy to run on remote sites through globus-compute, or for someone to use the garden SDK directly on a system they have access to using a.local()call. This has the downside that thetasks.pyfile is pretty huge since we need all of the logic for running the benchmark and calculating the metrics all in one file, but has the benefit that we don't need to addmatbench_discoveryas a dependency for the garden SDK since groundhog will install it in the venv it creates to run the functions.I implemented the full list of tasks, defined by the
matbench_discovery.enums.Taskenum:So in theory, any model that can be used as an ASE calculator can run the benchmark no matter what it is trained to do. I have only tested MACE, SevenNet, Mattersim, and EquiformerV2.
The general idea is that you pass in a
model_factorywhich is a function that build and returns a model instance, and themodel_packageswhich is the list of python packages the model factory needs to run. These are called by the tasks in the venv to setup the model for benchmarking. It currently only supports models that are pip-installable, but shouldn't be too hard to pull down an instantiate a model from git (or the future project formerly known as Graft).Since it takes ~20 gpu hours to run the full benchmark, I implemented a checkpoint/resume system that writes calculated energies and the index of each processed structure to a JSON file in
~/.garden/benchmarks/on the system running the benchmark. If you give a.submit(), .remote(), .local()call acheckpoint_pathkwarg, it will look there for an existing checkpoint file and figure out which structures have already been processed, and resume from there. We print the checkpoint path to stdout when the job starts, and also attach it to the future we get back from a.submit()call. So you can grab the checkpoint path like this:Metrics Calculations
I reuse the metric calculation functions matbench uses internally, but had issues importing the function directly from matbench, so I elected to copy the implementation into the
tasks.pyfile. We reproduce the key metrics in the official matbench leaderboard and add some of our own more 'meta' metrics. Here is an example blob of metrics from a MACE run:TODO add some metrics when the running job finishesPublishing results
Super users can publish results to the official garden leader board using the
publish_benchmark_resulthelper function. Regular users can call the function, but the backend will reject non-super user's requests.Testing
Manually tested a bunch of times using the scripts in
garden_ai/benchmarks/matbench_discovery/examples/Documentation
No documentation updates yet. But I will need to write up some tutorials to help users figure out how to run/debug it.
📚 Documentation preview 📚: https://garden-ai--651.org.readthedocs.build/en/651/