Releases: mle-infrastructure/mle-toolbox
Releases · mle-infrastructure/mle-toolbox
Minor fixes 🔧
[v0.3.5] - [08/2024]
- Adds tfcuda behavior deterministic flag
Robustify & Fix os resource setting 🚂
[v0.3.4] - [03/2023]
Added
- Adds PBT, Successive Halving & Hyperband experiment support
- Adds report support for non-search experiments.
- Adds robust
local -> remoteexperiment launching.
Changed
- Restructures experiment wrapper
launch_experiment - Moves PBT experiment utilities to
mle-hyperopt. - Fix versions of subpackages so that dependencies are static.
- Rename
get_jax_os_readytoget_os_env_readyand includedevice_configin auto-setup ofMLExperiment.
Fixed
- Updates
mle initto work with vim editor. - Fixes all broken links in Readme.
Restructure imports - run in colab etc.
v0.3.3 Restructure imports
mle-monitor, GCS sync refactor, examples
Added
- Introduces experimental notebook-friendly
MLELauncherwhich allows you to schedule experiments from within a local notebook. - Adds
mle protocolsubcommand to get a quick view of the last experiments and their status. - Adds
mle projectto initialize a new project based on cloning themle-projectrepository.
Changed
- Refactors out resource monitoring and protocol database to
mle-monitorsub-package. - Refactors out job launching and status monitoring to
mle-launchersub-package. - Moves population-based training and hypothesis testing into
experimentalsubmodule. - Moves documentation page to
mle-docssub-repository.
mle-hyperopt, mini features & bug fixes
Added
- 3D animation post-processing helpers (
animate_3D_scatterandanimate_3D_surface) and test coverage for visualizations (static/dynamic). nevergradmulti-objective hyperparameter optimization. Checkout the toy example.- Adds
@experimentdecorator for easy integration:
from mle_toolbox import experiment
@experiment("configs/abc.json", model_config={"num_layers": 2})
def run(mle, a):
print(mle.model_config)
print(mle.log)
print(a)
if __name__ == "__main__":
run(a=2)- Adds
combine_experimentswhich loads differentmeta_logandhyper_logobjects and makes them "dot"-accessible:
experiment_dirs = ["../tests/unit/fixtures/experiment_1",
"../tests/unit/fixtures/experiment_2"]
meta, hyper = combine_experiments(experiment_dirs, aggregate_seeds=False)- Adds option to run grid search for multiple base configurations without having to create individual experiment configuration files.
Changed
- Configuration loading is now more toolbox specific.
load_json_configandload_yaml_configare now part ofmle-logging. The toolbox now has two "new" functionload_job_configandload_experiment_config, which prepare the raw configs for future usage. - The
job_configfile now no longer has to be a.jsonfile, but can (and probably should) be a.yamlfile. This makes formatting easier. The hyperoptimization pipeline will generate configuration files that are of the same file type. - Moves core hyperparameter optimization functionality to
mle-hyperopt. At this point the toolbox wraps around the search strategies and handles themle-logginglog loading/data retrieval. - Reduces test suite since all hyperopt strategy-internal tests are taken care of in
mle-hyperopt.
Fixed
- Fixed unique file naming of zip files stored in GCS bucket. Now based on the time string.
- Grid engine monitoring now also tracks waiting/pending jobs.
- Fixes a bug in the random seed setting for synchronous batch jobs. Previously a new set of seeds was sampled for each batch. This lead to problems when aggregating different logs by their seed id. Now the first set of seeds is stored and provided as an input to all subsequent
JobQueuestartups.
mle-logging, PBT & fixes
Added
- Adds general processing job, which generalizes the post-processing job and enables 'shared'/centralized data pre-processing before a (search) experiment and results post-processing/figure generation afterwards. Checkout the MNIST example.
- Adds population-based training experiment type (still experimental). Checkout the MNIST example and the simple quadratic from the paper.
- Adds a set of unit/integration tests for more robustness and
flake8linting. - Adds code coverage with secrets token.
- Adds
mle.ready_to_logbased onlog_every_k_updatesinlog_config. No more modulo confusion. - Adds slack clusterbot integration which allows for notifications and report upload.
Changed
- Allow logging of array data in meta log
.hdf5file by makingtolerant_meanwork for matrices. - Change configuration .yaml to use
experiment_typeinstead ofjob_typefor clear distinction:- job: Single submission process on resource (e.g. single seed for single configuration)
- eval: Single parameter configuration which can be executed/trained for multiple seeds (individual jobs!)
- experiment: Refers to entire sequence of jobs to be executed (e.g. grid search with pre/post-processing)
- Restructure
experimentsubdirectory intojobfor consistent naming. - Refactor out
MLELoggerinto separatemle-loggingpackage. It is a core ingredient that should stand alone.
Fixed
- Fixed
mle retrieveto be actually useful and work robustly. - Fixed
mle reportto retrieve results if they don't exist (or to use a local directory provided by the user). - Fixed
mle reportto generate reports via.htmlfile and the dependencyxhtml2pdf. - Fixed unique hash for experiment results storage. Previously this only used the content of
base_config.json, which did not result in a unique hash when running different searches viajob_config.yaml. Now the hash is generated based on a merged dictionary of the time string,base_configandjob_config
Async Job Scheduling
Added
- Adds monitoring panel for GCP in
mle monitordashboard. - Adds asynchronous job launching via new
ExperimentQueueand monitoring based onmax_running_jobsbudget. This release changes the previous job launching infrastructure. We no longer rely on one process per job, but monitor all scheduled jobs passively in a for-loop. - Adds GitHub Pages hosted documentation using mkdocs and the Material framework. The documentation is hosted under roberttlange.github.io/mle-toolbox. It is still very much work in progress.
Changed
- Adds support for additional setup bash files when launching GCP VM in
single_job_args. - Adds Q/A for upload/deletion of directory to GCS bucket.
- All GCP-CPU resources are now queried via custom machine types - Default cheap n1.
- Separate different
requirements.txtfor minimal installation, examples and testing. - Restructures the search experiment API in the
.yamlfile. We now differentiate between 3 pillars:search_logging: General options such as reloading of previous log, verbosity, metrics in.hdf5log to monitor and how to do so.search_resources: How many jobs, batches, maximum number of simultaneously running jobs, etc..search_config: Options regarding the search type (random, grid, smbo) and the parameters to search over (spaces, resolution, etc.).
Google Cloud Platform Experiment Support
Added
- Adds
HypothesisTester: Simple time average difference comparison between individual runs. With multiple testing correction and p-value plotting. Examplehypothesis_testing.ipynbnotebook. - Adds
MetaLogandHyperLogclasses: Implement convenient functionalities likehyper_log.filter()and ease the post-processing analysis. - Adds GCP job launch/monitor support for all experiment types and organizes GCS syncing of results.
Changed
load_result_logsis now directly imported withimport mle_toolboxsince it is part of the core functionality.- Major restructuring of
experimentsub-directory (local,cluster,cloud) with easy 3 part extension for new resources:monitor,launch,check_job_args
Fixed
- Fixes plotting with new
MetaLogandHyperLogclasses.
Bash Experiment, Encryption & Extra cmd inputs
Added
- Allows multi-config + multi-seed bash experiments. The user needs to take care of the input arguments (
-exp_dir,-config_fname,-seed_id) themselves and within the bash script. We provide a minimal example of how to do so in examples/bash_configs. - Adds backend functions for
monitor_slurm_clusterand local version to get resource utilisation. - Adds SHA-256 encryption/decryption of ssh credentials. Also part of initialization setup.
- Adds
extra_cmd_line_inputstosingle_job_argsso that you can add a static input via the command line. This will also be incorporated in theMLExperimentasextra_configdotmapdictionary.
Changed
- Changes plots of monitored resource utilisation to
plotextto avoid gnuplot dependency. - Changes logger interface: Now one has to provide dictionaries as inputs to
update_log. This is supposed to make the logging more robust. - Changes template files and refactor/name files in
utilssubdirectory:core_experiment: Includes helpers used in (almost) every experimentcore_files_load: Helpers used to load various core components (configs)core_files_merge: Helpers used to merge meta-logshelpers: Random small functionalities (not crucial)
- Renames
hyperoptsubdirectory:hyperopt_<type>,hyperspace,hyperlogger - Changes the naming of the config from
cctomle_configfor easy readability. - Changes the naming of files to be more intuitive: E.g.
abc_1_def.py,abc_2_def.pyare changed toabc_def_1.py,abc_def_2.py
Fixed
- Fixed local launch of remote projects via
screensession and pipping toqrshorsrun --pty bash. If you are on a local machine and runmle run, you will get to choose the remote resource and can later reattach to that resource. - Fixed 2D plot with
fixed_params. The naming as well as subtitle of the.pngfiles/plots accounts for the fixed parameter.
`mle init`, `MLE_Experiment` & refactoring
- Adds
mle initto configure template toml. The command first searches for an existing config to update. If none is found we go through the process of updating values in a default config. - Print configuration and protocol summary with rich. This gets rid of
tabulatedependency. - Update
monitor_slurm_clusterto work with newmle monitor. This gets rid ofcolorclass,terminaltablesdependencies. - Fix report generation bug (everything has to be a string for markdown-ification!).
- Fix monitor bug: No longer reload the local database at each update call.
- Adds
get_jax_os_readyhelper for setting up JAX environment variables. - Adds
load_model_ckptfor smooth reloading of stored checkpoints. - Add
MLE_Experimentabstraction for minimal imports and smooth workflow. - A lot of internal refactoring: E.g. getting rid of
multi_runnersub directory.