diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 2ddb1a81..ca92d6c3 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -40,7 +40,7 @@ repos: - id: pyupgrade args: ["--py37-plus"] - repo: https://github.com/PyCQA/isort - rev: 5.10.1 + rev: 5.12.0 hooks: - id: isort - repo: https://github.com/python/black diff --git a/README.md b/README.md index e71a2484..a78e6864 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -![tests](https://github.com/LucasAlegre/mo-gym/workflows/Python%20tests/badge.svg) +![tests](https://github.com/Farama-Foundation/mo-gymnasium/workflows/Python%20tests/badge.svg) [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![License](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](https://github.com/LucasAlegre/mo-gym/blob/main/LICENSE) [![Discord](https://img.shields.io/discord/999693014618362036?label=discord)](https://discord.gg/ygmkfnBvKA) @@ -9,11 +9,16 @@ # MO-Gymnasium: Multi-Objective Reinforcement Learning Environments + + Gymnasium environments for multi-objective reinforcement learning (MORL). The environments follow the standard [gymnasium's API](https://github.com/Farama-Foundation/Gymnasium), but return vectorized rewards as numpy arrays. For details on multi-objective MDP's (MOMDP's) and other MORL definitions, see [A practical guide to multi-objective reinforcement learning and planning](https://link.springer.com/article/10.1007/s10458-022-09552-y). + + ## Install + Via pip: ```bash @@ -27,13 +32,18 @@ cd MO-Gymnasium pip install -e . ``` + + ## Usage + + ```python import gymnasium as gym import mo_gymnasium as mo_gym +import numpy as np -env = mo_gym.make('minecart-v0') # It follows the original gym's API ... +env = mo_gym.make('minecart-v0') # It follows the original Gymnasium API ... obs = env.reset() next_obs, vector_reward, terminated, truncated, info = env.step(your_agent.act(obs)) # but vector_reward is a numpy array! @@ -46,20 +56,23 @@ env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2])) You can also check more examples in this colab notebook! [MORL-Baselines](https://github.com/LucasAlegre/morl-baselines) is a repository containing various implementations of multi-objective reinforcement learning algorithms. It relies on the MO-Gymnasium API and shows various examples of the usage of wrappers and environments. + + ## Environments -| Env | Obs/Action spaces | Objectives | Description | -|----------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| + +| Env | Obs/Action spaces | Objectives | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `deep-sea-treasure-v0`
| Discrete / Discrete | `[treasure, time_penalty]` | Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). | | `resource-gathering-v0`
| Discrete / Discrete | `[enemy, gold, gem]` | Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From [Barret & Narayanan 2008](https://dl.acm.org/doi/10.1145/1390156.1390162). | | `fishwood-v0`
| Discrete / Discrete | `[fish_amount, wood_amount]` | ESR environment, the agent must collect fish and wood to light a fire and eat. From [Roijers et al. 2018](https://www.researchgate.net/publication/328718263_Multi-objective_Reinforcement_Learning_for_the_Expected_Utility_of_the_Return). | | `fruit-tree-v0`
| Discrete / Discrete | `[nutri1, ..., nutri6]` | Full binary tree of depth d=5,6 or 7. Every leaf contains a fruit with a value for the nutrients Protein, Carbs, Fats, Vitamins, Minerals and Water. From [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). | | `breakable-bottles-v0`
| Discrete (Dictionary) / Discrete | `[time_penalty, bottles_delivered, potential]` | Gridworld with 5 cells. The agents must collect bottles from the source location and deliver to the destination. From [Vamplew et al. 2021](https://www.sciencedirect.com/science/article/pii/S0952197621000336). | | `four-room-v0`
| Discrete / Discrete | `[item1, item2, item3]` | Agent must collect three different types of items in the map and reach the goal. From [Alegre et al. 2022](https://proceedings.mlr.press/v162/alegre22a.html). | -| `water-reservoir-v0` | Continuous / Continuous | `[cost_flooding, deficit_water]` | A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam. From [Pianosi et al. 2013](https://iwaponline.com/jh/article/15/2/258/3425/Tree-based-fitted-Q-iteration-for-multi-objective). | +| `water-reservoir-v0` | Continuous / Continuous | `[cost_flooding, deficit_water]` | A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam. From [Pianosi et al. 2013](https://iwaponline.com/jh/article/15/2/258/3425/Tree-based-fitted-Q-iteration-for-multi-objective). | | `mo-mountaincar-v0`
| Continuous / Discrete | `[time_penalty, reverse_penalty, forward_penalty]` | Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From [Vamplew et al. 2011](https://www.researchgate.net/publication/220343783_Empirical_evaluation_methods_for_multiobjective_reinforcement_learning_algorithms). | -| `mo-MountainCarContinuous-v0`
| Continuous / Continuous | `[time_penalty, fuel_consumption_penalty]` | Continuous Mountain Car env, but with penalties for fuel consumption. | +| `mo-mountaincarcontinuous-v0`
| Continuous / Continuous | `[time_penalty, fuel_consumption_penalty]` | Continuous Mountain Car env, but with penalties for fuel consumption. | | `mo-lunar-lander-v2`
| Continuous / Discrete or Continuous | `[landed, shaped_reward, main_engine_fuel, side_engine_fuel]` | MO version of the "LunarLander-v2" environment. Objectives defined similarly as in [Hung et al. 2022](https://openreview.net/forum?id=AwWaBXLIJE). | | `mo-reacher-v0`
| Continuous / Discrete | `[target_1, target_2, target_3, target_4]` | Reacher robot from [PyBullet](https://github.com/benelot/pybullet-gym/blob/ec9e87459dd76d92fe3e59ee4417e5a665504f62/pybulletgym/envs/roboschool/robots/manipulators/reacher.py), but there are 4 different target positions. From [Alegre et al. 2022](https://proceedings.mlr.press/v162/alegre22a.html). | | `minecart-v0`
| Continuous or Image / Discrete | `[ore1, ore2, fuel]` | Agent must collect two types of ores and minimize fuel consumption. From [Abels et al. 2019](https://arxiv.org/abs/1809.07803v2). | @@ -68,8 +81,12 @@ You can also check more examples in this colab notebook! | `mo-halfcheetah-v4`
| Continuous / Continuous | `[velocity, energy]` | Multi-objective version of [HalfCheetah-v4](https://www.gymlibrary.ml/environments/mujoco/half_cheetah/) env. Similar to [Xu et al. 2020](https://github.com/mit-gfx/PGMORL). | | `mo-hopper-v4`
| Continuous / Continuous | `[velocity, height, energy]` | Multi-objective version of [Hopper-v4](https://www.gymlibrary.ml/environments/mujoco/hopper/) env. | + + ## Citing + + If you use this repository in your work, please cite: ```bibtex @@ -81,10 +98,16 @@ If you use this repository in your work, please cite: } ``` + + ## Acknowledgments + + * The `minecart-v0` env is a refactor of https://github.com/axelabels/DynMORL. * The `deep-sea-treasure-v0`, `fruit-tree-v0` and `mo-supermario-v0` envs are based on https://github.com/RunzheYang/MORL. * The `four-room-v0` env is based on https://github.com/mike-gimelfarb/deep-successor-features-for-transfer. * The `fishwood-v0` code was provided by Denis Steckelmacher and Conor F. Hayes. * The `water-reservoir-v0` code was provided by Mathieu Reymond. + + diff --git a/docs/api/api.md b/docs/api/api.md deleted file mode 100644 index ad5d96e8..00000000 --- a/docs/api/api.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -title: "API" ---- - -# API -The environments follow the standard [gymnasium's API](https://github.com/Farama-Foundation/Gymnasium), but return vectorized rewards as numpy arrays. - -Here is a minimal example of how to create an environment and interact with it. -```python -import gymnasium -import mo_gymnasium as mo_gym - -env = mo_gym.make('minecart-v0') # It follows the original Gymnasium API ... - -obs = env.reset() -next_obs, vector_reward, terminated, truncated, info = env.step(your_agent.act(obs)) # but vector_reward is a numpy array! - -# Optionally, you can scalarize the reward function with the LinearReward wrapper -env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2])) -``` - -[![MO-Gym Demo in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LucasAlegre/mo-gym/blob/main/mo_gymnasium_demo.ipynb) -You can also check more examples in this colab notebook! diff --git a/docs/community/community.md b/docs/community/community.md new file mode 100644 index 00000000..b389fef0 --- /dev/null +++ b/docs/community/community.md @@ -0,0 +1,12 @@ +# Community + +If you want to help us out, reach us, or simply ask questions, you can join the Farama discord server [here](https://discord.gg/WF3FqsBk). + +## Acknowledgements + +Aside from the main contributors, some people have also contributed to the project in various ways. We would like to thank them all for their contributions. + +```{include} ../../README.md +:start-after: +:end-before: +``` diff --git a/docs/environments/environments.md b/docs/environments/environments.md index c26f2973..807d187b 100644 --- a/docs/environments/environments.md +++ b/docs/environments/environments.md @@ -4,21 +4,7 @@ title: "Environments" # Available environments -| Env | Obs/Action spaces | Objectives | Description | -|----------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `deep-sea-treasure-v0`
| Discrete / Discrete | `[treasure, time_penalty]` | Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). | -| `resource-gathering-v0`
| Discrete / Discrete | `[enemy, gold, gem]` | Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From [Barret & Narayanan 2008](https://dl.acm.org/doi/10.1145/1390156.1390162). | -| `fishwood-v0`
| Discrete / Discrete | `[fish_amount, wood_amount]` | ESR environment, the agent must collect fish and wood to light a fire and eat. From [Roijers et al. 2018](https://www.researchgate.net/publication/328718263_Multi-objective_Reinforcement_Learning_for_the_Expected_Utility_of_the_Return). | -| `fruit-tree-v0`
| Discrete / Discrete | `[nutri1, ..., nutri6]` | Full binary tree of depth d=5,6 or 7. Every leaf contains a fruit with a value for the nutrients Protein, Carbs, Fats, Vitamins, Minerals and Water. From [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). | -| `breakable-bottles-v0`
| Discrete (Dictionary) / Discrete | `[time_penalty, bottles_delivered, potential]` | Gridworld with 5 cells. The agents must collect bottles from the source location and deliver to the destination. From [Vamplew et al. 2021](https://www.sciencedirect.com/science/article/pii/S0952197621000336). | -| `four-room-v0`
| Discrete / Discrete | `[item1, item2, item3]` | Agent must collect three different types of items in the map and reach the goal. From [Alegre et al. 2022](https://proceedings.mlr.press/v162/alegre22a.html). | -| `water-reservoir-v0` | Continuous / Continuous | `[cost_flooding, deficit_water]` | A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam. From [Pianosi et al. 2013](https://iwaponline.com/jh/article/15/2/258/3425/Tree-based-fitted-Q-iteration-for-multi-objective). | -| `mo-mountaincar-v0`
| Continuous / Discrete | `[time_penalty, reverse_penalty, forward_penalty]` | Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From [Vamplew et al. 2011](https://www.researchgate.net/publication/220343783_Empirical_evaluation_methods_for_multiobjective_reinforcement_learning_algorithms). | -| `mo-MountainCarContinuous-v0`
| Continuous / Continuous | `[time_penalty, fuel_consumption_penalty]` | Continuous Mountain Car env, but with penalties for fuel consumption. | -| `mo-lunar-lander-v2`
| Continuous / Discrete or Continuous | `[landed, shaped_reward, main_engine_fuel, side_engine_fuel]` | MO version of the "LunarLander-v2" environment. Objectives defined similarly as in [Hung et al. 2022](https://openreview.net/forum?id=AwWaBXLIJE). | -| `mo-reacher-v0`
| Continuous / Discrete | `[target_1, target_2, target_3, target_4]` | Reacher robot from [PyBullet](https://github.com/benelot/pybullet-gym/blob/ec9e87459dd76d92fe3e59ee4417e5a665504f62/pybulletgym/envs/roboschool/robots/manipulators/reacher.py), but there are 4 different target positions. From [Alegre et al. 2022](https://proceedings.mlr.press/v162/alegre22a.html). | -| `minecart-v0`
| Continuous or Image / Discrete | `[ore1, ore2, fuel]` | Agent must collect two types of ores and minimize fuel consumption. From [Abels et al. 2019](https://arxiv.org/abs/1809.07803v2). | -| `mo-highway-v0` and `mo-highway-fast-v0`
| Continuous / Discrete | `[speed, right_lane, collision]` | The agent's objective is to reach a high speed while avoiding collisions with neighbouring vehicles and staying on the rightest lane. From [highway-env](https://github.com/eleurent/highway-env). | -| `mo-supermario-v0`
| Image / Discrete | `[x_pos, time, death, coin, enemy]` | Multi-objective version of [SuperMarioBrosEnv](https://github.com/Kautenja/gym-super-mario-bros). Objectives are defined similarly as in [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). | -| `mo-halfcheetah-v4`
| Continuous / Continuous | `[velocity, energy]` | Multi-objective version of [HalfCheetah-v4](https://www.gymlibrary.ml/environments/mujoco/half_cheetah/) env. Similar to [Xu et al. 2020](https://github.com/mit-gfx/PGMORL). | -| `mo-hopper-v4`
| Continuous / Continuous | `[velocity, height, energy]` | Multi-objective version of [Hopper-v4](https://www.gymlibrary.ml/environments/mujoco/hopper/) env. | +```{include} ../../README.md +:start-after: +:end-before: +``` diff --git a/docs/index.md b/docs/index.md index 20219815..245eb217 100644 --- a/docs/index.md +++ b/docs/index.md @@ -4,13 +4,6 @@ firstpage: lastpage: --- -```{toctree} -:hidden: -:caption: API - -api/api -``` - ```{toctree} :hidden: :caption: Environments @@ -36,6 +29,7 @@ wrappers/wrappers :hidden: :caption: Development +community/community Github Donate @@ -43,30 +37,28 @@ Donate # MO-Gymnasium is a standardized API and a suite of environments for multi-objective reinforcement learning (MORL) -For details on multi-objective MDP's (MOMDP's) and other MORL definitions, see [A practical guide to multi-objective reinforcement learning and planning](https://link.springer.com/article/10.1007/s10458-022-09552-y). +```{include} ../README.md +:start-after: +:end-before: +``` -## Install +## API -### From Pypi -```bash -pip install mo-gymnasium +```{include} ../README.md +:start-after: +:end-before: ``` -### From source -```bash -git clone https://github.com/Farama-Foundation/MO-Gymnasium -cd MO-Gymnasium -pip install -e . +## Install + +```{include} ../README.md +:start-after: +:end-before: ``` ## Citing -If you use this repository in your work, please cite: - -```bibtex -@inproceedings{Alegre+2022bnaic, - author = {Lucas N. Alegre and Florian Felten and El-Ghazali Talbi and Gr{\'e}goire Danoy and Ann Now{\'e} and Ana L. C. Bazzan and Bruno C. da Silva}, - title = {{MO-Gym}: A Library of Multi-Objective Reinforcement Learning Environments}, - booktitle = {Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn 2022}, - year = {2022} -} + +```{include} ../README.md +:start-after: +:end-before: ``` diff --git a/pyproject.toml b/pyproject.toml index 3e456e47..7cf93b99 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -54,9 +54,9 @@ all = [ testing = ["pytest ==7.1.3"] [project.urls] -Homepage = "https://farama.org" +Homepage = "https://mo-gymnasium.farama.org" Repository = "https://github.com/Farama-Foundation/MO-Gymnasium" -Documentation = "https://gymnasium.farama.org" +Documentation = "https://mo-gymnasium.farama.org" "Bug Report" = "https://github.com/Farama-Foundation/MO-Gymnasium/issues" [tool.setuptools]