diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 2ddb1a81..ca92d6c3 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -40,7 +40,7 @@ repos:
- id: pyupgrade
args: ["--py37-plus"]
- repo: https://github.com/PyCQA/isort
- rev: 5.10.1
+ rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/python/black
diff --git a/README.md b/README.md
index e71a2484..a78e6864 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-![tests](https://github.com/LucasAlegre/mo-gym/workflows/Python%20tests/badge.svg)
+![tests](https://github.com/Farama-Foundation/mo-gymnasium/workflows/Python%20tests/badge.svg)
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](https://github.com/LucasAlegre/mo-gym/blob/main/LICENSE)
[![Discord](https://img.shields.io/discord/999693014618362036?label=discord)](https://discord.gg/ygmkfnBvKA)
@@ -9,11 +9,16 @@
# MO-Gymnasium: Multi-Objective Reinforcement Learning Environments
+
+
Gymnasium environments for multi-objective reinforcement learning (MORL). The environments follow the standard [gymnasium's API](https://github.com/Farama-Foundation/Gymnasium), but return vectorized rewards as numpy arrays.
For details on multi-objective MDP's (MOMDP's) and other MORL definitions, see [A practical guide to multi-objective reinforcement learning and planning](https://link.springer.com/article/10.1007/s10458-022-09552-y).
+
+
## Install
+
Via pip:
```bash
@@ -27,13 +32,18 @@ cd MO-Gymnasium
pip install -e .
```
+
+
## Usage
+
+
```python
import gymnasium as gym
import mo_gymnasium as mo_gym
+import numpy as np
-env = mo_gym.make('minecart-v0') # It follows the original gym's API ...
+env = mo_gym.make('minecart-v0') # It follows the original Gymnasium API ...
obs = env.reset()
next_obs, vector_reward, terminated, truncated, info = env.step(your_agent.act(obs)) # but vector_reward is a numpy array!
@@ -46,20 +56,23 @@ env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))
You can also check more examples in this colab notebook!
[MORL-Baselines](https://github.com/LucasAlegre/morl-baselines) is a repository containing various implementations of multi-objective reinforcement learning algorithms. It relies on the MO-Gymnasium API and shows various examples of the usage of wrappers and environments.
+
+
## Environments
-| Env | Obs/Action spaces | Objectives | Description |
-|----------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+
+| Env | Obs/Action spaces | Objectives | Description |
+|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `deep-sea-treasure-v0`
| Discrete / Discrete | `[treasure, time_penalty]` | Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). |
| `resource-gathering-v0`
| Discrete / Discrete | `[enemy, gold, gem]` | Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From [Barret & Narayanan 2008](https://dl.acm.org/doi/10.1145/1390156.1390162). |
| `fishwood-v0`
| Discrete / Discrete | `[fish_amount, wood_amount]` | ESR environment, the agent must collect fish and wood to light a fire and eat. From [Roijers et al. 2018](https://www.researchgate.net/publication/328718263_Multi-objective_Reinforcement_Learning_for_the_Expected_Utility_of_the_Return). |
| `fruit-tree-v0`
| Discrete / Discrete | `[nutri1, ..., nutri6]` | Full binary tree of depth d=5,6 or 7. Every leaf contains a fruit with a value for the nutrients Protein, Carbs, Fats, Vitamins, Minerals and Water. From [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). |
| `breakable-bottles-v0`
| Discrete (Dictionary) / Discrete | `[time_penalty, bottles_delivered, potential]` | Gridworld with 5 cells. The agents must collect bottles from the source location and deliver to the destination. From [Vamplew et al. 2021](https://www.sciencedirect.com/science/article/pii/S0952197621000336). |
| `four-room-v0`
| Discrete / Discrete | `[item1, item2, item3]` | Agent must collect three different types of items in the map and reach the goal. From [Alegre et al. 2022](https://proceedings.mlr.press/v162/alegre22a.html). |
-| `water-reservoir-v0` | Continuous / Continuous | `[cost_flooding, deficit_water]` | A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam. From [Pianosi et al. 2013](https://iwaponline.com/jh/article/15/2/258/3425/Tree-based-fitted-Q-iteration-for-multi-objective). |
+| `water-reservoir-v0` | Continuous / Continuous | `[cost_flooding, deficit_water]` | A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam. From [Pianosi et al. 2013](https://iwaponline.com/jh/article/15/2/258/3425/Tree-based-fitted-Q-iteration-for-multi-objective). |
| `mo-mountaincar-v0`
| Continuous / Discrete | `[time_penalty, reverse_penalty, forward_penalty]` | Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From [Vamplew et al. 2011](https://www.researchgate.net/publication/220343783_Empirical_evaluation_methods_for_multiobjective_reinforcement_learning_algorithms). |
-| `mo-MountainCarContinuous-v0`
| Continuous / Continuous | `[time_penalty, fuel_consumption_penalty]` | Continuous Mountain Car env, but with penalties for fuel consumption. |
+| `mo-mountaincarcontinuous-v0`
| Continuous / Continuous | `[time_penalty, fuel_consumption_penalty]` | Continuous Mountain Car env, but with penalties for fuel consumption. |
| `mo-lunar-lander-v2`
| Continuous / Discrete or Continuous | `[landed, shaped_reward, main_engine_fuel, side_engine_fuel]` | MO version of the "LunarLander-v2" environment. Objectives defined similarly as in [Hung et al. 2022](https://openreview.net/forum?id=AwWaBXLIJE). |
| `mo-reacher-v0`
| Continuous / Discrete | `[target_1, target_2, target_3, target_4]` | Reacher robot from [PyBullet](https://github.com/benelot/pybullet-gym/blob/ec9e87459dd76d92fe3e59ee4417e5a665504f62/pybulletgym/envs/roboschool/robots/manipulators/reacher.py), but there are 4 different target positions. From [Alegre et al. 2022](https://proceedings.mlr.press/v162/alegre22a.html). |
| `minecart-v0`
| Continuous or Image / Discrete | `[ore1, ore2, fuel]` | Agent must collect two types of ores and minimize fuel consumption. From [Abels et al. 2019](https://arxiv.org/abs/1809.07803v2). |
@@ -68,8 +81,12 @@ You can also check more examples in this colab notebook!
| `mo-halfcheetah-v4`
| Continuous / Continuous | `[velocity, energy]` | Multi-objective version of [HalfCheetah-v4](https://www.gymlibrary.ml/environments/mujoco/half_cheetah/) env. Similar to [Xu et al. 2020](https://github.com/mit-gfx/PGMORL). |
| `mo-hopper-v4`
| Continuous / Continuous | `[velocity, height, energy]` | Multi-objective version of [Hopper-v4](https://www.gymlibrary.ml/environments/mujoco/hopper/) env. |
+
+
## Citing
+
+
If you use this repository in your work, please cite:
```bibtex
@@ -81,10 +98,16 @@ If you use this repository in your work, please cite:
}
```
+
+
## Acknowledgments
+
+
* The `minecart-v0` env is a refactor of https://github.com/axelabels/DynMORL.
* The `deep-sea-treasure-v0`, `fruit-tree-v0` and `mo-supermario-v0` envs are based on https://github.com/RunzheYang/MORL.
* The `four-room-v0` env is based on https://github.com/mike-gimelfarb/deep-successor-features-for-transfer.
* The `fishwood-v0` code was provided by Denis Steckelmacher and Conor F. Hayes.
* The `water-reservoir-v0` code was provided by Mathieu Reymond.
+
+
diff --git a/docs/api/api.md b/docs/api/api.md
deleted file mode 100644
index ad5d96e8..00000000
--- a/docs/api/api.md
+++ /dev/null
@@ -1,23 +0,0 @@
----
-title: "API"
----
-
-# API
-The environments follow the standard [gymnasium's API](https://github.com/Farama-Foundation/Gymnasium), but return vectorized rewards as numpy arrays.
-
-Here is a minimal example of how to create an environment and interact with it.
-```python
-import gymnasium
-import mo_gymnasium as mo_gym
-
-env = mo_gym.make('minecart-v0') # It follows the original Gymnasium API ...
-
-obs = env.reset()
-next_obs, vector_reward, terminated, truncated, info = env.step(your_agent.act(obs)) # but vector_reward is a numpy array!
-
-# Optionally, you can scalarize the reward function with the LinearReward wrapper
-env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))
-```
-
-[![MO-Gym Demo in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LucasAlegre/mo-gym/blob/main/mo_gymnasium_demo.ipynb)
-You can also check more examples in this colab notebook!
diff --git a/docs/community/community.md b/docs/community/community.md
new file mode 100644
index 00000000..b389fef0
--- /dev/null
+++ b/docs/community/community.md
@@ -0,0 +1,12 @@
+# Community
+
+If you want to help us out, reach us, or simply ask questions, you can join the Farama discord server [here](https://discord.gg/WF3FqsBk).
+
+## Acknowledgements
+
+Aside from the main contributors, some people have also contributed to the project in various ways. We would like to thank them all for their contributions.
+
+```{include} ../../README.md
+:start-after:
+:end-before:
+```
diff --git a/docs/environments/environments.md b/docs/environments/environments.md
index c26f2973..807d187b 100644
--- a/docs/environments/environments.md
+++ b/docs/environments/environments.md
@@ -4,21 +4,7 @@ title: "Environments"
# Available environments
-| Env | Obs/Action spaces | Objectives | Description |
-|----------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `deep-sea-treasure-v0`
| Discrete / Discrete | `[treasure, time_penalty]` | Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). |
-| `resource-gathering-v0`
| Discrete / Discrete | `[enemy, gold, gem]` | Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From [Barret & Narayanan 2008](https://dl.acm.org/doi/10.1145/1390156.1390162). |
-| `fishwood-v0`
| Discrete / Discrete | `[fish_amount, wood_amount]` | ESR environment, the agent must collect fish and wood to light a fire and eat. From [Roijers et al. 2018](https://www.researchgate.net/publication/328718263_Multi-objective_Reinforcement_Learning_for_the_Expected_Utility_of_the_Return). |
-| `fruit-tree-v0`
| Discrete / Discrete | `[nutri1, ..., nutri6]` | Full binary tree of depth d=5,6 or 7. Every leaf contains a fruit with a value for the nutrients Protein, Carbs, Fats, Vitamins, Minerals and Water. From [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). |
-| `breakable-bottles-v0`
| Discrete (Dictionary) / Discrete | `[time_penalty, bottles_delivered, potential]` | Gridworld with 5 cells. The agents must collect bottles from the source location and deliver to the destination. From [Vamplew et al. 2021](https://www.sciencedirect.com/science/article/pii/S0952197621000336). |
-| `four-room-v0`
| Discrete / Discrete | `[item1, item2, item3]` | Agent must collect three different types of items in the map and reach the goal. From [Alegre et al. 2022](https://proceedings.mlr.press/v162/alegre22a.html). |
-| `water-reservoir-v0` | Continuous / Continuous | `[cost_flooding, deficit_water]` | A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam. From [Pianosi et al. 2013](https://iwaponline.com/jh/article/15/2/258/3425/Tree-based-fitted-Q-iteration-for-multi-objective). |
-| `mo-mountaincar-v0`
| Continuous / Discrete | `[time_penalty, reverse_penalty, forward_penalty]` | Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From [Vamplew et al. 2011](https://www.researchgate.net/publication/220343783_Empirical_evaluation_methods_for_multiobjective_reinforcement_learning_algorithms). |
-| `mo-MountainCarContinuous-v0`
| Continuous / Continuous | `[time_penalty, fuel_consumption_penalty]` | Continuous Mountain Car env, but with penalties for fuel consumption. |
-| `mo-lunar-lander-v2`
| Continuous / Discrete or Continuous | `[landed, shaped_reward, main_engine_fuel, side_engine_fuel]` | MO version of the "LunarLander-v2" environment. Objectives defined similarly as in [Hung et al. 2022](https://openreview.net/forum?id=AwWaBXLIJE). |
-| `mo-reacher-v0`
| Continuous / Discrete | `[target_1, target_2, target_3, target_4]` | Reacher robot from [PyBullet](https://github.com/benelot/pybullet-gym/blob/ec9e87459dd76d92fe3e59ee4417e5a665504f62/pybulletgym/envs/roboschool/robots/manipulators/reacher.py), but there are 4 different target positions. From [Alegre et al. 2022](https://proceedings.mlr.press/v162/alegre22a.html). |
-| `minecart-v0`
| Continuous or Image / Discrete | `[ore1, ore2, fuel]` | Agent must collect two types of ores and minimize fuel consumption. From [Abels et al. 2019](https://arxiv.org/abs/1809.07803v2). |
-| `mo-highway-v0` and `mo-highway-fast-v0`
| Continuous / Discrete | `[speed, right_lane, collision]` | The agent's objective is to reach a high speed while avoiding collisions with neighbouring vehicles and staying on the rightest lane. From [highway-env](https://github.com/eleurent/highway-env). |
-| `mo-supermario-v0`
| Image / Discrete | `[x_pos, time, death, coin, enemy]` | Multi-objective version of [SuperMarioBrosEnv](https://github.com/Kautenja/gym-super-mario-bros). Objectives are defined similarly as in [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). |
-| `mo-halfcheetah-v4`
| Continuous / Continuous | `[velocity, energy]` | Multi-objective version of [HalfCheetah-v4](https://www.gymlibrary.ml/environments/mujoco/half_cheetah/) env. Similar to [Xu et al. 2020](https://github.com/mit-gfx/PGMORL). |
-| `mo-hopper-v4`
| Continuous / Continuous | `[velocity, height, energy]` | Multi-objective version of [Hopper-v4](https://www.gymlibrary.ml/environments/mujoco/hopper/) env. |
+```{include} ../../README.md
+:start-after:
+:end-before:
+```
diff --git a/docs/index.md b/docs/index.md
index 20219815..245eb217 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -4,13 +4,6 @@ firstpage:
lastpage:
---
-```{toctree}
-:hidden:
-:caption: API
-
-api/api
-```
-
```{toctree}
:hidden:
:caption: Environments
@@ -36,6 +29,7 @@ wrappers/wrappers
:hidden:
:caption: Development
+community/community
Github
Donate
@@ -43,30 +37,28 @@ Donate
# MO-Gymnasium is a standardized API and a suite of environments for multi-objective reinforcement learning (MORL)
-For details on multi-objective MDP's (MOMDP's) and other MORL definitions, see [A practical guide to multi-objective reinforcement learning and planning](https://link.springer.com/article/10.1007/s10458-022-09552-y).
+```{include} ../README.md
+:start-after:
+:end-before:
+```
-## Install
+## API
-### From Pypi
-```bash
-pip install mo-gymnasium
+```{include} ../README.md
+:start-after:
+:end-before:
```
-### From source
-```bash
-git clone https://github.com/Farama-Foundation/MO-Gymnasium
-cd MO-Gymnasium
-pip install -e .
+## Install
+
+```{include} ../README.md
+:start-after:
+:end-before:
```
## Citing
-If you use this repository in your work, please cite:
-
-```bibtex
-@inproceedings{Alegre+2022bnaic,
- author = {Lucas N. Alegre and Florian Felten and El-Ghazali Talbi and Gr{\'e}goire Danoy and Ann Now{\'e} and Ana L. C. Bazzan and Bruno C. da Silva},
- title = {{MO-Gym}: A Library of Multi-Objective Reinforcement Learning Environments},
- booktitle = {Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn 2022},
- year = {2022}
-}
+
+```{include} ../README.md
+:start-after:
+:end-before:
```
diff --git a/pyproject.toml b/pyproject.toml
index 3e456e47..7cf93b99 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -54,9 +54,9 @@ all = [
testing = ["pytest ==7.1.3"]
[project.urls]
-Homepage = "https://farama.org"
+Homepage = "https://mo-gymnasium.farama.org"
Repository = "https://github.com/Farama-Foundation/MO-Gymnasium"
-Documentation = "https://gymnasium.farama.org"
+Documentation = "https://mo-gymnasium.farama.org"
"Bug Report" = "https://github.com/Farama-Foundation/MO-Gymnasium/issues"
[tool.setuptools]