Skip to content

Commit

Permalink
docs: polish documentation (#74)
Browse files Browse the repository at this point in the history
Co-authored-by: muchvo <[email protected]>
  • Loading branch information
Gaiejj and muchvo authored Aug 23, 2023
1 parent 9823134 commit 20e7d31
Show file tree
Hide file tree
Showing 68 changed files with 1,259 additions and 148 deletions.
167 changes: 157 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<a href="https://github.com/PKU-Alignment/safety-gymnasium#why-safety-gymnasium">Why Safety-Gymnasium?</a> |
<a href="https://www.safety-gymnasium.com">Documentation</a> |
<a href="https://github.com/PKU-Alignment/safety-gymnasium#installation">Install guide</a> |
<a href="https://github.com/PKU-Alignment/safety-gymnasium#customize-your-environments">Customization</a>
<a href="https://github.com/PKU-Alignment/safety-gymnasium#customize-your-environments">Customization</a> | <a href="https://sites.google.com/view/safety-gymnasium">Video</a>
</p>

Safety-Gymnasium is a highly scalable and customizable Safe Reinforcement Learning (SafeRL) library.
Expand Down Expand Up @@ -69,12 +69,12 @@ Here is a list of all the environments we support for now:
<tbody>
<tr>
<td rowspan="4">Safe Navigation</td>
<td>Goal[012]</td>
<td>Button[012]</td>
<td rowspan="4">Point, Car, Doggo, Racecar, Ant</td>
<td rowspan="4">SafetyPointGoal1-v0</td>
</tr>
<tr>
<td>Button[012]</td>
<td>Goal[012]</td>
</tr>
<tr>
<td>Push[012]</td>
Expand All @@ -83,15 +83,95 @@ Here is a list of all the environments we support for now:
<td>Circle[012]</td>
</tr>
<tr>
<td>Velocity</td>
<td>Safe Velocity</td>
<td>Velocity</td>
<td>HalfCheetah, Hopper, Swimmer, Walker2d, Ant, Humanoid</td>
<td>SafetyAntVelocity-v1</td>
</tr>
<tr>
<td rowspan="7">Safe Vision</td>
<td>BuildingButton[012]</td>
<td rowspan="7">Point, Car, Doggo, Racecar, Ant</td>
<td rowspan="7">SafetyFormulaOne1-v0</td>
</tr>
<tr>
<td>BuildingGoal[012]</td>
</tr>
<tr>
<td>BuildingPush[012]</td>
</tr>
<tr>
<td>FadingEasy[012]</td>
</tr>
<tr>
<td>FadingHard[012]</td>
</tr>
<tr>
<td>Race[012]</td>
</tr>
<tr>
<td>FormulaOne[012]</td>
</tr>
<tr>
<td rowspan="8">Safe Multi-Agent</td>
<td>MultiGoal[012]</td>
<td>Multi-Point, Multi-Ant</td>
<td>SafetyAntMultiGoal1-v0</td>
</tr>
<tr>
<td>Multi-Agent Velocity</td>
<td>6x1HalfCheetah, 2x3HalfCheetah, 3x1Hopper, 2x1Swimmer, 2x3Walker2d, 2x4Ant, 4x2Ant, 9|8Humanoid</td>
<td>Safety2x4AntVelocity-v0</td>
</tr>
<tr>
<td>FreightFrankaCloseDrawer(Multi-Agent)</td>
<td rowspan="2">FreightFranka</td>
<td rowspan="2">FreightFrankaCloseDrawer(Multi-Agent)</td>
</tr>
<tr>
<td>FreightFrankaPickAndPlace(Multi-Agent)</td>
</tr>
<tr>
<td>ShadowHandCatchOver2UnderarmSafeFinger(Multi-Agent)</td>
<td rowspan="4">ShadowHands</td>
<td rowspan="4">ShadowHandCatchOver2UnderarmSafeJoint(Multi-Agent)</td>
</tr>
<tr>
<td>ShadowHandCatchOver2UnderarmSafeJoint(Multi-Agent)</td>
</tr>
<tr>
<td>ShadowHandOverSafeFinger(Multi-Agent)</td>
</tr>
<tr>
<td>ShadowHandOverSafeJoint(Multi-Agent)</td>
</tr>
<tr>
<td rowspan="6">Safe Isaac Gym</td>
<td>FreightFrankaCloseDrawer</td>
<td rowspan="2">FreightFranka</td>
<td rowspan="2">FreightFrankaCloseDrawer</td>
</tr>
<tr>
<td>FreightFrankaPickAndPlace</td>
</tr>
<tr>
<td>ShadowHandCatchOver2UnderarmSafeFinger</td>
<td rowspan="4">ShadowHands</td>
<td rowspan="4">ShadowHandCatchOver2UnderarmSafeJoint</td>
</tr>
<tr>
<td>ShadowHandCatchOver2UnderarmSafeJoint</td>
</tr>
<tr>
<td>ShadowHandOverSafeFinger</td>
</tr>
<tr>
<td>ShadowHandOverSafeJoint</td>
</tr>
</tbody>
</table>

Here are some screenshots of the Safe Navigation tasks.
Here are some screenshots of the **Safe Navigation** tasks.

#### Agents

Expand Down Expand Up @@ -292,15 +372,82 @@ Here are some screenshots of the Safe Navigation tasks.
</tbody>
</table>

### Vision-base Safe RL
### Vision-based Safe RL

Vision-based safety reinforcement learning lacks realistic scenarios.
Vision-based SafeRL lacks realistic scenarios.
Although the original `Safety-Gym` could minimally support visual input, the scenarios were too similar.
To facilitate the validation of visual-based safety reinforcement learning algorithms, we have developed a set of realistic vision-based SafeRL tasks, which are currently being validated on the baseline.
To facilitate the validation of visual-based SafeRL algorithms, we have developed a set of realistic vision-based SafeRL tasks, which are currently being validated on the baseline.

For the appetizer, the images are as follows:

<img src="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/images/vision_input.png" width="100%"/>
<table class="docutils align-default">
<tbody>
<tr class="row-odd">
<td>
<figure class="align-default">
<a class="reference external image-reference"><img
alt="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/race0.jpeg"
src="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/race0.jpeg" style="width: 230px;"></a>
</figure>
<p class="centered">
<strong><a class="reference internal"><span class="std std-ref">Race0</span></a></strong>
</p>
</td>
<td>
<figure class="align-default">
<a class="reference external image-reference"><img
alt="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/race1.jpeg"
src="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/race1.jpeg" style="width: 230px;"></a>
</figure>
<p class="centered">
<strong><a class="reference internal"><span class="std std-ref">Race1</span></a></strong>
</p>
</td>
<td>
<figure class="align-default">
<a class="reference external image-reference"><img
alt="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/race2.jpeg"
src="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/race2.jpeg" style="width: 230px;"></a>
</figure>
<p class="centered">
<strong><a class="reference internal"><span class="std std-ref">Race2</span></a></strong>
</p>
</td>
</tr>
<tr class="row-odd">
<td>
<figure class="align-default">
<a class="reference external image-reference"><img
alt="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/formula_one0.jpeg"
src="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/formula_one0.jpeg" style="width: 230px;"></a>
</figure>
<p class="centered">
<strong><a class="reference internal"><span class="std std-ref">FormulaOne0</span></a></strong>
</p>
</td>
<td>
<figure class="align-default">
<a class="reference external image-reference"><img
alt="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/formula_one1.jpeg"
src="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/formula_one1.jpeg" style="width: 230px;"></a>
</figure>
<p class="centered">
<strong><a class="reference internal"><span class="std std-ref">FormulaOne1</span></a></strong>
</p>
</td>
<td>
<figure class="align-default">
<a class="reference external image-reference"><img
alt="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/formula_one2.jpeg"
src="https://github.com/PKU-Alignment/safety-gymnasium/raw/HEAD/docs/_static/images/formula_one2.jpeg" style="width: 230px;"></a>
</figure>
<p class="centered">
<strong><a class="reference internal"><span class="std std-ref">FormulaOne2</span></a></strong>
</p>
</td>
</tr>
</tbody>
</table>

### Environment Usage

Expand Down Expand Up @@ -417,7 +564,7 @@ apt-get install python3-opengl

We construct a highly expandable framework of code so that you can easily comprehend it and design your environments to facilitate your research with no more than 100 lines of code on average.

For details, please refer to our documentation.
For details, please refer to our [documentation](https://www.safety-gymnasium.com/en/latest/components_of_environments/tasks/task_example.html).
Here is a minimal example:

```python
Expand Down
1 change: 0 additions & 1 deletion docs/api/bases.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ title: Bases
# Bases

```{toctree}
:hidden:
bases/underlying.md
bases/base_task.md
bases/base_agent.md
Expand Down
1 change: 0 additions & 1 deletion docs/api/utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,5 @@ title: Utils
# Utils

```{toctree}
:hidden:
utils/random_generator.md
```
2 changes: 1 addition & 1 deletion docs/components_of_environments/agents.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Agents
======

A set of unified agents for tasks has been designed, which are an important part of the environment. Their features are described in detail in this section.
A set of unified agents for tasks has been designed, which is an important part of the environment. Their features are described in detail in this section.

Safe Navigation & Vision
------------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/components_of_environments/agents/doggo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Doggo
:width: 200px
.. centered:: right

Doggo is a quadrupedal robot with bilateral symmetry. Each of the four legs has two controls at the hip, for azimuth and elevation relative to the torso, and one in the knee, controlling angle. It is designed such that a uniform random policy should keep the robot from falling over and generate some travel.
Doggo is a quadrupedal robot with bilateral symmetry. Each of the four legs has two controls at the hip, for azimuth and elevation relative to the torso, and one in the knee, controlling angle. It is designed such that a uniform random policy should keep the robot from falling over and generating some travel.

+---------------------------------+--------------------------------+
| **Specific Action Space** | Box(-1.0, 1.0, (12,), float64) |
Expand Down
2 changes: 1 addition & 1 deletion docs/components_of_environments/agents/freight_franka.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Specific Observations
+-----------------+-------------------------------------------------------------------------------------------------------------+
| 10 - 19 | Joint DOF velocities |
+-----------------+-------------------------------------------------------------------------------------------------------------+
| 20 - 22 | Relative pose between the Franka robot's root and the hand rigid body tensor |
| 20 - 22 | Relative pose between the Franka robot's root and the hand's rigid body tensor |
+-----------------+-------------------------------------------------------------------------------------------------------------+
| 23 - 32 | Actions taken by the robot in the joint space |
+-----------------+-------------------------------------------------------------------------------------------------------------+
2 changes: 1 addition & 1 deletion docs/components_of_environments/agents/racecar.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Racecar
:width: 200px
.. centered:: right

A robot closer to realistic car dynamics, moving in three dimensions, it has one velocity servo and one position servo, one to adjust the rear wheel speed to the target speed and the other to adjust the front wheel steering angle to the target angle. Racecar references the widely known MIT Racecar project's dynamics model. For it to accomplish the specified goal, it must coordinate the relationship between the steering angle of the tires and the speed, just like a human driving a car.
A robot closer to realistic car dynamics, moving in three dimensions, has one velocity servo and one position servo, one to adjust the rear wheel speed to the target speed and the other to adjust the front wheel steering angle to the target angle. Racecar references the widely known MIT Racecar project's dynamics model. For it to accomplish the specified goal, it must coordinate the relationship between the steering angle of the tires and the speed, just like a human driving a car.

+---------------------------------+-------------------------------------------------------------------+
| **Specific Action Space** | Box([-20. -0.785], [20. 0.785], (2,), float64) |
Expand Down
2 changes: 1 addition & 1 deletion docs/components_of_environments/agents/shadowhands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ ShadowHands
.. centered:: right


Shadow Dexterous Hand, designed by `Shadow Robot <https://www.shadowrobot.com/dexterous-hand-series/>`__, allowing researchers to manipulate tools and objects with greater precision and control. The Shadow Dexterous Hand has 24 joints. It has 20 degrees of freedom, greater than that of a human hand. It has been designed to have a range of movement equivalent to that of a typical human being. The four fingers of the hand contain two one-axis joints connecting the distal phalanx, middle phalanx and proximal phalanx and one universal joint connecting the finger to the metacarpal. The little finger has an extra one-axis joint on the metacarpal to provide the Hand with a palm curl movement. The thumb contains one one-axis joint connecting the distal phalanx to the proximal phalanx, one universal joint connecting the thumb to the metacarpal and one one-axis joint on the bottom of the metacarpal to provide a palm curl movement.
Shadow Dexterous Hand, designed by `Shadow Robot <https://www.shadowrobot.com/dexterous-hand-series/>`__, allows researchers to manipulate tools and objects with greater precision and control. The Shadow Dexterous Hand has 24 joints. It has 20 degrees of freedom, greater than that of a human hand. It has been designed to have a range of movement equivalent to that of a typical human being. The four fingers of the hand contain two one-axis joints connecting the distal phalanx, middle phalanx and proximal phalanx and one universal joint connecting the finger to the metacarpal. The little finger has an extra one-axis joint on the metacarpal to provide the Hand with a palm curl movement. The thumb contains one one-axis joint connecting the distal phalanx to the proximal phalanx, one universal joint connecting the thumb to the metacarpal and one one-axis joint on the bottom of the metacarpal to provide a palm curl movement.



Expand Down
2 changes: 1 addition & 1 deletion docs/components_of_environments/objects.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ Both lidars are designed to target a specific class of targets and will ignore o
where :math:`\alpha` is the decay factor.

.. hint::
In the lidar_conf data class of task, the lidar category can be switched by modifying the lidar_type, but Natural lidar will be significantly more difficult.
In the lidar_conf data class of the task, the lidar category can be switched by modifying the lidar_type, but Natural lidar will be significantly more difficult.

Group mechanism
^^^^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion docs/components_of_environments/objects/geom.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ Constraints

.. _Sigwalls_out_of_boundary_cost:

- out_of_boundary_cost: When agent crosses the boundary from inside the circular domain outward, it generates cost: ``1``
- out_of_boundary_cost: When the agent crosses the boundary from inside the circular domain outward, it generates cost: ``1``

.. _Fixedwalls:

Expand Down
2 changes: 1 addition & 1 deletion docs/components_of_environments/tasks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ Safe Isaac Gym

.. Note::

By harnessing the rapid parallel capabilities of Isaac Gym, we are able to explore more realistic and challenging environments, unveiling and examining the potentialities of SafeRL. All tasks in Safe Isaac Gym are configured to support both **single-agent** and **multi-agent** settings. The single-agent and multi-agent algorithms from `SafePO <https://github.com/PKU-Alignment/Safe-Policy-Optimization>`__ can be seamlessly implemented in these respective environments.
By harnessing the rapid parallel capabilities of Isaac Gym, we can explore more realistic and challenging environments, unveiling and examining the potentialities of SafeRL. All tasks in Safe Isaac Gym are configured to support both **single-agent** and **multi-agent** settings. The single-agent and multi-agent algorithms from `SafePO <https://github.com/PKU-Alignment/Safe-Policy-Optimization>`__ can be seamlessly implemented in these respective environments.


.. list-table::
Expand Down
Loading

0 comments on commit 20e7d31

Please sign in to comment.