Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
99f22b9
fix memory arguments and make update_distribution private
ClemensSchwarke Oct 8, 2025
b7f568e
switch to ruff and update licenses
ClemensSchwarke Oct 8, 2025
afad146
remove empty lines after docstrings
ClemensSchwarke Oct 8, 2025
be07ba7
reformat yield
ClemensSchwarke Oct 8, 2025
9c8778a
replace single quotes
ClemensSchwarke Oct 8, 2025
cfde2df
fix initial ruff errors
ClemensSchwarke Oct 9, 2025
dee6720
add pep8-naming, tidy-imports, perflint, ruff rules
ClemensSchwarke Oct 9, 2025
4de6f15
add annotations rules
ClemensSchwarke Oct 9, 2025
cf86da0
add types
ClemensSchwarke Oct 10, 2025
743c60f
python 3.9
ClemensSchwarke Oct 10, 2025
2d5240d
return type none
ClemensSchwarke Oct 10, 2025
af24fcf
fix all typing errors
ClemensSchwarke Oct 10, 2025
c896924
fix docstrings
ClemensSchwarke Oct 10, 2025
f03e34f
fix license format
ClemensSchwarke Oct 10, 2025
2806aec
Merge branch 'main' into fix/formatting
ClemensSchwarke Oct 10, 2025
9f2a827
add __all__ to init functions
ClemensSchwarke Oct 10, 2025
56415cf
fix comments
ClemensSchwarke Oct 13, 2025
7e3e65d
fix docstring formatting
ClemensSchwarke Oct 13, 2025
128aa1c
minor readme changes
ClemensSchwarke Oct 13, 2025
aeb326c
functional fixes
ClemensSchwarke Oct 13, 2025
93f9ba8
Fix type hints
ClemensSchwarke Oct 13, 2025
0823b34
correct type hint for tuple of multiple tensors
ClemensSchwarke Oct 13, 2025
f36f24b
minor clarification
ClemensSchwarke Oct 13, 2025
7bf5573
remove _map_path from wandb logger
ClemensSchwarke Oct 17, 2025
e935b5a
name none variables in rollout storage
ClemensSchwarke Oct 17, 2025
55619e2
rename hidden_states and define type
ClemensSchwarke Oct 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 0 additions & 22 deletions .flake8

This file was deleted.

29 changes: 5 additions & 24 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,40 +1,21 @@
repos:
- repo: https://github.com/python/black
rev: 23.10.1
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.14.0
hooks:
- id: black
args: ["--line-length", "120", "--preview"]
- repo: https://github.com/pycqa/flake8
rev: 6.1.0
hooks:
- id: flake8
additional_dependencies: [flake8-simplify, flake8-return]
- id: ruff-check
- id: ruff-format
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: check-symlinks
- id: destroyed-symlinks
- id: check-yaml
- id: check-toml
- id: check-merge-conflict
- id: check-case-conflict
- id: check-executables-have-shebangs
- id: check-toml
- id: end-of-file-fixer
- id: check-shebang-scripts-are-executable
- id: detect-private-key
- id: debug-statements
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
name: isort (python)
args: ["--profile", "black", "--filter-files"]
- repo: https://github.com/asottile/pyupgrade
rev: v3.15.0
hooks:
- id: pyupgrade
args: ["--py37-plus"]
- repo: https://github.com/codespell-project/codespell
rev: v2.2.6
hooks:
Expand Down
4 changes: 3 additions & 1 deletion CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,14 @@ Please keep the lists sorted alphabetically.

---

* Mayank Mittal
* Clemens Schwarke
* Mayank Mittal

## Authors

* Clemens Schwarke
* David Hoeller
* Mayank Mittal
* Nikita Rudin

## Contributors
Expand Down
14 changes: 5 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
# RSL RL
# RSL-RL

A fast and simple implementation of RL algorithms, designed to run fully on GPU.
This code is an evolution of `rl-pytorch` provided with NVIDIA's Isaac Gym.
A fast and simple implementation of learning algorithms for robotics. For an overview of the library please have a look at https://arxiv.org/pdf/2509.10771.

Environment repositories using the framework:

* **`Isaac Lab`** (built on top of NVIDIA Isaac Sim): https://github.com/isaac-sim/IsaacLab
* **`Legged-Gym`** (built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/
* **`Legged Gym`** (built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/
* **`MuJoCo Playground`** (built on top of MuJoCo MJX and Warp): https://github.com/google-deepmind/mujoco_playground/

The main branch supports **PPO** and **Student-Teacher Distillation** with additional features from our research. These include:
The library currently supports **PPO** and **Student-Teacher Distillation** with additional features from our research. These include:

* [Random Network Distillation (RND)](https://proceedings.mlr.press/v229/schwarke23a.html) - Encourages exploration by adding
a curiosity driven intrinsic reward.
Expand All @@ -22,8 +21,6 @@ information.
**Affiliation**: Robotic Systems Lab, ETH Zurich & NVIDIA <br/>
**Contact**: [email protected]

> **Note:** The `algorithms` branch supports additional algorithms (SAC, DDPG, DSAC, and more). However, it isn't currently actively maintained.


## Setup

Expand Down Expand Up @@ -57,8 +54,7 @@ For documentation, we adopt the [Google Style Guide](https://sphinxcontrib-napol
We use the following tools for maintaining code quality:

- [pre-commit](https://pre-commit.com/): Runs a list of formatters and linters over the codebase.
- [black](https://black.readthedocs.io/en/stable/): The uncompromising code formatter.
- [flake8](https://flake8.pycqa.org/en/latest/): A wrapper around PyFlakes, pycodestyle, and McCabe complexity checker.
- [ruff](https://github.com/astral-sh/ruff): An extremely fast Python linter and code formatter, written in Rust.

Please check [here](https://pre-commit.com/#install) for instructions to set these up. To run over the entire repository, please execute the following command in the terminal:

Expand Down
59 changes: 30 additions & 29 deletions config/example_config.yaml
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
runner:
class_name: OnPolicyRunner
# -- general
num_steps_per_env: 24 # number of steps per environment per iteration
max_iterations: 1500 # number of policy updates
# General
num_steps_per_env: 24 # Number of steps per environment per iteration
max_iterations: 1500 # Number of policy updates
seed: 1
# -- observations
obs_groups: {"policy": ["policy"], "critic": ["policy", "privileged"]} # maps observation groups to types. See `vec_env.py` for more information
# -- logging parameters
save_interval: 50 # check for potential saves every `save_interval` iterations
# Observations
obs_groups: {"policy": ["policy"], "critic": ["policy", "privileged"]} # Maps observation groups to sets. See `vec_env.py` for more information
# Logging parameters
save_interval: 50 # Check for potential saves every `save_interval` iterations
experiment_name: walking_experiment
run_name: ""
# -- logging writer
# Logging writer
logger: tensorboard # tensorboard, neptune, wandb
neptune_project: legged_gym
wandb_project: legged_gym

# -- policy
# Policy
policy:
class_name: ActorCritic
activation: elu
Expand All @@ -25,45 +25,46 @@ runner:
critic_hidden_dims: [256, 256, 256]
init_noise_std: 1.0
noise_std_type: "scalar" # 'scalar' or 'log'
state_dependent_std: false

# -- algorithm
# Algorithm
algorithm:
class_name: PPO
# -- training
# Training
learning_rate: 0.001
num_learning_epochs: 5
num_mini_batches: 4 # mini batch size = num_envs * num_steps / num_mini_batches
schedule: adaptive # adaptive, fixed
# -- value function
# Value function
value_loss_coef: 1.0
clip_param: 0.2
use_clipped_value_loss: true
# -- surrogate loss
# Surrogate loss
desired_kl: 0.01
entropy_coef: 0.01
gamma: 0.99
lam: 0.95
max_grad_norm: 1.0
# -- miscellaneous
# Miscellaneous
normalize_advantage_per_mini_batch: false

# -- random network distillation
# Random network distillation
rnd_cfg:
weight: 0.0 # initial weight of the RND reward
weight_schedule: null # note: this is a dictionary with a required key called "mode". Please check the RND module for more information
reward_normalization: false # whether to normalize RND reward
# -- learning parameters
learning_rate: 0.001 # learning rate for RND
# -- network parameters
num_outputs: 1 # number of outputs of RND network. Note: if -1, then the network will use dimensions of the observation
predictor_hidden_dims: [-1] # hidden dimensions of predictor network
target_hidden_dims: [-1] # hidden dimensions of target network
weight: 0.0 # Initial weight of the RND reward
weight_schedule: null # This is a dictionary with a required key called "mode". Please check the RND module for more information
reward_normalization: false # Whether to normalize RND reward
# Learning parameters
learning_rate: 0.001 # Learning rate for RND
# Network parameters
num_outputs: 1 # Number of outputs of RND network. Note: if -1, then the network will use dimensions of the observation
predictor_hidden_dims: [-1] # Hidden dimensions of predictor network
target_hidden_dims: [-1] # Hidden dimensions of target network

# -- symmetry augmentation
# Symmetry augmentation
symmetry_cfg:
use_data_augmentation: true # this adds symmetric trajectories to the batch
use_mirror_loss: false # this adds symmetry loss term to the loss function
data_augmentation_func: null # string containing the module and function name to import
use_data_augmentation: true # This adds symmetric trajectories to the batch
use_mirror_loss: false # This adds symmetry loss term to the loss function
data_augmentation_func: null # String containing the module and function name to import
# Example: "legged_gym.envs.locomotion.anymal_c.symmetry:get_symmetric_states"
#
# .. code-block:: python
Expand All @@ -73,4 +74,4 @@ runner:
# obs: Optional[torch.Tensor] = None, actions: Optional[torch.Tensor] = None, cfg: "BaseEnvCfg" = None, obs_type: str = "policy"
# ) -> Tuple[torch.Tensor, torch.Tensor]:
#
mirror_loss_coeff: 0.0 #coefficient for symmetry loss term. If 0, no symmetry loss is used
mirror_loss_coeff: 0.0 # Coefficient for symmetry loss term. If 0, no symmetry loss is used
21 changes: 0 additions & 21 deletions licenses/dependencies/black-license.txt

This file was deleted.

22 changes: 0 additions & 22 deletions licenses/dependencies/flake8-license.txt

This file was deleted.

21 changes: 0 additions & 21 deletions licenses/dependencies/isort-license.txt

This file was deleted.

19 changes: 0 additions & 19 deletions licenses/dependencies/pyupgrade-license.txt

This file was deleted.

Loading