Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 159 additions & 0 deletions docs/source/features/hydra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,162 @@ the post init update is as follows:

Here, when modifying ``env.decimation`` or ``env.sim.dt``, the user needs to give the updated ``env.sim.render_interval``,
``env.scene.height_scanner.update_period``, and ``env.scene.contact_forces.update_period`` as input as well.


Group Override
--------------
Group override lets you swap out entire groups of environment- or agent-level settings in one go.
Instead of overriding individual fields, you select a named preset defined under a ``variants`` mapping
directly inside your config classes.


Defining Variants
^^^^^^^^^^^^^^^^^
Declare alternatives under ``self.variants`` in your environment and agent configs. Each top-level key under
``variants`` becomes a Hydra group (``env.<key>`` or ``agent.<key>``), and each nested key is a selectable option.
Comment on lines +141 to +142
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this doesn't work if variants are inside the inner configs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it currently only binds variants to env and agent, and all inner config are specified through dotted path. I actually thought about maybe introduce a more generalized variant class so you can bind not only in top level env or agent, but can do it at any inner level. But It feels like will be quite invasive, at need to think about it a lot more carefully. If something nice come out, I think providing both dotted path, or inner cfg variant can be pretty nice.


Environment variants example:

.. code-block:: python

@configclass
class ReachEnvCfg(ManagerBasedRLEnvCfg):
def __post_init__(self):
super().__post_init__()
# Share across all derived envs
self.variants = {
"observations": {
"noise_less": NoiselessObservationsCfg(),
}
}

@configclass
class FrankaReachEnvCfg(ReachEnvCfg):
def __post_init__(self):
super().__post_init__()
variants = {
"actions.arm_action": {
"joint_position_to_limit": mdp.JointPositionToLimitsActionCfg(
asset_name="robot", joint_names=["panda_joint.*"]
),
"relative_joint_position": mdp.RelativeJointPositionActionCfg(
asset_name="robot", joint_names=["panda_joint.*"], scale=0.2
),
}
}
self.variants.update(variants)

RSL-RL agent variants example:

.. code-block:: python

@configclass
class FrankaReachPPORunnerCfg(RslRlOnPolicyRunnerCfg):
num_steps_per_env = 24
...
policy = RslRlPpoActorCriticCfg(
...
)
algorithm = RslRlPpoAlgorithmCfg(
...
)
variants = {
"policy": {
"large_network": RslRlPpoActorCriticCfg(
actor_hidden_dims=[512, 256, 128, 64], critic_hidden_dims=[512, 256, 128, 64], ...
),
"medium_network": RslRlPpoActorCriticCfg(
actor_hidden_dims=[256, 128, 64], critic_hidden_dims=[256, 128, 64], ...
),
},
"algorithm": {
"small_batch_lr": RslRlPpoAlgorithmCfg(num_mini_batches=16, learning_rate=1.0e-4, ...),
},
}


RL Games agent variants example:

.. code-block:: yaml

params:
env: ...
config: ...
network:
...
mlp:
units: [64, 64]
activation: elu
d2rl: False

variants:
params.network.mlp:
large_network:
units: [256, 128, 64]
activation: elu
d2rl: False

The above defines a selectable group at ``agent.params.network.mlp`` with option ``large_network``.





Override Syntax
^^^^^^^^^^^^^^^
Select one preset per group via Hydra-style CLI flags.

.. tab-set::
:sync-group: rl-override

.. tab-item:: rsl_rl
:sync: rsl_rl

.. code-block:: bash

python scripts/reinforcement_learning/rsl_rl/train.py \
--task=Isaac-Reach-Franka-v0 \
--headless \
env.observations=noise_less \
env.actions.arm_action=relative_joint_position \
agent.policy=large_network

Hydra replaces:

.. list-table::
:widths: 30 70
:header-rows: 1

* - CLI key
- Resolved variant node
* - ``env.observations``
- ``ReachEnvCfg.variants["observations"]["noise_less"]``
* - ``env.actions.arm_action``
- ``FrankaReachEnvCfg.variants["actions.arm_action"]["relative_joint_position"]``
* - ``agent.policy``
- ``FrankaReachPPORunnerCfg.variants["policy"]["large_network"]``

.. tab-item:: rl_games
:sync: rl_games

.. code-block:: bash

python scripts/reinforcement_learning/rl_games/train.py \
--task=Isaac-Reach-Franka-v0 \
--headless \
env.observations=noise_less \
env.actions.arm_action=relative_joint_position \
agent.params.network.mlp=large_network

Hydra replaces:

.. list-table::
:widths: 35 65
:header-rows: 1

* - CLI key
- Resolved variant node
* - ``agent.params.network.mlp``
- ``variants["params.network.mlp"]["large_network"]`` (from RL Games YAML)

These flags let you switch qualitative modes of your experiments with a single option per group.
2 changes: 1 addition & 1 deletion source/isaaclab_tasks/config/extension.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[package]

# Note: Semantic Versioning is used: https://semver.org/
version = "0.11.1"
version = "0.12.0"

# Description
title = "Isaac Lab Environments"
Expand Down
9 changes: 9 additions & 0 deletions source/isaaclab_tasks/docs/CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
Changelog
---------

0.12.0 (2025-10-15)
~~~~~~~~~~~~~~~~~~~~

Changed
^^^^^^^

* Add new feature that support hydra group config override, and provide example at Isaac-Reach-Franka-v0 env


0.11.1 (2025-09-24)
~~~~~~~~~~~~~~~~~~~~

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,15 @@ params:
clip_value: True
clip_actions: False
bounds_loss_coef: 0.0001

variants:
params.network.mlp:
large_network:
units: [256, 128, 64]
activation: elu
d2rl: False

initializer:
name: default
regularizer:
name: None
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,55 @@ class FrankaReachPPORunnerCfg(RslRlOnPolicyRunnerCfg):
desired_kl=0.01,
max_grad_norm=1.0,
)
variants = {
"policy": {
"large_network": RslRlPpoActorCriticCfg(
init_noise_std=1.0,
actor_hidden_dims=[512, 256, 128, 64],
critic_hidden_dims=[512, 256, 128, 64],
activation="elu",
),
"medium_network": RslRlPpoActorCriticCfg(
init_noise_std=1.0,
actor_hidden_dims=[256, 128, 64],
critic_hidden_dims=[256, 128, 64],
activation="elu",
),
"small_network": RslRlPpoActorCriticCfg(
init_noise_std=1.0,
actor_hidden_dims=[128, 64],
critic_hidden_dims=[128, 64],
activation="elu",
),
},
"algorithm": {
"large_batch_lr": RslRlPpoAlgorithmCfg(
value_loss_coef=1.0,
use_clipped_value_loss=True,
clip_param=0.2,
entropy_coef=0.001,
num_learning_epochs=8,
num_mini_batches=2,
learning_rate=1.0e-3,
schedule="adaptive",
gamma=0.99,
lam=0.95,
desired_kl=0.01,
max_grad_norm=1.0,
),
"small_batch_lr": RslRlPpoAlgorithmCfg(
value_loss_coef=1.0,
use_clipped_value_loss=True,
clip_param=0.2,
entropy_coef=0.001,
num_learning_epochs=8,
num_mini_batches=16,
learning_rate=1.0e-4,
schedule="adaptive",
gamma=0.99,
lam=0.95,
desired_kl=0.01,
max_grad_norm=1.0,
),
},
}
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,19 @@ def __post_init__(self):
self.commands.ee_pose.body_name = "panda_hand"
self.commands.ee_pose.ranges.pitch = (math.pi, math.pi)

variants = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this only in the test case and not main codebase?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I can do that for sure,

though it will be nice to have a more visible place(maybe in future) to show how to use group override.
I do think it is pretty nice that you can do
env.actions.arm_action=joint_position_to_limit
env.actions.arm_action=relative_joint_position

instead of being super explicit in task ID

"actions.arm_action": {
"joint_position_to_limit": mdp.JointPositionToLimitsActionCfg(
asset_name="robot", joint_names=["panda_joint.*"]
),
"relative_joint_position": mdp.RelativeJointPositionActionCfg(
asset_name="robot", joint_names=["panda_joint.*"], scale=0.2
),
}
}

self.variants.update(variants)


@configclass
class FrankaReachEnvCfg_PLAY(FrankaReachEnvCfg):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,28 @@ def __post_init__(self):
policy: PolicyCfg = PolicyCfg()


@configclass
class NoiselessObservationsCfg:
"""Noise less observation specifications for the MDP."""

@configclass
class PolicyCfg(ObsGroup):
"""Observations for policy group."""

# observation terms (order preserved)
joint_pos = ObsTerm(func=mdp.joint_pos_rel)
joint_vel = ObsTerm(func=mdp.joint_vel_rel)
pose_command = ObsTerm(func=mdp.generated_commands, params={"command_name": "ee_pose"})
actions = ObsTerm(func=mdp.last_action)

def __post_init__(self):
self.enable_corruption = False
self.concatenate_terms = True

# observation groups
policy: PolicyCfg = PolicyCfg()


@configclass
class EventCfg:
"""Configuration for events."""
Expand Down Expand Up @@ -227,3 +249,5 @@ def __post_init__(self):
),
},
)
# variants defined at base env will be shared across all derived robot-specific envs
self.variants = {"observations": {"noise_less": NoiselessObservationsCfg()}}
Loading
Loading