Goal-based environments and `ObsDictRelabelingBuffer`

Some algorithms, like HER, are for goal-conditioned environments, like the OpenAI Gym GoalEnv or the multiworld MultitaskEnv environments.

These environments are different from normal gym environments in that they return dictionaries for observations, like so: the environments work like this:

env = CarEnv()
obs = env.reset()
next_obs, reward, done, info = env.step(action)
print(obs)

# Output:
# {
#     'observation': ...,
#     'desired_goal': ...,
#     'achieved_goal': ...,
# }

The GoalEnv environments also have a function with signature

def compute_rewards (achieved_goal, desired_goal):
   # achieved_goal and desired_goal are vectors

while the MultitaskEnv has a signature like

def compute_rewards (observation, action, next_observation):
   # observation and next_observations are dictionaries

To learn more about these environments, check out the URLs above. This means that normal RL algorithms won't even "type check" with these environments.

ObsDictRelabelingBuffer perform hindsight experience replay with either types of environments and works by saving specific values in the observation dictionary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

goal_based_envs.md

goal_based_envs.md

Goal-based environments and `ObsDictRelabelingBuffer`

Files

goal_based_envs.md

Latest commit

History

goal_based_envs.md

File metadata and controls

Goal-based environments and ObsDictRelabelingBuffer

Goal-based environments and `ObsDictRelabelingBuffer`