Skip to content

Repository containing code and notebooks exploring how to solve Gymnasium's Lunar Lander through Reinforcement Learning

Notifications You must be signed in to change notification settings

kuds/rl-lunar-lander

Repository files navigation

Lunar Lander with Reinforcement Learning

Soft-Actor Critic (SAC)

Deep Q Learning (DQN)

Proximal Policy Optimization (PPO)

Results

Hardware: Google Colab L4

Model Type Discrete Average Reward Total Training Steps HuggingFace Repo
PPO No 220.66 750,000 Link
PPO Yes 214.55 750,000 Link
SAC No 288.74 750,000 Link
DQN Yes 218.56 750,000 Link

Training Notes

  • Set ent_coef for PPO as it encourages exploration of other actions. Stable Baselines3 defaults the value to 0.0. More Information
  • Do not set your eval_freq too low, as it can sometimes cause instability during learning due to being interrupted by evaluation. (e.g. >=10,000)
  • Stable Baselines3's DQN parameters exploration_initial_eps and exploration_final_eps help determine how exploratory your model is at the beginning and end of training.

Finding Theta Blog Posts

About

Repository containing code and notebooks exploring how to solve Gymnasium's Lunar Lander through Reinforcement Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published