Final year project "Planning to Explore in Reinforcement Learning".
Each agent has been developed such that it can interact with gym-like environments. Models are represented as MDPs, which are distinguished between deterministic and stochastic. Each model-based agent can be passed a model to utilise.