Open
Conversation
New keras agent based on TD3
Create keras_td3.py
Conflicts: exarl/workflows/workflow_vault/async_learner.py exarl/workflows/workflow_vault/sync_learner.py
Data structures Comms Environments
2. Revert lookup params to run_params to have a single point of change
casting all variables in calc_target_f to float32
fix to config files params: model/model_type
* 1. Fixed spelling and added greedy throttle to SingleRun.pm 2. Fixed support for -A option (previously not being respected) 3. Ensured Tensorflow is not loaded when using Pytorch 4. Fixed log/results directory to increment instead of defaulting to RUN001 5. Added convergence cutoffs to workflows 6. Fixed/simplified step counting in workflows 7. Added hyper-parameter tuning using sbatch and optuna * 1. Flake8 2. Added fix to sync with episode block leading to deadlock 3. Commented hyper-parameter tuning script 4. Fixed RUN000 to the next dir if logs exist 5. Adjusted output to match hyper-parameter script * 1. Updating docs for workflows 2. Minor fix to bsuite_batch help message * 1. Added ability to load external agents and environments 2. Fixed print and end of run 3. Async convergence call had too many parameters
… for end of episodes. Still touching some things up to get TD3-v1 working with the new fixBroken branch.
…0 work with the fix to the q target calculation and do great on the Pendulum test problem
… has dimension > 1.
…ist for no reason. Not sure why that was the case before but it works just fine now with the simpler fix
… works with Pendulum just as well as DDPG.
… testing however before pull request.
Fix done bug
The OUNoise call only works for a scalar action. Currently the noise is a scalar, so if the action space is bigger, it adds the same value to all actions. This fix ensures that the OUNoise matches the shape of the action space.
Update ddpg.py
…sted on the Pendulum case. Again, need to do some performance testing at some point, but the buffer does work on the Pendulum problem. Also added a default buffer for the TD3-v1 case.
Added the capability to do n-step DDPG-v0 and TD3-v1
…ize x self.num_actions, but the noise being sampled was only a self.num_actions length vector. So the same sampled noise was being added to each next_action in a batch. This fixes it. I also changed the noise shape in the action method as well. This one was currently fine, but ensuring it is the shape of sampled_actions ensures that it is the right shape. Having it as a self.num_actions vector could make it not robust to future changes of action space shape.
…the Dense layer output is sized wrong if the action space is of higher dimension than 1.
… is wrong unless the lower bound is equal to the upper bound. I switched to the sigmoid activation by default and then then scale it properly. Before, the output of the tanh layer was just multiplied by the upper_bound, so the output of the tanh layer was scaled to +/-upper_bound.
…e version of ExaRL that works with the old version of gym. I dont have the old version working to test it though.
Fixing a shape issue in TD3. In train_critic,
Adding Soft-Actor Critic method to the fixBroken branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.