LSTM #147

akbir · 2022-11-23T17:17:38Z

Adding an LSTMAgent.

This adds an LSTM option to the PPO Agents.

Achieves parity on IPD

Does your submission pass tests?
Have you lint your code locally prior to submission?

Changes to Core Features:

Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your core changes, as applicable?
Have you successfully ran tests with your changes locally?

newtonkwan

Nice that you're adding LSTM. However, I'm not confident that it works because I don't know if cell is being used to calculate the new hidden. If you could point me to where that's happening in ppo_lstm.py, it might help me understand. Also, does this run on the parity tests? When I first developed the memory component, LSTM didn't automatically batch the first dimension, so you had to know beforehand how big the batch size was going to be when passing in some inputs. I'm not sure if you were able to find a work around for that. My misunderstandings are probably also due to me not seeing the code base for a while.

If you could show that this works on a few parity tests and that the cell part is being used for updating the state, as well as addressing those few minor comments, then I think we're good to go.

newtonkwan · 2022-12-01T09:48:40Z

pax/agents/ppo/networks.py

@@ -358,15 +374,27 @@ def forward_fn(
        inputs: jnp.ndarray, state: jnp.ndarray
    ) -> Tuple[Tuple[jnp.ndarray, jnp.ndarray], jnp.ndarray]:
        """forward function"""
-        torso = hk.nets.MLP(


why is this being removed?

newtonkwan · 2022-12-01T09:51:12Z

pax/agents/ppo/ppo_lstm.py

+    behavior_values: jnp.ndarray
+    behavior_log_probs: jnp.ndarray
+
+    # GRU specific


Change to LSTM specific or Recurrent specific. Also, wouldn't this need cell as well?

newtonkwan · 2022-12-01T09:53:50Z

pax/experiment.py

+                seed=seed,
+                player_id=player_id,
+            )
+        else:


i would change this to an elif args.ppo.rnn_type == "gru", then an else that raises an error. We wouldn't want any string other than lstm to set the rnn_type to gru.

newtonkwan · 2022-12-01T09:55:04Z

pax/conf/experiment/sarl/pendulum.yaml

+agent1: 'PPO'
+
+# Environment
+env_id: MountainCar-v0


the file is called pendulum.yaml but the env_id: MountainCar-v0. Am I missing something?

newtonkwan · 2022-12-01T09:56:07Z

pax/conf/experiment/ipd/ppo_mem_v_tft.yaml

@@ -2,7 +2,7 @@

 # Agents  
 agent1: 'PPO_memory'
-agent2: 'TitForTat'
+agent2: 'PPO_memory'


file is called ppo_mem_v_tft.yaml but agent2: PPO_memory. Why was this changed?

newtonkwan · 2022-12-01T10:00:56Z

pax/runner_evo.py

            key = jax.random.split(
                agent2._state.random_key, args.popsize * args.num_opps
            ).reshape(args.popsize, args.num_opps, -1)
+            if args.ppo.rnn_type == "lstm" and args.agent2 == "PPO_memory":


I need some help understanding this. If we want to use an LSTM, the initial hidden state is a Haiku LSTMState object that holds the hidden and cell states. And if we want a GRU, then the hidden state is in jnp.tile(agent2._mem.hidden, (args.popsize, args.num_opps, 1, 1)). Are these both NamedTuples and is agent2.batch_init() able to handle both of them?

Maybe I'm missing something that changed in how the agents and the agent methods are initialized, but I don't see any diffs for that file here.

newtonkwan · 2022-12-01T10:21:25Z

pax/agents/ppo/ppo_lstm.py

+            hiddens: jnp.ndarray,
+        ):
+            """Surrogate loss using clipped probability ratios."""
+            (distribution, values), _ = network.apply(


Since we are now using an LSTM, is it now the case that network.apply() now requires both hidden and cell?

no it needs the LSTMHIdden.

newtonkwan · 2022-12-01T10:27:21Z

pax/agents/ppo/ppo_lstm.py

+        initial_hidden_state=initial_hidden_state,
+        optimizer=optimizer,
+        random_key=random_key,
+        gru_dim=args.ppo.hidden_size,


In the ppo file, you could change this input to something more general now such as recurrent_dim, rather than gru_dim.

akbir · 2022-12-07T13:38:16Z

@newtonkwan - can you pick this PR up and get into main before the release?

wip: there's a bug

c06fb6c

github-actions bot added the core label Nov 23, 2022

akbir added 2 commits November 23, 2022 18:31

updated vmap + lstm runs

4c64e12

cleaning up

ceee547

akbir changed the title ~~wip: LSTM~~ LSTM Nov 24, 2022

akbir added 3 commits November 24, 2022 12:01

added better cartpole networks

b5a539d

added support for coingame network

d17a3d8

added support for coingame network

526dd65

akbir requested a review from newtonkwan November 29, 2022 16:54

newtonkwan reviewed Dec 1, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM #147

LSTM #147

akbir commented Nov 23, 2022 •

edited

Loading

newtonkwan left a comment •

edited

Loading

newtonkwan Dec 1, 2022

newtonkwan Dec 1, 2022

newtonkwan Dec 1, 2022

newtonkwan Dec 1, 2022

newtonkwan Dec 1, 2022

newtonkwan Dec 1, 2022

newtonkwan Dec 1, 2022

akbir Dec 2, 2022

newtonkwan Dec 1, 2022

akbir commented Dec 7, 2022

LSTM #147

Are you sure you want to change the base?

LSTM #147

Conversation

akbir commented Nov 23, 2022 • edited Loading

Changes to Core Features:

newtonkwan left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akbir commented Dec 7, 2022

akbir commented Nov 23, 2022 •

edited

Loading

newtonkwan left a comment •

edited

Loading