WIP: feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config #858

puyuan1996 · 2025-03-06T07:11:24Z

Description

Related Issue

TODO

Check List

merge the latest version source branch/repo, and resolve all the conflicts
pass style check
pass all the tests

…n mask), e.g. detective env

PaParaZz1 · 2025-03-09T04:01:16Z

ding/policy/base_policy.py

+                    if param.grad is not None:
+                        allreduce(param.grad.data)
+                    else:
+                        # 如果梯度为 None，则创建一个与 param.grad_size 相同的零张量，并执行 allreduce


remove commented code and add English comment, then these modifications will be merged

PaParaZz1 · 2025-03-09T04:01:36Z

ding/utils/pytorch_ddp_dist_helper.py

+    # dist.init_process_group(backend=backend, rank=rank, world_size=world_size)
+    # TODO： 
+    import datetime
+    dist.init_process_group(backend=backend, rank=rank, world_size=world_size, timeout=datetime.timedelta(seconds=60000))


why add this

PaParaZz1 · 2025-03-09T04:02:25Z

ding/worker/learner/base_learner.py

+        # if self._rank == 0:
+        #     self._monitor = get_simple_monitor_type(self._policy.monitor_vars())(TickTime(), expire=10)
+
+        self._monitor = get_simple_monitor_type(self._policy.monitor_vars())(TickTime(), expire=10)


add an argument named only_monitor_rank0 to control the logic, defaults to True

PaParaZz1 · 2025-03-09T04:03:32Z

ding/worker/learner/learner_hook.py

-            for k in engine.log_buffer:
-                engine.log_buffer[k].clear()
-            return
+        # if engine.rank != 0:


also pass the only_monitor_rank0 argument to the hook class

PaParaZz1 · 2025-03-09T04:05:35Z

ding/model/template/qmix.py

+                self._global_state_encoder = nn.Identity()
+            elif len(global_obs_shape) == 3:
+                self._mixer = Mixer(agent_num, embedding_size, embedding_size, activation=activation)
+                self._global_state_encoder = ConvEncoder(global_obs_shape, hidden_size_list=hidden_size_list, activation=activation, norm_type='BN')


why BN rather than using LN as default here

PaParaZz1 · 2025-03-09T04:05:58Z

ding/model/template/qmix.py

-            agent_state, global_state = agent_state.unsqueeze(0), global_state.unsqueeze(0)
+            agent_state = agent_state.unsqueeze(0)
+        if single_step and len(global_state.shape) == 2:
+            global_state = global_state.unsqueeze(0)


add shape comments

PaParaZz1 · 2025-03-09T04:06:24Z

ding/model/template/qmix.py

@@ -205,7 +214,10 @@ def forward(self, data: dict, single_step: bool = True) -> dict:
        agent_q_act = torch.gather(agent_q, dim=-1, index=action.unsqueeze(-1))
        agent_q_act = agent_q_act.squeeze(-1)  # T, B, A
        if self.mixer:
-            global_state_embedding = self._global_state_encoder(global_state)
+            if len(global_state.shape) == 5:


add some comments

PaParaZz1 · 2025-03-09T04:07:49Z

ding/model/template/vac.py

@@ -265,12 +265,17 @@ def compute_actor(self, x: torch.Tensor) -> Dict:
            >>> assert actor_outputs['logit'].shape == torch.Size([4, 64])
        """
        if self.share_encoder:
-            x = self.encoder(x)
+            # import ipdb;ipdb.set_trace()


modify the corresponding API comments, and the isinstance(x, dict) to control the logic

puyuan1996 added 8 commits October 22, 2024 17:07

feature(pu): add pistonball_env, its unittest and qmix config

d1e427e

polish(pu): pistonball reuse PTZRecordVideo

e916841

polish(pu): adapt qmix's mixer to support image obs

55dc254

tmp commit: unizero_mt_ddp_v2

e6a18ba

polish(pu): adapt learner to unizero_multitask_ddp_v2

a42c85b

polish(pu): adapt learner to unizero_multitask_ddp_v2

7a66b76

test(pu): add timeout in dist.init_process_group

0c4b338

feature(pu): adapt ppo vac to env that return obs dict (include actio…

141cf51

…n mask), e.g. detective env

PaParaZz1 requested changes Mar 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config #858

WIP: feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config #858

puyuan1996 commented Mar 6, 2025

PaParaZz1 Mar 9, 2025

PaParaZz1 Mar 9, 2025

PaParaZz1 Mar 9, 2025

PaParaZz1 Mar 9, 2025

PaParaZz1 Mar 9, 2025

PaParaZz1 Mar 9, 2025

PaParaZz1 Mar 9, 2025

PaParaZz1 Mar 9, 2025

WIP: feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config #858

Are you sure you want to change the base?

WIP: feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config #858

Conversation

puyuan1996 commented Mar 6, 2025

Description

Related Issue

TODO

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment