Description
https://github.com/rllm-org/rllm/blob/main/rllm/experimental/unified_trainer.py#L216
Here we do norm_adv_by_std_in_grpo=self.rllm_config.stepwise_advantage.get("norm_adv_by_std_in_grpo", True), but if I'm not wrong, the norm_adv_by_std_in_grpo flag is defined at rllm.algorithm.norm_adv_by_std_in_grpo in the config files.
Steps to Reproduce
n/a
Error Output / Traceback
rLLM Version
latest main
Training Backend
tinker
Python Version
3.11.14
GPU / CUDA Version
No response
vLLM Version (if applicable)
No response
Training Script / Config
Additional Context
No response
Description
https://github.com/rllm-org/rllm/blob/main/rllm/experimental/unified_trainer.py#L216
Here we do
norm_adv_by_std_in_grpo=self.rllm_config.stepwise_advantage.get("norm_adv_by_std_in_grpo", True), but if I'm not wrong, thenorm_adv_by_std_in_grpoflag is defined at rllm.algorithm.norm_adv_by_std_in_grpo in the config files.Steps to Reproduce
n/a
Error Output / Traceback
rLLM Version
latest main
Training Backend
tinker
Python Version
3.11.14
GPU / CUDA Version
No response
vLLM Version (if applicable)
No response
Training Script / Config
Additional Context
No response