System Info
(TaskRunner pid=782838) step:0 - val-core/vision_table_comparison/reward/mean@8:0.5797759237576205 - val-aux/vision_table_comparison/reward/std@8:0.04346181558967558 - val-aux/vision_table_comparison/reward/best@2/mean:0.5981306398997328 - val-aux/vision_table_comparison/reward/best@2/std:0.027649894546757196 - val-aux/vision_table_comparison/reward/worst@2/mean:0.5615872453302144 - val-aux/vision_table_comparison/reward/worst@2/std:0.0451028965670866 - val-aux/vision_table_comparison/reward/best@4/mean:0.607696205476826 - val-aux/vision_table_comparison/reward/best@4/std:0.013642463673652247 - val-aux/vision_table_comparison/reward/worst@4/mean:0.5405101236723503 - val-aux/vision_table_comparison/reward/worst@4/std:0.04120077883325907 - val-core/vision_table_comparison/reward/best@8/mean:0.6128662523748177 - val-core/vision_table_comparison/reward/best@8/std:0.0074369439234818575 - val-aux/vision_table_comparison/reward/worst@8/mean:0.5212478603658811 - val-aux/vision_table_comparison/reward/worst@8/std:0.03235882541980303 - val-aux/num_turns/min:2 - val-aux/num_turns/max:2 - val-aux/num_turns/mean:2.0
(TaskRunner pid=782838) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790512, ip=192.168.0.11, actor_id=d605e2db7eb23153620ff54801000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f7830d00470>)
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(TaskRunner pid=782838) return self.__get_result()
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(TaskRunner pid=782838) raise self._exception
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
(TaskRunner pid=782838) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
(TaskRunner pid=782838) return func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
(TaskRunner pid=782838) return func(self_instance, *args, **kwargs_inner)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
(TaskRunner pid=782838) output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
(TaskRunner pid=782838) return self.log(decorated_function, *args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
(TaskRunner pid=782838) entropy, log_probs = self._forward_micro_batch(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
(TaskRunner pid=782838) log_probs = logprobs_from_logits(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
(TaskRunner pid=782838) output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
(TaskRunner pid=782838) output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
(TaskRunner pid=782838) return CrossEntropyLoss.apply(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
(TaskRunner pid=782838) return super().apply(*args, **kwargs) # type: ignore[misc]
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
(TaskRunner pid=782838) assert labels.shape == (n_rows,)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) AssertionError
(TaskRunner pid=782838) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790557, ip=192.168.0.11, actor_id=124c3751660a438e2946d09201000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7fa509b985f0>)
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(TaskRunner pid=782838) return self.__get_result()
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(TaskRunner pid=782838) raise self._exception
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
(TaskRunner pid=782838) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
(TaskRunner pid=782838) return func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
(TaskRunner pid=782838) return func(self_instance, *args, **kwargs_inner)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
(TaskRunner pid=782838) output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
(TaskRunner pid=782838) return self.log(decorated_function, *args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
(TaskRunner pid=782838) entropy, log_probs = self._forward_micro_batch(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
(TaskRunner pid=782838) log_probs = logprobs_from_logits(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
(TaskRunner pid=782838) output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
(TaskRunner pid=782838) output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
(TaskRunner pid=782838) return CrossEntropyLoss.apply(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
(TaskRunner pid=782838) return super().apply(*args, **kwargs) # type: ignore[misc]
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
(TaskRunner pid=782838) assert labels.shape == (n_rows,)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) AssertionError
(TaskRunner pid=782838) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790562, ip=192.168.0.11, actor_id=c122ced1511239abde824b3f01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7fcf2dc44740>)
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(TaskRunner pid=782838) return self.__get_result()
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(TaskRunner pid=782838) raise self._exception
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
(TaskRunner pid=782838) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
(TaskRunner pid=782838) return func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
(TaskRunner pid=782838) return func(self_instance, *args, **kwargs_inner)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
(TaskRunner pid=782838) output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
(TaskRunner pid=782838) return self.log(decorated_function, *args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
(TaskRunner pid=782838) entropy, log_probs = self._forward_micro_batch(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
(TaskRunner pid=782838) log_probs = logprobs_from_logits(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
(TaskRunner pid=782838) output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
(TaskRunner pid=782838) output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
(TaskRunner pid=782838) return CrossEntropyLoss.apply(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
(TaskRunner pid=782838) return super().apply(*args, **kwargs) # type: ignore[misc]
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
(TaskRunner pid=782838) assert labels.shape == (n_rows,)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) AssertionError
(TaskRunner pid=782838) swanlab: 🏠 View project at https://swanlab.cn/@zxx2000/qwen3vl_table_ocr
(TaskRunner pid=782838) swanlab: 🚀 View run at https://swanlab.cn/@zxx2000/qwen3vl_table_ocr/runs/jyqb6wz79pxyj1d93isth
Training Progress: 0%| | 0/500 [02:34<?, ?it/s]
Error executing job with overrides: ['algorithm.adv_estimator=grpo', 'data.train_files=/workspace/verl_grpo/verl_grpo_data/train.parquet', 'data.val_files=/workspace/verl_grpo/verl_grpo_data/test.parquet', 'critic.enable=False']
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/workspace/verl/verl/trainer/main_ppo.py", line 447, in
main()
File "/usr/local/lib/python3.12/dist-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/trainer/main_ppo.py", line 45, in main
run_ppo(config)
File "/workspace/verl/verl/trainer/main_ppo.py", line 99, in run_ppo
ray.get(runner.run.remote(config))
File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2882, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 968, in get_objects
raise value.as_instanceof_cause()
5bdabf01000000, repr=<main_ppo.TaskRunner object at 0x7f7e30145e20>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/trainer/main_ppo.py", line 366, in run
trainer.fit()
File "/workspace/verl/verl/trainer/ppo/ray_trainer.py", line 1483, in fit
ref_log_prob = self._compute_ref_log_prob(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/trainer/ppo/ray_trainer.py", line 1177, in _compute_ref_log_prob
ref_log_prob = self.ref_policy_wg.compute_ref_log_prob(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/single_controller/ray/base.py", line 53, in call
output = ray.get(output)
^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(AssertionError): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790530, ip=192.168.0.11, actor_id=7577f135929f6e4afe85fe4b01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f979cf144a0>)
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
output = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
return func(self_instance, *args, **kwargs_inner)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
return self.log(decorated_function, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
output = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
entropy, log_probs = self._forward_micro_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
log_probs = logprobs_from_logits(
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
return CrossEntropyLoss.apply(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
assert labels.shape == (n_rows,)
^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Information
Tasks
Reproduction
GRPO训练qwen3-vl-thinking-4b
ubuntu@zhizhehui:~/verl_grpo$ pip show verl
Name: verl
Version: 0.7.0.dev0
Summary: verl: Volcano Engine Reinforcement Learning for LLM
Home-page: https://github.com/volcengine/verl
Author: Bytedance - Seed - MLSys
Author-email: zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk
License: Apache-2.0
Location: /workspace/.local/lib/python3.12/site-packages
Editable project location: /workspace/verl
Requires: accelerate, codetiming, datasets, dill, hydra-core, numpy, packaging, pandas, peft, pyarrow, pybind11, pylatexenc, ray, tensorboard, tensordict, torchdata, transformers, wandb
Required-by:
Expected behavior
应该是正常跑到step1
System Info
(TaskRunner pid=782838) step:0 - val-core/vision_table_comparison/reward/mean@8:0.5797759237576205 - val-aux/vision_table_comparison/reward/std@8:0.04346181558967558 - val-aux/vision_table_comparison/reward/best@2/mean:0.5981306398997328 - val-aux/vision_table_comparison/reward/best@2/std:0.027649894546757196 - val-aux/vision_table_comparison/reward/worst@2/mean:0.5615872453302144 - val-aux/vision_table_comparison/reward/worst@2/std:0.0451028965670866 - val-aux/vision_table_comparison/reward/best@4/mean:0.607696205476826 - val-aux/vision_table_comparison/reward/best@4/std:0.013642463673652247 - val-aux/vision_table_comparison/reward/worst@4/mean:0.5405101236723503 - val-aux/vision_table_comparison/reward/worst@4/std:0.04120077883325907 - val-core/vision_table_comparison/reward/best@8/mean:0.6128662523748177 - val-core/vision_table_comparison/reward/best@8/std:0.0074369439234818575 - val-aux/vision_table_comparison/reward/worst@8/mean:0.5212478603658811 - val-aux/vision_table_comparison/reward/worst@8/std:0.03235882541980303 - val-aux/num_turns/min:2 - val-aux/num_turns/max:2 - val-aux/num_turns/mean:2.0
(TaskRunner pid=782838) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790512, ip=192.168.0.11, actor_id=d605e2db7eb23153620ff54801000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f7830d00470>)
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(TaskRunner pid=782838) return self.__get_result()
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(TaskRunner pid=782838) raise self._exception
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
(TaskRunner pid=782838) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
(TaskRunner pid=782838) return func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
(TaskRunner pid=782838) return func(self_instance, *args, **kwargs_inner)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
(TaskRunner pid=782838) output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
(TaskRunner pid=782838) return self.log(decorated_function, *args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
(TaskRunner pid=782838) entropy, log_probs = self._forward_micro_batch(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
(TaskRunner pid=782838) log_probs = logprobs_from_logits(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
(TaskRunner pid=782838) output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
(TaskRunner pid=782838) output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
(TaskRunner pid=782838) return CrossEntropyLoss.apply(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
(TaskRunner pid=782838) return super().apply(*args, **kwargs) # type: ignore[misc]
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
(TaskRunner pid=782838) assert labels.shape == (n_rows,)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) AssertionError
(TaskRunner pid=782838) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790557, ip=192.168.0.11, actor_id=124c3751660a438e2946d09201000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7fa509b985f0>)
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(TaskRunner pid=782838) return self.__get_result()
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(TaskRunner pid=782838) raise self._exception
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
(TaskRunner pid=782838) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
(TaskRunner pid=782838) return func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
(TaskRunner pid=782838) return func(self_instance, *args, **kwargs_inner)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
(TaskRunner pid=782838) output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
(TaskRunner pid=782838) return self.log(decorated_function, *args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
(TaskRunner pid=782838) entropy, log_probs = self._forward_micro_batch(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
(TaskRunner pid=782838) log_probs = logprobs_from_logits(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
(TaskRunner pid=782838) output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
(TaskRunner pid=782838) output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
(TaskRunner pid=782838) return CrossEntropyLoss.apply(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
(TaskRunner pid=782838) return super().apply(*args, **kwargs) # type: ignore[misc]
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
(TaskRunner pid=782838) assert labels.shape == (n_rows,)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) AssertionError
(TaskRunner pid=782838) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790562, ip=192.168.0.11, actor_id=c122ced1511239abde824b3f01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7fcf2dc44740>)
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(TaskRunner pid=782838) return self.__get_result()
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(TaskRunner pid=782838) raise self._exception
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
(TaskRunner pid=782838) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
(TaskRunner pid=782838) return func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
(TaskRunner pid=782838) return func(self_instance, *args, **kwargs_inner)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
(TaskRunner pid=782838) output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
(TaskRunner pid=782838) return self.log(decorated_function, *args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
(TaskRunner pid=782838) entropy, log_probs = self._forward_micro_batch(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
(TaskRunner pid=782838) log_probs = logprobs_from_logits(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
(TaskRunner pid=782838) output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
(TaskRunner pid=782838) output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
(TaskRunner pid=782838) return CrossEntropyLoss.apply(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
(TaskRunner pid=782838) return super().apply(*args, **kwargs) # type: ignore[misc]
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
(TaskRunner pid=782838) assert labels.shape == (n_rows,)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) AssertionError
(TaskRunner pid=782838) swanlab: 🏠 View project at https://swanlab.cn/@zxx2000/qwen3vl_table_ocr
(TaskRunner pid=782838) swanlab: 🚀 View run at https://swanlab.cn/@zxx2000/qwen3vl_table_ocr/runs/jyqb6wz79pxyj1d93isth
Training Progress: 0%| | 0/500 [02:34<?, ?it/s]
Error executing job with overrides: ['algorithm.adv_estimator=grpo', 'data.train_files=/workspace/verl_grpo/verl_grpo_data/train.parquet', 'data.val_files=/workspace/verl_grpo/verl_grpo_data/test.parquet', 'critic.enable=False']
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/workspace/verl/verl/trainer/main_ppo.py", line 447, in
main()
File "/usr/local/lib/python3.12/dist-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/trainer/main_ppo.py", line 45, in main
run_ppo(config)
File "/workspace/verl/verl/trainer/main_ppo.py", line 99, in run_ppo
ray.get(runner.run.remote(config))
File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2882, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 968, in get_objects
raise value.as_instanceof_cause()
5bdabf01000000, repr=<main_ppo.TaskRunner object at 0x7f7e30145e20>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/trainer/main_ppo.py", line 366, in run
trainer.fit()
File "/workspace/verl/verl/trainer/ppo/ray_trainer.py", line 1483, in fit
ref_log_prob = self._compute_ref_log_prob(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/trainer/ppo/ray_trainer.py", line 1177, in _compute_ref_log_prob
ref_log_prob = self.ref_policy_wg.compute_ref_log_prob(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/single_controller/ray/base.py", line 53, in call
output = ray.get(output)
^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(AssertionError): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790530, ip=192.168.0.11, actor_id=7577f135929f6e4afe85fe4b01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f979cf144a0>)
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
output = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
return func(self_instance, *args, **kwargs_inner)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
return self.log(decorated_function, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
output = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
entropy, log_probs = self._forward_micro_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
log_probs = logprobs_from_logits(
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
return CrossEntropyLoss.apply(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
assert labels.shape == (n_rows,)
^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
GRPO训练qwen3-vl-thinking-4b
ubuntu@zhizhehui:~/verl_grpo$ pip show verl
Name: verl
Version: 0.7.0.dev0
Summary: verl: Volcano Engine Reinforcement Learning for LLM
Home-page: https://github.com/volcengine/verl
Author: Bytedance - Seed - MLSys
Author-email: zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk
License: Apache-2.0
Location: /workspace/.local/lib/python3.12/site-packages
Editable project location: /workspace/verl
Requires: accelerate, codetiming, datasets, dill, hydra-core, numpy, packaging, pandas, peft, pyarrow, pybind11, pylatexenc, ray, tensorboard, tensordict, torchdata, transformers, wandb
Required-by:
Expected behavior
应该是正常跑到step1