Skip to content

assert labels.shape == (n_rows,) #5205

@z-x-x136

Description

@z-x-x136

System Info

(TaskRunner pid=782838) step:0 - val-core/vision_table_comparison/reward/mean@8:0.5797759237576205 - val-aux/vision_table_comparison/reward/std@8:0.04346181558967558 - val-aux/vision_table_comparison/reward/best@2/mean:0.5981306398997328 - val-aux/vision_table_comparison/reward/best@2/std:0.027649894546757196 - val-aux/vision_table_comparison/reward/worst@2/mean:0.5615872453302144 - val-aux/vision_table_comparison/reward/worst@2/std:0.0451028965670866 - val-aux/vision_table_comparison/reward/best@4/mean:0.607696205476826 - val-aux/vision_table_comparison/reward/best@4/std:0.013642463673652247 - val-aux/vision_table_comparison/reward/worst@4/mean:0.5405101236723503 - val-aux/vision_table_comparison/reward/worst@4/std:0.04120077883325907 - val-core/vision_table_comparison/reward/best@8/mean:0.6128662523748177 - val-core/vision_table_comparison/reward/best@8/std:0.0074369439234818575 - val-aux/vision_table_comparison/reward/worst@8/mean:0.5212478603658811 - val-aux/vision_table_comparison/reward/worst@8/std:0.03235882541980303 - val-aux/num_turns/min:2 - val-aux/num_turns/max:2 - val-aux/num_turns/mean:2.0
(TaskRunner pid=782838) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790512, ip=192.168.0.11, actor_id=d605e2db7eb23153620ff54801000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f7830d00470>)
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(TaskRunner pid=782838) return self.__get_result()
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(TaskRunner pid=782838) raise self._exception
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
(TaskRunner pid=782838) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
(TaskRunner pid=782838) return func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
(TaskRunner pid=782838) return func(self_instance, *args, **kwargs_inner)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
(TaskRunner pid=782838) output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
(TaskRunner pid=782838) return self.log(decorated_function, *args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
(TaskRunner pid=782838) entropy, log_probs = self._forward_micro_batch(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
(TaskRunner pid=782838) log_probs = logprobs_from_logits(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
(TaskRunner pid=782838) output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
(TaskRunner pid=782838) output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
(TaskRunner pid=782838) return CrossEntropyLoss.apply(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
(TaskRunner pid=782838) return super().apply(*args, **kwargs) # type: ignore[misc]
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
(TaskRunner pid=782838) assert labels.shape == (n_rows,)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) AssertionError
(TaskRunner pid=782838) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790557, ip=192.168.0.11, actor_id=124c3751660a438e2946d09201000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7fa509b985f0>)
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(TaskRunner pid=782838) return self.__get_result()
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(TaskRunner pid=782838) raise self._exception
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
(TaskRunner pid=782838) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
(TaskRunner pid=782838) return func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
(TaskRunner pid=782838) return func(self_instance, *args, **kwargs_inner)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
(TaskRunner pid=782838) output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
(TaskRunner pid=782838) return self.log(decorated_function, *args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
(TaskRunner pid=782838) entropy, log_probs = self._forward_micro_batch(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
(TaskRunner pid=782838) log_probs = logprobs_from_logits(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
(TaskRunner pid=782838) output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
(TaskRunner pid=782838) output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
(TaskRunner pid=782838) return CrossEntropyLoss.apply(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
(TaskRunner pid=782838) return super().apply(*args, **kwargs) # type: ignore[misc]
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
(TaskRunner pid=782838) assert labels.shape == (n_rows,)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) AssertionError
(TaskRunner pid=782838) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790562, ip=192.168.0.11, actor_id=c122ced1511239abde824b3f01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7fcf2dc44740>)
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(TaskRunner pid=782838) return self.__get_result()
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(TaskRunner pid=782838) raise self._exception
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
(TaskRunner pid=782838) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
(TaskRunner pid=782838) return func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
(TaskRunner pid=782838) return func(self_instance, *args, **kwargs_inner)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
(TaskRunner pid=782838) output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
(TaskRunner pid=782838) return self.log(decorated_function, *args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
(TaskRunner pid=782838) output = func(*args, **kwargs)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
(TaskRunner pid=782838) entropy, log_probs = self._forward_micro_batch(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
(TaskRunner pid=782838) log_probs = logprobs_from_logits(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
(TaskRunner pid=782838) output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
(TaskRunner pid=782838) output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
(TaskRunner pid=782838) return CrossEntropyLoss.apply(
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
(TaskRunner pid=782838) return super().apply(*args, **kwargs) # type: ignore[misc]
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
(TaskRunner pid=782838) assert labels.shape == (n_rows,)
(TaskRunner pid=782838) ^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=782838) AssertionError
(TaskRunner pid=782838) swanlab: 🏠 View project at https://swanlab.cn/@zxx2000/qwen3vl_table_ocr
(TaskRunner pid=782838) swanlab: 🚀 View run at https://swanlab.cn/@zxx2000/qwen3vl_table_ocr/runs/jyqb6wz79pxyj1d93isth
Training Progress: 0%| | 0/500 [02:34<?, ?it/s]
Error executing job with overrides: ['algorithm.adv_estimator=grpo', 'data.train_files=/workspace/verl_grpo/verl_grpo_data/train.parquet', 'data.val_files=/workspace/verl_grpo/verl_grpo_data/test.parquet', 'critic.enable=False']
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/workspace/verl/verl/trainer/main_ppo.py", line 447, in
main()
File "/usr/local/lib/python3.12/dist-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/trainer/main_ppo.py", line 45, in main
run_ppo(config)
File "/workspace/verl/verl/trainer/main_ppo.py", line 99, in run_ppo
ray.get(runner.run.remote(config))
File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2882, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 968, in get_objects
raise value.as_instanceof_cause()

5bdabf01000000, repr=<main_ppo.TaskRunner object at 0x7f7e30145e20>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/trainer/main_ppo.py", line 366, in run
trainer.fit()
File "/workspace/verl/verl/trainer/ppo/ray_trainer.py", line 1483, in fit
ref_log_prob = self._compute_ref_log_prob(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/trainer/ppo/ray_trainer.py", line 1177, in _compute_ref_log_prob
ref_log_prob = self.ref_policy_wg.compute_ref_log_prob(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/single_controller/ray/base.py", line 53, in call
output = ray.get(output)
^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(AssertionError): ray::WorkerDict.ref_compute_ref_log_prob() (pid=790530, ip=192.168.0.11, actor_id=7577f135929f6e4afe85fe4b01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f979cf144a0>)
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/single_controller/ray/base.py", line 841, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/single_controller/base/decorator.py", line 456, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/transferqueue_utils.py", line 314, in dummy_inner
output = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/profiler/profile.py", line 274, in wrapper
return func(self_instance, *args, **kwargs_inner)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/workers/fsdp_workers.py", line 1061, in compute_ref_log_prob
output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/profiler/performance.py", line 105, in f
return self.log(decorated_function, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/profiler/performance.py", line 118, in log
output = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/workers/actor/dp_actor.py", line 378, in compute_log_prob
entropy, log_probs = self._forward_micro_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/workers/actor/dp_actor.py", line 219, in _forward_micro_batch
log_probs = logprobs_from_logits(
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/torch_functional.py", line 94, in logprobs_from_logits
output = logprobs_from_logits_flash_attn(logits, labels, inplace_backward=inplace_backward)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/verl/verl/utils/torch_functional.py", line 122, in logprobs_from_logits_flash_attn
output = cross_entropy_loss(logits, labels, inplace_backward=inplace_backward)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 320, in cross_entropy_loss
return CrossEntropyLoss.apply(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 576, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/flash_attn/ops/triton/cross_entropy.py", line 171, in forward
assert labels.shape == (n_rows,)
^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

GRPO训练qwen3-vl-thinking-4b

ubuntu@zhizhehui:~/verl_grpo$ pip show verl
Name: verl
Version: 0.7.0.dev0
Summary: verl: Volcano Engine Reinforcement Learning for LLM
Home-page: https://github.com/volcengine/verl
Author: Bytedance - Seed - MLSys
Author-email: zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk
License: Apache-2.0
Location: /workspace/.local/lib/python3.12/site-packages
Editable project location: /workspace/verl
Requires: accelerate, codetiming, datasets, dill, hydra-core, numpy, packaging, pandas, peft, pyarrow, pybind11, pylatexenc, ray, tensorboard, tensordict, torchdata, transformers, wandb
Required-by:

Expected behavior

应该是正常跑到step1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions