-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Training with the official LLaVA-3D-Instruct-860K.json dataset fails with IndexError: index 0 is out of bounds for dimension 0 with size 0.
This is error message
Traceback (most recent call last): File "/mnt/sda/shenhao/conda/envs/llava-3d/lib/python3.10/contextlib.py", line 153, in __exit__ self.gen.throw(typ, value, traceback) File "/mnt/sda/shenhao/conda/envs/llava-3d/lib/python3.10/site-packages/accelerate/accelerator.py", line 924, in accumulate yield File "/mnt/sda/shenhao/conda/envs/llava-3d/lib/python3.10/site-packages/transformers/trainer.py", line 1869, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/mnt/sda/shenhao/conda/envs/llava-3d/lib/python3.10/site-packages/transformers/trainer.py", line 2772, in training_step loss = self.compute_loss(model, inputs) File "/mnt/sda/shenhao/conda/envs/llava-3d/lib/python3.10/site-packages/transformers/trainer.py", line 2795, in compute_loss outputs = model(**inputs) File "/mnt/sda/shenhao/conda/envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/mnt/sda/shenhao/conda/envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/mnt/sda/shenhao/conda/envs/llava-3d/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward return model_forward(*args, **kwargs) File "/mnt/sda/shenhao/conda/envs/llava-3d/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/mnt/sda/shenhao/conda/envs/llava-3d/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/mnt/sda/shenhao/code/LLaVA-3D/llava/model/language_model/llava_llama.py", line 86, in forward ) = self.prepare_inputs_labels_for_multimodal( File "/mnt/sda/shenhao/code/LLaVA-3D/llava/model/llava_arch.py", line 332, in prepare_inputs_labels_for_multimodal cur_prompt_features = prompt_features[cur_prompt_idx].unsqueeze(0) # (1, C) IndexError: index 0 is out of bounds for dimension 0 with size 0
I think the reason is
Mismatch between data format and code assumptions:When the conversation contains tokens but no corresponding 3D box data exists, the code tries to access an empty prompt_features tensor, causing an out-of-bounds error.
Execution Flow
- DataLoader: If a sample lacks target.boxes data (due to wrong field name or empty list), clicks = []
- Collate: prompt_features = encode_prompts(clicks) → shape (0, 3) (empty tensor)
- Tokenizer: in conversation gets converted to LOC_TOKEN_INDEX tokens
- Model: num_prompts > 0, loops to access prompt_features[0], [1]... but tensor is empty!
- Error: IndexError: index 0 is out of bounds for dimension 0 with size 0
and this is problematic data sample
{ "id": 838720, "video": "scannet/scene0069_00", "conversations": [ { "value": "<video>\nIdentify the object according to the following description.\nThe door's neighbor is a long radiator.\nThere may be no corresponding object, or there may be one or more objects.", "from": "human" }, { "value": "Answer: <boxes>.", "from": "gpt" } ], "box": [], "metadata": { "dataset": "multi3drefer", "question_type": "zt_wo_d", "ann_id": 8, "object_id": [] } }, { "id": 858841, "video": "scannet/scene0502_00", "conversations": [ { "value": "<video>\nIdentify the object according to the following description.\nA whiteboard-fronting office chair sits in the room.\nThere may be no corresponding object, or there may be one or more objects.", "from": "human" }, { "value": "Answer: <boxes>.", "from": "gpt" } ], "box": [ [ -0.3842504620552063, 0.2782879201695323, 0.6257577687501907, 0.630746603012085, 0.6165543142706156, 0.6325612962245941 ] ], "metadata": { "dataset": "multi3drefer", "question_type": "st_w_d", "ann_id": 56, "object_id": [ 1 ] } }