Skip to content

[BUG] dataset parameter not passed when using config.json with MMMUProDataset #1185

@javilonso

Description

@javilonso

🐛 Bug Report: dataset parameter not passed when using config.json with MMMUProDataset

What's happening?

When trying to load any benchmark from MMMUProDataset via config.json, the dataset fails to build correctly because the dataset parameter is not passed to the class constructor.

This happens due to how build_dataset_from_config() filters parameters based on the class constructor signature using Python's inspect module.


✅ Works (using command-line args)

python3 run.py --data MMMU_Pro_10c --model Qwen2-VL-2B-Instruct --verbose

❌ Fails (using config file)

python3 run.py --config config.json

Example config.json:

{
  "model": {
    "Qwen2-VL-2B-Instruct": {
      "class": "Qwen2VLChat",
      "model_path": "Qwen/Qwen2-VL-2B-Instruct",
      "temperature": 0.1
    }
  },
  "data": {
    "MMMU": {
      "class": "MMMUProDataset",
      "dataset": "MMMU_Pro_10c"
    }
  }
}

📌 Root Cause

  • The function build_dataset_from_config() uses this logic:

    valid_params = {k: v for k, v in config.items() if k in sig.parameters}
  • MMMUProDataset.__init__() is defined with **kwargs:

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
  • Because of this, inspect.signature(cls.__init__) returns:

    Signature(self, **kwargs)
  • It does not include dataset, even though the superclass (ImageBaseDataset) accepts dataset explicitly:

    def __init__(self, dataset='MMBench', skip_noimg=True):
        self.dataset_name = dataset
  • So the filtering logic incorrectly drops the dataset key from config.


✅ Why build_dataset() works

In contrast, build_dataset() passes arguments directly without filtering:

return cls(dataset=dataset_name, **kwargs)

It doesn't rely on inspect.signature, so dataset is passed correctly and everything works.


🔁 Suggested Fix

Update the logic in build_dataset_from_config() to detect when a class uses **kwargs, and skip filtering in that case:

if any(p.kind == p.VAR_KEYWORD for p in sig.parameters.values()):
    valid_params = config
else:
    valid_params = {k: v for k, v in config.items() if k in sig.parameters}

Alternatively, traverse the class's MRO to find the first constructor with concrete parameters (not just **kwargs).


✅ Expected Behavior

Running with --config config.json should work identically to the command-line approach, even for classes that inherit dataset via super().__init__(**kwargs).


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions