-
Notifications
You must be signed in to change notification settings - Fork 476
Description
🐛 Bug Report: dataset
parameter not passed when using config.json
with MMMUProDataset
What's happening?
When trying to load any benchmark from MMMUProDataset
via config.json
, the dataset fails to build correctly because the dataset
parameter is not passed to the class constructor.
This happens due to how build_dataset_from_config()
filters parameters based on the class constructor signature using Python's inspect
module.
✅ Works (using command-line args)
python3 run.py --data MMMU_Pro_10c --model Qwen2-VL-2B-Instruct --verbose
❌ Fails (using config file)
python3 run.py --config config.json
Example config.json
:
{
"model": {
"Qwen2-VL-2B-Instruct": {
"class": "Qwen2VLChat",
"model_path": "Qwen/Qwen2-VL-2B-Instruct",
"temperature": 0.1
}
},
"data": {
"MMMU": {
"class": "MMMUProDataset",
"dataset": "MMMU_Pro_10c"
}
}
}
📌 Root Cause
-
The function
build_dataset_from_config()
uses this logic:valid_params = {k: v for k, v in config.items() if k in sig.parameters}
-
MMMUProDataset.__init__()
is defined with**kwargs
:def __init__(self, **kwargs): super().__init__(**kwargs)
-
Because of this,
inspect.signature(cls.__init__)
returns:Signature(self, **kwargs)
-
It does not include
dataset
, even though the superclass (ImageBaseDataset
) acceptsdataset
explicitly:def __init__(self, dataset='MMBench', skip_noimg=True): self.dataset_name = dataset
-
So the filtering logic incorrectly drops the
dataset
key fromconfig
.
✅ Why build_dataset()
works
In contrast, build_dataset()
passes arguments directly without filtering:
return cls(dataset=dataset_name, **kwargs)
It doesn't rely on inspect.signature
, so dataset
is passed correctly and everything works.
🔁 Suggested Fix
Update the logic in build_dataset_from_config()
to detect when a class uses **kwargs
, and skip filtering in that case:
if any(p.kind == p.VAR_KEYWORD for p in sig.parameters.values()):
valid_params = config
else:
valid_params = {k: v for k, v in config.items() if k in sig.parameters}
Alternatively, traverse the class's MRO to find the first constructor with concrete parameters (not just **kwargs
).
✅ Expected Behavior
Running with --config config.json
should work identically to the command-line approach, even for classes that inherit dataset
via super().__init__(**kwargs)
.