Skip to content

neuroailab/SpelkeBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpelkeBench: Benchmarking Spelke Segmentation

The SpelkeBench Benchmark

Segmentation is the task of identifying object boundarie—specifically, given an image and a point on an object, the goal is to produce a mask delineating that object's boundaries. Traditional segmentation methods often rely on category labels (e.g., “car” or “tree”). In contrast, we draw from developmental psychology the notion of Spelke objects—groupings of physical entities that reliably move together under applied forces, a concept first introduced by Liz Spelke in [Principles of Object Perception] (https://www.harvardlds.org/wp-content/uploads/2017/01/Spelke1990-1.pdf). Defined by category-agnostic causal motion relationships, these segments reflect how objects interact and respond in the real world, making them especially relevant for physical reasoning and robotic manipulation.

SpelkeBench is a ~500-image evaluation dataset designed to assess whether segmentation algorithms can identify such segments. The dataset spans two complementary domains: high-resolution natural imagery sourced from EntitySeg and real-world robotic interaction scenes from Open X-Embodiment. Together, these domains support evaluation across both unconstrained natural scenes and structured physical environments. Example segments from SpelkeBench are displayed below, which we compare to SAM and EntitySeg segments to show that SpelkeBench contains annotated segments which align more with the Spelke notion. SpelkeBench example

Dataset Overview

SpelkeBench provides a standardized evaluation framework for Spelke segmentation with:

  • ~500 images spanning natural and robotic scenes
  • Ground truth segments which align with Spelke concept based on physical motion coherence
  • Virtual poke points indicating where to apply segmentation queries (centroid)

Download the Dataset

Clone this repository and download the SpelkeBench dataset:

git clone https://github.com/neuroailab/SpelkeBench.git
cd SpelkeBench
bash download_spelke_bench.sh

This will download spelke_bench.h5 to the datasets/ directory.

Dataset Format

The dataset is provided as a single HDF5 file where each key corresponds to an image sample containing:

Field Description Shape
rgb Input RGB image [H, W, 3]
segment Ground truth Spelke segments [N, H, W]
centroid Virtual poke locations (x, y) [N, 2]

Where N is the number of ground truth segments/centroids for that image.


Evaluating Your Model on SpelkeBench

SpelkeBench provides tools to evaluate any segmentation model that can segment objects based on point prompts. The evaluation pipeline handles dataset loading, parallel inference, and metric computation.

Step 1: Install SpelkeBench

conda create -n spelkebench python=3.10 -y
conda activate spelkebench
pip install -e .

This installs the command-line utilities: spelkebench-infer, spelkebench-launch, and spelkebench-evaluate.

Step 2: Implement the Model Interface

Create a model class that inherits from spelke_bench.models.segmentation_class.SegmentationModel:

from spelke_bench.models.segmentation_class import SegmentationModel
import numpy as np

class YourSegmentationModel(SegmentationModel):
    def __init__(self):
        """Initialize your model, load weights, etc."""
        super().__init__()
        # Your initialization code here
        
    def run_inference(self, input_image, poke_point):
        """
        Perform segmentation based on a poke point.
        
        Args:
            input_image (np.ndarray): RGB image of shape [H, W, 3] with values in [0, 255] range
            poke_point (tuple): (x, y) coordinates
        
        Returns:
            np.ndarray: Binary segmentation mask of shape [H, W] with values in {0, 1}
        """
        # Your segmentation logic here
        pass

Step 3: Run Inference

Single GPU Inference

For quick testing or debugging on a subset of images:

spelkebench-infer \
  --model_name your_model.SegmentationModel \
  --dataset_path ./datasets/spelke_bench.h5 \
  --output_dir ./results/my_model \
  --device cuda:0 \
  --img_names entityseg_1_image2926 entityseg_2_image1258

Multi-Node Distributed Inference

For cluster environments with multiple nodes (e.g., 4 nodes with 4 GPUs each):

On each node, run:

spelkebench-launch \
  --gpus 0 1 2 3 \
  --dataset_path ./datasets/spelke_bench.h5 \
  --output_dir ./results/my_model \
  --num_splits 4 \
  --split_num <node_id> \
  --model_name your_model.SegmentationModel

Replace <node_id> with 0, 1, 2, or 3 for each respective node.

Step 5: Evaluate Results

Once inference is complete, compute metrics:

spelkebench-evaluate \
  --input_dir ./results/my_model \
  --output_dir ./results/my_model/metrics
  --dataset_path ./datasets/spelke_bench.h5

This will:

  • Generate visual comparisons between predictions and ground truth
  • Calculate and print Average Recall (AR) and mean IoU scores
  • Save per-image metrics and visualizations

Command-Line Arguments

Argument Description
--model_name Python path to your model class
--dataset_path Path to spelke_bench.h5
--output_dir Directory for saving predictions
--device GPU device (e.g., cuda:0)
--img_names Specific image keys to process
--gpus GPU IDs for parallel processing
--num_splits Total splits for multi-node
--split_num Current split index

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published