Skip to content

neuroailab/bbscore_public

Repository files navigation

BBScore: Neural Benchmarking Framework

BBScore is a framework for benchmarking deep learning models against neural (fMRI, electrophysiology) and behavioral datasets. It handles model loading, stimulus preprocessing, feature extraction, and comparison with biological data.

  • Simple Notebook For Loading Data and Plotting Open in Colab

  • Data Analysis Notebook (By Josh Wilson) Open in Colab

Quick Start

1. Check Your System

Before installing, check if your machine can run BBScore:

python check_system.py --quick

For a detailed check with a specific configuration:

python check_system.py --model resnet50 --benchmark TVSDV1 --metric ridge

2. Install (Recommended: Use the Install Script)

# Make the script executable
chmod +x install.sh

# Run the interactive installer (recommended for students)
./install.sh

The installer features:

  • Interactive setup wizard with arrow-key navigation
  • Tab completion for directory paths
  • Auto-detection of GPU (NVIDIA CUDA / Apple Silicon MPS)
  • Automatic conda/miniconda installation if needed

Quick install (skip wizard, use defaults):

./install.sh --quick              # Auto-detect GPU
./install.sh --quick --cpu-only   # Force CPU-only PyTorch

All options:

./install.sh --help

Or install manually:

# Create conda environment
conda create -n bbscore python=3.10 -y
conda activate bbscore

# Install PyTorch (adjust for your CUDA version)
pip install torch torchvision torchaudio

# Install decord from conda-forge
conda install -c conda-forge decord -y

# Install dependencies
pip install -r requirements.txt

3. Activate the Environment

If you used the install script:

# Use the generated activation script
source activate_bbscore.sh

# Or activate conda directly
conda activate bbscore

The install script automatically configures the SCIKIT_LEARN_DATA environment variable.

Manual setup (if not using install script):

# Required: Set data directory (50GB+ free space recommended)
export SCIKIT_LEARN_DATA="/path/to/your/data/bbscore_data"

# Add to your shell config for persistence
echo 'export SCIKIT_LEARN_DATA="/path/to/your/data/bbscore_data"' >> ~/.bashrc
source ~/.bashrc

4. Run Your First Benchmark

# Simple example with a small model
python run.py --model resnet18 --layer _orig_mod.resnet.encoder.stages.3 --benchmark V1StaticFullFieldSineGratings --metric ridge

# Video benchmark with online metric (lower memory)
python run.py --model resnet50 --layer _orig_mod.resnet.encoder.stages.3 --benchmark OnlineTVSDV1 --metric online_linear_regressor

For Students: Step-by-Step Guide

What You Need

Resource Minimum Recommended
RAM 8 GB 16+ GB
GPU None (CPU works) 4+ GB VRAM
Disk 50 GB 100+ GB
Python 3.9+ 3.11

No GPU? Use:

  • ridge metric instead of online_linear_regressor
  • Smaller models (resnet18, efficientnet_b0)
  • Online benchmarks (OnlineTVSDV1, OnlineTVSDV4)

Validate Your Setup

Before starting your project, run the validation script to confirm your machine can handle the full pipeline. Note that this may take a while (~20 minutes) given we have to download the neural datasets.

python validate.py

This runs three tiers of checks:

Tier What it tests Time
1. Environment Python version, dependencies, registries, hardware (RAM, GPU, disk) ~30s
2. Model Inference Loads ResNet-18 (vision) and GPT-2 Small (language), runs forward passes ~2-3 min
3. Data & Pipeline Downloads NSD and LeBel2023 datasets, validates data shapes, runs ridge and temporal RSA on synthetic data ~5-30 min (first run downloads data)

All three tiers must pass before you start your project.

You can run individual tiers to isolate issues:

python validate.py --tier 1     # Environment only
python validate.py --tier 2     # Environment + model inference
python validate.py --tier 3     # All tiers (default)

Expected output on a working setup:

  Validation Summary
  PASS  Tier 1: Environment & Dependencies  (1s)
  PASS  Tier 2: Model Loading & Inference   (45s)
  PASS  Tier 3: Data & Pipeline             (120s)

  Your machine is ready for BBScore experiments.

Metric-Benchmark Compatibility

Not all metrics work with all benchmarks. The framework validates this automatically, but here is the reference:

Benchmark Type Compatible Metrics
NSD, TVSD, BMD, LeBel2023 (offline neural) ridge, torch_ridge, pls, rsa, temporal_rsa, versa, bidirectional, one_to_one, soft_matching, semi_matching, temporal_ridge, inverse_ridge
LeBel2023TR (TR-level language) ridge, temporal_rsa
LeBel2023Audio (audio average) ridge, torch_ridge, pls, rsa, temporal_rsa, versa, bidirectional, one_to_one, soft_matching, semi_matching, temporal_ridge, inverse_ridge
LeBel2023AudioTR (TR-level audio) ridge, temporal_rsa
OnlineTVSD online_linear_regressor
OnlinePhysionContact physion_contact_prediction, physion_contact_detection
OnlinePhysionPlacement physion_placement_prediction, physion_placement_detection
(Augmented)SSV2 online_linear_classifier, online_transformer_classifier

Using an incompatible metric will print a warning with the list of compatible options.

Note on TR-level benchmarks: LeBel2023TR and LeBel2023AudioTR are standalone classes that bypass the standard BenchmarkScore pipeline. They implement their own GroupKFold ridge regression internally (using sklearn.linear_model.RidgeCV with story-level cross-validation), which is distinct from the temporal_ridge metric in the registry. The registry's temporal_ridge (Ridge3DChunkedMetric) expects 3D chunked features from the standard pipeline and is not compatible with TR-level benchmarks. Use ridge for encoding accuracy and temporal_rsa for representational geometry comparisons.

Recommended Workflow

  1. Start with small experiments:

    python run.py --model resnet18 --layer _orig_mod.resnet.encoder.stages.3 --benchmark NSDV1Shared --metric ridge
  2. Scale up gradually:

    python run.py --model dinov2_base --layer blocks.11 --benchmark NSDV1Shared --metric ridge

Available Components

Benchmark Examples

Benchmark Type Memory Description
BMD_V1 Video Low Macaque V1 neural responses
NSDV1Shared Image Low Human fMRI V1 (NSD dataset)
TVSDV4 Video High Macaque V4 neural responses
SSV2Benchmark Video High Something-Something-V2
LeBel2023{UTS01-08} Text/fMRI Low Language comprehension fMRI
LeBel2023TR{UTS01-08} Text/fMRI Low TR-level language encoding
LeBel2023Audio{UTS01-08} Audio/fMRI Low Audio comprehension fMRI
LeBel2023AudioTR{UTS01-08} Audio/fMRI Low TR-level audio encoding

Models (Examples)

Model Parameters VRAM Type
resnet18 11M 2 GB Image
resnet50 26M 3 GB Image
dinov2_base 86M 4 GB Image
dinov2_large 304M 8 GB Image
videomae_base 87M 8 GB Video
clip_vit_b32 151M 4 GB Image
whisper_base 74M 2 GB Audio
wav2vec2_base 95M 2 GB Audio
hubert_base 95M 2 GB Audio

Metrics

Metric GPU Required Description
ridge No Ridge regression (sklearn)
online_linear_regressor Recommended but not required Online ridge with SGD and L2 regularization
pls No Partial Least Squares
rsa No Representational Similarity Analysis

Command Reference

Basic Run

python run.py --model <MODEL> --layer <LAYER> --benchmark <BENCHMARK> --metric <METRIC>

Common Options

--batch-size 8       # Adjust based on your GPU memory
--device cuda:0      # Specify GPU

Examples

# Image model on NSD (human fMRI)
python run.py --model resnet50 --layer _orig_mod.resnet.encoder.stages.3 --benchmark NSDV1Shared --metric ridge

# Video model on TVSD (macaque ephys)
python run.py --model videomae_base --layer encoder.layer.11 --benchmark TVSDV1 --metric ridge

# DINO on V4
python run.py --model dinov2_base --layer blocks.11 --benchmark OnlineTVSDV4 --metric ridge

# Audio model on LeBel2023 (human fMRI)
python run.py --model whisper_base --layer _orig_mod.layers.5 --benchmark LeBel2023AudioUTS01 --metric ridge

# Audio TR-level benchmark
python run.py --model wav2vec2_base --layer _orig_mod.encoder.layers.11 --benchmark LeBel2023AudioTRUTS01 --metric ridge

# Fast alpha search for high-dimensional features
python run.py --model dinov2_large --layer blocks.23 --benchmark NSDV1Shared --metric ridge --subsample-features-for-alpha 2000

Finding Layer Names

To see available layer names for any model, print the model architecture:

from models import MODEL_REGISTRY

# Get the model class
model_info = MODEL_REGISTRY['resnet18']
model_instance = model_info['class']()
model = model_instance.get_model('ResNet18')

# Print all layer names
for name, module in model.named_modules():
    print(name)

List Available Options

python check_system.py --list

Troubleshooting

Out of Memory (GPU)

# Reduce batch size
python run.py ... --batch-size 2

# Use CPU
python run.py ... --device cpu --metric ridge

No GPU / Installation Issues

# Reinstall with CPU-only PyTorch (smaller download, always works)
./install.sh --quick --cpu-only

Out of Memory (RAM)

  • Use Online* benchmarks instead of standard ones
  • Use smaller models

Slow Training

Dataset Download Issues

  • Ensure SCIKIT_LEARN_DATA is set to a writable directory
  • Check you have enough disk space
  • Some datasets require AWS credentials (see data/ folder)

Loss Functions for OnlineLinearRegressor

When using OnlineLinearRegressor, you can choose different loss functions:

Loss Type Description
mse Default - Mean squared error with L2 regularization
correlation Pearson correlation loss
combined MSE + correlation (tune correlation_weight)
ccc Concordance Correlation Coefficient (combines correlation + scale)
ccc_mse CCC + MSE combined

Example:

from metrics import OnlineLinearRegressor

# Default configuration (MSE + L2)
metric = OnlineLinearRegressor(
    input_feature_dim=768,
    loss_type='mse',  # Default
    n_epochs=100,
)

# Alternative: CCC loss is also available
metric_ccc = OnlineLinearRegressor(
    input_feature_dim=768,
    loss_type='ccc',
    n_epochs=100,
)

Project Structure

bbscore_public/
├── benchmarks/          # Benchmark definitions (NSD, TVSD, Physion, etc.)
├── data/                # Dataset loaders and downloaders
├── metrics/             # Scoring methods (ridge, RSA, PLS, online)
│   └── losses.py        # Loss functions (MSE, CCC, Pearson, etc.)
├── models/              # Model wrappers (HuggingFace, TorchVision)
├── run.py               # Main entry point
├── eval.py              # Batch evaluation script
├── check_system.py      # System diagnostic tool
├── install.sh           # Interactive installation script
├── activate_bbscore.sh  # Environment activation (generated by install.sh)
└── requirements.txt     # Python dependencies

Getting Help

  1. Check your system: python check_system.py
  2. List options: python check_system.py --list

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •