SUBMARIT - SUBMARket Identification and Testing

A Python implementation of submarket clustering algorithms for analyzing product substitution patterns.

Overview

SUBMARIT is a comprehensive toolkit for identifying and analyzing submarkets based on product substitution patterns. This Python implementation provides:

Efficient clustering algorithms for submarket identification
Statistical evaluation methods
Validation techniques including k-fold cross-validation
Support for large-scale data analysis
MATLAB compatibility layer for seamless migration

Installation

From Source

git clone https://github.com/yourusername/submarit.git
cd submarit
pip install -e .

For Development

pip install -e ".[dev]"
pre-commit install

Quick Start

import submarit
import numpy as np

# Load substitution matrix
data = submarit.load_substitution_data("data.csv")
matrix = submarit.SubstitutionMatrix(data)

# Run clustering
clusterer = submarit.LocalSearch(n_clusters=5)
labels = clusterer.fit_predict(matrix)

# Evaluate results
evaluator = submarit.ClusterEvaluator()
metrics = evaluator.evaluate(matrix, labels)
print(f"Log-likelihood: {metrics.log_likelihood}")
print(f"Z-score: {metrics.z_score}")

Features

Core Algorithms
- Local search optimization (quick approximation and direct log-likelihood)
- Constrained clustering with fixed assignments
- Multiple initialization strategies
Evaluation Metrics
- Log-likelihood calculations
- Z-value computations
- GAP statistic for optimal cluster selection
- Entropy-based comparisons
Validation
- K-fold cross-validation
- Empirical distribution generation
- Rand index calculations
- P-value computations
Performance
- Optimized NumPy operations
- Optional Numba JIT compilation
- Parallel processing support
- Memory-efficient sparse matrix handling

Documentation

📚 Online Documentation

Full documentation is available at https://submarit.readthedocs.io

📖 Documentation Contents

Installation Guide - Platform-specific installation instructions
Quick Start Tutorial - Get started with SUBMARIT in minutes
API Reference - Complete API documentation with examples
Algorithm Theory - Mathematical foundations and implementation details
Performance Guide - Optimization strategies and benchmarks
FAQ - Frequently asked questions

🔄 Migration from MATLAB

Migration Guide - Comprehensive guide for MATLAB users
Function Mapping - 1-to-1 MATLAB to Python function reference
Migration Examples - Jupyter notebook with practical examples

📓 Example Notebooks

Getting Started - Basic introduction to SUBMARIT
Advanced Clustering - Advanced techniques and algorithms
Performance Optimization - Tips for optimal performance
Visualization Gallery - Beautiful visualizations
MATLAB Migration - Examples for MATLAB users

🧪 Testing

Test Suite Documentation - Guide to running tests
Benchmarks - Performance benchmark results

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use SUBMARIT in your research, please cite:

@software{submarit,
  title = {SUBMARIT: SUBMARket Identification and Testing},
  year = {2024},
  url = {https://github.com/yourusername/submarit}
}

Acknowledgments

This is a Python implementation of the original MATLAB SUBMARIT package. The original MATLAB files are preserved in the matlab_original/ directory for reference and validation purposes.

Original MATLAB Implementation Credits

The MATLAB implementation includes contributions from:

Stephen France, Mississippi State University (RandIndex4.m, 2012)
Additional contributors (names unknown)

The methodology is based on submarket identification research from marketing science literature, including:

Rand (1971) - Rand Index for clustering similarity
Hubert and Arabie (1985) - Adjusted Rand Index
Urban, Johnson, and Hauser - Z-value calculations
Tibshirani, Walther, and Hastie (2001) - GAP statistic for optimal cluster selection

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude/agents		.claude/agents
.github/workflows		.github/workflows
docs		docs
examples		examples
matlab_original		matlab_original
src/submarit		src/submarit
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml.bak		.readthedocs.yaml.bak
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
CONVERSION_SUMMARY.md		CONVERSION_SUMMARY.md
LICENSE		LICENSE
LOCAL_SEARCH_IMPLEMENTATION.md		LOCAL_SEARCH_IMPLEMENTATION.md
README.md		README.md
TEST_SUMMARY.md		TEST_SUMMARY.md
matlab_files_strategy.md		matlab_files_strategy.md
pyproject.toml		pyproject.toml
run_all_tests.py		run_all_tests.py
test_basic_functionality.py		test_basic_functionality.py
test_local_search.py		test_local_search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SUBMARIT - SUBMARket Identification and Testing

Overview

Installation

From Source

For Development

Quick Start

Features

Documentation

📚 Online Documentation

📖 Documentation Contents

🔄 Migration from MATLAB

📓 Example Notebooks

🧪 Testing

Contributing

License

Citation

Acknowledgments

Original MATLAB Implementation Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

m-marinucci/SUBMARIT

Folders and files

Latest commit

History

Repository files navigation

SUBMARIT - SUBMARket Identification and Testing

Overview

Installation

From Source

For Development

Quick Start

Features

Documentation

📚 Online Documentation

📖 Documentation Contents

🔄 Migration from MATLAB

📓 Example Notebooks

🧪 Testing

Contributing

License

Citation

Acknowledgments

Original MATLAB Implementation Credits

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages