Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
6d342ca
Merge pull request #39 from mountainash-io/release/26.4.0
discreteds Apr 14, 2026
9d39d49
docs: spec for cross-backend performance benchmarks
discreteds Apr 28, 2026
a592679
docs: implementation plan for cross-backend performance benchmarks
discreteds Apr 28, 2026
f9fe15a
fix(benchmarks): exclude ibis-polars from benchmark backends
discreteds Apr 28, 2026
3b8dd1d
feat(benchmarks): add test-perf-save hatch command for JSON baselines
discreteds Apr 28, 2026
e96eac2
feat(benchmarks): add synthetic data generator with tests
discreteds Apr 28, 2026
41f9798
feat(benchmarks): add scaling matrix and strategy isolation benchmarks
discreteds Apr 28, 2026
76a3aa2
fix(benchmarks): skip broken string-match strategies on pandas/narwha…
discreteds Apr 28, 2026
46a6400
chore(benchmarks): link string-match skips to upstream mountainash#89
discreteds Apr 28, 2026
fa64363
docs: rewrite README for expression-based engine architecture
discreteds Apr 28, 2026
d37c56e
docs: add 'why metadata not code' motivation to README
discreteds Apr 28, 2026
8f670bd
docs: use professional tone in README opening
discreteds Apr 28, 2026
d748ac4
chore: update config for local dev (pyrightconfig, coverage, gitignore)
discreteds Apr 28, 2026
608f512
chore: remove stale phase 1-3 benchmark and test scripts from repo root
discreteds Apr 28, 2026
4245e03
chore: remove stale phase 1-3 benchmark results
discreteds Apr 28, 2026
6859e04
chore: remove deprecated test files for deleted engines
discreteds Apr 28, 2026
5934755
chore: remove stale planning and AI over-enthusiasm docs
discreteds Apr 28, 2026
e2be2fa
chore: remove TESTING.md (superseded by README and CLAUDE.md)
discreteds Apr 28, 2026
c7a2fff
chore: remove pre-expression-engine planning and retrospective docs
discreteds Apr 28, 2026
f8b6fde
chore: remove SQL reference files, update gitignore
discreteds Apr 28, 2026
4f74a12
Merge branch 'main' into release/26.4.0
discreteds Apr 28, 2026
ea9fda4
Merge pull request #41 from mountainash-io/release/26.4.0
discreteds Apr 28, 2026
14b70e0
feat: add DimensionRole enum and role field to Dimension
discreteds Apr 29, 2026
6bb3698
feat: add Aggregate model and Lattice data class
discreteds Apr 29, 2026
507d0ab
feat: add prime table and checked multiply for accumulator DNA
discreteds Apr 29, 2026
eef759c
feat: add AccumulatorResult extending RuleResult
discreteds Apr 29, 2026
c5ca703
feat: add AccumulatorCompiler with EXACT compatible/coalesce
discreteds Apr 29, 2026
1ad5c1c
feat: add RANGE/GREATER_THAN/LESS_THAN to AccumulatorCompiler
discreteds Apr 29, 2026
a1d25ac
feat: implement AccumulatorEngine build phase
discreteds Apr 29, 2026
c77a6a7
feat: implement AccumulatorEngine apply phase with integration tests
discreteds Apr 29, 2026
2142c11
feat: export accumulator engine public API
discreteds Apr 29, 2026
faef85d
docs: add accumulator engine to CLAUDE.md
discreteds Apr 29, 2026
f837e3f
test: add cross-backend parametrized tests for accumulator apply phase
discreteds Apr 29, 2026
338756d
test: add accumulator engine performance benchmarks
discreteds Apr 29, 2026
de74c5a
test: add accumulator edge case and isolation tests
discreteds Apr 29, 2026
fed5da6
docs: add multi-lattice composition discussion to backlog
discreteds Apr 29, 2026
29429a1
Merge branch 'develop' into feature/accumulator-engine
discreteds Apr 29, 2026
669a53c
fix: exclude benchmark tests from CI
discreteds Apr 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -172,3 +172,9 @@ htmlcov/
#testing artifacts
junit.*
coverage.*

#benchmark artifacts
.benchmarks/

.ruff_cache/
benchmark_results
28 changes: 19 additions & 9 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,14 @@ Mountain Ash Utils Rules is a high-performance Python package that provides revo
- **BaseMatchStrategy**: Abstract base class for rule matching strategies
- **ContextHelper**: Utilities for context value extraction and type validation

#### Accumulator Engine
- **AccumulatorEngine**: Build/Apply engine that computes maximal consistent rule combinations with accumulated numerics. Python-controlled iteration with expression-based steps via `mountainash.relations`
- **AccumulatorCompiler**: Compiles `coalesce`, `compatible`, and NA flag expressions per dimension for the accumulator's recursive combination step
- **Lattice**: Data class wrapping the build-phase output — outermost rulesets as a backend DataFrame
- **AccumulatorResult**: Extends `RuleResult` with accumulator-specific accessors (accumulated numerics, provenance, combination depth)
- **Aggregate**: Pydantic model declaring a named numeric column and its monoidal operation (sum, min, max, product)
- **DimensionRole**: Enum (`CONSTRAINT` / `CONTEXT_KEY`) on `Dimension` — distinguishes lattice-partitioning dimensions from coalesced dimensions

#### Performance-Optimized Engines (Phases 2-3)
- **HybridRulesEngine**: Hybrid numpy/ibis engine with automatic optimization selection
- **NumpyRuleProcessor**: Vectorized numpy-based rule processor for performance
Expand Down Expand Up @@ -62,16 +70,18 @@ The rules engine supports 11 match strategies via the `MatchStrategy` enum, comp
src/mountainash_utils_rules/
├── __init__.py # Package exports and public API
├── __version__.py # Version information
├── constants.py # Constants, enums, and ternary value definitions
├── accumulator_compiler.py # Coalesce/compatible/NA flag expression compilation
├── accumulator_engine.py # AccumulatorEngine: build/apply for rule combinations
├── accumulator_result.py # AccumulatorResult extending RuleResult
├── aggregate.py # Aggregate model for numeric accumulation
├── constants.py # Constants, enums (MatchStrategy, DimensionRole), ternary values
├── compiler.py # DimensionCompiler for filter engine expressions
├── context.py # Context handling utilities with batch optimization
├── dimension.py # Dimension metadata and management
├── engine.py # Original RulesEngine implementation
├── hybrid_engine.py # Phase 2: Hybrid numpy/ibis engine
├── numpy_processor.py # Vectorized numpy rule processor
├── vectorized_engine.py # Phase 3: Revolutionary polars-based engine
├── observer.py # Observability and debugging support
├── rule_manager.py # Rule storage and backend management
└── rule_strategies.py # Match strategy implementations
├── dimension.py # Dimension metadata (with role field) and management
├── engine.py # ExpressionRulesEngine (filter engine)
├── lattice.py # Lattice data class wrapping build output
├── primes.py # Prime table and checked multiply for combination DNA
└── result.py # RuleResult base class

tests/
├── benchmarks/ # Performance benchmarking framework
Expand Down
164 changes: 100 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,99 +1,135 @@
# mountainash-utils-rules

![Python](https://img.shields.io/badge/python-3.10%2B-blue) ![Category](https://img.shields.io/badge/category-utils-purple) ![Tests](https://img.shields.io/badge/tests-✓-green) ![Docs](https://img.shields.io/badge/docs-✓-blue)
![Python](https://img.shields.io/badge/python-3.12-blue) ![License](https://img.shields.io/badge/license-MIT-green) ![Backend](https://img.shields.io/badge/backends-7-purple)

Business rules that live in code are slow to change — every update requires a code review, a deploy, and a release cycle. This engine moves the logic into **metadata**: rules are rows in a DataFrame (or a database table), and the engine evaluates them against a context without any rule-specific code. Change a rule, reload the DataFrame, and the new behaviour applies immediately. No redeploy. No code change. No branching logic to maintain.

Mountain Ash - Utils - Rules
The engine evaluates rules against a context and ranks matches by specificity. Define dimensions with match strategies (exact, range, regex, prefix, set membership, etc.), pass a context, and get back the most specific matching rules — ranked, explained, and ready to use.

This utility package provides common functionality used across the Mountain Ash ecosystem.



## Installation

### Development Installation

```bash
# Clone and install in development mode
git clone <repository-url>
cd mountainash-utils-rules
pip install -e .
```

### Using Hatch

```bash
# Create development environment
hatch env create

# Run commands in the environment
hatch run <command>
```
Built on [mountainash](https://github.com/mountainash-io/mountainash) expressions for backend-agnostic evaluation. Rules compile once; contexts evaluate in a single vectorised pass. Supports Polars, Pandas, Ibis (DuckDB, SQLite, Polars), and Narwhals backends with zero code changes.

### Why metadata, not code?

- **Rules change faster than code.** Pricing tiers, eligibility criteria, fraud thresholds — these are business decisions that shift weekly. When rules are data, a product owner can update them in a database and the engine picks up the change on the next evaluation. No PR, no deploy, no downtime.
- **Rules are auditable.** Every rule is a row with a name, dimensions, and match criteria. You can diff two rule sets, version them in a table, and explain exactly why a context matched — because the engine tracks per-dimension ternary results (match / unknown / non-match) for every rule.
- **Rules scale without branching.** A hand-coded rule system with 2,000 rules is unmaintainable. A DataFrame with 2,000 rows is just data. The engine evaluates all of them in one vectorised pass regardless of count.

## Quick Start

```python
import mountainash_utils_rules

# Basic usage example
# TODO: Add specific usage example
import polars as pl
from mountainash_utils_rules import (
ExpressionRulesEngine, Dimension, DimensionsMetadata, MatchStrategy,
)
from pydantic import BaseModel

# 1. Define your rules as a DataFrame
rules = pl.DataFrame({
"rule_name": ["premium_au", "standard", "fallback"],
"region": ["AU", "AU", "<NA>"], # <NA> = wildcard
"spend_min": [1000, 0, -999999999], # -999999999 = wildcard
"spend_max": [9999, 999, -999999999],
})

# 2. Declare how each dimension matches
metadata = DimensionsMetadata(dimensions=[
Dimension(dimension_name="region", match_strategy=MatchStrategy.EXACT, data_type=str),
Dimension(
dimension_name="spend", match_strategy=MatchStrategy.RANGE, data_type=int,
range_min_field="spend_min", range_max_field="spend_max",
),
])

# 3. Build the engine (compiles expressions once)
engine = ExpressionRulesEngine(rules=rules, dimension_metadata=metadata)

# 4. Evaluate a context
class CustomerContext(BaseModel):
region: str
spend: int

result = engine.evaluate(CustomerContext(region="AU", spend=1500))

print(result.count) # 2 — premium_au and fallback survive
print(result.best_match) # premium_au (matches both dimensions)
print(result.explain("premium_au")) # {"region": 1, "spend": 1} — both match
print(result.explain("fallback")) # {"region": 0, "spend": 0} — both wildcard
```

## Match Strategies

| Strategy | Rule Column | Description |
|----------|-------------|-------------|
| `EXACT` | Scalar value | Context value equals rule value |
| `NOT_EQUAL` | Scalar value | Context value differs from rule value |
| `RANGE` | Two columns (min/max) | Context value within [min, max] |
| `GREATER_THAN` | Threshold | Context value > rule threshold |
| `LESS_THAN` | Threshold | Context value < rule threshold |
| `PREFIX` | String | Context value starts with rule value |
| `SUFFIX` | String | Context value ends with rule value |
| `CONTAINS` | Substring | Context value contains rule value |
| `REGEX` | Pattern | Context value matches rule pattern |
| `SET_MEMBERSHIP` | List | Context value is in rule's list |
| `SET_EXCLUSION` | List | Context value is not in rule's list |

## Features

- **1 Python modules** providing core functionality
- **Comprehensive test suite** ensuring reliability
- **Jupyter notebooks** with examples and tutorials
- **3 core dependencies** for robust functionality


Wildcard values (`<NA>` for strings, `-999999999` for numerics) produce an UNKNOWN result — the rule is not eliminated but scores lower on specificity.

## Documentation
## Backend Support

- **[CLAUDE.md](CLAUDE.md)** - Technical documentation and development guide
- **Testing** - Run tests with `pytest` or `hatch run test`
- **[Mountain Ash Documentation](https://mountainash-io.github.io/mountainash-docs/)** - Complete ecosystem documentation
The engine is backend-agnostic. Pass any supported DataFrame type as `rules`:

| Backend | Type |
|---------|------|
| Polars | `pl.DataFrame` |
| Pandas | `pd.DataFrame` |
| Narwhals (Polars) | `nw.from_native(pl.DataFrame(...))` |
| Narwhals (Pandas) | `nw.from_native(pd.DataFrame(...))` |
| Ibis (DuckDB) | `ibis.duckdb.connect().create_table(...)` |
| Ibis (Polars) | `ibis.polars.connect().create_table(...)` |
| Ibis (SQLite) | `ibis.sqlite.connect().create_table(...)` |

All backends produce identical results. Polars is recommended for performance.

## Development

### Testing
## Installation

```bash
# Run tests with Hatch
hatch run test

# Run with coverage
hatch run test:cov
# Development install with hatch
git clone https://github.com/mountainash-io/mountainash-utils-rules.git
cd mountainash-utils-rules
hatch env create
```

### Build Commands

See [CLAUDE.md](CLAUDE.md) for complete build and development commands.
Requires sibling checkouts of `mountainash`, `mountainash-data`, and `mountainash-settings` (see `hatch.toml` for path configuration).

### Contributing
## Development

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run tests and linting
5. Submit a pull request
| Command | Description |
|---------|-------------|
| `hatch run test:test-quick` | Run tests (no coverage) |
| `hatch run test:test` | Run tests with coverage reports |
| `hatch run test:test-target-quick tests/path.py -v` | Run specific tests |
| `hatch run test:test-perf` | Run performance benchmarks |
| `hatch run test:test-perf-save` | Benchmarks + save JSON baseline |
| `hatch run ruff:check` | Lint |
| `hatch run ruff:fix` | Lint + auto-fix |
| `hatch run mypy:check` | Type check |
| `hatch run radon:radon-cc` | Cyclomatic complexity |

## Architecture

The engine uses **signed-integer ternary logic** (-1 = non-match, 0 = unknown, 1 = match) to evaluate each rule dimension independently, then combines results in a single vectorised pass:

## License
1. **Compile** — `DimensionCompiler` converts dimension metadata into backend-agnostic expression templates at construction time.
2. **Bind** — Context values are injected as literal columns alongside the rules DataFrame.
3. **Evaluate** — All dimension expressions execute in one `with_columns` call, producing a ternary value per dimension per rule.
4. **Rank** — Rules with any -1 are eliminated. Survivors are ranked by **specificity** (count of 1s). More specific rules rank higher.

See LICENSE file for details.
The engine and result layer use only `mountainash.relations` and `mountainash.expressions` — no direct backend imports. See [CLAUDE.md](CLAUDE.md) for full architectural details.

## Mountain Ash Ecosystem

This package is part of the [Mountain Ash](https://github.com/mountainash-io) ecosystem of Python packages.
This package is part of the [Mountain Ash](https://github.com/mountainash-io) data framework ecosystem.

---
*README.md generated by [Mountain Ash Documentation Generator](https://github.com/mountainash-io/mountainash-docs) on 2025-07-21*
## License

MIT — see [LICENSE](LICENSE) for details.
Loading
Loading