Expose Tantivy 0.26.0 features: aggregations, tokenizers, regex, sort, bytes, count, MLT by nyo16 · Pull Request #23 · nyo16/muninn

nyo16 · 2026-04-11T17:14:39Z

Summary

Switch Tantivy from git pin (commit 51f340f) to crates.io 0.26.0 release
Extend FieldDef tuple from 4 to 6 elements to support fast and tokenizer options
Add 7 new features across schema, query, and analytics layers
229 tests (54 new), 0 failures, clippy clean

New Features

Schema

Feature	API	Description
Bytes field	`Schema.add_bytes_field/3`	Binary data storage/retrieval via `rustler::Binary`
Custom tokenizers	`add_text_field("f", tokenizer: "en_stem")`	Per-field tokenizer: `default`, `raw`, `en_stem`, `whitespace`
Fast fields	`add_u64_field("f", fast: true)`	Columnar storage for sort & aggregations

Queries

Feature	API	Description
Count	`Searcher.count/3`	Lightweight doc count without retrieval
Regex	`Searcher.search_regex/4`	Programmatic `RegexQuery` on text fields
MoreLikeThis	`Searcher.search_more_like_this/3`	Find similar documents by term distribution
Sort by field	`Searcher.search_query_sorted/5`	Sort by fast field value instead of BM25 score

Aggregations

Feature	API	Description
JSON pass-through NIF	`Searcher.aggregate/5`	Single `DirtyCpu`-scheduled NIF for all aggregation types
Builder DSL	`Aggregation.Bucket` / `Aggregation.Metric`	Terms, range, histogram, filter + avg, sum, min, max, stats, cardinality, percentiles

Example Usage

# Custom tokenizer + fast field
schema = Schema.new()
  |> Schema.add_text_field("title", stored: true, tokenizer: "en_stem")
  |> Schema.add_text_field("category", stored: true, tokenizer: "raw", fast: true)
  |> Schema.add_f64_field("price", stored: true, fast: true)

# Count without fetching docs
{:ok, count} = Searcher.count(searcher, "elixir", ["title"])

# Sort by price descending
{:ok, results} = Searcher.search_query_sorted(searcher, "*", ["title"], "price", reverse: true)

# Aggregation: avg price per category
alias Muninn.Aggregation
alias Muninn.Aggregation.{Bucket, Metric}

aggs = Aggregation.new()
  |> Aggregation.add("by_category",
    Bucket.terms("category", size: 10)
    |> Aggregation.sub("avg_price", Metric.avg("price"))
  )

{:ok, results} = Searcher.aggregate(searcher, "*", ["title"], aggs)

Test Plan

- All 175 existing tests pass (backward compatible)
- 54 new tests covering bytes, tokenizers, count, regex, MLT, sort, aggregation builder, aggregation integration
- cargo clippy clean
- mix format + cargo fmt clean

- Update aarch64-apple-darwin to use macos-14 (Apple Silicon) - Update x86_64-apple-darwin to use macos-13-large (Intel) - macOS-13 standard runners have been retired by GitHub Actions

…regex, sort, bytes, count, MLT Switch Tantivy dependency from git pin to crates.io 0.26.0 release and expose 7 new features through the Elixir API: Schema: - Bytes field type with round-trip binary storage via rustler::Binary - Custom tokenizers per text field (default, raw, en_stem, whitespace) - Fast field option for numeric/bool/text fields (required for sort + aggregations) Query features: - Count collector (Searcher.count/3) - lightweight doc counting without retrieval - Regex queries (Searcher.search_regex/4) - programmatic RegexQuery API - MoreLikeThis (Searcher.search_more_like_this/3) - find similar documents - Sort by field value (Searcher.search_query_sorted/5) - TopDocs::order_by_fast_field Aggregations: - Single JSON pass-through NIF (DirtyCpu scheduled) for all aggregation types - Elixir builder DSL: Aggregation, Aggregation.Bucket, Aggregation.Metric - Supports terms, range, histogram, filter buckets + avg/sum/min/max/stats/ cardinality/percentiles metrics with nesting FieldDef tuple extended from 4 to 6 elements (name, type, stored, indexed, fast, tokenizer). 229 tests (54 new), 0 failures. Clippy clean.

- CHANGELOG: document all 7 new features added in 0.5.4 - README: update features list, field types table, add sections for regex, MLT, count, sort, aggregations; update API reference and dev status - Add examples/aggregation_demo.exs showcasing all new features

nyo16 added 12 commits December 31, 2025 12:57

Fix macOS runners for precompiled NIF builds

212933d

- Update aarch64-apple-darwin to use macos-14 (Apple Silicon) - Update x86_64-apple-darwin to use macos-13-large (Intel) - macOS-13 standard runners have been retired by GitHub Actions

Bump version to 0.5.2

89ff430

Resolve merge conflicts - use macos-15-intel for x86_64 builds

0faba6d

Merge branch 'master' of github.com:nyo16/muninn

ad55ae9

Merge branch 'master' of github.com:nyo16/muninn

bfae3b0

Merge branch 'master' of github.com:nyo16/muninn

fd13220

Update attest-build-provenance action to v3

b520c88

Merge branch 'master' of github.com:nyo16/muninn

b810044

Bump Cargo.toml version to 0.5.4

5e3e980

Bump version to 0.5.4

f1e7f2d

nyo16 merged commit 909d167 into master Apr 11, 2026
3 checks passed

nyo16 deleted the feature/tantivy-0.26-features branch April 11, 2026 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose Tantivy 0.26.0 features: aggregations, tokenizers, regex, sort, bytes, count, MLT#23

Expose Tantivy 0.26.0 features: aggregations, tokenizers, regex, sort, bytes, count, MLT#23
nyo16 merged 12 commits into
masterfrom
feature/tantivy-0.26-features

nyo16 commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nyo16 commented Apr 11, 2026

Summary

New Features

Schema

Queries

Aggregations

Example Usage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant