From 5c79da02d6e912e9b0e8c11ef749e9de94f593ac Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 11:25:14 +0000 Subject: [PATCH 01/13] docs: condense README to elevator pitch (#478) --- README.md | 206 +++++++++++------------------------------------------- 1 file changed, 39 insertions(+), 167 deletions(-) diff --git a/README.md b/README.md index ff5a7682e..442d8fa55 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,15 @@ - +Mellea logo -# Mellea - -Mellea is a library for writing generative programs. -Generative programming replaces flaky agents and brittle prompts -with structured, maintainable, robust, and efficient AI workflows. +# Mellea — build predictable AI without guesswork +Inside every AI-powered pipeline, the unreliable part is the same: the LLM call itself. +Silent failures, untestable outputs, no guarantees. +Mellea wraps those calls in Python you can read, test, and reason about — +type-annotated outputs, verifiable requirements, automatic retries. [//]: # ([![arXiv](https://img.shields.io/badge/arXiv-2408.09869-b31b1b.svg)](https://arxiv.org/abs/2408.09869)) -[![Docs](https://img.shields.io/badge/docs-live-brightgreen)](https://docs.mellea.ai/) +[![Website](https://img.shields.io/badge/website-mellea.ai-blue)](https://mellea.ai/) +[![Docs](https://img.shields.io/badge/docs-docs.mellea.ai-brightgreen)](https://docs.mellea.ai/) [![PyPI version](https://img.shields.io/pypi/v/mellea)](https://pypi.org/project/mellea/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mellea)](https://pypi.org/project/mellea/) [![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv) @@ -18,189 +19,60 @@ with structured, maintainable, robust, and efficient AI workflows. [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-3.0-4baaaa.svg)](CODE_OF_CONDUCT.md) [![Discord](https://img.shields.io/discord/1448407063813165219?logo=discord&logoColor=white&label=Discord&color=7289DA)](https://ibm.biz/mellea-discord) - -## Features - - * A standard library of opinionated prompting patterns. - * Sampling strategies for inference-time scaling. - * Clean integration between verifiers and samplers. - - Batteries-included library of verifiers. - - Support for efficient checking of specialized requirements using - activated LoRAs. - - Train your own verifiers on proprietary classifier data. - * Compatible with many inference services and model families. Control cost - and quality by easily lifting and shifting workloads between: - - inference providers - - model families - - model sizes - * Easily integrate the power of LLMs into legacy code-bases (mify). - * Sketch applications by writing specifications and letting `mellea` fill in - the details (generative slots). - * Get started by decomposing your large unwieldy prompts into structured and maintainable mellea problems. - - - -## Getting Started - -You can get started with a local install, or by using Colab notebooks. - -### Getting Started with Local Inference - - - -Install with [uv](https://docs.astral.sh/uv/getting-started/installation/): +## Install ```bash uv pip install mellea ``` -Install with pip: +See [installation docs](https://docs.mellea.ai/getting-started/installation) for extras (`[hf]`, `[watsonx]`, `[docling]`, `[all]`, …) and source installation. -```bash -pip install mellea -``` - -> [!NOTE] -> `mellea` comes with some additional packages as defined in our `pyproject.toml`. If you would like to install all the extra optional dependencies, please run the following commands: -> -> ```bash -> uv pip install "mellea[hf]" # for Huggingface extras and Alora capabilities -> uv pip install "mellea[watsonx]" # for watsonx backend -> uv pip install "mellea[docling]" # for docling -> uv pip install "mellea[smolagents]" # for HuggingFace smolagents tools -> uv pip install "mellea[all]" # for all the optional dependencies -> ``` -> -> You can also install all the optional dependencies with `uv sync --all-extras` - -> [!NOTE] -> If running on an Intel mac, you may get errors related to torch/torchvision versions. Conda maintains updated versions of these packages. You will need to create a conda environment and run `conda install 'torchvision>=0.22.0'` (this should also install pytorch and torchvision-extra). Then, you should be able to run `uv pip install mellea`. To run the examples, you will need to use `python ` inside the conda environment instead of `uv run --with mellea `. - -> [!NOTE] -> If you are using python >= 3.13, you may encounter an issue where outlines cannot be installed due to rust compiler issues (`error: can't find Rust compiler`). You can either downgrade to python 3.12 or install the [rust compiler](https://www.rust-lang.org/tools/install) to build the wheel for outlines locally. - -For running a simple LLM request locally (using Ollama with Granite model), this is the starting code: -```python -# file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/example.py -import mellea - -m = mellea.start_session() -print(m.chat("What is the etymology of mellea?").content) -``` - - -Then run it: -> [!NOTE] -> Before we get started, you will need to download and install [ollama](https://ollama.com/). Mellea can work with many different types of backends, but everything in this tutorial will "just work" on a Macbook running IBM's Granite 4 Micro 3B model. -```shell -uv run --with mellea docs/examples/tutorial/example.py -``` - -### Get Started with Colab - -| Notebook | Try in Colab | Goal | -|----------|--------------|------| -| Hello, World | Open In Colab | Quick‑start demo | -| Simple Email | Open In Colab | Using the `m.instruct` primitive | -| Instruct-Validate-Repair | Open In Colab | Introduces our first generative programming design pattern | -| Model Options | Open In Colab | Demonstrates how to pass model options through to backends | -| Sentiment Classifier | Open In Colab | Introduces the `@generative` decorator | -| Managing Context | Open In Colab | Shows how to construct and manage context in a `MelleaSession` | -| Generative OOP | Open In Colab | Demonstrates object-oriented generative programming in Mellea | -| Rich Documents | Open In Colab | A generative program that uses Docling to work with rich-text documents | -| Composing Generative Functions | Open In Colab | Demonstrates contract-oriented programming in Mellea | -| `m serve` | Open In Colab | Serve a generative program as an openai-compatible model endpoint | -| MCP | Open In Colab | Mellea + MCP | - - -### Installing from Source - -If you want to contribute to Mellea or need the latest development version, see the -[Getting Started](CONTRIBUTING.md#getting-started) section in our Contributing Guide for -detailed installation instructions. - -## Getting started with validation - -Mellea supports validation of generation results through a **instruct-validate-repair** pattern. -Below, the request for *"Write an email.."* is constrained by the requirements of *"be formal"* and *"Use 'Dear interns' as greeting."*. -Using a simple rejection sampling strategy, the request is sent up to three (loop_budget) times to the model and -the output is checked against the constraints using (in this case) LLM-as-a-judge. - - -```python -# file: https://github.com/generative-computing/mellea/blob/main/docs/examples/instruct_validate_repair/101_email_with_validate.py -from mellea import MelleaSession -from mellea.backends import ModelOption -from mellea.backends.ollama import OllamaModelBackend -from mellea.backends import model_ids -from mellea.stdlib.sampling import RejectionSamplingStrategy - -# create a session with Mistral running on Ollama -m = MelleaSession( - backend=OllamaModelBackend( - model_id=model_ids.MISTRALAI_MISTRAL_0_3_7B, - model_options={ModelOption.MAX_NEW_TOKENS: 300}, - ) -) - -# run an instruction with requirements -email_v1 = m.instruct( - "Write an email to invite all interns to the office party.", - requirements=["be formal", "Use 'Dear interns' as greeting."], - strategy=RejectionSamplingStrategy(loop_budget=3), -) - -# print result -print(f"***** email ****\n{str(email_v1)}\n*******") -``` - - -## Getting Started with Generative Slots - -Generative slots allow you to define functions without implementing them. -The `@generative` decorator marks a function as one that should be interpreted by querying an LLM. -The example below demonstrates how an LLM's sentiment classification -capability can be wrapped up as a function using Mellea's generative slots and -a local LLM. +## Example +The `@generative` decorator turns a typed Python function into a structured LLM call. +Docstrings become prompts, type hints become schemas — no parsers, no chains: ```python -# file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/sentiment_classifier.py#L1-L13 -from typing import Literal +from pydantic import BaseModel from mellea import generative, start_session +class UserProfile(BaseModel): + name: str + age: int @generative -def classify_sentiment(text: str) -> Literal["positive", "negative"]: - """Classify the sentiment of the input text as 'positive' or 'negative'.""" +def extract_user(text: str) -> UserProfile: + """Extract the user's name and age from the text.""" - -if __name__ == "__main__": - m = start_session() - sentiment = classify_sentiment(m, text="I love this!") - print("Output sentiment is:", sentiment) +m = start_session() +user = extract_user(m, text="User log 42: Alice is 31 years old.") +print(user.name) # Alice +print(user.age) # 31 — always an int, guaranteed by the schema ``` +## Learn More -## Contributing +| Resource | | +|---|---| +| [mellea.ai](https://mellea.ai) | Vision, features, and live demos | +| [docs.mellea.ai](https://docs.mellea.ai) | Full docs — tutorials, API reference, how-to guides | +| [Colab notebooks](docs/examples/notebooks/) | Interactive examples you can run immediately | +| [Code examples](docs/examples/) | Runnable examples: RAG, agents, IVR, MObjects, and more | -We welcome contributions to Mellea! There are several ways to contribute: +## Contributing -1. **Contributing to this repository** - Core features, bug fixes, standard library components -2. **Applications & Libraries** - Build tools using Mellea (host in your own repo with `mellea-` prefix) -3. **Community Components** - Contribute to [mellea-contribs](https://github.com/generative-computing/mellea-contribs) +We welcome contributions of all kinds — bug fixes, new backends, standard library components, examples, and docs. -Please see our **[Contributing Guide](CONTRIBUTING.md)** for detailed information on: -- Getting started with development -- Coding standards and workflow -- Testing guidelines -- How to contribute specific types of components +- **[Contributing Guide](https://docs.mellea.ai/community/contributing-guide)** — development setup, workflow, and coding standards +- **[Building Extensions](https://docs.mellea.ai/community/building-extensions)** — create reusable components in your own repo +- **[mellea-contribs](https://github.com/generative-computing/mellea-contribs)** — community library for shared components -Questions? Join our [Discord](https://ibm.biz/mellea-discord)! +Questions? Join our [Discord](https://ibm.biz/mellea-discord). ### IBM ❤️ Open Source AI -Mellea has been started by IBM Research in Cambridge, MA. - +Mellea was started by IBM Research in Cambridge, MA. +--- +Licensed under the [Apache-2.0 License](LICENSE). Copyright © 2026 Mellea. From 611358528b470a2901e1d2e9f8c83b0fc37f788d Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 11:25:48 +0000 Subject: [PATCH 02/13] docs: link contributing guide to CONTRIBUTING.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 442d8fa55..464cf8627 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ print(user.age) # 31 — always an int, guaranteed by the schema We welcome contributions of all kinds — bug fixes, new backends, standard library components, examples, and docs. -- **[Contributing Guide](https://docs.mellea.ai/community/contributing-guide)** — development setup, workflow, and coding standards +- **[Contributing Guide](CONTRIBUTING.md)** — development setup, workflow, and coding standards - **[Building Extensions](https://docs.mellea.ai/community/building-extensions)** — create reusable components in your own repo - **[mellea-contribs](https://github.com/generative-computing/mellea-contribs)** — community library for shared components From 5b291d07b3557097b5c10928f3ce4769ad4980ce Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 11:29:12 +0000 Subject: [PATCH 03/13] docs: fix license badge link, vision statement, IVR spelling, wording tweaks --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 464cf8627..1443f6e65 100644 --- a/README.md +++ b/README.md @@ -4,8 +4,8 @@ Inside every AI-powered pipeline, the unreliable part is the same: the LLM call itself. Silent failures, untestable outputs, no guarantees. -Mellea wraps those calls in Python you can read, test, and reason about — -type-annotated outputs, verifiable requirements, automatic retries. +Mellea is a Python library for writing *generative programs* — replacing brittle prompts and flaky agents +with structured, testable AI workflows built around type-annotated outputs, verifiable requirements, and automatic retries. [//]: # ([![arXiv](https://img.shields.io/badge/arXiv-2408.09869-b31b1b.svg)](https://arxiv.org/abs/2408.09869)) [![Website](https://img.shields.io/badge/website-mellea.ai-blue)](https://mellea.ai/) @@ -15,7 +15,7 @@ type-annotated outputs, verifiable requirements, automatic retries. [![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv) [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit) -[![GitHub License](https://img.shields.io/github/license/generative-computing/mellea)](https://img.shields.io/github/license/generative-computing/mellea) +[![GitHub License](https://img.shields.io/github/license/generative-computing/mellea)](https://github.com/generative-computing/mellea/blob/main/LICENSE) [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-3.0-4baaaa.svg)](CODE_OF_CONDUCT.md) [![Discord](https://img.shields.io/discord/1448407063813165219?logo=discord&logoColor=white&label=Discord&color=7289DA)](https://ibm.biz/mellea-discord) @@ -30,7 +30,7 @@ See [installation docs](https://docs.mellea.ai/getting-started/installation) for ## Example The `@generative` decorator turns a typed Python function into a structured LLM call. -Docstrings become prompts, type hints become schemas — no parsers, no chains: +Docstrings become prompts, type hints become schemas — no templates, no parsers: ```python from pydantic import BaseModel @@ -57,7 +57,7 @@ print(user.age) # 31 — always an int, guaranteed by the schema | [mellea.ai](https://mellea.ai) | Vision, features, and live demos | | [docs.mellea.ai](https://docs.mellea.ai) | Full docs — tutorials, API reference, how-to guides | | [Colab notebooks](docs/examples/notebooks/) | Interactive examples you can run immediately | -| [Code examples](docs/examples/) | Runnable examples: RAG, agents, IVR, MObjects, and more | +| [Code examples](docs/examples/) | Runnable examples: RAG, agents, Instruct-Validate-Repair (IVR), MObjects, and more | ## Contributing From 19e205e4c9115acf63ec52a854a75490cc7b1d5d Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 11:30:35 +0000 Subject: [PATCH 04/13] docs: replace Discord link with GitHub Discussions --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1443f6e65..cf26811ab 100644 --- a/README.md +++ b/README.md @@ -67,7 +67,7 @@ We welcome contributions of all kinds — bug fixes, new backends, standard libr - **[Building Extensions](https://docs.mellea.ai/community/building-extensions)** — create reusable components in your own repo - **[mellea-contribs](https://github.com/generative-computing/mellea-contribs)** — community library for shared components -Questions? Join our [Discord](https://ibm.biz/mellea-discord). +Questions? Open a [GitHub Discussion](https://github.com/generative-computing/mellea/discussions). ### IBM ❤️ Open Source AI From d1167a3d016e95c86ca725b7fa01faf741d9a827 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 11:31:23 +0000 Subject: [PATCH 05/13] docs: remove Discord badge --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index cf26811ab..c856984c1 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,6 @@ with structured, testable AI workflows built around type-annotated outputs, veri [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit) [![GitHub License](https://img.shields.io/github/license/generative-computing/mellea)](https://github.com/generative-computing/mellea/blob/main/LICENSE) [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-3.0-4baaaa.svg)](CODE_OF_CONDUCT.md) -[![Discord](https://img.shields.io/discord/1448407063813165219?logo=discord&logoColor=white&label=Discord&color=7289DA)](https://ibm.biz/mellea-discord) ## Install From 485d53f16b274bb5ae711eecd66159e7bba79a12 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 11:34:51 +0000 Subject: [PATCH 06/13] docs: use GitHub Discussions, fix table header --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index c856984c1..4fb7ef9ce 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ print(user.age) # 31 — always an int, guaranteed by the schema ## Learn More -| Resource | | +| Resource | Description | |---|---| | [mellea.ai](https://mellea.ai) | Vision, features, and live demos | | [docs.mellea.ai](https://docs.mellea.ai) | Full docs — tutorials, API reference, how-to guides | @@ -66,7 +66,7 @@ We welcome contributions of all kinds — bug fixes, new backends, standard libr - **[Building Extensions](https://docs.mellea.ai/community/building-extensions)** — create reusable components in your own repo - **[mellea-contribs](https://github.com/generative-computing/mellea-contribs)** — community library for shared components -Questions? Open a [GitHub Discussion](https://github.com/generative-computing/mellea/discussions). +Questions? See [GitHub Discussions](https://github.com/generative-computing/mellea/discussions). ### IBM ❤️ Open Source AI From 214ef8c83145f3b5b2e65a84c5144cbc72bf9531 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 11:35:06 +0000 Subject: [PATCH 07/13] docs: fix landing page description --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4fb7ef9ce..c0324c3e1 100644 --- a/README.md +++ b/README.md @@ -53,7 +53,7 @@ print(user.age) # 31 — always an int, guaranteed by the schema | Resource | Description | |---|---| -| [mellea.ai](https://mellea.ai) | Vision, features, and live demos | +| [mellea.ai](https://mellea.ai) | Vision and features | | [docs.mellea.ai](https://docs.mellea.ai) | Full docs — tutorials, API reference, how-to guides | | [Colab notebooks](docs/examples/notebooks/) | Interactive examples you can run immediately | | [Code examples](docs/examples/) | Runnable examples: RAG, agents, Instruct-Validate-Repair (IVR), MObjects, and more | From 62e44370dd8ca09b448861c0bbad0126fd556b0e Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 11:42:32 +0000 Subject: [PATCH 08/13] docs: add capabilities section, fix table style --- README.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index c0324c3e1..787098c35 100644 --- a/README.md +++ b/README.md @@ -49,10 +49,19 @@ print(user.name) # Alice print(user.age) # 31 — always an int, guaranteed by the schema ``` +## What Mellea Does + +- **Structured output** — `@generative` turns typed functions into LLM calls; Pydantic schemas are enforced at generation time +- **Requirements & repair** — attach natural-language requirements to any call; Mellea validates and retries automatically +- **Sampling strategies** — rejection sampling, majority voting, inference-time scaling with one parameter change +- **Multiple backends** — Ollama, OpenAI, vLLM, HuggingFace, WatsonX, LiteLLM, Bedrock +- **Legacy integration** — drop Mellea into existing codebases with `mify` +- **MCP compatible** — expose any generative program as an MCP tool + ## Learn More | Resource | Description | -|---|---| +| --- | --- | | [mellea.ai](https://mellea.ai) | Vision and features | | [docs.mellea.ai](https://docs.mellea.ai) | Full docs — tutorials, API reference, how-to guides | | [Colab notebooks](docs/examples/notebooks/) | Interactive examples you can run immediately | From b4a3588ad1809dae2ac63c721dbacd0598f5cf03 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 12:43:45 +0000 Subject: [PATCH 09/13] Update README.md Co-authored-by: Paul Schweigert --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 787098c35..95be4943a 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ # Mellea — build predictable AI without guesswork -Inside every AI-powered pipeline, the unreliable part is the same: the LLM call itself. +Inside every AI-powered pipeline, the unreliable part is the same: the LLM calls itself. Silent failures, untestable outputs, no guarantees. Mellea is a Python library for writing *generative programs* — replacing brittle prompts and flaky agents with structured, testable AI workflows built around type-annotated outputs, verifiable requirements, and automatic retries. From 99680fe21cf3ae45f00735efefcbe324c1c24e1a Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 12:46:41 +0000 Subject: [PATCH 10/13] Update README.md Co-authored-by: Paul Schweigert --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 95be4943a..5074d8058 100644 --- a/README.md +++ b/README.md @@ -55,7 +55,7 @@ print(user.age) # 31 — always an int, guaranteed by the schema - **Requirements & repair** — attach natural-language requirements to any call; Mellea validates and retries automatically - **Sampling strategies** — rejection sampling, majority voting, inference-time scaling with one parameter change - **Multiple backends** — Ollama, OpenAI, vLLM, HuggingFace, WatsonX, LiteLLM, Bedrock -- **Legacy integration** — drop Mellea into existing codebases with `mify` +- **Legacy integration** — easily drop Mellea into existing codebases with `mify` - **MCP compatible** — expose any generative program as an MCP tool ## Learn More From 15afd1c9fa94de94c85287e09355b20fb1442201 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 13:03:23 +0000 Subject: [PATCH 11/13] docs: fix grammar, clarify sampling strategies description --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 5074d8058..6c8cfea5f 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ # Mellea — build predictable AI without guesswork -Inside every AI-powered pipeline, the unreliable part is the same: the LLM calls itself. +Inside every AI-powered pipeline, the unreliable part is the same: the LLM call itself. Silent failures, untestable outputs, no guarantees. Mellea is a Python library for writing *generative programs* — replacing brittle prompts and flaky agents with structured, testable AI workflows built around type-annotated outputs, verifiable requirements, and automatic retries. @@ -53,7 +53,7 @@ print(user.age) # 31 — always an int, guaranteed by the schema - **Structured output** — `@generative` turns typed functions into LLM calls; Pydantic schemas are enforced at generation time - **Requirements & repair** — attach natural-language requirements to any call; Mellea validates and retries automatically -- **Sampling strategies** — rejection sampling, majority voting, inference-time scaling with one parameter change +- **Sampling strategies** — run a generation multiple times and pick the best result; swap between rejection sampling, majority voting, and more with one parameter change - **Multiple backends** — Ollama, OpenAI, vLLM, HuggingFace, WatsonX, LiteLLM, Bedrock - **Legacy integration** — easily drop Mellea into existing codebases with `mify` - **MCP compatible** — expose any generative program as an MCP tool From 62016fb6eafddc1f133864e711f9056c0bd5cb21 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 14:26:45 +0000 Subject: [PATCH 12/13] ci: gate docstring quality and coverage in CI (#616) Add a hard-fail docstring quality gate to the docs-publish workflow: - New 'Docstring quality gate' step runs --quality --fail-on-quality --threshold 100; fails if any quality issue is found or coverage drops below 100% (both currently pass in CI) - Existing audit_coverage step (soft-fail, threshold 80) retained for the summary coverage metric Add typeddict_mismatch checks to audit_coverage.py: - typeddict_phantom: Attributes: documents a field not declared in the TypedDict - typeddict_undocumented: declared field absent from Attributes: section - Mirrors the existing param_mismatch logic for functions Pre-commit: enable --fail-on-quality on the manual-stage hook (CI is the hard gate; hook remains stages: [manual] as docs must be pre-built). Update CONTRIBUTING.md and docs/docs/guide/CONTRIBUTING.md with TypedDict docstring requirements and the two new audit check kinds. --- .github/workflows/docs-publish.yml | 42 ++++++++----------- .pre-commit-config.yaml | 11 +++-- CONTRIBUTING.md | 21 ++++++++++ docs/docs/guide/CONTRIBUTING.md | 5 +++ tooling/docs-autogen/audit_coverage.py | 56 ++++++++++++++++++++++++-- 5 files changed, 100 insertions(+), 35 deletions(-) diff --git a/.github/workflows/docs-publish.yml b/.github/workflows/docs-publish.yml index 77ad0f4a5..dc7494863 100644 --- a/.github/workflows/docs-publish.yml +++ b/.github/workflows/docs-publish.yml @@ -105,10 +105,17 @@ jobs: id: audit_coverage run: | set -o pipefail - uv run python tooling/docs-autogen/audit_coverage.py --docs-dir docs/docs/api --threshold 80 --quality 2>&1 \ + uv run python tooling/docs-autogen/audit_coverage.py --docs-dir docs/docs/api --threshold 80 2>&1 \ | tee /tmp/audit_coverage.log continue-on-error: ${{ inputs.strict_validation != true }} + - name: Docstring quality gate + id: quality_gate + run: | + set -o pipefail + uv run python tooling/docs-autogen/audit_coverage.py --docs-dir docs/docs/api --quality --fail-on-quality --threshold 100 2>&1 \ + | tee /tmp/quality_gate.log + # -- Upload artifact for deploy job -------------------------------------- - name: Upload docs artifact @@ -141,12 +148,14 @@ jobs: markdownlint_outcome = "${{ steps.markdownlint.outcome }}" validate_outcome = "${{ steps.validate_mdx.outcome }}" coverage_outcome = "${{ steps.audit_coverage.outcome }}" + quality_gate_outcome = "${{ steps.quality_gate.outcome }}" strict = "${{ inputs.strict_validation }}" == "true" mode = "" if strict else " *(soft-fail)*" lint_log = read_log("/tmp/markdownlint.log") validate_log = read_log("/tmp/validate_mdx.log") coverage_log = read_log("/tmp/audit_coverage.log") + quality_gate_log = read_log("/tmp/quality_gate.log") # Count markdownlint issues (lines matching file:line:col format) lint_issues = len([l for l in lint_log.splitlines() if re.match(r'.+:\d+:\d+ ', l)]) @@ -186,27 +195,11 @@ jobs: mdx_detail = parse_validate_detail(validate_log) - # Docstring quality annotation emitted by audit_coverage.py into the log + # Parse docstring quality annotation from quality gate log # Format: ::notice title=Docstring quality::message - # or ::warning title=Docstring quality::message - quality_match = re.search(r"::(notice|warning|error) title=Docstring quality::(.+)", coverage_log) - if quality_match: - quality_level, quality_msg = quality_match.group(1), quality_match.group(2) - quality_icon = "✅" if quality_level == "notice" else "⚠️" - quality_status = "pass" if quality_level == "notice" else "warning" - quality_detail = re.sub(r"\s*—\s*see job summary.*$", "", quality_msg) - quality_row = f"| Docstring Quality | {quality_icon} {quality_status}{mode} | {quality_detail} |" - else: - quality_row = None - - # Split coverage log at quality section to avoid duplicate output in collapsibles - quality_start = coverage_log.find("🔬 Running docstring quality") - if quality_start != -1: - quality_log = coverage_log[quality_start:] - coverage_display_log = coverage_log[:quality_start].strip() - else: - quality_log = "" - coverage_display_log = coverage_log + # or ::error title=Docstring quality::message + quality_gate_match = re.search(r"::(notice|warning|error) title=Docstring quality::(.+)", quality_gate_log) + quality_gate_detail = re.sub(r"\s*—\s*see job summary.*$", "", quality_gate_match.group(2)) if quality_gate_match else "" lines = [ "## Docs Build — Validation Summary\n", @@ -215,16 +208,15 @@ jobs: f"| Markdownlint | {icon(markdownlint_outcome)} {markdownlint_outcome}{mode} | {lint_detail} |", f"| MDX Validation | {icon(validate_outcome)} {validate_outcome}{mode} | {mdx_detail} |", f"| API Coverage | {icon(coverage_outcome)} {coverage_outcome}{mode} | {cov_detail} |", + f"| Docstring Quality | {icon(quality_gate_outcome)} {quality_gate_outcome} | {quality_gate_detail} |", ] - if quality_row: - lines.append(quality_row) lines.append("") for title, log, limit in [ ("Markdownlint output", lint_log, 5_000), ("MDX validation output", validate_log, 5_000), - ("API coverage output", coverage_display_log, 5_000), - ("Docstring quality details", quality_log, 1_000_000), + ("API coverage output", coverage_log, 5_000), + ("Docstring quality details", quality_gate_log, 1_000_000), ]: if log: lines += [ diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 302fe676a..513ff68bb 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -51,13 +51,12 @@ repos: language: system pass_filenames: false files: (docs/docs/.*\.mdx$|tooling/docs-autogen/) - # TODO(#616): Move to normal commit flow once docstring quality issues reach 0. - # Griffe loads the full package (~10s), so this is manual-only for now to avoid - # slowing down every Python commit. Re-enable (remove stages: [manual]) and add - # --fail-on-quality once quality issues are resolved. + # Docstring quality gate — manual only (CI is the hard gate via docs-publish.yml). + # Run locally with: pre-commit run docs-docstring-quality --hook-stage manual + # Requires generated API docs (run `uv run python tooling/docs-autogen/build.py` first). - id: docs-docstring-quality - name: Audit docstring quality (informational) - entry: bash -c 'test -d docs/docs/api && uv run --no-sync python tooling/docs-autogen/audit_coverage.py --quality --docs-dir docs/docs/api || true' + name: Audit docstring quality + entry: uv run --no-sync python tooling/docs-autogen/audit_coverage.py --quality --fail-on-quality --threshold 0 --docs-dir docs/docs/api language: system pass_filenames: false files: (mellea/.*\.py$|cli/.*\.py$) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index fc2b12b09..0e0cb9918 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -174,6 +174,25 @@ differs in type or behaviour from the constructor input — for example, when a argument is wrapped into a `CBlock`, or when a class-level constant is relevant to callers. Pure-echo entries that repeat `Args:` verbatim should be omitted. +**`TypedDict` classes are a special case.** Their fields *are* the entire public +contract, so when an `Attributes:` section is present it must exactly match the +declared fields. The audit will flag: + +- `typeddict_phantom` — `Attributes:` documents a field that is not declared in the `TypedDict` +- `typeddict_undocumented` — a declared field is absent from the `Attributes:` section + +```python +class ConstraintResult(TypedDict): + """Result of a constraint check. + + Attributes: + passed: Whether the constraint was satisfied. + reason: Human-readable explanation. + """ + passed: bool + reason: str +``` + #### Validating docstrings Run the coverage and quality audit to check your changes before committing: @@ -194,6 +213,8 @@ Key checks the audit enforces: | `no_args` | Standalone function has params but no `Args:` section | | `no_returns` | Function has a non-trivial return annotation but no `Returns:` section | | `param_mismatch` | `Args:` documents names not present in the actual signature | +| `typeddict_phantom` | `TypedDict` `Attributes:` documents a field not declared in the class | +| `typeddict_undocumented` | `TypedDict` has a declared field absent from its `Attributes:` section | **IDE hover verification** — open any of these existing classes in VS Code and hover over the class name or a constructor call to confirm the hover card shows `Args:` once diff --git a/docs/docs/guide/CONTRIBUTING.md b/docs/docs/guide/CONTRIBUTING.md index fd0a58434..e92e4a5e4 100644 --- a/docs/docs/guide/CONTRIBUTING.md +++ b/docs/docs/guide/CONTRIBUTING.md @@ -353,6 +353,11 @@ Add `Attributes:` only when a stored value differs in type or behaviour from the input (e.g. a `str` wrapped into a `CBlock`, or a class-level constant). Pure-echo entries that repeat `Args:` verbatim should be omitted. +**`TypedDict` classes** are a special case — their fields are the entire public contract, +so when an `Attributes:` section is present it must exactly match the declared fields. +The CI audit will fail on phantom fields (documented but not declared) and undocumented +fields (declared but missing from `Attributes:`). + See [CONTRIBUTING.md](../../CONTRIBUTING.md) for the full validation workflow. --- diff --git a/tooling/docs-autogen/audit_coverage.py b/tooling/docs-autogen/audit_coverage.py index eb9506c9f..353ae3288 100755 --- a/tooling/docs-autogen/audit_coverage.py +++ b/tooling/docs-autogen/audit_coverage.py @@ -102,6 +102,7 @@ def walk_module(module, module_path: str): # --------------------------------------------------------------------------- _ARGS_RE = re.compile(r"^\s*(Args|Arguments|Parameters)\s*:", re.MULTILINE) +_TYPEDDICT_BASES = re.compile(r"\bTypedDict\b") _RETURNS_RE = re.compile(r"^\s*Returns\s*:", re.MULTILINE) _YIELDS_RE = re.compile(r"^\s*Yields\s*:", re.MULTILINE) _RAISES_RE = re.compile(r"^\s*Raises\s*:", re.MULTILINE) @@ -274,6 +275,45 @@ def _check_member(member, full_path: str, short_threshold: int) -> list[dict]: } ) + # TypedDict field mismatch check. + # Unlike regular classes (where Attributes: is optional under Option C), + # TypedDict fields *are* the entire public contract. When an Attributes: + # section exists, every entry must match an actual declared field and every + # declared field must appear — stale or missing entries are always a bug. + is_typeddict = any( + _TYPEDDICT_BASES.search(str(base)) + for base in getattr(member, "bases", []) + ) + if is_typeddict and _ATTRIBUTES_RE.search(doc_text): + attrs_block = re.search( + r"Attributes\s*:(.*?)(?:\n\s*\n|\Z)", doc_text, re.DOTALL + ) + if attrs_block: + doc_field_names = set(_ARGS_ENTRY_RE.findall(attrs_block.group(1))) + actual_fields = { + name + for name, m in member.members.items() + if not name.startswith("_") and getattr(m, "is_attribute", False) + } + phantom = doc_field_names - actual_fields + if phantom: + issues.append( + { + "path": full_path, + "kind": "typeddict_phantom", + "detail": f"Attributes: documents {sorted(phantom)} not declared in TypedDict", + } + ) + undocumented = actual_fields - doc_field_names + if undocumented: + issues.append( + { + "path": full_path, + "kind": "typeddict_undocumented", + "detail": f"TypedDict fields {sorted(undocumented)} missing from Attributes: section", + } + ) + return issues @@ -296,11 +336,15 @@ def audit_docstring_quality( - no_class_args: class whose __init__ has typed params but no Args section on the class - duplicate_init_args: Args: present in both class docstring and __init__ (Option C violation) - param_mismatch: Args section documents names absent from the real signature + - typeddict_phantom: TypedDict Attributes: section documents fields not declared in the class + - typeddict_undocumented: TypedDict has declared fields absent from its Attributes: section - Note: Attributes: sections are intentionally not enforced. Under the Option C - convention, Attributes: is only used when stored values differ in type or - behaviour from the constructor inputs (e.g. type transforms, computed values, - class constants). Pure-echo entries that repeat Args: verbatim are omitted. + Note: Attributes: sections are intentionally not enforced for regular classes. Under + the Option C convention, Attributes: is only used when stored values differ in type or + behaviour from the constructor inputs (e.g. type transforms, computed values, class + constants). Pure-echo entries that repeat Args: verbatim are omitted. TypedDicts are + a carve-out: their fields are the entire public contract, so when an Attributes: + section is present it must exactly match the declared fields. Only symbols (and methods whose parent class) present in `documented` are checked when that set is provided — ensuring the audit is scoped to what is @@ -401,6 +445,8 @@ def _print_quality_report(issues: list[dict]) -> None: "no_class_args": "Missing class Args section", "duplicate_init_args": "Duplicate Args: in class + __init__ (Option C violation)", "param_mismatch": "Param name mismatches (documented but not in signature)", + "typeddict_phantom": "TypedDict phantom fields (documented but not declared)", + "typeddict_undocumented": "TypedDict undocumented fields (declared but missing from Attributes:)", } total = len(issues) @@ -419,6 +465,8 @@ def _print_quality_report(issues: list[dict]) -> None: "no_class_args", "duplicate_init_args", "param_mismatch", + "typeddict_phantom", + "typeddict_undocumented", ): items = by_kind.get(kind, []) if not items: From d777cfc6a29de1a4a36b79db5444b321e0b29a7b Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 18 Mar 2026 18:05:27 +0000 Subject: [PATCH 13/13] fix: always populate mot.usage in HuggingFace backend (#694) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Token count extraction in _post_process_async was gated behind `span is not None or metrics_enabled`, so mot.usage was never populated in plain (non-telemetry) runs. Now extracted unconditionally — usage is a standard mot field, not a telemetry concern. --- mellea/backends/huggingface.py | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/mellea/backends/huggingface.py b/mellea/backends/huggingface.py index 424d5b2f3..e6236e5c6 100644 --- a/mellea/backends/huggingface.py +++ b/mellea/backends/huggingface.py @@ -1133,18 +1133,11 @@ class used during generation, if any. ) span = mot._meta.get("_telemetry_span") - from ..telemetry.metrics import is_metrics_enabled - metrics_enabled = is_metrics_enabled() - - # Extract token counts only if needed + # Derive token counts from the output sequences (HF models have no usage object). hf_output = mot._meta.get("hf_output") n_prompt, n_completion = None, None - if (span is not None or metrics_enabled) and isinstance( - hf_output, GenerateDecoderOnlyOutput - ): - # HuggingFace local models don't provide usage objects, but we can - # calculate token counts from sequences + if isinstance(hf_output, GenerateDecoderOnlyOutput): try: if input_ids is not None and hf_output.sequences is not None: n_prompt = input_ids.shape[1] @@ -1152,7 +1145,6 @@ class used during generation, if any. except Exception: pass - # Populate standardized usage field (convert to OpenAI format) if n_prompt is not None and n_completion is not None: mot.usage = { "prompt_tokens": n_prompt,