Skip to content

feat: add MCP server support for CDF data models#618

Draft
ks93 wants to merge 8 commits intomainfrom
feat/mcp-server
Draft

feat: add MCP server support for CDF data models#618
ks93 wants to merge 8 commits intomainfrom
feat/mcp-server

Conversation

@ks93
Copy link
Contributor

@ks93 ks93 commented Jan 26, 2026

Summary

Adds MCP (Model Context Protocol) server support to pygen, enabling LLMs to interact with CDF data models through dynamically generated tools.

Features

  • Zero-config UX: Launch MCP server directly from a data model ID with interactive OAuth login
  • Dynamic tool generation: Automatically creates MCP tools based on generated SDK methods
  • Full parameter introspection: Exposes all SDK filter parameters with types and descriptions from docstrings
  • Correct required/optional handling: Required parameters (like aggregate in aggregate tools) are properly enforced
  • View filtering: Use --views to include only specific views (reduces tool count)
  • Operation filtering: Use --operations to enable only specific operations (list, search, etc.)

Supported Operations

Operation Description
list List/filter items with full filter support
retrieve Retrieve single item by external_id
search Full-text search with filters
aggregate Count, sum, avg, min, max with optional group_by
histogram Generate histograms for numeric properties
delete Delete items (opt-in via --write flag)
graphql_query Execute GraphQL queries (opt-in via --graphql flag)

Type Support

  • Datetime: Accepts ISO 8601 strings, converts to datetime.datetime
  • Direct relations: Accepts {"space": str, "externalId": str} objects
  • List of direct relations: Accepts arrays of node reference objects
  • All primitive types: str, int, float, bool

CLI Usage

# Basic usage
pygen mcp-serve "space/dataModel/version" --cluster greenfield --project myproject

# With optional features
pygen mcp-serve "space/dataModel/version" --cluster greenfield --project myproject --graphql --write

# Filter to specific views (reduces tool count for clients with limits)
pygen mcp-serve "space/dataModel/version" -c greenfield -p myproject --views Asset,Equipment

# Enable only specific operations
pygen mcp-serve "space/dataModel/version" -c greenfield -p myproject --ops list,search

Installation

# Install pygen globally with MCP support
uv tool install "cognite-pygen[cli,mcp]" --with pandas

# Or install from local development repo (editable)
uv tool install -e "/path/to/pygen[cli,mcp]" --with pandas --force

Cursor IDE Setup

Example Cursor MCP config (~/.cursor/mcp.json):

{
  "mcpServers": {
    "cognite": {
      "command": "pygen",
      "args": ["mcp-serve", "space/dataModel/version", "-c", "greenfield", "-p", "myproject", "--org", "myorg"]
    }
  }
}

Claude Desktop Setup

Example Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "cognite": {
      "command": "/Users/yourname/.local/bin/pygen",
      "args": [
        "mcp-serve",
        "sp_enterprise_process_industry/RigsbergProcessIndustries@v1",
        "-c", "greenfield",
        "-p", "eos-greenfield",
        "--org", "cog-ai",
        "--views", "RigsbergAsset,RigsbergFile,RigsbergMaintenanceOrder",
        "--ops", "list,search"
      ]
    }
  }
}

Note: Claude Desktop requires the full path to pygen since ~/.local/bin is not in its PATH.

Authentication

Uses interactive OAuth 2.0 Authorization Code Flow with PKCE (same as @cognite/dune):

  • Opens browser for Entra ID login
  • Local HTTPS callback server with self-signed certificate
  • No .env files or client secrets required

Test Plan

  • List tools expose all SDK filter parameters
  • Search tools work with query and filters
  • Aggregate tools work with count, group_by
  • Histogram tools work with numeric properties
  • Datetime filters accept ISO 8601 strings
  • Direct relation filters accept object/array format
  • Parameter descriptions extracted from SDK docstrings
  • Required parameters (e.g., aggregate) are properly enforced, optional params have defaults
  • --views flag filters to specified view external IDs
  • --operations flag enables only specified operations

Bump

  • Major
  • Minor
  • Patch
  • Skip

Changelog

Added

  • MCP (Model Context Protocol) server support for pygen-generated SDKs
  • New CLI command pygen mcp-serve to launch MCP server from a data model ID
  • Dynamic tool generation for list, retrieve, search, aggregate, and histogram operations
  • Full parameter introspection with types and descriptions from SDK docstrings
  • Interactive OAuth 2.0 authentication with PKCE flow
  • Optional --graphql and --write flags for GraphQL queries and delete operations
  • --views flag to filter which views are exposed as MCP tools
  • --operations flag to control which operations are enabled (list, retrieve, search, aggregate, histogram)

Fixed

  • Required parameters (like aggregate) are now properly enforced in MCP tool schemas

Add the ability to serve any CDF data model as an MCP (Model Context Protocol)
server, enabling LLMs like Claude to interact with Cognite Data Fusion directly.

Features:
- New `pygen mcp-serve` CLI command
- Interactive OAuth 2.0 PKCE browser authentication (zero config)
- Automatic SDK generation in memory from data model ID
- Exposes list, retrieve, and graphql_query tools for each view

Usage:
  pygen mcp-serve mySpace/MyModel@v1 --cluster api --project my-project

Install with: pip install cognite-pygen[mcp]
@gemini-code-assist
Copy link

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

@github-actions
Copy link

github-actions bot commented Jan 26, 2026

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
5376 3654 68% 60% 🟢

New Files

File Coverage Status
cognite/pygen/_auth.py 38% 🟢
cognite/pygen/_mcp.py 10% 🟢
TOTAL 24% 🟢

Modified Files

File Coverage Status
cognite/pygen/cli.py 0% 🟢
TOTAL 0% 🟢

updated for commit: afde35a by action🐍

Switch from requests to httpx for OAuth HTTP calls since httpx is already
a transitive dependency of mcp and has proper type stubs.

Also includes ruff formatting fixes.
@ks93 ks93 changed the title feat: Add MCP server support for CDF data models feat: add MCP server support for CDF data models Jan 26, 2026
ks93 added 2 commits January 25, 2026 20:28
- Change --graphql to opt-in (default off) instead of --no-graphql
- Add --write flag to enable delete operations (default off)
- Add _make_delete_tool helper for delete functionality

This gives users explicit control over which capabilities to expose.
- Introspect SDK list method signatures to expose all filter parameters
- Support datetime filters (ISO 8601 strings converted to datetime)
- Support node reference filters ({"space": str, "externalId": str})
- Support array of node refs for edge list filters
- Handle Python 3.10+ UnionType (X | Y) syntax
- Add search, aggregate, and histogram tools for all view APIs
- Extract parameter descriptions from SDK docstrings
- Refactor to use generic _make_dynamic_tool helper
- Fix histogram result serialization (access .buckets attribute)
- Force-include method-specific params (aggregate, group_by, property, interval)
Parameters without defaults (required) were incorrectly being marked as
optional with default=None. This caused MCP tools like aggregate to not
enforce required parameters.

Now:
- Required params: no default, non-nullable type annotation
- Optional params: has default, nullable type annotation (type | None)
Allows filtering which views and operations are exposed:
- --views: comma-separated list of view external IDs to include
- --operations/--ops: comma-separated operations (list, retrieve, search, aggregate, histogram)

This helps reduce tool count for MCP clients with limits.

Examples:
  pygen mcp-serve model -c api -p proj --views Asset,Equipment
  pygen mcp-serve model -c api -p proj --ops list,search
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant