mloda.ai: Open Data Access for ML & AI
Declarative data access for AI agents. Describe what you need - mloda delivers it.
pip install mlodaYour AI describes what it needs. mloda figures out how to get it:
from mloda.user import PluginLoader, mloda
PluginLoader.all()
result = mloda.run_all(
features=["customer_id", "income", "income__sum_aggr", "age__avg_aggr"],
compute_frameworks=["PandasDataFrame"],
api_data={"SampleData": {
"customer_id": ["C001", "C002", "C003", "C004", "C005"],
"age": [25, 35, 45, 30, 50],
"income": [50000, 75000, 90000, 60000, 85000]
}}
)Copy, paste, run. mloda resolves dependencies, chains plugins, delivers data.
┌─────────────────────────────────────────────────────────────────┐
│ DATA USERS │
│ AI Agents • ML Pipelines • Data Science • Analytics │
└───────────────────────────┬─────────────────────────────────────┘
│ describe what they need
▼
┌───────────────┐
│ mloda │ ← resolves HOW from WHAT
│ [Plugins] │
└───────────────┘
│ delivers trusted data
▼
┌─────────────────────────────────────────────────────────────────┐
│ DATA SOURCES │
│ Databases • APIs • Files • Any source via plugins │
└─────────────────────────────────────────────────────────────────┘
| You want to... | mloda gives you... |
|---|---|
| Give AI agents data access | Declarative API - agents describe WHAT, not HOW |
| Trace every result | Built-in lineage back to source |
| Reuse across projects | Plugins work anywhere - notebook to production |
| Mix data sources | One interface for DBs, APIs, files, anything |
Let LLMs request data without writing code:
# LLM generates this JSON
llm_request = '["customer_id", {"name": "income__sum_aggr"}]'
# mloda executes it
from mloda.user import load_features_from_config
features = load_features_from_config(llm_request, format="json")
result = mloda.run_all(
features=features,
compute_frameworks=["PandasDataFrame"],
api_data={"SampleData": {"customer_id": ["C001", "C002"], "income": [50000, 75000]}}
)More patterns: Context Window Assembly • RAG Pipelines
mloda separates WHAT you need from HOW to get it - through plugins. Existing tools solve parts of this, but none bridge the full gap:
| Category | Products | What it does | Why it's not enough |
|---|---|---|---|
| Feature Stores | Feast, Tecton, Featureform | Store + serve features | Infrastructure-tied, storage-only |
| Semantic Layers | dbt Semantic Layer, Cube | Declarative metrics | SQL-only, centralized |
| DAG Frameworks | Hamilton, Kedro | Dataflows as code | Function-first, no plugin abstraction |
| Data Catalogs | DataHub, Atlan | Metadata & discovery | No execution, no contracts |
| ORMs | SQLAlchemy, Django ORM | Database abstraction | Single database, no ML lifecycle |
mloda is the connection layer - separating WHAT you compute from HOW you compute it. Plugins define transformations. Users describe requirements. mloda resolves the pipeline.
mloda's architecture follows three roles: providers (define plugins), users (access data), and stewards (govern execution). The module structure reflects this: mloda.provider, mloda.user, mloda.steward.
mloda uses three types of plugins:
| Type | What it does |
|---|---|
| FeatureGroup | Defines data transformations |
| ComputeFramework | Execution backend (Pandas, Spark, etc.) |
| Extender | Hooks for logging, validation, monitoring |
Most of the time, you'll work with FeatureGroups - Python classes that define how to access and transform data (see Quick Example above).
Why plugins?
- Steps, not pipelines - Build transformations. mloda wires them together.
- Small and testable - Each plugin is a focused unit. Easy to test, easy to debug.
- AI-friendly - Small, template-like structures. Let AI generate plugins for you.
- Share what isn't secret - Your pipeline runs steps a,b,c,d. Steps b,c,d have no proprietary logic? Share them across projects, teams, even organizations.
- Experiment to production - Same plugins in your notebook and your cluster. No rewrite.
- Stand on shoulders - Combine community plugins with your own. Build on what exists.
Give LLMs deterministic data access - they declare what, mloda handles how:
from mloda.user import PluginLoader, load_features_from_config, mloda
PluginLoader.all()
# LLM generates this JSON (no Python code needed)
llm_output = '''
[
"customer_id",
{"name": "income__sum_aggr"},
{"name": "age__avg_aggr"},
{"name": "total_spend", "options": {"aggregation_type": "sum", "in_features": "income"}}
]
'''
# mloda parses JSON into Feature objects
features = load_features_from_config(llm_output, format="json")
result = mloda.run_all(
features=features,
compute_frameworks=["PandasDataFrame"],
api_data={"SampleData": {
"customer_id": ["C001", "C002", "C003"],
"income": [50000, 75000, 90000],
"age": [25, 35, 45]
}}
)LLM-friendly: The agent only declares what it needs - mloda handles the rest.
Gather context from multiple sources declaratively - mloda validates and delivers. Why not let an AI agent do it?
Example: This shows the API pattern. Requires custom FeatureGroup implementations for your data sources.
from mloda.user import Feature, mloda
# Build complete context from multiple sources
features = [
Feature(name="system_instructions", options={"template": "support_agent"}),
Feature(name="user_profile", options={"user_id": user_id, "include_preferences": True}),
Feature(name="knowledge_base", options={"query": user_query, "top_k": 5}),
Feature(name="conversation_history", options={"limit": 20, "summarize_old": True}),
Feature(name="available_tools", options={"category": "customer_service"}),
Feature(name="output_format", options={"format": "markdown", "max_length": 500}),
]
result = mloda.run_all(
features=features,
compute_frameworks=["PythonDictFramework"],
api_data={"UserQuery": {"query": [user_query]}}
)
# Each feature resolved via its plugin, validatedBuild RAG pipelines declaratively - mloda chains the steps for you.
Example: This shows the chaining syntax. Requires custom FeatureGroup implementations for retrieval and processing.
# String-based chaining: query -> validate -> retrieve -> redact
Feature(name="user_query__injection_checked__retrieved__pii_redacted")
# Configuration-based chaining: explicit pipeline
Feature(
name="safe_context",
options=Options(context={
"in_features": "documents__retrieved__pii_redacted",
"redact_types": ["email", "phone", "ssn"]
})
)mloda resolves the full chain - you declare the end result, not the steps.
Automatic dependency resolution: You only declare what you need. If pii_redacted depends on retrieved which depends on documents, just ask for pii_redacted - mloda traces back and resolves the full chain.
Mix multiple backends in a single pipeline - mloda routes each feature to the right framework:
result = mloda.run_all(
features=[...],
compute_frameworks=["PandasDataFrame", "PolarsDataFrame", "SparkFramework"]
)
# Results may come from different frameworks based on plugin compatibilityAdd your own frameworks - mloda is extensible.
Wrap plugin execution for logging, validation, or lineage tracking:
import time
from mloda.steward import Extender, ExtenderHook
class LogExecutionTime(Extender):
def wraps(self):
return {ExtenderHook.FEATURE_GROUP_CALCULATE_FEATURE}
def __call__(self, func, *args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
print(f"Took {time.time() - start:.2f}s")
return result
# Use it
result = mloda.run_all(features, function_extender={LogExecutionTime()})Built-in and custom extenders give you full lineage - trace any result back to its source.
Use mloda when:
- Your agents need data from multiple sources
- You want consistent, validated data access
- You need traceability (audit, debugging)
- Multiple agents share the same data patterns
Don't use mloda for:
- Single database, simple queries → use an ORM
- One-off scripts → just write the code
- Real-time streaming (<5ms) → use Kafka/Flink
- Getting Started - Installation and first steps
- Plugin Development - Build your own plugins
- API Reference - Complete API docs
Most plugins currently live in mloda_plugins/ within this repository. The goal is to gradually migrate them to standalone packages in the registry.
| Repository | Description |
|---|---|
| mloda-registry | Official plugin packages and 40+ development guides |
| mloda-plugin-template | Cookiecutter template for creating standalone plugins |
We welcome contributions! Build plugins, improve docs, or add features.
- GitHub Issues - Report bugs or request features
- Development Guide - How to contribute