CSL-Core (Chimera Specification Language) is a deterministic safety layer for AI agents. Write rules in .csl files, verify them mathematically with Z3, enforce them at runtime โ outside the model. The LLM never sees the rules. It simply cannot violate them.
pip install csl-coreOriginally built for Project Chimera, now open-source for any AI system.
prompt = """You are a helpful assistant. IMPORTANT RULES:
- Never transfer more than $1000 for junior users
- Never send PII to external emails
- Never query the secrets table"""This doesn't work. LLMs can be prompt-injected, rules are probabilistic (99% โ 100%), and there's no audit trail when something goes wrong.
CSL-Core flips this: rules live outside the model in compiled, Z3-verified policy files. Enforcement is deterministic โ not a suggestion.
Create my_policy.csl:
CONFIG {
ENFORCEMENT_MODE: BLOCK
CHECK_LOGICAL_CONSISTENCY: TRUE
}
DOMAIN MyGuard {
VARIABLES {
action: {"READ", "WRITE", "DELETE"}
user_level: 0..5
}
STATE_CONSTRAINT strict_delete {
WHEN action == "DELETE"
THEN user_level >= 4
}
}# Compile + Z3 formal verification
cslcore verify my_policy.csl
# Test a scenario
cslcore simulate my_policy.csl --input '{"action": "DELETE", "user_level": 2}'
# โ BLOCKED: Constraint 'strict_delete' violated.
# Interactive REPL
cslcore repl my_policy.cslfrom chimera_core import load_guard
guard = load_guard("my_policy.csl")
result = guard.verify({"action": "READ", "user_level": 1})
print(result.allowed) # True
result = guard.verify({"action": "DELETE", "user_level": 2})
print(result.allowed) # FalseWe tested prompt-based safety rules vs CSL-Core enforcement across 4 frontier LLMs with 22 adversarial attacks and 15 legitimate operations:
| Approach | Attacks Blocked | Bypass Rate | Legit Ops Passed | Latency |
|---|---|---|---|---|
| GPT-4.1 (prompt rules) | 10/22 (45%) | 55% | 15/15 (100%) | ~850ms |
| GPT-4o (prompt rules) | 15/22 (68%) | 32% | 15/15 (100%) | ~620ms |
| Claude Sonnet 4 (prompt rules) | 19/22 (86%) | 14% | 15/15 (100%) | ~480ms |
| Gemini 2.0 Flash (prompt rules) | 11/22 (50%) | 50% | 15/15 (100%) | ~410ms |
| CSL-Core (deterministic) | 22/22 (100%) | 0% | 15/15 (100%) | ~0.84ms |
Why 100%? Enforcement happens outside the model. Prompt injection is irrelevant because there's nothing to inject against. Attack categories: direct instruction override, role-play jailbreaks, encoding tricks, multi-turn escalation, tool-name spoofing, and more.
Full methodology:
benchmarks/
Protect any LangChain agent with 3 lines โ no prompt changes, no fine-tuning:
from chimera_core import load_guard
from chimera_core.plugins.langchain import guard_tools
from langchain_classic.agents import AgentExecutor, create_tool_calling_agent
guard = load_guard("agent_policy.csl")
# Wrap tools โ enforcement is automatic
safe_tools = guard_tools(
tools=[search_tool, transfer_tool, delete_tool],
guard=guard,
inject={"user_role": "JUNIOR", "environment": "prod"}, # LLM can't override these
tool_field="tool" # Auto-inject tool name
)
agent = create_tool_calling_agent(llm, safe_tools, prompt)
executor = AgentExecutor(agent=agent, tools=safe_tools)Every tool call is intercepted before execution. If the policy says no, the tool doesn't run. Period.
Pass runtime context that the LLM cannot override โ user roles, environment, rate limits:
safe_tools = guard_tools(
tools=tools,
guard=guard,
inject={
"user_role": current_user.role, # From your auth system
"environment": os.getenv("ENV"), # prod/dev/staging
"rate_limit_remaining": quota.remaining # Dynamic limits
}
)from chimera_core.plugins.langchain import gate
chain = (
{"query": RunnablePassthrough()}
| gate(guard, inject={"user_role": "USER"}) # Policy checkpoint
| prompt | llm | StrOutputParser()
)The CLI is a complete development environment for policies โ test, debug, and deploy without writing Python.
cslcore verify my_policy.csl
# โ๏ธ Compiling Domain: MyGuard
# โข Validating Syntax... โ
OK
# โโโ Verifying Logic Model (Z3 Engine)... โ
Mathematically Consistent
# โข Generating IR... โ
OK# Single input
cslcore simulate policy.csl --input '{"action": "DELETE", "user_level": 2}'
# Batch testing from file
cslcore simulate policy.csl --input-file test_cases.json --dashboard
# CI/CD: JSON output
cslcore simulate policy.csl --input-file tests.json --json --quietcslcore repl my_policy.csl --dashboard
cslcore> {"action": "DELETE", "user_level": 2}
๐ก๏ธ BLOCKED: Constraint 'strict_delete' violated.
cslcore> {"action": "DELETE", "user_level": 5}
โ
ALLOWEDcslcore formal my_policy.cslRuns the official TLC model checker (java -jar tla2tools.jar) against your policy. TLC exhaustively explores every reachable state in the abstract state space and proves each temporal property holds โ or returns a concrete counterexample trace with the exact state that breaks your invariant.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TLAโบ FORMAL VERIFICATION ENGINE โ
โ Chimera Specification Language ยท Temporal Logic of Actions โ
โ โ
โ โก REAL TLC ยท java -jar tla2tools.jar ยท Exhaustive Model Checking โ
โ TLC2 Version 2026.03.31.154134 (rev: becec35) ยท pid 48146 ยท 1 โ
โ worker(s) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Variable Domain Cardinality
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
agent_tier {"STANDARD", "PREMIUM"} |2|
task_type {"READ", "WRITE", "ANALYZE"} |3|
risk_score 0..5 |6|
โโ โก(no_destructive_ops) โ
HOLDS [288 states 349ms]
โโ โก(no_production_access) โ
HOLDS [288 states 349ms]
โโ โก(bounded_risk) โ
HOLDS [288 states 349ms]
โโ Proof hash: 17dd1564897d242fc045a3a884a52bbbโฆ โ
โโโโโโโโโโโโโโโ TLAโบ VERIFICATION COMPLETE โ ALL PROPERTIES HOLD โโโโโโโโโโโโโโโ
โ โ
Domain: AIAgentSafetyDemo ยท โฌก 144 states ยท โฑ 1047ms โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Enable in any policy by adding one line to CONFIG:
CONFIG {
ENFORCEMENT_MODE: BLOCK
ENABLE_FORMAL_VERIFICATION: TRUE // โ triggers cslcore formal automatically
}Or run standalone:
cslcore formal policy.csl # real TLC (Java required, JAR auto-downloaded)
cslcore formal policy.csl --mock # Python BFS fallback (no Java needed)
cslcore formal policy.csl --timeout 120
cslcore formal policy.csl --export-tla ./specs/ # save .tla + .cfg for TLA+ ToolboxNo Java? CSL-Core falls back to a Python BFS model checker automatically. The banner clearly labels which engine ran. JAR is auto-downloaded on first use (~4MB from the official TLA+ GitHub release).
# GitHub Actions
- name: Verify policies
run: |
for policy in policies/*.csl; do
cslcore verify "$policy" || exit 1
doneWrite, verify, and enforce safety policies directly from your AI assistant โ no code required.
pip install "csl-core[mcp]"Add to Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"csl-core": {
"command": "uv",
"args": ["run", "--with", "csl-core[mcp]", "csl-core-mcp"]
}
}
}| Tool | What It Does |
|---|---|
verify_policy |
Z3 formal verification โ catches contradictions at compile time |
simulate_policy |
Test policies against JSON inputs โ ALLOWED/BLOCKED |
explain_policy |
Human-readable summary of any CSL policy |
scaffold_policy |
Generate a CSL template from plain-English description |
You: "Write me a safety policy that prevents transfers over $5000 without admin approval"
Claude: scaffold_policy โ you edit โ verify_policy catches a contradiction โ you fix โ simulate_policy confirms it works
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. COMPILER .csl โ AST โ IR โ Compiled Artifact โ
โ Syntax validation, semantic checks, functor gen โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 2. Z3 VERIFIER Theorem Prover โ Static Analysis โ
โ Contradiction detection, reachability, rule shadowing โ
โ โ ๏ธ If verification fails โ policy will NOT compile โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 3. TLAโบ VERIFIER Model Checker โ Temporal Safety โ
โ Exhaustive state-space exploration via TLC โ
โ Predicate abstraction for large numeric domains โ
โ Counterexample traces + automated fix suggestions โ
โ (opt-in: ENABLE_FORMAL_VERIFICATION: TRUE) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 4. RUNTIME Deterministic Policy Enforcement โ
โ Fail-closed, zero dependencies, <1ms latency โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Heavy computation happens once at compile-time. Runtime is pure evaluation.
| ๐๏ธ |
Project Chimera โ Neuro-Symbolic AI Agent CSL-Core powers all safety policies across e-commerce and quantitative trading domains. Both are Z3-verified at startup. |
Using CSL-Core? Let us know and we'll add you here.
| Example | Domain | Key Features |
|---|---|---|
agent_tool_guard.csl |
AI Safety | RBAC, PII protection, tool permissions |
chimera_banking_case_study.csl |
Finance | Risk scoring, VIP tiers, sanctions |
dao_treasury_guard.csl |
Web3 | Multi-sig, timelocks, emergency bypass |
tla_demo.csl |
Formal Methods | TLAโบ model checking โ all properties hold |
tla_demo_violation.csl |
Formal Methods | TLAโบ counterexample trace + fix suggestions |
python examples/run_examples.py # Run all with test suites
python examples/run_examples.py banking # Run specific examplefrom chimera_core import load_guard, RuntimeConfig
# Load + compile + verify
guard = load_guard("policy.csl")
# With custom config
guard = load_guard("policy.csl", config=RuntimeConfig(
raise_on_block=False, # Return result instead of raising
collect_all_violations=True, # Report all violations, not just first
missing_key_behavior="block" # "block", "warn", or "ignore"
))
# Verify
result = guard.verify({"action": "DELETE", "user_level": 2})
print(result.allowed) # False
print(result.violations) # ['strict_delete']Full docs: Getting Started ยท Syntax Spec ยท CLI Reference ยท Philosophy
โ Done: Core language & parser ยท Z3 verification ยท Fail-closed runtime ยท LangChain integration ยท CLI (verify, simulate, repl, formal) ยท MCP Server ยท TLAโบ model checking with real TLC ยท Predicate abstraction ยท Counterexample analysis ยท Production deployment in Chimera v1.7.0
๐ง In Progress: Policy versioning ยท LangGraph integration
๐ฎ Planned: LlamaIndex & AutoGen ยท Multi-policy composition ยท Hot-reload ยท Policy marketplace ยท Cloud templates
๐ Enterprise (Research): Causal inference ยท Multi-tenancy
We welcome contributions! Start with good first issue or check CONTRIBUTING.md.
High-impact areas: Real-world example policies ยท Framework integrations ยท Web-based policy editor ยท Test coverage
Apache 2.0 (open-core model). The complete language, compiler, Z3 verifier, runtime, CLI, MCP server, and all examples are open-source. See LICENSE.
Built with โค๏ธ by Chimera Protocol ยท Issues ยท Discussions ยท Email