Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
38edfc2
fleet integartion step 0
Dec 12, 2025
f67bc43
updated README
Dec 12, 2025
164853c
readme update
Dec 12, 2025
935826f
another iteraton
Dec 13, 2025
7c09d5b
readme
Dec 18, 2025
eac8d0e
Add FleetTaskEnv for Gymnasium-compatible task environments
Jan 17, 2026
7efae22
conb
Jan 21, 2026
791a071
Add __init__.py to envs package for pip install compatibility
Jan 22, 2026
a24eaf6
fix: Remove default image_type="mcp" to use standard ECR images
Jan 22, 2026
9df9351
fix: Add data_key and data_version params to from_fleet()
Jan 22, 2026
46d1779
fix: Combine data_key and data_version into Fleet SDK format
Jan 22, 2026
7852847
fix: Remove seed parameter from reset() call
Jan 22, 2026
7697fcd
feat: Add request_timeout_s parameter to FleetTaskEnv
Jan 23, 2026
98ec667
Fix: make reset() a sync wrapper around reset_async()
Jan 25, 2026
b62c7e6
Fix: extract text content from MCP CallToolResult
Jan 25, 2026
9ef1dec
Add tests for tool extraction and reset behavior
Jan 25, 2026
1a3e27b
feat: fetch tools in __init__, simplify reset_async
Jan 25, 2026
336ff02
fix: add detailed logging for verifier execution failures
Jan 25, 2026
abb6936
fix: Use Fleet SDK Task.verify_detailed() for verifier execution
Jan 26, 2026
ced5eca
Fix: fetch tools lazily in reset_async to avoid asyncio.run in async …
Jan 26, 2026
d23f08d
fix: add retry with backoff for MCP list_tools and log errors
Jan 28, 2026
f938ab9
fix: add retry logic to call_tool for connection failures
Jan 28, 2026
a2f3531
debug: add logging for call_tool to trace success/failure paths
Jan 29, 2026
a08cb6d
fix: unwrap ExceptionGroup to show actual error cause
Jan 29, 2026
9806eb8
debug: add logging for Fleet instance creation timing
Jan 29, 2026
584d613
fix: Add retry logic to Fleet.make() for transient failures
Feb 5, 2026
93638df
feat(fleet_env): add ContextManager for context management tools
Feb 8, 2026
a1ac1a7
Use image_type='mcp' for computer_use tasks
Feb 10, 2026
1b66bab
fix: Fetch tools for all modalities (tool_use and computer_use)
Feb 11, 2026
f40cd91
Filter to only computer tool for computer_use modality
Feb 11, 2026
3ac6f82
fix: Handle ImageContent in MCP and filter tools for computer_use
Feb 12, 2026
461413e
Add initial screenshot on reset for computer_use tasks
Feb 12, 2026
6e4a522
debug: Log actual screenshot result format from MCP
Feb 13, 2026
80be63b
fix: Handle Fleet MCP base64_image format for VL models
Feb 13, 2026
7a1a755
Remove debug logging from task_env.py
Feb 13, 2026
0c0b535
test: Add tests for base64_image format handling
Feb 13, 2026
675e652
Add reset_timeout_s to avoid blocking on broken manager APIs
Feb 16, 2026
9e7390d
feat: Add env_key to error logs for better debugging
Feb 19, 2026
e2486cb
fix: Add HTTP-level timeouts to streamablehttp_client calls
Feb 22, 2026
8370cd1
fix: Raise on exhausted list_tools retries, allow retry on next reset
Feb 25, 2026
272b0b3
feat: Add Logfire error tracking to fleet env
Feb 26, 2026
216d16a
fix(telemetry): consistent schema with env_key, env_version, task_key…
Feb 27, 2026
34dd0d9
feat(telemetry): add fleet_mcp_tool_error for MCP server errors
Feb 27, 2026
03fbb92
fix(telemetry): set context BEFORE Fleet.make() so init failures have…
Feb 27, 2026
9d12c3e
fix: address bugbot issues in PR #6
Feb 27, 2026
0bc9cfd
fix: add hard timeout to MCP operations to prevent hanging
Feb 27, 2026
b061b4e
fix: async Fleet.make() to prevent event loop blocking
Feb 27, 2026
d8c5ddc
fix: enrich fleet_mcp_tool_error with env:version and step info
Feb 27, 2026
e627cd0
fix: suppress noisy logfire console output and tracebacks
Feb 28, 2026
73b81b5
Fix telemetry dashboard: count init failures as rollouts, add total_s…
Feb 28, 2026
1e9bfce
Add fleet_provisioning_completed telemetry event with provisioning_ti…
Feb 28, 2026
f4ed59b
Fix MCP endpoint routing, telemetry gap, and retry config
Mar 3, 2026
327c782
fix: call_tool retry used non-existent retry_base_delay attr
Mar 3, 2026
887fd1f
feat: auto-select instance TTL based on modality
Mar 3, 2026
84de403
Emit fleet_rollout_completed on close() for orphaned rollouts
Mar 4, 2026
77b9d6a
Simplify orphaned rollout stop reasons to max_steps / abandoned
Mar 4, 2026
540530a
Add Fleet telemetry section to README
Mar 4, 2026
199f67f
Increase tool_use TTL from 600s to 900s to reduce 502s from instance …
Mar 4, 2026
33d53c9
fix: use asyncio.to_thread(Fleet.make()) instead of AsyncFleet.make()
Mar 4, 2026
0d37811
fix: wrap sync blocking calls in asyncio.to_thread() to unblock event…
Mar 4, 2026
f86fa49
feat: add trace upload utilities for eval rollouts
Mar 7, 2026
290600e
fix: Convert OpenAI image_url blocks to Fleet ingest format for prope…
Mar 8, 2026
c99c1e5
fix: Pass reward as score to ingest API so sessions complete
Mar 8, 2026
fc0508f
Add hint-based reward for solver RL training (Options B, C, D)
Mar 8, 2026
31fa602
Revert "Add hint-based reward for solver RL training (Options B, C, D)"
Mar 8, 2026
b8ee588
Merge pull request #6 from fleet-ai/deniz/fleet-logfire
dzorlu Mar 11, 2026
cc5bf37
feat: add partial reward support behind flag
Mar 11, 2026
7c5d64f
Merge pull request #9 from fleet-ai/feat/partial-reward
dzorlu Mar 11, 2026
06dcdd4
feat: Run verifier at close() for orphaned rollouts
Mar 12, 2026
baa192a
Merge pull request #10 from fleet-ai/feat/close-verifier
dzorlu Mar 12, 2026
d651a01
feat: Add submit_final_answer synthetic tool for carlisle tasks
Mar 13, 2026
5111c78
merge: resolve conflicts with deniz/fleet_client (_reward_computed)
Mar 13, 2026
cf91b04
Merge pull request #11 from fleet-ai/deniz/submit-final-answer
dzorlu Mar 13, 2026
2290cb8
feat: Add TaskEvaluator for task generation inner loop
Feb 28, 2026
3c5f902
refactor: rewrite TaskEvaluator to use Fleet harness (POST /v1/jobs)
Mar 1, 2026
fafc3ea
Fix model ID format: use provider/model prefix for Fleet harness
Mar 1, 2026
6bc5939
Make _poll_job async to avoid blocking the event loop
Mar 1, 2026
0695f71
Fix model ID mismatch between Fleet API and configured models
Mar 2, 2026
0051e92
fix: Remove unused json import, defensive copy DEFAULT_MODELS
Mar 15, 2026
8605142
Merge pull request #12 from fleet-ai/deniz/task-evaluator
dzorlu Mar 15, 2026
3566e7c
Expose verifier feedback properties for hint generation
Mar 16, 2026
438c7b3
Merge pull request #14 from fleet-ai/deniz/hint-feedback
dzorlu Mar 17, 2026
a5c82e0
feat: Add DB query methods to FleetEnvClient
Mar 18, 2026
270e010
fix: properly await async describe/query on AsyncFleet env handles
Mar 20, 2026
534fd30
fix: handle null base64_image from Fleet MCP screenshot responses
Mar 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -109,3 +109,6 @@ outputs/
.uv/

*.backup*/

# logs
*.log
36 changes: 36 additions & 0 deletions PR_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
### PR: Fleet environments (OpenEnv)

This PR documents and refines the **Fleet** runtime integration for OpenEnv.

#### What this enables
- Run OpenEnv environments on **Fleet (remote)** with **no local Docker**.
- Keep a strict split between:
- **Orchestration (HTTP)**: `reset / step / state`
- **Agent actions (MCP)**: `tools/list + tools/call`

#### What this is *not*
- This is **not** the local “Dockerized env server + env container” setup.
- There is **no container/provider abstraction** here; Fleet hosts the runtime remotely (HTTP env server + MCP service). The client only connects.

#### Main abstractions
- **`FleetEnvClient` (HTTP)**: orchestrator handle for reset/step/state.
- **`FleetMCPTools` (MCP)**: agent handle for listing/calling tools.
- Unions tools across Fleet’s MCP endpoints (today often `api/v1/mcp` and `mcp`)
- Returns tools in **OpenAI “tools” dict format** (via `convert_tool_format`)
- Routes tool calls to the owning endpoint (cached after discovery)

#### Quickstart
- Install: `pip install "openenv-core[fleet]"`
- Set: `export FLEET_API_KEY="..."`
- Run: `python examples/fleet_env_example.py <env_key>`

#### References
- RFC 001: `rfcs/001-abstractions.md`
- RFC 003: `rfcs/003-mcp-support.md`

#### TODOs / known sharp edges
- Endpoint discovery (avoid hardcoding `api/v1/mcp` vs `mcp`)
- Reset inconsistencies across some env keys (better errors + compatibility notes)
- Tool-name collision policy across endpoints
- Retries/backoff and clearer “endpoint down” failure modes

19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,25 @@ Supporters include: Meta-PyTorch, Hugging Face, [Patronus AI](https://patronus.a

And we'd also like to acknowledge the team at Farama Foundation as the OpenEnv API was heavily inspired by the work you all have done on Gymnasium. Cheers!

## Fleet Telemetry

`FleetTaskEnv` emits Logfire events to track rollout lifecycle. Every `fleet_rollout_started` gets a matching `fleet_rollout_completed` with a `failure_reason`:

```
started = completed + init_err + tools_err + no_computer + max_steps + abandoned
```

| `failure_reason` | When |
|---|---|
| *(null)* | Rollout completed normally (verifier ran) |
| `init_error` | Fleet provisioning failed |
| `tools_error` | `list_tools()` MCP call failed |
| `computer_tool_missing` | CUA modality but no `computer` tool |
| `max_steps` | Caller hit turn limit without running verifier |
| `abandoned` | Caller stopped early (context overflow, job cancelled, crash) |

Set `LOGFIRE_TOKEN` to enable. Events include `step_count`, `reward`, `verifier_success`, and task context (env_key, version, modality).

## License

BSD 3-Clause License (see [LICENSE](./LICENSE) file)
153 changes: 153 additions & 0 deletions examples/fleet_env_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
"""
Example: Orchestrator + Agent loop using OpenEnv on Fleet.

Demonstrates the split architecture:
1. Orchestrator: Provisions environment, resets episodes (HTTP).
2. Agent: Lists tools, calls tools (MCP).

Prerequisites:
pip install "openenv-core[fleet]"
export FLEET_API_KEY="..."
export FLEET_ENV_KEY="..." # e.g. "browser-env" or your custom env
"""

import asyncio
import os
import random
import sys
from typing import Any, Dict, List, Sequence

# Ensure we can import from src/ if running from repo root
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "src")))

try:
# `openenv` installs top-level packages like `envs`, `core`, etc.
# This example also prepends `src/` above so it works from a repo checkout.
from envs.fleet_env import FleetEnvClient
except ImportError as e:
raise ImportError(
"Could not import `envs.fleet_env`. "
"Run from the repo root, or install OpenEnv in editable mode: "
"`python -m pip install -e '.[fleet]'`."
) from e

def get_openai_tool_param_enum(tool_def: Dict[str, Any], param_name: str) -> List[str]:
"""Extract an enum list for a parameter from an OpenAI 'tools' dict."""
schema = tool_def.get("function", {}).get("parameters", {})
if not isinstance(schema, dict):
return []
props = schema.get("properties", {})
if not isinstance(props, dict):
return []
param_spec = props.get(param_name, {})
if not isinstance(param_spec, dict):
return []
enum = param_spec.get("enum", [])
return enum if isinstance(enum, list) else []

SAFE_COMPUTER_ACTION_PREFERENCE: Sequence[str] = ("screenshot", "wait", "cursor_position")


def pick_safe_computer_action(tool_def: Dict[str, Any]) -> str:
"""Pick a non-destructive default action for the Fleet 'computer' tool.

Prefer safe actions like screenshot/wait, falling back to first enum.
"""
actions = get_openai_tool_param_enum(tool_def, "action")
if not actions:
raise ValueError("Tool 'computer' has no available actions in schema.")

action_set = set(actions)
safe_available = [a for a in SAFE_COMPUTER_ACTION_PREFERENCE if a in action_set]
if safe_available:
return random.choice(safe_available)
return actions[0]

def main():
api_key = os.environ.get("FLEET_API_KEY")

# 1. Get env_key from args or env var
env_key = sys.argv[1] if len(sys.argv) > 1 else os.environ.get("FLEET_ENV_KEY")

if not api_key or not env_key:
print("Usage: python fleet_env_example.py <env_key>")
print(" or: export FLEET_ENV_KEY=... && python fleet_env_example.py")
raise ValueError("Please set FLEET_API_KEY and provide an env_key.")

print(f"Provisioning Fleet environment: {env_key}...")

# 1. Provision & Split Handles (Synchronous)
# This must be run outside of an async loop because it manages its own loop.
try:
orch, tools = FleetEnvClient.from_fleet(
api_key=api_key,
env_key=env_key,
ttl_seconds=600, # 10 min TTL
)
except Exception as e:
raise ValueError(f"Failed to provision environment: {e}")


try:
# Run the async agent loop
asyncio.run(agent_loop(orch, tools))
except BaseException as e:
print(f"\n❌ Agent loop failed: {e}")
finally:
# 5. Cleanup (Synchronous)
print("\nOrchestrator: Closing environment...")
orch.close()
print("Done.")


async def agent_loop(orch, tools):
# 2. Orchestration: Start Episode (HTTP calls, sync method but we wrap or call directly)
# orch.reset() is sync (requests), so it blocks the loop briefly. That's fine for this example.
print("Orchestrator: Resetting environment...")
obs = orch.reset()
print(f"Reset complete. Initial observation keys: {list(obs.observation.metadata.keys())}")

# 3. Agent: Discover Tools (Async)
print("\nAgent: Discovering tools...")
listed = await tools.list_tools()
tool_defs = listed.tools
print(f"Available tools ({len(tool_defs)}): {[t['function']['name'] for t in tool_defs]}")
# Print the derived schema payloads (mirrors MCP Tool.inputSchema content, but OpenAI-shaped)
print([t["function"]["parameters"] for t in tool_defs])

if not tool_defs:
print("No MCP tools available (all MCP endpoints may be down).")
return

# 4. Agent: Call a Tool
target_tool_name = "computer"
target_def = next((t for t in tool_defs if t["function"]["name"] == target_tool_name), None)

if not target_def:
print(f"Tool '{target_tool_name}' not found, picking first available.")
target_def = tool_defs[0]
target_tool_name = target_def["function"]["name"]

print(f"\nTarget Tool: {target_tool_name}")
# Inspect schema to construct params (in a real agent, the LLM does this)
# schema = target_def["function"]["parameters"]
# print(f"Schema: {json.dumps(schema, indent=2)}")

params = {}
if target_tool_name == "computer":
# Choose a supported action from the schema (safe default).
params = {"action": pick_safe_computer_action(target_def)}

print(f"\nAgent: Calling tool '{target_tool_name}' with {params}...")
result = await tools.call_tool(target_tool_name, params)


# Result is typically a list of MCP content objects (TextContent/ImageContent)
# We'll just print a summary.
print("Agent: Tool execution result received.")
print(f"{result=}")


if __name__ == "__main__":
main()

10 changes: 10 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,13 @@ dependencies = [
"tomli-w>=1.2.0"
]

[project.optional-dependencies]
fleet = [
"mcp>=1.0.0",
"fleet-python>=0.2.79",
"logfire>=3.0.0",
]

[project.scripts]
openenv = "openenv_cli.__main__:main"

Expand All @@ -39,6 +46,9 @@ include-package-data = true
[tool.setuptools.packages.find]
where = ["src"]

[tool.pytest.ini_options]
pythonpath = ["src"]

[tool.coverage.run]
omit = [
"openenv_cli/templates/**",
Expand Down
1 change: 1 addition & 0 deletions src/envs/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# OpenEnv environments package
Loading