Skip to content

Implement Trajectory collection#86

Open
tamoghnokandar wants to merge 3 commits intowithmartian:mainfrom
tamoghnokandar:main
Open

Implement Trajectory collection#86
tamoghnokandar wants to merge 3 commits intowithmartian:mainfrom
tamoghnokandar:main

Conversation

@tamoghnokandar
Copy link
Copy Markdown

@tamoghnokandar tamoghnokandar commented Feb 10, 2026

User description

Summary

Adds optional trajectory collection to ARES environments. Episodes can be recorded as full (observation, action, reward, discount) sequences for behavior cloning, offline RL, and debugging.

Changes

New module: ares.environments.trajectory

  • StepRecord – Frozen dataclass for a single step (step_index, step_type, observation, action, reward, discount, timestamp).
  • EpisodeTrajectory – Dataclass for episode metadata (episode_id, task_name, steps, start_time, end_time, total_reward, truncated).
  • TrajectoryCollector – Protocol with begin_episode(), record_step(), and end_episode().
  • JsonTrajectoryCollector – Writes each episode to a JSON file in a configurable output directory.
  • Serialization helpersserialize_llm_request() and serialize_llm_response() for JSON-safe storage.

Integration in CodeEnvironment

  • Optional trajectory_collector parameter.
  • In reset(): calls begin_episode() and records the FIRST step (initial observation).
  • In step(): records each step (action, next observation, reward, discount); on LAST step, calls end_episode() with truncation status.

Registry and presets

  • ares.make() accepts trajectory_collector and forwards it to presets.
  • Presets pass trajectory_collector through to CodeEnvironment.

Step semantics (dm_env-style)

  • FIRST: observation only; action/reward/discount are None.
  • MID: full (observation, action, reward, discount).
  • LAST: final action; observation may be None; reward is the episode reward; discount is 0.0 (terminal) or 1.0 (truncated).

Usage

from ares.environments.trajectory import JsonTrajectoryCollector

collector = JsonTrajectoryCollector(output_dir="./trajectories")

env = ares.make(
    "sbv-mswea",
    trajectory_collector=collector
)

# Run episodes as usual; each episode is written to {episode_id}.json

Testing

  • 25 unit tests in trajectory_test.py
  • Test coverage includes:
    • Serialization
    • Data models
    • JsonTrajectoryCollector lifecycle

Run the trajectory collection tests

uv run pytest src/ares/environments/trajectory_test.py -v

Design Notes

  • Opt-in collection

    • trajectory_collector=None disables recording
  • Protocol-based design

    • Aligns with the existing protocol-driven architecture
  • Non-intrusive integration

    • No changes to the RL loop
    • Trajectory collection is layered on top of the existing reset / step flow

Generated description

Below is a concise technical summary of the changes proposed in this PR:
Add optional trajectory collection by introducing the ares.environments.trajectory module with data models, collectors, and serialization helpers for capturing episodes. Integrate trajectory_collector wiring through CodeEnvironment, the registry, and presets so ares.make can record episodes when provided a collector.

TopicDetails
Trajectory storage Capture episode data with the new trajectory module featuring StepRecord, EpisodeTrajectory, TrajectoryCollector, JsonTrajectoryCollector, and JSON serialization helpers, along with pytest coverage for serialization and persistence.
Modified files (3)
  • src/ares/environments/__init__.py
  • src/ares/environments/trajectory.py
  • src/ares/environments/trajectory_test.py
Latest Contributors(1)
UserCommitDate
joshua.greaves@gmail.comAdd basic code agents ...December 18, 2025
Env integration Hook trajectory_collector into CodeEnvironment.reset/step, the preset plumbing, and the public registry so ares.make can transparently forward collectors and persist episodes.
Modified files (4)
  • src/ares/__init__.py
  • src/ares/environments/code_env.py
  • src/ares/presets.py
  • src/ares/registry.py
Latest Contributors(2)
UserCommitDate
Narmeen07Add mechanistic interp...February 19, 2026
joshua.greaves@gmail.comMassively simplify the...January 29, 2026
This pull request is reviewed by Baz. Review like a pro on (Baz).

Summary by CodeRabbit

  • New Features
    • Episode trajectory collection is now available to record and persist comprehensive step-by-step execution data from your environments. Use the optional trajectory collector to save detailed records to JSON files, enabling better analysis and debugging of agent behavior during training and evaluation.

@joshgreaves
Copy link
Copy Markdown
Contributor

Thanks for this PR! Sorry for the slow review, scheduled time tomorrow to take a detailed look.

Copy link
Copy Markdown
Contributor

@joshgreaves joshgreaves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this PR! It took me a long time to think about what I think should be the best design here. This is what I got to:

Let's make the TrajectoryCollectingEnvironment(env, trajectory_collector) a wrapper.

  • Takes an env and trajectory_collector in init
  • Follows the Environment protocol
  • Records based on calls to aenter, reset, step, close, and aexit, using methods on trajectory_collector

The big benefits are:

  • It's backwards compatible. We aren't updating any signatures.
  • It will apply to all environments wrapped this way, not just code environments.

The cases I’d treat as episode end are:

  • step() returns LAST -> normal finish
  • reset() while an episode is active -> previous episode is abandoned/interrupted
  • close() / aexit() while active -> previous episode is aborted/closed
  • step() or reset() raises -> current episode is errored/aborted
  • task cancellation during step() / reset() -> aborted/cancelled

from ares.environments.trajectory import StepRecord
from ares.environments.trajectory import TrajectoryCollector

__all__ = [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable, but we might want to consider where is the right place to export these.

from ares.containers import containers
from ares.containers import daytona as ares_daytona
from ares.environments import base
from ares.environments import trajectory as trajectory_mod
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer trajectory_lib over trajectory_mod, since it's consistent with other parts of the codebase.

self._step_limit = step_limit
self._prefix = prefix
self._tracker = tracker if tracker is not None else stat_tracker.NullStatTracker()
self._trajectory_collector = trajectory_collector
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a NullTrajectoryTracker (like NullStattracker), since it simplifies the control flow a bit (don't have to check if the tracker is None in multiple places)

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

This PR adds episode trajectory collection capabilities to ARES. It introduces data structures (StepRecord, EpisodeTrajectory) and a TrajectoryCollector protocol with JsonTrajectoryCollector and NullTrajectoryCollector implementations. The trajectory collection is integrated into CodeEnvironment to record episode steps, and exposed through the public API via ares.make() and ares.get_env().

Changes

Cohort / File(s) Summary
Trajectory Data Structures & Collection Framework
src/ares/environments/trajectory.py
New module defining StepRecord and EpisodeTrajectory dataclasses with serialization/deserialization methods; introduces TrajectoryCollector protocol; provides NullTrajectoryCollector (no-op) and JsonTrajectoryCollector (writes episodes to disk); includes serialization helpers for LLM requests/responses.
Trajectory Tests
src/ares/environments/trajectory_test.py
Comprehensive test coverage for serialization, dataclass round-tripping, JSON persistence, state management, and JsonTrajectoryCollector behavior including directory creation, file I/O, and protocol conformance.
Environment Integration
src/ares/environments/code_env.py
Updated CodeEnvironment.__init__ to accept optional trajectory_collector; modified reset() to invoke begin_episode(); updated step() to record step data via record_step() and finalize episodes via end_episode(); tracks truncation state.
Registry & Preset API
src/ares/registry.py, src/ares/presets.py
Added optional trajectory_collector parameter to ares.make(), EnvironmentSpec.get_env(), and HarborSpec.get_env(); forwards parameter through the environment construction chain; updated docstrings with trajectory examples.
Module Exports
src/ares/__init__.py, src/ares/environments/__init__.py
Exported trajectory-related types (EpisodeTrajectory, JsonTrajectoryCollector, StepRecord, TrajectoryCollector) at package-level API; added new environments submodule init.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Registry as ares.make()
    participant Environment as CodeEnvironment
    participant Collector as TrajectoryCollector
    participant Storage as JSON File

    User->>Registry: make(preset_id, trajectory_collector=...)
    Registry->>Environment: __init__(trajectory_collector=...)
    Environment->>Environment: _trajectory_collector = collector

    User->>Environment: reset()
    Environment->>Collector: begin_episode(task_name)
    Collector->>Collector: Initialize episode state
    Environment->>Collector: record_step(StepRecord(FIRST, ...))

    loop Per Environment Step
        User->>Environment: step(action)
        Environment->>Collector: record_step(StepRecord(MID, ...))
        Collector->>Collector: Append step
    end

    Note over Environment,Collector: Episode terminates
    Environment->>Collector: end_episode(truncated=...)
    Collector->>Storage: Write {episode_id}.json
    Collector-->>Environment: Return EpisodeTrajectory
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hops of joy, I must confess,
New trajectories laid to rest—
Each step is saved, each goal is traced,
In JSON files, so neatly placed!
The episode unfolds so bright,
From reset's dawn to end's last night! 🌙

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implement Trajectory collection' directly and clearly summarizes the main feature added in this changeset—a new trajectory collection system for recording episode episodes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch main

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +128 to +139
# Record the FIRST step in the trajectory.
# FIRST steps have only observation; action/reward/discount are None per dm_env semantics.
assert self._current_task is not None
self._trajectory_collector.begin_episode(task_name=self._current_task.name)
self._trajectory_collector.record_step(
trajectory_lib.StepRecord(
step_index=0,
step_type="FIRST",
observation=trajectory_lib.serialize_llm_request(result.observation),
action=None,
reward=None,
discount=None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset() begins a new episode without ending the previous one, causing interrupted trajectories to be discarded — should we call _trajectory_collector.end_episode(truncated=True) before starting a new episode and on close()?

Finding type: Logical Bugs | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/environments/code_env.py around lines 128-142, the reset() method currently
calls _trajectory_collector.begin_episode(...) and record_step(...) without first ending
a previously active episode. Refactor by inserting a check to call
self._trajectory_collector.end_episode(truncated=True) if an episode is active (or
unconditionally call end_episode(truncated=True) before begin_episode) so interrupted
episodes are finalized rather than discarded, then call begin_episode and record the
FIRST step as before. Also modify the close() method (where the environment is closed)
to call self._trajectory_collector.end_episode(truncated=True) if an episode is active
so partial episodes are finalized on close as well.

Comment on lines +103 to +107
@dataclasses.dataclass
class EpisodeTrajectory:
"""A complete episode trajectory with metadata and step records.

Attributes:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EpisodeTrajectory is mutable despite CLAUDE.md recommending frozen dataclasses — should we make it frozen=True and return updated copies or explicitly document/allow this mutation?

Finding type: AI Coding Guidelines | Severity: 🟠 Medium


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/environments/trajectory.py around lines 103-107 (the EpisodeTrajectory
dataclass) and around lines 268-276 (JsonTrajectoryCollector.end_episode): Make
EpisodeTrajectory a frozen dataclass (add frozen=True) and change its steps field to an
immutable sequence type (e.g. tuple[StepRecord, ...]). Then refactor
JsonTrajectoryCollector so it no longer mutates an EpisodeTrajectory in place: keep
internal mutable state (e.g. self._current_steps: list[StepRecord] plus
episode_id/task_name/start_time) in begin_episode and record_step, and in end_episode
construct and return a new EpisodeTrajectory instance (with
steps=tuple(self._current_steps), end_time, total_reward, num_steps, truncated) instead
of mutating fields on an existing object; clear the internal state afterwards. Update
NullTrajectoryCollector and any creation sites accordingly to match the new frozen
EpisodeTrajectory signature.

Comment on lines +619 to +624
env = spec.get_env(
selector=selector,
container_factory=container_factory,
tracker=tracker,
trajectory_collector=trajectory_collector,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registry.make() forwards trajectory_collector to spec.get_env() causing TypeError for specs that don't accept that kwarg — should we only pass supported args or add a compatibility shim/**kwargs?

Finding type: Breaking Changes | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/registry.py around lines 619-624, the make() function unconditionally forwards
trajectory_collector to spec.get_env causing TypeError for specs that don't accept that
kwarg. Change the call to inspect spec.get_env's parameters (via inspect.signature) or
attempt the call and fall back: build a kwargs dict with selector, container_factory,
tracker and only include trajectory_collector if the target function accepts it; then
call spec.get_env(**kwargs). Also adjust the decorator-generated spec at lines 407-421
to accept trajectory_collector as an optional kwarg or accept **kwargs and forward them
to func, so auto-generated specs remain compatible with older user functions.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/ares/registry.py (1)

407-421: ⚠️ Potential issue | 🟠 Major

Keep legacy @register_env factories working when no collector is requested.

This wrapper now always injects trajectory_collector=None. Any existing factory that still has the old (..., tracker=None) signature will start failing with unexpected keyword argument 'trajectory_collector' even on plain make(...) calls. Only forward the kwarg when the caller actually provided a collector, or gate it on the wrapped function's signature.

💡 Minimal backwards-compatible forwarding
             def get_env(
                 self,
                 *,
                 selector: TaskSelector,
                 container_factory: containers.ContainerFactory,
                 tracker: stat_tracker.StatTracker | None = None,
                 trajectory_collector: trajectory.TrajectoryCollector | None = None,
             ) -> base.Environment:
                 """Delegate to the decorated function."""
-                return func(
-                    selector=selector,
-                    container_factory=container_factory,
-                    tracker=tracker,
-                    trajectory_collector=trajectory_collector,
-                )
+                kwargs = {
+                    "selector": selector,
+                    "container_factory": container_factory,
+                    "tracker": tracker,
+                }
+                if trajectory_collector is not None:
+                    kwargs["trajectory_collector"] = trajectory_collector
+                return func(**kwargs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/registry.py` around lines 407 - 421, The wrapper function get_env in
the registry is always passing trajectory_collector=None into the decorated
factory (func) which breaks legacy factories that only accept tracker; update
get_env to only forward the trajectory_collector kwarg when a collector was
actually provided (trajectory_collector is not None) or when the wrapped
function accepts that parameter (inspect.signature(func) includes
"trajectory_collector"); modify the call site inside get_env to build kwargs
dynamically (always include selector, container_factory, tracker, and
conditionally include trajectory_collector) or gate forwarding based on the func
signature so old factories continue to work.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ares/environments/code_env.py`:
- Around line 164-197: The step() flow currently leaves an in-progress
trajectory open if _get_time_step() (or subsequent processing) raises; catch
exceptions around the time-step retrieval/processing in step() and ensure the
episode is finalized before re-raising: if an exception occurs, set
self._requires_reset = True, cancel self._code_agent_task if not None, and call
self._trajectory_collector.end_episode(truncated=True) (or end_episode with an
appropriate truncated flag) so the JSON trajectory is closed, then re-raise the
original exception; locate this handling around the call to
self._get_time_step(), the surrounding logic that sets truncated and LAST, and
the recording block that uses trajectory_lib.StepRecord and
_trajectory_collector.

In `@src/ares/registry.py`:
- Around line 619-624: The call to spec.get_env in make() passes
trajectory_collector unconditionally which breaks third‑party EnvironmentSpec
implementations; instead build the kwargs dynamically (e.g., create a dict with
selector, container_factory, tracker and only set 'trajectory_collector' when
trajectory_collector is not None) and call spec.get_env(**kwargs) so older
register_preset() specs that lack that parameter continue to work; modify the
code around the spec.get_env invocation to conditionally include the
trajectory_collector key.

---

Outside diff comments:
In `@src/ares/registry.py`:
- Around line 407-421: The wrapper function get_env in the registry is always
passing trajectory_collector=None into the decorated factory (func) which breaks
legacy factories that only accept tracker; update get_env to only forward the
trajectory_collector kwarg when a collector was actually provided
(trajectory_collector is not None) or when the wrapped function accepts that
parameter (inspect.signature(func) includes "trajectory_collector"); modify the
call site inside get_env to build kwargs dynamically (always include selector,
container_factory, tracker, and conditionally include trajectory_collector) or
gate forwarding based on the func signature so old factories continue to work.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b89f7305-5ee5-4162-9ebf-cbcc8d77336b

📥 Commits

Reviewing files that changed from the base of the PR and between 519e893 and 0068123.

📒 Files selected for processing (7)
  • src/ares/__init__.py
  • src/ares/environments/__init__.py
  • src/ares/environments/code_env.py
  • src/ares/environments/trajectory.py
  • src/ares/environments/trajectory_test.py
  • src/ares/presets.py
  • src/ares/registry.py

Comment on lines 164 to +197
with self._tracker.timeit(f"{self._prefix}/get_time_step"):
ts = await self._get_time_step()

truncated = False
if self._step_count >= self._step_limit:
_LOGGER.debug("[%d] Step limit reached. Returning LAST timestep.", id(self))
assert self._code_agent_task is not None
self._code_agent_task.cancel()
# Truncation: step_type="LAST", discount=1.0, unless we're _also_ already in a terminal state.
truncated = ts.step_type != "LAST"
ts = base.TimeStep(step_type="LAST", reward=ts.reward, discount=ts.discount, observation=ts.observation)

if ts.step_type == "LAST":
self._requires_reset = True

# Record the step in the trajectory.
self._trajectory_collector.record_step(
trajectory_lib.StepRecord(
step_index=self._step_count,
step_type=ts.step_type,
observation=(
trajectory_lib.serialize_llm_request(ts.observation)
if ts.observation is not None
else None
),
action=trajectory_lib.serialize_llm_response(action),
reward=ts.reward,
discount=ts.discount,
timestamp=time.time(),
)
)
if ts.step_type == "LAST":
self._trajectory_collector.end_episode(truncated=truncated)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Finalize the trajectory when step() exits via an error.

_get_time_step() already raises on agent failures (Lines 214-220), but this block only flips _requires_reset and calls end_episode() on the happy-path LAST branch. After an exception, the environment still looks reusable and the in-progress trajectory is left hanging; JsonTrajectoryCollector.begin_episode() later overwrites it on the next reset (src/ares/environments/trajectory.py Lines 237-243). Please abort/finalize the episode and force a reset before re-raising.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/code_env.py` around lines 164 - 197, The step() flow
currently leaves an in-progress trajectory open if _get_time_step() (or
subsequent processing) raises; catch exceptions around the time-step
retrieval/processing in step() and ensure the episode is finalized before
re-raising: if an exception occurs, set self._requires_reset = True, cancel
self._code_agent_task if not None, and call
self._trajectory_collector.end_episode(truncated=True) (or end_episode with an
appropriate truncated flag) so the JSON trajectory is closed, then re-raise the
original exception; locate this handling around the call to
self._get_time_step(), the surrounding logic that sets truncated and LAST, and
the recording block that uses trajectory_lib.StepRecord and
_trajectory_collector.

Comment on lines +619 to +624
env = spec.get_env(
selector=selector,
container_factory=container_factory,
tracker=tracker,
trajectory_collector=trajectory_collector,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't break existing register_preset() specs by passing the new kwarg unconditionally.

make() now always calls spec.get_env(..., trajectory_collector=None). Any third-party EnvironmentSpec that still implements the previous signature will now fail on ordinary make() usage, even though trajectory collection was not requested. Build the kwargs dynamically and only include trajectory_collector when it is non-None.

💡 Backwards-compatible call construction
-    env = spec.get_env(
-        selector=selector,
-        container_factory=container_factory,
-        tracker=tracker,
-        trajectory_collector=trajectory_collector,
-    )
+    get_env_kwargs = {
+        "selector": selector,
+        "container_factory": container_factory,
+        "tracker": tracker,
+    }
+    if trajectory_collector is not None:
+        get_env_kwargs["trajectory_collector"] = trajectory_collector
+    env = spec.get_env(**get_env_kwargs)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env = spec.get_env(
selector=selector,
container_factory=container_factory,
tracker=tracker,
trajectory_collector=trajectory_collector,
)
get_env_kwargs = {
"selector": selector,
"container_factory": container_factory,
"tracker": tracker,
}
if trajectory_collector is not None:
get_env_kwargs["trajectory_collector"] = trajectory_collector
env = spec.get_env(**get_env_kwargs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/registry.py` around lines 619 - 624, The call to spec.get_env in
make() passes trajectory_collector unconditionally which breaks third‑party
EnvironmentSpec implementations; instead build the kwargs dynamically (e.g.,
create a dict with selector, container_factory, tracker and only set
'trajectory_collector' when trajectory_collector is not None) and call
spec.get_env(**kwargs) so older register_preset() specs that lack that parameter
continue to work; modify the code around the spec.get_env invocation to
conditionally include the trajectory_collector key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants