Skip to content

Add gymnasium-compatible wrappers for ARES environments (fixes #25)#98

Open
sarvanithin wants to merge 3 commits intowithmartian:mainfrom
sarvanithin:feat/gymnasium-wrapper-issue-25
Open

Add gymnasium-compatible wrappers for ARES environments (fixes #25)#98
sarvanithin wants to merge 3 commits intowithmartian:mainfrom
sarvanithin:feat/gymnasium-wrapper-issue-25

Conversation

@sarvanithin
Copy link
Copy Markdown
Contributor

@sarvanithin sarvanithin commented Mar 23, 2026

Summary

Closes #25.

Adds wrap_as_gym() — a factory that wraps any ARES Environment in a gymnasium v26+ compatible interface, making ARES accessible to RL researchers and libraries already familiar with the gymnasium API.

New public API

from ares.environments import wrap_as_gym

# Synchronous (default) — works from scripts / training loops
async with MyAresEnv(...) as ares_env:
    env = wrap_as_gym(ares_env)
    obs, info = env.reset()
    while True:
        action = policy(obs)
        obs, reward, terminated, truncated, info = env.step(action)
        if terminated or truncated:
            break

# Async — for use inside asyncio event loops / Jupyter
async with MyAresEnv(...) as ares_env:
    env = wrap_as_gym(ares_env, async_mode=True)
    obs, info = await env.reset()

Design decisions

  • observation_space / action_space are None — ARES operates on structured LLMRequest/LLMResponse objects, not numpy arrays, so standard gymnasium spaces don't apply. Libraries that strictly need array spaces can subclass and override.
  • truncated is always False — ARES raises TimeoutError for time limits rather than using a truncation flag.
  • info["time_step"] passes through the raw TimeStep so callers can inspect step_type, discount, etc. if needed.
  • GymWrapper raises RuntimeError if called from inside a running event loop, with a clear message directing users to AsyncGymWrapper.

Files changed

File Change
src/ares/environments/gym_wrapper.py New — GymWrapper, AsyncGymWrapper, wrap_as_gym
src/ares/environments/gym_wrapper_test.py New — 19 unit tests
src/ares/environments/__init__.py Export new symbols

Test plan

  • 19 unit tests, all passing (uv run pytest src/ares/environments/gym_wrapper_test.py)
  • uv run ruff check passes
  • uv run ruff format applied

Summary by CodeRabbit

Release Notes

  • New Features

    • Environment state snapshotting: Export environment state and restore from saved snapshots to resume progress later
    • Gymnasium-compatible wrappers: Standardized sync and async interfaces for environment interaction
  • Documentation

    • Added example demonstrating state export and restoration workflow

Implements snapshot/restore functionality to save and replay episodes
from specific checkpoints. Useful for debugging, trajectory analysis,
and mechanistic interpretability.

- Add EnvironmentSnapshot dataclass for serializing env state
- Implement export_state() and load_from_state() methods on CodeBaseEnv
- Support both SWEBench and Harbor environments
- Save container filesystem as tarball with JSON metadata
- Snapshots only work at episode boundaries (can't snapshot mid-episode)
- Add comprehensive test coverage

Closes withmartian#39
Moved load_from_state to SweBenchEnv and HarborEnv since they need
different constructor arguments (tasks list). Base class now provides
_restore_from_snapshot helper that subclasses call after init.

- Add load_from_state implementation to SweBenchEnv
- Add load_from_state implementation to HarborEnv
- Refactor base class to use _restore_from_snapshot helper
- Add test for load_from_state functionality
…rtian#25)

Introduces GymWrapper (sync) and AsyncGymWrapper (async) that adapt any
ARES Environment to the gymnasium v26+ interface (reset/step/close with
(obs, reward, terminated, truncated, info) returns).

- wrap_as_gym(env) returns a synchronous GymWrapper by default
- wrap_as_gym(env, async_mode=True) returns an AsyncGymWrapper for use
  inside existing asyncio event loops (e.g. Jupyter notebooks)
- observation_space / action_space exposed as None since ARES operates
  on structured LLM objects rather than numpy array spaces
- TimeStep passed through in info["time_step"] for callers that need
  step_type or discount
- 19 unit tests covering reset, step, close, context managers, and the
  factory function
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 23, 2026

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ee456781-b355-40d1-a3c5-e9645de41843

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 7dbbada.

📒 Files selected for processing (9)
  • examples/03_state_snapshotting.py
  • src/ares/environments/__init__.py
  • src/ares/environments/base.py
  • src/ares/environments/gym_wrapper.py
  • src/ares/environments/gym_wrapper_test.py
  • src/ares/environments/harbor_env.py
  • src/ares/environments/snapshot.py
  • src/ares/environments/snapshot_test.py
  • src/ares/environments/swebench_env.py
 ________________________
< I am the one who bugs. >
 ------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.

Add a .trivyignore file to your project to customize which findings Trivy reports.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

This changeset introduces environment state snapshotting and gymnasium-compatible wrappers for ARES. The snapshot system enables exporting and restoring complete environment states via export_state() and load_from_state() methods, with serialization handled by a new EnvironmentSnapshot dataclass. Gymnasium wrappers (GymWrapper, AsyncGymWrapper) provide sync/async interfaces mapping ARES semantics to gymnasium conventions. Concrete implementations in SweBenchEnv and HarborEnv supply task serialization logic, and a new example demonstrates the full snapshotting workflow.

Changes

Cohort / File(s) Summary
Snapshot Infrastructure
src/ares/environments/snapshot.py, src/ares/environments/snapshot_test.py
Introduces EnvironmentSnapshot frozen dataclass capturing task/container/agent state with file persistence (save_to_file, load_from_file). Comprehensive tests validate serialization round-trips, metadata accuracy, and state reconstruction.
Base Environment Snapshot Support
src/ares/environments/base.py
Adds export_state() public method and restoration plumbing (_restore_from_snapshot, _restore_container) to CodeBaseEnv. Introduces abstract methods _serialize_task/_deserialize_task and helper methods for container/task type detection and validation. Expanded constructor with snapshot-related state fields.
Environment Implementations
src/ares/environments/swebench_env.py, src/ares/environments/harbor_env.py
Implements snapshot serialization (task-specific _serialize_task, _deserialize_task, _get_task_type) and restoration (load_from_state classmethod) for both SWE-bench and Harbor environments.
Gymnasium Wrappers
src/ares/environments/gym_wrapper.py, src/ares/environments/gym_wrapper_test.py
Introduces GymWrapper (sync) and AsyncGymWrapper (async) wrapping ARES environments in gymnasium-compatible interfaces. Includes wrap_as_gym() factory, event loop detection, and reward/termination normalization. Tests verify sync/async semantics, context managers, and action/observation spaces.
Public API & Example
src/ares/environments/__init__.py, examples/03_state_snapshotting.py
Establishes environments package public API exporting key types (Environment, TimeStep, wrappers, utilities). Example script demonstrates end-to-end snapshot export, metadata inspection, and restoration with continued execution.

Sequence Diagram(s)

sequenceDiagram
    participant Env as SweBenchEnv
    participant Container as Container
    participant Task as Task
    participant Snapshot as EnvironmentSnapshot
    participant Disk as Filesystem

    rect rgba(0, 100, 200, 0.5)
        Note over Env,Disk: Export Phase
        Env->>+Container: get_container()
        Container-->>-Env: Container instance
        Env->>Container: download_filesystem()
        Container-->>Env: filesystem.tar.gz
        Env->>+Task: serialize task
        Task-->>-Env: task_data dict
        Env->>Env: extract agent messages
        Env->>+Snapshot: create EnvironmentSnapshot
        Snapshot-->>-Env: snapshot instance
        Env->>Disk: save snapshot metadata (snapshot.json)
        Env->>Disk: persist filesystem archive
    end

    rect rgba(100, 200, 0, 0.5)
        Note over Env,Disk: Restore Phase
        Env->>+Disk: load snapshot metadata
        Disk-->>-Env: EnvironmentSnapshot
        Env->>+Task: deserialize task_data
        Task-->>-Env: Task instance
        Env->>+Container: restore from snapshot
        Container->>Disk: extract filesystem archive
        Container-->>-Env: Container restored
        Env->>Env: _restore_from_snapshot()
        Env->>Env: reset step/reset state
        Env-->>Env: ready for continued execution
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A snapshot in time, preserved with care,
State captured whole, ready to share,
From gymnasium's interface, new paths unfold,
ARES environments, wrapped up, retold! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes extensive out-of-scope changes unrelated to issue #25: environment snapshot/restore functionality (export_state, load_from_state, EnvironmentSnapshot dataclass, snapshot serialization tests) and base environment refactoring, which are separate features not mentioned in the linked issue. Remove or move snapshot-related changes to a separate PR: snapshot.py, snapshot_test.py, export_state/load_from_state methods, _serialize_task/_deserialize_task implementations, and related imports/exports. Keep only gymnasium wrapper changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 55.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding gymnasium-compatible wrappers for ARES environments. It is specific, concise, and directly related to the primary objective of closing issue #25.
Linked Issues check ✅ Passed The PR successfully implements all core requirements from issue #25: gymnasium interface mapping, correct ARES-to-gym API semantics, wrap_as_gym factory for library presentation, and both sync/async wrapper implementations with comprehensive tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (11)
src/ares/environments/base.py (2)

468-472: Fragile access to agent internal state.

_get_agent_messages accesses the private _messages attribute of the code agent via hasattr. This creates tight coupling to MiniSWECodeAgent's implementation. Consider adding a get_messages() method to the CodeAgent interface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 468 - 472, The helper
_get_agent_messages currently reaches into self._code_agent._messages using
hasattr which couples BaseEnvironment to MiniSWECodeAgent internals; add a
public get_messages(self) -> list[dict] method to the CodeAgent interface (and
implement it on MiniSWECodeAgent) and change _get_agent_messages to call
self._code_agent.get_messages() when _code_agent is not None, with a safe
fallback to [] if the method is missing or returns None.

544-548: Consider using timezone-aware timestamps.

datetime.datetime.now().isoformat() produces a naive datetime without timezone info. For cross-system snapshot comparisons, consider using UTC.

♻️ Suggested improvement
-            created_at=datetime.datetime.now().isoformat(),
+            created_at=datetime.datetime.now(datetime.timezone.utc).isoformat(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 544 - 548, The created_at value
for snapshot.EnvironmentSnapshot is currently using a naive timestamp
(datetime.datetime.now().isoformat()); update the assignment so created_at is a
UTC-aware ISO timestamp (e.g., generate the timestamp with datetime.timezone.utc
so it includes the UTC offset) when constructing snap (see
snapshot.EnvironmentSnapshot and the created_at field), ensuring cross-system
comparisons use timezone-aware UTC times.
src/ares/environments/harbor_env.py (1)

215-216: Inconsistent import path for DaytonaContainer.

Line 215 uses base.ares_daytona.DaytonaContainer, which accesses ares_daytona through the base module. However, ares_daytona is already imported at the top of this file (line 23). Use the direct import for consistency.

♻️ Proposed fix
-        container_factory = container_factory or base.ares_daytona.DaytonaContainer
+        container_factory = container_factory or ares_daytona.DaytonaContainer
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/harbor_env.py` around lines 215 - 216, The assignment
uses base.ares_daytona.DaytonaContainer while ares_daytona is already imported;
update the default factory expression (container_factory = container_factory or
base.ares_daytona.DaytonaContainer) to use the direct import
(ares_daytona.DaytonaContainer) so the symbol DaytonaContainer is referenced
consistently with the top-level import and avoids the nested base lookup.
src/ares/environments/snapshot_test.py (1)

92-109: Move imports to module level.

asyncio and contextlib are imported inside test functions (lines 93, 106, 122). Move these to the module level for clarity.

♻️ Suggested fix

Add to imports at top of file:

import asyncio
import contextlib

Then remove the inline imports from test functions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/snapshot_test.py` around lines 92 - 109, The test
contains inline imports of asyncio and contextlib; move "import asyncio" and
"import contextlib" to the module-level imports at the top of
src/ares/environments/snapshot_test.py, then remove the inline imports inside
the test that defines mock_running_task and the cleanup block (the code
referencing mock_running_task, env._code_agent_task, and
env._validate_snapshot_allowed should remain unchanged).
examples/03_state_snapshotting.py (2)

115-128: Note: Calling reset() after restore starts a new episode.

The comment on line 118-119 states "The environment is now at the same state as when we snapshotted", but line 120 immediately calls reset() which starts a fresh episode rather than continuing from the snapshotted state. The example demonstrates restoration correctly, but the comment could be misleading about what "same state" means here.

📝 Suggested clarification
         # The environment is now at the same state as when we snapshotted
-            # We can continue taking steps from here
-            ts = await restored_env.reset()  # Reset to start a new episode
+            # The task and container filesystem are restored, but we start a new episode
+            ts = await restored_env.reset()  # Start a new episode with restored container state
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/03_state_snapshotting.py` around lines 115 - 128, The comment
incorrectly implies restored_env stays at the snapshotted episode while the code
immediately calls restored_env.reset(), which actually starts a new episode;
update the explanatory comment around restored_env and reset() (referencing
restored_env, reset(), and the snapshot/restore flow) to clarify that the
restore returns the environment to the saved state but the subsequent reset()
intentionally begins a fresh episode (or, alternatively, remove the reset() if
you intend to continue the restored episode) so the comment matches the code
behavior.

54-66: Consider exposing public properties for commonly accessed state.

The example accesses several private attributes (_step_count, _current_task, _code_agent_task). While acceptable in example code, this pattern may leak into user code. Consider adding public read-only properties for step_count and current_task to provide a stable API for users who need to inspect environment state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/03_state_snapshotting.py` around lines 54 - 66, The example reads
private attributes env._step_count, env._current_task and env._code_agent_task;
add public read-only properties on the environment class (e.g., def
step_count(self)->int and def current_task(self)->Task or similar) that return
the underlying private fields and keep the original private names for internal
use, then update the example to use env.step_count and env.current_task instead
of accessing _step_count/_current_task/_code_agent_task; locate these changes
around the environment class definition and the example usage referencing
env._step_count to implement the properties and switch callers to the new public
accessors.
src/ares/environments/snapshot.py (1)

116-127: Consider adding error handling for file operations.

save_to_file doesn't handle potential I/O errors (disk full, permissions). While these would bubble up as exceptions, explicit handling or documentation of possible exceptions would help callers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/snapshot.py` around lines 116 - 127, The save_to_file
method currently writes JSON without handling I/O failures; wrap the file
open/json.dump sequence in a try/except that catches OSError (or broader
IOError) around the open(path, "w") and json.dump calls inside save_to_file,
convert or annotate the caught exception with a clear message that includes the
target path (use the pathlib.Path argument) and either log it via the
module/logger used in this file or re-raise a more specific exception
(preserving the original exception as the __cause__); ensure
dataclasses.asdict(self) and conversion of snapshot_dir to str remain unchanged
and that any resources are properly closed (use context manager) when
propagating the error.
src/ares/environments/swebench_env.py (2)

305-306: Inconsistent import path for DaytonaContainer.

Same issue as in harbor_env.py: use the direct import ares_daytona.DaytonaContainer instead of base.ares_daytona.DaytonaContainer.

♻️ Proposed fix
-        container_factory = container_factory or base.ares_daytona.DaytonaContainer
+        container_factory = container_factory or ares_daytona.DaytonaContainer
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/swebench_env.py` around lines 305 - 306, The assignment
for the default container factory uses an inconsistent import path: change the
default from base.ares_daytona.DaytonaContainer to ares_daytona.DaytonaContainer
by updating the container_factory default expression (where container_factory is
set) so it references DaytonaContainer via ares_daytona.DaytonaContainer instead
of base.ares_daytona.DaytonaContainer; mirror the same pattern used in
harbor_env.py to keep imports consistent.

291-293: Remove redundant import.

pathlib is already imported at line 13. The re-import as pathlib_module at line 291 is unnecessary.

♻️ Proposed fix
-        import pathlib as pathlib_module
-
         from ares.environments import snapshot as snapshot_module

         # Load snapshot if path provided
-        if isinstance(snapshot_path, pathlib_module.Path):
+        if isinstance(snapshot_path, pathlib.Path):
             snap = snapshot_module.EnvironmentSnapshot.load_from_file(snapshot_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/swebench_env.py` around lines 291 - 293, The file
contains a redundant import "import pathlib as pathlib_module" (and the adjacent
"from ares.environments import snapshot as snapshot_module"); remove the
redundant pathlib alias import (pathlib_module) and, if any code uses
pathlib_module, change those references to the already-imported "pathlib" symbol
so only the original pathlib import at the top is used.
src/ares/environments/gym_wrapper.py (2)

88-90: Consider including time_step in reset's info dict for consistency.

step() returns info["time_step"] (line 115), but reset() returns an empty dict. Since ARES's reset() also returns a TimeStep, users may expect to access it here as well.

♻️ Suggested fix for consistency
         del seed, options  # Unused; kept for API compatibility.
         time_step = await self._env.reset()
-        return time_step.observation, {}
+        return time_step.observation, {"time_step": time_step}

Note: Update the docstring at line 86 accordingly if this change is adopted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/gym_wrapper.py` around lines 88 - 90, The reset method
in GymWrapper currently returns (time_step.observation, {}) but step() exposes
info["time_step"]; change reset() to return the TimeStep in the info dict (e.g.,
return time_step.observation, {"time_step": time_step}) so callers can access
the initial TimeStep consistently, and update the reset() docstring (around the
existing docstring near line 86) to document that info["time_step"] contains the
TimeStep returned by the environment.

149-150: Consider delegating space attributes to avoid duplication.

These duplicate the None assignments in AsyncGymWrapper. If the underlying implementation changes, both classes need updating.

♻️ Suggested refactor using properties
     def __init__(
         self,
         env: base.Environment[ActionType, ObservationType, Any, Any],
     ) -> None:
         self._async = AsyncGymWrapper(env)
-        self.observation_space: Any = None
-        self.action_space: Any = None
+
+    `@property`
+    def observation_space(self) -> Any:
+        return self._async.observation_space
+
+    `@property`
+    def action_space(self) -> Any:
+        return self._async.action_space
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/gym_wrapper.py` around lines 149 - 150, Replace the
duplicated None assignments for observation_space and action_space with property
delegates that forward to the wrapped environment (or the AsyncGymWrapper
counterpart) so changes are centralized: remove the lines setting
self.observation_space and self.action_space to None in GymWrapper, and add
`@property` def observation_space(self): return self._env.observation_space (and
similarly for action_space) or delegate to AsyncGymWrapper's
observation_space/action_space accessors if the wrapper wraps an
AsyncGymWrapper; update any internal uses to read the properties rather than
relying on instance attributes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ares/environments/base.py`:
- Around line 494-498: The call container.upload_dir(fs_path, "/") is wrong
because upload_dir expects a directory and its implementation uses remote_path /
relative_path which fails when remote_path is a str; either upload the tarball
as a single file with container.upload_file(fs_path, "/container_fs.tar.gz") or
extract fs_path to a temporary directory and call
container.upload_dir(extracted_dir, "/"); additionally fix the upload_dir
implementation in src/ares/containers/containers.py by ensuring remote_path is
converted to a pathlib.Path (e.g., remote_base = Path(remote_path)) before using
the / operator and keep iteration over Path.rglob("*") to handle directories
correctly.

In `@src/ares/environments/gym_wrapper.py`:
- Around line 34-35: The docstring for _run is misleading because the function
raises RuntimeError when an event loop is already running rather than re-using
it; update the _run function's docstring to accurately describe behavior (i.e.,
it runs the coroutine by creating a new event loop and raises RuntimeError if a
running loop exists) and mention the specific exception raised (RuntimeError)
and any side effects so callers of _run know to avoid calling it from within a
running event loop.
- Around line 7-20: The documentation example is invalid because wrap_as_gym
returns the synchronous GymWrapper whose _run() raises RuntimeError if an event
loop is already running; update the example to either (A) show synchronous usage
outside of async context (call wrap_as_gym(...) from normal def or top-level
code) or (B) keep the async context but pass async_mode=True to wrap_as_gym so
it returns the async-compatible wrapper; reference wrap_as_gym, GymWrapper/_run,
and the async_mode=True flag so readers know which API to change.
- Around line 195-206: The synchronous example in the wrap_as_gym docstring is
shown inside an async context, which is invalid; update the docstring examples
for wrap_as_gym so the sync usage is shown outside any async context (use a
normal with or direct instantiation for MyAresEnv and call env.reset()
synchronously) and keep the async usage example separate using async with and
async_mode=True for wrap_as_gym; adjust the text around the examples to clearly
label "Synchronous usage" and "Async usage" so readers use the correct context
for the reset() call.

---

Nitpick comments:
In `@examples/03_state_snapshotting.py`:
- Around line 115-128: The comment incorrectly implies restored_env stays at the
snapshotted episode while the code immediately calls restored_env.reset(), which
actually starts a new episode; update the explanatory comment around
restored_env and reset() (referencing restored_env, reset(), and the
snapshot/restore flow) to clarify that the restore returns the environment to
the saved state but the subsequent reset() intentionally begins a fresh episode
(or, alternatively, remove the reset() if you intend to continue the restored
episode) so the comment matches the code behavior.
- Around line 54-66: The example reads private attributes env._step_count,
env._current_task and env._code_agent_task; add public read-only properties on
the environment class (e.g., def step_count(self)->int and def
current_task(self)->Task or similar) that return the underlying private fields
and keep the original private names for internal use, then update the example to
use env.step_count and env.current_task instead of accessing
_step_count/_current_task/_code_agent_task; locate these changes around the
environment class definition and the example usage referencing env._step_count
to implement the properties and switch callers to the new public accessors.

In `@src/ares/environments/base.py`:
- Around line 468-472: The helper _get_agent_messages currently reaches into
self._code_agent._messages using hasattr which couples BaseEnvironment to
MiniSWECodeAgent internals; add a public get_messages(self) -> list[dict] method
to the CodeAgent interface (and implement it on MiniSWECodeAgent) and change
_get_agent_messages to call self._code_agent.get_messages() when _code_agent is
not None, with a safe fallback to [] if the method is missing or returns None.
- Around line 544-548: The created_at value for snapshot.EnvironmentSnapshot is
currently using a naive timestamp (datetime.datetime.now().isoformat()); update
the assignment so created_at is a UTC-aware ISO timestamp (e.g., generate the
timestamp with datetime.timezone.utc so it includes the UTC offset) when
constructing snap (see snapshot.EnvironmentSnapshot and the created_at field),
ensuring cross-system comparisons use timezone-aware UTC times.

In `@src/ares/environments/gym_wrapper.py`:
- Around line 88-90: The reset method in GymWrapper currently returns
(time_step.observation, {}) but step() exposes info["time_step"]; change reset()
to return the TimeStep in the info dict (e.g., return time_step.observation,
{"time_step": time_step}) so callers can access the initial TimeStep
consistently, and update the reset() docstring (around the existing docstring
near line 86) to document that info["time_step"] contains the TimeStep returned
by the environment.
- Around line 149-150: Replace the duplicated None assignments for
observation_space and action_space with property delegates that forward to the
wrapped environment (or the AsyncGymWrapper counterpart) so changes are
centralized: remove the lines setting self.observation_space and
self.action_space to None in GymWrapper, and add `@property` def
observation_space(self): return self._env.observation_space (and similarly for
action_space) or delegate to AsyncGymWrapper's observation_space/action_space
accessors if the wrapper wraps an AsyncGymWrapper; update any internal uses to
read the properties rather than relying on instance attributes.

In `@src/ares/environments/harbor_env.py`:
- Around line 215-216: The assignment uses base.ares_daytona.DaytonaContainer
while ares_daytona is already imported; update the default factory expression
(container_factory = container_factory or base.ares_daytona.DaytonaContainer) to
use the direct import (ares_daytona.DaytonaContainer) so the symbol
DaytonaContainer is referenced consistently with the top-level import and avoids
the nested base lookup.

In `@src/ares/environments/snapshot_test.py`:
- Around line 92-109: The test contains inline imports of asyncio and
contextlib; move "import asyncio" and "import contextlib" to the module-level
imports at the top of src/ares/environments/snapshot_test.py, then remove the
inline imports inside the test that defines mock_running_task and the cleanup
block (the code referencing mock_running_task, env._code_agent_task, and
env._validate_snapshot_allowed should remain unchanged).

In `@src/ares/environments/snapshot.py`:
- Around line 116-127: The save_to_file method currently writes JSON without
handling I/O failures; wrap the file open/json.dump sequence in a try/except
that catches OSError (or broader IOError) around the open(path, "w") and
json.dump calls inside save_to_file, convert or annotate the caught exception
with a clear message that includes the target path (use the pathlib.Path
argument) and either log it via the module/logger used in this file or re-raise
a more specific exception (preserving the original exception as the __cause__);
ensure dataclasses.asdict(self) and conversion of snapshot_dir to str remain
unchanged and that any resources are properly closed (use context manager) when
propagating the error.

In `@src/ares/environments/swebench_env.py`:
- Around line 305-306: The assignment for the default container factory uses an
inconsistent import path: change the default from
base.ares_daytona.DaytonaContainer to ares_daytona.DaytonaContainer by updating
the container_factory default expression (where container_factory is set) so it
references DaytonaContainer via ares_daytona.DaytonaContainer instead of
base.ares_daytona.DaytonaContainer; mirror the same pattern used in
harbor_env.py to keep imports consistent.
- Around line 291-293: The file contains a redundant import "import pathlib as
pathlib_module" (and the adjacent "from ares.environments import snapshot as
snapshot_module"); remove the redundant pathlib alias import (pathlib_module)
and, if any code uses pathlib_module, change those references to the
already-imported "pathlib" symbol so only the original pathlib import at the top
is used.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ee456781-b355-40d1-a3c5-e9645de41843

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 7dbbada.

📒 Files selected for processing (9)
  • examples/03_state_snapshotting.py
  • src/ares/environments/__init__.py
  • src/ares/environments/base.py
  • src/ares/environments/gym_wrapper.py
  • src/ares/environments/gym_wrapper_test.py
  • src/ares/environments/harbor_env.py
  • src/ares/environments/snapshot.py
  • src/ares/environments/snapshot_test.py
  • src/ares/environments/swebench_env.py

Comment on lines +494 to +498
fs_path = snap.snapshot_dir / "container_fs.tar.gz"
if fs_path.exists():
await container.upload_dir(fs_path, "/")

return container
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how upload_dir is implemented in container classes
ast-grep --pattern $'async def upload_dir($_, local_path$_, remote_path$_) {
  $$$
}'

Repository: withmartian/ares

Length of output: 42


🏁 Script executed:

# First, find the container class and how upload_dir is used
rg "upload_dir" -A 5 -B 2

Repository: withmartian/ares

Length of output: 3609


🏁 Script executed:

# Find container class definitions
rg "class.*container" -i

Repository: withmartian/ares

Length of output: 1041


🏁 Script executed:

# Look at the imports in base.py to understand the container type
head -50 src/ares/environments/base.py

Repository: withmartian/ares

Length of output: 1567


🏁 Script executed:

# Get the full upload_dir implementation
sed -n '/(async def upload_dir/,/^[[:space:]]*async def\|^[[:space:]]*def\|^class/p' src/ares/containers/containers.py | head -30

Repository: withmartian/ares

Length of output: 42


🏁 Script executed:

# Alternative: use rg to get more context around upload_dir
rg "async def upload_dir" -A 20 src/ares/containers/containers.py

Repository: withmartian/ares

Length of output: 1032


🏁 Script executed:

# Also check how fs_path is used in base.py context
sed -n '490,510p' src/ares/environments/base.py

Repository: withmartian/ares

Length of output: 782


🏁 Script executed:

# Check how the tarball is created/saved in export_state
rg "container_fs.tar.gz" -B 5 -A 2

Repository: withmartian/ares

Length of output: 1664


🏁 Script executed:

# Check the snapshot test to see intended behavior
sed -n '1,100p' src/ares/environments/snapshot_test.py | grep -A 20 "upload_dir"

Repository: withmartian/ares

Length of output: 42


🏁 Script executed:

# Look for where tarball is extracted or how filesystem is meant to be restored
rg "tar.*gz" -i

Repository: withmartian/ares

Length of output: 337


🏁 Script executed:

# Check download_dir implementation to understand how the tarball is created
rg "async def download_dir" -A 20 src/ares/containers/containers.py

Repository: withmartian/ares

Length of output: 954


🏁 Script executed:

# Check the full context of how export_state saves the filesystem
sed -n '507,550p' src/ares/environments/base.py

Repository: withmartian/ares

Length of output: 1546


🏁 Script executed:

# Check the snapshot.py to understand how snapshot stores/loads data
grep -n "snapshot_dir\|download_dir\|upload_dir" src/ares/environments/snapshot.py | head -20

Repository: withmartian/ares

Length of output: 387


🏁 Script executed:

# Check if there's any extraction or decompression happening with the "tarball"
rg "tar\|gz\|extract\|decompress" src/ares/environments/base.py

Repository: withmartian/ares

Length of output: 42


🏁 Script executed:

# Check the test to see what's actually in container_fs.tar.gz
sed -n '1,200p' src/ares/environments/snapshot_test.py | tail -100

Repository: withmartian/ares

Length of output: 3127


🏁 Script executed:

# Look at the full download_dir call to understand the behavior better
sed -n '530,545p' src/ares/environments/base.py

Repository: withmartian/ares

Length of output: 610


🏁 Script executed:

# Let me check what download_dir actually creates - is it a tarball or a directory?
# Look at the test mock implementation
sed -n '1,150p' src/ares/environments/snapshot_test.py | grep -A 30 "class MockContainerWithDirOps"

Repository: withmartian/ares

Length of output: 42


🏁 Script executed:

# Check if the naming is misleading - does "container_fs.tar.gz" actually create a tarball or directory structure?
# Let me see the full mock implementation for upload_dir
sed -n '1,300p' src/ares/environments/snapshot_test.py | grep -A 10 "async def upload_dir"

Repository: withmartian/ares

Length of output: 42


🏁 Script executed:

# Get the full snapshot_test.py to see the mock implementations
cat src/ares/environments/snapshot_test.py

Repository: withmartian/ares

Length of output: 13577


🏁 Script executed:

# Let me check the exact implementation of upload_dir more carefully, especially the destination_path line
sed -n '/async def upload_dir/,/async def download_file/p' src/ares/containers/containers.py | head -20

Repository: withmartian/ares

Length of output: 765


🏁 Script executed:

# Check if remote_path is maybe converted to Path somewhere
rg "destination_path.*remote_path" -B 2 -A 2

Repository: withmartian/ares

Length of output: 458


🏁 Script executed:

# Check if this code actually works - the remote_path is a string, but it's being used with /
# Let me trace through: remote_path is "/" (a string), relative_path is Path
# destination_path = str(remote_path / relative_path) should fail
# Let me verify if there's any conversion happening
python3 << 'EOF'
# Test if string / Path works
s = "/"
p = "file.txt"
try:
    result = s / p
    print("Success:", result)
except TypeError as e:
    print("Error:", e)
EOF

Repository: withmartian/ares

Length of output: 117


🏁 Script executed:

# Also check if there's maybe a different version or if pathlib.Path is being used
rg "destination_path.*=" -B 5 -A 1 src/ares/containers/

Repository: withmartian/ares

Length of output: 550


🏁 Script executed:

# Check the actual test to see if it exercises upload_dir properly
grep -A 30 "async def upload_dir" src/ares/environments/snapshot_test.py

Repository: withmartian/ares

Length of output: 1172


Critical bug: upload_dir call will fail due to type error in container implementation.

The code at line 497 calls container.upload_dir(fs_path, "/") where fs_path is a .tar.gz file. However, the upload_dir implementation in src/ares/containers/containers.py has a type error on line that computes destination_path = str(remote_path / relative_path). Since remote_path is a string (not a pathlib.Path), the / operator fails with TypeError: unsupported operand type(s) for /: 'str' and 'str'.

Additionally, upload_dir expects a directory and iterates through it with rglob("*"). Passing a .tar.gz file will not yield any files to upload, breaking snapshot restoration. The tarball should either be uploaded as a single file using upload_file or extracted before passing to upload_dir.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 494 - 498, The call
container.upload_dir(fs_path, "/") is wrong because upload_dir expects a
directory and its implementation uses remote_path / relative_path which fails
when remote_path is a str; either upload the tarball as a single file with
container.upload_file(fs_path, "/container_fs.tar.gz") or extract fs_path to a
temporary directory and call container.upload_dir(extracted_dir, "/");
additionally fix the upload_dir implementation in
src/ares/containers/containers.py by ensuring remote_path is converted to a
pathlib.Path (e.g., remote_base = Path(remote_path)) before using the / operator
and keep iteration over Path.rglob("*") to handle directories correctly.

Comment on lines +7 to +20
Usage::

import asyncio
from ares.environments import wrap_as_gym

async def main():
async with MyAresEnv(...) as ares_env:
env = wrap_as_gym(ares_env)
obs, info = env.reset()
while True:
action = policy(obs)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation example would fail at runtime.

The usage example shows calling the synchronous GymWrapper from inside async def main(), but _run() raises RuntimeError when an event loop is already running. This example will not work as written.

Consider either:

  1. Showing the sync wrapper used from a truly synchronous context (no async def), or
  2. Showing async_mode=True for the async context example.
📝 Suggested fix
 Usage::
 
-    import asyncio
     from ares.environments import wrap_as_gym
 
-    async def main():
-        async with MyAresEnv(...) as ares_env:
-            env = wrap_as_gym(ares_env)
-            obs, info = env.reset()
-            while True:
-                action = policy(obs)
-                obs, reward, terminated, truncated, info = env.step(action)
-                if terminated or truncated:
-                    break
+    # Synchronous usage (from a sync context)
+    env = wrap_as_gym(my_ares_env)
+    obs, info = env.reset()
+    while True:
+        action = policy(obs)
+        obs, reward, terminated, truncated, info = env.step(action)
+        if terminated or truncated:
+            break
+    env.close()
+
+    # Async usage (inside async context)
+    env = wrap_as_gym(my_ares_env, async_mode=True)
+    obs, info = await env.reset()
+    ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/gym_wrapper.py` around lines 7 - 20, The documentation
example is invalid because wrap_as_gym returns the synchronous GymWrapper whose
_run() raises RuntimeError if an event loop is already running; update the
example to either (A) show synchronous usage outside of async context (call
wrap_as_gym(...) from normal def or top-level code) or (B) keep the async
context but pass async_mode=True to wrap_as_gym so it returns the
async-compatible wrapper; reference wrap_as_gym, GymWrapper/_run, and the
async_mode=True flag so readers know which API to change.

Comment on lines +34 to +35
def _run(coro: Any) -> Any:
"""Run a coroutine, re-using the running loop if one exists."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring is misleading.

The docstring says "re-using the running loop if one exists," but the function actually raises RuntimeError when a loop is running.

📝 Suggested fix
 def _run(coro: Any) -> Any:
-    """Run a coroutine, re-using the running loop if one exists."""
+    """Run a coroutine via asyncio.run, raising if a loop is already running."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _run(coro: Any) -> Any:
"""Run a coroutine, re-using the running loop if one exists."""
def _run(coro: Any) -> Any:
"""Run a coroutine via asyncio.run, raising if a loop is already running."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/gym_wrapper.py` around lines 34 - 35, The docstring for
_run is misleading because the function raises RuntimeError when an event loop
is already running rather than re-using it; update the _run function's docstring
to accurately describe behavior (i.e., it runs the coroutine by creating a new
event loop and raises RuntimeError if a running loop exists) and mention the
specific exception raised (RuntimeError) and any side effects so callers of _run
know to avoid calling it from within a running event loop.

Comment on lines +195 to +206
Example::

# Synchronous usage (default)
async with MyAresEnv(...) as ares_env:
env = wrap_as_gym(ares_env)
obs, info = env.reset()

# Async usage (inside async context)
async with MyAresEnv(...) as ares_env:
env = wrap_as_gym(ares_env, async_mode=True)
obs, info = await env.reset()
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring example for sync usage is inside async context.

Same issue as the module docstring: the sync example at lines 197-200 is inside async with, which would raise RuntimeError because an event loop is running.

📝 Suggested fix
     Example::
 
-        # Synchronous usage (default)
-        async with MyAresEnv(...) as ares_env:
-            env = wrap_as_gym(ares_env)
-            obs, info = env.reset()
+        # Synchronous usage (from sync context)
+        env = wrap_as_gym(my_ares_env)
+        obs, info = env.reset()
 
         # Async usage (inside async context)
         async with MyAresEnv(...) as ares_env:
             env = wrap_as_gym(ares_env, async_mode=True)
             obs, info = await env.reset()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Example::
# Synchronous usage (default)
async with MyAresEnv(...) as ares_env:
env = wrap_as_gym(ares_env)
obs, info = env.reset()
# Async usage (inside async context)
async with MyAresEnv(...) as ares_env:
env = wrap_as_gym(ares_env, async_mode=True)
obs, info = await env.reset()
"""
Example::
# Synchronous usage (from sync context)
env = wrap_as_gym(my_ares_env)
obs, info = env.reset()
# Async usage (inside async context)
async with MyAresEnv(...) as ares_env:
env = wrap_as_gym(ares_env, async_mode=True)
obs, info = await env.reset()
"""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/gym_wrapper.py` around lines 195 - 206, The synchronous
example in the wrap_as_gym docstring is shown inside an async context, which is
invalid; update the docstring examples for wrap_as_gym so the sync usage is
shown outside any async context (use a normal with or direct instantiation for
MyAresEnv and call env.reset() synchronously) and keep the async usage example
separate using async with and async_mode=True for wrap_as_gym; adjust the text
around the examples to clearly label "Synchronous usage" and "Async usage" so
readers use the correct context for the reset() call.

Comment on lines +272 to +283
async def load_from_state(
cls,
snapshot_path: "base.snapshot.EnvironmentSnapshot | pathlib.Path",
*,
container_factory: containers.ContainerFactory | None = None,
code_agent_factory: code_agent_base.CodeAgentFactory | None = None,
tracker: stat_tracker.StatTracker | None = None,
) -> "SweBenchEnv":
"""Restore SweBenchEnv from snapshot.

Args:
snapshot_path: EnvironmentSnapshot or path to snapshot.json
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SweBenchEnv.load_from_state duplicates HarborEnv.load_from_state — should we move the shared snapshot-restoration logic into base.Environment._load_from_snapshot as a @classmethod and have subclasses supply task (de)serialization?

Finding types: Conciseness Code Dedup and Conventions | Severity: 🟠 Medium


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/environments/swebench_env.py around lines 272 to 319, the classmethod
load_from_state duplicates snapshot normalization, task deserialization, default factory
selection, env construction and _restore_from_snapshot boilerplate that is also present
in HarborEnv.load_from_state. Refactor by adding a shared classmethod on the base
environment (e.g. base.Environment._load_from_snapshot(cls, snap, *, container_factory,
code_agent_factory, tracker)) that: 1) accepts an already-loaded snapshot object, 2)
selects default container/code_agent factories when None, 3) constructs the env instance
with tasks=[task], step_limit=snap.step_limit and tracker, and 4) calls await
env._restore_from_snapshot(snap) and returns env. Then change
SweBenchEnv.load_from_state to only load/normalize snapshot_path, call
cls._deserialize_task to get the task, and delegate the remaining construction/restore
to base.Environment._load_from_snapshot (passing the chosen factories and tracker).
Apply the same refactor to src/ares/environments/harbor_env.py (lines ~169-229) so both
subclasses use the shared helper and only implement their task
serialization/deserialization hooks.

Comment on lines +305 to +314
container_factory = container_factory or base.ares_daytona.DaytonaContainer
code_agent_factory = code_agent_factory or mini_swe_agent.MiniSWECodeAgent

env = cls(
tasks=[task],
container_factory=container_factory,
code_agent_factory=code_agent_factory,
step_limit=snap.step_limit,
tracker=tracker,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_from_state() falls back to base.ares_daytona.DaytonaContainer when container_factory is omitted, so Docker snapshots recorded in snap.container_type are restored as Daytona — should we honor snap.container_type or call _default_container_factory(snap) when container_factory is not provided?

Finding type: Logical Bugs | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/environments/swebench_env.py around lines 305-314, the load_from_state method
always falls back to base.ares_daytona.DaytonaContainer when container_factory is
omitted, which ignores the snapshot's container_type and restores Docker snapshots as
Daytona. Change the code so that when container_factory is None it consults the snapshot
(snap.container_type) or calls the existing helper (for example
base._default_container_factory(snap)) to obtain the proper factory, and only falls back
to base.ares_daytona.DaytonaContainer if the snapshot metadata/helper cannot determine a
factory. Ensure the chosen factory type is compatible with the rest of the method (no
API changes) and add a small comment explaining the behavior.

Comment on lines +115 to +121
async with restored_env:
print("\n[PART 3] Continuing from restored state...")

# The environment is now at the same state as when we snapshotted
# We can continue taking steps from here
ts = await restored_env.reset() # Reset to start a new episode
step_count = 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored_env.reset() discards the restored snapshot because reset() always stops/clears and reinitializes, should we change Environment.reset() to avoid stopping/clearing for restored envs or add a start-restored method, or else mark snapshots non-resumable and verify _restore_from_snapshot() rehydrates agent/task/message history so we can continue without reset()?

Finding type: Logical Bugs | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
examples/03_state_snapshotting.py around lines 115-121, the example calls
restored_env.reset() to “continue” the restored environment; this is wrong because
reset() destroys and recreates the container/agent and therefore discards the restored
snapshot. Change the example to NOT call reset() after load; instead either (A) call a
non-destructive startup method (implement and call Environment.start_after_restore() or
resume() in src/ares/environments/base.py that brings a restored env into a running
state without stopping/clearing the container), or (B) if you don’t want a new API,
use the restored_env's existing in-memory state (use the restored timestep/task fields
populated by _restore_from_snapshot) and proceed to step() from that state. Also update
or add a test and a short comment explaining that snapshots must be resumed via this
non-destructive path, and inspect/ensure base._restore_from_snapshot rehydrates agent
message history and _code_agent/_code_agent_task appropriately so continuing execution
works without calling reset().

Comment on lines +438 to +442
def _require_container(self) -> containers.Container:
"""Get container or raise if not available."""
if self._container is None:
raise RuntimeError("Container is not available.")
return self._container
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_require_container and other snapshot helpers raise bare RuntimeError/ValueError, preventing callers/tests from distinguishing snapshot/episode-boundary failures — should we introduce environment-specific exceptions like SnapshotError/SnapshotTimingError and raise those instead?

Finding type: AI Coding Guidelines | Severity: 🟠 Medium


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/environments/base.py around lines 438 to 442, the helper methods
(_require_container, _require_task, _validate_snapshot_allowed and related snapshot
logic) currently raise bare RuntimeError/ValueError. Add environment-specific exception
classes (e.g. SnapshotError as base, SnapshotTimingError for mid-episode snapshot
attempts, and SnapshotMetadataError for invalid snapshot metadata) near the top of the
module or in src/ares/environments/__init__.py. Replace the RuntimeError in
_require_container/_require_task with SnapshotError or a more specific subclass, replace
the RuntimeError raised in _validate_snapshot_allowed with SnapshotTimingError, and
replace the ValueError in _restore_container (invalid snapshot metadata) with
SnapshotMetadataError. Ensure export_state/docstrings reference the new exceptions and
update any tests or callers that expect RuntimeError to catch the new exception types.

Comment on lines +450 to +455
def _validate_snapshot_allowed(self) -> None:
"""Raise error if snapshot not allowed (mid-episode)."""
if self._code_agent_task is not None and not self._code_agent_task.done():
raise RuntimeError(
"Cannot snapshot during active episode. Call export_state() after reset() or after final step()."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_validate_snapshot_allowed() blocks snapshots while _code_agent_task is running, so export_state() fails after reset(); should we relax the guard to allow snapshots at the episode boundary by checking _requires_reset?

Finding type: Logical Bugs | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/environments/base.py around lines 450 to 455, the _validate_snapshot_allowed
method forbids snapshots whenever self._code_agent_task is running, but reset() (around
lines ~302-322) starts the code agent and returns while that task is still pending,
which prevents valid 'snapshot after reset' behavior. Modify _validate_snapshot_allowed
so it only raises if the agent task is running AND self._requires_reset is False (i.e.,
allow snapshots immediately after reset when _requires_reset is True). Additionally, in
the reset() method (lines ~302-322) set self._requires_reset = True at the end of reset
to mark the post-reset boundary, and in the first step() invocation clear
self._requires_reset = False (or where the agent begins actual execution) so subsequent
snapshots are again blocked while the episode is active. Ensure these changes include
small comments explaining the intent.

Comment on lines +586 to +589

# Store messages for later restoration
self._saved_agent_messages = snap.agent_messages

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we seed _code_agent._messages from _saved_agent_messages before run so restored snapshots actually resume the prior conversation?

Finding type: Logical Bugs | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/environments/base.py around lines 586 to 589, the _restore_from_snapshot method
stores prior agent messages in self._saved_agent_messages but nothing injects those
messages into the newly created code agent, causing conversation history loss. Modify
the restore/start flow so that when subclasses instantiate the code agent (e.g., their
_start_code_agent or equivalent that currently creates MiniSWECodeAgent), they seed the
agent's internal message history with self._saved_agent_messages before calling
run/start: either add an optional messages parameter to the code agent
factory/constructor and pass self._saved_agent_messages, or after construction set
code_agent._messages = list(self._saved_agent_messages) (defensive copy) before starting
the agent. Also clear self._saved_agent_messages after seeding to avoid double-restore.

Comment on lines +11 to +14
from ares.environments.base import TimeStep
from ares.environments.gym_wrapper import AsyncGymWrapper
from ares.environments.gym_wrapper import GymWrapper
from ares.environments.gym_wrapper import wrap_as_gym
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gym_wrapper_test.py deep-imports AsyncGymWrapper, GymWrapper, and wrap_as_gym from ares.environments.gym_wrapper — should we from ares.environments import gym_wrapper and use gym_wrapper.AsyncGymWrapper, gym_wrapper.GymWrapper, and gym_wrapper.wrap_as_gym to follow CLAUDE.md?

Finding type: AI Coding Guidelines | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/environments/gym_wrapper_test.py around lines 11 to 14, the file currently does
deep symbol imports of AsyncGymWrapper, GymWrapper, and wrap_as_gym from
ares.environments.gym_wrapper. Change the import to import the module instead (from
ares.environments import gym_wrapper) and then update all usages in the file to
reference gym_wrapper.AsyncGymWrapper, gym_wrapper.GymWrapper, and
gym_wrapper.wrap_as_gym respectively. Ensure tests still run by replacing every
occurrence of the unqualified symbols with the qualified module-prefixed names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add gymnasium-like interface wrappers

1 participant