Add gymnasium-compatible wrappers for ARES environments (fixes #25)#98
Add gymnasium-compatible wrappers for ARES environments (fixes #25)#98sarvanithin wants to merge 3 commits intowithmartian:mainfrom
Conversation
Implements snapshot/restore functionality to save and replay episodes from specific checkpoints. Useful for debugging, trajectory analysis, and mechanistic interpretability. - Add EnvironmentSnapshot dataclass for serializing env state - Implement export_state() and load_from_state() methods on CodeBaseEnv - Support both SWEBench and Harbor environments - Save container filesystem as tarball with JSON metadata - Snapshots only work at episode boundaries (can't snapshot mid-episode) - Add comprehensive test coverage Closes withmartian#39
Moved load_from_state to SweBenchEnv and HarborEnv since they need different constructor arguments (tasks list). Base class now provides _restore_from_snapshot helper that subclasses call after init. - Add load_from_state implementation to SweBenchEnv - Add load_from_state implementation to HarborEnv - Refactor base class to use _restore_from_snapshot helper - Add test for load_from_state functionality
…rtian#25) Introduces GymWrapper (sync) and AsyncGymWrapper (async) that adapt any ARES Environment to the gymnasium v26+ interface (reset/step/close with (obs, reward, terminated, truncated, info) returns). - wrap_as_gym(env) returns a synchronous GymWrapper by default - wrap_as_gym(env, async_mode=True) returns an AsyncGymWrapper for use inside existing asyncio event loops (e.g. Jupyter notebooks) - observation_space / action_space exposed as None since ARES operates on structured LLM objects rather than numpy array spaces - TimeStep passed through in info["time_step"] for callers that need step_type or discount - 19 unit tests covering reset, step, close, context managers, and the factory function
|
Note Currently processing new changes in this PR. This may take a few minutes, please wait... ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.Add a .trivyignore file to your project to customize which findings Trivy reports. |
📝 WalkthroughWalkthroughThis changeset introduces environment state snapshotting and gymnasium-compatible wrappers for ARES. The snapshot system enables exporting and restoring complete environment states via Changes
Sequence Diagram(s)sequenceDiagram
participant Env as SweBenchEnv
participant Container as Container
participant Task as Task
participant Snapshot as EnvironmentSnapshot
participant Disk as Filesystem
rect rgba(0, 100, 200, 0.5)
Note over Env,Disk: Export Phase
Env->>+Container: get_container()
Container-->>-Env: Container instance
Env->>Container: download_filesystem()
Container-->>Env: filesystem.tar.gz
Env->>+Task: serialize task
Task-->>-Env: task_data dict
Env->>Env: extract agent messages
Env->>+Snapshot: create EnvironmentSnapshot
Snapshot-->>-Env: snapshot instance
Env->>Disk: save snapshot metadata (snapshot.json)
Env->>Disk: persist filesystem archive
end
rect rgba(100, 200, 0, 0.5)
Note over Env,Disk: Restore Phase
Env->>+Disk: load snapshot metadata
Disk-->>-Env: EnvironmentSnapshot
Env->>+Task: deserialize task_data
Task-->>-Env: Task instance
Env->>+Container: restore from snapshot
Container->>Disk: extract filesystem archive
Container-->>-Env: Container restored
Env->>Env: _restore_from_snapshot()
Env->>Env: reset step/reset state
Env-->>Env: ready for continued execution
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (11)
src/ares/environments/base.py (2)
468-472: Fragile access to agent internal state.
_get_agent_messagesaccesses the private_messagesattribute of the code agent viahasattr. This creates tight coupling toMiniSWECodeAgent's implementation. Consider adding aget_messages()method to theCodeAgentinterface.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/environments/base.py` around lines 468 - 472, The helper _get_agent_messages currently reaches into self._code_agent._messages using hasattr which couples BaseEnvironment to MiniSWECodeAgent internals; add a public get_messages(self) -> list[dict] method to the CodeAgent interface (and implement it on MiniSWECodeAgent) and change _get_agent_messages to call self._code_agent.get_messages() when _code_agent is not None, with a safe fallback to [] if the method is missing or returns None.
544-548: Consider using timezone-aware timestamps.
datetime.datetime.now().isoformat()produces a naive datetime without timezone info. For cross-system snapshot comparisons, consider using UTC.♻️ Suggested improvement
- created_at=datetime.datetime.now().isoformat(), + created_at=datetime.datetime.now(datetime.timezone.utc).isoformat(),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/environments/base.py` around lines 544 - 548, The created_at value for snapshot.EnvironmentSnapshot is currently using a naive timestamp (datetime.datetime.now().isoformat()); update the assignment so created_at is a UTC-aware ISO timestamp (e.g., generate the timestamp with datetime.timezone.utc so it includes the UTC offset) when constructing snap (see snapshot.EnvironmentSnapshot and the created_at field), ensuring cross-system comparisons use timezone-aware UTC times.src/ares/environments/harbor_env.py (1)
215-216: Inconsistent import path forDaytonaContainer.Line 215 uses
base.ares_daytona.DaytonaContainer, which accessesares_daytonathrough thebasemodule. However,ares_daytonais already imported at the top of this file (line 23). Use the direct import for consistency.♻️ Proposed fix
- container_factory = container_factory or base.ares_daytona.DaytonaContainer + container_factory = container_factory or ares_daytona.DaytonaContainer🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/environments/harbor_env.py` around lines 215 - 216, The assignment uses base.ares_daytona.DaytonaContainer while ares_daytona is already imported; update the default factory expression (container_factory = container_factory or base.ares_daytona.DaytonaContainer) to use the direct import (ares_daytona.DaytonaContainer) so the symbol DaytonaContainer is referenced consistently with the top-level import and avoids the nested base lookup.src/ares/environments/snapshot_test.py (1)
92-109: Move imports to module level.
asyncioandcontextlibare imported inside test functions (lines 93, 106, 122). Move these to the module level for clarity.♻️ Suggested fix
Add to imports at top of file:
import asyncio import contextlibThen remove the inline imports from test functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/environments/snapshot_test.py` around lines 92 - 109, The test contains inline imports of asyncio and contextlib; move "import asyncio" and "import contextlib" to the module-level imports at the top of src/ares/environments/snapshot_test.py, then remove the inline imports inside the test that defines mock_running_task and the cleanup block (the code referencing mock_running_task, env._code_agent_task, and env._validate_snapshot_allowed should remain unchanged).examples/03_state_snapshotting.py (2)
115-128: Note: Calling reset() after restore starts a new episode.The comment on line 118-119 states "The environment is now at the same state as when we snapshotted", but line 120 immediately calls
reset()which starts a fresh episode rather than continuing from the snapshotted state. The example demonstrates restoration correctly, but the comment could be misleading about what "same state" means here.📝 Suggested clarification
# The environment is now at the same state as when we snapshotted - # We can continue taking steps from here - ts = await restored_env.reset() # Reset to start a new episode + # The task and container filesystem are restored, but we start a new episode + ts = await restored_env.reset() # Start a new episode with restored container state🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/03_state_snapshotting.py` around lines 115 - 128, The comment incorrectly implies restored_env stays at the snapshotted episode while the code immediately calls restored_env.reset(), which actually starts a new episode; update the explanatory comment around restored_env and reset() (referencing restored_env, reset(), and the snapshot/restore flow) to clarify that the restore returns the environment to the saved state but the subsequent reset() intentionally begins a fresh episode (or, alternatively, remove the reset() if you intend to continue the restored episode) so the comment matches the code behavior.
54-66: Consider exposing public properties for commonly accessed state.The example accesses several private attributes (
_step_count,_current_task,_code_agent_task). While acceptable in example code, this pattern may leak into user code. Consider adding public read-only properties forstep_countandcurrent_taskto provide a stable API for users who need to inspect environment state.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/03_state_snapshotting.py` around lines 54 - 66, The example reads private attributes env._step_count, env._current_task and env._code_agent_task; add public read-only properties on the environment class (e.g., def step_count(self)->int and def current_task(self)->Task or similar) that return the underlying private fields and keep the original private names for internal use, then update the example to use env.step_count and env.current_task instead of accessing _step_count/_current_task/_code_agent_task; locate these changes around the environment class definition and the example usage referencing env._step_count to implement the properties and switch callers to the new public accessors.src/ares/environments/snapshot.py (1)
116-127: Consider adding error handling for file operations.
save_to_filedoesn't handle potential I/O errors (disk full, permissions). While these would bubble up as exceptions, explicit handling or documentation of possible exceptions would help callers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/environments/snapshot.py` around lines 116 - 127, The save_to_file method currently writes JSON without handling I/O failures; wrap the file open/json.dump sequence in a try/except that catches OSError (or broader IOError) around the open(path, "w") and json.dump calls inside save_to_file, convert or annotate the caught exception with a clear message that includes the target path (use the pathlib.Path argument) and either log it via the module/logger used in this file or re-raise a more specific exception (preserving the original exception as the __cause__); ensure dataclasses.asdict(self) and conversion of snapshot_dir to str remain unchanged and that any resources are properly closed (use context manager) when propagating the error.src/ares/environments/swebench_env.py (2)
305-306: Inconsistent import path forDaytonaContainer.Same issue as in
harbor_env.py: use the direct importares_daytona.DaytonaContainerinstead ofbase.ares_daytona.DaytonaContainer.♻️ Proposed fix
- container_factory = container_factory or base.ares_daytona.DaytonaContainer + container_factory = container_factory or ares_daytona.DaytonaContainer🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/environments/swebench_env.py` around lines 305 - 306, The assignment for the default container factory uses an inconsistent import path: change the default from base.ares_daytona.DaytonaContainer to ares_daytona.DaytonaContainer by updating the container_factory default expression (where container_factory is set) so it references DaytonaContainer via ares_daytona.DaytonaContainer instead of base.ares_daytona.DaytonaContainer; mirror the same pattern used in harbor_env.py to keep imports consistent.
291-293: Remove redundant import.
pathlibis already imported at line 13. The re-import aspathlib_moduleat line 291 is unnecessary.♻️ Proposed fix
- import pathlib as pathlib_module - from ares.environments import snapshot as snapshot_module # Load snapshot if path provided - if isinstance(snapshot_path, pathlib_module.Path): + if isinstance(snapshot_path, pathlib.Path): snap = snapshot_module.EnvironmentSnapshot.load_from_file(snapshot_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/environments/swebench_env.py` around lines 291 - 293, The file contains a redundant import "import pathlib as pathlib_module" (and the adjacent "from ares.environments import snapshot as snapshot_module"); remove the redundant pathlib alias import (pathlib_module) and, if any code uses pathlib_module, change those references to the already-imported "pathlib" symbol so only the original pathlib import at the top is used.src/ares/environments/gym_wrapper.py (2)
88-90: Consider includingtime_stepin reset's info dict for consistency.
step()returnsinfo["time_step"](line 115), butreset()returns an empty dict. Since ARES'sreset()also returns aTimeStep, users may expect to access it here as well.♻️ Suggested fix for consistency
del seed, options # Unused; kept for API compatibility. time_step = await self._env.reset() - return time_step.observation, {} + return time_step.observation, {"time_step": time_step}Note: Update the docstring at line 86 accordingly if this change is adopted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/environments/gym_wrapper.py` around lines 88 - 90, The reset method in GymWrapper currently returns (time_step.observation, {}) but step() exposes info["time_step"]; change reset() to return the TimeStep in the info dict (e.g., return time_step.observation, {"time_step": time_step}) so callers can access the initial TimeStep consistently, and update the reset() docstring (around the existing docstring near line 86) to document that info["time_step"] contains the TimeStep returned by the environment.
149-150: Consider delegating space attributes to avoid duplication.These duplicate the
Noneassignments inAsyncGymWrapper. If the underlying implementation changes, both classes need updating.♻️ Suggested refactor using properties
def __init__( self, env: base.Environment[ActionType, ObservationType, Any, Any], ) -> None: self._async = AsyncGymWrapper(env) - self.observation_space: Any = None - self.action_space: Any = None + + `@property` + def observation_space(self) -> Any: + return self._async.observation_space + + `@property` + def action_space(self) -> Any: + return self._async.action_space🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/environments/gym_wrapper.py` around lines 149 - 150, Replace the duplicated None assignments for observation_space and action_space with property delegates that forward to the wrapped environment (or the AsyncGymWrapper counterpart) so changes are centralized: remove the lines setting self.observation_space and self.action_space to None in GymWrapper, and add `@property` def observation_space(self): return self._env.observation_space (and similarly for action_space) or delegate to AsyncGymWrapper's observation_space/action_space accessors if the wrapper wraps an AsyncGymWrapper; update any internal uses to read the properties rather than relying on instance attributes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ares/environments/base.py`:
- Around line 494-498: The call container.upload_dir(fs_path, "/") is wrong
because upload_dir expects a directory and its implementation uses remote_path /
relative_path which fails when remote_path is a str; either upload the tarball
as a single file with container.upload_file(fs_path, "/container_fs.tar.gz") or
extract fs_path to a temporary directory and call
container.upload_dir(extracted_dir, "/"); additionally fix the upload_dir
implementation in src/ares/containers/containers.py by ensuring remote_path is
converted to a pathlib.Path (e.g., remote_base = Path(remote_path)) before using
the / operator and keep iteration over Path.rglob("*") to handle directories
correctly.
In `@src/ares/environments/gym_wrapper.py`:
- Around line 34-35: The docstring for _run is misleading because the function
raises RuntimeError when an event loop is already running rather than re-using
it; update the _run function's docstring to accurately describe behavior (i.e.,
it runs the coroutine by creating a new event loop and raises RuntimeError if a
running loop exists) and mention the specific exception raised (RuntimeError)
and any side effects so callers of _run know to avoid calling it from within a
running event loop.
- Around line 7-20: The documentation example is invalid because wrap_as_gym
returns the synchronous GymWrapper whose _run() raises RuntimeError if an event
loop is already running; update the example to either (A) show synchronous usage
outside of async context (call wrap_as_gym(...) from normal def or top-level
code) or (B) keep the async context but pass async_mode=True to wrap_as_gym so
it returns the async-compatible wrapper; reference wrap_as_gym, GymWrapper/_run,
and the async_mode=True flag so readers know which API to change.
- Around line 195-206: The synchronous example in the wrap_as_gym docstring is
shown inside an async context, which is invalid; update the docstring examples
for wrap_as_gym so the sync usage is shown outside any async context (use a
normal with or direct instantiation for MyAresEnv and call env.reset()
synchronously) and keep the async usage example separate using async with and
async_mode=True for wrap_as_gym; adjust the text around the examples to clearly
label "Synchronous usage" and "Async usage" so readers use the correct context
for the reset() call.
---
Nitpick comments:
In `@examples/03_state_snapshotting.py`:
- Around line 115-128: The comment incorrectly implies restored_env stays at the
snapshotted episode while the code immediately calls restored_env.reset(), which
actually starts a new episode; update the explanatory comment around
restored_env and reset() (referencing restored_env, reset(), and the
snapshot/restore flow) to clarify that the restore returns the environment to
the saved state but the subsequent reset() intentionally begins a fresh episode
(or, alternatively, remove the reset() if you intend to continue the restored
episode) so the comment matches the code behavior.
- Around line 54-66: The example reads private attributes env._step_count,
env._current_task and env._code_agent_task; add public read-only properties on
the environment class (e.g., def step_count(self)->int and def
current_task(self)->Task or similar) that return the underlying private fields
and keep the original private names for internal use, then update the example to
use env.step_count and env.current_task instead of accessing
_step_count/_current_task/_code_agent_task; locate these changes around the
environment class definition and the example usage referencing env._step_count
to implement the properties and switch callers to the new public accessors.
In `@src/ares/environments/base.py`:
- Around line 468-472: The helper _get_agent_messages currently reaches into
self._code_agent._messages using hasattr which couples BaseEnvironment to
MiniSWECodeAgent internals; add a public get_messages(self) -> list[dict] method
to the CodeAgent interface (and implement it on MiniSWECodeAgent) and change
_get_agent_messages to call self._code_agent.get_messages() when _code_agent is
not None, with a safe fallback to [] if the method is missing or returns None.
- Around line 544-548: The created_at value for snapshot.EnvironmentSnapshot is
currently using a naive timestamp (datetime.datetime.now().isoformat()); update
the assignment so created_at is a UTC-aware ISO timestamp (e.g., generate the
timestamp with datetime.timezone.utc so it includes the UTC offset) when
constructing snap (see snapshot.EnvironmentSnapshot and the created_at field),
ensuring cross-system comparisons use timezone-aware UTC times.
In `@src/ares/environments/gym_wrapper.py`:
- Around line 88-90: The reset method in GymWrapper currently returns
(time_step.observation, {}) but step() exposes info["time_step"]; change reset()
to return the TimeStep in the info dict (e.g., return time_step.observation,
{"time_step": time_step}) so callers can access the initial TimeStep
consistently, and update the reset() docstring (around the existing docstring
near line 86) to document that info["time_step"] contains the TimeStep returned
by the environment.
- Around line 149-150: Replace the duplicated None assignments for
observation_space and action_space with property delegates that forward to the
wrapped environment (or the AsyncGymWrapper counterpart) so changes are
centralized: remove the lines setting self.observation_space and
self.action_space to None in GymWrapper, and add `@property` def
observation_space(self): return self._env.observation_space (and similarly for
action_space) or delegate to AsyncGymWrapper's observation_space/action_space
accessors if the wrapper wraps an AsyncGymWrapper; update any internal uses to
read the properties rather than relying on instance attributes.
In `@src/ares/environments/harbor_env.py`:
- Around line 215-216: The assignment uses base.ares_daytona.DaytonaContainer
while ares_daytona is already imported; update the default factory expression
(container_factory = container_factory or base.ares_daytona.DaytonaContainer) to
use the direct import (ares_daytona.DaytonaContainer) so the symbol
DaytonaContainer is referenced consistently with the top-level import and avoids
the nested base lookup.
In `@src/ares/environments/snapshot_test.py`:
- Around line 92-109: The test contains inline imports of asyncio and
contextlib; move "import asyncio" and "import contextlib" to the module-level
imports at the top of src/ares/environments/snapshot_test.py, then remove the
inline imports inside the test that defines mock_running_task and the cleanup
block (the code referencing mock_running_task, env._code_agent_task, and
env._validate_snapshot_allowed should remain unchanged).
In `@src/ares/environments/snapshot.py`:
- Around line 116-127: The save_to_file method currently writes JSON without
handling I/O failures; wrap the file open/json.dump sequence in a try/except
that catches OSError (or broader IOError) around the open(path, "w") and
json.dump calls inside save_to_file, convert or annotate the caught exception
with a clear message that includes the target path (use the pathlib.Path
argument) and either log it via the module/logger used in this file or re-raise
a more specific exception (preserving the original exception as the __cause__);
ensure dataclasses.asdict(self) and conversion of snapshot_dir to str remain
unchanged and that any resources are properly closed (use context manager) when
propagating the error.
In `@src/ares/environments/swebench_env.py`:
- Around line 305-306: The assignment for the default container factory uses an
inconsistent import path: change the default from
base.ares_daytona.DaytonaContainer to ares_daytona.DaytonaContainer by updating
the container_factory default expression (where container_factory is set) so it
references DaytonaContainer via ares_daytona.DaytonaContainer instead of
base.ares_daytona.DaytonaContainer; mirror the same pattern used in
harbor_env.py to keep imports consistent.
- Around line 291-293: The file contains a redundant import "import pathlib as
pathlib_module" (and the adjacent "from ares.environments import snapshot as
snapshot_module"); remove the redundant pathlib alias import (pathlib_module)
and, if any code uses pathlib_module, change those references to the
already-imported "pathlib" symbol so only the original pathlib import at the top
is used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ee456781-b355-40d1-a3c5-e9645de41843
📒 Files selected for processing (9)
examples/03_state_snapshotting.pysrc/ares/environments/__init__.pysrc/ares/environments/base.pysrc/ares/environments/gym_wrapper.pysrc/ares/environments/gym_wrapper_test.pysrc/ares/environments/harbor_env.pysrc/ares/environments/snapshot.pysrc/ares/environments/snapshot_test.pysrc/ares/environments/swebench_env.py
| fs_path = snap.snapshot_dir / "container_fs.tar.gz" | ||
| if fs_path.exists(): | ||
| await container.upload_dir(fs_path, "/") | ||
|
|
||
| return container |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how upload_dir is implemented in container classes
ast-grep --pattern $'async def upload_dir($_, local_path$_, remote_path$_) {
$$$
}'Repository: withmartian/ares
Length of output: 42
🏁 Script executed:
# First, find the container class and how upload_dir is used
rg "upload_dir" -A 5 -B 2Repository: withmartian/ares
Length of output: 3609
🏁 Script executed:
# Find container class definitions
rg "class.*container" -iRepository: withmartian/ares
Length of output: 1041
🏁 Script executed:
# Look at the imports in base.py to understand the container type
head -50 src/ares/environments/base.pyRepository: withmartian/ares
Length of output: 1567
🏁 Script executed:
# Get the full upload_dir implementation
sed -n '/(async def upload_dir/,/^[[:space:]]*async def\|^[[:space:]]*def\|^class/p' src/ares/containers/containers.py | head -30Repository: withmartian/ares
Length of output: 42
🏁 Script executed:
# Alternative: use rg to get more context around upload_dir
rg "async def upload_dir" -A 20 src/ares/containers/containers.pyRepository: withmartian/ares
Length of output: 1032
🏁 Script executed:
# Also check how fs_path is used in base.py context
sed -n '490,510p' src/ares/environments/base.pyRepository: withmartian/ares
Length of output: 782
🏁 Script executed:
# Check how the tarball is created/saved in export_state
rg "container_fs.tar.gz" -B 5 -A 2Repository: withmartian/ares
Length of output: 1664
🏁 Script executed:
# Check the snapshot test to see intended behavior
sed -n '1,100p' src/ares/environments/snapshot_test.py | grep -A 20 "upload_dir"Repository: withmartian/ares
Length of output: 42
🏁 Script executed:
# Look for where tarball is extracted or how filesystem is meant to be restored
rg "tar.*gz" -iRepository: withmartian/ares
Length of output: 337
🏁 Script executed:
# Check download_dir implementation to understand how the tarball is created
rg "async def download_dir" -A 20 src/ares/containers/containers.pyRepository: withmartian/ares
Length of output: 954
🏁 Script executed:
# Check the full context of how export_state saves the filesystem
sed -n '507,550p' src/ares/environments/base.pyRepository: withmartian/ares
Length of output: 1546
🏁 Script executed:
# Check the snapshot.py to understand how snapshot stores/loads data
grep -n "snapshot_dir\|download_dir\|upload_dir" src/ares/environments/snapshot.py | head -20Repository: withmartian/ares
Length of output: 387
🏁 Script executed:
# Check if there's any extraction or decompression happening with the "tarball"
rg "tar\|gz\|extract\|decompress" src/ares/environments/base.pyRepository: withmartian/ares
Length of output: 42
🏁 Script executed:
# Check the test to see what's actually in container_fs.tar.gz
sed -n '1,200p' src/ares/environments/snapshot_test.py | tail -100Repository: withmartian/ares
Length of output: 3127
🏁 Script executed:
# Look at the full download_dir call to understand the behavior better
sed -n '530,545p' src/ares/environments/base.pyRepository: withmartian/ares
Length of output: 610
🏁 Script executed:
# Let me check what download_dir actually creates - is it a tarball or a directory?
# Look at the test mock implementation
sed -n '1,150p' src/ares/environments/snapshot_test.py | grep -A 30 "class MockContainerWithDirOps"Repository: withmartian/ares
Length of output: 42
🏁 Script executed:
# Check if the naming is misleading - does "container_fs.tar.gz" actually create a tarball or directory structure?
# Let me see the full mock implementation for upload_dir
sed -n '1,300p' src/ares/environments/snapshot_test.py | grep -A 10 "async def upload_dir"Repository: withmartian/ares
Length of output: 42
🏁 Script executed:
# Get the full snapshot_test.py to see the mock implementations
cat src/ares/environments/snapshot_test.pyRepository: withmartian/ares
Length of output: 13577
🏁 Script executed:
# Let me check the exact implementation of upload_dir more carefully, especially the destination_path line
sed -n '/async def upload_dir/,/async def download_file/p' src/ares/containers/containers.py | head -20Repository: withmartian/ares
Length of output: 765
🏁 Script executed:
# Check if remote_path is maybe converted to Path somewhere
rg "destination_path.*remote_path" -B 2 -A 2Repository: withmartian/ares
Length of output: 458
🏁 Script executed:
# Check if this code actually works - the remote_path is a string, but it's being used with /
# Let me trace through: remote_path is "/" (a string), relative_path is Path
# destination_path = str(remote_path / relative_path) should fail
# Let me verify if there's any conversion happening
python3 << 'EOF'
# Test if string / Path works
s = "/"
p = "file.txt"
try:
result = s / p
print("Success:", result)
except TypeError as e:
print("Error:", e)
EOFRepository: withmartian/ares
Length of output: 117
🏁 Script executed:
# Also check if there's maybe a different version or if pathlib.Path is being used
rg "destination_path.*=" -B 5 -A 1 src/ares/containers/Repository: withmartian/ares
Length of output: 550
🏁 Script executed:
# Check the actual test to see if it exercises upload_dir properly
grep -A 30 "async def upload_dir" src/ares/environments/snapshot_test.pyRepository: withmartian/ares
Length of output: 1172
Critical bug: upload_dir call will fail due to type error in container implementation.
The code at line 497 calls container.upload_dir(fs_path, "/") where fs_path is a .tar.gz file. However, the upload_dir implementation in src/ares/containers/containers.py has a type error on line that computes destination_path = str(remote_path / relative_path). Since remote_path is a string (not a pathlib.Path), the / operator fails with TypeError: unsupported operand type(s) for /: 'str' and 'str'.
Additionally, upload_dir expects a directory and iterates through it with rglob("*"). Passing a .tar.gz file will not yield any files to upload, breaking snapshot restoration. The tarball should either be uploaded as a single file using upload_file or extracted before passing to upload_dir.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ares/environments/base.py` around lines 494 - 498, The call
container.upload_dir(fs_path, "/") is wrong because upload_dir expects a
directory and its implementation uses remote_path / relative_path which fails
when remote_path is a str; either upload the tarball as a single file with
container.upload_file(fs_path, "/container_fs.tar.gz") or extract fs_path to a
temporary directory and call container.upload_dir(extracted_dir, "/");
additionally fix the upload_dir implementation in
src/ares/containers/containers.py by ensuring remote_path is converted to a
pathlib.Path (e.g., remote_base = Path(remote_path)) before using the / operator
and keep iteration over Path.rglob("*") to handle directories correctly.
| Usage:: | ||
|
|
||
| import asyncio | ||
| from ares.environments import wrap_as_gym | ||
|
|
||
| async def main(): | ||
| async with MyAresEnv(...) as ares_env: | ||
| env = wrap_as_gym(ares_env) | ||
| obs, info = env.reset() | ||
| while True: | ||
| action = policy(obs) | ||
| obs, reward, terminated, truncated, info = env.step(action) | ||
| if terminated or truncated: | ||
| break |
There was a problem hiding this comment.
Documentation example would fail at runtime.
The usage example shows calling the synchronous GymWrapper from inside async def main(), but _run() raises RuntimeError when an event loop is already running. This example will not work as written.
Consider either:
- Showing the sync wrapper used from a truly synchronous context (no
async def), or - Showing
async_mode=Truefor the async context example.
📝 Suggested fix
Usage::
- import asyncio
from ares.environments import wrap_as_gym
- async def main():
- async with MyAresEnv(...) as ares_env:
- env = wrap_as_gym(ares_env)
- obs, info = env.reset()
- while True:
- action = policy(obs)
- obs, reward, terminated, truncated, info = env.step(action)
- if terminated or truncated:
- break
+ # Synchronous usage (from a sync context)
+ env = wrap_as_gym(my_ares_env)
+ obs, info = env.reset()
+ while True:
+ action = policy(obs)
+ obs, reward, terminated, truncated, info = env.step(action)
+ if terminated or truncated:
+ break
+ env.close()
+
+ # Async usage (inside async context)
+ env = wrap_as_gym(my_ares_env, async_mode=True)
+ obs, info = await env.reset()
+ ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ares/environments/gym_wrapper.py` around lines 7 - 20, The documentation
example is invalid because wrap_as_gym returns the synchronous GymWrapper whose
_run() raises RuntimeError if an event loop is already running; update the
example to either (A) show synchronous usage outside of async context (call
wrap_as_gym(...) from normal def or top-level code) or (B) keep the async
context but pass async_mode=True to wrap_as_gym so it returns the
async-compatible wrapper; reference wrap_as_gym, GymWrapper/_run, and the
async_mode=True flag so readers know which API to change.
| def _run(coro: Any) -> Any: | ||
| """Run a coroutine, re-using the running loop if one exists.""" |
There was a problem hiding this comment.
Docstring is misleading.
The docstring says "re-using the running loop if one exists," but the function actually raises RuntimeError when a loop is running.
📝 Suggested fix
def _run(coro: Any) -> Any:
- """Run a coroutine, re-using the running loop if one exists."""
+ """Run a coroutine via asyncio.run, raising if a loop is already running."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _run(coro: Any) -> Any: | |
| """Run a coroutine, re-using the running loop if one exists.""" | |
| def _run(coro: Any) -> Any: | |
| """Run a coroutine via asyncio.run, raising if a loop is already running.""" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ares/environments/gym_wrapper.py` around lines 34 - 35, The docstring for
_run is misleading because the function raises RuntimeError when an event loop
is already running rather than re-using it; update the _run function's docstring
to accurately describe behavior (i.e., it runs the coroutine by creating a new
event loop and raises RuntimeError if a running loop exists) and mention the
specific exception raised (RuntimeError) and any side effects so callers of _run
know to avoid calling it from within a running event loop.
| Example:: | ||
|
|
||
| # Synchronous usage (default) | ||
| async with MyAresEnv(...) as ares_env: | ||
| env = wrap_as_gym(ares_env) | ||
| obs, info = env.reset() | ||
|
|
||
| # Async usage (inside async context) | ||
| async with MyAresEnv(...) as ares_env: | ||
| env = wrap_as_gym(ares_env, async_mode=True) | ||
| obs, info = await env.reset() | ||
| """ |
There was a problem hiding this comment.
Docstring example for sync usage is inside async context.
Same issue as the module docstring: the sync example at lines 197-200 is inside async with, which would raise RuntimeError because an event loop is running.
📝 Suggested fix
Example::
- # Synchronous usage (default)
- async with MyAresEnv(...) as ares_env:
- env = wrap_as_gym(ares_env)
- obs, info = env.reset()
+ # Synchronous usage (from sync context)
+ env = wrap_as_gym(my_ares_env)
+ obs, info = env.reset()
# Async usage (inside async context)
async with MyAresEnv(...) as ares_env:
env = wrap_as_gym(ares_env, async_mode=True)
obs, info = await env.reset()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Example:: | |
| # Synchronous usage (default) | |
| async with MyAresEnv(...) as ares_env: | |
| env = wrap_as_gym(ares_env) | |
| obs, info = env.reset() | |
| # Async usage (inside async context) | |
| async with MyAresEnv(...) as ares_env: | |
| env = wrap_as_gym(ares_env, async_mode=True) | |
| obs, info = await env.reset() | |
| """ | |
| Example:: | |
| # Synchronous usage (from sync context) | |
| env = wrap_as_gym(my_ares_env) | |
| obs, info = env.reset() | |
| # Async usage (inside async context) | |
| async with MyAresEnv(...) as ares_env: | |
| env = wrap_as_gym(ares_env, async_mode=True) | |
| obs, info = await env.reset() | |
| """ |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ares/environments/gym_wrapper.py` around lines 195 - 206, The synchronous
example in the wrap_as_gym docstring is shown inside an async context, which is
invalid; update the docstring examples for wrap_as_gym so the sync usage is
shown outside any async context (use a normal with or direct instantiation for
MyAresEnv and call env.reset() synchronously) and keep the async usage example
separate using async with and async_mode=True for wrap_as_gym; adjust the text
around the examples to clearly label "Synchronous usage" and "Async usage" so
readers use the correct context for the reset() call.
| async def load_from_state( | ||
| cls, | ||
| snapshot_path: "base.snapshot.EnvironmentSnapshot | pathlib.Path", | ||
| *, | ||
| container_factory: containers.ContainerFactory | None = None, | ||
| code_agent_factory: code_agent_base.CodeAgentFactory | None = None, | ||
| tracker: stat_tracker.StatTracker | None = None, | ||
| ) -> "SweBenchEnv": | ||
| """Restore SweBenchEnv from snapshot. | ||
|
|
||
| Args: | ||
| snapshot_path: EnvironmentSnapshot or path to snapshot.json |
There was a problem hiding this comment.
SweBenchEnv.load_from_state duplicates HarborEnv.load_from_state — should we move the shared snapshot-restoration logic into base.Environment._load_from_snapshot as a @classmethod and have subclasses supply task (de)serialization?
Finding types: Conciseness Code Dedup and Conventions | Severity: 🟠 Medium
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
src/ares/environments/swebench_env.py around lines 272 to 319, the classmethod
load_from_state duplicates snapshot normalization, task deserialization, default factory
selection, env construction and _restore_from_snapshot boilerplate that is also present
in HarborEnv.load_from_state. Refactor by adding a shared classmethod on the base
environment (e.g. base.Environment._load_from_snapshot(cls, snap, *, container_factory,
code_agent_factory, tracker)) that: 1) accepts an already-loaded snapshot object, 2)
selects default container/code_agent factories when None, 3) constructs the env instance
with tasks=[task], step_limit=snap.step_limit and tracker, and 4) calls await
env._restore_from_snapshot(snap) and returns env. Then change
SweBenchEnv.load_from_state to only load/normalize snapshot_path, call
cls._deserialize_task to get the task, and delegate the remaining construction/restore
to base.Environment._load_from_snapshot (passing the chosen factories and tracker).
Apply the same refactor to src/ares/environments/harbor_env.py (lines ~169-229) so both
subclasses use the shared helper and only implement their task
serialization/deserialization hooks.
| container_factory = container_factory or base.ares_daytona.DaytonaContainer | ||
| code_agent_factory = code_agent_factory or mini_swe_agent.MiniSWECodeAgent | ||
|
|
||
| env = cls( | ||
| tasks=[task], | ||
| container_factory=container_factory, | ||
| code_agent_factory=code_agent_factory, | ||
| step_limit=snap.step_limit, | ||
| tracker=tracker, | ||
| ) |
There was a problem hiding this comment.
load_from_state() falls back to base.ares_daytona.DaytonaContainer when container_factory is omitted, so Docker snapshots recorded in snap.container_type are restored as Daytona — should we honor snap.container_type or call _default_container_factory(snap) when container_factory is not provided?
Finding type: Logical Bugs | Severity: 🔴 High
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
src/ares/environments/swebench_env.py around lines 305-314, the load_from_state method
always falls back to base.ares_daytona.DaytonaContainer when container_factory is
omitted, which ignores the snapshot's container_type and restores Docker snapshots as
Daytona. Change the code so that when container_factory is None it consults the snapshot
(snap.container_type) or calls the existing helper (for example
base._default_container_factory(snap)) to obtain the proper factory, and only falls back
to base.ares_daytona.DaytonaContainer if the snapshot metadata/helper cannot determine a
factory. Ensure the chosen factory type is compatible with the rest of the method (no
API changes) and add a small comment explaining the behavior.
| async with restored_env: | ||
| print("\n[PART 3] Continuing from restored state...") | ||
|
|
||
| # The environment is now at the same state as when we snapshotted | ||
| # We can continue taking steps from here | ||
| ts = await restored_env.reset() # Reset to start a new episode | ||
| step_count = 0 |
There was a problem hiding this comment.
restored_env.reset() discards the restored snapshot because reset() always stops/clears and reinitializes, should we change Environment.reset() to avoid stopping/clearing for restored envs or add a start-restored method, or else mark snapshots non-resumable and verify _restore_from_snapshot() rehydrates agent/task/message history so we can continue without reset()?
Finding type: Logical Bugs | Severity: 🔴 High
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
examples/03_state_snapshotting.py around lines 115-121, the example calls
restored_env.reset() to “continue” the restored environment; this is wrong because
reset() destroys and recreates the container/agent and therefore discards the restored
snapshot. Change the example to NOT call reset() after load; instead either (A) call a
non-destructive startup method (implement and call Environment.start_after_restore() or
resume() in src/ares/environments/base.py that brings a restored env into a running
state without stopping/clearing the container), or (B) if you don’t want a new API,
use the restored_env's existing in-memory state (use the restored timestep/task fields
populated by _restore_from_snapshot) and proceed to step() from that state. Also update
or add a test and a short comment explaining that snapshots must be resumed via this
non-destructive path, and inspect/ensure base._restore_from_snapshot rehydrates agent
message history and _code_agent/_code_agent_task appropriately so continuing execution
works without calling reset().
| def _require_container(self) -> containers.Container: | ||
| """Get container or raise if not available.""" | ||
| if self._container is None: | ||
| raise RuntimeError("Container is not available.") | ||
| return self._container |
There was a problem hiding this comment.
_require_container and other snapshot helpers raise bare RuntimeError/ValueError, preventing callers/tests from distinguishing snapshot/episode-boundary failures — should we introduce environment-specific exceptions like SnapshotError/SnapshotTimingError and raise those instead?
Finding type: AI Coding Guidelines | Severity: 🟠 Medium
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
src/ares/environments/base.py around lines 438 to 442, the helper methods
(_require_container, _require_task, _validate_snapshot_allowed and related snapshot
logic) currently raise bare RuntimeError/ValueError. Add environment-specific exception
classes (e.g. SnapshotError as base, SnapshotTimingError for mid-episode snapshot
attempts, and SnapshotMetadataError for invalid snapshot metadata) near the top of the
module or in src/ares/environments/__init__.py. Replace the RuntimeError in
_require_container/_require_task with SnapshotError or a more specific subclass, replace
the RuntimeError raised in _validate_snapshot_allowed with SnapshotTimingError, and
replace the ValueError in _restore_container (invalid snapshot metadata) with
SnapshotMetadataError. Ensure export_state/docstrings reference the new exceptions and
update any tests or callers that expect RuntimeError to catch the new exception types.
| def _validate_snapshot_allowed(self) -> None: | ||
| """Raise error if snapshot not allowed (mid-episode).""" | ||
| if self._code_agent_task is not None and not self._code_agent_task.done(): | ||
| raise RuntimeError( | ||
| "Cannot snapshot during active episode. Call export_state() after reset() or after final step()." | ||
| ) |
There was a problem hiding this comment.
_validate_snapshot_allowed() blocks snapshots while _code_agent_task is running, so export_state() fails after reset(); should we relax the guard to allow snapshots at the episode boundary by checking _requires_reset?
Finding type: Logical Bugs | Severity: 🔴 High
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
src/ares/environments/base.py around lines 450 to 455, the _validate_snapshot_allowed
method forbids snapshots whenever self._code_agent_task is running, but reset() (around
lines ~302-322) starts the code agent and returns while that task is still pending,
which prevents valid 'snapshot after reset' behavior. Modify _validate_snapshot_allowed
so it only raises if the agent task is running AND self._requires_reset is False (i.e.,
allow snapshots immediately after reset when _requires_reset is True). Additionally, in
the reset() method (lines ~302-322) set self._requires_reset = True at the end of reset
to mark the post-reset boundary, and in the first step() invocation clear
self._requires_reset = False (or where the agent begins actual execution) so subsequent
snapshots are again blocked while the episode is active. Ensure these changes include
small comments explaining the intent.
|
|
||
| # Store messages for later restoration | ||
| self._saved_agent_messages = snap.agent_messages | ||
|
|
There was a problem hiding this comment.
Should we seed _code_agent._messages from _saved_agent_messages before run so restored snapshots actually resume the prior conversation?
Finding type: Logical Bugs | Severity: 🔴 High
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
src/ares/environments/base.py around lines 586 to 589, the _restore_from_snapshot method
stores prior agent messages in self._saved_agent_messages but nothing injects those
messages into the newly created code agent, causing conversation history loss. Modify
the restore/start flow so that when subclasses instantiate the code agent (e.g., their
_start_code_agent or equivalent that currently creates MiniSWECodeAgent), they seed the
agent's internal message history with self._saved_agent_messages before calling
run/start: either add an optional messages parameter to the code agent
factory/constructor and pass self._saved_agent_messages, or after construction set
code_agent._messages = list(self._saved_agent_messages) (defensive copy) before starting
the agent. Also clear self._saved_agent_messages after seeding to avoid double-restore.
| from ares.environments.base import TimeStep | ||
| from ares.environments.gym_wrapper import AsyncGymWrapper | ||
| from ares.environments.gym_wrapper import GymWrapper | ||
| from ares.environments.gym_wrapper import wrap_as_gym |
There was a problem hiding this comment.
gym_wrapper_test.py deep-imports AsyncGymWrapper, GymWrapper, and wrap_as_gym from ares.environments.gym_wrapper — should we from ares.environments import gym_wrapper and use gym_wrapper.AsyncGymWrapper, gym_wrapper.GymWrapper, and gym_wrapper.wrap_as_gym to follow CLAUDE.md?
Finding type: AI Coding Guidelines | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
src/ares/environments/gym_wrapper_test.py around lines 11 to 14, the file currently does
deep symbol imports of AsyncGymWrapper, GymWrapper, and wrap_as_gym from
ares.environments.gym_wrapper. Change the import to import the module instead (from
ares.environments import gym_wrapper) and then update all usages in the file to
reference gym_wrapper.AsyncGymWrapper, gym_wrapper.GymWrapper, and
gym_wrapper.wrap_as_gym respectively. Ensure tests still run by replacing every
occurrence of the unqualified symbols with the qualified module-prefixed names.
Summary
Closes #25.
Adds
wrap_as_gym()— a factory that wraps any ARESEnvironmentin a gymnasium v26+ compatible interface, making ARES accessible to RL researchers and libraries already familiar with the gymnasium API.New public API
Design decisions
observation_space/action_spaceareNone— ARES operates on structuredLLMRequest/LLMResponseobjects, not numpy arrays, so standard gymnasium spaces don't apply. Libraries that strictly need array spaces can subclass and override.truncatedis alwaysFalse— ARES raisesTimeoutErrorfor time limits rather than using a truncation flag.info["time_step"]passes through the rawTimeStepso callers can inspectstep_type,discount, etc. if needed.GymWrapperraisesRuntimeErrorif called from inside a running event loop, with a clear message directing users toAsyncGymWrapper.Files changed
src/ares/environments/gym_wrapper.pyGymWrapper,AsyncGymWrapper,wrap_as_gymsrc/ares/environments/gym_wrapper_test.pysrc/ares/environments/__init__.pyTest plan
uv run pytest src/ares/environments/gym_wrapper_test.py)uv run ruff checkpassesuv run ruff formatappliedSummary by CodeRabbit
Release Notes
New Features
Documentation