Skip to content

Add agent state serialization for Go-Explore checkpointing#95

Open
joshgreaves wants to merge 1 commit intomainfrom
go-explore/agent-state-serialization
Open

Add agent state serialization for Go-Explore checkpointing#95
joshgreaves wants to merge 1 commit intomainfrom
go-explore/agent-state-serialization

Conversation

@joshgreaves
Copy link
Copy Markdown
Contributor

@joshgreaves joshgreaves commented Mar 13, 2026

User description

Summary

  • Adds CodeAgentState frozen dataclass capturing an agent's conversational state (messages, call count, cost)
  • Adds CheckpointableCodeAgent protocol extending CodeAgent with get_state() and restore_and_resume()
  • Implements both methods on MiniSWECodeAgent -- restore_and_resume() injects saved message history and re-enters the step loop without re-running the setup preamble
  • Fixes pre-existing duplicate preset registration bug in presets.py

Context

Part of Go-Explore support (Wave 1). This PR is independent of the other two Wave 1 PRs and can be reviewed/merged in parallel:

  • PR: SnapshotableContainer protocol (go-explore/snapshotable-container)
  • PR: EnvironmentCheckpoint protocol (go-explore/checkpoint-protocol)

The key insight: at any checkpoint, a code agent's behavior is fully determined by its message history. Given the same messages and a container in the correct filesystem state, the agent can resume from any point without replaying commands.

Tests

5 new tests covering state capture, deep copy isolation, immutability, and restore+resume with mock LLM/container.


Generated description

Below is a concise technical summary of the changes proposed in this PR:
Introduce CodeAgentState and CheckpointableCodeAgent to capture conversational context and enable MiniSWECodeAgent to serialize/deep-copy/restore its message/counter state for checkpointing in Go-Explore, complete with tests ensuring immutability and restore behavior. Fix default preset registration by deduplicating harbor dataset/code agent identifiers when populating the registry.

TopicDetails
Agent Checkpointing Enable checkpoint flow by adding CodeAgentState/CheckpointableCodeAgent, letting MiniSWECodeAgent capture, deep-copy, and resume its conversational state while exercising the new behavior via targeted agent tests.
Modified files (3)
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/mini_swe_agent_state_test.py
Latest Contributors(2)
UserCommitDate
joshua.greaves@gmail.comAdd-an-LLMResponse-mod...January 29, 2026
ryan@withmartian.comAllowed-all-harbor-dat...January 28, 2026
Preset Dedup Prevent duplicate default presets by tracking seen harbor dataset/code agent combinations before registering them in the preset registry.
Modified files (1)
  • src/ares/presets.py
Latest Contributors(2)
UserCommitDate
Narmeen07Add-mechanistic-interp...February 19, 2026
ryan@withmartian.comRemoved-repeated-indiv...January 29, 2026
This pull request is reviewed by Baz. Review like a pro on (Baz).

Summary by CodeRabbit

  • New Features

    • Agent checkpointing: immutable conversational snapshots and restore-and-resume support; includes assistant-role messages and resumed run-loop handling.
  • Bug Fixes

    • Prevented duplicate preset registrations during initialization.
  • Tests

    • Added unit tests for state capture, immutability, deep-copy behavior, and restore-and-resume workflows.
  • Chores

    • Updated benchmark config path and tightened agent configuration loading/validation.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

Added an immutable CodeAgentState dataclass and CheckpointableCodeAgent protocol; extended MiniSWECodeAgent with get_state and restore_and_resume, adjusted config loading and message handling; added unit tests for state/restore; and added duplicate-guarding to preset registration.

Changes

Cohort / File(s) Summary
Agent base
src/ares/code_agents/code_agent_base.py
Added @dataclasses.dataclass(frozen=True) CodeAgentState (messages, n_calls, total_cost) and CheckpointableCodeAgent protocol declaring get_state() and restore_and_resume(...).
Mini SWE agent
src/ares/code_agents/mini_swe_agent.py
Config path switched to benchmarks/swebench.yaml; construct AgentConfig by filtering AgentConfig.model_fields (ignore unknown keys); added get_state() returning CodeAgentState; added async restore_and_resume() to restore state and resume run loop; _add_message() now accepts "user" and "assistant" and asserts on invalid roles.
State tests
src/ares/code_agents/mini_swe_agent_state_test.py
New pytest-asyncio tests for get_state() snapshot/immutability/deep-copy and restore_and_resume() flow using mocked LLM and container.
Presets dedupe
src/ares/presets.py
Added seen set and composite preset_name to skip duplicate preset registrations in _register_default_presets.
Manifests / metadata
manifest_file, package.json
Minor metadata/manifest entries touched (no API changes).

Sequence Diagram(s)

sequenceDiagram
    participant Agent as MiniSWECodeAgent
    participant LLM as LLM
    participant Container as RunContainer

    Agent->>Agent: get_state() -> CodeAgentState
    Note over Agent: captures messages, n_calls, total_cost

    Agent->>Agent: restore_and_resume(state, task)
    Agent->>LLM: send resume prompt / step
    LLM-->>Agent: response
    Agent->>Container: execute/run step
    Container-->>Agent: result / termination status
    alt not terminated
        Agent->>LLM: continue next step
    else terminated
        Agent->>Agent: finalize and return
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

"I hopped in code with a careful twirl,
Saved chats and counts in a tiny pearl.
When storms return, I stitch the seam—
Restore, resume, and chase the dream.
🐇✨"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add agent state serialization for Go-Explore checkpointing' directly and clearly summarizes the main changes: introduction of CodeAgentState dataclass and CheckpointableCodeAgent protocol for state serialization to enable checkpointing in Go-Explore exploration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch go-explore/agent-state-serialization
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@mwilliammyers mwilliammyers marked this pull request as ready for review March 16, 2026 23:49
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/ares/code_agents/mini_swe_agent.py (1)

156-179: Consider extracting the shared loop logic to reduce duplication.

The while True loop in restore_and_resume() (lines 169-179) duplicates the loop in run() (lines 214-224). Both handle _NonTerminatingError and _TerminatingError identically.

A minor refactor could extract this into a shared _run_loop() method, though the current duplication is acceptable given it's only two occurrences.

♻️ Optional: Extract shared loop logic
+    async def _run_loop(self) -> None:
+        """Execute the step loop until termination."""
+        while True:
+            try:
+                with self.tracker.timeit("mswea/step"):
+                    await self.step()
+            except _NonTerminatingError as e:
+                _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e)
+                self._add_message("user", repr(e))
+            except _TerminatingError as e:
+                _LOGGER.debug("[%d] Terminating error: %s", id(self), e)
+                self._add_message("user", repr(e))
+                return
+
     async def restore_and_resume(self, state: code_agent_base.CodeAgentState, task: str) -> None:
         # ...
         self._messages = copy.deepcopy(state.messages)
         self._n_calls = state.n_calls
         self._total_cost = state.total_cost
-
-        while True:
-            try:
-                with self.tracker.timeit("mswea/step"):
-                    await self.step()
-            except _NonTerminatingError as e:
-                _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e)
-                self._add_message("user", repr(e))
-            except _TerminatingError as e:
-                _LOGGER.debug("[%d] Terminating error: %s", id(self), e)
-                self._add_message("user", repr(e))
-                return
+        await self._run_loop()

Then also call await self._run_loop() in run() after the setup preamble.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent.py` around lines 156 - 179, The
while-True loop handling _NonTerminatingError and _TerminatingError is
duplicated in restore_and_resume and run; extract that shared behavior into a
new coroutine method (e.g., async def _run_loop(self):) which contains the with
self.tracker.timeit("mswea/step"): await self.step() call and the two except
handlers that call self._add_message and return on _TerminatingError, then
replace the loop in both restore_and_resume and run with a single await
self._run_loop() call so both paths reuse the same logic and maintain identical
error handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 156-179: The while-True loop handling _NonTerminatingError and
_TerminatingError is duplicated in restore_and_resume and run; extract that
shared behavior into a new coroutine method (e.g., async def _run_loop(self):)
which contains the with self.tracker.timeit("mswea/step"): await self.step()
call and the two except handlers that call self._add_message and return on
_TerminatingError, then replace the loop in both restore_and_resume and run with
a single await self._run_loop() call so both paths reuse the same logic and
maintain identical error handling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9451342-00b5-4c71-bb6f-27360f1b91cc

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 1358d0b.

📒 Files selected for processing (4)
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/mini_swe_agent_state_test.py
  • src/ares/presets.py

Comment on lines 120 to 122
def __post_init__(self):
config_path = pathlib.Path(minisweagent.config.builtin_config_dir) / "extra" / "swebench.yaml"
config_path = pathlib.Path(minisweagent.config.builtin_config_dir) / "benchmarks" / "swebench.yaml"
self._config = yaml.safe_load(config_path.read_text())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent raises FileNotFoundError at startup because MiniSWECodeAgent.post_init hard-codes <builtin_config_dir>/benchmarks/swebench.yaml while the package ships the config at config/extra/swebench.yaml (https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/config/extra/swebench.yaml), so config_path.read_text() fails. Can we load config/extra/swebench.yaml from the package or create benchmarks/swebench.yaml in the builtin config dir so the built-in config is found?

Finding type: Breaking Changes | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In src/ares/code_agents/mini_swe_agent.py around lines 120-122, the __post_init__ method
now hard-codes config_path to builtin_config_dir / "benchmarks" / "swebench.yaml", which
breaks systems that ship the config under config/extra/swebench.yaml. Change this to
first try builtin_config_dir / "benchmarks" / "swebench.yaml" but if that file does not
exist, fall back to builtin_config_dir / "extra" / "swebench.yaml" (or vice-versa to
preserve original behavior); read the first existing file and raise a clear
FileNotFoundError only if neither exists. Ensure the change keeps existing behavior for
environment config and logs which path was used.

Comment on lines +165 to +169
self._messages = copy.deepcopy(state.messages)
self._n_calls = state.n_calls
self._total_cost = state.total_cost

while True:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore_and_resume() copies state.messages into _messages and re-enters the step loop without executing the preamble run() appends, so a checkpoint taken before that preamble resumes with an empty conversation and the next query() lacks the instance/system task instructions. Can we ensure saved state always includes the preamble or re-run it before resuming?

Finding type: Logical Bugs | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In src/ares/code_agents/mini_swe_agent.py around lines 165-169, the async method
restore_and_resume currently deletes the task and blindly copies state.messages into
self._messages, which can resume without the run() preamble (the instance_template user
message). Change restore_and_resume to: do NOT delete the task parameter; after
restoring state fields, detect whether the preamble/instance_template message is missing
(e.g., check for a message based on self._agent_config['instance_template'] or for an
initial user message in self._messages). If it is missing, re-run the same setup
preamble used in run() — call the container to collect system, release, version,
machine, render the instance_template with the provided task and append it via
self._add_message — then continue into the existing step loop. Make the change only in
restore_and_resume (and optionally ensure get_state captures the preamble before
snapshotting) so resumed runs always have the required task/system context.

@gowthamkishore3
Copy link
Copy Markdown

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/ares/code_agents/mini_swe_agent_state_test.py (1)

74-110: Consider strengthening the message count assertion.

The test correctly validates the restore and resume workflow. However, line 107's assertion len(agent._messages) >= 2 is weak. Given the mock returns one response that triggers termination, the expected message count should be deterministic (restored messages + assistant response + terminating error message).

💡 More precise assertion
     # State should have been restored before the loop ran
     assert agent._n_calls == 4  # 3 from state + 1 from the query in the loop
-    assert len(agent._messages) >= 2  # At least the restored messages
+    # 2 restored + 1 assistant response + 1 terminating error = 4 messages
+    assert len(agent._messages) == 4
     # The first two messages should be the restored ones
     assert agent._messages[0]["content"] == "saved message"
     assert agent._messages[1]["content"] == "saved response"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent_state_test.py` around lines 74 - 110, The
len(agent._messages) assertion is too weak; update the
test_restore_and_resume_sets_state test to assert the deterministic expected
count after restore_and_resume: restored messages (2) + the assistant response
produced by MockLLMClient + the terminating error message from the container, so
expect exactly 4 messages; keep the other assertions and optionally assert the
contents of agent._messages[2] and agent._messages[3] to match the assistant
response and the terminating output; reference agent.restore_and_resume,
agent._messages, and agent._n_calls when making the change.
src/ares/presets.py (1)

128-147: Consider logging when duplicate presets are skipped.

The deduplication logic correctly prevents ValueError from registry.register_preset. However, silently skipping duplicates could make debugging harder if duplicates appear unexpectedly (e.g., due to list_harbor_datasets() returning duplicates or _make_harbor_dataset_id() collisions).

💡 Suggested improvement: Add debug logging for skipped duplicates
             preset_name = f"{ds_id}-{code_agent_id}"
             if preset_name in seen:
+                _LOGGER.debug("Skipping duplicate preset: %s", preset_name)
                 continue
             seen.add(preset_name)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/presets.py` around lines 128 - 147, The loop deduplicates presets
(seen, code_env.list_harbor_datasets(), _make_harbor_dataset_id, preset_name)
but silently skips duplicates; add a debug log immediately before the continue
to record that a duplicate preset was skipped and include identifying info
(preset_name, ds_spec.name, ds_spec.version, code_agent_id, ds_id) so it's easy
to trace whether duplicates come from list_harbor_datasets() or id collisions;
use the module logger (logger.debug(...)) or obtain one via
logging.getLogger(__name__) if not present, keeping the rest of the
register_preset flow unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 130-136: The code builds an AgentConfig instance (agent_config)
from agent_config_dict but never uses it; either remove the dead construction or
replace the dict-based config with the typed instance. Fix by either deleting
the lines that create agent_config
(agent_config_dict/accepted_fields/filtered_config/agent_config) if not needed,
or assign the created object to the instance (self._agent_config =
default_agent.AgentConfig(**filtered_config) and remove the noqa) and update any
code that reads self._agent_config to expect the AgentConfig type; reference
symbols: agent_config_dict, filtered_config, default_agent.AgentConfig, and
self._agent_config.

---

Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent_state_test.py`:
- Around line 74-110: The len(agent._messages) assertion is too weak; update the
test_restore_and_resume_sets_state test to assert the deterministic expected
count after restore_and_resume: restored messages (2) + the assistant response
produced by MockLLMClient + the terminating error message from the container, so
expect exactly 4 messages; keep the other assertions and optionally assert the
contents of agent._messages[2] and agent._messages[3] to match the assistant
response and the terminating output; reference agent.restore_and_resume,
agent._messages, and agent._n_calls when making the change.

In `@src/ares/presets.py`:
- Around line 128-147: The loop deduplicates presets (seen,
code_env.list_harbor_datasets(), _make_harbor_dataset_id, preset_name) but
silently skips duplicates; add a debug log immediately before the continue to
record that a duplicate preset was skipped and include identifying info
(preset_name, ds_spec.name, ds_spec.version, code_agent_id, ds_id) so it's easy
to trace whether duplicates come from list_harbor_datasets() or id collisions;
use the module logger (logger.debug(...)) or obtain one via
logging.getLogger(__name__) if not present, keeping the rest of the
register_preset flow unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9dbab72d-42d6-4dc2-af1a-565d79d70adb

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 1358d0b.

📒 Files selected for processing (4)
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/mini_swe_agent_state_test.py
  • src/ares/presets.py

Comment on lines +130 to +136
# We handle this by filtering to only fields that AgentConfig accepts.
agent_config_dict = self._config.get("agent", {})
agent_config = default_agent.AgentConfig()
for k, v in agent_config_dict.items():
if hasattr(default_agent.AgentConfig, k):
setattr(agent_config, k, v)
else:
_LOGGER.info("Ignoring argument %s in agent configuration.", k)
accepted_fields = set(default_agent.AgentConfig.model_fields)
filtered_config = {k: v for k, v in agent_config_dict.items() if k in accepted_fields}
for k in set(agent_config_dict) - set(filtered_config):
_LOGGER.info("Ignoring argument %s in agent configuration.", k)
agent_config = default_agent.AgentConfig(**filtered_config) # noqa: F841
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused agent_config variable or use it.

The agent_config variable is created but never used. The # noqa: F841 comment suppresses the linter warning but doesn't address the underlying issue. If this variable was intended to replace the self._agent_config dict usage elsewhere, the migration appears incomplete. If it's not needed, consider removing the dead code.

🧹 Option 1: Remove if not needed
         accepted_fields = set(default_agent.AgentConfig.model_fields)
         filtered_config = {k: v for k, v in agent_config_dict.items() if k in accepted_fields}
         for k in set(agent_config_dict) - set(filtered_config):
             _LOGGER.info("Ignoring argument %s in agent configuration.", k)
-        agent_config = default_agent.AgentConfig(**filtered_config)  # noqa: F841
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# We handle this by filtering to only fields that AgentConfig accepts.
agent_config_dict = self._config.get("agent", {})
agent_config = default_agent.AgentConfig()
for k, v in agent_config_dict.items():
if hasattr(default_agent.AgentConfig, k):
setattr(agent_config, k, v)
else:
_LOGGER.info("Ignoring argument %s in agent configuration.", k)
accepted_fields = set(default_agent.AgentConfig.model_fields)
filtered_config = {k: v for k, v in agent_config_dict.items() if k in accepted_fields}
for k in set(agent_config_dict) - set(filtered_config):
_LOGGER.info("Ignoring argument %s in agent configuration.", k)
agent_config = default_agent.AgentConfig(**filtered_config) # noqa: F841
# We handle this by filtering to only fields that AgentConfig accepts.
agent_config_dict = self._config.get("agent", {})
accepted_fields = set(default_agent.AgentConfig.model_fields)
filtered_config = {k: v for k, v in agent_config_dict.items() if k in accepted_fields}
for k in set(agent_config_dict) - set(filtered_config):
_LOGGER.info("Ignoring argument %s in agent configuration.", k)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent.py` around lines 130 - 136, The code
builds an AgentConfig instance (agent_config) from agent_config_dict but never
uses it; either remove the dead construction or replace the dict-based config
with the typed instance. Fix by either deleting the lines that create
agent_config (agent_config_dict/accepted_fields/filtered_config/agent_config) if
not needed, or assign the created object to the instance (self._agent_config =
default_agent.AgentConfig(**filtered_config) and remove the noqa) and update any
code that reads self._agent_config to expect the AgentConfig type; reference
symbols: agent_config_dict, filtered_config, default_agent.AgentConfig, and
self._agent_config.

@gowthamkishore3
Copy link
Copy Markdown

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/ares/code_agents/mini_swe_agent.py (1)

156-179: LGTM with optional refactor: consider extracting the step loop.

The restore logic correctly deep-copies the state to prevent mutations. The while True loop (lines 169-179) duplicates the pattern in run() (lines 214-224). Consider extracting to a shared helper like _run_step_loop() to reduce duplication.

♻️ Optional refactor to extract step loop
+    async def _run_step_loop(self) -> None:
+        """Run the step/execute loop until termination."""
+        while True:
+            try:
+                with self.tracker.timeit("mswea/step"):
+                    await self.step()
+            except _NonTerminatingError as e:
+                _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e)
+                self._add_message("user", repr(e))
+            except _TerminatingError as e:
+                _LOGGER.debug("[%d] Terminating error: %s", id(self), e)
+                self._add_message("user", repr(e))
+                return
+
     async def restore_and_resume(self, state: code_agent_base.CodeAgentState, task: str) -> None:
         ...
         self._messages = copy.deepcopy(state.messages)
         self._n_calls = state.n_calls
         self._total_cost = state.total_cost
-
-        while True:
-            try:
-                with self.tracker.timeit("mswea/step"):
-                    await self.step()
-            except _NonTerminatingError as e:
-                _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e)
-                self._add_message("user", repr(e))
-            except _TerminatingError as e:
-                _LOGGER.debug("[%d] Terminating error: %s", id(self), e)
-                self._add_message("user", repr(e))
-                return
+        await self._run_step_loop()

Then update run() similarly to call await self._run_step_loop() after setup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent.py` around lines 156 - 179, The while
True loop in restore_and_resume duplicates the step loop in run; extract that
repeated logic into a helper (e.g. _run_step_loop) that encapsulates the
try/except around await self.step(), the tracker.timeit("mswea/step") context,
and handling of _NonTerminatingError and _TerminatingError by calling
self._add_message and returning on termination; replace the loop in
restore_and_resume with await self._run_step_loop() and update run() to call the
same helper so both use the shared implementation.
src/ares/code_agents/code_agent_base.py (1)

21-31: Good design. Note: messages list remains mutable despite frozen dataclass.

The frozen dataclass prevents attribute reassignment but doesn't prevent state.messages.append(...) or state.messages[0]["content"] = "x". This is acceptable since get_state() implementations deep-copy before construction, but callers could still mutate the snapshot.

If stricter immutability is desired, consider using tuple[request.Message, ...] instead:

💡 Optional: use tuple for true immutability
 `@dataclasses.dataclass`(frozen=True)
 class CodeAgentState:
-    messages: list[request.Message]
+    messages: tuple[request.Message, ...]
     n_calls: int
     total_cost: float

Then in MiniSWECodeAgent.get_state():

-    messages=copy.deepcopy(self._messages),
+    messages=tuple(copy.deepcopy(self._messages)),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/code_agent_base.py` around lines 21 - 31, The
CodeAgentState currently defines messages as a mutable list which can be mutated
despite the dataclass being frozen; change the field type from
list[request.Message] to tuple[request.Message, ...] in CodeAgentState and
update any state construction sites (e.g., MiniSWECodeAgent.get_state()) to
convert/copy the messages into an immutable tuple when creating the
CodeAgentState instance so the snapshot cannot be mutated after creation.
src/ares/code_agents/mini_swe_agent_state_test.py (1)

74-110: Good integration test. Consider tightening the message count assertion.

The test correctly validates restore+resume flow. The assertion len(agent._messages) >= 2 (line 107) is intentionally loose but could be more precise. After one loop iteration that terminates via _SubmittedError, the agent should have exactly 4 messages: 2 restored + 1 assistant response + 1 user message (the terminating error repr).

💡 Optional: more specific assertion
-    assert len(agent._messages) >= 2  # At least the restored messages
+    # 2 restored + 1 assistant (LLM response) + 1 user (_SubmittedError repr)
+    assert len(agent._messages) == 4
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent_state_test.py` around lines 74 - 110, The
loose assertion len(agent._messages) >= 2 in test_restore_and_resume_sets_state
should be tightened: after calling restore_and_resume the agent should have
exactly 4 messages (2 restored + 1 assistant response + 1 user
submission/error), so replace the length check with assert len(agent._messages)
== 4 and add assertions that agent._messages[2]["role"] or content matches the
assistant response and agent._messages[3] matches the terminating
user/submission message (use restore_and_resume, agent._messages, and
agent._n_calls to locate and validate these values).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/ares/code_agents/code_agent_base.py`:
- Around line 21-31: The CodeAgentState currently defines messages as a mutable
list which can be mutated despite the dataclass being frozen; change the field
type from list[request.Message] to tuple[request.Message, ...] in CodeAgentState
and update any state construction sites (e.g., MiniSWECodeAgent.get_state()) to
convert/copy the messages into an immutable tuple when creating the
CodeAgentState instance so the snapshot cannot be mutated after creation.

In `@src/ares/code_agents/mini_swe_agent_state_test.py`:
- Around line 74-110: The loose assertion len(agent._messages) >= 2 in
test_restore_and_resume_sets_state should be tightened: after calling
restore_and_resume the agent should have exactly 4 messages (2 restored + 1
assistant response + 1 user submission/error), so replace the length check with
assert len(agent._messages) == 4 and add assertions that
agent._messages[2]["role"] or content matches the assistant response and
agent._messages[3] matches the terminating user/submission message (use
restore_and_resume, agent._messages, and agent._n_calls to locate and validate
these values).

In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 156-179: The while True loop in restore_and_resume duplicates the
step loop in run; extract that repeated logic into a helper (e.g.
_run_step_loop) that encapsulates the try/except around await self.step(), the
tracker.timeit("mswea/step") context, and handling of _NonTerminatingError and
_TerminatingError by calling self._add_message and returning on termination;
replace the loop in restore_and_resume with await self._run_step_loop() and
update run() to call the same helper so both use the shared implementation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 07dad877-7fc7-48ad-8b12-2b6d5edc4da8

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 1358d0b.

📒 Files selected for processing (4)
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/mini_swe_agent_state_test.py
  • src/ares/presets.py

@gowthamkishore3
Copy link
Copy Markdown

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
src/ares/code_agents/mini_swe_agent.py (1)

169-179: Extract duplicated loop into a helper method.

The while True loop here (lines 169-179) is identical to the loop in run() (lines 214-224). Consider extracting to a private _run_loop() method to reduce duplication.

♻️ Suggested refactor
+    async def _run_loop(self) -> None:
+        """Execute the step/query loop until termination."""
+        while True:
+            try:
+                with self.tracker.timeit("mswea/step"):
+                    await self.step()
+            except _NonTerminatingError as e:
+                _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e)
+                self._add_message("user", repr(e))
+            except _TerminatingError as e:
+                _LOGGER.debug("[%d] Terminating error: %s", id(self), e)
+                self._add_message("user", repr(e))
+                return
+
     async def restore_and_resume(self, state: code_agent_base.CodeAgentState, task: str) -> None:
         ...
         self._messages = copy.deepcopy(state.messages)
         self._n_calls = state.n_calls
         self._total_cost = state.total_cost
-
-        while True:
-            try:
-                with self.tracker.timeit("mswea/step"):
-                    await self.step()
-            except _NonTerminatingError as e:
-                _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e)
-                self._add_message("user", repr(e))
-            except _TerminatingError as e:
-                _LOGGER.debug("[%d] Terminating error: %s", id(self), e)
-                self._add_message("user", repr(e))
-                return
+        await self._run_loop()

Apply the same change to run() after the setup preamble.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent.py` around lines 169 - 179, Extract the
duplicated "while True" loop into a private helper method named _run_loop that
contains the exact loop body (with with self.tracker.timeit("mswea/step"), await
self.step(), the except handlers for _NonTerminatingError and _TerminatingError,
calls to self._add_message("user", repr(e)), and the return on terminating
error); then replace the two inline copies (the loop in the current method and
the one in run()) with a single call to self._run_loop() after any
setup/preamble in run(); keep all symbol usages unchanged (step, _add_message,
tracker.timeit, _NonTerminatingError, _TerminatingError) so behavior and logging
remain identical.
src/ares/code_agents/mini_swe_agent_state_test.py (2)

105-110: Consider more precise assertion for message count.

The >= 2 assertion is weak. After restore (2 messages) + query (1 assistant) + termination handling (1 user), the count should be exactly 4. A precise assertion would catch subtle bugs in the message flow.

🧪 Suggested fix
     # State should have been restored before the loop ran
     assert agent._n_calls == 4  # 3 from state + 1 from the query in the loop
-    assert len(agent._messages) >= 2  # At least the restored messages
+    assert len(agent._messages) == 4  # 2 restored + 1 assistant + 1 termination
     # The first two messages should be the restored ones
     assert agent._messages[0]["content"] == "saved message"
     assert agent._messages[1]["content"] == "saved response"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent_state_test.py` around lines 105 - 110,
Replace the weak length check on agent._messages with a precise equality
reflecting the expected flow: after restoring two messages, running the query
(adds one assistant message) and handling termination (adds one user message)
the list should have exactly 4 entries; update the assertion in the test (in
mini_swe_agent_state_test.py where agent._messages is checked) to assert
len(agent._messages) == 4 and adjust the associated comment to reflect the exact
expected count, leaving the other assertions (content checks and agent._n_calls)
unchanged.

63-71: Incomplete immutability test.

This test verifies that field reassignment fails but doesn't cover mutation of the messages list contents. Since CodeAgentState uses a mutable list, state.messages.append(...) would succeed silently.

If the messages field is changed to tuple (as suggested in code_agent_base.py), this test would provide complete coverage.

🧪 Additional test case for list mutation
def test_code_agent_state_messages_mutation():
    """Test that mutating the messages list doesn't affect other state copies."""
    messages = [request.UserMessage(role="user", content="test")]
    state = code_agent_base.CodeAgentState(
        messages=messages,
        n_calls=0,
        total_cost=0.0,
    )
    # This mutation succeeds with list but would fail with tuple
    state.messages.append(request.UserMessage(role="user", content="mutated"))
    assert len(state.messages) == 2  # Demonstrates the mutability issue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent_state_test.py` around lines 63 - 71, The
current test only checks attribute reassignment but not mutation of the messages
list; make the messages field truly immutable by changing
CodeAgentState.messages to use a tuple in code_agent_base.CodeAgentState and
update the test(s): either convert the existing test_code_agent_state_is_frozen
to construct messages as a tuple and assert that attempting to mutate (e.g.,
calling append) raises an exception, or add a new test (e.g.,
test_code_agent_state_messages_mutation) that constructs state with messages as
a tuple and verifies mutation is not possible; reference CodeAgentState and the
messages field when making the change.
src/ares/code_agents/code_agent_base.py (1)

21-31: Consider using tuple for messages to enforce true immutability.

The frozen=True dataclass prevents field reassignment but doesn't prevent mutation of the list contents. Code could still do state.messages.append(...) or state.messages.clear(), breaking the "immutable snapshot" contract.

Since get_state() implementations return deep copies, corruption is unlikely in practice, but using tuple[request.Message, ...] would provide stronger guarantees and make the immutability explicit.

♻️ Suggested change
 `@dataclasses.dataclass`(frozen=True)
 class CodeAgentState:
     """Serializable snapshot of a code agent's conversational state.

     Captures everything needed to resume an agent from where it left off,
     given that its container is in the correct filesystem state.
     """

-    messages: list[request.Message]
+    messages: tuple[request.Message, ...]
     n_calls: int
     total_cost: float

This would require updating get_state() in mini_swe_agent.py to return tuple(copy.deepcopy(self._messages)).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/code_agent_base.py` around lines 21 - 31, Change the
CodeAgentState.messages type from a mutable list to an immutable tuple by
updating the dataclass field to tuple[request.Message, ...] (CodeAgentState and
its messages field) and update any callers that construct states—specifically
modify get_state() in mini_swe_agent.py to return
tuple(copy.deepcopy(self._messages)) instead of a list so the snapshot is truly
immutable; ensure type hints and any places that iterate over messages still
work with a tuple.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/ares/code_agents/code_agent_base.py`:
- Around line 21-31: Change the CodeAgentState.messages type from a mutable list
to an immutable tuple by updating the dataclass field to tuple[request.Message,
...] (CodeAgentState and its messages field) and update any callers that
construct states—specifically modify get_state() in mini_swe_agent.py to return
tuple(copy.deepcopy(self._messages)) instead of a list so the snapshot is truly
immutable; ensure type hints and any places that iterate over messages still
work with a tuple.

In `@src/ares/code_agents/mini_swe_agent_state_test.py`:
- Around line 105-110: Replace the weak length check on agent._messages with a
precise equality reflecting the expected flow: after restoring two messages,
running the query (adds one assistant message) and handling termination (adds
one user message) the list should have exactly 4 entries; update the assertion
in the test (in mini_swe_agent_state_test.py where agent._messages is checked)
to assert len(agent._messages) == 4 and adjust the associated comment to reflect
the exact expected count, leaving the other assertions (content checks and
agent._n_calls) unchanged.
- Around line 63-71: The current test only checks attribute reassignment but not
mutation of the messages list; make the messages field truly immutable by
changing CodeAgentState.messages to use a tuple in
code_agent_base.CodeAgentState and update the test(s): either convert the
existing test_code_agent_state_is_frozen to construct messages as a tuple and
assert that attempting to mutate (e.g., calling append) raises an exception, or
add a new test (e.g., test_code_agent_state_messages_mutation) that constructs
state with messages as a tuple and verifies mutation is not possible; reference
CodeAgentState and the messages field when making the change.

In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 169-179: Extract the duplicated "while True" loop into a private
helper method named _run_loop that contains the exact loop body (with with
self.tracker.timeit("mswea/step"), await self.step(), the except handlers for
_NonTerminatingError and _TerminatingError, calls to self._add_message("user",
repr(e)), and the return on terminating error); then replace the two inline
copies (the loop in the current method and the one in run()) with a single call
to self._run_loop() after any setup/preamble in run(); keep all symbol usages
unchanged (step, _add_message, tracker.timeit, _NonTerminatingError,
_TerminatingError) so behavior and logging remain identical.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cd54aef4-2fa2-4c46-be4b-15b6a5bae682

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 1358d0b.

📒 Files selected for processing (4)
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/mini_swe_agent_state_test.py
  • src/ares/presets.py

@gowthamkishore3
Copy link
Copy Markdown

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/ares/code_agents/mini_swe_agent.py (1)

156-179: Consider extracting the shared step loop into a helper.

restore_and_resume() and run() currently duplicate the same loop/error-handling block; factoring it once reduces drift risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent.py` around lines 156 - 179, The duplicate
while-try-except loop in restore_and_resume and run should be extracted into a
shared helper to avoid drift: create a private method (e.g. _process_loop or
_run_step_loop) that contains the while True loop, the with
self.tracker.timeit("mswea/step") await self.step(), and the two except handlers
for _NonTerminatingError and _TerminatingError, then call that helper from both
restore_and_resume and run (ensuring restore_and_resume still sets
self._messages, self._n_calls, self._total_cost before invoking the helper).
Ensure the helper returns/propagates when a _TerminatingError occurs so both
callers behave identically.
src/ares/presets.py (1)

136-138: Consider logging when a duplicate preset is skipped.

The silent continue makes collisions harder to diagnose during startup; a debug log with preset_name would improve traceability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/presets.py` around lines 136 - 138, When a duplicate preset_name is
detected in the presets loading loop (the block using seen and preset_name), add
a debug-level log that records the skipped preset_name and context instead of
silently continuing; call the module or class logger (e.g., logger.debug or
self.logger.debug) with a message like "Skipping duplicate preset" and include
preset_name so collisions are traceable during startup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 156-179: The duplicate while-try-except loop in restore_and_resume
and run should be extracted into a shared helper to avoid drift: create a
private method (e.g. _process_loop or _run_step_loop) that contains the while
True loop, the with self.tracker.timeit("mswea/step") await self.step(), and the
two except handlers for _NonTerminatingError and _TerminatingError, then call
that helper from both restore_and_resume and run (ensuring restore_and_resume
still sets self._messages, self._n_calls, self._total_cost before invoking the
helper). Ensure the helper returns/propagates when a _TerminatingError occurs so
both callers behave identically.

In `@src/ares/presets.py`:
- Around line 136-138: When a duplicate preset_name is detected in the presets
loading loop (the block using seen and preset_name), add a debug-level log that
records the skipped preset_name and context instead of silently continuing; call
the module or class logger (e.g., logger.debug or self.logger.debug) with a
message like "Skipping duplicate preset" and include preset_name so collisions
are traceable during startup.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1cf00ca7-0f3f-4419-951f-6bb2ce8bf345

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 1358d0b.

📒 Files selected for processing (4)
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/mini_swe_agent_state_test.py
  • src/ares/presets.py

@gowthamkishore3
Copy link
Copy Markdown

@coderabbitaidev full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

@gowthamkishore3 Triggering a full review of the PR now!

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ares/code_agents/code_agent_base.py`:
- Around line 21-31: CodeAgentState currently stores a mutable list in the
messages field so callers can mutate the "snapshot"; change messages to an
immutable sequence (e.g., use tuple[request.Message] or
typing.Tuple[request.Message, ...]) and ensure CodeAgentState is constructed
with an immutable copy (tuple(messages)) so nested mutation is prevented; update
any code that instantiates or consumes CodeAgentState (e.g., MiniSWECodeAgent
state capture/restore) to convert between list and tuple accordingly and keep
the frozen dataclass declaration.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 34552566-0e74-4684-839c-77f0e5c4e48f

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 1358d0b.

📒 Files selected for processing (4)
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/mini_swe_agent_state_test.py
  • src/ares/presets.py

Comment on lines +21 to +31
@dataclasses.dataclass(frozen=True)
class CodeAgentState:
"""Serializable snapshot of a code agent's conversational state.

Captures everything needed to resume an agent from where it left off,
given that its container is in the correct filesystem state.
"""

messages: list[request.Message]
n_calls: int
total_cost: float
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
from dataclasses import dataclass

`@dataclass`(frozen=True)
class S:
    messages: list[dict]

s = S(messages=[{"role": "user", "content": "a"}])
s.messages.append({"role": "assistant", "content": "b"})
print("mutable_len=", len(s.messages))  # demonstrates nested mutability
PY

Repository: withmartian/ares

Length of output: 74


🏁 Script executed:

# Verify file exists and check the exact code
cat -n src/ares/code_agents/code_agent_base.py | head -40

Repository: withmartian/ares

Length of output: 1398


🏁 Script executed:

# Search for all usages of CodeAgentState to understand mutation patterns
rg "CodeAgentState" --type py -A 2 -B 2

Repository: withmartian/ares

Length of output: 3558


🏁 Script executed:

# Check if messages field is mutated anywhere
rg "\.messages\s*=" --type py | head -20

Repository: withmartian/ares

Length of output: 141


🏁 Script executed:

# Check imports to understand request.Message
rg "import.*request\|from.*request" src/ares/code_agents/code_agent_base.py

Repository: withmartian/ares

Length of output: 42


🏁 Script executed:

# Check the immutability test to understand what's being tested
cat -n src/ares/code_agents/mini_swe_agent_state_test.py

Repository: withmartian/ares

Length of output: 4411


🏁 Script executed:

# Check the restore_and_resume implementation to see how messages are used
rg "restore_and_resume" -A 15 src/ares/code_agents/mini_swe_agent.py

Repository: withmartian/ares

Length of output: 712


CodeAgentState is not deeply immutable—callers can mutate state.messages after creation.

The frozen=True decorator only prevents attribute rebinding; it does not freeze nested mutable objects. Line 29 stores a mutable list, so code can call state.messages.append(...) or modify the list in place, violating the "snapshot" contract described in the docstring.

While implementations like mini_swe_agent.MiniSWECodeAgent defensively use copy.deepcopy() when capturing and restoring state, this relies on discipline and doesn't enforce immutability at the type level. The test suite only verifies attribute immutability (rebinding), not nested collection mutability.

Suggested fix
 `@dataclasses.dataclass`(frozen=True)
 class CodeAgentState:
@@
-    messages: list[request.Message]
+    messages: tuple[request.Message, ...]
     n_calls: int
     total_cost: float
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@dataclasses.dataclass(frozen=True)
class CodeAgentState:
"""Serializable snapshot of a code agent's conversational state.
Captures everything needed to resume an agent from where it left off,
given that its container is in the correct filesystem state.
"""
messages: list[request.Message]
n_calls: int
total_cost: float
`@dataclasses.dataclass`(frozen=True)
class CodeAgentState:
"""Serializable snapshot of a code agent's conversational state.
Captures everything needed to resume an agent from where it left off,
given that its container is in the correct filesystem state.
"""
messages: tuple[request.Message, ...]
n_calls: int
total_cost: float
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/code_agent_base.py` around lines 21 - 31, CodeAgentState
currently stores a mutable list in the messages field so callers can mutate the
"snapshot"; change messages to an immutable sequence (e.g., use
tuple[request.Message] or typing.Tuple[request.Message, ...]) and ensure
CodeAgentState is constructed with an immutable copy (tuple(messages)) so nested
mutation is prevented; update any code that instantiates or consumes
CodeAgentState (e.g., MiniSWECodeAgent state capture/restore) to convert between
list and tuple accordingly and keep the frozen dataclass declaration.

@gowthamkishore3799
Copy link
Copy Markdown

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/ares/code_agents/code_agent_base.py (1)

21-31: ⚠️ Potential issue | 🟠 Major

CodeAgentState is still shallowly mutable via messages.

At Line 29, messages is a mutable list, so callers can mutate snapshot contents after creation (append, item edits), which breaks the immutability expectation of a checkpoint state.

Proposed fix
 `@dataclasses.dataclass`(frozen=True)
 class CodeAgentState:
@@
-    messages: list[request.Message]
+    messages: tuple[request.Message, ...]
     n_calls: int
     total_cost: float
#!/bin/bash
# Verify current snapshot field mutability surface and constructors.
rg -n "class CodeAgentState|messages:\s*list\[request\.Message\]|messages:\s*tuple\[request\.Message" src/ares/code_agents/code_agent_base.py
rg -n "CodeAgentState\(" --type py src/ares/code_agents

Expected result: messages is currently declared as list[...], confirming shallow mutability risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/code_agent_base.py` around lines 21 - 31,
CodeAgentState's messages field is declared as a mutable list which breaks the
frozen dataclass guarantee; change the type of messages to an immutable
tuple[request.Message] in CodeAgentState and update all places that construct a
CodeAgentState (search for CodeAgentState(...) and the constructor/site where
messages is passed) to wrap/convert incoming iterables/lists into
tuple(messages) before constructing the dataclass so the snapshot is truly
immutable; also update any type hints/imports and tests that inspect messages
accordingly.
🧹 Nitpick comments (1)
src/ares/code_agents/mini_swe_agent.py (1)

169-179: Consider extracting the shared step loop to one helper.

Line 169 through Line 179 duplicates the error-handling loop in run(). A shared helper would reduce drift risk between normal execution and resumed execution paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent.py` around lines 169 - 179, The
while-try-except step loop that awaits self.step() and handles
_NonTerminatingError/_TerminatingError (logging via _LOGGER.debug, adding
messages with self._add_message, and returning on terminating errors) is
duplicated; extract it into a single helper method (e.g., _run_step_loop or
_execute_steps) that encapsulates the with self.tracker.timeit("mswea/step"):
await self.step() behavior and the two except branches, then call that helper
from run() and any resume path so both use the same error handling and logging
logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/ares/code_agents/code_agent_base.py`:
- Around line 21-31: CodeAgentState's messages field is declared as a mutable
list which breaks the frozen dataclass guarantee; change the type of messages to
an immutable tuple[request.Message] in CodeAgentState and update all places that
construct a CodeAgentState (search for CodeAgentState(...) and the
constructor/site where messages is passed) to wrap/convert incoming
iterables/lists into tuple(messages) before constructing the dataclass so the
snapshot is truly immutable; also update any type hints/imports and tests that
inspect messages accordingly.

---

Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 169-179: The while-try-except step loop that awaits self.step()
and handles _NonTerminatingError/_TerminatingError (logging via _LOGGER.debug,
adding messages with self._add_message, and returning on terminating errors) is
duplicated; extract it into a single helper method (e.g., _run_step_loop or
_execute_steps) that encapsulates the with self.tracker.timeit("mswea/step"):
await self.step() behavior and the two except branches, then call that helper
from run() and any resume path so both use the same error handling and logging
logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d8b7b13c-6710-4b16-a49b-fab31d4d7a82

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 1358d0b.

📒 Files selected for processing (4)
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/mini_swe_agent_state_test.py
  • src/ares/presets.py

@gowthamkishore3799
Copy link
Copy Markdown

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/ares/code_agents/code_agent_base.py (1)

29-31: ⚠️ Potential issue | 🟠 Major

CodeAgentState.messages is mutable despite frozen=True.

On Line 29, messages is a list, so callers can still mutate snapshot contents in place (e.g., append), which breaks the immutable-checkpoint contract.

Proposed fix
 `@dataclasses.dataclass`(frozen=True)
 class CodeAgentState:
@@
-    messages: list[request.Message]
+    messages: tuple[request.Message, ...]
     n_calls: int
     total_cost: float
#!/bin/bash
python - <<'PY'
from dataclasses import dataclass
`@dataclass`(frozen=True)
class S:
    messages: list[int]
s = S([1])
s.messages.append(2)
print("Nested mutation allowed:", s.messages == [1, 2])
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/code_agent_base.py` around lines 29 - 31, CodeAgentState
currently declares messages: list[request.Message] while the dataclass is
frozen, allowing in-place mutations that violate immutability; change the field
to an immutable sequence (e.g., tuple) by updating the annotation to
tuple[request.Message, ...] (or collections.abc.Sequence[request.Message]) and
ensure any construction path (factory, constructor, or __post_init__ for the
frozen dataclass using object.__setattr__) converts incoming iterables/lists to
a tuple; also update any code that consumes CodeAgentState.messages to treat it
as an immutable sequence rather than a list.
🧹 Nitpick comments (2)
src/ares/presets.py (1)

136-138: Consider logging skipped duplicate presets.

The dedupe is correct, but the silent continue makes collisions hard to diagnose in production runs.

Proposed tweak
             if preset_name in seen:
+                _LOGGER.warning("Skipping duplicate default preset '%s'.", preset_name)
                 continue
             seen.add(preset_name)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/presets.py` around lines 136 - 138, The loop that deduplicates
presets currently silently continues when a duplicate is found (checking seen
and preset_name); add a log statement before the continue to record that a
duplicate preset was skipped, including the preset_name and any relevant context
(e.g., source or index) so collisions are discoverable in production; use the
module/class logger (e.g., logger.warning or self.logger.debug depending on
severity) and keep the message concise and structured for easy searching.
src/ares/code_agents/mini_swe_agent.py (1)

169-179: Resume loop duplicates run() loop logic.

This duplicates step/error control flow already present in run() (Lines 214-224), which can drift over time. Consider extracting a shared internal loop helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent.py` around lines 169 - 179, The
while-try/except body in mini_swe_agent (the loop around
self.tracker.timeit("mswea/step"), await self.step(), and handlers for
_NonTerminatingError and _TerminatingError that call self._add_message)
duplicates the same control flow in run(); extract that logic into a single
helper (e.g., a private method like _execute_step_loop or _run_step_once) and
have both run() and the other loop call it so the try/except and message
handling for _NonTerminatingError/_TerminatingError and the tracker.timeit
wrapper live in one place (move the await self.step() call into the helper and
return/propagate termination so run() can stop appropriately).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/ares/code_agents/code_agent_base.py`:
- Around line 29-31: CodeAgentState currently declares messages:
list[request.Message] while the dataclass is frozen, allowing in-place mutations
that violate immutability; change the field to an immutable sequence (e.g.,
tuple) by updating the annotation to tuple[request.Message, ...] (or
collections.abc.Sequence[request.Message]) and ensure any construction path
(factory, constructor, or __post_init__ for the frozen dataclass using
object.__setattr__) converts incoming iterables/lists to a tuple; also update
any code that consumes CodeAgentState.messages to treat it as an immutable
sequence rather than a list.

---

Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 169-179: The while-try/except body in mini_swe_agent (the loop
around self.tracker.timeit("mswea/step"), await self.step(), and handlers for
_NonTerminatingError and _TerminatingError that call self._add_message)
duplicates the same control flow in run(); extract that logic into a single
helper (e.g., a private method like _execute_step_loop or _run_step_once) and
have both run() and the other loop call it so the try/except and message
handling for _NonTerminatingError/_TerminatingError and the tracker.timeit
wrapper live in one place (move the await self.step() call into the helper and
return/propagate termination so run() can stop appropriately).

In `@src/ares/presets.py`:
- Around line 136-138: The loop that deduplicates presets currently silently
continues when a duplicate is found (checking seen and preset_name); add a log
statement before the continue to record that a duplicate preset was skipped,
including the preset_name and any relevant context (e.g., source or index) so
collisions are discoverable in production; use the module/class logger (e.g.,
logger.warning or self.logger.debug depending on severity) and keep the message
concise and structured for easy searching.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7028a305-0b43-4ec8-920c-7ed1c842dec2

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 1358d0b.

📒 Files selected for processing (4)
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/mini_swe_agent_state_test.py
  • src/ares/presets.py

@gowthamkishore3799
Copy link
Copy Markdown

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/ares/presets.py (1)

128-147: LGTM! Duplicate guard correctly prevents ValueError from registry.

The seen set prevents re-registration of presets that map to the same name (e.g., when _make_harbor_dataset_id collapses "swebench-verified" to "sbv"). This avoids the ValueError that registry.register_preset raises for duplicates.

Consider adding a debug log when skipping duplicates for easier troubleshooting:

💡 Optional: Add debug logging for skipped duplicates
             if preset_name in seen:
+                _LOGGER.debug("Skipping duplicate preset registration: %s", preset_name)
                 continue

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/presets.py` around lines 128 - 147, Add a debug log when a duplicate
preset_name is skipped so it's easier to trace why a dataset wasn't registered:
inside the loop that builds preset_name (which uses _make_harbor_dataset_id and
iterates code_env.list_harbor_datasets()), when you hit the "if preset_name in
seen: continue" branch, replace or augment that branch to call the module logger
(or processLogger) to emit a debug message including preset_name, ds_spec.name,
ds_spec.version and code_agent_id before continuing; keep the seen set and
registry.register_preset logic unchanged.
src/ares/code_agents/mini_swe_agent.py (1)

156-179: Consider extracting shared loop logic to reduce duplication.

The while True loop in restore_and_resume() (lines 169-179) is nearly identical to the one in run() (lines 214-224). Extracting this into a private method would reduce duplication and ensure consistent behavior.

💡 Optional refactor to extract shared loop
+    async def _run_step_loop(self) -> None:
+        """Execute the step loop until termination."""
+        while True:
+            try:
+                with self.tracker.timeit("mswea/step"):
+                    await self.step()
+            except _NonTerminatingError as e:
+                _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e)
+                self._add_message("user", repr(e))
+            except _TerminatingError as e:
+                _LOGGER.debug("[%d] Terminating error: %s", id(self), e)
+                self._add_message("user", repr(e))
+                return

     async def restore_and_resume(self, state: code_agent_base.CodeAgentState, task: str) -> None:
         ...
         self._messages = copy.deepcopy(state.messages)
         self._n_calls = state.n_calls
         self._total_cost = state.total_cost
-
-        while True:
-            try:
-                with self.tracker.timeit("mswea/step"):
-                    await self.step()
-            except _NonTerminatingError as e:
-                _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e)
-                self._add_message("user", repr(e))
-            except _TerminatingError as e:
-                _LOGGER.debug("[%d] Terminating error: %s", id(self), e)
-                self._add_message("user", repr(e))
-                return
+        await self._run_step_loop()

Then update run() similarly to call await self._run_step_loop().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent.py` around lines 156 - 179, The
while-True loop in restore_and_resume duplicates the same step/exception
handling logic used in run; extract that loop into a private helper (e.g., async
def _run_step_loop(self):) that contains the with
self.tracker.timeit("mswea/step"): await self.step() call and the two except
blocks for _NonTerminatingError and _TerminatingError (including the same
_LOGGER.debug and self._add_message calls and return on terminating); then
replace the loop in both restore_and_resume and run with a single await
self._run_step_loop() call so behavior is centralized and consistent.
src/ares/code_agents/mini_swe_agent_state_test.py (1)

63-71: Test only verifies attribute immutability, not nested collection immutability.

This test confirms that state.n_calls = 5 raises AttributeError, but doesn't verify that state.messages.append(...) would still succeed (due to the mutable list). Consider adding a test to document this known limitation or to verify that get_state() returns an independent copy that isn't affected by such mutations.

The current behavior is acceptable since get_state() uses copy.deepcopy(), but explicit test coverage would guard against future regressions.

💡 Optional: Add test for nested mutation isolation
def test_get_state_messages_mutation_does_not_affect_original():
    """Test that mutating state.messages doesn't affect original agent."""
    agent = _make_agent()
    agent._messages = [request.UserMessage(role="user", content="original")]
    
    state = agent.get_state()
    
    # Mutate the state's messages (this is allowed due to list mutability)
    state.messages.append(request.UserMessage(role="user", content="mutated"))
    
    # Agent should not be affected
    assert len(agent._messages) == 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/code_agents/mini_swe_agent_state_test.py` around lines 63 - 71, The
existing test test_code_agent_state_is_frozen only checks attribute immutability
on CodeAgentState but not nested collection isolation; add a test (e.g.,
test_get_state_messages_mutation_does_not_affect_original) that creates an agent
via _make_agent, sets agent._messages to a list with one request.UserMessage,
calls agent.get_state(), mutates the returned state's messages
(state.messages.append(...)) and then asserts the original agent._messages
length/content is unchanged; reference CodeAgentState, get_state, _make_agent
and agent._messages to locate where to add the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent_state_test.py`:
- Around line 63-71: The existing test test_code_agent_state_is_frozen only
checks attribute immutability on CodeAgentState but not nested collection
isolation; add a test (e.g.,
test_get_state_messages_mutation_does_not_affect_original) that creates an agent
via _make_agent, sets agent._messages to a list with one request.UserMessage,
calls agent.get_state(), mutates the returned state's messages
(state.messages.append(...)) and then asserts the original agent._messages
length/content is unchanged; reference CodeAgentState, get_state, _make_agent
and agent._messages to locate where to add the test.

In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 156-179: The while-True loop in restore_and_resume duplicates the
same step/exception handling logic used in run; extract that loop into a private
helper (e.g., async def _run_step_loop(self):) that contains the with
self.tracker.timeit("mswea/step"): await self.step() call and the two except
blocks for _NonTerminatingError and _TerminatingError (including the same
_LOGGER.debug and self._add_message calls and return on terminating); then
replace the loop in both restore_and_resume and run with a single await
self._run_step_loop() call so behavior is centralized and consistent.

In `@src/ares/presets.py`:
- Around line 128-147: Add a debug log when a duplicate preset_name is skipped
so it's easier to trace why a dataset wasn't registered: inside the loop that
builds preset_name (which uses _make_harbor_dataset_id and iterates
code_env.list_harbor_datasets()), when you hit the "if preset_name in seen:
continue" branch, replace or augment that branch to call the module logger (or
processLogger) to emit a debug message including preset_name, ds_spec.name,
ds_spec.version and code_agent_id before continuing; keep the seen set and
registry.register_preset logic unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 00683954-09d8-4be6-8a1a-f998768c06a8

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 1358d0b.

📒 Files selected for processing (4)
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/mini_swe_agent_state_test.py
  • src/ares/presets.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants