Add agent state serialization for Go-Explore checkpointing#95
Add agent state serialization for Go-Explore checkpointing#95joshgreaves wants to merge 1 commit intomainfrom
Conversation
📝 WalkthroughWalkthroughAdded an immutable Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as MiniSWECodeAgent
participant LLM as LLM
participant Container as RunContainer
Agent->>Agent: get_state() -> CodeAgentState
Note over Agent: captures messages, n_calls, total_cost
Agent->>Agent: restore_and_resume(state, task)
Agent->>LLM: send resume prompt / step
LLM-->>Agent: response
Agent->>Container: execute/run step
Container-->>Agent: result / termination status
alt not terminated
Agent->>LLM: continue next step
else terminated
Agent->>Agent: finalize and return
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/ares/code_agents/mini_swe_agent.py (1)
156-179: Consider extracting the shared loop logic to reduce duplication.The
while Trueloop inrestore_and_resume()(lines 169-179) duplicates the loop inrun()(lines 214-224). Both handle_NonTerminatingErrorand_TerminatingErroridentically.A minor refactor could extract this into a shared
_run_loop()method, though the current duplication is acceptable given it's only two occurrences.♻️ Optional: Extract shared loop logic
+ async def _run_loop(self) -> None: + """Execute the step loop until termination.""" + while True: + try: + with self.tracker.timeit("mswea/step"): + await self.step() + except _NonTerminatingError as e: + _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e) + self._add_message("user", repr(e)) + except _TerminatingError as e: + _LOGGER.debug("[%d] Terminating error: %s", id(self), e) + self._add_message("user", repr(e)) + return + async def restore_and_resume(self, state: code_agent_base.CodeAgentState, task: str) -> None: # ... self._messages = copy.deepcopy(state.messages) self._n_calls = state.n_calls self._total_cost = state.total_cost - - while True: - try: - with self.tracker.timeit("mswea/step"): - await self.step() - except _NonTerminatingError as e: - _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e) - self._add_message("user", repr(e)) - except _TerminatingError as e: - _LOGGER.debug("[%d] Terminating error: %s", id(self), e) - self._add_message("user", repr(e)) - return + await self._run_loop()Then also call
await self._run_loop()inrun()after the setup preamble.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent.py` around lines 156 - 179, The while-True loop handling _NonTerminatingError and _TerminatingError is duplicated in restore_and_resume and run; extract that shared behavior into a new coroutine method (e.g., async def _run_loop(self):) which contains the with self.tracker.timeit("mswea/step"): await self.step() call and the two except handlers that call self._add_message and return on _TerminatingError, then replace the loop in both restore_and_resume and run with a single await self._run_loop() call so both paths reuse the same logic and maintain identical error handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 156-179: The while-True loop handling _NonTerminatingError and
_TerminatingError is duplicated in restore_and_resume and run; extract that
shared behavior into a new coroutine method (e.g., async def _run_loop(self):)
which contains the with self.tracker.timeit("mswea/step"): await self.step()
call and the two except handlers that call self._add_message and return on
_TerminatingError, then replace the loop in both restore_and_resume and run with
a single await self._run_loop() call so both paths reuse the same logic and
maintain identical error handling.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e9451342-00b5-4c71-bb6f-27360f1b91cc
📒 Files selected for processing (4)
src/ares/code_agents/code_agent_base.pysrc/ares/code_agents/mini_swe_agent.pysrc/ares/code_agents/mini_swe_agent_state_test.pysrc/ares/presets.py
| def __post_init__(self): | ||
| config_path = pathlib.Path(minisweagent.config.builtin_config_dir) / "extra" / "swebench.yaml" | ||
| config_path = pathlib.Path(minisweagent.config.builtin_config_dir) / "benchmarks" / "swebench.yaml" | ||
| self._config = yaml.safe_load(config_path.read_text()) |
There was a problem hiding this comment.
Agent raises FileNotFoundError at startup because MiniSWECodeAgent.post_init hard-codes <builtin_config_dir>/benchmarks/swebench.yaml while the package ships the config at config/extra/swebench.yaml (https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/config/extra/swebench.yaml), so config_path.read_text() fails. Can we load config/extra/swebench.yaml from the package or create benchmarks/swebench.yaml in the builtin config dir so the built-in config is found?
Finding type: Breaking Changes | Severity: 🔴 High
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
In src/ares/code_agents/mini_swe_agent.py around lines 120-122, the __post_init__ method
now hard-codes config_path to builtin_config_dir / "benchmarks" / "swebench.yaml", which
breaks systems that ship the config under config/extra/swebench.yaml. Change this to
first try builtin_config_dir / "benchmarks" / "swebench.yaml" but if that file does not
exist, fall back to builtin_config_dir / "extra" / "swebench.yaml" (or vice-versa to
preserve original behavior); read the first existing file and raise a clear
FileNotFoundError only if neither exists. Ensure the change keeps existing behavior for
environment config and logs which path was used.
| self._messages = copy.deepcopy(state.messages) | ||
| self._n_calls = state.n_calls | ||
| self._total_cost = state.total_cost | ||
|
|
||
| while True: |
There was a problem hiding this comment.
restore_and_resume() copies state.messages into _messages and re-enters the step loop without executing the preamble run() appends, so a checkpoint taken before that preamble resumes with an empty conversation and the next query() lacks the instance/system task instructions. Can we ensure saved state always includes the preamble or re-run it before resuming?
Finding type: Logical Bugs | Severity: 🔴 High
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
In src/ares/code_agents/mini_swe_agent.py around lines 165-169, the async method
restore_and_resume currently deletes the task and blindly copies state.messages into
self._messages, which can resume without the run() preamble (the instance_template user
message). Change restore_and_resume to: do NOT delete the task parameter; after
restoring state fields, detect whether the preamble/instance_template message is missing
(e.g., check for a message based on self._agent_config['instance_template'] or for an
initial user message in self._messages). If it is missing, re-run the same setup
preamble used in run() — call the container to collect system, release, version,
machine, render the instance_template with the provided task and append it via
self._add_message — then continue into the existing step loop. Make the change only in
restore_and_resume (and optionally ensure get_state captures the preamble before
snapshotting) so resumed runs always have the required task/system context.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/ares/code_agents/mini_swe_agent_state_test.py (1)
74-110: Consider strengthening the message count assertion.The test correctly validates the restore and resume workflow. However, line 107's assertion
len(agent._messages) >= 2is weak. Given the mock returns one response that triggers termination, the expected message count should be deterministic (restored messages + assistant response + terminating error message).💡 More precise assertion
# State should have been restored before the loop ran assert agent._n_calls == 4 # 3 from state + 1 from the query in the loop - assert len(agent._messages) >= 2 # At least the restored messages + # 2 restored + 1 assistant response + 1 terminating error = 4 messages + assert len(agent._messages) == 4 # The first two messages should be the restored ones assert agent._messages[0]["content"] == "saved message" assert agent._messages[1]["content"] == "saved response"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent_state_test.py` around lines 74 - 110, The len(agent._messages) assertion is too weak; update the test_restore_and_resume_sets_state test to assert the deterministic expected count after restore_and_resume: restored messages (2) + the assistant response produced by MockLLMClient + the terminating error message from the container, so expect exactly 4 messages; keep the other assertions and optionally assert the contents of agent._messages[2] and agent._messages[3] to match the assistant response and the terminating output; reference agent.restore_and_resume, agent._messages, and agent._n_calls when making the change.src/ares/presets.py (1)
128-147: Consider logging when duplicate presets are skipped.The deduplication logic correctly prevents
ValueErrorfromregistry.register_preset. However, silently skipping duplicates could make debugging harder if duplicates appear unexpectedly (e.g., due tolist_harbor_datasets()returning duplicates or_make_harbor_dataset_id()collisions).💡 Suggested improvement: Add debug logging for skipped duplicates
preset_name = f"{ds_id}-{code_agent_id}" if preset_name in seen: + _LOGGER.debug("Skipping duplicate preset: %s", preset_name) continue seen.add(preset_name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/presets.py` around lines 128 - 147, The loop deduplicates presets (seen, code_env.list_harbor_datasets(), _make_harbor_dataset_id, preset_name) but silently skips duplicates; add a debug log immediately before the continue to record that a duplicate preset was skipped and include identifying info (preset_name, ds_spec.name, ds_spec.version, code_agent_id, ds_id) so it's easy to trace whether duplicates come from list_harbor_datasets() or id collisions; use the module logger (logger.debug(...)) or obtain one via logging.getLogger(__name__) if not present, keeping the rest of the register_preset flow unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 130-136: The code builds an AgentConfig instance (agent_config)
from agent_config_dict but never uses it; either remove the dead construction or
replace the dict-based config with the typed instance. Fix by either deleting
the lines that create agent_config
(agent_config_dict/accepted_fields/filtered_config/agent_config) if not needed,
or assign the created object to the instance (self._agent_config =
default_agent.AgentConfig(**filtered_config) and remove the noqa) and update any
code that reads self._agent_config to expect the AgentConfig type; reference
symbols: agent_config_dict, filtered_config, default_agent.AgentConfig, and
self._agent_config.
---
Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent_state_test.py`:
- Around line 74-110: The len(agent._messages) assertion is too weak; update the
test_restore_and_resume_sets_state test to assert the deterministic expected
count after restore_and_resume: restored messages (2) + the assistant response
produced by MockLLMClient + the terminating error message from the container, so
expect exactly 4 messages; keep the other assertions and optionally assert the
contents of agent._messages[2] and agent._messages[3] to match the assistant
response and the terminating output; reference agent.restore_and_resume,
agent._messages, and agent._n_calls when making the change.
In `@src/ares/presets.py`:
- Around line 128-147: The loop deduplicates presets (seen,
code_env.list_harbor_datasets(), _make_harbor_dataset_id, preset_name) but
silently skips duplicates; add a debug log immediately before the continue to
record that a duplicate preset was skipped and include identifying info
(preset_name, ds_spec.name, ds_spec.version, code_agent_id, ds_id) so it's easy
to trace whether duplicates come from list_harbor_datasets() or id collisions;
use the module logger (logger.debug(...)) or obtain one via
logging.getLogger(__name__) if not present, keeping the rest of the
register_preset flow unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9dbab72d-42d6-4dc2-af1a-565d79d70adb
📒 Files selected for processing (4)
src/ares/code_agents/code_agent_base.pysrc/ares/code_agents/mini_swe_agent.pysrc/ares/code_agents/mini_swe_agent_state_test.pysrc/ares/presets.py
| # We handle this by filtering to only fields that AgentConfig accepts. | ||
| agent_config_dict = self._config.get("agent", {}) | ||
| agent_config = default_agent.AgentConfig() | ||
| for k, v in agent_config_dict.items(): | ||
| if hasattr(default_agent.AgentConfig, k): | ||
| setattr(agent_config, k, v) | ||
| else: | ||
| _LOGGER.info("Ignoring argument %s in agent configuration.", k) | ||
| accepted_fields = set(default_agent.AgentConfig.model_fields) | ||
| filtered_config = {k: v for k, v in agent_config_dict.items() if k in accepted_fields} | ||
| for k in set(agent_config_dict) - set(filtered_config): | ||
| _LOGGER.info("Ignoring argument %s in agent configuration.", k) | ||
| agent_config = default_agent.AgentConfig(**filtered_config) # noqa: F841 |
There was a problem hiding this comment.
Remove unused agent_config variable or use it.
The agent_config variable is created but never used. The # noqa: F841 comment suppresses the linter warning but doesn't address the underlying issue. If this variable was intended to replace the self._agent_config dict usage elsewhere, the migration appears incomplete. If it's not needed, consider removing the dead code.
🧹 Option 1: Remove if not needed
accepted_fields = set(default_agent.AgentConfig.model_fields)
filtered_config = {k: v for k, v in agent_config_dict.items() if k in accepted_fields}
for k in set(agent_config_dict) - set(filtered_config):
_LOGGER.info("Ignoring argument %s in agent configuration.", k)
- agent_config = default_agent.AgentConfig(**filtered_config) # noqa: F841📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # We handle this by filtering to only fields that AgentConfig accepts. | |
| agent_config_dict = self._config.get("agent", {}) | |
| agent_config = default_agent.AgentConfig() | |
| for k, v in agent_config_dict.items(): | |
| if hasattr(default_agent.AgentConfig, k): | |
| setattr(agent_config, k, v) | |
| else: | |
| _LOGGER.info("Ignoring argument %s in agent configuration.", k) | |
| accepted_fields = set(default_agent.AgentConfig.model_fields) | |
| filtered_config = {k: v for k, v in agent_config_dict.items() if k in accepted_fields} | |
| for k in set(agent_config_dict) - set(filtered_config): | |
| _LOGGER.info("Ignoring argument %s in agent configuration.", k) | |
| agent_config = default_agent.AgentConfig(**filtered_config) # noqa: F841 | |
| # We handle this by filtering to only fields that AgentConfig accepts. | |
| agent_config_dict = self._config.get("agent", {}) | |
| accepted_fields = set(default_agent.AgentConfig.model_fields) | |
| filtered_config = {k: v for k, v in agent_config_dict.items() if k in accepted_fields} | |
| for k in set(agent_config_dict) - set(filtered_config): | |
| _LOGGER.info("Ignoring argument %s in agent configuration.", k) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ares/code_agents/mini_swe_agent.py` around lines 130 - 136, The code
builds an AgentConfig instance (agent_config) from agent_config_dict but never
uses it; either remove the dead construction or replace the dict-based config
with the typed instance. Fix by either deleting the lines that create
agent_config (agent_config_dict/accepted_fields/filtered_config/agent_config) if
not needed, or assign the created object to the instance (self._agent_config =
default_agent.AgentConfig(**filtered_config) and remove the noqa) and update any
code that reads self._agent_config to expect the AgentConfig type; reference
symbols: agent_config_dict, filtered_config, default_agent.AgentConfig, and
self._agent_config.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/ares/code_agents/mini_swe_agent.py (1)
156-179: LGTM with optional refactor: consider extracting the step loop.The restore logic correctly deep-copies the state to prevent mutations. The
while Trueloop (lines 169-179) duplicates the pattern inrun()(lines 214-224). Consider extracting to a shared helper like_run_step_loop()to reduce duplication.♻️ Optional refactor to extract step loop
+ async def _run_step_loop(self) -> None: + """Run the step/execute loop until termination.""" + while True: + try: + with self.tracker.timeit("mswea/step"): + await self.step() + except _NonTerminatingError as e: + _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e) + self._add_message("user", repr(e)) + except _TerminatingError as e: + _LOGGER.debug("[%d] Terminating error: %s", id(self), e) + self._add_message("user", repr(e)) + return + async def restore_and_resume(self, state: code_agent_base.CodeAgentState, task: str) -> None: ... self._messages = copy.deepcopy(state.messages) self._n_calls = state.n_calls self._total_cost = state.total_cost - - while True: - try: - with self.tracker.timeit("mswea/step"): - await self.step() - except _NonTerminatingError as e: - _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e) - self._add_message("user", repr(e)) - except _TerminatingError as e: - _LOGGER.debug("[%d] Terminating error: %s", id(self), e) - self._add_message("user", repr(e)) - return + await self._run_step_loop()Then update
run()similarly to callawait self._run_step_loop()after setup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent.py` around lines 156 - 179, The while True loop in restore_and_resume duplicates the step loop in run; extract that repeated logic into a helper (e.g. _run_step_loop) that encapsulates the try/except around await self.step(), the tracker.timeit("mswea/step") context, and handling of _NonTerminatingError and _TerminatingError by calling self._add_message and returning on termination; replace the loop in restore_and_resume with await self._run_step_loop() and update run() to call the same helper so both use the shared implementation.src/ares/code_agents/code_agent_base.py (1)
21-31: Good design. Note:messageslist remains mutable despite frozen dataclass.The frozen dataclass prevents attribute reassignment but doesn't prevent
state.messages.append(...)orstate.messages[0]["content"] = "x". This is acceptable sinceget_state()implementations deep-copy before construction, but callers could still mutate the snapshot.If stricter immutability is desired, consider using
tuple[request.Message, ...]instead:💡 Optional: use tuple for true immutability
`@dataclasses.dataclass`(frozen=True) class CodeAgentState: - messages: list[request.Message] + messages: tuple[request.Message, ...] n_calls: int total_cost: floatThen in
MiniSWECodeAgent.get_state():- messages=copy.deepcopy(self._messages), + messages=tuple(copy.deepcopy(self._messages)),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/code_agent_base.py` around lines 21 - 31, The CodeAgentState currently defines messages as a mutable list which can be mutated despite the dataclass being frozen; change the field type from list[request.Message] to tuple[request.Message, ...] in CodeAgentState and update any state construction sites (e.g., MiniSWECodeAgent.get_state()) to convert/copy the messages into an immutable tuple when creating the CodeAgentState instance so the snapshot cannot be mutated after creation.src/ares/code_agents/mini_swe_agent_state_test.py (1)
74-110: Good integration test. Consider tightening the message count assertion.The test correctly validates restore+resume flow. The assertion
len(agent._messages) >= 2(line 107) is intentionally loose but could be more precise. After one loop iteration that terminates via_SubmittedError, the agent should have exactly 4 messages: 2 restored + 1 assistant response + 1 user message (the terminating error repr).💡 Optional: more specific assertion
- assert len(agent._messages) >= 2 # At least the restored messages + # 2 restored + 1 assistant (LLM response) + 1 user (_SubmittedError repr) + assert len(agent._messages) == 4🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent_state_test.py` around lines 74 - 110, The loose assertion len(agent._messages) >= 2 in test_restore_and_resume_sets_state should be tightened: after calling restore_and_resume the agent should have exactly 4 messages (2 restored + 1 assistant response + 1 user submission/error), so replace the length check with assert len(agent._messages) == 4 and add assertions that agent._messages[2]["role"] or content matches the assistant response and agent._messages[3] matches the terminating user/submission message (use restore_and_resume, agent._messages, and agent._n_calls to locate and validate these values).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/ares/code_agents/code_agent_base.py`:
- Around line 21-31: The CodeAgentState currently defines messages as a mutable
list which can be mutated despite the dataclass being frozen; change the field
type from list[request.Message] to tuple[request.Message, ...] in CodeAgentState
and update any state construction sites (e.g., MiniSWECodeAgent.get_state()) to
convert/copy the messages into an immutable tuple when creating the
CodeAgentState instance so the snapshot cannot be mutated after creation.
In `@src/ares/code_agents/mini_swe_agent_state_test.py`:
- Around line 74-110: The loose assertion len(agent._messages) >= 2 in
test_restore_and_resume_sets_state should be tightened: after calling
restore_and_resume the agent should have exactly 4 messages (2 restored + 1
assistant response + 1 user submission/error), so replace the length check with
assert len(agent._messages) == 4 and add assertions that
agent._messages[2]["role"] or content matches the assistant response and
agent._messages[3] matches the terminating user/submission message (use
restore_and_resume, agent._messages, and agent._n_calls to locate and validate
these values).
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 156-179: The while True loop in restore_and_resume duplicates the
step loop in run; extract that repeated logic into a helper (e.g.
_run_step_loop) that encapsulates the try/except around await self.step(), the
tracker.timeit("mswea/step") context, and handling of _NonTerminatingError and
_TerminatingError by calling self._add_message and returning on termination;
replace the loop in restore_and_resume with await self._run_step_loop() and
update run() to call the same helper so both use the shared implementation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 07dad877-7fc7-48ad-8b12-2b6d5edc4da8
📒 Files selected for processing (4)
src/ares/code_agents/code_agent_base.pysrc/ares/code_agents/mini_swe_agent.pysrc/ares/code_agents/mini_swe_agent_state_test.pysrc/ares/presets.py
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
src/ares/code_agents/mini_swe_agent.py (1)
169-179: Extract duplicated loop into a helper method.The
while Trueloop here (lines 169-179) is identical to the loop inrun()(lines 214-224). Consider extracting to a private_run_loop()method to reduce duplication.♻️ Suggested refactor
+ async def _run_loop(self) -> None: + """Execute the step/query loop until termination.""" + while True: + try: + with self.tracker.timeit("mswea/step"): + await self.step() + except _NonTerminatingError as e: + _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e) + self._add_message("user", repr(e)) + except _TerminatingError as e: + _LOGGER.debug("[%d] Terminating error: %s", id(self), e) + self._add_message("user", repr(e)) + return + async def restore_and_resume(self, state: code_agent_base.CodeAgentState, task: str) -> None: ... self._messages = copy.deepcopy(state.messages) self._n_calls = state.n_calls self._total_cost = state.total_cost - - while True: - try: - with self.tracker.timeit("mswea/step"): - await self.step() - except _NonTerminatingError as e: - _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e) - self._add_message("user", repr(e)) - except _TerminatingError as e: - _LOGGER.debug("[%d] Terminating error: %s", id(self), e) - self._add_message("user", repr(e)) - return + await self._run_loop()Apply the same change to
run()after the setup preamble.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent.py` around lines 169 - 179, Extract the duplicated "while True" loop into a private helper method named _run_loop that contains the exact loop body (with with self.tracker.timeit("mswea/step"), await self.step(), the except handlers for _NonTerminatingError and _TerminatingError, calls to self._add_message("user", repr(e)), and the return on terminating error); then replace the two inline copies (the loop in the current method and the one in run()) with a single call to self._run_loop() after any setup/preamble in run(); keep all symbol usages unchanged (step, _add_message, tracker.timeit, _NonTerminatingError, _TerminatingError) so behavior and logging remain identical.src/ares/code_agents/mini_swe_agent_state_test.py (2)
105-110: Consider more precise assertion for message count.The
>= 2assertion is weak. After restore (2 messages) + query (1 assistant) + termination handling (1 user), the count should be exactly 4. A precise assertion would catch subtle bugs in the message flow.🧪 Suggested fix
# State should have been restored before the loop ran assert agent._n_calls == 4 # 3 from state + 1 from the query in the loop - assert len(agent._messages) >= 2 # At least the restored messages + assert len(agent._messages) == 4 # 2 restored + 1 assistant + 1 termination # The first two messages should be the restored ones assert agent._messages[0]["content"] == "saved message" assert agent._messages[1]["content"] == "saved response"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent_state_test.py` around lines 105 - 110, Replace the weak length check on agent._messages with a precise equality reflecting the expected flow: after restoring two messages, running the query (adds one assistant message) and handling termination (adds one user message) the list should have exactly 4 entries; update the assertion in the test (in mini_swe_agent_state_test.py where agent._messages is checked) to assert len(agent._messages) == 4 and adjust the associated comment to reflect the exact expected count, leaving the other assertions (content checks and agent._n_calls) unchanged.
63-71: Incomplete immutability test.This test verifies that field reassignment fails but doesn't cover mutation of the
messageslist contents. SinceCodeAgentStateuses a mutablelist,state.messages.append(...)would succeed silently.If the
messagesfield is changed totuple(as suggested in code_agent_base.py), this test would provide complete coverage.🧪 Additional test case for list mutation
def test_code_agent_state_messages_mutation(): """Test that mutating the messages list doesn't affect other state copies.""" messages = [request.UserMessage(role="user", content="test")] state = code_agent_base.CodeAgentState( messages=messages, n_calls=0, total_cost=0.0, ) # This mutation succeeds with list but would fail with tuple state.messages.append(request.UserMessage(role="user", content="mutated")) assert len(state.messages) == 2 # Demonstrates the mutability issue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent_state_test.py` around lines 63 - 71, The current test only checks attribute reassignment but not mutation of the messages list; make the messages field truly immutable by changing CodeAgentState.messages to use a tuple in code_agent_base.CodeAgentState and update the test(s): either convert the existing test_code_agent_state_is_frozen to construct messages as a tuple and assert that attempting to mutate (e.g., calling append) raises an exception, or add a new test (e.g., test_code_agent_state_messages_mutation) that constructs state with messages as a tuple and verifies mutation is not possible; reference CodeAgentState and the messages field when making the change.src/ares/code_agents/code_agent_base.py (1)
21-31: Consider usingtupleformessagesto enforce true immutability.The
frozen=Truedataclass prevents field reassignment but doesn't prevent mutation of the list contents. Code could still dostate.messages.append(...)orstate.messages.clear(), breaking the "immutable snapshot" contract.Since
get_state()implementations return deep copies, corruption is unlikely in practice, but usingtuple[request.Message, ...]would provide stronger guarantees and make the immutability explicit.♻️ Suggested change
`@dataclasses.dataclass`(frozen=True) class CodeAgentState: """Serializable snapshot of a code agent's conversational state. Captures everything needed to resume an agent from where it left off, given that its container is in the correct filesystem state. """ - messages: list[request.Message] + messages: tuple[request.Message, ...] n_calls: int total_cost: floatThis would require updating
get_state()inmini_swe_agent.pyto returntuple(copy.deepcopy(self._messages)).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/code_agent_base.py` around lines 21 - 31, Change the CodeAgentState.messages type from a mutable list to an immutable tuple by updating the dataclass field to tuple[request.Message, ...] (CodeAgentState and its messages field) and update any callers that construct states—specifically modify get_state() in mini_swe_agent.py to return tuple(copy.deepcopy(self._messages)) instead of a list so the snapshot is truly immutable; ensure type hints and any places that iterate over messages still work with a tuple.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/ares/code_agents/code_agent_base.py`:
- Around line 21-31: Change the CodeAgentState.messages type from a mutable list
to an immutable tuple by updating the dataclass field to tuple[request.Message,
...] (CodeAgentState and its messages field) and update any callers that
construct states—specifically modify get_state() in mini_swe_agent.py to return
tuple(copy.deepcopy(self._messages)) instead of a list so the snapshot is truly
immutable; ensure type hints and any places that iterate over messages still
work with a tuple.
In `@src/ares/code_agents/mini_swe_agent_state_test.py`:
- Around line 105-110: Replace the weak length check on agent._messages with a
precise equality reflecting the expected flow: after restoring two messages,
running the query (adds one assistant message) and handling termination (adds
one user message) the list should have exactly 4 entries; update the assertion
in the test (in mini_swe_agent_state_test.py where agent._messages is checked)
to assert len(agent._messages) == 4 and adjust the associated comment to reflect
the exact expected count, leaving the other assertions (content checks and
agent._n_calls) unchanged.
- Around line 63-71: The current test only checks attribute reassignment but not
mutation of the messages list; make the messages field truly immutable by
changing CodeAgentState.messages to use a tuple in
code_agent_base.CodeAgentState and update the test(s): either convert the
existing test_code_agent_state_is_frozen to construct messages as a tuple and
assert that attempting to mutate (e.g., calling append) raises an exception, or
add a new test (e.g., test_code_agent_state_messages_mutation) that constructs
state with messages as a tuple and verifies mutation is not possible; reference
CodeAgentState and the messages field when making the change.
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 169-179: Extract the duplicated "while True" loop into a private
helper method named _run_loop that contains the exact loop body (with with
self.tracker.timeit("mswea/step"), await self.step(), the except handlers for
_NonTerminatingError and _TerminatingError, calls to self._add_message("user",
repr(e)), and the return on terminating error); then replace the two inline
copies (the loop in the current method and the one in run()) with a single call
to self._run_loop() after any setup/preamble in run(); keep all symbol usages
unchanged (step, _add_message, tracker.timeit, _NonTerminatingError,
_TerminatingError) so behavior and logging remain identical.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cd54aef4-2fa2-4c46-be4b-15b6a5bae682
📒 Files selected for processing (4)
src/ares/code_agents/code_agent_base.pysrc/ares/code_agents/mini_swe_agent.pysrc/ares/code_agents/mini_swe_agent_state_test.pysrc/ares/presets.py
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/ares/code_agents/mini_swe_agent.py (1)
156-179: Consider extracting the shared step loop into a helper.
restore_and_resume()andrun()currently duplicate the same loop/error-handling block; factoring it once reduces drift risk.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent.py` around lines 156 - 179, The duplicate while-try-except loop in restore_and_resume and run should be extracted into a shared helper to avoid drift: create a private method (e.g. _process_loop or _run_step_loop) that contains the while True loop, the with self.tracker.timeit("mswea/step") await self.step(), and the two except handlers for _NonTerminatingError and _TerminatingError, then call that helper from both restore_and_resume and run (ensuring restore_and_resume still sets self._messages, self._n_calls, self._total_cost before invoking the helper). Ensure the helper returns/propagates when a _TerminatingError occurs so both callers behave identically.src/ares/presets.py (1)
136-138: Consider logging when a duplicate preset is skipped.The silent
continuemakes collisions harder to diagnose during startup; a debug log withpreset_namewould improve traceability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/presets.py` around lines 136 - 138, When a duplicate preset_name is detected in the presets loading loop (the block using seen and preset_name), add a debug-level log that records the skipped preset_name and context instead of silently continuing; call the module or class logger (e.g., logger.debug or self.logger.debug) with a message like "Skipping duplicate preset" and include preset_name so collisions are traceable during startup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 156-179: The duplicate while-try-except loop in restore_and_resume
and run should be extracted into a shared helper to avoid drift: create a
private method (e.g. _process_loop or _run_step_loop) that contains the while
True loop, the with self.tracker.timeit("mswea/step") await self.step(), and the
two except handlers for _NonTerminatingError and _TerminatingError, then call
that helper from both restore_and_resume and run (ensuring restore_and_resume
still sets self._messages, self._n_calls, self._total_cost before invoking the
helper). Ensure the helper returns/propagates when a _TerminatingError occurs so
both callers behave identically.
In `@src/ares/presets.py`:
- Around line 136-138: When a duplicate preset_name is detected in the presets
loading loop (the block using seen and preset_name), add a debug-level log that
records the skipped preset_name and context instead of silently continuing; call
the module or class logger (e.g., logger.debug or self.logger.debug) with a
message like "Skipping duplicate preset" and include preset_name so collisions
are traceable during startup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1cf00ca7-0f3f-4419-951f-6bb2ce8bf345
📒 Files selected for processing (4)
src/ares/code_agents/code_agent_base.pysrc/ares/code_agents/mini_swe_agent.pysrc/ares/code_agents/mini_swe_agent_state_test.pysrc/ares/presets.py
|
@coderabbitaidev full review |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ares/code_agents/code_agent_base.py`:
- Around line 21-31: CodeAgentState currently stores a mutable list in the
messages field so callers can mutate the "snapshot"; change messages to an
immutable sequence (e.g., use tuple[request.Message] or
typing.Tuple[request.Message, ...]) and ensure CodeAgentState is constructed
with an immutable copy (tuple(messages)) so nested mutation is prevented; update
any code that instantiates or consumes CodeAgentState (e.g., MiniSWECodeAgent
state capture/restore) to convert between list and tuple accordingly and keep
the frozen dataclass declaration.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 34552566-0e74-4684-839c-77f0e5c4e48f
📒 Files selected for processing (4)
src/ares/code_agents/code_agent_base.pysrc/ares/code_agents/mini_swe_agent.pysrc/ares/code_agents/mini_swe_agent_state_test.pysrc/ares/presets.py
| @dataclasses.dataclass(frozen=True) | ||
| class CodeAgentState: | ||
| """Serializable snapshot of a code agent's conversational state. | ||
|
|
||
| Captures everything needed to resume an agent from where it left off, | ||
| given that its container is in the correct filesystem state. | ||
| """ | ||
|
|
||
| messages: list[request.Message] | ||
| n_calls: int | ||
| total_cost: float |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from dataclasses import dataclass
`@dataclass`(frozen=True)
class S:
messages: list[dict]
s = S(messages=[{"role": "user", "content": "a"}])
s.messages.append({"role": "assistant", "content": "b"})
print("mutable_len=", len(s.messages)) # demonstrates nested mutability
PYRepository: withmartian/ares
Length of output: 74
🏁 Script executed:
# Verify file exists and check the exact code
cat -n src/ares/code_agents/code_agent_base.py | head -40Repository: withmartian/ares
Length of output: 1398
🏁 Script executed:
# Search for all usages of CodeAgentState to understand mutation patterns
rg "CodeAgentState" --type py -A 2 -B 2Repository: withmartian/ares
Length of output: 3558
🏁 Script executed:
# Check if messages field is mutated anywhere
rg "\.messages\s*=" --type py | head -20Repository: withmartian/ares
Length of output: 141
🏁 Script executed:
# Check imports to understand request.Message
rg "import.*request\|from.*request" src/ares/code_agents/code_agent_base.pyRepository: withmartian/ares
Length of output: 42
🏁 Script executed:
# Check the immutability test to understand what's being tested
cat -n src/ares/code_agents/mini_swe_agent_state_test.pyRepository: withmartian/ares
Length of output: 4411
🏁 Script executed:
# Check the restore_and_resume implementation to see how messages are used
rg "restore_and_resume" -A 15 src/ares/code_agents/mini_swe_agent.pyRepository: withmartian/ares
Length of output: 712
CodeAgentState is not deeply immutable—callers can mutate state.messages after creation.
The frozen=True decorator only prevents attribute rebinding; it does not freeze nested mutable objects. Line 29 stores a mutable list, so code can call state.messages.append(...) or modify the list in place, violating the "snapshot" contract described in the docstring.
While implementations like mini_swe_agent.MiniSWECodeAgent defensively use copy.deepcopy() when capturing and restoring state, this relies on discipline and doesn't enforce immutability at the type level. The test suite only verifies attribute immutability (rebinding), not nested collection mutability.
Suggested fix
`@dataclasses.dataclass`(frozen=True)
class CodeAgentState:
@@
- messages: list[request.Message]
+ messages: tuple[request.Message, ...]
n_calls: int
total_cost: float📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @dataclasses.dataclass(frozen=True) | |
| class CodeAgentState: | |
| """Serializable snapshot of a code agent's conversational state. | |
| Captures everything needed to resume an agent from where it left off, | |
| given that its container is in the correct filesystem state. | |
| """ | |
| messages: list[request.Message] | |
| n_calls: int | |
| total_cost: float | |
| `@dataclasses.dataclass`(frozen=True) | |
| class CodeAgentState: | |
| """Serializable snapshot of a code agent's conversational state. | |
| Captures everything needed to resume an agent from where it left off, | |
| given that its container is in the correct filesystem state. | |
| """ | |
| messages: tuple[request.Message, ...] | |
| n_calls: int | |
| total_cost: float |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ares/code_agents/code_agent_base.py` around lines 21 - 31, CodeAgentState
currently stores a mutable list in the messages field so callers can mutate the
"snapshot"; change messages to an immutable sequence (e.g., use
tuple[request.Message] or typing.Tuple[request.Message, ...]) and ensure
CodeAgentState is constructed with an immutable copy (tuple(messages)) so nested
mutation is prevented; update any code that instantiates or consumes
CodeAgentState (e.g., MiniSWECodeAgent state capture/restore) to convert between
list and tuple accordingly and keep the frozen dataclass declaration.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/ares/code_agents/code_agent_base.py (1)
21-31:⚠️ Potential issue | 🟠 Major
CodeAgentStateis still shallowly mutable viamessages.At Line 29,
messagesis a mutablelist, so callers can mutate snapshot contents after creation (append, item edits), which breaks the immutability expectation of a checkpoint state.Proposed fix
`@dataclasses.dataclass`(frozen=True) class CodeAgentState: @@ - messages: list[request.Message] + messages: tuple[request.Message, ...] n_calls: int total_cost: float#!/bin/bash # Verify current snapshot field mutability surface and constructors. rg -n "class CodeAgentState|messages:\s*list\[request\.Message\]|messages:\s*tuple\[request\.Message" src/ares/code_agents/code_agent_base.py rg -n "CodeAgentState\(" --type py src/ares/code_agentsExpected result:
messagesis currently declared aslist[...], confirming shallow mutability risk.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/code_agent_base.py` around lines 21 - 31, CodeAgentState's messages field is declared as a mutable list which breaks the frozen dataclass guarantee; change the type of messages to an immutable tuple[request.Message] in CodeAgentState and update all places that construct a CodeAgentState (search for CodeAgentState(...) and the constructor/site where messages is passed) to wrap/convert incoming iterables/lists into tuple(messages) before constructing the dataclass so the snapshot is truly immutable; also update any type hints/imports and tests that inspect messages accordingly.
🧹 Nitpick comments (1)
src/ares/code_agents/mini_swe_agent.py (1)
169-179: Consider extracting the shared step loop to one helper.Line 169 through Line 179 duplicates the error-handling loop in
run(). A shared helper would reduce drift risk between normal execution and resumed execution paths.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent.py` around lines 169 - 179, The while-try-except step loop that awaits self.step() and handles _NonTerminatingError/_TerminatingError (logging via _LOGGER.debug, adding messages with self._add_message, and returning on terminating errors) is duplicated; extract it into a single helper method (e.g., _run_step_loop or _execute_steps) that encapsulates the with self.tracker.timeit("mswea/step"): await self.step() behavior and the two except branches, then call that helper from run() and any resume path so both use the same error handling and logging logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/ares/code_agents/code_agent_base.py`:
- Around line 21-31: CodeAgentState's messages field is declared as a mutable
list which breaks the frozen dataclass guarantee; change the type of messages to
an immutable tuple[request.Message] in CodeAgentState and update all places that
construct a CodeAgentState (search for CodeAgentState(...) and the
constructor/site where messages is passed) to wrap/convert incoming
iterables/lists into tuple(messages) before constructing the dataclass so the
snapshot is truly immutable; also update any type hints/imports and tests that
inspect messages accordingly.
---
Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 169-179: The while-try-except step loop that awaits self.step()
and handles _NonTerminatingError/_TerminatingError (logging via _LOGGER.debug,
adding messages with self._add_message, and returning on terminating errors) is
duplicated; extract it into a single helper method (e.g., _run_step_loop or
_execute_steps) that encapsulates the with self.tracker.timeit("mswea/step"):
await self.step() behavior and the two except branches, then call that helper
from run() and any resume path so both use the same error handling and logging
logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d8b7b13c-6710-4b16-a49b-fab31d4d7a82
📒 Files selected for processing (4)
src/ares/code_agents/code_agent_base.pysrc/ares/code_agents/mini_swe_agent.pysrc/ares/code_agents/mini_swe_agent_state_test.pysrc/ares/presets.py
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/ares/code_agents/code_agent_base.py (1)
29-31:⚠️ Potential issue | 🟠 Major
CodeAgentState.messagesis mutable despitefrozen=True.On Line 29,
messagesis alist, so callers can still mutate snapshot contents in place (e.g.,append), which breaks the immutable-checkpoint contract.Proposed fix
`@dataclasses.dataclass`(frozen=True) class CodeAgentState: @@ - messages: list[request.Message] + messages: tuple[request.Message, ...] n_calls: int total_cost: float#!/bin/bash python - <<'PY' from dataclasses import dataclass `@dataclass`(frozen=True) class S: messages: list[int] s = S([1]) s.messages.append(2) print("Nested mutation allowed:", s.messages == [1, 2]) PY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/code_agent_base.py` around lines 29 - 31, CodeAgentState currently declares messages: list[request.Message] while the dataclass is frozen, allowing in-place mutations that violate immutability; change the field to an immutable sequence (e.g., tuple) by updating the annotation to tuple[request.Message, ...] (or collections.abc.Sequence[request.Message]) and ensure any construction path (factory, constructor, or __post_init__ for the frozen dataclass using object.__setattr__) converts incoming iterables/lists to a tuple; also update any code that consumes CodeAgentState.messages to treat it as an immutable sequence rather than a list.
🧹 Nitpick comments (2)
src/ares/presets.py (1)
136-138: Consider logging skipped duplicate presets.The dedupe is correct, but the silent
continuemakes collisions hard to diagnose in production runs.Proposed tweak
if preset_name in seen: + _LOGGER.warning("Skipping duplicate default preset '%s'.", preset_name) continue seen.add(preset_name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/presets.py` around lines 136 - 138, The loop that deduplicates presets currently silently continues when a duplicate is found (checking seen and preset_name); add a log statement before the continue to record that a duplicate preset was skipped, including the preset_name and any relevant context (e.g., source or index) so collisions are discoverable in production; use the module/class logger (e.g., logger.warning or self.logger.debug depending on severity) and keep the message concise and structured for easy searching.src/ares/code_agents/mini_swe_agent.py (1)
169-179: Resume loop duplicatesrun()loop logic.This duplicates step/error control flow already present in
run()(Lines 214-224), which can drift over time. Consider extracting a shared internal loop helper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent.py` around lines 169 - 179, The while-try/except body in mini_swe_agent (the loop around self.tracker.timeit("mswea/step"), await self.step(), and handlers for _NonTerminatingError and _TerminatingError that call self._add_message) duplicates the same control flow in run(); extract that logic into a single helper (e.g., a private method like _execute_step_loop or _run_step_once) and have both run() and the other loop call it so the try/except and message handling for _NonTerminatingError/_TerminatingError and the tracker.timeit wrapper live in one place (move the await self.step() call into the helper and return/propagate termination so run() can stop appropriately).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/ares/code_agents/code_agent_base.py`:
- Around line 29-31: CodeAgentState currently declares messages:
list[request.Message] while the dataclass is frozen, allowing in-place mutations
that violate immutability; change the field to an immutable sequence (e.g.,
tuple) by updating the annotation to tuple[request.Message, ...] (or
collections.abc.Sequence[request.Message]) and ensure any construction path
(factory, constructor, or __post_init__ for the frozen dataclass using
object.__setattr__) converts incoming iterables/lists to a tuple; also update
any code that consumes CodeAgentState.messages to treat it as an immutable
sequence rather than a list.
---
Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 169-179: The while-try/except body in mini_swe_agent (the loop
around self.tracker.timeit("mswea/step"), await self.step(), and handlers for
_NonTerminatingError and _TerminatingError that call self._add_message)
duplicates the same control flow in run(); extract that logic into a single
helper (e.g., a private method like _execute_step_loop or _run_step_once) and
have both run() and the other loop call it so the try/except and message
handling for _NonTerminatingError/_TerminatingError and the tracker.timeit
wrapper live in one place (move the await self.step() call into the helper and
return/propagate termination so run() can stop appropriately).
In `@src/ares/presets.py`:
- Around line 136-138: The loop that deduplicates presets currently silently
continues when a duplicate is found (checking seen and preset_name); add a log
statement before the continue to record that a duplicate preset was skipped,
including the preset_name and any relevant context (e.g., source or index) so
collisions are discoverable in production; use the module/class logger (e.g.,
logger.warning or self.logger.debug depending on severity) and keep the message
concise and structured for easy searching.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7028a305-0b43-4ec8-920c-7ed1c842dec2
📒 Files selected for processing (4)
src/ares/code_agents/code_agent_base.pysrc/ares/code_agents/mini_swe_agent.pysrc/ares/code_agents/mini_swe_agent_state_test.pysrc/ares/presets.py
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/ares/presets.py (1)
128-147: LGTM! Duplicate guard correctly prevents ValueError from registry.The
seenset prevents re-registration of presets that map to the same name (e.g., when_make_harbor_dataset_idcollapses "swebench-verified" to "sbv"). This avoids theValueErrorthatregistry.register_presetraises for duplicates.Consider adding a debug log when skipping duplicates for easier troubleshooting:
💡 Optional: Add debug logging for skipped duplicates
if preset_name in seen: + _LOGGER.debug("Skipping duplicate preset registration: %s", preset_name) continue,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/presets.py` around lines 128 - 147, Add a debug log when a duplicate preset_name is skipped so it's easier to trace why a dataset wasn't registered: inside the loop that builds preset_name (which uses _make_harbor_dataset_id and iterates code_env.list_harbor_datasets()), when you hit the "if preset_name in seen: continue" branch, replace or augment that branch to call the module logger (or processLogger) to emit a debug message including preset_name, ds_spec.name, ds_spec.version and code_agent_id before continuing; keep the seen set and registry.register_preset logic unchanged.src/ares/code_agents/mini_swe_agent.py (1)
156-179: Consider extracting shared loop logic to reduce duplication.The
while Trueloop inrestore_and_resume()(lines 169-179) is nearly identical to the one inrun()(lines 214-224). Extracting this into a private method would reduce duplication and ensure consistent behavior.💡 Optional refactor to extract shared loop
+ async def _run_step_loop(self) -> None: + """Execute the step loop until termination.""" + while True: + try: + with self.tracker.timeit("mswea/step"): + await self.step() + except _NonTerminatingError as e: + _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e) + self._add_message("user", repr(e)) + except _TerminatingError as e: + _LOGGER.debug("[%d] Terminating error: %s", id(self), e) + self._add_message("user", repr(e)) + return async def restore_and_resume(self, state: code_agent_base.CodeAgentState, task: str) -> None: ... self._messages = copy.deepcopy(state.messages) self._n_calls = state.n_calls self._total_cost = state.total_cost - - while True: - try: - with self.tracker.timeit("mswea/step"): - await self.step() - except _NonTerminatingError as e: - _LOGGER.debug("[%d] Non-terminating error: %s", id(self), e) - self._add_message("user", repr(e)) - except _TerminatingError as e: - _LOGGER.debug("[%d] Terminating error: %s", id(self), e) - self._add_message("user", repr(e)) - return + await self._run_step_loop()Then update
run()similarly to callawait self._run_step_loop().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent.py` around lines 156 - 179, The while-True loop in restore_and_resume duplicates the same step/exception handling logic used in run; extract that loop into a private helper (e.g., async def _run_step_loop(self):) that contains the with self.tracker.timeit("mswea/step"): await self.step() call and the two except blocks for _NonTerminatingError and _TerminatingError (including the same _LOGGER.debug and self._add_message calls and return on terminating); then replace the loop in both restore_and_resume and run with a single await self._run_step_loop() call so behavior is centralized and consistent.src/ares/code_agents/mini_swe_agent_state_test.py (1)
63-71: Test only verifies attribute immutability, not nested collection immutability.This test confirms that
state.n_calls = 5raisesAttributeError, but doesn't verify thatstate.messages.append(...)would still succeed (due to the mutable list). Consider adding a test to document this known limitation or to verify thatget_state()returns an independent copy that isn't affected by such mutations.The current behavior is acceptable since
get_state()usescopy.deepcopy(), but explicit test coverage would guard against future regressions.💡 Optional: Add test for nested mutation isolation
def test_get_state_messages_mutation_does_not_affect_original(): """Test that mutating state.messages doesn't affect original agent.""" agent = _make_agent() agent._messages = [request.UserMessage(role="user", content="original")] state = agent.get_state() # Mutate the state's messages (this is allowed due to list mutability) state.messages.append(request.UserMessage(role="user", content="mutated")) # Agent should not be affected assert len(agent._messages) == 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/code_agents/mini_swe_agent_state_test.py` around lines 63 - 71, The existing test test_code_agent_state_is_frozen only checks attribute immutability on CodeAgentState but not nested collection isolation; add a test (e.g., test_get_state_messages_mutation_does_not_affect_original) that creates an agent via _make_agent, sets agent._messages to a list with one request.UserMessage, calls agent.get_state(), mutates the returned state's messages (state.messages.append(...)) and then asserts the original agent._messages length/content is unchanged; reference CodeAgentState, get_state, _make_agent and agent._messages to locate where to add the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/ares/code_agents/mini_swe_agent_state_test.py`:
- Around line 63-71: The existing test test_code_agent_state_is_frozen only
checks attribute immutability on CodeAgentState but not nested collection
isolation; add a test (e.g.,
test_get_state_messages_mutation_does_not_affect_original) that creates an agent
via _make_agent, sets agent._messages to a list with one request.UserMessage,
calls agent.get_state(), mutates the returned state's messages
(state.messages.append(...)) and then asserts the original agent._messages
length/content is unchanged; reference CodeAgentState, get_state, _make_agent
and agent._messages to locate where to add the test.
In `@src/ares/code_agents/mini_swe_agent.py`:
- Around line 156-179: The while-True loop in restore_and_resume duplicates the
same step/exception handling logic used in run; extract that loop into a private
helper (e.g., async def _run_step_loop(self):) that contains the with
self.tracker.timeit("mswea/step"): await self.step() call and the two except
blocks for _NonTerminatingError and _TerminatingError (including the same
_LOGGER.debug and self._add_message calls and return on terminating); then
replace the loop in both restore_and_resume and run with a single await
self._run_step_loop() call so behavior is centralized and consistent.
In `@src/ares/presets.py`:
- Around line 128-147: Add a debug log when a duplicate preset_name is skipped
so it's easier to trace why a dataset wasn't registered: inside the loop that
builds preset_name (which uses _make_harbor_dataset_id and iterates
code_env.list_harbor_datasets()), when you hit the "if preset_name in seen:
continue" branch, replace or augment that branch to call the module logger (or
processLogger) to emit a debug message including preset_name, ds_spec.name,
ds_spec.version and code_agent_id before continuing; keep the seen set and
registry.register_preset logic unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 00683954-09d8-4be6-8a1a-f998768c06a8
📒 Files selected for processing (4)
src/ares/code_agents/code_agent_base.pysrc/ares/code_agents/mini_swe_agent.pysrc/ares/code_agents/mini_swe_agent_state_test.pysrc/ares/presets.py
User description
Summary
CodeAgentStatefrozen dataclass capturing an agent's conversational state (messages, call count, cost)CheckpointableCodeAgentprotocol extendingCodeAgentwithget_state()andrestore_and_resume()MiniSWECodeAgent--restore_and_resume()injects saved message history and re-enters the step loop without re-running the setup preamblepresets.pyContext
Part of Go-Explore support (Wave 1). This PR is independent of the other two Wave 1 PRs and can be reviewed/merged in parallel:
go-explore/snapshotable-container)go-explore/checkpoint-protocol)The key insight: at any checkpoint, a code agent's behavior is fully determined by its message history. Given the same messages and a container in the correct filesystem state, the agent can resume from any point without replaying commands.
Tests
5 new tests covering state capture, deep copy isolation, immutability, and restore+resume with mock LLM/container.
Generated description
Below is a concise technical summary of the changes proposed in this PR:
Introduce
CodeAgentStateandCheckpointableCodeAgentto capture conversational context and enableMiniSWECodeAgentto serialize/deep-copy/restore its message/counter state for checkpointing in Go-Explore, complete with tests ensuring immutability and restore behavior. Fix default preset registration by deduplicating harbor dataset/code agent identifiers when populating the registry.CodeAgentState/CheckpointableCodeAgent, lettingMiniSWECodeAgentcapture, deep-copy, and resume its conversational state while exercising the new behavior via targeted agent tests.Modified files (3)
Latest Contributors(2)
Modified files (1)
Latest Contributors(2)
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores