Adding local memory management #7

madclaws · 2025-10-20T21:36:43Z

No description provided.

- /start - Will load the model to memory and prepare the internal agent - /completions - Will now execute the py fns and update the memory

- Native interactive interface if model is mem-agent family - Communication between the py server for load model and chat

coderabbitai · 2025-10-20T21:37:00Z

📝 Walkthrough

Walkthrough

Adds a new Python FastAPI memory-agent server under server/ with model loading/caching, MLXRunner integration, sandboxed Python execution, filesystem memory utilities, model cache/health tools, reasoning parsing utilities, and a system prompt. Adds server entrypoint, pyproject, and packaging/tooling files. Updates Rust CLI to async via tokio, changes runner to optionally use the mem-agent server (with a subprocess fallback), updates command invocation sites, and swaps the model reference in b.modelfile. CI and tooling adjusted (GitHub Actions, justfile, rust-toolchain). Cargo.toml gained async/http/serde error-handling dependencies.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as Rust CLI (async)
    participant Server as FastAPI Server
    participant Cache as cache_utils
    participant MLX as MLXRunner
    participant Memory as mem_agent/tools + utils
    participant Sandbox as mem_agent/engine
    participant Reasoning as reasoning_utils

    CLI->>Server: POST /start { model, memory_path }
    Server->>Cache: resolve_single_model(model)
    Cache-->>Server: model_path
    Server->>MLX: get_or_load_model(model_path)
    MLX-->>Server: runner ready

    CLI->>Server: POST /v1/chat/completions { messages, params }
    Server->>Memory: create_memory_if_not_exists(memory_path)
    Server->>MLX: format_chat_messages_for_runner(messages)
    Server->>MLX: generate_batch / generate_streaming
    MLX-->>Server: response (may include <python> / reasoning)
    Server->>Reasoning: extract reasoning & final_answer
    alt response contains python
        Server->>Memory: read_file(s) for context
        Server->>Sandbox: execute_sandboxed_code(python, allowed_path, ...)
        Sandbox->>Memory: perform tools (read/write/list/delete)
        Memory-->>Sandbox: tool results
        Sandbox-->>Server: (locals, error)
        Server->>MLX: provide tool results and iterate (tool turn)
    end
    Server-->>CLI: ChatCompletionResponse (assistant reply)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Focus review on:
- server/mem_agent/engine.py — sandbox subprocess, pickle-over-stdout, timeout, dynamic installs, builtin/module blacklists.
- server/mem_agent/tools.py and server/mem_agent/utils.py — filesystem operations, size limits, path sanitization, edge-case handling.
- server/cache_utils.py — model discovery, resolution heuristics, health checks, deletion logic.
- server/mlx_runner.py — model loading, streaming generation, stop-token handling, resource cleanup.
- server/api.py — FastAPI endpoints, global mutable state, concurrency, error responses, tool-turn loop.
- server/reasoning_utils.py — parsing/streaming correctness across model formats.
- server/config.py — reading SYSTEM_PROMPT at import time (missing-file behavior).
- Rust changes: src/main.rs and src/runner/mlx.rs — tokio integration, async paths, HTTP client interactions, subprocess fallback.
- CI/tooling and manifests: Cargo.toml new deps, .github/workflows, justfile, rust-toolchain.toml, and b.modelfile model reference.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request contains no description provided by the author. An empty description provides zero information about the changeset and cannot be evaluated as "related in some way" to the changes, which is the lenient passing criterion. The description fails to convey any meaningful context, rationale, or explanation for the substantial modifications including the new server infrastructure, memory management system, and async runtime integration.	Add a pull request description that explains the purpose and scope of the changes. For this PR, the description should cover the addition of the memory agent server infrastructure, the FastAPI API implementation, the local memory management capabilities, the integration with the Rust CLI via async runtime, and the model configuration changes. This will help reviewers understand the intent and impact of these substantial modifications.
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "Adding local memory management" accurately refers to a significant portion of the changeset, specifically the new memory agent system implemented in server/mem_agent/ with tools, utilities, and sandboxed execution capabilities. However, the changeset is substantially broader than this title suggests, encompassing a complete FastAPI server implementation (server/api.py, config.py, main.py), model changes (modelfile), dependencies for HTTP/async support (Cargo.toml), Rust async runtime integration, and associated build tooling. While the title is not misleading or off-topic, it captures only one major component of a multi-faceted change and does not adequately represent the full scope of the pull request.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/integrate-mem-aget

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac9d39a and 823ecd4.

📒 Files selected for processing (1)

.github/workflows/rust.yml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/rust.yml

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 34

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

Cargo.toml (1)
4-4: Invalid Rust edition; compilation will fail.

The edition field must be one of 2015, 2018, or 2021. The value 2024 does not exist and will cause a compilation error.

Apply this fix:
-edition = "2024"
+edition = "2021"

🧹 Nitpick comments (15)

server/.gitignore (1)
1-3: Solid foundation, but consider adding .env for secrets.

The existing entries are appropriate for a Python project. However, since this PR introduces a FastAPI server (which typically requires environment configuration for API endpoints, model paths, secrets, etc.), consider adding common patterns to prevent accidental commits of sensitive data.

Consider enhancing the .gitignore with these additions:
 __pycache__/
 *.egg-info/
 .venv/
+.env
+.env.local
+.pytest_cache/
+dist/
+build/
Rationale:

.env / .env.local: Protects secrets (API keys, model cache paths, credentials) from being committed.

.pytest_cache/: Covers test metadata if pytest is used for validation.

dist/ / build/: Standard for Python packages if future packaging/distribution is planned.
Cargo.toml (2)
9-9: Specify explicit TLS backend for reproducible builds.

reqwest defaults to native-tls, but for deterministic and reproducible builds, explicitly choose between rustls or native-tls as recommended in best practices.

Replace with one of these options:
-reqwest = { version = "0.12", features = ["json", "blocking"] }
+reqwest = { version = "0.12", features = ["json", "blocking", "rustls-tls"] }
or if you prefer native TLS:
-reqwest = { version = "0.12", features = ["json", "blocking"] }
+reqwest = { version = "0.12", features = ["json", "blocking", "native-tls"] }
Note: This assumes the blocking feature is actually used in the codebase. If only async calls are made, consider removing it to reduce dependency bloat.

13-13: Pin tokio to a stable minor version for production consistency.

Using version = "1" (major only) allows any 1.x minor/patch, which can introduce unexpected behavioral changes across environments. Per best practices, pin to a specific LTS minor like ~1.43 or ~1.47.

Apply this change:
-tokio = { version = "1" , features = ["macros", "rt-multi-thread"]}
+tokio = { version = "~1.47", features = ["macros", "rt-multi-thread"]}
Alternatively, you can use 1.47 to allow patch updates within 1.47.x while staying stable for minor versions.
server/mem_agent/__init__.py (1)
1-1: Consider adding a module docstring for clarity.

This is an empty package initializer, which is correct. As an optional improvement, adding a brief module docstring would help developers understand the purpose of the mem_agent subsystem when exploring the codebase.

Example:
"""
Memory agent subsystem for local memory management and reasoning.

Provides engines, tools, and utilities for sandboxed code execution,
memory persistence, and multi-step reasoning workflows.
"""
server/pyproject.toml (1)
6-11: Move Black to dev extras and add minimal pins; consider uvicorn[standard].

Black is a formatter and shouldn’t be a runtime dependency. Add minimal version pins for stability and use uvicorn[standard] to pull performant wheels (uvloop/httptools) by default.

Apply this diff:
 [project]
 name = "server"
 version = "0.1.0"
 description = "Local MLX inference server for the Tiles CLI"
 requires-python = ">=3.10"
 dependencies = [
-    "fastapi",
-    "uvicorn",
-    "mlx-lm",
-    "black"
+    "fastapi>=0.95.2",
+    "uvicorn[standard]>=0.34",
+    "mlx-lm"
 ]
 
+[project.optional-dependencies]
+dev = [
+    "black>=24.3.0"
+]
Based on learnings.
server/main.py (1)
11-15: Tidy runner and enable basic logging.

Add log_level for visibility and drop the commented PID code (or implement it). Keeping 127.0.0.1 is fine for local use.
-    uvicorn.run(app, host="127.0.0.1", port=PORT)
+    uvicorn.run(app, host="127.0.0.1", port=PORT, log_level="info")
src/main.rs (1)
22-34: Optional: add structured logging early.

A basic tracing subscriber helps diagnose async issues.
 use clap::{Parser, Subcommand};
 mod commands;
+use tracing_subscriber::{fmt, EnvFilter};
@@
 #[tokio::main]
 pub async fn main() -> Result<(), Box<dyn Error>> {
+    let _ = fmt().with_env_filter(EnvFilter::from_default_env()).try_init();
src/runner/mlx.rs (3)
1-5: HTTP timeouts + remove unused import.

Add request timeouts (avoid hangs) and drop unused NulError.
-use reqwest::Client;
+use reqwest::Client;
 use serde_json::{Value, json};
-use std::ffi::NulError;
 use std::io::Write;
-use std::{io, process::Command};
+use std::{io, process::Command, time::Duration};
And add a base URL constant for reuse:
 use crate::core::modelfile::Modelfile;
 
+const BASE_URL: &str = "http://127.0.0.1:6969";
18-71: Blocking subprocess in async context and deep std::process::exit.

Spawning and waiting with std::process blocks a Tokio worker; and exiting from a library module makes composition/testing hard. Prefer tokio::process::Command and bubble errors up to the caller.

Switch to tokio::process::Command and .await the child.

Change run_model_by_sub_process to return Result<(), anyhow::Error> (or custom) and handle errors at the top of run.

Replace std::process::exit(1) with return Err(e.into()) and print user-friendly hints at the call site.

105-110: Reuse BASE_URL and (optionally) add a timeout for ping.

Small consistency fix.
-    let res = client.get("http://127.0.0.1:6969/ping").send().await?;
+    let res = client.get(format!("{}/ping", BASE_URL)).send().await?;
server/config.py (1)
7-7: Use Path.joinpath or os.path.join for path construction.

String concatenation for path construction is error-prone and platform-dependent. Use Path methods for consistency with the rest of the file.

Apply this diff:
-MEMORY_PATH = os.path.expanduser("~") + "/tiles_memory"
+MEMORY_PATH = str(Path.home() / "tiles_memory")
server/model_card.py (1)
24-24: Remove unused noqa directive.

The UP045 rule is not enabled, making this directive unnecessary.

Apply this diff:
-# ruff: noqa: UP045
server/reasoning_utils.py (1)
40-70: Annotate PATTERNS with typing.ClassVar.

The PATTERNS dictionary is a mutable class attribute that should be annotated with ClassVar to indicate it's a class-level attribute rather than an instance attribute.

Apply this diff:
+from typing import ClassVar, Dict, Optional, Tuple
+
 class ReasoningExtractor:
     """Extract reasoning and final answer from model outputs."""
     
     # Model-specific patterns
-    PATTERNS = {
+    PATTERNS: ClassVar[Dict[str, Dict]] = {
         'gpt-oss': {
server/mem_agent/engine.py (1)
19-189: Add AST validation before code execution.

The system prompt (server/system_prompt.txt line 269) states: "Your <python> block MUST compile under ast.parse and yield no SyntaxError". However, the sandbox doesn't validate this before execution. Invalid code will fail at exec() time, wasting resources and making error messages less clear.

Add AST validation before execution:
import ast

def _run_user_code(
    code: str,
    allow_installs: bool,
    allowed_path: str,
    blacklist: list,
    available_functions: dict,
    log: bool = False,
) -> tuple[dict, str]:
    """Execute code under sandboxed conditions..."""
    
    # Validate syntax before execution
    try:
        ast.parse(code)
    except SyntaxError as e:
        return {}, f"SyntaxError in code: {e}"
    
    try:
        # ... rest of the function
This provides faster feedback and matches the documented requirement.
server/mlx_runner.py (1)
300-301: Narrow bare except when probing tokenizer special tokens.

Use “except Exception” and optionally log at verbose.
-            except:
-                pass
+            except Exception as err:
+                if self.verbose:
+                    print(f"[DEBUG] Failed to probe token '{token}': {err}")

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 022bb7f and 2e17f41.

⛔ Files ignored due to path filters (2)

Cargo.lock is excluded by !**/*.lock
server/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (20)

Cargo.toml (1 hunks)
b.modelfile (1 hunks)
server/.gitignore (1 hunks)
server/__init__.py (1 hunks)
server/api.py (1 hunks)
server/cache_utils.py (1 hunks)
server/config.py (1 hunks)
server/main.py (1 hunks)
server/mem_agent/__init__.py (1 hunks)
server/mem_agent/engine.py (1 hunks)
server/mem_agent/tools.py (1 hunks)
server/mem_agent/utils.py (1 hunks)
server/mlx_runner.py (1 hunks)
server/model_card.py (1 hunks)
server/pyproject.toml (1 hunks)
server/reasoning_utils.py (1 hunks)
server/system_prompt.txt (1 hunks)
src/commands/mod.rs (1 hunks)
src/main.rs (1 hunks)
src/runner/mlx.rs (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{rs,toml}

⚙️ CodeRabbit configuration file

Review the Rust code for conformity with best practices in Rust, Systems programming. Highlight any deviations.

Files:

src/main.rs
server/pyproject.toml
src/runner/mlx.rs
src/commands/mod.rs
Cargo.toml

🧬 Code graph analysis (8)

src/main.rs (3)

src/core/modelfile.rs (1)

parse (253-258)

src/commands/mod.rs (1)

run (11-18)

src/runner/mlx.rs (1)

run (9-16)

src/runner/mlx.rs (2)

src/commands/mod.rs (1)

run (11-18)

src/core/modelfile.rs (4)

new (88-90)

new (107-119)

Modelfile (106-225)

Modelfile (94-104)

server/reasoning_utils.py (1)

server/cache_utils.py (1)

detect_model_type (277-303)

src/commands/mod.rs (2)

src/runner/mlx.rs (1)

run (9-16)

src/core/modelfile.rs (1)

parse_from_file (246-251)

server/mlx_runner.py (1)

server/reasoning_utils.py (6)

ReasoningExtractor (36-182)

StreamingReasoningParser (246-430)

detect_model_type (73-86)

process_token (200-243)

process_token (260-414)

finalize (416-430)

server/mem_agent/tools.py (1)

server/mem_agent/utils.py (2)

check_size_limits (35-54)

create_memory_if_not_exists (57-71)

server/cache_utils.py (2)

server/model_card.py (2)

read_readme_front_matter (139-167)

tokenizer_has_chat_template (170-184)

server/mlx_runner.py (1)

run_model_enhanced (955-1049)

server/api.py (5)

server/cache_utils.py (2)

detect_framework (241-274)

get_model_path (182-201)

server/mlx_runner.py (5)

cleanup (321-357)

load_model (128-180)

_format_conversation (765-794)

generate_batch (584-689)

get_effective_max_tokens (359-389)

server/mem_agent/utils.py (5)

extract_python_code (152-172)

extract_reply (175-182)

extract_thoughts (185-192)

create_memory_if_not_exists (57-71)

format_results (195-203)

server/mem_agent/engine.py (1)

execute_sandboxed_code (200-314)

src/runner/mlx.rs (2)

load_model (112-129)

ping (105-110)

🪛 LanguageTool

server/system_prompt.txt

[style] ~46-~46: This phrase is redundant. Consider using “outside”.
Context: ...Skip the <think> block - Provide text outside of these tags - Use <reply> when you hav...

(OUTSIDE_OF)

[style] ~257-~257: Using many exclamation marks might seem excessive (in this case: 9 exclamation marks for a text that’s 5062 characters long)
Context: ...thout assignment return empty {} results! 9. Wait for Results: After submitti...

(EN_EXCESSIVE_EXCLAMATION)

🪛 Ruff (0.14.1)

server/reasoning_utils.py

40-70: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

server/model_card.py

24-24: Unused noqa directive (non-enabled: UP045)

Remove unused noqa directive

(RUF100)

55-55: Do not catch blind exception: Exception

(BLE001)

165-165: Consider moving this statement to an else block

(TRY300)

166-166: Do not catch blind exception: Exception

(BLE001)

183-183: Do not catch blind exception: Exception

(BLE001)

server/mem_agent/engine.py

37-37: Do not catch blind exception: Exception

(BLE001)

53-55: Abstract raise to an inner function

(TRY301)

53-55: Avoid specifying long messages outside the exception class

(TRY003)

67-69: Abstract raise to an inner function

(TRY301)

67-69: Avoid specifying long messages outside the exception class

(TRY003)

80-82: Abstract raise to an inner function

(TRY301)

80-82: Avoid specifying long messages outside the exception class

(TRY003)

103-104: try-except-pass detected, consider logging the exception

(S110)

103-103: Do not catch blind exception: Exception

(BLE001)

128-128: subprocess call: check for execution of untrusted input

(S603)

134-134: Do not catch blind exception: Exception

(BLE001)

136-138: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

139-139: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

156-156: Use of exec detected

(S102)

157-157: Do not catch blind exception: Exception

(BLE001)

162-162: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

183-183: Do not catch blind exception: Exception

(BLE001)

189-189: Consider moving this statement to an else block

(TRY300)

191-191: Do not catch blind exception: Exception

(BLE001)

194-196: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

197-197: Use explicit conversion flag

Replace with conversion flag

(RUF010)

204-204: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

205-205: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

206-206: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

207-207: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

208-208: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

238-238: subprocess call: check for execution of untrusted input

(S603)

244-244: Do not catch blind exception: Exception

(BLE001)

245-247: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

270-270: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

287-287: subprocess call: check for execution of untrusted input

(S603)

295-297: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

307-307: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

308-308: Do not catch blind exception: Exception

(BLE001)

321-321: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

server/mem_agent/utils.py

70-70: Do not catch blind exception: Exception

(BLE001)

115-115: f-string without any placeholders

Remove extraneous f prefix

(F541)

130-130: Consider moving this statement to an else block

(TRY300)

131-131: Do not use bare except

(E722)

131-133: try-except-pass detected, consider logging the exception

(S110)

144-144: Consider moving this statement to an else block

(TRY300)

146-146: Do not catch blind exception: Exception

(BLE001)

146-146: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

server/mlx_runner.py

72-72: Consider moving this statement to an else block

(TRY300)

110-110: Avoid specifying long messages outside the exception class

(TRY003)

115-115: Consider moving this statement to an else block

(TRY300)

142-143: try-except-pass detected, consider logging the exception

(S110)

142-142: Do not catch blind exception: Exception

(BLE001)

180-180: Avoid specifying long messages outside the exception class

(TRY003)

221-221: Possible hardcoded password assigned to: "token_content"

(S105)

235-235: Possible hardcoded password assigned to: "token_content"

(S105)

300-300: Do not use bare except

(E722)

300-301: try-except-pass detected, consider logging the exception

(S110)

348-349: try-except-pass detected, consider logging the exception

(S110)

348-348: Do not catch blind exception: Exception

(BLE001)

421-421: Avoid specifying long messages outside the exception class

(TRY003)

611-611: Avoid specifying long messages outside the exception class

(TRY003)

761-761: Do not catch blind exception: Exception

(BLE001)

787-787: Consider moving this statement to an else block

(TRY300)

788-788: Do not catch blind exception: Exception

(BLE001)

829-829: Do not catch blind exception: Exception

(BLE001)

854-854: Unpacked variable before_reasoning is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

876-878: try-except-pass detected, consider logging the exception

(S110)

876-876: Do not catch blind exception: Exception

(BLE001)

1047-1047: Do not catch blind exception: Exception

(BLE001)

server/mem_agent/tools.py

26-26: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)

40-40: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)

49-49: Avoid specifying long messages outside the exception class

(TRY003)

88-88: Abstract raise to an inner function

(TRY301)

88-88: Create your own exception

(TRY002)

88-88: Avoid specifying long messages outside the exception class

(TRY003)

89-89: Do not catch blind exception: Exception

(BLE001)

94-94: Do not catch blind exception: Exception

(BLE001)

95-95: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

95-95: Create your own exception

(TRY002)

95-95: Avoid specifying long messages outside the exception class

(TRY003)

96-96: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

96-96: Create your own exception

(TRY002)

96-96: Avoid specifying long messages outside the exception class

(TRY003)

110-110: Consider moving this statement to an else block

(TRY300)

111-111: Do not catch blind exception: Exception

(BLE001)

178-178: Consider moving this statement to an else block

(TRY300)

182-182: Do not catch blind exception: Exception

(BLE001)

183-183: Use explicit conversion flag

Replace with conversion flag

(RUF010)

207-207: Do not catch blind exception: Exception

(BLE001)

230-230: Unused function argument: is_last

(ARG001)

277-277: Do not catch blind exception: Exception

(BLE001)

292-292: Consider moving this statement to an else block

(TRY300)

293-293: Do not catch blind exception: Exception

(BLE001)

328-328: Do not catch blind exception: Exception

(BLE001)

server/cache_utils.py

155-155: Unpacked variable found_model_dir is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

249-249: Unpacked variable pipeline is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

252-253: try-except-pass detected, consider logging the exception

(S110)

252-252: Do not catch blind exception: Exception

(BLE001)

266-266: Do not catch blind exception: Exception

(BLE001)

283-284: try-except-pass detected, consider logging the exception

(S110)

283-283: Do not catch blind exception: Exception

(BLE001)

294-295: try-except-pass detected, consider logging the exception

(S110)

294-294: Do not catch blind exception: Exception

(BLE001)

315-315: Do not catch blind exception: Exception

(BLE001)

376-376: Do not catch blind exception: Exception

(BLE001)

389-389: Do not catch blind exception: Exception

(BLE001)

458-460: try-except-continue detected, consider logging the exception

(S112)

458-458: Do not catch blind exception: Exception

(BLE001)

525-525: Do not catch blind exception: Exception

(BLE001)

572-572: Do not catch blind exception: Exception

(BLE001)

745-745: Starting a process with a shell, possible injection detected

(S605)

971-971: Do not catch blind exception: Exception

(BLE001)

1035-1035: Do not catch blind exception: Exception

(BLE001)

1054-1054: Do not catch blind exception: Exception

(BLE001)

1091-1091: Do not catch blind exception: Exception

(BLE001)

server/api.py

110-110: Unused method argument: memory_path

(ARG002)

110-110: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

112-112: Unused method argument: model

(ARG002)

112-112: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

113-113: Unused method argument: predetermined_memory_path

(ARG002)

114-114: Unused method argument: model_cache

(ARG002)

114-114: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

115-115: Unused method argument: current_model_path

(ARG002)

116-116: Unused method argument: default_max_tokens

(ARG002)

141-141: Unpacked variable commit_hash is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

143-143: Abstract raise to an inner function

(TRY301)

144-144: Do not catch blind exception: Exception

(BLE001)

145-145: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

145-145: Use explicit conversion flag

Replace with conversion flag

(RUF010)

161-162: try-except-pass detected, consider logging the exception

(S110)

161-161: Do not catch blind exception: Exception

(BLE001)

203-203: Consider moving this statement to an else block

(TRY300)

204-204: Do not catch blind exception: Exception

(BLE001)

205-205: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

274-274: Local variable thoughts is assigned to but never used

Remove assignment to unused variable thoughts

(F841)

311-311: Do not catch blind exception: Exception

(BLE001)

312-312: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (5)

server/__init__.py (1)

1-1: File structure and package initialization look good.

An empty __init__.py correctly marks this directory as a Python package. Consider whether public APIs from submodules (e.g., from api.py, config.py) should be re-exported here for improved usability, depending on your module design intent. For now, this approach is reasonable and allows users to import explicitly.

src/main.rs (1)

22-29: Async entrypoint looks good.

Tokio main + awaiting Run is correct and keeps the original flow intact.

b.modelfile (1)

1-1: No issue found—intentional separation of execution paths.

The repository correctly uses two different modelfiles for two different execution strategies: memgpt.modelfile (with the mem-agent prefix) routes to server execution, while b.modelfile (with a different model) routes to subprocess execution. The gating logic at line 11 of src/runner/mlx.rs is functioning as designed, not misconfigured.

Likely an incorrect or invalid review comment.

src/runner/mlx.rs (1)

9-16: No issues found with mem-agent gating or FROM handling.

The code is safe by design: modelfile.build()? is called in create_modelfile before returning Ok, and build() returns Err if from.is_none(). This guarantee ensures that when mlx::run() receives a Modelfile, from is always Some, making all three unwraps safe.

Verification confirms:

mem-agent gating limited to one location (src/runner/mlx.rs:11)

No other hardcoded model name checks found

Missing FROM fields already validated before run() is invoked

server/reasoning_utils.py (1)

260-414: Verify streaming parser state machine for edge cases.

The StreamingReasoningParser.process_token method has complex state machine logic with buffering, pattern matching, and conditional token skipping. The logic for detecting partial pattern matches (lines 296-317) and conditional skip tokens (lines 392-400) could have edge cases that cause tokens to be incorrectly buffered or dropped.

Consider testing scenarios:

Reasoning markers split across multiple tokens

Skip tokens appearing in the middle of content (not just at boundaries)

Very short tokens that match prefixes of multiple patterns

Nested or malformed reasoning/final sections

The state machine should handle these gracefully without losing tokens or entering invalid states.

coderabbitai · 2025-10-20T21:45:55Z

server/api.py

+_runner: MLXRunner = {}
+_max_tool_turns = 5
+


⚠️ Potential issue | 🟠 Major

Fix invalid globals: _runner type/init.

Initialize to None with Optional type, not {}.

-_runner: MLXRunner = {} +from typing import Optional +_runner: Optional[MLXRunner] = None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_runner: MLXRunner = {}

_max_tool_turns = 5

from typing import Optional

_runner: Optional[MLXRunner] = None

_max_tool_turns = 5

🤖 Prompt for AI Agents

In server/api.py around lines 47 to 49, the global _runner is incorrectly typed/initialized as "{}"; change its declaration to use an Optional type and initialize it to None (e.g., _runner: Optional[MLXRunner] = None), and ensure Optional is imported from typing (and MLXRunner is in scope or imported) so the global reflects a nullable runner rather than an empty dict.

coderabbitai · 2025-10-20T21:45:55Z

server/api.py

+agent: Agent()
+
+def get_or_load_model(model_spec: str, verbose: bool = False) -> MLXRunner:


⚠️ Potential issue | 🔴 Critical

This line executes Agent() at import time and leaves ‘agent’ unbound.

Change to a proper variable or remove.

-agent: Agent() +# Optional: instantiate later if needed +agent: Optional[Agent] = None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

agent: Agent()

def get_or_load_model(model_spec: str, verbose: bool = False) -> MLXRunner:

# Optional: instantiate later if needed

agent: Optional[Agent] = None

def get_or_load_model(model_spec: str, verbose: bool = False) -> MLXRunner:

🤖 Prompt for AI Agents

In server/api.py around lines 131-133, the line "agent: Agent()" is invalid/undesirable because it invokes Agent() at import time and doesn't properly bind the variable; replace it with either a proper type annotation and lazy init like "agent: Optional[Agent] = None" and instantiate Agent() inside the function that needs it (e.g., get_or_load_model) or if a module-level singleton is intended, change it to an explicit assignment "agent = Agent()". Ensure imports for Optional are added if using the first option.

coderabbitai · 2025-10-20T21:45:55Z

server/api.py

+        model_path, model_name, commit_hash = get_model_path(model_spec)
+        if not model_path.exists():
+            raise HTTPException(status_code=404, detail=f"Model {model_spec} not found in cache")
+    except Exception as e:
+        raise HTTPException(status_code=404, detail=f"Model {model_spec} not found: {str(e)}")
+
+    # Check if it's an MLX model
+
+    model_path_str = str(model_path)
+
+    print(_current_model_path)


⚠️ Potential issue | 🟠 Major

Robust model path existence check and remove prints.

Avoid AttributeError when get_model_path returns None; drop noisy prints.

- print(model_spec) @@ - model_path, model_name, commit_hash = get_model_path(model_spec) - if not model_path.exists(): + model_path, model_name, commit_hash = get_model_path(model_spec) + if not model_path or not Path(model_path).exists(): raise HTTPException(status_code=404, detail=f"Model {model_spec} not found in cache") @@ - print(_current_model_path) - print(model_path_str)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.1)

141-141: Unpacked variable commit_hash is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

143-143: Abstract raise to an inner function

(TRY301)

144-144: Do not catch blind exception: Exception

(BLE001)

145-145: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

145-145: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents

In server/api.py around lines 141 to 151, get_model_path may return None causing an AttributeError when calling model_path.exists(), and there is a stray print(_current_model_path); change the logic to first verify get_model_path returned a non-None tuple and that model_path is not None before calling .exists(), raise a 404 with a clear message if model_path is None or does not exist, catch only expected exceptions (or re-raise unexpected ones) rather than blanket-catching Exception for control flow, and remove the print statement (replace with appropriate logging if needed).

coderabbitai · 2025-10-20T21:45:55Z

server/api.py

+@app.get("/ping")
+async def ping():
+    return {"message": "Badda-Bing Badda-Bang"} 
+
+@app.post("/start")
+async def start_model(request: StartRequest):
+    """Load the model and start the agent"""
+    global _messages, _runner
+    print(str(request))
+    _messages = [ChatMessage(role="system", content=SYSTEM_PROMPT)]
+
+    try:
+        _runner = get_or_load_model(request.model)
+        return {"message": "Model loaded"}
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+


⚠️ Potential issue | 🟠 Major

/start resets global _messages; drop shared state and PII prints.

Globals will mix conversations across clients and leak content. Maintain history per request instead.

- global _messages, _runner - print(str(request)) - _messages = [ChatMessage(role="system", content=SYSTEM_PROMPT)] + global _runner try: _runner = get_or_load_model(request.model) return {"message": "Model loaded"}

🧰 Tools

🪛 Ruff (0.14.1)

203-203: Consider moving this statement to an else block

(TRY300)

204-204: Do not catch blind exception: Exception

(BLE001)

205-205: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents

In server/api.py around lines 190 to 206, the start_model endpoint currently prints the incoming request (potential PII) and resets the global _messages which can mix conversations across clients; remove the print(str(request)) call and do not mutate or reset any global conversation state in this handler. Instead, keep model loading separate (e.g., set _runner = get_or_load_model(request.model) only) and move conversation/history management to per-request or per-session storage (pass history in request, store keyed by session id, or use a request-scoped dependency) so each client gets isolated history; ensure the endpoint returns model load status without touching or clearing _messages and continue to surface errors via HTTPException as before.

coderabbitai · 2025-10-20T21:45:55Z

server/api.py

+@app.post("/v1/chat/completions")
+async def create_chat_completion(request: ChatCompletionRequest):
+    """Create a chat completion."""
+    global _messages, _max_tool_turns
+    try:
+        runner = get_or_load_model(request.model)
+
+        # if request.stream:
+        #     # Streaming response
+        #     return StreamingResponse(
+        #         generate_chat_stream(runner, request.messages, request),
+        #         media_type="text/plain",
+        #         headers={"Cache-Control": "no-cache"}
+        #     )
+        # else:
+            # Non-streaming response
+        completion_id = f"chatcmpl-{uuid.uuid4()}"
+        created = int(time.time())
+
+        # Convert messages to dict format for runner
+        # _messages.append(system_message)
+        _messages.extend(request.messages)
+        message_dicts = format_chat_messages_for_runner(_messages)
+        # Let the runner format with chat templates
+        prompt = runner._format_conversation(message_dicts, use_chat_template=True)
+
+        generated_text = runner.generate_batch(
+            prompt=prompt,
+            max_tokens=runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False),
+            temperature=request.temperature,
+            top_p=request.top_p,
+            repetition_penalty=request.repetition_penalty,
+            use_chat_template=False  # Already applied in _format_conversation
+        )
+


⚠️ Potential issue | 🟠 Major

Don’t block the event loop; offload generation to threadpool.

MLX generation is CPU-bound; use run_in_threadpool.

-from fastapi.responses import StreamingResponse +from fastapi.responses import StreamingResponse +from starlette.concurrency import run_in_threadpool @@ - completion_id = f"chatcmpl-{uuid.uuid4()}" + completion_id = f"chatcmpl-{uuid.uuid4()}" created = int(time.time()) @@ - _messages.extend(request.messages) - message_dicts = format_chat_messages_for_runner(_messages) + # Build per-request message history (avoid shared globals) + messages = [ChatMessage(role="system", content=SYSTEM_PROMPT), *request.messages] + message_dicts = format_chat_messages_for_runner(messages) @@ - generated_text = runner.generate_batch( - prompt=prompt, - max_tokens=runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False), - temperature=request.temperature, - top_p=request.top_p, - repetition_penalty=request.repetition_penalty, - use_chat_template=False # Already applied in _format_conversation - ) + generated_text = await run_in_threadpool( + runner.generate_batch, + prompt, + runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False), + request.temperature, + request.top_p, + request.repetition_penalty, + False, # use_chat_template already applied + False, # interactive + )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In server/api.py around lines 207 to 241, the code calls runner.generate_batch (and potentially runner._format_conversation) directly which blocks the asyncio event loop for CPU-bound ML generation; fix it by importing and using FastAPI/Starlette's run_in_threadpool to offload the blocking work: move any CPU/blocking preparation (formatting) into a sync function if needed, then await run_in_threadpool(...) around the blocking generate_batch call (and formatting if heavy) so generation runs on a worker thread and the event loop remains responsive; ensure the call returns the generated_text to use in the response.

coderabbitai · 2025-10-20T21:46:01Z

src/commands/mod.rs

+pub async fn run(modelfile: &str) {
    match modelfile::parse_from_file(modelfile) {
        Ok(modelfile) => {
-            mlx::run(modelfile);
+            mlx::run(modelfile).await;
        }


⚠️ Potential issue | 🔴 Critical

Validate Modelfile before running to avoid panic.

Runner unwraps FROM; call build() (or check FROM) here and fail fast with a clear message.

-pub async fn run(modelfile: &str) { - match modelfile::parse_from_file(modelfile) { - Ok(modelfile) => { - mlx::run(modelfile).await; +pub async fn run(modelfile: &str) { + match modelfile::parse_from_file(modelfile) { + Ok(mut modelfile) => { + if let Err(e) = modelfile.build() { + eprintln!("{}", e); + return; + } + mlx::run(modelfile).await; } Err(err) => println!("{}", err), } }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/commands/mod.rs around lines 11-15, after successfully parsing the modelfile you must validate/build it before passing to mlx::run to avoid a panic when the runner later unwraps FROM; call the modelfile.build() (or the appropriate validate/build method) and handle its Result, returning or logging a clear error message and aborting if build fails, otherwise pass the built/validated modelfile into mlx::run; ensure error paths produce a user-friendly message that explains the invalid FROM.

coderabbitai · 2025-10-20T21:46:01Z

src/runner/mlx.rs

+pub async fn run(modelfile: Modelfile) {
+    let model = modelfile.from.as_ref().unwrap();
+    if model.starts_with("driaforall/mem-agent") {
+        let _res = run_model_with_server(modelfile).await;
+    } else {
+        run_model_by_sub_process(modelfile);
+    }
+}


⚠️ Potential issue | 🔴 Critical

Guard against missing FROM and relax mem‑agent detection.

Avoid panics and make the mem‑agent switch less brittle.

-pub async fn run(modelfile: Modelfile) { - let model = modelfile.from.as_ref().unwrap(); - if model.starts_with("driaforall/mem-agent") { - let _res = run_model_with_server(modelfile).await; - } else { - run_model_by_sub_process(modelfile); - } -} +pub async fn run(mut modelfile: Modelfile) { + let Some(model) = modelfile.from.as_ref() else { + eprintln!("❌ Error: Modelfile is missing a FROM instruction"); + return; + }; + if is_mem_agent(model) { + if let Err(e) = run_model_with_server(modelfile).await { + eprintln!("❌ Error: {}", e); + } + } else { + run_model_by_sub_process(modelfile); + } +} + +fn is_mem_agent(model: &str) -> bool { + model.contains("mem-agent") +}

🤖 Prompt for AI Agents

In src/runner/mlx.rs around lines 9 to 16, avoid unwrapping modelfile.from (which can panic) and make the mem-agent check more flexible: replace the unwrap with an if let / match that handles None by logging or returning early, obtain a &str via as_deref(), normalize to lowercase, and use contains("mem-agent") (or a more specific substring if desired) instead of the strict starts_with; then call run_model_with_server(modelfile).await when the match hits and handle/log its result, otherwise call run_model_by_sub_process(modelfile) as before.

coderabbitai · 2025-10-20T21:46:01Z

src/runner/mlx.rs

+async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> {
+    // println!("gonna ping");
+    // let _ = ping().await;
+    let stdin = io::stdin();
+    let mut stdout = io::stdout();
+    // loading the model from mem-agent via daeomn server
+    let modelname = modelfile.from.as_ref().unwrap();
+    load_model(&modelname).await.unwrap();
+    println!("Running in interactive mode");
+    loop {
+        print!(">> ");
+        stdout.flush().unwrap();
+        let mut input = String::new();
+        stdin.read_line(&mut input).unwrap();
+        let input = input.trim();
+        match input {
+            "exit" => {
+                println!("Exiting interactive mode");
+                break;
+            }
+            _ => {
+                if let Ok(response) = chat(input, &modelname).await {
+                    println!(">> {}", response)
+                } else {
+                    println!(">> failed to respond")
+                }
+            }
+        }
+    }
+    Ok(())
+}


⚠️ Potential issue | 🟠 Major

Don’t unwrap in the interactive server path; handle failures gracefully.

Unwraps can panic if the server is down or returns non‑JSON. Propagate errors and keep the CLI alive.

- let stdin = io::stdin(); - let mut stdout = io::stdout(); - // loading the model from mem-agent via daeomn server - let modelname = modelfile.from.as_ref().unwrap(); - load_model(&modelname).await.unwrap(); + let stdin = io::stdin(); + let mut stdout = io::stdout(); + // load the model via daemon server + let modelname = modelfile.from.as_ref().unwrap().to_owned(); + if let Err(e) = load_model(&modelname).await { + eprintln!("❌ Failed to start model on server: {}", e); + return Ok(()); + } @@ - if let Ok(response) = chat(input, &modelname).await { + if let Ok(response) = chat(input, &modelname).await { println!(">> {}", response) } else { println!(">> failed to respond") }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> {

// println!("gonna ping");

// let _ = ping().await;

let stdin = io::stdin();

let mut stdout = io::stdout();

// loading the model from mem-agent via daeomn server

let modelname = modelfile.from.as_ref().unwrap();

load_model(&modelname).await.unwrap();

println!("Running in interactive mode");

loop {

print!(">> ");

stdout.flush().unwrap();

let mut input = String::new();

stdin.read_line(&mut input).unwrap();

let input = input.trim();

match input {

"exit" => {

println!("Exiting interactive mode");

break;

}

_ => {

if let Ok(response) = chat(input, &modelname).await {

println!(">> {}", response)

} else {

println!(">> failed to respond")

}

}

}

}

Ok(())

}

async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> {

// println!("gonna ping");

// let _ = ping().await;

let stdin = io::stdin();

let mut stdout = io::stdout();

// load the model via daemon server

let modelname = modelfile.from.as_ref().unwrap().to_owned();

if let Err(e) = load_model(&modelname).await {

eprintln!("❌ Failed to start model on server: {}", e);

return Ok(());

}

println!("Running in interactive mode");

loop {

print!(">> ");

stdout.flush().unwrap();

let mut input = String::new();

stdin.read_line(&mut input).unwrap();

let input = input.trim();

match input {

"exit" => {

println!("Exiting interactive mode");

break;

}

_ => {

if let Ok(response) = chat(input, &modelname).await {

println!(">> {}", response)

} else {

println!(">> failed to respond")

}

}

}

}

Ok(())

}

🤖 Prompt for AI Agents

In src/runner/mlx.rs around lines 73 to 103, several unwrap() calls in the interactive server path (loading the model, flushing stdout, reading stdin, and awaiting chat) can panic if the daemon/server is down or returns bad data; replace those unwraps with proper error handling: handle load_model errors by returning or logging a clear error and keep the CLI running (do not panic), check and handle stdout.flush() and stdin.read_line() errors with a warning and continue the loop, and handle chat(...) errors by logging the error details and printing a friendly failure message while continuing the interactive loop; ensure the function still returns a Result and propagate non-recoverable errors upward but avoid panics inside the interactive loop.

coderabbitai · 2025-10-20T21:46:01Z

src/runner/mlx.rs

+async fn load_model(model_name: &str) -> Result<(), String> {
+    let client = Client::new();
+    let body = json!({
+        "model": model_name
+    });
+    let res = client
+        .post("http://127.0.0.1:6969/start")
+        .json(&body)
+        .send()
+        .await
+        .unwrap();
+    // println!("{:?}", res);
+    if res.status() == 200 {
+        Ok(())
+    } else {
+        Err(String::from("request failed"))
+    }
+}


🛠️ Refactor suggestion | 🟠 Major

Return proper HTTP errors; add timeout and is_success.

Network failures should not panic. Include status in errors.

async fn load_model(model_name: &str) -> Result<(), String> { - let client = Client::new(); + let client = reqwest::Client::builder() + .timeout(Duration::from_secs(30)) + .build() + .map_err(|e| e.to_string())?; let body = json!({ "model": model_name }); - let res = client - .post("http://127.0.0.1:6969/start") + let res = client + .post(format!("{}/start", BASE_URL)) .json(&body) .send() - .await - .unwrap(); + .await + .map_err(|e| e.to_string())?; - // println!("{:?}", res); - if res.status() == 200 { + if res.status().is_success() { Ok(()) } else { - Err(String::from("request failed")) + Err(format!("request failed: {}", res.status())) } }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/runner/mlx.rs around lines 112-129, replace the unwrap and simplistic status check with proper error handling and a request timeout: construct the reqwest Client with a reasonable timeout, await the send() call without unwrap and match on its Result to convert network errors into Err(String) (include the error message), then check res.status().is_success() and on failure return Err including the numeric status and optionally the response text; ensure no panics and that timeouts/network failures become descriptive Err(String) returns instead of unwrapping.

coderabbitai · 2025-10-20T21:46:01Z

src/runner/mlx.rs

+async fn chat(input: &str, model_name: &str) -> Result<String, String> {
+    let client = Client::new();
+    let body = json!({
+        "model": model_name,
+        "messages": [{"role": "user", "content": input}]
+    });
+    let res = client
+        .post("http://127.0.0.1:6969/v1/chat/completions")
+        .json(&body)
+        .send()
+        .await
+        .unwrap();
+    // println!("{:?}", res);
+    if res.status() == 200 {
+        let text = res.text().await.unwrap();
+        let v: Value = serde_json::from_str(&text).unwrap();
+        let content = v["choices"][0]["message"]["content"]
+            .as_str()
+            .unwrap_or("<no content>");
+        Ok(content.to_owned())
+    } else {
+        Err(String::from("request failed"))
+    }
+}


⚠️ Potential issue | 🟠 Major

Avoid unwraps; parse response safely and surface non‑200s.

Also add a longer timeout for generation requests.

async fn chat(input: &str, model_name: &str) -> Result<String, String> { - let client = Client::new(); + let client = reqwest::Client::builder() + .timeout(Duration::from_secs(60)) + .build() + .map_err(|e| e.to_string())?; let body = json!({ "model": model_name, "messages": [{"role": "user", "content": input}] }); - let res = client - .post("http://127.0.0.1:6969/v1/chat/completions") + let res = client + .post(format!("{}/v1/chat/completions", BASE_URL)) .json(&body) .send() - .await - .unwrap(); - // println!("{:?}", res); - if res.status() == 200 { - let text = res.text().await.unwrap(); - let v: Value = serde_json::from_str(&text).unwrap(); + .await + .map_err(|e| e.to_string())?; + if res.status().is_success() { + let text = res.text().await.map_err(|e| e.to_string())?; + let v: Value = serde_json::from_str(&text).map_err(|e| e.to_string())?; let content = v["choices"][0]["message"]["content"] .as_str() .unwrap_or("<no content>"); Ok(content.to_owned()) } else { - Err(String::from("request failed")) + Err(format!("request failed: {}", res.status())) } }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/runner/mlx.rs around lines 131 to 154, replace the current chat function's unwraps and silent failure on non-200 by building the reqwest Client with a longer timeout, using awaited results with ?/map_err (or match) to propagate errors instead of unwrap, checking response.status().is_success() and on non-success returning an Err that includes the status code and response body, and parsing JSON with serde_json::from_str handling its Result so missing fields produce a clear Err; finally extract the content by safely indexing with get(...) and as_str().map(|s| s.to_owned()) or return an Err if absent—this ensures timeouts are extended and all network/parse errors are surfaced instead of panicking.

While loading the model we create the necessary files which will track the memory_path and then the memory_path to the py server in /start api, instead of py server handling the creation of memory_path. This allows us to configure the memory_path later via cli.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (9)

src/runner/mlx.rs (4)

11-18: Guard against missing FROM and relax mem‑agent detection.

The unwrap() at line 12 will panic if modelfile.from is None. The exact prefix match at line 13 is also brittle and will fail if the model name changes slightly.

This issue was previously flagged. Please refer to the past review comment for the suggested fix.

75-106: Don't unwrap in the interactive server path; handle failures gracefully.

Multiple unwrap() calls at lines 81, 82, 83, 87, and 89 can panic if the server is down, the modelfile is missing FROM, or IO operations fail. This will crash the CLI instead of allowing graceful recovery.

This issue was previously flagged. Please refer to the past review comment for the suggested fix.

115-132: Return proper HTTP errors; add timeout and use is_success().

Line 126 unwraps the network request which can panic. Line 127 uses exact status comparison instead of is_success(). No timeout is configured, so requests can hang indefinitely.

This issue was previously flagged. Please refer to the past review comment for the suggested fix.

134-157: Avoid unwraps; parse response safely and surface non‑200s.

Multiple unwrap() calls at lines 145, 148, and 149 can panic on network failures or malformed JSON. Line 147 uses exact status comparison. No timeout is configured for what could be a long-running generation request.

This issue was previously flagged. Please refer to the past review comment for the suggested fix.
server/api.py (5)
46-46: Critical: Fix invalid type/initialization for _runner.

This issue was flagged in previous reviews but remains unresolved. The global _runner is typed as MLXRunner but initialized to {} (an empty dict), which will cause type errors.

Apply this diff:
-_runner: MLXRunner = {}
+from typing import Optional
+_runner: Optional[MLXRunner] = None
132-132: Critical: Fix invalid agent instantiation syntax.

This issue was flagged in previous reviews. Line 132 agent: Agent() is invalid—it invokes Agent() at import time without properly binding the variable.

Apply this diff:
-agent: Agent()
+agent: Optional[Agent] = None
Then instantiate the agent inside the function that needs it (e.g., within start_model or as a module-level singleton agent = Agent()).

134-177: Fix model path validation and remove debug prints.

Several issues remain from previous reviews:

Lines 138, 152-153: Debug print statements should be removed (potential PII).

Line 143: model_path.exists() can raise AttributeError if get_model_path returns None for model_path (per cache_utils.py line 186, it can return (None, model_name, commit_hash)).

Line 142: commit_hash is unpacked but never used.

Lines 145-146: Broad exception handling without proper chaining.

Apply this diff:
-    print(model_spec)
     # Use the existing model path resolution from cache_utils
 
     try:
-        model_path, model_name, commit_hash = get_model_path(model_spec)
-        if not model_path.exists():
+        model_path, model_name, _commit_hash = get_model_path(model_spec)
+        if not model_path or not Path(model_path).exists():
             raise HTTPException(status_code=404, detail=f"Model {model_spec} not found in cache")
     except Exception as e:
-        raise HTTPException(status_code=404, detail=f"Model {model_spec} not found: {str(e)}")
+        raise HTTPException(status_code=404, detail=f"Model {model_spec} not found: {str(e)}") from e
 
     # Check if it's an MLX model
 
     model_path_str = str(model_path)
 
-    print(_current_model_path)
-    print(model_path_str)
     # Check if we need to load a different model
Also add the Path import at the top:
from pathlib import Path
195-207: Remove PII prints and avoid global conversation state.

This endpoint has issues flagged in previous reviews:

Line 199: print(str(request)) leaks PII and should be removed.

Line 200: Resetting global _messages will cause conversations from concurrent requests to interfere with each other.

Apply this diff:
-    global _messages, _runner,_memory_path
-    print(str(request))
-    _messages = [ChatMessage(role="system", content=SYSTEM_PROMPT)]
+    global _runner, _memory_path
     _memory_path = request.memory_path
     try:
         _runner = get_or_load_model(request.model)
Note: Conversation history should be managed per-request or per-session (e.g., passed in the request body or stored with a session ID) rather than in global state.

208-313: Critical: Fix blocking calls, global state, and PII leaks.

This endpoint has multiple unresolved issues from previous reviews:

Lines 234-241, 270-272: runner.generate_batch() is CPU-bound and blocks the asyncio event loop. This will make the server unresponsive during generation.

Lines 229, 266, 279: Mutating global _messages causes conversations from concurrent clients to interfere with each other.

Lines 251, 261-262, 273: Debug print statements leak PII (generated text, results).

Line 275: thoughts is assigned but never used.

Line 301: No fallback when reply is empty, resulting in blank content.

Apply this comprehensive diff:
+from starlette.concurrency import run_in_threadpool
+
 @app.post("/v1/chat/completions")
 async def create_chat_completion(request: ChatCompletionRequest):
     """Create a chat completion."""
-    global _messages, _max_tool_turns, _memory_path
+    global _max_tool_turns, _memory_path
     try:
         runner = get_or_load_model(request.model)
 
         # Non-streaming response
         completion_id = f"chatcmpl-{uuid.uuid4()}"
         created = int(time.time())
 
-        # Convert messages to dict format for runner
-        # _messages.append(system_message)
-        _messages.extend(request.messages)
-        message_dicts = format_chat_messages_for_runner(_messages)
+        # Build per-request message history (avoid shared globals)
+        messages = [ChatMessage(role="system", content=SYSTEM_PROMPT), *request.messages]
+        message_dicts = format_chat_messages_for_runner(messages)
         # Let the runner format with chat templates
         prompt = runner._format_conversation(message_dicts, use_chat_template=True)
 
-        generated_text = runner.generate_batch(
+        generated_text = await run_in_threadpool(
+            runner.generate_batch,
             prompt=prompt,
-            max_tokens=runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False),
-            temperature=request.temperature,
-            top_p=request.top_p,
-            repetition_penalty=request.repetition_penalty,
-            use_chat_template=False  # Already applied in _format_conversation
+            runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False),
+            request.temperature,
+            request.top_p,
+            request.repetition_penalty,
+            20,  # repetition_context_size
+            False,  # use_chat_template
+            False,  # interactive
         )
 
-        thoughts = extract_thoughts(generated_text)
         reply = extract_reply(generated_text)
         python_code = extract_python_code(generated_text)
-        print(generated_text)
         result = ({}, "")
         if python_code:
             create_memory_if_not_exists()
@@ -260,26 +252,25 @@
                 import_module="server.mem_agent.tools",
             )
 
-        print(reply)
-        print(str(result))        
 
         remaining_tool_turns = _max_tool_turns
         while remaining_tool_turns > 0 and not reply:
-            _messages.append(ChatMessage(role="user", content=format_results(result[0], result[1])))
-            message_dicts = format_chat_messages_for_runner(_messages)
+            messages.append(ChatMessage(role="user", content=format_results(result[0], result[1])))
+            message_dicts = format_chat_messages_for_runner(messages)
             # Let the runner format with chat templates
             prompt = runner._format_conversation(message_dicts, use_chat_template=True)
-            generated_text = runner.generate_batch(
-                prompt=prompt
+            generated_text = await run_in_threadpool(
+                runner.generate_batch,
+                prompt,
+                None, 0.7, 0.9, 1.1, 20, False, False
             )
-            print(generated_text)
             # Extract the thoughts, reply and python code from the response
-            thoughts = extract_thoughts(generated_text)
             reply = extract_reply(generated_text)
             python_code = extract_python_code(generated_text)
 
-            _messages.append(ChatMessage(role="assistant", content=generated_text))
+            messages.append(ChatMessage(role="assistant", content=generated_text))
             if python_code:
                 create_memory_if_not_exists()
                 result = execute_sandboxed_code(
@@ -298,7 +289,7 @@
                 {
                     "index": 0,
                     "message": {
                         "role": "assistant",
-                        "content": reply
+                        "content": reply or generated_text.strip()
                     },
                     "finish_reason": "stop"
                 }

🧹 Nitpick comments (1)

src/runner/mlx.rs (1)

108-113: Remove or document the unused ping function.

The ping function is not called anywhere in the codebase. If it's intended for future use or debugging, consider marking it with #[allow(dead_code)] and adding a doc comment explaining its purpose. Otherwise, remove it to reduce maintenance burden.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e17f41 and 01ea71e.

📒 Files selected for processing (2)

server/api.py (1 hunks)
src/runner/mlx.rs (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{rs,toml}

⚙️ CodeRabbit configuration file

Review the Rust code for conformity with best practices in Rust, Systems programming. Highlight any deviations.

Files:

src/runner/mlx.rs

🧬 Code graph analysis (2)

src/runner/mlx.rs (3)

src/commands/mod.rs (1)

run (11-18)

src/core/modelfile.rs (4)

new (88-90)

new (107-119)

from_str (64-71)

from_str (235-237)

server/api.py (1)

ping (192-193)

server/api.py (5)

server/cache_utils.py (1)

get_model_path (182-201)

server/mlx_runner.py (5)

cleanup (321-357)

load_model (128-180)

_format_conversation (765-794)

generate_batch (584-689)

get_effective_max_tokens (359-389)

server/mem_agent/utils.py (5)

extract_python_code (152-172)

extract_reply (175-182)

extract_thoughts (185-192)

create_memory_if_not_exists (57-71)

format_results (195-203)

server/mem_agent/engine.py (1)

execute_sandboxed_code (200-314)

src/runner/mlx.rs (2)

load_model (115-132)

ping (108-113)

🪛 Ruff (0.14.1)

server/api.py

111-111: Unused method argument: memory_path

(ARG002)

111-111: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

113-113: Unused method argument: model

(ARG002)

113-113: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

114-114: Unused method argument: predetermined_memory_path

(ARG002)

115-115: Unused method argument: model_cache

(ARG002)

115-115: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

116-116: Unused method argument: current_model_path

(ARG002)

117-117: Unused method argument: default_max_tokens

(ARG002)

142-142: Unpacked variable commit_hash is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

144-144: Abstract raise to an inner function

(TRY301)

145-145: Do not catch blind exception: Exception

(BLE001)

146-146: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

146-146: Use explicit conversion flag

Replace with conversion flag

(RUF010)

162-163: try-except-pass detected, consider logging the exception

(S110)

162-162: Do not catch blind exception: Exception

(BLE001)

204-204: Consider moving this statement to an else block

(TRY300)

205-205: Do not catch blind exception: Exception

(BLE001)

206-206: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

275-275: Local variable thoughts is assigned to but never used

Remove assignment to unused variable thoughts

(F841)

312-312: Do not catch blind exception: Exception

(BLE001)

313-313: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

coderabbitai · 2025-10-26T20:43:08Z

server/api.py

+class Agent:
+    def __init__(
+        self,
+        max_tool_turns: int = 20,
+        memory_path: str = None,
+        use_vllm: bool = False,
+        model: str = None,
+        predetermined_memory_path: bool = False,
+        model_cache: Dict[str, MLXRunner] = {},
+        current_model_path: Optional[str] = None,
+        default_max_tokens: Optional[int] = None  # Use dynamic model-aware limits by default
+
+    ):
+        # Load the system prompt and add it to the conversation history
+        self.system_prompt = SYSTEM_PROMPT
+        self.messages: list[ChatMessage] = [
+            ChatMessage(role="system", content=self.system_prompt)
+        ]
+
+        # Set the maximum number of tool turns and use_vllm flag
+        self.max_tool_turns = max_tool_turns
+        self.use_vllm = use_vllm
+


🛠️ Refactor suggestion | 🟠 Major

Remove unused Agent parameters or complete implementation.

The Agent class declares several unused parameters: memory_path, model, predetermined_memory_path, model_cache, current_model_path, and default_max_tokens. Additionally, line 115 uses a mutable default argument which is an anti-pattern in Python.

Apply this diff to clean up the unused parameters and fix the mutable default:

class Agent: def __init__( self, max_tool_turns: int = 20, - memory_path: str = None, use_vllm: bool = False, - model: str = None, - predetermined_memory_path: bool = False, - model_cache: Dict[str, MLXRunner] = {}, - current_model_path: Optional[str] = None, - default_max_tokens: Optional[int] = None # Use dynamic model-aware limits by default - ): # Load the system prompt and add it to the conversation history self.system_prompt = SYSTEM_PROMPT self.messages: list[ChatMessage] = [ ChatMessage(role="system", content=self.system_prompt) ] # Set the maximum number of tool turns and use_vllm flag self.max_tool_turns = max_tool_turns self.use_vllm = use_vllm

If these parameters are intended for future use, implement them in the body; otherwise, remove them to avoid confusion.

🧰 Tools

🪛 Ruff (0.14.1)

111-111: Unused method argument: memory_path

(ARG002)

111-111: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

113-113: Unused method argument: model

(ARG002)

113-113: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

114-114: Unused method argument: predetermined_memory_path

(ARG002)

115-115: Unused method argument: model_cache

(ARG002)

115-115: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

116-116: Unused method argument: current_model_path

(ARG002)

117-117: Unused method argument: default_max_tokens

(ARG002)

🤖 Prompt for AI Agents

In server/api.py around lines 107 to 129, the Agent __init__ declares unused parameters (memory_path, model, predetermined_memory_path, model_cache, current_model_path, default_max_tokens) and uses a mutable default for model_cache; either remove the unused parameters or wire them to instance state, and fix the mutable default by changing model_cache: Optional[Dict[str, MLXRunner]] = None and then inside the body set self.model_cache = model_cache or {}. If you keep parameters for future use, assign them to self (e.g., self.memory_path = memory_path, self.model = model, etc.) and ensure default_max_tokens is stored or documented; otherwise drop them from the signature to avoid confusion.

coderabbitai · 2025-10-26T20:43:08Z

src/runner/mlx.rs

+    // build the arg list from modelfile
    let mut args: Vec<String> = vec![];
    args.push("--model".to_owned());
    args.push(modelfile.from.unwrap());


⚠️ Potential issue | 🔴 Critical

Don't unwrap modelfile.from; handle the missing FROM case.

Line 24 will panic if modelfile.from is None, which is inconsistent with the good error handling elsewhere in this function.

Apply this diff to handle the missing FROM case gracefully:

- args.push(modelfile.from.unwrap()); + let Some(from) = modelfile.from else { + eprintln!("❌ Error: Modelfile is missing a FROM instruction"); + return; + }; + args.push(from);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

args.push(modelfile.from.unwrap());

let Some(from) = modelfile.from else {

eprintln!("❌ Error: Modelfile is missing a FROM instruction");

return;

};

args.push(from);

🤖 Prompt for AI Agents

In src/runner/mlx.rs around line 24, don't unwrap modelfile.from; instead handle the None case gracefully: check whether modelfile.from is Some and push its value into args, or if None return/propagate a proper error (with a clear message consistent with other errors in this function) rather than panicking; use pattern matching (if let/ match) or map_err to convert the missing FROM into the function's error type and include context about the missing FROM field.

coderabbitai · 2025-10-26T20:43:09Z

src/runner/mlx.rs

+}
+
+fn get_memory_path() -> Result<String> {
+    let home_dir = env::home_dir().context("Failed to fetch $HOME")?;


⚠️ Potential issue | 🟠 Major

Replace deprecated env::home_dir() with the dirs crate.

std::env::home_dir() has been deprecated since Rust 1.29. It can produce incorrect results on Windows and has platform-specific issues. Use the dirs crate (specifically dirs::home_dir()) or the home crate instead.

First, add the dependency to Cargo.toml:

dirs = "5.0"

Then apply this diff:

fn get_memory_path() -> Result<String> { - let home_dir = env::home_dir().context("Failed to fetch $HOME")?; + let home_dir = dirs::home_dir().context("Failed to fetch home directory")?; let config_dir = match env::var("XDG_CONFIG_HOME") {

And update the imports:

-use std::{env, fs}; +use std::{fs, env};

🤖 Prompt for AI Agents

In src/runner/mlx.rs around line 160, replace the deprecated std::env::home_dir() call with dirs::home_dir(): add the dirs = "5.0" dependency to Cargo.toml, update the imports to use dirs::home_dir (and remove or stop using std::env::home_dir), call dirs::home_dir().context("Failed to fetch $HOME")? in place of the old call, and adjust any use of the returned PathBuf/Option handling if necessary to preserve the existing error path.

coderabbitai · 2025-10-26T20:43:09Z

src/runner/mlx.rs

+        fs::create_dir_all(&tiles_config_dir).context("Failed to create tiles config directory")?;
+        fs::write(
+            tiles_config_dir.join(".memory_path"),
+            memory_path.to_str().unwrap(),


⚠️ Potential issue | 🟠 Major

Don't unwrap to_str(); handle non‑UTF-8 paths.

Line 190 will panic if the memory path contains invalid UTF-8 characters. Use to_string_lossy() (as done correctly at line 193) or propagate an error instead.

Apply this diff:

fs::write( tiles_config_dir.join(".memory_path"), - memory_path.to_str().unwrap(), + memory_path.to_string_lossy().as_ref(), ) .context("Failed to write the default path to .memory_path")?;

🤖 Prompt for AI Agents

In src/runner/mlx.rs around line 190, the call memory_path.to_str().unwrap() can panic on non‑UTF8 paths; replace it with a non‑panicking approach (use memory_path.to_string_lossy() like at line 193, or propagate a Result error) so invalid UTF‑8 is handled safely—update the argument usage to accept the lossy string (or change the function signature to return/propagate an error) to avoid unwrap() panics.

- Added rust-toolchain.toml for consistent rust runtime - Added justfile for easy running commands - Updated CI to support toolchain and just, and cache

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/runner/mlx.rs (2)
55-73: Do not std::process::exit in library code; return early instead.

Exiting the process here harms callers and tests. Log and return.
         Err(e) => {
             if e.kind() == std::io::ErrorKind::NotFound {
                 eprintln!("❌ Error: mlx_lm.chat command not found");
                 eprintln!("💡 Hint: Install mlx-lm by running: pip install mlx-lm");
                 eprintln!("📝 Note: mlx-lm is only available on macOS with Apple Silicon");
-                std::process::exit(1);
+                return;
             } else {
                 eprintln!("❌ Error: Failed to spawn mlx_lm.chat: {}", e);
-                std::process::exit(1);
+                return;
             }
         }
11-106: Address panic paths and deprecated APIs in src/runner/mlx.rs

Verification confirms multiple issues that require fixes:

11 unwrap() calls panic on error (lines 12, 24, 81, 82, 83, 87, 89, 126, 145, 148, 149, 190): Replace with proper error handling using ?, .context(), or .map_err().

Deprecated env::home_dir() (line 160): Replace with dirs::home_dir() or std::env::var("HOME").

Hard-coded daemon URLs (lines 110, 122, 141 with http://127.0.0.1:6969): Extract to a constant or configuration.

♻️ Duplicate comments (7)

src/runner/mlx.rs (7)

20-26: Avoid unwrap on FROM in subprocess path.

Gracefully handle missing FROM rather than panicking.

-    args.push("--model".to_owned());
-    args.push(modelfile.from.unwrap());
+    args.push("--model".to_owned());
+    let Some(from) = modelfile.from else {
+        eprintln!("❌ Error: Modelfile is missing a FROM instruction");
+        return;
+    };
+    args.push(from);

11-18: Handle missing FROM and relax mem‑agent detection; don’t drop errors.

Unwrap can panic; make detection flexible and surface server errors.

-pub async fn run(modelfile: Modelfile) {
-    let model = modelfile.from.as_ref().unwrap();
-    if model.starts_with("driaforall/mem-agent") {
-        let _res = run_model_with_server(modelfile).await;
-    } else {
-        run_model_by_sub_process(modelfile);
-    }
-}
+pub async fn run(modelfile: Modelfile) {
+    let Some(model) = modelfile.from.as_deref() else {
+        eprintln!("❌ Error: Modelfile is missing a FROM instruction");
+        return;
+    };
+    if is_mem_agent(model) {
+        if let Err(e) = run_model_with_server(modelfile).await {
+            eprintln!("❌ Server path failed: {}", e);
+        }
+    } else {
+        run_model_by_sub_process(modelfile);
+    }
+}
+
+fn is_mem_agent(model: &str) -> bool {
+    model.to_ascii_lowercase().contains("mem-agent")
+}

75-106: Interactive server path: replace unwraps, surface errors, and keep session alive.

Multiple unwraps can panic; also prefer anyhow::Result for consistency.

-async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> {
+async fn run_model_with_server(modelfile: Modelfile) -> Result<()> {
     let stdin = io::stdin();
     let mut stdout = io::stdout();
     // loading the model from mem-agent via daemon server
-    let memory_path = get_memory_path()
-        .context("Retrieving memory_path failed")
-        .unwrap();
-    let modelname = modelfile.from.as_ref().unwrap();
-    load_model(modelname, &memory_path).await.unwrap();
+    let memory_path = get_memory_path().context("Retrieving memory_path failed")?;
+    let Some(modelname) = modelfile.from.as_deref() else {
+        eprintln!("❌ Error: Modelfile is missing a FROM instruction");
+        return Ok(());
+    };
+    if let Err(e) = load_model(modelname, &memory_path).await {
+        eprintln!("❌ Failed to start model on server: {}", e);
+        return Ok(());
+    }
     println!("Running in interactive mode");
     loop {
         print!(">> ");
-        stdout.flush().unwrap();
+        if let Err(e) = stdout.flush() {
+            eprintln!("⚠️ flush failed: {}", e);
+        }
         let mut input = String::new();
-        stdin.read_line(&mut input).unwrap();
+        if let Err(e) = stdin.read_line(&mut input) {
+            eprintln!("⚠️ read_line failed: {}", e);
+            continue;
+        }
         let input = input.trim();
         match input {
             "exit" => {
                 println!("Exiting interactive mode");
                 break;
             }
             _ => {
-                if let Ok(response) = chat(input, modelname).await {
-                    println!(">> {}", response)
-                } else {
-                    println!(">> failed to respond")
-                }
+                match chat(input, modelname).await {
+                    Ok(response) => println!(">> {}", response),
+                    Err(e) => println!(">> failed to respond: {}", e),
+                }
             }
         }
     }
     Ok(())
 }

115-132: HTTP start: add timeout, check is_success, remove unwraps, include status/body.

Avoid panics and provide actionable errors.

 async fn load_model(model_name: &str, memory_path: &str) -> Result<(), String> {
-    let client = Client::new();
+    let client = reqwest::Client::builder()
+        .timeout(Duration::from_secs(30))
+        .build()
+        .map_err(|e| e.to_string())?;
     let body = json!({
         "model": model_name,
         "memory_path": memory_path
     });
     let res = client
-        .post("http://127.0.0.1:6969/start")
+        .post(format!("{}/start", DEFAULT_BASE_URL))
         .json(&body)
         .send()
-        .await
-        .unwrap();
-    if res.status() == 200 {
+        .await
+        .map_err(|e| e.to_string())?;
+    if res.status().is_success() {
         Ok(())
     } else {
-        Err(String::from("request failed"))
+        let status = res.status();
+        let body = res.text().await.unwrap_or_default();
+        Err(format!("request failed: {} - {}", status, body))
     }
 }

134-157: HTTP chat: add timeout, check is_success, parse JSON safely, remove unwraps.

Prevent panics; return clear errors on malformed responses.

 async fn chat(input: &str, model_name: &str) -> Result<String, String> {
-    let client = Client::new();
+    let client = reqwest::Client::builder()
+        .timeout(Duration::from_secs(60))
+        .build()
+        .map_err(|e| e.to_string())?;
     let body = json!({
         "model": model_name,
         "messages": [{"role": "user", "content": input}]
     });
     let res = client
-        .post("http://127.0.0.1:6969/v1/chat/completions")
+        .post(format!("{}/v1/chat/completions", DEFAULT_BASE_URL))
         .json(&body)
         .send()
-        .await
-        .unwrap();
-    // println!("{:?}", res);
-    if res.status() == 200 {
-        let text = res.text().await.unwrap();
-        let v: Value = serde_json::from_str(&text).unwrap();
-        let content = v["choices"][0]["message"]["content"]
-            .as_str()
-            .unwrap_or("<no content>");
-        Ok(content.to_owned())
+        .await
+        .map_err(|e| e.to_string())?;
+    if res.status().is_success() {
+        let text = res.text().await.map_err(|e| e.to_string())?;
+        let v: Value = serde_json::from_str(&text).map_err(|e| e.to_string())?;
+        let content = v
+            .get("choices").and_then(|c| c.get(0))
+            .and_then(|c0| c0.get("message"))
+            .and_then(|m| m.get("content"))
+            .and_then(|c| c.as_str())
+            .ok_or_else(|| "missing choices[0].message.content".to_string())?;
+        Ok(content.to_owned())
     } else {
-        Err(String::from("request failed"))
+        let status = res.status();
+        let body = res.text().await.unwrap_or_default();
+        Err(format!("request failed: {} - {}", status, body))
     }
 }

159-165: Replace deprecated env::home_dir() with dirs::home_dir().

Avoid deprecated API and platform quirks.

 fn get_memory_path() -> Result<String> {
-    let home_dir = env::home_dir().context("Failed to fetch $HOME")?;
+    let home_dir = dirs::home_dir().context("Failed to fetch home directory")?;
     let config_dir = match env::var("XDG_CONFIG_HOME") {

Add to Cargo.toml:

[dependencies]
dirs = "5"

189-191: Don’t unwrap to_str(); handle non‑UTF‑8 paths.

Use to_string_lossy() like you already do on Line 193.

         fs::write(
             tiles_config_dir.join(".memory_path"),
-            memory_path.to_str().unwrap(),
+            memory_path.to_string_lossy().as_ref(),
         )
         .context("Failed to write the default path to .memory_path")?;

🧹 Nitpick comments (5)

justfile (1)

9-12: Consider adding Python checks for the new server.

The PR introduces a FastAPI server (server/), but the justfile only validates Rust code. Consider adding targets for Python linting, formatting (e.g., ruff, black), and type checking (e.g., mypy) to maintain consistency across the polyglot codebase.
.github/workflows/rust.yml (1)
19-19: Pin the rust-toolchain action to a stable version.

Using @master makes the workflow vulnerable to unexpected breaking changes. Pin to a specific version tag or use a stable channel reference.

Apply this diff to pin to a stable version:
-    - uses: dtolnay/rust-toolchain@master
+    - uses: dtolnay/rust-toolchain@stable
Alternatively, if you need a specific toolchain version (e.g., from rust-toolchain.toml), you can use:
    - uses: dtolnay/[email protected]
      with:
        toolchain: stable
src/runner/mlx.rs (3)
1-7: Introduce a default BASE_URL and request timeouts; avoid magic strings.

Centralize the daemon URL and prepare for timeouts in HTTP helpers.
 use anyhow::{Context, Result};
 use reqwest::Client;
 use serde_json::{Value, json};
 use std::io::Write;
 use std::path::PathBuf;
-use std::{env, fs};
+use std::{env, fs};
+use std::time::Duration;
 use std::{io, process::Command};
 
+// Default mem-agent daemon URL; allow overriding via env in helpers if desired.
+const DEFAULT_BASE_URL: &str = "http://127.0.0.1:6969";
176-184: Trim newline when reading .memory_path.

fs::read_to_string includes a trailing newline; trim to avoid invalid paths.
-    if tiles_config_dir.is_dir()
-        && let Ok(content) = fs::read_to_string(tiles_config_dir.join(".memory_path"))
-    {
-        memory_path = content;
+    if tiles_config_dir.is_dir()
+        && let Ok(content) = fs::read_to_string(tiles_config_dir.join(".memory_path"))
+    {
+        memory_path = content.trim().to_string();
         is_memory_path_found = true;
     }
115-157: Optional: allow overriding base URL via env.

Keep DEFAULT_BASE_URL but let users set TILES_MEM_AGENT_URL externally.
-        .post(format!("{}/start", DEFAULT_BASE_URL))
+        .post(format!("{}/start", env::var("TILES_MEM_AGENT_URL").unwrap_or_else(|_| DEFAULT_BASE_URL.to_string())))
@@
-        .post(format!("{}/v1/chat/completions", DEFAULT_BASE_URL))
+        .post(format!("{}/v1/chat/completions", env::var("TILES_MEM_AGENT_URL").unwrap_or_else(|_| DEFAULT_BASE_URL.to_string())))
Also applies to: 75-106

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01ea71e and ac9d39a.

📒 Files selected for processing (4)

.github/workflows/rust.yml (1 hunks)
justfile (1 hunks)
rust-toolchain.toml (1 hunks)
src/runner/mlx.rs (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{rs,toml}

⚙️ CodeRabbit configuration file

Review the Rust code for conformity with best practices in Rust, Systems programming. Highlight any deviations.

Files:

rust-toolchain.toml
src/runner/mlx.rs

🧬 Code graph analysis (1)

src/runner/mlx.rs (2)

src/commands/mod.rs (1)

run (11-18)

src/core/modelfile.rs (2)

new (88-90)

new (107-119)

🔇 Additional comments (4)

rust-toolchain.toml (1)

1-3: Rust 1.90.0 is a valid release. Rust 1.90.0 is an official stable release (released September 18, 2025). The toolchain configuration is well-formed, follows Rust best practices for reproducible builds, and specifies appropriate standard components.

justfile (1)

1-12: LGTM! Clean and idiomatic justfile structure.

The justfile provides a standard Rust development workflow with appropriate settings for CI:

Format checking with --check flag

Clippy with warnings-as-errors

Sequential execution that fails fast

.github/workflows/rust.yml (2)

20-30: LGTM! Cargo caching properly configured.

The cache configuration follows best practices by caching cargo binaries, registry, git dependencies, and build artifacts. The cache key based on Cargo.lock ensures cache invalidation when dependencies change, and the restore-keys provide good fallback behavior.

32-36: LGTM! Integration with justfile is clean.

The installation check for just and subsequent execution of just check properly centralizes the build/test logic, making it consistent between local development and CI.

madclaws added 5 commits October 18, 2025 16:54

Integrate mem-agent

1e58a88

feat: Python server for running mlx models

51d73d7

chore: added mlx-knife license

286e2d1

feat: Added /start and modified /completion api

64bff5e

- /start - Will load the model to memory and prepare the internal agent - /completions - Will now execute the py fns and update the memory

feat: CLI changes for running mem-agent model

2e17f41

- Native interactive interface if model is mem-agent family - Communication between the py server for load model and chat

madclaws changed the title ~~Using mem-agent models for local memory~~ Adding local memory management Oct 20, 2025

Merge branch 'main' into feat/integrate-mem-aget

bf6bfeb

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

coderabbitai bot reviewed Oct 26, 2025

View reviewed changes

build: Added config files for consistent dev env

ac9d39a

- Added rust-toolchain.toml for consistent rust runtime - Added justfile for easy running commands - Updated CI to support toolchain and just, and cache

coderabbitai bot reviewed Oct 26, 2025

View reviewed changes

fix: CI toolchain configs

823ecd4

		agent: Agent()

		def get_or_load_model(model_spec: str, verbose: bool = False) -> MLXRunner:

Uh oh!

Adding local memory management #7

Are you sure you want to change the base?

Adding local memory management #7

Uh oh!

Conversation

madclaws commented Oct 20, 2025

Uh oh!

coderabbitai bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Oct 20, 2025 •

edited

Loading