Skip to content

Conversation

@madclaws
Copy link
Member

No description provided.

- /start - Will load the model to memory and prepare the internal agent
- /completions - Will now execute the py fns and update the memory
- Native interactive interface if model is mem-agent family
- Communication between the py server for load model and chat
@coderabbitai
Copy link

coderabbitai bot commented Oct 20, 2025

📝 Walkthrough

Walkthrough

Adds a new Python FastAPI memory-agent server under server/ with model loading/caching, MLXRunner integration, sandboxed Python execution, filesystem memory utilities, model cache/health tools, reasoning parsing utilities, and a system prompt. Adds server entrypoint, pyproject, and packaging/tooling files. Updates Rust CLI to async via tokio, changes runner to optionally use the mem-agent server (with a subprocess fallback), updates command invocation sites, and swaps the model reference in b.modelfile. CI and tooling adjusted (GitHub Actions, justfile, rust-toolchain). Cargo.toml gained async/http/serde error-handling dependencies.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as Rust CLI (async)
    participant Server as FastAPI Server
    participant Cache as cache_utils
    participant MLX as MLXRunner
    participant Memory as mem_agent/tools + utils
    participant Sandbox as mem_agent/engine
    participant Reasoning as reasoning_utils

    CLI->>Server: POST /start { model, memory_path }
    Server->>Cache: resolve_single_model(model)
    Cache-->>Server: model_path
    Server->>MLX: get_or_load_model(model_path)
    MLX-->>Server: runner ready

    CLI->>Server: POST /v1/chat/completions { messages, params }
    Server->>Memory: create_memory_if_not_exists(memory_path)
    Server->>MLX: format_chat_messages_for_runner(messages)
    Server->>MLX: generate_batch / generate_streaming
    MLX-->>Server: response (may include <python> / reasoning)
    Server->>Reasoning: extract reasoning & final_answer
    alt response contains python
        Server->>Memory: read_file(s) for context
        Server->>Sandbox: execute_sandboxed_code(python, allowed_path, ...)
        Sandbox->>Memory: perform tools (read/write/list/delete)
        Memory-->>Sandbox: tool results
        Sandbox-->>Server: (locals, error)
        Server->>MLX: provide tool results and iterate (tool turn)
    end
    Server-->>CLI: ChatCompletionResponse (assistant reply)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

  • Focus review on:
    • server/mem_agent/engine.py — sandbox subprocess, pickle-over-stdout, timeout, dynamic installs, builtin/module blacklists.
    • server/mem_agent/tools.py and server/mem_agent/utils.py — filesystem operations, size limits, path sanitization, edge-case handling.
    • server/cache_utils.py — model discovery, resolution heuristics, health checks, deletion logic.
    • server/mlx_runner.py — model loading, streaming generation, stop-token handling, resource cleanup.
    • server/api.py — FastAPI endpoints, global mutable state, concurrency, error responses, tool-turn loop.
    • server/reasoning_utils.py — parsing/streaming correctness across model formats.
    • server/config.py — reading SYSTEM_PROMPT at import time (missing-file behavior).
    • Rust changes: src/main.rs and src/runner/mlx.rs — tokio integration, async paths, HTTP client interactions, subprocess fallback.
    • CI/tooling and manifests: Cargo.toml new deps, .github/workflows, justfile, rust-toolchain.toml, and b.modelfile model reference.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request contains no description provided by the author. An empty description provides zero information about the changeset and cannot be evaluated as "related in some way" to the changes, which is the lenient passing criterion. The description fails to convey any meaningful context, rationale, or explanation for the substantial modifications including the new server infrastructure, memory management system, and async runtime integration. Add a pull request description that explains the purpose and scope of the changes. For this PR, the description should cover the addition of the memory agent server infrastructure, the FastAPI API implementation, the local memory management capabilities, the integration with the Rust CLI via async runtime, and the model configuration changes. This will help reviewers understand the intent and impact of these substantial modifications.
Docstring Coverage ⚠️ Warning Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "Adding local memory management" accurately refers to a significant portion of the changeset, specifically the new memory agent system implemented in server/mem_agent/ with tools, utilities, and sandboxed execution capabilities. However, the changeset is substantially broader than this title suggests, encompassing a complete FastAPI server implementation (server/api.py, config.py, main.py), model changes (modelfile), dependencies for HTTP/async support (Cargo.toml), Rust async runtime integration, and associated build tooling. While the title is not misleading or off-topic, it captures only one major component of a multi-faceted change and does not adequately represent the full scope of the pull request.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/integrate-mem-aget

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac9d39a and 823ecd4.

📒 Files selected for processing (1)
  • .github/workflows/rust.yml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/rust.yml

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@madclaws madclaws changed the title Using mem-agent models for local memory Adding local memory management Oct 20, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 34

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Cargo.toml (1)

4-4: Invalid Rust edition; compilation will fail.

The edition field must be one of 2015, 2018, or 2021. The value 2024 does not exist and will cause a compilation error.

Apply this fix:

-edition = "2024"
+edition = "2021"
🧹 Nitpick comments (15)
server/.gitignore (1)

1-3: Solid foundation, but consider adding .env for secrets.

The existing entries are appropriate for a Python project. However, since this PR introduces a FastAPI server (which typically requires environment configuration for API endpoints, model paths, secrets, etc.), consider adding common patterns to prevent accidental commits of sensitive data.

Consider enhancing the .gitignore with these additions:

 __pycache__/
 *.egg-info/
 .venv/
+.env
+.env.local
+.pytest_cache/
+dist/
+build/

Rationale:

  • .env / .env.local: Protects secrets (API keys, model cache paths, credentials) from being committed.
  • .pytest_cache/: Covers test metadata if pytest is used for validation.
  • dist/ / build/: Standard for Python packages if future packaging/distribution is planned.
Cargo.toml (2)

9-9: Specify explicit TLS backend for reproducible builds.

reqwest defaults to native-tls, but for deterministic and reproducible builds, explicitly choose between rustls or native-tls as recommended in best practices.

Replace with one of these options:

-reqwest = { version = "0.12", features = ["json", "blocking"] }
+reqwest = { version = "0.12", features = ["json", "blocking", "rustls-tls"] }

or if you prefer native TLS:

-reqwest = { version = "0.12", features = ["json", "blocking"] }
+reqwest = { version = "0.12", features = ["json", "blocking", "native-tls"] }

Note: This assumes the blocking feature is actually used in the codebase. If only async calls are made, consider removing it to reduce dependency bloat.


13-13: Pin tokio to a stable minor version for production consistency.

Using version = "1" (major only) allows any 1.x minor/patch, which can introduce unexpected behavioral changes across environments. Per best practices, pin to a specific LTS minor like ~1.43 or ~1.47.

Apply this change:

-tokio = { version = "1" , features = ["macros", "rt-multi-thread"]}
+tokio = { version = "~1.47", features = ["macros", "rt-multi-thread"]}

Alternatively, you can use 1.47 to allow patch updates within 1.47.x while staying stable for minor versions.

server/mem_agent/__init__.py (1)

1-1: Consider adding a module docstring for clarity.

This is an empty package initializer, which is correct. As an optional improvement, adding a brief module docstring would help developers understand the purpose of the mem_agent subsystem when exploring the codebase.

Example:

"""
Memory agent subsystem for local memory management and reasoning.

Provides engines, tools, and utilities for sandboxed code execution,
memory persistence, and multi-step reasoning workflows.
"""
server/pyproject.toml (1)

6-11: Move Black to dev extras and add minimal pins; consider uvicorn[standard].

Black is a formatter and shouldn’t be a runtime dependency. Add minimal version pins for stability and use uvicorn[standard] to pull performant wheels (uvloop/httptools) by default.

Apply this diff:

 [project]
 name = "server"
 version = "0.1.0"
 description = "Local MLX inference server for the Tiles CLI"
 requires-python = ">=3.10"
 dependencies = [
-    "fastapi",
-    "uvicorn",
-    "mlx-lm",
-    "black"
+    "fastapi>=0.95.2",
+    "uvicorn[standard]>=0.34",
+    "mlx-lm"
 ]
 
+[project.optional-dependencies]
+dev = [
+    "black>=24.3.0"
+]

Based on learnings.

server/main.py (1)

11-15: Tidy runner and enable basic logging.

Add log_level for visibility and drop the commented PID code (or implement it). Keeping 127.0.0.1 is fine for local use.

-    uvicorn.run(app, host="127.0.0.1", port=PORT)
+    uvicorn.run(app, host="127.0.0.1", port=PORT, log_level="info")
src/main.rs (1)

22-34: Optional: add structured logging early.

A basic tracing subscriber helps diagnose async issues.

 use clap::{Parser, Subcommand};
 mod commands;
+use tracing_subscriber::{fmt, EnvFilter};
@@
 #[tokio::main]
 pub async fn main() -> Result<(), Box<dyn Error>> {
+    let _ = fmt().with_env_filter(EnvFilter::from_default_env()).try_init();
src/runner/mlx.rs (3)

1-5: HTTP timeouts + remove unused import.

Add request timeouts (avoid hangs) and drop unused NulError.

-use reqwest::Client;
+use reqwest::Client;
 use serde_json::{Value, json};
-use std::ffi::NulError;
 use std::io::Write;
-use std::{io, process::Command};
+use std::{io, process::Command, time::Duration};

And add a base URL constant for reuse:

 use crate::core::modelfile::Modelfile;
 
+const BASE_URL: &str = "http://127.0.0.1:6969";

18-71: Blocking subprocess in async context and deep std::process::exit.

Spawning and waiting with std::process blocks a Tokio worker; and exiting from a library module makes composition/testing hard. Prefer tokio::process::Command and bubble errors up to the caller.

  • Switch to tokio::process::Command and .await the child.
  • Change run_model_by_sub_process to return Result<(), anyhow::Error> (or custom) and handle errors at the top of run.
  • Replace std::process::exit(1) with return Err(e.into()) and print user-friendly hints at the call site.

105-110: Reuse BASE_URL and (optionally) add a timeout for ping.

Small consistency fix.

-    let res = client.get("http://127.0.0.1:6969/ping").send().await?;
+    let res = client.get(format!("{}/ping", BASE_URL)).send().await?;
server/config.py (1)

7-7: Use Path.joinpath or os.path.join for path construction.

String concatenation for path construction is error-prone and platform-dependent. Use Path methods for consistency with the rest of the file.

Apply this diff:

-MEMORY_PATH = os.path.expanduser("~") + "/tiles_memory"
+MEMORY_PATH = str(Path.home() / "tiles_memory")
server/model_card.py (1)

24-24: Remove unused noqa directive.

The UP045 rule is not enabled, making this directive unnecessary.

Apply this diff:

-# ruff: noqa: UP045
server/reasoning_utils.py (1)

40-70: Annotate PATTERNS with typing.ClassVar.

The PATTERNS dictionary is a mutable class attribute that should be annotated with ClassVar to indicate it's a class-level attribute rather than an instance attribute.

Apply this diff:

+from typing import ClassVar, Dict, Optional, Tuple
+
 class ReasoningExtractor:
     """Extract reasoning and final answer from model outputs."""
     
     # Model-specific patterns
-    PATTERNS = {
+    PATTERNS: ClassVar[Dict[str, Dict]] = {
         'gpt-oss': {
server/mem_agent/engine.py (1)

19-189: Add AST validation before code execution.

The system prompt (server/system_prompt.txt line 269) states: "Your <python> block MUST compile under ast.parse and yield no SyntaxError". However, the sandbox doesn't validate this before execution. Invalid code will fail at exec() time, wasting resources and making error messages less clear.

Add AST validation before execution:

import ast

def _run_user_code(
    code: str,
    allow_installs: bool,
    allowed_path: str,
    blacklist: list,
    available_functions: dict,
    log: bool = False,
) -> tuple[dict, str]:
    """Execute code under sandboxed conditions..."""
    
    # Validate syntax before execution
    try:
        ast.parse(code)
    except SyntaxError as e:
        return {}, f"SyntaxError in code: {e}"
    
    try:
        # ... rest of the function

This provides faster feedback and matches the documented requirement.

server/mlx_runner.py (1)

300-301: Narrow bare except when probing tokenizer special tokens.

Use “except Exception” and optionally log at verbose.

-            except:
-                pass
+            except Exception as err:
+                if self.verbose:
+                    print(f"[DEBUG] Failed to probe token '{token}': {err}")
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 022bb7f and 2e17f41.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • server/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (20)
  • Cargo.toml (1 hunks)
  • b.modelfile (1 hunks)
  • server/.gitignore (1 hunks)
  • server/__init__.py (1 hunks)
  • server/api.py (1 hunks)
  • server/cache_utils.py (1 hunks)
  • server/config.py (1 hunks)
  • server/main.py (1 hunks)
  • server/mem_agent/__init__.py (1 hunks)
  • server/mem_agent/engine.py (1 hunks)
  • server/mem_agent/tools.py (1 hunks)
  • server/mem_agent/utils.py (1 hunks)
  • server/mlx_runner.py (1 hunks)
  • server/model_card.py (1 hunks)
  • server/pyproject.toml (1 hunks)
  • server/reasoning_utils.py (1 hunks)
  • server/system_prompt.txt (1 hunks)
  • src/commands/mod.rs (1 hunks)
  • src/main.rs (1 hunks)
  • src/runner/mlx.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{rs,toml}

⚙️ CodeRabbit configuration file

Review the Rust code for conformity with best practices in Rust, Systems programming. Highlight any deviations.

Files:

  • src/main.rs
  • server/pyproject.toml
  • src/runner/mlx.rs
  • src/commands/mod.rs
  • Cargo.toml
🧬 Code graph analysis (8)
src/main.rs (3)
src/core/modelfile.rs (1)
  • parse (253-258)
src/commands/mod.rs (1)
  • run (11-18)
src/runner/mlx.rs (1)
  • run (9-16)
src/runner/mlx.rs (2)
src/commands/mod.rs (1)
  • run (11-18)
src/core/modelfile.rs (4)
  • new (88-90)
  • new (107-119)
  • Modelfile (106-225)
  • Modelfile (94-104)
server/reasoning_utils.py (1)
server/cache_utils.py (1)
  • detect_model_type (277-303)
src/commands/mod.rs (2)
src/runner/mlx.rs (1)
  • run (9-16)
src/core/modelfile.rs (1)
  • parse_from_file (246-251)
server/mlx_runner.py (1)
server/reasoning_utils.py (6)
  • ReasoningExtractor (36-182)
  • StreamingReasoningParser (246-430)
  • detect_model_type (73-86)
  • process_token (200-243)
  • process_token (260-414)
  • finalize (416-430)
server/mem_agent/tools.py (1)
server/mem_agent/utils.py (2)
  • check_size_limits (35-54)
  • create_memory_if_not_exists (57-71)
server/cache_utils.py (2)
server/model_card.py (2)
  • read_readme_front_matter (139-167)
  • tokenizer_has_chat_template (170-184)
server/mlx_runner.py (1)
  • run_model_enhanced (955-1049)
server/api.py (5)
server/cache_utils.py (2)
  • detect_framework (241-274)
  • get_model_path (182-201)
server/mlx_runner.py (5)
  • cleanup (321-357)
  • load_model (128-180)
  • _format_conversation (765-794)
  • generate_batch (584-689)
  • get_effective_max_tokens (359-389)
server/mem_agent/utils.py (5)
  • extract_python_code (152-172)
  • extract_reply (175-182)
  • extract_thoughts (185-192)
  • create_memory_if_not_exists (57-71)
  • format_results (195-203)
server/mem_agent/engine.py (1)
  • execute_sandboxed_code (200-314)
src/runner/mlx.rs (2)
  • load_model (112-129)
  • ping (105-110)
🪛 LanguageTool
server/system_prompt.txt

[style] ~46-~46: This phrase is redundant. Consider using “outside”.
Context: ...Skip the <think> block - Provide text outside of these tags - Use <reply> when you hav...

(OUTSIDE_OF)


[style] ~257-~257: Using many exclamation marks might seem excessive (in this case: 9 exclamation marks for a text that’s 5062 characters long)
Context: ...thout assignment return empty {} results! 9. Wait for Results: After submitti...

(EN_EXCESSIVE_EXCLAMATION)

🪛 Ruff (0.14.1)
server/reasoning_utils.py

40-70: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

server/model_card.py

24-24: Unused noqa directive (non-enabled: UP045)

Remove unused noqa directive

(RUF100)


55-55: Do not catch blind exception: Exception

(BLE001)


165-165: Consider moving this statement to an else block

(TRY300)


166-166: Do not catch blind exception: Exception

(BLE001)


183-183: Do not catch blind exception: Exception

(BLE001)

server/mem_agent/engine.py

37-37: Do not catch blind exception: Exception

(BLE001)


53-55: Abstract raise to an inner function

(TRY301)


53-55: Avoid specifying long messages outside the exception class

(TRY003)


67-69: Abstract raise to an inner function

(TRY301)


67-69: Avoid specifying long messages outside the exception class

(TRY003)


80-82: Abstract raise to an inner function

(TRY301)


80-82: Avoid specifying long messages outside the exception class

(TRY003)


103-104: try-except-pass detected, consider logging the exception

(S110)


103-103: Do not catch blind exception: Exception

(BLE001)


128-128: subprocess call: check for execution of untrusted input

(S603)


134-134: Do not catch blind exception: Exception

(BLE001)


136-138: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


139-139: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


156-156: Use of exec detected

(S102)


157-157: Do not catch blind exception: Exception

(BLE001)


162-162: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


183-183: Do not catch blind exception: Exception

(BLE001)


189-189: Consider moving this statement to an else block

(TRY300)


191-191: Do not catch blind exception: Exception

(BLE001)


194-196: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


197-197: Use explicit conversion flag

Replace with conversion flag

(RUF010)


204-204: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


205-205: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


206-206: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


207-207: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


208-208: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


238-238: subprocess call: check for execution of untrusted input

(S603)


244-244: Do not catch blind exception: Exception

(BLE001)


245-247: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


270-270: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


287-287: subprocess call: check for execution of untrusted input

(S603)


295-297: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


307-307: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


308-308: Do not catch blind exception: Exception

(BLE001)


321-321: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

server/mem_agent/utils.py

70-70: Do not catch blind exception: Exception

(BLE001)


115-115: f-string without any placeholders

Remove extraneous f prefix

(F541)


130-130: Consider moving this statement to an else block

(TRY300)


131-131: Do not use bare except

(E722)


131-133: try-except-pass detected, consider logging the exception

(S110)


144-144: Consider moving this statement to an else block

(TRY300)


146-146: Do not catch blind exception: Exception

(BLE001)


146-146: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

server/mlx_runner.py

72-72: Consider moving this statement to an else block

(TRY300)


110-110: Avoid specifying long messages outside the exception class

(TRY003)


115-115: Consider moving this statement to an else block

(TRY300)


142-143: try-except-pass detected, consider logging the exception

(S110)


142-142: Do not catch blind exception: Exception

(BLE001)


180-180: Avoid specifying long messages outside the exception class

(TRY003)


221-221: Possible hardcoded password assigned to: "token_content"

(S105)


235-235: Possible hardcoded password assigned to: "token_content"

(S105)


300-300: Do not use bare except

(E722)


300-301: try-except-pass detected, consider logging the exception

(S110)


348-349: try-except-pass detected, consider logging the exception

(S110)


348-348: Do not catch blind exception: Exception

(BLE001)


421-421: Avoid specifying long messages outside the exception class

(TRY003)


611-611: Avoid specifying long messages outside the exception class

(TRY003)


761-761: Do not catch blind exception: Exception

(BLE001)


787-787: Consider moving this statement to an else block

(TRY300)


788-788: Do not catch blind exception: Exception

(BLE001)


829-829: Do not catch blind exception: Exception

(BLE001)


854-854: Unpacked variable before_reasoning is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


876-878: try-except-pass detected, consider logging the exception

(S110)


876-876: Do not catch blind exception: Exception

(BLE001)


1047-1047: Do not catch blind exception: Exception

(BLE001)

server/mem_agent/tools.py

26-26: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)


40-40: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)


49-49: Avoid specifying long messages outside the exception class

(TRY003)


88-88: Abstract raise to an inner function

(TRY301)


88-88: Create your own exception

(TRY002)


88-88: Avoid specifying long messages outside the exception class

(TRY003)


89-89: Do not catch blind exception: Exception

(BLE001)


94-94: Do not catch blind exception: Exception

(BLE001)


95-95: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


95-95: Create your own exception

(TRY002)


95-95: Avoid specifying long messages outside the exception class

(TRY003)


96-96: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


96-96: Create your own exception

(TRY002)


96-96: Avoid specifying long messages outside the exception class

(TRY003)


110-110: Consider moving this statement to an else block

(TRY300)


111-111: Do not catch blind exception: Exception

(BLE001)


178-178: Consider moving this statement to an else block

(TRY300)


182-182: Do not catch blind exception: Exception

(BLE001)


183-183: Use explicit conversion flag

Replace with conversion flag

(RUF010)


207-207: Do not catch blind exception: Exception

(BLE001)


230-230: Unused function argument: is_last

(ARG001)


277-277: Do not catch blind exception: Exception

(BLE001)


292-292: Consider moving this statement to an else block

(TRY300)


293-293: Do not catch blind exception: Exception

(BLE001)


328-328: Do not catch blind exception: Exception

(BLE001)

server/cache_utils.py

155-155: Unpacked variable found_model_dir is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


249-249: Unpacked variable pipeline is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


252-253: try-except-pass detected, consider logging the exception

(S110)


252-252: Do not catch blind exception: Exception

(BLE001)


266-266: Do not catch blind exception: Exception

(BLE001)


283-284: try-except-pass detected, consider logging the exception

(S110)


283-283: Do not catch blind exception: Exception

(BLE001)


294-295: try-except-pass detected, consider logging the exception

(S110)


294-294: Do not catch blind exception: Exception

(BLE001)


315-315: Do not catch blind exception: Exception

(BLE001)


376-376: Do not catch blind exception: Exception

(BLE001)


389-389: Do not catch blind exception: Exception

(BLE001)


458-460: try-except-continue detected, consider logging the exception

(S112)


458-458: Do not catch blind exception: Exception

(BLE001)


525-525: Do not catch blind exception: Exception

(BLE001)


572-572: Do not catch blind exception: Exception

(BLE001)


745-745: Starting a process with a shell, possible injection detected

(S605)


971-971: Do not catch blind exception: Exception

(BLE001)


1035-1035: Do not catch blind exception: Exception

(BLE001)


1054-1054: Do not catch blind exception: Exception

(BLE001)


1091-1091: Do not catch blind exception: Exception

(BLE001)

server/api.py

110-110: Unused method argument: memory_path

(ARG002)


110-110: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


112-112: Unused method argument: model

(ARG002)


112-112: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


113-113: Unused method argument: predetermined_memory_path

(ARG002)


114-114: Unused method argument: model_cache

(ARG002)


114-114: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


115-115: Unused method argument: current_model_path

(ARG002)


116-116: Unused method argument: default_max_tokens

(ARG002)


141-141: Unpacked variable commit_hash is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


143-143: Abstract raise to an inner function

(TRY301)


144-144: Do not catch blind exception: Exception

(BLE001)


145-145: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


145-145: Use explicit conversion flag

Replace with conversion flag

(RUF010)


161-162: try-except-pass detected, consider logging the exception

(S110)


161-161: Do not catch blind exception: Exception

(BLE001)


203-203: Consider moving this statement to an else block

(TRY300)


204-204: Do not catch blind exception: Exception

(BLE001)


205-205: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


274-274: Local variable thoughts is assigned to but never used

Remove assignment to unused variable thoughts

(F841)


311-311: Do not catch blind exception: Exception

(BLE001)


312-312: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (5)
server/__init__.py (1)

1-1: File structure and package initialization look good.

An empty __init__.py correctly marks this directory as a Python package. Consider whether public APIs from submodules (e.g., from api.py, config.py) should be re-exported here for improved usability, depending on your module design intent. For now, this approach is reasonable and allows users to import explicitly.

src/main.rs (1)

22-29: Async entrypoint looks good.

Tokio main + awaiting Run is correct and keeps the original flow intact.

b.modelfile (1)

1-1: No issue found—intentional separation of execution paths.

The repository correctly uses two different modelfiles for two different execution strategies: memgpt.modelfile (with the mem-agent prefix) routes to server execution, while b.modelfile (with a different model) routes to subprocess execution. The gating logic at line 11 of src/runner/mlx.rs is functioning as designed, not misconfigured.

Likely an incorrect or invalid review comment.

src/runner/mlx.rs (1)

9-16: No issues found with mem-agent gating or FROM handling.

The code is safe by design: modelfile.build()? is called in create_modelfile before returning Ok, and build() returns Err if from.is_none(). This guarantee ensures that when mlx::run() receives a Modelfile, from is always Some, making all three unwraps safe.

Verification confirms:

  • mem-agent gating limited to one location (src/runner/mlx.rs:11)
  • No other hardcoded model name checks found
  • Missing FROM fields already validated before run() is invoked
server/reasoning_utils.py (1)

260-414: Verify streaming parser state machine for edge cases.

The StreamingReasoningParser.process_token method has complex state machine logic with buffering, pattern matching, and conditional token skipping. The logic for detecting partial pattern matches (lines 296-317) and conditional skip tokens (lines 392-400) could have edge cases that cause tokens to be incorrectly buffered or dropped.

Consider testing scenarios:

  • Reasoning markers split across multiple tokens
  • Skip tokens appearing in the middle of content (not just at boundaries)
  • Very short tokens that match prefixes of multiple patterns
  • Nested or malformed reasoning/final sections

The state machine should handle these gracefully without losing tokens or entering invalid states.

Comment on lines 47 to 49
_runner: MLXRunner = {}
_max_tool_turns = 5

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix invalid globals: _runner type/init.

Initialize to None with Optional type, not {}.

-_runner: MLXRunner = {}
+from typing import Optional
+_runner: Optional[MLXRunner] = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_runner: MLXRunner = {}
_max_tool_turns = 5
from typing import Optional
_runner: Optional[MLXRunner] = None
_max_tool_turns = 5
🤖 Prompt for AI Agents
In server/api.py around lines 47 to 49, the global _runner is incorrectly
typed/initialized as "{}"; change its declaration to use an Optional type and
initialize it to None (e.g., _runner: Optional[MLXRunner] = None), and ensure
Optional is imported from typing (and MLXRunner is in scope or imported) so the
global reflects a nullable runner rather than an empty dict.

Comment on lines +131 to +133
agent: Agent()

def get_or_load_model(model_spec: str, verbose: bool = False) -> MLXRunner:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

This line executes Agent() at import time and leaves ‘agent’ unbound.

Change to a proper variable or remove.

-agent: Agent()
+# Optional: instantiate later if needed
+agent: Optional[Agent] = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
agent: Agent()
def get_or_load_model(model_spec: str, verbose: bool = False) -> MLXRunner:
# Optional: instantiate later if needed
agent: Optional[Agent] = None
def get_or_load_model(model_spec: str, verbose: bool = False) -> MLXRunner:
🤖 Prompt for AI Agents
In server/api.py around lines 131-133, the line "agent: Agent()" is
invalid/undesirable because it invokes Agent() at import time and doesn't
properly bind the variable; replace it with either a proper type annotation and
lazy init like "agent: Optional[Agent] = None" and instantiate Agent() inside
the function that needs it (e.g., get_or_load_model) or if a module-level
singleton is intended, change it to an explicit assignment "agent = Agent()".
Ensure imports for Optional are added if using the first option.

Comment on lines +141 to +151
model_path, model_name, commit_hash = get_model_path(model_spec)
if not model_path.exists():
raise HTTPException(status_code=404, detail=f"Model {model_spec} not found in cache")
except Exception as e:
raise HTTPException(status_code=404, detail=f"Model {model_spec} not found: {str(e)}")

# Check if it's an MLX model

model_path_str = str(model_path)

print(_current_model_path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Robust model path existence check and remove prints.

Avoid AttributeError when get_model_path returns None; drop noisy prints.

-    print(model_spec)
@@
-        model_path, model_name, commit_hash = get_model_path(model_spec)
-        if not model_path.exists():
+        model_path, model_name, commit_hash = get_model_path(model_spec)
+        if not model_path or not Path(model_path).exists():
             raise HTTPException(status_code=404, detail=f"Model {model_spec} not found in cache")
@@
-    print(_current_model_path)
-    print(model_path_str)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.1)

141-141: Unpacked variable commit_hash is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


143-143: Abstract raise to an inner function

(TRY301)


144-144: Do not catch blind exception: Exception

(BLE001)


145-145: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


145-145: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents
In server/api.py around lines 141 to 151, get_model_path may return None causing
an AttributeError when calling model_path.exists(), and there is a stray
print(_current_model_path); change the logic to first verify get_model_path
returned a non-None tuple and that model_path is not None before calling
.exists(), raise a 404 with a clear message if model_path is None or does not
exist, catch only expected exceptions (or re-raise unexpected ones) rather than
blanket-catching Exception for control flow, and remove the print statement
(replace with appropriate logging if needed).

Comment on lines 190 to 206
@app.get("/ping")
async def ping():
return {"message": "Badda-Bing Badda-Bang"}

@app.post("/start")
async def start_model(request: StartRequest):
"""Load the model and start the agent"""
global _messages, _runner
print(str(request))
_messages = [ChatMessage(role="system", content=SYSTEM_PROMPT)]

try:
_runner = get_or_load_model(request.model)
return {"message": "Model loaded"}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

/start resets global _messages; drop shared state and PII prints.

Globals will mix conversations across clients and leak content. Maintain history per request instead.

-    global _messages, _runner
-    print(str(request))
-    _messages = [ChatMessage(role="system", content=SYSTEM_PROMPT)]
+    global _runner
     try:
         _runner = get_or_load_model(request.model)
         return {"message": "Model loaded"}
🧰 Tools
🪛 Ruff (0.14.1)

203-203: Consider moving this statement to an else block

(TRY300)


204-204: Do not catch blind exception: Exception

(BLE001)


205-205: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In server/api.py around lines 190 to 206, the start_model endpoint currently
prints the incoming request (potential PII) and resets the global _messages
which can mix conversations across clients; remove the print(str(request)) call
and do not mutate or reset any global conversation state in this handler.
Instead, keep model loading separate (e.g., set _runner =
get_or_load_model(request.model) only) and move conversation/history management
to per-request or per-session storage (pass history in request, store keyed by
session id, or use a request-scoped dependency) so each client gets isolated
history; ensure the endpoint returns model load status without touching or
clearing _messages and continue to surface errors via HTTPException as before.

Comment on lines 207 to 241
@app.post("/v1/chat/completions")
async def create_chat_completion(request: ChatCompletionRequest):
"""Create a chat completion."""
global _messages, _max_tool_turns
try:
runner = get_or_load_model(request.model)

# if request.stream:
# # Streaming response
# return StreamingResponse(
# generate_chat_stream(runner, request.messages, request),
# media_type="text/plain",
# headers={"Cache-Control": "no-cache"}
# )
# else:
# Non-streaming response
completion_id = f"chatcmpl-{uuid.uuid4()}"
created = int(time.time())

# Convert messages to dict format for runner
# _messages.append(system_message)
_messages.extend(request.messages)
message_dicts = format_chat_messages_for_runner(_messages)
# Let the runner format with chat templates
prompt = runner._format_conversation(message_dicts, use_chat_template=True)

generated_text = runner.generate_batch(
prompt=prompt,
max_tokens=runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False),
temperature=request.temperature,
top_p=request.top_p,
repetition_penalty=request.repetition_penalty,
use_chat_template=False # Already applied in _format_conversation
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t block the event loop; offload generation to threadpool.

MLX generation is CPU-bound; use run_in_threadpool.

-from fastapi.responses import StreamingResponse
+from fastapi.responses import StreamingResponse
+from starlette.concurrency import run_in_threadpool
@@
-        completion_id = f"chatcmpl-{uuid.uuid4()}"
+        completion_id = f"chatcmpl-{uuid.uuid4()}"
         created = int(time.time())
@@
-        _messages.extend(request.messages)
-        message_dicts = format_chat_messages_for_runner(_messages)
+        # Build per-request message history (avoid shared globals)
+        messages = [ChatMessage(role="system", content=SYSTEM_PROMPT), *request.messages]
+        message_dicts = format_chat_messages_for_runner(messages)
@@
-        generated_text = runner.generate_batch(
-            prompt=prompt,
-            max_tokens=runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False),
-            temperature=request.temperature,
-            top_p=request.top_p,
-            repetition_penalty=request.repetition_penalty,
-            use_chat_template=False  # Already applied in _format_conversation
-        )
+        generated_text = await run_in_threadpool(
+            runner.generate_batch,
+            prompt,
+            runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False),
+            request.temperature,
+            request.top_p,
+            request.repetition_penalty,
+            False,  # use_chat_template already applied
+            False,  # interactive
+        )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In server/api.py around lines 207 to 241, the code calls runner.generate_batch
(and potentially runner._format_conversation) directly which blocks the asyncio
event loop for CPU-bound ML generation; fix it by importing and using
FastAPI/Starlette's run_in_threadpool to offload the blocking work: move any
CPU/blocking preparation (formatting) into a sync function if needed, then await
run_in_threadpool(...) around the blocking generate_batch call (and formatting
if heavy) so generation runs on a worker thread and the event loop remains
responsive; ensure the call returns the generated_text to use in the response.

Comment on lines +11 to 15
pub async fn run(modelfile: &str) {
match modelfile::parse_from_file(modelfile) {
Ok(modelfile) => {
mlx::run(modelfile);
mlx::run(modelfile).await;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Validate Modelfile before running to avoid panic.

Runner unwraps FROM; call build() (or check FROM) here and fail fast with a clear message.

-pub async fn run(modelfile: &str) {
-    match modelfile::parse_from_file(modelfile) {
-        Ok(modelfile) => {
-            mlx::run(modelfile).await;
+pub async fn run(modelfile: &str) {
+    match modelfile::parse_from_file(modelfile) {
+        Ok(mut modelfile) => {
+            if let Err(e) = modelfile.build() {
+                eprintln!("{}", e);
+                return;
+            }
+            mlx::run(modelfile).await;
         }
         Err(err) => println!("{}", err),
     }
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/commands/mod.rs around lines 11-15, after successfully parsing the
modelfile you must validate/build it before passing to mlx::run to avoid a panic
when the runner later unwraps FROM; call the modelfile.build() (or the
appropriate validate/build method) and handle its Result, returning or logging a
clear error message and aborting if build fails, otherwise pass the
built/validated modelfile into mlx::run; ensure error paths produce a
user-friendly message that explains the invalid FROM.

Comment on lines +9 to +16
pub async fn run(modelfile: Modelfile) {
let model = modelfile.from.as_ref().unwrap();
if model.starts_with("driaforall/mem-agent") {
let _res = run_model_with_server(modelfile).await;
} else {
run_model_by_sub_process(modelfile);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard against missing FROM and relax mem‑agent detection.

Avoid panics and make the mem‑agent switch less brittle.

-pub async fn run(modelfile: Modelfile) {
-    let model = modelfile.from.as_ref().unwrap();
-    if model.starts_with("driaforall/mem-agent") {
-        let _res = run_model_with_server(modelfile).await;
-    } else {
-        run_model_by_sub_process(modelfile);
-    }
-}
+pub async fn run(mut modelfile: Modelfile) {
+    let Some(model) = modelfile.from.as_ref() else {
+        eprintln!("❌ Error: Modelfile is missing a FROM instruction");
+        return;
+    };
+    if is_mem_agent(model) {
+        if let Err(e) = run_model_with_server(modelfile).await {
+            eprintln!("❌ Error: {}", e);
+        }
+    } else {
+        run_model_by_sub_process(modelfile);
+    }
+}
+
+fn is_mem_agent(model: &str) -> bool {
+    model.contains("mem-agent")
+}
🤖 Prompt for AI Agents
In src/runner/mlx.rs around lines 9 to 16, avoid unwrapping modelfile.from
(which can panic) and make the mem-agent check more flexible: replace the unwrap
with an if let / match that handles None by logging or returning early, obtain a
&str via as_deref(), normalize to lowercase, and use contains("mem-agent") (or a
more specific substring if desired) instead of the strict starts_with; then call
run_model_with_server(modelfile).await when the match hits and handle/log its
result, otherwise call run_model_by_sub_process(modelfile) as before.

Comment on lines 73 to 103
async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> {
// println!("gonna ping");
// let _ = ping().await;
let stdin = io::stdin();
let mut stdout = io::stdout();
// loading the model from mem-agent via daeomn server
let modelname = modelfile.from.as_ref().unwrap();
load_model(&modelname).await.unwrap();
println!("Running in interactive mode");
loop {
print!(">> ");
stdout.flush().unwrap();
let mut input = String::new();
stdin.read_line(&mut input).unwrap();
let input = input.trim();
match input {
"exit" => {
println!("Exiting interactive mode");
break;
}
_ => {
if let Ok(response) = chat(input, &modelname).await {
println!(">> {}", response)
} else {
println!(">> failed to respond")
}
}
}
}
Ok(())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t unwrap in the interactive server path; handle failures gracefully.

Unwraps can panic if the server is down or returns non‑JSON. Propagate errors and keep the CLI alive.

-    let stdin = io::stdin();
-    let mut stdout = io::stdout();
-    // loading the model from mem-agent via daeomn server
-    let modelname = modelfile.from.as_ref().unwrap();
-    load_model(&modelname).await.unwrap();
+    let stdin = io::stdin();
+    let mut stdout = io::stdout();
+    // load the model via daemon server
+    let modelname = modelfile.from.as_ref().unwrap().to_owned();
+    if let Err(e) = load_model(&modelname).await {
+        eprintln!("❌ Failed to start model on server: {}", e);
+        return Ok(());
+    }
@@
-                if let Ok(response) = chat(input, &modelname).await {
+                if let Ok(response) = chat(input, &modelname).await {
                     println!(">> {}", response)
                 } else {
                     println!(">> failed to respond")
                 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> {
// println!("gonna ping");
// let _ = ping().await;
let stdin = io::stdin();
let mut stdout = io::stdout();
// loading the model from mem-agent via daeomn server
let modelname = modelfile.from.as_ref().unwrap();
load_model(&modelname).await.unwrap();
println!("Running in interactive mode");
loop {
print!(">> ");
stdout.flush().unwrap();
let mut input = String::new();
stdin.read_line(&mut input).unwrap();
let input = input.trim();
match input {
"exit" => {
println!("Exiting interactive mode");
break;
}
_ => {
if let Ok(response) = chat(input, &modelname).await {
println!(">> {}", response)
} else {
println!(">> failed to respond")
}
}
}
}
Ok(())
}
async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> {
// println!("gonna ping");
// let _ = ping().await;
let stdin = io::stdin();
let mut stdout = io::stdout();
// load the model via daemon server
let modelname = modelfile.from.as_ref().unwrap().to_owned();
if let Err(e) = load_model(&modelname).await {
eprintln!("❌ Failed to start model on server: {}", e);
return Ok(());
}
println!("Running in interactive mode");
loop {
print!(">> ");
stdout.flush().unwrap();
let mut input = String::new();
stdin.read_line(&mut input).unwrap();
let input = input.trim();
match input {
"exit" => {
println!("Exiting interactive mode");
break;
}
_ => {
if let Ok(response) = chat(input, &modelname).await {
println!(">> {}", response)
} else {
println!(">> failed to respond")
}
}
}
}
Ok(())
}
🤖 Prompt for AI Agents
In src/runner/mlx.rs around lines 73 to 103, several unwrap() calls in the
interactive server path (loading the model, flushing stdout, reading stdin, and
awaiting chat) can panic if the daemon/server is down or returns bad data;
replace those unwraps with proper error handling: handle load_model errors by
returning or logging a clear error and keep the CLI running (do not panic),
check and handle stdout.flush() and stdin.read_line() errors with a warning and
continue the loop, and handle chat(...) errors by logging the error details and
printing a friendly failure message while continuing the interactive loop;
ensure the function still returns a Result and propagate non-recoverable errors
upward but avoid panics inside the interactive loop.

Comment on lines 112 to 129
async fn load_model(model_name: &str) -> Result<(), String> {
let client = Client::new();
let body = json!({
"model": model_name
});
let res = client
.post("http://127.0.0.1:6969/start")
.json(&body)
.send()
.await
.unwrap();
// println!("{:?}", res);
if res.status() == 200 {
Ok(())
} else {
Err(String::from("request failed"))
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Return proper HTTP errors; add timeout and is_success.

Network failures should not panic. Include status in errors.

 async fn load_model(model_name: &str) -> Result<(), String> {
-    let client = Client::new();
+    let client = reqwest::Client::builder()
+        .timeout(Duration::from_secs(30))
+        .build()
+        .map_err(|e| e.to_string())?;
     let body = json!({
         "model": model_name
     });
-    let res = client
-        .post("http://127.0.0.1:6969/start")
+    let res = client
+        .post(format!("{}/start", BASE_URL))
         .json(&body)
         .send()
-        .await
-        .unwrap();
+        .await
+        .map_err(|e| e.to_string())?;
-    // println!("{:?}", res);
-    if res.status() == 200 {
+    if res.status().is_success() {
         Ok(())
     } else {
-        Err(String::from("request failed"))
+        Err(format!("request failed: {}", res.status()))
     }
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/runner/mlx.rs around lines 112-129, replace the unwrap and simplistic
status check with proper error handling and a request timeout: construct the
reqwest Client with a reasonable timeout, await the send() call without unwrap
and match on its Result to convert network errors into Err(String) (include the
error message), then check res.status().is_success() and on failure return Err
including the numeric status and optionally the response text; ensure no panics
and that timeouts/network failures become descriptive Err(String) returns
instead of unwrapping.

Comment on lines +131 to +154
async fn chat(input: &str, model_name: &str) -> Result<String, String> {
let client = Client::new();
let body = json!({
"model": model_name,
"messages": [{"role": "user", "content": input}]
});
let res = client
.post("http://127.0.0.1:6969/v1/chat/completions")
.json(&body)
.send()
.await
.unwrap();
// println!("{:?}", res);
if res.status() == 200 {
let text = res.text().await.unwrap();
let v: Value = serde_json::from_str(&text).unwrap();
let content = v["choices"][0]["message"]["content"]
.as_str()
.unwrap_or("<no content>");
Ok(content.to_owned())
} else {
Err(String::from("request failed"))
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid unwraps; parse response safely and surface non‑200s.

Also add a longer timeout for generation requests.

 async fn chat(input: &str, model_name: &str) -> Result<String, String> {
-    let client = Client::new();
+    let client = reqwest::Client::builder()
+        .timeout(Duration::from_secs(60))
+        .build()
+        .map_err(|e| e.to_string())?;
     let body = json!({
         "model": model_name,
         "messages": [{"role": "user", "content": input}]
     });
-    let res = client
-        .post("http://127.0.0.1:6969/v1/chat/completions")
+    let res = client
+        .post(format!("{}/v1/chat/completions", BASE_URL))
         .json(&body)
         .send()
-        .await
-        .unwrap();
-    // println!("{:?}", res);
-    if res.status() == 200 {
-        let text = res.text().await.unwrap();
-        let v: Value = serde_json::from_str(&text).unwrap();
+        .await
+        .map_err(|e| e.to_string())?;
+    if res.status().is_success() {
+        let text = res.text().await.map_err(|e| e.to_string())?;
+        let v: Value = serde_json::from_str(&text).map_err(|e| e.to_string())?;
         let content = v["choices"][0]["message"]["content"]
             .as_str()
             .unwrap_or("<no content>");
         Ok(content.to_owned())
     } else {
-        Err(String::from("request failed"))
+        Err(format!("request failed: {}", res.status()))
     }
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/runner/mlx.rs around lines 131 to 154, replace the current chat
function's unwraps and silent failure on non-200 by building the reqwest Client
with a longer timeout, using awaited results with ?/map_err (or match) to
propagate errors instead of unwrap, checking response.status().is_success() and
on non-success returning an Err that includes the status code and response body,
and parsing JSON with serde_json::from_str handling its Result so missing fields
produce a clear Err; finally extract the content by safely indexing with
get(...) and as_str().map(|s| s.to_owned()) or return an Err if absent—this
ensures timeouts are extended and all network/parse errors are surfaced instead
of panicking.

While loading the model we create the necessary files
which will track the memory_path and then the memory_path
to the py server in /start api, instead of py server
handling the creation of memory_path. This allows us to
configure the memory_path later via cli.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (9)
src/runner/mlx.rs (4)

11-18: Guard against missing FROM and relax mem‑agent detection.

The unwrap() at line 12 will panic if modelfile.from is None. The exact prefix match at line 13 is also brittle and will fail if the model name changes slightly.

This issue was previously flagged. Please refer to the past review comment for the suggested fix.


75-106: Don't unwrap in the interactive server path; handle failures gracefully.

Multiple unwrap() calls at lines 81, 82, 83, 87, and 89 can panic if the server is down, the modelfile is missing FROM, or IO operations fail. This will crash the CLI instead of allowing graceful recovery.

This issue was previously flagged. Please refer to the past review comment for the suggested fix.


115-132: Return proper HTTP errors; add timeout and use is_success().

Line 126 unwraps the network request which can panic. Line 127 uses exact status comparison instead of is_success(). No timeout is configured, so requests can hang indefinitely.

This issue was previously flagged. Please refer to the past review comment for the suggested fix.


134-157: Avoid unwraps; parse response safely and surface non‑200s.

Multiple unwrap() calls at lines 145, 148, and 149 can panic on network failures or malformed JSON. Line 147 uses exact status comparison. No timeout is configured for what could be a long-running generation request.

This issue was previously flagged. Please refer to the past review comment for the suggested fix.

server/api.py (5)

46-46: Critical: Fix invalid type/initialization for _runner.

This issue was flagged in previous reviews but remains unresolved. The global _runner is typed as MLXRunner but initialized to {} (an empty dict), which will cause type errors.

Apply this diff:

-_runner: MLXRunner = {}
+from typing import Optional
+_runner: Optional[MLXRunner] = None

132-132: Critical: Fix invalid agent instantiation syntax.

This issue was flagged in previous reviews. Line 132 agent: Agent() is invalid—it invokes Agent() at import time without properly binding the variable.

Apply this diff:

-agent: Agent()
+agent: Optional[Agent] = None

Then instantiate the agent inside the function that needs it (e.g., within start_model or as a module-level singleton agent = Agent()).


134-177: Fix model path validation and remove debug prints.

Several issues remain from previous reviews:

  1. Lines 138, 152-153: Debug print statements should be removed (potential PII).
  2. Line 143: model_path.exists() can raise AttributeError if get_model_path returns None for model_path (per cache_utils.py line 186, it can return (None, model_name, commit_hash)).
  3. Line 142: commit_hash is unpacked but never used.
  4. Lines 145-146: Broad exception handling without proper chaining.

Apply this diff:

-    print(model_spec)
     # Use the existing model path resolution from cache_utils
 
     try:
-        model_path, model_name, commit_hash = get_model_path(model_spec)
-        if not model_path.exists():
+        model_path, model_name, _commit_hash = get_model_path(model_spec)
+        if not model_path or not Path(model_path).exists():
             raise HTTPException(status_code=404, detail=f"Model {model_spec} not found in cache")
     except Exception as e:
-        raise HTTPException(status_code=404, detail=f"Model {model_spec} not found: {str(e)}")
+        raise HTTPException(status_code=404, detail=f"Model {model_spec} not found: {str(e)}") from e
 
     # Check if it's an MLX model
 
     model_path_str = str(model_path)
 
-    print(_current_model_path)
-    print(model_path_str)
     # Check if we need to load a different model

Also add the Path import at the top:

from pathlib import Path

195-207: Remove PII prints and avoid global conversation state.

This endpoint has issues flagged in previous reviews:

  1. Line 199: print(str(request)) leaks PII and should be removed.
  2. Line 200: Resetting global _messages will cause conversations from concurrent requests to interfere with each other.

Apply this diff:

-    global _messages, _runner,_memory_path
-    print(str(request))
-    _messages = [ChatMessage(role="system", content=SYSTEM_PROMPT)]
+    global _runner, _memory_path
     _memory_path = request.memory_path
     try:
         _runner = get_or_load_model(request.model)

Note: Conversation history should be managed per-request or per-session (e.g., passed in the request body or stored with a session ID) rather than in global state.


208-313: Critical: Fix blocking calls, global state, and PII leaks.

This endpoint has multiple unresolved issues from previous reviews:

  1. Lines 234-241, 270-272: runner.generate_batch() is CPU-bound and blocks the asyncio event loop. This will make the server unresponsive during generation.

  2. Lines 229, 266, 279: Mutating global _messages causes conversations from concurrent clients to interfere with each other.

  3. Lines 251, 261-262, 273: Debug print statements leak PII (generated text, results).

  4. Line 275: thoughts is assigned but never used.

  5. Line 301: No fallback when reply is empty, resulting in blank content.

Apply this comprehensive diff:

+from starlette.concurrency import run_in_threadpool
+
 @app.post("/v1/chat/completions")
 async def create_chat_completion(request: ChatCompletionRequest):
     """Create a chat completion."""
-    global _messages, _max_tool_turns, _memory_path
+    global _max_tool_turns, _memory_path
     try:
         runner = get_or_load_model(request.model)
 
         # Non-streaming response
         completion_id = f"chatcmpl-{uuid.uuid4()}"
         created = int(time.time())
 
-        # Convert messages to dict format for runner
-        # _messages.append(system_message)
-        _messages.extend(request.messages)
-        message_dicts = format_chat_messages_for_runner(_messages)
+        # Build per-request message history (avoid shared globals)
+        messages = [ChatMessage(role="system", content=SYSTEM_PROMPT), *request.messages]
+        message_dicts = format_chat_messages_for_runner(messages)
         # Let the runner format with chat templates
         prompt = runner._format_conversation(message_dicts, use_chat_template=True)
 
-        generated_text = runner.generate_batch(
+        generated_text = await run_in_threadpool(
+            runner.generate_batch,
             prompt=prompt,
-            max_tokens=runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False),
-            temperature=request.temperature,
-            top_p=request.top_p,
-            repetition_penalty=request.repetition_penalty,
-            use_chat_template=False  # Already applied in _format_conversation
+            runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False),
+            request.temperature,
+            request.top_p,
+            request.repetition_penalty,
+            20,  # repetition_context_size
+            False,  # use_chat_template
+            False,  # interactive
         )
 
-        thoughts = extract_thoughts(generated_text)
         reply = extract_reply(generated_text)
         python_code = extract_python_code(generated_text)
-        print(generated_text)
         result = ({}, "")
         if python_code:
             create_memory_if_not_exists()
@@ -260,26 +252,25 @@
                 import_module="server.mem_agent.tools",
             )
 
-        print(reply)
-        print(str(result))        
 
         remaining_tool_turns = _max_tool_turns
         while remaining_tool_turns > 0 and not reply:
-            _messages.append(ChatMessage(role="user", content=format_results(result[0], result[1])))
-            message_dicts = format_chat_messages_for_runner(_messages)
+            messages.append(ChatMessage(role="user", content=format_results(result[0], result[1])))
+            message_dicts = format_chat_messages_for_runner(messages)
             # Let the runner format with chat templates
             prompt = runner._format_conversation(message_dicts, use_chat_template=True)
-            generated_text = runner.generate_batch(
-                prompt=prompt
+            generated_text = await run_in_threadpool(
+                runner.generate_batch,
+                prompt,
+                None, 0.7, 0.9, 1.1, 20, False, False
             )
-            print(generated_text)
             # Extract the thoughts, reply and python code from the response
-            thoughts = extract_thoughts(generated_text)
             reply = extract_reply(generated_text)
             python_code = extract_python_code(generated_text)
 
-            _messages.append(ChatMessage(role="assistant", content=generated_text))
+            messages.append(ChatMessage(role="assistant", content=generated_text))
             if python_code:
                 create_memory_if_not_exists()
                 result = execute_sandboxed_code(
@@ -298,7 +289,7 @@
                 {
                     "index": 0,
                     "message": {
                         "role": "assistant",
-                        "content": reply
+                        "content": reply or generated_text.strip()
                     },
                     "finish_reason": "stop"
                 }
🧹 Nitpick comments (1)
src/runner/mlx.rs (1)

108-113: Remove or document the unused ping function.

The ping function is not called anywhere in the codebase. If it's intended for future use or debugging, consider marking it with #[allow(dead_code)] and adding a doc comment explaining its purpose. Otherwise, remove it to reduce maintenance burden.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e17f41 and 01ea71e.

📒 Files selected for processing (2)
  • server/api.py (1 hunks)
  • src/runner/mlx.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{rs,toml}

⚙️ CodeRabbit configuration file

Review the Rust code for conformity with best practices in Rust, Systems programming. Highlight any deviations.

Files:

  • src/runner/mlx.rs
🧬 Code graph analysis (2)
src/runner/mlx.rs (3)
src/commands/mod.rs (1)
  • run (11-18)
src/core/modelfile.rs (4)
  • new (88-90)
  • new (107-119)
  • from_str (64-71)
  • from_str (235-237)
server/api.py (1)
  • ping (192-193)
server/api.py (5)
server/cache_utils.py (1)
  • get_model_path (182-201)
server/mlx_runner.py (5)
  • cleanup (321-357)
  • load_model (128-180)
  • _format_conversation (765-794)
  • generate_batch (584-689)
  • get_effective_max_tokens (359-389)
server/mem_agent/utils.py (5)
  • extract_python_code (152-172)
  • extract_reply (175-182)
  • extract_thoughts (185-192)
  • create_memory_if_not_exists (57-71)
  • format_results (195-203)
server/mem_agent/engine.py (1)
  • execute_sandboxed_code (200-314)
src/runner/mlx.rs (2)
  • load_model (115-132)
  • ping (108-113)
🪛 Ruff (0.14.1)
server/api.py

111-111: Unused method argument: memory_path

(ARG002)


111-111: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


113-113: Unused method argument: model

(ARG002)


113-113: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


114-114: Unused method argument: predetermined_memory_path

(ARG002)


115-115: Unused method argument: model_cache

(ARG002)


115-115: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


116-116: Unused method argument: current_model_path

(ARG002)


117-117: Unused method argument: default_max_tokens

(ARG002)


142-142: Unpacked variable commit_hash is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


144-144: Abstract raise to an inner function

(TRY301)


145-145: Do not catch blind exception: Exception

(BLE001)


146-146: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


146-146: Use explicit conversion flag

Replace with conversion flag

(RUF010)


162-163: try-except-pass detected, consider logging the exception

(S110)


162-162: Do not catch blind exception: Exception

(BLE001)


204-204: Consider moving this statement to an else block

(TRY300)


205-205: Do not catch blind exception: Exception

(BLE001)


206-206: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


275-275: Local variable thoughts is assigned to but never used

Remove assignment to unused variable thoughts

(F841)


312-312: Do not catch blind exception: Exception

(BLE001)


313-313: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Comment on lines +107 to +129
class Agent:
def __init__(
self,
max_tool_turns: int = 20,
memory_path: str = None,
use_vllm: bool = False,
model: str = None,
predetermined_memory_path: bool = False,
model_cache: Dict[str, MLXRunner] = {},
current_model_path: Optional[str] = None,
default_max_tokens: Optional[int] = None # Use dynamic model-aware limits by default

):
# Load the system prompt and add it to the conversation history
self.system_prompt = SYSTEM_PROMPT
self.messages: list[ChatMessage] = [
ChatMessage(role="system", content=self.system_prompt)
]

# Set the maximum number of tool turns and use_vllm flag
self.max_tool_turns = max_tool_turns
self.use_vllm = use_vllm

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove unused Agent parameters or complete implementation.

The Agent class declares several unused parameters: memory_path, model, predetermined_memory_path, model_cache, current_model_path, and default_max_tokens. Additionally, line 115 uses a mutable default argument which is an anti-pattern in Python.

Apply this diff to clean up the unused parameters and fix the mutable default:

 class Agent:
     def __init__(
         self,
         max_tool_turns: int = 20,
-        memory_path: str = None,
         use_vllm: bool = False,
-        model: str = None,
-        predetermined_memory_path: bool = False,
-        model_cache: Dict[str, MLXRunner] = {},
-        current_model_path: Optional[str] = None,
-        default_max_tokens: Optional[int] = None  # Use dynamic model-aware limits by default
-
     ):
         # Load the system prompt and add it to the conversation history
         self.system_prompt = SYSTEM_PROMPT
         self.messages: list[ChatMessage] = [
             ChatMessage(role="system", content=self.system_prompt)
         ]
 
         # Set the maximum number of tool turns and use_vllm flag
         self.max_tool_turns = max_tool_turns
         self.use_vllm = use_vllm

If these parameters are intended for future use, implement them in the body; otherwise, remove them to avoid confusion.

🧰 Tools
🪛 Ruff (0.14.1)

111-111: Unused method argument: memory_path

(ARG002)


111-111: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


113-113: Unused method argument: model

(ARG002)


113-113: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


114-114: Unused method argument: predetermined_memory_path

(ARG002)


115-115: Unused method argument: model_cache

(ARG002)


115-115: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


116-116: Unused method argument: current_model_path

(ARG002)


117-117: Unused method argument: default_max_tokens

(ARG002)

🤖 Prompt for AI Agents
In server/api.py around lines 107 to 129, the Agent __init__ declares unused
parameters (memory_path, model, predetermined_memory_path, model_cache,
current_model_path, default_max_tokens) and uses a mutable default for
model_cache; either remove the unused parameters or wire them to instance state,
and fix the mutable default by changing model_cache: Optional[Dict[str,
MLXRunner]] = None and then inside the body set self.model_cache = model_cache
or {}. If you keep parameters for future use, assign them to self (e.g.,
self.memory_path = memory_path, self.model = model, etc.) and ensure
default_max_tokens is stored or documented; otherwise drop them from the
signature to avoid confusion.

// build the arg list from modelfile
let mut args: Vec<String> = vec![];
args.push("--model".to_owned());
args.push(modelfile.from.unwrap());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't unwrap modelfile.from; handle the missing FROM case.

Line 24 will panic if modelfile.from is None, which is inconsistent with the good error handling elsewhere in this function.

Apply this diff to handle the missing FROM case gracefully:

-    args.push(modelfile.from.unwrap());
+    let Some(from) = modelfile.from else {
+        eprintln!("❌ Error: Modelfile is missing a FROM instruction");
+        return;
+    };
+    args.push(from);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
args.push(modelfile.from.unwrap());
let Some(from) = modelfile.from else {
eprintln!("❌ Error: Modelfile is missing a FROM instruction");
return;
};
args.push(from);
🤖 Prompt for AI Agents
In src/runner/mlx.rs around line 24, don't unwrap modelfile.from; instead handle
the None case gracefully: check whether modelfile.from is Some and push its
value into args, or if None return/propagate a proper error (with a clear
message consistent with other errors in this function) rather than panicking;
use pattern matching (if let/ match) or map_err to convert the missing FROM into
the function's error type and include context about the missing FROM field.

}

fn get_memory_path() -> Result<String> {
let home_dir = env::home_dir().context("Failed to fetch $HOME")?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace deprecated env::home_dir() with the dirs crate.

std::env::home_dir() has been deprecated since Rust 1.29. It can produce incorrect results on Windows and has platform-specific issues. Use the dirs crate (specifically dirs::home_dir()) or the home crate instead.

First, add the dependency to Cargo.toml:

dirs = "5.0"

Then apply this diff:

 fn get_memory_path() -> Result<String> {
-    let home_dir = env::home_dir().context("Failed to fetch $HOME")?;
+    let home_dir = dirs::home_dir().context("Failed to fetch home directory")?;
     let config_dir = match env::var("XDG_CONFIG_HOME") {

And update the imports:

-use std::{env, fs};
+use std::{fs, env};
🤖 Prompt for AI Agents
In src/runner/mlx.rs around line 160, replace the deprecated
std::env::home_dir() call with dirs::home_dir(): add the dirs = "5.0" dependency
to Cargo.toml, update the imports to use dirs::home_dir (and remove or stop
using std::env::home_dir), call dirs::home_dir().context("Failed to fetch
$HOME")? in place of the old call, and adjust any use of the returned
PathBuf/Option handling if necessary to preserve the existing error path.

fs::create_dir_all(&tiles_config_dir).context("Failed to create tiles config directory")?;
fs::write(
tiles_config_dir.join(".memory_path"),
memory_path.to_str().unwrap(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't unwrap to_str(); handle non‑UTF-8 paths.

Line 190 will panic if the memory path contains invalid UTF-8 characters. Use to_string_lossy() (as done correctly at line 193) or propagate an error instead.

Apply this diff:

         fs::write(
             tiles_config_dir.join(".memory_path"),
-            memory_path.to_str().unwrap(),
+            memory_path.to_string_lossy().as_ref(),
         )
         .context("Failed to write the default path to .memory_path")?;
🤖 Prompt for AI Agents
In src/runner/mlx.rs around line 190, the call memory_path.to_str().unwrap() can
panic on non‑UTF8 paths; replace it with a non‑panicking approach (use
memory_path.to_string_lossy() like at line 193, or propagate a Result error) so
invalid UTF‑8 is handled safely—update the argument usage to accept the lossy
string (or change the function signature to return/propagate an error) to avoid
unwrap() panics.

- Added rust-toolchain.toml for consistent rust runtime
- Added justfile for easy running commands
- Updated CI to support toolchain and just, and cache
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/runner/mlx.rs (2)

55-73: Do not std::process::exit in library code; return early instead.

Exiting the process here harms callers and tests. Log and return.

         Err(e) => {
             if e.kind() == std::io::ErrorKind::NotFound {
                 eprintln!("❌ Error: mlx_lm.chat command not found");
                 eprintln!("💡 Hint: Install mlx-lm by running: pip install mlx-lm");
                 eprintln!("📝 Note: mlx-lm is only available on macOS with Apple Silicon");
-                std::process::exit(1);
+                return;
             } else {
                 eprintln!("❌ Error: Failed to spawn mlx_lm.chat: {}", e);
-                std::process::exit(1);
+                return;
             }
         }

11-106: Address panic paths and deprecated APIs in src/runner/mlx.rs

Verification confirms multiple issues that require fixes:

  1. 11 unwrap() calls panic on error (lines 12, 24, 81, 82, 83, 87, 89, 126, 145, 148, 149, 190): Replace with proper error handling using ?, .context(), or .map_err().

  2. Deprecated env::home_dir() (line 160): Replace with dirs::home_dir() or std::env::var("HOME").

  3. Hard-coded daemon URLs (lines 110, 122, 141 with http://127.0.0.1:6969): Extract to a constant or configuration.

♻️ Duplicate comments (7)
src/runner/mlx.rs (7)

20-26: Avoid unwrap on FROM in subprocess path.

Gracefully handle missing FROM rather than panicking.

-    args.push("--model".to_owned());
-    args.push(modelfile.from.unwrap());
+    args.push("--model".to_owned());
+    let Some(from) = modelfile.from else {
+        eprintln!("❌ Error: Modelfile is missing a FROM instruction");
+        return;
+    };
+    args.push(from);

11-18: Handle missing FROM and relax mem‑agent detection; don’t drop errors.

Unwrap can panic; make detection flexible and surface server errors.

-pub async fn run(modelfile: Modelfile) {
-    let model = modelfile.from.as_ref().unwrap();
-    if model.starts_with("driaforall/mem-agent") {
-        let _res = run_model_with_server(modelfile).await;
-    } else {
-        run_model_by_sub_process(modelfile);
-    }
-}
+pub async fn run(modelfile: Modelfile) {
+    let Some(model) = modelfile.from.as_deref() else {
+        eprintln!("❌ Error: Modelfile is missing a FROM instruction");
+        return;
+    };
+    if is_mem_agent(model) {
+        if let Err(e) = run_model_with_server(modelfile).await {
+            eprintln!("❌ Server path failed: {}", e);
+        }
+    } else {
+        run_model_by_sub_process(modelfile);
+    }
+}
+
+fn is_mem_agent(model: &str) -> bool {
+    model.to_ascii_lowercase().contains("mem-agent")
+}

75-106: Interactive server path: replace unwraps, surface errors, and keep session alive.

Multiple unwraps can panic; also prefer anyhow::Result for consistency.

-async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> {
+async fn run_model_with_server(modelfile: Modelfile) -> Result<()> {
     let stdin = io::stdin();
     let mut stdout = io::stdout();
     // loading the model from mem-agent via daemon server
-    let memory_path = get_memory_path()
-        .context("Retrieving memory_path failed")
-        .unwrap();
-    let modelname = modelfile.from.as_ref().unwrap();
-    load_model(modelname, &memory_path).await.unwrap();
+    let memory_path = get_memory_path().context("Retrieving memory_path failed")?;
+    let Some(modelname) = modelfile.from.as_deref() else {
+        eprintln!("❌ Error: Modelfile is missing a FROM instruction");
+        return Ok(());
+    };
+    if let Err(e) = load_model(modelname, &memory_path).await {
+        eprintln!("❌ Failed to start model on server: {}", e);
+        return Ok(());
+    }
     println!("Running in interactive mode");
     loop {
         print!(">> ");
-        stdout.flush().unwrap();
+        if let Err(e) = stdout.flush() {
+            eprintln!("⚠️ flush failed: {}", e);
+        }
         let mut input = String::new();
-        stdin.read_line(&mut input).unwrap();
+        if let Err(e) = stdin.read_line(&mut input) {
+            eprintln!("⚠️ read_line failed: {}", e);
+            continue;
+        }
         let input = input.trim();
         match input {
             "exit" => {
                 println!("Exiting interactive mode");
                 break;
             }
             _ => {
-                if let Ok(response) = chat(input, modelname).await {
-                    println!(">> {}", response)
-                } else {
-                    println!(">> failed to respond")
-                }
+                match chat(input, modelname).await {
+                    Ok(response) => println!(">> {}", response),
+                    Err(e) => println!(">> failed to respond: {}", e),
+                }
             }
         }
     }
     Ok(())
 }

115-132: HTTP start: add timeout, check is_success, remove unwraps, include status/body.

Avoid panics and provide actionable errors.

 async fn load_model(model_name: &str, memory_path: &str) -> Result<(), String> {
-    let client = Client::new();
+    let client = reqwest::Client::builder()
+        .timeout(Duration::from_secs(30))
+        .build()
+        .map_err(|e| e.to_string())?;
     let body = json!({
         "model": model_name,
         "memory_path": memory_path
     });
     let res = client
-        .post("http://127.0.0.1:6969/start")
+        .post(format!("{}/start", DEFAULT_BASE_URL))
         .json(&body)
         .send()
-        .await
-        .unwrap();
-    if res.status() == 200 {
+        .await
+        .map_err(|e| e.to_string())?;
+    if res.status().is_success() {
         Ok(())
     } else {
-        Err(String::from("request failed"))
+        let status = res.status();
+        let body = res.text().await.unwrap_or_default();
+        Err(format!("request failed: {} - {}", status, body))
     }
 }

134-157: HTTP chat: add timeout, check is_success, parse JSON safely, remove unwraps.

Prevent panics; return clear errors on malformed responses.

 async fn chat(input: &str, model_name: &str) -> Result<String, String> {
-    let client = Client::new();
+    let client = reqwest::Client::builder()
+        .timeout(Duration::from_secs(60))
+        .build()
+        .map_err(|e| e.to_string())?;
     let body = json!({
         "model": model_name,
         "messages": [{"role": "user", "content": input}]
     });
     let res = client
-        .post("http://127.0.0.1:6969/v1/chat/completions")
+        .post(format!("{}/v1/chat/completions", DEFAULT_BASE_URL))
         .json(&body)
         .send()
-        .await
-        .unwrap();
-    // println!("{:?}", res);
-    if res.status() == 200 {
-        let text = res.text().await.unwrap();
-        let v: Value = serde_json::from_str(&text).unwrap();
-        let content = v["choices"][0]["message"]["content"]
-            .as_str()
-            .unwrap_or("<no content>");
-        Ok(content.to_owned())
+        .await
+        .map_err(|e| e.to_string())?;
+    if res.status().is_success() {
+        let text = res.text().await.map_err(|e| e.to_string())?;
+        let v: Value = serde_json::from_str(&text).map_err(|e| e.to_string())?;
+        let content = v
+            .get("choices").and_then(|c| c.get(0))
+            .and_then(|c0| c0.get("message"))
+            .and_then(|m| m.get("content"))
+            .and_then(|c| c.as_str())
+            .ok_or_else(|| "missing choices[0].message.content".to_string())?;
+        Ok(content.to_owned())
     } else {
-        Err(String::from("request failed"))
+        let status = res.status();
+        let body = res.text().await.unwrap_or_default();
+        Err(format!("request failed: {} - {}", status, body))
     }
 }

159-165: Replace deprecated env::home_dir() with dirs::home_dir().

Avoid deprecated API and platform quirks.

 fn get_memory_path() -> Result<String> {
-    let home_dir = env::home_dir().context("Failed to fetch $HOME")?;
+    let home_dir = dirs::home_dir().context("Failed to fetch home directory")?;
     let config_dir = match env::var("XDG_CONFIG_HOME") {

Add to Cargo.toml:

[dependencies]
dirs = "5"

189-191: Don’t unwrap to_str(); handle non‑UTF‑8 paths.

Use to_string_lossy() like you already do on Line 193.

         fs::write(
             tiles_config_dir.join(".memory_path"),
-            memory_path.to_str().unwrap(),
+            memory_path.to_string_lossy().as_ref(),
         )
         .context("Failed to write the default path to .memory_path")?;
🧹 Nitpick comments (5)
justfile (1)

9-12: Consider adding Python checks for the new server.

The PR introduces a FastAPI server (server/), but the justfile only validates Rust code. Consider adding targets for Python linting, formatting (e.g., ruff, black), and type checking (e.g., mypy) to maintain consistency across the polyglot codebase.

.github/workflows/rust.yml (1)

19-19: Pin the rust-toolchain action to a stable version.

Using @master makes the workflow vulnerable to unexpected breaking changes. Pin to a specific version tag or use a stable channel reference.

Apply this diff to pin to a stable version:

-    - uses: dtolnay/rust-toolchain@master
+    - uses: dtolnay/rust-toolchain@stable

Alternatively, if you need a specific toolchain version (e.g., from rust-toolchain.toml), you can use:

    - uses: dtolnay/[email protected]
      with:
        toolchain: stable
src/runner/mlx.rs (3)

1-7: Introduce a default BASE_URL and request timeouts; avoid magic strings.

Centralize the daemon URL and prepare for timeouts in HTTP helpers.

 use anyhow::{Context, Result};
 use reqwest::Client;
 use serde_json::{Value, json};
 use std::io::Write;
 use std::path::PathBuf;
-use std::{env, fs};
+use std::{env, fs};
+use std::time::Duration;
 use std::{io, process::Command};
 
+// Default mem-agent daemon URL; allow overriding via env in helpers if desired.
+const DEFAULT_BASE_URL: &str = "http://127.0.0.1:6969";

176-184: Trim newline when reading .memory_path.

fs::read_to_string includes a trailing newline; trim to avoid invalid paths.

-    if tiles_config_dir.is_dir()
-        && let Ok(content) = fs::read_to_string(tiles_config_dir.join(".memory_path"))
-    {
-        memory_path = content;
+    if tiles_config_dir.is_dir()
+        && let Ok(content) = fs::read_to_string(tiles_config_dir.join(".memory_path"))
+    {
+        memory_path = content.trim().to_string();
         is_memory_path_found = true;
     }

115-157: Optional: allow overriding base URL via env.

Keep DEFAULT_BASE_URL but let users set TILES_MEM_AGENT_URL externally.

-        .post(format!("{}/start", DEFAULT_BASE_URL))
+        .post(format!("{}/start", env::var("TILES_MEM_AGENT_URL").unwrap_or_else(|_| DEFAULT_BASE_URL.to_string())))
@@
-        .post(format!("{}/v1/chat/completions", DEFAULT_BASE_URL))
+        .post(format!("{}/v1/chat/completions", env::var("TILES_MEM_AGENT_URL").unwrap_or_else(|_| DEFAULT_BASE_URL.to_string())))

Also applies to: 75-106

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01ea71e and ac9d39a.

📒 Files selected for processing (4)
  • .github/workflows/rust.yml (1 hunks)
  • justfile (1 hunks)
  • rust-toolchain.toml (1 hunks)
  • src/runner/mlx.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{rs,toml}

⚙️ CodeRabbit configuration file

Review the Rust code for conformity with best practices in Rust, Systems programming. Highlight any deviations.

Files:

  • rust-toolchain.toml
  • src/runner/mlx.rs
🧬 Code graph analysis (1)
src/runner/mlx.rs (2)
src/commands/mod.rs (1)
  • run (11-18)
src/core/modelfile.rs (2)
  • new (88-90)
  • new (107-119)
🔇 Additional comments (4)
rust-toolchain.toml (1)

1-3: Rust 1.90.0 is a valid release. Rust 1.90.0 is an official stable release (released September 18, 2025). The toolchain configuration is well-formed, follows Rust best practices for reproducible builds, and specifies appropriate standard components.

justfile (1)

1-12: LGTM! Clean and idiomatic justfile structure.

The justfile provides a standard Rust development workflow with appropriate settings for CI:

  • Format checking with --check flag
  • Clippy with warnings-as-errors
  • Sequential execution that fails fast
.github/workflows/rust.yml (2)

20-30: LGTM! Cargo caching properly configured.

The cache configuration follows best practices by caching cargo binaries, registry, git dependencies, and build artifacts. The cache key based on Cargo.lock ensures cache invalidation when dependencies change, and the restore-keys provide good fallback behavior.


32-36: LGTM! Integration with justfile is clean.

The installation check for just and subsequent execution of just check properly centralizes the build/test logic, making it consistent between local development and CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant