-
Couldn't load subscription status.
- Fork 0
Adding local memory management #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- /start - Will load the model to memory and prepare the internal agent - /completions - Will now execute the py fns and update the memory
- Native interactive interface if model is mem-agent family - Communication between the py server for load model and chat
📝 WalkthroughWalkthroughAdds a new Python FastAPI memory-agent server under server/ with model loading/caching, MLXRunner integration, sandboxed Python execution, filesystem memory utilities, model cache/health tools, reasoning parsing utilities, and a system prompt. Adds server entrypoint, pyproject, and packaging/tooling files. Updates Rust CLI to async via tokio, changes runner to optionally use the mem-agent server (with a subprocess fallback), updates command invocation sites, and swaps the model reference in b.modelfile. CI and tooling adjusted (GitHub Actions, justfile, rust-toolchain). Cargo.toml gained async/http/serde error-handling dependencies. Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as Rust CLI (async)
participant Server as FastAPI Server
participant Cache as cache_utils
participant MLX as MLXRunner
participant Memory as mem_agent/tools + utils
participant Sandbox as mem_agent/engine
participant Reasoning as reasoning_utils
CLI->>Server: POST /start { model, memory_path }
Server->>Cache: resolve_single_model(model)
Cache-->>Server: model_path
Server->>MLX: get_or_load_model(model_path)
MLX-->>Server: runner ready
CLI->>Server: POST /v1/chat/completions { messages, params }
Server->>Memory: create_memory_if_not_exists(memory_path)
Server->>MLX: format_chat_messages_for_runner(messages)
Server->>MLX: generate_batch / generate_streaming
MLX-->>Server: response (may include <python> / reasoning)
Server->>Reasoning: extract reasoning & final_answer
alt response contains python
Server->>Memory: read_file(s) for context
Server->>Sandbox: execute_sandboxed_code(python, allowed_path, ...)
Sandbox->>Memory: perform tools (read/write/list/delete)
Memory-->>Sandbox: tool results
Sandbox-->>Server: (locals, error)
Server->>MLX: provide tool results and iterate (tool turn)
end
Server-->>CLI: ChatCompletionResponse (assistant reply)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 34
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Cargo.toml (1)
4-4: Invalid Rust edition; compilation will fail.The edition field must be one of
2015,2018, or2021. The value2024does not exist and will cause a compilation error.Apply this fix:
-edition = "2024" +edition = "2021"
🧹 Nitpick comments (15)
server/.gitignore (1)
1-3: Solid foundation, but consider adding.envfor secrets.The existing entries are appropriate for a Python project. However, since this PR introduces a FastAPI server (which typically requires environment configuration for API endpoints, model paths, secrets, etc.), consider adding common patterns to prevent accidental commits of sensitive data.
Consider enhancing the .gitignore with these additions:
__pycache__/ *.egg-info/ .venv/ +.env +.env.local +.pytest_cache/ +dist/ +build/Rationale:
.env/.env.local: Protects secrets (API keys, model cache paths, credentials) from being committed..pytest_cache/: Covers test metadata if pytest is used for validation.dist//build/: Standard for Python packages if future packaging/distribution is planned.Cargo.toml (2)
9-9: Specify explicit TLS backend for reproducible builds.
reqwestdefaults tonative-tls, but for deterministic and reproducible builds, explicitly choose betweenrustlsornative-tlsas recommended in best practices.Replace with one of these options:
-reqwest = { version = "0.12", features = ["json", "blocking"] } +reqwest = { version = "0.12", features = ["json", "blocking", "rustls-tls"] }or if you prefer native TLS:
-reqwest = { version = "0.12", features = ["json", "blocking"] } +reqwest = { version = "0.12", features = ["json", "blocking", "native-tls"] }Note: This assumes the
blockingfeature is actually used in the codebase. If only async calls are made, consider removing it to reduce dependency bloat.
13-13: Pin tokio to a stable minor version for production consistency.Using
version = "1"(major only) allows any 1.x minor/patch, which can introduce unexpected behavioral changes across environments. Per best practices, pin to a specific LTS minor like~1.43or~1.47.Apply this change:
-tokio = { version = "1" , features = ["macros", "rt-multi-thread"]} +tokio = { version = "~1.47", features = ["macros", "rt-multi-thread"]}Alternatively, you can use
1.47to allow patch updates within 1.47.x while staying stable for minor versions.server/mem_agent/__init__.py (1)
1-1: Consider adding a module docstring for clarity.This is an empty package initializer, which is correct. As an optional improvement, adding a brief module docstring would help developers understand the purpose of the mem_agent subsystem when exploring the codebase.
Example:
""" Memory agent subsystem for local memory management and reasoning. Provides engines, tools, and utilities for sandboxed code execution, memory persistence, and multi-step reasoning workflows. """server/pyproject.toml (1)
6-11: Move Black to dev extras and add minimal pins; consider uvicorn[standard].Black is a formatter and shouldn’t be a runtime dependency. Add minimal version pins for stability and use uvicorn[standard] to pull performant wheels (uvloop/httptools) by default.
Apply this diff:
[project] name = "server" version = "0.1.0" description = "Local MLX inference server for the Tiles CLI" requires-python = ">=3.10" dependencies = [ - "fastapi", - "uvicorn", - "mlx-lm", - "black" + "fastapi>=0.95.2", + "uvicorn[standard]>=0.34", + "mlx-lm" ] +[project.optional-dependencies] +dev = [ + "black>=24.3.0" +]Based on learnings.
server/main.py (1)
11-15: Tidy runner and enable basic logging.Add log_level for visibility and drop the commented PID code (or implement it). Keeping 127.0.0.1 is fine for local use.
- uvicorn.run(app, host="127.0.0.1", port=PORT) + uvicorn.run(app, host="127.0.0.1", port=PORT, log_level="info")src/main.rs (1)
22-34: Optional: add structured logging early.A basic tracing subscriber helps diagnose async issues.
use clap::{Parser, Subcommand}; mod commands; +use tracing_subscriber::{fmt, EnvFilter}; @@ #[tokio::main] pub async fn main() -> Result<(), Box<dyn Error>> { + let _ = fmt().with_env_filter(EnvFilter::from_default_env()).try_init();src/runner/mlx.rs (3)
1-5: HTTP timeouts + remove unused import.Add request timeouts (avoid hangs) and drop unused NulError.
-use reqwest::Client; +use reqwest::Client; use serde_json::{Value, json}; -use std::ffi::NulError; use std::io::Write; -use std::{io, process::Command}; +use std::{io, process::Command, time::Duration};And add a base URL constant for reuse:
use crate::core::modelfile::Modelfile; +const BASE_URL: &str = "http://127.0.0.1:6969";
18-71: Blocking subprocess in async context and deep std::process::exit.Spawning and waiting with std::process blocks a Tokio worker; and exiting from a library module makes composition/testing hard. Prefer tokio::process::Command and bubble errors up to the caller.
- Switch to tokio::process::Command and
.awaitthe child.- Change
run_model_by_sub_processto returnResult<(), anyhow::Error>(or custom) and handle errors at the top ofrun.- Replace
std::process::exit(1)withreturn Err(e.into())and print user-friendly hints at the call site.
105-110: Reuse BASE_URL and (optionally) add a timeout for ping.Small consistency fix.
- let res = client.get("http://127.0.0.1:6969/ping").send().await?; + let res = client.get(format!("{}/ping", BASE_URL)).send().await?;server/config.py (1)
7-7: Use Path.joinpath or os.path.join for path construction.String concatenation for path construction is error-prone and platform-dependent. Use Path methods for consistency with the rest of the file.
Apply this diff:
-MEMORY_PATH = os.path.expanduser("~") + "/tiles_memory" +MEMORY_PATH = str(Path.home() / "tiles_memory")server/model_card.py (1)
24-24: Remove unused noqa directive.The
UP045rule is not enabled, making this directive unnecessary.Apply this diff:
-# ruff: noqa: UP045server/reasoning_utils.py (1)
40-70: Annotate PATTERNS with typing.ClassVar.The PATTERNS dictionary is a mutable class attribute that should be annotated with
ClassVarto indicate it's a class-level attribute rather than an instance attribute.Apply this diff:
+from typing import ClassVar, Dict, Optional, Tuple + class ReasoningExtractor: """Extract reasoning and final answer from model outputs.""" # Model-specific patterns - PATTERNS = { + PATTERNS: ClassVar[Dict[str, Dict]] = { 'gpt-oss': {server/mem_agent/engine.py (1)
19-189: Add AST validation before code execution.The system prompt (server/system_prompt.txt line 269) states: "Your
<python>block MUST compile underast.parseand yield noSyntaxError". However, the sandbox doesn't validate this before execution. Invalid code will fail at exec() time, wasting resources and making error messages less clear.Add AST validation before execution:
import ast def _run_user_code( code: str, allow_installs: bool, allowed_path: str, blacklist: list, available_functions: dict, log: bool = False, ) -> tuple[dict, str]: """Execute code under sandboxed conditions...""" # Validate syntax before execution try: ast.parse(code) except SyntaxError as e: return {}, f"SyntaxError in code: {e}" try: # ... rest of the functionThis provides faster feedback and matches the documented requirement.
server/mlx_runner.py (1)
300-301: Narrow bare except when probing tokenizer special tokens.Use “except Exception” and optionally log at verbose.
- except: - pass + except Exception as err: + if self.verbose: + print(f"[DEBUG] Failed to probe token '{token}': {err}")
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockserver/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (20)
Cargo.toml(1 hunks)b.modelfile(1 hunks)server/.gitignore(1 hunks)server/__init__.py(1 hunks)server/api.py(1 hunks)server/cache_utils.py(1 hunks)server/config.py(1 hunks)server/main.py(1 hunks)server/mem_agent/__init__.py(1 hunks)server/mem_agent/engine.py(1 hunks)server/mem_agent/tools.py(1 hunks)server/mem_agent/utils.py(1 hunks)server/mlx_runner.py(1 hunks)server/model_card.py(1 hunks)server/pyproject.toml(1 hunks)server/reasoning_utils.py(1 hunks)server/system_prompt.txt(1 hunks)src/commands/mod.rs(1 hunks)src/main.rs(1 hunks)src/runner/mlx.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{rs,toml}
⚙️ CodeRabbit configuration file
Review the Rust code for conformity with best practices in Rust, Systems programming. Highlight any deviations.
Files:
src/main.rsserver/pyproject.tomlsrc/runner/mlx.rssrc/commands/mod.rsCargo.toml
🧬 Code graph analysis (8)
src/main.rs (3)
src/core/modelfile.rs (1)
parse(253-258)src/commands/mod.rs (1)
run(11-18)src/runner/mlx.rs (1)
run(9-16)
src/runner/mlx.rs (2)
src/commands/mod.rs (1)
run(11-18)src/core/modelfile.rs (4)
new(88-90)new(107-119)Modelfile(106-225)Modelfile(94-104)
server/reasoning_utils.py (1)
server/cache_utils.py (1)
detect_model_type(277-303)
src/commands/mod.rs (2)
src/runner/mlx.rs (1)
run(9-16)src/core/modelfile.rs (1)
parse_from_file(246-251)
server/mlx_runner.py (1)
server/reasoning_utils.py (6)
ReasoningExtractor(36-182)StreamingReasoningParser(246-430)detect_model_type(73-86)process_token(200-243)process_token(260-414)finalize(416-430)
server/mem_agent/tools.py (1)
server/mem_agent/utils.py (2)
check_size_limits(35-54)create_memory_if_not_exists(57-71)
server/cache_utils.py (2)
server/model_card.py (2)
read_readme_front_matter(139-167)tokenizer_has_chat_template(170-184)server/mlx_runner.py (1)
run_model_enhanced(955-1049)
server/api.py (5)
server/cache_utils.py (2)
detect_framework(241-274)get_model_path(182-201)server/mlx_runner.py (5)
cleanup(321-357)load_model(128-180)_format_conversation(765-794)generate_batch(584-689)get_effective_max_tokens(359-389)server/mem_agent/utils.py (5)
extract_python_code(152-172)extract_reply(175-182)extract_thoughts(185-192)create_memory_if_not_exists(57-71)format_results(195-203)server/mem_agent/engine.py (1)
execute_sandboxed_code(200-314)src/runner/mlx.rs (2)
load_model(112-129)ping(105-110)
🪛 LanguageTool
server/system_prompt.txt
[style] ~46-~46: This phrase is redundant. Consider using “outside”.
Context: ...Skip the <think> block - Provide text outside of these tags - Use <reply> when you hav...
(OUTSIDE_OF)
[style] ~257-~257: Using many exclamation marks might seem excessive (in this case: 9 exclamation marks for a text that’s 5062 characters long)
Context: ...thout assignment return empty {} results! 9. Wait for Results: After submitti...
(EN_EXCESSIVE_EXCLAMATION)
🪛 Ruff (0.14.1)
server/reasoning_utils.py
40-70: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
server/model_card.py
24-24: Unused noqa directive (non-enabled: UP045)
Remove unused noqa directive
(RUF100)
55-55: Do not catch blind exception: Exception
(BLE001)
165-165: Consider moving this statement to an else block
(TRY300)
166-166: Do not catch blind exception: Exception
(BLE001)
183-183: Do not catch blind exception: Exception
(BLE001)
server/mem_agent/engine.py
37-37: Do not catch blind exception: Exception
(BLE001)
53-55: Abstract raise to an inner function
(TRY301)
53-55: Avoid specifying long messages outside the exception class
(TRY003)
67-69: Abstract raise to an inner function
(TRY301)
67-69: Avoid specifying long messages outside the exception class
(TRY003)
80-82: Abstract raise to an inner function
(TRY301)
80-82: Avoid specifying long messages outside the exception class
(TRY003)
103-104: try-except-pass detected, consider logging the exception
(S110)
103-103: Do not catch blind exception: Exception
(BLE001)
128-128: subprocess call: check for execution of untrusted input
(S603)
134-134: Do not catch blind exception: Exception
(BLE001)
136-138: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
139-139: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
156-156: Use of exec detected
(S102)
157-157: Do not catch blind exception: Exception
(BLE001)
162-162: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
183-183: Do not catch blind exception: Exception
(BLE001)
189-189: Consider moving this statement to an else block
(TRY300)
191-191: Do not catch blind exception: Exception
(BLE001)
194-196: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
197-197: Use explicit conversion flag
Replace with conversion flag
(RUF010)
204-204: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
205-205: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
206-206: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
207-207: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
208-208: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
238-238: subprocess call: check for execution of untrusted input
(S603)
244-244: Do not catch blind exception: Exception
(BLE001)
245-247: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
270-270: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
287-287: subprocess call: check for execution of untrusted input
(S603)
295-297: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
307-307: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
308-308: Do not catch blind exception: Exception
(BLE001)
321-321: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
server/mem_agent/utils.py
70-70: Do not catch blind exception: Exception
(BLE001)
115-115: f-string without any placeholders
Remove extraneous f prefix
(F541)
130-130: Consider moving this statement to an else block
(TRY300)
131-131: Do not use bare except
(E722)
131-133: try-except-pass detected, consider logging the exception
(S110)
144-144: Consider moving this statement to an else block
(TRY300)
146-146: Do not catch blind exception: Exception
(BLE001)
146-146: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
server/mlx_runner.py
72-72: Consider moving this statement to an else block
(TRY300)
110-110: Avoid specifying long messages outside the exception class
(TRY003)
115-115: Consider moving this statement to an else block
(TRY300)
142-143: try-except-pass detected, consider logging the exception
(S110)
142-142: Do not catch blind exception: Exception
(BLE001)
180-180: Avoid specifying long messages outside the exception class
(TRY003)
221-221: Possible hardcoded password assigned to: "token_content"
(S105)
235-235: Possible hardcoded password assigned to: "token_content"
(S105)
300-300: Do not use bare except
(E722)
300-301: try-except-pass detected, consider logging the exception
(S110)
348-349: try-except-pass detected, consider logging the exception
(S110)
348-348: Do not catch blind exception: Exception
(BLE001)
421-421: Avoid specifying long messages outside the exception class
(TRY003)
611-611: Avoid specifying long messages outside the exception class
(TRY003)
761-761: Do not catch blind exception: Exception
(BLE001)
787-787: Consider moving this statement to an else block
(TRY300)
788-788: Do not catch blind exception: Exception
(BLE001)
829-829: Do not catch blind exception: Exception
(BLE001)
854-854: Unpacked variable before_reasoning is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
876-878: try-except-pass detected, consider logging the exception
(S110)
876-876: Do not catch blind exception: Exception
(BLE001)
1047-1047: Do not catch blind exception: Exception
(BLE001)
server/mem_agent/tools.py
26-26: Loop control variable dirnames not used within loop body
Rename unused dirnames to _dirnames
(B007)
40-40: Loop control variable dirnames not used within loop body
Rename unused dirnames to _dirnames
(B007)
49-49: Avoid specifying long messages outside the exception class
(TRY003)
88-88: Abstract raise to an inner function
(TRY301)
88-88: Create your own exception
(TRY002)
88-88: Avoid specifying long messages outside the exception class
(TRY003)
89-89: Do not catch blind exception: Exception
(BLE001)
94-94: Do not catch blind exception: Exception
(BLE001)
95-95: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
95-95: Create your own exception
(TRY002)
95-95: Avoid specifying long messages outside the exception class
(TRY003)
96-96: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
96-96: Create your own exception
(TRY002)
96-96: Avoid specifying long messages outside the exception class
(TRY003)
110-110: Consider moving this statement to an else block
(TRY300)
111-111: Do not catch blind exception: Exception
(BLE001)
178-178: Consider moving this statement to an else block
(TRY300)
182-182: Do not catch blind exception: Exception
(BLE001)
183-183: Use explicit conversion flag
Replace with conversion flag
(RUF010)
207-207: Do not catch blind exception: Exception
(BLE001)
230-230: Unused function argument: is_last
(ARG001)
277-277: Do not catch blind exception: Exception
(BLE001)
292-292: Consider moving this statement to an else block
(TRY300)
293-293: Do not catch blind exception: Exception
(BLE001)
328-328: Do not catch blind exception: Exception
(BLE001)
server/cache_utils.py
155-155: Unpacked variable found_model_dir is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
249-249: Unpacked variable pipeline is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
252-253: try-except-pass detected, consider logging the exception
(S110)
252-252: Do not catch blind exception: Exception
(BLE001)
266-266: Do not catch blind exception: Exception
(BLE001)
283-284: try-except-pass detected, consider logging the exception
(S110)
283-283: Do not catch blind exception: Exception
(BLE001)
294-295: try-except-pass detected, consider logging the exception
(S110)
294-294: Do not catch blind exception: Exception
(BLE001)
315-315: Do not catch blind exception: Exception
(BLE001)
376-376: Do not catch blind exception: Exception
(BLE001)
389-389: Do not catch blind exception: Exception
(BLE001)
458-460: try-except-continue detected, consider logging the exception
(S112)
458-458: Do not catch blind exception: Exception
(BLE001)
525-525: Do not catch blind exception: Exception
(BLE001)
572-572: Do not catch blind exception: Exception
(BLE001)
745-745: Starting a process with a shell, possible injection detected
(S605)
971-971: Do not catch blind exception: Exception
(BLE001)
1035-1035: Do not catch blind exception: Exception
(BLE001)
1054-1054: Do not catch blind exception: Exception
(BLE001)
1091-1091: Do not catch blind exception: Exception
(BLE001)
server/api.py
110-110: Unused method argument: memory_path
(ARG002)
110-110: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
112-112: Unused method argument: model
(ARG002)
112-112: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
113-113: Unused method argument: predetermined_memory_path
(ARG002)
114-114: Unused method argument: model_cache
(ARG002)
114-114: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
115-115: Unused method argument: current_model_path
(ARG002)
116-116: Unused method argument: default_max_tokens
(ARG002)
141-141: Unpacked variable commit_hash is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
143-143: Abstract raise to an inner function
(TRY301)
144-144: Do not catch blind exception: Exception
(BLE001)
145-145: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
145-145: Use explicit conversion flag
Replace with conversion flag
(RUF010)
161-162: try-except-pass detected, consider logging the exception
(S110)
161-161: Do not catch blind exception: Exception
(BLE001)
203-203: Consider moving this statement to an else block
(TRY300)
204-204: Do not catch blind exception: Exception
(BLE001)
205-205: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
274-274: Local variable thoughts is assigned to but never used
Remove assignment to unused variable thoughts
(F841)
311-311: Do not catch blind exception: Exception
(BLE001)
312-312: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (5)
server/__init__.py (1)
1-1: File structure and package initialization look good.An empty
__init__.pycorrectly marks this directory as a Python package. Consider whether public APIs from submodules (e.g., fromapi.py,config.py) should be re-exported here for improved usability, depending on your module design intent. For now, this approach is reasonable and allows users to import explicitly.src/main.rs (1)
22-29: Async entrypoint looks good.Tokio main + awaiting Run is correct and keeps the original flow intact.
b.modelfile (1)
1-1: No issue found—intentional separation of execution paths.The repository correctly uses two different modelfiles for two different execution strategies:
memgpt.modelfile(with the mem-agent prefix) routes to server execution, whileb.modelfile(with a different model) routes to subprocess execution. The gating logic at line 11 ofsrc/runner/mlx.rsis functioning as designed, not misconfigured.Likely an incorrect or invalid review comment.
src/runner/mlx.rs (1)
9-16: No issues found with mem-agent gating or FROM handling.The code is safe by design: modelfile.build()? is called in create_modelfile before returning Ok, and build() returns Err if from.is_none(). This guarantee ensures that when
mlx::run()receives a Modelfile,fromis always Some, making all three unwraps safe.Verification confirms:
- mem-agent gating limited to one location (src/runner/mlx.rs:11)
- No other hardcoded model name checks found
- Missing FROM fields already validated before run() is invoked
server/reasoning_utils.py (1)
260-414: Verify streaming parser state machine for edge cases.The
StreamingReasoningParser.process_tokenmethod has complex state machine logic with buffering, pattern matching, and conditional token skipping. The logic for detecting partial pattern matches (lines 296-317) and conditional skip tokens (lines 392-400) could have edge cases that cause tokens to be incorrectly buffered or dropped.Consider testing scenarios:
- Reasoning markers split across multiple tokens
- Skip tokens appearing in the middle of content (not just at boundaries)
- Very short tokens that match prefixes of multiple patterns
- Nested or malformed reasoning/final sections
The state machine should handle these gracefully without losing tokens or entering invalid states.
| _runner: MLXRunner = {} | ||
| _max_tool_turns = 5 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix invalid globals: _runner type/init.
Initialize to None with Optional type, not {}.
-_runner: MLXRunner = {}
+from typing import Optional
+_runner: Optional[MLXRunner] = None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _runner: MLXRunner = {} | |
| _max_tool_turns = 5 | |
| from typing import Optional | |
| _runner: Optional[MLXRunner] = None | |
| _max_tool_turns = 5 | |
🤖 Prompt for AI Agents
In server/api.py around lines 47 to 49, the global _runner is incorrectly
typed/initialized as "{}"; change its declaration to use an Optional type and
initialize it to None (e.g., _runner: Optional[MLXRunner] = None), and ensure
Optional is imported from typing (and MLXRunner is in scope or imported) so the
global reflects a nullable runner rather than an empty dict.
| agent: Agent() | ||
|
|
||
| def get_or_load_model(model_spec: str, verbose: bool = False) -> MLXRunner: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line executes Agent() at import time and leaves ‘agent’ unbound.
Change to a proper variable or remove.
-agent: Agent()
+# Optional: instantiate later if needed
+agent: Optional[Agent] = None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| agent: Agent() | |
| def get_or_load_model(model_spec: str, verbose: bool = False) -> MLXRunner: | |
| # Optional: instantiate later if needed | |
| agent: Optional[Agent] = None | |
| def get_or_load_model(model_spec: str, verbose: bool = False) -> MLXRunner: |
🤖 Prompt for AI Agents
In server/api.py around lines 131-133, the line "agent: Agent()" is
invalid/undesirable because it invokes Agent() at import time and doesn't
properly bind the variable; replace it with either a proper type annotation and
lazy init like "agent: Optional[Agent] = None" and instantiate Agent() inside
the function that needs it (e.g., get_or_load_model) or if a module-level
singleton is intended, change it to an explicit assignment "agent = Agent()".
Ensure imports for Optional are added if using the first option.
| model_path, model_name, commit_hash = get_model_path(model_spec) | ||
| if not model_path.exists(): | ||
| raise HTTPException(status_code=404, detail=f"Model {model_spec} not found in cache") | ||
| except Exception as e: | ||
| raise HTTPException(status_code=404, detail=f"Model {model_spec} not found: {str(e)}") | ||
|
|
||
| # Check if it's an MLX model | ||
|
|
||
| model_path_str = str(model_path) | ||
|
|
||
| print(_current_model_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Robust model path existence check and remove prints.
Avoid AttributeError when get_model_path returns None; drop noisy prints.
- print(model_spec)
@@
- model_path, model_name, commit_hash = get_model_path(model_spec)
- if not model_path.exists():
+ model_path, model_name, commit_hash = get_model_path(model_spec)
+ if not model_path or not Path(model_path).exists():
raise HTTPException(status_code=404, detail=f"Model {model_spec} not found in cache")
@@
- print(_current_model_path)
- print(model_path_str)Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.1)
141-141: Unpacked variable commit_hash is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
143-143: Abstract raise to an inner function
(TRY301)
144-144: Do not catch blind exception: Exception
(BLE001)
145-145: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
145-145: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🤖 Prompt for AI Agents
In server/api.py around lines 141 to 151, get_model_path may return None causing
an AttributeError when calling model_path.exists(), and there is a stray
print(_current_model_path); change the logic to first verify get_model_path
returned a non-None tuple and that model_path is not None before calling
.exists(), raise a 404 with a clear message if model_path is None or does not
exist, catch only expected exceptions (or re-raise unexpected ones) rather than
blanket-catching Exception for control flow, and remove the print statement
(replace with appropriate logging if needed).
| @app.get("/ping") | ||
| async def ping(): | ||
| return {"message": "Badda-Bing Badda-Bang"} | ||
|
|
||
| @app.post("/start") | ||
| async def start_model(request: StartRequest): | ||
| """Load the model and start the agent""" | ||
| global _messages, _runner | ||
| print(str(request)) | ||
| _messages = [ChatMessage(role="system", content=SYSTEM_PROMPT)] | ||
|
|
||
| try: | ||
| _runner = get_or_load_model(request.model) | ||
| return {"message": "Model loaded"} | ||
| except Exception as e: | ||
| raise HTTPException(status_code=500, detail=str(e)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/start resets global _messages; drop shared state and PII prints.
Globals will mix conversations across clients and leak content. Maintain history per request instead.
- global _messages, _runner
- print(str(request))
- _messages = [ChatMessage(role="system", content=SYSTEM_PROMPT)]
+ global _runner
try:
_runner = get_or_load_model(request.model)
return {"message": "Model loaded"}🧰 Tools
🪛 Ruff (0.14.1)
203-203: Consider moving this statement to an else block
(TRY300)
204-204: Do not catch blind exception: Exception
(BLE001)
205-205: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In server/api.py around lines 190 to 206, the start_model endpoint currently
prints the incoming request (potential PII) and resets the global _messages
which can mix conversations across clients; remove the print(str(request)) call
and do not mutate or reset any global conversation state in this handler.
Instead, keep model loading separate (e.g., set _runner =
get_or_load_model(request.model) only) and move conversation/history management
to per-request or per-session storage (pass history in request, store keyed by
session id, or use a request-scoped dependency) so each client gets isolated
history; ensure the endpoint returns model load status without touching or
clearing _messages and continue to surface errors via HTTPException as before.
| @app.post("/v1/chat/completions") | ||
| async def create_chat_completion(request: ChatCompletionRequest): | ||
| """Create a chat completion.""" | ||
| global _messages, _max_tool_turns | ||
| try: | ||
| runner = get_or_load_model(request.model) | ||
|
|
||
| # if request.stream: | ||
| # # Streaming response | ||
| # return StreamingResponse( | ||
| # generate_chat_stream(runner, request.messages, request), | ||
| # media_type="text/plain", | ||
| # headers={"Cache-Control": "no-cache"} | ||
| # ) | ||
| # else: | ||
| # Non-streaming response | ||
| completion_id = f"chatcmpl-{uuid.uuid4()}" | ||
| created = int(time.time()) | ||
|
|
||
| # Convert messages to dict format for runner | ||
| # _messages.append(system_message) | ||
| _messages.extend(request.messages) | ||
| message_dicts = format_chat_messages_for_runner(_messages) | ||
| # Let the runner format with chat templates | ||
| prompt = runner._format_conversation(message_dicts, use_chat_template=True) | ||
|
|
||
| generated_text = runner.generate_batch( | ||
| prompt=prompt, | ||
| max_tokens=runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False), | ||
| temperature=request.temperature, | ||
| top_p=request.top_p, | ||
| repetition_penalty=request.repetition_penalty, | ||
| use_chat_template=False # Already applied in _format_conversation | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t block the event loop; offload generation to threadpool.
MLX generation is CPU-bound; use run_in_threadpool.
-from fastapi.responses import StreamingResponse
+from fastapi.responses import StreamingResponse
+from starlette.concurrency import run_in_threadpool
@@
- completion_id = f"chatcmpl-{uuid.uuid4()}"
+ completion_id = f"chatcmpl-{uuid.uuid4()}"
created = int(time.time())
@@
- _messages.extend(request.messages)
- message_dicts = format_chat_messages_for_runner(_messages)
+ # Build per-request message history (avoid shared globals)
+ messages = [ChatMessage(role="system", content=SYSTEM_PROMPT), *request.messages]
+ message_dicts = format_chat_messages_for_runner(messages)
@@
- generated_text = runner.generate_batch(
- prompt=prompt,
- max_tokens=runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False),
- temperature=request.temperature,
- top_p=request.top_p,
- repetition_penalty=request.repetition_penalty,
- use_chat_template=False # Already applied in _format_conversation
- )
+ generated_text = await run_in_threadpool(
+ runner.generate_batch,
+ prompt,
+ runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False),
+ request.temperature,
+ request.top_p,
+ request.repetition_penalty,
+ False, # use_chat_template already applied
+ False, # interactive
+ )Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In server/api.py around lines 207 to 241, the code calls runner.generate_batch
(and potentially runner._format_conversation) directly which blocks the asyncio
event loop for CPU-bound ML generation; fix it by importing and using
FastAPI/Starlette's run_in_threadpool to offload the blocking work: move any
CPU/blocking preparation (formatting) into a sync function if needed, then await
run_in_threadpool(...) around the blocking generate_batch call (and formatting
if heavy) so generation runs on a worker thread and the event loop remains
responsive; ensure the call returns the generated_text to use in the response.
| pub async fn run(modelfile: &str) { | ||
| match modelfile::parse_from_file(modelfile) { | ||
| Ok(modelfile) => { | ||
| mlx::run(modelfile); | ||
| mlx::run(modelfile).await; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate Modelfile before running to avoid panic.
Runner unwraps FROM; call build() (or check FROM) here and fail fast with a clear message.
-pub async fn run(modelfile: &str) {
- match modelfile::parse_from_file(modelfile) {
- Ok(modelfile) => {
- mlx::run(modelfile).await;
+pub async fn run(modelfile: &str) {
+ match modelfile::parse_from_file(modelfile) {
+ Ok(mut modelfile) => {
+ if let Err(e) = modelfile.build() {
+ eprintln!("{}", e);
+ return;
+ }
+ mlx::run(modelfile).await;
}
Err(err) => println!("{}", err),
}
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/commands/mod.rs around lines 11-15, after successfully parsing the
modelfile you must validate/build it before passing to mlx::run to avoid a panic
when the runner later unwraps FROM; call the modelfile.build() (or the
appropriate validate/build method) and handle its Result, returning or logging a
clear error message and aborting if build fails, otherwise pass the
built/validated modelfile into mlx::run; ensure error paths produce a
user-friendly message that explains the invalid FROM.
| pub async fn run(modelfile: Modelfile) { | ||
| let model = modelfile.from.as_ref().unwrap(); | ||
| if model.starts_with("driaforall/mem-agent") { | ||
| let _res = run_model_with_server(modelfile).await; | ||
| } else { | ||
| run_model_by_sub_process(modelfile); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against missing FROM and relax mem‑agent detection.
Avoid panics and make the mem‑agent switch less brittle.
-pub async fn run(modelfile: Modelfile) {
- let model = modelfile.from.as_ref().unwrap();
- if model.starts_with("driaforall/mem-agent") {
- let _res = run_model_with_server(modelfile).await;
- } else {
- run_model_by_sub_process(modelfile);
- }
-}
+pub async fn run(mut modelfile: Modelfile) {
+ let Some(model) = modelfile.from.as_ref() else {
+ eprintln!("❌ Error: Modelfile is missing a FROM instruction");
+ return;
+ };
+ if is_mem_agent(model) {
+ if let Err(e) = run_model_with_server(modelfile).await {
+ eprintln!("❌ Error: {}", e);
+ }
+ } else {
+ run_model_by_sub_process(modelfile);
+ }
+}
+
+fn is_mem_agent(model: &str) -> bool {
+ model.contains("mem-agent")
+}🤖 Prompt for AI Agents
In src/runner/mlx.rs around lines 9 to 16, avoid unwrapping modelfile.from
(which can panic) and make the mem-agent check more flexible: replace the unwrap
with an if let / match that handles None by logging or returning early, obtain a
&str via as_deref(), normalize to lowercase, and use contains("mem-agent") (or a
more specific substring if desired) instead of the strict starts_with; then call
run_model_with_server(modelfile).await when the match hits and handle/log its
result, otherwise call run_model_by_sub_process(modelfile) as before.
| async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> { | ||
| // println!("gonna ping"); | ||
| // let _ = ping().await; | ||
| let stdin = io::stdin(); | ||
| let mut stdout = io::stdout(); | ||
| // loading the model from mem-agent via daeomn server | ||
| let modelname = modelfile.from.as_ref().unwrap(); | ||
| load_model(&modelname).await.unwrap(); | ||
| println!("Running in interactive mode"); | ||
| loop { | ||
| print!(">> "); | ||
| stdout.flush().unwrap(); | ||
| let mut input = String::new(); | ||
| stdin.read_line(&mut input).unwrap(); | ||
| let input = input.trim(); | ||
| match input { | ||
| "exit" => { | ||
| println!("Exiting interactive mode"); | ||
| break; | ||
| } | ||
| _ => { | ||
| if let Ok(response) = chat(input, &modelname).await { | ||
| println!(">> {}", response) | ||
| } else { | ||
| println!(">> failed to respond") | ||
| } | ||
| } | ||
| } | ||
| } | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t unwrap in the interactive server path; handle failures gracefully.
Unwraps can panic if the server is down or returns non‑JSON. Propagate errors and keep the CLI alive.
- let stdin = io::stdin();
- let mut stdout = io::stdout();
- // loading the model from mem-agent via daeomn server
- let modelname = modelfile.from.as_ref().unwrap();
- load_model(&modelname).await.unwrap();
+ let stdin = io::stdin();
+ let mut stdout = io::stdout();
+ // load the model via daemon server
+ let modelname = modelfile.from.as_ref().unwrap().to_owned();
+ if let Err(e) = load_model(&modelname).await {
+ eprintln!("❌ Failed to start model on server: {}", e);
+ return Ok(());
+ }
@@
- if let Ok(response) = chat(input, &modelname).await {
+ if let Ok(response) = chat(input, &modelname).await {
println!(">> {}", response)
} else {
println!(">> failed to respond")
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> { | |
| // println!("gonna ping"); | |
| // let _ = ping().await; | |
| let stdin = io::stdin(); | |
| let mut stdout = io::stdout(); | |
| // loading the model from mem-agent via daeomn server | |
| let modelname = modelfile.from.as_ref().unwrap(); | |
| load_model(&modelname).await.unwrap(); | |
| println!("Running in interactive mode"); | |
| loop { | |
| print!(">> "); | |
| stdout.flush().unwrap(); | |
| let mut input = String::new(); | |
| stdin.read_line(&mut input).unwrap(); | |
| let input = input.trim(); | |
| match input { | |
| "exit" => { | |
| println!("Exiting interactive mode"); | |
| break; | |
| } | |
| _ => { | |
| if let Ok(response) = chat(input, &modelname).await { | |
| println!(">> {}", response) | |
| } else { | |
| println!(">> failed to respond") | |
| } | |
| } | |
| } | |
| } | |
| Ok(()) | |
| } | |
| async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> { | |
| // println!("gonna ping"); | |
| // let _ = ping().await; | |
| let stdin = io::stdin(); | |
| let mut stdout = io::stdout(); | |
| // load the model via daemon server | |
| let modelname = modelfile.from.as_ref().unwrap().to_owned(); | |
| if let Err(e) = load_model(&modelname).await { | |
| eprintln!("❌ Failed to start model on server: {}", e); | |
| return Ok(()); | |
| } | |
| println!("Running in interactive mode"); | |
| loop { | |
| print!(">> "); | |
| stdout.flush().unwrap(); | |
| let mut input = String::new(); | |
| stdin.read_line(&mut input).unwrap(); | |
| let input = input.trim(); | |
| match input { | |
| "exit" => { | |
| println!("Exiting interactive mode"); | |
| break; | |
| } | |
| _ => { | |
| if let Ok(response) = chat(input, &modelname).await { | |
| println!(">> {}", response) | |
| } else { | |
| println!(">> failed to respond") | |
| } | |
| } | |
| } | |
| } | |
| Ok(()) | |
| } |
🤖 Prompt for AI Agents
In src/runner/mlx.rs around lines 73 to 103, several unwrap() calls in the
interactive server path (loading the model, flushing stdout, reading stdin, and
awaiting chat) can panic if the daemon/server is down or returns bad data;
replace those unwraps with proper error handling: handle load_model errors by
returning or logging a clear error and keep the CLI running (do not panic),
check and handle stdout.flush() and stdin.read_line() errors with a warning and
continue the loop, and handle chat(...) errors by logging the error details and
printing a friendly failure message while continuing the interactive loop;
ensure the function still returns a Result and propagate non-recoverable errors
upward but avoid panics inside the interactive loop.
src/runner/mlx.rs
Outdated
| async fn load_model(model_name: &str) -> Result<(), String> { | ||
| let client = Client::new(); | ||
| let body = json!({ | ||
| "model": model_name | ||
| }); | ||
| let res = client | ||
| .post("http://127.0.0.1:6969/start") | ||
| .json(&body) | ||
| .send() | ||
| .await | ||
| .unwrap(); | ||
| // println!("{:?}", res); | ||
| if res.status() == 200 { | ||
| Ok(()) | ||
| } else { | ||
| Err(String::from("request failed")) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Return proper HTTP errors; add timeout and is_success.
Network failures should not panic. Include status in errors.
async fn load_model(model_name: &str) -> Result<(), String> {
- let client = Client::new();
+ let client = reqwest::Client::builder()
+ .timeout(Duration::from_secs(30))
+ .build()
+ .map_err(|e| e.to_string())?;
let body = json!({
"model": model_name
});
- let res = client
- .post("http://127.0.0.1:6969/start")
+ let res = client
+ .post(format!("{}/start", BASE_URL))
.json(&body)
.send()
- .await
- .unwrap();
+ .await
+ .map_err(|e| e.to_string())?;
- // println!("{:?}", res);
- if res.status() == 200 {
+ if res.status().is_success() {
Ok(())
} else {
- Err(String::from("request failed"))
+ Err(format!("request failed: {}", res.status()))
}
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/runner/mlx.rs around lines 112-129, replace the unwrap and simplistic
status check with proper error handling and a request timeout: construct the
reqwest Client with a reasonable timeout, await the send() call without unwrap
and match on its Result to convert network errors into Err(String) (include the
error message), then check res.status().is_success() and on failure return Err
including the numeric status and optionally the response text; ensure no panics
and that timeouts/network failures become descriptive Err(String) returns
instead of unwrapping.
| async fn chat(input: &str, model_name: &str) -> Result<String, String> { | ||
| let client = Client::new(); | ||
| let body = json!({ | ||
| "model": model_name, | ||
| "messages": [{"role": "user", "content": input}] | ||
| }); | ||
| let res = client | ||
| .post("http://127.0.0.1:6969/v1/chat/completions") | ||
| .json(&body) | ||
| .send() | ||
| .await | ||
| .unwrap(); | ||
| // println!("{:?}", res); | ||
| if res.status() == 200 { | ||
| let text = res.text().await.unwrap(); | ||
| let v: Value = serde_json::from_str(&text).unwrap(); | ||
| let content = v["choices"][0]["message"]["content"] | ||
| .as_str() | ||
| .unwrap_or("<no content>"); | ||
| Ok(content.to_owned()) | ||
| } else { | ||
| Err(String::from("request failed")) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid unwraps; parse response safely and surface non‑200s.
Also add a longer timeout for generation requests.
async fn chat(input: &str, model_name: &str) -> Result<String, String> {
- let client = Client::new();
+ let client = reqwest::Client::builder()
+ .timeout(Duration::from_secs(60))
+ .build()
+ .map_err(|e| e.to_string())?;
let body = json!({
"model": model_name,
"messages": [{"role": "user", "content": input}]
});
- let res = client
- .post("http://127.0.0.1:6969/v1/chat/completions")
+ let res = client
+ .post(format!("{}/v1/chat/completions", BASE_URL))
.json(&body)
.send()
- .await
- .unwrap();
- // println!("{:?}", res);
- if res.status() == 200 {
- let text = res.text().await.unwrap();
- let v: Value = serde_json::from_str(&text).unwrap();
+ .await
+ .map_err(|e| e.to_string())?;
+ if res.status().is_success() {
+ let text = res.text().await.map_err(|e| e.to_string())?;
+ let v: Value = serde_json::from_str(&text).map_err(|e| e.to_string())?;
let content = v["choices"][0]["message"]["content"]
.as_str()
.unwrap_or("<no content>");
Ok(content.to_owned())
} else {
- Err(String::from("request failed"))
+ Err(format!("request failed: {}", res.status()))
}
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/runner/mlx.rs around lines 131 to 154, replace the current chat
function's unwraps and silent failure on non-200 by building the reqwest Client
with a longer timeout, using awaited results with ?/map_err (or match) to
propagate errors instead of unwrap, checking response.status().is_success() and
on non-success returning an Err that includes the status code and response body,
and parsing JSON with serde_json::from_str handling its Result so missing fields
produce a clear Err; finally extract the content by safely indexing with
get(...) and as_str().map(|s| s.to_owned()) or return an Err if absent—this
ensures timeouts are extended and all network/parse errors are surfaced instead
of panicking.
While loading the model we create the necessary files which will track the memory_path and then the memory_path to the py server in /start api, instead of py server handling the creation of memory_path. This allows us to configure the memory_path later via cli.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (9)
src/runner/mlx.rs (4)
11-18: Guard against missing FROM and relax mem‑agent detection.The
unwrap()at line 12 will panic ifmodelfile.fromisNone. The exact prefix match at line 13 is also brittle and will fail if the model name changes slightly.This issue was previously flagged. Please refer to the past review comment for the suggested fix.
75-106: Don't unwrap in the interactive server path; handle failures gracefully.Multiple
unwrap()calls at lines 81, 82, 83, 87, and 89 can panic if the server is down, the modelfile is missing FROM, or IO operations fail. This will crash the CLI instead of allowing graceful recovery.This issue was previously flagged. Please refer to the past review comment for the suggested fix.
115-132: Return proper HTTP errors; add timeout and useis_success().Line 126 unwraps the network request which can panic. Line 127 uses exact status comparison instead of
is_success(). No timeout is configured, so requests can hang indefinitely.This issue was previously flagged. Please refer to the past review comment for the suggested fix.
134-157: Avoid unwraps; parse response safely and surface non‑200s.Multiple
unwrap()calls at lines 145, 148, and 149 can panic on network failures or malformed JSON. Line 147 uses exact status comparison. No timeout is configured for what could be a long-running generation request.This issue was previously flagged. Please refer to the past review comment for the suggested fix.
server/api.py (5)
46-46: Critical: Fix invalid type/initialization for _runner.This issue was flagged in previous reviews but remains unresolved. The global
_runneris typed asMLXRunnerbut initialized to{}(an empty dict), which will cause type errors.Apply this diff:
-_runner: MLXRunner = {} +from typing import Optional +_runner: Optional[MLXRunner] = None
132-132: Critical: Fix invalid agent instantiation syntax.This issue was flagged in previous reviews. Line 132
agent: Agent()is invalid—it invokesAgent()at import time without properly binding the variable.Apply this diff:
-agent: Agent() +agent: Optional[Agent] = NoneThen instantiate the agent inside the function that needs it (e.g., within
start_modelor as a module-level singletonagent = Agent()).
134-177: Fix model path validation and remove debug prints.Several issues remain from previous reviews:
- Lines 138, 152-153: Debug print statements should be removed (potential PII).
- Line 143:
model_path.exists()can raiseAttributeErrorifget_model_pathreturnsNoneformodel_path(percache_utils.pyline 186, it can return(None, model_name, commit_hash)).- Line 142:
commit_hashis unpacked but never used.- Lines 145-146: Broad exception handling without proper chaining.
Apply this diff:
- print(model_spec) # Use the existing model path resolution from cache_utils try: - model_path, model_name, commit_hash = get_model_path(model_spec) - if not model_path.exists(): + model_path, model_name, _commit_hash = get_model_path(model_spec) + if not model_path or not Path(model_path).exists(): raise HTTPException(status_code=404, detail=f"Model {model_spec} not found in cache") except Exception as e: - raise HTTPException(status_code=404, detail=f"Model {model_spec} not found: {str(e)}") + raise HTTPException(status_code=404, detail=f"Model {model_spec} not found: {str(e)}") from e # Check if it's an MLX model model_path_str = str(model_path) - print(_current_model_path) - print(model_path_str) # Check if we need to load a different modelAlso add the
Pathimport at the top:from pathlib import Path
195-207: Remove PII prints and avoid global conversation state.This endpoint has issues flagged in previous reviews:
- Line 199:
print(str(request))leaks PII and should be removed.- Line 200: Resetting global
_messageswill cause conversations from concurrent requests to interfere with each other.Apply this diff:
- global _messages, _runner,_memory_path - print(str(request)) - _messages = [ChatMessage(role="system", content=SYSTEM_PROMPT)] + global _runner, _memory_path _memory_path = request.memory_path try: _runner = get_or_load_model(request.model)Note: Conversation history should be managed per-request or per-session (e.g., passed in the request body or stored with a session ID) rather than in global state.
208-313: Critical: Fix blocking calls, global state, and PII leaks.This endpoint has multiple unresolved issues from previous reviews:
Lines 234-241, 270-272:
runner.generate_batch()is CPU-bound and blocks the asyncio event loop. This will make the server unresponsive during generation.Lines 229, 266, 279: Mutating global
_messagescauses conversations from concurrent clients to interfere with each other.Lines 251, 261-262, 273: Debug print statements leak PII (generated text, results).
Line 275:
thoughtsis assigned but never used.Line 301: No fallback when
replyis empty, resulting in blank content.Apply this comprehensive diff:
+from starlette.concurrency import run_in_threadpool + @app.post("/v1/chat/completions") async def create_chat_completion(request: ChatCompletionRequest): """Create a chat completion.""" - global _messages, _max_tool_turns, _memory_path + global _max_tool_turns, _memory_path try: runner = get_or_load_model(request.model) # Non-streaming response completion_id = f"chatcmpl-{uuid.uuid4()}" created = int(time.time()) - # Convert messages to dict format for runner - # _messages.append(system_message) - _messages.extend(request.messages) - message_dicts = format_chat_messages_for_runner(_messages) + # Build per-request message history (avoid shared globals) + messages = [ChatMessage(role="system", content=SYSTEM_PROMPT), *request.messages] + message_dicts = format_chat_messages_for_runner(messages) # Let the runner format with chat templates prompt = runner._format_conversation(message_dicts, use_chat_template=True) - generated_text = runner.generate_batch( + generated_text = await run_in_threadpool( + runner.generate_batch, prompt=prompt, - max_tokens=runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False), - temperature=request.temperature, - top_p=request.top_p, - repetition_penalty=request.repetition_penalty, - use_chat_template=False # Already applied in _format_conversation + runner.get_effective_max_tokens(request.max_tokens or _default_max_tokens, interactive=False), + request.temperature, + request.top_p, + request.repetition_penalty, + 20, # repetition_context_size + False, # use_chat_template + False, # interactive ) - thoughts = extract_thoughts(generated_text) reply = extract_reply(generated_text) python_code = extract_python_code(generated_text) - print(generated_text) result = ({}, "") if python_code: create_memory_if_not_exists() @@ -260,26 +252,25 @@ import_module="server.mem_agent.tools", ) - print(reply) - print(str(result)) remaining_tool_turns = _max_tool_turns while remaining_tool_turns > 0 and not reply: - _messages.append(ChatMessage(role="user", content=format_results(result[0], result[1]))) - message_dicts = format_chat_messages_for_runner(_messages) + messages.append(ChatMessage(role="user", content=format_results(result[0], result[1]))) + message_dicts = format_chat_messages_for_runner(messages) # Let the runner format with chat templates prompt = runner._format_conversation(message_dicts, use_chat_template=True) - generated_text = runner.generate_batch( - prompt=prompt + generated_text = await run_in_threadpool( + runner.generate_batch, + prompt, + None, 0.7, 0.9, 1.1, 20, False, False ) - print(generated_text) # Extract the thoughts, reply and python code from the response - thoughts = extract_thoughts(generated_text) reply = extract_reply(generated_text) python_code = extract_python_code(generated_text) - _messages.append(ChatMessage(role="assistant", content=generated_text)) + messages.append(ChatMessage(role="assistant", content=generated_text)) if python_code: create_memory_if_not_exists() result = execute_sandboxed_code( @@ -298,7 +289,7 @@ { "index": 0, "message": { "role": "assistant", - "content": reply + "content": reply or generated_text.strip() }, "finish_reason": "stop" }
🧹 Nitpick comments (1)
src/runner/mlx.rs (1)
108-113: Remove or document the unusedpingfunction.The
pingfunction is not called anywhere in the codebase. If it's intended for future use or debugging, consider marking it with#[allow(dead_code)]and adding a doc comment explaining its purpose. Otherwise, remove it to reduce maintenance burden.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/api.py(1 hunks)src/runner/mlx.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{rs,toml}
⚙️ CodeRabbit configuration file
Review the Rust code for conformity with best practices in Rust, Systems programming. Highlight any deviations.
Files:
src/runner/mlx.rs
🧬 Code graph analysis (2)
src/runner/mlx.rs (3)
src/commands/mod.rs (1)
run(11-18)src/core/modelfile.rs (4)
new(88-90)new(107-119)from_str(64-71)from_str(235-237)server/api.py (1)
ping(192-193)
server/api.py (5)
server/cache_utils.py (1)
get_model_path(182-201)server/mlx_runner.py (5)
cleanup(321-357)load_model(128-180)_format_conversation(765-794)generate_batch(584-689)get_effective_max_tokens(359-389)server/mem_agent/utils.py (5)
extract_python_code(152-172)extract_reply(175-182)extract_thoughts(185-192)create_memory_if_not_exists(57-71)format_results(195-203)server/mem_agent/engine.py (1)
execute_sandboxed_code(200-314)src/runner/mlx.rs (2)
load_model(115-132)ping(108-113)
🪛 Ruff (0.14.1)
server/api.py
111-111: Unused method argument: memory_path
(ARG002)
111-111: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
113-113: Unused method argument: model
(ARG002)
113-113: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
114-114: Unused method argument: predetermined_memory_path
(ARG002)
115-115: Unused method argument: model_cache
(ARG002)
115-115: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
116-116: Unused method argument: current_model_path
(ARG002)
117-117: Unused method argument: default_max_tokens
(ARG002)
142-142: Unpacked variable commit_hash is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
144-144: Abstract raise to an inner function
(TRY301)
145-145: Do not catch blind exception: Exception
(BLE001)
146-146: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
146-146: Use explicit conversion flag
Replace with conversion flag
(RUF010)
162-163: try-except-pass detected, consider logging the exception
(S110)
162-162: Do not catch blind exception: Exception
(BLE001)
204-204: Consider moving this statement to an else block
(TRY300)
205-205: Do not catch blind exception: Exception
(BLE001)
206-206: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
275-275: Local variable thoughts is assigned to but never used
Remove assignment to unused variable thoughts
(F841)
312-312: Do not catch blind exception: Exception
(BLE001)
313-313: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
| class Agent: | ||
| def __init__( | ||
| self, | ||
| max_tool_turns: int = 20, | ||
| memory_path: str = None, | ||
| use_vllm: bool = False, | ||
| model: str = None, | ||
| predetermined_memory_path: bool = False, | ||
| model_cache: Dict[str, MLXRunner] = {}, | ||
| current_model_path: Optional[str] = None, | ||
| default_max_tokens: Optional[int] = None # Use dynamic model-aware limits by default | ||
|
|
||
| ): | ||
| # Load the system prompt and add it to the conversation history | ||
| self.system_prompt = SYSTEM_PROMPT | ||
| self.messages: list[ChatMessage] = [ | ||
| ChatMessage(role="system", content=self.system_prompt) | ||
| ] | ||
|
|
||
| # Set the maximum number of tool turns and use_vllm flag | ||
| self.max_tool_turns = max_tool_turns | ||
| self.use_vllm = use_vllm | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Remove unused Agent parameters or complete implementation.
The Agent class declares several unused parameters: memory_path, model, predetermined_memory_path, model_cache, current_model_path, and default_max_tokens. Additionally, line 115 uses a mutable default argument which is an anti-pattern in Python.
Apply this diff to clean up the unused parameters and fix the mutable default:
class Agent:
def __init__(
self,
max_tool_turns: int = 20,
- memory_path: str = None,
use_vllm: bool = False,
- model: str = None,
- predetermined_memory_path: bool = False,
- model_cache: Dict[str, MLXRunner] = {},
- current_model_path: Optional[str] = None,
- default_max_tokens: Optional[int] = None # Use dynamic model-aware limits by default
-
):
# Load the system prompt and add it to the conversation history
self.system_prompt = SYSTEM_PROMPT
self.messages: list[ChatMessage] = [
ChatMessage(role="system", content=self.system_prompt)
]
# Set the maximum number of tool turns and use_vllm flag
self.max_tool_turns = max_tool_turns
self.use_vllm = use_vllmIf these parameters are intended for future use, implement them in the body; otherwise, remove them to avoid confusion.
🧰 Tools
🪛 Ruff (0.14.1)
111-111: Unused method argument: memory_path
(ARG002)
111-111: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
113-113: Unused method argument: model
(ARG002)
113-113: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
114-114: Unused method argument: predetermined_memory_path
(ARG002)
115-115: Unused method argument: model_cache
(ARG002)
115-115: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
116-116: Unused method argument: current_model_path
(ARG002)
117-117: Unused method argument: default_max_tokens
(ARG002)
🤖 Prompt for AI Agents
In server/api.py around lines 107 to 129, the Agent __init__ declares unused
parameters (memory_path, model, predetermined_memory_path, model_cache,
current_model_path, default_max_tokens) and uses a mutable default for
model_cache; either remove the unused parameters or wire them to instance state,
and fix the mutable default by changing model_cache: Optional[Dict[str,
MLXRunner]] = None and then inside the body set self.model_cache = model_cache
or {}. If you keep parameters for future use, assign them to self (e.g.,
self.memory_path = memory_path, self.model = model, etc.) and ensure
default_max_tokens is stored or documented; otherwise drop them from the
signature to avoid confusion.
| // build the arg list from modelfile | ||
| let mut args: Vec<String> = vec![]; | ||
| args.push("--model".to_owned()); | ||
| args.push(modelfile.from.unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't unwrap modelfile.from; handle the missing FROM case.
Line 24 will panic if modelfile.from is None, which is inconsistent with the good error handling elsewhere in this function.
Apply this diff to handle the missing FROM case gracefully:
- args.push(modelfile.from.unwrap());
+ let Some(from) = modelfile.from else {
+ eprintln!("❌ Error: Modelfile is missing a FROM instruction");
+ return;
+ };
+ args.push(from);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| args.push(modelfile.from.unwrap()); | |
| let Some(from) = modelfile.from else { | |
| eprintln!("❌ Error: Modelfile is missing a FROM instruction"); | |
| return; | |
| }; | |
| args.push(from); |
🤖 Prompt for AI Agents
In src/runner/mlx.rs around line 24, don't unwrap modelfile.from; instead handle
the None case gracefully: check whether modelfile.from is Some and push its
value into args, or if None return/propagate a proper error (with a clear
message consistent with other errors in this function) rather than panicking;
use pattern matching (if let/ match) or map_err to convert the missing FROM into
the function's error type and include context about the missing FROM field.
| } | ||
|
|
||
| fn get_memory_path() -> Result<String> { | ||
| let home_dir = env::home_dir().context("Failed to fetch $HOME")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace deprecated env::home_dir() with the dirs crate.
std::env::home_dir() has been deprecated since Rust 1.29. It can produce incorrect results on Windows and has platform-specific issues. Use the dirs crate (specifically dirs::home_dir()) or the home crate instead.
First, add the dependency to Cargo.toml:
dirs = "5.0"Then apply this diff:
fn get_memory_path() -> Result<String> {
- let home_dir = env::home_dir().context("Failed to fetch $HOME")?;
+ let home_dir = dirs::home_dir().context("Failed to fetch home directory")?;
let config_dir = match env::var("XDG_CONFIG_HOME") {And update the imports:
-use std::{env, fs};
+use std::{fs, env};🤖 Prompt for AI Agents
In src/runner/mlx.rs around line 160, replace the deprecated
std::env::home_dir() call with dirs::home_dir(): add the dirs = "5.0" dependency
to Cargo.toml, update the imports to use dirs::home_dir (and remove or stop
using std::env::home_dir), call dirs::home_dir().context("Failed to fetch
$HOME")? in place of the old call, and adjust any use of the returned
PathBuf/Option handling if necessary to preserve the existing error path.
| fs::create_dir_all(&tiles_config_dir).context("Failed to create tiles config directory")?; | ||
| fs::write( | ||
| tiles_config_dir.join(".memory_path"), | ||
| memory_path.to_str().unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't unwrap to_str(); handle non‑UTF-8 paths.
Line 190 will panic if the memory path contains invalid UTF-8 characters. Use to_string_lossy() (as done correctly at line 193) or propagate an error instead.
Apply this diff:
fs::write(
tiles_config_dir.join(".memory_path"),
- memory_path.to_str().unwrap(),
+ memory_path.to_string_lossy().as_ref(),
)
.context("Failed to write the default path to .memory_path")?;🤖 Prompt for AI Agents
In src/runner/mlx.rs around line 190, the call memory_path.to_str().unwrap() can
panic on non‑UTF8 paths; replace it with a non‑panicking approach (use
memory_path.to_string_lossy() like at line 193, or propagate a Result error) so
invalid UTF‑8 is handled safely—update the argument usage to accept the lossy
string (or change the function signature to return/propagate an error) to avoid
unwrap() panics.
- Added rust-toolchain.toml for consistent rust runtime - Added justfile for easy running commands - Updated CI to support toolchain and just, and cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/runner/mlx.rs (2)
55-73: Do not std::process::exit in library code; return early instead.Exiting the process here harms callers and tests. Log and return.
Err(e) => { if e.kind() == std::io::ErrorKind::NotFound { eprintln!("❌ Error: mlx_lm.chat command not found"); eprintln!("💡 Hint: Install mlx-lm by running: pip install mlx-lm"); eprintln!("📝 Note: mlx-lm is only available on macOS with Apple Silicon"); - std::process::exit(1); + return; } else { eprintln!("❌ Error: Failed to spawn mlx_lm.chat: {}", e); - std::process::exit(1); + return; } }
11-106: Address panic paths and deprecated APIs in src/runner/mlx.rsVerification confirms multiple issues that require fixes:
11 unwrap() calls panic on error (lines 12, 24, 81, 82, 83, 87, 89, 126, 145, 148, 149, 190): Replace with proper error handling using
?,.context(), or.map_err().Deprecated
env::home_dir()(line 160): Replace withdirs::home_dir()orstd::env::var("HOME").Hard-coded daemon URLs (lines 110, 122, 141 with
http://127.0.0.1:6969): Extract to a constant or configuration.
♻️ Duplicate comments (7)
src/runner/mlx.rs (7)
20-26: Avoid unwrap on FROM in subprocess path.Gracefully handle missing FROM rather than panicking.
- args.push("--model".to_owned()); - args.push(modelfile.from.unwrap()); + args.push("--model".to_owned()); + let Some(from) = modelfile.from else { + eprintln!("❌ Error: Modelfile is missing a FROM instruction"); + return; + }; + args.push(from);
11-18: Handle missing FROM and relax mem‑agent detection; don’t drop errors.Unwrap can panic; make detection flexible and surface server errors.
-pub async fn run(modelfile: Modelfile) { - let model = modelfile.from.as_ref().unwrap(); - if model.starts_with("driaforall/mem-agent") { - let _res = run_model_with_server(modelfile).await; - } else { - run_model_by_sub_process(modelfile); - } -} +pub async fn run(modelfile: Modelfile) { + let Some(model) = modelfile.from.as_deref() else { + eprintln!("❌ Error: Modelfile is missing a FROM instruction"); + return; + }; + if is_mem_agent(model) { + if let Err(e) = run_model_with_server(modelfile).await { + eprintln!("❌ Server path failed: {}", e); + } + } else { + run_model_by_sub_process(modelfile); + } +} + +fn is_mem_agent(model: &str) -> bool { + model.to_ascii_lowercase().contains("mem-agent") +}
75-106: Interactive server path: replace unwraps, surface errors, and keep session alive.Multiple unwraps can panic; also prefer anyhow::Result for consistency.
-async fn run_model_with_server(modelfile: Modelfile) -> reqwest::Result<()> { +async fn run_model_with_server(modelfile: Modelfile) -> Result<()> { let stdin = io::stdin(); let mut stdout = io::stdout(); // loading the model from mem-agent via daemon server - let memory_path = get_memory_path() - .context("Retrieving memory_path failed") - .unwrap(); - let modelname = modelfile.from.as_ref().unwrap(); - load_model(modelname, &memory_path).await.unwrap(); + let memory_path = get_memory_path().context("Retrieving memory_path failed")?; + let Some(modelname) = modelfile.from.as_deref() else { + eprintln!("❌ Error: Modelfile is missing a FROM instruction"); + return Ok(()); + }; + if let Err(e) = load_model(modelname, &memory_path).await { + eprintln!("❌ Failed to start model on server: {}", e); + return Ok(()); + } println!("Running in interactive mode"); loop { print!(">> "); - stdout.flush().unwrap(); + if let Err(e) = stdout.flush() { + eprintln!("⚠️ flush failed: {}", e); + } let mut input = String::new(); - stdin.read_line(&mut input).unwrap(); + if let Err(e) = stdin.read_line(&mut input) { + eprintln!("⚠️ read_line failed: {}", e); + continue; + } let input = input.trim(); match input { "exit" => { println!("Exiting interactive mode"); break; } _ => { - if let Ok(response) = chat(input, modelname).await { - println!(">> {}", response) - } else { - println!(">> failed to respond") - } + match chat(input, modelname).await { + Ok(response) => println!(">> {}", response), + Err(e) => println!(">> failed to respond: {}", e), + } } } } Ok(()) }
115-132: HTTP start: add timeout, check is_success, remove unwraps, include status/body.Avoid panics and provide actionable errors.
async fn load_model(model_name: &str, memory_path: &str) -> Result<(), String> { - let client = Client::new(); + let client = reqwest::Client::builder() + .timeout(Duration::from_secs(30)) + .build() + .map_err(|e| e.to_string())?; let body = json!({ "model": model_name, "memory_path": memory_path }); let res = client - .post("http://127.0.0.1:6969/start") + .post(format!("{}/start", DEFAULT_BASE_URL)) .json(&body) .send() - .await - .unwrap(); - if res.status() == 200 { + .await + .map_err(|e| e.to_string())?; + if res.status().is_success() { Ok(()) } else { - Err(String::from("request failed")) + let status = res.status(); + let body = res.text().await.unwrap_or_default(); + Err(format!("request failed: {} - {}", status, body)) } }
134-157: HTTP chat: add timeout, check is_success, parse JSON safely, remove unwraps.Prevent panics; return clear errors on malformed responses.
async fn chat(input: &str, model_name: &str) -> Result<String, String> { - let client = Client::new(); + let client = reqwest::Client::builder() + .timeout(Duration::from_secs(60)) + .build() + .map_err(|e| e.to_string())?; let body = json!({ "model": model_name, "messages": [{"role": "user", "content": input}] }); let res = client - .post("http://127.0.0.1:6969/v1/chat/completions") + .post(format!("{}/v1/chat/completions", DEFAULT_BASE_URL)) .json(&body) .send() - .await - .unwrap(); - // println!("{:?}", res); - if res.status() == 200 { - let text = res.text().await.unwrap(); - let v: Value = serde_json::from_str(&text).unwrap(); - let content = v["choices"][0]["message"]["content"] - .as_str() - .unwrap_or("<no content>"); - Ok(content.to_owned()) + .await + .map_err(|e| e.to_string())?; + if res.status().is_success() { + let text = res.text().await.map_err(|e| e.to_string())?; + let v: Value = serde_json::from_str(&text).map_err(|e| e.to_string())?; + let content = v + .get("choices").and_then(|c| c.get(0)) + .and_then(|c0| c0.get("message")) + .and_then(|m| m.get("content")) + .and_then(|c| c.as_str()) + .ok_or_else(|| "missing choices[0].message.content".to_string())?; + Ok(content.to_owned()) } else { - Err(String::from("request failed")) + let status = res.status(); + let body = res.text().await.unwrap_or_default(); + Err(format!("request failed: {} - {}", status, body)) } }
159-165: Replace deprecated env::home_dir() with dirs::home_dir().Avoid deprecated API and platform quirks.
fn get_memory_path() -> Result<String> { - let home_dir = env::home_dir().context("Failed to fetch $HOME")?; + let home_dir = dirs::home_dir().context("Failed to fetch home directory")?; let config_dir = match env::var("XDG_CONFIG_HOME") {Add to Cargo.toml:
[dependencies] dirs = "5"
189-191: Don’t unwrap to_str(); handle non‑UTF‑8 paths.Use to_string_lossy() like you already do on Line 193.
fs::write( tiles_config_dir.join(".memory_path"), - memory_path.to_str().unwrap(), + memory_path.to_string_lossy().as_ref(), ) .context("Failed to write the default path to .memory_path")?;
🧹 Nitpick comments (5)
justfile (1)
9-12: Consider adding Python checks for the new server.The PR introduces a FastAPI server (server/), but the justfile only validates Rust code. Consider adding targets for Python linting, formatting (e.g.,
ruff,black), and type checking (e.g.,mypy) to maintain consistency across the polyglot codebase..github/workflows/rust.yml (1)
19-19: Pin the rust-toolchain action to a stable version.Using
@mastermakes the workflow vulnerable to unexpected breaking changes. Pin to a specific version tag or use a stable channel reference.Apply this diff to pin to a stable version:
- - uses: dtolnay/rust-toolchain@master + - uses: dtolnay/rust-toolchain@stableAlternatively, if you need a specific toolchain version (e.g., from
rust-toolchain.toml), you can use:- uses: dtolnay/[email protected] with: toolchain: stablesrc/runner/mlx.rs (3)
1-7: Introduce a default BASE_URL and request timeouts; avoid magic strings.Centralize the daemon URL and prepare for timeouts in HTTP helpers.
use anyhow::{Context, Result}; use reqwest::Client; use serde_json::{Value, json}; use std::io::Write; use std::path::PathBuf; -use std::{env, fs}; +use std::{env, fs}; +use std::time::Duration; use std::{io, process::Command}; +// Default mem-agent daemon URL; allow overriding via env in helpers if desired. +const DEFAULT_BASE_URL: &str = "http://127.0.0.1:6969";
176-184: Trim newline when reading .memory_path.fs::read_to_string includes a trailing newline; trim to avoid invalid paths.
- if tiles_config_dir.is_dir() - && let Ok(content) = fs::read_to_string(tiles_config_dir.join(".memory_path")) - { - memory_path = content; + if tiles_config_dir.is_dir() + && let Ok(content) = fs::read_to_string(tiles_config_dir.join(".memory_path")) + { + memory_path = content.trim().to_string(); is_memory_path_found = true; }
115-157: Optional: allow overriding base URL via env.Keep DEFAULT_BASE_URL but let users set TILES_MEM_AGENT_URL externally.
- .post(format!("{}/start", DEFAULT_BASE_URL)) + .post(format!("{}/start", env::var("TILES_MEM_AGENT_URL").unwrap_or_else(|_| DEFAULT_BASE_URL.to_string()))) @@ - .post(format!("{}/v1/chat/completions", DEFAULT_BASE_URL)) + .post(format!("{}/v1/chat/completions", env::var("TILES_MEM_AGENT_URL").unwrap_or_else(|_| DEFAULT_BASE_URL.to_string())))Also applies to: 75-106
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/rust.yml(1 hunks)justfile(1 hunks)rust-toolchain.toml(1 hunks)src/runner/mlx.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{rs,toml}
⚙️ CodeRabbit configuration file
Review the Rust code for conformity with best practices in Rust, Systems programming. Highlight any deviations.
Files:
rust-toolchain.tomlsrc/runner/mlx.rs
🧬 Code graph analysis (1)
src/runner/mlx.rs (2)
src/commands/mod.rs (1)
run(11-18)src/core/modelfile.rs (2)
new(88-90)new(107-119)
🔇 Additional comments (4)
rust-toolchain.toml (1)
1-3: Rust 1.90.0 is a valid release. Rust 1.90.0 is an official stable release (released September 18, 2025). The toolchain configuration is well-formed, follows Rust best practices for reproducible builds, and specifies appropriate standard components.justfile (1)
1-12: LGTM! Clean and idiomatic justfile structure.The justfile provides a standard Rust development workflow with appropriate settings for CI:
- Format checking with
--checkflag- Clippy with warnings-as-errors
- Sequential execution that fails fast
.github/workflows/rust.yml (2)
20-30: LGTM! Cargo caching properly configured.The cache configuration follows best practices by caching cargo binaries, registry, git dependencies, and build artifacts. The cache key based on
Cargo.lockensures cache invalidation when dependencies change, and the restore-keys provide good fallback behavior.
32-36: LGTM! Integration with justfile is clean.The installation check for
justand subsequent execution ofjust checkproperly centralizes the build/test logic, making it consistent between local development and CI.
No description provided.