Recover from SSD streaming errors without crashing#38
Merged
Conversation
Convert ThreadSafeError.check() from fatalError (which crashes the entire app) to a global SSDStreamingErrorLatch pattern. When a pread I/O error occurs on truncated/corrupted safetensors files, the error is now posted to the latch instead of killing the process. The generation loop in Evaluate.swift checks the latch: - After model.prepare() during prefill (catches errors during prompt processing) - After each token in the generation loop (catches errors during decoding) This allows the consuming code (InferenceEngine) to surface the error gracefully in the UI and prompt the user to re-download the model. Also adds SSDStreamingError and SSDStreamingErrorLatch as public types for downstream consumers.
There was a problem hiding this comment.
Pull request overview
This PR changes MLXLMCommon’s SSD expert streaming error handling from a process-terminating crash to a recoverable, latched error that can be detected during prompt prefill and token generation.
Changes:
- Introduces
SSDStreamingErrorand a globalSSDStreamingErrorLatchto record/consume streaming I/O failures from non-throwing paths. - Adds latch checks during
TokenIterator.prepare(...)(prefill/logits) to throw early if streaming errors occurred. - Adds a per-token latch check in the async generation loop to stop generation when a streaming error is detected.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| Libraries/MLXLMCommon/Evaluate.swift | Checks the SSD streaming error latch during prefill and during async generation iteration. |
| Libraries/MLXLMCommon/ConcurrentError.swift | Replaces fatalError with a recoverable error latch and introduces a typed SSDStreamingError. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+65
to
76
| /// Check if any error was recorded during concurrent I/O. | ||
| /// | ||
| /// Instead of calling `fatalError` (which crashes the entire app), this | ||
| /// posts the error to the global `SSDStreamingErrorLatch` so the generation | ||
| /// loop can detect it after the current token and surface it gracefully | ||
| /// in the UI (e.g., prompting a re-download). | ||
| package func check() { | ||
| if let error = error { | ||
| fatalError("MLX SSD Streaming Error: \(error.localizedDescription). (The model safetensors file may be corrupted, truncated, or incomplete).") | ||
| SSDStreamingErrorLatch.shared.set( | ||
| SSDStreamingError(underlyingError: error) | ||
| ) | ||
| } |
Comment on lines
+653
to
+662
| // Check for SSD streaming errors that occurred during prefill. | ||
| // The MoE expert pread path uses a non-throwing callAsFunction, | ||
| // so errors are posted to the global latch instead. | ||
| try SSDStreamingErrorLatch.shared.throwIfSet() | ||
|
|
||
| // evaluate the remainder of the prompt -- this primes the pump | ||
| let token = step(previous: y) | ||
|
|
||
| // Check again after step() which also runs through MoE layers | ||
| try SSDStreamingErrorLatch.shared.throwIfSet() |
Comment on lines
+1723
to
+1726
| if let ssdError = SSDStreamingErrorLatch.shared.consume() { | ||
| print("[MLXLMCommon] SSD streaming error detected: \(ssdError.localizedDescription)") | ||
| stopReason = .cancelled | ||
| break |
Comment on lines
+17
to
+21
| public final class SSDStreamingErrorLatch: @unchecked Sendable { | ||
| public static let shared = SSDStreamingErrorLatch() | ||
| private let lock = NSLock() | ||
| private var _error: Error? | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Testing