fix(logger): preserve traceback in JSON sink#2594
Open
mikasenghaas wants to merge 3 commits into
Open
Conversation
loguru's RecordException.__reduce__ unconditionally drops the traceback object before pickling, so records dequeued by the enqueue=True worker have exc.traceback=None and the JSON sink emits only the exception type + message - no frames. Pre-format the traceback into extra["_traceback"] via a loguru patcher that runs at log time, while the live traceback is still attached. The string survives pickling, and build_log_entry reads it back into the "exception" field. enqueue=True is kept because the trainer's torchrun workers all write JSON lines to a shared inherited stdout, and loguru's docs flag this as the case where the multiprocessing-safe queue is required to avoid interleaved/corrupt output.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
record["extra"]["traceback"]via atraceback_patcherthat runs at log time, beforeenqueue=Truepickles the record. The JSON sink reads it back into the"exception"field, restoring full stack frames in JSON logs.enqueue=True— required for multiprocess safety of trainer torchrun workers sharing stdout.Why
When
--log.json_loggingis enabled, exception logs lose their stack frames. Reproduction with the orchestrator entrypoint produced:{"level": "ERROR", "message": "Fatal error in orchestrate", "exception": "RuntimeError: ...\n"}No
Traceback (most recent call last):header, no frames — just the exception type + message.Root cause
loguru/_recattrs.py:68—RecordException.__reduce__unconditionally sets the traceback toNonebefore pickling, because traceback objects aren't picklable:With
enqueue=Truethe record is pickled to cross the queue, so by the timejson_sinkreceives it bothexc.tracebackandexc.value.__traceback__areNone, andtraceback.format_exception(...)renders only the head line.Verification
Injected
raise RuntimeError("...")at the start oforchestrate(), ranorchestrator @ configs/debug/orch.toml --log.json_logging:Before:
{"level": "ERROR", "exception": "RuntimeError: ...\n"}After:
{"level": "ERROR", "exception": "Traceback (most recent call last):\n File \".../utils.py\", line 59, in async_wrapper\n ret = await func(*args, **kwargs)\n File \".../orchestrator.py\", line 96, in orchestrate\n raise RuntimeError(...)\nRuntimeError: ...\n"}Why not drop
enqueue=True?Per loguru docs:
The trainer's torchrun workers all write JSON lines to a shared inherited stdout. Without the queue, concurrent writes from different processes can interleave at byte boundaries and corrupt JSON lines. Loguru's per-sink lock handles intra-process threading but is not multiprocess-safe.
Note
Medium Risk
Changes how exceptions are captured and serialized in JSON logging mode (via Loguru patchers and
extramutation), which could affect error visibility and structured log fields across async/multiprocess logging.Overview
Fixes JSON logging so exception stack traces are preserved when the sink runs with
enqueue=True.This pre-formats the traceback at log time via a new
traceback_patcher(stored inrecord["extra"]["traceback"]) and updatesbuild_log_entry()to prefer that preformatted traceback when populating the JSONexceptionfield, falling back to formattingrecord["exception"]only when needed.Reviewed by Cursor Bugbot for commit 7d6b55b. Bugbot is set up for automated code reviews on this repo. Configure here.