Skip to content

fix: consistent eprintln! lifecycle messages and scope filtered_urls in process.rs#62

Open
jmagar wants to merge 1 commit intomainfrom
bd-work/fix-process-logging
Open

fix: consistent eprintln! lifecycle messages and scope filtered_urls in process.rs#62
jmagar wants to merge 1 commit intomainfrom
bd-work/fix-process-logging

Conversation

@jmagar
Copy link
Copy Markdown
Owner

@jmagar jmagar commented Apr 8, 2026

Summary

  • Align cancel/cache-hit completion messages to use eprintln! with symbol_for_status/muted styling, matching the existing start/complete messages
  • Keep log_warn for failure path (goes to structured logs AND visible at default WARN threshold)
  • Remove unused log_done import
  • Move filtered_urls computation inside the DB write branch where it is consumed

Beads

  • axon_rust-1s0: Fix inconsistent logging style at job lifecycle boundaries
  • axon_rust-vcz: Scope filtered_urls to DB write branch in spawn_progress_task

Testing

  • All pre-commit hooks pass (rustfmt, clippy, monolith, 1692 tests)
  • No behavioral changes — purely consistency and scoping fixes

Summary by cubic

Make job lifecycle logs consistent by using eprintln! with symbol_for_status/muted for cancel and cache-hit paths, while keeping log_warn for failures. Also scope filtered_urls to the DB write branch to avoid unnecessary work. Addresses axon_rust-1s0 and axon_rust-vcz.

  • Bug Fixes
    • Standardized cancel and cache-hit messages to eprintln!; failures still use log_warn.
    • Removed unused log_done import.
    • Compute filtered_urls only when writing progress to DB in spawn_progress_task.

Written for commit 863c768. Summary will update on new commits.

Summary by CodeRabbit

  • Chores

    • Updated job status output formatting for canceled and completed crawl jobs.
  • Performance

    • Optimized crawl job processing efficiency by refining computation timing.

…ages and scope filtered_urls

- Align cancel/cache-hit completion messages to use eprintln! with
  symbol_for_status/muted styling, matching start/complete messages
- Keep log_warn for failure path (structured log + always visible)
- Remove unused log_done import
- Move filtered_urls computation inside the DB write branch where it
  is actually consumed
Copilot AI review requested due to automatic review settings April 8, 2026 20:38
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

Modified logging infrastructure in the crawl worker process by replacing log macros with stderr output and eprintln! calls, while optimizing filtered URL computation to occur only when database writes are needed.

Changes

Cohort / File(s) Summary
Logging Refactor
crates/jobs/crawl/runtime/worker/process.rs
Removed log_done import and replaced logging calls with eprintln! output using status symbols and formatting helpers (symbol_for_status, muted). Updated completion, cancellation, and failure paths to use stderr-based messaging.
Computation Optimization
crates/jobs/crawl/runtime/worker/process.rs
Moved filtered_urls computation into conditional block to execute only when database writes are due, eliminating unnecessary calculations on cache hits.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A Worker's Tale

The logs now speak in stderr's voice,
With symbols bright, we made a choice,
Computed URLs with careful grace—
Only when they find their place! 📝✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: consistent eprintln! lifecycle messages and filtering of filtered_urls scope in process.rs, which are the primary modifications in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bd-work/fix-process-logging

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes crawl job lifecycle messaging by switching certain boundary messages to styled eprintln! output, and narrows the scope of filtered_urls to where it’s used in the progress DB update path.

Changes:

  • Replace cancel/failure/cache-hit completion messages with styled eprintln! output and remove the unused log_done import.
  • Keep log_warn on the failure path while adjusting the visible failure/cancel messaging.
  • Move filtered_urls computation into the DB-write branch in spawn_progress_task.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +102 to +106
eprintln!(
"{} crawl job {} failed",
symbol_for_status("failed"),
muted(&id.to_string()),
);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure path now writes to stderr twice: once via eprintln! (this new message) and again via log_warn, whose console layer is configured to write WARN events to stderr. This will produce duplicate visible "failed" lines for the same job. Consider emitting only one user-visible line (e.g., keep log_warn for structured+console output and drop the eprintln!, or keep eprintln! and route the structured log somewhere that doesn’t also print to stderr).

Suggested change
eprintln!(
"{} crawl job {} failed",
symbol_for_status("failed"),
muted(&id.to_string()),
);

Copilot uses AI. Check for mistakes.
"{} crawl job {} canceled",
symbol_for_status("canceled"),
muted(&id.to_string()),
);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change removes the log_info call for canceled jobs. Unlike eprintln!, log_info is captured by tracing (including the JSON log file configured in crates/core/logging.rs). If cancellation events are important to retain in structured logs, consider keeping a tracing log for this path (in addition to or instead of the styled eprintln!).

Suggested change
);
);
log_info(&format!("worker canceled crawl job {id}"));

Copilot uses AI. Check for mistakes.
Comment on lines 246 to +251
mark_job_completed(pool, TABLE, id, Some(&result_json)).await?;
log_done(&format!("worker completed crawl job {id} (cache hit)"));
eprintln!(
"{} crawl job {} done (cache hit)",
symbol_for_status("completed"),
muted(&id.to_string()),
);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching the cache-hit completion from log_done to eprintln! means this completion event is no longer emitted through tracing (and therefore won’t appear in the JSON log file). If downstream monitoring/analytics relies on structured "done" events, consider keeping a tracing log (e.g., log_done/log_info) alongside the styled eprintln!.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/jobs/crawl/runtime/worker/process.rs`:
- Around line 96-106: Replace the direct eprintln! lifecycle prints with
structured logging: use log_info to emit the "canceled" and cache-hit completion
messages and keep failures using log_warn; call log_info!(...) or log_warn!(...)
with the same formatted message using symbol_for_status(...) and
muted(&id.to_string()) so the content remains identical, and update the other
eprintln! occurrences (the similar messages later in the file) the same way;
locate instances by searching for eprintln!(..., symbol_for_status("canceled") /
symbol_for_status("failed") and replace accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ccaddc1b-f4e1-40fa-a7af-b332a0bdb441

📥 Commits

Reviewing files that changed from the base of the PR and between 317af8a and 863c768.

📒 Files selected for processing (1)
  • crates/jobs/crawl/runtime/worker/process.rs

Comment on lines +96 to +106
eprintln!(
"{} crawl job {} canceled",
symbol_for_status("canceled"),
muted(&id.to_string()),
);
} else {
eprintln!(
"{} crawl job {} failed",
symbol_for_status("failed"),
muted(&id.to_string()),
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use structured logger calls for lifecycle events, not eprintln!.

These new lifecycle messages bypass structured logging and log-level routing. In this worker module, emit canceled/cache-hit completion through log_info (and keep failures in log_warn) instead of eprintln!.

Proposed fix
             if is_canceled {
-                eprintln!(
-                    "{} crawl job {} canceled",
-                    symbol_for_status("canceled"),
-                    muted(&id.to_string()),
-                );
+                log_info(&format!("crawl job {id} canceled"));
             } else {
-                eprintln!(
-                    "{} crawl job {} failed",
-                    symbol_for_status("failed"),
-                    muted(&id.to_string()),
-                );
-                log_warn(&format!("worker failed crawl job {id}"));
+                log_warn(&format!("worker failed crawl job {id}: {err}"));
             }
@@
-    eprintln!(
-        "{} crawl job {} done (cache hit)",
-        symbol_for_status("completed"),
-        muted(&id.to_string()),
-    );
+    log_info(&format!("crawl job {id} done (cache hit)"));

As per coding guidelines, "Use structured log output via log_info and log_warn instead of println! in library code".

Also applies to: 247-251

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/jobs/crawl/runtime/worker/process.rs` around lines 96 - 106, Replace
the direct eprintln! lifecycle prints with structured logging: use log_info to
emit the "canceled" and cache-hit completion messages and keep failures using
log_warn; call log_info!(...) or log_warn!(...) with the same formatted message
using symbol_for_status(...) and muted(&id.to_string()) so the content remains
identical, and update the other eprintln! occurrences (the similar messages
later in the file) the same way; locate instances by searching for
eprintln!(..., symbol_for_status("canceled") / symbol_for_status("failed") and
replace accordingly.

Copy link
Copy Markdown
Owner Author

@jmagar jmagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eprintln! usage here is intentional. log_info routes through tracing::info!, which is suppressed at the default console threshold (WARN). These job lifecycle boundary messages — start, complete, cancel, cache-hit — need to be always visible to operators without requiring RUST_LOG=info. The eprintln! approach mirrors what the progress task already uses for in-flight crawl status, ensuring consistent operator UX across all lifecycle events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants