fix: consistent eprintln! lifecycle messages and scope filtered_urls in process.rs#62
fix: consistent eprintln! lifecycle messages and scope filtered_urls in process.rs#62
Conversation
…ages and scope filtered_urls - Align cancel/cache-hit completion messages to use eprintln! with symbol_for_status/muted styling, matching start/complete messages - Keep log_warn for failure path (structured log + always visible) - Remove unused log_done import - Move filtered_urls computation inside the DB write branch where it is actually consumed
📝 WalkthroughWalkthroughModified logging infrastructure in the crawl worker process by replacing log macros with stderr output and Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR standardizes crawl job lifecycle messaging by switching certain boundary messages to styled eprintln! output, and narrows the scope of filtered_urls to where it’s used in the progress DB update path.
Changes:
- Replace cancel/failure/cache-hit completion messages with styled
eprintln!output and remove the unusedlog_doneimport. - Keep
log_warnon the failure path while adjusting the visible failure/cancel messaging. - Move
filtered_urlscomputation into the DB-write branch inspawn_progress_task.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| eprintln!( | ||
| "{} crawl job {} failed", | ||
| symbol_for_status("failed"), | ||
| muted(&id.to_string()), | ||
| ); |
There was a problem hiding this comment.
The failure path now writes to stderr twice: once via eprintln! (this new message) and again via log_warn, whose console layer is configured to write WARN events to stderr. This will produce duplicate visible "failed" lines for the same job. Consider emitting only one user-visible line (e.g., keep log_warn for structured+console output and drop the eprintln!, or keep eprintln! and route the structured log somewhere that doesn’t also print to stderr).
| eprintln!( | |
| "{} crawl job {} failed", | |
| symbol_for_status("failed"), | |
| muted(&id.to_string()), | |
| ); |
| "{} crawl job {} canceled", | ||
| symbol_for_status("canceled"), | ||
| muted(&id.to_string()), | ||
| ); |
There was a problem hiding this comment.
This change removes the log_info call for canceled jobs. Unlike eprintln!, log_info is captured by tracing (including the JSON log file configured in crates/core/logging.rs). If cancellation events are important to retain in structured logs, consider keeping a tracing log for this path (in addition to or instead of the styled eprintln!).
| ); | |
| ); | |
| log_info(&format!("worker canceled crawl job {id}")); |
| mark_job_completed(pool, TABLE, id, Some(&result_json)).await?; | ||
| log_done(&format!("worker completed crawl job {id} (cache hit)")); | ||
| eprintln!( | ||
| "{} crawl job {} done (cache hit)", | ||
| symbol_for_status("completed"), | ||
| muted(&id.to_string()), | ||
| ); |
There was a problem hiding this comment.
Switching the cache-hit completion from log_done to eprintln! means this completion event is no longer emitted through tracing (and therefore won’t appear in the JSON log file). If downstream monitoring/analytics relies on structured "done" events, consider keeping a tracing log (e.g., log_done/log_info) alongside the styled eprintln!.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/jobs/crawl/runtime/worker/process.rs`:
- Around line 96-106: Replace the direct eprintln! lifecycle prints with
structured logging: use log_info to emit the "canceled" and cache-hit completion
messages and keep failures using log_warn; call log_info!(...) or log_warn!(...)
with the same formatted message using symbol_for_status(...) and
muted(&id.to_string()) so the content remains identical, and update the other
eprintln! occurrences (the similar messages later in the file) the same way;
locate instances by searching for eprintln!(..., symbol_for_status("canceled") /
symbol_for_status("failed") and replace accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ccaddc1b-f4e1-40fa-a7af-b332a0bdb441
📒 Files selected for processing (1)
crates/jobs/crawl/runtime/worker/process.rs
| eprintln!( | ||
| "{} crawl job {} canceled", | ||
| symbol_for_status("canceled"), | ||
| muted(&id.to_string()), | ||
| ); | ||
| } else { | ||
| eprintln!( | ||
| "{} crawl job {} failed", | ||
| symbol_for_status("failed"), | ||
| muted(&id.to_string()), | ||
| ); |
There was a problem hiding this comment.
Use structured logger calls for lifecycle events, not eprintln!.
These new lifecycle messages bypass structured logging and log-level routing. In this worker module, emit canceled/cache-hit completion through log_info (and keep failures in log_warn) instead of eprintln!.
Proposed fix
if is_canceled {
- eprintln!(
- "{} crawl job {} canceled",
- symbol_for_status("canceled"),
- muted(&id.to_string()),
- );
+ log_info(&format!("crawl job {id} canceled"));
} else {
- eprintln!(
- "{} crawl job {} failed",
- symbol_for_status("failed"),
- muted(&id.to_string()),
- );
- log_warn(&format!("worker failed crawl job {id}"));
+ log_warn(&format!("worker failed crawl job {id}: {err}"));
}
@@
- eprintln!(
- "{} crawl job {} done (cache hit)",
- symbol_for_status("completed"),
- muted(&id.to_string()),
- );
+ log_info(&format!("crawl job {id} done (cache hit)"));As per coding guidelines, "Use structured log output via log_info and log_warn instead of println! in library code".
Also applies to: 247-251
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/jobs/crawl/runtime/worker/process.rs` around lines 96 - 106, Replace
the direct eprintln! lifecycle prints with structured logging: use log_info to
emit the "canceled" and cache-hit completion messages and keep failures using
log_warn; call log_info!(...) or log_warn!(...) with the same formatted message
using symbol_for_status(...) and muted(&id.to_string()) so the content remains
identical, and update the other eprintln! occurrences (the similar messages
later in the file) the same way; locate instances by searching for
eprintln!(..., symbol_for_status("canceled") / symbol_for_status("failed") and
replace accordingly.
jmagar
left a comment
There was a problem hiding this comment.
The eprintln! usage here is intentional. log_info routes through tracing::info!, which is suppressed at the default console threshold (WARN). These job lifecycle boundary messages — start, complete, cancel, cache-hit — need to be always visible to operators without requiring RUST_LOG=info. The eprintln! approach mirrors what the progress task already uses for in-flight crawl status, ensuring consistent operator UX across all lifecycle events.
Summary
eprintln!withsymbol_for_status/mutedstyling, matching the existing start/complete messageslog_warnfor failure path (goes to structured logs AND visible at default WARN threshold)log_doneimportfiltered_urlscomputation inside the DB write branch where it is consumedBeads
axon_rust-1s0: Fix inconsistent logging style at job lifecycle boundariesaxon_rust-vcz: Scope filtered_urls to DB write branch in spawn_progress_taskTesting
Summary by cubic
Make job lifecycle logs consistent by using
eprintln!withsymbol_for_status/mutedfor cancel and cache-hit paths, while keepinglog_warnfor failures. Also scopefiltered_urlsto the DB write branch to avoid unnecessary work. Addressesaxon_rust-1s0andaxon_rust-vcz.eprintln!; failures still uselog_warn.log_doneimport.filtered_urlsonly when writing progress to DB inspawn_progress_task.Written for commit 863c768. Summary will update on new commits.
Summary by CodeRabbit
Chores
Performance