Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 18 additions & 4 deletions crates/jobs/crawl/runtime/worker/process.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use crate::crates::core::config::{Config, RenderMode};
use crate::crates::core::http::validate_url;
use crate::crates::core::logging::{log_done, log_info, log_warn};
use crate::crates::core::logging::{log_info, log_warn};
use crate::crates::core::ui::{accent, muted, symbol_for_status};
use crate::crates::crawl::engine::{CrawlSummary, run_crawl_once, should_fallback_to_chrome};
use crate::crates::jobs::common::{JobTable, mark_job_completed, spawn_heartbeat_task};
Expand Down Expand Up @@ -93,8 +93,17 @@ pub(super) async fn process_job(
.execute(pool)
.await?;
if is_canceled {
log_info(&format!("worker canceled crawl job {id}"));
eprintln!(
"{} crawl job {} canceled",
symbol_for_status("canceled"),
muted(&id.to_string()),
);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change removes the log_info call for canceled jobs. Unlike eprintln!, log_info is captured by tracing (including the JSON log file configured in crates/core/logging.rs). If cancellation events are important to retain in structured logs, consider keeping a tracing log for this path (in addition to or instead of the styled eprintln!).

Suggested change
);
);
log_info(&format!("worker canceled crawl job {id}"));

Copilot uses AI. Check for mistakes.
} else {
eprintln!(
"{} crawl job {} failed",
symbol_for_status("failed"),
muted(&id.to_string()),
);
Comment on lines +102 to +106
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure path now writes to stderr twice: once via eprintln! (this new message) and again via log_warn, whose console layer is configured to write WARN events to stderr. This will produce duplicate visible "failed" lines for the same job. Consider emitting only one user-visible line (e.g., keep log_warn for structured+console output and drop the eprintln!, or keep eprintln! and route the structured log somewhere that doesn’t also print to stderr).

Suggested change
eprintln!(
"{} crawl job {} failed",
symbol_for_status("failed"),
muted(&id.to_string()),
);

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +106
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use structured logger calls for lifecycle events, not eprintln!.

These new lifecycle messages bypass structured logging and log-level routing. In this worker module, emit canceled/cache-hit completion through log_info (and keep failures in log_warn) instead of eprintln!.

Proposed fix
             if is_canceled {
-                eprintln!(
-                    "{} crawl job {} canceled",
-                    symbol_for_status("canceled"),
-                    muted(&id.to_string()),
-                );
+                log_info(&format!("crawl job {id} canceled"));
             } else {
-                eprintln!(
-                    "{} crawl job {} failed",
-                    symbol_for_status("failed"),
-                    muted(&id.to_string()),
-                );
-                log_warn(&format!("worker failed crawl job {id}"));
+                log_warn(&format!("worker failed crawl job {id}: {err}"));
             }
@@
-    eprintln!(
-        "{} crawl job {} done (cache hit)",
-        symbol_for_status("completed"),
-        muted(&id.to_string()),
-    );
+    log_info(&format!("crawl job {id} done (cache hit)"));

As per coding guidelines, "Use structured log output via log_info and log_warn instead of println! in library code".

Also applies to: 247-251

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/jobs/crawl/runtime/worker/process.rs` around lines 96 - 106, Replace
the direct eprintln! lifecycle prints with structured logging: use log_info to
emit the "canceled" and cache-hit completion messages and keep failures using
log_warn; call log_info!(...) or log_warn!(...) with the same formatted message
using symbol_for_status(...) and muted(&id.to_string()) so the content remains
identical, and update the other eprintln! occurrences (the similar messages
later in the file) the same way; locate instances by searching for
eprintln!(..., symbol_for_status("canceled") / symbol_for_status("failed") and
replace accordingly.

log_warn(&format!("worker failed crawl job {id}"));
}
}
Expand Down Expand Up @@ -235,7 +244,11 @@ async fn maybe_complete_cache_hit(
"audit_report_path": report_path.to_string_lossy(),
});
mark_job_completed(pool, TABLE, id, Some(&result_json)).await?;
log_done(&format!("worker completed crawl job {id} (cache hit)"));
eprintln!(
"{} crawl job {} done (cache hit)",
symbol_for_status("completed"),
muted(&id.to_string()),
);
Comment on lines 246 to +251
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching the cache-hit completion from log_done to eprintln! means this completion event is no longer emitted through tracing (and therefore won’t appear in the JSON log file). If downstream monitoring/analytics relies on structured "done" events, consider keeping a tracing log (e.g., log_done/log_info) alongside the styled eprintln!.

Copilot uses AI. Check for mistakes.
Ok(true)
}

Expand All @@ -261,7 +274,6 @@ fn spawn_progress_task(
continue; // drain channel, skip both DB write and log
}
let pages_crawled = progress.pages_seen as u64;
let filtered_urls = pages_crawled.saturating_sub(progress.markdown_files as u64);

if elapsed_log >= Duration::from_secs(5) {
eprintln!(
Expand All @@ -277,6 +289,8 @@ fn spawn_progress_task(
}

if elapsed_db >= Duration::from_millis(500) {
let filtered_urls =
pages_crawled.saturating_sub(progress.markdown_files as u64);
let progress_json = serde_json::json!({
"phase": "crawling",
"md_created": progress.markdown_files,
Expand Down
Loading