fix(process-pdf-job): discard on RuntimeError and guard failed status#2453
fix(process-pdf-job): discard on RuntimeError and guard failed status#2453ahnv wants to merge 1 commit into
Conversation
Add discard_on(RuntimeError) to drop the job immediately instead of exhausting 25 Sidekiq retries on deterministic errors. The block logs job_id and message for observability. Widen the early-return guard from status == "complete" to also cover "failed", preventing re-processing of already-failed imports.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthrough
ChangesPDF Job Error Handling
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Superagent didn't find any vulnerabilities or security issues in this PR. |
|
I think @JSONbored worked on this feature, maybe your review is also beneficial on this PR |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e8e9796a73
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return unless pdf_import.is_a?(PdfImport) | ||
| return reset_processing_claim(pdf_import) unless pdf_import.pdf_uploaded? | ||
| return if pdf_import.status == "complete" | ||
| return if pdf_import.status.in?(%w[complete failed]) |
There was a problem hiding this comment.
Keep queued retries from short-circuiting
When a non-RuntimeError is raised after processing starts, the rescue below updates the import to failed before re-raising. ActiveJob/Sidekiq then deserializes that same PdfImport on the retry, but this new guard returns immediately because the status is already failed, so transient failures such as API/Faraday errors during extraction get only the first attempt instead of the intended queued retries.
Useful? React with 👍 / 👎.
| class ProcessPdfJob < ApplicationJob | ||
| queue_as :medium_priority | ||
|
|
||
| discard_on(RuntimeError) do |job, err| |
There was a problem hiding this comment.
Narrow the RuntimeError discard to permanent PDF failures
This treats every bare raise "..." as non-retryable, but PdfImport#process_with_ai and #extract_transactions turn any unsuccessful provider response into exactly that by raising the response message. Since the providers also wrap API/Faraday failures into unsuccessful responses, a transient LLM outage or rate limit is now discarded after one attempt and left failed; use a narrower permanent-error class for bad PDFs/passwords.
Useful? React with 👍 / 👎.
jjmata
left a comment
There was a problem hiding this comment.
Targeted and clean. Two small things:
Breadth of discard_on(RuntimeError) — RuntimeError is Ruby's default error class (raise "message" produces one), so this will catch most untyped errors from PDF parsing libraries. That's probably the intent. But if any infrastructure-level error happens to be raised as a bare RuntimeError (rather than a subclass of StandardError), those would also be permanently discarded. It's worth checking what the upstream PDF/AI library raises for hard failures — if it uses a named subclass, narrowing the rescue to that class would be safer and make the discard intent more explicit.
status.in?(%w[complete failed]) guard — Good catch. This prevents a job in the failed state (set by another code path, e.g. AI processing marking it failed before the job retries) from redundantly re-processing. The original check only covered "complete", which meant a failed import could be re-attempted indefinitely if a retry was queued. This is the right fix.
Generated by Claude Code
|
Hey @jjmata, based on your initial review, I dug into the actual error path and there's a subtlety worth flagging before we finalize. The thing is, the typed error never actually makes it to the job. The provider does raise a proper named error ( # app/models/pdf_import.rb
unless response.success?
error_message = response.error&.message || "Unknown PDF processing error"
raise error_message # raise <String> => plain RuntimeError
endOnce you And that leads to the part that worries me a bit more: since every provider failure goes through that same
In other words, this PR fixes the password-protected PDF case (great), but it also quietly stops retrying genuinely transient API failures, which used to get Sidekiq's 25 retries. Same fix, two very different failure modes caught in the same net. A few ways we could go:
On the I'm leaning B. Happy to fold it into this PR if that sounds right to you. |
|
I think B makes sense, yeah ... let's go that route! 👍 |
What
Two small hardening changes to
ProcessPdfJob:discard_on(RuntimeError)— permanently discards the job when aRuntimeErroris raised, instead of letting Sidekiq retry it 25 times over ~21 days. The discard block logs thejob_idand error message so failures stay observable without spamming retries.Guard
failedstatus — the early-return guard that previously only covered"complete"now also covers"failed". This prevents a re-enqueued or accidentally re-triggered job from attempting to reprocess an import that already failed.Incident context
A
ProcessPdfJobhas been failing and retrying since June 7. By June 22 it had reachedretry_count=22. It fails every time with:This is a password-protected PDF — a deterministic, unrecoverable error. Each retry triggers PDF→image conversion via poppler/mini_magick, a known memory-leak risk on repeated failures (image buffers are allocated and abandoned). With 3 more retries pending before Sidekiq gives up, it would have continued retrying and leaking.
Why
RuntimeErrorin this job is almost always deterministic (e.g. invalid state, bad PDF format, wrong password). Retrying 25 times wastes resources and delays queue throughput with no chance of success.failedguard, a duplicate enqueue or a background retry could clobber thefailedstatus and re-run expensive AI processing on an import whose failure was already recorded and surfaced to the user.Retry behaviour after this change
RuntimeError(or subclass)ActiveJob::DeserializationErrorApplicationJob)ActiveRecord::DeadlockedApplicationJob)StandardErrorTesting
RuntimeErroris discarded and not retried.failedimport returns early without changing status.ProcessPdfJobtests should remain green.Summary by CodeRabbit
Bug Fixes