Add async suggestion jobs for manual processing#984
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
✅ Files skipped from review due to trivial changes (3)
🚧 Files skipped from review as they are similar to previous changes (6)
📝 WalkthroughWalkthroughThis PR converts synchronous document suggestion generation into an asynchronous job system with in-memory job storage, worker-pool processing, per-job cancellation/timeouts, sequential per-document generation with progress updates, HTTP job lifecycle endpoints, frontend polling and UI integration, tests, and small docs/build updates. ChangesAsynchronous Suggestion Job System
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend as DocumentProcessor
participant API as HTTP Handlers
participant Queue as suggestionJobQueue
participant Worker as SuggestionWorker
participant LLM as LLM
participant Store as SuggestionJobStore
User->>Frontend: Select docs, click "Generate Suggestions"
Frontend->>API: POST /api/jobs/suggestions {docs, options}
API->>Store: Add job (status=pending)
API-->>Frontend: 202 {job_id}
Frontend->>Frontend: Poll /api/jobs/suggestions/:job_id periodically
API->>Queue: Enqueue job
Worker->>Queue: Dequeue job
Worker->>Store: startPending -> set in_progress
loop Per document
Worker->>LLM: Generate suggestions for document
Worker->>Store: updateProgress(docs_done, current_document)
end
alt Success
Worker->>Store: complete(job, results)
else Timeout
Worker->>Store: updateStatus(failed, "Suggestion job timed out")
else Cancelled
Worker->>Store: updateStatus(cancelled, error)
end
Frontend->>API: GET /api/jobs/suggestions/:job_id
API-->>Frontend: {status, progress, result/error}
opt User stops job
Frontend->>API: POST /api/jobs/suggestions/:job_id/stop
API->>Store: cancelPending or invoke cancel func
API-->>Frontend: 204
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app_http_handlers.go`:
- Around line 224-225: Replace the blocking send to suggestionJobQueue (the line
with "suggestionJobQueue <- job") with a non-blocking enqueue: attempt to push
the job into suggestionJobQueue using a select/default pattern (or equivalent)
and, if the queue is full, fail fast by responding from the HTTP handler with an
appropriate status (503 or 429) or mark the job as failed in suggestionJobStore
(use the same job ID returned by suggestionJobStore.addJob(job)); ensure you
still call suggestionJobStore.addJob(job) before attempting the non-blocking
enqueue so the job state can be updated when enqueue fails.
In `@app_llm.go`:
- Around line 506-510: The current implementation always prefetches
tags/correspondents/document types by calling GetAllTags, GetAllCorrespondents,
and GetAllDocumentTypes inside prepareSuggestionGenerationContext even when the
user requested only titles or dates; change prepareSuggestionGenerationContext
to accept the GenerateSuggestionsRequest (pass it from
generateDocumentSuggestionsSequential and other callers) and only call
GetAllTags/GetAllCorrespondents/GetAllDocumentTypes when the corresponding
suggestion types are requested in the GenerateSuggestionsRequest (e.g., check
request fields for tag/correspondent/documentType suggestions before fetching),
and update every other caller of prepareSuggestionGenerationContext similarly so
unrelated endpoint failures don't abort title-only/date-only requests.
In `@suggestion_jobs.go`:
- Around line 62-66: getJob currently returns pointers to SuggestionJob while
holding the lock then unlocks, causing data races because callers marshal or
read those jobs after the lock; do one of two fixes: inside
SuggestionJobStore.getJob (and the analogous getAllSuggestionJobs method) create
and return an immutable deep copy/snapshot of the SuggestionJob(s) while the
lock is held (or construct a response DTO under the lock) so callers receive
non-mutated data after unlock; update callers to expect the copied struct
instead of the internal pointer (keep function names getJob and
getAllSuggestionJobs but change their return to the copied value or DTO) to
eliminate concurrent mutation races.
- Around line 97-107: The cancelPending race occurs because the
pending→in_progress flip is not done atomically with respect to cancelPending
(SuggestionJobStore.cancelPending). Fix by implementing a compare-and-set under
the same store mutex: add a method on SuggestionJobStore (e.g.,
SetStatusIf(jobID string, expected string, next string) bool) or modify the
job-starting code to acquire store.Lock(), check job.Status == "pending", set
job.Status = "in_progress" and update UpdatedAt before releasing the lock, and
only proceed to start work when that CAS returns true; keep cancelPending
unchanged but ensure it uses the same mutex (store.Lock()) so only one of
cancelPending or the new CAS can win.
In `@web-app/src/DocumentProcessor.tsx`:
- Around line 99-106: The current mapping in processedSuggestions
unconditionally overwrites suggestion.suggested_custom_fields[*].name with
customFieldMap.get(cf.id) || 'Unknown Field', which discards the API-provided
cf.name when the local allCustomFields map is missing an entry; change the logic
in the processedSuggestions transformation so name = customFieldMap.has(cf.id) ?
customFieldMap.get(cf.id) : cf.name (preserving the backend-provided name from
CustomFieldSuggestion) and avoid defaulting to 'Unknown Field' unless both
sources lack a name.
- Around line 174-177: The catch block in the suggestion polling logic currently
treats any error as terminal and calls clearActiveSuggestionJob(), which
discards job_id on transient failures; instead, preserve the active job_id on
generic/network/5xx errors and only clear it when the backend indicates the job
is gone/terminal (e.g., HTTP 404/410 or an explicit "job not found/terminated"
error). Update the catch in the function that calls setError and logs the error
(the block using console.error("Error checking suggestion job status:", err),
setError("Failed to check suggestion job status."), clearActiveSuggestionJob())
to inspect the error (e.g., err.response?.status or error code), call
clearActiveSuggestionJob() only for terminal statuses, and otherwise keep the
job_id and allow the polling retry/schedule to continue.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 03a4b56f-502a-4919-99a4-1f45cc15bb9e
📒 Files selected for processing (8)
DockerfileREADME.mdapp_http_handlers.goapp_llm.gomain.gosuggestion_jobs.gosuggestion_jobs_test.goweb-app/src/DocumentProcessor.tsx
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
suggestion_jobs_test.go (2)
255-255:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove redundant
updateStatuscall.The job is already added with
Status: "cancelled"andError: "Job cancelled by user"at line 254. Line 255 sets the same values again, which is unnecessary.🧹 Proposed fix
} suggestionJobStore.addJob(job) - suggestionJobStore.updateStatus(job.ID, "cancelled", "Job cancelled by user") processSuggestionJob(app, job)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@suggestion_jobs_test.go` at line 255, Remove the redundant status update: since the test creates the job with Status: "cancelled" and Error: "Job cancelled by user", delete the extra call to suggestionJobStore.updateStatus(job.ID, "cancelled", "Job cancelled by user") so the test doesn't repeat setting identical state (locate the code around the job creation and the suggestionJobStore.updateStatus invocation).
273-273:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove redundant
os.Unsetenvcall.
t.Setenvat line 267 automatically restores the environment variable after the test completes (since Go 1.17). The manualos.Unsetenvin the cleanup function is redundant.🧹 Proposed fix
titleTemplate = template.Must(template.New("title").Parse("{{.Content}}")) t.Cleanup(func() { titleTemplate = previousTitleTemplate - _ = os.Unsetenv("SUGGESTION_JOB_TIMEOUT_SECONDS") })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@suggestion_jobs_test.go` at line 273, Remove the redundant os.Unsetenv call in the test cleanup: since t.Setenv (used at the test setup around the line with t.Setenv) automatically restores the environment variable, delete the explicit os.Unsetenv("SUGGESTION_JOB_TIMEOUT_SECONDS") statement in suggestion_jobs_test.go (the cleanup block) so the test relies on t.Setenv's automatic restoration and avoid double-unsetting.
🧹 Nitpick comments (1)
app_llm_test.go (1)
431-466: 💤 Low valueLGTM!
The test effectively verifies that
prepareSuggestionGenerationContextrespects the generate flags and only fetches metadata when requested. The two-scenario structure (negative + positive cases) provides good coverage.Optional: Consider safer type assertions.
Lines 446 and 459 use bare type assertions without checking the
okvalue. While this is a common pattern in test code where setup is controlled, a more defensive approach would prevent panics if the mock is accidentally misconfigured:♻️ Optional refactor for type-assertion safety
- client := app.Client.(*mockPaperlessClient) + client, ok := app.Client.(*mockPaperlessClient) + require.True(t, ok, "Client should be *mockPaperlessClient") assert.Zero(t, client.GetAllTagsCalls)Apply the same pattern at line 459.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app_llm_test.go` around lines 431 - 466, The test uses bare type assertions for app.Client.(*mockPaperlessClient) at the two spots where client is assigned; change these to safe comma-ok checks (e.g. client, ok := app.Client.(*mockPaperlessClient)) and assert/require that ok is true before using client so a misconfigured test setup doesn't panic; apply this fix for both occurrences around prepareSuggestionGenerationContext and the subsequent assertions involving GetAllTagsCalls, GetAllCorrespondentsCalls, GetAllDocumentTypesCalls, availableTagNames, availableCorrespondentNames, and availableDocumentTypeNames.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@suggestion_jobs_test.go`:
- Line 255: Remove the redundant status update: since the test creates the job
with Status: "cancelled" and Error: "Job cancelled by user", delete the extra
call to suggestionJobStore.updateStatus(job.ID, "cancelled", "Job cancelled by
user") so the test doesn't repeat setting identical state (locate the code
around the job creation and the suggestionJobStore.updateStatus invocation).
- Line 273: Remove the redundant os.Unsetenv call in the test cleanup: since
t.Setenv (used at the test setup around the line with t.Setenv) automatically
restores the environment variable, delete the explicit
os.Unsetenv("SUGGESTION_JOB_TIMEOUT_SECONDS") statement in
suggestion_jobs_test.go (the cleanup block) so the test relies on t.Setenv's
automatic restoration and avoid double-unsetting.
---
Nitpick comments:
In `@app_llm_test.go`:
- Around line 431-466: The test uses bare type assertions for
app.Client.(*mockPaperlessClient) at the two spots where client is assigned;
change these to safe comma-ok checks (e.g. client, ok :=
app.Client.(*mockPaperlessClient)) and assert/require that ok is true before
using client so a misconfigured test setup doesn't panic; apply this fix for
both occurrences around prepareSuggestionGenerationContext and the subsequent
assertions involving GetAllTagsCalls, GetAllCorrespondentsCalls,
GetAllDocumentTypesCalls, availableTagNames, availableCorrespondentNames, and
availableDocumentTypeNames.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 924f3b03-bcf2-48d2-9803-43d9c3134fd4
📒 Files selected for processing (6)
app_http_handlers.goapp_llm.goapp_llm_test.gosuggestion_jobs.gosuggestion_jobs_test.goweb-app/src/DocumentProcessor.tsx
🚧 Files skipped from review as they are similar to previous changes (2)
- app_llm.go
- web-app/src/DocumentProcessor.tsx
f783901 to
0530dd7
Compare
|
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
Summary
This adds an asynchronous job flow for manual suggestion generation so the web UI no longer depends on one long-running browser POST.
The existing synchronous
/api/generate-suggestionsendpoint remains available for compatibility, but the manual UI now submits suggestion work through/api/jobs/suggestions, polls job status, and renders the completed suggestions when the worker finishes.Why
Manual suggestion generation can take longer than browser, proxy, or deployment timeouts when local or slower LLM backends are used. In that case the old flow could lose the result even when the backend was still working. Moving the UI onto an async job flow makes long-running processing observable and lets the user resume polling after a reload.
Changes
SuggestionJob, an in-memory job store, queue, and worker pool.generateDocumentSuggestionsSequential.SUGGESTION_WORKERSwith default1.SUGGESTION_JOB_TIMEOUT_SECONDS, disabled by default.localStorage.Validation
go test ./...npm run buildnpx eslint src/DocumentProcessor.tsxNotes
The new job store follows the existing in-memory OCR job pattern. Job persistence can be added later if the project wants job state to survive process restarts.
Summary by CodeRabbit
New Features
UI
Documentation
Tests
Chores