feat: iOS OCR Server with hOCR support + preserve document owner, permissions and document_type#973
feat: iOS OCR Server with hOCR support + preserve document owner, permissions and document_type#973vistalba wants to merge 26 commits into
Conversation
Add a new OCR provider (ios_ocr) that sends images to the iOS OCR Server app running on iPhone via its /upload API and extracts the recognized text. - New environment variable: IOS_OCR_SERVER_URL (required) and IOS_OCR_SERVER_TIMEOUT (optional, default 60s) - Supports image processing mode only - Returns plain OCR text, mirroring all existing provider behavior - Full test coverage following existing patterns
Bump musl-dev to 1.2.5-r11 and mupdf/mupdf-dev to 1.24.10-r1 to match current Alpine 3.21 repository.
Removes pinned package versions from Dockerfile to avoid build failures when Alpine repos update.
This reverts commit bc6cbf8.
…POST retries Also adds docstring to newIosOcrProvider to satisfy docstring coverage threshold. The background loop in background.go already handles document-level retry via exponential backoff, so HTTP-level retry is redundant and harmful for non-idempotent POST /upload.
Implements the HOCRCapable interface on IosOcrProvider to translate the iOS OCR Server's ocr_boxes array into hOCR Page structures.
- Adds IosOcrBox struct for typed box parsing
- Adds buildHOCRPage helper: sorts words top-to-bottom, groups into lines by Y-proximity, builds hOCR hierarchy (Page → Lines → Words)
- Adds parseOcrBoxes helper for safe interface{} parsing via marshal/unmarshal round-trip
- Implements IsHOCREnabled, GetHOCRPages, GetHOCRDocument, ResetHOCR for the HOCRCapable interface
- hOCR page creation is gated by config.EnableHOCR (ENABLE_HOCR env var)
- Includes unit tests for hOCR page building, box parsing, and end-to-end ProcessImage hOCR flow
Follow-up to the base ios_ocr provider PR.
The := at line 239 created a local imagePaths variable that shadowed the outer declaration (line 137), causing processed_pages=0 at line 366. This prevented PDF generation in image mode for all hOCR-capable providers. Changed to pre-declare imgPageCount and err, then use = for assignment, so the outer imagePaths is used consistently. This is a behavior change: PDF generation in image mode now works when it was previously silently broken.
… parseOcrBoxes, README contradictions
… related_document
… owner and permissions on upload - New env var PDF_PRESERVE_OWNER_PERMISSIONS (bool, default false) - Poll task + PATCH owner/permissions independently of ReplaceOriginal - Deletion still gated by ReplaceOriginal only - Added validation: PreserveOwnerPermissions requires UploadPDF=true - Updated all OCROptions construction sites and tests
- Replace synchronous 60s timeout polling with async background queue - New file permissions.go with enqueuePermissionRestore and processPendingPermissionRestores - uploadProcessedPDF enqueues restore task when PreserveOwnerPermissions=true - Background loop processes queue every ~10s, retries indefinitely with 24h expiry - ReplaceOriginal block unchanged (sync poll + PATCH + delete for crash-safety) - extractDocIDFromTask / buildPatchFields / patchNewDocumentPermissions helpers added - PR note: queue is in-memory, lost on pod restart during the upload-to-PATCH window
… and helpers - PendingPermissionRestore.Permissions was *PermissionSet, should be *Permissions - Added missing 'time' import to types.go - Fixed enqueuePermissionRestore, buildPatchFields, patchNewDocumentPermissions signatures
- New env var table entry - Trade-off warning in Metadata Copying Limitations section - Added success log in processPendingPermissionRestores
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an iOS OCR Server provider with optional hOCR, threads owner/permissions through types and the Paperless client, enqueues in-memory async permission-restore jobs after processed-PDF uploads, patches permissions on ReplaceOriginal success, and wires config, background processing, tests, and README updates. ChangesiOS OCR Server & Permission Restoration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
types.go (1)
79-79:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftMake
document_typenullable in these structs.These fields are decoded directly from the Paperless document payload, and this file already hints that
document_typemay be absent/null. If that happens, unmarshalling intointwill fail on Line 79, and Line 102 also loses the distinction between “unset” and a real ID. Using*inthere is safer for the new preserve-document-type flow.💡 Suggested change
type GetDocumentApiResponse struct { ID int `json:"id"` Correspondent int `json:"correspondent"` - DocumentType int `json:"document_type"` + DocumentType *int `json:"document_type"` Title string `json:"title"` Content string `json:"content"` Tags []int `json:"tags"` CreatedDate string `json:"created_date"` OriginalFileName string `json:"original_file_name"` Notes []interface{} `json:"notes"` CustomFields []CustomFieldResponse `json:"custom_fields"` Owner *int `json:"owner"` Permissions *Permissions `json:"permissions,omitempty"` } type Document struct { ID int `json:"id"` Title string `json:"title"` Content string `json:"content"` Tags []string `json:"tags"` Correspondent string `json:"correspondent"` CreatedDate string `json:"created_date"` OriginalFileName string `json:"original_file_name"` DocumentTypeName string `json:"document_type_name"` - DocumentType int `json:"document_type"` + DocumentType *int `json:"document_type"` CustomFields []CustomFieldResponse `json:"custom_fields"` Owner *int `json:"owner"` Permissions *Permissions `json:"permissions,omitempty"` }Also applies to: 102-102
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@types.go` at line 79, Change the DocumentType fields from int to *int in the affected structs so JSON null/absent values decode without error: locate the DocumentType field declarations (currently `DocumentType int \`json:"document_type"\``) and change their type to `*int`; update any places that read/write those fields (e.g., the preserve-document-type flow and any code that compares or assigns DocumentType) to handle nil vs non-nil (use nil-checks before dereferencing or treat nil as “unset”) and keep the json tag intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ocr_test.go`:
- Line 92: The for-loop range uses the undefined identifier `tc`; change both
occurrences of `for _, tc := range tc {` in ocr_test.go to iterate over the
actual slice `testCases` (i.e., `for _, tc := range testCases {`) so the loop
variable `tc` is defined; update both instances (around the loops that iterate
testCases in the test functions) to fix the compile error.
In `@ocr.go`:
- Around line 633-643: The function extractDocIDFromTask currently falls back to
reading taskStatus["id"] (the task's internal ID) which is incorrect; change it
to only parse related_document and support both string and numeric forms: remove
the fallback branch that checks taskStatus["id"], keep and expand the existing
related_document handling in extractDocIDFromTask to accept numeric types (e.g.,
float64/int) as well as strings, converting them to int and returning (id,
true); if related_document is absent or cannot be parsed return (0, false).
In `@ocr/iosocr_provider.go`:
- Around line 234-252: GetHOCRDocument currently uses p.hocrPages in append
order which can be out-of-order; after copying p.hocrPages into the local pages
slice (inside GetHOCRDocument) sort the pages by the PageNumber field (e.g.,
using sort.Slice or sort.SliceStable on pages comparing pages[i].PageNumber <
pages[j].PageNumber) before constructing the hocr.HOCR struct so the resulting
document pages are in numeric order; ensure you add the sort import if missing
and reference p.hocrPages and the local pages slice in your change.
In `@permissions.go`:
- Around line 71-78: The loop currently calls
app.patchNewDocumentPermissions(ctx, taskStatus, entry.Owner, entry.Permissions,
logger) but always marks the queue entry as processed (processed++ and
logger.Info) even if patching failed; change patchNewDocumentPermissions to
return an error (or success bool) instead of swallowing failures, then check
that return value in the SUCCESS case before calling logger.Info and
incrementing processed; if the call returns an error, log the error via the
existing logger and do not increment processed so the entry remains for retry.
Ensure you update callers and the patchNewDocumentPermissions signature
accordingly.
In `@README.md`:
- Around line 388-390: The README now documents a fifth provider ("### 5. iOS
OCR Server") but the earlier description still says the project supports "four
different OCR providers"; update that sentence to either increment the count to
five or, better, rephrase it to avoid hardcoding a number (e.g., "multiple OCR
providers" or "several OCR providers") so the text stays correct when providers
are added; look for the sentence that contains "four different OCR providers"
and change it accordingly.
- Around line 397-407: The README update about iOS hOCR/searchable PDF is
missing the prerequisite flag; update the relevant sections (around the iOS OCR
configuration block and the later Enhanced OCR Features paragraph) to explicitly
state that ENABLE_HOCR must be set to true (e.g., "ENABLE_HOCR=true") for hOCR
output to be produced by the iOS OCR provider, and add it as an item in the
Configuration block alongside OCR_PROVIDER/IOS_OCR_SERVER_URL and in the
explanation near the Enhanced OCR Features reference so users know the gate is
required.
---
Outside diff comments:
In `@types.go`:
- Line 79: Change the DocumentType fields from int to *int in the affected
structs so JSON null/absent values decode without error: locate the DocumentType
field declarations (currently `DocumentType int \`json:"document_type"\``) and
change their type to `*int`; update any places that read/write those fields
(e.g., the preserve-document-type flow and any code that compares or assigns
DocumentType) to handle nil vs non-nil (use nil-checks before dereferencing or
treat nil as “unset”) and keep the json tag intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3b3b10c4-3f0f-44be-8d02-aec1aca5a167
📒 Files selected for processing (14)
README.mdapp_llm_test.gobackground.gobackground_test.gojobs.gomain.goocr.goocr/iosocr_provider.goocr/iosocr_provider_test.goocr/provider.goocr_test.gopaperless.gopermissions.gotypes.go
- Rename tc to testCases in ocr_test.go for clarity - Remove taskStatus[id] fallback in extractDocIDFromTask - Make patchNewDocumentPermissions return error for retry logic - Sort hocrPages by PageNumber in GetHOCRDocument - Fix 'four different OCR providers' in README
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ocr_test.go`:
- Around line 200-206: The mock task currently sets "id" equal to documentID so
the test won't catch regressions in extractDocIDFromTask; change the "id" field
in the json.NewEncoder(w).Encode payload to use the distinct taskID (e.g. "id":
fmt.Sprintf("%d", taskID) or the taskID variable) while keeping
"related_document" as fmt.Sprintf("%d", documentID) so id and related_document
differ and the test actually verifies extractDocIDFromTask's behavior.
In `@ocr.go`:
- Around line 607-609: The replacement flow always calls
patchNewDocumentPermissions causing owner/permission restoration even when
PreserveOwnerPermissions is false; guard that call by checking the
PreserveOwnerPermissions flag before invoking app.patchNewDocumentPermissions
(use the same ctx, taskStatus, originalDoc.Owner, originalDoc.Permissions,
logger parameters), so patchNewDocumentPermissions runs only when
PreserveOwnerPermissions is true and otherwise is skipped.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dde6162c-1c36-49ed-aa10-770bd800ad7a
📒 Files selected for processing (5)
README.mdocr.goocr/iosocr_provider.goocr_test.gopermissions.go
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- permissions.go
- ocr/iosocr_provider.go
| json.NewEncoder(w).Encode([]map[string]interface{}{ | ||
| { | ||
| "id": documentID, | ||
| "status": "SUCCESS", | ||
| "task_id": taskID, | ||
| "related_document": fmt.Sprintf("%d", documentID), | ||
| "result": fmt.Sprintf("Success. New document id %d created", documentID), |
There was a problem hiding this comment.
Make the mocked task id differ from related_document.
This fixture would still pass if extractDocIDFromTask regressed to using id, because both fields resolve to the same value here. Give id a distinct task-record value so the test actually guards the bug this PR fixed.
Suggested fix
json.NewEncoder(w).Encode([]map[string]interface{}{
{
- "id": documentID,
+ "id": 9999,
"status": "SUCCESS",
"task_id": taskID,
"related_document": fmt.Sprintf("%d", documentID),
"result": fmt.Sprintf("Success. New document id %d created", documentID),
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| json.NewEncoder(w).Encode([]map[string]interface{}{ | |
| { | |
| "id": documentID, | |
| "status": "SUCCESS", | |
| "task_id": taskID, | |
| "related_document": fmt.Sprintf("%d", documentID), | |
| "result": fmt.Sprintf("Success. New document id %d created", documentID), | |
| json.NewEncoder(w).Encode([]map[string]interface{}{ | |
| { | |
| "id": 9999, | |
| "status": "SUCCESS", | |
| "task_id": taskID, | |
| "related_document": fmt.Sprintf("%d", documentID), | |
| "result": fmt.Sprintf("Success. New document id %d created", documentID), |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ocr_test.go` around lines 200 - 206, The mock task currently sets "id" equal
to documentID so the test won't catch regressions in extractDocIDFromTask;
change the "id" field in the json.NewEncoder(w).Encode payload to use the
distinct taskID (e.g. "id": fmt.Sprintf("%d", taskID) or the taskID variable)
while keeping "related_document" as fmt.Sprintf("%d", documentID) so id and
related_document differ and the test actually verifies extractDocIDFromTask's
behavior.
✅ Addressed in commit 957f65d
| // Restore owner and permissions on the new document | ||
| // Error is logged by patchNewDocumentPermissions — still proceed with deletion | ||
| _ = app.patchNewDocumentPermissions(ctx, taskStatus, originalDoc.Owner, originalDoc.Permissions, logger) |
There was a problem hiding this comment.
Gate the replacement-time permission PATCH behind PreserveOwnerPermissions.
This runs on every successful replacement, even when PreserveOwnerPermissions is false. If the source document has owner/permission fields, the new option becomes ineffective and replacement silently preserves them anyway.
Suggested fix
- // Restore owner and permissions on the new document
- // Error is logged by patchNewDocumentPermissions — still proceed with deletion
- _ = app.patchNewDocumentPermissions(ctx, taskStatus, originalDoc.Owner, originalDoc.Permissions, logger)
+ if options.PreserveOwnerPermissions {
+ // Restore owner and permissions on the new document
+ // Error is logged by patchNewDocumentPermissions — still proceed with deletion
+ _ = app.patchNewDocumentPermissions(ctx, taskStatus, originalDoc.Owner, originalDoc.Permissions, logger)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Restore owner and permissions on the new document | |
| // Error is logged by patchNewDocumentPermissions — still proceed with deletion | |
| _ = app.patchNewDocumentPermissions(ctx, taskStatus, originalDoc.Owner, originalDoc.Permissions, logger) | |
| if options.PreserveOwnerPermissions { | |
| // Restore owner and permissions on the new document | |
| // Error is logged by patchNewDocumentPermissions — still proceed with deletion | |
| _ = app.patchNewDocumentPermissions(ctx, taskStatus, originalDoc.Owner, originalDoc.Permissions, logger) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ocr.go` around lines 607 - 609, The replacement flow always calls
patchNewDocumentPermissions causing owner/permission restoration even when
PreserveOwnerPermissions is false; guard that call by checking the
PreserveOwnerPermissions flag before invoking app.patchNewDocumentPermissions
(use the same ctx, taskStatus, originalDoc.Owner, originalDoc.Permissions,
logger parameters), so patchNewDocumentPermissions runs only when
PreserveOwnerPermissions is true and otherwise is skipped.
✅ Addressed in commit 957f65d
…ssions flag; distinguish mock task id from document id in test
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ocr.go (1)
583-625:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winOnly delete the source document after a confirmed
SUCCESS.Line 624 still deletes the original when polling times out or when status couldn't be confirmed.
UploadDocumentonly means Paperless accepted the task; it does not guarantee the new document was created successfully. In a slow or delayed-processing case, this can drop the only copy of the document.Suggested fix
- for i := 0; i < maxRetries; i++ { + deleteOriginal := false + for i := 0; i < maxRetries; i++ { taskStatus, err := app.Client.GetTaskStatus(ctx, taskID) if err != nil { - logger.WithError(err).Warn("Failed to check task status, proceeding with deletion anyway") + logger.WithError(err).Warn("Failed to check task status, keeping original document") break } @@ if status == "SUCCESS" { logger.Info("Document processing completed successfully") + deleteOriginal = true if options.PreserveOwnerPermissions { _ = app.patchNewDocumentPermissions(ctx, taskStatus, originalDoc.Owner, originalDoc.Permissions, logger) } break } @@ - // Delete original document (even if poll timed out — upload was successful) - if err := app.Client.DeleteDocument(ctx, documentID); err != nil { + if !deleteOriginal { + return fmt.Errorf("document processing did not reach SUCCESS, not deleting original document") + } + if err := app.Client.DeleteDocument(ctx, documentID); err != nil { return fmt.Errorf("error deleting original document: %w", err) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ocr.go` around lines 583 - 625, Current logic may call app.Client.DeleteDocument even when polling timed out or task status is unknown; change the flow so DeleteDocument is only invoked after a confirmed "SUCCESS" status from app.Client.GetTaskStatus. Specifically, in the ReplaceOriginal branch (where options.ReplaceOriginal is true) update the polling loop around app.Client.GetTaskStatus/taskStatus so that you set a local flag (e.g., processingSucceeded) when status == "SUCCESS" (and run app.patchNewDocumentPermissions there if PreserveOwnerPermissions), return an error immediately if status == "FAILURE", and after the loop only call app.Client.DeleteDocument(ctx, documentID) if processingSucceeded is true; otherwise skip deletion and return a non-destructive error indicating the document was not confirmed successful.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@ocr.go`:
- Around line 583-625: Current logic may call app.Client.DeleteDocument even
when polling timed out or task status is unknown; change the flow so
DeleteDocument is only invoked after a confirmed "SUCCESS" status from
app.Client.GetTaskStatus. Specifically, in the ReplaceOriginal branch (where
options.ReplaceOriginal is true) update the polling loop around
app.Client.GetTaskStatus/taskStatus so that you set a local flag (e.g.,
processingSucceeded) when status == "SUCCESS" (and run
app.patchNewDocumentPermissions there if PreserveOwnerPermissions), return an
error immediately if status == "FAILURE", and after the loop only call
app.Client.DeleteDocument(ctx, documentID) if processingSucceeded is true;
otherwise skip deletion and return a non-destructive error indicating the
document was not confirmed successful.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4ca7cc2d-4aba-499e-b5b9-c8c61703ffa7
📒 Files selected for processing (2)
ocr.goocr_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- ocr_test.go
NOTE (not addressed in this PR): |
Raise maxRetries from 12 to 720 (5s intervals) so that document processing has up to ~1 hour to complete before giving up on deleting the original. This accommodates slower paperless-ngx processing common with mobile uploads, AI-based classification, or large documents.
Previously, ReplaceOriginal would delete the original document even if the processing task timed out or returned an unknown status. This could cause data loss in edge cases where the new document was not fully processed. Now a deleteOriginal flag is set only on confirmed SUCCESS. The original document is preserved and an error is returned if processing did not reach SUCCESS.
17fcfc6 to
d486e45
Compare
Summary
Two features plus a bug fix:
ios_ocrprovider withhOCR support for searchable PDF generation. Image-mode only.
PDF_PRESERVE_OWNER_PERMISSIONSenv var restores the original document's owner and permissions on uploaded
PDFs via an async background queue.
ocr.go:239that silently skipped PDFgeneration for all hOCR-capable providers in image mode.
Files Changed (14 files, +1076 / -41)
ocr/iosocr_provider.goocr/iosocr_provider_test.goocr/provider.gopermissions.gopaperless.goPatchDocument,GetTaskStatusarray fix,full_perms=truetypes.goPendingPermissionRestorestructocr.gomain.gobackground.go/jobs.goREADME.mdDesign Decision & Trade-off
The permission restore queue is in-memory. If paperless-gpt crashes between
upload and restore, the new document keeps default permissions. Acceptable
because the vulnerability window is small, the API key holder already has full
access, and the rest of the OCR pipeline is also in-memory.
Configuration
IOS_OCR_SERVER_URLios_ocr)IOS_OCR_SERVER_TIMEOUTPDF_PRESERVE_OWNER_PERMISSIONSSummary by CodeRabbit
New Features
Documentation
Tests
Notes