Bug: AUTO_TAG auto-tagging loops indefinitely on any Paperless PATCH 4xx — LLM-side equivalent of #949

## Summary
paperless-gpt's auto-tagging path (`processAutoTagDocuments` in the background task) forwards every LLM-suggested value to Paperless via PATCH without intermediate validation. If Paperless rejects the request with a 4xx — for *any* reason — the auto-tag path exits with an error and leaves `AUTO_TAG` (`paperless-gpt-auto`) on the document. The background poller picks the document up again ~9 s later, re-runs the full ~21 s LLM pipeline (6 calls on OpenAI), gets the same rejection, and loops indefinitely.

This is the LLM-side equivalent of #949 (which addresses the same shape on the OCR path). The defect is architectural and type-agnostic: any Paperless-validated field type can trigger it (Date, Select, Monetary, Integer, Float, URL, Boolean, oversize String). Related issues that are the same underlying bug under different triggers:

- #871 — generic report, OCR path, Mistral page-limit error
- #956 — auto-tagging path, Select-type custom field with freetext value from LLM
- (this issue) — auto-tagging path, Date-type custom field with out-of-range value from LLM

## Environment
- paperless-gpt: v0.25.1 (image: `ghcr.io/icereed/paperless-gpt:latest`, built 2026-02-26)
- paperless-ngx: 2.20 (latest)
- LLM provider: OpenAI, model `gpt-5.4-mini`, `LLM_TEMPERATURE=1.0`
- `AUTO_TAG`: `paperless-gpt-auto` (default)
- Polling interval: ~9 s (default)

## Reproducer (the simplest of many)
1. In Paperless, create a custom field of type **Date** and include its id in paperless-gpt's `custom_fields_selected_ids` so the LLM tries to fill it.
2. Add a document whose OCR text contains a date with an out-of-range day or month — for example the literal string `Datum 79.01.2023` (easy to provoke by hand-typing such a string into a test PDF before submitting).
3. Add `paperless-gpt-auto` to the document.

Other validated field types reproduce the same loop with different triggers — e.g. #956 demonstrates it for type Select with a freetext LLM output. A Monetary field fed a non-numeric value, an Integer field fed a string, a URL field fed something without a scheme, etc., would all behave the same way.

## Observed
- LLM emits a value that Paperless's serializer rejects (in the reproducer: the literal ISO string `2023-01-79`).
- paperless-gpt PATCHes Paperless. Paperless replies 400, e.g. (for the Date case):

  ```
  400 {"custom_fields":[...,{"non_field_errors":[
        "Date has wrong format. Use one of these formats instead: YYYY-MM-DD."]},...]}
  ```

- paperless-gpt logs:

  ```
  level=error msg="Error in background tagging: error in processAutoTagDocuments:
        error updating document N: ..."
  ```

- **`AUTO_TAG` is NOT removed.** Next poll re-runs the full LLM pipeline.
- Paperless's audit log confirms a 94-minute observed loop on a single document in our environment: ~270 cycles × ~6 LLM calls ≈ ~1,600 billed calls.

## Expected
Symmetric to #949 on the OCR path: on any failure exiting `processAutoTagDocuments` — regardless of where it failed (LLM call, JSON parse, Paperless PATCH 4xx, etc.) — the `AUTO_TAG` should be swapped to a configurable failure tag (default `paperless-gpt-failed`), so the loop is broken after one wasted cycle and failed documents are easy for the user to find and re-process manually.

## Suggested fix
Mirror #949's pattern in the LLM auto-tagging path. The fix is structural and type-agnostic — it does not need to know which field validation tripped; it only needs to react to any error exit from `processAutoTagDocuments`:

- Add `FAIL_TAG` env var (default: `paperless-gpt-failed`), validated and exported.
- In `background.go` (or wherever `processAutoTagDocuments` exits on error), call `UpdateDocuments` to swap `AUTO_TAG → FAIL_TAG` before continuing.
- Mirror #949's test (`TestProcessAutoOcrTagDocuments_FailureRemovesTag`) for the AUTO_TAG path; include at least one case per failure class (PATCH 4xx, LLM error, JSON parse error).

A complementary improvement (separate PR; out of scope for this one) would be to validate LLM output against the destination Paperless schema before PATCHing, and drop fields that won't pass validation rather than discard the whole document update over one bad value. But the loop-break is the essential fix.

Happy to submit a PR if useful. Either way, I'd keep this fix separate from #944 (queue rewrite) since it's a small, focused change that can land independently.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: AUTO_TAG auto-tagging loops indefinitely on any Paperless PATCH 4xx — LLM-side equivalent of #949 #975

Summary

Environment

Reproducer (the simplest of many)

Observed

Expected

Suggested fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Bug: AUTO_TAG auto-tagging loops indefinitely on any Paperless PATCH 4xx — LLM-side equivalent of #949 #975

Description

Summary

Environment

Reproducer (the simplest of many)

Observed

Expected

Suggested fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions