`[cco-service]` Implement Background Worker for Metadata Enrichment

**Description**
We need to implement the asynchronous worker to handle high-latency tasks. As defined in **ADR-001**, the system must support processing large batches (1000+) of citations without blocking the main API thread or timing out the user interface.

**Context**
When a user uploads a raw list of 1,000 DOIs (via the `IdentifierListIngestor` from Issue #3), the system creates "hollow" objects. We need a background process to iterate through these IDs, query external APIs (Crossref/DataCite) for metadata (Title, Author, Year), and "enrich" the CCO in the background.

**Requirements**

* **Queue Infrastructure:** Set up a task queue (recommend **Celery** or **RQ**) that connects to the existing Redis instance.
* **Worker Container:** Add a service to `docker-compose.yml` that runs the worker process.
* **Enrichment Task:** Implement a task (e.g., `enrich_cco_metadata(session_id)`) that:
1. Retrieves the current CCO from Redis.
2. Iterates through `AggregatedResources` that are missing metadata.
3. Fetches metadata from public APIs (Crossref/DataCite).
4. Updates the CCO object in Redis.


* **Status Tracking:** The task should update a status key in Redis (e.g., `processing_status: { percent: 50, status: "running" }`) so the Frontend can poll for progress.

**Technical Scope (Suggested)**

* Use the same `cco-core` library model definitions to ensure data consistency.
* **Important:** Ensure the worker handles API rate limits (e.g., add rudimentary sleeps or retries if hitting Crossref too hard).
* The API (Issue #4) will need an endpoint to *trigger* this task, but this issue focuses on the worker infrastructure and the task logic itself.

**Acceptance Criteria**

* [ ] Worker container is running and connected to Redis.
* [ ] A "dummy task" can be triggered that writes to logs (to prove infrastructure works).
* [ ] The `enrich_cco_metadata` task successfully fetches a title for a given DOI and saves it back to the Redis CCO object.
* [ ] Basic error handling (e.g., what happens if a DOI is invalid?) is implemented so the worker doesn't crash.

**References**

* **ADR-001 Section 2:** [Container Design (Worker)](https://github.com/clnsmth/cco-generator/blob/development/docs/ADR-001-Architecture.md#container-design)
* **ADR-001 Section 6:** [Consequences (Async UX)](https://github.com/clnsmth/cco-generator/blob/development/docs/ADR-001-Architecture.md#6-consequences)
* **Requirements Section 2:** [Scalability (1000+ citations)](https://github.com/clnsmth/cco-generator/blob/development/docs/requirements.md#2-non-functional-requirements)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`[cco-service]` Implement Background Worker for Metadata Enrichment #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[cco-service] Implement Background Worker for Metadata Enrichment #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`[cco-service]` Implement Background Worker for Metadata Enrichment #7