Skip to content

[cco-service] Implement Background Worker for Metadata Enrichment #7

@clnsmth

Description

@clnsmth

Description
We need to implement the asynchronous worker to handle high-latency tasks. As defined in ADR-001, the system must support processing large batches (1000+) of citations without blocking the main API thread or timing out the user interface.

Context
When a user uploads a raw list of 1,000 DOIs (via the IdentifierListIngestor from Issue #3), the system creates "hollow" objects. We need a background process to iterate through these IDs, query external APIs (Crossref/DataCite) for metadata (Title, Author, Year), and "enrich" the CCO in the background.

Requirements

  • Queue Infrastructure: Set up a task queue (recommend Celery or RQ) that connects to the existing Redis instance.
  • Worker Container: Add a service to docker-compose.yml that runs the worker process.
  • Enrichment Task: Implement a task (e.g., enrich_cco_metadata(session_id)) that:
  1. Retrieves the current CCO from Redis.
  2. Iterates through AggregatedResources that are missing metadata.
  3. Fetches metadata from public APIs (Crossref/DataCite).
  4. Updates the CCO object in Redis.
  • Status Tracking: The task should update a status key in Redis (e.g., processing_status: { percent: 50, status: "running" }) so the Frontend can poll for progress.

Technical Scope (Suggested)

  • Use the same cco-core library model definitions to ensure data consistency.
  • Important: Ensure the worker handles API rate limits (e.g., add rudimentary sleeps or retries if hitting Crossref too hard).
  • The API (Issue [cco-core] Implement Repository Adapter Pattern (Zenodo Integration) #4) will need an endpoint to trigger this task, but this issue focuses on the worker infrastructure and the task logic itself.

Acceptance Criteria

  • Worker container is running and connected to Redis.
  • A "dummy task" can be triggered that writes to logs (to prove infrastructure works).
  • The enrich_cco_metadata task successfully fetches a title for a given DOI and saves it back to the Redis CCO object.
  • Basic error handling (e.g., what happens if a DOI is invalid?) is implemented so the worker doesn't crash.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions