You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
We need to implement the asynchronous worker to handle high-latency tasks. As defined in ADR-001, the system must support processing large batches (1000+) of citations without blocking the main API thread or timing out the user interface.
Context
When a user uploads a raw list of 1,000 DOIs (via the IdentifierListIngestor from Issue #3), the system creates "hollow" objects. We need a background process to iterate through these IDs, query external APIs (Crossref/DataCite) for metadata (Title, Author, Year), and "enrich" the CCO in the background.
Requirements
Queue Infrastructure: Set up a task queue (recommend Celery or RQ) that connects to the existing Redis instance.
Worker Container: Add a service to docker-compose.yml that runs the worker process.
Enrichment Task: Implement a task (e.g., enrich_cco_metadata(session_id)) that:
Retrieves the current CCO from Redis.
Iterates through AggregatedResources that are missing metadata.
Fetches metadata from public APIs (Crossref/DataCite).
Updates the CCO object in Redis.
Status Tracking: The task should update a status key in Redis (e.g., processing_status: { percent: 50, status: "running" }) so the Frontend can poll for progress.
Technical Scope (Suggested)
Use the same cco-core library model definitions to ensure data consistency.
Important: Ensure the worker handles API rate limits (e.g., add rudimentary sleeps or retries if hitting Crossref too hard).
Description
We need to implement the asynchronous worker to handle high-latency tasks. As defined in ADR-001, the system must support processing large batches (1000+) of citations without blocking the main API thread or timing out the user interface.
Context
When a user uploads a raw list of 1,000 DOIs (via the
IdentifierListIngestorfrom Issue #3), the system creates "hollow" objects. We need a background process to iterate through these IDs, query external APIs (Crossref/DataCite) for metadata (Title, Author, Year), and "enrich" the CCO in the background.Requirements
docker-compose.ymlthat runs the worker process.enrich_cco_metadata(session_id)) that:AggregatedResourcesthat are missing metadata.processing_status: { percent: 50, status: "running" }) so the Frontend can poll for progress.Technical Scope (Suggested)
cco-corelibrary model definitions to ensure data consistency.[cco-core]Implement Repository Adapter Pattern (Zenodo Integration) #4) will need an endpoint to trigger this task, but this issue focuses on the worker infrastructure and the task logic itself.Acceptance Criteria
enrich_cco_metadatatask successfully fetches a title for a given DOI and saves it back to the Redis CCO object.References