A Cloudflare Workers-based system for scheduling and publishing AEM Edge Delivery snapshots automatically
This system consists of four main components that work together to manage scheduled snapshot publishing:
- Register - Registers org/site combinations for snapshot scheduling and manages schedule data
- Cron - Monitors schedule data and queues snapshots for publishing
- Publish - Publishes snapshots and manages completion tracking
- DLQ - Handles failed snapshots for investigation and recovery
The register service handles both registration and schedule management for org/site combinations.
To register an org/site combination for snapshot scheduling:
curl -X POST https://helix-snapshot-scheduler-ci.adobeaem.workers.dev/register \
-H "Content-Type: application/json" \
-H "Authorization: token <your-token>" \
-d '{"org": "your-org", "site": "your-site", "apiKey": "your-api-key"}'
Response:
200 OK
- Registration successful or already registered400 Bad Request
- Missing org or site in request body401 Unauthorized
- Invalid or missing authorization token
To schedule a snapshot for publishing at a specific time:
curl -X POST https://helix-snapshot-scheduler-ci.adobeaem.workers.dev/schedule \
-H "Content-Type: application/json" \
-H "Authorization: token <your-token>" \
-d '{
"org": "your-org",
"site": "your-site",
"snapshotId": "snapshot-123"
}'
Response:
200 OK
- Schedule updated successfully400 Bad Request
- Missing required fields or invalid date format401 Unauthorized
- Invalid or missing authorization token404 Not Found
- Org/site not registered for scheduled publishing
To retrieve schedule data for a specific org/site:
curl -X GET https://helix-snapshot-scheduler-ci.adobeaem.workers.dev/schedule/your-org/your-site \
-H "Authorization: token <your-token>"
Response:
{
"your-org--your-site": {
"snapshot-123": "2025-01-15T10:30:00Z",
"snapshot-456": "2025-01-16T14:00:00Z"
}
}
To check if an org/site is registered:
curl -X GET https://helix-snapshot-scheduler-ci.adobeaem.workers.dev/register/your-org/your-site \
-H "Authorization: token <your-token>"
Response:
{
"registered": true
}
The cron worker runs every 5 minutes and performs the following:
- Reads schedule data: Loads the centralized
schedule.json
from R2 bucket - Filters by timing: Identifies snapshots scheduled for publishing in the next 5 minutes
- Queues for publishing: Adds eligible snapshots to the
publish-queue
with exact delay timing
When the publish-queue processes a batch of snapshots:
- Publishes snapshots: Calls the AEM Admin API to publish each snapshot in the batch
- Batch optimization: Updates schedule and completed data once per batch (not per snapshot)
- Updates schedule: Removes all published snapshots from
schedule.json
in a single operation - Tracks completion: Moves completed snapshot data to
completed/YYYY-MM-DD.json
for audit trail - Retry mechanism: Automatically retries failed publishes (5 attempts with exponential backoff)
- Dead Letter Queue: After max retries, failed snapshots are sent to DLQ for investigation
When snapshots fail after all retry attempts:
- Logs failures: Records detailed error information for each failed snapshot
- Stores for investigation: Saves failed snapshot data to
failed/YYYY-MM-DD.json
in R2 - Enables recovery: Failed snapshots can be manually retried or investigated
- Prevents message loss: Ensures no snapshots are silently dropped
The register service stores schedule data in R2 bucket as schedule.json
with the following structure:
{
"org1--site1": {
"snapshotId1": "2025-01-15T10:30:00Z",
"snapshotId2": "2025-01-16T14:00:00Z"
},
"org1--site2": {
"snapshotId1": "2025-01-17T09:15:00Z"
},
"org2--site1": {
"snapshotId1": "2025-01-18T16:45:00Z"
}
}
The publish worker tracks completed snapshots in date-based JSON files:
[
{
"org": "org1",
"site": "site1",
"snapshotId": "snapshot-123",
"scheduledPublish": "2025-01-15T10:30:00Z",
"publishedAt": "2025-01-15T10:30:15Z",
"publishedBy": "scheduled-snapshot-publisher"
}
]
The DLQ worker stores failed snapshots for investigation:
[
{
"org": "org1",
"site": "site1",
"snapshotId": "snapshot-456",
"scheduledPublish": "2025-01-15T10:30:00Z",
"messageId": "abc-123",
"timestamp": 1696412100000,
"failedAt": "2025-01-15T10:35:45Z",
"reason": "exceeded-max-retries"
}
]
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ ┌─────────────┐
│ Register │ │ Cron │ │ Publish │ │ DLQ │
│ │ │ │ │ │ │ │
│ POST / │ │ Every 5 min │ │ Queue Consumer │ │ Failed Msgs │
│ register │ │ → Read │ │ → Batch Process │ │ → Log & │
│ POST / │ │ schedule │ │ → Retry 5x │ │ Store │
│ schedule │ │ → Queue │ │ → Admin API │ │ Failures │
│ GET / │ │ snapshots │ │ → Update R2 │ │ │
│ schedule │ │ │ │ │ │ │
└─────────────┘ └──────────────┘ └─────────────────┘ └─────────────┘
│ │ │ ▲
│ │ │ │
▼ ▼ ▼ │ (max retries)
┌───────────────────────────────────────────────────────────┐ │
│ R2 Bucket Storage │ │
│ • schedule.json - Current scheduled snapshots │ │
│ • completed/YYYY-MM-DD.json - Successfully published │ │
│ • failed/YYYY-MM-DD.json - Failed after retries ◄─────┘ │
└───────────────────────────────────────────────────────────┘
R2_BUCKET
: Cloudflare R2 bucket for storing schedule data, completed snapshots, and failed snapshotsSCHEDULER_KV
: Cloudflare KV namespace for storing API tokensPUBLISH_QUEUE
: Cloudflare Queue for snapshot publishing with retry mechanismDLQ
: Dead Letter Queue for failed snapshots after max retries
All register service endpoints require proper authorization:
- Admin access (for registration): Requires access to AEM Admin API site configuration
- Basic author access (for scheduling): Requires access to AEM Snapshot List API
The service validates authorization by making test calls to:
https://admin.hlx.page/config/{org}/sites/{site}.json
(for admin access)https://admin.hlx.page/snapshot/{org}/{site}/main
(for snapshot access)
Each component is deployed as a separate Cloudflare Worker automatically via GitHub Actions:
Branch Deployments (CI Environment)
- When you push code to any branch (except
main
), thebuild.yml
workflow runs - Tests are executed and all 4 workers are deployed to the CI environment using
wrangler-ci.toml
configs - CI workers are available at:
*-ci.adobeaem.workers.dev
Production Deployments (Main Branch)
- When code is merged to
main
, thesemantic-release.yml
workflow runs - Semantic-release analyzes commit messages (using conventional commits:
feat:
,fix:
, etc.) - If a release is warranted, it:
- Creates a new version and updates the CHANGELOG
- Deploys all 4 workers atomically to production using
wrangler.toml
configs - Tags the release in GitHub
- Sends notifications to Coralogix and Slack
Worker Components:
register/
- HTTP endpoint for registration and schedule managementcron/
- Scheduled worker (runs every 5 minutes) that reads schedule data and queues snapshotspublish/
- Queue worker for publishing with automatic retry mechanismdlq/
- Dead Letter Queue consumer for handling permanently failed snapshots
Manual Deployment:
# Deploy to production
cd <worker-directory>
npm run deploy
# Deploy to CI
npm run deploy-ci
See individual wrangler.toml
and wrangler-ci.toml
files for deployment configuration.