Skip to content

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Jan 28, 2026

Bumps next from 16.0.10 to 16.1.5.

Release notes

Sourced from next's releases.

v16.1.5

Please refer the following changelogs for more information about this security release:

https://vercel.com/changelog/summaries-of-cve-2025-59471-and-cve-2025-59472 https://vercel.com/changelog/summary-of-cve-2026-23864

v16.1.4

[!NOTE] This release is backporting bug fixes. It does not include all pending features/changes on canary.

Core Changes

  • Only filter next config if experimental flag is enabled (#88733)

Credits

Huge thanks to @​mischnic for helping!

v16.1.3

[!NOTE] This release is backporting bug fixes. It does not include all pending features/changes on canary.

Core Changes

  • Fix linked list bug in LRU deleteFromLru (#88652)
  • Fix relative same host redirects in node middleware (#88253)

Credits

Huge thanks to @​acdlite and @​ijjk for helping!

v16.1.2

[!NOTE] This release is backporting bug fixes. It does not include all pending features/changes on canary.

Core Changes

  • Turbopack: Update to swc_core v50.2.3 (#87841) (#88296)
    • Fixes a crash when processing mdx files with multibyte characters. (#87713)
  • Turbopack: mimalloc upgrade and enabling it on musl (#88503) (#87815) (#88426)
    • Fixes a significant performance issue on musl-based Linux distributions (e.g. Alpine in Docker) related to musl's allocator.
    • Other platforms have always used mimalloc, but we previously did not use mimalloc on musl because of compilation issues that have since been resolved.

Credits

Huge thanks to @​mischnic for helping!

v16.1.1

[!NOTE] This release is backporting bug fixes. It does not include all pending features/changes on canary.

... (truncated)

Commits
  • acba4a6 v16.1.5
  • e1d1fc6 Add maximum size limit for postponed body parsing (#88175)
  • 500ec83 fetch(next/image): reduce maximumResponseBody from 300MB to 50MB (#88588)
  • 1caaca3 feat(next/image)!: add images.maximumResponseBody config (#88183)
  • 522ed84 Sync DoS mitigations for React Flight
  • 8cad197 [backport][cna] Ensure created app is not considered the workspace root in pn...
  • 2718661 Backport/docs fixes (#89031)
  • 5333625 Backport/docs fixes 16.1.5 (#88916)
  • 60de6c2 v16.1.4
  • 5f75d22 backport: Only filter next config if experimental flag is enabled (#88733) (#...
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the Security Alerts page.

@dependabot dependabot bot added dependencies Pull requests that update a dependency file javascript Pull requests that update javascript code labels Jan 28, 2026
@dependabot dependabot bot force-pushed the dependabot/npm_and_yarn/next-16.1.5 branch from 6fa292a to 4be0b81 Compare January 29, 2026 22:45
@github-actions
Copy link

github-actions bot commented Feb 3, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • ✅ Keeperhub Application
  • ✅ PostgreSQL Database (isolated instance)
  • ✅ LocalStack (SQS emulation)

The environment will be automatically cleaned up when this PR is closed or merged.

@suisuss
Copy link

suisuss commented Feb 3, 2026

Workflow Package Upgrade Required

The Next.js 16.1.5 upgrade is incompatible with workflow@4.0.1-beta.17 due to a missing module:

Error: Cannot find module 'next/dist/lib/server-external-packages.json'

To fix this, I upgraded the workflow package to 4.1.0-beta.51 (commit 75f2bd0).

Breaking Changes in workflow@4.1.0

However, the preview deploy shows workflows getting stuck (trigger node stays in "running" state). This is likely due to breaking changes in the new version:

  1. Input/Output now binary (Uint8Array) - User input/output at the World interface changed from plain objects to Uint8Array. This may affect how triggerInput is passed to start().

  2. Event-sourced architecture - Storage interface is now read-only; all mutations go through events.create(). Runs, steps, and hooks are now materializations of event logs. Our custom execution tracking via Drizzle may conflict with this.

  3. Deprecated events removed - workflow_completed, workflow_failed, workflow_started replaced with run_completed, run_failed, run_started.

  4. Spec version compatibility - PR #894 added backwards compatibility for v1 runs from v2 runtime, but new runs created with v2 may have different expectations.

Next Steps

Need to investigate:

  • Check preview deploy logs for serialization errors
  • Verify if our custom logStepStartDb/logStepCompleteDb functions interfere with the SDK's event-sourcing
  • Test if the start() function needs different input handling

Reference: workflow@4.1.0-beta.51 release notes

@suisuss
Copy link

suisuss commented Feb 3, 2026

Bump is incompatible with the current vercel workflow package, and upgrading it causes breaking changes - will need to circle back to this imo.

Upstream haven't patched their next version either. Both keeperhub and workflow_builder_template are vulnerable to the next.js CVEs. They haven't upgraded the upstream to the latest vercel/workflow package.

The upstream vercel-labs/workflow-builder-template is on next@16.0.10 and is vulnerable to:

These are the same security issues that the dependabot PR #229 fixes by upgrading to next@16.1.5.
They'll need to upgrade Next.js for security, which will force them to upgrade workflow to 4.1.0+, which has breaking changes. It's the same situation you're dealing with now.

dependabot bot and others added 2 commits February 5, 2026 16:23
Bumps [next](https://github.com/vercel/next.js) from 16.0.10 to 16.1.5.
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](vercel/next.js@v16.0.10...v16.1.5)

---
updated-dependencies:
- dependency-name: next
  dependency-version: 16.1.5
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@suisuss suisuss force-pushed the dependabot/npm_and_yarn/next-16.1.5 branch from 75f2bd0 to 3cb0c95 Compare February 5, 2026 05:33
@suisuss
Copy link

suisuss commented Feb 5, 2026

Workflow Package 4.1.0-beta.51 Breaking Change Fix

Problem

The start() function from workflow/api in version 4.1.0-beta.51 now returns Promise<Run<TResult>> and must be awaited. The current code calls start() without awaiting, causing workflows to not trigger properly.

Root Cause

// Current (broken) - lines 42-50 and 98-106
start(executeWorkflow, [{ nodes, edges, triggerInput, executionId, workflowId }]);
// Promise floats, errors lost, workflow may not start
// Required (4.1.0-beta.51)
const run = await start(executeWorkflow, [args]);
// run.runId available for tracking

Files to Modify

1. app/api/workflow/[workflowId]/execute/route.ts

Function: executeWorkflowBackground() (lines 21-70)

Change: Add await before start() call at line 42

// Before (line 42-50)
start(executeWorkflow, [
  {
    nodes,
    edges,
    triggerInput: input,
    executionId,
    workflowId,
  },
]);

// After
const run = await start(executeWorkflow, [
  {
    nodes,
    edges,
    triggerInput: input,
    executionId,
    workflowId,
  },
]);

console.log("[Workflow Execute] Workflow started, runId:", run.runId);

2. app/api/workflows/[workflowId]/webhook/route.ts

Function: executeWorkflowBackground() (lines 81-125)

Change: Add await before start() call at line 98

// Before (line 98-106)
start(executeWorkflow, [
  {
    nodes,
    edges,
    triggerInput: input,
    executionId,
    workflowId,
  },
]);

// After
const run = await start(executeWorkflow, [
  {
    nodes,
    edges,
    triggerInput: input,
    executionId,
    workflowId,
  },
]);

console.log("[Webhook] Workflow started, runId:", run.runId);

What NOT to Change

  • Keep POST handlers NOT awaiting executeWorkflowBackground() - Intentional fire-and-forget for background execution
  • Keep current API response format - { executionId, status: "running" }
  • Keep current executionId generation - Created before start() is called
  • No database schema changes - Our execution tracking is sufficient
  • Don't store run.runId - Our executionId is primary, runId is just for logging

Design Decisions

Decision Choice Rationale
Use Run.runId? Log only, don't store Our executionId is primary, schema change not worth it
Store Run object? No We have our own status tracking
API response change? No Would break clients
Error handling Keep existing Already updates execution to error state

Phase 1: Refactor

Address the issue mentioned above the proposed changes. Then I can go ahead and....

Phase 2: Verification

Run the full test suite:

pnpm check           # Lint
pnpm type-check      # TypeScript
pnpm test:unit       # Unit tests
pnpm test:integration # Integration tests
pnpm build           # Production build

Phase 3: Manual Testing

  1. Create a test workflow with multiple steps
  2. Execute via API endpoint
  3. Verify execution logs show correct status
  4. Test webhook trigger
  5. Verify codegen output includes correct imports

Phase 4: Deployment

  1. Deploy to staging environment
  2. Run smoke tests
  3. Monitor for errors in Sentry
  4. Deploy to production
  5. Monitor workflow execution metrics

Version Considerations

Version Pros Cons
4.1.0-beta.51 Already tested in PR, CI passes Not latest
4.1.0-beta.52 Fixes Bun circular dependency May introduce untested changes

Recommendation: Start with 4.1.0-beta.51. Consider upgrading to beta.52 in a follow-up PR if Bun compatibility is needed.

Rollback Plan

If issues are discovered post-deployment:

  1. Revert the commit: git revert <commit-hash>
  2. Deploy the revert
  3. Investigate issues in staging environment

Note: The security vulnerabilities primarily affect self-hosted deployments. Vercel-hosted applications have WAF protection as an interim measure.

Monitoring

Post-deployment, monitor:

  • Workflow execution success rate
  • API response times for /api/workflow/*/execute
  • Memory usage (CVE fixes relate to DoS via memory exhaustion)
  • Build times (withWorkflow plugin performance)
  • Sentry for new error patterns

References

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • ✅ Keeperhub Application
  • ✅ PostgreSQL Database (isolated instance)
  • ✅ LocalStack (SQS emulation)

The environment will be automatically cleaned up when this PR is closed or merged.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🧹 PR Environment Cleaned Up

The PR environment has been successfully deleted.

Deleted Resources:

  • 🗑️ Namespace: pr-229
  • 🗑️ Keeperhub Application
  • 🗑️ PostgreSQL Database (including data)
  • 🗑️ LocalStack
  • 🗑️ All associated secrets and configs

All resources have been cleaned up and will no longer incur costs.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • ✅ Keeperhub Application
  • ✅ PostgreSQL Database (isolated instance)
  • ✅ LocalStack (SQS emulation)

The environment will be automatically cleaned up when this PR is closed or merged.

@joelorzet
Copy link

joelorzet commented Feb 5, 2026

Thanks for the detailed breakdown. I agree with the proposed plan and the approach for handling the breaking change in 4.1.0-beta.51.

So far, I've completed Phase 1 (Refactor) and tested the new implementation using workflow@4.1.0-beta.51.

Testing results:

  • Local environment:
    All changes are working as expected. Workflows start correctly with the awaited start() call, and executions complete without issues.

  • PR environment:
    I observed two cases where the workflow run got stuck (execution remained in a running state and did not progress). I wasn't able to reproduce this consistently, and it did not occur locally.

At this point, I’m aligned with moving forward with the plan, but I want to be cautious, as this change could introduce a significant breaking behavior.

Given that I observed a couple of stuck runs in the PR environment (while local testing was stable), In particular, we should closely validate the execution lifecycle and background processing behavior under repeated and concurrent runs.

Once we’re confident the PR environment behaves consistently, I’m comfortable continuing with the remaining phases.

Evidence

Workflows remain stuck in the Run state
Screenshot from 2026-02-05 14-47-03

Edit:

The issue persists after restarting the pr environment. Will run local tests trying to reproduce the error

Evidence

Persistent issue
image

Edit 2:

Noticed that the updates were not introduced in this PR, so this behavior is expected.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🧹 PR Environment Cleaned Up

The PR environment has been successfully deleted.

Deleted Resources:

  • 🗑️ Namespace: pr-229
  • 🗑️ Keeperhub Application
  • 🗑️ PostgreSQL Database (including data)
  • 🗑️ LocalStack
  • 🗑️ All associated secrets and configs

All resources have been cleaned up and will no longer incur costs.

@eskp eskp requested review from a team, OleksandrUA, eskp and taitsengstock and removed request for a team February 5, 2026 23:11
…cing

pnpm strict mode keeps transitive deps in .pnpm/ where Next.js
standalone output tracing globs cannot reach them. The .pnpm/ glob
approach causes symlink-vs-directory conflicts during copy.

Add .npmrc with public-hoist-pattern to promote the full dependency
tree of @workflow/world-postgres to top-level node_modules, and
list every transitive package in serverExternalPackages and
outputFileTracingIncludes so the standalone build includes them.
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • ✅ Keeperhub Application
  • ✅ PostgreSQL Database (isolated instance)
  • ✅ LocalStack (SQS emulation)

The environment will be automatically cleaned up when this PR is closed or merged.

@suisuss
Copy link

suisuss commented Feb 9, 2026

Fix: MODULE_NOT_FOUND for pg-boss (and all world-postgres transitive deps)

Problem

PR deploys were failing with MODULE_NOT_FOUND: pg-boss at runtime. The app worked fine in local dev (pnpm dev) but broke in Docker/standalone builds.

Root Cause

Three layers of issues compounding:

  1. Dynamic require in the Workflow SDK — The SDK loads the world implementation via require(process.env.WORKFLOW_TARGET_WORLD), which resolves to @workflow/world-postgres at runtime. Next.js standalone output tracing (@vercel/nft) cannot follow dynamic requires where the target is an env var, so the entire package and its dependency tree are invisible to the tracer.

  2. serverExternalPackages prevents internal tracing — The original config correctly listed @workflow/world-postgres in serverExternalPackages and outputFileTracingIncludes, but marking a package as server-external tells nft "don't look inside this package at all." So even though @workflow/world-postgres/dist/index.js has a static import PgBoss from 'pg-boss' on line 1, nft never sees it.

  3. pnpm strict mode — Transitive deps live in .pnpm/ and are NOT hoisted to top-level node_modules/. The outputFileTracingIncludes glob ./node_modules/pg-boss/**/* matched nothing because pg-boss wasn't at that path. Attempting .pnpm/ path globs instead caused symlink-vs-directory conflicts (ENOTDIR/ENOENT) because pnpm's internal symlink farms don't survive being copied into standalone output.

Why It Worked Locally

pnpm dev runs against the full node_modules/ directory — all packages are present and Node resolves them normally. The bug only manifests in environments running from the standalone output (Docker, PR deploys, production).

Fix (commits 5aa3c2f, 9608918)

.npmrc — Added public-hoist-pattern entries to force pnpm to hoist all transitive deps of @workflow/world-postgres to top-level node_modules/:

public-hoist-pattern[]=pg-boss
public-hoist-pattern[]=@workflow/*
public-hoist-pattern[]=@vercel/queue
public-hoist-pattern[]=cbor-x
public-hoist-pattern[]=cbor-extract
public-hoist-pattern[]=ulid
public-hoist-pattern[]=async-sema
public-hoist-pattern[]=cron-parser

next.config.ts — Listed the full transitive dependency tree in both serverExternalPackages and outputFileTracingIncludes with simple ./node_modules/<pkg>/**/* globs (which now resolve correctly thanks to hoisting).

Full Dependency Tree of @workflow/world-postgres

Traced by reading the compiled .js files in the package:

File Import Already hoisted?
dist/index.js pg-boss no
dist/index.js postgres yes (direct dep)
dist/queue.js @vercel/queue no
dist/queue.js @workflow/world-local no
dist/queue.js ulid no
dist/storage.js @workflow/errors no
dist/storage.js @workflow/world no
dist/storage.js ulid no
dist/drizzle/*.js cbor-x no
dist/drizzle/*.js drizzle-orm yes (direct dep)
@workflow/world-local async-sema no
@workflow/world-local @workflow/utils no
pg-boss cron-parser yes (but included for safety)

Verified

Docker build with --target builder completes with zero Failed to copy traced files errors.

Note on Vercel SDK Design

The dynamic require(env_var) pattern works seamlessly on Vercel's platform where they control the runtime. @workflow/world-postgres is the self-hosted escape hatch — the standalone tracing gap is a known pain point for Docker/K8s deployments. All @workflow/* packages come from https://github.com/vercel/workflow.

@vercel/queue imports @vercel/oidc and mixpart — another layer of
transitive deps invisible to the standalone tracer.
@suisuss
Copy link

suisuss commented Feb 9, 2026

Whack-a-mole continues — @vercel/queue imports @vercel/oidc and mixpart

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • ✅ Keeperhub Application
  • ✅ PostgreSQL Database (isolated instance)
  • ✅ LocalStack (SQS emulation)

The environment will be automatically cleaned up when this PR is closed or merged.

postgres, drizzle-orm, zod, dotenv are direct app deps that nft
traces from the app code path — but serverExternalPackages prevents
nft from seeing them as deps of world-postgres too. Module resolution
from inside the externalized package can't find them in standalone.
@github-actions
Copy link

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • ✅ Keeperhub Application
  • ✅ PostgreSQL Database (isolated instance)
  • ✅ LocalStack (SQS emulation)

The environment will be automatically cleaned up when this PR is closed or merged.

Walk the full dependency tree (34 hard deps) instead of discovering
them one runtime error at a time. Generated by reading package.json
dependencies recursively from @workflow/world-postgres through the
pnpm store. Includes pg and its sub-packages (pg-pool, pg-protocol,
etc.), undici, luxon, and all other leaf deps.
@github-actions
Copy link

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • ✅ Keeperhub Application
  • ✅ PostgreSQL Database (isolated instance)
  • ✅ LocalStack (SQS emulation)

The environment will be automatically cleaned up when this PR is closed or merged.

Walks the full dependency tree of @workflow/world-postgres and outputs
the package lists needed for .npmrc and next.config.ts standalone tracing.
@github-actions
Copy link

🧹 PR Environment Cleaned Up

The PR environment has been successfully deleted.

Deleted Resources:

  • 🗑️ Namespace: pr-229
  • 🗑️ Keeperhub Application
  • 🗑️ PostgreSQL Database (including data)
  • 🗑️ LocalStack
  • 🗑️ All associated secrets and configs

All resources have been cleaned up and will no longer incur costs.

Production was missing WORKFLOW_TARGET_WORLD, WORKFLOW_POSTGRES_URL,
and workflow-postgres-setup in the init container. Without these,
prod defaults to Local World (filesystem-based) instead of Postgres
World. Aligns prod with staging and PR environment configuration.
@suisuss
Copy link

suisuss commented Feb 10, 2026

KEEP-1371: Next.js 16.0.10 -> 16.1.5 Upgrade

Summary

PR #229 upgrades Next.js from 16.0.10 to 16.1.5 (security patches) and bumps the Vercel Workflow SDK from 4.0.1-beta.17 to 4.1.0-beta.51 (breaking change). This document covers the deployment architecture relevant to validating the PR, what the PR environment can and cannot test, and the changes made.

Security CVEs Addressed

Breaking Change: Workflow SDK start() Now Returns Promise<Run>

In workflow SDK 4.1.0-beta.51, start() changed from fire-and-forget to returning Promise<Run>. Calls must now be awaited.

Files changed:

File Change
app/api/workflow/[workflowId]/execute/route.ts start(...) -> const run = await start(...)
app/api/workflows/[workflowId]/webhook/route.ts Same

The outer executeWorkflowBackground() function is intentionally NOT awaited in the POST handler (fire-and-forget pattern preserved). Only the inner start() call needed the fix.

SDK Local World: .workflow-data/ Filesystem Requirement

The Problem

The start() function relies on the SDK's "World" system for storage and queuing. On EKS (not Vercel), the SDK defaults to the Local World (@workflow/world-local), which persists run state as JSON files in .workflow-data/runs/.

On SDK 4.0.1-beta.17, start() was fire-and-forget. The filesystem write to .workflow-data/ would fail silently as an unhandled promise rejection — the workflow still executed because it runs in-process. On 4.1.0-beta.51, start() is awaited, so the ENOENT error propagates and kills the execution before the workflow runs:

Error: ENOENT: no such file or directory, open '.workflow-data/runs/wrun_....json.tmp...'

In the K8s container, this directory didn't exist and the non-root nextjs user (uid 1001) couldn't create it under /app (owned by root).

The Fix

Added mkdir -p .workflow-data/runs && chown -R nextjs:nodejs .workflow-data in the Dockerfile runner stage, before the USER nextjs switch. Also added .workflow-data/ to .gitignore for local dev.

Why executeWorkflow() Can't Be Called Directly

Investigated bypassing start() entirely — calling executeWorkflow() directly, the same pattern used by scripts/workflow-runner.ts. This does not work in the Next.js context. The "use workflow" directive is a compile-time transformation: Next.js replaces the function body with a guard that throws if called directly:

Error: You attempted to execute workflow executeWorkflow function directly.
To start a workflow, use start(executeWorkflow) from workflow/api

The workflow-runner script avoids this because it runs via tsx outside the Next.js compiler — the "use workflow" directive is not processed.

Local World Limitations (Resolved: Now Using Postgres World)

The Local World is documented as "designed for development, not production":

  • In-memory queue — steps don't persist across server restarts
  • Filesystem storage — JSON files in .workflow-data/
  • Single instance — cannot handle distributed deployments

This has been resolved by switching to the Postgres World (WORKFLOW_TARGET_WORLD=@workflow/world-postgres). The Helm values for both staging and PR environments set this env var, and instrumentation.ts initializes the postgres world at startup. The .workflow-data/ directory fix above is still needed as a fallback (the SDK creates it regardless of world backend), but run state is now persisted in PostgreSQL via pg-boss. See the "Standalone Output Tracing" section below for the dependency management this required.

World Configuration Reference

Env Var Purpose Default
WORKFLOW_TARGET_WORLD Select world backend Auto-detect (Local in non-Vercel)
WORKFLOW_LOCAL_DATA_DIR Local World data directory .workflow-data/
WORKFLOW_LOCAL_BASE_URL Local World base URL http://localhost:{port}
WORKFLOW_LOCAL_QUEUE_CONCURRENCY Max concurrent queue workers 100

PR Environment Architecture

PR environments deploy to AWS EKS in an isolated namespace (pr-${PR_NUMBER}) via the deploy-pr-environment label.

What Gets Deployed

Component Image Target Deployed in PR? Notes
Next.js app runner Yes Single replica via Helm
DB Migrator migrator Yes Init container, runs db:push + db:seed
PostgreSQL - Yes Isolated CNPG cluster per PR
LocalStack (SQS) - Yes Emulated SQS queue
Schedule Dispatcher scheduler No Image built but no CronJob deployed
Job Spawner scheduler No Image built but no Deployment deployed
Workflow Runner workflow-runner No Image built but nothing spawns K8s Jobs

Why Scheduler/Events Are Missing

The PR Helm values template (deploy/pr-environment/values.template.yaml) only defines:

  • A single Deployment (the app)
  • An init container (the migrator)

It does not include:

  • A CronJob for the schedule dispatcher
  • A Deployment for the job spawner
  • Any connection to the events service or MCP service

The external service API keys (SCHEDULER_SERVICE_API_KEY, EVENTS_SERVICE_API_KEY, MCP_SERVICE_API_KEY) are configured and pulled from staging Parameter Store, but the services that use those keys to call the PR app are not pointing at the PR environment -- they point at the staging app.

Infrastructure Per PR

Namespace: pr-${PR_NUMBER}
├── Deployment: keeperhub-pr-${PR_NUMBER}  (Next.js app)
│   └── Init Container: db-migration       (migrator)
├── Service: keeperhub-pr-${PR_NUMBER}     (ClusterIP :3000)
├── Ingress: app-pr-${PR_NUMBER}.keeperhub.com
├── PostgreSQL: keeperhub-pr-${PR_NUMBER}-db-rw
├── LocalStack: localstack.pr-${PR_NUMBER} (:4566)
│   └── SQS Queue: keeperhub-workflow-queue
└── ServiceAccount: keeperhub-pr-${PR_NUMBER}
    └── RBAC: batch/jobs (create,get,list,watch,delete), pods (get,list,watch), pods/log (get)

Execution Paths and PR Testability

Path 1: Manual Test Run (Testable)

User clicks "Run" in UI
  -> POST /api/workflow/[workflowId]/execute (session auth)
    -> Creates workflowExecutions record
    -> await start(executeWorkflow, [...])   <-- THE BREAKING CHANGE
    -> Returns executionId

This is the primary path affected by the SDK upgrade. It runs entirely within the Next.js app process. Fully testable in PR environment.

Path 2: Webhook Trigger (Testable)

External HTTP request
  -> POST /api/workflows/[workflowId]/webhook
    -> Validates webhook config
    -> Creates workflowExecutions record
    -> await start(executeWorkflow, [...])   <-- THE BREAKING CHANGE
    -> Returns executionId

Also runs within the Next.js process. Fully testable in PR environment.

Path 3: Schedule Trigger

CronJob (every minute)
  -> schedule-dispatcher queries workflow_schedules (innerJoin workflows, both enabled)
    -> Evaluates cron expressions with cron-parser (timezone-aware)
    -> Sends SQS message for due workflows
      -> Job Spawner polls SQS (long-poll 20s, up to 10 msgs)
        -> Validates workflow + schedule still enabled
        -> Creates workflowExecutions record (status: "pending")
        -> Creates K8s Job (workflow-runner image)
          -> scripts/workflow-runner.ts entry point (tsx)
            -> Fetches workflow from DB (nodes, edges as JSONB)
            -> Validates integration ownership
            -> Calls executeWorkflow() DIRECTLY (no start())
            -> Updates workflowExecutions + workflowSchedules in DB

Entry point: scripts/workflow-runner.ts (CMD ["tsx", "scripts/workflow-runner.ts"] in Dockerfile workflow-runner target).

SDK surface: The runner does NOT call start() from workflow/api. It calls executeWorkflow() directly from lib/workflow-executor.workflow.ts, which uses "use workflow" and "use step" directives (Workflow DevKit runtime, not the start() API). The start() breaking change in 4.1.0-beta.51 does not affect this path.

Residual risk: The "use workflow" / "use step" directive runtime behavior could theoretically change in the SDK bump (4.0.1 -> 4.1.0), but all 491 unit tests and 91 integration tests pass against executeWorkflow() with the new SDK, which exercises this code path.

Path 4: Internal Service Trigger

MCP / Events / Scheduler service
  -> POST /api/workflow/[workflowId]/execute
    -> Header: Authorization: Bearer ${SERVICE_API_KEY}
    -> authenticateInternalService() validates key
    -> Same execution as Path 1

The services that would call this endpoint point at staging, not the PR app.

Validation Strategy

Given the above, the PR should be validated by:

  1. Unit tests (491 passed) - Core logic
  2. Integration tests (91 passed) - API routes, DB operations
  3. E2E tests (11 passed, 1 skipped) - Full UI flows including auth, invitations, org management
  4. Manual test run in PR environment - Click "Run" on a workflow, verify execution completes
  5. Webhook test in PR environment - Trigger a webhook-enabled workflow via curl
  6. Staging deployment - Full validation of all paths including schedules and events

The await start() change is the only code change beyond dependency bumps. It affects Paths 1 and 2 (both testable). Path 3 (schedules) calls executeWorkflow() directly via scripts/workflow-runner.ts -- it never calls start() from workflow/api, so the breaking change does not apply. The same executeWorkflow() function is used by all paths, so test coverage of that function applies regardless of trigger mechanism.

Standalone Output Tracing: MODULE_NOT_FOUND for Workflow World Dependencies

The Problem

PR deploys failed at runtime with MODULE_NOT_FOUND for pg-boss, then @vercel/oidc, then postgres, then undici — each discovered only after deploying. The app worked in local dev (pnpm dev) but broke in Docker/standalone builds.

Root Cause (Three Layers)

  1. Dynamic require in the Workflow SDK — The SDK loads the world implementation via require(process.env.WORKFLOW_TARGET_WORLD), resolving to @workflow/world-postgres at runtime. Next.js standalone output tracing (@vercel/nft) cannot follow dynamic requires where the target is an environment variable, so the package and its entire dependency tree are invisible to the tracer.

  2. serverExternalPackages prevents internal tracing — The config listed @workflow/world-postgres in serverExternalPackages, which tells nft "don't look inside this package." So even though @workflow/world-postgres/dist/index.js has import PgBoss from 'pg-boss' on line 1, nft never sees it. Every transitive dependency must be explicitly listed.

  3. pnpm strict mode — Transitive deps live in .pnpm/ and are NOT hoisted to top-level node_modules/. The outputFileTracingIncludes globs like ./node_modules/pg-boss/**/* match nothing because the package isn't at that path. Attempting .pnpm/ path globs causes symlink-vs-directory conflicts (ENOTDIR/ENOENT) because pnpm's internal symlink farms don't survive being copied into standalone.

Why It Works It Development Mode and Not Production Mode

  • pnpm dev: runs against the full node_modules/ — everything is there, Node resolves normally
  • next build standalone: produces a minimal .next/standalone/ with only traced files — anything nft missed is gone
  • The bug only manifests in environments running from the standalone output (Docker, PR deploys, production)

The Fix

Two-part solution:

.npmrc with public-hoist-pattern — Forces pnpm to hoist specific transitive deps to top-level node_modules/, making them reachable by the outputFileTracingIncludes globs.

next.config.ts — Lists the full transitive dependency tree (34 packages) in both serverExternalPackages and outputFileTracingIncludes. Generated by walking package.json dependencies recursively from @workflow/world-postgres through the pnpm store (via node scripts/list-world-deps.mjs).

Shared Deps Trap

Even packages the app already uses directly (postgres, drizzle-orm, zod, dotenv) must be listed in outputFileTracingIncludes. nft traces them from the app code, so they exist in standalone — but module resolution from inside @workflow/world-postgres is path-based and may not find them at the expected location in standalone output.

What NOT To Do

  • Don't glob into .pnpm/ — symlink farms cause ENOTDIR/ENOENT conflicts during copy
  • Don't only include the top-level package — its transitive deps won't be traced since the whole package is externalized
  • Don't discover deps one deploy at a time — walk the full tree upfront with a script

@vercel/queue Internals

@vercel/queue (0.0.0-alpha.36) is a private Vercel package with no public repo. Source is readable at node_modules/.pnpm/@vercel+queue@*/node_modules/@vercel/queue/dist/index.mjs (~1500 lines). It's a REST queue client for vercel-queue.com with three external imports: @vercel/oidc (OIDC auth tokens), mixpart (multipart stream parsing), and Node builtins. In the postgres world path, it's used mainly for its JsonTransport serializer — the actual queuing goes through pg-boss locally.

Complete Transitive Dependency Tree

All 34 hard dependencies of @workflow/world-postgres (excludes peer deps):

@vercel/oidc          @vercel/queue         @workflow/errors
@workflow/utils       @workflow/world       @workflow/world-local
async-sema            cbor-x                cron-parser
dotenv                drizzle-orm           luxon
mixpart               ms                    pg
pg-boss               pg-connection-string  pg-int8
pg-pool               pg-protocol           pg-types
pgpass                postgres              postgres-array
postgres-bytea        postgres-date         postgres-interval
serialize-error       split2                type-fest
ulid                  undici                xtend
zod

Sustainability Assessment

This approach requires updating three files (.npmrc, next.config.ts serverExternalPackages, next.config.ts outputFileTracingIncludes) every time the workflow SDK updates. The build succeeds even when deps are missing — failures are runtime-only.

More sustainable alternatives:

  • node-linker=hoisted in .npmrc — makes pnpm behave like npm, eliminates hoisting issues entirely. Tradeoff: loses pnpm strict isolation.
  • Dockerfile post-build copy — copy needed packages into standalone after build. Decouples from tracer. Tradeoff: fragile across Next.js versions.
  • Upstream fixserverExternalPackages should recursively include transitive deps. That's a Next.js issue.

Why Vercel Designed It This Way

The dynamic require(env_var) pattern works seamlessly on Vercel's platform where they control the runtime. @workflow/world-postgres is the self-hosted escape hatch. The standalone tracing gap is a known pain point for Docker/K8s deployments but not Vercel's priority use case.

All @workflow/* packages: https://github.com/vercel/workflow

Package Sources

Package Repository
@workflow/* https://github.com/vercel/workflow (monorepo)
@vercel/queue Private (no public repo)
@vercel/oidc Private (no public repo)
pg-boss https://github.com/timgit/pg-boss
cbor-x https://github.com/kriszyp/cbor-x
cbor-extract https://github.com/kriszyp/cbor-extract
ulid https://github.com/ulid/javascript
async-sema https://github.com/vercel/async-sema
cron-parser https://github.com/harrisiirak/cron-parser
mixpart Unknown (private)

Scheduler Service: No Changes Required for Postgres Worlds

Architecture

keeperhub-scheduler (git@github.com:suisuss/keeperhub-scheduler.git) is a separate microservice that evaluates cron schedules and spawns K8s Jobs to execute workflows. It does NOT use the Vercel Workflow SDK or @workflow/world-postgres. Its dependency footprint is minimal and entirely independent:

Dependency Purpose
@aws-sdk/client-sqs SQS queue polling
@kubernetes/client-node K8s Job creation
cron-parser Cron expression parsing
drizzle-orm + postgres Direct DB access (shared KeeperHub DB)
nanoid ID generation

Two-Service Pattern

  1. Schedule Dispatcher — K8s CronJob, runs every minute, evaluates cron expressions against workflow_schedules table, sends SQS messages for due workflows
  2. Job Spawner — Long-running Deployment, polls SQS (long-poll 20s, batch 10), creates workflow_executions records, spawns K8s Jobs using the workflow-runner container image

Why No Changes Are Needed

The scheduler's workflow-runner path is completely decoupled from the world system:

Aspect Next.js App (Paths 1 & 2) K8s Runner (Path 3)
Entry point Next.js server process tsx scripts/workflow-runner.ts
How workflows execute await start(executeWorkflow, [...]) executeWorkflow() called directly
World system Yes — instrumentation.ts loads postgres world No — runner doesn't use start() or any world
State persistence SDK writes via world backend Runner writes directly to DB via Drizzle
WORKFLOW_TARGET_WORLD Required (set in Helm values) Not passed, not needed
Standalone tracing Affected (needs all 34 deps hoisted) Not affected (tsx runs against full node_modules/)

The "use workflow" directive in lib/workflow-executor.workflow.ts is a compile-time transformation that Next.js applies. When run via tsx, the directive is ignored — executeWorkflow() is callable directly without start().

Env Vars Passed to Runner K8s Jobs

The job-spawner passes these env vars to spawned containers (src/job-spawner.ts:141-152):

WORKFLOW_ID, EXECUTION_ID, SCHEDULE_ID, WORKFLOW_INPUT,
DATABASE_URL, INTEGRATION_ENCRYPTION_KEY

Notably absent: WORKFLOW_TARGET_WORLD, WORKFLOW_POSTGRES_URL. The runner doesn't need them because it doesn't use the SDK's world system at all.

PR Environment Gaps

The PR environment builds the scheduler image but does not deploy it (no CronJob or Deployment in values.template.yaml). Scheduled workflow execution (Path 3) can only be validated in staging or production.

Events Service: No Changes Required

Architecture

keeperhub-events (git@github.com:techops-services/keeperhub-events.git) is a separate microservice that monitors blockchain smart contract events via WebSocket and triggers workflow executions via HTTP. It does NOT use the Vercel Workflow SDK or @workflow/world-postgres. It consists of two services:

Service Purpose
sc-event-tracker Spawns child processes per workflow, each opening a WebSocket to an EVM node. Listens for contract events matching the workflow's configured event name, contract address, and ABI.
sc-event-worker Express server (:3010) that periodically fetches active event-triggered workflows from KeeperHub (GET /api/workflows/events?active=true), exposes them to the tracker via /data, and proxies workflow executions to KeeperHub (POST /api/workflow/:id/execute).

Dependencies

Package Purpose
ethers WebSocket provider, ABI parsing, event filtering
ioredis Cross-container process synchronization, transaction deduplication (24h TTL)
axios HTTP calls to KeeperHub API
express Worker HTTP server
deep-diff Detecting workflow configuration changes for process restarts

Execution Path (Path 4 in this document)

Blockchain emits event
  → sc-event-tracker child process receives log via WebSocket
    → Validates event name match, deduplicates via Redis
    → POST sc-event-worker:3010/workflow/:id/execute  { ...eventPayload }
      → Worker wraps as { input: payload }
      → POST KEEPERHUB_API_URL/api/workflow/:id/execute
        → Headers: Authorization: Bearer ${JWT}, X-Internal-Token: ${API_KEY},
                   X-Service-Key: ${API_KEY}, X-Internal-Execution: true
        → KeeperHub: authenticateInternalService() → await start(executeWorkflow, [...])

Why No Changes Are Needed

The events service is a pure HTTP client to KeeperHub. Every aspect affected by this PR is internal to the Next.js process:

Aspect KeeperHub Next.js App Events Service
Workflow SDK Yes — start(), "use workflow", world system No — never imported
await start() breaking change Affected (Paths 1, 2, 4) Not affected — calls HTTP endpoint, not SDK
World system (postgres/local) Yes — instrumentation.ts loads world No involvement
Standalone output tracing Affected (34 deps) Not affected — plain Node.js, no Next.js build
HTTP API contract Unchanged — POST /api/workflow/:id/execute Unchanged — same endpoint, same payload shape
Response behavior Unchanged — executeWorkflowBackground() still fire-and-forget in POST handler Unchanged — receives same 200 response

Auth Pattern Note

The events worker authenticates using JWT username/password auth (POST /auth/token) plus X-Internal-Token / X-Service-Key headers with KEEPERHUB_API_KEY. KeeperHub references this key as EVENTS_SERVICE_API_KEY on its side. The auth mechanism was not changed in this PR. If authenticateInternalService() is ever refactored to drop support for these headers in favor of a different pattern, the events worker's http-service.js would need a corresponding update.

PR Environment Gaps

The events service is not deployed in PR environments. Event-triggered workflow execution (Path 4) can only be validated in staging or production.

@suisuss suisuss marked this pull request as ready for review February 10, 2026 01:50
@eskp eskp merged commit 0822e9a into staging Feb 10, 2026
3 checks passed
@eskp eskp deleted the dependabot/npm_and_yarn/next-16.1.5 branch February 10, 2026 01:56
@github-actions
Copy link

🧹 PR Environment Cleaned Up

The PR environment has been successfully deleted.

Deleted Resources:

  • 🗑️ Namespace: pr-229
  • 🗑️ Keeperhub Application
  • 🗑️ PostgreSQL Database (including data)
  • 🗑️ LocalStack
  • 🗑️ All associated secrets and configs

All resources have been cleaned up and will no longer incur costs.

@github-actions
Copy link

ℹ️ No PR Environment to Clean Up

No PR environment was found for this PR. This is expected if:

  • The PR never had the deploy-pr-environment label
  • The environment was already cleaned up
  • The deployment never completed successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file javascript Pull requests that update javascript code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants