[P2] onboarding clarity — Dima's approach (supersedes #866) by Ryanchen911 · Pull Request #1261 · gonka-ai/gonka

Ryanchen911 · 2026-05-27T10:25:29Z

Summary

Supersedes #866. Implements the architecture reviewers asked for: keep the broker untouched, derive onboarding UX at the handler layer, and run PoC-wiring assertions in a separate api selfcheck mode against a mocked chain.

Two commits:

feat(admin): onboarding clarity at handler layer — pure-function helpers (TimingCalculator, StatusReporter, OnboardingStateManager) + ParticipantActivityTracker (30s cached "is active" — zero per-request chain RPCs). GET /admin/v1/nodes returns an enriched NodeWithOnboarding envelope. broker.NodeState is unchanged; one log line in node_worker_commands.go is demoted Error→Info because "no epoch models" is normal during onboarding.
feat(selfcheck): isolated PoC-wiring self-test + manual mlnode test endpoint — api selfcheck subcommand spins up the real broker with a MockChainBridge and reports per-stage PASS/FAIL (exit 0/1). New POST /admin/v1/nodes/:id/test runs a one-shot MLnode validation (synchronous, no broker mutation, per-node mutex).

Why this PR instead of #866

#866 added UX state to broker.NodeState and new broker commands. Reviewers (@patimen, @DimaOrekhovPS) flagged this as adding UX complexity to a critical/already-complex code path. This PR delivers the same user-facing features without touching broker internals.

	#866	This PR
broker.NodeState new fields	6	0
broker new commands	2	0
Per-request chain RPCs in `GET /nodes`	N+1	0 (30s cache)
Location of UX state computation	broker writes back	handler derives

See proposals/onboarding-clarity-v1/README.md for product context.

Test plan

go build ./...
go test ./broker/... ./internal/server/admin/... ./selfcheck/... ./participant/... — all green
api selfcheck smoke test — 3 stages PASS (node-registered, epoch-models-populated, hardware-diff-submitted), exit 0
Manual: hit POST /admin/v1/nodes/:id/test on a live testnet node
Manual: confirm GET /admin/v1/nodes response shape (added optional onboarding field) doesn't break existing admin UI consumers

Compute onboarding UX (participant state, MLnode state, timing, human-friendly messages) in the admin handler rather than on broker state. The broker is unchanged; admin GET /nodes returns an enriched NodeWithOnboarding envelope that wraps the broker's NodeResponse with an optional Onboarding block. New components: - participant.ActivityTracker: background goroutine caches whether the participant is in the current active set (refreshed every 30s); admin handlers read IsActive() without per-request chain RPCs. - admin pure helpers (OnboardingStateManager, TimingCalculator, StatusReporter): no state, no broker references, no I/O. Outputs derive only from inputs. - apiconfig timing constants: shared defaults (block time, alert lead, auto-test slack). Also demote the "No epoch models available" log in node_worker_commands from Error to Info: this is normal during onboarding when the participant has not yet joined the active set, and Error here was the loudest source of confusion in user reports. See proposals/onboarding-clarity-v1/README.md for product context. Replaces the broker-coupled approach from gonka-ai#866 with the separation suggested by reviewers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ndpoint Implements the approach suggested by reviewers on PR gonka-ai#866: run the real broker against a mocked chain in a separate mode to verify the PoC lifecycle is wired up, with production code paths untouched. Two additions: 1) admin POST /admin/v1/nodes/:id/test Synchronous one-shot validation against a configured MLnode. For each model: InferenceUp + load-time measurement; then InferenceHealth probe; then Stop. Returns a TestResult with per- model load times and overall pass/fail. Per-node mutex prevents concurrent tests against the same node (returns 409). The broker is not involved — the result is reported back to the caller, not written to broker state. 2) `api selfcheck` subcommand Builds a real broker with a MockChainBridge (implements broker.BrokerChainBridge with hardcoded single-participant/single- model epoch data) and a MockClientFactory for MLnode I/O. The EventDriver advances the phase tracker and triggers the same broker commands OnNewBlockDispatcher would normally queue. The Evaluator inspects broker state via the public GetNodes API and produces a Report (PASS/FAIL per stage). Exits 0 on PASS, 1 on FAIL; the report goes to stderr. Initial stages: - node-registered - epoch-models-populated (chain bridge wiring works end-to-end) - hardware-diff-submitted (broker reports hardware on startup) Full PoC artifact protocol simulation is intentionally out of scope; the foundation is here for follow-up PRs to add more stages (StartPoC -> POC transition, validation, inference resume). Tests: - admin/mlnode_tester_test.go covers NodeNotFound, success, model-load failure, concurrent rejection. - selfcheck/run_test.go runs the full selfcheck and asserts PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds onboarding UX improvements and a standalone “selfcheck” mode to reduce operator confusion during the PoC wait period and validate broker wiring without modifying production broker paths.

Changes:

Introduces a selfcheck package that runs the real broker against a mocked chain and reports PASS/FAIL with staged diagnostics.
Enhances the admin /nodes response with derived onboarding/timing fields and adds a manual node test endpoint.
Adds a participant ActivityTracker to cache “active set” membership and reduces noisy broker logging during non-active onboarding.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
proposals/onboarding-clarity-v1/README.md	Product proposal describing onboarding UX + testing goals.
decentralized-api/selfcheck/run.go	Implements selfcheck runner wiring a real broker to mock chain/events.
decentralized-api/selfcheck/run_test.go	Adds regression test ensuring `selfcheck.Run` passes on the synthetic setup.
decentralized-api/selfcheck/mock_chain_bridge.go	Mocks `BrokerChainBridge` with single model/node/participant and captures hardware diffs.
decentralized-api/selfcheck/event_driver.go	Drives synthetic chainphase transitions and broker commands.
decentralized-api/selfcheck/evaluator.go	Evaluates broker state via public APIs and produces a staged report.
decentralized-api/selfcheck/config.go	Creates temp-file-backed config manager for selfcheck isolation.
decentralized-api/participant/activity_tracker.go	Adds cached “participant active” checker refreshed in background.
decentralized-api/main.go	Adds `selfcheck` CLI mode and wires `ActivityTracker` into admin server startup.
decentralized-api/internal/server/admin/server.go	Extends admin server with phase tracker, activity tracker, and ML node tester + route.
decentralized-api/internal/server/admin/server_test.go	Updates server test setup for new `NewServer` parameters.
decentralized-api/internal/server/admin/node_handlers.go	Enriches `/nodes` response with onboarding status and adds `/nodes/:id/test`.
decentralized-api/internal/server/admin/onboarding_state_manager.go	Adds onboarding state derivation helpers and enums.
decentralized-api/internal/server/admin/onboarding_state_manager_test.go	Adds unit tests for onboarding state derivation and messages.
decentralized-api/internal/server/admin/timing_calculator.go	Adds timing derivation (blocks/seconds to next PoC) + duration formatting.
decentralized-api/internal/server/admin/status_reporter.go	Builds human-facing MLnode/participant guidance strings.
decentralized-api/internal/server/admin/mlnode_tester.go	Implements synchronous node validation (model load + health) with concurrency guard.
decentralized-api/internal/server/admin/mlnode_tester_test.go	Adds tests for MLNodeTester success/failure/concurrency.
decentralized-api/broker/node_worker_commands.go	Downgrades a noisy “no epoch models” log from Error to Info with context.
decentralized-api/apiconfig/constants.go	Adds shared timing constants used by admin UX and selfcheck.

+	var seconds int64
+	if timing != nil {
+		seconds = timing.SecondsUntilNextPoC
+	}
+
+	testFailed := n.State.FailureReason != "" &&
+		n.State.CurrentStatus == types.HardwareNodeStatus_FAILED
+	mlState, _ := DeriveMLNodeState(OnboardingStateInputs{
+		ParticipantActive:   active,
+		IsTesting:           false,
+		TestFailed:          testFailed,
+		SecondsUntilNextPoC: seconds,
+	})
+
+	var userMsg, guidance string
+	if active {
+		userMsg = BuildMLNodeMessage(mlState, seconds, "")
+		guidance = BuildParticipantMessage(participantState)
+	} else {
+		userMsg = BuildParticipantMessage(participantState)
+		guidance = BuildInactiveGuidance(seconds)
+	}


+	if os.Getenv("ENFORCED_MODEL_ID") == "" {
+		os.Setenv("ENFORCED_MODEL_ID", "disabled")


+	}
+}
+
+// staticParticipant is a CurrenParticipantInfo whose values never


+// do automatically.
+func BuildInactiveGuidance(secondsUntilNextPoC int64) string {
+	if secondsUntilNextPoC > apiconfig.AutoTestMinSecondsBeforePoC {
+		return "MLnode will be tested automatically when there is more than 1 hour until next PoC"


+	for modelId, modelCfg := range cfg.Models {
+		modelStart := time.Now()
+		if err := client.InferenceUp(ctx, modelId, modelCfg.Args); err != nil {
+			result.Status = TestFailed
+			result.FailingModel = modelId
+			result.Error = err.Error()
+			result.LoadMs[modelId] = time.Since(modelStart).Milliseconds()
+			return result
+		}
+		result.LoadMs[modelId] = time.Since(modelStart).Milliseconds()
+	}


+			// Info, not Error: this state is normal when the participant
+			// has not yet joined the active set for the current epoch.
+			// A genuine misconfiguration shows up downstream as a
+			// FAILED node status; logging at Error here only spams
+			// healthy onboarding flows.
+			logging.Info(result.Error+" (participant may not be active in current epoch)", types.Nodes, "node_id", worker.nodeId)


Trim and reorganize dev_notes/onboarding_clarity_changes.md into three parts: requirements (Issue gonka-ai#326 + proposal), what was done (gonka-ai#1261 baseline + our additions), and gaps vs the requirements with brief reasons. Records that notifying the MLnode of TEST_FAILED is intentionally not done (would require MLnode server changes the proposal says aren't needed). Plain text and tables only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add dev_notes/onboarding_clarity_pr_description.md — a ready-to-paste PR body modeled on gonka-ai#1261 (Summary / Why-not-gonka-ai#866 table / Test plan / Follow-ups), covering the full feature set on this branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- dev_notes/onboarding_clarity_changes.md: requirements (Issue gonka-ai#326 + proposal), what was done (gonka-ai#1261 baseline + our additions), and gaps with brief reasons. - dev_notes/onboarding_clarity_pr_description.md: a ready-to-paste PR body (Summary / Why-not-gonka-ai#866 / Test plan / Follow-ups), with the test plan reflecting local verification and the real-MLnode gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Ryanchen911 and others added 2 commits May 27, 2026 18:11

Copilot AI review requested due to automatic review settings May 27, 2026 10:25

Copilot AI reviewed May 27, 2026

View reviewed changes

tcharchian linked an issue May 27, 2026 that may be closed by this pull request

[P2] Improve onboarding experience #326

Open

4 tasks

tcharchian requested review from DimaOrekhovPS and patimen May 27, 2026 23:07

tcharchian added this to the v0.2.14 milestone May 27, 2026

This was referenced Jun 3, 2026

[P2] Onboarding Clarity: handler-layer UX + selfcheck, Dima's approach (supersedes #1261, #866) #1292

Closed

[P2] Onboarding Clarity: handler-layer UX + selfcheck (supersedes #1261, #866) #1296

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P2] onboarding clarity — Dima's approach (supersedes #866)#1261

[P2] onboarding clarity — Dima's approach (supersedes #866)#1261
Ryanchen911 wants to merge 2 commits into
gonka-ai:upgrade-v0.2.13from
Ryanchen911:jf/onboarding_clarity_v2_dima_approach

Ryanchen911 commented May 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if os.Getenv("ENFORCED_MODEL_ID") == "" {
		os.Setenv("ENFORCED_MODEL_ID", "disabled")

Conversation

Ryanchen911 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this PR instead of #866

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ryanchen911 commented May 27, 2026 •

edited

Loading