devshard v2 (v0.2.13-devshard-v2) by a-kuprin · Pull Request #1289 · gonka-ai/gonka

a-kuprin · 2026-06-01T15:14:46Z

This PR prepares the devshard v2 release.

This is the first devshard-only upgrade, which operates independently of usual chain upgrades. Once approved, v2 will run in parallel with the existing v1 devshard runtime.

See the upgrade design doc and the versioned/ package for details.

Upgrade process

Release the devshardd binary as a Gonka release artifact
Submit a governance proposal to register the new supported version in DevshardEscrowParams.approved_versions (defining the name, binary download URL, and sha256 hash)
If the proposal is approved, versiond automatically downloads the binary and serves it under the /devshard/v2 prefix
Once /devshard/v2 is available, contributors can test it before gateways switch primary traffic to v2

No manual host steps are expected during this type of upgrade.

devshard

Prune old epoch storage on epoch changes, move SQLite/Postgres schema setup out of hot paths, and select exactly one storage backend per process
Remove the seed reveal round, seal completed inference stats, and prune payloads so long-running sessions do not keep all served inferences in RAM or state
Re-gossip stale MsgFinishInference transactions so the sequencer can pick them up from another host's mempool
Enforce the governance-controlled maximum nonce limit on hosts to reject invalid requests before settlement
Separate devshard runtime version from state-root protocol version and stamp protocol v2 at build time
Create sessions from on-chain escrow fee snapshots and runtime config instead of hardcoded values (with direct chain fallback until mainnet has the matching NodeManager runtime-config endpoint)
Store per-inference validation counters outside the state root in SQLite/Postgres and expose per-slot totals through devshard stats endpoints after inference pruning
Add internal devshard traces and metrics through OpenTelemetry and Prometheus
Return typed devshard errors for disabled, initializing, and non-retryable states instead of generic failures

decentralized-api

The changes in the decentralized-api/ module are fully backward compatible and do not need to be activated before the next mainnet release.

Serve chain-backed devshard runtime config through the NodeManager GetRuntimeConfig gRPC long-poll
Add dapi traces and metrics for public inference requests, event listening, validation, chain queries, transaction broadcasts, and ML node calls
Propagate trace context across executor forwarding, validation payload fetches, and ML node calls

inference-chain

The changes in the inference-chain/ module are wire-compatible and do not need to be activated before the next mainnet release.

Rename the version field to state_root_and_protocol_version in the devshard settlement message proto
Move devshard session timeouts, fees, validation rates, vote threshold factor, and grace periods to governance-controlled DevshardEscrowParams
Add create_devshard_fee and fee_per_nonce to DevshardEscrow to snapshot active fees at escrow creation

deploy

Add join-stack observability with Grafana, Jaeger, Prometheus, Loki, Promtail, and cAdvisor
Add dashboards for devshard sessions, chain health, query latency, storage, containers, and node health

Proposed Bounties

Bounty ID	Sum USDT	Bounty Explanation	GitHub ID
PR #1114, PR #1115	3000	Certik security audit fixes (GEB-62, GEB-59, GEB-60), reported in Issue #1109	@x0152
Issue #1135	30000	PoC Decode. So far, PoC validation has only covered the prefill step, but most of the real computation in inference happens during decode, which goes unverified. PoC-decode extends it to every decode step, so a node running a different/cheaper model gets caught. It closes the biggest open gap in the network's PoC validation mechanism. spec	Axel-t
PR #1035	100	fix(subnetctl): propagate fatal HTTP errors instead of waiting on timeout	@unameisfine
PR #1298	17000	Devshard 0.2.13 v2 - release implementation and management	@akup
PR #1046	4000	Observability implementation	@qdanik
PR #1046	2000	Observability implementation	@blizko
branch	7000	Emergency troubleshooting	@qdanik
--	3000	Gateway - implementation work	@qdanik
report	7000	Emergency troubleshooting - schema bomb and B200 investigation	kaitaku.ai
MiniMax, Additional benchm	10000	MiniMax integration + post-deploy bug-fixing + additional benchmarks + community FAQ	kaitaku.ai
Issue #1026	5000	VLM inference and validation in Gonka - testing VLM serving validation and adding the necessary tools/scripts (inference + validation for visual language models, threshold calibration across honest/fraud scenarios)	@fedor-konovalenko, MIL team
Issue #34	5000	TOPLOC as a validation mechanism. Evaluated using topic to reduce artifact size. The original paper reported near-100% accuracy, but only on small models (Llama-8B); Experiment results matched the paper for small models, while accuracy dropped on large models (235B).	@fedor-konovalenko, MIL team
docs#1093, docs#1134, docs#992, docs#1094	500	docs: restructure governance section and expand guidance; add MiniMax-M2.7 and Kimi K2.6 model licenses; update host hardware specifications	@Dolper

Co-authored-by: Cursor <cursoragent@cursor.com>

Sets DevshardEscrowParams.MaxEscrowsPerEpoch to 500_000.

Skip startup only when the port is set negative; treat 0 as unset and fall back to 9400. Wire the same default into the join compose file via NODE_MANAGER_GRPC_PORT so devshard reaches the API without manual config.

A participant restored to ACTIVE inherited the prior ConsecutiveInvalidInferences, so a single new failure could re-invalidate them immediately. Zero the counter when transitioning to INVALID and at every upcoming-to-effective promotion.

Replace the hardcoded keeper.DevshardMaxNonce constant with a governance parameter on DevshardEscrowParams. VerifyDevshardSettlement now receives the bound from params; the settle msg server reads it before verifying. The v0.2.13 upgrade handler raises MaxNonce to 1_000_000 and bundles the existing MaxEscrowsPerEpoch=500_000 bump into the same step.

…2.13 v0.2.12 added MsgRespondDealerComplaints to InferenceOperationKeyPerms but did not migrate existing cold-to-warm grants, leaving pre-v0.2.12 DAPIs unable to respond to dealer complaints. Walk authz grants, key each pair off its MsgStartInference grant, and add the missing authorization with the source grant's expiration. Idempotent.

Wire CreateUpgradeHandler with InferenceKeeper and AuthzKeeper so the chain runs the v0.2.13 migrations at the upgrade height. No module ConsensusVersion bump: the handler edits existing collections, no inference store schema change.

# Devshard storage: Postgres backend + epoch pruning Drop-in replacement for the unbounded single-file SQLite store on `main`. SQLite-only deployments need no config change; new binaries auto-migrate the legacy DB on first boot. ## Architecture ``` HostManager -> ManagedStorage // 30s pruner, retain N=3 epochs -> SQLite // PGHOST unset -> HybridStorage // PGHOST set -> Postgres // primary, sticky per-escrow -> SQLite // local fallback while PG is down ``` Storage is partitioned by `epoch_id` (= `DevshardEscrow.epoch_index`): - Postgres: `devshard_sessions`, `devshard_diffs`, `devshard_signatures` each `PARTITION BY RANGE (epoch_id)`. Partitions are created lazily; pruning is `DROP TABLE`. - SQLite: one `epoch_<N>.db` per epoch plus a `_meta.db` routing index; pruning closes the pool and removes the file. - Hybrid: per-escrow stickiness keeps a session on one backend. `ManagedStorage` ticks every 30s, computes `cutoff = max_observed_epoch + 1 - retain`, and prunes everything older. An `EpochProvider` advances the cutoff on quiet hosts. ## Drop-in guarantees - `PGHOST` unset -> SQLite-only, identical to before. - `PGHOST` set -> hybrid mode, same env vars as `payloadstorage`. - Legacy `/root/.dapi/data/devshard.db` is migrated to `/root/.dapi/data/devshard/` on first boot, then renamed `*.migrated.<unix>`. Idempotent across restarts. - Per-host storage. No schema, proto, HTTP, or gossip changes. ## Tradeoffs For simplicity, partitioning is by `epoch_id` only, not `(epoch_id, escrow_id)`. Loading a session reads its diffs from the shared epoch partition (indexed on `escrow_id`). The next step is per-escrow state snapshots (data + additions) so readers skip the diff replay.

…poch Reuses the v0.2.10 grace-epoch primitive with UpgradeProtectionWindow=3000.

The pruning test queried latestEpoch at the very end and asserted that its session partition existed. But the advance-epochs loop exits via waitForNextEpoch after the last write, so by the time the assertion runs the chain's current epoch has no devshard activity and therefore no partition. Capture the epochIndex of the last tick's escrow during the loop and assert against that partition instead.

Problem: API startup waited for devshard legacy migration and full session replay before starting the ML/admin servers. On large devshard state this delayed port 9100 by minutes even though most endpoints did not need recovered devshard sessions. Solution: Gate devshard session routes with a 503 initializing response, run legacy migration in the background, then mark devshard ready and recover sessions asynchronously. Requests after migration still lazily recover a single escrow before serving it. Flow: startup -> register gated routes -> start servers -> migrate legacy DB -> mark ready -> background recovery request -> ready? no -> 503 initializing request -> ready? yes -> session cached? yes -> serve request -> ready? yes -> session cached? no -> recover escrow -> serve

* devshard snapshots for hosts * devshards recoversessions parallel workers * devshard host snapshot on settlement --------- Co-authored-by: David and Daniil Liberman <da@liberman.net>

a-kuprin · 2026-06-09T10:01:37Z

Added #1326 that fixes found issue:

Hosts could diverge from the user on SealedAcc / post_state_root because sealing used a wall-clock grace gate outside the signed diff

* Move devshard inference sealing into deterministic state-machine auto-seal. Host-local wall-clock prune tiers made seal timing node-dependent and risked diverging state roots. Fold eligible inferences during diff apply using nonce and ConfirmedAt-derived state clock gates, and have the host emit payload-prune events only after the machine seals them. * Added short path for sealing inference: if inference is validated/invalidated don't wait grace period and seal it immidiately. Additional check before sealing inference has one of following statuses: StatusFinished, StatusValidated, StatusInvalidated, StatusTimedOut --------- Co-authored-by: akup <ak@neonavigation.com>

…ead code

0xMayoor · 2026-06-11T13:01:40Z

devshardAssignedUpperBoundForSlot (devshard_settlement.go) is documented as "the maximum number of inference IDs that could have been assigned to a slot" — an upper bound, 1 + (nonce-firstAssigned)/slotCount. but the settle handler uses it as the actual completed count: assignedToSlot, _ := devshardAssignedUpperBoundForSlot(msg.Nonce, ...) → AggregateDevshardHostStatsIntoCurrentEpochStats(participant, *hs, assignedToSlot), which credits completed = assignedPerSlot - missed straight into CurrentEpochStats.InferenceCount. so the credited inference count comes from the settlement nonce, not from work the hosts actually attested.

the nonce isn't bound to real work. in applyCore (devshard/state/machine.go) an empty diff (or MsgFinalizeRound) advances LatestNonce with no StartInference, and the per-nonce fee is only charged in the Active phase — so once you're in Finalizing/Settlement you can advance the nonce up to the max for free. the new host-side max-nonce limit caps the magnitude (~MaxNonce/groupSize per slot, ~1250 at the defaults) but doesn't change that the count is decoupled from work. hosts still sign those empty roots — the only acceptance checker withholds on a stale mempool, not on an inference-less diff — and HostStats.Missed/Cost stay 0 since nothing finished or timed out. so an all-zero HostStats settlement at a high nonce is a valid quorum-signed payload, and each occupied slot's participant gets credited ~1250 "completed".

that's the same counter the downtime punishment reads (accountsettle.go, total = InferenceCount + MissedRequests). a participant who's genuinely down — say 50 served / 50 missed, normally zeroed by MissedStatTest — can settle one max-nonce escrow, fabricate ~1250 completed, drop their apparent miss-rate under p0, and keep the full reward. the same counters also feed getDynamicP0, so a large zero-missed contribution pulls the network-wide baseline down and tightens p0 for everyone.

create/settle is permissionless by default (AllowedCreatorAddresses empty) and slots are sampled from the epoch group, so any active participant can land a slot — one is enough. i have a small go test that runs the real devshardAssignedUpperBoundForSlot → AggregateDevshardHostStatsIntoCurrentEpochStats → CheckAndPunishForDowntime path and shows that same 50/50 participant flip from reward 0 to full reward; happy to share.

not prescribing a fix since that's your design, but the root is using the nonce-derived upper bound as the actual completed count — binding the credit to signed per-slot completed work (or cross-checking against Cost/validations at settle) would close it.

0xMayoor · 2026-06-11T13:08:18Z

two more verification gaps in the v2 runtime this PR ships — both the same "sibling verifies, twin doesn't" shape, and i've got fixes open against main for each:

fetchSignature (devshard/user/session.go) stores the bytes a host returns from GET /signatures keyed by slot, with only a slot-ownership check and no RecoverAddress — so a host can hand back arbitrary bytes that then get counted toward quorum. its sibling processResponse recovers and matches the address before storing. fix: #1311.

HandleGossipTxs (devshard/transport/server.go) forwards gossiped txs into the mempool after only a group-membership check, with no per-tx proposer-sig verification — so a group member can inject forged txs the host then trusts (e.g. a forged validation vote that suppresses the host's own validation via the mempool oracle). its sibling HandleGossipNonce does RecoverAddress + slot match before storing. fix: #1312.

both are still present on devshard-0.2.13-v2 at the current head — flagging here since they ride along in the code under review.

a-kuprin · 2026-06-11T15:34:03Z

@0xMayoor

I've seen both and they are candidates for next release in 1 or 2 weeks. We just need to make this release finite

* Parameters naming and inferenceSealGraceNonce, inferenceSealGraceTimeout moved to EscrowStart * Don't seal inferences when stateClock is undefined (no confirmedAt value in latest inferences)

It is at escrow start message and unchangable during escrow session Default is 150. It is required for e2e testermint test pass. That test checking autodealing works

blizko · 2026-06-12T13:44:22Z

@@ -4,7 +4,7 @@ go 1.24.2

 replace (
 	cosmossdk.io/store => github.com/gonka-ai/cosmos-sdk/store v1.1.2-ps1
-	github.com/cosmos/cosmos-sdk => github.com/gonka-ai/cosmos-sdk v0.53.3-ps17
+	github.com/cosmos/cosmos-sdk => github.com/gonka-ai/cosmos-sdk v0.53.3-ps17-observability


Are we planning to make this include as a stable version, instead of a feature branch?

blizko · 2026-06-12T13:44:48Z

@@ -788,8 +790,8 @@ github.com/golangci/revgrep v0.5.3 h1:3tL7c1XBMtWHHqVpS5ChmiAAoe4PF/d5+ULzV9sLAz
 github.com/golangci/revgrep v0.5.3/go.mod h1:U4R/s9dlXZsg8uJmaR1GrloUr14D7qDl8gi2iPXJH8k=
 github.com/golangci/unconvert v0.0.0-20240309020433-c5143eacb3ed h1:IURFTjxeTfNFP0hTEi1YKjB/ub8zkpaOqFFMApi2EAs=
 github.com/golangci/unconvert v0.0.0-20240309020433-c5143eacb3ed/go.mod h1:XLXN8bNw4CGRPaqgl3bv/lhz7bsGPh4/xSaMTbo2vkQ=
-github.com/gonka-ai/cosmos-sdk v0.53.3-ps17 h1:xw8ssDJDfl+/TnD9QMq/EZGzjnoh+6cvROqZE/MwNzU=
-github.com/gonka-ai/cosmos-sdk v0.53.3-ps17/go.mod h1:90S054hIbadFB1MlXVZVC5w0QbKfd1P4b79zT+vvJxw=
+github.com/gonka-ai/cosmos-sdk v0.53.3-ps17-observability h1:vWph4b1Xzvwj9jV3BVD6RXQLqRmCsGNyPAxePlFIU0Q=


Are we planning to make this include as a stable version, instead of a feature branch?

stable version, not a feature branch.
Do you have any concerns on this?

The naming v0.53.3-ps17-observability breaks semantic versioning.

a-kuprin · 2026-06-12T14:18:59Z

@0xMayoor

so the credited inference count comes from the settlement nonce, not from work the hosts actually attested.

Basically in devshard nonceId == inferenceId, but you are right on that there is service nonces like one carrying MsgFinalizeRound.
devshard is designed to serve a lot of inderences, so this doesn't break the stats.

But again you are right that we should add - 1

0xMayoor · 2026-06-12T17:01:58Z

yeah fair @a-kuprin , the active-phase fee bounds it so it's not free like i implied, my bad.
the gap's bigger than -1 though — once finalizing starts the nonce keeps advancing with no fee till
LatestNonce >= FinalizeNonce +len(Group), so it's the whole finalize window not one service nonce.
and that count lands in CurrentEpochStats.InferenceCount which feeds the downtime punishment denom and the dynamicP0 baseline, so it shifts the miss-rate test a bit, not just a display stat.
might be small in normal runs, you'd know better — figured worth subtracting the window not just 1.

a-kuprin · 2026-06-13T09:38:17Z

yeah fair @a-kuprin , the active-phase fee bounds it so it's not free like i implied, my bad. the gap's bigger than -1 though — once finalizing starts the nonce keeps advancing with no fee till LatestNonce >= FinalizeNonce +len(Group), so it's the whole finalize window not one service nonce. and that count lands in CurrentEpochStats.InferenceCount which feeds the downtime punishment denom and the dynamicP0 baseline, so it shifts the miss-rate test a bit, not just a display stat. might be small in normal runs, you'd know better — figured worth subtracting the window not just 1.

Could you please dig it deeper and prepare PR with a fair fix?

0xMayoor · 2026-06-13T18:59:11Z

ok dug in, it's bigger than the -1 we landed on — a node that's genuinely down keeps its full epoch reward.

empty diffs are the lever: they bump the nonce with no inference in them, so settlement credits the slot ~nonce/groupsize "completed" off pure nonce while Missed stays 0. run it to max, settle all-zero hoststats, and a 50-done/50-missed node that should be zeroed gets buried under ~1250 fake completed. no collusion needed either — stale mempool is the only thing that makes a host withhold its sig, and an empty session never has pending txs, so honest hosts sign it fine. costs ~2e7 in per-nonce fees, nothing next to the reward it saves.

and it beats both downtime gates, not just the epoch one — the per-block slashing check too. devshard misses only hit that SPRT batched at settlement, so front-load the empty one and it never trips. inactive = zero reward, so that's the part that actually keeps the money. all confirmed with PoCs on the v2 head.

finalize-window subtraction won't close it btw — the empty active nonces survive that, the count has to come from real work not the nonce.
PR coming. @a-kuprin

a-kuprin · 2026-06-14T08:21:03Z

@0xMayoor

empty diffs are the lever

Actually empty diff is very undesirable by protocol and normally shouldn't appear. The another attack with empty diffs is skipping slots and send real work only to some choosen host in devshard.

There are still a lot of work on devshards to make it stable on protocol level. I don't think we need to push it right now, as devshard now is primarily used for gateway stabilisation.
But the intent is that every nonce should carry inference, so I think settlement solutions was done keeping this intent in mind, as anyway this host skipping attack should be prevented by protocol.

So the main thing here is that empty diff is not what target version of protocol expects, but current state is mostly experimental for gateway purposes, and every gateway currently is from white list and is trusted

Of course we can add real inference counting, but this should also be added to settlement message.

From one side of view we will be changing part of protocol to legitimate host-skip attack I've described. But we anyway need legal way to skip inferences for some hosts (for example during cPoC see https://github.com/a-kuprin/gonka/blob/devshard-testenv/devshard/docs/proposals/CPOC_PROTOCOL.md)

So adding real inference count to settlement message is what we should have, and is very easy to add

0xMayoor · 2026-06-14T09:19:19Z

@0xMayoor

empty diffs are the lever

Actually empty diff is very undesirable by protocol and normally shouldn't appear. The another attack with empty diffs is skipping slots and send real work only to some choosen host in devshard.

There are still a lot of work on devshards to make it stable on protocol level. I don't think we need to push it right now, as devshard now is primarily used for gateway stabilisation. But the intent is that every nonce should carry inference, so I think settlement solutions was done keeping this intent in mind, as anyway this host skipping attack should be prevented by protocol.

So the main thing here is that empty diff is not what target version of protocol expects, but current state is mostly experimental for gateway purposes, and every gateway currently is from white list and is trusted

Of course we can add real inference counting, but this should also be added to settlement message.

From one side of view we will be changing part of protocol to legitimate host-skip attack I've described. But we anyway need legal way to skip inferences for some hosts (for example during cPoC see https://github.com/a-kuprin/gonka/blob/devshard-testenv/devshard/docs/proposals/CPOC_PROTOCOL.md)

So adding real inference count to settlement message is what we should have, and is very easy to add

yeah, attested per-slot count in the settlement message is the move — only thing i'd watch is it rides in the signed host_stats, not a value the settler passes in, otherwise you've just moved the trust. nonce stays as the cap.
you tackling the pr or should i?

Ryanchen911 · 2026-06-15T07:43:21Z

I found three settlement issues in devshard_settlement.go — seems all can lock or drain escrow funds

Protocol tag validated against the binary-version allowlist (:103 → :33)
msg.StateRootAndProtocolVersion (the protocol tag, constant "v2" from domain.go:15) is checked against params.ApprovedVersions[].Name (versiond binary names). These are different concepts — your protocol-version.md says "do not assume it must equal an approved_versions.name entry." It only passes today because the allowlist is empty (len(approved)==0 → nil) or happens to contain "v2". Once approved_versions holds anything else (legacy {name:"v1"}, or a future {name:"v3"} bugfix binary with no protocol bump), every settlement is rejected, escrow.Settled is never set, and funds lock.
Fix: drop the check (the tag is already bound into the state root via versionHash + quorum sigs), or validate against a separate protocol-version list.
msg.Nonce checked against live MaxNonce, not a per-escrow snapshot (:95)
Fees are snapshotted at creation (create_devshard_fee/fee_per_nonce), but max_nonce is read live. A session created at max_nonce=1000 that legitimately ran to nonce 800 becomes unsettleable if governance later lowers max_nonce to 500 (800 > 500 → reject) → funds locked/forfeited.
Fix: snapshot max_nonce onto the escrow at creation and validate msg.Nonce against the snapshot.
Snapshotted fee schedule is never enforced against msg.Fees (:209)
The only check is totalCost + msg.Fees ≤ escrow.Amount; msg.Fees is never compared to create_devshard_fee + fee_per_nonce * msg.Nonce. So a colluding/buggy host quorum can sign a settlement with Fees inflated up to the full escrow, overpaying validators and eating the creator's refund. The snapshotted schedule is currently decorative.
Fix: compute expectedFees = create_devshard_fee + fee_per_nonce * msg.Nonce (with overflow checks) and require msg.Fees == expectedFees.

x0152 · 2026-06-15T13:30:33Z

+	sm.mu.Lock()
+	defer sm.mu.Unlock()
+
+	if err := sm.inferenceStore.DeleteSealedInferences(sm.state.EscrowID); err != nil {


I might be missing some context, but why do we prune sealed tables at startup?
This might cause data loss after restart in some scenarios

It is the session recovery.

The source of truth is diffs here not the local data. Also sealed inferences are only for observability

Not a blocker

interface.go:84 says these counters survive restarts, but sealed ones don't: recovery calls DeleteSealedInferences() (clears sealed_validation_obs) and never rebuilds them from diffs. So a session's sealed validation data is lost after restart

x0152 · 2026-06-15T13:48:50Z

DB write error here can mutate state but drop tx from diff which may cause checksum mismatches and reject diffs

Note: I don't know why, but github doesn't show the exact line

var applied []*types.DevshardTx for _, tx := range txs { if err := sm.applyTx(tx); err != nil { if tx.GetStartInference() != nil { sm.restoreMutable(snap) return nil, nil, fmt.Errorf("mandatory start inference: %w", err) } continue // <-- } applied = append(applied, tx) ...

valid concern

x0152 · 2026-06-15T14:00:03Z

 		SELECT c.relname
 		FROM pg_class c
 		JOIN pg_inherits i ON i.inhrelid = c.oid
 		JOIN pg_class p ON p.oid = i.inhparent
-		WHERE p.relname IN ('devshard_sessions', 'devshard_diffs', 'devshard_signatures', 'devshard_snapshots')


ensurePartition creates 8 partitions per epoch, but this pruneBefore query only lists 5 parents - leading to unbounded storage growth over time

valid, should be fixed

x0152 · 2026-06-15T14:44:35Z

+		return c.JSON(http.StatusOK, []prometheusTargetGroup{})
+	}
+
+	versions := s.configManager.GetDevshardVersions().Versions


Small fix to avoid a data race:

versions := slices.Clone(s.configManager.GetDevshardVersions().Versions)

gmorgachev · 2026-06-15T16:25:34Z

@0xMayoor @Ryanchen911 @x0152
Thanks for feedback, let's definitely include them all in next devshard release, i hope we'll have it in next 1-2 weeks

For v0.2.13-v2, created the release from the current state, going to propose it today

a-kuprin · 2026-06-15T17:14:22Z

Protocol tag validated against the binary-version allowlist (:103 → :33)
msg.StateRootAndProtocolVersion (the protocol tag, constant "v2" from domain.go:15) is checked against params.ApprovedVersions[].Name (versiond binary names). These are different concepts — your protocol-version.md says "do not assume it must equal an approved_versions.name entry." It only passes today because the allowlist is empty (len(approved)==0 → nil) or happens to contain "v2". Once approved_versions holds anything else (legacy {name:"v1"}, or a future {name:"v3"} bugfix binary with no protocol bump), every settlement is rejected, escrow.Settled is never set, and funds lock.
Fix: drop the check (the tag is already bound into the state root via versionHash + quorum sigs), or validate against a separate protocol-version list.

Conversation

a-kuprin commented Jun 1, 2026 • edited by mtvnastya Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Upgrade process

devshard

decentralized-api

inference-chain

deploy

Proposed Bounties

Uh oh!

a-kuprin commented Jun 9, 2026

Uh oh!

0xMayoor commented Jun 11, 2026

Uh oh!

0xMayoor commented Jun 11, 2026

Uh oh!

a-kuprin commented Jun 11, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a-kuprin commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xMayoor commented Jun 12, 2026

Uh oh!

a-kuprin commented Jun 13, 2026

Uh oh!

0xMayoor commented Jun 13, 2026

Uh oh!

a-kuprin commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xMayoor commented Jun 14, 2026

Uh oh!

Ryanchen911 commented Jun 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a-kuprin Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

x0152 Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmorgachev commented Jun 15, 2026

Uh oh!

a-kuprin commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

a-kuprin commented Jun 1, 2026 •

edited by mtvnastya

Loading

a-kuprin commented Jun 12, 2026 •

edited

Loading

a-kuprin commented Jun 14, 2026 •

edited

Loading

a-kuprin Jun 15, 2026 •

edited

Loading

x0152 Jun 15, 2026 •

edited

Loading

a-kuprin commented Jun 15, 2026 •

edited

Loading