Skip to content

Commit a971fa1

Browse files
author
NightCrawler
committed
fix: re-enable legacy metrics reporter for audit bootstrap
The audit module's epoch-end recovery requires peer observations from active probers. When the module was first activated on testnet, all supernodes running v2.4.5-testnet had already been POSTPONED by the legacy staleness handler (they stopped submitting MsgReportSupernodeMetrics ~500 blocks after upgrading, before the chain upgrade). This created a deadlock: - Recovery needs peer observations from active probers - No active probers exist (empty active_supernode_accounts in every anchor) - POSTPONED SNs submit epoch reports but cannot recover - The 3 SNs on old releases bounce ACTIVE↔POSTPONED via legacy metrics but are always POSTPONED at epoch start (anchor freeze time) Fix: run the legacy metrics reporter alongside the audit host_reporter. Legacy MsgReportSupernodeMetrics recovers POSTPONED SNs to ACTIVE mid-epoch. Since they also submit audit epoch reports, the audit EndBlocker won't re-postpone them (report exists, host minimums are disabled, no peer-port streak). They survive the epoch end and appear ACTIVE in the next epoch anchor, bootstrapping the peer-observation cycle for all remaining POSTPONED SNs. Once the active set stabilizes, the legacy reporter can be removed in a future release.
1 parent 5517a57 commit a971fa1

File tree

1 file changed

+18
-16
lines changed

1 file changed

+18
-16
lines changed

supernode/cmd/start.go

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,7 @@ import (
2626
hostReporterService "github.com/LumeraProtocol/supernode/v2/supernode/host_reporter"
2727
statusService "github.com/LumeraProtocol/supernode/v2/supernode/status"
2828
storageChallengeService "github.com/LumeraProtocol/supernode/v2/supernode/storage_challenge"
29-
// Legacy supernode metrics reporter (MsgReportSupernodeMetrics) has been superseded by
30-
// epoch-scoped audit reporting in `x/audit`.
31-
// supernodeMetrics "github.com/LumeraProtocol/supernode/v2/supernode/supernode_metrics"
29+
supernodeMetrics "github.com/LumeraProtocol/supernode/v2/supernode/supernode_metrics"
3230
"github.com/LumeraProtocol/supernode/v2/supernode/transport/gateway"
3331
cascadeRPC "github.com/LumeraProtocol/supernode/v2/supernode/transport/grpc/cascade"
3432
server "github.com/LumeraProtocol/supernode/v2/supernode/transport/grpc/status"
@@ -173,18 +171,22 @@ The supernode will connect to the Lumera network and begin participating in the
173171
logtrace.Fatal(ctx, "Failed to initialize host reporter", logtrace.Fields{"error": err.Error()})
174172
}
175173

176-
// Legacy on-chain supernode metrics reporting has been superseded by `x/audit`.
177-
// metricsCollector := supernodeMetrics.NewCollector(
178-
// statusSvc,
179-
// lumeraClient,
180-
// appConfig.SupernodeConfig.Identity,
181-
// Version,
182-
// kr,
183-
// appConfig.SupernodeConfig.Port,
184-
// appConfig.P2PConfig.Port,
185-
// appConfig.SupernodeConfig.GatewayPort,
186-
// )
187-
// logtrace.Info(ctx, "Metrics collection enabled", logtrace.Fields{})
174+
// Legacy on-chain supernode metrics reporting (MsgReportSupernodeMetrics)
175+
// runs alongside the audit epoch reporter. It is needed to recover
176+
// POSTPONED supernodes via the supernode module's instant-recovery
177+
// path so they appear ACTIVE in the next epoch anchor — which
178+
// bootstraps the audit peer-observation cycle.
179+
metricsCollector := supernodeMetrics.NewCollector(
180+
statusSvc,
181+
lumeraClient,
182+
appConfig.SupernodeConfig.Identity,
183+
Version,
184+
kr,
185+
appConfig.SupernodeConfig.Port,
186+
appConfig.P2PConfig.Port,
187+
appConfig.SupernodeConfig.GatewayPort,
188+
)
189+
logtrace.Info(ctx, "Legacy metrics collection enabled (audit bootstrap)", logtrace.Fields{})
188190

189191
// Storage challenge history DB (shared by the gRPC handler and runner).
190192
historyStore, err := queries.OpenHistoryDB()
@@ -253,7 +255,7 @@ The supernode will connect to the Lumera network and begin participating in the
253255
// Start the services using the standard runner and capture exit
254256
servicesErr := make(chan error, 1)
255257
go func() {
256-
services := []service{grpcServer, cService, p2pService, gatewayServer, hostReporter}
258+
services := []service{grpcServer, cService, p2pService, gatewayServer, hostReporter, metricsCollector}
257259
if storageChallengeRunner != nil {
258260
services = append(services, storageChallengeRunner)
259261
}

0 commit comments

Comments
 (0)