Skip to content
This repository was archived by the owner on Jan 27, 2026. It is now read-only.

imp(general): Orchestrator Discovery Sync, Worker Timeouts#578

Merged
JannikSt merged 2 commits into
developfrom
imp/orchestrator-discovery-sync+misc
Jun 24, 2025
Merged

imp(general): Orchestrator Discovery Sync, Worker Timeouts#578
JannikSt merged 2 commits into
developfrom
imp/orchestrator-discovery-sync+misc

Conversation

@JannikSt
Copy link
Copy Markdown
Member

This PR includes small improvements to logs that we're seeing on our testnet run.
Primarily nothing critical, just a way to get rid of some logs.

@JannikSt JannikSt requested a review from Copilot June 24, 2025 03:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refines timeout and expiry settings and enhances discovery logging in the worker and orchestrator components.

  • Increased HTTP client timeout and shortened Redis heartbeat TTL for more resilient retries.
  • Replaced hardcoded expiry values with a shared constant in the signature middleware.
  • Added a 5-minute grace period and extra compute‐spec change logging in the discovery monitor.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
crates/worker/src/operations/heartbeat/service.rs Increased client timeout from 5 s to 20 s
crates/shared/src/security/auth_signature_middleware.rs Introduced REQUEST_EXPIRY_SECS and replaced hardcoded 10 s
crates/orchestrator/src/store/domains/heartbeat_store.rs Reduced Redis expiry from 180 s to 90 s
crates/orchestrator/src/discovery/monitor.rs Added 5 min grace period before marking nodes inactive and log compute spec changes
Comments suppressed due to low confidence (2)

crates/shared/src/security/auth_signature_middleware.rs:35

  • [nitpick] Add a doc comment explaining the rationale behind the 300 s expiry (e.g., aligning rate limit windows) to clarify its intended usage.
const REQUEST_EXPIRY_SECS: u64 = 300;

crates/orchestrator/src/discovery/monitor.rs:323

  • Add unit tests covering both branches of the should_mark_inactive check (immediate ejection vs. grace-period wait) to ensure this logic behaves as intended.
                    if should_mark_inactive {

Comment thread crates/worker/src/operations/heartbeat/service.rs
Comment thread crates/orchestrator/src/store/domains/heartbeat_store.rs
Comment thread crates/orchestrator/src/discovery/monitor.rs
@JannikSt JannikSt merged commit c55590c into develop Jun 24, 2025
1 check passed
JannikSt added a commit that referenced this pull request Jun 24, 2025
* add additional status update loop metrics (#575)

* increase heartbeat timeout (#574)

* switch to sequential validation as quickfix for nonce issues (#576)

* imp(worker): host nw mode with ability to switch networking config (#577)

* host nw mode with ability to switch networking config

* imp(general): Orchestrator Discovery Sync, Worker Timeouts (#578)

* fix race condition in discovery sync, decrease heartbeat ttl, increase general auth mw request timeout

* increase heartbeat timeout

* release 0.3.7
@JannikSt JannikSt deleted the imp/orchestrator-discovery-sync+misc branch June 25, 2025 12:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants