Skip to content

Conversation

crdant
Copy link
Member

@crdant crdant commented Oct 9, 2025

TL;DR

Aligns Python SDK with available or soon to be deployed endpoints

Details

The Python SDK previously called several endpoints that don't exist in Vandoor, causing instance tracking and metrics reporting to fail. This change either

This change adopts the telemetry-based architecture used by the Go SDK. Instance metadata flows through telemetry headers to /kots_metrics/license_instance/info rather than explicit CRUD operations.

Authentication now correctly distinguishes between publishable keys (which use Bearer prefix) and service tokens (raw token without prefix), matching Vandoor's middleware expectations. Custom metrics use the proper format with {"data": {name: value}} structure and correct /application/custom-metrics endpoint.

Co-Authored-By: @jpshackelford
Co-Authored-By: Open Hands [email protected]
Co-Authored-by: Claude Code [email protected]

crdant and others added 11 commits October 9, 2025 11:03
Changes instance management to match Go SDK architecture and align with Vandoor endpoints:
- Generates deterministic instance IDs client-side from machine fingerprints (no API calls)
- Adds _report_instance() methods that send telemetry to /kots_metrics/license_instance/info
- Fixes metrics format to use {"data": {name: value}} and correct endpoint /application/custom-metrics
- Removes delete_metric(), set_status(), and set_version() methods (endpoints don't exist in Vandoor)

This fixes endpoint alignment issues between the Python SDK and Vandoor.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Removes usage of set_status() and set_version() methods that were deleted in the telemetry migration, and removes unused InstanceStatus import.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Wraps telemetry calls in try/except to prevent 401 errors from
blocking metric sends. Telemetry is optional - metrics should
still work even if instance reporting fails.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Service tokens should be sent as raw Authorization header values
(not Bearer tokens). Vandoor's PreloadKotsLicenseFromToken middleware
expects raw token values for service account lookup.

- Publishable keys: "Bearer <key>" (for customer creation)
- Service tokens: "<token>" (raw value, no Bearer prefix)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sends hostname via X-Replicated-InstanceTagData header as a tag
named "name". This allows Vandoor to display a human-readable
identifier for each instance.

- Gets hostname via socket.gethostname()
- Encodes as base64 JSON: {"name": "hostname"}
- Falls back to "unknown" if hostname unavailable
- Applied to both sync and async implementations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The instance tag data must be in the format:
{"force": bool, "tags": {"key": "value"}}

Not just the raw tags object. This matches the format expected
by Vandoor's InstanceTagData schema.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Vandoor's reporting-api automatically creates a "name" tag using
the service account name (sdk-{timestamp}). To override this with
the actual hostname, we need to use force=true in the tag data.

The reconcileTags function checks: if (!dbKeys[key] || forced)
This ensures our hostname tag replaces the auto-generated one.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Cleans up all debug logging that was added for troubleshooting
authentication and telemetry issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Missed one debug print statement in AsyncCustomerService.get_or_create.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
crdant added 7 commits October 9, 2025 15:42
- Use package metadata for version with "+python" suffix to distinguish from Go SDK
- Send SDK version via User-Agent header as "Replicated-SDK/{version}"
- Add set_status() method to allow dynamic app status changes
- Replace hardcoded "ready" status with configurable _status field
- Remove hardcoded SDK version from telemetry headers (now sent via User-Agent)
- Move __version__ import inside _build_headers() to avoid circular import
- Make AsyncInstance.set_status() async to match await usage
@crdant
Copy link
Member Author

crdant commented Oct 10, 2025

ID creation is back on the server side. I've added one of the missing endpoints with replicatedhq/vandoor#8163 and continue to use the same endpoints as the Replicated SDK for metics and status reporting.

@crdant crdant merged commit b604bcd into main Oct 13, 2025
5 checks passed
crdant added a commit that referenced this pull request Oct 14, 2025
TL;DR
-----

Implements machine fingerprint-based cluster identification that aligns with Vandoor's telemetry architecture requirements for tracking application instances across machine boundaries.

Details
--------

Generates a stable machine fingerprint using platform-specific identifiers (IOPlatformUUID on macOS, D-Bus machine-id on Linux, MachineGuid on Windows) and hashes them with SHA256 for privacy. This fingerprint initializes once at client creation and propagates to all instances, ensuring consistent cluster identification throughout the client's lifetime.

This change completes the telemetry architecture alignment started in PR #5, ensuring the Python SDK matches the behavior established by the Go SDK and meets Vandoor's data model requirements.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants