Skip to content

design(protocol): schema-first WebSocket protocol to eliminate cross-language duplication #31

@SystemicVoid

Description

@SystemicVoid

Issue

WebSocket message schemas are duplicated between Python daemon and Rust client, creating maintenance burden and risk of protocol drift.

Problem

The WebSocket protocol between parakeet-stt-daemon (Python) and parakeet-ptt (Rust) is defined independently in both codebases:

Python: parakeet-stt-daemon/src/parakeet_stt_daemon/messages.py (lines 44-191)

  • ClientMessage: StartSession, StopSession, AbortSession
  • ServerMessage: SessionStarted, FinalResult, Error, Status, InterimState, InterimText, AudioLevel, SessionEnded

Rust: parakeet-ptt/src/protocol.rs (lines 9-86)

  • Same message structures with similar field names

Key Differences:

  • Python uses datetime for timestamps, Rust uses String (RFC3339)
  • Python uses UUID type, Rust uses uuid::Uuid
  • Python uses float for numeric fields, Rust uses u64/u32
  • Python uses Literal types for validation, Rust uses serde string enums

DRY Violation

The knowledge of "the protocol contract" has no single authoritative representation. Any protocol change requires manual updates to both files, risking:

  • Type mismatches causing runtime errors
  • Field name inconsistencies
  • Inconsistent validation logic
  • Drift between implementations over time

Proposed Fix

Adopt a schema-first approach with code generation:

Option A: Protocol Buffers (protobuf)

message StartSession {
  string session_id = 1;
  string timestamp = 2;  // RFC3339
  string mode = 3;
  string preferred_lang = 4;
}

message FinalResult {
  string session_id = 1;
  string text = 2;
  uint64 latency_ms = 3;
  uint32 audio_ms = 4;
  string lang = 5;
  float confidence = 6;
}
// ... (all messages)

Option B: JSON Schema with Type Definitions

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Parakeet WebSocket Protocol",
  "definitions": {
    "StartSession": {
      "type": "object",
      "properties": {
        "type": {"enum": ["start_session"]},
        "session_id": {"type": "string", "format": "uuid"},
        "timestamp": {"type": "string", "format": "date-time"}
      }
    }
  }
}

Option C: TypeScript Interface (for documentation + validation)

interface StartSession {
  type: "start_session";
  session_id: string;  // UUID
  timestamp: string;  // ISO 8601
  mode: "push_to_talk" | "continuous";
  preferred_lang?: string;
}

Tasks

This is a significant architectural change. Tasks include:

  1. Research & Decision:

    • Evaluate code generation tools for Rust (prost, tonic, serde-codegen, etc.)
    • Evaluate code generation tools for Python (protoc, mypy-protobuf, dataclasses, pydantic)
    • Decide on schema format (protobuf, JSON Schema, TypeScript, custom)
    • Consider build system integration
  2. Implementation:

    • Create schema file in repository root (e.g., protocol/parakeet.proto)
    • Set up code generation in Python build
    • Set up code generation in Rust build
    • Generate message types for Python
    • Generate message types for Rust
    • Migrate Python daemon to use generated types
    • Migrate Rust client to use generated types
    • Remove hand-written message definitions
  3. Validation & Testing:

    • Ensure backward compatibility with existing protocol
    • Run full integration tests
    • Verify message serialization/deserialization works
    • Update documentation to reference schema file
  4. Documentation:

    • Update docs/SPEC.md to reference schema
    • Document how to add new messages
    • Document code generation process

Tradeoffs

Schema Format Pros Cons Effort
Protocol Buffers Industry standard, efficient binary, codegen mature Learning curve, build complexity, less human-readable High
JSON Schema Human-readable, Python/Rust both have good support No binary format, less performant Medium
TypeScript Great documentation, readable No native codegen, type systems differ Medium-High
Custom YAML/JSON Maximum flexibility, simple No tooling, must write own codegen Very High

Related Issues

Questions for Consideration

  1. Should we support both JSON and binary serialization formats?
  2. Should the schema include versioning for future protocol evolution?
  3. Should error codes be defined in the same schema file?
  4. How do we handle optional vs required fields in the schema?

AI-Generated Disclaimer: This issue was generated by AI analysis and may contain errors. Please thoroughly verify the findings before implementing any changes. Review the code directly and test any modifications. This is a significant architectural change that requires careful planning and testing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions