design(protocol): schema-first WebSocket protocol to eliminate cross-language duplication

## Issue
WebSocket message schemas are duplicated between Python daemon and Rust client, creating maintenance burden and risk of protocol drift.

## Problem

The WebSocket protocol between `parakeet-stt-daemon` (Python) and `parakeet-ptt` (Rust) is defined independently in both codebases:

**Python**: `parakeet-stt-daemon/src/parakeet_stt_daemon/messages.py` (lines 44-191)
- `ClientMessage`: StartSession, StopSession, AbortSession
- `ServerMessage`: SessionStarted, FinalResult, Error, Status, InterimState, InterimText, AudioLevel, SessionEnded

**Rust**: `parakeet-ptt/src/protocol.rs` (lines 9-86)
- Same message structures with similar field names

**Key Differences**:
- Python uses `datetime` for timestamps, Rust uses `String` (RFC3339)
- Python uses `UUID` type, Rust uses `uuid::Uuid`
- Python uses `float` for numeric fields, Rust uses `u64`/`u32`
- Python uses `Literal` types for validation, Rust uses serde string enums

## DRY Violation

The knowledge of "the protocol contract" has no single authoritative representation. Any protocol change requires manual updates to both files, risking:
- Type mismatches causing runtime errors
- Field name inconsistencies
- Inconsistent validation logic
- Drift between implementations over time

## Proposed Fix

Adopt a schema-first approach with code generation:

### Option A: Protocol Buffers (protobuf)
```protobuf
message StartSession {
  string session_id = 1;
  string timestamp = 2;  // RFC3339
  string mode = 3;
  string preferred_lang = 4;
}

message FinalResult {
  string session_id = 1;
  string text = 2;
  uint64 latency_ms = 3;
  uint32 audio_ms = 4;
  string lang = 5;
  float confidence = 6;
}
// ... (all messages)
```

### Option B: JSON Schema with Type Definitions
```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Parakeet WebSocket Protocol",
  "definitions": {
    "StartSession": {
      "type": "object",
      "properties": {
        "type": {"enum": ["start_session"]},
        "session_id": {"type": "string", "format": "uuid"},
        "timestamp": {"type": "string", "format": "date-time"}
      }
    }
  }
}
```

### Option C: TypeScript Interface (for documentation + validation)
```typescript
interface StartSession {
  type: "start_session";
  session_id: string;  // UUID
  timestamp: string;  // ISO 8601
  mode: "push_to_talk" | "continuous";
  preferred_lang?: string;
}
```

## Tasks

This is a significant architectural change. Tasks include:

1. **Research & Decision**:
   - [ ] Evaluate code generation tools for Rust (prost, tonic, serde-codegen, etc.)
   - [ ] Evaluate code generation tools for Python (protoc, mypy-protobuf, dataclasses, pydantic)
   - [ ] Decide on schema format (protobuf, JSON Schema, TypeScript, custom)
   - [ ] Consider build system integration

2. **Implementation**:
   - [ ] Create schema file in repository root (e.g., `protocol/parakeet.proto`)
   - [ ] Set up code generation in Python build
   - [ ] Set up code generation in Rust build
   - [ ] Generate message types for Python
   - [ ] Generate message types for Rust
   - [ ] Migrate Python daemon to use generated types
   - [ ] Migrate Rust client to use generated types
   - [ ] Remove hand-written message definitions

3. **Validation & Testing**:
   - [ ] Ensure backward compatibility with existing protocol
   - [ ] Run full integration tests
   - [ ] Verify message serialization/deserialization works
   - [ ] Update documentation to reference schema file

4. **Documentation**:
   - [ ] Update `docs/SPEC.md` to reference schema
   - [ ] Document how to add new messages
   - [ ] Document code generation process

## Tradeoffs

| Schema Format | Pros | Cons | Effort |
|---------------|------|------|--------|
| **Protocol Buffers** | Industry standard, efficient binary, codegen mature | Learning curve, build complexity, less human-readable | High |
| **JSON Schema** | Human-readable, Python/Rust both have good support | No binary format, less performant | Medium |
| **TypeScript** | Great documentation, readable | No native codegen, type systems differ | Medium-High |
| **Custom YAML/JSON** | Maximum flexibility, simple | No tooling, must write own codegen | Very High |

## Related Issues

- #29: Session state consolidation (should use same schema)
- #30: Network config consolidation (config may influence endpoint schema)

## Questions for Consideration

1. Should we support both JSON and binary serialization formats?
2. Should the schema include versioning for future protocol evolution?
3. Should error codes be defined in the same schema file?
4. How do we handle optional vs required fields in the schema?

---

**AI-Generated Disclaimer**: This issue was generated by AI analysis and may contain errors. Please thoroughly verify the findings before implementing any changes. Review the code directly and test any modifications. This is a significant architectural change that requires careful planning and testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design(protocol): schema-first WebSocket protocol to eliminate cross-language duplication #31

Issue

Problem

DRY Violation

Proposed Fix

Option A: Protocol Buffers (protobuf)

Option B: JSON Schema with Type Definitions

Option C: TypeScript Interface (for documentation + validation)

Tasks

Tradeoffs

Related Issues

Questions for Consideration

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Schema Format	Pros	Cons	Effort
Protocol Buffers	Industry standard, efficient binary, codegen mature	Learning curve, build complexity, less human-readable	High
JSON Schema	Human-readable, Python/Rust both have good support	No binary format, less performant	Medium
TypeScript	Great documentation, readable	No native codegen, type systems differ	Medium-High
Custom YAML/JSON	Maximum flexibility, simple	No tooling, must write own codegen	Very High

design(protocol): schema-first WebSocket protocol to eliminate cross-language duplication #31

Description

Issue

Problem

DRY Violation

Proposed Fix

Option A: Protocol Buffers (protobuf)

Option B: JSON Schema with Type Definitions

Option C: TypeScript Interface (for documentation + validation)

Tasks

Tradeoffs

Related Issues

Questions for Consideration

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions