perf: attestation optimizations + persistent GPU worker + e2e benchmarks#70
Conversation
…ilities - aws-lc-rs 1.15.4 → 1.16.2 (pulls aws-lc-sys 0.37.0 → 0.39.0) Fixes RUSTSEC-2026-0044 through 0048 (X.509 bypass, PKCS7 bypasses, AES-CCM timing, CRL scope check) - rustls-webpki 0.103.9 → 0.103.10 Fixes RUSTSEC-2026-0049 (CRL Distribution Point matching bypass)
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical security vulnerabilities by upgrading key cryptographic and TLS-related dependencies. The updates ensure the application benefits from the latest security patches, preventing potential exploits and maintaining the integrity of certificate validation and cryptographic operations. This change is crucial for passing security audits and enhancing the overall robustness of the system. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates aws-lc-rs and rustls-webpki dependencies to address several security vulnerabilities. The changes are confined to Cargo.lock, reflecting the updated versions of these transitive dependencies. The changes are consistent with the stated goal of resolving the listed RUSTSEC advisories. No issues were found in this update.
Attestation generation optimization: - Parallelize TDX quote + GPU evidence collection with tokio::try_join! (previously sequential — these are independent: TDX talks to dstack socket, GPU evidence spawns a Python subprocess) - Cache dstack info() with OnceCell (static data, never changes during process lifetime — was re-fetched on every attestation) New benchmark suite (benches/e2e.rs): - End-to-end JSON completion flow (1/5/20 messages) - End-to-end streaming completion flow (5/20/50 chunks) - Attestation cache operations (hit/miss/set/semaphore) - Attestation response serialization - Request body processing pipeline (SHA256, JSON parse, reserialize) - Response signing full pipeline (parse → hash → sign → cache) - Streaming SSE parse+hash pipeline - Auth token constant-time comparison - JSON body round-trip (parse, modify, reserialize)
Replace subprocess-per-call with a long-running Python process that keeps
the interpreter, verifier module imports, and NVML driver initialized
across requests. Communication via JSON lines over stdin/stdout pipes.
Before: each attestation spawns python3, imports verifier/cc_admin,
calls nvmlInit(), collects evidence, exits. ~0.5-2s overhead per call
just from Python startup + module loading + NVML initialization.
After: worker spawns once, stays alive, processes nonce requests via
pipe. Only the actual GPU evidence collection time remains (~1-5s
depending on GPU load). Python startup + import + nvmlInit amortized
to zero after first call.
Design:
- GpuEvidenceWorker struct manages the child process lifecycle
- Worker sends {"ready": true} on stdout after initialization
- Requests: {"nonce": "<hex>", "no_gpu_mode": bool}
- Responses: {"ok": true, "evidence": [...]} or {"ok": false, "error": "..."}
- Auto-restart on worker death with one retry
- Falls back to subprocess-per-call if worker can't spawn
- All access serialized by existing gpu_semaphore (NVML constraint)
Async benchmark for attestation and completion endpoints with configurable concurrency and duration. Tests: - Attestation cached (no nonce) — measures cache hit path - Attestation fresh (with nonce) — forces GPU evidence + TDX quote - Chat completion (non-streaming and streaming) Reports p50/p90/p99 latencies, throughput, error rates. Usage: uv run scripts/bench_live.py <endpoint> -c 20 -d 60
…ption The NVIDIA verifier library (cc_admin) prints info messages directly to stdout (e.g. "Number of GPUs available : 8"), corrupting the JSON line protocol. Fix by dup'ing the real stdout fd at startup, redirecting sys.stdout to stderr, and using the saved fd for protocol messages. Also fix fallback: when the worker spawns but evidence collection fails on both attempts, now falls back to subprocess instead of returning error.
…ialization
Three improvements for attestation latency:
1. Cache pre-serialized JSON bytes instead of structs
- Cache hit now returns bytes::Bytes directly (zero-copy clone)
- Eliminates report.clone() + AttestationResponse construction +
serde_json::to_value + Json() serialization on every cached request
- 297KB response was being fully re-serialized on every hit
2. Remove wide semaphore — use worker Mutex only
- Previously: semaphore wrapped entire generate_attestation_inner()
(TDX quote + GPU evidence + dstack info)
- Now: only GPU evidence is serialized (via worker Mutex)
- Concurrent fresh attestation requests can overlap TDX quotes
3. Return AttestationResult enum from generate_attestation
- CachedBytes: pre-serialized bytes sent directly to client
- Fresh: report that needs one-time serialization
- Route handler branches on variant, avoids redundant work
Summary
Attestation performance overhaul + security dependency updates. Tested on live endpoints (gpu07, gpu23, gpu11).
Security fixes
Attestation optimizations
tokio::try_join!instead of sequentialOnceCell, static data fetched once per processbytes::Bytesdirectly on cache hit, no clone/serialize of 297KB structLive benchmark results (gpu07, concurrency=20, 60s)
Cached attestation bytes optimization not yet benchmarked on live (this PR).
New files
gpu_evidence_worker.py— persistent Python worker (JSON line protocol over stdin/stdout)benches/e2e.rs— end-to-end benchmarks (attestation, completions, proxy overhead)scripts/bench_live.py— live endpoint benchmark tool (uv run scripts/bench_live.py <url>)Worker design
sys.stdoutredirected to stderr to prevent NVIDIA verifier library prints from corrupting JSON protocolTest plan