Skip to content

perf: attestation optimizations + persistent GPU worker + e2e benchmarks#70

Merged
Evrard-Nil merged 7 commits intomainfrom
fix/security-audit-aws-lc-rustls-webpki
Mar 24, 2026
Merged

perf: attestation optimizations + persistent GPU worker + e2e benchmarks#70
Evrard-Nil merged 7 commits intomainfrom
fix/security-audit-aws-lc-rustls-webpki

Conversation

@Evrard-Nil
Copy link
Copy Markdown
Contributor

@Evrard-Nil Evrard-Nil commented Mar 23, 2026

Summary

Attestation performance overhaul + security dependency updates. Tested on live endpoints (gpu07, gpu23, gpu11).

Security fixes

  • aws-lc-sys 0.37.0 → 0.39.0 (RUSTSEC-2026-0044 through 0048)
  • rustls-webpki 0.103.9 → 0.103.10 (RUSTSEC-2026-0049)

Attestation optimizations

Optimization Expected savings
Persistent Python worker — keeps interpreter, verifier imports, and NVML initialized across requests instead of spawning python3 per call ~0.5-2s/call
Parallelize TDX quote + GPU evidencetokio::try_join! instead of sequential ~0.3-1s/call
Cache dstack infoOnceCell, static data fetched once per process ~0.1-0.3s/call
Cache pre-serialized bytes — return bytes::Bytes directly on cache hit, no clone/serialize of 297KB struct ~1.5s at c=20
Narrow GPU serialization — only GPU evidence uses Mutex, TDX quotes run concurrently across requests Reduced queuing

Live benchmark results (gpu07, concurrency=20, 60s)

Metric Before After Change
Fresh attestation p50 17,668ms 7,035ms -60%
Fresh attestation p90 17,932ms 7,637ms -57%
Fresh attestation throughput 67 req/60s 144 req/60s +115%

Cached attestation bytes optimization not yet benchmarked on live (this PR).

New files

  • gpu_evidence_worker.py — persistent Python worker (JSON line protocol over stdin/stdout)
  • benches/e2e.rs — end-to-end benchmarks (attestation, completions, proxy overhead)
  • scripts/bench_live.py — live endpoint benchmark tool (uv run scripts/bench_live.py <url>)

Worker design

  • Spawns once, stays alive, processes nonce requests via stdin/stdout pipes
  • sys.stdout redirected to stderr to prevent NVIDIA verifier library prints from corrupting JSON protocol
  • Auto-restart on worker death, falls back to subprocess-per-call if worker can't spawn

Test plan

  • 94 tests pass
  • Clippy clean
  • CI passes (all 5 checks)
  • Worker protocol tested locally
  • Deployed and benchmarked on gpu07, gpu23, gpu11
  • Fresh attestation p50 dropped 60% on gpu07
  • Verify cached bytes path after deployment (expect <100ms at c=20)

…ilities

- aws-lc-rs 1.15.4 → 1.16.2 (pulls aws-lc-sys 0.37.0 → 0.39.0)
  Fixes RUSTSEC-2026-0044 through 0048 (X.509 bypass, PKCS7 bypasses,
  AES-CCM timing, CRL scope check)
- rustls-webpki 0.103.9 → 0.103.10
  Fixes RUSTSEC-2026-0049 (CRL Distribution Point matching bypass)
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical security vulnerabilities by upgrading key cryptographic and TLS-related dependencies. The updates ensure the application benefits from the latest security patches, preventing potential exploits and maintaining the integrity of certificate validation and cryptographic operations. This change is crucial for passing security audits and enhancing the overall robustness of the system.

Highlights

  • Dependency Updates: Updated aws-lc-rs from 1.15.4 to 1.16.2, which in turn updates aws-lc-sys from 0.37.0 to 0.39.0. Also updated rustls-webpki from 0.103.9 to 0.103.10.
  • Security Vulnerability Fixes: Resolved multiple security vulnerabilities including X.509 Name Constraints Bypass, Timing Side-Channel in AES-CCM, PKCS7 certificate and signature validation bypasses, and CRL Distribution Point logic errors, as well as a CRL Distribution Point matching bypass.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates aws-lc-rs and rustls-webpki dependencies to address several security vulnerabilities. The changes are confined to Cargo.lock, reflecting the updated versions of these transitive dependencies. The changes are consistent with the stated goal of resolving the listed RUSTSEC advisories. No issues were found in this update.

Attestation generation optimization:
- Parallelize TDX quote + GPU evidence collection with tokio::try_join!
  (previously sequential — these are independent: TDX talks to dstack
  socket, GPU evidence spawns a Python subprocess)
- Cache dstack info() with OnceCell (static data, never changes during
  process lifetime — was re-fetched on every attestation)

New benchmark suite (benches/e2e.rs):
- End-to-end JSON completion flow (1/5/20 messages)
- End-to-end streaming completion flow (5/20/50 chunks)
- Attestation cache operations (hit/miss/set/semaphore)
- Attestation response serialization
- Request body processing pipeline (SHA256, JSON parse, reserialize)
- Response signing full pipeline (parse → hash → sign → cache)
- Streaming SSE parse+hash pipeline
- Auth token constant-time comparison
- JSON body round-trip (parse, modify, reserialize)
Replace subprocess-per-call with a long-running Python process that keeps
the interpreter, verifier module imports, and NVML driver initialized
across requests. Communication via JSON lines over stdin/stdout pipes.

Before: each attestation spawns python3, imports verifier/cc_admin,
calls nvmlInit(), collects evidence, exits. ~0.5-2s overhead per call
just from Python startup + module loading + NVML initialization.

After: worker spawns once, stays alive, processes nonce requests via
pipe. Only the actual GPU evidence collection time remains (~1-5s
depending on GPU load). Python startup + import + nvmlInit amortized
to zero after first call.

Design:
- GpuEvidenceWorker struct manages the child process lifecycle
- Worker sends {"ready": true} on stdout after initialization
- Requests: {"nonce": "<hex>", "no_gpu_mode": bool}
- Responses: {"ok": true, "evidence": [...]} or {"ok": false, "error": "..."}
- Auto-restart on worker death with one retry
- Falls back to subprocess-per-call if worker can't spawn
- All access serialized by existing gpu_semaphore (NVML constraint)
@Evrard-Nil Evrard-Nil changed the title fix: update aws-lc-sys and rustls-webpki to resolve security vulnerabilities perf: attestation optimization + e2e benchmarks + persistent GPU worker Mar 23, 2026
Async benchmark for attestation and completion endpoints with
configurable concurrency and duration. Tests:
- Attestation cached (no nonce) — measures cache hit path
- Attestation fresh (with nonce) — forces GPU evidence + TDX quote
- Chat completion (non-streaming and streaming)

Reports p50/p90/p99 latencies, throughput, error rates.

Usage: uv run scripts/bench_live.py <endpoint> -c 20 -d 60
…ption

The NVIDIA verifier library (cc_admin) prints info messages directly to
stdout (e.g. "Number of GPUs available : 8"), corrupting the JSON line
protocol. Fix by dup'ing the real stdout fd at startup, redirecting
sys.stdout to stderr, and using the saved fd for protocol messages.

Also fix fallback: when the worker spawns but evidence collection fails
on both attempts, now falls back to subprocess instead of returning error.
…ialization

Three improvements for attestation latency:

1. Cache pre-serialized JSON bytes instead of structs
   - Cache hit now returns bytes::Bytes directly (zero-copy clone)
   - Eliminates report.clone() + AttestationResponse construction +
     serde_json::to_value + Json() serialization on every cached request
   - 297KB response was being fully re-serialized on every hit

2. Remove wide semaphore — use worker Mutex only
   - Previously: semaphore wrapped entire generate_attestation_inner()
     (TDX quote + GPU evidence + dstack info)
   - Now: only GPU evidence is serialized (via worker Mutex)
   - Concurrent fresh attestation requests can overlap TDX quotes

3. Return AttestationResult enum from generate_attestation
   - CachedBytes: pre-serialized bytes sent directly to client
   - Fresh: report that needs one-time serialization
   - Route handler branches on variant, avoids redundant work
@Evrard-Nil Evrard-Nil changed the title perf: attestation optimization + e2e benchmarks + persistent GPU worker perf: attestation optimizations + persistent GPU worker + e2e benchmarks Mar 24, 2026
@Evrard-Nil Evrard-Nil merged commit e6108d0 into main Mar 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant