Skip to content

BERDataLakehouse/spark_connect_proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-connect-proxy

Tests codecov Python 3.13

gRPC proxy for multi-user Spark Connect access via KBase authentication.

Routes PySpark Spark Connect gRPC traffic from a single endpoint to the correct user's notebook pod based on the KBase token in the request metadata.

Architecture

User (PySpark) → Ingress (spark.berdl.kbase.us:443) → Spark Connect Proxy → jupyter-{username}:15002
  1. User sends gRPC request with x-kbase-token metadata
  2. Proxy validates token, resolves username
  3. Proxy forwards gRPC to jupyter-{username}.jupyterhub-prod.svc.cluster.local:15002
  4. Responses stream back to user transparently

Configuration

Env Var Description Default
KBASE_AUTH_URL KBase Auth2 service URL https://kbase.us/services/auth/
PROXY_LISTEN_PORT Port the proxy listens on 15002
BACKEND_PORT Spark Connect port on notebooks 15002
BACKEND_NAMESPACE K8s namespace for notebooks jupyterhub-prod
SERVICE_TEMPLATE Backend service pattern jupyter-{username}.{namespace}.svc.cluster.local
TOKEN_CACHE_TTL Token cache TTL (seconds) 300

Development

# Install
uv sync --dev

# Run tests
uv run pytest

# Run locally
uv run python -m spark_connect_proxy

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors