Skip to content

Commit 079c8f8

Browse files
authored
chore: pre-release readiness (#313)
1 parent 2e5e806 commit 079c8f8

File tree

13 files changed

+506
-188
lines changed

13 files changed

+506
-188
lines changed

.agents/skills/debug-openshell-cluster/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ openshell doctor exec -- kubectl -n kube-system logs -l job-name=helm-install-op
160160
Common issues:
161161

162162
- **Replicas 0/0**: The StatefulSet has been scaled to zero — no pods are running. This can happen after a failed deploy, manual scale-down, or Helm values misconfiguration. Fix: `openshell doctor exec -- kubectl -n openshell scale statefulset openshell --replicas=1`
163-
- **ImagePullBackOff**: The component image failed to pull. In `internal` mode, verify internal registry readiness and pushed image tags (Step 5). In `external` mode, check `/etc/rancher/k3s/registries.yaml` credentials/endpoints and DNS (Step 8). Default external registry is `ghcr.io/nvidia/openshell/`. Ensure a valid `--registry-token` (or `OPENSHELL_REGISTRY_TOKEN`) was provided during deploy.
163+
- **ImagePullBackOff**: The component image failed to pull. In `internal` mode, verify internal registry readiness and pushed image tags (Step 5). In `external` mode, check `/etc/rancher/k3s/registries.yaml` credentials/endpoints and DNS (Step 8). Default external registry is `ghcr.io/nvidia/openshell/` (public, no auth required). If using a private registry, ensure `--registry-username` and `--registry-token` (or `OPENSHELL_REGISTRY_USERNAME`/`OPENSHELL_REGISTRY_TOKEN`) were provided during deploy.
164164
- **CrashLoopBackOff**: The server is crashing. Check pod logs for the actual error.
165165
- **Pending**: Insufficient resources or scheduling constraints.
166166

README.md

Lines changed: 2 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -21,25 +21,10 @@ Want to run on cloud compute? [Launch on Brev](https://brev.nvidia.com/launchabl
2121

2222
### Install
2323

24-
**Binary (recommended — requires [GitHub CLI](https://cli.github.com)):**
24+
**Binary (recommended):**
2525

2626
```bash
27-
sh -c 'ARCH=$(uname -m); OS=$(uname -s); \
28-
case "${OS}-${ARCH}" in \
29-
Linux-x86_64) ASSET="openshell-x86_64-unknown-linux-musl.tar.gz" ;; \
30-
Linux-aarch64) ASSET="openshell-aarch64-unknown-linux-musl.tar.gz" ;; \
31-
Darwin-arm64) ASSET="openshell-aarch64-apple-darwin.tar.gz" ;; \
32-
*) echo "Unsupported platform: ${OS}-${ARCH}" >&2; exit 1 ;; \
33-
esac; \
34-
gh release download devel --repo NVIDIA/OpenShell --pattern "${ASSET}" -O - \
35-
| tar xz \
36-
&& sudo install -m 755 openshell /usr/local/bin/openshell'
37-
```
38-
39-
Or use the install script from the repository:
40-
41-
```bash
42-
./install.sh
27+
curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh
4328
```
4429

4530
**From PyPI (requires [uv](https://docs.astral.sh/uv/)):**

architecture/gateway-single-node.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ Falls back to `8.8.8.8` / `8.8.4.4` if iptables detection fails.
264264

265265
### Registry configuration
266266

267-
Writes `/etc/rancher/k3s/registries.yaml` from `REGISTRY_HOST`, `REGISTRY_ENDPOINT`, `REGISTRY_USERNAME`, `REGISTRY_PASSWORD`, and `REGISTRY_INSECURE` environment variables so that k3s/containerd can authenticate when pulling component images at runtime.
267+
Writes `/etc/rancher/k3s/registries.yaml` from `REGISTRY_HOST`, `REGISTRY_ENDPOINT`, `REGISTRY_USERNAME`, `REGISTRY_PASSWORD`, and `REGISTRY_INSECURE` environment variables so that k3s/containerd can authenticate when pulling component images at runtime. When no explicit credentials are provided (the default for public GHCR repos), the auth block is omitted and images are pulled anonymously.
268268

269269
### Manifest injection
270270

@@ -392,8 +392,8 @@ Variables set on the container by `ensure_container()` in `docker.rs`:
392392
| `REGISTRY_INSECURE` | `"true"` or `"false"` | Always |
393393
| `IMAGE_REPO_BASE` | `{registry_host}/{namespace}` (or `IMAGE_REPO_BASE`/`OPENSHELL_IMAGE_REPO_BASE` override) | Always |
394394
| `REGISTRY_ENDPOINT` | Custom endpoint URL | When `OPENSHELL_REGISTRY_ENDPOINT` is set |
395-
| `REGISTRY_USERNAME` | Registry auth username | When credentials available |
396-
| `REGISTRY_PASSWORD` | Registry auth password | When credentials available |
395+
| `REGISTRY_USERNAME` | Registry auth username | When explicit credentials provided via `--registry-username`/`--registry-token` or env vars |
396+
| `REGISTRY_PASSWORD` | Registry auth password | When explicit credentials provided via `--registry-username`/`--registry-token` or env vars |
397397
| `EXTRA_SANS` | Comma-separated extra TLS SANs | When extra SANs computed |
398398
| `SSH_GATEWAY_HOST` | Resolved remote hostname/IP | Remote deploys only |
399399
| `SSH_GATEWAY_PORT` | Configured host port (default `8080`) | Remote deploys only |

crates/openshell-bootstrap/src/docker.rs

Lines changed: 17 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,7 @@
33

44
use crate::RemoteOptions;
55
use crate::constants::{container_name, network_name, volume_name};
6-
use crate::image::{
7-
self, DEFAULT_IMAGE_REPO_BASE, DEFAULT_REGISTRY, DEFAULT_REGISTRY_USERNAME, parse_image_ref,
8-
};
6+
use crate::image::{self, DEFAULT_IMAGE_REPO_BASE, DEFAULT_REGISTRY, parse_image_ref};
97
use bollard::API_DEFAULT_VERSION;
108
use bollard::Docker;
119
use bollard::errors::Error as BollardError;
@@ -403,6 +401,7 @@ pub async fn ensure_volume(docker: &Docker, name: &str) -> Result<()> {
403401
pub async fn ensure_image(
404402
docker: &Docker,
405403
image_ref: &str,
404+
registry_username: Option<&str>,
406405
registry_token: Option<&str>,
407406
) -> Result<()> {
408407
match docker.inspect_image(image_ref).await {
@@ -423,9 +422,10 @@ pub async fn ensure_image(
423422

424423
let (repo, tag) = parse_image_ref(image_ref);
425424

426-
// Use GHCR credentials (explicit or built-in default) for ghcr.io images.
425+
// Use explicit GHCR credentials when provided for ghcr.io images.
426+
// Public repos are pulled without authentication by default.
427427
let credentials = if repo.starts_with("ghcr.io/") {
428-
image::ghcr_credentials(registry_token)
428+
image::ghcr_credentials(registry_username, registry_token)
429429
} else {
430430
None
431431
};
@@ -452,6 +452,7 @@ pub async fn ensure_container(
452452
gateway_port: u16,
453453
disable_tls: bool,
454454
disable_gateway_auth: bool,
455+
registry_username: Option<&str>,
455456
registry_token: Option<&str>,
456457
gpu: bool,
457458
) -> Result<()> {
@@ -586,15 +587,17 @@ pub async fn ensure_container(
586587

587588
// Credential priority:
588589
// 1. OPENSHELL_REGISTRY_USERNAME/PASSWORD env vars (power-user override)
589-
// 2. registry_token from --registry-token / OPENSHELL_REGISTRY_TOKEN
590-
// 3. Built-in default XOR-decoded token
591-
let registry_username = env_non_empty("OPENSHELL_REGISTRY_USERNAME")
592-
.or_else(|| Some(DEFAULT_REGISTRY_USERNAME.to_string()));
593-
let registry_password = env_non_empty("OPENSHELL_REGISTRY_PASSWORD").or_else(|| {
590+
// 2. registry_username/registry_token from CLI flags / env vars
591+
// No built-in default — GHCR repos are public and pull without auth.
592+
let effective_username = env_non_empty("OPENSHELL_REGISTRY_USERNAME").or_else(|| {
593+
registry_username
594+
.filter(|u| !u.is_empty())
595+
.map(ToString::to_string)
596+
});
597+
let effective_password = env_non_empty("OPENSHELL_REGISTRY_PASSWORD").or_else(|| {
594598
registry_token
595599
.filter(|t| !t.is_empty())
596600
.map(ToString::to_string)
597-
.or_else(|| Some(image::default_registry_token()))
598601
});
599602

600603
let mut env_vars: Vec<String> = vec![
@@ -606,28 +609,13 @@ pub async fn ensure_container(
606609
if let Some(endpoint) = registry_endpoint {
607610
env_vars.push(format!("REGISTRY_ENDPOINT={endpoint}"));
608611
}
609-
if let (Some(username), Some(password)) = (registry_username, registry_password) {
612+
if let Some(password) = effective_password {
613+
// Default to __token__ when only a password/token is provided.
614+
let username = effective_username.unwrap_or_else(|| "__token__".to_string());
610615
env_vars.push(format!("REGISTRY_USERNAME={username}"));
611616
env_vars.push(format!("REGISTRY_PASSWORD={password}"));
612617
}
613618

614-
// When the primary registry is NOT ghcr.io (e.g. a local registry in
615-
// push-mode), we still need containerd credentials for the community
616-
// registry so that community sandbox images
617-
// (ghcr.io/nvidia/openshell-community/sandboxes/*) can be pulled at
618-
// runtime. Pass community registry credentials as a separate set of
619-
// env vars so the entrypoint can add a second block to registries.yaml.
620-
if registry_host != DEFAULT_REGISTRY {
621-
env_vars.push(format!("COMMUNITY_REGISTRY_HOST={DEFAULT_REGISTRY}"));
622-
env_vars.push(format!(
623-
"COMMUNITY_REGISTRY_USERNAME={DEFAULT_REGISTRY_USERNAME}"
624-
));
625-
env_vars.push(format!(
626-
"COMMUNITY_REGISTRY_PASSWORD={}",
627-
image::default_registry_token()
628-
));
629-
}
630-
631619
if !extra_sans.is_empty() {
632620
env_vars.push(format!("EXTRA_SANS={}", extra_sans.join(",")));
633621
}

crates/openshell-bootstrap/src/errors.rs

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -241,15 +241,18 @@ fn diagnose_image_pull_auth_failure(_gateway_name: &str) -> GatewayFailureDiagno
241241
GatewayFailureDiagnosis {
242242
summary: "Registry authentication failed".to_string(),
243243
explanation: "Could not authenticate with the container registry. The image may not \
244-
exist, or you may not have permission to access it."
244+
exist, or you may not have permission to access it. Public GHCR repos \
245+
should not require authentication — if you see this error with the default \
246+
registry, it may indicate the image does not exist or a network issue."
245247
.to_string(),
246248
recovery_steps: vec![
247249
RecoveryStep::with_command(
248250
"Verify the image exists and you have access",
249251
"docker pull ghcr.io/nvidia/openshell/cluster:latest",
250252
),
251253
RecoveryStep::new(
252-
"If using a private registry, ensure OPENSHELL_REGISTRY_TOKEN is set",
254+
"If using a private registry, set OPENSHELL_REGISTRY_USERNAME and OPENSHELL_REGISTRY_TOKEN \
255+
(or use --registry-username and --registry-token)",
253256
),
254257
RecoveryStep::with_command("Check your Docker login", "docker login ghcr.io"),
255258
],

crates/openshell-bootstrap/src/image.rs

Lines changed: 30 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -42,42 +42,7 @@ pub const DEFAULT_GATEWAY_IMAGE: &str = "ghcr.io/nvidia/openshell/cluster";
4242
///
4343
/// GHCR accepts any non-empty username when authenticating with a PAT;
4444
/// `__token__` is a common convention for token-based OCI registry auth.
45-
pub const DEFAULT_REGISTRY_USERNAME: &str = "__token__";
46-
47-
// ---------------------------------------------------------------------------
48-
// XOR-obfuscated default registry token
49-
// ---------------------------------------------------------------------------
50-
// A read-only GHCR PAT is XOR-encoded so it doesn't appear as plaintext in
51-
// the compiled binary. This is a lightweight deterrent against casual
52-
// inspection — it is NOT a security boundary. The `--registry-token` flag
53-
// (or `OPENSHELL_REGISTRY_TOKEN` env var) overrides this default.
54-
55-
/// XOR key used to decode the default registry token.
56-
const XOR_KEY: [u8; 32] = [
57-
0x9c, 0x87, 0xc1, 0x0c, 0x00, 0xe2, 0x59, 0x14, 0x98, 0xb8, 0xa5, 0x45, 0x48, 0x40, 0x3e, 0x92,
58-
0x62, 0x41, 0xfe, 0x5e, 0xd4, 0x09, 0x23, 0xe6, 0x85, 0xa7, 0x94, 0xab, 0xb8, 0x15, 0xcd, 0x45,
59-
];
60-
61-
/// XOR-encoded default GHCR registry token.
62-
const DEFAULT_REGISTRY_TOKEN_ENC: [u8; 40] = [
63-
0xfb, 0xef, 0xb1, 0x52, 0x45, 0xb5, 0x6c, 0x70, 0xd0, 0xf0, 0xd1, 0x15, 0x09, 0x39, 0x72, 0xd7,
64-
0x29, 0x36, 0xb7, 0x69, 0xe5, 0x64, 0x55, 0xaf, 0xee, 0xd2, 0xc0, 0xd2, 0xd1, 0x5b, 0x81, 0x0e,
65-
0xd1, 0xf5, 0xf2, 0x5a, 0x6b, 0xa3, 0x14, 0x46,
66-
];
67-
68-
/// Decode an XOR-encoded byte slice using [`XOR_KEY`].
69-
fn xor_decode(encoded: &[u8]) -> String {
70-
encoded
71-
.iter()
72-
.enumerate()
73-
.map(|(i, b)| (b ^ XOR_KEY[i % XOR_KEY.len()]) as char)
74-
.collect()
75-
}
76-
77-
/// Default GHCR registry token, decoded at runtime.
78-
pub(crate) fn default_registry_token() -> String {
79-
xor_decode(&DEFAULT_REGISTRY_TOKEN_ENC)
80-
}
45+
const DEFAULT_REGISTRY_USERNAME: &str = "__token__";
8146

8247
/// Parse an image reference into (repository, tag).
8348
///
@@ -150,18 +115,22 @@ pub async fn pull_image(
150115
Ok(())
151116
}
152117

153-
/// Build [`DockerCredentials`] for ghcr.io from a registry token.
118+
/// Build [`DockerCredentials`] for ghcr.io from explicit credentials.
154119
///
155-
/// When `token` is `None` or empty, falls back to the built-in default
156-
/// token (XOR-decoded at runtime). Always returns `Some`.
157-
#[allow(clippy::unnecessary_wraps)]
158-
pub(crate) fn ghcr_credentials(token: Option<&str>) -> Option<DockerCredentials> {
159-
let effective_token = token
160-
.filter(|t| !t.is_empty())
161-
.map_or_else(default_registry_token, ToString::to_string);
120+
/// Returns `None` when `token` is `None` or empty — the default GHCR repos
121+
/// are public and do not require authentication. When a token is provided,
122+
/// uses the given `username` (falling back to `__token__` if `None`/empty).
123+
pub(crate) fn ghcr_credentials(
124+
username: Option<&str>,
125+
token: Option<&str>,
126+
) -> Option<DockerCredentials> {
127+
let token = token.filter(|t| !t.is_empty())?;
128+
let username = username
129+
.filter(|u| !u.is_empty())
130+
.unwrap_or(DEFAULT_REGISTRY_USERNAME);
162131
Some(DockerCredentials {
163-
username: Some(DEFAULT_REGISTRY_USERNAME.to_string()),
164-
password: Some(effective_token),
132+
username: Some(username.to_string()),
133+
password: Some(token.to_string()),
165134
serveraddress: Some(DEFAULT_REGISTRY.to_string()),
166135
..Default::default()
167136
})
@@ -182,6 +151,7 @@ pub(crate) fn ghcr_credentials(token: Option<&str>) -> Option<DockerCredentials>
182151
pub async fn pull_remote_image(
183152
remote: &Docker,
184153
image_ref: &str,
154+
registry_username: Option<&str>,
185155
registry_token: Option<&str>,
186156
mut on_progress: impl FnMut(String) + Send + 'static,
187157
) -> Result<()> {
@@ -213,7 +183,7 @@ pub async fn pull_remote_image(
213183
);
214184
on_progress(format!("[progress] Pulling {platform_str} image"));
215185

216-
let credentials = ghcr_credentials(registry_token);
186+
let credentials = ghcr_credentials(registry_username, registry_token);
217187

218188
let options = CreateImageOptions {
219189
from_image: Some(registry_image_base),
@@ -351,8 +321,8 @@ mod tests {
351321
}
352322

353323
#[test]
354-
fn ghcr_credentials_with_token() {
355-
let creds = ghcr_credentials(Some("ghp_test123"));
324+
fn ghcr_credentials_with_token_default_username() {
325+
let creds = ghcr_credentials(None, Some("ghp_test123"));
356326
assert!(creds.is_some());
357327
let creds = creds.unwrap();
358328
assert_eq!(creds.username.as_deref(), Some("__token__"));
@@ -361,31 +331,21 @@ mod tests {
361331
}
362332

363333
#[test]
364-
fn ghcr_credentials_without_token_uses_default() {
365-
// When no explicit token is provided, the built-in default is used.
366-
let creds = ghcr_credentials(None).unwrap();
367-
assert_eq!(creds.username.as_deref(), Some("__token__"));
334+
fn ghcr_credentials_with_custom_username() {
335+
let creds = ghcr_credentials(Some("myuser"), Some("ghp_test123"));
336+
assert!(creds.is_some());
337+
let creds = creds.unwrap();
338+
assert_eq!(creds.username.as_deref(), Some("myuser"));
339+
assert_eq!(creds.password.as_deref(), Some("ghp_test123"));
368340
assert_eq!(creds.serveraddress.as_deref(), Some("ghcr.io"));
369-
// The password should be the decoded default token (non-empty).
370-
assert!(creds.password.is_some());
371-
assert!(!creds.password.as_ref().unwrap().is_empty());
372-
373-
// Same for empty string.
374-
let creds2 = ghcr_credentials(Some("")).unwrap();
375-
assert_eq!(creds2.password, creds.password);
376341
}
377342

378343
#[test]
379-
fn xor_decode_default_token() {
380-
let token = default_registry_token();
381-
assert!(
382-
!token.is_empty(),
383-
"default token should decode to non-empty"
384-
);
385-
assert!(
386-
token.chars().all(|c| c.is_ascii_graphic()),
387-
"default token should be printable ASCII"
388-
);
344+
fn ghcr_credentials_without_token_returns_none() {
345+
// No token means unauthenticated (public repos).
346+
assert!(ghcr_credentials(None, None).is_none());
347+
assert!(ghcr_credentials(None, Some("")).is_none());
348+
assert!(ghcr_credentials(Some("myuser"), None).is_none());
389349
}
390350

391351
#[test]

0 commit comments

Comments
 (0)