S3 FUSE home directories and API-based credential fetching by Tianhao-Gu · Pull Request #138 · BERDataLakehouse/spark_notebook

Tianhao-Gu · 2026-03-16T02:43:56Z

Summary

Mount user home directories via s3fs-fuse (cdm-lake/users-general-warehouse/{user}/workspace → /home/{user}/) in entrypoint.sh
Fetch MinIO credentials from governance API (GET /credentials) in entrypoint instead of reading from local file cache — needed because the FUSE mount requires credentials before the home directory exists
Skip recursive chown on FUSE-mounted home directory (s3fs handles ownership via uid/gid mount options)
Update refresh.py to use _get_credentials_cache_path() helper instead of inline Path.home() / CREDENTIALS_CACHE_FILE
Remove unused credential cache tests (cache functions still exist but are no longer the primary credential path)

Test plan

Verify entrypoint.sh fetches credentials from governance API with retry logic
Verify s3fs mount succeeds with correct permissions (uid/gid)
Verify notebook startup works with FUSE-mounted home directory
Verify refresh_spark_environment() clears credential cache correctly
Run full test suite: cd notebook_utils && uv run pytest (839 passed)

Takes main's credential caching (file-lock based), relative imports, error handling, deprecation warnings, list_user_names, and regenerate_policies. Removes Polaris-specific code (polaris credential cache, get_polaris_credentials, httpx-based rotate) that is not in main's governance client v0.0.14.

github-actions · 2026-03-16T02:48:06Z

Coverage report

This PR does not seem to contain any modification to coverable code.

Copilot

Pull request overview

This PR updates notebook startup to support S3 FUSE–backed home directories by fetching MinIO credentials from the governance API early in container startup, and adjusts related utilities/tests to align with the new startup flow.

Changes:

Update scripts/entrypoint.sh to fetch credentials from the governance API with retries and mount /home/$NB_USER (and a shared subdir) via s3fs.
Remove recursive chown of the home directory to avoid expensive operations on FUSE mounts.
Update refresh/test code to use the credential cache path helper and simplify credential tests.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
scripts/entrypoint.sh	Fetch credentials from governance API, set AWS endpoint env, and mount S3-backed home/share via s3fs.
configs/jupyter_docker_stacks_hooks/11-setup_env.sh	Remove recursive `chown -R` of the home directory to avoid slow FUSE traversal.
notebook_utils/berdl_notebook_utils/minio_governance/operations.py	Update `get_minio_credentials()` docstring to reflect API-based credential sourcing.
notebook_utils/berdl_notebook_utils/refresh.py	Use `_get_credentials_cache_path()` helper when clearing credential cache.
notebook_utils/tests/minio_governance/test_operations.py	Remove credential cache helper tests and adjust `get_minio_credentials()` tests toward API behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

notebook_utils/tests/minio_governance/test_operations.py

scripts/entrypoint.sh

configs/jupyter_docker_stacks_hooks/11-setup_env.sh

notebook_utils/berdl_notebook_utils/minio_governance/operations.py

notebook_utils/berdl_notebook_utils/refresh.py

scripts/entrypoint.sh

codecov · 2026-03-16T03:02:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Resolve conflicts in test_operations.py: - Keep main's comprehensive test classes for coverage - Fix list_user_names tests to patch correct module-level function - Add missing Path and _write_credentials_cache imports

Resolve conflicts in operations.py, test_operations.py, and entrypoint.sh by keeping the feature/s3_mount versions (retry logic for credential fetch, s3fs FUSE mounts).

Copilot

Pull request overview

This PR updates the notebook container startup flow to (1) fetch MinIO credentials early via the governance API and (2) mount S3-backed “home” directories via s3fs-fuse, avoiding slow recursive ownership fixes on FUSE mounts.

Changes:

Add governance API credential fetch + retry logic to entrypoint.sh and use it to drive s3fs mounts for /home/$NB_USER and a global shared directory.
Remove recursive chown of the home directory in the docker-stacks hook to avoid expensive FUSE/S3 object traversal.
Introduce/adjust local credential cache helpers + locking in minio_governance/operations.py, and update/add tests around cache/lock behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 14 comments.

File	Description
scripts/entrypoint.sh	Fetches credentials from governance API with retries and mounts S3 paths into home via `s3fs`.
configs/jupyter_docker_stacks_hooks/11-setup_env.sh	Skips recursive `chown` to avoid FUSE-triggered S3 HEAD storms.
notebook_utils/berdl_notebook_utils/minio_governance/operations.py	Adds credential cache path/helpers and file-locking around cache usage.
notebook_utils/tests/minio_governance/test_operations.py	Updates credential tests and adds coverage for cache/lock error handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T18:42:08Z

scripts/entrypoint.sh

+    local s3fs_opts="passwd_file=/tmp/.s3fs-passwd"
+    s3fs_opts="${s3fs_opts},url=${MINIO_URL}"
+    s3fs_opts="${s3fs_opts},use_path_request_style"


s3fs is configured with url=${MINIO_URL}, but MINIO_URL is constructed earlier in the script by defaulting to http:// when no scheme is provided. Since that scheme selection currently ignores MINIO_SECURE, secure deployments that set MINIO_SECURE=true but provide a bare host/port will still use http here. Consider deriving MINIO_URL’s scheme from MINIO_SECURE (as done in configs/jupyter_server_config.py) so s3fs/boto3 consistently use https when configured.

Copilot · 2026-03-23T18:42:09Z

scripts/entrypoint.sh

+    # Create s3fs password file
+    echo "${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}" > /tmp/.s3fs-passwd
+    chmod 600 /tmp/.s3fs-passwd
+


/tmp/.s3fs-passwd contains long-lived access credentials; it’s created with 0600 but never removed. If s3fs doesn’t need it after the mount is established, remove it (or install an EXIT trap to clean it up) to reduce the window for credential exposure in the container filesystem.

Copilot · 2026-03-23T18:42:09Z

configs/jupyter_docker_stacks_hooks/11-setup_env.sh

+# requests for every object in S3.



The finalization step now skips the recursive chown unconditionally. If the S3 FUSE mount is not active (e.g., missing creds, mount failure, or FUSE disabled), files/directories created in /home/$NB_USER by this hook may remain root-owned and break runtime writes. Consider conditionally chown only when $HOME_DIRECTORY is not a FUSE mount (e.g., via findmnt/stat -f -c %T) or only chown the specific local paths created here when not mounted.

Suggested change

# requests for every object in S3.

# requests for every object in S3.

#

# However, if $HOME_DIRECTORY is *not* a FUSE mount (for example, when the S3

# mount is disabled or failed), files and directories created by this hook may

# be owned by root and thus unwritable by $NB_USER. In that case, fix ownership

# only for the specific paths created here, not the entire home directory.

# Detect filesystem type of the home directory. On FUSE-based mounts, %T

# typically reports a value starting with "fuse" (e.g., "fuseblk", "fuse").

FS_TYPE="$(stat -f -c %T "$HOME_DIRECTORY" 2>/dev/null || echo unknown)"

# If the home directory is not on a FUSE filesystem, correct ownership of the

# specific paths created/managed by this hook.

case "$FS_TYPE" in

fuse* )

# On FUSE (e.g., s3fs), rely on mount-time uid/gid options; do not chown.

;;

* )

NB_UID="$(id -u "$NB_USER")"

NB_GID="$(id -g "$NB_USER")"

# Only chown paths this script created or modified, to avoid expensive

# operations on large home directories.

chown "$NB_UID:$NB_GID" \

"$HOME_DIRECTORY/.bash_profile" \

"$CUSTOM_PROFILE"

chown -R "$NB_UID:$NB_GID" \

"$JUPYTER_DIR" \

"$IPYTHON_STARTUP_DIR"

;;

esac

Copilot · 2026-03-23T18:42:09Z

notebook_utils/berdl_notebook_utils/minio_governance/operations.py

+    cache_path = _get_credentials_cache_path()
+    lock_path = cache_path.with_suffix(".lock")
+
+    # Use file locking to prevent concurrent access
+    with open(lock_path, "w") as lock_file:
+        try:
+            # Acquire exclusive lock (blocks until available)
+            fcntl.flock(lock_file.fileno(), fcntl.LOCK_EX)
+
+            # Try to load from cache first (double-check after acquiring lock)
+            cached_credentials = _read_cached_credentials(cache_path)
+            if cached_credentials:
+                credentials = cached_credentials
+            else:
+                # No cache or cache corrupted, fetch fresh credentials
+                client = get_governance_client()
+                api_response = get_credentials_credentials_get.sync(client=client)
+                if isinstance(api_response, CredentialsResponse):
+                    credentials = api_response
+                    _write_credentials_cache(cache_path, credentials)
+                else:


PR description says credentials are fetched from the governance API "instead of reading from local file cache" and that the API is the cache. However get_minio_credentials() now reads/writes a local credentials cache file and only calls the API on a cache miss. Either update the PR description to reflect that a local cache is still used, or remove the local cache path from this function to match the stated design.

Copilot · 2026-03-23T18:42:10Z

notebook_utils/berdl_notebook_utils/minio_governance/operations.py

+    with open(lock_path, "w") as lock_file:
+        try:
+            fcntl.flock(lock_file.fileno(), fcntl.LOCK_EX)
+            _write_credentials_cache(cache_path, api_response)
+        finally:
+            pass
+
+    try:
+        lock_path.unlink(missing_ok=True)
+    except OSError:
+        pass
+


rotate_minio_credentials() also unconditionally opens the lock file for writing. If the cache directory/home is not writable, rotation will raise after the API call succeeds, preventing env vars from being updated. Similar to get_minio_credentials(), consider handling OSError around lock/cache operations and still updating env vars even when caching/locking fails.

Suggested change

with open(lock_path, "w") as lock_file:

try:

fcntl.flock(lock_file.fileno(), fcntl.LOCK_EX)

_write_credentials_cache(cache_path, api_response)

finally:

pass

try:

lock_path.unlink(missing_ok=True)

except OSError:

pass

try:

with open(lock_path, "w") as lock_file:

fcntl.flock(lock_file.fileno(), fcntl.LOCK_EX)

_write_credentials_cache(cache_path, api_response)

try:

lock_path.unlink(missing_ok=True)

except OSError:

# Failure to remove the lock file is non-fatal

pass

except OSError as exc:

# If we cannot write to the cache/lock, log and continue so env vars are still updated

logging.warning(

"Failed to update MinIO credentials cache at %s: %s",

cache_path,

exc,

)

Copilot · 2026-03-23T18:42:11Z

notebook_utils/berdl_notebook_utils/minio_governance/operations.py

+    Calls GET /credentials on the governance API, which returns DB-cached credentials
+    without rotating. No local file cache needed — the API is the cache.


The get_minio_credentials docstring says "No local file cache needed — the API is the cache", but the implementation now explicitly reads/writes a local credentials cache file before calling the API. Please update the docstring (and/or the implementation) so the documented behavior matches the actual caching logic.

Suggested change

Calls GET /credentials on the governance API, which returns DB-cached credentials

without rotating. No local file cache needed — the API is the cache.

Credentials are read from a local cache file when available; otherwise, the

function calls GET /credentials on the governance API (which returns

DB-cached credentials without rotating) and updates the local cache.

Copilot · 2026-03-23T18:42:11Z

notebook_utils/berdl_notebook_utils/minio_governance/operations.py

+        with open(cache_path, "w") as f:
+            json.dump(credentials.to_dict(), f)
+    except (OSError, TypeError):


The credentials cache file stores the MinIO secret key in plaintext, but _write_credentials_cache doesn’t set restrictive permissions (e.g., 0600). Depending on the container/user umask this can create group/world-readable credentials. Consider writing with explicit mode 0o600 (or chmod after write) and/or placing it under a private directory like ~/.cache with controlled permissions.

Suggested change

with open(cache_path, "w") as f:

json.dump(credentials.to_dict(), f)

except (OSError, TypeError):

# Use os.open with explicit 0o600 mode to avoid relying on the process umask

fd = os.open(str(cache_path), os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)

with os.fdopen(fd, "w") as f:

json.dump(credentials.to_dict(), f)

except (OSError, TypeError):

# On any error, silently skip caching to preserve existing behavior

Copilot · 2026-03-23T18:42:11Z

notebook_utils/berdl_notebook_utils/minio_governance/operations.py

+    # Use file locking to prevent concurrent access
+    with open(lock_path, "w") as lock_file:
+        try:
+            # Acquire exclusive lock (blocks until available)
+            fcntl.flock(lock_file.fileno(), fcntl.LOCK_EX)
+
+            # Try to load from cache first (double-check after acquiring lock)
+            cached_credentials = _read_cached_credentials(cache_path)
+            if cached_credentials:
+                credentials = cached_credentials
+            else:
+                # No cache or cache corrupted, fetch fresh credentials
+                client = get_governance_client()
+                api_response = get_credentials_credentials_get.sync(client=client)
+                if isinstance(api_response, CredentialsResponse):
+                    credentials = api_response
+                    _write_credentials_cache(cache_path, credentials)
+                else:
+                    raise RuntimeError("Failed to fetch credentials from API")
+        finally:
+            # Lock is automatically released when file is closed
+            pass


get_minio_credentials() always opens the lock file with open(lock_path, "w") before any API call. If the home directory is not writable (read-only, permission issues, etc.), this will raise and prevent credential fetching entirely, even though the API call could still succeed. Consider catching OSError around lock/cache setup and falling back to a no-cache/no-lock path (fetch from API directly) when locking isn’t possible.

Suggested change

# Use file locking to prevent concurrent access

with open(lock_path, "w") as lock_file:

try:

# Acquire exclusive lock (blocks until available)

fcntl.flock(lock_file.fileno(), fcntl.LOCK_EX)

# Try to load from cache first (double-check after acquiring lock)

cached_credentials = _read_cached_credentials(cache_path)

if cached_credentials:

credentials = cached_credentials

else:

# No cache or cache corrupted, fetch fresh credentials

client = get_governance_client()

api_response = get_credentials_credentials_get.sync(client=client)

if isinstance(api_response, CredentialsResponse):

credentials = api_response

_write_credentials_cache(cache_path, credentials)

else:

raise RuntimeError("Failed to fetch credentials from API")

finally:

# Lock is automatically released when file is closed

pass

credentials: CredentialsResponse

try:

# Use file locking to prevent concurrent access

with open(lock_path, "w") as lock_file:

try:

# Acquire exclusive lock (blocks until available)

fcntl.flock(lock_file.fileno(), fcntl.LOCK_EX)

# Try to load from cache first (double-check after acquiring lock)

cached_credentials = _read_cached_credentials(cache_path)

if cached_credentials:

credentials = cached_credentials

else:

# No cache or cache corrupted, fetch fresh credentials

client = get_governance_client()

api_response = get_credentials_credentials_get.sync(client=client)

if isinstance(api_response, CredentialsResponse):

credentials = api_response

_write_credentials_cache(cache_path, credentials)

else:

raise RuntimeError("Failed to fetch credentials from API")

finally:

# Lock is automatically released when file is closed

pass

except OSError:

# If we cannot create or lock the cache file (e.g. read-only home directory),

# fall back to fetching credentials directly from the API without using

# any local cache or file locking.

client = get_governance_client()

api_response = get_credentials_credentials_get.sync(client=client)

if isinstance(api_response, CredentialsResponse):

credentials = api_response

else:

raise RuntimeError("Failed to fetch credentials from API")

Copilot · 2026-03-23T18:42:12Z

notebook_utils/tests/minio_governance/test_operations.py

 class TestGetMinioCredentials:
    """Tests for get_minio_credentials function."""

-    @patch("berdl_notebook_utils.minio_governance.operations.get_settings")
    @patch("berdl_notebook_utils.minio_governance.operations.os")
    @patch("berdl_notebook_utils.minio_governance.operations.get_credentials_credentials_get")
    @patch("berdl_notebook_utils.minio_governance.operations.get_governance_client")
-    def test_fetches_credentials_from_api(self, mock_get_client, mock_get_creds, mock_os, mock_get_settings):
+    def test_fetches_credentials_from_api(self, mock_get_client, mock_get_creds, mock_os):
        """Test fetches credentials from API and sets env vars."""
        mock_client = Mock()
        mock_get_client.return_value = mock_client

        mock_creds = Mock(spec=CredentialsResponse)
-        mock_creds.access_key = "new_key"
-        mock_creds.secret_key = "new_secret"
+        mock_creds.access_key = "api_key"
+        mock_creds.secret_key = "api_secret"
        mock_get_creds.sync.return_value = mock_creds

        result = get_minio_credentials()

        assert result == mock_creds
-        mock_get_creds.sync.assert_called_once_with(client=mock_client)
-        mock_os.environ.__setitem__.assert_any_call("MINIO_ACCESS_KEY", "new_key")
-        mock_os.environ.__setitem__.assert_any_call("MINIO_SECRET_KEY", "new_secret")
-        mock_get_settings.cache_clear.assert_called_once()
+        mock_os.environ.__setitem__.assert_any_call("MINIO_ACCESS_KEY", "api_key")
+        mock_os.environ.__setitem__.assert_any_call("MINIO_SECRET_KEY", "api_secret")



TestGetMinioCredentials.test_fetches_credentials_from_api doesn’t isolate the on-disk cache path (Path.home() / .berdl_minio_credentials). Since get_minio_credentials() now reads the cache before calling the API, the test can become order/environment dependent if that file exists on the runner. Patch _get_credentials_cache_path to a tmp_path and _read_cached_credentials to None (or ensure the cache file is removed) so this test deterministically exercises the API-fetch branch.

Copilot · 2026-03-23T18:42:12Z

scripts/entrypoint.sh

+if [ -n "${GOVERNANCE_API_URL:-}" ] && [ -n "${KBASE_AUTH_TOKEN:-}" ]; then
+    # Fetch per-user credentials from governance API (idempotent — returns cached).
+    # Retry up to 5 times — the service may still be starting.
+    echo "Fetching MinIO credentials from governance API..."
+    CRED_FETCHED=false
+    for attempt in 1 2 3 4 5; do
+        CRED_RESPONSE=$(curl -sL -w "\n%{http_code}" \
+            -H "Authorization: Bearer ${KBASE_AUTH_TOKEN}" \
+            "${GOVERNANCE_API_URL}/credentials" 2>/dev/null)
+        HTTP_CODE=$(echo "$CRED_RESPONSE" | tail -1)
+        BODY=$(echo "$CRED_RESPONSE" | sed '$d')
+        if [ "$HTTP_CODE" = "200" ] && [ -n "$BODY" ]; then
+            export AWS_ACCESS_KEY_ID=$(echo "$BODY" | python3 -c "import sys,json; print(json.load(sys.stdin).get('access_key',''))")
+            export AWS_SECRET_ACCESS_KEY=$(echo "$BODY" | python3 -c "import sys,json; print(json.load(sys.stdin).get('secret_key',''))")
+            echo "Loaded credentials from governance API"


set -x combined with the credential fetch/export will log the auth token and the AWS secret/access keys (both the curl command and the export ... expansions). This is a credential leak in container logs; disable xtrace around the credential fetch/parsing (or run those commands in a subshell with set +x), and avoid placing secrets on the command line where possible.

Align with main branch (PR #142): remove local .berdl_minio_credentials file cache and file locking from get_minio_credentials() and rotate_minio_credentials(). The governance API already caches credentials server-side in PostgreSQL.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T21:40:58Z

scripts/entrypoint.sh

+            export AWS_ACCESS_KEY_ID=$(echo "$BODY" | python3 -c "import sys,json; print(json.load(sys.stdin).get('access_key',''))")
+            export AWS_SECRET_ACCESS_KEY=$(echo "$BODY" | python3 -c "import sys,json; print(json.load(sys.stdin).get('secret_key',''))")
+            echo "Loaded credentials from governance API"
+            CRED_FETCHED=true
+            break
+        fi
+        echo "  Attempt $attempt: HTTP $HTTP_CODE — retrying in 3s..." >&2


Credential parsing marks CRED_FETCHED=true solely based on HTTP 200 and non-empty body. If the body is invalid JSON or missing keys, the python json.load calls will fail (or return empty strings) but the script will still treat the fetch as successful and proceed with empty AWS_* values. Please validate JSON parsing succeeded and that both access_key/secret_key are non-empty before setting CRED_FETCHED=true / breaking the loop.

Suggested change

export AWS_ACCESS_KEY_ID=$(echo "$BODY" | python3 -c "import sys,json; print(json.load(sys.stdin).get('access_key',''))")

export AWS_SECRET_ACCESS_KEY=$(echo "$BODY" | python3 -c "import sys,json; print(json.load(sys.stdin).get('secret_key',''))")

echo "Loaded credentials from governance API"

CRED_FETCHED=true

break

fi

echo " Attempt $attempt: HTTP $HTTP_CODE — retrying in 3s..." >&2

ACCESS_KEY=$(echo "$BODY" | python3 -c "import sys, json; data = json.load(sys.stdin); print(data.get('access_key', '')) if isinstance(data, dict) else None" 2>/dev/null || echo "")

SECRET_KEY=$(echo "$BODY" | python3 -c "import sys, json; data = json.load(sys.stdin); print(data.get('secret_key', '')) if isinstance(data, dict) else None" 2>/dev/null || echo "")

if [ -n "$ACCESS_KEY" ] && [ -n "$SECRET_KEY" ]; then

export AWS_ACCESS_KEY_ID="$ACCESS_KEY"

export AWS_SECRET_ACCESS_KEY="$SECRET_KEY"

echo "Loaded credentials from governance API"

CRED_FETCHED=true

break

else

echo " Attempt $attempt: HTTP $HTTP_CODE — received invalid or incomplete credentials payload — retrying in 3s..." >&2

fi

else

echo " Attempt $attempt: HTTP $HTTP_CODE — retrying in 3s..." >&2

fi

Copilot · 2026-03-26T21:40:58Z

scripts/entrypoint.sh

+    echo "Mounting s3fs: ${bucket_prefix} -> ${mountpoint}"
+    if s3fs "${bucket_prefix}" "${mountpoint}" -o "${s3fs_opts}" ${S3FS_SSL_OPTS} ${extra_opts}; then
+        echo "Successfully mounted ${mountpoint}"
+    else
+        echo "ERROR: Failed to mount ${mountpoint} — falling back to empty directory" >&2
+    fi


If s3fs fails to mount, the script only logs an error and continues, leaving the mountpoint as a root-owned local directory. Subsequent startup steps (e.g., setup_env hook) will create files under /home/$NB_USER that may then be unwritable by the notebook user. Please make mount_s3fs return non-zero on failure and either (a) chown the mountpoint to NB_USER on failure, or (b) abort startup / switch to a clear local-home fallback path.

Copilot · 2026-03-26T21:40:59Z

scripts/entrypoint.sh

+        CRED_RESPONSE=$(curl -sL -w "\n%{http_code}" \
+            -H "Authorization: Bearer ${KBASE_AUTH_TOKEN}" \
+            "${GOVERNANCE_API_URL}/credentials" 2>/dev/null)


With set -x enabled, this curl invocation will log the expanded Authorization: Bearer ${KBASE_AUTH_TOKEN} header to stdout/stderr, potentially exposing the user token in container logs. Please disable xtrace (e.g., set +x) around the credential fetch (and re-enable after) or remove set -x entirely.

Copilot · 2026-03-26T21:40:59Z

scripts/entrypoint.sh

+    s3fs_opts="${s3fs_opts},url=${MINIO_URL}"
+    s3fs_opts="${s3fs_opts},use_path_request_style"


s3fs_opts uses url=${MINIO_URL}, but MINIO_URL is currently built with an unconditional http:// prefix when MINIO_ENDPOINT_URL lacks a scheme. That ignores MINIO_SECURE and can cause HTTPS MinIO endpoints to be contacted over HTTP. Please derive the scheme from MINIO_SECURE when constructing MINIO_URL so s3fs/boto3 use the correct protocol.

Copilot · 2026-03-26T21:40:59Z

scripts/entrypoint.sh

+if [[ "${MINIO_SECURE:-True}" != "True" ]]; then
+    S3FS_SSL_OPTS="-o no_check_certificate"


MINIO_SECURE handling for s3fs is inconsistent and likely inverted: it defaults to "True", compares case-sensitively to "True", and when it’s not exactly "True" it sets no_check_certificate (which disables TLS cert validation). This can unexpectedly disable certificate validation for values like "true"/"1"/"yes", and it doesn’t clearly separate "use https" vs "skip cert verify". Please parse MINIO_SECURE case-insensitively (as done elsewhere in the repo) and only apply no_check_certificate when using HTTPS and you explicitly want to skip verification.

Suggested change

if [[ "${MINIO_SECURE:-True}" != "True" ]]; then

S3FS_SSL_OPTS="-o no_check_certificate"

# Normalize MINIO_SECURE (case-insensitive); default to secure/verified HTTPS.

MINIO_SECURE_NORMALIZED="${MINIO_SECURE:-true}"

MINIO_SECURE_NORMALIZED="${MINIO_SECURE_NORMALIZED,,}"

# Only disable certificate verification when:

# 1) The endpoint URL is HTTPS, and

# 2) MINIO_SECURE explicitly requests insecure/skip-verify behavior.

if [[ "${MINIO_URL}" =~ ^https:// ]]; then

case "${MINIO_SECURE_NORMALIZED}" in

skip-verify|skip_verify|insecure|allow-insecure|allow_insecure)

S3FS_SSL_OPTS="-o no_check_certificate"

;;

*)

S3FS_SSL_OPTS=""

;;

esac

Copilot · 2026-03-26T21:40:59Z

scripts/entrypoint.sh

+s3 = boto3.client('s3', endpoint_url='${MINIO_URL}',
+    aws_access_key_id='${AWS_ACCESS_KEY_ID}', aws_secret_access_key='${AWS_SECRET_ACCESS_KEY}',
+    region_name='us-east-1')


The inline python3 -c embeds AWS access/secret keys directly in the command string (aws_access_key_id=..., aws_secret_access_key=...). That can leak credentials via process listings and shell debug output. Please rely on the exported AWS_* env vars (boto3 picks them up automatically) or pass credentials via a file/FD with restricted permissions rather than putting secrets in the command line.

Suggested change

s3 = boto3.client('s3', endpoint_url='${MINIO_URL}',

aws_access_key_id='${AWS_ACCESS_KEY_ID}', aws_secret_access_key='${AWS_SECRET_ACCESS_KEY}',

region_name='us-east-1')

s3 = boto3.client('s3', endpoint_url='${MINIO_URL}', region_name='us-east-1')

Tianhao-Gu added 2 commits March 10, 2026 22:50

read minio creds from postgres

2aaac4c

Copilot AI review requested due to automatic review settings March 16, 2026 02:43

Copilot started reviewing on behalf of Tianhao-Gu March 16, 2026 02:44 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Tianhao-Gu added 2 commits March 22, 2026 20:59

Merge main into feature/s3_mount

00c3782

Resolve conflicts in test_operations.py: - Keep main's comprehensive test classes for coverage - Fix list_user_names tests to patch correct module-level function - Add missing Path and _write_credentials_cache imports

Merge main into feature/s3_mount

b35190f

Resolve conflicts in operations.py, test_operations.py, and entrypoint.sh by keeping the feature/s3_mount versions (retry logic for credential fetch, s3fs FUSE mounts).

Copilot AI review requested due to automatic review settings March 23, 2026 18:36

Copilot started reviewing on behalf of Tianhao-Gu March 23, 2026 18:36 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Tianhao-Gu and others added 2 commits March 23, 2026 13:44

Merge branch 'main' into feature/s3_mount

bf16530

Copilot AI review requested due to automatic review settings March 26, 2026 21:36

Copilot started reviewing on behalf of Tianhao-Gu March 26, 2026 21:37 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

-# requests for every object in S3.
+# requests for every object in S3.
+#
+# However, if $HOME_DIRECTORY is *not* a FUSE mount (for example, when the S3
+# mount is disabled or failed), files and directories created by this hook may
+# be owned by root and thus unwritable by $NB_USER. In that case, fix ownership
+# only for the specific paths created here, not the entire home directory.
+# Detect filesystem type of the home directory. On FUSE-based mounts, %T
+# typically reports a value starting with "fuse" (e.g., "fuseblk", "fuse").
+FS_TYPE="$(stat -f -c %T "$HOME_DIRECTORY" 2>/dev/null || echo unknown)"
+# If the home directory is not on a FUSE filesystem, correct ownership of the
+# specific paths created/managed by this hook.
+case "$FS_TYPE" in
+  fuse* )
+    # On FUSE (e.g., s3fs), rely on mount-time uid/gid options; do not chown.
+    ;;
+  * )
+    NB_UID="$(id -u "$NB_USER")"
+    NB_GID="$(id -g "$NB_USER")"
+    # Only chown paths this script created or modified, to avoid expensive
+    # operations on large home directories.
+    chown "$NB_UID:$NB_GID" \
+      "$HOME_DIRECTORY/.bash_profile" \
+      "$CUSTOM_PROFILE"
+    chown -R "$NB_UID:$NB_GID" \
+      "$JUPYTER_DIR" \
+      "$IPYTHON_STARTUP_DIR"
+    ;;
+esac

		Calls GET /credentials on the governance API, which returns DB-cached credentials
		without rotating. No local file cache needed — the API is the cache.

-    Calls GET /credentials on the governance API, which returns DB-cached credentials
-    without rotating. No local file cache needed — the API is the cache.
+    Credentials are read from a local cache file when available; otherwise, the
+    function calls GET /credentials on the governance API (which returns
+    DB-cached credentials without rotating) and updates the local cache.

		s3fs_opts="${s3fs_opts},url=${MINIO_URL}"
		s3fs_opts="${s3fs_opts},use_path_request_style"

		if [[ "${MINIO_SECURE:-True}" != "True" ]]; then
		S3FS_SSL_OPTS="-o no_check_certificate"

-if [[ "${MINIO_SECURE:-True}" != "True" ]]; then
-    S3FS_SSL_OPTS="-o no_check_certificate"
+# Normalize MINIO_SECURE (case-insensitive); default to secure/verified HTTPS.
+MINIO_SECURE_NORMALIZED="${MINIO_SECURE:-true}"
+MINIO_SECURE_NORMALIZED="${MINIO_SECURE_NORMALIZED,,}"
+# Only disable certificate verification when:
+#   1) The endpoint URL is HTTPS, and
+#   2) MINIO_SECURE explicitly requests insecure/skip-verify behavior.
+if [[ "${MINIO_URL}" =~ ^https:// ]]; then
+    case "${MINIO_SECURE_NORMALIZED}" in
+        skip-verify|skip_verify|insecure|allow-insecure|allow_insecure)
+            S3FS_SSL_OPTS="-o no_check_certificate"
+            ;;
+        *)
+            S3FS_SSL_OPTS=""
+            ;;
+    esac

Conversation

Tianhao-Gu commented Mar 16, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

github-actions bot commented Mar 16, 2026 •

edited

Loading

codecov bot commented Mar 16, 2026 •

edited

Loading