-
Notifications
You must be signed in to change notification settings - Fork 83
feat(clp-package)!: Containerize the scripts in sbin via Docker Compose; Remove redundant assets from the package (resolves #1358).
#1512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAll host filesystem paths used by package utilities are resolved into the container-mounted host path before I/O and validation. CLI/sbin scripts now source a shared .common-env.sh and invoke Python modules inside a new clp-runtime container. Packaging and CI were adjusted to produce python-libs and include WebUI assets. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Shell as CLI script
participant Env as .common-env.sh
participant Compose as docker compose
participant Runtime as clp-runtime
participant Module as Python module
User->>Shell: invoke script (e.g., compress.sh)
Shell->>Env: source .common-env.sh
Env-->>Shell: export CLP_HOME, CLP_PWD_HOST, image ref, UID:GID, plugin/sock paths
Shell->>Compose: docker compose run --rm clp-runtime python3 -m <module> ...
Compose->>Runtime: start container (volumes & user mapping)
Runtime->>Module: run module
Module->>Module: call resolve_host_path_in_container() for host file ops (mkdir/open/chown/unlink)
Module-->>Runtime: exit with status
Runtime-->>Compose: container exits
Compose-->>Shell: return status
Shell-->>User: exit
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related issues
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| ) | ||
|
|
||
| necessary_mounts: List[Optional[DockerMount]] = [ | ||
| mounts.clp_home, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mounts like this have to be removed, or they conflict with the contents inside the docker image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, can you explain more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, that comment was so unclear that it took me time to recall why it was done -
on the host, clp_home is essentially the clp-package directory root. previously we were mounting the package root into the container so the python modules in lib/python3/site-packages can be accessed in the container. however, we have now removed such contents from the package (recall "now the package is only 180K"); if we mount an empty directory into /opt/clp in the container, anything in /opt/clp is essentially gone. we should just use the assets in the docker image and remove such mounts then
| file_extraction_parser.add_argument( | ||
| "-f", "--files-from", help="A file listing all files to extract." | ||
| ) | ||
| default_extraction_dir = pathlib.Path(os.environ.get("CLP_PWD_HOST", ".")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't explicitly set and read this, . resolves to /opt/clp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't really have to use Docker Compose for this container definition, though Docker Compose helps it keep structured and avoids duplicate code.
…ainer environments. Update components to utilize resolve_host_path for consistent path handling between host and container; adjust validations to support use of host-mounted paths.
…meter; unify with resolve_host_path.
junhaoliao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make sure if we don't miss resolving any paths and to avoid showing the prefixes in the error messages which may confuse users, we only
- "resolve" any host path right before their consumptions (e.g.,
.mkdir(),open(),.exists(),isdir()) - "resolve" only the paths that can be outside of the CLP_HOME directory, since we map CLP_HOME into the runtime container anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
tools/docker-images/clp-package/Dockerfile (1)
20-24: Split ENV directives into separate lines for readability.Per the learning from PR #1414, ENV directives should be split into separate lines rather than consolidated. This improves readability and aligns with project conventions, especially when later assignments depend on earlier ones (e.g., PATH references CLP_HOME).
ENV CLP_HOME="/opt/clp" -ENV LD_LIBRARY_PATH="${CLP_HOME}/lib" \ - PATH="${CLP_HOME}/sbin:${CLP_HOME}/bin:${PATH}" \ - PYTHONPATH="${CLP_HOME}/lib/python3/site-packages" \ - USER="clp-user" +ENV LD_LIBRARY_PATH="${CLP_HOME}/lib" +ENV PATH="${CLP_HOME}/sbin:${CLP_HOME}/bin:${PATH}" +ENV PYTHONPATH="${CLP_HOME}/lib/python3/site-packages" +ENV USER="clp-user"components/clp-py-utils/clp_py_utils/clp_config.py (1)
697-731: Use keyword for boolean param and preserve exception context.Name the boolean argument for readability and chain original exceptions.
Apply this diff:
@@ def validate_archive_output_config(self): - try: - validate_path_could_be_dir(self.archive_output.get_directory(), True) - except ValueError as ex: - raise ValueError(f"archive_output.storage's directory is invalid: {ex}") + try: + validate_path_could_be_dir(self.archive_output.get_directory(), use_host_mount=True) + except ValueError as ex: + raise ValueError("archive_output.storage's directory is invalid") from ex @@ def validate_stream_output_config(self): - try: - validate_path_could_be_dir(self.stream_output.get_directory(), True) - except ValueError as ex: - raise ValueError(f"stream_output.storage's directory is invalid: {ex}") + try: + validate_path_could_be_dir(self.stream_output.get_directory(), use_host_mount=True) + except ValueError as ex: + raise ValueError("stream_output.storage's directory is invalid") from ex @@ def validate_data_dir(self): - try: - validate_path_could_be_dir(self.data_directory, True) - except ValueError as ex: - raise ValueError(f"data_directory is invalid: {ex}") + try: + validate_path_could_be_dir(self.data_directory, use_host_mount=True) + except ValueError as ex: + raise ValueError("data_directory is invalid") from ex @@ def validate_logs_dir(self): - try: - validate_path_could_be_dir(self.logs_directory, True) - except ValueError as ex: - raise ValueError(f"logs_directory is invalid: {ex}") + try: + validate_path_could_be_dir(self.logs_directory, use_host_mount=True) + except ValueError as ex: + raise ValueError("logs_directory is invalid") from ex @@ def validate_tmp_dir(self): - try: - validate_path_could_be_dir(self.tmp_directory, True) - except ValueError as ex: - raise ValueError(f"tmp_directory is invalid: {ex}") + try: + validate_path_could_be_dir(self.tmp_directory, use_host_mount=True) + except ValueError as ex: + raise ValueError("tmp_directory is invalid") from excomponents/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py (3)
225-233: Ensure logs dir exists before writing temp files.dump_container_config writes under logs_directory; if the directory is missing, writes fail.
Apply this diff near config load success:
@@ try: config_file_path = pathlib.Path(parsed_args.config) clp_config = load_config_file(config_file_path, default_config_file_path, clp_home) - clp_config.validate_logs_dir() + clp_config.validate_logs_dir() + # Ensure host logs directory exists for generated files + resolve_host_path(clp_config.logs_directory).mkdir(parents=True, exist_ok=True)
305-314: Propagate AWS env vars when using env-vars auth.docker run only receives envs explicitly passed. Forward standard AWS vars if present.
Apply this diff:
@@ - extra_env_vars = { + extra_env_vars = { CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username, CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password, } + # Forward AWS env vars when set (env_vars auth or general SDK usage) + import os + for key in ( + "AWS_ACCESS_KEY_ID", + "AWS_SECRET_ACCESS_KEY", + "AWS_SESSION_TOKEN", + "AWS_PROFILE", + "AWS_DEFAULT_REGION", + "AWS_REGION", + ): + val = os.getenv(key) + if val: + extra_env_vars[key] = val
37-51: Param naming no longer reflects usage.container_url_list_path now receives a host-resolved path. Rename to url_list_path for clarity.
-def _generate_url_list( - subcommand: str, - container_url_list_path: pathlib.Path, +def _generate_url_list( + subcommand: str, + url_list_path: pathlib.Path, @@ - with open(container_url_list_path, "w") as url_list_file: + with open(url_list_path, "w") as url_list_file:components/clp-package-utils/clp_package_utils/scripts/compress.py (2)
231-236: Ensure logs dir exists before writing logs-list/config.Without creating the host logs directory, subsequent writes may fail on fresh installs.
Apply this diff after dump_container_config:
@@ generated_config_path_on_container, generated_config_path_on_host = dump_container_config( container_clp_config, clp_config, get_container_config_filename(container_name) ) + # Ensure host logs directory exists for generated files + resolve_host_path(clp_config.logs_directory).mkdir(parents=True, exist_ok=True)
37-50: Param naming mismatch.container_logs_list_path now points to a host-resolved path. Rename for clarity.
-def _generate_logs_list( - container_logs_list_path: pathlib.Path, +def _generate_logs_list( + logs_list_path: pathlib.Path, @@ - with open(container_logs_list_path, "w") as container_logs_list_file: + with open(logs_list_path, "w") as container_logs_list_file:components/clp-package-utils/clp_package_utils/general.py (1)
775-792: Use keyword for boolean arg and chain exceptions in validators.Improves readability and preserves root cause.
Apply this diff:
@@ def _validate_data_directory(data_dir: pathlib.Path, component_name: str) -> None: - try: - validate_path_could_be_dir(data_dir, True) - except ValueError as ex: - raise ValueError(f"{component_name} data directory is invalid: {ex}") + try: + validate_path_could_be_dir(data_dir, use_host_mount=True) + except ValueError as ex: + raise ValueError(f"{component_name} data directory is invalid") from ex @@ def _validate_log_directory(logs_dir: pathlib.Path, component_name: str): - try: - validate_path_could_be_dir(logs_dir, True) - except ValueError as ex: - raise ValueError(f"{component_name} logs directory is invalid: {ex}") + try: + validate_path_could_be_dir(logs_dir, use_host_mount=True) + except ValueError as ex: + raise ValueError(f"{component_name} logs directory is invalid") from ex
♻️ Duplicate comments (1)
components/clp-package-utils/clp_package_utils/scripts/decompress.py (1)
305-312: Good: default extraction dir honours CLP_PWD_HOST.This addresses the earlier issue where "." resolved to /opt/clp inside containers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (24)
components/clp-package-utils/clp_package_utils/controller.py(17 hunks)components/clp-package-utils/clp_package_utils/general.py(10 hunks)components/clp-package-utils/clp_package_utils/scripts/archive_manager.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/compress.py(4 hunks)components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py(4 hunks)components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py(3 hunks)components/clp-package-utils/clp_package_utils/scripts/decompress.py(8 hunks)components/clp-package-utils/clp_package_utils/scripts/search.py(3 hunks)components/clp-package-utils/clp_package_utils/scripts/start_clp.py(2 hunks)components/clp-py-utils/clp_py_utils/clp_config.py(5 hunks)components/clp-py-utils/clp_py_utils/core.py(2 hunks)components/package-template/src/sbin/.common-env.sh(1 hunks)components/package-template/src/sbin/admin-tools/archive-manager.sh(1 hunks)components/package-template/src/sbin/admin-tools/dataset-manager.sh(1 hunks)components/package-template/src/sbin/compress-from-s3.sh(1 hunks)components/package-template/src/sbin/compress.sh(1 hunks)components/package-template/src/sbin/decompress.sh(1 hunks)components/package-template/src/sbin/search.sh(1 hunks)components/package-template/src/sbin/start-clp.sh(1 hunks)components/package-template/src/sbin/stop-clp.sh(1 hunks)taskfile.yaml(6 hunks)taskfiles/docker-images.yaml(1 hunks)tools/deployment/package/docker-compose.runtime.yaml(1 hunks)tools/docker-images/clp-package/Dockerfile(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-08-13T14:48:49.020Z
Learnt from: haiqi96
PR: y-scope/clp#1144
File: components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py:106-114
Timestamp: 2025-08-13T14:48:49.020Z
Learning: For the dataset manager scripts in components/clp-package-utils/clp_package_utils/scripts/, the native script (native/dataset_manager.py) is designed to only be called through the wrapper script (dataset_manager.py), so dataset validation is only performed at the wrapper level rather than duplicating it in the native script.
Applied to files:
components/package-template/src/sbin/admin-tools/dataset-manager.sh
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
PR: y-scope/clp#1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Applied to files:
tools/docker-images/clp-package/Dockerfile
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-09-25T19:26:32.436Z
Learnt from: junhaoliao
PR: y-scope/clp#1335
File: taskfiles/docker-images.yaml:15-15
Timestamp: 2025-09-25T19:26:32.436Z
Learning: In the CLP project's Taskfile, the `:package` task creates the G_PACKAGE_BUILD_DIR directory structure, so any task that depends on `:package` (like `docker-images:package`) can safely assume this directory exists without needing additional mkdir commands.
Applied to files:
taskfiles/docker-images.yaml
🧬 Code graph analysis (10)
components/clp-package-utils/clp_package_utils/scripts/compress.py (1)
components/clp-py-utils/clp_py_utils/core.py (1)
resolve_host_path(68-85)
components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py (1)
components/clp-py-utils/clp_py_utils/core.py (1)
resolve_host_path(68-85)
components/clp-package-utils/clp_package_utils/scripts/search.py (1)
components/clp-py-utils/clp_py_utils/core.py (1)
resolve_host_path(68-85)
components/clp-package-utils/clp_package_utils/scripts/decompress.py (1)
components/clp-py-utils/clp_py_utils/core.py (2)
resolve_host_path(68-85)validate_path_could_be_dir(88-95)
components/clp-package-utils/clp_package_utils/controller.py (3)
components/clp-py-utils/clp_py_utils/core.py (1)
resolve_host_path(68-85)components/clp-package-utils/clp_package_utils/general.py (1)
validate_webui_config(587-597)components/clp-py-utils/clp_py_utils/clp_config.py (2)
get_directory(551-552)get_directory(562-563)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (2)
components/clp-py-utils/clp_py_utils/core.py (1)
resolve_host_path(68-85)components/clp-py-utils/clp_py_utils/clp_config.py (2)
get_directory(551-552)get_directory(562-563)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
components/clp-py-utils/clp_py_utils/core.py (1)
resolve_host_path(68-85)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
components/clp-py-utils/clp_py_utils/core.py (2)
resolve_host_path(68-85)validate_path_could_be_dir(88-95)
components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py (1)
components/clp-py-utils/clp_py_utils/core.py (1)
resolve_host_path(68-85)
components/clp-package-utils/clp_package_utils/general.py (1)
components/clp-py-utils/clp_py_utils/core.py (4)
resolve_host_path(68-85)read_yaml_config_file(59-65)make_config_path_absolute(45-56)validate_path_could_be_dir(88-95)
🪛 Checkov (3.2.334)
tools/docker-images/clp-package/Dockerfile
[low] 1-44: Ensure that HEALTHCHECK instructions have been added to container images
(CKV_DOCKER_2)
🪛 Ruff (0.14.2)
components/clp-package-utils/clp_package_utils/scripts/compress.py
61-61: Unnecessary mode argument
Remove mode argument
(UP015)
components/clp-package-utils/clp_package_utils/scripts/decompress.py
84-84: Boolean positional value in function call
(FBT003)
components/clp-package-utils/clp_package_utils/controller.py
835-835: Unnecessary mode argument
Remove mode argument
(UP015)
components/clp-py-utils/clp_py_utils/clp_config.py
678-678: Avoid specifying long messages outside the exception class
(TRY003)
697-697: Boolean positional value in function call
(FBT003)
711-711: Boolean positional value in function call
(FBT003)
713-713: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
713-713: Avoid specifying long messages outside the exception class
(TRY003)
717-717: Boolean positional value in function call
(FBT003)
719-719: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
719-719: Avoid specifying long messages outside the exception class
(TRY003)
723-723: Boolean positional value in function call
(FBT003)
725-725: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
725-725: Avoid specifying long messages outside the exception class
(TRY003)
729-729: Boolean positional value in function call
(FBT003)
components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
61-61: Unnecessary mode argument
Remove mode argument
(UP015)
components/clp-py-utils/clp_py_utils/core.py
88-88: Boolean-typed positional argument in function definition
(FBT001)
components/clp-package-utils/clp_package_utils/general.py
479-479: Avoid specifying long messages outside the exception class
(TRY003)
776-776: Boolean positional value in function call
(FBT003)
790-790: Boolean positional value in function call
(FBT003)
🪛 Shellcheck (0.11.0)
components/package-template/src/sbin/compress.sh
[warning] 5-5: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
components/package-template/src/sbin/admin-tools/dataset-manager.sh
[warning] 5-5: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
components/package-template/src/sbin/.common-env.sh
[info] 23-23: Command appears to be unreachable. Check usage (or ignore if invoked indirectly).
(SC2317)
[warning] 30-30: Declare and assign separately to avoid masking return values.
(SC2155)
components/package-template/src/sbin/search.sh
[warning] 5-5: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
components/package-template/src/sbin/stop-clp.sh
[warning] 5-5: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
components/package-template/src/sbin/decompress.sh
[warning] 5-5: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
components/package-template/src/sbin/start-clp.sh
[warning] 5-5: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
components/package-template/src/sbin/admin-tools/archive-manager.sh
[warning] 5-5: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
components/package-template/src/sbin/compress-from-s3.sh
[warning] 5-5: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
🔇 Additional comments (30)
components/package-template/src/sbin/compress.sh (1)
4-10: Approved—wrapper pattern aligns with containerization design.The sourcing of .common-env.sh and docker compose invocation follow the containerization pattern established across CLI tools. The ShellCheck warning SC1090 is a false positive; the script_dir assignment is necessarily dynamic.
components/package-template/src/sbin/search.sh (1)
4-10: Approved—consistent containerization pattern.Mirrors the established pattern from compress.sh and other wrapper scripts. No additional concerns.
components/package-template/src/sbin/compress-from-s3.sh (1)
4-10: Approved—follows established wrapper pattern.tools/docker-images/clp-package/Dockerfile (1)
30-44: Approved—explicit COPY structure supports maintainability.The granular COPY commands for each component (binaries, dependencies, spider, node, Python libs, WebUI) provide clarity over consolidated COPYs and align with the build restructuring described in the PR objectives.
tools/deployment/package/docker-compose.runtime.yaml (1)
1-19: Approved—comprehensive runtime configuration with appropriate defaults.The docker-compose service is well-structured with:
- Sensible defaults for image reference and user mapping
- Proper volume bindings for Docker-in-Docker capability and host file access
- Environment variables propagated from the host
- Read-only protections on critical mounts (plugin dir, docker binary)
components/package-template/src/sbin/.common-env.sh (2)
6-24: Approved—robust image reference resolution.Image reference fallback chain (clp-package-image.id → VERSION → error) is correct. The dual-mode error handling on line 23 (
return 1 2>/dev/null || exit 1) is intentional and necessary for scripts that may be sourced or executed directly; ShellCheck warning SC2317 is a false positive here.
32-53: Approved—defensive Docker plugin and socket discovery.The plugin directory search is well-ordered, and the socket path detection correctly verifies the socket exists before exporting. The warning message appropriately informs users if the plugin directory cannot be found. Error handling for missing docker binary (redirected to /dev/null) is acceptable; docker compose will fail gracefully if unavailable.
components/package-template/src/sbin/stop-clp.sh (1)
4-10: Approved—follows established containerization pattern.Consistent with other wrapper scripts (compress.sh, search.sh). No additional concerns.
components/package-template/src/sbin/admin-tools/archive-manager.sh (1)
4-10: Approved—correct relative path for subdirectory location.The relative path
$script_dir/../.common-env.shcorrectly references the common environment script from the admin-tools/ subdirectory. Follows the established pattern with appropriate path adjustment for directory depth.components/package-template/src/sbin/start-clp.sh (1)
4-10: LGTM! Clean containerization of the start-clp script.The script now sources a common environment configuration and delegates execution to the Docker Compose-managed runtime. This is consistent with the PR's objective of containerizing CLP tools.
Note: The ShellCheck SC1090 warning about non-constant source is expected and can be safely ignored—this is a standard pattern for shell wrappers.
components/package-template/src/sbin/decompress.sh (1)
4-10: LGTM! Consistent containerization pattern.The changes mirror the approach used in other scripts, sourcing the common environment and delegating to the containerized runtime. The implementation is clean and consistent with the PR's containerization objectives.
taskfile.yaml (3)
58-58: Verify the default task dependency change.The default task now depends on
packageinstead ofdocker-images:package. This shifts the top-level build workflow to produce the package directory structure rather than just the Docker image.Please confirm this workflow change is intentional and aligns with the expected developer experience when running
taskwithout arguments.
106-152: LGTM! Package task properly orchestrates containerized artifacts.The restructured package task now:
- Depends on
docker-images:packageto ensure the container image is built- Copies webui settings and the image ID into the package structure
- Uses rsync with proper flags for directory synchronization
This aligns well with the PR's objective to containerize tools while maintaining the package structure for deployment.
472-510: LGTM! Python libraries task is well-structured.The new
python-libsinternal task:
- Properly depends on all Python component builds and the package venv
- Uses checksum-based incremental builds
- Installs wheels to an isolated output directory with
-tflag- Follows the same pattern as other build tasks in the file
This cleanly separates Python library packaging from the Docker image build.
taskfiles/docker-images.yaml (1)
4-13: LGTM! Docker image build properly depends on individual components.The restructured task now:
- Depends directly on
:init,:core,:deps:spider,:python-libs, and:webui- Removes the circular dependency pattern where the package depended on the image
- Delegates artifact assembly to
build.shThis inversion of dependencies is cleaner and allows the Docker image to be composed directly from build outputs, consistent with the PR's containerization approach.
components/package-template/src/sbin/admin-tools/dataset-manager.sh (1)
4-10: LGTM! Consistent containerization with correct relative path.The script properly sources the common environment from the parent directory (
../.common-env.sh) since it resides in theadmin-toolssubdirectory. The Docker Compose invocation follows the established pattern.components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (3)
17-17: LGTM! Host path resolution import added.The
resolve_host_pathimport is correctly used at line 284 to translate the generated config path before cleanup.
223-226: LGTM! Removed clp_home mount to avoid conflicts.Removing
mounts.clp_homefrom the necessary mounts list prevents conflicts with the Docker image contents, as noted in past review feedback. The container now only mounts the logs and archives directories that it needs to access.
284-285: LGTM! Host path resolution before cleanup.The code correctly resolves the generated config path to its container-mount equivalent before unlinking, ensuring the cleanup operation targets the correct filesystem location in the containerized runtime.
components/clp-package-utils/clp_package_utils/scripts/search.py (3)
16-16: LGTM! Host path resolution import added.The import is correctly used at line 184 for path resolution before cleanup.
132-132: LGTM! Streamlined mounts list.Removing
mounts.clp_homefrom the necessary mounts list follows the same pattern as other scripts in this PR, avoiding conflicts with containerized resources.
184-185: LGTM! Consistent host path resolution pattern.The cleanup logic matches the pattern used in archive_manager.py and other scripts, ensuring correct path resolution in the containerized environment.
components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py (3)
17-17: LGTM! Host path resolution import added.The import is used consistently with other scripts at line 188 for cleanup operations.
135-135: LGTM! Consistent mount configuration.The necessary mounts list initialization follows the same pattern as search.py and archive_manager.py, starting with only
logs_dirand conditionally adding other mounts as needed.
188-189: LGTM! Standard cleanup pattern.The cleanup logic uses the same host path resolution approach as the other Python scripts, ensuring consistency across the codebase.
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
7-7: LGTM on host-path resolution import.Importing resolve_host_path here is consistent with the containerised runtime path model.
components/clp-py-utils/clp_py_utils/clp_config.py (1)
676-681: Correctly validating logs_input against the host-mounted path.Using resolve_host_path for existence/type checks fixes container/host path mismatches.
components/clp-package-utils/clp_package_utils/controller.py (2)
446-452: Validation path usage is consistent.Passing original paths to validate_webui_config is fine since that helper resolves to host‑mounted paths internally.
123-129: Verification confirms bind-mount configuration is correct; code pattern approved.The clp-runtime service correctly bind-mounts the host root at
/mnt/host, confirming thatresolve_host_path()operations (mkdir, chown) properly affect the Docker host as intended. The selective application of_chown_paths_if_root()to infrastructure services only is appropriate and well-scoped.components/clp-py-utils/clp_py_utils/core.py (1)
7-7: Mount configuration verified: /mnt/host binding is present and correct.The docker-compose.runtime.yaml correctly defines the required volume mount at line 18:
- "/:/mnt/host", which binds the entire host filesystem to /mnt/host in the clp-runtime container. The CONTAINER_HOST_ROOT_DIR constant is properly aligned with this runtime configuration, and resolve_host_path operations will function as intended.
| resolved_client_settings_json_path, client_settings_json_updates | ||
| ) | ||
| with open(client_settings_json_path, "w") as client_settings_json_file: | ||
| with open(resolved_client_settings_json_path, "w") as client_settings_json_file: | ||
| client_settings_json_file.write(json.dumps(client_settings_json)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Prefer json.dump with indentation for readability.
Small DX improvement for troubleshooting settings.
- with open(resolved_client_settings_json_path, "w") as client_settings_json_file:
- client_settings_json_file.write(json.dumps(client_settings_json))
+ with open(resolved_client_settings_json_path, "w") as client_settings_json_file:
+ json.dump(client_settings_json, client_settings_json_file, indent=2, sort_keys=True)
@@
- with open(resolved_server_settings_json_path, "w") as settings_json_file:
- settings_json_file.write(json.dumps(server_settings_json))
+ with open(resolved_server_settings_json_path, "w") as settings_json_file:
+ json.dump(server_settings_json, settings_json_file, indent=2, sort_keys=True)Also applies to: 536-537
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/controller.py around lines
479-482 (and also apply the same change at lines ~536-537), replace the manual
json.dumps write with json.dump using an indent parameter for readability; open
the file as currently done but call json.dump(client_settings_json,
client_settings_json_file, indent=2) (or another suitable indent) so the written
JSON is pretty-printed and easier to troubleshoot.
| instance_id_file_path = clp_config.logs_directory / "instance-id" | ||
| resolved_instance_id_file_path = resolve_host_path(instance_id_file_path) | ||
|
|
||
| if instance_id_file_path.exists(): | ||
| with open(instance_id_file_path, "r") as f: | ||
| if resolved_instance_id_file_path.exists(): | ||
| with open(resolved_instance_id_file_path, "r") as f: | ||
| instance_id = f.readline() | ||
| else: | ||
| instance_id = str(uuid.uuid4())[-4:] | ||
| with open(instance_id_file_path, "w") as f: | ||
| with open(resolved_instance_id_file_path, "w") as f: | ||
| f.write(instance_id) | ||
|
|
||
| return instance_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Trim minor style issue and reduce collision risk for instance IDs.
- open(..., "r") mode is redundant; default is read (UP015).
- Optional: use 8 hex chars to reduce collisions.
- if resolved_instance_id_file_path.exists():
- with open(resolved_instance_id_file_path, "r") as f:
+ if resolved_instance_id_file_path.exists():
+ with open(resolved_instance_id_file_path) as f:
instance_id = f.readline()
else:
- instance_id = str(uuid.uuid4())[-4:]
+ instance_id = uuid.uuid4().hex[-8:]
with open(resolved_instance_id_file_path, "w") as f:
f.write(instance_id)🧰 Tools
🪛 Ruff (0.14.2)
835-835: Unnecessary mode argument
Remove mode argument
(UP015)
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/controller.py around lines 831
to 842, remove the redundant "r" mode when opening the instance-id file and trim
the read value, and generate a longer 8-hex-character instance id to reduce
collision risk: when reading, open with default read mode (omit "r") and use
.readline().strip() to remove trailing newlines; when creating a new id, use
uuid.uuid4().hex and take the last 8 characters (or otherwise produce 8 hex
chars) before writing it to the resolved path using the default write mode.
| resolved_config_file_path_on_host = resolve_host_path(config_file_path_on_host) | ||
| with open(resolved_config_file_path_on_host, "w") as f: | ||
| yaml.safe_dump(container_clp_config.dump_to_primitive_dict(), f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create parent directory before writing the container config file.
Without this, first-run writes fail if the logs dir hierarchy is missing.
Apply this diff:
- resolved_config_file_path_on_host = resolve_host_path(config_file_path_on_host)
- with open(resolved_config_file_path_on_host, "w") as f:
+ resolved_config_file_path_on_host = resolve_host_path(config_file_path_on_host)
+ resolved_config_file_path_on_host.parent.mkdir(parents=True, exist_ok=True)
+ with open(resolved_config_file_path_on_host, "w") as f:
yaml.safe_dump(container_clp_config.dump_to_primitive_dict(), f)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resolved_config_file_path_on_host = resolve_host_path(config_file_path_on_host) | |
| with open(resolved_config_file_path_on_host, "w") as f: | |
| yaml.safe_dump(container_clp_config.dump_to_primitive_dict(), f) | |
| resolved_config_file_path_on_host = resolve_host_path(config_file_path_on_host) | |
| resolved_config_file_path_on_host.parent.mkdir(parents=True, exist_ok=True) | |
| with open(resolved_config_file_path_on_host, "w") as f: | |
| yaml.safe_dump(container_clp_config.dump_to_primitive_dict(), f) |
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 359 to
361, the code opens the resolved config file path for writing without ensuring
its parent directory exists; create the parent directory before opening the file
by resolving the path (use Path or os.path), computing parent_dir =
resolved_path.parent (or os.path.dirname), and calling os.makedirs(parent_dir,
exist_ok=True) (skip if parent_dir is empty or root), then open and write the
YAML as before so first-run writes won't fail when the logs dir hierarchy is
missing.
| if not resolved_credentials_file_path.exists(): | ||
| if ( | ||
| make_config_path_absolute(clp_home, CLP_DEFAULT_CREDENTIALS_FILE_PATH) | ||
| == credentials_file_path | ||
| and generate_default_file | ||
| ): | ||
| generate_credentials_file(credentials_file_path) | ||
| generate_credentials_file(resolved_credentials_file_path) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also mkdir for generated default credentials file.
Generating the default file will fail if its parent dir does not exist.
Apply this diff:
- resolved_credentials_file_path = resolve_host_path(credentials_file_path)
+ resolved_credentials_file_path = resolve_host_path(credentials_file_path)
@@
- generate_credentials_file(resolved_credentials_file_path)
+ resolved_credentials_file_path.parent.mkdir(parents=True, exist_ok=True)
+ generate_credentials_file(resolved_credentials_file_path)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not resolved_credentials_file_path.exists(): | |
| if ( | |
| make_config_path_absolute(clp_home, CLP_DEFAULT_CREDENTIALS_FILE_PATH) | |
| == credentials_file_path | |
| and generate_default_file | |
| ): | |
| generate_credentials_file(credentials_file_path) | |
| generate_credentials_file(resolved_credentials_file_path) | |
| else: | |
| if not resolved_credentials_file_path.exists(): | |
| if ( | |
| make_config_path_absolute(clp_home, CLP_DEFAULT_CREDENTIALS_FILE_PATH) | |
| == credentials_file_path | |
| and generate_default_file | |
| ): | |
| resolved_credentials_file_path.parent.mkdir(parents=True, exist_ok=True) | |
| generate_credentials_file(resolved_credentials_file_path) | |
| else: |
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 471 to
478, the code generates a default credentials file but doesn't ensure its parent
directory exists; before calling
generate_credentials_file(resolved_credentials_file_path) create the parent
directory for resolved_credentials_file_path (e.g.,
resolved_credentials_file_path.parent.mkdir(parents=True, exist_ok=True)) so
file creation won't fail when the directory is missing.
| resolved_inputs_from_path = resolve_host_path(pathlib.Path(parsed_args.inputs_from)) | ||
| with open(resolved_inputs_from_path, "r") as input_file: | ||
| for line in input_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Good: read inputs_from via resolved host path.
This aligns file I/O with the container/host mapping. Minor: open(..., "r") is redundant; default is read.
🧰 Tools
🪛 Ruff (0.14.2)
61-61: Unnecessary mode argument
Remove mode argument
(UP015)
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
around lines 60 to 62, the open(...) call uses an explicit "r" mode which is
redundant because open defaults to read mode; change the file open to omit the
mode argument (e.g., with open(resolved_inputs_from_path) as input_file:) to
simplify the code while keeping the resolved host path logic intact.
| def resolve_host_path(path: pathlib.Path) -> pathlib.Path: | ||
| """ | ||
| Translates a host path to its container-mount equivalent. | ||
| :param path: The host path. | ||
| :return: The translated container path. | ||
| """ | ||
| path = path.absolute() | ||
| resolved = CONTAINER_HOST_ROOT_DIR / path.relative_to("/") | ||
|
|
||
| try: | ||
| if resolved.is_symlink(): | ||
| target_path = (resolved.parent / resolved.readlink()).resolve() | ||
| resolved = CONTAINER_HOST_ROOT_DIR / target_path.relative_to("/") | ||
| except OSError: | ||
| pass | ||
|
|
||
| return resolved | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Container‑only semantics and resilience of resolve_host_path.
- Works assuming /mnt/host is mounted. Consider documenting this explicitly in the docstring.
- Swallowing OSError hides resolution issues; log at debug at least.
- If portability matters, consider making the mount root configurable via an env (e.g., CLP_HOST_MOUNT_ROOT), defaulting to /mnt/host.
def resolve_host_path(path: pathlib.Path) -> pathlib.Path:
- """
- Translates a host path to its container-mount equivalent.
+ """
+ Translates a host path to its container-mount equivalent.
+ Requires the container to bind‑mount the Docker host root at CONTAINER_HOST_ROOT_DIR.
@@
- try:
- if resolved.is_symlink():
- target_path = (resolved.parent / resolved.readlink()).resolve()
- resolved = CONTAINER_HOST_ROOT_DIR / target_path.relative_to("/")
- except OSError:
- pass
+ try:
+ if resolved.is_symlink():
+ target_path = (resolved.parent / resolved.readlink()).resolve()
+ resolved = CONTAINER_HOST_ROOT_DIR / target_path.relative_to("/")
+ except OSError as e:
+ # Best‑effort: unresolved symlinks shouldn't break call sites
+ import logging
+ logging.getLogger(__name__).debug("resolve_host_path symlink check failed: %s", e)| def validate_path_could_be_dir(path: pathlib.Path, use_host_mount: bool): | ||
| part = resolve_host_path(path) if use_host_mount else path | ||
| while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Avoid boolean positional; make keyword‑only parameter (fixes FBT001/FBT003).
Make use_host_mount keyword‑only (with default), and update callers to pass by name.
-def validate_path_could_be_dir(path: pathlib.Path, use_host_mount: bool):
+def validate_path_could_be_dir(path: pathlib.Path, *, use_host_mount: bool = False):
part = resolve_host_path(path) if use_host_mount else pathFollow‑up in callers (e.g., decompress.py): validate_path_could_be_dir(extraction_dir, use_host_mount=True).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def validate_path_could_be_dir(path: pathlib.Path, use_host_mount: bool): | |
| part = resolve_host_path(path) if use_host_mount else path | |
| while True: | |
| def validate_path_could_be_dir(path: pathlib.Path, *, use_host_mount: bool = False): | |
| part = resolve_host_path(path) if use_host_mount else path | |
| while True: |
🧰 Tools
🪛 Ruff (0.14.2)
88-88: Boolean-typed positional argument in function definition
(FBT001)
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/core.py around lines 88 to 90, the
function validate_path_could_be_dir currently accepts use_host_mount as a
positional boolean; change the signature to make use_host_mount a keyword-only
parameter with a default (e.g., *, use_host_mount: bool = False) so callers must
pass it by name. Update all call sites (e.g., decompress.py) to pass
use_host_mount=True or False by name (validate_path_could_be_dir(extraction_dir,
use_host_mount=True)) and run tests/lint to ensure FBT001/FBT003 violations are
resolved.
| resolved = CONTAINER_HOST_ROOT_DIR / path.relative_to("/") | ||
|
|
||
| try: | ||
| if resolved.is_symlink(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is needed because somehow my WSL Ubuntu has this
(clp) junhao@ASUS-X870E:~$ ls -l .aws
lrwxrwxrwx 1 junhao junhao 24 Aug 26 12:07 .aws -> /mnt/c/Users/junhao/.aws
/mnt/c/Users/junhao/.aws is certainly not a valid path in the runtime container, to access, the bootstrap scripts should use the prefix + /mnt/c/Users/junhao/.aws
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a complete review but some things I noticed while reading so far.
Some of the shell scripts have lines longer than 100 chars as well as shellcheck warnings. Can you address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
components/clp-py-utils/clp_py_utils/core.py (1)
87-89: Log OSError at debug level instead of silently swallowing.Silent exception handling masks resolution issues. Past review comments suggested logging at debug level to aid troubleshooting.
Apply this diff to add debug logging:
+ import logging + try: if not translated_path.is_symlink(): return translated_path link_target = translated_path.readlink() if link_target.is_absolute(): return CONTAINER_DIR_FOR_HOST_ROOT / link_target.relative_to("/") else: # If the symlink points to a relative path, resolve it relative to the symlink's parent. return (translated_path.parent / link_target).resolve() - except OSError: + except OSError as e: # Ignore if reading the symlink fails (e.g., broken link or permission error). - pass + logging.getLogger(__name__).debug( + "resolve_host_path_in_container: symlink resolution failed for %s: %s", + translated_path, + e + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/clp-py-utils/clp_py_utils/core.py(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1413
File: tools/docker-images/clp-package/Dockerfile:22-24
Timestamp: 2025-10-20T21:05:30.417Z
Learning: In the clp repository's Dockerfiles, ENV directives should be consolidated into multi-line ENV statements when possible to reduce image layers. ENV statements should only be split into separate commands when consolidation is not possible due to dependencies (e.g., when later variables must reference earlier ones that need to be set first, or when PATH must be modified sequentially).
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
components/clp-py-utils/clp_py_utils/core.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: ubuntu-jammy-deps-image
- GitHub Check: musllinux_1_2-x86_64-deps-image
- GitHub Check: centos-stream-9-deps-image
- GitHub Check: manylinux_2_28-x86_64-deps-image
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
components/clp-py-utils/clp_py_utils/core.py (2)
6-7: LGTM! Clear constant definition.The constant name follows the naming suggestion from past reviews and clearly indicates the container directory for the host root mount.
94-101: Verification complete — all call sites properly updated.All eight call sites of
validate_path_could_be_dirare using a single argument, and each correctly invokesresolve_host_path_in_container()before passing the path to the validation function. The removal of theuse_host_mountparameter from the function signature has been fully propagated across the codebase:
- In
clp_config.py, the validation methods conditionally resolve paths based on their ownuse_host_mountparameter before calling the validator.- In
general.pyanddecompress.py, paths are resolved unconditionally before validation.The code changes are sound and complete.
| try: | ||
| if not translated_path.is_symlink(): | ||
| return translated_path | ||
| link_target = translated_path.readlink() | ||
| if link_target.is_absolute(): | ||
| return CONTAINER_DIR_FOR_HOST_ROOT / link_target.relative_to("/") | ||
| else: | ||
| # If the symlink points to a relative path, resolve it relative to the symlink's parent. | ||
| return (translated_path.parent / link_target).resolve() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Inconsistent symlink resolution depth.
The handling differs for absolute vs. relative symlink targets:
- Absolute targets (lines 82-83): translated directly without following further symlinks.
- Relative targets (line 86):
.resolve()is called, which follows all symlinks in the target path.
This inconsistency means a relative symlink pointing to another symlink will be fully resolved, but an absolute symlink pointing to another symlink won't. Consider applying .resolve() consistently in both branches or documenting this behavior explicitly.
For consistent single-level resolution, you could refactor to avoid .resolve() and handle path normalization separately:
if link_target.is_absolute():
return CONTAINER_DIR_FOR_HOST_ROOT / link_target.relative_to("/")
else:
- # If the symlink points to a relative path, resolve it relative to the symlink's parent.
- return (translated_path.parent / link_target).resolve()
+ # If the symlink points to a relative path, normalize it relative to the symlink's parent.
+ # Note: this doesn't follow further symlinks, only normalizes . and .. components.
+ normalized = (translated_path.parent / link_target)
+ # Since Path doesn't provide non-symlink-following normalization, we accept that
+ # relative symlinks may not be fully canonicalized or we accept full resolution.
+ return normalized.resolve()Alternatively, document clearly that relative symlinks undergo full resolution while absolute ones do not.
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
| # Validate and load config file. | ||
| config_file_path = pathlib.Path(parsed_args.config) | ||
| clp_config = load_config_file(config_file_path, default_config_file_path, clp_home) | ||
| clp_config = load_config_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside load_config_file, we print one of the arguments in a log, which I think is wrong, right? I'm fine leaving it as is and opening an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the log would be confusing to general users when load_config_file is used in the "bootstrap" scripts. I didn't modify load_config_file directly because the "native" scripts also uses load_config_file - it's possible to use a flag to determine whether the method is called in a bootstrap script or a native script though.
@coderabbitai please create an issue for the inaccurate log and we will fix it as we clean up the bootstrap / native scripts in a future PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create GitHub issue for inaccurate log in load_config_file
gh issue create \
--repo y-scope/clp \
--title "Inaccurate log message in load_config_file when used in bootstrap scripts" \
--body "## Description
When \`load_config_file\` is called from bootstrap scripts (containerized CLI wrappers), the log messages print container paths instead of host paths, which can confuse users.
## Context
The bootstrap scripts now resolve host paths before passing them to \`load_config_file\`, but the function logs one of the arguments directly. This creates misleading output for users when the scripts run inside containers.
The native scripts also use \`load_config_file\`, so a flag could be introduced to determine whether the method is called in a bootstrap script or a native script to control the logging behaviour.
## References
- Pull Request: https://github.com/y-scope/clp/pull/1512
- Comment: https://github.com/y-scope/clp/pull/1512#discussion_r2490812330
- Requested by: @junhaoliao
## Follow-up
This will be addressed in a future PR when cleaning up the bootstrap/native scripts." \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1548 to track the inaccurate log message in load_config_file when used in bootstrap scripts. The issue has been assigned to you and includes references to this PR and comment thread.
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
| ) | ||
|
|
||
| necessary_mounts: List[Optional[DockerMount]] = [ | ||
| mounts.clp_home, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, can you explain more?
Co-authored-by: kirkrodrigues <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (6)
components/clp-package-utils/clp_package_utils/controller.py (5)
445-450: Consider moving these declarations closer to their use.The resolved path variables are declared here but not used until lines 482 and 537, which reduces code clarity.
482-485: Prefer json.dump with indentation for readability.Using
json.dumpwith anindentparameter would make the output more human-readable for troubleshooting.
536-540: Prefer json.dump with indentation for readability.Using
json.dumpwith anindentparameter would make the output more human-readable for troubleshooting.
571-573: Remove unnecessary blank line for consistency.Line 571 contains an unnecessary blank line. Other similar methods in this file don't have a blank line before resolving the logs directory.
841-852: Address the redundant mode, short instance ID, and missing strip.Three issues:
- The "r" mode at line 845 is redundant (default is read)
- Using only 4 hex characters for instance ID increases collision risk; consider 8
- Should strip whitespace when reading the instance ID at line 846
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
179-179: Pass boolean as keyword argument for clarity.Using a keyword argument makes the intent explicit and avoids the FBT003 style violation.
Apply this diff:
- clp_config.validate_logs_dir(True) + clp_config.validate_logs_dir(use_host_mount=True)As per coding guidelines.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/clp-package-utils/clp_package_utils/controller.py(16 hunks)components/clp-package-utils/clp_package_utils/scripts/archive_manager.py(3 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1413
File: tools/docker-images/clp-package/Dockerfile:22-24
Timestamp: 2025-10-20T21:05:30.417Z
Learning: In the clp repository's Dockerfiles, ENV directives should be consolidated into multi-line ENV statements when possible to reduce image layers. ENV statements should only be split into separate commands when consolidation is not possible due to dependencies (e.g., when later variables must reference earlier ones that need to be set first, or when PATH must be modified sequentially).
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.pycomponents/clp-package-utils/clp_package_utils/scripts/archive_manager.py
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.pycomponents/clp-package-utils/clp_package_utils/scripts/archive_manager.py
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2024-11-15T16:21:52.122Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
📚 Learning: 2025-10-20T20:19:40.504Z
Learnt from: 20001020ycx
Repo: y-scope/clp PR: 1436
File: components/clp-mcp-server/clp_mcp_server/clp_mcp_server.py:22-27
Timestamp: 2025-10-20T20:19:40.504Z
Learning: In the clp-mcp-server project (components/clp-mcp-server/clp_mcp_server/clp_mcp_server.py), the --config Click option should use exists=True to validate the configuration file path at option processing time, even though this requires the default path to exist on all machines.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
🧬 Code graph analysis (2)
components/clp-package-utils/clp_package_utils/controller.py (2)
components/clp-py-utils/clp_py_utils/core.py (1)
resolve_host_path_in_container(67-91)components/clp-package-utils/clp_package_utils/general.py (1)
validate_webui_config(581-591)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (2)
components/clp-py-utils/clp_py_utils/core.py (1)
resolve_host_path_in_container(67-91)components/clp-py-utils/clp_py_utils/clp_config.py (1)
validate_logs_dir(737-743)
🪛 Ruff (0.14.3)
components/clp-package-utils/clp_package_utils/controller.py
845-845: Unnecessary mode argument
Remove mode argument
(UP015)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
179-179: Boolean positional value in function call
(FBT003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: manylinux_2_28-x86_64-deps-image
- GitHub Check: ubuntu-jammy-deps-image
- GitHub Check: musllinux_1_2-x86_64-deps-image
- GitHub Check: centos-stream-9-deps-image
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build (macos-15)
- GitHub Check: lint-check (macos-15)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (macos-15)
🔇 Additional comments (13)
components/clp-package-utils/clp_package_utils/controller.py (11)
40-40: LGTM!The import of
resolve_host_path_in_containeris necessary for the containerization changes and is used consistently throughout the file.
123-128: LGTM!The host-path resolution pattern is correctly applied for the database component: paths are resolved, directories created on resolved paths, and ownership adjusted for the infrastructure service.
173-175: LGTM!The host-path resolution pattern is correctly applied for the queue component.
212-217: LGTM!The host-path resolution pattern is correctly applied for the Redis component.
263-268: LGTM!The host-path resolution pattern is correctly applied for the results cache component.
305-306: LGTM!The host-path resolution pattern is correctly applied for the compression scheduler component. No ownership change is needed for worker log directories.
329-330: LGTM!The host-path resolution pattern is correctly applied for the query scheduler component.
352-353: LGTM!The host-path resolution pattern is correctly applied for the compression worker component.
382-383: LGTM!The host-path resolution pattern is correctly applied for the query worker component.
410-411: LGTM!The host-path resolution pattern is correctly applied for the reducer component.
610-611: LGTM!The host-path resolution pattern is correctly applied for the garbage collector component.
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (2)
225-228: Correct removal of clp_home from mounts.The exclusion of
clp_homefrom necessary mounts aligns with the PR's containerization strategy, where CLP assets are embedded in the Docker image rather than mounted from the host.
286-289: Host path resolution for cleanup is correct.Resolving the generated config path before unlinking ensures the cleanup operation targets the correct mounted location when running inside the container.
| resolve_host_path_in_container(config_file_path), | ||
| resolve_host_path_in_container(default_config_file_path), | ||
| clp_home, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard config path resolution to preserve container-local defaults.
The current implementation unconditionally translates both config_file_path and default_config_file_path to host-mounted paths. This breaks when the default config exists only inside the container (e.g., /opt/clp/etc/clp-config.yml), because translating it yields a non-existent /mnt/host/opt/clp/etc/clp-config.yml path, causing load_config_file to fail.
For config_file_path: check whether the path exists in the container before resolving; if it exists, use it as-is (supports both container-local defaults and host overrides).
For default_config_file_path: apply the same pattern—preserve the path if it exists in the container, otherwise resolve it.
Apply this diff:
+ import os
+
# Validate and load config file
try:
config_file_path: Path = Path(parsed_args.config)
+ # Check if config_file_path exists in container; if not, resolve it as a host path
+ if config_file_path.exists():
+ resolved_config_file_path = config_file_path
+ else:
+ resolved_config_file_path = resolve_host_path_in_container(config_file_path)
+
+ # For default config, use container path if it exists; otherwise resolve as host path
+ if default_config_file_path.exists():
+ resolved_default_config_file_path = default_config_file_path
+ else:
+ resolved_default_config_file_path = resolve_host_path_in_container(
+ default_config_file_path
+ )
+
clp_config: CLPConfig = load_config_file(
- resolve_host_path_in_container(config_file_path),
- resolve_host_path_in_container(default_config_file_path),
+ resolved_config_file_path,
+ resolved_default_config_file_path,
clp_home,
)Based on learnings.
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
around lines 175 to 178, the code always calls resolve_host_path_in_container on
both config_file_path and default_config_file_path which breaks when a default
config only exists inside the container; instead, check whether each path exists
in the container first (e.g., os.path.exists(container_path)); if it exists,
pass the original container-local path through, otherwise call
resolve_host_path_in_container to convert it to the host-mounted path; apply
this logic to both config_file_path and default_config_file_path before passing
them (and leave clp_home unchanged).
… code review Co-authored-by: kirkrodrigues <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
tools/deployment/package/docker-compose.runtime.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1413
File: tools/docker-images/clp-package/Dockerfile:22-24
Timestamp: 2025-10-20T21:05:30.417Z
Learning: In the clp repository's Dockerfiles, ENV directives should be consolidated into multi-line ENV statements when possible to reduce image layers. ENV statements should only be split into separate commands when consolidation is not possible due to dependencies (e.g., when later variables must reference earlier ones that need to be set first, or when PATH must be modified sequentially).
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Applied to files:
tools/deployment/package/docker-compose.runtime.yaml
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
tools/deployment/package/docker-compose.runtime.yaml
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
tools/deployment/package/docker-compose.runtime.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: ubuntu-jammy-deps-image
- GitHub Check: manylinux_2_28-x86_64-deps-image
- GitHub Check: musllinux_1_2-x86_64-deps-image
- GitHub Check: centos-stream-9-deps-image
- GitHub Check: rust-checks (macos-15)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
tools/deployment/package/docker-compose.runtime.yaml (1)
1-38: Well-structured clp-runtime service definition.The Docker Compose service is clearly configured with appropriate environment variable forwarding (HOME, CLP_HOME, CLP_PWD_HOST) and defensive defaults/validation. The volume strategy correctly forwards the Docker daemon bridge for in-container compose operations and binds CLP_HOME for state and configs.
| - type: "bind" | ||
| source: "/" | ||
| target: "/mnt/host" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider documenting the root filesystem mount strategy.
Mounting the entire root filesystem (/) to /mnt/host is a significant security and design choice that enables flexible host path resolution but provides broad container access. The PR objectives indicate this is intentional for resolving arbitrary host paths during I/O operations, but consider adding a deployment-guide note or inline comment clarifying that paths must be validated in calling code to prevent unintended access.
For future clarity, consider adding a comment like:
# Host filesystem
+ # NOTE: Mounting "/" → "/mnt/host" enables flexible host path access within the container.
+ # Calling code must validate and sanitize paths to prevent unintended cross-mount access.
- type: "bind"
source: "/"
target: "/mnt/host"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - type: "bind" | |
| source: "/" | |
| target: "/mnt/host" | |
| # Host filesystem | |
| # NOTE: Mounting "/" → "/mnt/host" enables flexible host path access within the container. | |
| # Calling code must validate and sanitize paths to prevent unintended cross-mount access. | |
| - type: "bind" | |
| source: "/" | |
| target: "/mnt/host" |
🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.runtime.yaml around lines 36-38, add
an inline comment next to the bind mount explaining why the host root (/) is
mounted, the intended use (resolving arbitrary host paths for I/O), and the
security implications; explicitly instruct callers to validate/whitelist host
paths before using them and reference the repo deployment guide or README where
a short section documents the mount strategy, risk mitigation, and recommended
validation patterns.
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
feat(clp-package)!: Containerize the scripts in `sbin` via Docker Compose; Remove redundant assets from the package (resolves #1358).
sbin via Docker Compose; Remove redundant assets from the package (resolves #1358).
| for compose_plugin_dir in \ | ||
| "$HOME/.docker/cli-plugins" \ | ||
| "/mnt/wsl/docker-desktop/cli-tools/usr/local/lib/docker/cli-plugins" \ | ||
| "/usr/local/lib/docker/cli-plugins" \ | ||
| "/usr/libexec/docker/cli-plugins"; do | ||
|
|
||
| compose_plugin_path="$compose_plugin_dir/docker-compose" | ||
| if [[ -f "$compose_plugin_path" ]]; then | ||
| export CLP_DOCKER_PLUGIN_DIR="$compose_plugin_dir" | ||
| break | ||
| fi | ||
| done | ||
| if [[ -z "${CLP_DOCKER_PLUGIN_DIR:-}" ]]; then | ||
| echo >&2 "Warning: Docker plugin directory not found;" \ | ||
| "Docker Compose may not work inside container." | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plugin upgrades that are installed through Docker Desktop on Windows seems to create symlinks in $HOME/.docker/cli-plugins. The symlinks can link to some Windows host path, such as /mnt/c/Users/junhao/.docker/modules/cli-plugins-wsl/docker-compose. If we don't map the link target's parent path into the container, the compose plugin would not be found in the container.
@coderabbitai create an issue to track this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track the Docker Compose plugin symlink resolution problem
gh issue create \
--repo y-scope/clp \
--title "Docker Compose plugin symlinks not resolved when link targets are outside mounted paths" \
--body "## Problem Description
When using Docker Desktop on Windows with WSL, Docker Compose plugin upgrades create symlinks in \`\$HOME/.docker/cli-plugins\` that point to Windows host paths (e.g., \`/mnt/c/Users/junhao/.docker/modules/cli-plugins-wsl/docker-compose\`).
Currently, if the symlink target's parent path is not mapped into the container, the compose plugin will not be found inside the container runtime.
## Context
This issue was discovered during the containerization of CLP tools in PR #1512.
## Example
- Symlink location: \`\$HOME/.docker/cli-plugins/docker-compose\`
- Symlink target: \`/mnt/c/Users/junhao/.docker/modules/cli-plugins-wsl/docker-compose\`
- Issue: The target path \`/mnt/c/Users/junhao/.docker/modules/cli-plugins-wsl/\` needs to be mounted for the plugin to work inside the container.
## Impact
The Docker Compose plugin may not be available inside the container when users have upgraded their plugins through Docker Desktop on Windows.
---
Reported by @junhaoliao in https://github.com/y-scope/clp/pull/1512#discussion_r2493058818" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1554 to track the Docker Compose plugin symlink resolution problem on Windows with Docker Desktop. The issue documents the scenario where symlinks in $HOME/.docker/cli-plugins point to Windows host paths that need to be mounted for the plugin to work inside the container.
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate `set -eu` and `set -o pipefail` commands rather than combining them into `set -euo pipefail`, to ensure uniform script structure across all platform build scripts.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1413
File: tools/docker-images/clp-package/Dockerfile:22-24
Timestamp: 2025-10-20T21:05:30.417Z
Learning: In the clp repository's Dockerfiles, ENV directives should be consolidated into multi-line ENV statements when possible to reduce image layers. ENV statements should only be split into separate commands when consolidation is not possible due to dependencies (e.g., when later variables must reference earlier ones that need to be set first, or when PATH must be modified sequentially).
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the `components/package-template/src/sbin/` directory, keep them simple and avoid adding additional validation code.
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
| @@ -1,9 +1,12 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should
set -o errexit
set -o nounset
set -o pipefail
or
set -eu
set -o pipefail
so the scripts early exit on any error or on any unset variables, and make a pipeline fail if any command in it fails
@coderabbitai create an issue to track the improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create GitHub issue to track adding error handling flags to shell scripts
gh issue create \
--title "Add error handling flags to shell scripts in package" \
--body "## Context
User @junhaoliao requested tracking an improvement to add error handling flags to shell scripts in the package template.
## Description
Shell scripts in \`components/package-template/src/sbin/\` should include error handling flags at the beginning to make them more robust:
\`\`\`bash
set -eu
set -o pipefail
\`\`\`
These flags will:
- Exit early on any error (\`-e\` / \`errexit\`)
- Exit on unset variables (\`-u\` / \`nounset\`)
- Make pipelines fail if any command in them fails (\`pipefail\`)
Note: Use separate \`set -eu\` and \`set -o pipefail\` commands for consistency with the established pattern in CLP project build scripts.
## Affected Scripts
Scripts in \`components/package-template/src/sbin/\` directory, including:
- admin-tools/archive-manager.sh
- admin-tools/dataset-manager.sh
- compress-from-s3.sh
- compress.sh
- decompress.sh
- search.sh
- start-clp.sh
- stop-clp.sh
- .common-env.sh
## References
- PR: https://github.com/y-scope/clp/pull/1512
- Comment: https://github.com/y-scope/clp/pull/1512#discussion_r2493126446
- Requested by: @junhaoliao" \
--assignee junhaoliao \
--repo y-scope/clp
Length of output: 42
@junhaoliao, I've created issue #1556 to track adding error handling flags to the shell scripts in the package template. The issue covers all the scripts in components/package-template/src/sbin/ and follows the established CLP project pattern of using separate set -eu and set -o pipefail commands for consistency with existing build scripts.
🧠 Learnings used
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate `set -eu` and `set -o pipefail` commands rather than combining them into `set -euo pipefail`, to ensure uniform script structure across all platform build scripts.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1261
File: .github/workflows/clp-core-build.yaml:294-332
Timestamp: 2025-08-25T06:29:59.610Z
Learning: In the CLP project, Bill-hbrhbr prefers a "fail fast" approach for CI workflows - allowing potential command availability issues (like getconf in musllinux) to surface through CI failures rather than preemptively adding fallback logic, as they will fix issues when they occur.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/general.py:0-0
Timestamp: 2025-08-19T14:41:28.901Z
Learning: In the CLP codebase, prefer explicit failures over automatic directory creation in utility functions like dump_config. The user junhaoliao prefers to let file operations fail when parent directories don't exist, as this helps catch implementation errors during development rather than masking setup issues with automatic directory creation.
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1413
File: tools/docker-images/clp-package/Dockerfile:22-24
Timestamp: 2025-10-20T21:05:30.417Z
Learning: In the clp repository's Dockerfiles, ENV directives should be consolidated into multi-line ENV statements when possible to reduce image layers. ENV statements should only be split into separate commands when consolidation is not possible due to dependencies (e.g., when later variables must reference earlier ones that need to be set first, or when PATH must be modified sequentially).
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to make sure the .resolve() calls in this script can properly resolve relative and symlinks in the host path space; otherwise, the "mounted_path"s might point to some invalid file in the Package services containers
@coderabbitai create an issue to track this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track the path resolution concern
gh issue create \
--repo y-scope/clp \
--title "Ensure .resolve() calls in compress.py properly handle host path resolution" \
--body "## Description
In the containerized runtime environment introduced in PR #1512, the \`.resolve()\` calls in \`components/clp-package-utils/clp_package_utils/scripts/compress.py\` need to properly resolve relative paths and symlinks in the host path space. Otherwise, the \`mounted_path\`s might point to invalid files in the Package services containers.
## Context
The \`_generate_logs_list\` function uses \`pathlib.Path(...).resolve()\` to convert user-provided paths into absolute paths before mounting them in containers. However, in the containerized runtime, these resolve calls may operate in the container's filesystem space rather than the host's filesystem space.
## Related
- PR: https://github.com/y-scope/clp/pull/1512
- Comment: https://github.com/y-scope/clp/pull/1512#discussion_r2495876730
- Requested by: @junhaoliao" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1558 to track this concern about ensuring .resolve() calls in compress.py properly handle host path resolution.
You're right to flag this—the .resolve() calls at lines 48 and 69 (in _generate_logs_list) currently resolve paths in whatever filesystem context the script runs in. Since the script now executes inside the container, these calls would resolve symlinks and relative paths against the container's filesystem instead of the host's, which could lead to incorrect mounted_path values that don't correspond to valid files in the container mounts.
🧬 Code Graph Analysis Results
[content below identifies relevant snippets to address the user comment; do not perform the user comment analysis here]
-
File: components/clp-py-utils/clp_py_utils/core.py
Lines: 67-92
Content:
def resolve_host_path_in_container(host_path: pathlib.Path) -> pathlib.Path:
"""
Translates a host path to its container-mount equivalent. It also resolves a single level of
symbolic link if the host path itself is a symlink.:param host_path: The host path. :return: The translated path. """ host_path = host_path.absolute() translated_path = CONTAINER_DIR_FOR_HOST_ROOT / host_path.relative_to("/") try: if not translated_path.is_symlink(): return translated_path link_target = translated_path.readlink() if link_target.is_absolute(): return CONTAINER_DIR_FOR_HOST_ROOT / link_target.relative_to("/") else: # If the symlink points to a relative path, resolve it relative to the symlink's parent. return (translated_path.parent / link_target).resolve() except OSError: # Ignore if reading the symlink fails (e.g., broken link or permission error). pass return translated_path -
File: components/clp-package-utils/clp_package_utils/scripts/compress.py
Lines: 31-54 (approximate)
Content (relevant excerpt):
def _generate_logs_list(
container_logs_list_path: pathlib.Path,
parsed_args: argparse.Namespace,
) -> bool:
"""
Generates logs list file for the native compression script.:param container_logs_list_path: Path to write logs list. :param parsed_args: Parsed command-line arguments. :return: Whether any paths were written to the logs list. """ host_logs_list_path = parsed_args.path_list with open(container_logs_list_path, "w") as container_logs_list_file: if host_logs_list_path is None: for path in parsed_args.paths: resolved_path = pathlib.Path(path).resolve() mounted_path = CONTAINER_INPUT_LOGS_ROOT_DIR / resolved_path.relative_to( resolved_path.anchor ) container_logs_list_file.write(f"{mounted_path}\n") return len(parsed_args.paths) != 0 no_path_found = True resolved_host_logs_list_path = resolve_host_path_in_container( pathlib.Path(host_logs_list_path) ) with open(resolved_host_logs_list_path, "r") as host_logs_list_file: for line in host_logs_list_file: stripped_path_str = line.rstrip() if "" == stripped_path_str: # Skip empty paths continue no_path_found = False resolved_path = pathlib.Path(stripped_path_str).resolve() mounted_path = CONTAINER_INPUT_LOGS_ROOT_DIR / resolved_path.relative_to( resolved_path.anchor ) container_logs_list_file.write(f"{mounted_path}\n") return not no_path_found
Note: The user comment mentions ensuring .resolve() calls properly resolve relative and symlinks in host path space so that "mounted_path"s are valid inside containers. The above snippets illustrate:
- How host paths are translated to container paths with potential single-level symlink handling (resolve_host_path_in_container).
- How the script resolves host paths on the host side and computes mounted container paths (Path.resolve() + relative_to("/")) to form the mounted paths written to the logs list. These are the relevant points to audit and adjust for host-space resolution behavior.
🧠 Learnings used
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
host network mode should be used or the start_clp.py script cannot properly check what IP addresses it can bind to on the host.
@coderabbitai file an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue for enabling host network mode in docker-compose.runtime.yaml
gh issue create \
--repo y-scope/clp \
--title "Enable host network mode for clp-runtime service in docker-compose.runtime.yaml" \
--body "## Problem
The \`clp-runtime\` service in \`tools/deployment/package/docker-compose.runtime.yaml\` should use host network mode to allow the \`start_clp.py\` script to properly check what IP addresses it can bind to on the host.
## Context
Currently, the service runs with bridge networking, which isolates the container's network stack from the host. This prevents \`start_clp.py\` from correctly detecting available host IP addresses for binding.
## Backlinks
- PR: https://github.com/y-scope/clp/pull/1512
- Comment: https://github.com/y-scope/clp/pull/1512#discussion_r2496275743
- Reported by: @junhaoliao" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1560 to track enabling host network mode for the clp-runtime service. The issue includes the context about start_clp.py needing proper access to host IP addresses and references this PR and comment.
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1413
File: tools/docker-images/clp-package/Dockerfile:22-24
Timestamp: 2025-10-20T21:05:30.417Z
Learning: In the clp repository's Dockerfiles, ENV directives should be consolidated into multi-line ENV statements when possible to reduce image layers. ENV statements should only be split into separate commands when consolidation is not possible due to dependencies (e.g., when later variables must reference earlier ones that need to be set first, or when PATH must be modified sequentially).
Learnt from: 20001020ycx
Repo: y-scope/clp PR: 1368
File: components/clp-mcp-server/clp_mcp_server/__init__.py:11-15
Timestamp: 2025-10-02T15:48:58.961Z
Learning: In the clp-mcp-server component (components/clp-mcp-server/clp_mcp_server/__init__.py), the default host binding of 0.0.0.0 is intentional because the server is designed to be deployed in Docker containers where this binding is necessary to accept external connections.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
| stdin_open: true | ||
| tty: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's found that when requesting to detach stdin from the shell using &, no output is seen in the console. e.g.,
cd build/clp-package
./sbin/start-clp.sh
# this command prints compression speed to the console
./sbin/compress.sh ~/samples/hive-24hr
# with the `&` added, no command output is observed in the console
./sbin/compress.sh ~/samples/hive-24hr &
i believe we should simply remove tty: true and stdin_open: true from the docker-compose.runtime.yaml to avoid allocating a TTY. however, removing those lines have no effects - when & is used, there's still no output printed to the console.
It was later discovered that the docker compose run command overrides the tty and stdin_open options defined in service configurations (source), even when neither the --no-TTY nor the --interactive flag is specified. In other words, if these flags are omitted, docker compose run will automatically set tty: true when it detects a TTY-capable terminal, and will always set stdin_open: true, regardless of the values defined in docker-compose.yaml. (The official documentation is unfortunately confusing and misleading on this behavior.)
then i think we should remove these two lines (rather than changing them to false) in docker-compose.runtime.yaml to avoid confusions, and add --no-TTY to all of the sbin scripts.
@coderabbitai create an issue to track the fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track the TTY handling fix
gh issue create \
--repo y-scope/clp \
--title "Fix TTY allocation causing missing output when backgrounding commands" \
--body "## Issue Description
When running CLP package commands in the background using \`&\`, no output is printed to the console. For example:
\`\`\`bash
./sbin/compress.sh ~/samples/hive-24hr &
\`\`\`
## Root Cause
The \`docker compose run\` command overrides the \`tty\` and \`stdin_open\` options defined in service configurations ([source](https://github.com/docker/compose/blob/c37ede62dbb6485c4d5e297db9134631e9b2095e/cmd/compose/run.go#L88-L89)), even when neither the \`--no-TTY\` nor the \`--interactive\` flag is specified. This means \`docker compose run\` automatically sets \`tty: true\` when it detects a TTY-capable terminal and always sets \`stdin_open: true\`, regardless of the values defined in \`docker-compose.yaml\`.
## Proposed Solution
1. Remove \`tty: true\` and \`stdin_open: true\` from \`tools/deployment/package/docker-compose.runtime.yaml\` to avoid confusion
2. Add \`--no-TTY\` flag to all \`docker compose run\` commands in the sbin scripts
## Context
Reported by @junhaoliao in PR #1512: https://github.com/y-scope/clp/pull/1512#discussion_r2501082935" \
--assignee junhaoliao
Length of output: 42
@junhaoliao I've created issue #1566 to track this fix. The issue documents the root cause (docker compose run overriding TTY settings) and the proposed solution (removing the lines from docker-compose.runtime.yaml and adding --no-TTY to all sbin scripts).
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Description
This change containerizes the package runtime, moving all user-facing CLI wrappers to run inside a Docker Compose–managed container. It introduces a unified environment bootstrap for consistent configuration and reduces package bloat by eliminating duplication of assets already included in the container image.
Summary of changes (2 BREAKING changes):
${CLP_HOME}/var/data,${CLP_HOME}/var/log,${CLP_HOME}/var/tmpwill now be created with ownership${USER}:dockerinstead of${USER}:<user's default group>.tools/deployment/package/docker-compose.runtime.yamldefining theclp-runtimeservice.HOMEinto the runtime environment for~inclp-config.yamlto work.components/package-template/src/sbin/.common-env.shthat:CLP_HOME, image ref (CLP_PACKAGE_CONTAINER_IMAGE_REF), andCLP_FIRST_PARTY_SERVICE_UID_GID.CLP_PWD_HOSTsrc/sbin/*.shwrappers (admin-tools and user tools) tosource.common-env.shand invoke:docker compose -f "$CLP_HOME/docker-compose.runtime.yaml" run --rm clp-runtime python3 -m ...mounts.clp_homefromnecessary_mountsin:archive_manager.py,compress.py,compress_from_s3.py,dataset_manager.py,decompress.py(both handlers),search.py.decompress.py: default--extraction-dirtoCLP_PWD_HOST(falls back to".").clp-package/bin.taskfile.yaml:python-libsinternal task producingbuild/python-libs.docker-images:packagenow depends on image and copies settings/webui artifactsdefaultnow runs taskpackageagain.tools/docker-images/clp-package/Dockerfileto assemble image directly from build outputs:lib/python3/site-packages/, and webui tovar/www/webui/.docker-compose.runtime.yamland runs without a locally managedPYTHONPATH.Rationale:
Checklist
breaking change.
Validation performed
Observed size reduction of the package
taskto build the package.cd build/clp-package/ && ./sbin/start-clp.sh./sbin/compress.sh ~/samples/hive-24hr./sbin/search.sh 1which returned resultscd /tmp && ./sbin/decompress.sh xwhich extracted the logs into/tmpwith correct hierarchy set up.Summary by CodeRabbit
New Features
Chores
Bug Fixes
Documentation