Skip to content

Conversation

junhaoliao
Copy link
Member

@junhaoliao junhaoliao commented Aug 8, 2025

Description

  • Docker Compose Implementation: Replaced the previous Docker container management approach with Docker Compose for orchestrating CLP services
    • Service Dependencies: Defined proper service dependencies in the Docker Compose file with health checks and startup conditions
    • Volume Management: Improved volume mounting configuration for persistent data and logs
    • Health Checks: Added health check configurations for critical services (database, queue, redis, results cache, schedulers, web UI)
  • New Controller Architecture: Introduced a new BaseController abstract class and DockerComposeController implementation to handle service provisioning and deployment, enabling dynamic orchestration switching in the future once we add support for k8s orchestration
    • Environment-based Configuration: Created a system that generates .env files with environment variables for Docker Compose services
    • Component Provisioning Methods: Implemented dedicated provisioning methods for each CLP component (database, queue, redis, results cache, schedulers, workers, web UI, garbage collector)
    • Configuration Transformation: Added functionality to transform CLP configuration for containerized environments using generate_docker_compose_container_config
  • Simplified Start/Stop Scripts: Refactored start_clp.py and stop_clp.py to use the new Docker Compose controller instead of managing individual containers
  • (BREAKING?) Removed Legacy Individual Component Starting Mechanism: Eliminated component-specific target parsing in start/stop scripts

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

(to be updated with more details once they're all revalidated with the latest batch of refactoring changes). Previously:

  1. All scripts in sbin:
    1. admin-tools/archive-manager.sh
    2. admin-tools/dataset-manager.sh
    3. compress.sh
    4. decompress.sh
    5. search.sh
    6. start-clp.sh
    7. stop-clp.sh
  2. And the webui:
    1. Viewing ingestion stats / jobs
    2. Submitting and viewing queries
    3. Extracting stream files and opening them in the log viewer
  3. With below config / use-case variations:
    1. storage_engine: clp / clp-s
    2. with clp-s:
      1. s3 input
      2. s3 archive output
      3. s3 stream output
      4. non-default dataset
    3. garbage collector enablement

Summary by CodeRabbit

  • New Features

    • Docker Compose deployment controller with automated provisioning, health checks, shared .env export and automatic worker sizing.
    • Packaged docker‑compose manifests for one‑step orchestration.
  • Changes

    • start/stop unified to single config‑driven commands with upfront validation, unconditional credential loading and directory preparation.
    • Configuration defaults and container paths updated for container usage; deployment constants relocated.
  • Documentation

    • Added Docker Compose design and build guidance; multi‑node guide temporarily suspended.
  • Chores

    • Packaging now includes deployment assets.

Copy link
Contributor

coderabbitai bot commented Aug 8, 2025

Walkthrough

Adds container-focused deployment: new BaseController and DockerComposeController, container-aware CLPConfig transforms and constants, Docker Compose manifests and packaging, start/stop scripts refactored to use the controller, validators updated, and documentation extended for Docker Compose.

Changes

Cohort / File(s) Summary
Controller and deployment orchestration
components/clp-package-utils/clp_package_utils/controller.py
New BaseController (abstract) and DockerComposeController (concrete) implementing provisioning, per-component environment assembly, .env export, docker compose up/down, helpers for chown and IP resolution, and dynamic WebUI settings.json updates.
General utilities and validation
components/clp-package-utils/clp_package_utils/general.py
Added Docker Compose checks (is_docker_compose_running, check_docker_dependencies), container config generator (generate_docker_compose_container_config), centralized _validate_log_directory, updated validator signatures to require base_config, and replaced check_dependencies with compose-aware logic.
Core config model (container transforms & constants)
components/clp-py-utils/clp_py_utils/clp_config.py
Added container path/constants (CONTAINER_*, CLP_DEFAULT_*), DeploymentType enum, per-component transform_for_container methods, container-aware storage/ingestion transforms, updated serialization (dump_to_primitive_dict) and path handling, and exposed CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH.
Startup/Shutdown scripts (Compose-based)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py, components/clp-package-utils/clp_package_utils/scripts/stop_clp.py
Replaced per-component orchestration with DockerComposeController flows; unified validation and directory prep; simplified CLI; stop_clp.py now uses main() and delegates to controller stop().
Script imports: default config constant relocation
components/clp-package-utils/clp_package_utils/scripts/*.py, components/clp-package-utils/clp_package_utils/scripts/native/*.py
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH import sites moved from clp_package_utils.general to clp_py_utils.clp_config; usage unchanged across affected scripts (archive_manager, compress, dataset_manager, decompress, search and native variants).
Docker Compose definitions
tools/deployment/package/docker-compose.base.yaml, tools/deployment/package/docker-compose.yaml
New base and overlay Compose files defining services, anchors for defaults/healthchecks, networks, volumes, secrets, dependencies, health checks, and service/one-time init job definitions for CLP packaging.
Packaging tasks
taskfile.yaml
Added tools/deployment/package/**/* to package sources and rsync steps to include deployment assets in the package output.
Documentation updates
docs/src/dev-docs/design-docker-compose.md, docs/src/dev-docs/building-package.md, docs/src/dev-docs/index.md, docs/src/user-docs/guides-multi-node.md, docs/src/user-docs/guides-overview.md
Added Docker Compose design doc and toctree entry; updated build docs to mention Compose; replaced multi-node guide content with a warning about temporary removal; minor formatting fix.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant StartScript as start_clp.py
  participant Config as CLPConfig
  participant Controller as DockerComposeController
  participant Utils as general.py
  participant Compose as "docker compose"

  User->>StartScript: run --config
  StartScript->>Config: load_config_file(...)
  StartScript->>Config: transform_for_container()
  StartScript->>Controller: deploy(clp_config)
  Controller->>Utils: check_docker_dependencies(should_compose_run=true)
  Controller->>Controller: _provision() -> compute env, write .env, prep dirs
  Controller->>Compose: up -d (uses .env + compose files)
  Compose-->>Controller: services started / healthy
  Controller-->>StartScript: success
  StartScript-->>User: exit 0
Loading
sequenceDiagram
  autonumber
  actor User
  participant StopScript as stop_clp.py
  participant Controller as DockerComposeController
  participant Utils as general.py
  participant Compose as "docker compose"

  User->>StopScript: run
  StopScript->>Controller: stop()
  Controller->>Utils: check_docker_dependencies(should_compose_run=false)
  Controller->>Compose: down
  Compose-->>Controller: stopped
  Controller-->>StopScript: success
  StopScript-->>User: exit 0
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related issues

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit's high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately describes migrating the package orchestration to Docker Compose, which is the core change in the pull request.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cba044a and f41e22a.

📒 Files selected for processing (1)
  • tools/deployment/package/docker-compose.base.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • tools/deployment/package/docker-compose.base.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: lint-check (macos-15)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao changed the title feat(deployment): Migrate package orchestration to Docker Compose (resolves #1177). DO NOT MERGE (pending many other dependency PRs) - feat(deployment): Migrate package orchestration to Docker Compose (resolves #1177). Aug 8, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c91bf3 and 2cb1807.

📒 Files selected for processing (4)
  • components/clp-package-utils/clp_package_utils/controller.py (1 hunks)
  • docs/src/dev-docs/design-docker-compose.md (1 hunks)
  • tools/deployment/package/docker-compose.base.yaml (1 hunks)
  • tools/deployment/package/docker-compose.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-09-25T05:13:13.282Z
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.282Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
  • tools/deployment/package/docker-compose.base.yaml
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
🧬 Code graph analysis (1)
components/clp-package-utils/clp_package_utils/controller.py (3)
components/clp-py-utils/clp_py_utils/clp_config.py (9)
  • CLPConfig (838-1048)
  • DeploymentType (66-68)
  • QueryEngine (76-79)
  • StorageEngine (71-73)
  • StorageType (82-84)
  • get_clp_connection_params_and_type (203-224)
  • get_directory (728-729)
  • get_directory (750-751)
  • get_deployment_type (980-984)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (3)
  • get_archives_table_name (299-300)
  • get_datasets_table_name (307-308)
  • get_files_table_name (311-312)
components/clp-package-utils/clp_package_utils/general.py (9)
  • check_docker_dependencies (151-175)
  • dump_shared_container_config (359-369)
  • generate_docker_compose_container_config (310-320)
  • get_clp_home (107-123)
  • validate_db_config (497-507)
  • validate_queue_config (510-513)
  • validate_redis_config (516-526)
  • validate_results_cache_config (540-554)
  • validate_webui_config (569-578)
🪛 markdownlint-cli2 (0.18.1)
docs/src/dev-docs/design-docker-compose.md

21-21: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


23-23: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


136-136: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe

(MD055, table-pipe-style)


136-136: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe

(MD055, table-pipe-style)


136-136: Table column count
Expected: 2; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


149-149: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe

(MD055, table-pipe-style)


149-149: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe

(MD055, table-pipe-style)


149-149: Table column count
Expected: 2; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


156-156: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


161-161: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


166-166: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🔇 Additional comments (25)
components/clp-package-utils/clp_package_utils/controller.py (7)

514-521: Ensure at least one worker on 1‑CPU hosts

cpu_count() // 2 can be 0.

-        return multiprocessing.cpu_count() // 2
+        return max(1, multiprocessing.cpu_count() // 2)

89-96: Fix abstract method contract for _provision

Base declares a return value but subclass returns None and caller ignores it. Align to None.

 class BaseController(ABC):
@@
-    @abstractmethod
-    def _provision(self) -> EnvVarsDict:
+    @abstractmethod
+    def _provision(self) -> None:
         """
         Prepares all components with orchestrator-specific logic.
 
-        :return: Dictionary of environment variables to be used by the orchestrator.
+        :return: None
         """
         pass
-    def _provision(self):
+    def _provision(self) -> None:
@@
-        :return: Dictionary of all environment variables.
+        :return: None

Also applies to: 522-531


124-127: Make DB image configurable and loosen the default pin

Hard-coding ties you to stale tags and blocks environment overrides.

-            "CLP_DB_IMAGE": (
-                "mysql:8.0.23" if "mysql" == self.clp_config.database.type else "mariadb:10-jammy"
-            ),
+            "CLP_DB_IMAGE": os.getenv(
+                "CLP_DB_IMAGE",
+                "mysql:8" if "mysql" == self.clp_config.database.type else "mariadb:10",
+            ),

217-223: Apply mode on existing log file

Path.touch(mode=...) only sets mode at creation. Ensure mode is applied when file exists.

         logs_file = self.clp_config.logs_directory / f"{component_name}.log"
-        logs_file.touch(mode=LOGS_FILE_MODE, exist_ok=True)
+        logs_file.touch(mode=LOGS_FILE_MODE, exist_ok=True)
+        logs_file.chmod(LOGS_FILE_MODE)

234-240: Apply mode on existing log file (query scheduler)

Mirror the chmod fix here.

         logs_file = self.clp_config.logs_directory / f"{component_name}.log"
-        logs_file.touch(mode=LOGS_FILE_MODE, exist_ok=True)
+        logs_file.touch(mode=LOGS_FILE_MODE, exist_ok=True)
+        logs_file.chmod(LOGS_FILE_MODE)

618-620: Use portable chown flag

Prefer -R over GNU‑specific --recursive.

-    chown_cmd = ["chown", "--recursive", f"{user_id}:{group_id}", str(path)]
+    chown_cmd = ["chown", "-R", f"{user_id}:{group_id}", str(path)]

622-629: Hostname resolution: add IPv6 support and context on failure

socket.gethostbyname is IPv4‑only and exceptions lack context.

-def _get_ip_from_hostname(hostname: str) -> str:
+def _get_ip_from_hostname(hostname: str) -> str:
@@
-    return socket.gethostbyname(hostname)
+    try:
+        infos = socket.getaddrinfo(hostname, None)
+        # Prefer IPv4 if available; otherwise take the first result
+        for family, _, _, _, sockaddr in infos:
+            if family == socket.AF_INET:
+                return sockaddr[0]
+        return infos[0][4][0]
+    except socket.gaierror as ex:
+        raise ValueError(f"Failed to resolve hostname '{hostname}': {ex}") from ex
docs/src/dev-docs/design-docker-compose.md (5)

52-66: Align Mermaid node IDs with service keys

Use hyphenated identifiers to match compose keys.

-  results_cache["results-cache (MongoDB)"]
-  compression_scheduler["compression-scheduler"]
-  query_scheduler["query-scheduler"]
-  compression_worker["compression-worker"]
-  query_worker["query-worker"]
+  results-cache["results-cache (MongoDB)"]
+  compression-scheduler["compression-scheduler"]
+  query-scheduler["query-scheduler"]
+  compression-worker["compression-worker"]
+  query-worker["query-worker"]
@@
-    results_cache
+    results-cache
@@
-    compression_scheduler
-    query_scheduler
+    compression-scheduler
+    query-scheduler
@@
-    compression_worker
-    query_worker
+    compression-worker
+    query-worker

Also applies to: 87-93, 99-109


155-168: Surround fenced blocks in lists with blank lines

Improves rendering and satisfies markdownlint MD031.

-1. Check service status:
-   ```bash
+1. Check service status:
+
+   ```bash
    docker compose ps
    ```
 
-2. View service logs:
-   ```bash
+2. View service logs:
+
+   ```bash
    docker compose logs <service-name>
    ```
 
-3. Validate configuration:
-   ```bash
+3. Validate configuration:
+
+   ```bash
    docker compose config
    ```

41-47: Script name and compose wait flag

Use start_clp.py and consider --wait.

-The `start-clp.py` script performs the following steps:
+The `start_clp.py` script performs the following steps:
@@
-4. **Docker Compose Execution**: `docker compose up -d` is executed to start all services.
+4. **Docker Compose Execution**: `docker compose up -d --wait` is executed to start all services.

21-24: Remove trailing spaces and add a security note for .env

Trim whitespace and warn about secrets in .env.

-1. **Provisioning Methods**: Each CLP component has a dedicated provisioning method in the 
+1. **Provisioning Methods**: Each CLP component has a dedicated provisioning method in the
    controller: `provision_<component-name>()`.
-2. **Environment Generation**: The controller generates a `.env` file with all necessary environment 
+2. **Environment Generation**: The controller generates a `.env` file with all necessary environment
    variables for Docker Compose.
+   - Note: `.env` may contain secrets. Do not commit it; add it to `.gitignore` and store securely.

30-38: Add prerequisites section

Call out required Docker/Compose versions.

 ## Docker Compose File
@@
 The `docker-compose.yaml` file defines all services with:
+
+Prerequisites
+
+- Docker Engine 24+ (or a compatible release)
+- Docker Compose v2 (invoked as `docker compose`)
tools/deployment/package/docker-compose.base.yaml (10)

4-7: Do not apply a global user override to all services

Vendor images (MySQL/RabbitMQ/Redis/Mongo) often break with arbitrary UIDs. Drop from defaults and set per app service only.

 x-service-defaults: &service_defaults
   stop_grace_period: "3s"
-  user: "${CLP_UID_GID:-1000:1000}"

41-48: DB healthcheck leaks password; read from secret file

Avoid exposing secrets in argv/inspect.

-      test: [
-        "CMD",
-        "mysqladmin", "ping",
-        "--silent",
-        "-h", "127.0.0.1",
-        "-u", "${CLP_DB_USER}",
-        "--password=${CLP_DB_PASS}"
-      ]
+      test: >
+        bash -lc 'mysqladmin ping --silent -h 127.0.0.1 -u "${CLP_DB_USER}" -p"$(cat /run/secrets/CLP_DB_PASS_FILE)"'

21-39: Give stateful services enough time to shut down

Increase stop_grace_period to ≥30s for DB.

   database:
     <<: *service_defaults
+    stop_grace_period: "30s"

118-138: Extend stop_grace_period for MongoDB

Same rationale as DB.

   results-cache:
     <<: *service_defaults
+    stop_grace_period: "30s"

74-75: Bump RabbitMQ to a supported tag

3.9.8 is EOL with known CVEs.

-    image: "rabbitmq:3.9.8"
+    image: "rabbitmq:3.13.8"

124-125: Fix results-cache default host port

6379 (Redis) collides; MongoDB is 27017.

-      - "${CLP_RESULTS_CACHE_HOST:-127.0.0.1}:${CLP_RESULTS_CACHE_PORT:-6379}:27017"
+      - "${CLP_RESULTS_CACHE_HOST:-127.0.0.1}:${CLP_RESULTS_CACHE_PORT:-27017}:27017"

170-174: Compose doesn’t expand ~; use $HOME

Avoid mounting literal "~/.aws".

-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-$HOME/.aws}:/.aws:ro"

206-214: Avoid mapping the same host dir for staged and final archives

Introduce CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST.

-      - "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/archives}:/var/data/archives"
-      - "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/staged-archives}:/var/data/staged-archives"
+      - "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/archives}:/var/data/archives"
+      - "${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/staged-archives}:/var/data/staged-archives"

171-173: Avoid backslash continuations in quoted scalars

Keep each volume entry on one line for parser compatibility.

-      - "${CLP_COMPRESSION_SCHEDULER_LOGS_FILE_HOST:-./var/log/compression_scheduler.log}\
-:/var/log/compression_scheduler.log"
+      - "${CLP_COMPRESSION_SCHEDULER_LOGS_FILE_HOST:-./var/log/compression_scheduler.log}:/var/log/compression_scheduler.log"
@@
-      - "${CLP_COMPRESSION_WORKER_LOGS_DIR_HOST:-./var/log/compression_worker}:\
-/var/log/compression_worker"
+      - "${CLP_COMPRESSION_WORKER_LOGS_DIR_HOST:-./var/log/compression_worker}:/var/log/compression_worker"
@@
-      - "./var/www/webui/server/dist/settings.json\
-:/opt/clp/var/www/webui/server/dist/settings.json:ro"
+      - "./var/www/webui/server/dist/settings.json:/opt/clp/var/www/webui/server/dist/settings.json:ro"

Also applies to: 203-204, 245-246


174-175: Consider scoping the “/” bind mount

Mounting the host root is high-privilege; gate behind a profile or narrow to needed dirs.

Also applies to: 213-214

tools/deployment/package/docker-compose.yaml (3)

6-9: Do not apply a global user override to all services

As in base file, remove from defaults and set on app containers only.

 x-service-defaults: &service_defaults
   stop_grace_period: "3s"
-  user: "${CLP_UID_GID:-1000:1000}"

34-35: Avoid backslash continuation in volume entry

Keep as single-line scalar.

-      - "${CLP_QUERY_SCHEDULER_LOGS_FILE_HOST:-./var/log/query_scheduler.log}:\
-/var/log/query_scheduler.log"
+      - "${CLP_QUERY_SCHEDULER_LOGS_FILE_HOST:-./var/log/query_scheduler.log}:/var/log/query_scheduler.log"

76-82: Fix staged/final streams mounts and tilde expansion

Use $HOME and a dedicated staged streams env.

-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-$HOME/.aws}:/.aws:ro"
@@
-      - "${CLP_STREAM_OUTPUT_DIR_HOST:-./var/data/staged-streams}:/var/data/staged-streams"
-      - "${CLP_STREAM_OUTPUT_DIR_HOST:-./var/data/streams}:/var/data/streams"
+      - "${CLP_STAGED_STREAM_OUTPUT_DIR_HOST:-./var/data/staged-streams}:/var/data/staged-streams"
+      - "${CLP_STREAM_OUTPUT_DIR_HOST:-./var/data/streams}:/var/data/streams"

Comment on lines +479 to +483
cmd = ["docker", "compose", "--project-name", self._project_name]
if deployment_type == DeploymentType.BASE:
cmd += ["--file", "docker-compose.base.yaml"]
cmd += ["up", "--detach"]
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Optionally wait for health before returning

docker compose up -d --wait improves determinism with healthchecks.

-        cmd += ["up", "--detach"]
+        cmd += ["up", "--detach", "--wait"]

Please confirm your target Docker Compose v2 baseline supports --wait.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cmd = ["docker", "compose", "--project-name", self._project_name]
if deployment_type == DeploymentType.BASE:
cmd += ["--file", "docker-compose.base.yaml"]
cmd += ["up", "--detach"]
try:
cmd = ["docker", "compose", "--project-name", self._project_name]
if deployment_type == DeploymentType.BASE:
cmd += ["--file", "docker-compose.base.yaml"]
cmd += ["up", "--detach", "--wait"]
try:
🤖 Prompt for AI Agents
components/clp-package-utils/clp_package_utils/controller.py around lines 479 to
483: the docker compose command should optionally include the --wait flag to
block until container healthchecks pass; update the command construction to
append "--wait" when a new boolean parameter/flag (e.g., wait_for_health=True)
or config indicates it, but guard for compatibility by detecting Docker Compose
v2 support (e.g., run "docker compose version" or inspect "docker compose
--help" once at startup) and only add "--wait" when supported; if detection
fails or the flag is unsupported, fall back to current behavior (omit "--wait")
and log a warning so callers can verify support.

Comment on lines 535 to 563
env_dict = {
"CLP_PACKAGE_STORAGE_ENGINE": self.clp_config.package.storage_engine,
# User and group IDs
"CLP_UID_GID": DEFAULT_UID_GID,
"CLP_SERVICE_CONTAINER_UID_GID": (
SERVICE_CONTAINER_UID_GID if os.geteuid() == 0 else DEFAULT_UID_GID
),
# Package container
"CLP_PACKAGE_CONTAINER": self.clp_config.execution_container,
# Global paths
"CLP_DATA_DIR_HOST": str(self.clp_config.data_directory),
"CLP_LOGS_DIR_HOST": str(self.clp_config.logs_directory),
"CLP_ARCHIVE_OUTPUT_DIR_HOST": str(self.clp_config.archive_output.get_directory()),
"CLP_STREAM_OUTPUT_DIR_HOST": str(self.clp_config.stream_output.get_directory()),
# AWS credentials
"CLP_AWS_ACCESS_KEY_ID": os.getenv("AWS_ACCESS_KEY_ID", ""),
"CLP_AWS_SECRET_ACCESS_KEY": os.getenv("AWS_SECRET_ACCESS_KEY", ""),
**self._set_up_env_for_database(),
**self._set_up_env_for_queue(),
**self._set_up_env_for_redis(),
**self._set_up_env_for_results_cache(),
**self._set_up_env_for_compression_scheduler(),
**self._set_up_env_for_query_scheduler(),
**self._set_up_env_for_compression_worker(num_workers),
**self._set_up_env_for_query_worker(num_workers),
**self._set_up_env_for_reducer(num_workers),
**self._set_up_env_for_webui(container_clp_config),
**self._set_up_env_for_garbage_collector(),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add staged directory envs to prevent mount collisions

Compose mounts reuse CLP_ARCHIVE_OUTPUT_DIR_HOST/CLP_STREAM_OUTPUT_DIR_HOST for both staged and final paths, collapsing directories. Provide distinct envs.

         env_dict = {
@@
-            "CLP_ARCHIVE_OUTPUT_DIR_HOST": str(self.clp_config.archive_output.get_directory()),
-            "CLP_STREAM_OUTPUT_DIR_HOST": str(self.clp_config.stream_output.get_directory()),
+            "CLP_ARCHIVE_OUTPUT_DIR_HOST": str(self.clp_config.archive_output.get_directory()),
+            # New: dedicated staged dirs to avoid overlap with final paths
+            "CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST": str(
+                (self.clp_config.archive_output.get_directory().parent / "staged-archives").resolve()
+            ),
+            "CLP_STREAM_OUTPUT_DIR_HOST": str(self.clp_config.stream_output.get_directory()),
+            "CLP_STAGED_STREAM_OUTPUT_DIR_HOST": str(
+                (self.clp_config.stream_output.get_directory().parent / "staged-streams").resolve()
+            ),

Follow up: update compose files to use the new variables (see related comments).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env_dict = {
"CLP_PACKAGE_STORAGE_ENGINE": self.clp_config.package.storage_engine,
# User and group IDs
"CLP_UID_GID": DEFAULT_UID_GID,
"CLP_SERVICE_CONTAINER_UID_GID": (
SERVICE_CONTAINER_UID_GID if os.geteuid() == 0 else DEFAULT_UID_GID
),
# Package container
"CLP_PACKAGE_CONTAINER": self.clp_config.execution_container,
# Global paths
"CLP_DATA_DIR_HOST": str(self.clp_config.data_directory),
"CLP_LOGS_DIR_HOST": str(self.clp_config.logs_directory),
"CLP_ARCHIVE_OUTPUT_DIR_HOST": str(self.clp_config.archive_output.get_directory()),
"CLP_STREAM_OUTPUT_DIR_HOST": str(self.clp_config.stream_output.get_directory()),
# AWS credentials
"CLP_AWS_ACCESS_KEY_ID": os.getenv("AWS_ACCESS_KEY_ID", ""),
"CLP_AWS_SECRET_ACCESS_KEY": os.getenv("AWS_SECRET_ACCESS_KEY", ""),
**self._set_up_env_for_database(),
**self._set_up_env_for_queue(),
**self._set_up_env_for_redis(),
**self._set_up_env_for_results_cache(),
**self._set_up_env_for_compression_scheduler(),
**self._set_up_env_for_query_scheduler(),
**self._set_up_env_for_compression_worker(num_workers),
**self._set_up_env_for_query_worker(num_workers),
**self._set_up_env_for_reducer(num_workers),
**self._set_up_env_for_webui(container_clp_config),
**self._set_up_env_for_garbage_collector(),
}
env_dict = {
"CLP_PACKAGE_STORAGE_ENGINE": self.clp_config.package.storage_engine,
# User and group IDs
"CLP_UID_GID": DEFAULT_UID_GID,
"CLP_SERVICE_CONTAINER_UID_GID": (
SERVICE_CONTAINER_UID_GID if os.geteuid() == 0 else DEFAULT_UID_GID
),
# Package container
"CLP_PACKAGE_CONTAINER": self.clp_config.execution_container,
# Global paths
"CLP_DATA_DIR_HOST": str(self.clp_config.data_directory),
"CLP_LOGS_DIR_HOST": str(self.clp_config.logs_directory),
"CLP_ARCHIVE_OUTPUT_DIR_HOST": str(self.clp_config.archive_output.get_directory()),
# New: dedicated staged dirs to avoid overlap with final paths
"CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST": str(
(self.clp_config.archive_output.get_directory().parent / "staged-archives").resolve()
),
"CLP_STREAM_OUTPUT_DIR_HOST": str(self.clp_config.stream_output.get_directory()),
"CLP_STAGED_STREAM_OUTPUT_DIR_HOST": str(
(self.clp_config.stream_output.get_directory().parent / "staged-streams").resolve()
),
# AWS credentials
"CLP_AWS_ACCESS_KEY_ID": os.getenv("AWS_ACCESS_KEY_ID", ""),
"CLP_AWS_SECRET_ACCESS_KEY": os.getenv("AWS_SECRET_ACCESS_KEY", ""),
**self._set_up_env_for_database(),
**self._set_up_env_for_queue(),
**self._set_up_env_for_redis(),
**self._set_up_env_for_results_cache(),
**self._set_up_env_for_compression_scheduler(),
**self._set_up_env_for_query_scheduler(),
**self._set_up_env_for_compression_worker(num_workers),
**self._set_up_env_for_query_worker(num_workers),
**self._set_up_env_for_reducer(num_workers),
**self._set_up_env_for_webui(container_clp_config),
**self._set_up_env_for_garbage_collector(),
}
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/controller.py around lines 535
to 563, the environment dict currently exposes CLP_ARCHIVE_OUTPUT_DIR_HOST and
CLP_STREAM_OUTPUT_DIR_HOST but not separate staged-path variables, causing
compose mounts to collapse staged and final directories; add two new env entries
(e.g., CLP_ARCHIVE_STAGED_DIR_HOST and CLP_STREAM_STAGED_DIR_HOST) that point to
the staged subdirectories (derived from the existing archive_output and
stream_output configs), include them in env_dict alongside the existing vars,
and then update related compose files to consume these new variables for staged
mounts to prevent mount collisions.

Comment on lines +549 to +551
# AWS credentials
"CLP_AWS_ACCESS_KEY_ID": os.getenv("AWS_ACCESS_KEY_ID", ""),
"CLP_AWS_SECRET_ACCESS_KEY": os.getenv("AWS_SECRET_ACCESS_KEY", ""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Lock down .env permissions and avoid writing empty AWS creds

Secrets are written world-readable by default and blank keys get emitted. Create .env with 0600 and only include AWS creds if set.

-            # AWS credentials
-            "CLP_AWS_ACCESS_KEY_ID": os.getenv("AWS_ACCESS_KEY_ID", ""),
-            "CLP_AWS_SECRET_ACCESS_KEY": os.getenv("AWS_SECRET_ACCESS_KEY", ""),
+            # (conditionally added below) AWS credentials
-        with open(f"{self._clp_home}/.env", "w") as env_file:
-            for key, value in env_dict.items():
-                env_file.write(f"{key}={value}\n")
+        # Conditionally include AWS creds
+        aws_access_key = os.getenv("AWS_ACCESS_KEY_ID")
+        aws_secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
+        if aws_access_key and aws_secret_key:
+            env_dict["CLP_AWS_ACCESS_KEY_ID"] = aws_access_key
+            env_dict["CLP_AWS_SECRET_ACCESS_KEY"] = aws_secret_key
+
+        # Write .env with strict permissions (0600)
+        env_path = f"{self._clp_home}/.env"
+        fd = os.open(env_path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
+        with os.fdopen(fd, "w") as env_file:
+            for key, value in env_dict.items():
+                env_file.write(f"{key}={value}\n")

Also applies to: 568-571

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/controller.py around lines
549-551 and 568-571, the code writes a .env file that can be world-readable and
emits blank AWS credential entries; change the .env creation to set file
permissions to 0600 and only write the CLP_AWS_ACCESS_KEY_ID and
CLP_AWS_SECRET_ACCESS_KEY lines into the .env if their environment values are
non-empty (skip emitting those keys when empty) so secrets are not exposed or
written as blank values.

Comment on lines 91 to 117
redis:
<<: *service_defaults
container_name: "redis"
image: "redis:7.2.4"
user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
ports:
- "${CLP_REDIS_HOST:-127.0.0.1}:${CLP_REDIS_PORT:-6379}:6379"
volumes:
- "${CLP_REDIS_CONF_FILE_HOST:-./etc/redis/redis.conf}:/usr/local/etc/redis/redis.conf:ro"
- "${CLP_REDIS_DATA_DIR_HOST:-./var/data/redis}:/data"
- "${CLP_REDIS_LOGS_DIR_HOST:-./var/log/redis}:/var/log/redis"
healthcheck:
<<: *healthcheck_defaults
test: [
"CMD",
"redis-cli",
"-h", "127.0.0.1",
"-p", "6379",
"-a", "${CLP_REDIS_PASS}",
"PING"
]
command: [
"redis-server",
"/usr/local/etc/redis/redis.conf",
"--requirepass", "${CLP_REDIS_PASS}"
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Redis: move password to a secret for healthcheck/command

Avoid exposing CLP_REDIS_PASS via env/argv.

 secrets:
   CLP_DB_PASS_FILE:
     environment: "CLP_DB_PASS"
+  CLP_REDIS_PASS_FILE:
+    environment: "CLP_REDIS_PASS"
@@
   redis:
@@
-    healthcheck:
-      <<: *healthcheck_defaults
-      test: [
-        "CMD",
-        "redis-cli",
-        "-h", "127.0.0.1",
-        "-p", "6379",
-        "-a", "${CLP_REDIS_PASS}",
-        "PING"
-      ]
+    secrets: ["CLP_REDIS_PASS_FILE"]
+    healthcheck:
+      <<: *healthcheck_defaults
+      test: >
+        bash -lc 'redis-cli -h 127.0.0.1 -p 6379 -a "$(cat /run/secrets/CLP_REDIS_PASS_FILE)" PING'
@@
-    command: [
-      "redis-server",
-      "/usr/local/etc/redis/redis.conf",
-      "--requirepass", "${CLP_REDIS_PASS}"
-    ]
+    command: >
+      bash -lc 'exec redis-server /usr/local/etc/redis/redis.conf --requirepass "$(cat /run/secrets/CLP_REDIS_PASS_FILE)"'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
redis:
<<: *service_defaults
container_name: "redis"
image: "redis:7.2.4"
user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
ports:
- "${CLP_REDIS_HOST:-127.0.0.1}:${CLP_REDIS_PORT:-6379}:6379"
volumes:
- "${CLP_REDIS_CONF_FILE_HOST:-./etc/redis/redis.conf}:/usr/local/etc/redis/redis.conf:ro"
- "${CLP_REDIS_DATA_DIR_HOST:-./var/data/redis}:/data"
- "${CLP_REDIS_LOGS_DIR_HOST:-./var/log/redis}:/var/log/redis"
healthcheck:
<<: *healthcheck_defaults
test: [
"CMD",
"redis-cli",
"-h", "127.0.0.1",
"-p", "6379",
"-a", "${CLP_REDIS_PASS}",
"PING"
]
command: [
"redis-server",
"/usr/local/etc/redis/redis.conf",
"--requirepass", "${CLP_REDIS_PASS}"
]
redis:
<<: *service_defaults
container_name: "redis"
image: "redis:7.2.4"
user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
ports:
- "${CLP_REDIS_HOST:-127.0.0.1}:${CLP_REDIS_PORT:-6379}:6379"
volumes:
- "${CLP_REDIS_CONF_FILE_HOST:-./etc/redis/redis.conf}:/usr/local/etc/redis/redis.conf:ro"
- "${CLP_REDIS_DATA_DIR_HOST:-./var/data/redis}:/data"
- "${CLP_REDIS_LOGS_DIR_HOST:-./var/log/redis}:/var/log/redis"
secrets: ["CLP_REDIS_PASS_FILE"]
healthcheck:
<<: *healthcheck_defaults
test: >
bash -lc 'redis-cli -h 127.0.0.1 -p 6379 -a "$(cat /run/secrets/CLP_REDIS_PASS_FILE)" PING'
command: >
bash -lc 'exec redis-server /usr/local/etc/redis/redis.conf --requirepass "$(cat /run/secrets/CLP_REDIS_PASS_FILE)"'
🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.base.yaml around lines 91 to 117, the
service currently injects CLP_REDIS_PASS directly into env/argv for healthcheck
and command; change this to use a Docker secret (e.g., redis_password) and
reference the secret from the service so the plain password is not exposed. Add
a secrets definition and attach it to the redis service, ensure the secret is
mounted at /run/secrets/redis_password, then update the healthcheck to run a
shell command that reads the password from that file (e.g., sh -c 'redis-cli -h
127.0.0.1 -p 6379 -a "$(cat /run/secrets/redis_password)" PING') and update the
command to set the requirepass by reading the secret (e.g., sh -c 'redis-server
/usr/local/etc/redis/redis.conf --requirepass "$(cat
/run/secrets/redis_password)"'), removing direct use of CLP_REDIS_PASS from
ports/command/healthcheck.

Comment on lines 18 to 28
services:
query-scheduler:
<<: *service_defaults
container_name: "query_scheduler"
image: "${CLP_PACKAGE_CONTAINER}"
environment:
BROKER_URL: "amqp://${CLP_QUEUE_USER}:${CLP_QUEUE_PASS}@queue:5672"
CLP_DB_PASS: "${CLP_DB_PASS}"
CLP_DB_USER: "${CLP_DB_USER}"
CLP_LOGGING_LEVEL: "${CLP_QUERY_SCHEDULER_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Health‑gated startup for query scheduler

Optionally add --wait to the docs/flow and ensure healthcheck is effective; the file already defines a healthcheck.

Do you want me to add --wait to your start path (controller.deploy) to block until services are healthy?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tools/deployment/package/docker-compose.base.yaml (1)

20-289: Add restart policies for long-running services.

Improves resilience across reboots/transient failures. Apply restart: unless-stopped to database, queue, redis, results-cache, compression-scheduler, compression-worker, webui, garbage-collector. Keep one-shot initializers without restart.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cb1807 and e657feb.

📒 Files selected for processing (1)
  • tools/deployment/package/docker-compose.base.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • tools/deployment/package/docker-compose.base.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: centos-stream-9-static-linked-bins
🔇 Additional comments (11)
tools/deployment/package/docker-compose.base.yaml (11)

4-7: Remove global user override from shared defaults.

This forces vendor images (MySQL, RabbitMQ, Redis, MongoDB) to run as an arbitrary UID/GID and can break their init/startup. Set users only on CLP-owned services that require it.

 x-service-defaults: &service_defaults
   stop_grace_period: "3s"
-  user: "${CLP_UID_GID:-1000:1000}"

22-39: MySQL: allow clean shutdown and avoid overriding container user.

  • Give MySQL ≥30s to stop.
  • Don’t set a custom user for the vendor image; let the image manage permissions and init.
   database:
     <<: *service_defaults
     container_name: "database"
     image: "${CLP_DB_IMAGE:-mysql:8.0.23}"
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
+    stop_grace_period: "30s"

41-48: Keep the DB password out of argv; read from the secret file.

Avoid leaking via docker inspect/process args.

-      test: [
-        "CMD",
-        "mysqladmin", "ping",
-        "--silent",
-        "-h", "127.0.0.1",
-        "-u", "${CLP_DB_USER}",
-        "--password=${CLP_DB_PASS}"
-      ]
+      test: >
+        bash -lc 'mysqladmin ping --silent -h 127.0.0.1 -u "${CLP_DB_USER}" -p"$(cat /run/secrets/CLP_DB_PASS_FILE)"'

16-19: Add a secret for Redis password (to use in healthcheck/command).

Don’t pass CLP_REDIS_PASS in env/argv.

 secrets:
   CLP_DB_PASS_FILE:
     environment: "CLP_DB_PASS"
+  CLP_REDIS_PASS_FILE:
+    environment: "CLP_REDIS_PASS"

71-76: RabbitMQ: upgrade tag and drop user override.

  • 3.9.8 is EoL; bump to a maintained 3.x (e.g., 3.13.8).
  • Don’t override the user for the vendor image.
   queue:
     <<: *service_defaults
     container_name: "queue"
-    image: "rabbitmq:3.9.8"
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
+    image: "rabbitmq:3.13.8"

91-117: Redis: drop user override and move password to a secret for healthcheck/command.

   redis:
     <<: *service_defaults
     container_name: "redis"
     image: "redis:7.2.4"
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
+    secrets: ["CLP_REDIS_PASS_FILE"]
@@
-    healthcheck:
-      <<: *healthcheck_defaults
-      test: [
-        "CMD",
-        "redis-cli",
-        "-h", "127.0.0.1",
-        "-p", "6379",
-        "-a", "${CLP_REDIS_PASS}",
-        "PING"
-      ]
+    healthcheck:
+      <<: *healthcheck_defaults
+      test: >
+        bash -lc 'redis-cli -h 127.0.0.1 -p 6379 -a "$(cat /run/secrets/CLP_REDIS_PASS_FILE)" PING'
@@
-    command: [
-      "redis-server",
-      "/usr/local/etc/redis/redis.conf",
-      "--requirepass", "${CLP_REDIS_PASS}"
-    ]
+    command: >
+      bash -lc 'exec redis-server /usr/local/etc/redis/redis.conf --requirepass "$(cat /run/secrets/CLP_REDIS_PASS_FILE)"'

118-138: MongoDB: fix port mapping default, add stop_grace_period, and drop user override.

  • Default host port collides with Redis (6379); use 27017.
  • Give MongoDB ≥30s to stop.
  • Don’t override user for the vendor image.
   results-cache:
     <<: *service_defaults
     container_name: "results_cache"
     image: "mongo:7.0.1"
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
+    stop_grace_period: "30s"
     ports:
-      - "${CLP_RESULTS_CACHE_HOST:-127.0.0.1}:${CLP_RESULTS_CACHE_PORT:-6379}:27017"
+      - "${CLP_RESULTS_CACHE_HOST:-127.0.0.1}:${CLP_RESULTS_CACHE_PORT:-27017}:27017"

170-175: Use $HOME instead of ~ in default host paths; Compose doesn’t expand ~.

-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-$HOME/.aws}:/.aws:ro"

206-214: Separate staged/final archive dirs and avoid ~ expansion.

  • Introduce CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST for staged path.
  • Use $HOME for AWS config path.
       - "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/archives}:/var/data/archives"
-      - "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/staged-archives}:/var/data/staged-archives"
-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/staged-archives}:/var/data/staged-archives"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-$HOME/.aws}:/.aws:ro"

242-247: Avoid ~ in webui AWS config mount; use $HOME.

-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-$HOME/.aws}:/.aws:ro"

9-14: Verify Compose support for start_interval.

Some Compose/runtime versions ignore start_interval. Confirm your target Compose supports it; otherwise drop it or rely on start_period.

Comment on lines 156 to 169
compression-scheduler:
<<: *service_defaults
container_name: "compression_scheduler"
image: "${CLP_PACKAGE_CONTAINER}"
environment:
BROKER_URL: "amqp://${CLP_QUEUE_USER}:${CLP_QUEUE_PASS}@queue:5672"
CLP_DB_PASS: "${CLP_DB_PASS}"
CLP_DB_USER: "${CLP_DB_USER}"
CLP_LOGGING_LEVEL: "${CLP_COMPRESSION_SCHEDULER_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log"
PYTHONPATH: "/opt/clp/lib/python3/site-packages"
RESULT_BACKEND: >-
redis://default:${CLP_REDIS_PASS}@redis:6379/${CLP_REDIS_COMPRESSION_BACKEND_DB:-1}
volumes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Explicitly set user for CLP-owned compression-scheduler after removing the global default.

Scheduler should run as CLP_UID_GID (host user), per our runtime model.

   compression-scheduler:
     <<: *service_defaults
     container_name: "compression_scheduler"
     image: "${CLP_PACKAGE_CONTAINER}"
+    user: "${CLP_UID_GID:-1000:1000}"

Based on learnings

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
compression-scheduler:
<<: *service_defaults
container_name: "compression_scheduler"
image: "${CLP_PACKAGE_CONTAINER}"
environment:
BROKER_URL: "amqp://${CLP_QUEUE_USER}:${CLP_QUEUE_PASS}@queue:5672"
CLP_DB_PASS: "${CLP_DB_PASS}"
CLP_DB_USER: "${CLP_DB_USER}"
CLP_LOGGING_LEVEL: "${CLP_COMPRESSION_SCHEDULER_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log"
PYTHONPATH: "/opt/clp/lib/python3/site-packages"
RESULT_BACKEND: >-
redis://default:${CLP_REDIS_PASS}@redis:6379/${CLP_REDIS_COMPRESSION_BACKEND_DB:-1}
volumes:
compression-scheduler:
<<: *service_defaults
container_name: "compression_scheduler"
image: "${CLP_PACKAGE_CONTAINER}"
user: "${CLP_UID_GID:-1000:1000}"
environment:
BROKER_URL: "amqp://${CLP_QUEUE_USER}:${CLP_QUEUE_PASS}@queue:5672"
CLP_DB_PASS: "${CLP_DB_PASS}"
CLP_DB_USER: "${CLP_DB_USER}"
CLP_LOGGING_LEVEL: "${CLP_COMPRESSION_SCHEDULER_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log"
PYTHONPATH: "/opt/clp/lib/python3/site-packages"
RESULT_BACKEND: >-
redis://default:${CLP_REDIS_PASS}@redis:6379/${CLP_REDIS_COMPRESSION_BACKEND_DB:-1}
volumes:
🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.base.yaml around lines 156-169, the
compression-scheduler service no longer inherits a global default user and must
explicitly run as the host CLP user; add a user entry to the service like user:
"${CLP_UID_GID}" directly under the service definition (e.g., after image:) so
the container runs as the CLP host UID/GID at runtime and ensure the CLP_UID_GID
env var is provided in the environment or .env.

Comment on lines 189 to 205
compression-worker:
<<: *service_defaults
container_name: "compression_worker"
image: "${CLP_PACKAGE_CONTAINER}"
environment:
AWS_ACCESS_KEY_ID: "${CLP_AWS_ACCESS_KEY_ID}"
AWS_SECRET_ACCESS_KEY: "${CLP_AWS_SECRET_ACCESS_KEY}"
BROKER_URL: "amqp://${CLP_QUEUE_USER}:${CLP_QUEUE_PASS}@queue:5672"
CLP_CONFIG_PATH: "/etc/clp-config.yml"
CLP_HOME: "/opt/clp"
CLP_LOGGING_LEVEL: "${CLP_COMPRESSION_WORKER_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log/compression_worker"
CLP_WORKER_LOG_PATH: "/var/log/compression_worker/worker.log"
PYTHONPATH: "/opt/clp/lib/python3/site-packages"
RESULT_BACKEND: >-
redis://default:${CLP_REDIS_PASS}@redis:6379/${CLP_REDIS_COMPRESSION_BACKEND_DB:-1}
volumes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Explicitly set user for compression-worker after removing the global default.

   compression-worker:
     <<: *service_defaults
     container_name: "compression_worker"
     image: "${CLP_PACKAGE_CONTAINER}"
+    user: "${CLP_UID_GID:-1000:1000}"

Based on learnings

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
compression-worker:
<<: *service_defaults
container_name: "compression_worker"
image: "${CLP_PACKAGE_CONTAINER}"
environment:
AWS_ACCESS_KEY_ID: "${CLP_AWS_ACCESS_KEY_ID}"
AWS_SECRET_ACCESS_KEY: "${CLP_AWS_SECRET_ACCESS_KEY}"
BROKER_URL: "amqp://${CLP_QUEUE_USER}:${CLP_QUEUE_PASS}@queue:5672"
CLP_CONFIG_PATH: "/etc/clp-config.yml"
CLP_HOME: "/opt/clp"
CLP_LOGGING_LEVEL: "${CLP_COMPRESSION_WORKER_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log/compression_worker"
CLP_WORKER_LOG_PATH: "/var/log/compression_worker/worker.log"
PYTHONPATH: "/opt/clp/lib/python3/site-packages"
RESULT_BACKEND: >-
redis://default:${CLP_REDIS_PASS}@redis:6379/${CLP_REDIS_COMPRESSION_BACKEND_DB:-1}
volumes:
compression-worker:
<<: *service_defaults
container_name: "compression_worker"
image: "${CLP_PACKAGE_CONTAINER}"
user: "${CLP_UID_GID:-1000:1000}"
environment:
AWS_ACCESS_KEY_ID: "${CLP_AWS_ACCESS_KEY_ID}"
AWS_SECRET_ACCESS_KEY: "${CLP_AWS_SECRET_ACCESS_KEY}"
BROKER_URL: "amqp://${CLP_QUEUE_USER}:${CLP_QUEUE_PASS}@queue:5672"
CLP_CONFIG_PATH: "/etc/clp-config.yml"
CLP_HOME: "/opt/clp"
CLP_LOGGING_LEVEL: "${CLP_COMPRESSION_WORKER_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log/compression_worker"
CLP_WORKER_LOG_PATH: "/var/log/compression_worker/worker.log"
PYTHONPATH: "/opt/clp/lib/python3/site-packages"
RESULT_BACKEND: >-
redis://default:${CLP_REDIS_PASS}@redis:6379/${CLP_REDIS_COMPRESSION_BACKEND_DB:-1}
volumes:
🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.base.yaml around lines 189 to 205,
the compression-worker service no longer inherits the removed global default
user, so add an explicit user key to the service (e.g. user: "1000:1000" or the
named service user used elsewhere) so the container does not run as root; ensure
the chosen UID:GID or username matches files/volume ownership (adjust
CLP_LOGS_DIR and mounted volumes permissions if needed) and add the user entry
directly under the compression-worker service definition.

Comment on lines 265 to 276
garbage-collector:
<<: *service_defaults
container_name: "garbage_collector"
image: "${CLP_PACKAGE_CONTAINER}"
environment:
CLP_DB_PASS: "${CLP_DB_PASS}"
CLP_DB_USER: "${CLP_DB_USER}"
CLP_HOME: "/opt/clp"
CLP_LOGGING_LEVEL: "${CLP_GC_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log/garbage_collector"
PYTHONPATH: "/opt/clp/lib/python3/site-packages"
volumes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Explicitly set user for garbage-collector after removing the global default.

   garbage-collector:
     <<: *service_defaults
     container_name: "garbage_collector"
     image: "${CLP_PACKAGE_CONTAINER}"
+    user: "${CLP_UID_GID:-1000:1000}"

Based on learnings

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
garbage-collector:
<<: *service_defaults
container_name: "garbage_collector"
image: "${CLP_PACKAGE_CONTAINER}"
environment:
CLP_DB_PASS: "${CLP_DB_PASS}"
CLP_DB_USER: "${CLP_DB_USER}"
CLP_HOME: "/opt/clp"
CLP_LOGGING_LEVEL: "${CLP_GC_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log/garbage_collector"
PYTHONPATH: "/opt/clp/lib/python3/site-packages"
volumes:
garbage-collector:
<<: *service_defaults
container_name: "garbage_collector"
image: "${CLP_PACKAGE_CONTAINER}"
user: "${CLP_UID_GID:-1000:1000}"
environment:
CLP_DB_PASS: "${CLP_DB_PASS}"
CLP_DB_USER: "${CLP_DB_USER}"
CLP_HOME: "/opt/clp"
CLP_LOGGING_LEVEL: "${CLP_GC_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log/garbage_collector"
PYTHONPATH: "/opt/clp/lib/python3/site-packages"
volumes:
🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.base.yaml around lines 265 to 276,
the garbage-collector service no longer inherits the global default user and
therefore needs an explicit user set; add a user entry for the garbage-collector
service (matching the non-root UID:GID used by other services or an env-backed
value like CLP_USER/CLP_GROUP) so the container runs as the intended non-root
user and update any related volume permissions if necessary.

"CMD",
"bash",
"-c",
"< /dev/tcp/query_scheduler/7000"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gibber9809 @kirkrodrigues

(for the background - this is to specify a health checking method of a service container. if the check fails for consecutively 3 times, the service will be restarted. )

here we are trying to open a TCP connection with the query scheduler server to check if it's alive, which is similar to a telnet command. Do you think this the right method to check liveness of the query scheduler?

if so, i might need to update the query scheduler code a bit to handle empty requests gracefully - currently, for every empty request, this is printed in the console:

future: <Task finished name='Task-16' coro=<handle_reducer_connection() done, defined at /opt/clp/lib/python3/site-packages/job_orchestration/scheduler/query/reducer_handler.py:104> exception=IncompleteReadError('0 bytes read on a total of 8 expected bytes')>

Traceback (most recent call last):

  File "/opt/clp/lib/python3/site-packages/job_orchestration/scheduler/query/reducer_handler.py", line 110, in handle_reducer_connection

    message_bytes = await _recv_msg_from_reducer(reader)

  File "/opt/clp/lib/python3/site-packages/job_orchestration/scheduler/query/reducer_handler.py", line 87, in _recv_msg_from_reducer

    msg_size_bytes = await reader.readexactly(8)

  File "/usr/lib/python3.10/asyncio/streams.py", line 706, in readexactly

    raise exceptions.IncompleteReadError(incomplete, n)

asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of 8 expected bytes

# Conflicts:
#	components/clp-package-utils/clp_package_utils/scripts/start_clp.py
#	components/clp-py-utils/clp_py_utils/clp_config.py
#	docs/src/dev-docs/building-package.md
#	tools/docker-images/clp-package/build.sh
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (11)
taskfile.yaml (2)

55-55: Verify CI workflows and docs reflect the new default task.

The default task now depends on docker-images:package, which aligns with the Docker Compose orchestration migration. However, existing CI workflows and documentation may reference the old default behavior.

Based on past review comments, the following locations should be verified:

  • .github/workflows/clp-artifact-build.yaml (around line 540): ensure task invocations are correct
  • docs/src/dev-docs/building-package.md (around line 49): update examples to reflect new default

Run the following script to identify references that may need updating:

#!/bin/bash
# Find references to 'task package' or assumptions about default task behavior
rg -n -C3 -g '!**/node_modules/**' -g '!**/build/**' '\b(task\s+package\b|task\s+default\b)'

# Check CI workflows specifically
rg -n -C3 -g '.github/workflows/*.yaml' '\btask\b'

# Check documentation
rg -n -C3 -g 'docs/**/*.md' '\btask\b.*\bpackage\b'

172-175: Consider adding Docker Compose YAML validation.

The rsync correctly copies deployment artifacts into the package. However, validating the Docker Compose YAML before packaging would help catch configuration errors early.

Add a validation step before line 172 (after npm ci):

       cd "{{.OUTPUT_DIR}}/var/www/webui"
       PATH="{{.G_NODEJS_22_BIN_DIR}}":$PATH npm ci --omit=dev
+      # Validate Docker Compose configuration
+      - |-
+        if [ "${CLP_VALIDATE_COMPOSE:-0}" = "1" ]; then
+          docker compose -f "tools/deployment/package/docker-compose.yml" config -q
+        fi
       - >-
         rsync -a
         "tools/deployment/package/"
         "{{.OUTPUT_DIR}}"

This validation is gated by CLP_VALIDATE_COMPOSE=1 so CI runners without Docker can skip it.

components/clp-package-utils/clp_package_utils/general.py (6)

134-149: Fix error message to reference "Docker Compose" correctly.

The error message on line 148 mentions "docker-compose" but the code checks for Docker Compose v2 using the docker compose command. This inconsistency could confuse users.

Apply this diff to fix the error message:

-        raise EnvironmentError("docker-compose is not installed or not functioning properly.")
+        raise EnvironmentError("Docker Compose is not installed or not functioning properly.")

171-175: Critical: Restore idempotency for stop operations.

Lines 172-173 raise an error when Docker Compose is not running and should_compose_run is False. This breaks idempotency: calling stop_clp.py twice will fail on the second invocation. Stop operations should be safe to repeat.

Apply this diff to fix idempotency:

     is_running = is_docker_compose_running(project_name)
     if should_compose_run and not is_running:
         raise EnvironmentError("docker-compose is not running.")
-    if not should_compose_run and is_running:
+    if not should_compose_run:
+        if is_running:
-        raise EnvironmentError("docker-compose is already running.")
+            raise EnvironmentError("docker-compose is already running.")
+        # Already stopped; idempotent no-op
+        return

310-320: Fix docstring to accurately describe return value.

The docstring on line 315 states the function returns "The container config and the mounts" but the function only returns the transformed CLPConfig (line 320).

Apply this diff to fix the docstring:

 def generate_docker_compose_container_config(clp_config: CLPConfig) -> CLPConfig:
     """
-    Copies the given config and transforms mount paths and hosts for Docker Compose.
+    Copies the given config and transforms paths and hosts for containerized execution.
 
     :param clp_config:
-    :return: The container config and the mounts.
+    :return: The transformed container config.
     """

497-505: Add file type validation for database base_config.

Line 500 checks base_config.exists() but doesn't verify it's a file. A directory with the same name would pass this check but cause issues downstream.

Apply this diff to add file type validation:

     if not base_config.exists():
         raise ValueError(
             f"{DB_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
         )
+    if not base_config.is_file():
+        raise ValueError(
+            f"{DB_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
+        )

516-524: Add file type validation for Redis base_config.

Line 519 checks base_config.exists() but doesn't verify it's a file. Consistent with the database config validation, this should also verify the path is a regular file.

Apply this diff to add file type validation:

     if not base_config.exists():
         raise ValueError(
             f"{REDIS_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
         )
+    if not base_config.is_file():
+        raise ValueError(
+            f"{REDIS_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
+        )

540-548: Add file type validation for results cache base_config.

Line 543 checks base_config.exists() but doesn't verify it's a file. For consistency and robustness, add the same file type check as recommended for the other components.

Apply this diff to add file type validation:

     if not base_config.exists():
         raise ValueError(
             f"{RESULTS_CACHE_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
         )
+    if not base_config.is_file():
+        raise ValueError(
+            f"{RESULTS_CACHE_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
+        )
components/clp-py-utils/clp_py_utils/clp_config.py (1)

685-685: Default directory "/" risks scanning entire filesystem.

FsIngestionConfig defaults directory to pathlib.Path("/"), which is dangerous if used directly on the host before container transformation. Consider defaulting to a safe sentinel (e.g., pathlib.Path("input-logs") or making it Optional[pathlib.Path] = None) and adding validation to require explicit user configuration, while still allowing transform_for_container() to override it to /mnt/logs.

 class FsIngestionConfig(FsStorage):
-    directory: pathlib.Path = pathlib.Path("/")
+    directory: pathlib.Path = pathlib.Path("input-logs")  # Safe default; container transform overrides

Based on past review comments.

docs/src/dev-docs/building-package.md (2)

89-90: Remove trailing space (MD009) and keep as a single wrapped paragraph.

Tidy formatting and avoid a mid‑sentence line break.

-The package includes a `docker-compose.yaml` file that can be used to deploy CLP using Docker Compose.
-If you want to manually deploy with Docker Compose instead of using the package scripts, see the 
-[Docker Compose design][docker-compose-design] for more information.
+The package includes a `docker-compose.yml` file that can be used to deploy CLP using Docker Compose. If you want to manually deploy with Docker Compose instead of using the package scripts, see the [Docker Compose design][docker-compose-design] for more information.

88-91: Add a minimal “manual Compose” example for quick start.

Improves skimmability; shows exact commands without context switching.

 The package includes a `docker-compose.yaml` file that can be used to deploy CLP using Docker Compose. If you want to manually deploy with Docker Compose instead of using the package scripts, see the [Docker Compose design][docker-compose-design] for more information.
+
+Example:
+
+```bash
+# From the package root where the Compose file resides
+docker compose up -d
+# Tear down when done (preserves volumes; append -v to also remove volumes)
+docker compose down
+```
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e657feb and c79259a.

📒 Files selected for processing (7)
  • components/clp-package-utils/clp_package_utils/general.py (6 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/decompress.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/native/compress.py (1 hunks)
  • components/clp-py-utils/clp_py_utils/clp_config.py (18 hunks)
  • docs/src/dev-docs/building-package.md (1 hunks)
  • taskfile.yaml (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-25T19:26:32.436Z
Learnt from: junhaoliao
PR: y-scope/clp#1335
File: taskfiles/docker-images.yaml:15-15
Timestamp: 2025-09-25T19:26:32.436Z
Learning: In the CLP project's Taskfile, the `:package` task creates the G_PACKAGE_BUILD_DIR directory structure, so any task that depends on `:package` (like `docker-images:package`) can safely assume this directory exists without needing additional mkdir commands.

Applied to files:

  • docs/src/dev-docs/building-package.md
🧬 Code graph analysis (2)
components/clp-package-utils/clp_package_utils/general.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (14)
  • CLPConfig (906-1117)
  • transform_for_container (265-267)
  • transform_for_container (331-333)
  • transform_for_container (399-401)
  • transform_for_container (438-440)
  • transform_for_container (491-493)
  • transform_for_container (540-542)
  • transform_for_container (631-632)
  • transform_for_container (687-688)
  • transform_for_container (694-695)
  • transform_for_container (701-702)
  • transform_for_container (708-709)
  • transform_for_container (715-716)
  • transform_for_container (1097-1117)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
components/clp-py-utils/clp_py_utils/core.py (1)
  • make_config_path_absolute (42-53)
🪛 markdownlint-cli2 (0.18.1)
docs/src/dev-docs/building-package.md

89-89: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
🔇 Additional comments (28)
components/clp-py-utils/clp_py_utils/clp_config.py (19)

4-4: LGTM: ClassVar import added appropriately.

The import supports the new DEFAULT_PORT ClassVar declarations throughout the file.


48-62: LGTM: Container path constants are well-defined.

The centralized container path constants follow a consistent pattern and align with the Docker Compose architecture.


72-75: LGTM: DeploymentType enum supports deployment variants.

The enum cleanly captures the BASE/FULL deployment distinction used by the Docker Compose controller.


152-156: LGTM: DEFAULT_PORT ClassVars improve maintainability.

Using ClassVar prevents these constants from being captured as instance attributes by Pydantic, and the pattern is consistently applied across all service classes.


265-268: Verify transform pattern consistency.

The transform logic correctly updates host/port for containerized execution. Ensure the pattern is applied uniformly across all service classes.


631-633: LGTM: No-op transform for S3 ingestion is appropriate.

S3 ingestion configuration requires no container-specific adjustments.


687-717: LGTM: Storage transform methods handle container paths correctly.

Each storage type appropriately transforms paths for containerized execution, mapping host directories to their container mount points.


929-934: LGTM: Directory defaults and private attributes are well-structured.

The private attributes for image ID and version paths correctly use PrivateAttr to exclude them from Pydantic model serialization.


951-952: LGTM: Private paths made absolute consistently.

The image ID and version file paths are resolved relative to clp_home, matching the pattern for other configuration paths.


1045-1052: LGTM: Container image resolution logic is sound.

The fallback from image.id to version-based tag provides flexibility for different build scenarios.


1057-1062: LGTM: Deployment type determination is correct.

The method correctly maps Presto to BASE deployment and other engines to FULL deployment.


1064-1072: LGTM: Using set for custom serialized fields is idiomatic.

The set literal provides clearer intent and slightly better lookup performance for membership checks.


1097-1118: LGTM: Container transformation orchestrates all service configs.

The method systematically transforms paths and delegates to each service component's transform method, ensuring consistent containerization across the configuration.


46-50: Container path constants defined clearly.

The container-side paths are well-defined. Note that CONTAINER_AWS_CONFIG_DIRECTORY = pathlib.Path("/") / ".aws" uses /.aws, which may not match the runtime user's home. Past review comments suggested using the container user's actual home directory (e.g., /root/.aws for root, or dynamically resolving Path.home() / ".aws"). If containers run as root, consider /root/.aws; if non-root, align with that user's home.

Based on past review comments.


72-75: LGTM: DeploymentType enum introduced.

The enum clearly distinguishes BASE and FULL deployment types, used in get_deployment_type() to determine deployment mode based on query engine.


152-156: LGTM: DEFAULT_PORT as ClassVar for Database.

Using ClassVar prevents Pydantic from treating this as an instance field, keeping it as a class-level constant. This is correctly applied across Database, QueryScheduler, Redis, Reducer, ResultsCache, and Queue.


929-934: LGTM: Default paths use centralized constants.

The defaults for data_directory, logs_directory, _image_id_path, and _version_file_path now reference the constants defined at the top of the file, improving maintainability.


1057-1062: LGTM: get_deployment_type method correctly maps query engine to deployment type.

Returns DeploymentType.BASE for Presto (scheduler-only), otherwise DeploymentType.FULL (full pipeline). This aligns with the PR objectives to support different deployment configurations.


1097-1118: Container transform comprehensively updates config for containerised execution.

The method transforms paths to container-absolute paths and updates all service hostnames/ports to container service names. This is well-structured. Consider returning self for consistency, as noted in a previous comment.

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (2)

12-12: LGTM: Import relocation aligns with config centralization.

The constant is now correctly imported from its new location in clp_py_utils.clp_config, maintaining consistency across the codebase.


12-12: LGTM: Import relocation to centralize config constants.

CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH is now imported from clp_py_utils.clp_config, aligning with the PR's centralization of configuration constants. Usage remains unchanged.

components/clp-package-utils/clp_package_utils/scripts/native/compress.py (1)

13-13: LGTM: Import updated to reflect constant relocation.

Consistent with the broader refactoring to centralize configuration constants in clp_py_utils.clp_config.

components/clp-package-utils/clp_package_utils/scripts/decompress.py (2)

12-12: LGTM: Import source updated for relocated constant.

The import now references the canonical location in clp_py_utils.clp_config, ensuring consistency across all scripts.


12-12: LGTM: Import relocation to centralize config constants.

CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH is now imported from clp_py_utils.clp_config, consistent with the centralization effort across the codebase. No functional changes.

components/clp-package-utils/clp_package_utils/general.py (2)

3-3: LGTM!

The json import and CONTAINER_* constants are necessary for the new Docker Compose workflow and are used appropriately throughout the file.

Also applies to: 20-22


178-189: LGTM!

The refactored _validate_log_directory helper with its clear docstring improves code organization and readability.

taskfile.yaml (1)

121-121: LGTM!

Adding tools/deployment/package/**/* to the package sources correctly tracks Docker Compose deployment artifacts as dependencies for the packaging step.

docs/src/dev-docs/building-package.md (1)

88-88: Standardise Compose filename to match the packaged artefact

Confirm whether the package ships a docker-compose.yml or docker-compose.yaml and update the docs to use the exact filename.

Comment on lines +265 to +268
def transform_for_container(self):
self.host = DB_COMPONENT_NAME
self.port = self.DEFAULT_PORT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Transform methods should return self for chaining.

All transform_for_container() methods (Database, QueryScheduler, Redis, Reducer, ResultsCache, Queue, storage classes, CLPConfig) should return self to support method chaining and consistency. Past review comments requested this.

Apply this pattern to all transform methods:

 def transform_for_container(self):
     self.host = DB_COMPONENT_NAME
     self.port = self.DEFAULT_PORT
+    return self

Also applies to: 331-334, 399-402, 438-441, 491-494, 540-543, 631-633, 687-689, 694-696, 701-703, 708-710, 715-717, 1097-1118

Based on past review comments.

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_config.py around lines 265-268 (and
also update the other listed ranges: 331-334, 399-402, 438-441, 491-494,
540-543, 631-633, 687-689, 694-696, 701-703, 708-710, 715-717, 1097-1118), the
transform_for_container() methods currently set attributes but do not return
anything; update each transform_for_container() implementation to return self at
the end so callers can chain calls. Locate each method in the specified line
ranges and append a final "return self" statement (keeping existing logic
unchanged) to ensure consistent fluent API behavior across all transform
methods.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (9)
components/clp-package-utils/clp_package_utils/controller.py (9)

445-455: Minor: Use local variable in assignment.

Line 454 accesses updates[key] even though value is already bound in the loop.

Apply this diff:

             if isinstance(value, dict):
                 self._update_settings_object(f"{parent_key_prefix}{key}.", settings[key], value)
             else:
-                settings[key] = updates[key]
+                settings[key] = value

Based on past review comments.


513-520: Worker count can be zero on single-CPU hosts.

The calculation multiprocessing.cpu_count() // 2 returns 0 when cpu_count() is 1, which would prevent workers from starting.

Apply this diff:

     @staticmethod
     def _get_num_workers() -> int:
         """
         Gets the parallelism number for worker components.
         TODO: Revisit after moving from single-container to multi-container workers.
         :return: Number of worker processes.
         """
-        return multiprocessing.cpu_count() // 2
+        return max(1, multiprocessing.cpu_count() // 2)

Based on past review comments.


479-482: Optionally add --wait flag for health check synchronization.

Docker Compose v2 supports --wait to block until health checks pass, improving deployment determinism. Consider adding this flag when supported.

Apply this diff:

-        cmd += ["up", "--detach"]
+        cmd += ["up", "--detach", "--wait"]

Verify your target Docker Compose baseline supports --wait before applying.

Based on past review comments.


606-619: Improve chown portability.

The --recursive flag at line 618 is GNU-specific and may fail on BSD systems. Use the portable -R flag instead.

Apply this diff:

-    chown_cmd = ["chown", "--recursive", f"{user_id}:{group_id}", str(path)]
+    chown_cmd = ["chown", "-R", f"{user_id}:{group_id}", str(path)]

Alternatively, implement recursion in Python using os.walk and os.chown for full portability.

Based on past review comments.


622-629: Hostname resolution lacks error handling and IPv6 support.

socket.gethostbyname is IPv4-only and will raise socket.gaierror without context on failures. Consider using socket.getaddrinfo for IPv6 compatibility and adding error handling.

Apply this diff:

 def _get_ip_from_hostname(hostname: str) -> str:
     """
     Resolves a hostname to an IP address.
 
     :param hostname: The hostname to resolve.
     :return: The resolved IP address.
     """
-    return socket.gethostbyname(hostname)
+    try:
+        # Prefer IPv4, but fall back to any address returned
+        infos = socket.getaddrinfo(hostname, None)
+        for info in infos:
+            if info[0] == socket.AF_INET:
+                return info[4][0]
+        # If no IPv4, return first address
+        return infos[0][4][0]
+    except socket.gaierror as ex:
+        raise ValueError(f"Failed to resolve hostname '{hostname}': {ex}") from ex

Based on past review comments.


63-96: Refine the abstract method contract for _provision.

The base class declares _provision returns EnvVarsDict, but the subclass implementation writes to a file and doesn't return anything meaningful to the caller. The deploy() method also ignores the return value. Update the signature to return None or make it optional.

Apply this diff:

     @abstractmethod
-    def _provision(self) -> EnvVarsDict:
+    def _provision(self) -> None:
         """
         Prepares all components with orchestrator-specific logic.
 
-        :return: Dictionary of environment variables to be used by the orchestrator.
         """
         pass

Based on past review comments.


208-223: Ensure log file mode is applied to existing files.

Path.touch(mode=LOGS_FILE_MODE, exist_ok=True) only sets the mode when creating a new file. If the log file already exists, the mode is not updated. Explicitly call chmod after touch to ensure the correct permissions regardless of whether the file was created or already existed.

Apply this diff:

         logs_file = self.clp_config.logs_directory / f"{component_name}.log"
         logs_file.touch(mode=LOGS_FILE_MODE, exist_ok=True)
+        logs_file.chmod(LOGS_FILE_MODE)

Apply the same fix to the query scheduler at lines 234-235.

Based on past review comments.


402-414: Consider applying ownership changes to garbage collector log directory.

Similar to worker components, the garbage collector creates a log directory but does not call _chown_paths_if_root, creating an inconsistency with infrastructure services.

Apply this diff:

         logs_dir = self.clp_config.logs_directory / component_name
         logs_dir.mkdir(parents=True, exist_ok=True)
+        _chown_paths_if_root(logs_dir)

98-127: Consider making database images configurable.

The database image tags are hardcoded to mysql:8.0.23 and mariadb:10-jammy at lines 124-126. Tightly pinned versions can become stale, and different environments may require different images. Consider reading from an environment variable or configuration with sensible defaults.

Apply this diff:

             "CLP_DB_IMAGE": (
-                "mysql:8.0.23" if "mysql" == self.clp_config.database.type else "mariadb:10-jammy"
+                os.getenv(
+                    "CLP_DB_IMAGE",
+                    "mysql:8" if "mysql" == self.clp_config.database.type else "mariadb:10",
+                )
             ),

Based on past review comments.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c79259a and af96fc8.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/controller.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
🧬 Code graph analysis (1)
components/clp-package-utils/clp_package_utils/controller.py (3)
components/clp-py-utils/clp_py_utils/clp_config.py (10)
  • AwsAuthType (93-97)
  • CLPConfig (906-1117)
  • DeploymentType (72-74)
  • QueryEngine (82-85)
  • StorageEngine (77-79)
  • StorageType (88-90)
  • get_clp_connection_params_and_type (220-241)
  • get_directory (797-798)
  • get_directory (820-821)
  • get_deployment_type (1057-1061)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (3)
  • get_archives_table_name (299-300)
  • get_datasets_table_name (307-308)
  • get_files_table_name (311-312)
components/clp-package-utils/clp_package_utils/general.py (9)
  • check_docker_dependencies (151-175)
  • dump_shared_container_config (359-369)
  • generate_docker_compose_container_config (310-320)
  • get_clp_home (107-123)
  • validate_db_config (497-507)
  • validate_queue_config (510-513)
  • validate_redis_config (516-526)
  • validate_results_cache_config (540-554)
  • validate_webui_config (569-578)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-22.04)
🔇 Additional comments (6)
components/clp-package-utils/clp_package_utils/controller.py (6)

1-60: LGTM: Imports and constants are well-structured.

The imports, type alias, and constants are organized logically. The LOGS_FILE_MODE (0o644) and service container UID/GID constants (999:999) align with the containerized deployment architecture.


129-206: LGTM: Infrastructure service provisioning follows consistent pattern.

The queue, Redis, and results cache provisioning methods correctly validate configurations, create necessary directories, apply ownership changes when running as root, and return appropriate environment variables. The naming convention uses _HOST suffix for host paths and _FILE_HOST for file paths, maintaining consistency.


300-400: LGTM: WebUI provisioning handles complex configuration correctly.

The method properly transforms configuration for different storage types (S3 vs filesystem) and query engines (Presto vs native), updates both client and server settings files, and returns appropriate environment variables. The use of container_clp_config for container-specific paths is correct.


543-543: Package container image is configurable via config.

The CLP_PACKAGE_CONTAINER value is read from self.clp_config.execution_container, which is loaded from the configuration file. While past comments suggested making this configurable via environment variable, the current approach allows configuration through the CLP config file. If runtime override is needed, consider reading from os.getenv("CLP_PACKAGE_CONTAINER", self.clp_config.execution_container).


573-590: LGTM: Instance ID generation is straightforward.

The function correctly reads an existing instance ID or generates a new one using the last 4 characters of a UUID. The persistence in the logs directory is appropriate.


593-603: LGTM: Conditional chown logic is correct.

The function correctly checks if running as root before attempting ownership changes to the service container UID/GID (999:999).

Comment on lines +242 to +298
def _set_up_env_for_compression_worker(self, num_workers: int) -> EnvVarsDict:
"""
Prepares environment variables for the compression worker component.
:param num_workers: Number of worker processes to run.
:return: Dictionary of compression worker-related environment variables.
"""
component_name = COMPRESSION_WORKER_COMPONENT_NAME
logger.info(f"Setting up environment for {component_name}...")

logs_dir = self.clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)

return {
"CLP_COMPRESSION_WORKER_CONCURRENCY": str(num_workers),
"CLP_COMPRESSION_WORKER_LOGGING_LEVEL": self.clp_config.compression_worker.logging_level,
"CLP_COMPRESSION_WORKER_LOGS_DIR_HOST": str(logs_dir),
}

def _set_up_env_for_query_worker(self, num_workers: int) -> EnvVarsDict:
"""
Prepares environment variables for the query worker component.
:param num_workers: Number of worker processes to run.
:return: Dictionary of component-related environment variables.
"""
component_name = QUERY_WORKER_COMPONENT_NAME
logger.info(f"Setting up environment for {component_name}...")

logs_dir = self.clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)

return {
"CLP_QUERY_WORKER_LOGGING_LEVEL": self.clp_config.query_worker.logging_level,
"CLP_QUERY_WORKER_LOGS_DIR_HOST": str(logs_dir),
"CLP_QUERY_WORKER_CONCURRENCY": str(num_workers),
}

def _set_up_env_for_reducer(self, num_workers: int) -> EnvVarsDict:
"""
Prepares environment variables for the reducer component.
:param num_workers: Number of worker processes to run.
:return: Dictionary of component-related environment variables.
"""
component_name = REDUCER_COMPONENT_NAME
logger.info(f"Setting up environment for {component_name}...")

logs_dir = self.clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)

return {
"CLP_REDUCER_LOGGING_LEVEL": self.clp_config.reducer.logging_level,
"CLP_REDUCER_LOGS_DIR_HOST": str(logs_dir),
"CLP_REDUCER_CONCURRENCY": str(num_workers),
"CLP_REDUCER_UPSERT_INTERVAL": str(self.clp_config.reducer.upsert_interval),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider applying ownership changes to worker log directories for consistency.

The compression worker, query worker, and reducer methods create log directories but do not call _chown_paths_if_root. While these components may run as the current user, applying _chown_paths_if_root would maintain consistency with infrastructure services (database, queue, Redis, results cache) and prevent potential permission issues in edge cases.

Apply this diff to each worker provisioning method:

         logs_dir = self.clp_config.logs_directory / component_name
         logs_dir.mkdir(parents=True, exist_ok=True)
+        _chown_paths_if_root(logs_dir)

Apply to lines 252-253, 271-272, and 290-291.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _set_up_env_for_compression_worker(self, num_workers: int) -> EnvVarsDict:
"""
Prepares environment variables for the compression worker component.
:param num_workers: Number of worker processes to run.
:return: Dictionary of compression worker-related environment variables.
"""
component_name = COMPRESSION_WORKER_COMPONENT_NAME
logger.info(f"Setting up environment for {component_name}...")
logs_dir = self.clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
return {
"CLP_COMPRESSION_WORKER_CONCURRENCY": str(num_workers),
"CLP_COMPRESSION_WORKER_LOGGING_LEVEL": self.clp_config.compression_worker.logging_level,
"CLP_COMPRESSION_WORKER_LOGS_DIR_HOST": str(logs_dir),
}
def _set_up_env_for_query_worker(self, num_workers: int) -> EnvVarsDict:
"""
Prepares environment variables for the query worker component.
:param num_workers: Number of worker processes to run.
:return: Dictionary of component-related environment variables.
"""
component_name = QUERY_WORKER_COMPONENT_NAME
logger.info(f"Setting up environment for {component_name}...")
logs_dir = self.clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
return {
"CLP_QUERY_WORKER_LOGGING_LEVEL": self.clp_config.query_worker.logging_level,
"CLP_QUERY_WORKER_LOGS_DIR_HOST": str(logs_dir),
"CLP_QUERY_WORKER_CONCURRENCY": str(num_workers),
}
def _set_up_env_for_reducer(self, num_workers: int) -> EnvVarsDict:
"""
Prepares environment variables for the reducer component.
:param num_workers: Number of worker processes to run.
:return: Dictionary of component-related environment variables.
"""
component_name = REDUCER_COMPONENT_NAME
logger.info(f"Setting up environment for {component_name}...")
logs_dir = self.clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
return {
"CLP_REDUCER_LOGGING_LEVEL": self.clp_config.reducer.logging_level,
"CLP_REDUCER_LOGS_DIR_HOST": str(logs_dir),
"CLP_REDUCER_CONCURRENCY": str(num_workers),
"CLP_REDUCER_UPSERT_INTERVAL": str(self.clp_config.reducer.upsert_interval),
}
def _set_up_env_for_compression_worker(self, num_workers: int) -> EnvVarsDict:
"""
Prepares environment variables for the compression worker component.
:param num_workers: Number of worker processes to run.
:return: Dictionary of compression worker-related environment variables.
"""
component_name = COMPRESSION_WORKER_COMPONENT_NAME
logger.info(f"Setting up environment for {component_name}...")
logs_dir = self.clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
_chown_paths_if_root(logs_dir)
return {
"CLP_COMPRESSION_WORKER_CONCURRENCY": str(num_workers),
"CLP_COMPRESSION_WORKER_LOGGING_LEVEL": self.clp_config.compression_worker.logging_level,
"CLP_COMPRESSION_WORKER_LOGS_DIR_HOST": str(logs_dir),
}
def _set_up_env_for_query_worker(self, num_workers: int) -> EnvVarsDict:
"""
Prepares environment variables for the query worker component.
:param num_workers: Number of worker processes to run.
:return: Dictionary of component-related environment variables.
"""
component_name = QUERY_WORKER_COMPONENT_NAME
logger.info(f"Setting up environment for {component_name}...")
logs_dir = self.clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
_chown_paths_if_root(logs_dir)
return {
"CLP_QUERY_WORKER_LOGGING_LEVEL": self.clp_config.query_worker.logging_level,
"CLP_QUERY_WORKER_LOGS_DIR_HOST": str(logs_dir),
"CLP_QUERY_WORKER_CONCURRENCY": str(num_workers),
}
def _set_up_env_for_reducer(self, num_workers: int) -> EnvVarsDict:
"""
Prepares environment variables for the reducer component.
:param num_workers: Number of worker processes to run.
:return: Dictionary of component-related environment variables.
"""
component_name = REDUCER_COMPONENT_NAME
logger.info(f"Setting up environment for {component_name}...")
logs_dir = self.clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
_chown_paths_if_root(logs_dir)
return {
"CLP_REDUCER_LOGGING_LEVEL": self.clp_config.reducer.logging_level,
"CLP_REDUCER_LOGS_DIR_HOST": str(logs_dir),
"CLP_REDUCER_CONCURRENCY": str(num_workers),
"CLP_REDUCER_UPSERT_INTERVAL": str(self.clp_config.reducer.upsert_interval),
}
🤖 Prompt for AI Agents
components/clp-package-utils/clp_package_utils/controller.py lines 242-298: the
three worker env setup methods create logs_dir but don't call
_chown_paths_if_root, leading to inconsistent ownership compared to other
services; after logs_dir.mkdir(parents=True, exist_ok=True) in
_set_up_env_for_compression_worker (around lines 252-253),
_set_up_env_for_query_worker (around lines 271-272), and _set_up_env_for_reducer
(around lines 290-291), call self._chown_paths_if_root with the created logs_dir
so the directory ownership is fixed when running as root, preserving consistency
and preventing permission issues.

Comment on lines +550 to +551
"CLP_AWS_ACCESS_KEY_ID": os.getenv("AWS_ACCESS_KEY_ID", ""),
"CLP_AWS_SECRET_ACCESS_KEY": os.getenv("AWS_SECRET_ACCESS_KEY", ""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Secure .env file permissions and avoid writing empty AWS credentials.

The .env file can contain secrets but is created with default permissions (typically world-readable). Additionally, empty AWS credential strings are written even when not set, which clutters the file.

Apply this diff:

-            # AWS credentials
-            "CLP_AWS_ACCESS_KEY_ID": os.getenv("AWS_ACCESS_KEY_ID", ""),
-            "CLP_AWS_SECRET_ACCESS_KEY": os.getenv("AWS_SECRET_ACCESS_KEY", ""),

And update the file writing logic:

+        # Conditionally include AWS credentials only if set
+        aws_access_key = os.getenv("AWS_ACCESS_KEY_ID")
+        aws_secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
+        if aws_access_key and aws_secret_key:
+            env_dict["CLP_AWS_ACCESS_KEY_ID"] = aws_access_key
+            env_dict["CLP_AWS_SECRET_ACCESS_KEY"] = aws_secret_key
+
         if self.clp_config.aws_config_directory is not None:
             env_dict["CLP_AWS_CONFIG_DIR_HOST"] = str(self.clp_config.aws_config_directory)
 
-        with open(f"{self._clp_home}/.env", "w") as env_file:
+        # Write .env with strict permissions (0o600)
+        env_path = f"{self._clp_home}/.env"
+        fd = os.open(env_path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
+        with os.fdopen(fd, "w") as env_file:
             for key, value in env_dict.items():
                 env_file.write(f"{key}={value}\n")

Based on past review comments.

Also applies to: 568-570

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/controller.py around lines
550-551 and 568-570, the .env writer currently writes empty AWS credential keys
and creates the file with default (potentially world-readable) permissions;
change the logic to only include CLP_AWS_ACCESS_KEY_ID and
CLP_AWS_SECRET_ACCESS_KEY in the .env content when their corresponding
environment variables are non-empty, and ensure the file is created with
restrictive permissions (e.g., open/create with mode 0o600 or write then
os.chmod to 0o600) so secrets are not world-readable; keep the rest of the .env
keys behavior the same.

# Conflicts:
#	components/clp-package-utils/clp_package_utils/scripts/start_clp.py
#	components/clp-py-utils/clp_py_utils/clp_config.py
#	taskfiles/docker-images.yaml
@junhaoliao junhaoliao requested review from kirkrodrigues and removed request for kirkrodrigues October 2, 2025 01:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

283-290: Port validation allows port 0, inconsistent with other validators.

The _validate_port function allows port = 0 (min_valid_port = 0), but this is inconsistent with field validators like QueryScheduler.validate_port which typically expect ports in range 1-65535. Port 0 is reserved and not valid for most services.

Apply this diff to enforce standard port range 1-65535:

 def _validate_port(cls, value):
-    min_valid_port = 0
+    min_valid_port = 1
     max_valid_port = 2**16 - 1
     if min_valid_port > value or max_valid_port < value:
         raise ValueError(
             f"{cls.__name__}.port is not within valid range " f"{min_valid_port}-{max_valid_port}."
         )

Based on past review comments.

♻️ Duplicate comments (21)
components/clp-py-utils/clp_py_utils/clp_config.py (3)

265-268: All transform_for_container() methods must return self for chaining.

Based on past review comments (marked "✅ Addressed" but not implemented), every transform_for_container() method should return self to support method chaining and maintain a consistent fluent API across all config classes.

Apply this pattern to all transform_for_container() methods in this file (Database line 265-268, QueryScheduler 331-334, Redis 399-402, Reducer 438-441, ResultsCache 491-494, Queue 540-543, S3IngestionConfig 631-633, FsIngestionConfig 687-689, ArchiveFsStorage 694-696, StreamFsStorage 701-703, ArchiveS3Storage 708-710, StreamS3Storage 715-717, CLPConfig 1100-1121):

 def transform_for_container(self):
     self.host = DB_COMPONENT_NAME
     self.port = self.DEFAULT_PORT
+    return self

Repeat for all other transform methods listed above.

Based on past review comments.


46-50: Container AWS config path /.aws is non-standard and will break credential lookups.

Past review comments (marked "✅ Addressed in commits 21b1856 to c79259a") suggested using the container runtime user's home directory (e.g., /root/.aws for root, or pathlib.Path.home() / ".aws"). The current code still uses pathlib.Path("/") / ".aws" which maps to /.aws, an unusual location that AWS SDKs won't check by default.

Replace the hardcoded /.aws with the runtime user's home:

-CONTAINER_AWS_CONFIG_DIRECTORY = pathlib.Path("/") / ".aws"
+CONTAINER_AWS_CONFIG_DIRECTORY = pathlib.Path.home() / ".aws"

Ensure each container's HOME environment variable is set correctly (e.g., /root for root user) and update Docker Compose volume mounts to bind the host's ~/.aws to the container user's home .aws directory.

Based on past review comments.


631-633: S3IngestionConfig.transform_for_container() is a no-op; it should return self.

Past review comments noted this method was missing; now it exists but is a no-op pass. For consistency and to enable method chaining, it must return self.

Apply this diff:

 def transform_for_container(self):
-    pass
+    return self  # No container-specific changes for S3 ingestion

Based on past review comments.

components/clp-package-utils/clp_package_utils/controller.py (10)

89-96: Verify abstract method return type consistency.

The base class declares _provision returns EnvVarsDict, but DockerComposeController._provision (line 522) doesn't return a value. Align the contract: either update the base to return None or have the implementation return the env dict.

Based on past review comments.


618-618: Verify chown flag was updated to portable -R.

Past review comment noted that --recursive is GNU-only and commit 0066386 supposedly addressed this, but the code still shows --recursive. Verify the fix was applied.

Expected: "chown", "-R", ...

Based on past review comments.


124-126: Make database images configurable.

Hard-coded image tags "mysql:8.0.23" and "mariadb:10-jammy" are tightly pinned. Consider reading from environment variables with sensible defaults for flexibility.

Based on past review comments.


217-218: Apply file mode explicitly for scheduler log file.

Path.touch(mode=...) only sets mode on creation. Explicitly call chmod to ensure the mode is applied even if the file already exists.

Apply this diff:

         logs_file = self.clp_config.logs_directory / f"{component_name}.log"
         logs_file.touch(mode=LOGS_FILE_MODE, exist_ok=True)
+        logs_file.chmod(LOGS_FILE_MODE)

Based on past review comments and learnings about scheduler user context.


234-235: Apply file mode explicitly for query scheduler log file.

Same issue as compression scheduler: ensure mode is applied to existing files.

Apply this diff:

         logs_file = self.clp_config.logs_directory / f"{component_name}.log"
         logs_file.touch(mode=LOGS_FILE_MODE, exist_ok=True)
+        logs_file.chmod(LOGS_FILE_MODE)

Based on past review comments.


445-454: Minor: use local variable in assignment.

Line 454 could use the loop variable value instead of updates[key] for clarity.

Apply this diff:

             if isinstance(value, dict):
                 self._update_settings_object(f"{parent_key_prefix}{key}.", settings[key], value)
             else:
-                settings[key] = updates[key]
+                settings[key] = value

Based on past review comments.


513-520: Ensure at least one worker on low-CPU systems.

On single-CPU hosts, cpu_count() // 2 returns 0. Return at least 1 worker.

Apply this diff:

     @staticmethod
     def _get_num_workers() -> int:
-        return multiprocessing.cpu_count() // 2
+        return max(1, multiprocessing.cpu_count() // 2)

Based on past review comments.


550-551: Avoid writing empty AWS credentials to .env file.

Empty credential strings are written even when AWS credentials are not set, cluttering the file and potentially causing issues.

Only include AWS credentials in env_dict if they are non-empty:

-            # AWS credentials
-            "CLP_AWS_ACCESS_KEY_ID": os.getenv("AWS_ACCESS_KEY_ID", ""),
-            "CLP_AWS_SECRET_ACCESS_KEY": os.getenv("AWS_SECRET_ACCESS_KEY", ""),

Then conditionally add them after the dict is created:

+        # Conditionally include AWS credentials only if set
+        aws_access_key = os.getenv("AWS_ACCESS_KEY_ID")
+        aws_secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
+        if aws_access_key and aws_secret_key:
+            env_dict["CLP_AWS_ACCESS_KEY_ID"] = aws_access_key
+            env_dict["CLP_AWS_SECRET_ACCESS_KEY"] = aws_secret_key
+
         if self.clp_config.aws_config_directory is not None:

Based on past review comments.


568-570: Secure .env file with restrictive permissions.

The .env file can contain secrets but is created with default permissions (potentially world-readable). Create it with mode 0o600.

Apply this diff:

-        with open(f"{self._clp_home}/.env", "w") as env_file:
+        env_path = f"{self._clp_home}/.env"
+        fd = os.open(env_path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
+        with os.fdopen(fd, "w") as env_file:
             for key, value in env_dict.items():
                 env_file.write(f"{key}={value}\n")

Based on past review comments.


622-629: Improve hostname resolution with error handling and IPv6 support.

socket.gethostbyname is IPv4-only and raises gaierror without context on failures. Consider using socket.getaddrinfo with proper error handling.

Based on past review comments.

taskfile.yaml (2)

55-55: Verify documentation and CI updated for new default task.

The default task now references docker-images:package. Ensure README, contributor docs, and CI workflows are updated accordingly.

Based on past review comments.


170-173: Consider validating Docker Compose files during packaging.

Adding a pre-check to validate Docker Compose YAML (e.g., docker compose config -q) before packaging can catch errors early.

Based on past review comments.

components/clp-package-utils/clp_package_utils/general.py (6)

134-149: Fix error message to match Docker Compose v2 command.

The error message references "docker-compose" but the code checks for Docker Compose v2 using docker compose command. Update the message for consistency.

Apply this diff:

     except subprocess.CalledProcessError:
-        raise EnvironmentError("docker-compose is not installed or not functioning properly.")
+        raise EnvironmentError("Docker Compose is not installed or not functioning properly.")

171-175: Maintain idempotency for stop operations.

The current logic raises an error when should_compose_run=False and Compose is not running, breaking idempotency of stop operations. When Compose is already down, stop_clp.py should succeed as a no-op.

Apply this diff to allow redundant stops:

     is_running = is_docker_compose_running(project_name)
-    if should_compose_run and not is_running:
-        raise EnvironmentError("docker-compose is not running.")
-    if not should_compose_run and is_running:
-        raise EnvironmentError("docker-compose is already running.")
+    if should_compose_run:
+        if not is_running:
+            raise EnvironmentError("Docker Compose is not running.")
+    elif is_running:
+        raise EnvironmentError("Docker Compose is already running.")

310-320: Fix docstring: function returns only transformed config.

The docstring incorrectly states the function returns "the container config and the mounts" but it only returns the transformed CLPConfig.

Apply this diff:

 def generate_docker_compose_container_config(clp_config: CLPConfig) -> CLPConfig:
     """
-    Copies the given config and transforms mount paths and hosts for Docker Compose.
-
+    Copies the given config and transforms paths and hosts for containerized execution.
+
     :param clp_config:
-    :return: The container config and the mounts.
+    :return: The transformed container-side CLPConfig.
     """

497-505: Add file type validation for database base config.

The validation checks if base_config.exists() but doesn't verify it's a file. A directory with the same name would pass but cause issues later.

Apply this diff:

     if not base_config.exists():
         raise ValueError(
             f"{DB_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
         )
+    if not base_config.is_file():
+        raise ValueError(
+            f"{DB_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
+        )

517-524: Add file type validation for Redis base config.

Consistent with database config validation, verify that base_config is a file, not just that it exists.

Apply this diff:

     if not base_config.exists():
         raise ValueError(
             f"{REDIS_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
         )
+    if not base_config.is_file():
+        raise ValueError(
+            f"{REDIS_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
+        )

540-548: Add file type validation for results cache base config.

Match other config validations by verifying base_config is a file.

Apply this diff:

     if not base_config.exists():
         raise ValueError(
             f"{RESULTS_CACHE_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
         )
+    if not base_config.is_file():
+        raise ValueError(
+            f"{RESULTS_CACHE_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af96fc8 and 320dd11.

📒 Files selected for processing (9)
  • components/clp-package-utils/clp_package_utils/controller.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/general.py (6 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/compress.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/decompress.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/search.py (1 hunks)
  • components/clp-py-utils/clp_py_utils/clp_config.py (17 hunks)
  • taskfile.yaml (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-13T14:48:49.020Z
Learnt from: haiqi96
PR: y-scope/clp#1144
File: components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py:106-114
Timestamp: 2025-08-13T14:48:49.020Z
Learning: For the dataset manager scripts in components/clp-package-utils/clp_package_utils/scripts/, the native script (native/dataset_manager.py) is designed to only be called through the wrapper script (dataset_manager.py), so dataset validation is only performed at the wrapper level rather than duplicating it in the native script.

Applied to files:

  • components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
🧬 Code graph analysis (2)
components/clp-package-utils/clp_package_utils/general.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (14)
  • CLPConfig (906-1120)
  • transform_for_container (265-267)
  • transform_for_container (331-333)
  • transform_for_container (399-401)
  • transform_for_container (438-440)
  • transform_for_container (491-493)
  • transform_for_container (540-542)
  • transform_for_container (631-632)
  • transform_for_container (687-688)
  • transform_for_container (694-695)
  • transform_for_container (701-702)
  • transform_for_container (708-709)
  • transform_for_container (715-716)
  • transform_for_container (1100-1120)
components/clp-package-utils/clp_package_utils/controller.py (3)
components/clp-py-utils/clp_py_utils/clp_config.py (10)
  • AwsAuthType (93-97)
  • CLPConfig (906-1120)
  • DeploymentType (72-74)
  • QueryEngine (82-85)
  • StorageEngine (77-79)
  • StorageType (88-90)
  • get_clp_connection_params_and_type (220-241)
  • get_directory (797-798)
  • get_directory (820-821)
  • get_deployment_type (1060-1064)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (3)
  • get_archives_table_name (299-300)
  • get_datasets_table_name (307-308)
  • get_files_table_name (311-312)
components/clp-package-utils/clp_package_utils/general.py (9)
  • check_docker_dependencies (151-175)
  • dump_shared_container_config (359-369)
  • generate_docker_compose_container_config (310-320)
  • get_clp_home (107-123)
  • validate_db_config (497-507)
  • validate_queue_config (510-513)
  • validate_redis_config (516-526)
  • validate_results_cache_config (540-554)
  • validate_webui_config (569-578)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: build (macos-15)
🔇 Additional comments (19)
components/clp-py-utils/clp_py_utils/clp_config.py (3)

72-75: LGTM: DeploymentType enum added.

The new DeploymentType enum (BASE, FULL) is correctly defined using KebabCaseStrEnum and auto(), consistent with other enums in this file.


152-156: LGTM: Database.DEFAULT_PORT ClassVar and default port.

The introduction of DEFAULT_PORT: ClassVar[int] = 3306 and updating port: int = DEFAULT_PORT is correct. Using a ClassVar prevents the constant from being captured as an instance attribute by Pydantic.


1060-1065: LGTM: get_deployment_type() method added.

The method correctly returns DeploymentType.BASE when Presto is the query engine, otherwise DeploymentType.FULL. Logic is sound.

components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py (1)

9-16: LGTM: Import relocation of CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH.

The import source for CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH has been correctly moved from clp_package_utils.general to clp_py_utils.clp_config. The constant's usage at line 40 remains unchanged, and this aligns with the broader refactor moving config constants to the clp_py_utils.clp_config module.

components/clp-package-utils/clp_package_utils/scripts/search.py (1)

8-15: LGTM: Import relocation of CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH.

The import source for CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH has been correctly moved from clp_package_utils.general to clp_py_utils.clp_config. The constant's usage at line 35 remains unchanged, and this aligns with the broader refactor.

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)

9-16: LGTM: Import relocation of CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH.

The import source for CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH has been correctly moved from clp_package_utils.general to clp_py_utils.clp_config. The constant's usage at line 59 remains unchanged.

components/clp-package-utils/clp_package_utils/scripts/compress.py (1)

10-16: LGTM: Import relocation of CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH.

The import source for CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH has been correctly moved from clp_package_utils.general to clp_py_utils.clp_config. The constant's usage at line 137 remains unchanged.

components/clp-package-utils/clp_package_utils/scripts/decompress.py (1)

9-17: LGTM: Import relocation of CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH.

The import source for CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH has been correctly moved from clp_package_utils.general to clp_py_utils.clp_config. The constant's usage at line 276 remains unchanged.

components/clp-package-utils/clp_package_utils/general.py (3)

3-3: LGTM: JSON import for Docker Compose detection.

The json import is appropriately used to parse docker compose ls output in the new is_docker_compose_running function.


20-22: LGTM: Container path constants imported for config transformation.

These constants support the new generate_docker_compose_container_config function and align with the containerized deployment workflow.


178-189: LGTM: Log directory validation helper reduces duplication.

The extracted _validate_log_directory function provides consistent validation across components and improves maintainability.

components/clp-package-utils/clp_package_utils/controller.py (7)

54-54: LGTM: Log file mode definition is clear and correct.

While the past review suggested using octal notation (0o644), the explicit stat constants are more self-documenting and equally valid.


198-206: LGTM: Results cache env var naming is consistent.

The environment key CLP_RESULTS_CACHE_CONF_FILE_HOST correctly indicates a file path, addressing the past review concern.


242-298: LGTM: Worker log directory provisioning.

The worker provisioning methods create log directories without applying ownership changes, which is consistent with workers running as the current user context.


300-400: LGTM: Web UI provisioning handles complex configuration updates.

The implementation correctly reads, updates, and writes both client and server settings.json files, mapping host config to container config appropriately.


402-414: LGTM: Garbage collector provisioning.

The GC provisioning creates necessary directories without ownership changes, consistent with the deployment model.


466-492: LGTM: Deploy method with pre-flight checks.

The dependency check prevents double-starts while allowing idempotent docker compose up -d operations. Past concerns have been addressed.


543-543: LGTM: Container image from config.

The container image reference is read from clp_config.container_image_ref, addressing the past concern about hard-coded values.

taskfile.yaml (1)

120-120: LGTM: Package sources include deployment files.

Adding "tools/deployment/package/**/*" ensures the package task rebuilds when Docker Compose or deployment files change.

Comment on lines 670 to +689
class FsIngestionConfig(FsStorage):
directory: pathlib.Path = pathlib.Path("/")

def transform_for_container(self):
self.directory = CONTAINER_INPUT_LOGS_ROOT_DIR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Dangerous default: FsIngestionConfig.directory = pathlib.Path("/") risks scanning the entire filesystem.

Past review comments (marked as "⚠️ Potential issue") flagged this as a critical risk. Defaulting directory to / means ingestion could scan the entire root filesystem by mistake. This must fail closed with a safe placeholder or explicit validation.

Change the field to an explicit "unset" sentinel and add validation:

 class FsIngestionConfig(FsStorage):
-    directory: pathlib.Path = pathlib.Path("/")
+    directory: Optional[pathlib.Path] = None
+
+    @field_validator("directory")
+    @classmethod
+    def validate_directory_set(cls, value):
+        if value is None:
+            raise ValueError("logs_input.directory must be explicitly set; defaulting to '/' is unsafe.")
+        _validate_directory(value)
+        return value
 
     def transform_for_container(self):
-        self.directory = CONTAINER_INPUT_LOGS_ROOT_DIR
+        if self.directory is None:
+            self.directory = CONTAINER_INPUT_LOGS_ROOT_DIR
+        else:
+            self.directory = CONTAINER_INPUT_LOGS_ROOT_DIR
+        return self

Alternatively, use a clearly invalid placeholder like pathlib.Path("") and validate it's changed before use.

Based on past review comments.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
components/clp-py-utils/clp_py_utils/clp_config.py lines 684-689: the field
defaulting directory to pathlib.Path("/") is dangerous because it can cause
accidental root filesystem scans; change the default to an explicit "unset"
sentinel (e.g., None or pathlib.Path("")) and update the type to
Optional[pathlib.Path] (or use a sentinel constant), ensure
transform_for_container still sets the container path, and add a validation step
(constructor or a validate() method) that raises a clear exception if directory
is unset, empty, or still "/" before any ingestion operation so the code fails
closed.

Comment on lines +1100 to +1121
def transform_for_container(self):
"""
Adjusts paths and service hosts for containerized execution.
Converts all relevant directories to absolute paths inside the container
and updates service hostnames/ports to their container service names.
"""
self.data_directory = pathlib.Path("/") / CLP_DEFAULT_DATA_DIRECTORY_PATH
self.logs_directory = pathlib.Path("/") / CLP_DEFAULT_LOG_DIRECTORY_PATH
if self.aws_config_directory is not None:
self.aws_config_directory = CONTAINER_AWS_CONFIG_DIRECTORY
self.logs_input.transform_for_container()
self.archive_output.storage.transform_for_container()
self.stream_output.storage.transform_for_container()

self.database.transform_for_container()
self.queue.transform_for_container()
self.redis.transform_for_container()
self.results_cache.transform_for_container()
self.query_scheduler.transform_for_container()
self.reducer.transform_for_container()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

CLPConfig.transform_for_container() should bind WebUI to 0.0.0.0 and return self.

Past review comments (marked "🛠️ Refactor suggestion") requested:

  1. Set self.webui.host = "0.0.0.0" so the WebUI listens on all interfaces inside the container (not just loopback).
  2. Return self for chaining.

Neither change is present in the current code.

Apply this diff:

 def transform_for_container(self):
     """
     Adjusts paths and service hosts for containerized execution.
 
     Converts all relevant directories to absolute paths inside the container
     and updates service hostnames/ports to their container service names.
     """
     self.data_directory = pathlib.Path("/") / CLP_DEFAULT_DATA_DIRECTORY_PATH
     self.logs_directory = pathlib.Path("/") / CLP_DEFAULT_LOG_DIRECTORY_PATH
     if self.aws_config_directory is not None:
         self.aws_config_directory = CONTAINER_AWS_CONFIG_DIRECTORY
+    # Ensure WebUI listens on all interfaces within the container
+    self.webui.host = "0.0.0.0"
     self.logs_input.transform_for_container()
     self.archive_output.storage.transform_for_container()
     self.stream_output.storage.transform_for_container()
 
     self.database.transform_for_container()
     self.queue.transform_for_container()
     self.redis.transform_for_container()
     self.results_cache.transform_for_container()
     self.query_scheduler.transform_for_container()
     self.reducer.transform_for_container()
+    return self

Based on past review comments.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def transform_for_container(self):
"""
Adjusts paths and service hosts for containerized execution.
Converts all relevant directories to absolute paths inside the container
and updates service hostnames/ports to their container service names.
"""
self.data_directory = pathlib.Path("/") / CLP_DEFAULT_DATA_DIRECTORY_PATH
self.logs_directory = pathlib.Path("/") / CLP_DEFAULT_LOG_DIRECTORY_PATH
if self.aws_config_directory is not None:
self.aws_config_directory = CONTAINER_AWS_CONFIG_DIRECTORY
self.logs_input.transform_for_container()
self.archive_output.storage.transform_for_container()
self.stream_output.storage.transform_for_container()
self.database.transform_for_container()
self.queue.transform_for_container()
self.redis.transform_for_container()
self.results_cache.transform_for_container()
self.query_scheduler.transform_for_container()
self.reducer.transform_for_container()
def transform_for_container(self):
"""
Adjusts paths and service hosts for containerized execution.
Converts all relevant directories to absolute paths inside the container
and updates service hostnames/ports to their container service names.
"""
self.data_directory = pathlib.Path("/") / CLP_DEFAULT_DATA_DIRECTORY_PATH
self.logs_directory = pathlib.Path("/") / CLP_DEFAULT_LOG_DIRECTORY_PATH
if self.aws_config_directory is not None:
self.aws_config_directory = CONTAINER_AWS_CONFIG_DIRECTORY
# Ensure WebUI listens on all interfaces within the container
self.webui.host = "0.0.0.0"
self.logs_input.transform_for_container()
self.archive_output.storage.transform_for_container()
self.stream_output.storage.transform_for_container()
self.database.transform_for_container()
self.queue.transform_for_container()
self.redis.transform_for_container()
self.results_cache.transform_for_container()
self.query_scheduler.transform_for_container()
self.reducer.transform_for_container()
return self
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_config.py around lines 1100 to 1121,
transform_for_container currently updates many subcomponents but does not bind
the WebUI to all interfaces nor return self; set self.webui.host = "0.0.0.0" (so
the WebUI listens on 0.0.0.0 inside the container) before returning, and add a
final line returning self to allow method chaining; ensure this change is made
after other transforms and handles a missing self.webui gracefully if necessary.

junhaoliao and others added 2 commits October 2, 2025 12:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (12)
tools/deployment/package/docker-compose.base.yaml (8)

4-7: Drop the global user: override from shared defaults.

Reusing vendor images (MySQL, RabbitMQ, Redis, MongoDB) with a forced ${CLP_UID_GID} user breaks their init scripts. Remove this user: from x-service-defaults and only set it on CLP-owned services (scheduler, workers, webui, GC) as needed.


23-37: Bump the MySQL default image tag.

mysql:8.0.23 is long out of support and accumulates CVEs. Update the fallback to a currently supported 8.x tag (e.g., 8.0.39) while keeping CLP_DB_IMAGE overrideable, then revalidate.


40-48: Keep DB password out of argv; read from the secret file.

The healthcheck currently invokes mysqladmin with --password=${CLP_DB_PASS}, exposing the secret via inspect/process args. Switch to CMD-SHELL and cat /run/secrets/CLP_DB_PASS_FILE instead.


71-90: Upgrade RabbitMQ image to a supported release.

rabbitmq:3.9.8 is EOL and carries known CVEs. Move the default to a maintained 3.13.x tag (e.g., 3.13.8) to stay patched.


118-138: Fix results-cache host port default and stop grace.

The published port defaults to 6379 (Redis) so MongoDB collides with Redis when env vars are unset. Default it to 27017 and give this stateful service a ≥30 s stop_grace_period to avoid forced shutdowns.


170-175: Compose won’t expand ~; use $HOME in defaults.

The volume fallback mounts literal ~/.aws. Replace it with ${HOME}/.aws (and mirror for other occurrences in this file).


189-214: Separate staged archive host path.

Both archive mounts use CLP_ARCHIVE_OUTPUT_DIR_HOST, so staged and final archives collide when customised. Introduce CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST (with matching env/docs).


156-188: Restore non-root user for CLP-owned services after dropping the global override.

Once x-service-defaults.user is removed, compression-scheduler must explicitly run as ${CLP_UID_GID} per our runtime model. Please add user: "${CLP_UID_GID:-1000:1000}" (same applies to compression-worker, webui, garbage-collector). Based on learnings.

tools/deployment/package/docker-compose.yaml (4)

6-9: Remove the global user: override here as well.

Same rationale as the base file: vendor dependencies inherit this anchor, so drop user: from x-service-defaults and set it only on CLP-owned services.


33-36: Avoid backslash continuations and keep the volume on one line.

The split string relies on YAML quirks; replace it with a single-line entry without \ to keep parsers happy.


59-82: Fix staged stream volume collision and home expansion.

query-worker mounts both staged and final streams from CLP_STREAM_OUTPUT_DIR_HOST, so customising the env collapses them. Add a dedicated CLP_STAGED_STREAM_OUTPUT_DIR_HOST. Also, defaults using ~ won’t expand; switch to ${HOME}.


18-56: Re-add per-service user: after removing the global override.

Once the shared user: is gone, ensure query-scheduler (and other CLP services in this file) explicitly set user: "${CLP_UID_GID:-1000:1000}" to keep host UID semantics.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08644f4 and dd56d04.

📒 Files selected for processing (2)
  • tools/deployment/package/docker-compose.base.yaml (1 hunks)
  • tools/deployment/package/docker-compose.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • tools/deployment/package/docker-compose.base.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)

Comment on lines +91 to +116
redis:
<<: *service_defaults
image: "redis:7.2.4"
hostname: "redis"
user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
ports:
- "${CLP_REDIS_HOST:-127.0.0.1}:${CLP_REDIS_PORT:-6379}:6379"
volumes:
- "${CLP_REDIS_CONF_FILE_HOST:-./etc/redis/redis.conf}:/usr/local/etc/redis/redis.conf:ro"
- "${CLP_REDIS_DATA_DIR_HOST:-./var/data/redis}:/data"
- "${CLP_REDIS_LOGS_DIR_HOST:-./var/log/redis}:/var/log/redis"
healthcheck:
<<: *healthcheck_defaults
test: [
"CMD",
"redis-cli",
"-h", "127.0.0.1",
"-p", "6379",
"-a", "${CLP_REDIS_PASS}",
"PING"
]
command: [
"redis-server",
"/usr/local/etc/redis/redis.conf",
"--requirepass", "${CLP_REDIS_PASS}"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Store the Redis password in a secret.

CLP_REDIS_PASS is injected into the healthcheck and command, leaking credentials. Define a secret (e.g., CLP_REDIS_PASS_FILE), attach it to the service, and read from it in both the healthcheck and startup command.

🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.base.yaml around lines 91 to 116, the
Redis password is inlined via CLP_REDIS_PASS and leaks into healthcheck and
command; instead declare a Docker secret (e.g., CLP_REDIS_PASS) in the top-level
secrets section, reference that secret in the redis service so it is mounted
into the container (conventionally under /run/secrets/CLP_REDIS_PASS), remove
any direct CLP_REDIS_PASS interpolation from ports/command/healthcheck, and
change the healthcheck and startup command to use the secret file by executing a
shell that reads the secret file (e.g., sh -c "redis-cli -h 127.0.0.1 -p 6379 -a
$(cat /run/secrets/CLP_REDIS_PASS) PING") and similarly pass the password to
redis-server via reading the secret file in a shell wrapper; ensure the secret
is marked external or points to the file path you want and do not expose the
password as an environment variable or in-line string.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (10)
tools/deployment/package/docker-compose.base.yaml (7)

4-8: Remove global user override; set users only on CLP-owned services.

Vendor images (MySQL/RabbitMQ/Redis/Mongo) fail init under arbitrary UID/GID. Drop the shared user and explicitly set user for CLP app containers.

 x-service-defaults: &service_defaults
   image: "${CLP_PACKAGE_CONTAINER}"
   stop_grace_period: "3s"
-  user: "${CLP_UID_GID:-1000:1000}"

Also remove per‑infra user overrides and add user for CLP services:

@@
   database:
@@
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
@@
   queue:
@@
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
@@
   redis:
@@
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
@@
   results-cache:
@@
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
@@
   compression-scheduler:
@@
+    user: "${CLP_UID_GID:-1000:1000}"
@@
   compression-worker:
@@
+    user: "${CLP_UID_GID:-1000:1000}"
@@
   webui:
@@
+    user: "${CLP_UID_GID:-1000:1000}"
@@
   garbage-collector:
@@
+    user: "${CLP_UID_GID:-1000:1000}"

Based on learnings


71-76: Do not set user for RabbitMQ; also update image tag.

   queue:
     <<: *service_defaults
-    image: "rabbitmq:3.9.8"
+    image: "rabbitmq:3.13.8"
     hostname: "queue"
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"

118-138: Fix Mongo host port default; add longer stop_grace_period; remove trailing comma.

   results-cache:
     <<: *service_defaults
     image: "mongo:7.0.1"
     hostname: "results_cache"
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
+    # runs with image default user
+    stop_grace_period: "30s"
     ports:
-      - "${CLP_RESULTS_CACHE_HOST:-127.0.0.1}:${CLP_RESULTS_CACHE_PORT:-6379}:27017"
+      - "${CLP_RESULTS_CACHE_HOST:-127.0.0.1}:${CLP_RESULTS_CACHE_PORT:-27017}:27017"
@@
     command: [
       "--config", "/etc/mongo/mongod.conf",
-      "--bind_ip", "0.0.0.0",
+      "--bind_ip", "0.0.0.0"
     ]

168-172: Compose doesn’t expand ~; also avoid backslash continuations and fix staged-archives var.

-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-$HOME/.aws}:/.aws:ro"
-      - "${CLP_COMPRESSION_SCHEDULER_LOGS_FILE_HOST:-./var/log/compression_scheduler.log}\
-:/var/log/compression_scheduler.log"
+      - "${CLP_COMPRESSION_SCHEDULER_LOGS_FILE_HOST:-./var/log/compression_scheduler.log}:/var/log/compression_scheduler.log"
@@
-      - "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/archives}:/var/data/archives"
-      - "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/staged-archives}:/var/data/staged-archives"
-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
-      - "${CLP_COMPRESSION_WORKER_LOGS_DIR_HOST:-./var/log/compression_worker}:\
-/var/log/compression_worker"
+      - "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/archives}:/var/data/archives"
+      - "${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/staged-archives}:/var/data/staged-archives"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-$HOME/.aws}:/.aws:ro"
+      - "${CLP_COMPRESSION_WORKER_LOGS_DIR_HOST:-./var/log/compression_worker}:/var/log/compression_worker"
@@
-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-$HOME/.aws}:/.aws:ro"
-      - "./var/www/webui/server/dist/settings.json\
-:/opt/clp/var/www/webui/server/dist/settings.json:ro"
+      - "./var/www/webui/server/dist/settings.json:/opt/clp/var/www/webui/server/dist/settings.json:ro"

Ensure CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST is added to env generation/docs.

Also applies to: 203-211, 238-242


21-40: Increase stop_grace_period for stateful stores.

3s is unsafe. Give DB at least 30s to shut down cleanly. Do the same for Mongo below.

   database:
     <<: *service_defaults
     image: "${CLP_DB_IMAGE:-mysql:8.0.23}"
+    stop_grace_period: "30s"

91-117: Move Redis password to a secret (healthcheck/command).

Top-level secrets:

 secrets:
   CLP_DB_PASS_FILE:
     environment: "CLP_DB_PASS"
+  CLP_REDIS_PASS_FILE:
+    environment: "CLP_REDIS_PASS"

Service changes:

   redis:
     <<: *service_defaults
@@
-    healthcheck:
-      <<: *healthcheck_defaults
-      test: [
-        "CMD",
-        "redis-cli",
-        "-h", "127.0.0.1",
-        "-p", "6379",
-        "-a", "${CLP_REDIS_PASS}",
-        "PING"
-      ]
+    secrets: ["CLP_REDIS_PASS_FILE"]
+    healthcheck:
+      <<: *healthcheck_defaults
+      test: >
+        bash -lc 'redis-cli -h 127.0.0.1 -p 6379 -a "$(cat /run/secrets/CLP_REDIS_PASS_FILE)" PING'
@@
-    command: [
-      "redis-server",
-      "/usr/local/etc/redis/redis.conf",
-      "--requirepass", "${CLP_REDIS_PASS}"
-    ]
+    command: >
+      bash -lc 'exec redis-server /usr/local/etc/redis/redis.conf --requirepass "$(cat /run/secrets/CLP_REDIS_PASS_FILE)"'

41-49: DB healthcheck leaks password; read from secret file.

     healthcheck:
       <<: *healthcheck_defaults
-      test: [
-        "CMD",
-        "mysqladmin", "ping",
-        "--silent",
-        "-h", "127.0.0.1",
-        "-u", "${CLP_DB_USER}",
-        "--password=${CLP_DB_PASS}"
-      ]
+      test: >
+        bash -lc 'mysqladmin ping --silent -h 127.0.0.1 -u "${CLP_DB_USER}" -p"$(cat /run/secrets/CLP_DB_PASS_FILE)"'
tools/deployment/package/docker-compose.yaml (3)

5-10: Remove global user override; set per-service user on CLP apps.

 x-service-defaults: &service_defaults
   image: "${CLP_PACKAGE_CONTAINER}"
   stop_grace_period: "3s"
-  user: "${CLP_UID_GID:-1000:1000}"

Then add user to app services:

   query-scheduler:
@@
+    user: "${CLP_UID_GID:-1000:1000}"
@@
   query-worker:
@@
+    user: "${CLP_UID_GID:-1000:1000}"
@@
   reducer:
@@
+    user: "${CLP_UID_GID:-1000:1000}"

75-81: Use $HOME instead of ~; split staged vs final streams.

       - "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/archives}:/var/data/archives"
-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-$HOME/.aws}:/.aws:ro"
@@
-      - "${CLP_STREAM_OUTPUT_DIR_HOST:-./var/data/staged-streams}:/var/data/staged-streams"
-      - "${CLP_STREAM_OUTPUT_DIR_HOST:-./var/data/streams}:/var/data/streams"
+      - "${CLP_STAGED_STREAM_OUTPUT_DIR_HOST:-./var/data/staged-streams}:/var/data/staged-streams"
+      - "${CLP_STREAM_OUTPUT_DIR_HOST:-./var/data/streams}:/var/data/streams"

Ensure CLP_STAGED_STREAM_OUTPUT_DIR_HOST is added to env generation/docs.


33-36: Avoid backslash line continuations in volume entries.

-      - "${CLP_QUERY_SCHEDULER_LOGS_FILE_HOST:-./var/log/query_scheduler.log}:\
-/var/log/query_scheduler.log"
+      - "${CLP_QUERY_SCHEDULER_LOGS_FILE_HOST:-./var/log/query_scheduler.log}:/var/log/query_scheduler.log"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd56d04 and 5298540.

📒 Files selected for processing (2)
  • tools/deployment/package/docker-compose.base.yaml (1 hunks)
  • tools/deployment/package/docker-compose.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • tools/deployment/package/docker-compose.base.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: antlr-code-committed (macos-15)
🔇 Additional comments (1)
tools/deployment/package/docker-compose.base.yaml (1)

24-24: Upgrade default MySQL image tag (security)
Bump from mysql:8.0.23 to mysql:8.0.43.

-    image: "${CLP_DB_IMAGE:-mysql:8.0.23}"
+    image: "${CLP_DB_IMAGE:-mysql:8.0.43}"

Verify compatibility in CI and update documentation.

Comment on lines +172 to +173
- "/:/mnt/logs:ro"
depends_on:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Scope the “/” host mount.

Mounting root “/” is overly broad; restrict to needed paths or gate via a profile.

-      - "/:/mnt/logs:ro"
+# Consider: - "${CLP_MNT_LOGS_HOST:-/var/log}:/mnt/logs:ro"
+# Or guard via profiles: ["host-root-mount"]

Also applies to: 209-210

🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.base.yaml around lines 172-173 (and
also at 209-210) the service mounts the host root ("/:/mnt/logs:ro"), which is
too broad; replace that bind with only the specific host directory paths
required for logs (e.g., /var/log or the app-specific log path) or move the root
mount behind a clearly named compose profile so it’s only enabled intentionally;
update the volume entry to reference the scoped host path(s) and adjust any
service or documentation to reflect the new path(s).

Comment on lines +49 to +57
healthcheck:
<<: *healthcheck_defaults
# FIXME: need to suppressing warnings in the scheduler for reading 0 out of 8 expected bytes
test: [
"CMD",
"bash",
"-c",
"< /dev/tcp/query_scheduler/7000"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Simplify healthcheck and avoid DNS to self.

-      test: [
-        "CMD",
-        "bash",
-        "-c",
-        "< /dev/tcp/query_scheduler/7000"
-      ]
+      test: ["CMD-SHELL", "< /dev/tcp/127.0.0.1/7000"]

(Optional: expose a lightweight /health endpoint to remove socket warnings.)

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (9)
tools/deployment/package/docker-compose.base.yaml (7)

24-24: Update the default MySQL image to a maintained tag

mysql:8.0.23 is long out of support and carries known CVEs. Please bump the default to a currently supported 8.0.x/8.4.x tag (e.g., 8.0.39) so new deployments aren’t born vulnerable.

-    image: "${CLP_DB_IMAGE:-mysql:8.0.23}"
+    image: "${CLP_DB_IMAGE:-mysql:8.0.39}"

4-7: Remove the global user override from shared defaults

This forces vendor images (MySQL, RabbitMQ, Redis, MongoDB) to run as an arbitrary UID/GID, breaking their init scripts and permission setup. Drop the user from the shared defaults so only CLP-owned services opt in.

 x-service-defaults: &service_defaults
   image: "${CLP_PACKAGE_CONTAINER:-clp-package}"
   stop_grace_period: "3s"
-  user: "${CLP_UID_GID:-1000:1000}"

203-204: Use a distinct host path for staged archives

Both mounts expand to CLP_ARCHIVE_OUTPUT_DIR_HOST, collapsing staged and final archives onto the same host directory. Introduce a dedicated env var for staged archives so the pipeline doesn’t overwrite itself.

-      - "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/staged-archives}:/var/data/staged-archives"
+      - "${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/staged-archives}:/var/data/staged-archives"

73-76: Upgrade RabbitMQ to a supported tag

rabbitmq:3.9.8 is EOL with multiple CVEs. Please move to a maintained 3.x release such as 3.13.8.

-    image: "rabbitmq:3.9.8"
+    image: "rabbitmq:3.13.8"

40-49: Keep the DB password out of the healthcheck command line

Passing --password=${CLP_DB_PASS} leaks secrets via docker inspect and process args. Read from the mounted secret file instead.

     healthcheck:
       <<: *healthcheck_defaults
-      test: [
-        "CMD",
-        "mysqladmin", "ping",
-        "--silent",
-        "-h", "127.0.0.1",
-        "-u", "${CLP_DB_USER}",
-        "--password=${CLP_DB_PASS}"
-      ]
+      test: >
+        bash -lc 'mysqladmin ping --silent -h 127.0.0.1 -u "${CLP_DB_USER}" -p"$(cat /run/secrets/CLP_DB_PASS_FILE)"'

124-124: Fix the MongoDB port mapping default

The fallback host port is 6379 (Redis), causing a collision when env vars are unset. Default this mapping to Mongo’s 27017.

-      - "${CLP_RESULTS_CACHE_HOST:-127.0.0.1}:${CLP_RESULTS_CACHE_PORT:-6379}:27017"
+      - "${CLP_RESULTS_CACHE_HOST:-127.0.0.1}:${CLP_RESULTS_CACHE_PORT:-27017}:27017"

155-186: Add explicit users for CLP-owned services after dropping the global default

Once the shared user override is gone, the CLP app containers (compression-scheduler, compression-worker, webui, garbage-collector) must set user: "${CLP_UID_GID:-1000:1000}" so they continue to run as the host UID/GID, matching file ownership and expectations. Based on learnings.

   compression-scheduler:
     <<: *service_defaults
     hostname: "compression_scheduler"
+    user: "${CLP_UID_GID:-1000:1000}"
@@
   compression-worker:
     <<: *service_defaults
     hostname: "compression_worker"
+    user: "${CLP_UID_GID:-1000:1000}"
@@
   webui:
     <<: *service_defaults
     hostname: "webui"
+    user: "${CLP_UID_GID:-1000:1000}"
@@
   garbage-collector:
     <<: *service_defaults
     hostname: "garbage_collector"
+    user: "${CLP_UID_GID:-1000:1000}"

Also applies to: 187-223, 224-252, 261-283

tools/deployment/package/docker-compose.yaml (2)

76-76: Avoid using ~ in defaults

Compose leaves ~ untouched, so this resolves to a literal ./~/.aws. Use ${HOME} instead.

-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-${HOME}/.aws}:/.aws:ro"

6-9: Remove the global user override here as well

Same problem as the base file: this anchors every service to the host UID/GID, breaking vendor containers inherited from the base compose. Drop it.

 x-service-defaults: &service_defaults
   image: "${CLP_PACKAGE_CONTAINER:-clp-package}"
   stop_grace_period: "3s"
-  user: "${CLP_UID_GID:-1000:1000}"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5298540 and 68586f0.

📒 Files selected for processing (2)
  • tools/deployment/package/docker-compose.base.yaml (1 hunks)
  • tools/deployment/package/docker-compose.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • tools/deployment/package/docker-compose.base.yaml

Comment on lines +17 to +20
secrets:
CLP_DB_PASS_FILE:
environment: "CLP_DB_PASS"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Move the Redis password into a secret

CLP_REDIS_PASS is exposed in env/argv for both the healthcheck and startup command. Define a secret, mount it, and read the password from the file to avoid leaking credentials.

 secrets:
   CLP_DB_PASS_FILE:
     environment: "CLP_DB_PASS"
+  CLP_REDIS_PASS_FILE:
+    environment: "CLP_REDIS_PASS"
@@
   redis:
     <<: *service_defaults
     image: "redis:7.2.4"
     hostname: "redis"
-    ports:
+    secrets: ["CLP_REDIS_PASS_FILE"]
+    ports:
@@
-    healthcheck:
-      <<: *healthcheck_defaults
-      test: [
-        "CMD",
-        "redis-cli",
-        "-h", "127.0.0.1",
-        "-p", "6379",
-        "-a", "${CLP_REDIS_PASS}",
-        "PING"
-      ]
-    command: [
-      "redis-server",
-      "/usr/local/etc/redis/redis.conf",
-      "--requirepass", "${CLP_REDIS_PASS}"
-    ]
+    healthcheck:
+      <<: *healthcheck_defaults
+      test: >
+        bash -lc 'redis-cli -h 127.0.0.1 -p 6379 -a "$(cat /run/secrets/CLP_REDIS_PASS_FILE)" PING'
+    command: >
+      bash -lc 'exec redis-server /usr/local/etc/redis/redis.conf --requirepass "$(cat /run/secrets/CLP_REDIS_PASS_FILE)"'

Also applies to: 91-116

Comment on lines +23 to +38
<<: *service_defaults
image: "${CLP_DB_IMAGE:-mysql:8.0.23}"
hostname: "database"
user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
environment:
MYSQL_DATABASE: "${CLP_DB_NAME}"
MYSQL_PASSWORD_FILE: "/run/secrets/CLP_DB_PASS_FILE"
MYSQL_ROOT_PASSWORD_FILE: "/run/secrets/CLP_DB_PASS_FILE"
MYSQL_USER: "${CLP_DB_USER}"
secrets:
- "CLP_DB_PASS_FILE"
ports:
- "${CLP_DB_HOST:-127.0.0.1}:${CLP_DB_PORT:-3306}:3306"
volumes:
- "${CLP_DB_CONF_FILE_HOST:-./etc/mysql/conf.d/logging.cnf}:/etc/mysql/conf.d/logging.cnf:ro"
- "${CLP_DB_DATA_DIR_HOST:-./var/data/database}:/var/lib/mysql"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Give stateful stores enough time to shut down

Three seconds isn’t enough for MySQL or MongoDB to flush data safely; containers will be SIGKILLed and risk corruption. Set a ≥30 s stop_grace_period on both services.

   database:
     <<: *service_defaults
+    stop_grace_period: "30s"
@@
   results-cache:
     <<: *service_defaults
+    stop_grace_period: "30s"

Also applies to: 118-138

🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.base.yaml around lines 23 to 38 (and
also apply same change to lines 118 to 138), the MySQL service lacks a
sufficient shutdown grace period; increase or add stop_grace_period to at least
30s (e.g., "30s") for the database service so containers get enough time to
flush data and shut down cleanly; make the identical change for the
MongoDB/service definition at lines 118-138 to ensure both stateful stores are
protected from SIGKILL during short shutdowns.

<<: *service_defaults
image: "${CLP_DB_IMAGE:-mysql:8.0.23}"
hostname: "database"
user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Let vendor images run with their packaged users

Even after removing the global default, these per-service overrides still pin infra containers to the host UID/GID, which breaks startup. Please delete the user lines on database, queue, redis, and results-cache so they run with their baked-in accounts.

     hostname: "database"
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
@@
     hostname: "queue"
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
@@
     hostname: "redis"
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"
@@
     hostname: "results_cache"
-    user: "${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}"

Also applies to: 75-75, 95-95, 122-122

🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.base.yaml around lines 26, 75, 95,
and 122, per-service user overrides (user:
"${CLP_SERVICE_CONTAINER_UID_GID:-1000:1000}") are forcing infra containers to
run as the host UID/GID and breaking startup; remove the user: lines for the
database, queue, redis, and results-cache service blocks so those services run
with their vendor-packaged users (delete the entire user: line from each listed
service).

Comment on lines +168 to +172
- "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
- "${CLP_COMPRESSION_SCHEDULER_LOGS_FILE_HOST:-./var/log/compression_scheduler.log}\
:/var/log/compression_scheduler.log"
- "${CLP_LOGS_DIR_HOST:-./var/log}/.clp-config.yml:/etc/clp-config.yml:ro"
- "/:/mnt/logs:ro"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace ~ defaults with $HOME

Compose doesn’t expand ~, so these mounts resolve to literal ./~/.aws. Use $HOME (or ${HOME}) to pick up the user’s home directory.

-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-${HOME}/.aws}:/.aws:ro"
@@
-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-${HOME}/.aws}:/.aws:ro"
@@
-      - "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
+      - "${CLP_AWS_CONFIG_DIR_HOST:-${HOME}/.aws}:/.aws:ro"

Also applies to: 205-205, 238-239

🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.base.yaml around lines 168-172 (and
also update the occurrences at lines 205 and 238-239), replace default paths
that use "~" with "$HOME" or "${HOME}" so Docker Compose expands the user's home
directory correctly; update each mount default like
"${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro" to use
"${CLP_AWS_CONFIG_DIR_HOST:-$HOME/.aws}:/.aws:ro" (and similarly for other
entries) ensuring the syntax remains ${VAR:-default} and preserving existing
mount targets and options.

- "${CLP_COMPRESSION_SCHEDULER_LOGS_FILE_HOST:-./var/log/compression_scheduler.log}\
:/var/log/compression_scheduler.log"
- "${CLP_LOGS_DIR_HOST:-./var/log}/.clp-config.yml:/etc/clp-config.yml:ro"
- "/:/mnt/logs:ro"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stop mounting the entire host filesystem

- "/:/mnt/logs:ro" grants every container root-level read access, which is an unnecessary and risky privilege escalation. Scope the bind to the actual log directory you need (or gate it behind a profile).

Also applies to: 210-210

Comment on lines +19 to +92
services:
query-scheduler:
<<: *service_defaults
hostname: "query_scheduler"
environment:
BROKER_URL: "amqp://${CLP_QUEUE_USER}:${CLP_QUEUE_PASS}@queue:5672"
CLP_DB_PASS: "${CLP_DB_PASS}"
CLP_DB_USER: "${CLP_DB_USER}"
CLP_LOGGING_LEVEL: "${CLP_QUERY_SCHEDULER_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log"
PYTHONPATH: "/opt/clp/lib/python3/site-packages"
RESULT_BACKEND: >-
redis://default:${CLP_REDIS_PASS}@redis:6379/${CLP_REDIS_QUERY_BACKEND_DB:-0}
volumes:
- "${CLP_LOGS_DIR_HOST:-./var/log}/.clp-config.yml:/etc/clp-config.yml:ro"
- "${CLP_QUERY_SCHEDULER_LOGS_FILE_HOST:-./var/log/query_scheduler.log}:\
/var/log/query_scheduler.log"
depends_on:
db-table-creator:
condition: "service_completed_successfully"
queue:
condition: "service_healthy"
redis:
condition: "service_healthy"
command: [
"python3",
"-u",
"-m", "job_orchestration.scheduler.query.query_scheduler",
"--config", "/etc/clp-config.yml"
]
healthcheck:
<<: *healthcheck_defaults
# FIXME: need to suppressing warnings in the scheduler for reading 0 out of 8 expected bytes
test: [
"CMD",
"bash",
"-c",
"< /dev/tcp/query_scheduler/7000"
]

query-worker:
<<: *service_defaults
hostname: "query_worker"
environment:
AWS_ACCESS_KEY_ID: "${CLP_AWS_ACCESS_KEY_ID}"
AWS_SECRET_ACCESS_KEY: "${CLP_AWS_SECRET_ACCESS_KEY}"
BROKER_URL: "amqp://${CLP_QUEUE_USER}:${CLP_QUEUE_PASS}@queue:5672"
CLP_CONFIG_PATH: "/etc/clp-config.yml"
CLP_HOME: "/opt/clp"
CLP_LOGGING_LEVEL: "${CLP_QUERY_WORKER_LOGGING_LEVEL:-INFO}"
CLP_LOGS_DIR: "/var/log/query_worker"
CLP_WORKER_LOG_PATH: "/var/log/query_worker/worker.log"
PYTHONPATH: "/opt/clp/lib/python3/site-packages"
RESULT_BACKEND: >-
redis://default:${CLP_REDIS_PASS}@redis:6379/${CLP_REDIS_QUERY_BACKEND_DB:-0}
volumes:
- "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-./var/data/archives}:/var/data/archives"
- "${CLP_AWS_CONFIG_DIR_HOST:-~/.aws}:/.aws:ro"
- "${CLP_LOGS_DIR_HOST:-./var/log}/.clp-config.yml:/etc/clp-config.yml:ro"
- "${CLP_QUERY_WORKER_LOGS_DIR_HOST:-./var/log/query_worker}:/var/log/query_worker"
- "${CLP_STREAM_OUTPUT_DIR_HOST:-./var/data/staged-streams}:/var/data/staged-streams"
- "${CLP_STREAM_OUTPUT_DIR_HOST:-./var/data/streams}:/var/data/streams"
command: [
"python3",
"-u",
"/opt/clp/lib/python3/site-packages/bin/celery",
"-A", "job_orchestration.executor.query",
"worker",
"--concurrency", "${CLP_QUERY_WORKER_CONCURRENCY:-1}",
"--loglevel", "${CLP_QUERY_WORKER_LOGGING_LEVEL:-INFO}",
"-f", "/var/log/query_worker/worker.log",
"-Q", "query",
"-n", "query-worker"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reintroduce explicit users for CLP services after removing the global default

Once the shared user is gone, the query-scheduler and query-worker need user: "${CLP_UID_GID:-1000:1000}" to keep matching host file ownership; otherwise they fall back to root. Please add it to query-scheduler, query-worker, and reducer. Based on learnings.

   query-scheduler:
     <<: *service_defaults
     hostname: "query_scheduler"
+    user: "${CLP_UID_GID:-1000:1000}"
@@
   query-worker:
     <<: *service_defaults
     hostname: "query_worker"
+    user: "${CLP_UID_GID:-1000:1000}"
@@
   reducer:
     <<: *service_defaults
     hostname: "reducer"
+    user: "${CLP_UID_GID:-1000:1000}"

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.yaml around lines 19 to 92, after
removal of the global default user the query-scheduler, query-worker (and the
reducer service elsewhere) fall back to root; add a per-service user entry to
preserve host file ownership by adding a line user: "${CLP_UID_GID:-1000:1000}"
under each service definition (same indentation level as hostname) for
query-scheduler, query-worker, and reducer so containers run with the expected
UID:GID.

Comment on lines +33 to +35
- "${CLP_LOGS_DIR_HOST:-./var/log}/.clp-config.yml:/etc/clp-config.yml:ro"
- "${CLP_QUERY_SCHEDULER_LOGS_FILE_HOST:-./var/log/query_scheduler.log}:\
/var/log/query_scheduler.log"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clean up volume entries (single line, no backslash continuations)

Backslash continuations inside quoted scalars confuse Compose tooling; keep each bind on one line.

-      - "${CLP_QUERY_SCHEDULER_LOGS_FILE_HOST:-./var/log/query_scheduler.log}:\
-/var/log/query_scheduler.log"
+      - "${CLP_QUERY_SCHEDULER_LOGS_FILE_HOST:-./var/log/query_scheduler.log}:/var/log/query_scheduler.log"
@@
-      - "${CLP_STREAM_OUTPUT_DIR_HOST:-./var/data/staged-streams}:/var/data/staged-streams"
-      - "${CLP_STREAM_OUTPUT_DIR_HOST:-./var/data/streams}:/var/data/streams"
+      - "${CLP_STAGED_STREAM_OUTPUT_DIR_HOST:-./var/data/staged-streams}:/var/data/staged-streams"
+      - "${CLP_STREAM_OUTPUT_DIR_HOST:-./var/data/streams}:/var/data/streams"
@@
-      - "${CLP_REDUCER_LOGS_DIR_HOST:-./var/log/reducer}:/var/log/reducer"
+      - "${CLP_REDUCER_LOGS_DIR_HOST:-./var/log/reducer}:/var/log/reducer"

Also applies to: 75-80, 103-104

🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.yaml around lines 33-35 (and also
apply the same fix to lines 75-80 and 103-104), the volume bindings use
backslash line continuations inside quoted scalars which breaks Compose parsing;
replace each continued scalar with a single-line volume entry so each bind is
one complete string on its own line, remove the backslashes and any stray
whitespace/newline characters, and ensure each volume mapping remains properly
quoted and follows YAML list item syntax.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docs/src/dev-docs/design-docker-compose.md (2)

80-138: Align Mermaid service IDs with docker-compose keys

The Mermaid diagram still uses underscored identifiers (results_cache, compression_scheduler, etc.), but the compose file defines these services with hyphenated keys (results-cache, compression-scheduler, db, …). Readers cross-referencing the manifest will be misled unless the diagram matches the real service names. Please update each node/edge identifier and visible label to use the compose keys verbatim.


148-161: Update services table to the actual compose service names

The table entries still reference database, compression_scheduler, results_cache, etc., while tools/deployment/package/docker-compose.yml now exposes them as db, compression-scheduler, results-cache, and so on. This discrepancy will cause operators to look for non-existent services. Please rename the table entries to the exact compose service keys.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68586f0 and cba044a.

📒 Files selected for processing (1)
  • docs/src/dev-docs/design-docker-compose.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/src/dev-docs/design-docker-compose.md

161-161: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe

(MD055, table-pipe-style)


161-161: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe

(MD055, table-pipe-style)


161-161: Table column count
Expected: 2; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


174-174: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe

(MD055, table-pipe-style)


174-174: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe

(MD055, table-pipe-style)


174-174: Table column count
Expected: 2; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


181-181: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


186-186: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


191-191: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


195-195: Files should end with a single newline character

(MD047, single-trailing-newline)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)

Comment on lines +180 to +193
1. Check service status:
```bash
docker compose ps
```

2. View service logs:
```bash
docker compose logs <service-name>
```

3. Validate configuration:
```bash
docker compose config
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add blank lines before fenced code blocks in lists

Markdownlint MD031 flags these list items because the ```bash fences are not separated from the list text by a blank line. Insert an empty line between each numbered item description and its code block to keep rendering correct.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

181-181: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


186-186: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


191-191: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents
In docs/src/dev-docs/design-docker-compose.md around lines 180 to 193, the
numbered list items place fenced code blocks immediately after the list text
which triggers Markdownlint MD031; insert a single blank line between each list
item description line and its following ```bash fenced code block so each code
block is separated by an empty line from the list text, then run markdownlint to
confirm the MD031 warning is resolved.

command: [
"python3",
"-u",
"-m", "clp_py_utils.create-db-tables",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [] array syntax does not enforce one string per line, which (subjectively speaking) improves the readability of argument key-value pairs like this

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another partial review.

clp_config.validate_logs_dir()
clp_config.validate_aws_config_dir()

# Create necessary directories
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about moving this into a method of CLPConfig?

Comment on lines +54 to +57
clp_config.data_directory.mkdir(parents=True, exist_ok=True)
clp_config.logs_directory.mkdir(parents=True, exist_ok=True)
clp_config.archive_output.get_directory().mkdir(parents=True, exist_ok=True)
clp_config.stream_output.get_directory().mkdir(parents=True, exist_ok=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log of this try-except block doesn't make sense if one of these calls fails (fail to load doesn't imply anything about creating directories). Can we move the calls into their own try-except with a more appropriate log?

Comment on lines +13 to +14
# Type alias for environment variables dictionary.
EnvVarsDict = Dict[str, str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this after all imports?

validate_webui_config,
)

LOGS_FILE_MODE = stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP | stat.S_IROTH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about LOG_FILE_ACCESS_MODE?

Comment on lines +65 to +67
Abstract base controller for preparing and deploying CLP components.
Provides common logic for preparing environment variables, directories,
and configuration files for each service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should be wrapped to 100 chars.

"--stream-collection", "${CLP_RESULTS_CACHE_STREAM_COLLECTION_NAME:-stream-files}",
]

compression-scheduler:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not need a health_check?

"--config", "/etc/clp-config.yml"
]

compression-worker:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

health_check?

Comment on lines +192 to +193
AWS_ACCESS_KEY_ID: "${CLP_AWS_ACCESS_KEY_ID}"
AWS_SECRET_ACCESS_KEY: "${CLP_AWS_SECRET_ACCESS_KEY}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous code, the credentials and the AWS config dir mount were mutually exclusive. That doesn't seem to be the case anymore? Perhaps things still work, but can you explain a bit?

/var/log/compression_worker"
- "${CLP_DATA_DIR_HOST:-./var/data}:/var/data"
- "${CLP_LOGS_DIR_HOST:-./var/log}/.clp-config.yml:/etc/clp-config.yml:ro"
- "/:/mnt/logs:ro"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be logs_input.directory (at least in the case of logs_input.type == fs)?

"-A", "job_orchestration.executor.compress",
"worker",
"--concurrency", "${CLP_COMPRESSION_WORKER_CONCURRENCY:-1}",
"--loglevel", "${CLP_COMPRESSION_WORKER_LOGGING_LEVEL:-INFO}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be noisy (iirc, that's why we set it to WARN before).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants