Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
145 commits
Select commit Hold shift + click to select a range
fa2aff7
feat(deployment): Migrate package orchestration to Docker Compose (re…
junhaoliao Aug 20, 2025
32ea452
Add garbage collector service to Docker Compose configuration
junhaoliao Aug 22, 2025
217c63f
fix s3 support and various issues
junhaoliao Aug 24, 2025
c229fe0
lint
junhaoliao Aug 24, 2025
a850f69
add Docker Compose launch
junhaoliao Aug 24, 2025
1763e17
Merge branch 'main' into docker-compose
junhaoliao Aug 24, 2025
b283231
remove unused import
junhaoliao Aug 24, 2025
743268c
add proper stop support; reduce stop_grace_period from 10s to 3s
junhaoliao Aug 24, 2025
ee1b9f2
de-duplicate docker-compose.yml
junhaoliao Aug 24, 2025
ec0cfa5
replace hardcoded image name with environment variable for container …
junhaoliao Aug 24, 2025
ab68c25
remove FIXME
junhaoliao Aug 24, 2025
934a83c
remove FIXME
junhaoliao Aug 24, 2025
5bf23c2
Merge branch 'main' into docker-compose
junhaoliao Aug 24, 2025
83cc9d1
reformat
junhaoliao Aug 24, 2025
82abc07
reformat
junhaoliao Aug 24, 2025
0fb2294
Update garbage collector logs directory mapping
junhaoliao Aug 24, 2025
c8ffb94
Remove unused component argument parsers
junhaoliao Aug 24, 2025
83e902a
Remove unused component argument parsers
junhaoliao Aug 24, 2025
5f2e5cd
Refactor dependency checks to include docker-compose status validation
junhaoliao Aug 24, 2025
edfa9c9
Refactor log directory handling to use constant path definitions
junhaoliao Aug 24, 2025
7b3965e
Add constants for archive and stream directory paths
junhaoliao Aug 24, 2025
cd84be8
remove unused component groups and functions
junhaoliao Aug 24, 2025
aa12bdb
Remove unused CONTROLLER_TARGET_NAME constant from start_clp.py
junhaoliao Aug 24, 2025
5365722
fix staging dirs
junhaoliao Aug 24, 2025
21ef703
fix: update command to check if Docker Compose is running
junhaoliao Aug 24, 2025
d2cdfbc
add AWS env credentials support
junhaoliao Aug 24, 2025
4f56709
Merge branch 'main' into docker-compose
junhaoliao Aug 25, 2025
f0db07f
Update container image name in start_clp.py
junhaoliao Aug 26, 2025
655600d
Merge remote-tracking branch 'origin/main' into docker-compose
junhaoliao Sep 3, 2025
5f24ce7
add support for configurable CLP WebUI rate limiting
junhaoliao Sep 3, 2025
3df20fc
update WebUI server path in start_clp.py and docker-compose configura…
junhaoliao Sep 3, 2025
c6f81ad
copy docker-compose.yml in package task
junhaoliao Sep 3, 2025
db9c20f
use absolute paths in archive and stream storage configurations
junhaoliao Sep 3, 2025
ea03e17
refactor: centralize environment variable management and enhance vali…
junhaoliao Sep 4, 2025
60994ee
fix: use List[str] type hint for command parameter in start_clp.py
junhaoliao Sep 4, 2025
7e25d75
refactor: remove `dump_to_env_vars_dict` methods and centralize envir…
junhaoliao Sep 4, 2025
3e24e4e
lint
junhaoliao Sep 4, 2025
cbb9ce1
refactor: modularize and simplify start_clp.py by introducing DockerC…
junhaoliao Sep 4, 2025
3eb8dfe
refactor: remove obsolete node-specific directory configuration comments
junhaoliao Sep 4, 2025
669fa9c
refactor: remove redundant `conf_dir` parameter and use centralized c…
junhaoliao Sep 4, 2025
acee071
refactor: rename `controllers` module to `controller` and update impo…
junhaoliao Sep 4, 2025
3c45cfa
refactor: make `validate_log_directory` private and update references…
junhaoliao Sep 4, 2025
ed20110
refactor: rename `conf_dir` to `_conf_dir` and update references to r…
junhaoliao Sep 4, 2025
4d1f5aa
refactor: make `get_ip_from_hostname` private and update all references
junhaoliao Sep 4, 2025
3a698bb
refactor: extract `transform_for_container_config` method to simplify…
junhaoliao Sep 4, 2025
cca84f4
refactor: centralize and simplify path definitions and container conf…
junhaoliao Sep 4, 2025
537398a
refactor: streamline container service configuration with `transform_…
junhaoliao Sep 4, 2025
42442b8
fix imports
junhaoliao Sep 4, 2025
a3288ae
reorganize imports
junhaoliao Sep 4, 2025
5b36779
fix: adjust volume path formatting in docker-compose
junhaoliao Sep 4, 2025
93882af
fix: correct typo in comment for scheduler healthcheck in docker-compose
junhaoliao Sep 4, 2025
a4d8e1f
refactor: unify volume path definitions and remove redundant staging …
junhaoliao Sep 4, 2025
63e6d72
remove comment
junhaoliao Sep 4, 2025
0531a7b
feat: implement `stop` method in `DockerComposeController` and update…
junhaoliao Sep 4, 2025
f93aec7
refactor: reorder volume definitions in docker-compose for consistenc…
junhaoliao Sep 4, 2025
6a20628
revert error message in start_clp.py
junhaoliao Sep 4, 2025
0935658
refactor: standardize logger messages from "Initializing" to "Provisi…
junhaoliao Sep 4, 2025
ad6192a
remove comment
junhaoliao Sep 4, 2025
9469db4
docs: update multi-node deployment guide and add Docker Compose desig…
junhaoliao Sep 4, 2025
110e9fa
docs: refine Docker Compose design doc for clarity and consistency
junhaoliao Sep 4, 2025
045dde6
refactor: update CLP container configuration to use execution contain…
junhaoliao Sep 4, 2025
fa87d92
refactor(controller): move directory creation logic from `_provision`…
junhaoliao Sep 4, 2025
b6ac2c6
feat(controller): add function to dump shared container configuration…
junhaoliao Sep 4, 2025
bfd8c7b
lint
junhaoliao Sep 4, 2025
2b7959b
refactor(clp-py-utils): update staging directory handling in S3 stora…
junhaoliao Sep 4, 2025
f9eb88c
docs(design): Enhance Docker Compose design doc with diagrams and det…
junhaoliao Sep 4, 2025
0f5bd45
fix title case
junhaoliao Sep 4, 2025
0656928
Update references to docker-compose.yml to docker-compose.yaml.
junhaoliao Sep 4, 2025
66dcbb2
feat(docker): add project name to docker-compose and enhance running …
junhaoliao Sep 10, 2025
09ef298
fix lint
junhaoliao Sep 10, 2025
d3b6a67
fix(taskfile): update default task to `docker-images:package`
junhaoliao Sep 10, 2025
6148a65
feat: reset default ports for container configs.
junhaoliao Sep 17, 2025
0ab99f0
fix(docker): update MongoDB connection string to use internal port
junhaoliao Sep 17, 2025
263be6f
fix(controller): update webui configuration to use container-specific…
junhaoliao Sep 17, 2025
51d1b55
add ownership management for data and logs directories when running a…
junhaoliao Sep 17, 2025
0066386
fix(controller, docker-compose): update user and group ID handling fo…
junhaoliao Sep 17, 2025
bc9ab99
fix(search): enable direct connection for MongoDB client in search.py
junhaoliao Sep 20, 2025
8008138
revert direct connection option for MongoDB client in search.py
junhaoliao Sep 20, 2025
34b60bd
refactor(controller): rename provision methods to set_up_env_for_* to…
junhaoliao Sep 22, 2025
9f63481
reorder private functions
junhaoliao Sep 22, 2025
2344db9
add documentation
junhaoliao Sep 22, 2025
2ba93d7
change visibility of `set_up_env_for` function from public to private…
junhaoliao Sep 22, 2025
ce94225
Improve type hints.
junhaoliao Sep 22, 2025
5225870
docs(clp-package-utils): add docstring for is_docker_compose_running …
junhaoliao Sep 22, 2025
cecf5b2
docs(clp-package-utils): add docstring for check_docker_dependencies …
junhaoliao Sep 22, 2025
0c0c562
docs(clp-package-utils): add docstring for _validate_log_directory fu…
junhaoliao Sep 22, 2025
762884b
remove `None` return type annotation from _validate_log_directory fun…
junhaoliao Sep 22, 2025
fc515db
Rename transform_for_container_config to transform_for_container; add…
junhaoliao Sep 22, 2025
268c144
add docs for x-service-defaults & x-healthcheck-defaults in docker-co…
junhaoliao Sep 22, 2025
c1d7b23
Merge branch 'main' into docker-compose
junhaoliao Sep 24, 2025
fc7aae3
remove garbage-collector's dependency condition on query-scheduler.
junhaoliao Sep 24, 2025
cda0348
Split base services into a separate docker-compose.base.yaml; Launch …
junhaoliao Sep 24, 2025
e506f6d
feat(controller): Add instance ID support for Docker Compose project …
junhaoliao Sep 24, 2025
93034fe
remove network to use default network with bridge driver
junhaoliao Sep 24, 2025
7ebdbb6
Merge branch 'main' into docker-compose
junhaoliao Sep 25, 2025
ca62009
fix(controller): update clp_home to be private variable.
junhaoliao Sep 25, 2025
82cc48c
Use unique ids to launch dev clp package.
junhaoliao Sep 25, 2025
8f6ba1b
remove trailing space
junhaoliao Sep 25, 2025
2aff301
remove `execution_container` field from clp-config.yml template
junhaoliao Sep 25, 2025
5c91bf3
fix(docker-compose): update `include` array syntax to use [] format.
junhaoliao Sep 25, 2025
049e269
update service name from `db` to `database`.
junhaoliao Sep 25, 2025
fc70c05
fix(docker-compose, controller): correct configuration file variable …
junhaoliao Sep 25, 2025
2cb1807
add _HOST postfix to path env var names for consistency
junhaoliao Sep 25, 2025
08472d1
Merge branch 'main' into docker-compose
junhaoliao Sep 25, 2025
09658e5
Merge branch 'main' into docker-compose
junhaoliao Sep 26, 2025
e657feb
fix: the webui service should depend on db-table-creator rather than …
junhaoliao Sep 26, 2025
c79259a
Merge branch 'main' into docker-compose
junhaoliao Oct 1, 2025
af96fc8
lint
junhaoliao Oct 1, 2025
320dd11
Merge branch 'main' into docker-compose
junhaoliao Oct 2, 2025
08644f4
Merge branch 'main' into docker-compose
junhaoliao Oct 2, 2025
dd56d04
fix: Replace container_name with hostname for services to avoid name …
junhaoliao Oct 2, 2025
5298540
refactor(docker-compose): move image definition to service defaults.
junhaoliao Oct 2, 2025
140187f
refactor(docker-compose): Remove image entry from reducer service.
junhaoliao Oct 2, 2025
68586f0
refactor(docker-compose): add default fallback for image and storage …
junhaoliao Oct 2, 2025
cba044a
improve docs
junhaoliao Oct 3, 2025
8a6790f
Merge branch 'main' into docker-compose
junhaoliao Oct 3, 2025
f41e22a
refactor(docker-compose): use list syntax for healthcheck test command.
junhaoliao Oct 3, 2025
da6c17f
Merge branch 'main' into docker-compose
junhaoliao Oct 6, 2025
02fcbe7
fix(docker-compose): revert workers log level to hardcoded "warning".
junhaoliao Oct 6, 2025
6b0e1f2
fix(docker-compose): Fix environment variable name for results cache …
junhaoliao Oct 6, 2025
ddcf760
refactor(clp-package-utils): Move EnvVarsDict type alias below imports.
junhaoliao Oct 6, 2025
4134b1a
refactor(clp-package-utils): Rename LOGS_FILE_MODE to LOG_FILE_ACCESS…
junhaoliao Oct 6, 2025
f483abd
reflow comments to 100 char per line
junhaoliao Oct 6, 2025
7d4f47d
refactor(clp-package-utils): Rename deploy method to start and adjust…
junhaoliao Oct 6, 2025
064f365
refactor(clp-package-utils): Rename _provision method to _set_up_env …
junhaoliao Oct 6, 2025
7beb199
docs(clp-package-utils): Update docstrings to use “sets up” phrasing.
junhaoliao Oct 6, 2025
3507a1b
docs(clp-package-utils): Standardize return docstrings; Remove return…
junhaoliao Oct 6, 2025
eb2754a
refactor(clp-package-utils): Add spacing before log and data director…
junhaoliao Oct 6, 2025
cae66df
refactor(clp-package-utils): Move logs directory assignment after dat…
junhaoliao Oct 6, 2025
dffc2f2
refactor(clp-package-utils): Rename logs_file -> log_file
junhaoliao Oct 6, 2025
c0ef4ff
refactor(clp-package-utils): Rename LOGS_FILE variables to LOG_FILE i…
junhaoliao Oct 6, 2025
95404d0
refactor(clp-package-utils): Remove redundant flush after writing ins…
junhaoliao Oct 6, 2025
e1db192
Clarify resolving to IPv4 in the docstring - Apply suggestions from c…
junhaoliao Oct 6, 2025
630329b
Remove redundant param description in `_get_ip_from_hostname()` - App…
junhaoliao Oct 6, 2025
d9da763
refactor(clp-package-utils): Rename db logging config variable and en…
junhaoliao Oct 6, 2025
c200502
use long form `:readonly` instead of short form `:ro` in binding moun…
junhaoliao Oct 6, 2025
5751ddf
revert unintentional change
junhaoliao Oct 6, 2025
3e9734e
chore(deployment): Add local logging driver to service defaults.
junhaoliao Oct 6, 2025
ab64f4a
Improve docs - Apply suggestions from code review
junhaoliao Oct 7, 2025
2d9906f
chore(deployment): Adjust healthcheck defaults (start interval, perio…
junhaoliao Oct 7, 2025
3635883
refactor(deployment): Use long form volume definitions and use bind o…
junhaoliao Oct 7, 2025
5e0d0d2
refactor(deployment): Use long form port definitions; fix CLP_RESULTS…
junhaoliao Oct 7, 2025
e771800
refactor(deployment): Reorder port definitions to long form order.
junhaoliao Oct 7, 2025
6e878cf
refactor(clp-package-utils): Replace deprecated Pydantic (V2) copy() …
junhaoliao Oct 7, 2025
f2b0967
feat(deployment): Add support for configurable logs input directory m…
junhaoliao Oct 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
637 changes: 637 additions & 0 deletions components/clp-package-utils/clp_package_utils/controller.py

Large diffs are not rendered by default.

136 changes: 75 additions & 61 deletions components/clp-package-utils/clp_package_utils/general.py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirkrodrigues any reason we didn't make the validate_and_load_ functions as instance methods?

Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import enum
import errno
import json
import os
import pathlib
import re
Expand All @@ -15,6 +16,9 @@
CLP_DEFAULT_CREDENTIALS_FILE_PATH,
CLP_SHARED_CONFIG_FILENAME,
CLPConfig,
CONTAINER_AWS_CONFIG_DIRECTORY,
CONTAINER_CLP_HOME,
CONTAINER_INPUT_LOGS_ROOT_DIR,
DB_COMPONENT_NAME,
QueryEngine,
QUEUE_COMPONENT_NAME,
Expand Down Expand Up @@ -42,12 +46,6 @@
EXTRACT_IR_CMD = "i"
EXTRACT_JSON_CMD = "j"

# Paths
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to move this to clp_config.py to avoid circular import

CONTAINER_AWS_CONFIG_DIRECTORY = pathlib.Path("/") / ".aws"
CONTAINER_CLP_HOME = pathlib.Path("/") / "opt" / "clp"
CONTAINER_INPUT_LOGS_ROOT_DIR = pathlib.Path("/") / "mnt" / "logs"
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH = pathlib.Path("etc") / "clp-config.yml"

DOCKER_MOUNT_TYPE_STRINGS = ["bind"]


Expand Down Expand Up @@ -132,59 +130,58 @@ def generate_container_name(job_type: str) -> str:
return f"clp-{job_type}-{str(uuid.uuid4())[-4:]}"


def check_dependencies():
def is_docker_compose_running(project_name: str) -> bool:
"""
Checks if a Docker Compose project is running.

:param project_name:
:return: True if at least one instance is running, else False.
:raises EnvironmentError: If Docker Compose is not installed or fails.
"""
cmd = ["docker", "compose", "ls", "--format", "json", "--filter", f"name={project_name}"]
try:
output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
running_instances = json.loads(output)
return len(running_instances) >= 1
except subprocess.CalledProcessError:
raise EnvironmentError("docker-compose is not installed or not functioning properly.")


def check_docker_dependencies(should_compose_run: bool, project_name: str):
"""
Checks if Docker and Docker Compose are installed, and whether Docker Compose is running or not.

:param should_compose_run:
:param project_name: The Docker Compose project name to check.
:raises EnvironmentError: If any Docker dependency is not installed or Docker Compose state
does not match expectation.
"""
try:
subprocess.run(
"command -v docker",
shell=True,
stdout=subprocess.PIPE,
stdout=subprocess.DEVNULL,
stderr=subprocess.STDOUT,
check=True,
)
except subprocess.CalledProcessError:
raise EnvironmentError("docker is not installed or available on the path")
try:
subprocess.run(
["docker", "ps"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, check=True
)
except subprocess.CalledProcessError:
raise EnvironmentError("docker cannot run without superuser privileges (sudo).")


def is_container_running(container_name):
# fmt: off
cmd = [
"docker", "ps",
# Only return container IDs
"--quiet",
"--filter", f"name={container_name}"
]
# fmt: on
proc = subprocess.run(cmd, stdout=subprocess.PIPE)
if proc.stdout.decode("utf-8"):
return True

return False

is_running = is_docker_compose_running(project_name)
if should_compose_run and not is_running:
raise EnvironmentError("docker-compose is not running.")
if not should_compose_run and is_running:
raise EnvironmentError("docker-compose is already running.")
Comment on lines +170 to +174
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Stop command must stay idempotent.

Line 171
Raising here means DockerComposeController.stop() (components/clp-package-utils/clp_package_utils/controller.py, Lines 490-517) now fails whenever the project is already down, so stop_clp.py exits with an error on a second run. The old flow tolerated redundant stops, and we should keep that contract. Let a “not running” project short-circuit instead of hard failing.

-    is_running = is_docker_compose_running(project_name)
-    if should_compose_run and not is_running:
-        raise EnvironmentError("docker-compose is not running.")
-    if not should_compose_run and is_running:
-        raise EnvironmentError("docker-compose is already running.")
+    is_running = is_docker_compose_running(project_name)
+    if should_compose_run:
+        if not is_running:
+            return
+    elif is_running:
+        raise EnvironmentError("docker-compose is already running.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
is_running = is_docker_compose_running(project_name)
if should_compose_run and not is_running:
raise EnvironmentError("docker-compose is not running.")
if not should_compose_run and is_running:
raise EnvironmentError("docker-compose is already running.")
is_running = is_docker_compose_running(project_name)
if should_compose_run:
if not is_running:
return
elif is_running:
raise EnvironmentError("docker-compose is already running.")
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 171 to
175, the current code raises an EnvironmentError when compose is not running,
breaking idempotency of DockerComposeController.stop(); instead, treat a "not
running" project as a no-op and short-circuit gracefully. Replace the second
raise (when not should_compose_run and not is_running) with an early return (or
a harmless log and return) so stop() does not error on redundant stops; keep the
original error raise only for the case where we expected compose to be running
but it's not (or adjust logic symmetrically if semantics differ).


def is_container_exited(container_name):
# fmt: off
cmd = [
"docker", "ps",
# Only return container IDs
"--quiet",
"--filter", f"name={container_name}",
"--filter", "status=exited"
]
# fmt: on
proc = subprocess.run(cmd, stdout=subprocess.PIPE)
if proc.stdout.decode("utf-8"):
return True

return False

def _validate_log_directory(logs_dir: pathlib.Path, component_name: str):
"""
Validate that a log directory path of a component is valid.

def validate_log_directory(logs_dir: pathlib.Path, component_name: str) -> None:
:param logs_dir:
:param component_name:
:raises ValueError: If the path is invalid or not a directory.
"""
try:
validate_path_could_be_dir(logs_dir)
except ValueError as ex:
Expand Down Expand Up @@ -309,6 +306,19 @@ def generate_container_config(
return container_clp_config, docker_mounts


def generate_docker_compose_container_config(clp_config: CLPConfig) -> CLPConfig:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate_container_config() should eventually be deprecated when we migrate the other (compress / decompress / search / management) scripts to docker compose

"""
Copies the given config and transforms mount paths and hosts for Docker Compose.

:param clp_config:
:return: The container config and the mounts.
"""
container_clp_config = clp_config.model_copy(deep=True)
container_clp_config.transform_for_container()

return container_clp_config


def generate_worker_config(clp_config: CLPConfig) -> WorkerConfig:
worker_config = WorkerConfig()
worker_config.package = clp_config.package.model_copy(deep=True)
Expand Down Expand Up @@ -431,11 +441,6 @@ def load_config_file(
validate_path_for_container_mount(clp_config.data_directory)
validate_path_for_container_mount(clp_config.logs_directory)

# Make data and logs directories node-specific
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BREAKING

hostname = socket.gethostname()
clp_config.data_directory /= hostname
clp_config.logs_directory /= hostname

return clp_config


Expand Down Expand Up @@ -488,35 +493,40 @@ def validate_and_load_redis_credentials_file(
clp_config.redis.load_credentials_from_file(clp_config.credentials_file_path)


def validate_db_config(clp_config: CLPConfig, data_dir: pathlib.Path, logs_dir: pathlib.Path):
def validate_db_config(
clp_config: CLPConfig, base_config: pathlib.Path, data_dir: pathlib.Path, logs_dir: pathlib.Path
):
if not base_config.exists():
raise ValueError(
f"{DB_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
)
Comment on lines +496 to +502
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider validating base_config is a file, not just that it exists.

The function checks if base_config.exists() but doesn't verify it's actually a file. A directory with the same name would pass this check but likely cause issues later.

Apply this diff to add file type validation:

     if not base_config.exists():
         raise ValueError(
             f"{DB_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
         )
+    if not base_config.is_file():
+        raise ValueError(
+            f"{DB_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def validate_db_config(
clp_config: CLPConfig, base_config: pathlib.Path, data_dir: pathlib.Path, logs_dir: pathlib.Path
):
if not base_config.exists():
raise ValueError(
f"{DB_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
)
def validate_db_config(
clp_config: CLPConfig, base_config: pathlib.Path, data_dir: pathlib.Path, logs_dir: pathlib.Path
):
if not base_config.exists():
raise ValueError(
f"{DB_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
)
if not base_config.is_file():
raise ValueError(
f"{DB_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
)
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 473 to
479, the check only verifies base_config.exists() but not that it's a regular
file; update the validation to ensure base_config.is_file() (or fail if it
exists but is not a file) and raise a ValueError with a clear message if the
path is missing or not a file so directories don't pass this check.

_validate_data_directory(data_dir, DB_COMPONENT_NAME)
validate_log_directory(logs_dir, DB_COMPONENT_NAME)
_validate_log_directory(logs_dir, DB_COMPONENT_NAME)

validate_port(f"{DB_COMPONENT_NAME}.port", clp_config.database.host, clp_config.database.port)


def validate_queue_config(clp_config: CLPConfig, logs_dir: pathlib.Path):
validate_log_directory(logs_dir, QUEUE_COMPONENT_NAME)
_validate_log_directory(logs_dir, QUEUE_COMPONENT_NAME)

validate_port(f"{QUEUE_COMPONENT_NAME}.port", clp_config.queue.host, clp_config.queue.port)


def validate_redis_config(
clp_config: CLPConfig, data_dir: pathlib.Path, logs_dir: pathlib.Path, base_config: pathlib.Path
clp_config: CLPConfig, base_config: pathlib.Path, data_dir: pathlib.Path, logs_dir: pathlib.Path
):
_validate_data_directory(data_dir, REDIS_COMPONENT_NAME)
validate_log_directory(logs_dir, REDIS_COMPONENT_NAME)

if not base_config.exists():
raise ValueError(
f"{REDIS_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
)
Comment on lines 517 to 521
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider validating base_config is a file for Redis config.

Same issue as the database config validation - should verify it's a file, not just that it exists.

Apply this diff to add file type validation:

     if not base_config.exists():
         raise ValueError(
             f"{REDIS_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
         )
+    if not base_config.is_file():
+        raise ValueError(
+            f"{REDIS_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
):
_validate_data_directory(data_dir, REDIS_COMPONENT_NAME)
validate_log_directory(logs_dir, REDIS_COMPONENT_NAME)
if not base_config.exists():
raise ValueError(
f"{REDIS_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
)
if not base_config.exists():
raise ValueError(
f"{REDIS_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
)
if not base_config.is_file():
raise ValueError(
f"{REDIS_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
)
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 494 to
498, the check only verifies base_config.exists(); update it to also verify
base_config.is_file() similar to the database config validation. If not
base_config.is_file() raise a ValueError stating the Redis base configuration at
{path} is missing or not a file (or split into two checks: exists and is_file
with clear messages). Ensure the new check runs before using the path so callers
get a clear error when a directory or non-file path is provided.

_validate_data_directory(data_dir, REDIS_COMPONENT_NAME)
_validate_log_directory(logs_dir, REDIS_COMPONENT_NAME)

validate_port(f"{REDIS_COMPONENT_NAME}.port", clp_config.redis.host, clp_config.redis.port)


def validate_reducer_config(clp_config: CLPConfig, logs_dir: pathlib.Path, num_workers: int):
validate_log_directory(logs_dir, REDUCER_COMPONENT_NAME)
_validate_log_directory(logs_dir, REDUCER_COMPONENT_NAME)

for i in range(0, num_workers):
validate_port(
Expand All @@ -527,10 +537,14 @@ def validate_reducer_config(clp_config: CLPConfig, logs_dir: pathlib.Path, num_w


def validate_results_cache_config(
clp_config: CLPConfig, data_dir: pathlib.Path, logs_dir: pathlib.Path
clp_config: CLPConfig, base_config: pathlib.Path, data_dir: pathlib.Path, logs_dir: pathlib.Path
):
if not base_config.exists():
raise ValueError(
f"{RESULTS_CACHE_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
)
Comment on lines +540 to +545
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider validating base_config is a file for results cache config.

Consistent with the other config validations, should verify it's a file.

Apply this diff to add file type validation:

     if not base_config.exists():
         raise ValueError(
             f"{RESULTS_CACHE_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
         )
+    if not base_config.is_file():
+        raise ValueError(
+            f"{RESULTS_CACHE_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
clp_config: CLPConfig, base_config: pathlib.Path, data_dir: pathlib.Path, logs_dir: pathlib.Path
):
if not base_config.exists():
raise ValueError(
f"{RESULTS_CACHE_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
)
if not base_config.exists():
raise ValueError(
f"{RESULTS_CACHE_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
)
if not base_config.is_file():
raise ValueError(
f"{RESULTS_CACHE_COMPONENT_NAME} base configuration at {str(base_config)} is not a file."
)
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 517 to
522, the current validation only checks base_config.exists(); update it to also
verify base_config.is_file() and raise a ValueError if it's not a regular file.
Specifically, after checking existence, add a conditional that raises a clear
error mentioning that the results cache base configuration path must be a file
(include the path in the message) to match the other config validations.

_validate_data_directory(data_dir, RESULTS_CACHE_COMPONENT_NAME)
validate_log_directory(logs_dir, RESULTS_CACHE_COMPONENT_NAME)
_validate_log_directory(logs_dir, RESULTS_CACHE_COMPONENT_NAME)

validate_port(
f"{RESULTS_CACHE_COMPONENT_NAME}.port",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@
from clp_py_utils.clp_config import (
CLP_DB_PASS_ENV_VAR_NAME,
CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLP_DEFAULT_DATASET_NAME,
StorageEngine,
StorageType,
)

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLPConfig,
DockerMount,
dump_container_config,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@
from clp_py_utils.clp_config import (
CLP_DB_PASS_ENV_VAR_NAME,
CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLP_DEFAULT_DATASET_NAME,
StorageEngine,
)
from job_orchestration.scheduler.job_config import InputType

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CONTAINER_INPUT_LOGS_ROOT_DIR,
dump_container_config,
generate_container_config,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@
ARCHIVE_MANAGER_ACTION_NAME,
CLP_DB_PASS_ENV_VAR_NAME,
CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
StorageEngine,
StorageType,
)
from clp_py_utils.s3_utils import generate_container_auth_options

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
dump_container_config,
generate_container_config,
generate_container_name,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@
from clp_py_utils.clp_config import (
CLP_DB_PASS_ENV_VAR_NAME,
CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLP_DEFAULT_DATASET_NAME,
CLPConfig,
StorageEngine,
StorageType,
)

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
DockerMount,
DockerMountType,
dump_container_config,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,14 @@
from pathlib import Path
from typing import Any, List, Optional

from clp_py_utils.clp_config import Database
from clp_py_utils.clp_config import CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH, Database
from clp_py_utils.clp_metadata_db_utils import (
delete_archives_from_metadata_db,
get_archives_table_name,
)
from clp_py_utils.sql_adapter import SQL_Adapter

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLPConfig,
get_clp_home,
load_config_file,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import brotli
import msgpack
from clp_py_utils.clp_config import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLPConfig,
COMPRESSION_JOBS_TABLE_NAME,
)
Expand All @@ -29,7 +30,6 @@
)

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CONTAINER_INPUT_LOGS_ROOT_DIR,
get_clp_home,
load_config_file,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,13 @@
from pathlib import Path
from typing import Dict, List

from clp_py_utils.clp_config import ArchiveOutput, Database, S3Config, StorageType
from clp_py_utils.clp_config import (
ArchiveOutput,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
Database,
S3Config,
StorageType,
)
from clp_py_utils.clp_metadata_db_utils import (
delete_dataset_from_metadata_db,
get_datasets_table_name,
Expand All @@ -15,7 +21,6 @@
from clp_py_utils.sql_adapter import SQL_Adapter

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLPConfig,
get_clp_home,
load_config_file,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from clp_py_utils.clp_config import (
CLP_DB_PASS_ENV_VAR_NAME,
CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLPConfig,
Database,
)
Comment on lines 12 to 18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Import relocation to clp_py_utils.clp_config looks correct; default path construction remains consistent.

No functional change; aligns with config centralization.

Run to confirm there are no lingering imports from the old location:

Also applies to: 281-284


🏁 Script executed:

#!/bin/bash
rg -nP --glob '!**/site-packages/**' $'from\\s+clp_package_utils\\.general\\s+import\\s+[^\\n]*\\bCLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH\\b'

Length of output: 133


🏁 Script executed:

#!/bin/bash
# Search for any lingering imports of relocated constants from the old module
rg -nP --glob '!**/site-packages/**' $'from\\s+clp_package_utils\\.general\\s+import\\s+[^\\n]*(CLP_DB_PASS_ENV_VAR_NAME|CLP_DB_USER_ENV_VAR_NAME|CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH)\\b' -n

# Search for any import of the old general module
rg -nP --glob '!**/site-packages/**' 'from\s+clp_package_utils\.general\s+import' -n

# Search for any direct references to the old module namespace
rg -nP --glob '!**/site-packages/**' 'clp_package_utils\.general\.' -n

Length of output: 1994


Update all remaining imports from clp_package_utils.general to clp_py_utils.clp_config
The following files still import from the old module and need their constants relocated:

  • components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:14
  • components/clp-package-utils/clp_package_utils/controller.py:35
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py:18
  • components/clp-package-utils/clp_package_utils/scripts/native/search.py:22
  • components/clp-package-utils/clp_package_utils/scripts/stop_clp.py:7
  • components/clp-package-utils/clp_package_utils/scripts/search.py:17
  • components/clp-package-utils/clp_package_utils/scripts/decompress.py:19
  • components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:28
  • components/clp-package-utils/clp_package_utils/scripts/start_clp.py:9
  • components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py:19
  • components/clp-package-utils/clp_package_utils/scripts/native/dataset_manager.py:23
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py:17
  • components/clp-package-utils/clp_package_utils/scripts/native/compress.py:32
  • components/clp-package-utils/clp_package_utils/scripts/compress.py:19

Replace these with the corresponding imports from clp_py_utils.clp_config.

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/native/decompress.py
around lines 12 to 18, ensure the import source is the new module
clp_py_utils.clp_config (not clp_package_utils.general); replace any occurrences
importing CLP_DB_PASS_ENV_VAR_NAME, CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH, CLPConfig, Database from
clp_package_utils.general with equivalent imports from clp_py_utils.clp_config,
and apply the same replacement to the other files listed in the review (update
their import paths to clp_py_utils.clp_config and remove the old module
reference).

Expand All @@ -25,7 +26,6 @@
)

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
EXTRACT_FILE_CMD,
EXTRACT_IR_CMD,
EXTRACT_JSON_CMD,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import psutil
import pymongo
from clp_py_utils.clp_config import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
Database,
ResultsCache,
)
Expand All @@ -20,7 +21,6 @@
from job_orchestration.scheduler.job_config import AggregationConfig, SearchJobConfig

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
get_clp_home,
load_config_file,
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@
from clp_py_utils.clp_config import (
CLP_DB_PASS_ENV_VAR_NAME,
CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLP_DEFAULT_DATASET_NAME,
StorageEngine,
StorageType,
)

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
dump_container_config,
generate_container_config,
generate_container_name,
Expand Down
Loading
Loading