feat(clp-py-utils): Add database-backed AWS credential management system. #1486

Eden-D-Zhang · 2025-10-23T19:46:25Z

Description

This PR implements the database layer for AWS S3 credential management in the CLP package. Currently, S3 credentials are stored directly in clp-config.yml and job configurations, posing security risks as they can be exposed through the web UI. This change introduces a separate database-backed credential storage system and Pydantic models to represent credential entries.

Key Changes:

aws_credentials table: Stores long-term AWS credentials (access_key_id, secret_access_key pair) with optional role ARN for temporary role assumption.
aws_temporary_credentials table: Caches temporary credentials, tracking source (eg. role ARN) and expiry.
AwsCredential Model: Represents stored long-term credentials
TemporaryCredential Model: Represents cached session tokens
S3CredentialManager class: CRUD operations for database table interface and management of temporary credentials.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Summary by CodeRabbit

Release Notes

New Features
- Introduced AWS credential management capabilities supporting both long-term and temporary credentials.
- Added credential validation, caching, and automatic expiry detection with configurable buffer times.
- Enhanced database initialization to support secure credential storage and retrieval.

…n methods

…ment

…er validation

…ge-s3-credentials

…nager

…-credentials

coderabbitai · 2025-10-23T19:46:35Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This change adds comprehensive AWS credential management to the CLP Python utilities. It introduces new credential models, database table infrastructure for storing both long-term and temporary credentials, initialization logic for setting up credential tables, and a credential manager utility class that handles CRUD operations, session token caching, and expiry validation.

Changes

Cohort / File(s)	Summary
Credential Models `components/clp-py-utils/clp_py_utils/clp_config.py`	Added `AwsCredential` model with fields for ID, name (validated), access key, secret key, optional role ARN, and timestamps. Added `TemporaryCredential` model with fields for cached session tokens, expiry tracking, and source tracking. Both models provide `to_s3_credentials()` conversion. `TemporaryCredential` includes `is_expired()` method with configurable buffer.
Database Infrastructure `components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py`	Added AWS_CREDENTIALS_TABLE_SUFFIX and AWS_TEMPORARY_CREDENTIALS_TABLE_SUFFIX constants. Added internal functions `_create_aws_credentials_table()` and `_create_aws_temporary_credentials_table()`. Added public wrappers `create_aws_credentials_table()`, `create_aws_temporary_credentials_table()`, `get_aws_credentials_table_name()`, and `get_aws_temporary_credentials_table_name()`. Updated TABLE_SUFFIX_MAX_LEN to include new suffixes.
Database Initialization `components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py`	Added imports for AWS table creation functions. Integrated calls to `create_aws_credentials_table()` and `create_aws_temporary_credentials_table()` into the initialization flow after metadata database cursor establishment.
Credential Manager `components/clp-py-utils/clp_py_utils/s3_credential_manager.py`	New class `S3CredentialManager` for managing AWS credentials. Provides long-term credential CRUD: `create_credential()`, `list_credentials()`, `get_credential_by_id()`, `get_credential_by_name()`, `update_credential()`, `delete_credential()`. Provides temporary credential operations: `cache_session_token()`, `get_cached_session_token_by_source()`, `cleanup_expired_session_tokens()`. Includes private validation methods for credential fields.

Sequence Diagram(s)

sequenceDiagram
    actor Admin
    participant Manager as S3CredentialManager
    participant DB as Metadata Database
    participant Cache as Session Token<br/>Cache

    Admin->>Manager: create_credential(name, keys, role)
    Manager->>Manager: _validate_* inputs
    Manager->>DB: Check name uniqueness
    Manager->>DB: INSERT AwsCredential
    Manager->>Admin: Return credential_id

    Admin->>Manager: cache_session_token(key_id, keys, token, source, expires)
    Manager->>Manager: Validate fields
    Manager->>DB: INSERT TemporaryCredential
    Manager->>Cache: Cache stored
    Manager->>Admin: Return token_id

    Admin->>Manager: get_cached_session_token_by_source(source)
    Manager->>DB: Query recent non-expired tokens
    Manager->>Admin: Return TemporaryCredential (or None)

    Admin->>Manager: cleanup_expired_session_tokens()
    Manager->>DB: DELETE expired tokens
    Manager->>Admin: Return deleted count

    Admin->>Manager: update_credential(id, fields...)
    Manager->>DB: Check name uniqueness
    Manager->>DB: UPDATE AwsCredential
    Manager->>Admin: Return success boolean

    Admin->>Manager: delete_credential(id)
    Manager->>DB: DELETE AwsCredential (CASCADE)
    Manager->>Admin: Return success boolean

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

The changes span multiple files with heterogeneous concerns: new data models with validation rules, database schema and table management functions, initialization integration, and a feature-rich credential manager class with stateful CRUD operations, caching logic, and expiry validation. The credential manager alone requires careful review of input validation, SQL operations, transaction handling, and the correctness of filtering/expiry logic.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "feat(clp-py-utils): Add database-backed AWS credential management system." is fully aligned with the changeset. The title directly describes the primary purpose of the changes: introducing a database layer for AWS credential management. The changeset includes new Pydantic models (AwsCredential and TemporaryCredential), database table creation utilities, and a new S3CredentialManager class—all of which collectively implement exactly what the title states. The title is specific, concise, and clearly communicates the main change without vague terminology or unnecessary noise.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Eden-D-Zhang · 2025-10-23T19:47:31Z

@coderabbitai review

coderabbitai · 2025-10-23T19:48:23Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Eden-D-Zhang · 2025-10-23T20:12:59Z

@coderabbitai review

coderabbitai · 2025-10-23T20:13:26Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 8

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6abc934 and 99c1dbf.

📒 Files selected for processing (4)

components/clp-py-utils/clp_py_utils/clp_config.py (3 hunks)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (5 hunks)
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (2 hunks)
components/clp-py-utils/clp_py_utils/s3_credential_manager.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.14.1)

components/clp-py-utils/clp_py_utils/clp_config.py

442-442: datetime.datetime.now() called without a tz argument

(DTZ005)

443-443: datetime.datetime.now() called without a tz argument

(DTZ005)

505-505: datetime.datetime.now() called without a tz argument

(DTZ005)

components/clp-py-utils/clp_py_utils/s3_credential_manager.py

5-5: typing.List is deprecated, use list instead

(UP035)

5-5: typing.Tuple is deprecated, use tuple instead

(UP035)

28-28: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

63-63: Boolean positional value in function call

(FBT003)

67-67: Possible SQL injection vector through string-based query construction

(S608)

69-69: Avoid specifying long messages outside the exception class

(TRY003)

72-76: Possible SQL injection vector through string-based query construction

(S608)

82-82: Logging statement uses f-string

(G004)

94-94: Boolean positional value in function call

(FBT003)

98-102: Possible SQL injection vector through string-based query construction

(S608)

117-117: Boolean positional value in function call

(FBT003)

121-126: Possible SQL injection vector through string-based query construction

(S608)

154-154: Boolean positional value in function call

(FBT003)

158-163: Possible SQL injection vector through string-based query construction

(S608)

201-201: Avoid specifying long messages outside the exception class

(TRY003)

206-206: Boolean positional value in function call

(FBT003)

210-210: Possible SQL injection vector through string-based query construction

(S608)

219-219: Possible SQL injection vector through string-based query construction

(S608)

223-223: Avoid specifying long messages outside the exception class

(TRY003)

250-254: Possible SQL injection vector through string-based query construction

(S608)

259-259: Logging statement uses f-string

(G004)

275-275: Boolean positional value in function call

(FBT003)

278-278: Possible SQL injection vector through string-based query construction

(S608)

283-283: Logging statement uses f-string

(G004)

285-285: Logging statement uses f-string

(G004)

313-313: Avoid specifying long messages outside the exception class

(TRY003)

316-316: Avoid specifying long messages outside the exception class

(TRY003)

319-319: Avoid specifying long messages outside the exception class

(TRY003)

324-324: Boolean positional value in function call

(FBT003)

328-332: Possible SQL injection vector through string-based query construction

(S608)

346-346: Logging statement uses f-string

(G004)

366-366: Boolean positional value in function call

(FBT003)

370-375: Possible SQL injection vector through string-based query construction

(S608)

410-410: Boolean positional value in function call

(FBT003)

414-417: Possible SQL injection vector through string-based query construction

(S608)

423-423: Logging statement uses f-string

(G004)

439-439: Avoid specifying long messages outside the exception class

(TRY003)

442-442: Avoid specifying long messages outside the exception class

(TRY003)

445-445: Avoid specifying long messages outside the exception class

(TRY003)

449-451: Avoid specifying long messages outside the exception class

(TRY003)

461-461: Avoid specifying long messages outside the exception class

(TRY003)

471-471: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: package-image

🔇 Additional comments (3)

components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (1)

66-67: Good call ordering and idempotent setup.

Creating the credentials table before the temporary-credentials table avoids FK issues; committing once is fine. No further changes needed.

components/clp-py-utils/clp_py_utils/s3_credential_manager.py (2)

60-68: Note on S608 warnings (table names in f-strings).

Table identifiers are built from a constant prefix and fixed suffix and are wrapped in backticks; values are still parameterised. This is acceptable. If you ever allow a user‑supplied prefix, validate against a strict pattern (e.g., r'^[a-zA-Z0-9_]+$').

Also applies to: 91-106, 114-129, 151-170, 203-247, 272-287, 321-341, 363-387, 407-427

350-399: Apply expiry buffer and verify UTC storage before implementation.

The suggestion to add a buffer and compute the cutoff in Python is valid, but requires careful verification:

Type consistency risk: TemporaryCredential.expires_at is a naive datetime. The proposed refactor uses datetime.now(timezone.utc), creating a mismatch between timezone-aware query values and the naive type. Ensure the DB stores/returns expires_at in UTC and that type annotations are consistent.

Verify DB schema: The suggestion requires confirmation that the expires_at column is defined in UTC or has proper timezone configuration. Database migration or schema files were not found in the codebase—manually verify your DB setup.

Review cleanup method: The cleanup_expired_session_tokens() method (line 416) uses the same NOW() pattern and should be refactored consistently for uniformity.

Confirm expires_at source: Determine where cache_session_token() receives the expires_at value (likely from AWS STS) to confirm it's already in UTC before applying the buffer.

The refactor is sound in principle but needs these validations to avoid subtle timezone bugs.

coderabbitai · 2025-10-23T20:23:27Z

components/clp-py-utils/clp_py_utils/clp_config.py

@@ -1,5 +1,6 @@
 import os
 import pathlib
+from datetime import datetime


⚠️ Potential issue | 🟠 Major

Fix tz-naive defaults and import; avoid import-time evaluation.

Use Field(default_factory=...) with UTC to prevent tz-naive values and import-time evaluation.

-from datetime import datetime +from datetime import datetime, timezone @@ - created_at: datetime = datetime.now() - updated_at: datetime = datetime.now() + created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc)) + updated_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))

Also applies to: 442-443

🤖 Prompt for AI Agents

In components/clp-py-utils/clp_py_utils/clp_config.py around line 3 (and also lines 442-443), replace any direct datetime import/usages that create tz-naive values at import time with a timezone-aware default_factory: import datetime and timezone (from datetime import datetime, timezone) and update Field(default=...) or module-level datetime.utcnow() calls to Field(default_factory=lambda: datetime.now(timezone.utc)) so defaults are created at runtime and are UTC-aware; ensure you import Field from pydantic (or the correct config dataclass provider) if not already.

coderabbitai · 2025-10-23T20:23:27Z

components/clp-py-utils/clp_py_utils/clp_config.py

+    def is_expired(self, buffer_minutes: int = 5) -> bool:
+        """
+        Checks if credential is expired or expiring soon.
+
+        :param buffer_minutes: Minutes of buffer before expiration to consider credential expired.
+        :return: True if expired or expiring within buffer_minutes.
+        """
+        from datetime import timedelta
+
+        return datetime.now() >= self.expires_at - timedelta(minutes=buffer_minutes)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Make is_expired time-safe (UTC, tz-aware).

Comparing tz-naive datetimes risks errors and drift across hosts. Normalise to UTC and handle naive DB values.

- def is_expired(self, buffer_minutes: int = 5) -> bool: + def is_expired(self, buffer_minutes: int = 5) -> bool: """ Checks if credential is expired or expiring soon. @@ - from datetime import timedelta - - return datetime.now() >= self.expires_at - timedelta(minutes=buffer_minutes) + from datetime import timedelta, timezone + now = datetime.now(timezone.utc) + exp = self.expires_at + if exp.tzinfo is None: + # Assume DB stores UTC; attach UTC tzinfo to compare safely. + exp = exp.replace(tzinfo=timezone.utc) + return now >= exp - timedelta(minutes=buffer_minutes)

If DB values are not UTC, document and convert at read time.

I'll verify this review comment by examining the codebase to understand the current implementation and the timezone handling.

🏁 Script executed:

#!/bin/bash # Find and examine the class definition containing is_expired fd -e py . | xargs rg -n "def is_expired|expires_at" --context=5 components/clp-py-utils/clp_py_utils/clp_config.py

Length of output: 17989

🏁 Script executed:

#!/bin/bash # Look at the full class structure around line 496 fd clp_config.py | xargs cat | head -550 | tail -100

Length of output: 3120

🏁 Script executed:

#!/bin/bash # Search for expires_at field definition and initialization fd -e py . | xargs rg -n "expires_at\s*=" --context=3 | head -40

Length of output: 740

🏁 Script executed:

#!/bin/bash # Check timezone imports and UTC usage patterns in the file fd clp_config.py | xargs rg -n "import.*timezone|from.*timezone|timezone\.utc|datetime\.now"

Length of output: 180

🏁 Script executed:

#!/bin/bash # Search for other datetime comparisons that might have similar issues fd -e py . | xargs rg -n "datetime\.now" --context=2 | head -30

Length of output: 3344

🏁 Script executed:

#!/bin/bash # Check if there are DB interactions or datetime conversions fd -e py . | xargs rg -n "tzinfo|utc|UTC" components/clp-py-utils/clp_py_utils/ | head -20

Length of output: 1970

🏁 Script executed:

#!/bin/bash # Look for any tests or usages of is_expired method fd -e py . | xargs rg -n "is_expired" | head -20

Length of output: 751

🏁 Script executed:

#!/bin/bash # Examine the class structure and field definitions fd clp_config.py | xargs ast-grep --pattern $'class $_ { $$$ expires_at $$$ }'

Length of output: 0

🏁 Script executed:

#!/bin/bash # Get the broader context around the is_expired method fd clp_config.py | xargs python3 << 'EOF' with open('components/clp-py-utils/clp_py_utils/clp_config.py', 'r') as f: lines = f.readlines() # Look for is_expired and surrounding context for i, line in enumerate(lines): if 'is_expired' in line or 'expires_at' in line: start = max(0, i - 10) end = min(len(lines), i + 15) print(f"Lines {start+1}-{end}:") for j in range(start, end): print(f"{j+1:4d}: {lines[j]}", end='') print("\n" + "="*80 + "\n") EOF

Length of output: 87

🏁 Script executed:

#!/bin/bash # Check the imports at the top of the file fd clp_config.py | xargs head -30

Length of output: 667

🏁 Script executed:

#!/bin/bash # Look at how datetime is imported and used fd clp_config.py | xargs rg -n "^from datetime|^import datetime" -A 5

Length of output: 172

🏁 Script executed:

#!/bin/bash # Check type hints for expires_at fd clp_config.py | xargs rg -n "expires_at.*:" --context=2

Length of output: 146

🏁 Script executed:

#!/bin/bash # Search for DB schema or ORM model definitions fd -e py . | xargs rg -n "expires_at" --context=4 components/

Length of output: 22398

🏁 Script executed:

#!/bin/bash # Check if datetime.now() returns naive or if tzinfo handling exists fd clp_config.py | xargs python3 << 'EOF' import re with open('components/clp-py-utils/clp_py_utils/clp_config.py', 'r') as f: content = f.read() # Find the class and is_expired method if 'def is_expired' in content: idx = content.find('def is_expired') print("is_expired method found at character index:", idx) print("\nContext (500 chars before and after):") print(content[max(0, idx-500):idx+500]) print("\n" + "="*80) # Check for timezone usage if 'timezone' in content: print("timezone import/usage detected") if 'utc' in content.lower(): print("UTC references detected") EOF

Length of output: 89

🏁 Script executed:

#!/bin/bash # Search for usage of is_expired method on TemporaryCredential fd -e py . | xargs rg -n "\.is_expired\(" --context=3 | grep -A3 -B3 "TemporaryCredential\|temporary_credential\|session_token"

Length of output: 0

🏁 Script executed:

#!/bin/bash # Look for where TemporaryCredential is instantiated and expires_at is set fd -e py . | xargs rg -n "TemporaryCredential\(" --context=2

Length of output: 874

🏁 Script executed:

#!/bin/bash # Check AWS credential manager to see how it populates expires_at fd s3_credential_manager.py | xargs rg -n "cache_session_token\|expires_at" --context=3 | head -50

Length of output: 0

🏁 Script executed:

#!/bin/bash # Look for datetime defaults or initialization in TemporaryCredential fd clp_config.py | xargs rg -n "created_at\|now\(" --context=2

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check if MySQL connection has timezone configuration fd -e py . | xargs rg -n "autocommit=True\|timezone\|convert_tz" components/clp-py-utils/clp_py_utils/

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check imports in s3_credential_manager to see how datetimes are handled fd s3_credential_manager.py | xargs head -50

Length of output: 1437

🏁 Script executed:

#!/bin/bash # Look at actual usage: where is_expired is called fd -e py . | xargs rg -B5 -A5 "\.is_expired\("

Length of output: 3668

🏁 Script executed:

#!/bin/bash # Comprehensive search: is_expired called anywhere on temporary_credential or TemporaryCredential fd -e py . | xargs rg -n "temporary_credential|temp_cred" --context=2 | head -30

Length of output: 2786

🏁 Script executed:

#!/bin/bash # Check if is_expired might be called indirectly or through method chaining fd -e py . | xargs rg -n "get_cached_session_token|TemporaryCredential" --context=3 | grep -A10 "\.is_expired\|check.*expir"

Length of output: 0

Implement timezone-safe expiration check using UTC.

The current is_expired() method compares naive datetimes, which risks errors across hosts in different timezones. While the method is not currently called in the codebase, it should follow the established pattern used elsewhere (e.g., session_manager.py lines 72, 150) to ensure correctness if invoked. Apply the suggested fix to handle naive database values and use datetime.now(timezone.utc) for safe comparison.

def is_expired(self, buffer_minutes: int = 5) -> bool: """ Checks if credential is expired or expiring soon. :param buffer_minutes: Minutes of buffer before expiration to consider credential expired. :return: True if expired or expiring within buffer_minutes. """ - from datetime import timedelta - - return datetime.now() >= self.expires_at - timedelta(minutes=buffer_minutes) + from datetime import timedelta, timezone + now = datetime.now(timezone.utc) + exp = self.expires_at + if exp.tzinfo is None: + # Assume DB stores UTC; attach UTC tzinfo to compare safely. + exp = exp.replace(tzinfo=timezone.utc) + return now >= exp - timedelta(minutes=buffer_minutes)

🧰 Tools

🪛 Ruff (0.14.1)

505-505: datetime.datetime.now() called without a tz argument

(DTZ005)

🤖 Prompt for AI Agents

In components/clp-py-utils/clp_py_utils/clp_config.py around lines 496 to 505, the is_expired method compares naive datetimes; update it to be timezone-safe by using datetime.now(timezone.utc) and ensuring self.expires_at is timezone-aware (if naive, treat it as UTC via replace(tzinfo=timezone.utc) or use an aware conversion pattern like in session_manager.py lines 72/150) before subtracting timedelta(buffer_minutes); perform the comparison with both values in UTC to avoid cross-host timezone issues.

coderabbitai · 2025-10-23T20:23:28Z

components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py

+def _create_aws_credentials_table(db_cursor, aws_credentials_table_name: str) -> None:
+    db_cursor.execute(
+        f"""
+        CREATE TABLE IF NOT EXISTS `{aws_credentials_table_name}` (
+            `id` INT NOT NULL AUTO_INCREMENT,
+            `name` VARCHAR(255) NOT NULL UNIQUE,
+            `access_key_id` VARCHAR(255) NOT NULL,
+            `secret_access_key` VARCHAR(255) NOT NULL,
+            `role_arn` VARCHAR(2048),
+            `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
+            `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+            PRIMARY KEY (`id`),
+            UNIQUE KEY `name_unique` (`name`)
+        ) ROW_FORMAT=DYNAMIC
+        """
+    )


🛠️ Refactor suggestion | 🟠 Major

Remove redundant unique definition on name.

You have both UNIQUE at column level and an explicit UNIQUE KEY. Keep one to avoid duplicate unique indexes and overhead.

CREATE TABLE IF NOT EXISTS `{aws_credentials_table_name}` ( `id` INT NOT NULL AUTO_INCREMENT, - `name` VARCHAR(255) NOT NULL UNIQUE, + `name` VARCHAR(255) NOT NULL UNIQUE, `access_key_id` VARCHAR(255) NOT NULL, `secret_access_key` VARCHAR(255) NOT NULL, `role_arn` VARCHAR(2048), `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), - UNIQUE KEY `name_unique` (`name`) ) ROW_FORMAT=DYNAMIC

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _create_aws_credentials_table(db_cursor, aws_credentials_table_name: str) -> None:

db_cursor.execute(

f"""

CREATE TABLE IF NOT EXISTS `{aws_credentials_table_name}` (

`id` INT NOT NULL AUTO_INCREMENT,

`name` VARCHAR(255) NOT NULL UNIQUE,

`access_key_id` VARCHAR(255) NOT NULL,

`secret_access_key` VARCHAR(255) NOT NULL,

`role_arn` VARCHAR(2048),

`created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,

`updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

PRIMARY KEY (`id`),

UNIQUE KEY `name_unique` (`name`)

) ROW_FORMAT=DYNAMIC

"""

)

def _create_aws_credentials_table(db_cursor, aws_credentials_table_name: str) -> None:

db_cursor.execute(

f"""

CREATE TABLE IF NOT EXISTS `{aws_credentials_table_name}` (

`id` INT NOT NULL AUTO_INCREMENT,

`name` VARCHAR(255) NOT NULL UNIQUE,

`access_key_id` VARCHAR(255) NOT NULL,

`secret_access_key` VARCHAR(255) NOT NULL,

`role_arn` VARCHAR(2048),

`created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,

`updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

PRIMARY KEY (`id`)

) ROW_FORMAT=DYNAMIC

"""

)

🤖 Prompt for AI Agents

In components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py around lines 117 to 132, the CREATE TABLE defines `name` as VARCHAR(255) NOT NULL UNIQUE and also adds a separate UNIQUE KEY `name_unique`, creating redundant unique indexes; remove one of the duplicate constraints — either drop the column-level UNIQUE from `name` or delete the explicit UNIQUE KEY `name_unique` line — so only a single unique constraint remains, then run migrations/DDL to apply the updated table definition.

components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py

coderabbitai · 2025-10-23T20:23:28Z

components/clp-py-utils/clp_py_utils/s3_credential_manager.py

+import os
+import re


🧹 Nitpick | 🔵 Trivial

Tidy imports and typing; modernise signature.

Remove unused os.

Prefer built-in generics.

Annotate __init__ return.

-import os import re from contextlib import closing -from datetime import datetime -from typing import List, Tuple +from datetime import datetime @@ -class S3CredentialManager: +class S3CredentialManager: @@ - def __init__(self, database_config: Database): + def __init__(self, database_config: Database) -> None: @@ - def list_credentials(self) -> List[Tuple[int, str, datetime]]: + def list_credentials(self) -> list[tuple[int, str, datetime]]:

Also applies to: 5-5, 85-91

🤖 Prompt for AI Agents

In components/clp-py-utils/clp_py_utils/s3_credential_manager.py around lines 1-2 (and also applicable at lines 5 and 85-91), remove the unused "os" import, replace any typing.List/typing.Dict/typing.Optional generics with built-in generics (list, dict, Optional from typing if needed) and update the __init__ signature to include an explicit return annotation "-> None"; ensure imports reflect only what's used (e.g., import re and from typing import Optional if required) and adjust any function/type hints on lines 85-91 to use the built-in generic types.

coderabbitai · 2025-10-23T20:23:28Z

components/clp-py-utils/clp_py_utils/s3_credential_manager.py

+        self.sql_adapter = SQL_Adapter(database_config)
+        conn_params = database_config.get_clp_connection_params_and_type()
+        self.table_prefix = conn_params.get("table_prefix", CLP_METADATA_TABLE_PREFIX)
+


🛠️ Refactor suggestion | 🟠 Major

Fail fast if DB creds are missing.

Call ensure_credentials_loaded() to avoid late failures when creating connections.

self.sql_adapter = SQL_Adapter(database_config) - conn_params = database_config.get_clp_connection_params_and_type() + database_config.ensure_credentials_loaded() + conn_params = database_config.get_clp_connection_params_and_type()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.sql_adapter = SQL_Adapter(database_config)

conn_params = database_config.get_clp_connection_params_and_type()

self.table_prefix = conn_params.get("table_prefix", CLP_METADATA_TABLE_PREFIX)

self.sql_adapter = SQL_Adapter(database_config)

database_config.ensure_credentials_loaded()

conn_params = database_config.get_clp_connection_params_and_type()

self.table_prefix = conn_params.get("table_prefix", CLP_METADATA_TABLE_PREFIX)

🤖 Prompt for AI Agents

In components/clp-py-utils/clp_py_utils/s3_credential_manager.py around lines 34 to 37, call ensure_credentials_loaded() on the database_config before attempting to create the SQL_Adapter or read connection params so missing DB credentials raise immediately; specifically, invoke database_config.ensure_credentials_loaded() first, then construct the SQL_Adapter and fetch conn_params/table_prefix as currently done.

coderabbitai · 2025-10-23T20:23:28Z

components/clp-py-utils/clp_py_utils/s3_credential_manager.py

+                f"""
+                INSERT INTO `{table_name}`
+                (name, access_key_id, secret_access_key, role_arn)
+                VALUES (%s, %s, %s, %s)
+                """,
+                (name, access_key_id, secret_access_key, role_arn),
+            )
+            db_conn.commit()


⚠️ Potential issue | 🟠 Major

Plaintext secrets in DB: adopt encryption or external secret storage.

Long‑term keys and session tokens are stored unencrypted. This is a high‑impact risk (exfiltration, backups, replicas). Prefer one of:

Store only references (ARNs) in DB; keep material in AWS Secrets Manager/SSM Parameter Store.

Envelope encryption with KMS; encrypt/decrypt at the edge; store ciphertext + key metadata.

At minimum, enable MySQL TDE or use AES_ENCRYPT with server‑managed key rotation.

Please document threat model and rotation/TTL. I can help sketch an approach.

Also applies to: 328-340

🧰 Tools

🪛 Ruff (0.14.1)

72-76: Possible SQL injection vector through string-based query construction

(S608)

coderabbitai · 2025-10-23T20:23:28Z

components/clp-py-utils/clp_py_utils/s3_credential_manager.py

+            logger.info(f"Created credential '{name}' with ID {credential_id}")
+            return credential_id


🧹 Nitpick | 🔵 Trivial

Use parameterised logging; avoid f-strings.

Leverage lazy formatting for performance and to prevent accidental string interpolation with secrets.

- logger.info(f"Created credential '{name}' with ID {credential_id}") + logger.info("Created credential '%s' with ID %s", name, credential_id) @@ - logger.info(f"Updated credential ID {credential_id}") + logger.info("Updated credential ID %s", credential_id) @@ - if deleted: - logger.info(f"Deleted credential ID {credential_id}") - else: - logger.warning(f"Credential ID {credential_id} not found for deletion") + if deleted: + logger.info("Deleted credential ID %s", credential_id) + else: + logger.warning("Credential ID %s not found for deletion", credential_id) @@ - logger.info( - f"Cached session token for source '{source}' with ID {credential_id}, expires at {expires_at}" - ) + logger.info( + "Cached session token for source '%s' with ID %s, expires at %s", + source, credential_id, expires_at, + ) @@ - logger.info(f"Cleaned up {deleted_count} expired session token(s)") + logger.info("Cleaned up %s expired session token(s)", deleted_count) - else: - logger.debug("No expired session tokens to clean up") + else: + logger.debug("No expired session tokens to clean up")

Also applies to: 259-260, 283-286, 346-347, 423-425

🧰 Tools

🪛 Ruff (0.14.1)

82-82: Logging statement uses f-string

(G004)

🤖 Prompt for AI Agents

In components/clp-py-utils/clp_py_utils/s3_credential_manager.py around lines 82-83 (and similarly at 259-260, 283-286, 346-347, 423-425), replace f-string logging with parameterised logging calls to enable lazy formatting and avoid accidental interpolation of secrets; for each occurrence change calls like logger.info(f"...{var}...") to logger.info("...%s...", var) (or use comma-separated arguments matching placeholders) so values are passed as arguments instead of being formatted into the message string.

…-credentials

…ang/clp into package-s3-credentials

Eden-D-Zhang added 11 commits October 20, 2025 14:54

Add AWS credentials database tables for static and temporary credentials

f14178b

Initialize AWS credential tables in database setup script

08682e0

Add Pydantic models for AWS credentials with validation and conversio…

5e0793e

…n methods

Add S3CredentialManager class for database credential CRUD operations

a395256

Implement session token cache methods for temporary credential manage…

535203c

…ment

Align credential name validation pattern in Pydantic model with manag…

9cac4c3

…er validation

Merge branch 'main' of https://github.com/Eden-D-Zhang/clp into packa…

3906383

…ge-s3-credentials

Use modern type syntax and improve code organization in credential ma…

febfb75

…nager

Merge branch 'main' of https://github.com/y-scope/clp into package-s3…

5344d16

…-credentials

Use clp logger

85ce7f9

Remove unnecessary functions

517c995

Eden-D-Zhang requested a review from a team as a code owner October 23, 2025 19:46

Eden-D-Zhang added 3 commits October 23, 2025 15:49

Lint

b1f70d1

Remove unused import

e7f8853

Remove unused import again

99c1dbf

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

Eden-D-Zhang added 4 commits October 23, 2025 16:52

Coderabbit review

d3808b2

Lint

58f05e0

Merge branch 'main' of https://github.com/y-scope/clp into package-s3…

417d73e

…-credentials

Fix comment

2576b2e

Eden-D-Zhang requested review from gibber9809 and wraymo as code owners October 27, 2025 17:24

Eden-D-Zhang added 3 commits October 27, 2025 14:24

Clean up docstrings.

2391c19

Clean up docstrings.

4fc9a89

Merge branch 'package-s3-credentials' of https://github.com/Eden-D-Zh…

2b3258a

…ang/clp into package-s3-credentials

Eden-D-Zhang added 2 commits October 27, 2025 18:03

Docstrings again

36c9a65

Fix index length

51c92fc

		logger.info(f"Created credential '{name}' with ID {credential_id}")
		return credential_id

feat(clp-py-utils): Add database-backed AWS credential management system. #1486

Are you sure you want to change the base?

feat(clp-py-utils): Add database-backed AWS credential management system. #1486

Conversation

Eden-D-Zhang commented Oct 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes:

Checklist

Validation performed

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

Eden-D-Zhang commented Oct 23, 2025

Uh oh!

coderabbitai bot commented Oct 23, 2025

Uh oh!

Eden-D-Zhang commented Oct 23, 2025

Uh oh!

coderabbitai bot commented Oct 23, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Eden-D-Zhang commented Oct 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 23, 2025 •

edited

Loading