Skip to content

Conversation

@Eden-D-Zhang
Copy link
Contributor

@Eden-D-Zhang Eden-D-Zhang commented Oct 23, 2025

Description

This PR implements the database layer for AWS S3 credential management in the CLP package. Currently, S3 credentials are stored directly in clp-config.yml and job configurations, posing security risks as they can be exposed through the web UI. This change introduces a separate database-backed credential storage system and Pydantic models to represent credential entries.

Key Changes:

  • aws_credentials table: Stores long-term AWS credentials (access_key_id, secret_access_key pair) with optional role ARN for temporary role assumption.
  • aws_temporary_credentials table: Caches temporary credentials, tracking source (eg. role ARN) and expiry.
  • AwsCredential Model: Represents stored long-term credentials
  • TemporaryCredential Model: Represents cached session tokens
  • S3CredentialManager class: CRUD operations for database table interface and management of temporary credentials.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Summary by CodeRabbit

Release Notes

  • New Features
    • Introduced AWS credential management capabilities supporting both long-term and temporary credentials.
    • Added credential validation, caching, and automatic expiry detection with configurable buffer times.
    • Enhanced database initialization to support secure credential storage and retrieval.

@Eden-D-Zhang Eden-D-Zhang requested a review from a team as a code owner October 23, 2025 19:46
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This change adds comprehensive AWS credential management to the CLP Python utilities. It introduces new credential models, database table infrastructure for storing both long-term and temporary credentials, initialization logic for setting up credential tables, and a credential manager utility class that handles CRUD operations, session token caching, and expiry validation.

Changes

Cohort / File(s) Summary
Credential Models
components/clp-py-utils/clp_py_utils/clp_config.py
Added AwsCredential model with fields for ID, name (validated), access key, secret key, optional role ARN, and timestamps. Added TemporaryCredential model with fields for cached session tokens, expiry tracking, and source tracking. Both models provide to_s3_credentials() conversion. TemporaryCredential includes is_expired() method with configurable buffer.
Database Infrastructure
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py
Added AWS_CREDENTIALS_TABLE_SUFFIX and AWS_TEMPORARY_CREDENTIALS_TABLE_SUFFIX constants. Added internal functions _create_aws_credentials_table() and _create_aws_temporary_credentials_table(). Added public wrappers create_aws_credentials_table(), create_aws_temporary_credentials_table(), get_aws_credentials_table_name(), and get_aws_temporary_credentials_table_name(). Updated TABLE_SUFFIX_MAX_LEN to include new suffixes.
Database Initialization
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
Added imports for AWS table creation functions. Integrated calls to create_aws_credentials_table() and create_aws_temporary_credentials_table() into the initialization flow after metadata database cursor establishment.
Credential Manager
components/clp-py-utils/clp_py_utils/s3_credential_manager.py
New class S3CredentialManager for managing AWS credentials. Provides long-term credential CRUD: create_credential(), list_credentials(), get_credential_by_id(), get_credential_by_name(), update_credential(), delete_credential(). Provides temporary credential operations: cache_session_token(), get_cached_session_token_by_source(), cleanup_expired_session_tokens(). Includes private validation methods for credential fields.

Sequence Diagram(s)

sequenceDiagram
    actor Admin
    participant Manager as S3CredentialManager
    participant DB as Metadata Database
    participant Cache as Session Token<br/>Cache

    Admin->>Manager: create_credential(name, keys, role)
    Manager->>Manager: _validate_* inputs
    Manager->>DB: Check name uniqueness
    Manager->>DB: INSERT AwsCredential
    Manager->>Admin: Return credential_id

    Admin->>Manager: cache_session_token(key_id, keys, token, source, expires)
    Manager->>Manager: Validate fields
    Manager->>DB: INSERT TemporaryCredential
    Manager->>Cache: Cache stored
    Manager->>Admin: Return token_id

    Admin->>Manager: get_cached_session_token_by_source(source)
    Manager->>DB: Query recent non-expired tokens
    Manager->>Admin: Return TemporaryCredential (or None)

    Admin->>Manager: cleanup_expired_session_tokens()
    Manager->>DB: DELETE expired tokens
    Manager->>Admin: Return deleted count

    Admin->>Manager: update_credential(id, fields...)
    Manager->>DB: Check name uniqueness
    Manager->>DB: UPDATE AwsCredential
    Manager->>Admin: Return success boolean

    Admin->>Manager: delete_credential(id)
    Manager->>DB: DELETE AwsCredential (CASCADE)
    Manager->>Admin: Return success boolean
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

The changes span multiple files with heterogeneous concerns: new data models with validation rules, database schema and table management functions, initialization integration, and a feature-rich credential manager class with stateful CRUD operations, caching logic, and expiry validation. The credential manager alone requires careful review of input validation, SQL operations, transaction handling, and the correctness of filtering/expiry logic.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat(clp-py-utils): Add database-backed AWS credential management system." is fully aligned with the changeset. The title directly describes the primary purpose of the changes: introducing a database layer for AWS credential management. The changeset includes new Pydantic models (AwsCredential and TemporaryCredential), database table creation utilities, and a new S3CredentialManager class—all of which collectively implement exactly what the title states. The title is specific, concise, and clearly communicates the main change without vague terminology or unnecessary noise.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Eden-D-Zhang
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Eden-D-Zhang
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6abc934 and 99c1dbf.

📒 Files selected for processing (4)
  • components/clp-py-utils/clp_py_utils/clp_config.py (3 hunks)
  • components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (5 hunks)
  • components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (2 hunks)
  • components/clp-py-utils/clp_py_utils/s3_credential_manager.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.1)
components/clp-py-utils/clp_py_utils/clp_config.py

442-442: datetime.datetime.now() called without a tz argument

(DTZ005)


443-443: datetime.datetime.now() called without a tz argument

(DTZ005)


505-505: datetime.datetime.now() called without a tz argument

(DTZ005)

components/clp-py-utils/clp_py_utils/s3_credential_manager.py

5-5: typing.List is deprecated, use list instead

(UP035)


5-5: typing.Tuple is deprecated, use tuple instead

(UP035)


28-28: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


63-63: Boolean positional value in function call

(FBT003)


67-67: Possible SQL injection vector through string-based query construction

(S608)


69-69: Avoid specifying long messages outside the exception class

(TRY003)


72-76: Possible SQL injection vector through string-based query construction

(S608)


82-82: Logging statement uses f-string

(G004)


94-94: Boolean positional value in function call

(FBT003)


98-102: Possible SQL injection vector through string-based query construction

(S608)


117-117: Boolean positional value in function call

(FBT003)


121-126: Possible SQL injection vector through string-based query construction

(S608)


154-154: Boolean positional value in function call

(FBT003)


158-163: Possible SQL injection vector through string-based query construction

(S608)


201-201: Avoid specifying long messages outside the exception class

(TRY003)


206-206: Boolean positional value in function call

(FBT003)


210-210: Possible SQL injection vector through string-based query construction

(S608)


219-219: Possible SQL injection vector through string-based query construction

(S608)


223-223: Avoid specifying long messages outside the exception class

(TRY003)


250-254: Possible SQL injection vector through string-based query construction

(S608)


259-259: Logging statement uses f-string

(G004)


275-275: Boolean positional value in function call

(FBT003)


278-278: Possible SQL injection vector through string-based query construction

(S608)


283-283: Logging statement uses f-string

(G004)


285-285: Logging statement uses f-string

(G004)


313-313: Avoid specifying long messages outside the exception class

(TRY003)


316-316: Avoid specifying long messages outside the exception class

(TRY003)


319-319: Avoid specifying long messages outside the exception class

(TRY003)


324-324: Boolean positional value in function call

(FBT003)


328-332: Possible SQL injection vector through string-based query construction

(S608)


346-346: Logging statement uses f-string

(G004)


366-366: Boolean positional value in function call

(FBT003)


370-375: Possible SQL injection vector through string-based query construction

(S608)


410-410: Boolean positional value in function call

(FBT003)


414-417: Possible SQL injection vector through string-based query construction

(S608)


423-423: Logging statement uses f-string

(G004)


439-439: Avoid specifying long messages outside the exception class

(TRY003)


442-442: Avoid specifying long messages outside the exception class

(TRY003)


445-445: Avoid specifying long messages outside the exception class

(TRY003)


449-451: Avoid specifying long messages outside the exception class

(TRY003)


461-461: Avoid specifying long messages outside the exception class

(TRY003)


471-471: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: package-image
🔇 Additional comments (3)
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (1)

66-67: Good call ordering and idempotent setup.

Creating the credentials table before the temporary-credentials table avoids FK issues; committing once is fine. No further changes needed.

components/clp-py-utils/clp_py_utils/s3_credential_manager.py (2)

60-68: Note on S608 warnings (table names in f-strings).

Table identifiers are built from a constant prefix and fixed suffix and are wrapped in backticks; values are still parameterised. This is acceptable. If you ever allow a user‑supplied prefix, validate against a strict pattern (e.g., r'^[a-zA-Z0-9_]+$').

Also applies to: 91-106, 114-129, 151-170, 203-247, 272-287, 321-341, 363-387, 407-427


350-399: Apply expiry buffer and verify UTC storage before implementation.

The suggestion to add a buffer and compute the cutoff in Python is valid, but requires careful verification:

  1. Type consistency risk: TemporaryCredential.expires_at is a naive datetime. The proposed refactor uses datetime.now(timezone.utc), creating a mismatch between timezone-aware query values and the naive type. Ensure the DB stores/returns expires_at in UTC and that type annotations are consistent.

  2. Verify DB schema: The suggestion requires confirmation that the expires_at column is defined in UTC or has proper timezone configuration. Database migration or schema files were not found in the codebase—manually verify your DB setup.

  3. Review cleanup method: The cleanup_expired_session_tokens() method (line 416) uses the same NOW() pattern and should be refactored consistently for uniformity.

  4. Confirm expires_at source: Determine where cache_session_token() receives the expires_at value (likely from AWS STS) to confirm it's already in UTC before applying the buffer.

The refactor is sound in principle but needs these validations to avoid subtle timezone bugs.

@@ -1,5 +1,6 @@
import os
import pathlib
from datetime import datetime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix tz-naive defaults and import; avoid import-time evaluation.

Use Field(default_factory=...) with UTC to prevent tz-naive values and import-time evaluation.

-from datetime import datetime
+from datetime import datetime, timezone
@@
-    created_at: datetime = datetime.now()
-    updated_at: datetime = datetime.now()
+    created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
+    updated_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))

Also applies to: 442-443

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_config.py around line 3 (and also
lines 442-443), replace any direct datetime import/usages that create tz-naive
values at import time with a timezone-aware default_factory: import datetime and
timezone (from datetime import datetime, timezone) and update Field(default=...)
or module-level datetime.utcnow() calls to Field(default_factory=lambda:
datetime.now(timezone.utc)) so defaults are created at runtime and are
UTC-aware; ensure you import Field from pydantic (or the correct config
dataclass provider) if not already.

Comment on lines 496 to 505
def is_expired(self, buffer_minutes: int = 5) -> bool:
"""
Checks if credential is expired or expiring soon.
:param buffer_minutes: Minutes of buffer before expiration to consider credential expired.
:return: True if expired or expiring within buffer_minutes.
"""
from datetime import timedelta

return datetime.now() >= self.expires_at - timedelta(minutes=buffer_minutes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Make is_expired time-safe (UTC, tz-aware).

Comparing tz-naive datetimes risks errors and drift across hosts. Normalise to UTC and handle naive DB values.

-    def is_expired(self, buffer_minutes: int = 5) -> bool:
+    def is_expired(self, buffer_minutes: int = 5) -> bool:
         """
         Checks if credential is expired or expiring soon.
@@
-        from datetime import timedelta
-
-        return datetime.now() >= self.expires_at - timedelta(minutes=buffer_minutes)
+        from datetime import timedelta, timezone
+        now = datetime.now(timezone.utc)
+        exp = self.expires_at
+        if exp.tzinfo is None:
+            # Assume DB stores UTC; attach UTC tzinfo to compare safely.
+            exp = exp.replace(tzinfo=timezone.utc)
+        return now >= exp - timedelta(minutes=buffer_minutes)

If DB values are not UTC, document and convert at read time.


I'll verify this review comment by examining the codebase to understand the current implementation and the timezone handling.


🏁 Script executed:

#!/bin/bash
# Find and examine the class definition containing is_expired
fd -e py . | xargs rg -n "def is_expired|expires_at" --context=5 components/clp-py-utils/clp_py_utils/clp_config.py

Length of output: 17989


🏁 Script executed:

#!/bin/bash
# Look at the full class structure around line 496
fd clp_config.py | xargs cat | head -550 | tail -100

Length of output: 3120


🏁 Script executed:

#!/bin/bash
# Search for expires_at field definition and initialization
fd -e py . | xargs rg -n "expires_at\s*=" --context=3 | head -40

Length of output: 740


🏁 Script executed:

#!/bin/bash
# Check timezone imports and UTC usage patterns in the file
fd clp_config.py | xargs rg -n "import.*timezone|from.*timezone|timezone\.utc|datetime\.now" 

Length of output: 180


🏁 Script executed:

#!/bin/bash
# Search for other datetime comparisons that might have similar issues
fd -e py . | xargs rg -n "datetime\.now\(\)" --context=2 | head -30

Length of output: 3344


🏁 Script executed:

#!/bin/bash
# Check if there are DB interactions or datetime conversions
fd -e py . | xargs rg -n "tzinfo|utc|UTC" components/clp-py-utils/clp_py_utils/ | head -20

Length of output: 1970


🏁 Script executed:

#!/bin/bash
# Look for any tests or usages of is_expired method
fd -e py . | xargs rg -n "is_expired" | head -20

Length of output: 751


🏁 Script executed:

#!/bin/bash
# Examine the class structure and field definitions
fd clp_config.py | xargs ast-grep --pattern $'class $_ {
  $$$
  expires_at
  $$$
}'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Get the broader context around the is_expired method
fd clp_config.py | xargs python3 << 'EOF'
with open('components/clp-py-utils/clp_py_utils/clp_config.py', 'r') as f:
    lines = f.readlines()
    # Look for is_expired and surrounding context
    for i, line in enumerate(lines):
        if 'is_expired' in line or 'expires_at' in line:
            start = max(0, i - 10)
            end = min(len(lines), i + 15)
            print(f"Lines {start+1}-{end}:")
            for j in range(start, end):
                print(f"{j+1:4d}: {lines[j]}", end='')
            print("\n" + "="*80 + "\n")
EOF

Length of output: 87


🏁 Script executed:

#!/bin/bash
# Check the imports at the top of the file
fd clp_config.py | xargs head -30

Length of output: 667


🏁 Script executed:

#!/bin/bash
# Look at how datetime is imported and used
fd clp_config.py | xargs rg -n "^from datetime|^import datetime" -A 5

Length of output: 172


🏁 Script executed:

#!/bin/bash
# Check type hints for expires_at
fd clp_config.py | xargs rg -n "expires_at.*:" --context=2

Length of output: 146


🏁 Script executed:

#!/bin/bash
# Search for DB schema or ORM model definitions
fd -e py . | xargs rg -n "expires_at" --context=4 components/

Length of output: 22398


🏁 Script executed:

#!/bin/bash
# Check if datetime.now() returns naive or if tzinfo handling exists
fd clp_config.py | xargs python3 << 'EOF'
import re
with open('components/clp-py-utils/clp_py_utils/clp_config.py', 'r') as f:
    content = f.read()
    # Find the class and is_expired method
    if 'def is_expired' in content:
        idx = content.find('def is_expired')
        print("is_expired method found at character index:", idx)
        print("\nContext (500 chars before and after):")
        print(content[max(0, idx-500):idx+500])
        print("\n" + "="*80)
    # Check for timezone usage
    if 'timezone' in content:
        print("timezone import/usage detected")
    if 'utc' in content.lower():
        print("UTC references detected")
EOF

Length of output: 89



🏁 Script executed:

#!/bin/bash
# Search for usage of is_expired method on TemporaryCredential
fd -e py . | xargs rg -n "\.is_expired\(" --context=3 | grep -A3 -B3 "TemporaryCredential\|temporary_credential\|session_token"

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for where TemporaryCredential is instantiated and expires_at is set
fd -e py . | xargs rg -n "TemporaryCredential\(" --context=2

Length of output: 874


🏁 Script executed:

#!/bin/bash
# Check AWS credential manager to see how it populates expires_at
fd s3_credential_manager.py | xargs rg -n "cache_session_token\|expires_at" --context=3 | head -50

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for datetime defaults or initialization in TemporaryCredential
fd clp_config.py | xargs rg -n "created_at\|now\(" --context=2

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if MySQL connection has timezone configuration
fd -e py . | xargs rg -n "autocommit=True\|timezone\|convert_tz" components/clp-py-utils/clp_py_utils/

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check imports in s3_credential_manager to see how datetimes are handled
fd s3_credential_manager.py | xargs head -50

Length of output: 1437


🏁 Script executed:

#!/bin/bash
# Look at actual usage: where is_expired is called
fd -e py . | xargs rg -B5 -A5 "\.is_expired\(" 

Length of output: 3668



🏁 Script executed:

#!/bin/bash
# Comprehensive search: is_expired called anywhere on temporary_credential or TemporaryCredential
fd -e py . | xargs rg -n "temporary_credential|temp_cred" --context=2 | head -30

Length of output: 2786


🏁 Script executed:

#!/bin/bash
# Check if is_expired might be called indirectly or through method chaining
fd -e py . | xargs rg -n "get_cached_session_token|TemporaryCredential" --context=3 | grep -A10 "\.is_expired\|check.*expir"

Length of output: 0


Implement timezone-safe expiration check using UTC.

The current is_expired() method compares naive datetimes, which risks errors across hosts in different timezones. While the method is not currently called in the codebase, it should follow the established pattern used elsewhere (e.g., session_manager.py lines 72, 150) to ensure correctness if invoked. Apply the suggested fix to handle naive database values and use datetime.now(timezone.utc) for safe comparison.

    def is_expired(self, buffer_minutes: int = 5) -> bool:
        """
        Checks if credential is expired or expiring soon.

        :param buffer_minutes: Minutes of buffer before expiration to consider credential expired.
        :return: True if expired or expiring within buffer_minutes.
        """
-        from datetime import timedelta
-
-        return datetime.now() >= self.expires_at - timedelta(minutes=buffer_minutes)
+        from datetime import timedelta, timezone
+        now = datetime.now(timezone.utc)
+        exp = self.expires_at
+        if exp.tzinfo is None:
+            # Assume DB stores UTC; attach UTC tzinfo to compare safely.
+            exp = exp.replace(tzinfo=timezone.utc)
+        return now >= exp - timedelta(minutes=buffer_minutes)
🧰 Tools
🪛 Ruff (0.14.1)

505-505: datetime.datetime.now() called without a tz argument

(DTZ005)

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_config.py around lines 496 to 505,
the is_expired method compares naive datetimes; update it to be timezone-safe by
using datetime.now(timezone.utc) and ensuring self.expires_at is timezone-aware
(if naive, treat it as UTC via replace(tzinfo=timezone.utc) or use an aware
conversion pattern like in session_manager.py lines 72/150) before subtracting
timedelta(buffer_minutes); perform the comparison with both values in UTC to
avoid cross-host timezone issues.

Comment on lines 117 to 132
def _create_aws_credentials_table(db_cursor, aws_credentials_table_name: str) -> None:
db_cursor.execute(
f"""
CREATE TABLE IF NOT EXISTS `{aws_credentials_table_name}` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(255) NOT NULL UNIQUE,
`access_key_id` VARCHAR(255) NOT NULL,
`secret_access_key` VARCHAR(255) NOT NULL,
`role_arn` VARCHAR(2048),
`created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `name_unique` (`name`)
) ROW_FORMAT=DYNAMIC
"""
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove redundant unique definition on name.

You have both UNIQUE at column level and an explicit UNIQUE KEY. Keep one to avoid duplicate unique indexes and overhead.

         CREATE TABLE IF NOT EXISTS `{aws_credentials_table_name}` (
             `id` INT NOT NULL AUTO_INCREMENT,
-            `name` VARCHAR(255) NOT NULL UNIQUE,
+            `name` VARCHAR(255) NOT NULL UNIQUE,
             `access_key_id` VARCHAR(255) NOT NULL,
             `secret_access_key` VARCHAR(255) NOT NULL,
             `role_arn` VARCHAR(2048),
             `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
             `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
             PRIMARY KEY (`id`),
-            UNIQUE KEY `name_unique` (`name`)
         ) ROW_FORMAT=DYNAMIC
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _create_aws_credentials_table(db_cursor, aws_credentials_table_name: str) -> None:
db_cursor.execute(
f"""
CREATE TABLE IF NOT EXISTS `{aws_credentials_table_name}` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(255) NOT NULL UNIQUE,
`access_key_id` VARCHAR(255) NOT NULL,
`secret_access_key` VARCHAR(255) NOT NULL,
`role_arn` VARCHAR(2048),
`created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `name_unique` (`name`)
) ROW_FORMAT=DYNAMIC
"""
)
def _create_aws_credentials_table(db_cursor, aws_credentials_table_name: str) -> None:
db_cursor.execute(
f"""
CREATE TABLE IF NOT EXISTS `{aws_credentials_table_name}` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(255) NOT NULL UNIQUE,
`access_key_id` VARCHAR(255) NOT NULL,
`secret_access_key` VARCHAR(255) NOT NULL,
`role_arn` VARCHAR(2048),
`created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ROW_FORMAT=DYNAMIC
"""
)
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py around lines
117 to 132, the CREATE TABLE defines `name` as VARCHAR(255) NOT NULL UNIQUE and
also adds a separate UNIQUE KEY `name_unique`, creating redundant unique
indexes; remove one of the duplicate constraints — either drop the column-level
UNIQUE from `name` or delete the explicit UNIQUE KEY `name_unique` line — so
only a single unique constraint remains, then run migrations/DDL to apply the
updated table definition.

Comment on lines 1 to 2
import os
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Tidy imports and typing; modernise signature.

  • Remove unused os.
  • Prefer built-in generics.
  • Annotate __init__ return.
-import os
 import re
 from contextlib import closing
-from datetime import datetime
-from typing import List, Tuple
+from datetime import datetime
@@
-class S3CredentialManager:
+class S3CredentialManager:
@@
-    def __init__(self, database_config: Database):
+    def __init__(self, database_config: Database) -> None:
@@
-    def list_credentials(self) -> List[Tuple[int, str, datetime]]:
+    def list_credentials(self) -> list[tuple[int, str, datetime]]:

Also applies to: 5-5, 85-91

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/s3_credential_manager.py around lines
1-2 (and also applicable at lines 5 and 85-91), remove the unused "os" import,
replace any typing.List/typing.Dict/typing.Optional generics with built-in
generics (list, dict, Optional from typing if needed) and update the __init__
signature to include an explicit return annotation "-> None"; ensure imports
reflect only what's used (e.g., import re and from typing import Optional if
required) and adjust any function/type hints on lines 85-91 to use the built-in
generic types.

Comment on lines 34 to 37
self.sql_adapter = SQL_Adapter(database_config)
conn_params = database_config.get_clp_connection_params_and_type()
self.table_prefix = conn_params.get("table_prefix", CLP_METADATA_TABLE_PREFIX)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Fail fast if DB creds are missing.

Call ensure_credentials_loaded() to avoid late failures when creating connections.

         self.sql_adapter = SQL_Adapter(database_config)
-        conn_params = database_config.get_clp_connection_params_and_type()
+        database_config.ensure_credentials_loaded()
+        conn_params = database_config.get_clp_connection_params_and_type()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.sql_adapter = SQL_Adapter(database_config)
conn_params = database_config.get_clp_connection_params_and_type()
self.table_prefix = conn_params.get("table_prefix", CLP_METADATA_TABLE_PREFIX)
self.sql_adapter = SQL_Adapter(database_config)
database_config.ensure_credentials_loaded()
conn_params = database_config.get_clp_connection_params_and_type()
self.table_prefix = conn_params.get("table_prefix", CLP_METADATA_TABLE_PREFIX)
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/s3_credential_manager.py around lines 34
to 37, call ensure_credentials_loaded() on the database_config before attempting
to create the SQL_Adapter or read connection params so missing DB credentials
raise immediately; specifically, invoke
database_config.ensure_credentials_loaded() first, then construct the
SQL_Adapter and fetch conn_params/table_prefix as currently done.

Comment on lines +72 to +79
f"""
INSERT INTO `{table_name}`
(name, access_key_id, secret_access_key, role_arn)
VALUES (%s, %s, %s, %s)
""",
(name, access_key_id, secret_access_key, role_arn),
)
db_conn.commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Plaintext secrets in DB: adopt encryption or external secret storage.

Long‑term keys and session tokens are stored unencrypted. This is a high‑impact risk (exfiltration, backups, replicas). Prefer one of:

  • Store only references (ARNs) in DB; keep material in AWS Secrets Manager/SSM Parameter Store.
  • Envelope encryption with KMS; encrypt/decrypt at the edge; store ciphertext + key metadata.
  • At minimum, enable MySQL TDE or use AES_ENCRYPT with server‑managed key rotation.

Please document threat model and rotation/TTL. I can help sketch an approach.

Also applies to: 328-340

🧰 Tools
🪛 Ruff (0.14.1)

72-76: Possible SQL injection vector through string-based query construction

(S608)

Comment on lines 82 to 83
logger.info(f"Created credential '{name}' with ID {credential_id}")
return credential_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Use parameterised logging; avoid f-strings.

Leverage lazy formatting for performance and to prevent accidental string interpolation with secrets.

-            logger.info(f"Created credential '{name}' with ID {credential_id}")
+            logger.info("Created credential '%s' with ID %s", name, credential_id)
@@
-            logger.info(f"Updated credential ID {credential_id}")
+            logger.info("Updated credential ID %s", credential_id)
@@
-            if deleted:
-                logger.info(f"Deleted credential ID {credential_id}")
-            else:
-                logger.warning(f"Credential ID {credential_id} not found for deletion")
+            if deleted:
+                logger.info("Deleted credential ID %s", credential_id)
+            else:
+                logger.warning("Credential ID %s not found for deletion", credential_id)
@@
-            logger.info(
-                f"Cached session token for source '{source}' with ID {credential_id}, expires at {expires_at}"
-            )
+            logger.info(
+                "Cached session token for source '%s' with ID %s, expires at %s",
+                source, credential_id, expires_at,
+            )
@@
-                logger.info(f"Cleaned up {deleted_count} expired session token(s)")
+                logger.info("Cleaned up %s expired session token(s)", deleted_count)
-            else:
-                logger.debug("No expired session tokens to clean up")
+            else:
+                logger.debug("No expired session tokens to clean up")

Also applies to: 259-260, 283-286, 346-347, 423-425

🧰 Tools
🪛 Ruff (0.14.1)

82-82: Logging statement uses f-string

(G004)

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/s3_credential_manager.py around lines
82-83 (and similarly at 259-260, 283-286, 346-347, 423-425), replace f-string
logging with parameterised logging calls to enable lazy formatting and avoid
accidental interpolation of secrets; for each occurrence change calls like
logger.info(f"...{var}...") to logger.info("...%s...", var) (or use
comma-separated arguments matching placeholders) so values are passed as
arguments instead of being formatted into the message string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant