Skip to content

Conversation

@s-prosvirnin
Copy link
Member

@s-prosvirnin s-prosvirnin commented Dec 29, 2025

Summary by CodeRabbit

  • New Features
    • Added a new job manager library with S3-backed storage, multi-worker orchestration, task lifecycle, retries, deadlines, caching, metrics, and graceful shutdown controls.
  • Tests
    • Added extensive integration and unit tests exercising concurrency, caching, deadline expiry, retries, dynamic tasks, iterations, and shutdown behavior.
  • Chores
    • Workspace updated with a new member, stricter lints, dependency version bumps, expanded Makefile targets, and .gitignore additions.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 29, 2025

Walkthrough

Adds a new workspace crate icegate-jobmanager implementing job/task models, workers, S3-backed storage (with caching and retry), observability, examples, and many integration tests; also tightens workspace lints, bumps some dependency versions, and updates workspace Makefile/clippy targets.

Changes

Cohort / File(s) Summary
Workspace & build files
\.gitignore`, `Cargo.toml`, `Makefile``
Added ignore patterns (.tmp, *.iml); added icegate-jobmanager to workspace members; tightened clippy/lint rules and bumped dependency versions; Makefile clippy targets now run with --workspace.
New crate manifest & Makefile
icegate-jobmanager
\crates/icegate-jobmanager/Cargo.toml`, `crates/icegate-jobmanager/Makefile``
New crate manifest with many dependencies (tokio, aws-sdk-s3, serde, tracing, thiserror, opentelemetry, uuid, etc.) and a Makefile exposing targets (examples infra up/down, test, clean, clippy).
Core domain modules
\crates/icegate-jobmanager/src/core/*``
Added core modules: error, task, job, registry. Public types/aliases (Error/Result, TaskCode/TaskStatus/TaskDefinition/ImmutableTask, JobCode/JobStatus/JobDefinition, TaskExecutorFn, JobRegistry) and internal job/task lifecycle, state transitions, merge logic, and registry lookup implemented.
Execution surface
\crates/icegate-jobmanager/src/execution/*``
New JobManager trait and JobManagerImpl, JobsManager (start/shutdown/handle), and Worker with WorkerConfig; worker loop, task selection, executor invocation, persistence and conflict-resolution logic added.
Storage abstraction & impls
\crates/icegate-jobmanager/src/storage/*``
New Storage trait, StorageError, JobMeta, JobDefinitionRegistry; added CachedStorage (DashMap-based cache with invalidation/metrics) and S3Storage (AWS SDK-backed, bucket provisioning, ETag/versioned reads/writes, atomic semantics, serialization, retrier).
Infrastructure utilities
\crates/icegate-jobmanager/src/infra/*``
Added Metrics (OpenTelemetry-backed) and Retrier/RetrierConfig with backoff/jitter, cancellation support, and metrics hooks.
Library exports
\crates/icegate-jobmanager/src/lib.rs``
Crate-root re-exports and compatibility modules for error, registry, and s3_storage; public exports for Job/Task/Registry/Manager/Worker/Metrics/Retrier and storage wrappers.
Examples & infra
\crates/icegate-jobmanager/examples/docker-compose.yml`, `crates/icegate-jobmanager/examples/*.rs``
Added docker-compose.yml (MinIO + mc) and example binaries demonstrating simple, sequential, and JSON-model workflows against S3-compatible storage.
Cached storage & helpers
\crates/icegate-jobmanager/src/storage/cached.rs`, `crates/icegate-jobmanager/src/storage/mod.rs``
New caching wrapper CachedStorage implementing Storage with per-job mutexed cache entries, cache-hit/miss logic, update-on-save, and conflict handling.
S3 storage implementation
\crates/icegate-jobmanager/src/storage/s3.rs`, `crates/icegate-jobmanager/src/tests/common/minio_env.rs``, ...
Full S3-backed storage implementation, serialization helpers, atomic write paths, error mapping, and tests/helpers for MinIO-based integration.
Tests & test helpers
\crates/icegate-jobmanager/src/tests/`, `crates/icegate-jobmanager/src/tests/common/``
Many new integration tests and helpers: InMemoryStorage, CountingStorage, ManagerEnv, MinIOEnv, cache invalidation, concurrency, deadline expiry, dynamic tasks, iterations, shutdown, simple job flows, and related test scaffolding.

Sequence Diagram(s)

sequenceDiagram
    participant Worker
    participant Storage
    participant JobRegistry
    participant TaskExecutor
    participant Metrics

    Worker->>Storage: find_job_meta(job_code)
    Storage-->>Worker: JobMeta
    alt cache miss / stale
      Worker->>Storage: get_job_by_meta(meta)
      Storage-->>Worker: Job
    else cache hit
      Worker-->>Worker: use cached Job
    end

    Worker->>Worker: pick_task_to_execute()
    Worker->>Worker: start_task(task, worker_id)
    Worker->>JobRegistry: get_task_executor(job_code, task_code)
    JobRegistry-->>Worker: TaskExecutorFn
    Worker->>TaskExecutor: executor(immutable_task, job_manager, cancel_token)
    TaskExecutor-->>Worker: Result<(), Error>

    alt Success
      Worker->>Worker: complete_task(task_id, output)
      Worker->>Metrics: record_task_processed(Completed)
    else Failure
      Worker->>Worker: fail_task(task_id, error_msg)
      Worker->>Metrics: record_task_processed(Failed)
    end

    Worker->>Storage: save_job(updated_job, cancel_token)
    alt Conflict (ConcurrentModification)
      Storage-->>Worker: ConcurrentModification
      Worker->>Storage: get_job_by_meta(updated_meta)
      Storage-->>Worker: Job
      Worker->>Worker: merge_with_processed_task()
      Worker->>Storage: save_job(merged_job)
    else Saved
      Storage-->>Worker: OK
      Worker->>Metrics: record_job_iteration_complete()
    end
Loading
sequenceDiagram
    participant JobsManager
    participant Worker1
    participant Worker2
    participant Cache
    participant Storage

    JobsManager->>JobsManager: start()
    JobsManager->>Worker1: spawn(...)
    JobsManager->>Worker2: spawn(...)

    par Worker1 loop
      Worker1->>Cache: get_job(job_code)
      alt Cache miss
        Worker1->>Storage: get_job_by_meta(meta)
        Storage-->>Worker1: Job
        Worker1->>Cache: update_cache(job)
      else Cache hit
        Cache-->>Worker1: job
      end
      Worker1->>Worker1: execute_task()
      Worker1->>Storage: save_job()
    and Worker2 loop
      Worker2->>Cache: get_job(job_code)
      alt Cache stale / conflict observed
        Worker2->>Storage: get_job_by_meta(meta)
        Storage-->>Worker2: Job
        Worker2->>Cache: invalidate & update
      else Cache valid
        Cache-->>Worker2: job
      end
      Worker2->>Worker2: execute_task()
      Worker2->>Storage: save_job()
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.37% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'GH-28: Add Job Manager' directly reflects the main objective of this pull request: introducing a comprehensive new job manager crate (icegate-jobmanager) with core job/task management, storage backends, worker orchestration, and integration tests.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@s-prosvirnin s-prosvirnin marked this pull request as ready for review December 31, 2025 10:05
@s-prosvirnin s-prosvirnin requested review from a team and frisbeeman December 31, 2025 10:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (18)
crates/icegate-jobmanager/examples/docker-compose.yml (1)

18-18: Consider pinning the minio/mc image version.

The MinIO server image is pinned to a specific release for reproducibility, but minio/mc:latest may drift over time. Consider pinning to a compatible release for consistent behavior across development environments.

crates/icegate-jobmanager/src/infra/retrier.rs (1)

90-105: Potential panic if delays is empty.

While Default provides a non-empty delays vector, if a user constructs RetrierConfig with an empty delays, line 94's unwrap_or would return the fallback, but line 92's index access could still be an issue if attempt is 0 and delays is empty (though this wouldn't happen since attempt starts at 1 after increment).

However, consider adding validation in RetrierConfig construction or documenting that delays must not be empty.

crates/icegate-jobmanager/src/tests/shutdown_test.rs (1)

25-30: Consider renaming shadowed variable for clarity.

Line 30 shadows started_tx from line 26. While this works correctly, using a distinct name (e.g., started_tx_clone) would improve readability and make the ownership flow clearer.

🔎 Suggested rename
     let (started_tx, started_rx) = oneshot::channel();
     let started_tx = Arc::new(Mutex::new(Some(started_tx)));
     let cancelled = Arc::new(AtomicBool::new(false));

     let cancelled_flag = Arc::clone(&cancelled);
-    let started_tx = Arc::clone(&started_tx);
+    let started_tx_clone = Arc::clone(&started_tx);
     let executor: TaskExecutorFn = Arc::new(move |task, _manager, cancel_token| {
         let cancelled_flag = Arc::clone(&cancelled_flag);
-        let started_tx = Arc::clone(&started_tx);
+        let started_tx = Arc::clone(&started_tx_clone);
crates/icegate-jobmanager/src/tests/dynamic_task_test.rs (1)

28-28: Consider using usize for dynamic_task_count.

The variable is used as a loop bound and for length comparisons (tasks.len()), which return usize. Using usize would eliminate the need for conversions at lines 125 and 141.

🔎 Suggested change
-    let dynamic_task_count = 5;
+    let dynamic_task_count: usize = 5;

Then adjust the assertion:

     assert_eq!(
         dynamic_tasks_executed.load(Ordering::SeqCst),
-        i32::try_from(dynamic_task_count)?,
+        dynamic_task_count as i32,
         "all dynamic tasks should be executed"
     );
crates/icegate-jobmanager/src/tests/common/minio_env.rs (1)

40-48: TCP readiness check may be insufficient.

The TCP connection check confirms the port is open but doesn't guarantee MinIO's S3 API is fully initialized. Consider adding an HTTP health check to /minio/health/live for more reliable readiness detection, especially under slow CI environments.

That said, the current approach works in practice since WaitFor::seconds(1) provides initial delay and subsequent S3 operations will retry on failure.

crates/icegate-jobmanager/examples/json_model_job.rs (1)

107-112: Remove leftover comments.

Lines 107 and 112 contain cleanup notes that should be removed before merging.

🔎 Proposed fix
     let job_registry = Arc::new(JobRegistry::new(vec![job_def.clone()])?);
-    // retrier was unused.
     let s3_storage = Arc::new(S3Storage::new(s3_config, job_registry.clone(), Metrics::new_disabled()).await?);

     let cached_storage = Arc::new(CachedStorage::new(s3_storage, Metrics::new_disabled()));

-    // job_code was unused.
-
     // 2. Start Manager
crates/icegate-jobmanager/src/infra/metrics.rs (1)

77-108: Consider using Display instead of Debug for status labels.

Lines 85 and 105 use format!("{status:?}") which produces Debug output (e.g., Started vs a lowercase started). Since JobStatus and TaskStatus implement Display (per the relevant code snippets), using format!("{status}") would produce more consistent, human-readable metric labels.

🔎 Proposed fix
         self.job_duration.record(
             duration.as_secs_f64(),
             &[
                 KeyValue::new("code", code.to_string()),
-                KeyValue::new("status", format!("{status:?}")),
+                KeyValue::new("status", status.to_string()),
             ],
         );
         self.task_duration.record(
             duration.as_secs_f64(),
             &[
                 KeyValue::new("job_code", job_code.to_string()),
                 KeyValue::new("task_code", task_code.to_string()),
-                KeyValue::new("status", format!("{status:?}")),
+                KeyValue::new("status", status.to_string()),
             ],
         );
crates/icegate-jobmanager/src/tests/deadline_expiry_test.rs (1)

43-46: Timing-sensitive test logic.

The test relies on specific timing (500ms sleep vs 100ms deadline) to trigger deadline expiry. While the margin is reasonable (5x), consider adding a comment explaining this relationship for maintainability.

crates/icegate-jobmanager/src/storage/cached.rs (1)

11-16: Acknowledged: TODO for cache eviction.

The TODO on line 15 about TTL or LRU cache is important for production use to prevent unbounded memory growth. Consider creating an issue to track this.

Would you like me to open an issue to track implementing TTL or LRU eviction for the cache?

crates/icegate-jobmanager/src/tests/common/manager_env.rs (1)

36-80: CancellationToken is local-only and cannot be externally triggered.

The cancel_token created at line 38 is only used to pass to get_job() calls but is never exposed for external cancellation. If you want to support cancelling the wait from outside (e.g., on test failure), consider accepting a CancellationToken parameter or checking manager_handle's cancellation status.

For a test helper, this is likely acceptable, but be aware that the only way to break out of this loop early is via timeout.

crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs (1)

190-194: Consider clarifying the PUT count formula in the comment.

The formula (((secondary_task_count + 1) * 2) + 1) is correct (1 job creation + 2 PUTs per task), but the comment "1 PUT for create job, 2 PUT for each task" could explicitly mention this includes both task creation and completion PUTs. Minor clarity improvement.

crates/icegate-jobmanager/src/lib.rs (1)

1-3: Temporary lint allows noted.

The #![allow(missing_docs)], #![allow(dead_code)], and #![allow(clippy::redundant_pub_crate)] are reasonable for early development. Consider adding a TODO comment to track removal of these allows once the crate stabilizes.

crates/icegate-jobmanager/src/core/registry.rs (1)

80-82: Task executor key format could theoretically collide.

The key format "{job_code}:{task_code}" could have collisions if job or task codes contain the : delimiter. For example, job "a:b" with task "c" produces the same key as job "a" with task "b:c".

Consider using a delimiter unlikely to appear in codes, or validate that codes don't contain the delimiter.

🔎 Alternative delimiter approach
     fn task_executor_key(job_code: &JobCode, task_code: &TaskCode) -> String {
-        format!("{job_code}:{task_code}")
+        format!("{job_code}\0{task_code}")
     }

Or validate codes during registration to reject : characters.

crates/icegate-jobmanager/src/execution/worker.rs (1)

462-484: Inconsistent handling of failed task save result.

When a task fails (line 462), the result of save_processed_task is partially discarded with _ = .... The ? operator at line 484 propagates errors, but the returned Job is discarded. In contrast, the success path (lines 494-514) uses the returned job for further processing.

This asymmetry may be intentional (failed task state doesn't need further processing), but consider adding a brief comment explaining why the returned job is unused here.

crates/icegate-jobmanager/src/core/task.rs (2)

249-255: Good defensive logic in can_be_picked_up, but note the TODO.

The method correctly allows picking up Todo, Failed, or expired Started tasks. The TODO at line 251 about limiting retry attempts is important for preventing infinite retry loops on consistently failing tasks.

Would you like me to open an issue to track implementing the attempt limit for task retries?


142-173: Consider a builder pattern for restore.

The restore function has 13 parameters, which is acknowledged with #[allow(clippy::too_many_arguments)]. For better maintainability, consider a builder pattern or a TaskSnapshot struct that captures persisted state.

crates/icegate-jobmanager/src/core/job.rs (1)

246-261: The next_iteration method's self-replacement pattern is unusual but functional.

The approach of calling Self::new(...) and then overwriting self works, but it's slightly confusing because id and iter_num are immediately overwritten after construction. Consider extracting the initialization logic into a helper or using a more explicit approach.

🔎 Alternative approach
     pub(crate) fn next_iteration(
         &mut self,
         tasks: Vec<Task>,
         worker_id: String,
         max_iterations: u64,
     ) -> Result<(), JobError> {
         if !self.is_ready_to_next_iteration() {
             return Err(JobError::Other("job is not ready to next iteration".into()));
         }

         self.status.transition_to(JobStatus::Started)?;

-        let old_id = self.id.clone();
-        let old_iter_num = self.iter_num;
-        let old_metadata = self.metadata.clone();
-
-        *self = Self::new(self.code.clone(), tasks, old_metadata, worker_id, max_iterations);
-        self.id = old_id;
-        // TODO(low): ...
-        self.iter_num = old_iter_num + 1;
-        self.started_at = Utc::now();
+        // Reset task state for new iteration
+        self.tasks_by_id.clear();
+        for task in tasks {
+            self.tasks_by_id.insert(task.id().to_string(), Arc::new(task));
+        }
+        self.iter_num += 1;
+        self.updated_by_worker_id = worker_id;
+        self.started_at = Utc::now();
+        self.running_at = None;
+        self.completed_at = None;
+        self.max_iterations = max_iterations;

         Ok(())
     }
crates/icegate-jobmanager/Cargo.toml (1)

44-49: Consider workspace-level dependency definitions for potential reuse.

The direct dependencies (futures-util, parking_lot, async-trait, dashmap) are reasonable choices for job orchestration. If these utilities are expected to be used by other workspace crates in the future, consider moving them to workspace dependencies for version consistency.

For now, crate-specific definitions are acceptable.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 238f3ab and c1e9706.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (40)
  • .gitignore
  • Cargo.toml
  • Makefile
  • crates/icegate-jobmanager/Cargo.toml
  • crates/icegate-jobmanager/Makefile
  • crates/icegate-jobmanager/examples/docker-compose.yml
  • crates/icegate-jobmanager/examples/json_model_job.rs
  • crates/icegate-jobmanager/examples/simple_job.rs
  • crates/icegate-jobmanager/examples/simple_sequence_job.rs
  • crates/icegate-jobmanager/src/core/error.rs
  • crates/icegate-jobmanager/src/core/job.rs
  • crates/icegate-jobmanager/src/core/mod.rs
  • crates/icegate-jobmanager/src/core/registry.rs
  • crates/icegate-jobmanager/src/core/task.rs
  • crates/icegate-jobmanager/src/execution/job_manager.rs
  • crates/icegate-jobmanager/src/execution/jobs_manager.rs
  • crates/icegate-jobmanager/src/execution/mod.rs
  • crates/icegate-jobmanager/src/execution/worker.rs
  • crates/icegate-jobmanager/src/infra/metrics.rs
  • crates/icegate-jobmanager/src/infra/mod.rs
  • crates/icegate-jobmanager/src/infra/retrier.rs
  • crates/icegate-jobmanager/src/lib.rs
  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/src/storage/mod.rs
  • crates/icegate-jobmanager/src/storage/s3.rs
  • crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs
  • crates/icegate-jobmanager/src/tests/common/in_memory_storage.rs
  • crates/icegate-jobmanager/src/tests/common/manager_env.rs
  • crates/icegate-jobmanager/src/tests/common/minio_env.rs
  • crates/icegate-jobmanager/src/tests/common/mod.rs
  • crates/icegate-jobmanager/src/tests/common/storage_wrapper.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
  • crates/icegate-jobmanager/src/tests/deadline_expiry_test.rs
  • crates/icegate-jobmanager/src/tests/dynamic_task_test.rs
  • crates/icegate-jobmanager/src/tests/job_iterations_test.rs
  • crates/icegate-jobmanager/src/tests/mod.rs
  • crates/icegate-jobmanager/src/tests/shutdown_test.rs
  • crates/icegate-jobmanager/src/tests/simple_job_test.rs
  • crates/icegate-jobmanager/src/tests/task_failure_test.rs
  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs
  • crates/icegate-jobmanager/src/infra/mod.rs
  • crates/icegate-jobmanager/src/core/mod.rs
  • crates/icegate-jobmanager/src/tests/mod.rs
  • crates/icegate-jobmanager/src/tests/shutdown_test.rs
  • crates/icegate-jobmanager/src/tests/task_failure_test.rs
  • crates/icegate-jobmanager/examples/simple_job.rs
  • crates/icegate-jobmanager/src/tests/common/mod.rs
  • crates/icegate-jobmanager/src/tests/dynamic_task_test.rs
  • crates/icegate-jobmanager/src/storage/mod.rs
  • crates/icegate-jobmanager/src/tests/common/manager_env.rs
  • crates/icegate-jobmanager/src/tests/simple_job_test.rs
  • crates/icegate-jobmanager/src/tests/deadline_expiry_test.rs
  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/examples/json_model_job.rs
  • crates/icegate-jobmanager/src/tests/common/minio_env.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
  • crates/icegate-jobmanager/Cargo.toml
  • crates/icegate-jobmanager/examples/simple_sequence_job.rs
  • crates/icegate-jobmanager/src/execution/mod.rs
  • crates/icegate-jobmanager/src/tests/common/in_memory_storage.rs
  • crates/icegate-jobmanager/src/execution/jobs_manager.rs
  • crates/icegate-jobmanager/src/infra/metrics.rs
  • crates/icegate-jobmanager/src/execution/job_manager.rs
  • crates/icegate-jobmanager/src/core/registry.rs
  • crates/icegate-jobmanager/src/tests/job_iterations_test.rs
  • Cargo.toml
  • crates/icegate-jobmanager/src/core/job.rs
  • crates/icegate-jobmanager/src/tests/common/storage_wrapper.rs
  • crates/icegate-jobmanager/src/execution/worker.rs
  • crates/icegate-jobmanager/src/core/error.rs
  • crates/icegate-jobmanager/src/core/task.rs
  • crates/icegate-jobmanager/src/lib.rs
  • crates/icegate-jobmanager/src/storage/s3.rs
  • crates/icegate-jobmanager/src/infra/retrier.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs
  • crates/icegate-jobmanager/src/infra/mod.rs
  • crates/icegate-jobmanager/src/core/mod.rs
  • crates/icegate-jobmanager/src/tests/mod.rs
  • crates/icegate-jobmanager/src/tests/shutdown_test.rs
  • crates/icegate-jobmanager/src/tests/task_failure_test.rs
  • crates/icegate-jobmanager/examples/simple_job.rs
  • crates/icegate-jobmanager/src/tests/common/mod.rs
  • crates/icegate-jobmanager/src/tests/dynamic_task_test.rs
  • crates/icegate-jobmanager/src/storage/mod.rs
  • crates/icegate-jobmanager/src/tests/common/manager_env.rs
  • crates/icegate-jobmanager/src/tests/simple_job_test.rs
  • crates/icegate-jobmanager/src/tests/deadline_expiry_test.rs
  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/examples/json_model_job.rs
  • crates/icegate-jobmanager/src/tests/common/minio_env.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
  • crates/icegate-jobmanager/examples/simple_sequence_job.rs
  • crates/icegate-jobmanager/src/execution/mod.rs
  • crates/icegate-jobmanager/src/tests/common/in_memory_storage.rs
  • crates/icegate-jobmanager/src/execution/jobs_manager.rs
  • crates/icegate-jobmanager/src/infra/metrics.rs
  • crates/icegate-jobmanager/src/execution/job_manager.rs
  • crates/icegate-jobmanager/src/core/registry.rs
  • crates/icegate-jobmanager/src/tests/job_iterations_test.rs
  • crates/icegate-jobmanager/src/core/job.rs
  • crates/icegate-jobmanager/src/tests/common/storage_wrapper.rs
  • crates/icegate-jobmanager/src/execution/worker.rs
  • crates/icegate-jobmanager/src/core/error.rs
  • crates/icegate-jobmanager/src/core/task.rs
  • crates/icegate-jobmanager/src/lib.rs
  • crates/icegate-jobmanager/src/storage/s3.rs
  • crates/icegate-jobmanager/src/infra/retrier.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs
  • crates/icegate-jobmanager/src/infra/mod.rs
  • crates/icegate-jobmanager/src/core/mod.rs
  • crates/icegate-jobmanager/src/tests/mod.rs
  • crates/icegate-jobmanager/src/tests/shutdown_test.rs
  • crates/icegate-jobmanager/src/tests/task_failure_test.rs
  • crates/icegate-jobmanager/examples/simple_job.rs
  • crates/icegate-jobmanager/src/tests/common/mod.rs
  • crates/icegate-jobmanager/src/tests/dynamic_task_test.rs
  • crates/icegate-jobmanager/src/storage/mod.rs
  • crates/icegate-jobmanager/src/tests/common/manager_env.rs
  • crates/icegate-jobmanager/src/tests/simple_job_test.rs
  • crates/icegate-jobmanager/src/tests/deadline_expiry_test.rs
  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/examples/json_model_job.rs
  • crates/icegate-jobmanager/src/tests/common/minio_env.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
  • crates/icegate-jobmanager/Cargo.toml
  • crates/icegate-jobmanager/examples/simple_sequence_job.rs
  • crates/icegate-jobmanager/src/execution/mod.rs
  • crates/icegate-jobmanager/src/tests/common/in_memory_storage.rs
  • crates/icegate-jobmanager/src/execution/jobs_manager.rs
  • crates/icegate-jobmanager/src/infra/metrics.rs
  • crates/icegate-jobmanager/src/execution/job_manager.rs
  • crates/icegate-jobmanager/src/core/registry.rs
  • crates/icegate-jobmanager/src/tests/job_iterations_test.rs
  • Cargo.toml
  • crates/icegate-jobmanager/src/core/job.rs
  • crates/icegate-jobmanager/src/tests/common/storage_wrapper.rs
  • crates/icegate-jobmanager/src/execution/worker.rs
  • crates/icegate-jobmanager/src/core/error.rs
  • crates/icegate-jobmanager/src/core/task.rs
  • crates/icegate-jobmanager/src/lib.rs
  • crates/icegate-jobmanager/src/storage/s3.rs
  • crates/icegate-jobmanager/src/infra/retrier.rs
Cargo.toml

📄 CodeRabbit inference engine (AGENTS.md)

Configure strict clippy and rustc lints: forbid unsafe_code, deny missing_docs and dead_code, and enable clippy pedantic/nursery

Files:

  • Cargo.toml
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Organize code in Cargo workspace with 4 crates: icegate-common, icegate-query, icegate-ingest, and icegate-maintain
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Organize code in Cargo workspace with 4 crates: icegate-common, icegate-query, icegate-ingest, and icegate-maintain

Applied to files:

  • crates/icegate-jobmanager/src/core/mod.rs
  • crates/icegate-jobmanager/src/tests/mod.rs
  • crates/icegate-jobmanager/Makefile
  • crates/icegate-jobmanager/src/tests/common/mod.rs
  • crates/icegate-jobmanager/src/storage/mod.rs
  • crates/icegate-jobmanager/examples/json_model_job.rs
  • crates/icegate-jobmanager/Cargo.toml
  • crates/icegate-jobmanager/src/execution/mod.rs
  • crates/icegate-jobmanager/src/execution/jobs_manager.rs
  • Cargo.toml
  • crates/icegate-jobmanager/src/execution/worker.rs
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use `make clippy` to run the linter with warnings as errors

Applied to files:

  • Makefile
  • Cargo.toml
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Run `make ci` to execute all CI checks (check, fmt, clippy, test, audit)

Applied to files:

  • Makefile
  • crates/icegate-jobmanager/Makefile
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Run `make audit` to perform security audits and use `make install` to install cargo-audit

Applied to files:

  • Makefile
  • crates/icegate-jobmanager/Makefile
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use `make fmt` to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml

Applied to files:

  • Makefile
  • Cargo.toml
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to Cargo.toml : Configure strict clippy and rustc lints: forbid unsafe_code, deny missing_docs and dead_code, and enable clippy pedantic/nursery

Applied to files:

  • Makefile
  • Cargo.toml
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to clippy.toml : Set clippy thresholds in clippy.toml: cognitive-complexity=30, too-many-arguments=8, too-many-lines=150

Applied to files:

  • Makefile
  • Cargo.toml
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.{rs,toml} : Use `cargo build` for debug builds, `cargo build --release` for release builds, and specific binary builds with `cargo build --bin <name>`

Applied to files:

  • Makefile
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use rustfmt for code formatting with configuration in rustfmt.toml

Applied to files:

  • Makefile
  • Cargo.toml
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Run all tests with `cargo test`, specific tests with `cargo test test_name`, and use `--nocapture` flag to show test output

Applied to files:

  • Makefile
  • crates/icegate-jobmanager/Makefile
🧬 Code graph analysis (17)
crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs (3)
crates/icegate-jobmanager/src/infra/metrics.rs (2)
  • new (36-75)
  • new_disabled (22-34)
crates/icegate-jobmanager/src/storage/cached.rs (1)
  • new (26-32)
crates/icegate-jobmanager/src/tests/common/in_memory_storage.rs (4)
  • new (16-23)
  • version (25-27)
  • find_meta_calls (29-31)
  • get_by_meta_calls (33-35)
crates/icegate-jobmanager/src/tests/task_failure_test.rs (3)
crates/icegate-jobmanager/src/storage/s3.rs (1)
  • new (99-162)
crates/icegate-jobmanager/src/tests/common/manager_env.rs (2)
  • new (16-33)
  • storage (90-92)
crates/icegate-jobmanager/src/tests/common/minio_env.rs (1)
  • new (20-56)
crates/icegate-jobmanager/examples/simple_job.rs (3)
crates/icegate-jobmanager/src/core/job.rs (4)
  • new (15-17)
  • new (103-131)
  • new (168-195)
  • task_executors (141-143)
crates/icegate-jobmanager/src/core/task.rs (3)
  • new (13-15)
  • new (71-76)
  • new (124-140)
crates/icegate-jobmanager/src/storage/s3.rs (1)
  • new (99-162)
crates/icegate-jobmanager/src/tests/dynamic_task_test.rs (5)
crates/icegate-jobmanager/src/core/job.rs (3)
  • new (15-17)
  • new (103-131)
  • new (168-195)
crates/icegate-jobmanager/src/core/task.rs (3)
  • new (13-15)
  • new (71-76)
  • new (124-140)
crates/icegate-jobmanager/src/storage/s3.rs (1)
  • new (99-162)
crates/icegate-jobmanager/src/tests/common/manager_env.rs (2)
  • new (16-33)
  • storage (90-92)
crates/icegate-jobmanager/src/tests/common/minio_env.rs (1)
  • new (20-56)
crates/icegate-jobmanager/src/storage/mod.rs (2)
crates/icegate-jobmanager/src/core/error.rs (2)
  • cancelled (85-87)
  • max_attempts (89-91)
crates/icegate-jobmanager/src/infra/retrier.rs (2)
  • cancelled (44-44)
  • max_attempts (45-45)
crates/icegate-jobmanager/src/tests/common/manager_env.rs (6)
crates/icegate-jobmanager/src/execution/worker.rs (2)
  • new (98-115)
  • start (121-155)
crates/icegate-jobmanager/src/core/job.rs (3)
  • new (15-17)
  • new (103-131)
  • new (168-195)
crates/icegate-jobmanager/src/execution/job_manager.rs (1)
  • new (26-28)
crates/icegate-jobmanager/src/execution/jobs_manager.rs (2)
  • new (78-90)
  • start (93-124)
crates/icegate-jobmanager/src/infra/metrics.rs (2)
  • new (36-75)
  • new_disabled (22-34)
crates/icegate-jobmanager/src/tests/common/minio_env.rs (1)
  • new (20-56)
crates/icegate-jobmanager/src/storage/cached.rs (4)
crates/icegate-jobmanager/src/core/job.rs (9)
  • new (15-17)
  • new (103-131)
  • new (168-195)
  • code (133-135)
  • code (330-332)
  • id (326-328)
  • iter_num (334-336)
  • version (342-344)
  • tasks_as_string (505-524)
crates/icegate-jobmanager/src/infra/metrics.rs (3)
  • new (36-75)
  • record_cache_hit (123-128)
  • record_cache_miss (130-135)
crates/icegate-jobmanager/src/storage/s3.rs (5)
  • new (99-162)
  • get_job (399-426)
  • get_job_by_meta (428-466)
  • find_job_meta (468-508)
  • save_job (511-583)
crates/icegate-jobmanager/src/storage/mod.rs (5)
  • get_job (87-87)
  • get_job (102-102)
  • get_job_by_meta (90-90)
  • find_job_meta (93-93)
  • save_job (97-97)
crates/icegate-jobmanager/examples/json_model_job.rs (9)
crates/icegate-jobmanager/src/core/job.rs (5)
  • fmt (25-27)
  • fmt (55-62)
  • new (15-17)
  • new (103-131)
  • new (168-195)
crates/icegate-jobmanager/src/core/task.rs (7)
  • fmt (23-25)
  • fmt (51-58)
  • new (13-15)
  • new (71-76)
  • new (124-140)
  • input (83-85)
  • input (216-218)
crates/icegate-jobmanager/src/execution/jobs_manager.rs (2)
  • default (16-21)
  • new (78-90)
crates/icegate-jobmanager/src/infra/retrier.rs (2)
  • default (19-36)
  • new (49-51)
crates/icegate-jobmanager/src/infra/metrics.rs (2)
  • new (36-75)
  • new_disabled (22-34)
crates/icegate-jobmanager/src/storage/cached.rs (1)
  • new (26-32)
crates/icegate-jobmanager/src/storage/s3.rs (1)
  • new (99-162)
crates/icegate-jobmanager/src/tests/common/in_memory_storage.rs (1)
  • new (16-23)
crates/icegate-jobmanager/src/tests/common/manager_env.rs (1)
  • new (16-33)
crates/icegate-jobmanager/src/tests/two_jobs_test.rs (6)
crates/icegate-jobmanager/src/core/job.rs (6)
  • max_iterations (145-147)
  • new (15-17)
  • new (103-131)
  • new (168-195)
  • from (31-33)
  • from (37-39)
crates/icegate-jobmanager/src/execution/worker.rs (2)
  • new (98-115)
  • default (33-40)
crates/icegate-jobmanager/src/infra/metrics.rs (2)
  • new (36-75)
  • new_disabled (22-34)
crates/icegate-jobmanager/src/storage/s3.rs (1)
  • new (99-162)
crates/icegate-jobmanager/src/tests/common/manager_env.rs (2)
  • new (16-33)
  • storage (90-92)
crates/icegate-jobmanager/src/core/registry.rs (1)
  • new (29-58)
crates/icegate-jobmanager/src/tests/common/in_memory_storage.rs (1)
crates/icegate-jobmanager/src/storage/mod.rs (5)
  • get_job (87-87)
  • get_job (102-102)
  • get_job_by_meta (90-90)
  • find_job_meta (93-93)
  • save_job (97-97)
crates/icegate-jobmanager/src/execution/jobs_manager.rs (3)
crates/icegate-jobmanager/src/execution/worker.rs (2)
  • new (98-115)
  • start (121-155)
crates/icegate-jobmanager/src/infra/metrics.rs (1)
  • new (36-75)
crates/icegate-jobmanager/src/core/registry.rs (1)
  • new (29-58)
crates/icegate-jobmanager/src/infra/metrics.rs (3)
crates/icegate-jobmanager/src/core/job.rs (5)
  • new (15-17)
  • new (103-131)
  • new (168-195)
  • code (133-135)
  • code (330-332)
crates/icegate-jobmanager/src/core/task.rs (7)
  • new (13-15)
  • new (71-76)
  • new (124-140)
  • code (78-80)
  • code (95-95)
  • code (180-182)
  • code (315-317)
crates/icegate-jobmanager/src/storage/cached.rs (3)
  • new (26-32)
  • record_cache_hit (46-48)
  • record_cache_miss (50-52)
crates/icegate-jobmanager/src/execution/job_manager.rs (2)
crates/icegate-jobmanager/src/core/job.rs (10)
  • add_task (263-275)
  • complete_task (287-292)
  • fail_task (294-299)
  • get_task (492-495)
  • get_tasks_by_code (497-503)
  • code (133-135)
  • code (330-332)
  • new (15-17)
  • new (103-131)
  • new (168-195)
crates/icegate-jobmanager/src/core/task.rs (9)
  • output (220-222)
  • error_msg (224-226)
  • code (78-80)
  • code (95-95)
  • code (180-182)
  • code (315-317)
  • new (13-15)
  • new (71-76)
  • new (124-140)
crates/icegate-jobmanager/src/core/job.rs (3)
crates/icegate-jobmanager/src/core/task.rs (18)
  • new (13-15)
  • new (71-76)
  • new (124-140)
  • code (78-80)
  • code (95-95)
  • code (180-182)
  • code (315-317)
  • as_str (17-19)
  • from (29-31)
  • from (35-37)
  • id (94-94)
  • id (176-178)
  • id (311-313)
  • status (184-186)
  • started_at (204-206)
  • completed_at (208-210)
  • output (220-222)
  • error_msg (224-226)
crates/icegate-jobmanager/src/core/registry.rs (1)
  • new (29-58)
crates/icegate-jobmanager/src/core/error.rs (3)
  • from (60-62)
  • from (66-71)
  • from (75-81)
crates/icegate-jobmanager/src/core/error.rs (2)
crates/icegate-jobmanager/src/infra/retrier.rs (2)
  • cancelled (44-44)
  • max_attempts (45-45)
crates/icegate-jobmanager/src/storage/mod.rs (2)
  • cancelled (63-65)
  • max_attempts (67-69)
crates/icegate-jobmanager/src/core/task.rs (1)
crates/icegate-jobmanager/src/core/error.rs (3)
  • from (60-62)
  • from (66-71)
  • from (75-81)
crates/icegate-jobmanager/src/infra/retrier.rs (2)
crates/icegate-jobmanager/src/core/error.rs (2)
  • cancelled (85-87)
  • max_attempts (89-91)
crates/icegate-jobmanager/src/storage/mod.rs (2)
  • cancelled (63-65)
  • max_attempts (67-69)
🪛 checkmake (0.2.2)
crates/icegate-jobmanager/Makefile

[warning] 1-1: Missing required phony target "all"

(minphony)


[warning] 1-1: Missing required phony target "clean"

(minphony)


[warning] 1-1: Missing required phony target "test"

(minphony)

🔇 Additional comments (64)
.gitignore (1)

5-6: LGTM!

The additions appropriately ignore temporary development data (.tmp for MinIO storage) and IDE-specific files (*.iml for IntelliJ IDEA).

crates/icegate-jobmanager/examples/simple_job.rs (1)

1-101: Well-structured example demonstrating the job manager workflow.

The example clearly illustrates the complete lifecycle: task definition → executor mapping → job definition → registry → storage → manager → graceful shutdown. The hardcoded MinIO credentials are appropriate for a local development example.

crates/icegate-jobmanager/src/infra/retrier.rs (2)

53-88: Well-designed retry mechanism with proper cancellation support.

The implementation correctly handles the retry loop with:

  • Cancellation checks before and during sleep
  • Proper attempt counting and max attempts enforcement
  • Clean separation between retryable results and hard errors

102-102: The rand API usage is compatible. The code at line 102 correctly uses rand::rng().random_range(0..max_jitter), which is the proper API for rand 0.9.2 as specified in the workspace Cargo.toml.

crates/icegate-jobmanager/src/infra/mod.rs (1)

1-2: LGTM!

Clean module structure exposing the infrastructure components.

crates/icegate-jobmanager/src/execution/mod.rs (1)

1-3: LGTM!

Clear module organization for the execution layer with well-named components. Note: This PR introduces icegate-jobmanager as a new crate beyond the original 4-crate workspace structure mentioned in learnings, which appears to be an intentional expansion.

crates/icegate-jobmanager/src/tests/common/mod.rs (1)

1-6: LGTM!

Well-organized test utilities with focused modules for different testing concerns: in-memory storage, manager lifecycle, MinIO environment, and storage instrumentation.

crates/icegate-jobmanager/src/tests/mod.rs (1)

1-14: Good test organization with comprehensive coverage.

The decision to place integration tests in src/ to access pub(crate) types is well-documented and justified. The test modules cover a good range of scenarios including concurrency, lifecycle, failure handling, and edge cases.

crates/icegate-jobmanager/examples/docker-compose.yml (1)

16-16: Fix the volume path typo.

The path contains a double slash (.//.tmp/minio_data) which appears to be a typo. While Docker may resolve this correctly, it should be cleaned up for clarity.

🔎 Proposed fix
-      - ./.tmp/minio_data:/data
+      - ./.tmp/minio_data:/data

Note: If the original has .//.tmp, change to ./.tmp.

Likely an incorrect or invalid review comment.

crates/icegate-jobmanager/src/tests/task_failure_test.rs (1)

1-115: LGTM!

The test correctly validates the retry mechanism for flaky tasks. The atomic counter properly tracks attempts across async boundaries with SeqCst ordering, the executor logic correctly fails on the first attempt and succeeds on subsequent attempts, and the assertions verify both the retry count and final job state.

crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs (1)

1-63: LGTM!

The test effectively validates cache invalidation behavior by tracking storage call counts. The sequence of operations correctly demonstrates:

  1. Cache miss on first access
  2. Cache hit when data is unchanged
  3. Cache invalidation when underlying storage is modified directly

The assertions on find_meta_calls and get_by_meta_calls counters properly verify that the cache avoids redundant S3 calls when versions match.

crates/icegate-jobmanager/src/tests/shutdown_test.rs (1)

22-83: LGTM!

The test correctly validates that shutdown cancellation propagates to running executors. The use of tokio::select! to race cancellation against timeout is idiomatic, and the AtomicBool flag properly captures whether cancellation was observed.

crates/icegate-jobmanager/src/core/error.rs (1)

1-92: LGTM!

The error hierarchy is well-structured with clean separation between public (Error) and internal (InternalError, JobError) types. The From implementations correctly propagate errors while preserving the Cancelled variant for control flow, and the RetryError trait implementation enables the retry infrastructure.

crates/icegate-jobmanager/src/core/mod.rs (1)

1-4: LGTM!

The module structure cleanly organizes the core domain logic into logical submodules. Note: Based on learnings, the original workspace plan mentioned 4 crates (icegate-common, icegate-query, icegate-ingest, icegate-maintain), and this PR adds icegate-jobmanager as an additional crate, which appears to be an intentional expansion.

crates/icegate-jobmanager/examples/simple_sequence_job.rs (2)

1-144: Well-structured example demonstrating sequential job workflow.

The example effectively demonstrates:

  • Multi-step job definition with dynamic task creation
  • Flaky task simulation for retry testing
  • S3 storage configuration
  • Graceful shutdown handling via Ctrl+C

The code is well-commented and serves as a good reference for users of the library.


46-46: No issues found with the rand API usage.

The rand::rng().random_bool(0.3) call is valid. The random_bool method exists on the Rng trait and accepts a probability parameter. The code correctly simulates a 30% failure chance.

crates/icegate-jobmanager/src/tests/dynamic_task_test.rs (1)

21-146: LGTM!

The test effectively validates dynamic task creation by having an initial task spawn multiple dynamic tasks at runtime. The assertions properly verify that all tasks executed and the job completed successfully.

crates/icegate-jobmanager/src/tests/two_jobs_test.rs (1)

21-185: LGTM!

Comprehensive integration test for concurrent job execution. The test correctly:

  • Runs two distinct jobs with separate execution counters
  • Uses payload-based routing in a shared executor
  • Validates execution counts match expected iterations × tasks
  • Verifies final job states, iteration counts, and zero timeouts

The multi-worker configuration with 10 tokio threads properly exercises concurrency.

crates/icegate-jobmanager/src/tests/job_iterations_test.rs (1)

21-113: LGTM! Well-structured iteration test.

The test correctly validates the job iteration mechanism by:

  • Tracking iterations with an atomic counter
  • Completing tasks to trigger automatic restarts
  • Verifying both the iteration count and final job state

The use of SeqCst ordering is appropriate for test verification where strong consistency is needed.

crates/icegate-jobmanager/src/tests/common/minio_env.rs (1)

58-74: LGTM!

Clean accessor methods and appropriate documentation. The automatic container cleanup via Drop is correctly noted.

crates/icegate-jobmanager/examples/json_model_job.rs (1)

21-140: Good example demonstrating JSON workflow.

The example effectively showcases:

  • Task data serialization/deserialization with serde
  • Two-step sequential task execution
  • Storage layer composition (S3 + cache)
  • Manager lifecycle management

The unique job code using UUID (line 100) is a good pattern for examples that may be run multiple times.

crates/icegate-jobmanager/src/tests/common/storage_wrapper.rs (2)

55-61: Inconsistent counting for get_job_by_meta.

get_job increments list_and_get_successes on success, but get_job_by_meta does not. If this is intentional (perhaps to avoid double-counting when get_job internally calls get_job_by_meta), consider adding a comment. Otherwise, add the counter increment for consistency.


45-92: LGTM! Useful test instrumentation.

The CountingStorage wrapper provides good visibility into storage operations for test assertions. The logging with job code, iteration, and version is helpful for debugging concurrency scenarios.

crates/icegate-jobmanager/src/infra/metrics.rs (1)

21-34: LGTM! Clean no-op metrics pattern.

The new_disabled() constructor creates valid instrument handles without exporting. The enabled flag ensures no overhead during recording operations.

crates/icegate-jobmanager/src/tests/deadline_expiry_test.rs (1)

21-129: LGTM! Good coverage of deadline expiry behavior.

The test effectively validates that:

  • Tasks exceeding their deadline are re-picked by other workers
  • The attempt counter tracks multiple executions
  • Final job state reflects successful completion after retries

The multi-thread runtime and multiple workers ensure realistic concurrency.

crates/icegate-jobmanager/src/tests/simple_job_test.rs (2)

21-121: LGTM! Clean basic execution test.

The test properly validates:

  • Task execution triggers the executor
  • Input data flows correctly to the task
  • Job completes with expected state

Using parking_lot::Mutex for the captured input is appropriate for the simple synchronization needs.


123-236: LGTM! Good test of dynamic task creation.

The multi-task sequence test demonstrates:

  • Dynamic task injection via manager.add_task()
  • Data flow between tasks through task input
  • Sequential task execution ordering

The assertion inside the async block (line 163) will cause a panic on failure, which is acceptable for tests.

crates/icegate-jobmanager/src/storage/cached.rs (2)

34-44: Cache update logic is correct.

The update_cache_if_newer function correctly handles the "equal or newer iteration" case, which allows cache updates for version changes within the same iteration.


55-105: Well-structured caching implementation.

The Storage implementation correctly:

  • Validates cache against latest metadata before serving
  • Holds the cache lock during fetch to prevent races
  • Invalidates cache on conflicts
  • Records metrics for cache hits/misses

The use of Arc<Mutex<CachedJob>> per entry allows concurrent access to different jobs while serializing access to the same job.

Also applies to: 107-148, 150-190

crates/icegate-jobmanager/src/tests/common/manager_env.rs (1)

96-101: LGTM!

The Drop implementation correctly ensures the manager is aborted if the test doesn't explicitly call stop(), preventing resource leaks in test scenarios.

crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs (2)

53-73: Primary executor closure captures executed_primary_tasks_counter correctly.

The executor properly clones the Arc before the async block and tracks execution. The pattern of creating secondary tasks and then completing is well-structured.


37-50: LGTM!

The test setup properly initializes tracking structures and the MinIO environment. Using DashMap for concurrent task execution tracking is appropriate given the multi-threaded test context.

crates/icegate-jobmanager/src/execution/jobs_manager.rs (2)

37-66: LGTM!

The JobsManagerHandle provides clean lifecycle management with graceful shutdown() and forced abort() paths. The wait() method properly drains the JoinSet and handles panics appropriately.


68-72: LGTM!

The Drop implementation ensures workers are aborted if the handle is dropped without explicit shutdown, preventing resource leaks.

crates/icegate-jobmanager/src/lib.rs (1)

21-44: LGTM!

The re-export structure is well-organized with clear separation between public API (pub use) and crate-internal types (pub(crate) use). The compatibility modules provide clean namespacing for related types.

crates/icegate-jobmanager/src/storage/mod.rs (2)

48-60: LGTM!

The StorageError helper methods (is_retryable, is_conflict, is_cancelled) provide clean APIs for error classification. The is_retryable() correctly identifies transient errors that should trigger retry logic.


82-103: LGTM!

The Storage and JobDefinitionRegistry traits are well-designed. All async methods properly accept CancellationToken for cooperative cancellation, enabling graceful shutdown behavior.

crates/icegate-jobmanager/src/execution/job_manager.rs (2)

31-62: LGTM!

The JobManagerImpl correctly uses write() locks for mutating operations (add_task, complete_task, fail_task) and read() locks for read-only operations (get_task, get_tasks_by_code). The error mapping to Error::Other provides a simplified interface for executors.


10-17: LGTM!

The JobManager trait provides a clean, thread-safe interface (Send + Sync) for task executors. The method signatures are consistent and appropriate for the use case.

crates/icegate-jobmanager/src/tests/common/in_memory_storage.rs (2)

80-88: save_job doesn't implement optimistic locking.

Unlike what the Storage trait doc suggests ("Save job with optimistic locking"), this implementation always succeeds without checking if the stored job's version matches the incoming job's version. This may be intentional for a simplified test helper, but could mask concurrency issues in tests that rely on version conflict detection.

If this is intentional for simplicity, consider adding a brief comment noting it's a simplified implementation.


38-78: LGTM!

The Storage implementation correctly handles cancellation checks first in all methods and properly implements atomic counter increments for tracking method calls. The version matching logic in get_job_by_meta is correct.

crates/icegate-jobmanager/src/core/registry.rs (2)

29-58: LGTM!

The JobRegistry::new() constructor has comprehensive validation: non-empty jobs list, non-empty job codes, and duplicate detection. The error messages are descriptive.


7-19: LGTM!

The TaskExecutorFn type alias is well-designed with appropriate lifetime handling via HRTB (for<'a>). The doc comments clearly explain the design decisions: borrowing JobManager prevents moving it into background tasks, and CancellationToken enables early shutdown.

crates/icegate-jobmanager/src/execution/worker.rs (1)

538-596: The save_job_state method is well-structured with proper retry semantics.

The retry logic correctly handles:

  • Job ownership transfer via Arc<Mutex<Option<Job>>>
  • Conflict detection with merge handler callback
  • Lock scope minimization (not held across await points)

The pattern of taking ownership before await and restoring after is a good approach for the retry loop.

crates/icegate-jobmanager/src/core/job.rs (3)

65-89: Well-designed status transition validation.

The can_transition_to method provides clear, explicit transition rules with proper error reporting. The transition graph is intuitive:

  • Started → Running | Failed
  • Running → Running | Completed | Failed
  • Completed/Failed → Started (for next iteration)

102-131: JobDefinition validation is thorough.

Good defensive validation ensuring:

  • Non-empty initial tasks
  • Non-empty executors
  • Every initial task has a corresponding executor

477-481: Boundary condition is correct and intentional.

The condition iter_num >= max_iterations with iter_num starting at 1 means max_iterations directly specifies the iteration count. With max_iterations=1, the job runs exactly 1 iteration (checked after completion, preventing a second iteration). This is confirmed by the test case in job_iterations_test.rs, which expects exactly 3 iterations with max_iterations=3 and verifies iter_num==3 at completion.

crates/icegate-jobmanager/src/storage/s3.rs (3)

206-217: Clever use of inverted iteration number for S3 listing optimization.

Using u64::MAX - iter_num ensures that the most recent iteration always appears first in lexicographic order, making the max_keys(1) LIST operation efficient. Well-documented with the comment explaining the rationale.


510-583: The save_job implementation handles both new and existing iterations correctly.

The retry logic properly distinguishes between retryable errors and conflicts, and the ETag is correctly propagated back to the job for subsequent operations.


337-393: Atomic write semantics correctly implemented with conditional requests.

The code properly uses:

  • if_none_match("*") for new iterations (create-if-not-exists)
  • if_match(version) for current iteration updates (compare-and-swap)

This provides the optimistic concurrency control needed for distributed workers. The TODO at lines 369–370 about multipart uploads and ETag is a valid concern: S3 multipart ETags are composite values (MD5 of concatenated part MD5s with "-" suffix) and differ from single-part ETags. If the SDK auto-enables multipart for larger objects, the ETag format could change between writes, potentially affecting conditional request matching. Worth investigating whether multipart should be explicitly disabled to ensure ETag stability.

crates/icegate-jobmanager/src/core/task.rs (1)

245-247: This is valid Rust code and compiles successfully.

The is_processed method is correctly marked as const fn and can call other const fn methods (is_completed() and is_failed()) through self. Modern Rust (1.57+) fully supports const fn method calls in const contexts, so the compilation concern raised is unfounded.

Likely an incorrect or invalid review comment.

Cargo.toml (6)

9-9: LGTM! New workspace member added correctly.

The icegate-jobmanager crate is properly added to the workspace members, aligning with the PR objective to introduce job orchestration functionality.


23-24: LGTM! Stricter Rust lints improve code quality.

Setting unused_imports and unused_variables to deny helps maintain a cleaner codebase by preventing dead code accumulation.


28-45: LGTM! Significantly stricter clippy configuration enforces best practices.

The upgraded lint levels (pedantic, nursery, perf, cargo to deny) and style lints (dbg_macro, todo, unimplemented, print_stdout, print_stderr to deny) align with the coding guidelines for strict linting. The single_match_else = "allow" is a pragmatic exception to avoid overly verbose code.

Note that these changes apply workspace-wide and require all existing crates to be compliant.


128-128: No action required. Cargo.toml already ends with a newline character and complies with the coding guidelines.


61-61: UUID version 1.19 with features v7, v4, and serde is valid and supported by the crate.


126-127: Both AWS SDK versions exist and are compatible. The specifications aws-config = "1.1" and aws-sdk-s3 = "1.11" are valid semver constraints that resolve to available releases (aws-config 1.1.0–1.1.10+ and aws-sdk-s3 1.11.0+) and are compatible with each other.

crates/icegate-jobmanager/Cargo.toml (3)

1-14: LGTM! Package metadata properly configured.

The package metadata correctly uses workspace inheritance for version, edition, authors, license, and repository. The readme path pointing to the root README and workspace lint inheritance ensure consistency across the workspace.


16-50: LGTM! Workspace dependencies correctly utilized.

The crate properly leverages workspace dependencies for error handling, logging, async runtime, AWS S3, serialization, and utilities, ensuring version consistency across the workspace.


53-55: Dev dependencies properly configured.

The testcontainers dependency correctly uses workspace inheritance with the blocking feature, appropriate for synchronous test setup. File ends with a newline as required.

Makefile (2)

22-22: LGTM! Workspace-wide clippy ensures comprehensive linting.

The addition of --workspace to the clippy command ensures that all workspace members, including the new icegate-jobmanager crate, are linted consistently with the strict lint configuration.


25-25: LGTM! Consistent workspace-wide clippy fix capability.

The --workspace flag ensures automatic fixes can be applied across all workspace members, maintaining consistency with the clippy check target.

crates/icegate-jobmanager/Makefile (2)

3-7: LGTM! Example infrastructure targets properly configured.

The targets provide convenient commands to spin up and tear down example infrastructure (likely MinIO for S3 testing). The --detach flag for the up command is appropriate for background operation.


9-16: LGTM! Standard Cargo targets properly implemented.

The test, clean, and clippy targets are correctly implemented. The clippy target appropriately scopes to the icegate-jobmanager package with strict warning-as-error enforcement (-D warnings), consistent with workspace linting policy.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
crates/icegate-jobmanager/src/storage/mod.rs (1)

95-96: Doc comment issue resolved.

The doc comment for save_job now correctly states it "Returns ConcurrentModification if version mismatch." This addresses the previous review feedback.

crates/icegate-jobmanager/src/execution/jobs_manager.rs (1)

84-87: Worker count validation now present.

The validation for worker_count == 0 has been added in new(), returning an appropriate error. This addresses the previous review concern about an empty JobsManagerHandle being returned silently.

crates/icegate-jobmanager/src/execution/worker.rs (2)

169-169: Typo fixed: "skipped" is now correct.

The previous review flagged "skiped" → "skipped". This has been corrected.


145-155: Jitter logic appears inverted when randomization is disabled.

When poll_interval_randomization.is_zero(), the code sets jitter_ms = self.config.poll_interval (line 147), then wait_duration = jitter_ms + poll_interval (line 155). This means the worker waits 2 * poll_interval when randomization is disabled, which seems unintentional.

If randomization is disabled, jitter should be zero:

🔎 Proposed fix
             // Reduce strong concurrency between workers
             let jitter_ms = if self.config.poll_interval_randomization.is_zero() {
-                self.config.poll_interval
+                Duration::ZERO
             } else {
                 #[allow(clippy::cast_possible_truncation)]
                 Duration::from_millis(
                     rand::rng().random_range(0..self.config.poll_interval_randomization.as_millis() as u64),
                 )
             };

             wait_duration = jitter_ms + poll_interval;
crates/icegate-jobmanager/src/storage/s3.rs (2)

138-150: Bucket creation error handling now distinguishes 404 from other errors.

The code now uses pattern matching on SdkError::ServiceError with status code 404 check before attempting bucket creation. Other errors (auth failures, etc.) are properly propagated. This addresses the previous review concern.


24-47: created_by_worker is now persisted correctly.

The TaskJson struct includes created_by_worker (line 29), and both task_to_json (line 266) and task_from_json (line 306) handle this field. This addresses the previous review concern about losing creator information during S3 round-trips.

Also applies to: 260-276, 300-316

🧹 Nitpick comments (12)
crates/icegate-jobmanager/Makefile (1)

9-10: Consider scoping the test target to this package for consistency.

The clippy target is scoped to icegate-jobmanager with the -p flag, but test runs all workspace tests. For consistency and faster local development feedback, consider scoping the test target similarly.

🔎 Proposed change for package-scoped tests
 test:
-	cargo test
+	cargo test -p icegate-jobmanager
crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs (1)

31-62: Consider adding comments to clarify the cache invalidation strategy.

The test exercises a specific cache invalidation pattern where:

  • find_meta is called on every get_job to check for version changes
  • get_by_meta is only called when the cache is stale or empty

Adding comments before each test phase would improve maintainability and help future readers understand the expected behavior.

💡 Suggested comment structure
+    // Phase 1: Direct storage save (bypasses cache)
     storage.save_job(&mut job, &cancel_token).await?;
     assert_eq!(storage.version(), 1);

+    // Phase 2: First cached get (cache miss, populates cache)
     let job_from_cache = cached_storage.get_job(&job_code, &cancel_token).await?;
     assert_eq!(storage.find_meta_calls(), 1);
     assert_eq!(storage.get_by_meta_calls(), 1);
crates/icegate-jobmanager/src/tests/two_jobs_test.rs (1)

22-22: Reduce excessive worker threads allocation.

The test configures 10 tokio worker threads but only uses 2 manager workers. This wastes resources and could slow down the test suite.

🔎 Proposed fix
-#[tokio::test(flavor = "multi_thread", worker_threads = 10)]
+#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
 async fn test_two_jobs_concurrent() -> Result<(), Box<dyn std::error::Error>> {

Using 4 threads provides sufficient parallelism (2× the number of manager workers) while being more resource-efficient.

crates/icegate-jobmanager/src/tests/deadline_expiry_test.rs (2)

23-23: Reduce excessive worker threads allocation.

The test configures 10 tokio worker threads but only uses 3 manager workers. This wastes resources and could slow down the test suite.

🔎 Proposed fix
-#[tokio::test(flavor = "multi_thread", worker_threads = 10)]
+#[tokio::test(flavor = "multi_thread", worker_threads = 6)]
 async fn test_task_deadline_expiry() -> Result<(), Box<dyn std::error::Error>> {

Using 6 threads provides sufficient parallelism (2× the number of manager workers) while being more resource-efficient.


45-48: Clarify the deadline expiry strategy.

The test sleeps for 500ms (5× the deadline) on the first 3 attempts. While this works, a shorter sleep (e.g., 150ms or 2× the deadline) would make the test faster without sacrificing reliability.

💡 Alternative approach
             if attempt <= 3 {
-                // First attempt: exceed deadline so another worker can re-pick.
-                tokio::time::sleep(Duration::from_millis(500)).await;
+                // First 3 attempts: exceed deadline so another worker can re-pick
+                tokio::time::sleep(Duration::from_millis(150)).await;
             }

This reduces the total sleep time from 1500ms to 450ms, making the test ~1 second faster.

crates/icegate-jobmanager/src/storage/cached.rs (2)

11-16: TODO acknowledged: consider TTL/LRU cache to bound memory.

The TODO on line 15 is a valid concern. Without eviction, the cache will grow unboundedly with each unique JobCode. For production use, consider using a crate like moka or quick_cache that provides TTL/LRU semantics out of the box.

Would you like me to open an issue to track implementing TTL/LRU eviction for the job cache?


36-44: Consider checking version for staleness, not just iteration.

update_cache_if_newer only compares iter_num. If two updates occur within the same iteration (e.g., task state changes), the cache might hold a stale job with an older version. Consider also comparing versions when iter_num is equal.

🔎 Proposed refinement
 fn update_cache_if_newer(storage_job: &Job, cached_job: &mut CachedJob) {
     if let Some(ref current_job) = cached_job.job {
-        if storage_job.iter_num() >= current_job.iter_num() {
+        if storage_job.iter_num() > current_job.iter_num()
+            || (storage_job.iter_num() == current_job.iter_num()
+                && storage_job.version() != current_job.version())
+        {
             cached_job.job = Some(storage_job.clone());
         }
     } else {
         cached_job.job = Some(storage_job.clone());
     }
 }
crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs (2)

194-198: Clarify the PUT count formula in comments.

The formula (((secondary_task_count + 1) * 2) + 1) + timeouts accounts for:

  • 1 PUT for job creation
  • 2 PUTs per task (start + complete)
  • Additional PUTs for timeout retries

Consider adding a more detailed comment explaining each component for future maintainability.

🔎 Suggested comment improvement
     assert_eq!(
         counting_storage.put_successes(),
-        (((secondary_task_count + 1) * 2) + 1) as u64 + timeouts, // 1 PUT for create job, 2 PUT for each task
+        // Expected PUTs:
+        // - 1 for initial job creation
+        // - 2 per task (1 to start, 1 to complete) × (primary + secondary tasks)
+        // - additional PUTs for task timeout retries
+        (((secondary_task_count + 1) * 2) + 1) as u64 + timeouts,
         "all tasks must be executed"
     );

41-44: Magic numbers could be extracted as constants.

The test configuration uses several magic numbers (10, 30, 100, 20). Consider extracting key values as named constants at the top of the function for clarity.

🔎 Suggested refactor
 async fn run_concurrent_workers_test(use_cached_storage: bool) -> Result<(), Box<dyn std::error::Error>> {
-    let secondary_task_count = 10;
-    let max_iterations = 1u64;
-    let workers_cnt = 10;
+    const SECONDARY_TASK_COUNT: usize = 10;
+    const MAX_ITERATIONS: u64 = 1;
+    const WORKER_COUNT: usize = 10;
+    const COMPLETION_TIMEOUT: Duration = Duration::from_secs(30);
+
+    let secondary_task_count = SECONDARY_TASK_COUNT;
+    let max_iterations = MAX_ITERATIONS;
+    let workers_cnt = WORKER_COUNT;
crates/icegate-jobmanager/src/execution/worker.rs (2)

66-75: Consider using std::panic::catch_unwind helper.

The panic_payload_to_string function is straightforward. Consider placing it in a shared utility module if panic handling is needed elsewhere.


91-91: TODO: Task timeout cancellation is not implemented.

Line 91 notes TODO(med): cancel task if timeout. Currently, tasks that exceed their deadline are logged (line 457) but not cancelled mid-execution. This could lead to resource exhaustion if tasks hang.

Would you like me to open an issue to track implementing task timeout cancellation?

crates/icegate-jobmanager/src/storage/s3.rs (1)

18-19: TODOs noted for cleanup and network testing.

Line 18-19 TODOs for old job state cleanup and Toxiproxy testing are valid future work items. Consider tracking these as issues.

Would you like me to open issues to track:

  1. Mechanism to clean up old job states
  2. Toxiproxy-based network failure testing
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c1e9706 and f39ab2b.

📒 Files selected for processing (16)
  • crates/icegate-jobmanager/Makefile
  • crates/icegate-jobmanager/src/execution/jobs_manager.rs
  • crates/icegate-jobmanager/src/execution/worker.rs
  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/src/storage/mod.rs
  • crates/icegate-jobmanager/src/storage/s3.rs
  • crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs
  • crates/icegate-jobmanager/src/tests/common/mod.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
  • crates/icegate-jobmanager/src/tests/deadline_expiry_test.rs
  • crates/icegate-jobmanager/src/tests/dynamic_task_test.rs
  • crates/icegate-jobmanager/src/tests/job_iterations_test.rs
  • crates/icegate-jobmanager/src/tests/shutdown_test.rs
  • crates/icegate-jobmanager/src/tests/simple_job_test.rs
  • crates/icegate-jobmanager/src/tests/task_failure_test.rs
  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
🚧 Files skipped from review as they are similar to previous changes (4)
  • crates/icegate-jobmanager/src/tests/shutdown_test.rs
  • crates/icegate-jobmanager/src/tests/task_failure_test.rs
  • crates/icegate-jobmanager/src/tests/simple_job_test.rs
  • crates/icegate-jobmanager/src/tests/job_iterations_test.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs
  • crates/icegate-jobmanager/src/tests/dynamic_task_test.rs
  • crates/icegate-jobmanager/src/tests/common/mod.rs
  • crates/icegate-jobmanager/src/tests/deadline_expiry_test.rs
  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
  • crates/icegate-jobmanager/src/execution/worker.rs
  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/src/storage/mod.rs
  • crates/icegate-jobmanager/src/execution/jobs_manager.rs
  • crates/icegate-jobmanager/src/storage/s3.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs
  • crates/icegate-jobmanager/src/tests/dynamic_task_test.rs
  • crates/icegate-jobmanager/src/tests/common/mod.rs
  • crates/icegate-jobmanager/src/tests/deadline_expiry_test.rs
  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
  • crates/icegate-jobmanager/src/execution/worker.rs
  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/src/storage/mod.rs
  • crates/icegate-jobmanager/src/execution/jobs_manager.rs
  • crates/icegate-jobmanager/src/storage/s3.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs
  • crates/icegate-jobmanager/src/tests/dynamic_task_test.rs
  • crates/icegate-jobmanager/src/tests/common/mod.rs
  • crates/icegate-jobmanager/src/tests/deadline_expiry_test.rs
  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
  • crates/icegate-jobmanager/src/execution/worker.rs
  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/src/storage/mod.rs
  • crates/icegate-jobmanager/src/execution/jobs_manager.rs
  • crates/icegate-jobmanager/src/storage/s3.rs
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Organize code in Cargo workspace with 4 crates: icegate-common, icegate-query, icegate-ingest, and icegate-maintain
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Organize code in Cargo workspace with 4 crates: icegate-common, icegate-query, icegate-ingest, and icegate-maintain

Applied to files:

  • crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs
  • crates/icegate-jobmanager/src/tests/common/mod.rs
  • crates/icegate-jobmanager/src/execution/worker.rs
  • crates/icegate-jobmanager/src/storage/mod.rs
  • crates/icegate-jobmanager/src/execution/jobs_manager.rs
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use `make clippy` to run the linter with warnings as errors

Applied to files:

  • crates/icegate-jobmanager/Makefile
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Run `make ci` to execute all CI checks (check, fmt, clippy, test, audit)

Applied to files:

  • crates/icegate-jobmanager/Makefile
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Run `make audit` to perform security audits and use `make install` to install cargo-audit

Applied to files:

  • crates/icegate-jobmanager/Makefile
🧬 Code graph analysis (7)
crates/icegate-jobmanager/src/tests/cache_invalidation_test.rs (3)
crates/icegate-jobmanager/src/tests/common/mod.rs (1)
  • init_tracing (12-19)
crates/icegate-jobmanager/src/storage/cached.rs (1)
  • new (26-32)
crates/icegate-jobmanager/src/tests/common/in_memory_storage.rs (4)
  • new (16-23)
  • version (25-27)
  • find_meta_calls (29-31)
  • get_by_meta_calls (33-35)
crates/icegate-jobmanager/src/tests/dynamic_task_test.rs (2)
crates/icegate-jobmanager/src/tests/common/manager_env.rs (2)
  • new (16-33)
  • storage (90-92)
crates/icegate-jobmanager/src/tests/common/minio_env.rs (1)
  • new (20-56)
crates/icegate-jobmanager/src/tests/two_jobs_test.rs (6)
crates/icegate-jobmanager/src/core/job.rs (7)
  • max_iterations (145-147)
  • new (15-17)
  • new (103-131)
  • new (168-195)
  • status (338-340)
  • from (31-33)
  • from (37-39)
crates/icegate-jobmanager/src/execution/worker.rs (2)
  • new (98-115)
  • default (33-40)
crates/icegate-jobmanager/src/storage/s3.rs (1)
  • new (100-162)
crates/icegate-jobmanager/src/core/task.rs (6)
  • new (13-15)
  • new (71-76)
  • new (124-140)
  • status (184-186)
  • from (29-31)
  • from (35-37)
crates/icegate-jobmanager/src/tests/common/manager_env.rs (2)
  • new (16-33)
  • storage (90-92)
crates/icegate-jobmanager/src/core/registry.rs (1)
  • new (29-58)
crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs (3)
crates/icegate-jobmanager/src/tests/common/mod.rs (1)
  • init_tracing (12-19)
crates/icegate-jobmanager/src/tests/common/minio_env.rs (1)
  • new (20-56)
crates/icegate-jobmanager/src/tests/common/storage_wrapper.rs (3)
  • put_attempts (32-34)
  • put_successes (36-38)
  • list_and_get_successes (40-42)
crates/icegate-jobmanager/src/execution/worker.rs (3)
crates/icegate-jobmanager/src/core/job.rs (11)
  • id (326-328)
  • new (15-17)
  • new (103-131)
  • new (168-195)
  • code (133-135)
  • code (330-332)
  • updated_by_worker_id (366-368)
  • iter_num (334-336)
  • from (31-33)
  • from (37-39)
  • version (342-344)
crates/icegate-jobmanager/src/storage/cached.rs (1)
  • new (26-32)
crates/icegate-jobmanager/src/storage/s3.rs (1)
  • new (100-162)
crates/icegate-jobmanager/src/storage/cached.rs (4)
crates/icegate-jobmanager/src/storage/s3.rs (5)
  • new (100-162)
  • get_job (400-427)
  • get_job_by_meta (429-467)
  • find_job_meta (469-509)
  • save_job (512-584)
crates/icegate-jobmanager/src/infra/metrics.rs (3)
  • new (36-75)
  • record_cache_hit (123-128)
  • record_cache_miss (130-135)
crates/icegate-jobmanager/src/tests/common/in_memory_storage.rs (6)
  • new (16-23)
  • get_job (40-49)
  • version (25-27)
  • get_job_by_meta (51-62)
  • find_job_meta (64-78)
  • save_job (80-88)
crates/icegate-jobmanager/src/storage/mod.rs (5)
  • get_job (87-87)
  • get_job (101-101)
  • get_job_by_meta (90-90)
  • find_job_meta (93-93)
  • save_job (96-96)
crates/icegate-jobmanager/src/execution/jobs_manager.rs (5)
crates/icegate-jobmanager/src/execution/worker.rs (3)
  • default (33-40)
  • new (98-115)
  • start (121-157)
crates/icegate-jobmanager/src/infra/retrier.rs (2)
  • default (19-36)
  • new (49-51)
crates/icegate-jobmanager/src/storage/cached.rs (1)
  • new (26-32)
crates/icegate-jobmanager/src/infra/metrics.rs (1)
  • new (36-75)
crates/icegate-jobmanager/src/core/registry.rs (1)
  • new (29-58)
🪛 checkmake (0.2.2)
crates/icegate-jobmanager/Makefile

[warning] 1-1: Missing required phony target "all"

(minphony)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test beta on ubuntu-latest
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Benchmark
  • GitHub Check: Security Audit
  • GitHub Check: Build Release
🔇 Additional comments (16)
crates/icegate-jobmanager/Makefile (2)

1-7: LGTM! Infrastructure targets are well-defined.

The .PHONY declaration correctly lists all targets, and the docker compose commands appropriately manage example infrastructure for local development and testing.


15-16: LGTM! Clippy target follows project conventions.

The clippy target correctly scopes to the icegate-jobmanager package and treats warnings as errors, aligning with the project's linting standards.

crates/icegate-jobmanager/src/tests/common/mod.rs (1)

12-19: LGTM!

The init_tracing function correctly uses OnceLock to ensure single initialization across multiple test runs, and the fallback filter level is appropriate for test debugging.

crates/icegate-jobmanager/src/tests/two_jobs_test.rs (1)

168-169: The timeout calculation logic is correct.

Lines 168-169 and 183-184 calculate timeouts by summing (attempt - 1) for all tasks. The attempt field starts at 0 when a task is created, increments to 1 on the first start() call (first execution attempt), and increments further for each retry due to timeout. This means (attempt - 1) correctly counts the number of retries/timeouts: a fresh task with no timeouts has attempt = 1, yielding (1 - 1) = 0. The test expectation is valid.

crates/icegate-jobmanager/src/storage/cached.rs (1)

93-107: Retry loop is now bounded—good fix.

The previous unbounded loop has been replaced with a bounded for _ in 0..2 loop (max 2 attempts). This addresses the earlier concern about indefinite spinning under contention. The fallback to StorageError::ConcurrentModification on exhaustion is appropriate.

crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs (1)

1-201: LGTM: Comprehensive concurrent worker test.

The test thoroughly validates concurrent task processing with proper tracking, cleanup, and assertions. Good coverage for both S3 and cached storage variants.

crates/icegate-jobmanager/src/storage/mod.rs (2)

11-46: Well-structured error enum with appropriate variants.

StorageError covers the essential failure modes (not found, conflicts, timeouts, auth, rate limiting, etc.) with clear helper predicates. The thiserror derivation provides good error messages.


82-97: Storage trait design is clean and async-friendly.

The trait methods accept CancellationToken for cooperative cancellation and return StorageResult. The separation of get_job vs get_job_by_meta vs find_job_meta provides flexibility for callers to optimize storage access patterns.

crates/icegate-jobmanager/src/execution/jobs_manager.rs (3)

68-72: Drop implementation provides safety net.

The Drop impl calling abort() ensures workers are cancelled even if the handle is dropped without explicit shutdown. This prevents resource leaks and orphaned tasks.


117-125: Worker panic handling needs attention (per TODO).

The TODO on line 118 notes that worker panics are not handled gracefully—the worker dies but the manager continues. Consider whether panics should trigger a restart or escalate to the manager level.

Is the current behavior (worker dies silently, others continue) acceptable for production, or should there be a restart mechanism or escalation policy?


37-66: JobsManagerHandle lifecycle API is well-designed.

The separation of shutdown() (graceful) vs abort() (forceful) with proper wait() semantics provides good control over worker lifecycle. The error logging for panics in wait() is appropriate.

crates/icegate-jobmanager/src/execution/worker.rs (2)

527-599: Complex but well-structured retry logic in save_job_state.

The save_job_state method handles optimistic locking with a merge callback pattern. The use of Arc<Mutex<Option<Job>>> to manage job ownership across async retry boundaries is sound. The MergeDecision enum cleanly separates retry vs done outcomes.


431-440: Good panic handling for task executors.

Wrapping the executor in AssertUnwindSafe and catching panics prevents a single misbehaving task from crashing the worker. The panic payload is converted to a string for logging.

crates/icegate-jobmanager/src/storage/s3.rs (3)

206-217: Inverted iteration numbering for S3 listing is clever.

Using u64::MAX - iter_num ensures newer iterations appear first in S3's lexicographic ordering, making max_keys(1) efficient for fetching the latest state. Good design choice documented with comments.


370-371: TODO: Verify multipart upload behavior with ETag.

The comment on lines 370-371 notes potential issues with multipart uploads affecting ETag atomicity. This is worth investigating for large job states.

Consider testing with job states large enough to trigger multipart uploads to verify ETag behavior remains correct for optimistic locking.


99-162: S3 client initialization is thorough.

The setup includes proper timeout configuration, path-style access forcing, and bucket existence check with creation fallback. The credentials are passed explicitly which is appropriate for non-AWS S3-compatible services.

frisbeeman
frisbeeman previously approved these changes Jan 3, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs (3)

15-15: Noted: TODO for error log checking.

This is a valid concern. Consider using a scoped tracing subscriber per test (e.g., tracing_test crate) or capturing logs with a custom layer that filters by span context to isolate errors per test.

Would you like me to open an issue to track this enhancement?


56-77: Minor: Consider adding a compile-time assertion for the cast safety.

The #[allow(clippy::cast_possible_truncation)] is fine for the current secondary_task_count = 10, but if this constant is later increased beyond 255, the truncation would silently corrupt task data.

Also, Line 75: task_id.clone() is unnecessary since task_id can be moved directly—it's the last use of the variable.

🔎 Suggested improvement
 let primary_executor: TaskExecutorFn = Arc::new(move |task, manager, _cancel_token| {
     let executed = Arc::clone(&executed_primary_tasks_counter);
     let task_id = task.id().to_string();

     Box::pin(async move {
         // Create multiple work tasks
-        #[allow(clippy::cast_possible_truncation)]
         for i in 0..secondary_task_count {
+            #[allow(clippy::cast_possible_truncation)]
             let secondary_task_def = TaskDefinition::new(
                 TaskCode::new("secondary_task"),
                 vec![i as u8],
                 ChronoDuration::seconds(1),
             )?;
             manager.add_task(secondary_task_def)?;
         }

-        executed.insert(task_id.clone(), true);
+        executed.insert(task_id.clone(), true);  // clone needed here

-        manager.complete_task(&task_id, Vec::new())
+        manager.complete_task(&task_id, Vec::new())  // borrow is fine
     })
 });

201-202: Ensure file ends with a newline.

As per coding guidelines, each .rs file should end with a trailing newline. Please verify this is present.

crates/icegate-jobmanager/src/core/job.rs (1)

197-233: Consider extracting common task collection logic.

Both new() (lines 175-178) and restore() (lines 213-216) have identical logic for converting Vec<Task> to HashMap<String, Arc<Task>>. This could be extracted to a helper, though it's minor.

🔎 Optional: Extract helper function
fn tasks_to_map(tasks: Vec<Task>) -> HashMap<String, Arc<Task>> {
    tasks.into_iter()
        .map(|task| (task.id().to_string(), Arc::new(task)))
        .collect()
}
crates/icegate-jobmanager/src/storage/cached.rs (2)

15-15: TODO: Consider prioritizing cache eviction strategy.

The unbounded cache growth is a known issue. Depending on job cardinality and update frequency, this could become a memory concern in production. Consider prioritizing TTL or LRU implementation, especially if jobs are numerous or short-lived.


92-111: Retry logic improved but has minor inefficiency.

The bounded retry loop addresses the previous concern about unbounded retries. However, if the second get_job_by_meta attempt fails with a conflict (line 100), the code fetches fresh meta (line 102) but never uses it since the loop terminates. This wastes a storage call.

Consider restructuring to avoid the unnecessary third find_job_meta:

🔎 Suggested refactor
-        let mut job = None;
-        for _ in 0..2 {
+        const MAX_RETRIES: usize = 2;
+        let job = 'retry: {
+            for attempt in 0..MAX_RETRIES {
                 match self.inner.get_job_by_meta(&meta, cancel_token).await {
                     Ok(found) => {
-                        job = Some(found);
-                        break;
+                        break 'retry found;
                     }
                     Err(e) if e.is_conflict() => {
+                        if attempt + 1 < MAX_RETRIES {
                             debug!("Retry find job {job_code} meta in storage");
                             meta = self.inner.find_job_meta(job_code, cancel_token).await?;
+                        }
                     }
                     Err(e) => return Err(e),
                 }
             }
-        }
-
-        let job = job.ok_or(StorageError::ConcurrentModification)?;
+            return Err(StorageError::ConcurrentModification);
+        };
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f39ab2b and a40aeb3.

📒 Files selected for processing (4)
  • crates/icegate-jobmanager/src/core/error.rs
  • crates/icegate-jobmanager/src/core/job.rs
  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
  • crates/icegate-jobmanager/src/core/error.rs
  • crates/icegate-jobmanager/src/core/job.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
  • crates/icegate-jobmanager/src/core/error.rs
  • crates/icegate-jobmanager/src/core/job.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-jobmanager/src/storage/cached.rs
  • crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs
  • crates/icegate-jobmanager/src/core/error.rs
  • crates/icegate-jobmanager/src/core/job.rs
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Organize code in Cargo workspace with 4 crates: icegate-common, icegate-query, icegate-ingest, and icegate-maintain
🧬 Code graph analysis (4)
crates/icegate-jobmanager/src/storage/cached.rs (1)
crates/icegate-jobmanager/src/storage/mod.rs (5)
  • get_job (87-87)
  • get_job (101-101)
  • get_job_by_meta (90-90)
  • find_job_meta (93-93)
  • save_job (96-96)
crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs (6)
crates/icegate-jobmanager/src/tests/common/mod.rs (1)
  • init_tracing (12-19)
crates/icegate-jobmanager/src/storage/cached.rs (1)
  • new (26-32)
crates/icegate-jobmanager/src/storage/s3.rs (1)
  • new (100-162)
crates/icegate-jobmanager/src/infra/metrics.rs (2)
  • new (36-75)
  • new_disabled (22-34)
crates/icegate-jobmanager/src/tests/common/manager_env.rs (2)
  • new (16-33)
  • storage (90-92)
crates/icegate-jobmanager/src/tests/common/storage_wrapper.rs (3)
  • put_attempts (32-34)
  • put_successes (36-38)
  • list_and_get_successes (40-42)
crates/icegate-jobmanager/src/core/error.rs (2)
crates/icegate-jobmanager/src/storage/mod.rs (2)
  • cancelled (63-65)
  • max_attempts (67-69)
crates/icegate-jobmanager/src/infra/retrier.rs (2)
  • cancelled (44-44)
  • max_attempts (45-45)
crates/icegate-jobmanager/src/core/job.rs (2)
crates/icegate-jobmanager/src/core/task.rs (19)
  • new (13-15)
  • new (71-76)
  • new (124-140)
  • code (78-80)
  • code (95-95)
  • code (180-182)
  • code (315-317)
  • as_str (17-19)
  • from (29-31)
  • from (35-37)
  • id (94-94)
  • id (176-178)
  • id (311-313)
  • status (184-186)
  • started_at (204-206)
  • completed_at (208-210)
  • output (220-222)
  • error_msg (224-226)
  • fail (293-306)
crates/icegate-jobmanager/src/core/registry.rs (1)
  • new (29-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Security Audit
  • GitHub Check: Benchmark
  • GitHub Check: Test beta on ubuntu-latest
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Build Release
🔇 Additional comments (26)
crates/icegate-jobmanager/src/tests/concurrent_workers_test.rs (8)

1-13: LGTM!

Imports are well-organized and appropriate for the test functionality.


17-39: LGTM!

Test functions are properly structured with appropriate async runtime configuration. The 10 worker threads align with the workers_cnt used in the test logic.


41-54: LGTM!

Test setup is well-structured. Using DashMap for concurrent execution tracking is appropriate, and the Arc cloning pattern for closure sharing is correct.


79-93: LGTM!

Secondary executor correctly simulates work and tracks execution. The pattern is consistent with the primary executor.


95-109: LGTM!

Job and task definitions are properly configured. The JobRegistry creation with the cloned job definition ensures proper registration.


111-139: LGTM!

Storage layering is correct. The CountingStorage wrapper properly sits beneath CachedStorage when enabled, allowing metrics to reflect actual S3 operations versus cached reads.


141-156: LGTM!

Manager configuration is appropriate for the concurrent test scenario. The 30-second timeout provides sufficient margin for the test workload.


158-199: Verification logic is thorough.

The assertions correctly validate:

  • Job completion state and iteration count
  • Task creation counts (primary + secondary)
  • Execution tracking for both task types

The commented-out assertion (lines 195-199) with the TODO is appropriately documented regarding test flakiness due to race conditions.

crates/icegate-jobmanager/src/core/error.rs (3)

1-22: LGTM! Well-structured public error types.

The public Error enum with thiserror derive provides clear, user-facing error variants. The #[error(transparent)] on Serialization correctly forwards the underlying serde_json::Error message. The Result<T> type alias is a clean convenience for the crate's public API.


24-57: LGTM! Internal error types are well-designed.

The separation between InternalError (crate-level) and JobError (job-specific) provides good modularity. The InvalidStatusTransition variant with from/to fields is particularly useful for debugging state machine issues.


59-92: LGTM! Error conversions and trait implementation are correct.

The From implementations properly propagate the Cancelled variant across error boundaries. The RetryError implementation provides semantic variants (Cancelled, MaxAttemptsReached) that align with the trait contract defined in infra/retrier.rs.

crates/icegate-jobmanager/src/core/job.rs (11)

9-40: LGTM! Clean newtype pattern for JobCode.

The #[serde(transparent)] ensures JSON compatibility while maintaining type safety. Validation of empty codes is appropriately deferred to JobRegistry::new() as shown in the registry.rs snippet.


65-89: LGTM! State machine transitions are well-defined.

The transition rules correctly model the job lifecycle:

  • Started can begin work (Running) or fail immediately
  • Running can stay running (idempotent), complete, or fail
  • Completed/Failed can transition back to Started for next iteration

The separation of can_transition_to (validation) and transition_to (mutation) follows a clean pattern.


102-131: LGTM! Robust constructor validation.

The JobDefinition::new constructor properly validates invariants:

  1. At least one initial task required
  2. At least one executor required
  3. Every initial task must have a matching executor

This fail-fast approach prevents runtime errors during job execution.


150-165: LGTM! Well-structured internal job state.

Using Arc<Task> in the HashMap enables cheap cloning of job state while allowing copy-on-write semantics via Arc::make_mut. The comment on line 156 clearly documents this design decision.


236-261: LGTM! Next iteration logic correctly preserves identity.

The approach of creating a fresh Job and then restoring preserved fields (id, iter_num, metadata) ensures all transient state is properly reset. The increment of iter_num and fresh started_at correctly mark the new iteration.


277-299: LGTM! Task mutation uses copy-on-write correctly.

The Arc::make_mut pattern (lines 280, 290, 297) ensures efficient memory usage: tasks are only cloned when the Arc has multiple references. This is the idiomatic Rust approach for shared mutable state.


301-323: LGTM! Defensive check for invariant violation.

The error at lines 315-320 detects an inconsistent state: if all tasks are completed but the job is still Running, something went wrong in the completion flow. This is good defensive programming that surfaces bugs early.


370-393: LGTM! State check methods are well-implemented.

The is_ready_to_next_iteration() method correctly combines multiple conditions:

  1. Job must be in terminal state (Completed or Failed)
  2. Iteration limit not reached
  3. Scheduled time has passed (or no schedule set)

400-425: LGTM! State mutation methods enforce transitions.

All mutation methods delegate to status.transition_to() which enforces the state machine rules. The reuse of completed_at for both success and failure is reasonable since it represents "terminal timestamp."


428-475: Merge logic is sound with proper ownership checks.

The worker ownership validation at line 442 prevents race conditions where a timed-out task might be overwritten by a late worker. The task merge criteria (lines 459-461) correctly capture both newly created tasks and processed tasks.

One minor note: the error wrapping at lines 447-450 converts the JobError to a string, which loses the structured error information. This is acceptable for debugging but could be improved if error handling becomes more sophisticated.


477-537: LGTM! Helper methods are well-implemented.

Key observations:

  • is_iteration_limit_reached() uses >= correctly (stops at the limit, not after)
  • all_tasks_completed() guards against empty task maps returning true
  • tasks_as_string() limits output to 3 tasks to prevent log flooding
crates/icegate-jobmanager/src/storage/cached.rs (4)

59-88: Cache hit path looks correct.

The logic properly validates both iter_num and version before serving from cache (line 76), records metrics, and includes helpful debug logging. Holding the lock during storage fetch prevents get/save races, which is the right choice for correctness.


143-154: Conflict handling and cache invalidation are correct.

The code properly invalidates the cache on conflict (line 147), forcing the next read to fetch fresh state. The cache update uses update_cache_if_newer, which has the version comparison issue flagged earlier in this review.


165-197: Save implementation follows correct write-through pattern.

The method correctly updates the cache only after a successful save (line 179) and invalidates on conflict (line 192), ensuring cache consistency. The debug logging provides good observability.


157-162: Correct passthrough implementation.

Not caching the meta is the right choice, as fresh meta is needed to validate cached jobs.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI Agents
In @crates/icegate-jobmanager/src/tests/two_jobs_test.rs:
- Around line 43-62: The async block in the TaskExecutorFn closure is not
awaiting the future returned by manager.complete_task, so the task never
actually completes; update the async move block in the executor (the closure
assigned to TaskExecutorFn) to call manager.complete_task(&task_id,
b"done".to_vec()).await and handle or propagate the Result as appropriate (e.g.,
unwrap or map_err/log) so the completion future is executed.
🧹 Nitpick comments (1)
crates/icegate-jobmanager/src/tests/two_jobs_test.rs (1)

167-168: Consider more descriptive variable naming.

The variable primary_job_timeouts and secondary_job_timeouts actually represent the count of retry attempts rather than timeouts per se. Consider renaming to something like primary_job_retries or primary_job_extra_attempts for clarity.

🔎 Proposed refactor
-    let primary_job_timeouts: u64 = primary_job_state.tasks_as_iter().map(|t| u64::from(t.attempt() - 1)).sum();
-    assert_eq!(primary_job_timeouts, 0, "job A should not have timeouts");
+    let primary_job_retries: u64 = primary_job_state.tasks_as_iter().map(|t| u64::from(t.attempt() - 1)).sum();
+    assert_eq!(primary_job_retries, 0, "job A should not have retries");

And similarly for the secondary job (lines 182-183):

-    let secondary_job_timeouts: u64 = secondary_job_state.tasks_as_iter().map(|t| u64::from(t.attempt() - 1)).sum();
-    assert_eq!(secondary_job_timeouts, 0, "job B should not have timeouts");
+    let secondary_job_retries: u64 = secondary_job_state.tasks_as_iter().map(|t| u64::from(t.attempt() - 1)).sum();
+    assert_eq!(secondary_job_retries, 0, "job B should not have retries");

Also applies to: 182-183

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a40aeb3 and 1b1149a.

📒 Files selected for processing (1)
  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-jobmanager/src/tests/two_jobs_test.rs
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Organize code in Cargo workspace with 4 crates: icegate-common, icegate-query, icegate-ingest, and icegate-maintain
🧬 Code graph analysis (1)
crates/icegate-jobmanager/src/tests/two_jobs_test.rs (4)
crates/icegate-jobmanager/src/execution/worker.rs (1)
  • new (98-115)
crates/icegate-jobmanager/src/execution/job_manager.rs (1)
  • new (26-28)
crates/icegate-jobmanager/src/tests/common/manager_env.rs (2)
  • new (16-33)
  • storage (90-92)
crates/icegate-jobmanager/src/core/registry.rs (1)
  • new (29-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Benchmark
  • GitHub Check: Build Release
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Security Audit
  • GitHub Check: Test beta on ubuntu-latest
🔇 Additional comments (1)
crates/icegate-jobmanager/src/tests/two_jobs_test.rs (1)

146-155: The >= assertions appear defensive but lack justification given the timeout verification.

The comment states that "actual task execution may exceed the expected one" due to concurrency, but the test verifies that primary_job_timeouts == 0 and secondary_job_timeouts == 0 (lines 167-168, 182-183). Since all tasks completed on first attempt with no retries, what concurrent scenario would cause execution counts to exceed expectations?

Either change the assertions to == (which should be true if no retries occurred), or explicitly document which concurrent modification edge case the >= is protecting against.

@s-prosvirnin s-prosvirnin merged commit 83f7b2f into main Jan 6, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants