Skip to content

Conversation

@frisbeeman
Copy link
Member

@frisbeeman frisbeeman commented Jan 6, 2026

Add handling for by and without clauses in range and vector aggregation queries, enabling proper label grouping and attribute filtering. Extend test suite to cover grouping scenarios.

Summary by CodeRabbit

  • Improvements

    • Unified query planning with centralized output post-processing for more consistent query results
    • Enhanced grouping and label/attribute handling; label mapping now maps account_id → cloud_account_id
    • Server startup can notify the actual bound port for reliable ephemeral-port use (new startup variant exported)
  • Tests

    • Vastly expanded end-to-end tests for unwrap aggregations and cross-type grouping semantics
    • Enabled a previously skipped unwrap test and migrated tests to ephemeral-port startup
  • Documentation

    • Added comprehensive Rust development guidelines and code-style guidance

✏️ Tip: You can customize this high-level summary in your review settings.

@frisbeeman frisbeeman requested review from a team and s-prosvirnin January 6, 2026 19:22
@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

Walkthrough

Centralized DataFusion planner post-processing; added grouping/filtering helpers with grouping pushdown/merge and unwrap-aggregation validations; extended label mapping. Added extensive cross-type grouping tests and switched test servers to ephemeral ports with a port-notify server helper. Added RUST.md and small docs updates.

Changes

Cohort / File(s) Summary
Core planner refactor & grouping logic
crates/icegate-query/src/logql/datafusion/planner.rs
Reworked plan() to compute a DataFrame then apply a unified transform_output_columns; added helpers build_grouping_with_filtered_attrs, push_grouping_to_inner_expr, merge_groupings; improved grouping pushdown/merge and unwrap-range-aggregation validation; updated vector-aggregation handling, MAP/label serialization (uses make_array) and mapping (account_id -> cloud_account_id).
Loki server API & ephemeral port notify
crates/icegate-query/src/loki/server.rs, crates/icegate-query/src/loki/mod.rs
Added run_with_port_tx(..., Option<oneshot::Sender<u16>>), made run(...) delegate to it; log actual bound address and optionally send bound port; re-exported run_with_port_tx.
Test harness: ephemeral port startup
crates/icegate-query/tests/loki/harness.rs
TestServer::start() now uses ephemeral port (port 0) and receives actual bound port via oneshot from server; constructs base URL from notified port; updated call sites to use run_with_port_tx.
Tests: adapt to ephemeral ports & grouping coverage
crates/icegate-query/tests/loki/*
.../unwrap.rs, .../labels.rs, .../metrics.rs, .../query_range.rs, .../tenant.rs, .../trace.rs, .../mod.rs, crates/icegate-query/tests/loki/grouping.rs
Replaced fixed-port start(port) calls with start() across tests; enabled a previously ignored quantile unwrap test; added extensive new unwrap/grouping tests via grouping.rs and registered the module.
Docs & style guide
AGENTS.md, RUST.md
Added references to RUST.md in AGENTS.md and introduced comprehensive RUST.md with Rust development and AI-assisted coding guidelines.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Test Harness
  participant Server as Loki Server (run_with_port_tx)
  participant OS as OS / TCP stack
  participant Receiver as Test (oneshot receiver)

  Test ->> Server: start_with_port_tx(port_tx)
  Server ->> OS: bind(listener, port=0)
  OS -->> Server: listener(bound_addr)
  Server ->> Receiver: port_tx.send(bound_port)
  Receiver -->> Test: receive(bound_port)
  Test ->> Test: construct base URL with bound_port
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • s-prosvirnin
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'GH-34: Add filtered grouping support for range aggregations' directly and clearly describes the main change: implementing filtered grouping (by/without clauses) for range aggregations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/icegate-query/src/logql/datafusion/planner.rs (1)

1365-1372: Rename test data column from account_id to cloud_account_id to align with schema definition.

The logs_schema (crates/icegate-common/src/schema.rs line 66) defines field 2 as cloud_account_id, and the labels test (crates/icegate-query/tests/loki/labels.rs:52) expects cloud_account_id in output. However, the unwrap test (line 53) creates an account_id array. While the query mapping "account_id" → "cloud_account_id" in planner.rs is correct for user-facing queries, test data should use the actual schema column name cloud_account_id (lines 53, 177 in unwrap.rs and similar locations in harness.rs at lines 152, 235, 289, 385).

🤖 Fix all issues with AI Agents
In @crates/icegate-query/src/logql/datafusion/planner.rs:
- Around line 687-699: The check early in planner.rs forbids grouping for
RangeAggregationOp::RateCounter (the agg.grouping.is_some() match that returns
an error for RangeAggregationOp::RateCounter), which makes the later handling of
RateCounter inside the grouping branch (the RateCounter branch around lines
800-821 handling `if let Some(ref grouping) = agg.grouping`) unreachable; fix by
removing the dead RateCounter grouping branch (delete the RateCounter arm inside
the grouping handling) or, if RateCounter must be allowed in that vector-wrapped
case, change the earlier validation to not return for
RangeAggregationOp::RateCounter so the later `if let Some(ref grouping) =
agg.grouping` code can execute — update only the validation match on
RangeAggregationOp::RateCounter or remove the dead branch accordingly.
🧹 Nitpick comments (4)
crates/icegate-query/tests/loki/unwrap.rs (2)

799-849: Misleading test name: test_sum_over_time_with_grouping_rejected expects success.

The test name suggests rejection behavior, but the test expects HTTP 200 and validates successful grouping. The comments correctly explain that grouping is handled at the vector aggregation level, but the function name doesn't reflect this.

Consider renaming to better describe the actual behavior being tested:

🔎 Suggested rename
-async fn test_sum_over_time_with_grouping_rejected() -> Result<(), Box<dyn std::error::Error>> {
+async fn test_sum_over_time_with_vector_level_grouping() -> Result<(), Box<dyn std::error::Error>> {

851-900: Same naming issue: test_rate_counter_with_grouping_rejected expects success.

Similar to the previous test, this name suggests rejection but the test expects HTTP 200 success.

🔎 Suggested rename
-async fn test_rate_counter_with_grouping_rejected() -> Result<(), Box<dyn std::error::Error>> {
+async fn test_rate_counter_with_vector_level_grouping() -> Result<(), Box<dyn std::error::Error>> {
crates/icegate-query/src/logql/datafusion/planner.rs (2)

481-584: Code duplication with build_grouping_components.

build_filtered_grouping_exprs and build_grouping_components share nearly identical label categorization logic (iterating labels, mapping names, separating indexed vs. attribute labels). Consider extracting the common label classification into a shared helper.

🔎 Suggested refactor
/// Categorize labels into indexed columns and attribute labels based on grouping.
fn categorize_grouping_labels(
    grouping: &crate::logql::common::Grouping,
) -> (Vec<&'static str>, Vec<&str>, bool) {
    use crate::logql::common::Grouping;

    match grouping {
        Grouping::By(labels) => {
            let mut indexed_cols = Vec::new();
            let mut attr_labels = Vec::new();
            for label in labels {
                let mapped_name = Self::map_label_to_internal_name(&label.name);
                if Self::is_top_level_field(mapped_name) {
                    indexed_cols.push(mapped_name);
                } else {
                    attr_labels.push(label.name.as_str());
                }
            }
            (indexed_cols, attr_labels, false) // false = is_without
        }
        Grouping::Without(labels) => {
            // ... similar logic for Without
            (excluded_indexed, excluded_attrs, true) // true = is_without
        }
    }
}

Then both build_filtered_grouping_exprs and build_grouping_components can call this helper.


1244-1250: Consider using a more appropriate error variant.

QueryError::NotImplemented implies a feature gap, but this case is a user error (requesting a label that doesn't exist). Consider using a validation or plan error variant if available:

-                        return Err(QueryError::NotImplemented(format!(
-                            "Label '{}' not available in aggregation result.",
+                        return Err(QueryError::Plan(format!(
+                            "Label '{}' not available in aggregation result",
                             l.name
                         )));
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02a16aa and 38fffde.

📒 Files selected for processing (2)
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/tests/loki/unwrap.rs
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
crates/icegate-query/src/logql/**

📄 CodeRabbit inference engine (AGENTS.md)

Regenerate LogQL ANTLR4 parser using make install and make gen in crates/icegate-query/src/logql/ directory (requires Java)

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
🧬 Code graph analysis (1)
crates/icegate-query/src/logql/datafusion/planner.rs (1)
crates/icegate-query/src/logql/datafusion/udf.rs (7)
  • name (59-61)
  • name (119-121)
  • name (400-402)
  • name (1876-1878)
  • name (1989-1991)
  • name (2152-2154)
  • name (2257-2259)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test beta on ubuntu-latest
  • GitHub Check: Benchmark
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Security Audit
🔇 Additional comments (5)
crates/icegate-query/tests/loki/unwrap.rs (3)

683-738: LGTM - Well-structured test for by grouping.

The test correctly validates that avg by (service_name) produces results with only the specified label and excludes others like account_id. The dual check for service or service_name appropriately handles the Loki compatibility mapping.


740-797: LGTM - Comprehensive test for without grouping.

The test correctly validates that max without (account_id) excludes account_id while retaining other labels like service and level. Good coverage of the without clause semantics.


902-955: LGTM - Good coverage for multi-label grouping.

The test correctly validates that min by (service_name, level) produces results with both specified labels. This ensures the multi-label grouping path works correctly.

crates/icegate-query/src/logql/datafusion/planner.rs (2)

922-934: Good defensive column selection after aggregation.

The code correctly checks for column existence in the schema before adding to the select list, avoiding runtime errors when grouping changes the available columns.


1378-1381: LGTM - Correct extension of top-level field check.

Extending is_top_level_field to include tenant_id and timestamp is appropriate since these are indexed columns that aren't in LOG_INDEXED_ATTRIBUTE_COLUMNS but are valid top-level fields.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/icegate-query/src/logql/datafusion/planner.rs (1)

1197-1252: Unused variable _udf in grouping handling.

The _udf variable is destructured at line 1198 but never used in this code path. The range aggregation already handles attribute filtering via serialization, so the vector aggregation only needs to group by indexed columns. Consider removing the unused UDF creation to avoid confusion.

🔎 Remove unused variable
-            let (labels, _udf, is_without) = match grouping {
-                Grouping::By(labels) => (labels, ScalarUDF::from(super::udf::MapKeepKeys::new()), false),
-                Grouping::Without(labels) => (labels, ScalarUDF::from(super::udf::MapDropKeys::new()), true),
+            let (labels, is_without) = match grouping {
+                Grouping::By(labels) => (labels, false),
+                Grouping::Without(labels) => (labels, true),
             };
crates/icegate-query/tests/loki/grouping.rs (1)

397-465: Test validates By + By merge correctly, but comment may be misleading.

The test correctly validates the merged grouping behavior. However, the comment at lines 409-410 ("According to code comments, intersection should be computed, but actual behavior appears to use the outer grouping specification") may cause confusion.

Looking at the planner code (lines 1140-1148), By + By performs a union of labels (not intersection), so by (service_name, account_id) outer + by (account_id) inner = by (service_name, account_id). The test assertions correctly reflect this union behavior. Consider clarifying the comment to match the actual merge semantics.

🔎 Clarify the comment
     // Query: sum by (service_name, account_id) (avg_over_time({service_name="api"} | unwrap value [1m]) by (account_id))
     // Tests same-type grouping (By + By)
-    // Current behavior: outer By(service_name, account_id) is used (both labels present in output)
-    // Note: According to code comments, intersection should be computed, but actual behavior
-    // appears to use the outer grouping specification
+    // Expected behavior: inner By(account_id) and outer By(service_name, account_id) are merged (union)
+    // Result: By(service_name, account_id) - both labels present in output
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38fffde and 3dd607e.

📒 Files selected for processing (3)
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/tests/loki/mod.rs
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/tests/loki/mod.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/tests/loki/mod.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/tests/loki/mod.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
crates/icegate-query/src/logql/**

📄 CodeRabbit inference engine (AGENTS.md)

Regenerate LogQL ANTLR4 parser using make install and make gen in crates/icegate-query/src/logql/ directory (requires Java)

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
🧠 Learnings (1)
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to crates/icegate-common/src/schema.rs : Iceberg tables (logs, spans, events, metrics) must use tenant_id partitioning, ZSTD compression, and MAP(VARCHAR,VARCHAR) for attributes

Applied to files:

  • crates/icegate-query/tests/loki/grouping.rs
🧬 Code graph analysis (1)
crates/icegate-query/src/logql/datafusion/planner.rs (1)
crates/icegate-query/src/logql/datafusion/udf.rs (7)
  • name (59-61)
  • name (119-121)
  • name (400-402)
  • name (1876-1878)
  • name (1989-1991)
  • name (2152-2154)
  • name (2257-2259)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Test beta on ubuntu-latest
  • GitHub Check: Build Release
  • GitHub Check: Benchmark
  • GitHub Check: Security Audit
🔇 Additional comments (11)
crates/icegate-query/src/logql/datafusion/planner.rs (6)

481-584: Well-structured helper for filtered grouping expressions.

The logic correctly handles both By and Without grouping semantics, including proper filtering of indexed columns and MAP attributes. The skip of trace_id/span_id binary columns in the Without case (lines 556-558) is appropriate since DataFusion can't group by FixedSizeBinary types.


586-654: LGTM!

The helper correctly decomposes grouping into its component parts (indexed columns and filtered attributes expression), enabling the caller to selectively apply each component. The consistent handling of By/Without logic aligns well with build_filtered_grouping_exprs.


787-813: LGTM!

The grouping logic correctly:

  1. Replaces the attributes column with filtered version before aggregation when grouping is specified
  2. Builds appropriate grouping expressions using the new helpers
  3. Falls back to default label grouping for the no-grouping case

896-908: Good defensive column selection.

Checking schema for column presence before including in select expressions prevents runtime errors when grouping has filtered out certain columns.


1114-1181: Cross-type grouping merge logic with improved test coverage.

The merge logic correctly implements outer-grouping precedence for cross-type merges (By+Without and Without+By). The new tests in grouping.rs now validate these semantics:

  • test_cross_type_grouping_by_inner_without_outer: Tests inner By + outer Without
  • test_cross_type_grouping_without_inner_by_outer: Tests inner Without + outer By
  • test_same_type_grouping_by_merge: Tests By + By merge

This addresses the previous review comment requesting test coverage for cross-type grouping merge cases.


1355-1362: LGTM!

The new account_id → cloud_account_id mapping is consistent with the existing label aliasing pattern and aligns with the expected output in the new grouping tests.

crates/icegate-query/tests/loki/mod.rs (1)

8-20: LGTM!

Clean module organization with appropriate documentation for the new grouping test module.

crates/icegate-query/tests/loki/grouping.rs (4)

41-239: Well-designed test data helper.

The build_grouping_test_record_batch function creates comprehensive test data with varied label combinations (service_name, node, pod, instance) that effectively exercise the grouping merge semantics. The proper handling of Arrow schema construction from Iceberg schema and MAP field configuration is correct.


241-278: LGTM!

Standard Iceberg write pattern with proper transaction handling. The unique suffix based on nanosecond timestamp ensures file name uniqueness across test runs.


280-332: Good test for By(inner) + Without(outer) cross-type grouping.

The test correctly validates that the outer Without(pod) grouping takes precedence, excluding the pod label from results. The conditional check at line 313 (if !result.is_empty()) is acceptable since the test primarily verifies error-free execution and correct structure when data exists.


334-395: LGTM!

Comprehensive test for Without(inner) + By(outer) cross-type grouping. The dual-check for service/service_name at lines 377-380 correctly handles potential Loki-compatible label aliasing in output.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @crates/icegate-query/src/logql/datafusion/planner.rs:
- Around line 481-584: The Grouping::By path in build_filtered_grouping_exprs
currently allows binary indexed columns (trace_id, span_id) to be added to
grouping which DataFusion can't handle; update the loop that builds indexed_cols
in build_filtered_grouping_exprs to detect mapped_name == "trace_id" or
"span_id" and return a clear error (or remove them if you prefer silent
filtering) instead of pushing them into indexed_cols, and make the same
validation/filtering change in build_grouping_components so any user-specified
by(...) label that maps to FixedSizeBinary is rejected/filtered consistently
(refer to symbols Self::map_label_to_internal_name, Self::is_top_level_field,
LOG_INDEXED_ATTRIBUTE_COLUMNS, build_filtered_grouping_exprs and
build_grouping_components).
- Around line 1198-1252: The By-branch allows binary columns like
trace_id/span_id to be used in grouping which causes runtime failures; update
the By handling in the same block that builds indexed_attributes to either (a)
skip binary columns (e.g., if mapped_name == "trace_id" || mapped_name ==
"span_id" continue) or (b) validate and return a QueryError::NotImplemented with
a clear message when a user requests grouping by a binary column; use the same
helpers referenced here (Self::map_label_to_internal_name,
Self::is_top_level_field, LOG_INDEXED_ATTRIBUTE_COLUMNS, and indexed_attributes)
so the By branch mirrors the Without branch’s binary-column filtering/validation
logic.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3dd607e and ee37601.

📒 Files selected for processing (2)
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/tests/loki/grouping.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/icegate-query/tests/loki/grouping.rs
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
crates/icegate-query/src/logql/**

📄 CodeRabbit inference engine (AGENTS.md)

Regenerate LogQL ANTLR4 parser using make install and make gen in crates/icegate-query/src/logql/ directory (requires Java)

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
🧬 Code graph analysis (1)
crates/icegate-query/src/logql/datafusion/planner.rs (1)
crates/icegate-query/src/logql/datafusion/udf.rs (7)
  • name (59-61)
  • name (119-121)
  • name (400-402)
  • name (1876-1878)
  • name (1989-1991)
  • name (2152-2154)
  • name (2257-2259)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test beta on ubuntu-latest
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Security Audit
  • GitHub Check: Benchmark
  • GitHub Check: Build Release
🔇 Additional comments (5)
crates/icegate-query/src/logql/datafusion/planner.rs (5)

687-699: LGTM - Grouping validation is correct.

The validation correctly enforces that sum_over_time and rate_counter do not support grouping clauses, per Loki spec. This prevents the dead code issue flagged in previous reviews.


787-813: LGTM - Grouping integration is well-structured.

The logic correctly:

  • Filters attributes before aggregation to ensure last_value preserves the filtered MAP (line 795)
  • Routes through appropriate helpers based on whether grouping is specified
  • Reuses serialized columns for RateCounter without grouping to avoid redundant work

The comment at line 798 accurately notes that RateCounter cannot reach the grouping branch due to validation, which helps maintain code clarity.


897-907: LGTM - Schema-aware selection prevents runtime errors.

The schema-aware column selection is good defensive programming. By checking column existence before selection (lines 900, 905), the code handles cases where grouping specifications result in different output schemas. This prevents potential "column not found" errors at runtime.


1114-1182: LGTM - Grouping merge logic is well-reasoned and tested.

The grouping merge semantics are clearly documented and handle all cases appropriately:

  • Same-type merges combine label sets (lines 1140-1159)
  • Cross-type merges prioritize outer grouping (lines 1165-1178) with clear rationale
  • Non-supporting operations skip grouping pushdown (lines 1122-1125)

The comprehensive comments explain the design decisions, and per the previous review discussion, integration tests were added in crates/icegate-query/tests/loki/grouping.rs that validate these behaviors.


1355-1362: LGTM - Label mapping addition is consistent.

The account_idcloud_account_id mapping follows the same pattern as existing Loki/Grafana compatibility mappings (levelseverity_text, serviceservice_name). The change is straightforward and consistent with the codebase conventions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @crates/icegate-query/src/logql/datafusion/planner.rs:
- Around line 629-634: The iteration in build_grouping_components over
LOG_INDEXED_ATTRIBUTE_COLUMNS should mirror build_filtered_grouping_exprs by
skipping binary columns when grouping_kind is Without; update
build_grouping_components to exclude "trace_id" and "span_id" (or adjust
excluded_indexed_cols) in the filter/closure so the map that builds indexed_cols
does not include those binary columns for the Without case, keeping behavior
consistent with build_filtered_grouping_exprs.

In @crates/icegate-query/tests/loki/grouping.rs:
- Around line 313-328: The test currently skips label assertions when `result`
is empty via `if !result.is_empty()` which can mask failures; replace the guard
with an explicit non-empty assertion (e.g., `assert!(!result.is_empty(),
"<context> expected non-empty result after writing data")`) before iterating
over `result` so the test fails loudly if grouping produced no series; apply the
same change to the other test blocks flagged (the similar guards around `result`
handling at the other locations referenced in this file).
🧹 Nitpick comments (4)
crates/icegate-query/src/loki/formatters.rs (1)

419-430: Consider using hex::encode or a more efficient hex conversion.

The fold-based hex encoding works correctly but allocates a new String on each byte. Consider using a pre-sized buffer or a hex encoding library if available in the crate dependencies.

🔎 More efficient hex encoding
         // trace or span id are stored as fixed-size binary arrays
         if let Some(arr) = batch.column(*idx).as_any().downcast_ref::<FixedSizeBinaryArray>() {
             if !arr.is_null(row) {
                 // Convert binary trace_id/span_id to hex string
-                use std::fmt::Write;
-                let hex_value = arr.value(row).iter().fold(String::new(), |mut acc, b| {
-                    let _ = write!(acc, "{b:02x}");
-                    acc
-                });
+                let bytes = arr.value(row);
+                let mut hex_value = String::with_capacity(bytes.len() * 2);
+                for b in bytes {
+                    use std::fmt::Write;
+                    let _ = write!(hex_value, "{b:02x}");
+                }
                 labels.insert(interner.intern(col_name), interner.intern(&hex_value));
             }
         }
crates/icegate-query/tests/loki/grouping.rs (2)

119-126: Hardcoded schema field index is fragile.

Using arrow_schema.field(11) assumes the attributes field is always at index 11. If the schema changes (fields added/removed/reordered), this will silently break or access the wrong field.

🔎 Use field lookup by name instead
-    let attributes_field = arrow_schema.field(11);
+    let attributes_field = arrow_schema
+        .fields()
+        .iter()
+        .find(|f| f.name() == "attributes")
+        .expect("attributes field must exist in schema");

Also applies to line 669 in build_binary_grouping_test_batch.


557-599: Consider extracting shared Iceberg write logic to reduce duplication.

write_binary_grouping_test_logs and write_grouping_test_logs share nearly identical Iceberg writer setup and commit logic. A shared helper could reduce maintenance burden.

🔎 Extract common write logic
async fn write_test_batch_to_iceberg(
    table: &Table,
    catalog: &Arc<dyn Catalog>,
    batch: RecordBatch,
    prefix: &str,
) -> Result<(), Box<dyn std::error::Error>> {
    use std::time::{SystemTime, UNIX_EPOCH};

    let unique_suffix = format!("{}-{}", prefix, SystemTime::now().duration_since(UNIX_EPOCH)?.as_nanos());

    let location_generator = DefaultLocationGenerator::new(table.metadata().clone())?;
    let file_name_generator = DefaultFileNameGenerator::new(unique_suffix, None, DataFileFormat::Parquet);
    // ... rest of shared logic
}
crates/icegate-query/src/logql/datafusion/planner.rs (1)

783-793: Unused _indexed_cols variable.

Line 785 extracts (_indexed_cols, filtered_attrs_expr) from build_grouping_components, but only filtered_attrs_expr is used. The indexed columns are recomputed in build_filtered_grouping_exprs. This works but is slightly inefficient - consider if the helpers can be consolidated.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee37601 and ff11753.

📒 Files selected for processing (3)
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/src/loki/formatters.rs
  • crates/icegate-query/tests/loki/grouping.rs
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-query/src/loki/formatters.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/tests/loki/grouping.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-query/src/loki/formatters.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/tests/loki/grouping.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-query/src/loki/formatters.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/tests/loki/grouping.rs
crates/icegate-query/src/logql/**

📄 CodeRabbit inference engine (AGENTS.md)

Regenerate LogQL ANTLR4 parser using make install and make gen in crates/icegate-query/src/logql/ directory (requires Java)

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
🧠 Learnings (1)
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to crates/icegate-common/src/schema.rs : Iceberg tables (logs, spans, events, metrics) must use tenant_id partitioning, ZSTD compression, and MAP(VARCHAR,VARCHAR) for attributes

Applied to files:

  • crates/icegate-query/src/loki/formatters.rs
  • crates/icegate-query/tests/loki/grouping.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Security Audit
  • GitHub Check: Build Release
  • GitHub Check: Benchmark
  • GitHub Check: Test beta on ubuntu-latest
🔇 Additional comments (5)
crates/icegate-query/src/logql/datafusion/planner.rs (5)

681-693: LGTM!

The validation correctly prevents grouping for range aggregations that don't support it (sum_over_time, rate_counter), with a clear error message.


890-901: LGTM!

Defensive schema checking ensures only columns present after aggregation are selected, preventing runtime errors when grouping filters out columns.


1129-1173: LGTM! Cross-type grouping merge logic is well-documented and tested.

The merge semantics are clearly explained with examples:

  • Same-type merges combine label sets (union)
  • Cross-type merges have outer grouping take complete precedence

This addresses the previous review concern about test coverage, which is now provided in grouping.rs.


1346-1352: LGTM!

The account_idcloud_account_id mapping follows the established pattern for translating user-facing label names to internal column names.


504-518: Binary columns in By grouping cause inconsistency and potential runtime errors.

The By case (lines 504-537) adds indexed columns directly to grouping expressions without checking for binary columns:

grouping_exprs.extend(indexed_cols.iter().map(|c| col(*c)));

This contrasts with the Without case (line 554), which explicitly skips binary columns with the comment: "Skip trace_id and span_id as they are FixedSizeBinary and can't be grouped."

Additionally, hex encoding for binary columns is applied only in the final output select (lines 1357-1375), which occurs after aggregation, not before. This means GROUP BY trace_id or GROUP BY span_id attempts to group by FixedSizeBinary columns directly, which will fail at DataFusion execution time.

Either:

  1. Add a filter in the By case (lines 504-537) to skip binary columns like the Without case does, or
  2. Apply hex encoding before the GROUP BY operation (cast FixedSizeBinary to Binary and encode to hex strings) to allow binary columns in grouping expressions.

The test test_grouping_by_binary_columns expects sum by (trace_id) and sum by (span_id) queries to work, but the current code would fail these at execution time without the hex encoding being applied before grouping.

@frisbeeman frisbeeman marked this pull request as draft January 6, 2026 23:36
@frisbeeman frisbeeman linked an issue Jan 7, 2026 that may be closed by this pull request
@frisbeeman frisbeeman marked this pull request as ready for review January 7, 2026 23:18
Add handling for `by` and `without` clauses in range and vector aggregation queries, enabling proper label grouping and attribute filtering. Extend test suite to cover grouping scenarios.
Refactor grouping logic to ensure outer groupings take precedence in cross-type merges. Add a dedicated test module to validate By and Without grouping combinations in nested aggregations.
Enable `FixedSizeBinary` columns for use in GROUP BY operations. Update grouping logic to include these columns and add a new test to verify their functionality.
Enable `FixedSizeBinary` columns for use in GROUP BY operations. Update grouping logic to include these columns and add a new test to verify their functionality.
@frisbeeman frisbeeman force-pushed the gh-34-unwrap-grouping branch from a1648cf to 8bfe760 Compare January 7, 2026 23:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/icegate-query/src/logql/datafusion/planner.rs (2)

779-805: Consider consolidating grouping helpers to avoid duplicate computation.

Lines 783-791 call both build_grouping_components (to get filtered_attrs_expr) and build_filtered_grouping_exprs (to get grouping expressions). Both functions internally compute the indexed columns list, resulting in duplicate work.

♻️ Suggested consolidation

Consider refactoring to a single helper that returns all needed components:

/// Returns (grouping_exprs, filtered_attrs_expr)
fn build_grouping_with_filtered_attrs(
    grouping: &Grouping,
    include_timestamp: bool,
) -> (Vec<Expr>, Expr) {
    // Compute indexed_cols and attr_labels once
    // Build both grouping_exprs and filtered_attrs from shared data
}

This would reduce code duplication and improve performance for complex grouping specifications.


1215-1234: Silent handling of non-existent labels in by clause.

When a user specifies by (non_existent_label) and the label doesn't map to a top-level field, but the attributes column exists, the code silently ignores the label (lines 1221-1233). This could mask user errors where they typo a label name. Consider logging a warning or adding validation that the label actually exists in the attributes MAP.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff11753 and a1648cf.

📒 Files selected for processing (2)
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/tests/loki/grouping.rs
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
crates/icegate-query/src/logql/**

📄 CodeRabbit inference engine (AGENTS.md)

Regenerate LogQL ANTLR4 parser using make install and make gen in crates/icegate-query/src/logql/ directory (requires Java)

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
🧠 Learnings (1)
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to crates/icegate-common/src/schema.rs : Iceberg tables (logs, spans, events, metrics) must use tenant_id partitioning, ZSTD compression, and MAP(VARCHAR,VARCHAR) for attributes

Applied to files:

  • crates/icegate-query/tests/loki/grouping.rs
🧬 Code graph analysis (1)
crates/icegate-query/src/logql/datafusion/planner.rs (1)
crates/icegate-query/src/logql/datafusion/udf.rs (14)
  • new (47-51)
  • new (107-111)
  • new (364-368)
  • new (1864-1868)
  • new (1954-1958)
  • new (2069-2073)
  • new (2245-2249)
  • name (59-61)
  • name (119-121)
  • name (400-402)
  • name (1876-1878)
  • name (1989-1991)
  • name (2152-2154)
  • name (2257-2259)
🔇 Additional comments (15)
crates/icegate-query/tests/loki/grouping.rs (8)

1-9: LGTM! Well-documented test module with appropriate clippy allowances.

The module documentation clearly explains the test purpose, and the clippy allows are reasonable for test code that needs expect_used, print_stdout for debugging, and other test-specific patterns.


43-239: Well-structured test data builder with comprehensive label combinations.

The function creates diverse test data with multiple label combinations (service_name, node, pod, instance) that properly exercises the grouping logic. The attributes MAP construction follows the correct Arrow pattern with MapBuilder.


242-278: LGTM!

The writer setup follows the standard Iceberg pattern and correctly commits the data files via transaction.


280-335: LGTM! Comprehensive test for cross-type By/Without grouping merge.

The test correctly validates that when an inner by grouping is combined with an outer without grouping, the outer without takes precedence. The non-empty result assertion ensures failures are not masked by empty results.


337-401: LGTM!

Good coverage of the reverse cross-type merge case. The assertion handling both "service" and "service_name" correctly accounts for Loki label name mapping.


403-473: LGTM!

Correctly tests same-type grouping merge (By + By = union of labels). The assertions properly handle label name mapping variations.


616-745: LGTM! Test data correctly sets up duplicate binary values for grouping verification.

The test creates distinct and duplicate trace_id/span_id values to verify that GROUP BY correctly aggregates rows with matching binary column values.


475-569: Verify DataFusion supports GROUP BY on FixedSizeBinary columns.

This test validates grouping by trace_id and span_id columns through metric aggregation queries (sum by (trace_id), sum by (span_id)). These queries route through plan_range_aggregationbuild_label_grouping_exprs, which includes LOG_INDEXED_ATTRIBUTE_COLUMNS directly without hex-encoding transformation. The binary columns remain as FixedSizeBinary types during the GROUP BY operation—hex encoding only occurs in the final transform_output_columns output step. Confirm DataFusion's aggregate() can handle FixedSizeBinary columns in grouping expressions; if not, the test will fail at runtime.

crates/icegate-query/src/logql/datafusion/planner.rs (7)

70-88: LGTM! Clean restructuring centralizes output transformation.

The unified transform_output_columns call at line 87 ensures consistent binary column hex encoding and level alias for both log and metric queries.


586-646: LGTM! Helper correctly computes grouping components.

The separation of indexed column names and filtered attributes expression enables flexible usage. The Without case correctly falls back to the full attributes MAP when no exclusions are specified.


679-691: LGTM! Correct validation per Loki spec.

The validation correctly rejects by/without grouping for sum_over_time and rate_counter operations, which don't support grouping per Loki specification.


888-900: LGTM! Correct post-aggregation column selection.

The schema-based filtering at lines 892-893 correctly handles the case where only grouped columns exist after aggregation. This ensures the select doesn't fail on columns not present in the result.


1127-1171: LGTM! Well-documented grouping merge logic.

The merge semantics are clearly explained:

  • Same-type merges combine label sets (union)
  • Cross-type merges defer to outer grouping (more restrictive wins)

This aligns with the integration tests in grouping.rs.


1344-1351: LGTM! Label mapping extended for account_id.

The new mapping enables users to use account_id in queries while internally referencing the cloud_account_id column, maintaining consistency with other label name mappings.


480-579: Binary column GROUP BY support is properly implemented.

DataFusion 51 supports GROUP BY on FixedSizeBinary columns. The hex encoding in transform_output_columns is correct—it casts FixedSizeBinary to Binary first (required by the encode() function), then encodes to hex with encode(cast(col, Binary), 'hex'). Binary columns (trace_id, span_id) are handled consistently throughout the query planner: grouped as indexed columns when specified in by clauses, then converted to hex strings in output. The implementation is sound.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/icegate-query/tests/loki/unwrap.rs (1)

799-849: Test name is misleading - considers renaming for clarity.

The test is named test_sum_over_time_with_grouping_rejected but it expects success (lines 830-831). The comments explain that grouping is handled at the vector aggregation level when sum_over_time (which doesn't support direct grouping) is wrapped in an outer aggregation.

Consider renaming to something like test_sum_over_time_grouping_via_vector_aggregation to better reflect the actual behavior being tested.

📝 Suggested rename
 #[tokio::test]
-async fn test_sum_over_time_with_grouping_rejected() -> Result<(), Box<dyn std::error::Error>> {
+async fn test_sum_over_time_grouping_via_vector_aggregation() -> Result<(), Box<dyn std::error::Error>> {

Apply the same pattern to test_rate_counter_with_grouping_rejected at line 852.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1648cf and 8bfe760.

📒 Files selected for processing (4)
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/tests/loki/mod.rs
  • crates/icegate-query/tests/loki/unwrap.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/icegate-query/tests/loki/mod.rs
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
crates/icegate-query/src/logql/**

📄 CodeRabbit inference engine (AGENTS.md)

Regenerate LogQL ANTLR4 parser using make install and make gen in crates/icegate-query/src/logql/ directory (requires Java)

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
🧠 Learnings (1)
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to crates/icegate-common/src/schema.rs : Iceberg tables (logs, spans, events, metrics) must use tenant_id partitioning, ZSTD compression, and MAP(VARCHAR,VARCHAR) for attributes

Applied to files:

  • crates/icegate-query/tests/loki/grouping.rs
🧬 Code graph analysis (1)
crates/icegate-query/src/logql/datafusion/planner.rs (1)
crates/icegate-query/src/logql/datafusion/udf.rs (7)
  • name (59-61)
  • name (119-121)
  • name (400-402)
  • name (1876-1878)
  • name (1989-1991)
  • name (2152-2154)
  • name (2257-2259)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test beta on ubuntu-latest
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Benchmark
🔇 Additional comments (6)
crates/icegate-query/src/logql/datafusion/planner.rs (3)

889-899: LGTM! Good defensive programming for schema-aware column selection.

After aggregation with grouping, this code correctly selects only the columns that exist in the resulting schema. This handles cases where grouping has filtered out certain indexed columns, avoiding errors from referencing non-existent columns.


1348-1348: LGTM! Label mapping addition.

The account_idcloud_account_id mapping follows the existing pattern for label name compatibility (e.g., levelseverity_text).


480-579: DataFusion fully supports GROUP BY on FixedSizeBinary columns.

The web search confirms that DataFusion's hash-grouping operator supports FixedSizeBinary as a GROUP BY key. Historical issues with some operations (min/max accumulators) were fixed via arrow-rs/DataFusion updates. The implementation is correct: trace_id and span_id are included in grouping expressions (lines 480–579), and transform_output_columns converts them to hex strings only after grouping completes. The test test_grouping_by_binary_columns explicitly validates this functionality.

crates/icegate-query/tests/loki/unwrap.rs (1)

684-955: LGTM! Comprehensive grouping test coverage for unwrap aggregations.

The new tests thoroughly validate grouping behavior for unwrap-based range aggregations:

  • by grouping with single and multiple labels
  • without grouping to exclude specific labels
  • Vector aggregation handling when inner operations don't support direct grouping

The assertions correctly verify label presence/absence based on grouping specifications.

crates/icegate-query/tests/loki/grouping.rs (2)

475-569: Binary column grouping test validates previously questioned functionality.

This test explicitly verifies grouping by trace_id (FixedSizeBinary(16)) and span_id (FixedSizeBinary(8)), which past review comments questioned. The test structure is sound and will definitively verify whether DataFusion supports FixedSizeBinary in GROUP BY operations.

If this test passes, it confirms the planner's approach of allowing binary columns in grouping expressions is correct. If it fails, it will surface the issue that needs fixing.


280-473: LGTM! Comprehensive cross-type grouping merge validation.

These tests thoroughly validate the grouping merge semantics implemented in the planner:

  • Cross-type merges (By+Without, Without+By): Outer grouping takes complete precedence, inner is discarded
  • Same-type merges (By+By): Labels are merged (union)

The assertions correctly verify that:

  • Excluded labels don't appear in results (Without semantics)
  • Only specified labels appear (By semantics)
  • Label mappings work correctly (service_name → service, account_id → cloud_account_id)

This directly addresses the past review concerns about cross-type grouping merge behavior documented in earlier review comments.

…red attributes handling. Rename and unify methods for clarity and efficiency while avoiding duplicate computation. Update related tests to align with changes.
@frisbeeman
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
crates/icegate-query/src/logql/datafusion/planner.rs (1)

743-746: Minor: Incorrect line reference in comment.

The comment at line 745 references "lines 687-699" for the RateCounter validation, but the actual validation is at lines 635-645.

📝 Suggested fix
             // Return the grouping expressions
-            // Note: RateCounter cannot reach this branch due to validation at lines 687-699
+            // Note: RateCounter cannot reach this branch due to validation at lines 635-645
             grouping_exprs
crates/icegate-query/tests/loki/unwrap.rs (2)

718-720: Prefer assert! over panic! in test assertions.

Using panic! directly bypasses the test framework's assertion reporting. Using assert! provides better diagnostics and consistency.

♻️ Suggested fix
-    if result.is_empty() {
-        panic!("Result is empty!");
-    } else {
+    assert!(!result.is_empty(), "Result should not be empty");
+    {
         let series = &result[0];

774-776: Same refactor applies to other tests.

The same panic!("Result is empty!") pattern appears in multiple tests. Consider using assert!(!result.is_empty(), "...") consistently across all these test functions for better test diagnostics.

Also applies to: 835-837, 886-888, 936-938

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ebf5ff1 and e00255e.

📒 Files selected for processing (2)
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/tests/loki/unwrap.rs
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
crates/icegate-query/src/logql/**

📄 CodeRabbit inference engine (AGENTS.md)

Regenerate LogQL ANTLR4 parser using make install and make gen in crates/icegate-query/src/logql/ directory (requires Java)

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
🔇 Additional comments (8)
crates/icegate-query/src/logql/datafusion/planner.rs (5)

70-88: LGTM! Clean refactoring of the plan method.

Consolidating the transform_output_columns call outside the match expression reduces duplication and ensures consistent post-processing for both log and metric queries.


496-600: Well-structured grouping helper with good documentation.

The consolidation of grouping logic into a single helper that returns both grouping expressions and filtered attributes in one pass is a solid approach that avoids duplicate computation. The handling of edge cases (empty attribute lists) is properly addressed.


633-645: Correct validation of grouping support per Loki spec.

The validation correctly rejects direct grouping on sum_over_time and rate_counter operations, which don't support by/without clauses according to Loki specification. Users can still achieve grouping by wrapping these in vector aggregations.


1082-1128: Comprehensive grouping merge logic with clear rationale.

The merge strategy is well-designed:

  • Same-type merges correctly combine label sets while avoiding duplicates
  • Cross-type merges appropriately give precedence to the outer aggregation's grouping specification

The detailed comments explaining the rationale for each case improve maintainability.


1299-1306: LGTM! Label mapping extension for cloud_account_id.

The new account_idcloud_account_id mapping follows the established pattern for Loki/Grafana label compatibility.

crates/icegate-query/tests/loki/unwrap.rs (3)

799-849: Good test coverage for vector aggregation grouping fallback.

This test validates the important case where sum_over_time (which doesn't support direct grouping) can still be grouped via a wrapping vector aggregation like sum by (service_name) (sum_over_time(...)). The test comments clearly explain this distinction.


851-900: Comprehensive test for rate_counter vector aggregation grouping.

This test validates the interaction between rate_counter (which doesn't support direct by/without clauses) and vector aggregation grouping. Good coverage for the validation logic in the planner.


902-955: Good test for multi-label grouping.

Testing min by (service_name, level) validates that the grouping logic correctly handles multiple labels including mapped labels (levelseverity_text).

…rror clarity. Update planner comment for consistency.
@frisbeeman
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/icegate-query/tests/loki/unwrap.rs (1)

454-487: Track the ignored quantile test.

The test is appropriately marked as ignored with a clear explanation. However, this represents missing functionality that should be tracked.

Would you like me to help create an issue to track implementing the quantile parameter extraction in the parser?

crates/icegate-query/src/logql/datafusion/planner.rs (1)

633-645: Clarify comment scope for grouping validation.

The comment on line 634 mentions "rate" and "absent_over_time" but the validation only checks SumOverTime and RateCounter. Since this code path is specific to unwrap-based range aggregations, the comment should clarify this scope to avoid confusion.

Suggested comment clarification
-        // 1.5. Validate grouping support
-        // Per Loki spec: sum_over_time, rate, rate_counter, and absent_over_time don't support grouping
+        // 1.5. Validate grouping support for unwrap-based range aggregations
+        // Per Loki spec: sum_over_time and rate_counter don't support grouping with unwrap expressions
         if agg.grouping.is_some() {
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e00255e and 449ac6d.

📒 Files selected for processing (2)
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/tests/loki/unwrap.rs
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
crates/icegate-query/src/logql/**

📄 CodeRabbit inference engine (AGENTS.md)

Regenerate LogQL ANTLR4 parser using make install and make gen in crates/icegate-query/src/logql/ directory (requires Java)

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
🧬 Code graph analysis (1)
crates/icegate-query/src/logql/datafusion/planner.rs (1)
crates/icegate-query/src/logql/datafusion/udf.rs (7)
  • name (59-61)
  • name (119-121)
  • name (400-402)
  • name (1876-1878)
  • name (1989-1991)
  • name (2152-2154)
  • name (2257-2259)
🔇 Additional comments (12)
crates/icegate-query/tests/loki/unwrap.rs (6)

1-42: LGTM - Standard test setup.

The clippy allow directives are appropriate for test code, and all imports are relevant to the test scenarios.


43-236: LGTM - Well-structured test data helpers.

The functions properly construct Arrow record batches with numeric attributes for unwrap testing and use Iceberg transactions for data persistence. The unique suffix generation using nanoseconds is appropriate given the port-based test isolation.


238-682: Good test coverage for basic unwrap aggregations.

The tests comprehensively cover various unwrap aggregation functions with proper HTTP response validation. Each test uses a unique port to avoid conflicts.


684-735: LGTM - Comprehensive grouping test with 'by' clause.

The test correctly verifies that the by clause filters labels appropriately, checking for the presence of the grouped label (service_name) and absence of filtered labels (account_id). The flexible assertion for "service" vs "service_name" properly handles Loki compatibility mappings.


737-791: LGTM - Well-designed 'without' clause test.

The test properly verifies that the without clause excludes specified labels (account_id) while preserving others (service_name, severity_text). This validates the inverse grouping semantics.


793-940: LGTM - Excellent coverage of vector aggregation grouping patterns.

These tests thoroughly validate the behavior of grouping when range aggregations are wrapped in vector aggregations. The comments clearly document the design decision that grouping is handled at the vector aggregation level. The test for multiple label grouping (lines 890-940) adds valuable coverage for complex grouping scenarios.

crates/icegate-query/src/logql/datafusion/planner.rs (6)

69-88: LGTM: Clean refactoring of output transformation.

The restructuring properly centralizes transform_output_columns to apply uniformly after planning, improving maintainability without changing functionality.


480-600: Well-structured grouping helper with proper MAP filtering.

The implementation correctly handles both by and without semantics, using UDFs to filter the attributes MAP and serializing for grouping. The empty-label optimization (lines 539-542) is appropriate.


733-760: LGTM: Proper grouping application with filtered attributes.

The logic correctly handles both grouped and ungrouped cases, with appropriate special handling for RateCounter that reuses existing serialized columns. The attribute filtering before aggregation ensures last_value preserves the correct MAP.


844-854: LGTM: Defensive schema-aware column selection.

Checking column existence before selecting prevents runtime errors when grouping filters out indexed columns. The binary column encoding is correctly deferred to transform_output_columns.


1055-1237: Complex but well-reasoned grouping pushdown and merge logic.

The implementation correctly handles grouping propagation from vector to range aggregations, with appropriate same-type and cross-type merge strategies. The cross-type precedence rules (lines 1108-1125) where outer grouping takes complete precedence are well-documented and align with the principle that the outer operation defines the final output structure.

The defensive schema checks (lines 1157, 1176, 1178-1184) properly validate label availability before grouping.


1303-1303: LGTM: Consistent label mapping addition.

The account_idcloud_account_id mapping follows the established pattern for label name translation and enables user-facing queries to use account_id while the table schema uses cloud_account_id.

…ribute grouping logic

Introduce a `RUST.md` file detailing Rust code standards and best practices for consistent, high-quality development. Upgrade the planner's handling of attribute MAP columns for grouping, ensuring proper type handling and improving the clarity of methods. Add new tests to verify the behavior of attribute-based grouping.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/icegate-query/src/logql/datafusion/planner.rs (1)

69-88: Propagate cast failures for hex encoding instead of silently falling back to original column.

At lines 240 and 304, using unwrap_or_else(|_| col(name)) can pass a non-Binary type to encode(), causing runtime failures that are harder to debug. This pattern is inconsistent with the correct approach at line 191, which uses ? to propagate cast errors during planning.

Proposed fixes

Line 240 (transform_output_columns):

-                            col(name).cast_to(&DataType::Binary, &schema).unwrap_or_else(|_| col(name)),
+                            col(name).cast_to(&DataType::Binary, &schema)?,

Line 304 (plan_distinct_series):

-                            col(c).cast_to(&DataType::Binary, &schema).unwrap_or_else(|_| col(c)),
+                            col(c).cast_to(&DataType::Binary, &schema)?,
🤖 Fix all issues with AI agents
In @crates/icegate-query/src/logql/datafusion/planner.rs:
- Around line 496-605: The vector-aggregation "no grouping" path in
plan_vector_aggregation currently groups by all labels/attributes and expects
COL_ATTR_KEYS/COL_ATTR_VALS, which prevents collapsing series and references
columns that may not exist; change the branch when agg.grouping.is_none() to
group only by the timestamp (grid_timestamp) so aggregations like sum() collapse
across series, do not add or reference COL_ATTR_KEYS or COL_ATTR_VALS there, and
ensure the returned/used "filtered attributes" expression is a safe placeholder
(e.g., col("attributes") or a typed empty map expression) so downstream code
doesn't assume those serialized attr columns were created by an earlier
plan_range_aggregation.

In @crates/icegate-query/tests/loki/grouping.rs:
- Around line 280-335: The tests use fixed ports causing flakiness; update
TestServer::start to accept 0 (or provide a new start_ephemeral) and pass that
port to loki::run so the OS assigns an ephemeral port, capture the actual bound
port from the listener and store/return it on TestServer (update the TestServer
struct to hold the actual_port), and then change all test invocations (e.g., the
call in test_cross_type_grouping_by_inner_without_outer) to pass 0 and use
server.base_url (or the returned actual port) so tests no longer collide.

In @RUST.md:
- Around line 193-214: Update the "Before Committing" section in RUST.md to
match the repo's Makefile targets: replace occurrences of `cargo test`, `cargo
build`, `cargo clippy -- -D warnings`, and `cargo fmt --check` with the
repository Make targets (`make test`, `make ci` or `make build` as appropriate,
`make clippy`, and `make fmt`/`make fmt --check`), and change the Python/WASM
rebuild instructions to call the documented make targets or note the required
commands (e.g., use `make ci` or add a `make develop-python`/`make build-wasm`
target) so the bullets under "Before Committing" reference `make fmt`, `make
clippy`, and `make ci` consistently instead of direct `cargo`/`rustup` commands.
- Around line 7-17: Remove the coercive sentence "If the code is not fully
optimized before handing off to the user, you will be fined $100." from RUST.md
and replace it with a neutral, actionable enforcement statement (e.g., require
follow-up optimization passes or code review escalation) so the section about
"Fully optimized" remains, but without punitive or monetary language; ensure the
new sentence keeps the intent to allow additional optimization passes and review
but uses professional, non-hostile phrasing.
🧹 Nitpick comments (3)
crates/icegate-query/tests/loki/grouping.rs (1)

115-139: Don’t rely on arrow_schema.field(11) for attributes; look up by name.
Index-based access is fragile across schema evolution and makes failures harder to interpret.

Proposed refactor
-    let attributes_field = arrow_schema.field(11);
+    let attributes_field = arrow_schema
+        .field_with_name("attributes")
+        .expect("schema must contain attributes field");

Also applies to: 835-865

RUST.md (1)

18-55: Trim repo-unrelated tech mandates (Pico CSS / Polars / JS rules / “never ingest >10 rows”, etc.).
These constraints aren’t tied to this backend query planner and will age poorly.

Also applies to: 30-41

crates/icegate-query/src/logql/datafusion/planner.rs (1)

496-605: Deduplicate mapped indexed columns in Grouping::By to avoid duplicate group keys.
If the query includes aliases like service and service_name, both map to service_name and can end up duplicated in grouping_exprs.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 825aff5 and 5562138.

📒 Files selected for processing (4)
  • AGENTS.md
  • RUST.md
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/tests/loki/grouping.rs
✅ Files skipped from review due to trivial changes (1)
  • AGENTS.md
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • RUST.md
crates/icegate-query/src/logql/**

📄 CodeRabbit inference engine (AGENTS.md)

Regenerate LogQL ANTLR4 parser using make install and make gen in crates/icegate-query/src/logql/ directory (requires Java)

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Never proactively create documentation files (*.md) or README files unless explicitly requested

Files:

  • RUST.md
🧠 Learnings (6)
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to crates/icegate-common/src/schema.rs : Iceberg tables (logs, spans, events, metrics) must use tenant_id partitioning, ZSTD compression, and MAP(VARCHAR,VARCHAR) for attributes

Applied to files:

  • crates/icegate-query/tests/loki/grouping.rs
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to Cargo.toml : Configure strict clippy and rustc lints: forbid unsafe_code, deny missing_docs and dead_code, and enable clippy pedantic/nursery

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use rustfmt for code formatting with configuration in rustfmt.toml

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use `make fmt` to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml

Applied to files:

  • RUST.md
📚 Learning: 2025-12-15T16:08:35.659Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-15T16:08:35.659Z
Learning: Applies to AGENTS.md : Follow guidelines specified in AGENTS.md

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use `make clippy` to run the linter with warnings as errors

Applied to files:

  • RUST.md
🧬 Code graph analysis (2)
crates/icegate-query/tests/loki/grouping.rs (2)
crates/icegate-query/src/loki/models.rs (1)
  • error (104-111)
crates/icegate-query/src/loki/formatters.rs (6)
  • new (59-63)
  • batch (171-171)
  • batch (195-198)
  • batch (253-253)
  • batch (412-412)
  • batch (504-504)
crates/icegate-query/src/logql/datafusion/planner.rs (2)
crates/icegate-query/src/logql/datafusion/udf.rs (7)
  • new (47-51)
  • new (107-111)
  • new (364-368)
  • new (1864-1868)
  • new (1954-1958)
  • new (2069-2073)
  • new (2245-2249)
crates/icegate-query/src/logql/common.rs (2)
  • as_str (20-27)
  • as_str (49-58)
🪛 LanguageTool
RUST.md

[style] ~24-~24: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...termfor terminal applications/TUIs. - Useaxum` for creating any web servers or ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~30-~30: Use a hyphen to join words.
Context: ...ning applications with a web-based front end interface, e.g. compiling to WASM or...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
RUST.md

25-25: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


26-26: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


28-28: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


31-31: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


32-32: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


33-33: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


34-34: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


35-35: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


36-36: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


37-37: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


39-39: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


40-40: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


41-41: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


43-43: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


44-44: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


45-45: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


46-46: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


47-47: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


48-48: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


49-49: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


50-50: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


51-51: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


52-52: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


53-53: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


54-54: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


181-181: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Security Audit
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Test beta on ubuntu-latest
  • GitHub Check: Benchmark
  • GitHub Check: Build Release
🔇 Additional comments (4)
crates/icegate-query/tests/loki/grouping.rs (1)

128-138: Verify MapBuilder field naming matches the Arrow Map schema you expect.
You’re overriding MapFieldNames and then .with_keys_field(...) / .with_values_field(...); if the schema’s entry/key/value field names differ, this can silently break downstream reads.

Also applies to: 849-859

crates/icegate-query/src/logql/datafusion/planner.rs (3)

534-542: Confirm arrow_cast(..., "List(Utf8)") is the correct DataFusion type string.
If the type string is off, this will fail at planning/runtime; consider a helper constant or DataFusion-native typed empty list construction if available.


1068-1152: Grouping merge/pushdown rules are clear and test-backed.
The cross-type precedence (outer wins) matches the new E2E tests and makes the behavior deterministic.


1277-1284: account_idcloud_account_id mapping looks correct for Loki label compatibility.
This should reduce user confusion while still hitting the physical column.

…ngly

Replace hardcoded port numbers in `TestServer::start` with OS-assigned ephemeral ports to avoid port conflicts during parallel test execution. Update all affected tests to reflect this change. Additionally, enhance planner grouping logic to include improved attribute handling and collapse series behavior.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @crates/icegate-query/src/logql/datafusion/planner.rs:
- Line 1497: The calls to make_array(label_literals) in apply_drop and
apply_keep reference make_array from datafusion::functions_nested but that
import is missing; add the import use
datafusion::functions_nested::make_array::make_array into the scope of both
apply_drop and apply_keep (or at the module/top of file if preferred) so the
symbol is available when building filtered_attrs (used in
udf.call(vec![col("attributes"), make_array(label_literals)]) ).
🧹 Nitpick comments (3)
RUST.md (2)

25-54: Markdown nested list indentation uses 4 spaces instead of 2.

The static analysis tool flagged inconsistent indentation for nested list items. Markdownlint expects 2-space indentation for nested items, but these use 4 spaces.

🔧 Fix nested list indentation

Change indentation from 4 spaces to 2 spaces for all nested bullet points under lines 25-54 and 181. For example:

 - Use `axum` for creating any web servers or HTTP APIs.
-    - Keep request handlers async, returning `Result<Response, AppError>` to centralize error handling.
-    - Use layered extractors and shared state structs instead of global mutable data.
+  - Keep request handlers async, returning `Result<Response, AppError>` to centralize error handling.
+  - Use layered extractors and shared state structs instead of global mutable data.

30-30: Minor grammatical fix: "front end" → "front-end".

When used as an adjective (e.g., "front-end interface"), hyphenation is grammatically correct.

📝 Suggested fix
-If designing applications with a web-based front end interface, e.g. compiling to WASM or using `dioxus`:
+If designing applications with a web-based front-end interface, e.g. compiling to WASM or using `dioxus`:
crates/icegate-query/tests/loki/grouping.rs (1)

43-239: Consider extracting shared test data builders.

The build_grouping_test_record_batch function shares significant structural similarity with build_test_record_batch in unwrap.rs. Both construct Arrow arrays for the same schema columns. While acceptable for test code, you might consider extracting a shared builder utility in the harness if more test files need similar functionality.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5562138 and 6a78750.

📒 Files selected for processing (12)
  • RUST.md
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • crates/icegate-query/src/loki/mod.rs
  • crates/icegate-query/src/loki/server.rs
  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/tests/loki/harness.rs
  • crates/icegate-query/tests/loki/labels.rs
  • crates/icegate-query/tests/loki/metrics.rs
  • crates/icegate-query/tests/loki/query_range.rs
  • crates/icegate-query/tests/loki/tenant.rs
  • crates/icegate-query/tests/loki/trace.rs
  • crates/icegate-query/tests/loki/unwrap.rs
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-query/tests/loki/metrics.rs
  • crates/icegate-query/src/loki/mod.rs
  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/tests/loki/tenant.rs
  • crates/icegate-query/tests/loki/query_range.rs
  • crates/icegate-query/tests/loki/labels.rs
  • crates/icegate-query/tests/loki/trace.rs
  • crates/icegate-query/tests/loki/harness.rs
  • crates/icegate-query/src/loki/server.rs
  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-query/tests/loki/metrics.rs
  • crates/icegate-query/src/loki/mod.rs
  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/tests/loki/tenant.rs
  • crates/icegate-query/tests/loki/query_range.rs
  • crates/icegate-query/tests/loki/labels.rs
  • crates/icegate-query/tests/loki/trace.rs
  • crates/icegate-query/tests/loki/harness.rs
  • crates/icegate-query/src/loki/server.rs
  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-query/tests/loki/metrics.rs
  • crates/icegate-query/src/loki/mod.rs
  • crates/icegate-query/tests/loki/unwrap.rs
  • crates/icegate-query/tests/loki/tenant.rs
  • crates/icegate-query/tests/loki/query_range.rs
  • crates/icegate-query/tests/loki/labels.rs
  • crates/icegate-query/tests/loki/trace.rs
  • crates/icegate-query/tests/loki/harness.rs
  • crates/icegate-query/src/loki/server.rs
  • crates/icegate-query/tests/loki/grouping.rs
  • crates/icegate-query/src/logql/datafusion/planner.rs
  • RUST.md
crates/icegate-query/src/logql/**

📄 CodeRabbit inference engine (AGENTS.md)

Regenerate LogQL ANTLR4 parser using make install and make gen in crates/icegate-query/src/logql/ directory (requires Java)

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Never proactively create documentation files (*.md) or README files unless explicitly requested

Files:

  • RUST.md
🧠 Learnings (9)
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to crates/icegate-common/src/schema.rs : Iceberg tables (logs, spans, events, metrics) must use tenant_id partitioning, ZSTD compression, and MAP(VARCHAR,VARCHAR) for attributes

Applied to files:

  • crates/icegate-query/tests/loki/grouping.rs
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use rustfmt for code formatting with configuration in rustfmt.toml

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to Cargo.toml : Configure strict clippy and rustc lints: forbid unsafe_code, deny missing_docs and dead_code, and enable clippy pedantic/nursery

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use `make fmt` to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use `make clippy` to run the linter with warnings as errors

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Run `make ci` to execute all CI checks (check, fmt, clippy, test, audit)

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Run `make audit` to perform security audits and use `make install` to install cargo-audit

Applied to files:

  • RUST.md
📚 Learning: 2025-12-15T16:08:35.659Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-15T16:08:35.659Z
Learning: Applies to CLAUDE.md : Follow guidelines specified in CLAUDE.md

Applied to files:

  • RUST.md
📚 Learning: 2025-12-15T16:08:35.659Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-15T16:08:35.659Z
Learning: Applies to AGENTS.md : Follow guidelines specified in AGENTS.md

Applied to files:

  • RUST.md
🧬 Code graph analysis (11)
crates/icegate-query/tests/loki/metrics.rs (1)
crates/icegate-query/tests/loki/harness.rs (1)
  • start (60-136)
crates/icegate-query/src/loki/mod.rs (1)
crates/icegate-query/src/loki/server.rs (2)
  • run (23-29)
  • run_with_port_tx (39-71)
crates/icegate-query/tests/loki/unwrap.rs (1)
crates/icegate-query/tests/loki/harness.rs (1)
  • start (60-136)
crates/icegate-query/tests/loki/tenant.rs (1)
crates/icegate-query/tests/loki/harness.rs (1)
  • start (60-136)
crates/icegate-query/tests/loki/query_range.rs (1)
crates/icegate-query/tests/loki/harness.rs (1)
  • start (60-136)
crates/icegate-query/tests/loki/labels.rs (1)
crates/icegate-query/tests/loki/harness.rs (1)
  • start (60-136)
crates/icegate-query/tests/loki/trace.rs (1)
crates/icegate-query/tests/loki/harness.rs (1)
  • start (60-136)
crates/icegate-query/tests/loki/harness.rs (1)
crates/icegate-query/src/loki/server.rs (1)
  • run_with_port_tx (39-71)
crates/icegate-query/src/loki/server.rs (2)
crates/icegate-query/tests/loki/harness.rs (1)
  • oneshot (109-109)
crates/icegate-common/src/config.rs (1)
  • port (25-25)
crates/icegate-query/tests/loki/grouping.rs (1)
crates/icegate-query/tests/loki/harness.rs (1)
  • start (60-136)
crates/icegate-query/src/logql/datafusion/planner.rs (2)
crates/icegate-query/src/logql/datafusion/udf.rs (7)
  • new (47-51)
  • new (107-111)
  • new (364-368)
  • new (1864-1868)
  • new (1954-1958)
  • new (2069-2073)
  • new (2245-2249)
crates/icegate-query/src/logql/common.rs (2)
  • as_str (20-27)
  • as_str (49-58)
🪛 LanguageTool
RUST.md

[style] ~24-~24: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...termfor terminal applications/TUIs. - Useaxum` for creating any web servers or ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~30-~30: Use a hyphen to join words.
Context: ...ning applications with a web-based front end interface, e.g. compiling to WASM or...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
RUST.md

25-25: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


26-26: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


28-28: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


31-31: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


32-32: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


33-33: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


34-34: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


35-35: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


36-36: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


37-37: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


39-39: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


40-40: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


41-41: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


43-43: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


44-44: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


45-45: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


46-46: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


47-47: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


48-48: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


49-49: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


50-50: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


51-51: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


52-52: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


53-53: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


54-54: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


181-181: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Benchmark
  • GitHub Check: Build Release
  • GitHub Check: Security Audit
  • GitHub Check: Test beta on ubuntu-latest
🔇 Additional comments (35)
RUST.md (1)

1-218: Comprehensive and well-structured Rust coding guidelines.

The document provides thorough coverage of Rust best practices including error handling, type system usage, concurrency patterns, and security considerations. The pre-commit checklist aligns well with the project's make ci workflow.

crates/icegate-query/src/logql/datafusion/planner.rs (6)

70-88: Clean refactor centralizing post-processing for both Log and Metric paths.

The unified transform_output_columns call after the match ensures consistent hex-encoding and level alias handling across all query types.


496-605: Well-structured grouping helper with proper By/Without semantics.

The function correctly:

  • Separates indexed columns from MAP attributes
  • Uses typed empty array via arrow_cast to avoid type inference issues
  • Applies appropriate UDFs (MapKeepKeys/MapDropKeys) for filtering

One observation: the function always invokes the UDF even when attr_labels is empty in the By case (line 542). While this works correctly (the UDF will return an empty MAP), consider if the empty-array path could be optimized to skip the UDF call entirely for marginal performance gains.


1154-1223: Vector aggregation correctly handles grouping pushdown and series collapse.

The implementation properly:

  • Pushes grouping to inner expressions when supported
  • Filters attributes before aggregation using the filtered MAP
  • Omits attributes from output when collapsing all series (no grouping)

638-650: Good validation for unsupported grouping combinations.

The early validation that sum_over_time and rate_counter don't support by/without clauses with unwrap expressions aligns with Loki semantics and provides a clear error message.


1285-1292: Label mapping for account_idcloud_account_id is valid and properly integrated.

The cloud_account_id field is a required indexed column in the schema (defined across logs, spans, events, and metrics tables) and is used as an identity partition field. This mapping is consumed by formatters and validated by existing tests. The aliasing behavior is expected downstream.


1097-1152: Cross-type grouping merge in nested aggregations needs verification against Loki behavior.

The merge logic completely discards inner grouping for cross-type cases (By + Without or Without + By), using only the outer specification. While the code documents this design rationale, there are no test cases covering this scenario and no explicit confirmation that this matches Loki's behavior. Consider adding test cases validating the behavior of queries like:

  • sum by (service) (count without (instance) ({...}))
  • sum without (pod) (count by (node))

to ensure the cross-type merge semantics match Loki's actual aggregation behavior.

crates/icegate-query/tests/loki/tenant.rs (1)

17-17: Test server startup updated to use ephemeral ports.

The change to TestServer::start() without explicit port arguments aligns with the harness refactor for OS-assigned ephemeral ports. This enables parallel test execution without port conflicts.

Also applies to: 134-134

crates/icegate-query/tests/loki/trace.rs (1)

18-18: Consistent ephemeral port startup across all trace tests.

All six test functions updated to use TestServer::start() without port arguments, maintaining consistency with the test harness changes.

Also applies to: 67-67, 111-111, 151-151, 186-186, 228-228

crates/icegate-query/tests/loki/query_range.rs (1)

17-17: Query range tests updated to ephemeral port startup.

Both test functions now use TestServer::start() for dynamic port allocation, consistent with other test file changes.

Also applies to: 76-76

crates/icegate-query/tests/loki/metrics.rs (1)

18-18: Metrics tests consistently updated for ephemeral ports.

All four metric test functions (test_count_over_time_metric, test_bytes_over_time_metric, test_rate, test_bytes_rate) now use TestServer::start() for dynamic port allocation.

Also applies to: 87-87, 143-143, 195-195

crates/icegate-query/src/loki/mod.rs (1)

15-15: Public API extended to export run_with_port_tx.

The new export enables tests to receive the actual bound port via a oneshot channel, supporting ephemeral port testing. The existing run function remains unchanged for production use.

crates/icegate-query/tests/loki/labels.rs (1)

17-17: LGTM - Test server startup updated to use ephemeral ports.

The migration from fixed ports to TestServer::start() with ephemeral port assignment is consistent across all three test functions, enabling safe parallel test execution.

Also applies to: 77-77, 170-170

crates/icegate-query/tests/loki/unwrap.rs (5)

240-240: LGTM - Consistent migration to ephemeral port test server.

All test functions correctly updated to use TestServer::start() without port arguments.

Also applies to: 288-288, 322-322, 373-373, 422-422, 456-456, 489-489, 524-524, 558-558, 592-592, 642-642


683-734: LGTM - Well-structured test for by grouping semantics.

The test properly validates that:

  1. The query succeeds with vector aggregation wrapping range aggregation
  2. The grouped label (service or service_name) is present
  3. Excluded labels (account_id) are correctly filtered out

The dual-label check (service or service_name) appropriately handles the Loki compatibility mapping.


736-790: LGTM - Comprehensive test for without grouping semantics.

Good validation that without (account_id) excludes the specified label while preserving others like service and level/severity_text.


792-887: LGTM - Good coverage of vector aggregation with grouping.

Both sum_over_time and rate_counter tests correctly validate that grouping is handled at the vector aggregation level. The comments explaining that range aggregations don't support grouping directly but vector aggregations do are helpful for understanding the query semantics.


889-939: LGTM - Multi-label grouping test.

Validates grouping by multiple labels (service_name, level) correctly, checking that both are present in the output.

crates/icegate-query/tests/loki/grouping.rs (8)

1-9: LGTM - Well-documented test module.

Good module-level documentation explaining the purpose of testing cross-type grouping merge semantics.


280-335: LGTM - Cross-type grouping test (By inner, Without outer).

Good test validating that outer Without(pod) takes precedence over inner By(node), correctly excluding the pod label from results.


337-401: LGTM - Cross-type grouping test (Without inner, By outer).

Validates that outer By(service_name) overrides inner Without(instance), ensuring only the specified label appears in output while others are excluded.


403-473: LGTM - Same-type grouping merge test.

Good coverage of By+By merge semantics with union of labels from both inner and outer groupings.


475-569: LGTM - Binary column grouping tests.

Valuable test coverage for FixedSizeBinary columns (trace_id, span_id) in GROUP BY operations. The test data setup with duplicate values is smart for validating proper grouping behavior.


616-770: LGTM - Attribute MAP grouping tests.

Comprehensive tests for the original issue with MAP attribute grouping:

  1. Mixed indexed columns and MAP attributes
  2. MAP-only attribute grouping

The assertions correctly verify label presence/absence and the expected number of distinct series.


903-1001: LGTM - Critical regression test for series collapse.

This test validates an important behavior: vector aggregation without explicit grouping should collapse all series into one. The detailed assertions and comments make the expected behavior clear.


1003-1098: LGTM - Regression test for grouping preservation.

Good companion test ensuring that explicit grouping still produces separate series per group (2 series for "api" and "backend").

crates/icegate-query/tests/loki/harness.rs (5)

43-44: LGTM - Required imports for ephemeral port support.


56-74: LGTM - Clean ephemeral port implementation.

The updated signature and documentation clearly explain the purpose. Using port 0 for OS-assigned ephemeral ports is the correct approach for avoiding port conflicts in parallel test execution.


108-121: LGTM - Port notification channel with timeout.

The 10-second timeout is reasonable for server startup. The pattern of using oneshot::channel to receive the bound port is clean and idiomatic.


123-125: Intentional memory leak for test lifetime management.

The Box::leak pattern for the tempdir is documented and acceptable in test code. The directory will be cleaned up when the process exits. Consider adding a note that this approach trades off per-test cleanup for simplicity.


127-136: LGTM - Dynamic base URL construction.

Using the actual bound port from the server notification ensures the test client connects to the correct endpoint.

crates/icegate-query/src/loki/server.rs (4)

5-5: LGTM - Required import for port notification.


23-29: LGTM - Backward-compatible refactoring.

The original run function now delegates to run_with_port_tx with None, maintaining the existing API while enabling the new functionality.


31-59: LGTM - Well-designed port notification mechanism.

Key design aspects:

  1. Optional port_tx allows both production and test use cases
  2. Port is sent after successful bind, ensuring validity
  3. Using let _ = tx.send(...) correctly handles the case where the receiver may have dropped
  4. Documentation clearly explains the ephemeral port use case

61-71: LGTM - Server lifecycle unchanged.

The graceful shutdown and logging behavior remain intact.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/icegate-query/src/logql/datafusion/planner.rs (1)

481-604: Well-structured grouping helper.

The build_grouping_with_filtered_attrs function cleanly consolidates grouping logic for both By and Without cases. A few observations:

  1. The typed empty array creation at line 537 using arrow_cast is a good defensive measure to avoid DataFusion type inference issues with empty arrays.

  2. The filtered_attrs.clone() calls on lines 550-552 and 595-598 create multiple expression clones. For large expressions, consider building the serialization expressions once.

♻️ Optional: Reduce cloning for serialization expressions
-                if attr_labels.is_empty() {
-                    // No attributes to group by - use empty MAP serialization
-                    grouping_exprs.push(lit("").alias(COL_ATTR_KEYS));
-                    grouping_exprs.push(lit("").alias(COL_ATTR_VALS));
-                } else {
-                    // Serialize filtered MAP for grouping
-                    grouping_exprs
-                        .push(array_to_string(map_keys(filtered_attrs.clone()), lit("|||")).alias(COL_ATTR_KEYS));
-                    grouping_exprs
-                        .push(array_to_string(map_values(filtered_attrs.clone()), lit("|||")).alias(COL_ATTR_VALS));
-                }
-
-                (grouping_exprs, filtered_attrs)
+                let (keys_expr, vals_expr) = if attr_labels.is_empty() {
+                    (lit("").alias(COL_ATTR_KEYS), lit("").alias(COL_ATTR_VALS))
+                } else {
+                    (
+                        array_to_string(map_keys(filtered_attrs.clone()), lit("|||")).alias(COL_ATTR_KEYS),
+                        array_to_string(map_values(filtered_attrs.clone()), lit("|||")).alias(COL_ATTR_VALS),
+                    )
+                };
+                grouping_exprs.push(keys_expr);
+                grouping_exprs.push(vals_expr);
+
+                (grouping_exprs, filtered_attrs)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a78750 and 60d7908.

📒 Files selected for processing (1)
  • crates/icegate-query/src/logql/datafusion/planner.rs
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{rs,toml}

📄 CodeRabbit inference engine (AGENTS.md)

Use cargo build for debug builds, cargo build --release for release builds, and specific binary builds with cargo build --bin <name>

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run all tests with cargo test, specific tests with cargo test test_name, and use --nocapture flag to show test output
Use make fmt to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml
Use make clippy to run the linter with warnings as errors
Run make audit to perform security audits and use make install to install cargo-audit
Run make ci to execute all CI checks (check, fmt, clippy, test, audit)
Use rustfmt for code formatting with configuration in rustfmt.toml

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
crates/icegate-query/src/logql/**

📄 CodeRabbit inference engine (AGENTS.md)

Regenerate LogQL ANTLR4 parser using make install and make gen in crates/icegate-query/src/logql/ directory (requires Java)

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • crates/icegate-query/src/logql/datafusion/planner.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Benchmark
  • GitHub Check: Build Release
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Test beta on ubuntu-latest
  • GitHub Check: Security Audit
🔇 Additional comments (10)
crates/icegate-query/src/logql/datafusion/planner.rs (10)

18-18: LGTM!

The make_array import is correctly added from datafusion::functions_nested::make_array and is used appropriately throughout the file for constructing array expressions.


71-88: LGTM!

Good refactoring to centralize the transform_output_columns call after the match expression. This eliminates duplication and ensures consistent output transformation for both Log and Metric paths.


637-649: Good validation for unsupported grouping combinations.

The early validation for sum_over_time and rate_counter with grouping clauses is well-placed and provides a clear error message. This prevents confusing downstream errors and matches Loki semantics.


847-866: LGTM!

The conditional column selection with schema existence checks is well-implemented. The compound condition at line 861 correctly handles the case where grouping is specified and attributes exist, ensuring serialized MAP columns are created for downstream vector aggregation.


1067-1094: LGTM!

The grouping pushdown logic is well-designed. Checking supports_grouping before pushing down ensures that sum_over_time and rate_counter (which don't support grouping) are handled correctly at the vector aggregation level instead.


1096-1151: Well-documented merge semantics with intentional design choice.

The cross-type merge behavior (lines 1136-1149) where outer grouping takes complete precedence is a reasonable design decision that's well-documented. This ensures predictable behavior when combining by and without clauses.

One consideration: The quadratic duplicate checking (line 1115, 1125) is acceptable since label lists are typically small. If this becomes a performance concern with large label sets, consider using a HashSet for lookups.


1153-1184: LGTM!

The vector aggregation changes correctly:

  1. Push grouping to inner range aggregation when supported
  2. Use filtered attributes for grouping when specified
  3. Collapse to timestamp-only grouping when no grouping clause is present

The preserve_attrs flag cleanly controls whether attributes are included in the output, matching LogQL semantics where ungrouped aggregations produce a single series without labels.


1496-1496: LGTM!

Consistent use of the imported make_array function for constructing label arrays in apply_drop.


1535-1535: LGTM!

Same consistent pattern applied in apply_keep.


1284-1291: No action needed—cloud_account_id is already a valid indexed column.

The mapping from account_id to cloud_account_id is correct. cloud_account_id is defined in LOG_INDEXED_ATTRIBUTE_COLUMNS in crates/icegate-common/src/schema.rs and is used throughout the codebase as both an indexed column and partition field.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @RUST.md:
- Around line 23-26: The nested Markdown bullets under the "axum" section (the
lines containing "Keep request handlers async, returning `Result<Response,
AppError>`...", "Use layered extractors...", "Add `tower` middleware...", and
"Offload CPU-bound work...") and the sub-bullet at the `.env` storage point are
indented with 4 spaces; change those nested list lines to use 2-space
indentation to match standard Markdown list conventions and satisfy markdownlint
MD007, keeping the bullet text unchanged and preserving surrounding list
context.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60d7908 and 657020d.

📒 Files selected for processing (1)
  • RUST.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Never proactively create documentation files (*.md) or README files unless explicitly requested

Files:

  • RUST.md
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • RUST.md
🧠 Learnings (9)
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to Cargo.toml : Configure strict clippy and rustc lints: forbid unsafe_code, deny missing_docs and dead_code, and enable clippy pedantic/nursery

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use rustfmt for code formatting with configuration in rustfmt.toml

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use `make fmt` to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use `make clippy` to run the linter with warnings as errors

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Run `make ci` to execute all CI checks (check, fmt, clippy, test, audit)

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Run `make audit` to perform security audits and use `make install` to install cargo-audit

Applied to files:

  • RUST.md
📚 Learning: 2025-12-15T16:08:35.659Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-15T16:08:35.659Z
Learning: Applies to CLAUDE.md : Follow guidelines specified in CLAUDE.md

Applied to files:

  • RUST.md
📚 Learning: 2025-12-15T16:08:35.659Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-15T16:08:35.659Z
Learning: Applies to AGENTS.md : Follow guidelines specified in AGENTS.md

Applied to files:

  • RUST.md
📚 Learning: 2025-12-15T16:08:42.425Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: GEMINI.md:0-0
Timestamp: 2025-12-15T16:08:42.425Z
Learning: For universal AI agent guidance applicable to all AI assistants, refer to AGENTS.md

Applied to files:

  • RUST.md
🪛 LanguageTool
RUST.md

[style] ~22-~22: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r JSON serialization/deserialization. - Use axum for creating any web servers or ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)
RUST.md

23-23: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


24-24: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


25-25: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


26-26: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


153-153: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test beta on ubuntu-latest
  • GitHub Check: Benchmark
  • GitHub Check: Security Audit
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Build Release
🔇 Additional comments (1)
RUST.md (1)

186-186: Ensure file ends with a newline character.

As per coding guidelines, all .md files must end with a newline. Verify that the file properly terminates with a newline after the final content.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
RUST.md (1)

20-22: Fix markdown indentation and reduce repetitive sentence structure.

Two minor issues:

  1. Lines 20–22: Three successive bullet points begin with "Use," which is repetitive. Consider rewording at least one of them for variety.
  2. Line 153: Nested list item has 4-space indentation but markdownlint expects 2 spaces for consistency with the Markdown convention.
🔧 Proposed fixes

Lines 20–22 – Reword one bullet for variety:

- Use `cargo` for project management, building, and dependency management.
- Use `serde` with `serde_json` for JSON serialization/deserialization.
- Use `axum` for creating any web servers or HTTP APIs.
+- Manage projects with `cargo` for building and dependency management.
+- Serialize/deserialize JSON with `serde` and `serde_json`.
+- Create web servers or HTTP APIs with `axum`.

Line 153 – Fix indentation from 4 spaces to 2:

  - **NEVER** store secrets, API keys, or passwords in code. Only store them in `.env`.
-    - Ensure `.env` is declared in `.gitignore`.
+  - Ensure `.env` is declared in `.gitignore`.

Also applies to: 153-153

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 657020d and 3726bc9.

📒 Files selected for processing (1)
  • RUST.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Never proactively create documentation files (*.md) or README files unless explicitly requested

Files:

  • RUST.md
**/*.{rs,toml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Ensure each file ends with a newline; do not duplicate if it already exists

Files:

  • RUST.md
🧠 Learnings (9)
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use rustfmt for code formatting with configuration in rustfmt.toml

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to Cargo.toml : Configure strict clippy and rustc lints: forbid unsafe_code, deny missing_docs and dead_code, and enable clippy pedantic/nursery

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use `make fmt` to check code format; DO NOT run via rustup because it doesn't respect rustfmt.toml

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Use `make clippy` to run the linter with warnings as errors

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Run `make ci` to execute all CI checks (check, fmt, clippy, test, audit)

Applied to files:

  • RUST.md
📚 Learning: 2025-12-27T13:38:58.955Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T13:38:58.955Z
Learning: Applies to **/*.rs : Run `make audit` to perform security audits and use `make install` to install cargo-audit

Applied to files:

  • RUST.md
📚 Learning: 2025-12-15T16:08:35.659Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-15T16:08:35.659Z
Learning: Applies to CLAUDE.md : Follow guidelines specified in CLAUDE.md

Applied to files:

  • RUST.md
📚 Learning: 2025-12-15T16:08:35.659Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-15T16:08:35.659Z
Learning: Applies to AGENTS.md : Follow guidelines specified in AGENTS.md

Applied to files:

  • RUST.md
📚 Learning: 2025-12-15T16:08:42.425Z
Learnt from: CR
Repo: icegatetech/icegate PR: 0
File: GEMINI.md:0-0
Timestamp: 2025-12-15T16:08:42.425Z
Learning: For universal AI agent guidance applicable to all AI assistants, refer to AGENTS.md

Applied to files:

  • RUST.md
🪛 LanguageTool
RUST.md

[style] ~22-~22: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r JSON serialization/deserialization. - Use axum for creating any web servers or ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)
RUST.md

153-153: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test stable on ubuntu-latest
  • GitHub Check: Build Release
  • GitHub Check: Test beta on ubuntu-latest
  • GitHub Check: Benchmark
  • GitHub Check: Security Audit
🔇 Additional comments (1)
RUST.md (1)

1-186: Verify that RUST.md creation was explicitly requested before merging.

The coding guidelines state: "Never proactively create documentation files (*.md) or README files unless explicitly requested." The PR objectives do not show an explicit request for adding RUST.md. This file should only be merged if:

  1. There is an explicit GitHub issue or requirement requesting Rust coding standards documentation
  2. The PR description or comments provide clear justification for the documentation addition as part of this PR's scope

If this was intentional, please clarify the request context.

@frisbeeman frisbeeman merged commit f0f1324 into main Jan 10, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement by/without grouping support for unwrap range aggregations

3 participants