-
-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor + test: query response serialization #1165
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes update the error handling and response formatting in the query handling and response generation logic. A new error variant for Arrow errors is introduced in the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/response.rs (2)
31-59
: Potential performance improvements
Writing entireRecordBatch
data into a single in-memory buffer witharrow_json::ArrayWriter
may become memory-intensive for large result sets. Consider a streaming or chunk-based approach if you anticipate very large batches. Also note that settingfill_null
totrue
will require iterating over all objects to find missing fields, which can increase overhead.
62-219
: Great test coverage
These tests thoroughly cover multiple scenarios, including handling of null values, empty batches, and field inclusion. However, there is repeated code for creating record batches. Consider refactoring common setup steps into a helper function to reduce duplication.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/handlers/http/query.rs
(6 hunks)src/response.rs
(2 hunks)src/utils/arrow/mod.rs
(0 hunks)
💤 Files with no reviewable changes (1)
- src/utils/arrow/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: coverage
🔇 Additional comments (8)
src/response.rs (2)
19-19
: No issues
The newly added import forQueryError
looks correct.
21-21
: Imports for JSON are appropriate
No concerns with the addedserde_json
imports.src/handlers/http/query.rs (6)
22-22
: Explicit import
IntroducingArrowError
is consistent with the newArrow
error variant.
35-35
: New debug logging
No concerns with addingerror
anddebug
fromtracing
.
105-105
: Accurate timing logs for quick count queries
Recording and logging elapsed time here is beneficial for performance monitoring.Also applies to: 126-132
143-143
: End-to-end timing for standard queries
Consistent usage of timing metrics here as well ensures comprehensive performance tracking.Also applies to: 145-145, 150-152
318-321
: Enhanced error handling
AddingArrow
andJson
variants toQueryError
clearly distinguishes error origins and improves debuggability.
327-329
: Status code alignment
TreatingQueryError::Json
andQueryError::Arrow
as internal server errors follows the existing pattern forExecute
errors and keeps error handling consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/response.rs (2)
31-54
: Consider adding error handling documentationThe
to_json
method propagates errors from therecord_batches_to_json
function, but there's no documentation about what specific error types might be returned. Consider adding documentation that explains the potential error scenarios./// Convert into a JSON response +/// +/// # Errors +/// +/// Returns a `QueryError` if: +/// - There's an error converting the record batches to JSON (Arrow error) +/// - JSON serialization fails pub fn to_json(&self) -> Result<Value, QueryError> {
32-54
: Consider adding support for custom null value representationThe method currently uses
Value::Null
for missing fields whenfill_null
is true. Consider extending this functionality to allow clients to specify a custom representation for null values (e.g., empty string, zero, etc.) which could be useful for certain data processing pipelines.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/response.rs
(2 hunks)src/utils/arrow/mod.rs
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/utils/arrow/mod.rs
🧰 Additional context used
🧬 Code Definitions (1)
src/response.rs (2)
src/handlers/http/query.rs (1)
query
(70-154)src/utils/arrow/mod.rs (1)
record_batches_to_json
(52-66)
🔇 Additional comments (11)
src/response.rs (11)
21-22
: Updated import statements to align with new implementationThe imports have been updated to directly use
serde_json::Value
and therecord_batches_to_json
utility function, which is appropriate for the new JSON-focused implementation.
32-34
: Newto_json
method correctly converts QueryResponse to serializable JSONThe method appropriately handles the conversion of Arrow
RecordBatch
data to JSON through therecord_batches_to_json
utility function. The return type ofResult<Value, QueryError>
is well-suited for JSON serialization, allowing for proper error propagation.
37-40
: Null value handling is implemented correctlyWhen
fill_null
is true, the code properly iterates through each JSON object and ensures that any missing field gets aValue::Null
entry. This maintains data consistency for clients expecting all fields to be present.
44-51
: Response structure correctly implements the configured output formatThe logic to conditionally include fields with the records (when
with_fields
is true) or return just the records is well-implemented. The use ofjson!
macro ensures proper serialization of both formats.
57-66
: Well-organized test module setupThe test module is properly structured with appropriate imports, including all the necessary Arrow types for comprehensive testing.
67-77
: Test for empty record batches is comprehensiveThis test correctly verifies that empty record batches convert to an empty array in JSON format, which is an important edge case to handle.
79-108
: Thorough test for basic record batch conversionThe test creates a diverse set of data types (integers, floats, and strings) to ensure proper JSON conversion for various data types. The assertions correctly verify that the structure and values match the expected JSON output.
110-142
: Test for including fields in the output is well-implementedThis test properly verifies the behavior when
with_fields
is set to true, ensuring that both the fields array and records are included in the response as expected.
144-177
: Test for handling record batches with nullable fields correctly verifies behaviorThis test thoroughly validates that nullable fields are handled correctly when
fill_null
is false. The test creates records with null values in the string array and verifies that these fields are omitted from the JSON output, which is the expected behavior.
179-211
: Test for null value filling is comprehensiveThis test validates that when
fill_null
is set to true, null values are explicitly included in the JSON output rather than being omitted. The assertion correctly checks for the presence of explicit null values in the output.
33-35
:❓ Verification inconclusive
Verify error propagation from record_batches_to_json
The code now directly propagates errors from
record_batches_to_json
with the?
operator, but it's not clear how Arrow errors are mapped toQueryError
. Ensure theQueryError
enum has been updated to handle Arrow errors appropriately.
🏁 Script executed:
#!/bin/bash # Check if QueryError has been updated to handle Arrow errors rg -A 20 "pub enum QueryError" --type rustLength of output: 1325
Clarify error mapping for Arrow errors
The updated code now uses the
?
operator to propagate errors fromrecord_batches_to_json
as-is. Please verify that any errors it may return—particularly those stemming from Arrow operations—are correctly converted into the appropriateQueryError
. Currently, theQueryError
enum (defined insrc/handlers/http/query.rs
) includes a variant forDataFusionError
(with a#[from]
conversion), which may already cover Arrow-related errors (if they are wrapped in aDataFusionError
). However, ifrecord_batches_to_json
can yield errors that aren’t encapsulated byDataFusionError
, an explicit mapping (or a newQueryError
variant) might be needed.
- Confirm that
record_batches_to_json
returns errors convertible intoQueryError
(e.g., viaDataFusionError
).- Update
QueryError
to handle direct Arrow errors if such cases exist.
Fixes #XXXX.
Description
Improves the testability of
QueryResponse
This PR has:
Summary by CodeRabbit