-
Notifications
You must be signed in to change notification settings - Fork 379
feat: support WITHIN
filter
#5397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: support WITHIN
filter
#5397
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis pull request updates the sqlparser dependency in Cargo.toml and introduces new functionality for filtering operations. A new WithinFilterFunction is added to the math functions, and a within_filter module is integrated into the query engine. The planner now rewrites the logical plan with an asynchronous call to a range plan rewriter, and a new WithinFilterRule is applied to transform filter expressions. Additionally, test cases and SQL scripts are enhanced with new queries and assertions, reflecting these updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Planner as DfLogicalPlanner
participant RangeRewriter as RangePlanRewriter
participant FilterRule as WithinFilterRule
participant Optimizer as Query Optimizer
Planner->>Planner: Convert SQL to initial plan (sql_to_rel)
Planner->>RangeRewriter: Await rewrite(original_plan)
RangeRewriter->>FilterRule: Analyze LogicalPlan for within_filter predicate
FilterRule-->>RangeRewriter: Return transformed LogicalPlan
RangeRewriter-->>Planner: Provide rewritten LogicalPlan
Planner->>Optimizer: Optimize LogicalPlan
Optimizer-->>Planner: Return optimized LogicalPlan
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
@killme2008 @waynexia My idea is
This idea is based on |
pub struct WithinFilterRule; | ||
|
||
impl ExtensionAnalyzerRule for WithinFilterRule { | ||
fn analyze( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert LogicalPlan::Filter(within(column, time))
to LogicalPlan::Filter(column >= start and column < end)
so that the optimizer can push down within
function.
@coderabbitai review |
🧩 Analysis chain🏁 Script executed:
Length of output: 2304 🌐 Web query:
💡 Result:
🏁 Script executed:
Length of output: 20487 @killme2008 Here's my review of the WITHIN filter implementation:
The approach of converting WITHIN to range filters looks good, but needs more work on documentation, error handling, and format support. Would you like me to provide specific code examples for any of these points? ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
src/log-query/src/log_query.rs (1)
352-361
:🧩 Analysis chain
Fix failing test case for year-only date format.
The test is failing with error "Invalid date format: 2023". While the code claims to support year-only format ("%Y"), the implementation seems to have a bug handling this case.
Let's verify the date parsing implementation:
🏁 Script executed:
#!/bin/bash # Description: Check if year-only format is properly handled in parse_datetime # Search for all test cases and date format handling rg -A 5 'parse_datetime|parse_from_str'Length of output: 10145
Bug in date parsing for year-only inputs in parse_datetime
The issue is confirmed. In src/log-query/src/log_query.rs, the parse_datetime function only computes the end date for the "%Y-%m-%d" case. The "%Y-%m" and "%Y" cases lack corresponding match arms for computing the end date, which leads to failing tests (e.g. with input "2023" causing "Invalid date format: 2023"). Please update the match statement to handle year-only (and year-month) formats appropriately—for example, computing the end date as start + 1 year for "%Y" and as start + duration corresponding to the month for "%Y-%m".
- Review and fix the match arms in src/log-query/src/log_query.rs within parse_datetime.
- Add the missing computations for end date when using "%Y" (and, if needed, "%Y-%m").
🧰 Tools
🪛 GitHub Actions: CI
[error] 359-359: called
Result::unwrap()
on anErr
value: 0: Invalid date format: 2023
🧹 Nitpick comments (4)
src/query/src/within_filter.rs (2)
31-32
: Avoid using unwrap() on the result of from_ymd_opt().
from_ymd_opt() returns an Option, which will be None for invalid dates. Unwrapping can cause runtime panics. Please handle potential None cases explicitly with an error or alternative behavior.- let timestamp = NaiveDate::from_ymd_opt(year, 1, 1).unwrap(); + if let Some(timestamp_naive) = NaiveDate::from_ymd_opt(year, 1, 1) { + let timestamp = Timestamp::from_chrono_date(timestamp_naive) + .map_err(|_| DataFusionError::Execution("Timestamp conversion error".to_string()))?; + ... + } else { + return Err(DataFusionError::Execution("Invalid date".to_string())); + }Also applies to: 46-47
81-84
: Provide a more descriptive error or handle multiple date/time formats.
Currently, the code only supports a simple 4-digit year, returning a "NotImplemented" error for other formats. To improve usability, handle or document additional date/time formats, or return a clearer error message.src/common/function/src/scalars/math.rs (1)
47-47
: Document the newly registered WithinFilterFunction.
While the function is registered here, it lacks explicit doc comments. Adding documentation clarifies its purpose and limitations for maintainers and users.tests/cases/standalone/common/basic.result (1)
34-40
: Make test outputs explicit for within_filter coverage.
The output matches the entire dataset, which might not be intuitive for a “within” single-day filter if the underlying logic eventually expands to more precise dates. Adding a second day or boundary records clarifies correctness.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (9)
Cargo.toml
(1 hunks)src/common/function/src/scalars/math.rs
(2 hunks)src/log-query/src/log_query.rs
(1 hunks)src/query/src/lib.rs
(1 hunks)src/query/src/planner.rs
(1 hunks)src/query/src/query_engine/state.rs
(2 hunks)src/query/src/within_filter.rs
(1 hunks)tests/cases/standalone/common/basic.result
(1 hunks)tests/cases/standalone/common/basic.sql
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/query/src/lib.rs
🧰 Additional context used
🪛 GitHub Actions: CI
src/log-query/src/log_query.rs
[error] 359-359: called Result::unwrap()
on an Err
value: 0: Invalid date format: 2023
src/query/src/within_filter.rs
[error] 1-1: Found missing header files. Please add the required header to the file.
🔇 Additional comments (5)
tests/cases/standalone/common/basic.result (1)
32-33
: Consider adding boundary tests for the within_filter usage.
You are only testing a single date "2022-11-03." Include boundary or multi-year queries to ensure correctness around year transitions and invalid inputs.src/query/src/planner.rs (1)
108-111
: LGTM!The integration of
RangePlanRewriter
is well-placed in the planning pipeline, occurring before optimization but after the initial plan creation.src/query/src/query_engine/state.rs (1)
99-106
: LGTM!The explicit type declaration for
extension_rules
improves code clarity. TheWithinFilterRule
is correctly added after theTypeConversionRule
.tests/cases/standalone/common/basic.sql (1)
20-20
: LGTM!The test case appropriately validates the new
WITHIN
filter functionality.Cargo.toml (1)
185-189
: SQLParser Dependency Update: Verify CompatibilityThe
sqlparser
dependency has been updated to use NiwakaDev's fork with the new revisionb3584823b236db3c244b70945008f0aae65c5e39
. Please verify that this revision fully supports all the features needed—especially the "visitor" and "serde" capabilities required for implementing the new WITHIN filter functionality—and that it does not introduce any breaking changes compared to the prior version. This change is critical as it underpins the parsing logic for the upcoming WITHIN filter feature.
src/query/src/within_filter.rs
Outdated
@@ -0,0 +1,96 @@ | |||
use chrono::NaiveDate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing license header or file header comment.
It appears that your CI pipeline flags this file for a missing header. Ensure you include the necessary headers (e.g., Apache license header or any project-specific header) at the top of the file.
🧰 Tools
🪛 GitHub Actions: CI
[error] 1-1: Found missing header files. Please add the required header to the file.
src/query/src/within_filter.rs
Outdated
)) | ||
} | ||
} else { | ||
todo!(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the todo!() macro in production code.
Relying on todo!() will cause an immediate panic if encountered. Use proper error handling or implement the logic to manage unsupported scenarios.
- todo!();
+ return Err(DataFusionError::Execution("Unsupported within_filter argument type".to_string()));
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
todo!(); | |
return Err(DataFusionError::Execution("Unsupported within_filter argument type".to_string())); |
#[derive(Clone, Debug, Default)] | ||
struct WithinFilterFunction; | ||
|
||
impl fmt::Display for WithinFilterFunction { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
write!(f, "WithinFilterFunction") | ||
} | ||
} | ||
|
||
impl Function for WithinFilterFunction { | ||
fn name(&self) -> &str { | ||
"within_filter" | ||
} | ||
|
||
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> { | ||
Ok(ConcreteDataType::boolean_datatype()) | ||
} | ||
|
||
fn signature(&self) -> Signature { | ||
Signature::uniform( | ||
2, | ||
vec![ConcreteDataType::string_datatype()], | ||
Volatility::Immutable, | ||
) | ||
} | ||
|
||
fn eval(&self, _func_ctx: FunctionContext, _columns: &[VectorRef]) -> Result<VectorRef> { | ||
Err(DataFusionError::Internal("todo".into())).context(GeneralDataFusionSnafu) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Implement or refine the eval() method.
Currently, eval() always returns an internal error with "todo". This can break user queries expecting a result. Even if the final logic resides elsewhere, consider returning a more explicit message or partially implementing the function to avoid confusion.
- Err(DataFusionError::Internal("todo".into())).context(GeneralDataFusionSnafu)
+ Err(DataFusionError::NotImplemented(
+ "WithinFilterFunction eval() is not implemented yet".to_string()
+ ))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#[derive(Clone, Debug, Default)] | |
struct WithinFilterFunction; | |
impl fmt::Display for WithinFilterFunction { | |
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | |
write!(f, "WithinFilterFunction") | |
} | |
} | |
impl Function for WithinFilterFunction { | |
fn name(&self) -> &str { | |
"within_filter" | |
} | |
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> { | |
Ok(ConcreteDataType::boolean_datatype()) | |
} | |
fn signature(&self) -> Signature { | |
Signature::uniform( | |
2, | |
vec![ConcreteDataType::string_datatype()], | |
Volatility::Immutable, | |
) | |
} | |
fn eval(&self, _func_ctx: FunctionContext, _columns: &[VectorRef]) -> Result<VectorRef> { | |
Err(DataFusionError::Internal("todo".into())).context(GeneralDataFusionSnafu) | |
} | |
} | |
#[derive(Clone, Debug, Default)] | |
struct WithinFilterFunction; | |
impl fmt::Display for WithinFilterFunction { | |
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | |
write!(f, "WithinFilterFunction") | |
} | |
} | |
impl Function for WithinFilterFunction { | |
fn name(&self) -> &str { | |
"within_filter" | |
} | |
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> { | |
Ok(ConcreteDataType::boolean_datatype()) | |
} | |
fn signature(&self) -> Signature { | |
Signature::uniform( | |
2, | |
vec![ConcreteDataType::string_datatype()], | |
Volatility::Immutable, | |
) | |
} | |
fn eval(&self, _func_ctx: FunctionContext, _columns: &[VectorRef]) -> Result<VectorRef> { | |
- Err(DataFusionError::Internal("todo".into())).context(GeneralDataFusionSnafu) | |
+ Err(DataFusionError::NotImplemented( | |
+ "WithinFilterFunction eval() is not implemented yet".to_string() | |
+ )) | |
} | |
} |
if let Expr::ScalarFunction(func) = &filter.predicate | ||
&& func.func.name() == "within_filter" | ||
{ | ||
let column_name = func.args[0].clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to ensure the argument number is 2?
I think it's great! Let's do it! |
93a3bad
to
e9754ed
Compare
@killme2008 |
|
||
#[tokio::test] | ||
async fn test_within_filter() { | ||
// TODO: test within filter with time zone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QUESTION
How do we handle time zone in WITHIN
filter functions?
This PR doesn't support time zone yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set the timezone info in QueryEngineContext
, then we can test the queries with timezone.
See #5397 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What great work! Thanks. About the timezone, I commented on it above.
@evenyag Please take a look. It's a lovely feature.
} | ||
|
||
/// Infers the time range from a given timestamp string. | ||
fn try_to_infer_time_range(timestamp: &str) -> Option<(Timestamp, Timestamp)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! I think we must take care of the timezone. We can retrieve the timezone from QueryEngineContext -> QueryEngineContext -> pub fn timezone(&self) -> Timezone
. The QueryEngineContext
is already in analyze
signature.
I found that the NativeDataTime has a function to change timezone
https://docs.rs/chrono/latest/chrono/naive/struct.NaiveDateTime.html#method.and_local_timezone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 6 out of 12 changed files in this pull request and generated 2 comments.
Files not reviewed (6)
- tests/cases/standalone/common/function/within_filter.result: Language not supported
- tests/cases/standalone/common/function/within_filter.sql: Language not supported
- src/query/src/dist_plan/merge_scan.rs: Evaluated as low risk
- src/query/src/optimizer/windowed_sort.rs: Evaluated as low risk
- src/query/src/lib.rs: Evaluated as low risk
- src/query/src/planner.rs: Evaluated as low risk
parsed.set_month(1).unwrap(); | ||
parsed.set_day(1).unwrap(); | ||
Some(parsed.to_naive_date().unwrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling should be added when setting month and day in try_parse_year to avoid potential panic.
parsed.set_month(1).unwrap(); | |
parsed.set_day(1).unwrap(); | |
Some(parsed.to_naive_date().unwrap()) | |
parsed.set_month(1).ok()?; | |
parsed.set_day(1).ok()?; | |
Some(parsed.to_naive_date().ok()?) |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may avoid unwrap()
in this function or document why we can use unwrap()
. I think ok()?
is fine since we deal with user inputs.
|
||
fn eval(&self, _func_ctx: FunctionContext, _columns: &[VectorRef]) -> Result<VectorRef> { | ||
Err(DataFusionError::Internal( | ||
"within_filter function just a empty function, it should not be eval!".into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is unclear. It should be changed to 'The within_filter function is a placeholder and should not be evaluated.'
"within_filter function just a empty function, it should not be eval!".into(), | |
"The within_filter function is a placeholder and should not be evaluated.".into(), |
Copilot uses AI. Check for mistakes.
Great work! I plan to review this today |
// TODO: remove Vec<Arc<(dyn ExtensionAnalyzerRule + std::marker::Send + Sync + 'static)>> | ||
let mut extension_rules: Vec< | ||
Arc<(dyn ExtensionAnalyzerRule + std::marker::Send + Sync + 'static)>, | ||
> = Vec::new(); | ||
|
||
// The [`TypeConversionRule`] must be at first | ||
extension_rules.insert(0, Arc::new(TypeConversionRule) as _); | ||
extension_rules.push(Arc::new(WithinFilterRule::new())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: remove Vec<Arc<(dyn ExtensionAnalyzerRule + std::marker::Send + Sync + 'static)>> | |
let mut extension_rules: Vec< | |
Arc<(dyn ExtensionAnalyzerRule + std::marker::Send + Sync + 'static)>, | |
> = Vec::new(); | |
// The [`TypeConversionRule`] must be at first | |
extension_rules.insert(0, Arc::new(TypeConversionRule) as _); | |
extension_rules.push(Arc::new(WithinFilterRule::new())); | |
// TODO: remove Vec<Arc<(dyn ExtensionAnalyzerRule + Send + Sync)>> | |
let extension_rules: Vec<Arc<(dyn ExtensionAnalyzerRule + Send + Sync)>> = vec![ | |
// The [`TypeConversionRule`] must be at first | |
Arc::new(TypeConversionRule), | |
Arc::new(WithinFilterRule), | |
]; |
pub struct WithinFilterRule {} | ||
|
||
impl WithinFilterRule { | ||
pub fn new() -> Self { | ||
WithinFilterRule {} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub struct WithinFilterRule {} | |
impl WithinFilterRule { | |
pub fn new() -> Self { | |
WithinFilterRule {} | |
} | |
} | |
pub struct WithinFilterRule; |
@@ -188,7 +188,7 @@ smallvec = { version = "1", features = ["serde"] } | |||
snafu = "0.8" | |||
sysinfo = "0.30" | |||
# on branch v0.52.x | |||
sqlparser = { git = "https://github.com/GreptimeTeam/sqlparser-rs.git", rev = "71dd86058d2af97b9925093d40c4e03360403170", features = [ | |||
sqlparser = { git = "https://github.com/NiwakaDev/sqlparser-rs", rev = "daf8c33234f1b4568824b32efcc5d611091c4834", features = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the repo now?
let end = if naive_date.month() == 12 { | ||
NaiveDate::from_ymd_opt(naive_date.year() + 1, 1, 1).unwrap() | ||
} else { | ||
NaiveDate::from_ymd_opt(naive_date.year(), naive_date.month() + 1, 1).unwrap() | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we panic here if something overflows?
} | ||
|
||
/// Infers the time range from a given timestamp string. | ||
fn try_to_infer_time_range(timestamp: &str) -> Option<(Timestamp, Timestamp)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might write a unit test for different time units.
\n TableScan: test"; | ||
assert_eq!(expected, plan.to_string()); | ||
|
||
// 2025-12-1T01:12:00 <= timestamp < 2025-12-1T01:13:00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move some cases to the unit test for the try_to_infer_time_range()
. It will be easier to add more cases.
@NiwakaDev Hi, do you have time to resolve these comments? I think this PR is almost done, looks great, thank you. |
Sorry for the delay. I'll address the reviews this weekend. |
I have a few concerns about adding a new To achieve a similar goal, @sunng87 provides another alternative method that implements it as a Furthermore, we can support relative time filters, like But Really sorry to put this comment that late 🙇 It just came across to my mind on the drive |
It seems like questdb supports a similar feature: One thing I’m wondering: if we implement it as a function, is it pushed down like a normal filter? |
If you look at the PR in the parser GreptimeTeam/sqlparser-rs#16, You will find that it doesn't change the existing SQL parser code but adds new parsing logic. So maintenance isn't a problem here at all. This grammar is supported by QuestDB. I think it's friendlier than a SQL function. And of course, we can support the interval by extending select * from monitors where ts within '2024-04-19' in '1 day' Like that. |
Sure, we can do this just as other functions or operators
You could have a try yourself before you put "not a problem at all" :) RANGE clause doesn't change existing code either.
Do you really think the next example is easy to understand? |
The range query is worth the maintenance cost due to its usefulness and elegance, but I agree with you that Could you refactor this PR, @NiwakaDev? Please implement it as an SQL function. Most of the work is already done. Thanks! |
The new following approach:
I guess the second step is almost the same. What do you think? |
Yes, definitely. |
Thank you @NiwakaDev @killme2008 ! |
@NiwakaDev Sorry to disturb you, but do you have time to review this PR? It's nearly finished and would benefit our users. If not, I would love to help. |
Sorry for the delay. March was busy😅 I can finally start working on this PR this week if you don't mind. If I can't make good progress by next week, I'll ask for your help. Thank you! |
That would be great! Thank you. |
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
Part of #3755.
This PR supports
within
syntax like:They are converted to
What's changed and what's your intention?
PR Checklist
Please convert it to a draft if some of the following conditions are not met.
Summary by CodeRabbit
New Features
• Enhanced query filtering now accepts year-only date inputs, automatically converting them into valid date ranges.
• Expanded SQL functionality adds new aggregation and data selection options for analyzing system metrics.
Chores
• Updated an internal dependency for improved stability.
Tests
• Added new SQL queries and data insertion tests to validate the improved filtering and aggregation behaviors.