Skip to content

feat: support WITHIN filter #5397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

NiwakaDev
Copy link
Collaborator

@NiwakaDev NiwakaDev commented Jan 18, 2025

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

Part of #3755.

This PR supports within syntax like:

select * from monitors where ts within '2024';

select * from monitors where ts within '2024-04-19';

select * from monitors where ts within '2024-04-19 23:50';

They are converted to

select * from monitors where ts >= '2024-01-01 00:00:00' and ts < '2025-01-01 00:00:00';

select * from monitors where ts >= '2024-04-19 00:00:00' and ts < '2024-04-20 00:00:00';

select * from monitors where ts >= '2024-04-19 23:50:00' and ts < '2024-04-19 23:51:00';

What's changed and what's your intention?

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

Summary by CodeRabbit

  • New Features
    • Enhanced query filtering now accepts year-only date inputs, automatically converting them into valid date ranges.
    • Expanded SQL functionality adds new aggregation and data selection options for analyzing system metrics.

  • Chores
    • Updated an internal dependency for improved stability.

  • Tests
    • Added new SQL queries and data insertion tests to validate the improved filtering and aggregation behaviors.

@NiwakaDev NiwakaDev requested review from evenyag and a team as code owners January 18, 2025 21:08
Copy link
Contributor

coderabbitai bot commented Jan 18, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This pull request updates the sqlparser dependency in Cargo.toml and introduces new functionality for filtering operations. A new WithinFilterFunction is added to the math functions, and a within_filter module is integrated into the query engine. The planner now rewrites the logical plan with an asynchronous call to a range plan rewriter, and a new WithinFilterRule is applied to transform filter expressions. Additionally, test cases and SQL scripts are enhanced with new queries and assertions, reflecting these updates.

Changes

File(s) Change Summary
Cargo.toml Updated sqlparser dependency from the GreptimeTeam repository/revision to the NiwakaDev repository/revision with unchanged features
src/common/function/src/scalars/math.rs Added new WithinFilterFunction with its structure, trait implementations (name, return_type, signature, eval) for boolean output and placeholder error message
src/log-query/src/log_query.rs Modified test case in canonicalize: changed TimeFilter start date from "2023-10-01" to "2023"
src/query/src/{lib.rs, planner.rs, query_engine/state.rs, within_filter.rs} Introduced within_filter module; added a new RangePlanRewriter step in plan_sql; updated QueryEngineState extension_rules with new WithinFilterRule; added WithinFilterRule implementation that transforms filter expressions
tests/cases/standalone/common/{basic.result, basic.sql} Added new SQL queries for selecting, aggregating, and inserting data on the system_metrics table, including filtering by a within_filter predicate

Sequence Diagram(s)

sequenceDiagram
    participant Planner as DfLogicalPlanner
    participant RangeRewriter as RangePlanRewriter
    participant FilterRule as WithinFilterRule
    participant Optimizer as Query Optimizer

    Planner->>Planner: Convert SQL to initial plan (sql_to_rel)
    Planner->>RangeRewriter: Await rewrite(original_plan)
    RangeRewriter->>FilterRule: Analyze LogicalPlan for within_filter predicate
    FilterRule-->>RangeRewriter: Return transformed LogicalPlan
    RangeRewriter-->>Planner: Provide rewritten LogicalPlan
    Planner->>Optimizer: Optimize LogicalPlan
    Optimizer-->>Planner: Return optimized LogicalPlan
Loading

Suggested reviewers

  • waynexia
  • zhongzc
  • tisonkun

Poem

I'm just a little rabbit, hopping through the code,
Changes bloom like carrots along my winding road.
With within_filter and plans that twist and turn,
I nibble on new features, ever eager to learn.
Coding carrots and crisp SQL—oh, what a joyful spree!


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Jan 18, 2025
@NiwakaDev NiwakaDev marked this pull request as draft January 18, 2025 21:09
@NiwakaDev
Copy link
Collaborator Author

NiwakaDev commented Jan 18, 2025

@killme2008 @waynexia
I've created a rough implementation of the WITHIN filter:

My idea is

  1. Parse WITHIN filters, and treats it as functions in the parser phase(NiwakaDev/sqlparser-rs@b358482).
  2. Convert WITHIN filter functions to normal filters in the planner phase.

This idea is based on RANGE query implementation.
If the approach seems good, I'd like to proceed with it. What do you think?

@NiwakaDev NiwakaDev changed the title fix: wip feat: support WITHIN filter Jan 18, 2025
pub struct WithinFilterRule;

impl ExtensionAnalyzerRule for WithinFilterRule {
fn analyze(
Copy link
Collaborator Author

@NiwakaDev NiwakaDev Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert LogicalPlan::Filter(within(column, time)) to LogicalPlan::Filter(column >= start and column < end) so that the optimizer can push down within function.

@killme2008
Copy link
Contributor

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Feb 7, 2025

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch PR details
gh pr view 5397 --json files,body,title,commits

# Search for test files related to WITHIN filter
rg -l "WITHIN" --type rust

# Check for documentation updates
fd -e md -e rs | xargs rg -l "WITHIN"

Length of output: 2304


🌐 Web query:

What is WITHIN filter in time series databases?

💡 Result:

While none of the provided sources explicitly mention a "WITHIN" filter operator, time series databases commonly use three equivalent filtering mechanisms that achieve similar spatial/temporal containment checks:

## Temporal Range Filtering
Used to select entries within a specific time window:
```sql
-- RavenDB example with BETWEEN
SELECT timeseries FROM HeartRates 
BETWEEN '2020-05-17T00:00:00' AND '2020-05-17T00:10:00' [1]

-- QuestDB's IN syntax for time containment
SELECT * FROM trades 
WHERE timestamp IN '2023-12-20;15s' -- 15-second window [5]
```

## Value Range Filtering
Filters numerical values within specified bounds:
```sql
-- InfluxQL example excluding outliers 
WHERE (154 - "value") BETWEEN 0 AND 760 [3]

-- RavenDB value threshold filtering
WHERE ts.Value > 75 [1]
```

## Tag Containment Filtering
Checks if tags exist in a specified set:
```csharp
// RavenDB tag whitelisting
.Where(ts => ts.Tag == "watches/fitbit" || 
             ts.Tag == "watches/apple") [1]
```

These patterns align with the core filtering capabilities described across sources:
1. Temporal containment through date ranges (`BETWEEN`, `IN`) [1][5]
2. Numerical range checks using comparison operators [1][3]
3. Set membership checks for tags/attributes [1][4]

While terminology varies between implementations (BETWEEN, IN, WHERE clauses), they all enable selecting data points that fall **within** specified temporal, numerical, or categorical boundaries - essentially implementing different flavors of spatial/temporal containment filters.

Citations:

- 1: https://ravendb.net/docs/article-page/6.2/csharp/document-extensions/timeseries/querying/filtering
- 2: https://docs.datomic.com/reference/filters.html
- 3: https://community.grafana.com/t/how-to-filter-noise-from-time-series-graph/108812
- 4: https://www.alibabacloud.com/blog/key-concepts-and-features-of-time-series-databases_594734
- 5: https://questdb.com/glossary/time-series-database/
- 6: https://devops.com/time-series-database-basics/
- 7: https://www.splunk.com/en_us/blog/learn/time-series-databases.html

🏁 Script executed:

#!/bin/bash
# Check the new within_filter implementation
cat src/query/src/within_filter.rs

# Check test cases
cat tests/cases/standalone/common/basic.sql
cat tests/cases/standalone/common/basic.result

# Check planner integration
cat src/query/src/planner.rs

Length of output: 20487


@killme2008 Here's my review of the WITHIN filter implementation:

  1. Documentation:

    • Please add rustdoc comments for the WithinFilterRule struct and its implementation
    • Document the supported date formats and examples in the comments
  2. Error Handling:

    • Replace todo!() macro with proper error handling for unsupported cases
    • Consider adding custom error types for invalid date formats
    • Add validation for the timestamp column type
  3. Date Format Support:

    • Current implementation only supports year format ('2022')
    • Consider supporting more formats like:
      • Full date: '2022-11-03'
      • Month: '2022-11'
      • Time ranges: '1h', '7d'
  4. Testing:

    • Add more test cases covering:
      • Different date formats
      • Invalid inputs
      • Edge cases (year boundaries)
      • Error scenarios

The approach of converting WITHIN to range filters looks good, but needs more work on documentation, error handling, and format support. Would you like me to provide specific code examples for any of these points?

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
src/log-query/src/log_query.rs (1)

352-361:

🧩 Analysis chain

Fix failing test case for year-only date format.

The test is failing with error "Invalid date format: 2023". While the code claims to support year-only format ("%Y"), the implementation seems to have a bug handling this case.

Let's verify the date parsing implementation:


🏁 Script executed:

#!/bin/bash
# Description: Check if year-only format is properly handled in parse_datetime

# Search for all test cases and date format handling
rg -A 5 'parse_datetime|parse_from_str'

Length of output: 10145


Bug in date parsing for year-only inputs in parse_datetime

The issue is confirmed. In src/log-query/src/log_query.rs, the parse_datetime function only computes the end date for the "%Y-%m-%d" case. The "%Y-%m" and "%Y" cases lack corresponding match arms for computing the end date, which leads to failing tests (e.g. with input "2023" causing "Invalid date format: 2023"). Please update the match statement to handle year-only (and year-month) formats appropriately—for example, computing the end date as start + 1 year for "%Y" and as start + duration corresponding to the month for "%Y-%m".

  • Review and fix the match arms in src/log-query/src/log_query.rs within parse_datetime.
  • Add the missing computations for end date when using "%Y" (and, if needed, "%Y-%m").
🧰 Tools
🪛 GitHub Actions: CI

[error] 359-359: called Result::unwrap() on an Err value: 0: Invalid date format: 2023

🧹 Nitpick comments (4)
src/query/src/within_filter.rs (2)

31-32: Avoid using unwrap() on the result of from_ymd_opt().
from_ymd_opt() returns an Option, which will be None for invalid dates. Unwrapping can cause runtime panics. Please handle potential None cases explicitly with an error or alternative behavior.

- let timestamp = NaiveDate::from_ymd_opt(year, 1, 1).unwrap();
+ if let Some(timestamp_naive) = NaiveDate::from_ymd_opt(year, 1, 1) {
+     let timestamp = Timestamp::from_chrono_date(timestamp_naive)
+         .map_err(|_| DataFusionError::Execution("Timestamp conversion error".to_string()))?;
+     ...
+ } else {
+     return Err(DataFusionError::Execution("Invalid date".to_string()));
+ }

Also applies to: 46-47


81-84: Provide a more descriptive error or handle multiple date/time formats.
Currently, the code only supports a simple 4-digit year, returning a "NotImplemented" error for other formats. To improve usability, handle or document additional date/time formats, or return a clearer error message.

src/common/function/src/scalars/math.rs (1)

47-47: Document the newly registered WithinFilterFunction.
While the function is registered here, it lacks explicit doc comments. Adding documentation clarifies its purpose and limitations for maintainers and users.

tests/cases/standalone/common/basic.result (1)

34-40: Make test outputs explicit for within_filter coverage.
The output matches the entire dataset, which might not be intuitive for a “within” single-day filter if the underlying logic eventually expands to more precise dates. Adding a second day or boundary records clarifies correctness.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d072801 and d2cffa8.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • Cargo.toml (1 hunks)
  • src/common/function/src/scalars/math.rs (2 hunks)
  • src/log-query/src/log_query.rs (1 hunks)
  • src/query/src/lib.rs (1 hunks)
  • src/query/src/planner.rs (1 hunks)
  • src/query/src/query_engine/state.rs (2 hunks)
  • src/query/src/within_filter.rs (1 hunks)
  • tests/cases/standalone/common/basic.result (1 hunks)
  • tests/cases/standalone/common/basic.sql (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/query/src/lib.rs
🧰 Additional context used
🪛 GitHub Actions: CI
src/log-query/src/log_query.rs

[error] 359-359: called Result::unwrap() on an Err value: 0: Invalid date format: 2023

src/query/src/within_filter.rs

[error] 1-1: Found missing header files. Please add the required header to the file.

🔇 Additional comments (5)
tests/cases/standalone/common/basic.result (1)

32-33: Consider adding boundary tests for the within_filter usage.
You are only testing a single date "2022-11-03." Include boundary or multi-year queries to ensure correctness around year transitions and invalid inputs.

src/query/src/planner.rs (1)

108-111: LGTM!

The integration of RangePlanRewriter is well-placed in the planning pipeline, occurring before optimization but after the initial plan creation.

src/query/src/query_engine/state.rs (1)

99-106: LGTM!

The explicit type declaration for extension_rules improves code clarity. The WithinFilterRule is correctly added after the TypeConversionRule.

tests/cases/standalone/common/basic.sql (1)

20-20: LGTM!

The test case appropriately validates the new WITHIN filter functionality.

Cargo.toml (1)

185-189: SQLParser Dependency Update: Verify Compatibility

The sqlparser dependency has been updated to use NiwakaDev's fork with the new revision b3584823b236db3c244b70945008f0aae65c5e39. Please verify that this revision fully supports all the features needed—especially the "visitor" and "serde" capabilities required for implementing the new WITHIN filter functionality—and that it does not introduce any breaking changes compared to the prior version. This change is critical as it underpins the parsing logic for the upcoming WITHIN filter feature.

@@ -0,0 +1,96 @@
use chrono::NaiveDate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add missing license header or file header comment.
It appears that your CI pipeline flags this file for a missing header. Ensure you include the necessary headers (e.g., Apache license header or any project-specific header) at the top of the file.

🧰 Tools
🪛 GitHub Actions: CI

[error] 1-1: Found missing header files. Please add the required header to the file.

))
}
} else {
todo!();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove the todo!() macro in production code.
Relying on todo!() will cause an immediate panic if encountered. Use proper error handling or implement the logic to manage unsupported scenarios.

- todo!();
+ return Err(DataFusionError::Execution("Unsupported within_filter argument type".to_string()));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
todo!();
return Err(DataFusionError::Execution("Unsupported within_filter argument type".to_string()));

Comment on lines 92 to 121
#[derive(Clone, Debug, Default)]
struct WithinFilterFunction;

impl fmt::Display for WithinFilterFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "WithinFilterFunction")
}
}

impl Function for WithinFilterFunction {
fn name(&self) -> &str {
"within_filter"
}

fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::boolean_datatype())
}

fn signature(&self) -> Signature {
Signature::uniform(
2,
vec![ConcreteDataType::string_datatype()],
Volatility::Immutable,
)
}

fn eval(&self, _func_ctx: FunctionContext, _columns: &[VectorRef]) -> Result<VectorRef> {
Err(DataFusionError::Internal("todo".into())).context(GeneralDataFusionSnafu)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Implement or refine the eval() method.
Currently, eval() always returns an internal error with "todo". This can break user queries expecting a result. Even if the final logic resides elsewhere, consider returning a more explicit message or partially implementing the function to avoid confusion.

- Err(DataFusionError::Internal("todo".into())).context(GeneralDataFusionSnafu)
+ Err(DataFusionError::NotImplemented(
+     "WithinFilterFunction eval() is not implemented yet".to_string()
+ ))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[derive(Clone, Debug, Default)]
struct WithinFilterFunction;
impl fmt::Display for WithinFilterFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "WithinFilterFunction")
}
}
impl Function for WithinFilterFunction {
fn name(&self) -> &str {
"within_filter"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::boolean_datatype())
}
fn signature(&self) -> Signature {
Signature::uniform(
2,
vec![ConcreteDataType::string_datatype()],
Volatility::Immutable,
)
}
fn eval(&self, _func_ctx: FunctionContext, _columns: &[VectorRef]) -> Result<VectorRef> {
Err(DataFusionError::Internal("todo".into())).context(GeneralDataFusionSnafu)
}
}
#[derive(Clone, Debug, Default)]
struct WithinFilterFunction;
impl fmt::Display for WithinFilterFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "WithinFilterFunction")
}
}
impl Function for WithinFilterFunction {
fn name(&self) -> &str {
"within_filter"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::boolean_datatype())
}
fn signature(&self) -> Signature {
Signature::uniform(
2,
vec![ConcreteDataType::string_datatype()],
Volatility::Immutable,
)
}
fn eval(&self, _func_ctx: FunctionContext, _columns: &[VectorRef]) -> Result<VectorRef> {
- Err(DataFusionError::Internal("todo".into())).context(GeneralDataFusionSnafu)
+ Err(DataFusionError::NotImplemented(
+ "WithinFilterFunction eval() is not implemented yet".to_string()
+ ))
}
}

if let Expr::ScalarFunction(func) = &filter.predicate
&& func.func.name() == "within_filter"
{
let column_name = func.args[0].clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to ensure the argument number is 2?

@killme2008
Copy link
Contributor

@killme2008 @waynexia I've created a rough implementation of the WITHIN filter:

My idea is

  1. Parse WITHIN filters, and treats it as functions in the parser phase(NiwakaDev/sqlparser-rs@b358482).
  2. Convert WITHIN filter functions to normal filters in the planner phase.

This idea is based on RANGE query implementation. If the approach seems good, I'd like to proceed with it. What do you think?

I think it's great! Let's do it!

@NiwakaDev NiwakaDev force-pushed the feat/support_within_filter branch from 93a3bad to e9754ed Compare February 9, 2025 11:13
@NiwakaDev NiwakaDev marked this pull request as ready for review February 9, 2025 11:14
@NiwakaDev
Copy link
Collaborator Author

@killme2008
Marked this PR as review.


#[tokio::test]
async fn test_within_filter() {
// TODO: test within filter with time zone
Copy link
Collaborator Author

@NiwakaDev NiwakaDev Feb 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QUESTION

How do we handle time zone in WITHIN filter functions?

This PR doesn't support time zone yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set the timezone info in QueryEngineContext, then we can test the queries with timezone.

See #5397 (comment)

Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What great work! Thanks. About the timezone, I commented on it above.

@evenyag Please take a look. It's a lovely feature.

}

/// Infers the time range from a given timestamp string.
fn try_to_infer_time_range(timestamp: &str) -> Option<(Timestamp, Timestamp)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I think we must take care of the timezone. We can retrieve the timezone from QueryEngineContext -> QueryEngineContext -> pub fn timezone(&self) -> Timezone. The QueryEngineContext is already in analyze signature.

I found that the NativeDataTime has a function to change timezone

https://docs.rs/chrono/latest/chrono/naive/struct.NaiveDateTime.html#method.and_local_timezone

@killme2008 killme2008 mentioned this pull request Feb 11, 2025
26 tasks
@evenyag evenyag requested a review from waynexia February 12, 2025 06:02
@waynexia waynexia requested a review from Copilot February 12, 2025 06:38
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 12 changed files in this pull request and generated 2 comments.

Files not reviewed (6)
  • tests/cases/standalone/common/function/within_filter.result: Language not supported
  • tests/cases/standalone/common/function/within_filter.sql: Language not supported
  • src/query/src/dist_plan/merge_scan.rs: Evaluated as low risk
  • src/query/src/optimizer/windowed_sort.rs: Evaluated as low risk
  • src/query/src/lib.rs: Evaluated as low risk
  • src/query/src/planner.rs: Evaluated as low risk

Comment on lines +127 to +129
parsed.set_month(1).unwrap();
parsed.set_day(1).unwrap();
Some(parsed.to_naive_date().unwrap())
Copy link
Preview

Copilot AI Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling should be added when setting month and day in try_parse_year to avoid potential panic.

Suggested change
parsed.set_month(1).unwrap();
parsed.set_day(1).unwrap();
Some(parsed.to_naive_date().unwrap())
parsed.set_month(1).ok()?;
parsed.set_day(1).ok()?;
Some(parsed.to_naive_date().ok()?)

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@evenyag evenyag Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may avoid unwrap() in this function or document why we can use unwrap(). I think ok()? is fine since we deal with user inputs.


fn eval(&self, _func_ctx: FunctionContext, _columns: &[VectorRef]) -> Result<VectorRef> {
Err(DataFusionError::Internal(
"within_filter function just a empty function, it should not be eval!".into(),
Copy link
Preview

Copilot AI Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is unclear. It should be changed to 'The within_filter function is a placeholder and should not be evaluated.'

Suggested change
"within_filter function just a empty function, it should not be eval!".into(),
"The within_filter function is a placeholder and should not be evaluated.".into(),

Copilot uses AI. Check for mistakes.

@waynexia
Copy link
Member

Great work! I plan to review this today

Comment on lines +97 to +104
// TODO: remove Vec<Arc<(dyn ExtensionAnalyzerRule + std::marker::Send + Sync + 'static)>>
let mut extension_rules: Vec<
Arc<(dyn ExtensionAnalyzerRule + std::marker::Send + Sync + 'static)>,
> = Vec::new();

// The [`TypeConversionRule`] must be at first
extension_rules.insert(0, Arc::new(TypeConversionRule) as _);
extension_rules.push(Arc::new(WithinFilterRule::new()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: remove Vec<Arc<(dyn ExtensionAnalyzerRule + std::marker::Send + Sync + 'static)>>
let mut extension_rules: Vec<
Arc<(dyn ExtensionAnalyzerRule + std::marker::Send + Sync + 'static)>,
> = Vec::new();
// The [`TypeConversionRule`] must be at first
extension_rules.insert(0, Arc::new(TypeConversionRule) as _);
extension_rules.push(Arc::new(WithinFilterRule::new()));
// TODO: remove Vec<Arc<(dyn ExtensionAnalyzerRule + Send + Sync)>>
let extension_rules: Vec<Arc<(dyn ExtensionAnalyzerRule + Send + Sync)>> = vec![
// The [`TypeConversionRule`] must be at first
Arc::new(TypeConversionRule),
Arc::new(WithinFilterRule),
];

Comment on lines +67 to +73
pub struct WithinFilterRule {}

impl WithinFilterRule {
pub fn new() -> Self {
WithinFilterRule {}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub struct WithinFilterRule {}
impl WithinFilterRule {
pub fn new() -> Self {
WithinFilterRule {}
}
}
pub struct WithinFilterRule;

@@ -188,7 +188,7 @@ smallvec = { version = "1", features = ["serde"] }
snafu = "0.8"
sysinfo = "0.30"
# on branch v0.52.x
sqlparser = { git = "https://github.com/GreptimeTeam/sqlparser-rs.git", rev = "71dd86058d2af97b9925093d40c4e03360403170", features = [
sqlparser = { git = "https://github.com/NiwakaDev/sqlparser-rs", rev = "daf8c33234f1b4568824b32efcc5d611091c4834", features = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the repo now?

Comment on lines +161 to +165
let end = if naive_date.month() == 12 {
NaiveDate::from_ymd_opt(naive_date.year() + 1, 1, 1).unwrap()
} else {
NaiveDate::from_ymd_opt(naive_date.year(), naive_date.month() + 1, 1).unwrap()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we panic here if something overflows?

}

/// Infers the time range from a given timestamp string.
fn try_to_infer_time_range(timestamp: &str) -> Option<(Timestamp, Timestamp)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might write a unit test for different time units.

\n TableScan: test";
assert_eq!(expected, plan.to_string());

// 2025-12-1T01:12:00 <= timestamp < 2025-12-1T01:13:00
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move some cases to the unit test for the try_to_infer_time_range(). It will be easier to add more cases.

@killme2008
Copy link
Contributor

@NiwakaDev Hi, do you have time to resolve these comments? I think this PR is almost done, looks great, thank you.

@NiwakaDev
Copy link
Collaborator Author

@killme2008

do you have time to resolve these comments?

Sorry for the delay. I'll address the reviews this weekend.

@waynexia
Copy link
Member

waynexia commented Feb 24, 2025

I have a few concerns about adding a new within grammar. It brings more burden of maintaining the sqlparser lib, and is not generic enough (I did some search and see no one else provides similar things).

To achieve a similar goal, @sunng87 provides another alternative method that implements it as a within() function like where within(ts, '2024') or where within(ts, '2024-04-19 23:50'). This is fully compatible with the current sql grammar and people's usual practice.

Furthermore, we can support relative time filters, like where within(ts, '5d'::INTEVAL), or arbitrary time ranges where within(ts, '2024-02-24', '2024-03-15').

But WITHIN is already a keyword, so we need to pick up another name. E.g., with_in, in_range, time_filter etc. I don't have a good choice in my mind, open to suggestions.

Really sorry to put this comment that late 🙇 It just came across to my mind on the drive

@NiwakaDev
Copy link
Collaborator Author

NiwakaDev commented Feb 24, 2025

@waynexia

I did some search and see no one else provides similar things

It seems like questdb supports a similar feature:
https://questdb.com/docs/reference/sql/where/#time-range-where-in.
But it might not be generic.

One thing I’m wondering:

if we implement it as a function, is it pushed down like a normal filter?

@killme2008
Copy link
Contributor

I have a few concerns about adding a new within grammar. It brings more burden of maintaining the sqlparser lib, and is not generic enough (I did some search and see no one else provides similar things).

To achieve a similar goal, @sunng87 provides another alternative method that implements it as a within() function like where within(ts, '2024') or where within(ts, '2024-04-19 23:50'). This is fully compatible with the current sql grammar and people's usual practice.

Furthermore, we can support relative time filters, like where within(ts, '5d'::INTEVAL), or arbitrary time ranges where within(ts, '2024-02-24', '2024-03-15').

But WITHIN is already a keyword, so we need to pick up another name. E.g., with_in, in_range, time_filter etc. I don't have a good choice in my mind, open to suggestions.

Really sorry to put this comment that late 🙇 It just came across to my mind on the drive

If you look at the PR in the parser GreptimeTeam/sqlparser-rs#16,

You will find that it doesn't change the existing SQL parser code but adds new parsing logic. So maintenance isn't a problem here at all.

This grammar is supported by QuestDB. I think it's friendlier than a SQL function. And of course, we can support the interval by extending in option:

select * from monitors where ts within '2024-04-19' in '1 day'

Like that.

@waynexia
Copy link
Member

if we implement it as a function, is it pushed down like a normal filter?

Sure, we can do this just as other functions or operators

You will find that it doesn't change the existing SQL parser code but adds new parsing logic. So maintenance isn't a problem here at all.

You could have a try yourself before you put "not a problem at all" :) RANGE clause doesn't change existing code either.

I think it's friendlier than a SQL function. And of course, we can support the interval by extending in option

Do you really think the next example is easy to understand?

@killme2008
Copy link
Contributor

The range query is worth the maintenance cost due to its usefulness and elegance, but I agree with you that within is unnecessary. @waynexia

Could you refactor this PR, @NiwakaDev? Please implement it as an SQL function. Most of the work is already done. Thanks!

@NiwakaDev
Copy link
Collaborator Author

@killme2008 @waynexia

Please implement it as an SQL function. Most of the work is already done. Thanks!

The new following approach:

  1. Registers WITHIN function.
  2. Converts WITHIN function to a normal filter in the planner phase.

I guess the second step is almost the same. What do you think?

@killme2008
Copy link
Contributor

@killme2008 @waynexia

Please implement it as an SQL function. Most of the work is already done. Thanks!

The new following approach:

  1. Registers WITHIN function.
  2. Converts WITHIN function to a normal filter in the planner phase.

I guess the second step is almost the same. What do you think?

Yes, definitely.

@waynexia
Copy link
Member

Thank you @NiwakaDev @killme2008 !

@killme2008
Copy link
Contributor

@NiwakaDev Sorry to disturb you, but do you have time to review this PR? It's nearly finished and would benefit our users. If not, I would love to help.

@NiwakaDev
Copy link
Collaborator Author

@killme2008

Sorry for the delay. March was busy😅

I can finally start working on this PR this week if you don't mind. If I can't make good progress by next week, I'll ask for your help.

Thank you!

@killme2008
Copy link
Contributor

@killme2008

Sorry for the delay. March was busy😅

I can finally start working on this PR this week if you don't mind. If I can't make good progress by next week, I'll ask for your help.

Thank you!

That would be great! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants