Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add Support for INTERVAL Data Type - Already Supported in Parquet & Arrow #16677

Open
inviscid opened this issue Oct 23, 2024 · 7 comments · May be fixed by #16990
Open

Feature: Add Support for INTERVAL Data Type - Already Supported in Parquet & Arrow #16677

inviscid opened this issue Oct 23, 2024 · 7 comments · May be fixed by #16990
Assignees
Labels
A-query Area: databend query C-feature Category: feature

Comments

@inviscid
Copy link

Summary

Interval is a value type that Databend understands as it is used in date addition. However, there is no current way to store an Interval value like can be done in Postgres.

While Snowflake and MySQL also do not support the Interval type, Postgres does and it makes life so much easier since it is quite common to store duration information. Both Parquet and Arrow do support an Interval/Duration data type.

The Parquet standard does support an Interval data type as defined here: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#interval

Arrow also supports a Duration type with varying levels of resolution. It would likely be safe to pick a reasonable default resolution for Arrow usage: https://arrow.apache.org/docs/python/generated/pyarrow.duration.html#pyarrow.duration. This conversion function seems to suggest that might be milliseconds: https://arrow.apache.org/rust/parquet/arrow/arrow_writer/fn.get_interval_dt_array_slice.html

The ideal approach would be one where an Interval value could be marshalled and unmarshalled from Parquet using native Parquet and Arrow types.

@inviscid inviscid added the C-feature Category: feature label Oct 23, 2024
@sundy-li sundy-li added the A-query Area: databend query label Oct 24, 2024
@sundy-li
Copy link
Member

New DataTypes could be supported after #16610 , we are still in a big refactoring stage.

@sundy-li sundy-li self-assigned this Oct 24, 2024
@BohuTANG
Copy link
Member

#16610 has been merged, and this feature is now ready to be added to the work queue.

@sundy-li
Copy link
Member

Better do it after #16814

@inviscid
Copy link
Author

@sundy-li now that #16814 is complete, do you have an estimate on when this might get some focus?

@sundy-li
Copy link
Member

@TCeason is already working on it.

We will support interval units via months, days, or microseconds

@inviscid
Copy link
Author

inviscid commented Dec 2, 2024

@TCeason Sorry to bother you on this one. We are trying to plan out migrations to Databend but we have a couple of projects that depend on the Interval data type being available. Would it be realistic to see that new data type this week or is that too soon? We would have to push the migrations to next year if we can't get testing done by next week.

Thanks...

@TCeason
Copy link
Collaborator

TCeason commented Dec 2, 2024

@TCeason Sorry to bother you on this one. We are trying to plan out migrations to Databend but we have a couple of projects that depend on the Interval data type being available. Would it be realistic to see that new data type this week or is that too soon? We would have to push the migrations to next year if we can't get testing done by next week.

Thanks...

This week I'm prioritizing this task. It is expected to be completed by the end of the week

@TCeason TCeason linked a pull request Dec 5, 2024 that will close this issue
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query Area: databend query C-feature Category: feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants