Skip to content

Commit a3c0303

Browse files
vinodkchuangxiaopingRD
authored andcommitted
[SPARK-54442][SQL] Add numeric conversion functions for TIME type
### What changes were proposed in this pull request? This PR adds six numeric conversion functions for the TIME type, mirroring the existing pattern for TIMESTAMP: Constructor Functions (Numeric → TIME): - time_from_seconds(seconds) - Supports fractional seconds via NumericType - time_from_millis(millis) - IntegralType input - time_from_micros(micros) - IntegralType input Extractor Functions (TIME → Numeric): - time_to_seconds(time) - Returns DECIMAL(14,6) to preserve fractional seconds - time_to_millis(time) - Returns BIGINT - time_to_micros(time) - Returns BIGINT eg ``` -- Constructor functions (Numeric → TIME) SELECT time_from_seconds(52200); -- 14:30:00 SELECT time_from_seconds(52200.123456); -- 14:30:00.123456 SELECT time_from_millis(52200123); -- 14:30:00.123 SELECT time_from_micros(52200123456); -- 14:30:00.123456 -- Extractor functions (TIME → Numeric) SELECT time_to_seconds(TIME'14:30:00.123456'); -- 52200.123456 SELECT time_to_millis(TIME'14:30:00.123'); -- 52200123 SELECT time_to_micros(TIME'14:30:00.123456'); -- 52200123456 ``` ### Why are the changes needed? The TIME type lacks numeric conversion functions, making it difficult to: - Create TIME values from numeric representations (common in data ingestion) - Extract numeric values for calculations or external system integration TIMESTAMP has equivalent functions (timestamp_seconds(), unix_seconds(), etc.), and TIME should achieve feature parity. ### Does this PR introduce _any_ user-facing change? Yes, adds six new SQL functions: ### How was this patch tested? Unit Tests (TimeExpressionsSuite.scala): SQL Integration Tests (time.sql) ### Was this patch authored or co-authored using generative AI tooling? Yes. Generated-by: Claude 3.5 Sonnet AI assistance was used for: - Code pattern analysis and design discussions - Implementation guidance following Spark conventions - Test case generation and organization - Documentation and examples ### Additional context Q: Why does time_to_seconds() return DECIMAL(14,6) instead of BIGINT? A: To preserve fractional seconds and enable exact round-trip conversions: ``` SELECT time_to_seconds(time_from_seconds(52200.123456)); -- Returns: 52200.123456 (exact match) ✓ ``` If we returned BIGINT, fractional seconds would be lost. Q: Why does time_from_seconds() accept NumericType instead of just IntegralType? A: To support fractional seconds for maximum flexibility: ``` SELECT time_from_seconds(52200.5); -- DECIMAL: 14:30:00.5 SELECT time_from_seconds(52200.5::float); -- FLOAT: 14:30:00.5 ``` This mirrors timestamp_seconds() which also accepts NumericType. Closes apache#53147 from vinodkc/br_time_numeric_conversion. Authored-by: vinodkc <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 58b074d commit a3c0303

File tree

11 files changed

+1434
-2
lines changed

11 files changed

+1434
-2
lines changed

python/pyspark/sql/connect/functions/builtin.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3945,6 +3945,48 @@ def make_time(hour: "ColumnOrName", minute: "ColumnOrName", second: "ColumnOrNam
39453945
make_time.__doc__ = pysparkfuncs.make_time.__doc__
39463946

39473947

3948+
def time_from_seconds(col: "ColumnOrName") -> Column:
3949+
return _invoke_function_over_columns("time_from_seconds", col)
3950+
3951+
3952+
time_from_seconds.__doc__ = pysparkfuncs.time_from_seconds.__doc__
3953+
3954+
3955+
def time_from_millis(col: "ColumnOrName") -> Column:
3956+
return _invoke_function_over_columns("time_from_millis", col)
3957+
3958+
3959+
time_from_millis.__doc__ = pysparkfuncs.time_from_millis.__doc__
3960+
3961+
3962+
def time_from_micros(col: "ColumnOrName") -> Column:
3963+
return _invoke_function_over_columns("time_from_micros", col)
3964+
3965+
3966+
time_from_micros.__doc__ = pysparkfuncs.time_from_micros.__doc__
3967+
3968+
3969+
def time_to_seconds(col: "ColumnOrName") -> Column:
3970+
return _invoke_function_over_columns("time_to_seconds", col)
3971+
3972+
3973+
time_to_seconds.__doc__ = pysparkfuncs.time_to_seconds.__doc__
3974+
3975+
3976+
def time_to_millis(col: "ColumnOrName") -> Column:
3977+
return _invoke_function_over_columns("time_to_millis", col)
3978+
3979+
3980+
time_to_millis.__doc__ = pysparkfuncs.time_to_millis.__doc__
3981+
3982+
3983+
def time_to_micros(col: "ColumnOrName") -> Column:
3984+
return _invoke_function_over_columns("time_to_micros", col)
3985+
3986+
3987+
time_to_micros.__doc__ = pysparkfuncs.time_to_micros.__doc__
3988+
3989+
39483990
@overload
39493991
def make_timestamp(
39503992
years: "ColumnOrName",

python/pyspark/sql/functions/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,12 @@
249249
"timestamp_millis",
250250
"timestamp_seconds",
251251
"time_diff",
252+
"time_from_micros",
253+
"time_from_millis",
254+
"time_from_seconds",
255+
"time_to_micros",
256+
"time_to_millis",
257+
"time_to_seconds",
252258
"time_trunc",
253259
"to_date",
254260
"to_time",

python/pyspark/sql/functions/builtin.py

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24958,6 +24958,162 @@ def make_time(hour: "ColumnOrName", minute: "ColumnOrName", second: "ColumnOrNam
2495824958
return _invoke_function_over_columns("make_time", hour, minute, second)
2495924959

2496024960

24961+
@_try_remote_functions
24962+
def time_from_seconds(col: "ColumnOrName") -> Column:
24963+
"""
24964+
Creates a TIME value from seconds since midnight (supports fractional seconds).
24965+
24966+
.. versionadded:: 4.2.0
24967+
24968+
Parameters
24969+
----------
24970+
col : :class:`~pyspark.sql.Column` or column name
24971+
Seconds since midnight (0 to 86399.999999).
24972+
24973+
Examples
24974+
--------
24975+
>>> from pyspark.sql import functions as sf
24976+
>>> df = spark.createDataFrame([(52200.5,)], ['seconds'])
24977+
>>> df.select(sf.time_from_seconds('seconds')).show()
24978+
+--------------------------+
24979+
|time_from_seconds(seconds)|
24980+
+--------------------------+
24981+
| 14:30:00.5|
24982+
+--------------------------+
24983+
"""
24984+
return _invoke_function_over_columns("time_from_seconds", col)
24985+
24986+
24987+
@_try_remote_functions
24988+
def time_from_millis(col: "ColumnOrName") -> Column:
24989+
"""
24990+
Creates a TIME value from milliseconds since midnight.
24991+
24992+
.. versionadded:: 4.2.0
24993+
24994+
Parameters
24995+
----------
24996+
col : :class:`~pyspark.sql.Column` or column name
24997+
Milliseconds since midnight (0 to 86399999).
24998+
24999+
Examples
25000+
--------
25001+
>>> from pyspark.sql import functions as sf
25002+
>>> df = spark.createDataFrame([(52200500,)], ['millis'])
25003+
>>> df.select(sf.time_from_millis('millis')).show()
25004+
+------------------------+
25005+
|time_from_millis(millis)|
25006+
+------------------------+
25007+
| 14:30:00.5|
25008+
+------------------------+
25009+
"""
25010+
return _invoke_function_over_columns("time_from_millis", col)
25011+
25012+
25013+
@_try_remote_functions
25014+
def time_from_micros(col: "ColumnOrName") -> Column:
25015+
"""
25016+
Creates a TIME value from microseconds since midnight.
25017+
25018+
.. versionadded:: 4.2.0
25019+
25020+
Parameters
25021+
----------
25022+
col : :class:`~pyspark.sql.Column` or column name
25023+
Microseconds since midnight (0 to 86399999999).
25024+
25025+
Examples
25026+
--------
25027+
>>> from pyspark.sql import functions as sf
25028+
>>> df = spark.createDataFrame([(52200500000,)], ['micros'])
25029+
>>> df.select(sf.time_from_micros('micros')).show()
25030+
+------------------------+
25031+
|time_from_micros(micros)|
25032+
+------------------------+
25033+
| 14:30:00.5|
25034+
+------------------------+
25035+
"""
25036+
return _invoke_function_over_columns("time_from_micros", col)
25037+
25038+
25039+
@_try_remote_functions
25040+
def time_to_seconds(col: "ColumnOrName") -> Column:
25041+
"""
25042+
Extracts seconds from TIME value (returns DECIMAL to preserve fractional seconds).
25043+
25044+
.. versionadded:: 4.2.0
25045+
25046+
Parameters
25047+
----------
25048+
col : :class:`~pyspark.sql.Column` or column name
25049+
TIME value to convert.
25050+
25051+
Examples
25052+
--------
25053+
>>> from pyspark.sql import functions as sf
25054+
>>> df = spark.sql("SELECT TIME'14:30:00.5' as time")
25055+
>>> df.select(sf.time_to_seconds('time')).show()
25056+
+---------------------+
25057+
|time_to_seconds(time)|
25058+
+---------------------+
25059+
| 52200.500000|
25060+
+---------------------+
25061+
"""
25062+
return _invoke_function_over_columns("time_to_seconds", col)
25063+
25064+
25065+
@_try_remote_functions
25066+
def time_to_millis(col: "ColumnOrName") -> Column:
25067+
"""
25068+
Extracts milliseconds from TIME value.
25069+
25070+
.. versionadded:: 4.2.0
25071+
25072+
Parameters
25073+
----------
25074+
col : :class:`~pyspark.sql.Column` or column name
25075+
TIME value to convert.
25076+
25077+
Examples
25078+
--------
25079+
>>> from pyspark.sql import functions as sf
25080+
>>> df = spark.sql("SELECT TIME'14:30:00.5' as time")
25081+
>>> df.select(sf.time_to_millis('time')).show()
25082+
+--------------------+
25083+
|time_to_millis(time)|
25084+
+--------------------+
25085+
| 52200500|
25086+
+--------------------+
25087+
"""
25088+
return _invoke_function_over_columns("time_to_millis", col)
25089+
25090+
25091+
@_try_remote_functions
25092+
def time_to_micros(col: "ColumnOrName") -> Column:
25093+
"""
25094+
Extracts microseconds from TIME value.
25095+
25096+
.. versionadded:: 4.2.0
25097+
25098+
Parameters
25099+
----------
25100+
col : :class:`~pyspark.sql.Column` or column name
25101+
TIME value to convert.
25102+
25103+
Examples
25104+
--------
25105+
>>> from pyspark.sql import functions as sf
25106+
>>> df = spark.sql("SELECT TIME'14:30:00.5' as time")
25107+
>>> df.select(sf.time_to_micros('time')).show()
25108+
+--------------------+
25109+
|time_to_micros(time)|
25110+
+--------------------+
25111+
| 52200500000|
25112+
+--------------------+
25113+
"""
25114+
return _invoke_function_over_columns("time_to_micros", col)
25115+
25116+
2496125117
@overload
2496225118
def make_timestamp(
2496325119
years: "ColumnOrName",

sql/api/src/main/scala/org/apache/spark/sql/functions.scala

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6910,6 +6910,55 @@ object functions {
69106910
Column.fn("time_trunc", unit, time)
69116911
}
69126912

6913+
/**
6914+
* Creates a TIME from the number of seconds since midnight.
6915+
*
6916+
* @group datetime_funcs
6917+
* @since 4.2.0
6918+
*/
6919+
def time_from_seconds(e: Column): Column = Column.fn("time_from_seconds", e)
6920+
6921+
/**
6922+
* Creates a TIME from the number of milliseconds since midnight.
6923+
*
6924+
* @group datetime_funcs
6925+
* @since 4.2.0
6926+
*/
6927+
def time_from_millis(e: Column): Column = Column.fn("time_from_millis", e)
6928+
6929+
/**
6930+
* Creates a TIME from the number of microseconds since midnight.
6931+
*
6932+
* @group datetime_funcs
6933+
* @since 4.2.0
6934+
*/
6935+
def time_from_micros(e: Column): Column = Column.fn("time_from_micros", e)
6936+
6937+
/**
6938+
* Extracts the number of seconds (including fractional seconds) from a TIME value. Returns a
6939+
* DECIMAL(14,6) to preserve microsecond precision.
6940+
*
6941+
* @group datetime_funcs
6942+
* @since 4.2.0
6943+
*/
6944+
def time_to_seconds(e: Column): Column = Column.fn("time_to_seconds", e)
6945+
6946+
/**
6947+
* Extracts the number of milliseconds since midnight from a TIME value.
6948+
*
6949+
* @group datetime_funcs
6950+
* @since 4.2.0
6951+
*/
6952+
def time_to_millis(e: Column): Column = Column.fn("time_to_millis", e)
6953+
6954+
/**
6955+
* Extracts the number of microseconds since midnight from a TIME value.
6956+
*
6957+
* @group datetime_funcs
6958+
* @since 4.2.0
6959+
*/
6960+
def time_to_micros(e: Column): Column = Column.fn("time_to_micros", e)
6961+
69136962
/**
69146963
* Parses the `timestamp` expression with the `format` expression to a timestamp without time
69156964
* zone. Returns null with invalid input.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -701,6 +701,12 @@ object FunctionRegistry {
701701
expression[MakeDate]("make_date"),
702702
expression[MakeTime]("make_time"),
703703
expression[TimeTrunc]("time_trunc"),
704+
expression[TimeFromSeconds]("time_from_seconds"),
705+
expression[TimeFromMillis]("time_from_millis"),
706+
expression[TimeFromMicros]("time_from_micros"),
707+
expression[TimeToSeconds]("time_to_seconds"),
708+
expression[TimeToMillis]("time_to_millis"),
709+
expression[TimeToMicros]("time_to_micros"),
704710
expressionBuilder("make_timestamp", MakeTimestampExpressionBuilder),
705711
expressionBuilder("try_make_timestamp", TryMakeTimestampExpressionBuilder),
706712
expression[MonthName]("monthname"),

0 commit comments

Comments
 (0)