Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51384][SQL] Support java.time.LocalTime as the external type of TimeType #50153

Closed
wants to merge 8 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Mar 4, 2025

What changes were proposed in this pull request?

In the PR, I propose to support java.time.LocalTime as the external type of new data type TIME introduced by #50103. After the changes, users can create Datasets with TimeType columns and collect them back as instances of java.time.LocalTime. For example:

scala> val df = Seq(LocalTime.of(12, 15)).toDF
val df: org.apache.spark.sql.DataFrame = [value: time(6)]

scala> df.printSchema
root
 |-- value: time(6) (nullable = true)

scala> df.first.getAs[LocalTime](0)
val res8: java.time.LocalTime = 12:15

By default the external type is encoded to the TIME type column with precision = 6 (microseconds).

Why are the changes needed?

  1. To allow creation of TIME columns using public Scala/Java API otherwise new type is useless.
  2. To be able to write tests when supporting new type in other parts of Spark SQL.

Does this PR introduce any user-facing change?

Yes, in some sense since the PR allow to create TimeType columns using Scala/Java APIs.

How was this patch tested?

By running new tests:

$ build/sbt "test:testOnly *DateTimeUtilsSuite"
$ build/sbt "test:testOnly *CatalystTypeConvertersSuite"
$ build/sbt "test:testOnly *DatasetSuite"

and modified:

$ build/sbt "test:testOnly *DataTypeSuite"

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Mar 4, 2025
@github-actions github-actions bot added the DOCS label Mar 4, 2025
@MaxGekk MaxGekk changed the title [WIP][SPARK-51384][SQL] Support java.time.LocalTime as the external type of TimeType [SPARK-51384][SQL] Support java.time.LocalTime as the external type of TimeType Mar 5, 2025
@MaxGekk MaxGekk marked this pull request as ready for review March 5, 2025 11:12
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 6, 2025

Merging to master. Thank you, @yaooqinn @dongjoon-hyun for review.

@MaxGekk MaxGekk closed this in 1054f0d Mar 6, 2025
Pajaraja pushed a commit to Pajaraja/spark that referenced this pull request Mar 6, 2025
… of `TimeType`

### What changes were proposed in this pull request?
In the PR, I propose to support `java.time.LocalTime` as the external type of new data type `TIME` introduced by apache#50103. After the changes, users can create Datasets with `TimeType` columns and collect them back as instances of `java.time.LocalTime`. For example:
```scala
scala> val df = Seq(LocalTime.of(12, 15)).toDF
val df: org.apache.spark.sql.DataFrame = [value: time(6)]

scala> df.printSchema
root
 |-- value: time(6) (nullable = true)

scala> df.first.getAs[LocalTime](0)
val res8: java.time.LocalTime = 12:15
```

By default the external type is encoded to the `TIME` type column with precision = 6 (microseconds).

### Why are the changes needed?
1. To allow creation of TIME columns using public Scala/Java API otherwise new type is useless.
2. To be able to write tests when supporting new type in other parts of Spark SQL.

### Does this PR introduce _any_ user-facing change?
Yes, in some sense since the PR allow to create `TimeType` columns using Scala/Java APIs.

### How was this patch tested?
By running new tests:
```
$ build/sbt "test:testOnly *DateTimeUtilsSuite"
$ build/sbt "test:testOnly *CatalystTypeConvertersSuite"
$ build/sbt "test:testOnly *DatasetSuite"
```
and modified:
```
$ build/sbt "test:testOnly *DataTypeSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#50153 from MaxGekk/time-localtime.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
anoopj pushed a commit to anoopj/spark that referenced this pull request Mar 15, 2025
… of `TimeType`

### What changes were proposed in this pull request?
In the PR, I propose to support `java.time.LocalTime` as the external type of new data type `TIME` introduced by apache#50103. After the changes, users can create Datasets with `TimeType` columns and collect them back as instances of `java.time.LocalTime`. For example:
```scala
scala> val df = Seq(LocalTime.of(12, 15)).toDF
val df: org.apache.spark.sql.DataFrame = [value: time(6)]

scala> df.printSchema
root
 |-- value: time(6) (nullable = true)

scala> df.first.getAs[LocalTime](0)
val res8: java.time.LocalTime = 12:15
```

By default the external type is encoded to the `TIME` type column with precision = 6 (microseconds).

### Why are the changes needed?
1. To allow creation of TIME columns using public Scala/Java API otherwise new type is useless.
2. To be able to write tests when supporting new type in other parts of Spark SQL.

### Does this PR introduce _any_ user-facing change?
Yes, in some sense since the PR allow to create `TimeType` columns using Scala/Java APIs.

### How was this patch tested?
By running new tests:
```
$ build/sbt "test:testOnly *DateTimeUtilsSuite"
$ build/sbt "test:testOnly *CatalystTypeConvertersSuite"
$ build/sbt "test:testOnly *DatasetSuite"
```
and modified:
```
$ build/sbt "test:testOnly *DataTypeSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#50153 from MaxGekk/time-localtime.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
kazemaksOG pushed a commit to kazemaksOG/spark-custom-scheduler that referenced this pull request Mar 27, 2025
… of `TimeType`

### What changes were proposed in this pull request?
In the PR, I propose to support `java.time.LocalTime` as the external type of new data type `TIME` introduced by apache#50103. After the changes, users can create Datasets with `TimeType` columns and collect them back as instances of `java.time.LocalTime`. For example:
```scala
scala> val df = Seq(LocalTime.of(12, 15)).toDF
val df: org.apache.spark.sql.DataFrame = [value: time(6)]

scala> df.printSchema
root
 |-- value: time(6) (nullable = true)

scala> df.first.getAs[LocalTime](0)
val res8: java.time.LocalTime = 12:15
```

By default the external type is encoded to the `TIME` type column with precision = 6 (microseconds).

### Why are the changes needed?
1. To allow creation of TIME columns using public Scala/Java API otherwise new type is useless.
2. To be able to write tests when supporting new type in other parts of Spark SQL.

### Does this PR introduce _any_ user-facing change?
Yes, in some sense since the PR allow to create `TimeType` columns using Scala/Java APIs.

### How was this patch tested?
By running new tests:
```
$ build/sbt "test:testOnly *DateTimeUtilsSuite"
$ build/sbt "test:testOnly *CatalystTypeConvertersSuite"
$ build/sbt "test:testOnly *DatasetSuite"
```
and modified:
```
$ build/sbt "test:testOnly *DataTypeSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#50153 from MaxGekk/time-localtime.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants