Skip to content

Conversation

@davidm-db
Copy link
Contributor

@davidm-db davidm-db commented Dec 5, 2025

What changes were proposed in this pull request?

Introducing a new SQL config for TIME type: spark.sql.timeType.enabled.

The default value is false and it is enabled only in tests.

Why are the changes needed?

TIME data type support is not complete, so we need to guard it before it is completed, especially ahead of Spark 4.1 release.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Need to add tests for disabled config.

Was this patch authored or co-authored using generative AI tooling?

No.

@davidm-db davidm-db force-pushed the davidm-db/time-config branch from aa1ac43 to 5857c5d Compare December 5, 2025 15:39
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make CI happy, @davidm-db ?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-54609] Disable TIME type by default [SPARK-54609][SQL] Disable TIME type by default Dec 5, 2025
@dongjoon-hyun
Copy link
Member

}

override def supportDataType(dataType: DataType): Boolean = dataType match {
case _: TimeType => SQLConf.get. isTimeTypeEnabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we block geo types for data sources?

Copy link
Contributor Author

@davidm-db davidm-db Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion with @uros-db, blocking for Parquet should be sufficient for TIME.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added guards for all file formats (that haven't previously been explicitly marked as not supported for TIME).

@dongjoon-hyun
Copy link
Member

Could you answer the above comments and make this PR pass the CIs for further discussion, @davidm-db ?

@yaooqinn
Copy link
Member

yaooqinn commented Dec 7, 2025

TimeType is marked as Unstable. Is this short-term prohibition actually required?

@dongjoon-hyun
Copy link
Member

Yes, we need this to protect the users from the accidental use of unfinished work, @yaooqinn .

TimeType is marked as Unstable. Is this short-term prohibition actually required?

@davidm-db davidm-db force-pushed the davidm-db/time-config branch from f58fbfa to 02c68f0 Compare December 8, 2025 10:16
@davidm-db davidm-db force-pushed the davidm-db/time-config branch from 02c68f0 to c861aa6 Compare December 8, 2025 10:33
val specialDate = convertSpecialDate(value, zoneId).map(Literal(_, DateType))
specialDate.getOrElse(toLiteral(stringToDate, DateType))
case TIME => toLiteral(stringToTime, TimeType())
case TIME if conf.isTimeTypeEnabled => toLiteral(stringToTime, TimeType())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have the check here, we don't need to update SqlBaseParser.g4 to complicate things.

Copy link
Contributor Author

@davidm-db davidm-db Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replicated what Max did internally. I think the reason for this is:

  • the code you are commenting is handling literal types (statement example: SELECT TIME'10:00:00') and is done this way to fit into the existing error message format
  • {time_type_enabled}? guard in SqlBaseParser.g4 guards references to the TIME as a type and throws different class of errors, i.e. datatype unsupported (statement example: CREATE TABLE t(col TIME))

I don't know if we want to change this behavior or not, please share your thoughts.

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 8, 2025

can we add some test cases following the geo types blocking PRs?

Copy link
Contributor

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidm-db Should we add some tests, e.g.

  • e2e sql query tests with config turned off
  • blocking data sources like Parquet, CSV
  • data frames (classic and spark connect)
  • also, there are Scala suites for casting

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, given that @yaooqinn 's comment, while working on this PR, we need to build a consensus on this by sending out an email to dev@spark mailing list, @davidm-db , @uros-db , and @cloud-fan .

It would be enough to reply on RC2 email about the TIME type. Maybe, could you send out the decision clearly to the mailing list, please, @cloud-fan , because @MaxGekk is not in the loop yet?

unsupportedType = ctx.literalType.getText,
supportedTypes =
// TODO: Remove TIME from the list.
Seq("DATE", "TIMESTAMP_NTZ", "TIMESTAMP_LTZ", "TIMESTAMP", "INTERVAL", "X", "TIME"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, the following style?

- Seq("DATE", "TIMESTAMP_NTZ", "TIMESTAMP_LTZ", "TIMESTAMP", "INTERVAL", "X", "TIME"),
+ Seq("DATE", "TIMESTAMP_NTZ", "TIMESTAMP_LTZ", "TIMESTAMP", "INTERVAL", "X") ++ (if (conf.isTimeTypeEnabled) Seq("TIME") else None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will do this definitely. There are a lot of dependencies and TIME needs to be guarded in a lot of places, so for now I'm just trying to figure out what's needed to make the CI pass. Afterwards, I'll sort out the TODO comment. Hope to finish everything tomorrow!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@davidm-db davidm-db force-pushed the davidm-db/time-config branch from f2552ce to c9cf94c Compare December 8, 2025 23:26
@davidm-db
Copy link
Contributor Author

@dongjoon-hyun @cloud-fan I think I've resolved all of the comments. I'll make sure tonight that after my latest changes all CIs are passing.
Tomorrow, I'll add some negative tests as Wenchen suggested (i.e. TIME really doesn't work when flag is disabled) and I think that should be it for this PR.

@dongjoon-hyun
Copy link
Member

Thank you, @davidm-db .

override def toString: String = "XML"

override def supportDataType(dataType: DataType): Boolean = dataType match {
case _: TimeType => SQLConf.get.isTimeTypeEnabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this covering everything? isn't this only for a write path? how do we handle blocking on the read path then?

I assume that the idea is to have a check in single place instead of doing it for each File Format, which makes complete sense. I'm just wondering if with what you suggested we are covering the same scope as with checks in File Formats (current state of the code)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also block read path in DataSource.resolveRelation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh DataSourceUtils.verifySchema is a better narrow waist for both read and write paths.

Copy link
Contributor Author

@davidm-db davidm-db Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I just have one additional question - if we go down this way (which makes sense on a high level), I think we might be missing to completely block the type. for example, if you take a look at ParquetFileFormat#supportDataType, it recursively calls supportDataType function in case the root type is Array/Map/Struct. from the top of my mind, I think the same holds for Xml, maybe for some others.

am I missing something or what I just said makes sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataSourceUtils.verifySchema gets the full schema and we can do whatever we want, e.g.

if (schema.existsRecursively(_.isInstanceOf[TimeType])) fail ...

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still missing, @davidm-db and @cloud-fan ?

Need to add tests for disabled config.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 10, 2025

I verified manually that it's blocked properly.

scala> spark.sql("CREATE TABLE t(c TIME)")
org.apache.spark.sql.catalyst.parser.ParseException:
[UNSUPPORTED_DATATYPE] Unsupported data type "TIME". SQLSTATE: 0A000
== SQL (line 1, position 18) ==
CREATE TABLE t(c TIME)
                 ^^^^

Given that, shall we proceed those test case as a follow-up, @davidm-db and @cloud-fan ?

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 11, 2025

I removed all client side checks as it's not very meaningful. People can use the TimeType class directly and there is no point to only block SQL. People can use their own Spark Connect client which is out of our control. I think the server side checks should be sufficient: we block time related functions, and we block query result with time type (either collect the rows or write to data sources). This is also how we block geo types.

@dongjoon-hyun
Copy link
Member

So, is this the final status from your side, @cloud-fan ? Then, could you give your approval?

new GeometryConverter(g)
case DateType if SQLConf.get.datetimeJava8ApiEnabled => LocalDateConverter
case DateType => DateConverter
case _: TimeType if !SQLConf.get.isTimeTypeEnabled =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record, we don't have this for GeographyType|GeometryType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have it here, a few lines above

      case _ @ (_: GeographyType | _: GeometryType) if !SQLConf.get.geospatialEnabled =>
        throw new org.apache.spark.sql.AnalysisException(
          errorClass = "UNSUPPORTED_FEATURE.GEOSPATIAL_DISABLED",
          messageParameters = scala.collection.immutable.Map.empty)

def verifySchema(format: FileFormat, schema: StructType, readOnly: Boolean = false): Unit = {
if (!SQLConf.get.isTimeTypeEnabled && schema.existsRecursively(_.isInstanceOf[TimeType])) {
throw QueryCompilationErrors.unsupportedTimeTypeError()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. We don't have this for Geo*Type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no data source supports geo types yet, so it's not needed for now. But to be future-proof we should check geo here as well.

"""
print("Enabling TIME data type")
jspark.sql("SET spark.sql.timeType.enabled = true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have this for Geo*Type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious about why geo didn't fail here...

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

+1, LGTM. Thank you so much, @cloud-fan .

@dongjoon-hyun
Copy link
Member

Could you cancel all previously launched the GitHub Action CI in order to make run the last commit? It seems that the last commit didn't get the resource yet.

Screenshot 2025-12-10 at 18 54 16

@dongjoon-hyun
Copy link
Member

I manually verified the compilation and Scalalinter and both on and off behavior manually.

scala> sql("create table t(a TIME)").show()
org.apache.spark.sql.AnalysisException: [UNSUPPORTED_TIME_TYPE] The data type TIME is not supported. SQLSTATE: 0A000

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 11, 2025

Merged to master.

Could you make a backporting PR, @davidm-db and @cloud-fan ? There exist conflicts in branch-4.1.

Never mind. I resolved the conflicts and am testing locally on branch-4.1 now.

dongjoon-hyun pushed a commit that referenced this pull request Dec 11, 2025
Introducing a new SQL config for TIME type: `spark.sql.timeType.enabled`.

The default value is `false` and it is enabled only in tests.

TIME data type support is not complete, so we need to guard it before it is completed, especially ahead of Spark 4.1 release.

No.

Need to add tests for disabled config.

No.

Closes #53344 from davidm-db/davidm-db/time-config.

Lead-authored-by: David Milicevic <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 18a9435)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Merged to branch-4.1, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants