Skip to content

[SPARK-52272][SQL] HiveExternalCatalog alter table should update column comments #50991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

szehon-ho
Copy link
Contributor

@szehon-ho szehon-ho commented May 22, 2025

This bug was noticed by @gengliangwang , thanks!

What changes were proposed in this pull request?

Improve HiveExternalCatalog.alterTable to update column comments if they are changed in the schema.

Why are the changes needed?

Users use this method assuming comments should be updated.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add test to HiveDDLSuite

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label May 22, 2025
@szehon-ho szehon-ho force-pushed the hive_alter_table_comment branch 2 times, most recently from 24fdfe6 to 2cbbaeb Compare May 22, 2025 22:13
@szehon-ho szehon-ho force-pushed the hive_alter_table_comment branch from 2cbbaeb to 05b278f Compare May 22, 2025 22:14
@szehon-ho szehon-ho changed the title [SPARK-52272][SQL] HiveExternalCatalog alter table loses comment updates [SPARK-52272][SQL] HiveExternalCatalog alter table should update column comments May 22, 2025
@@ -684,11 +684,23 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
} else {
(oldTableDef.schema, oldTableDef.partitionColumnNames)
}

// alter column comments
val newCommentMap = tableDefinition.schema.map(f => (f.name, f.getComment())).toMap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work with nested columns. For example, a.b.c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, let me add to this

// // Add old table's owner if we need to restore
val owner = Option(tableDefinition.owner).filter(_.nonEmpty).getOrElse(oldTableDef.owner)
val newDef = tableDefinition.copy(
storage = newStorage,
schema = newSchema,
schema = newSchemaWithComments,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we can't use the schema from the input parameter tableDefinition directly. cc @cloud-fan @yaooqinn do you know the context?

Copy link
Contributor Author

@szehon-ho szehon-ho May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gengliangwang looking at this, actually there is a separate method for alter the schema https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L706 maybe we rushed a bit to solution and the problem is the caller should call this one instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is true, ideally this method should handle update the schema as well. Otherwise the caller needs to make two separate alter calls and its more expensive

@szehon-ho
Copy link
Contributor Author

szehon-ho commented May 24, 2025

As this look like its by design: #16944 (comment)

I close it in favor of : #51007

@szehon-ho szehon-ho closed this May 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants