Skip to content

Draft: Hive iceberg materialized view#6393

Draft
InvisibleProgrammer wants to merge 2 commits into
apache:masterfrom
InvisibleProgrammer:hive_iceberg_materialized_view
Draft

Draft: Hive iceberg materialized view#6393
InvisibleProgrammer wants to merge 2 commits into
apache:masterfrom
InvisibleProgrammer:hive_iceberg_materialized_view

Conversation

@InvisibleProgrammer

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

@sonarqubecloud

sonarqubecloud Bot commented May 4, 2026

Copy link
Copy Markdown

@sonarqubecloud

sonarqubecloud Bot commented May 8, 2026

Copy link
Copy Markdown

return;
}

TableType tableType = Enum.valueOf(TableType.class, tableTypeValue);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the exception handled when tableTypeValue contains an unknown enum constant ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it is a leftover method. I removed it.

Comment on lines +456 to +475
if (!currentMetadata.uuid().equals(newMetadata.uuid())) {
throw new UnsupportedOperationException(
String.format("Cannot change iceberg table %s metadata location pointing to another table's metadata %s",
icebergTable.name(), newMetadataLocation)
);
}

String currentMetadataLocation = currentMetadata.metadataFileLocation();
if (!currentMetadataLocation.equals(newMetadataLocation) &&
!context.getProperties().containsKey(MANUAL_ICEBERG_METADATA_LOCATION_CHANGE)) {
// If we got here then this is an alter table operation where the table to be changed had an Iceberg commit
// meanwhile. The base metadata locations differ, while we know that this wasn't an intentional, manual
// metadata_location set by a user. To protect the interim commit we need to refresh the metadata file location.
tblParams.put(BaseMetastoreTableOperations.METADATA_LOCATION_PROP, currentMetadataLocation);
LOG.warn("Detected an alter table operation attempting to do alterations on an Iceberg table with a stale " +
"metadata_location. Considered base metadata_location: {}, Actual metadata_location: {}. Will override " +
"this request with the refreshed metadata_location in order to preserve the concurrent commit.",
newMetadataLocation, currentMetadataLocation);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these lines be extracted? They appear in assertNotCrossTableMetadataLocationChangeForTable too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Extracted.

Comment on lines +98 to +99
import org.jetbrains.annotations.NotNull;
import org.jetbrains.annotations.Nullable;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using these?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

HiveConf.getVar(configuration, HiveConf.ConfVars.HIVE_ICEBERG_MATERIALIZEDVIEW_METADATA_LOCATION)
) &&
(
TableType.MATERIALIZED_VIEW.name().equalsIgnoreCase(hmsTable.getTableType()) ||

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have managed materialized views with iceberg metadata location?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Let's see the tests after I remove it.

Comment on lines +95 to +98
public static final String MATERIALIZED_VIEW_BASE_SNAPSHOT_PROPERTY_KEY_PREFIX =
"iceberg.base.snapshot.";
public static final String MATERIALIZED_VIEW_VERSION_PROPERTY_KEY =
"iceberg.materialized.view.version";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not used. Should we remove them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thank you for spotting it.

"If this is set to true the URI for auth will have the default location masked with DEFAULT_TABLE_LOCATION"),
HIVE_ICEBERG_ALLOW_DATAFILES_IN_TABLE_LOCATION_ONLY("hive.iceberg.allow.datafiles.in.table.location.only", false,
"If this is set to true, then all the data files being read should be withing the table location"),
HIVE_ICEBERG_MATERIALIZEDVIEW_METADATA_LOCATION("hive.iceberg.materializedview.metadata.location", "metastore",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about dropping this config since we introduce external materialized views. We can extend the grammar to allow the external keyword in create materialized view statements.

create external materialized view mat_native stored by iceberg stored as orc tblproperties ('format-version'='1') as
select b, c from tbl_ice_native where c > 52;

For external MVs, store the MV metadata via the storage handler API, and keep the existing HMS storage logic for non-external MVs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm not sure what is the proper approach: As HIVE-29578: Iceberg: add support for logical views is merged, it changes the whole picture:
This implementation uses a HiveConf setting.
That implementation uses a table property, 'view-format'='iceberg'.

My problem is if we introduce the external keyword here and don't change the existing view-format solution, we provide an inconsistent interface for the users.

But I also don't think if the view-format table property is a proper approach for materialized views: it works great until we stay within the Hive engine. But what happens if we not?
How we will handle the case when the user creates a materialized view (or a simple logical view) in Spark and we want to use it from Hive?
Should create external create the view if it doesn't exist and just fill the Hive side of metadata if the view physically exists but Hive doesn't know about it yet? Like how we do it with external tables?

Tagging @difin (as the author for Iceberg views) and @zabetak (as you are on PTO now) here to get their opinion to get a consistent Hive behavior.

Comment on lines +14094 to +14100
private static @NotNull TableType getViewType(String storageHandler) {
if ("org.apache.iceberg.mr.hive.HiveIcebergStorageHandler".equals(storageHandler)) {
return TableType.EXTERNAL_MATERIALIZED_VIEW;
}

return TableType.MATERIALIZED_VIEW;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding HiveIcebergStorageHandler is not a good approach. I think the presence of a storage handler should indicate that an external MV needs to be created. Alternatively, extending the grammar to allow the EXTERNAL keyword in CREATE MATERIALIZED VIEW statements would also be a good option.

Comment on lines +58 to +59
public static final String ICEBERG_STORAGE_HANDLER = "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler";
public static final String STORAGE_HANDLER_KEY = "storage_handler";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not referenced, please remove them.

Comment on lines +20 to +48
import com.google.common.base.Joiner;
import com.google.common.collect.Lists;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hive.common.StatsSetupConst;
import org.apache.hadoop.hive.common.TableName;
import org.apache.hadoop.hive.common.repl.ReplConst;
import org.apache.hadoop.hive.metastore.ColumnType;
import org.apache.hadoop.hive.metastore.HiveMetaHook;
import org.apache.hadoop.hive.metastore.TableType;
import org.apache.hadoop.hive.metastore.Warehouse;
import org.apache.hadoop.hive.metastore.api.Database;
import org.apache.hadoop.hive.metastore.api.DatabaseType;
import org.apache.hadoop.hive.metastore.api.FieldSchema;
import org.apache.hadoop.hive.metastore.api.GetPartitionsByNamesRequest;
import org.apache.hadoop.hive.metastore.api.MetaException;
import org.apache.hadoop.hive.metastore.api.Partition;
import org.apache.hadoop.hive.metastore.api.PartitionSpec;
import org.apache.hadoop.hive.metastore.api.PartitionsSpecByExprResult;
import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
import org.apache.hadoop.hive.metastore.api.Table;
import org.apache.hadoop.hive.metastore.api.WMPoolSchedulingPolicy;
import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.annotation.Nullable;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional?

Comment on lines +295 to +298
StringUtils.isNotBlank(storageHandler) &&
TableType.EXTERNAL_MATERIALIZED_VIEW.equals(table.getTableType()) &&
"org.apache.iceberg.mr.hive.HiveIcebergStorageHandler".equals(storageHandler)
) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified to

if (TableType.EXTERNAL_MATERIALIZED_VIEW.toString().equals(table.getTableType())

?

In the current form it is always false: comparing an enum with a string

TableType.EXTERNAL_MATERIALIZED_VIEW.equals(table.getTableType())

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants