Skip to content

Conversation

kokila-19
Copy link
Contributor

What changes were proposed in this pull request?
Added Z-order support for Iceberg tables via CREATE TABLE DDL

Why are the changes needed?
To support zorder indexing which will improve data clustering and query performance on Iceberg tables.

Does this PR introduce any user-facing change?
Yes , new syntax support
CREATE TABLE test_zorder (
id int,
text string)
WRITE [LOCALLY] ORDERED BY ZORDER (id, text)
STORED BY iceberg
STORED As orc;

How was this patch tested?
qtest

…BLE DDL

- Supported WRITE [LOCALLY] ORDERED BY ZORDER (col1, col2, ...) syntax in CREATE TABLE to enable Z-order.
- Modified LOCALLY keyword to be optional in WRITE ORDERED BY SYNTAX.
- Persisted Z-order information in HMS for iceberg tables using sort.order and sort.columns.
- Implemented GenericUDFIcebergZorder UDF (iceberg_zorder) to compute Z-order values and sort data in ascending order.
- Ensured INSERT operations respect Z-order sorting.
Copy link

@kokila-19
Copy link
Contributor Author

@deniskuzZ @zhangbutao
I have modified the syntax and also made locally optional. Requesting your review.
Thanks !

private boolean isZOrderJSON(String jsonString) {
try {
JsonNode node = JSON_OBJECT_MAPPER.readTree(jsonString);
return node.has("zorderFields");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not move this to constans?

.map(ZOrderFieldDesc::getColumnName)
.collect(Collectors.toList());

LOG.info("Setting Z-order sort order for columns: {}", columnNames);
Copy link
Member

@deniskuzZ deniskuzZ Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe LOG.debug(Applying Z-ordering to columns: {})


LOG.info("Setting Z-order sort order for columns: {}", columnNames);

properties.put(SORT_ORDER, "ZORDER");
Copy link
Member

@deniskuzZ deniskuzZ Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please introduce the enum:

enum SortType {
LEXICAL = 0,
ZORDER = 1
}

properties.put(SORT_ORDER, "ZORDER");
properties.put(SORT_COLUMNS, String.join(",", columnNames));

LOG.info("Z-order sort order configured for Iceberg table with columns: {}", columnNames);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is redundant


public static final String CATALOG_CONFIG_PREFIX = "iceberg.catalog.";

public static final String SORT_ORDER = "sort.order";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sortOrder is ASC or DESC, should we rename to SORT_TYPE?


// Even if table has no explicit sort order, honor z-order if configured
Map<String, String> props = table.properties();
if ("ZORDER".equalsIgnoreCase(props.getOrDefault(SORT_ORDER, ""))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use enum

* - Configures a single ASC sort key with NULLS FIRST and injects a custom key expression for
* Z-order
*/
private void createZOrderCustomSort(Map<String, String> props, DynamicPartitionCtx dpCtx, Table table,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe addZOrderExpr ?

org.apache.hadoop.hive.ql.metadata.Table hmsTable, Operation writeOperation) {
String colsProp = props.get(SORT_COLUMNS);
if (StringUtils.isNotBlank(colsProp)) {
List<String> zCols = Arrays.stream(colsProp.split(",")).map(String::trim)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List zCols = Arrays.stream(colsProp.split(","))
.map(String::trim)
.filter(Predicate.not(String::isEmpty))
.toList();

List<String> zCols = Arrays.stream(colsProp.split(",")).map(String::trim)
.filter(s -> !s.isEmpty()).collect(Collectors.toList());

Map<String, Integer> fieldOrderMap = Maps.newHashMap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map<String, Integer> fieldOrderMap = new HashMap<>(fields.size());

Integer base = fieldOrderMap.get(col);
Preconditions.checkArgument(base != null, "Z-order column not found in schema: %s", col);
return base + offset;
}).collect(Collectors.toList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.toList()

return base + offset;
}).collect(Collectors.toList());

dpCtx.setCustomSortOrder(Lists.newArrayList(Collections.singletonList(1)));
Copy link
Member

@deniskuzZ deniskuzZ Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we even need to set CustomSortOrder and CustomSortNullOrder for z-order?

dpCtx.setCustomSortNullOrder(Lists.newArrayList(Collections.singletonList(NullOrdering.NULLS_FIRST.getCode())));

dpCtx.addCustomSortExpressions(Collections.singletonList(allCols -> {
List<ExprNodeDesc> args = Lists.newArrayListWithExpectedSize(zIndices.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List args = zIndices.stream()
.map(allCols::get)
.toList();

}
try {
GenericUDF udf = new GenericUDFIcebergZorder();
return ExprNodeGenericFuncDesc.newInstance(udf, "iceberg_zorder", args);
Copy link
Member

@deniskuzZ deniskuzZ Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name is redundant?

return ExprNodeGenericFuncDesc.newInstance(new GenericUDFIcebergZorder(), args)

set hive.fetch.task.conversion=more;
select * from ice_orc_sorted;

-- Validates syntax without LOCALLY keyword
Copy link
Member

@deniskuzZ deniskuzZ Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no sure this should be done on qtest level. maybe keep the locally syntax here and drop in zorder q file

(Class<? extends GenericUDF>) Class.forName("org.apache.iceberg.mr.hive.udf.GenericUDFIcebergDay"));
system.registerGenericUDF("iceberg_hour",
(Class<? extends GenericUDF>) Class.forName("org.apache.iceberg.mr.hive.udf.GenericUDFIcebergHour"));
system.registerGenericUDF("iceberg_zorder",
Copy link
Member

@deniskuzZ deniskuzZ Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to register it when you create an explicit expression?

return ExprNodeGenericFuncDesc.newInstance(new GenericUDFIcebergZorder(), args)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants