Equality delete tests #2773

tomlarkworthy · 2025-11-20T16:11:53Z

I've successfully created a proof-of-concept demonstrating that PyIceberg already supports writing equality delete files via transactions, even though the read path is not yet implemented.

What I Discovered

No tests use actual equality_ids values - All existing tests either set it to [] or None
The write infrastructure is complete and working - All necessary components exist:
- DataFileContent.EQUALITY_DELETES enum
- equality_ids field in DataFile
- Snapshot tracking for equality deletes
- Manifest serialization
The key is using the transaction API directly:
with table.transaction() as txn:
update_snapshot = txn.update_snapshot()
with update_snapshot.fast_append() as append_files:
append_files.append_data_file(delete_file) # Works for delete files!

Files Created

test_equality_delete_poc.py - Detailed standalone test with verbose output
test_add_equality_delete.py - Clean pytest suite with 2 passing tests:
- Single equality delete file
- Multiple delete files with different equality_ids
example_add_equality_delete.py - Complete working examples showing:
- Basic usage (single column)
- Composite keys (multiple columns)
- Multiple delete files in one transaction
EQUALITY_DELETE_POC_SUMMARY.md - Comprehensive documentation

Test Results

All tests pass successfully:
test_add_equality_delete.py::test_add_equality_delete_file_via_transaction PASSED
test_add_equality_delete.py::test_add_multiple_equality_delete_files_with_different_equality_ids PASSED
====== 2 passed in 1.06s ======

Key Takeaways

✅ You can write equality delete files today using the transaction API
✅ Single column deletes: equality_ids=[1]
✅ Composite key deletes: equality_ids=[1, 2]
✅ Multiple delete files can be added in one transaction
✅ Metadata tracking works correctly (snapshot summaries, manifests)
❌ Reading is blocked - raises ValueError when scanning tables with equality deletes

The write path is production-ready. Users who generate equality delete files externally can add them to PyIceberg tables now, though they'll need other tools (like Spark) to read those tables.

tomlarkworthy · 2025-11-20T16:15:33Z

this was a mistake!

Equality delete tests

a9720e1

tomlarkworthy closed this Nov 20, 2025

tomlarkworthy deleted the poc branch November 20, 2025 16:16

tomlarkworthy restored the poc branch November 20, 2025 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Equality delete tests #2773

Equality delete tests #2773

Uh oh!

tomlarkworthy commented Nov 20, 2025

Uh oh!

tomlarkworthy commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Equality delete tests #2773

Equality delete tests #2773

Uh oh!

Conversation

tomlarkworthy commented Nov 20, 2025

Uh oh!

tomlarkworthy commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant