Skip to content

Conversation

@ForeverAngry
Copy link
Contributor

@ForeverAngry ForeverAngry commented Sep 7, 2025

Closes #2433

Rationale for this change

This PR adds comprehensive branch merge strategies to PyIceberg, bringing Git-like branch merging capabilities to Iceberg table operations. This enhancement enables users to merge branches with different strategies depending on their workflow needs.

Feature Overview:
Apache Iceberg supports branch operations (create, delete, tag), but lacked merge capabilities between branches. This PR implements 5 standard merge strategies commonly used in version control systems (note, there are differences between this and the java implementation):

  1. MERGE: Classic three-way merge creating a merge commit that preserves history of both branches
  2. SQUASH: Condenses all commits from source branch into a single clean commit on target branch
  3. REBASE: Creates linear history by replaying commits from source branch on top of target branch
  4. CHERRY_PICK: Selects and applies specific individual commits from one branch to another
  5. FAST_FORWARD: Moves target branch pointer forward when no divergent commits exist (no merge commit needed)

Implementation Details:

  • Strategy Pattern: Clean, extensible architecture with abstract base class and concrete implementations
  • Automatic Detection: Fast-forward possibility automatically detected and validated
  • Robust Utilities: Common ancestor finding, branch validation, and snapshot traversal utilities
  • Flexible API: Optional source branch deletion after successful merge
  • Error Handling: Comprehensive validation with clear error messages for invalid operations

Use Cases:

  • Development Workflows: Feature branch integration with different merge policies
  • Data Pipeline Management: Merging experimental data processing branches back to production
  • Schema Evolution: Combining schema changes from different development branches

Are these changes tested?

Yes, extremely somewhat comprehensive test coverage with 35 tests across multiple categories:

Are there any user-facing changes?

Yes - New Feature Addition (No Breaking Changes)

New Public API:

from pyiceberg.table.update.snapshot import BranchMergeStrategy

# New enum with 5 merge strategies (im open to suggestions on this, I couldn't decided on the best approach)
BranchMergeStrategy.MERGE
BranchMergeStrategy.SQUASH  
BranchMergeStrategy.REBASE
BranchMergeStrategy.CHERRY_PICK
BranchMergeStrategy.FAST_FORWARD

# New method on ManageSnapshots
table.manage_snapshots().merge_branch(
    source_branch="feature",
    target_branch="main", 
    strategy=BranchMergeStrategy.SQUASH,
    delete_source_branch=False  # Optional: preserve or delete source branch
).commit()

- Implemented unit tests for various branch merge strategies including Merge, Squash, Rebase, Cherry-Pick, and Fast-Forward.
- Added tests for utility functions related to snapshot management and ancestor finding.
- Ensured coverage for edge cases such as missing snapshots, circular references, and validation errors during merges.
- Verified that all strategies return consistent structures and handle integration scenarios correctly.
- Included tests for error handling and behavior differences across strategies.
@ForeverAngry
Copy link
Contributor Author

@jayceslesar i know you have done a good bit of work in the managed snapshots class, can you review this as well?

@ForeverAngry ForeverAngry marked this pull request as ready for review September 7, 2025 00:33
@ForeverAngry ForeverAngry changed the title Add comprehensive tests for branch merge strategies in pyiceberg Add branch merge strategies Sep 7, 2025
@ForeverAngry
Copy link
Contributor Author

ForeverAngry commented Sep 19, 2025

@gabeiglio i noticed you had created a pretty great proposal for this work. It would be awesome if you wanted to review the implementation and help get it in good shape!

@gabeiglio
Copy link
Contributor

Thanks for the PR! this is awesome, will review it by this week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add branch merge strategies

2 participants