Skip to content

Append-mode unique checks against existing target data (scalable strategy) #180

@malon64

Description

@malon64

Summary

Implement a scalable target-aware uniqueness check for append mode so duplicates can be detected against existing accepted data, not only within the incoming batch.

Context

Write mode append/overwrite exists. Unique checks currently need a robust architecture for target-side comparisons at scale.

Scope

  • Define abstraction/hook for target uniqueness lookup by sink format (Parquet/Delta/Iceberg)
  • Optimize key-only reads and partition pruning where possible
  • Keep overwrite mode behavior unchanged
  • Report duplicate source-vs-target outcomes clearly

Acceptance criteria

  • Design/spec + incremental implementation plan
  • Initial implementation for at least one sink (future PRs)

Related: #159

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:ioIO formats/read/writeenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions