-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Summary
The current Batch
passes around the watermark with optional data
.
#[derive(Clone, PartialEq, Debug)]
pub struct Batch {
/// The data associated with the batch.
pub(crate) data: Option<BatchInfo>,
/// An indication that the batch stream has completed up to the given time.
/// Any rows in future batches on this stream must have a time strictly
/// greater than this.
pub up_to_time: RowTime,
}
Many evaluators are thus forced to reason about the presence / absence of the watermark and data
without really needing to. A good refactoring to simplify logic / readability would be to separate the watermark from the batch, and only pass each where they are needed.
Possible Solution
#[must_use]
pub struct Watermark(RowTime);
pub struct WatermarkedBatch {
batch: Option<Batch>,
watermark: Watermark,
}
impl WatermarkedBatch {
pub fn take(self) -> (Option<Batch>, Watermark) { ... }
}
So:
- The only way to get the batch is to call take
- When you call take you get the Watermark
- Once you have the Watermark you must use it
Metadata
Metadata
Assignees
Labels
No labels