-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] - S3 Progress Tracker #3568
base: main
Are you sure you want to change the base?
Conversation
42596db
to
454b48b
Compare
pkg/sources/s3/progress_tracker.go
Outdated
type ProgressTracker struct { | ||
enabled bool | ||
|
||
isResuming bool // Indicates if the tracker is resuming a previous scan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it gets set once during initialization and never updated. Is that right?
When I read it, I thought it was like a state flag, and once we finished resuming it would be set to false
, but I haven't reviewed the whole ProgressTracker
module yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is likely misleading—it was intended as a flag to indicate some progress, primarily used in the update flow for calculating SectionRemaining. However, I don’t think we need to manage progress at all, so I’ll likely remove this and any related logic.
ex:
pageSize := int32(len(pageContents))
if p.currentPageSize == 0 {
if !p.isResuming {
// Only add to the total if this is a fresh scan.
p.progress.SectionsRemaining += pageSize
}
p.currentPageSize = pageSize
}
pkg/sources/s3/progress_tracker.go
Outdated
if progress == nil { | ||
ctx.Logger().Info("Nil progress provided. Progress initialized.") | ||
progress = new(sources.Progress) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually allow progress tracking? I think not, because progress
is now only accessible internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh, I thought I removed this. 😢 Yea this doesn't work, we should just return an error here. If a source doesn't have Progress, we have bigger problems.
// enabling resumable scans by tracking which objects have been successfully processed. | ||
// It provides checkpoints that can be used to resume interrupted scans without missing objects. | ||
type ProgressTracker struct { | ||
enabled bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If enabled
does not change at runtime, have you considered a noop implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was being lazy, assuming this would only be temporary. Since progress tracking will eventually always be enabled, I just went with this approach. 😓
// Preserve existing progress counters while clearing resume state. | ||
p.progress.SetProgressComplete( | ||
int(p.progress.SectionsCompleted), | ||
int(p.progress.SectionsRemaining), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why preserve the counters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was because Complete
is only called at the end of the scan. Since progress was tracked at the object level, there’s no need to set these values. I’ll be removing all progress logic. Progress tracking will still happen at the bucket level, but resuming will remain at the object level.
Description:
This PR adds a mechanism to track S3 scan progress, enabling resumption.
Checklist:
make test-community
)?make lint
this requires golangci-lint)?