Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update blockbuilder to use scheduler for fetching jobs #15224

Merged
merged 13 commits into from
Dec 4, 2024

Conversation

ashwanthgoli
Copy link
Contributor

@ashwanthgoli ashwanthgoli commented Dec 3, 2024

What this PR does / why we need it:

  • Updates block builder to use scheduler APIs for getting jobs and updating their status
  • Adds a sync loop to periodically call syncJob to update the status of inflight jobs
  • Tries to rename all instances of slimgester to blockbuilder
  • Moves chunk appender code to a separate file appender.go
  • Removes controller as it is no longer required

Special notes for your reviewer:
I renamed slimgester.go to blockbuilder.go in the last commit, so github does not show the diff anymore and treats it as a new file. Please check the first two commits to view the new changes made in builder.go

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@ashwanthgoli ashwanthgoli changed the base branch from main to refactor-kafka-reader December 3, 2024 06:35
@pull-request-size pull-request-size bot added size/L and removed size/XL labels Dec 3, 2024
@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Dec 3, 2024
@ashwanthgoli ashwanthgoli marked this pull request as ready for review December 3, 2024 10:23
@ashwanthgoli ashwanthgoli requested a review from a team as a code owner December 3, 2024 10:23
Base automatically changed from refactor-kafka-reader to main December 4, 2024 04:40
Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few things to fix, but giving approval to unblock you.

@@ -29,6 +29,7 @@ message GetJobResponse {
message CompleteJobRequest {
string builder_id = 1;
Job job = 2;
int64 LastConsumedOffset = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Jobs are completed all-or-nothing because tsdbs are built at the end. The job itself contains the offset range


lastConsumedOffset, err := i.processJob(ctx, job, logger)
// TODO: pass lastConsumedOffset as a separate field
job.Offsets.Max = lastConsumedOffset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's much simpler if the jobs are predetermined at the scheduler. This was the initial design and although it does introduce a bit of lag (we only process offsets known at the time of job creation), I think the simplicity & separation of concerns are more beneficial (at least for now).


exists, job, err := i.jobController.LoadJob(ctx)
func (i *BlockBuilder) runOne(ctx context.Context, workerID string) error {
// assuming GetJob blocks/polls until a job is available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we'll need to retry when there are no jobs here, but as you said it's also possible the transport handles this

if err != nil {
return nil, err
readerFactory := func(partitionID int32) (partition.Reader, error) {
return partition.NewKafkaReader(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will panic b/c it's creating new clients each time, each which use the same metrics namespacing internally. Instead, we could create a single client which creates cheap copies via a WithPartition(x) -> Self or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it is safe to make copies of kgo.Client. we'd need separate instances of it as we mutate it while setting offset for consumption.

working around this by registering metrics only once 786186a.

@ashwanthgoli ashwanthgoli merged commit 0d67831 into main Dec 4, 2024
59 checks passed
@ashwanthgoli ashwanthgoli deleted the blockbuilder-use-scheduler branch December 4, 2024 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants