Skip to content

Allow configuration for loose-objects maintenance task #1885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Documentation/config/maintenance.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ maintenance.loose-objects.auto::
loose objects is at least the value of `maintenance.loose-objects.auto`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> From: Derrick Stolee <[email protected]>
>
> The 'loose-objects' task of 'git maintenance run' first deletes loose
> objects that exit within packfiles and then collects loose objects into

"exit" -> "exist"?  It may read better to also do "collects" ->
"collects remaining".

> a packfile. This second step uses an implicit limit of fifty thousand
> that cannot be modified by users.
>
> Add a new config option that allows this limit to be adjusted or ignored
> entirely.
>
> While creating tests for this option, I noticed that actually there was
> an off-by-one error due to the strict comparison in the limit check.

Ahh, I, contrary to my usual routine, started reading from the code
change before reading the proposed log message and was wondering
about this exact point.  

> +	/* If batch_size is INT_MAX, then this will return 0 always. */

Cute ;-).

>  	return ++(d->count) > d->batch_size;
>  }

The default value is 100.

maintenance.loose-objects.batchSize::
This integer config option controls the maximum number of loose objects
written into a packfile during the `loose-objects` task. The default is
fifty thousand. Use value `0` to indicate no limit.

maintenance.incremental-repack.auto::
This integer config option controls how often the `incremental-repack`
task should be run as part of `git maintenance run --auto`. If zero,
Expand Down
18 changes: 11 additions & 7 deletions Documentation/git-maintenance.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -126,13 +126,17 @@ loose-objects::
objects that already exist in a pack-file; concurrent Git processes
will examine the pack-file for the object data instead of the loose
object. Second, it creates a new pack-file (starting with "loose-")
containing a batch of loose objects. The batch size is limited to 50
thousand objects to prevent the job from taking too long on a
repository with many loose objects. The `gc` task writes unreachable
objects as loose objects to be cleaned up by a later step only if
they are not re-added to a pack-file; for this reason it is not
advisable to enable both the `loose-objects` and `gc` tasks at the
same time.
containing a batch of loose objects.
+
The batch size defaults to fifty thousand objects to prevent the job from
taking too long on a repository with many loose objects. Use the
`maintenance.loose-objects.batchSize` config option to adjust this size,
including a value of `0` to remove the limit.
+
The `gc` task writes unreachable objects as loose objects to be cleaned up
by a later step only if they are not re-added to a pack-file; for this
reason it is not advisable to enable both the `loose-objects` and `gc`
tasks at the same time.

incremental-repack::
The `incremental-repack` job repacks the object directory
Expand Down
20 changes: 20 additions & 0 deletions builtin/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -1029,6 +1029,8 @@ static int run_write_commit_graph(struct maintenance_run_opts *opts)

if (opts->quiet)
strvec_push(&child.args, "--no-progress");
else
strvec_push(&child.args, "--progress");

return !!run_command(&child);
}
Expand Down Expand Up @@ -1161,6 +1163,7 @@ static int write_loose_object_to_stdin(const struct object_id *oid,

fprintf(d->in, "%s\n", oid_to_hex(oid));

/* If batch_size is INT_MAX, then this will return 0 always. */
return ++(d->count) > d->batch_size;
}

Expand All @@ -1185,6 +1188,8 @@ static int pack_loose(struct maintenance_run_opts *opts)
strvec_push(&pack_proc.args, "pack-objects");
if (opts->quiet)
strvec_push(&pack_proc.args, "--quiet");
else
strvec_push(&pack_proc.args, "--no-quiet");
strvec_pushf(&pack_proc.args, "%s/pack/loose", r->objects->odb->path);

pack_proc.in = -1;
Expand All @@ -1204,6 +1209,15 @@ static int pack_loose(struct maintenance_run_opts *opts)
data.count = 0;
data.batch_size = 50000;

repo_config_get_int(r, "maintenance.loose-objects.batchSize",
&data.batch_size);

/* If configured as 0, then remove limit. */
if (!data.batch_size)
data.batch_size = INT_MAX;
else if (data.batch_size > 0)
data.batch_size--; /* Decrease for equality on limit. */

for_each_loose_file_in_objdir(r->objects->odb->path,
write_loose_object_to_stdin,
NULL,
Expand Down Expand Up @@ -1263,6 +1277,8 @@ static int multi_pack_index_write(struct maintenance_run_opts *opts)

if (opts->quiet)
strvec_push(&child.args, "--no-progress");
else
strvec_push(&child.args, "--progress");

if (run_command(&child))
return error(_("failed to write multi-pack-index"));
Expand All @@ -1279,6 +1295,8 @@ static int multi_pack_index_expire(struct maintenance_run_opts *opts)

if (opts->quiet)
strvec_push(&child.args, "--no-progress");
else
strvec_push(&child.args, "--progress");

if (run_command(&child))
return error(_("'git multi-pack-index expire' failed"));
Expand Down Expand Up @@ -1335,6 +1353,8 @@ static int multi_pack_index_repack(struct maintenance_run_opts *opts)

if (opts->quiet)
strvec_push(&child.args, "--no-progress");
else
strvec_push(&child.args, "--progress");

strvec_pushf(&child.args, "--batch-size=%"PRIuMAX,
(uintmax_t)get_auto_pack_size());
Expand Down
28 changes: 28 additions & 0 deletions t/t7900-maintenance.sh
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,34 @@ test_expect_success 'maintenance.loose-objects.auto' '
test_subcommand git prune-packed --quiet <trace-loC
'

test_expect_success 'maintenance.loose-objects.batchSize' '
git init loose-batch &&

# This creates three objects per commit.
test_commit_bulk -C loose-batch 34 &&
pack=$(ls loose-batch/.git/objects/pack/pack-*.pack) &&
index="${pack%pack}idx" &&
rm "$index" &&
git -C loose-batch unpack-objects <"$pack" &&
git -C loose-batch config maintenance.loose-objects.batchSize 50 &&

GIT_PROGRESS_DELAY=0 \
git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
grep "Enumerating objects: 50, done." err &&

GIT_PROGRESS_DELAY=0 \
git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
grep "Enumerating objects: 50, done." err &&

GIT_PROGRESS_DELAY=0 \
git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
grep "Enumerating objects: 2, done." err &&

GIT_PROGRESS_DELAY=0 \
git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
test_must_be_empty err
'

test_expect_success 'incremental-repack task' '
packDir=.git/objects/pack &&
for i in $(test_seq 1 5)
Expand Down
Loading