-
Notifications
You must be signed in to change notification settings - Fork 160
Audit and document Scalar config #2010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
f38ead9 to
18580f0
Compare
|
/submit |
|
Submitted as [email protected] To fetch this version into To fetch this version to local tag |
| #include "refs.h" | ||
| #include "dir.h" | ||
| #include "packfile.h" | ||
| #include "help.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <[email protected]> writes:
> Add "# set by scalar" to the end of each config option to assist users
> in identifying why these config options were set in their repo.
The implementation is quite straight-forward, inlining expansion of
repo_config_set_gently() in the places that we want to add comment to.
If we had (a lot) more than two callsites, I would have suggested to
add a simple helper function, something like
static int scalar_config_set(struct repository *r, const char *key, const char *value)
{
char *file = repo_git_path(r, "config");
int res = repo_config_set_multivar_in_file_gently(r, file,
key, value, NULL, " # set by scalar", 0);
free(file);
return res;
}
and then the updates to the callers would have been absolute minimum.
Well, even with only two callsites, perhaps such a refactoring may
still have value in reducing the risk of typo in the comment.
> diff --git a/t/t9210-scalar.sh b/t/t9210-scalar.sh
> index bd6f0c40d2..43c210a23d 100755
> --- a/t/t9210-scalar.sh
> +++ b/t/t9210-scalar.sh
> @@ -210,6 +210,9 @@ test_expect_success 'scalar reconfigure' '
> GIT_TRACE2_EVENT="$(pwd)/reconfigure" scalar reconfigure -a &&
> test_path_is_file one/src/cron.txt &&
> test true = "$(git -C one/src config core.preloadIndex)" &&
> + test_grep "preloadIndex = true # set by scalar" one/src/.git/config &&
> + test_grep "excludeDecoration = refs/prefetch/\* # set by scalar" one/src/.git/config &&
> +
> test_subcommand git maintenance start <reconfigure &&
> test_subcommand ! git maintenance unregister --force <reconfigure &&
Looks good.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Wed, Nov 26, 2025 at 03:55:10PM -0800, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
>
> > Add "# set by scalar" to the end of each config option to assist users
> > in identifying why these config options were set in their repo.
>
> The implementation is quite straight-forward, inlining expansion of
> repo_config_set_gently() in the places that we want to add comment to.
>
> If we had (a lot) more than two callsites, I would have suggested to
> add a simple helper function, something like
>
> static int scalar_config_set(struct repository *r, const char *key, const char *value)
> {
> char *file = repo_git_path(r, "config");
> int res = repo_config_set_multivar_in_file_gently(r, file,
> key, value, NULL, " # set by scalar", 0);
> free(file);
> return res;
> }
>
> and then the updates to the callers would have been absolute minimum.
>
> Well, even with only two callsites, perhaps such a refactoring may
> still have value in reducing the risk of typo in the comment.
Agreed, I think it's a good idea to provide such a function. The calls
to `repo_config_set_multivar_in_file_gently()` are quite verbose.
Patrick
scalar.c
Outdated
| #endif | ||
| { "core.logAllRefUpdates", "true", 1 }, | ||
| { "credential.https://dev.azure.com.useHttpPath", "true", 1 }, | ||
| { "credential.validate", "false", 1 }, /* GCM4W-only */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <[email protected]> writes:
> diff --git a/t/t9210-scalar.sh b/t/t9210-scalar.sh
> index 43c210a23d..91d5964b73 100755
> --- a/t/t9210-scalar.sh
> +++ b/t/t9210-scalar.sh
> @@ -246,6 +246,11 @@ test_expect_success 'scalar reconfigure --all with includeIf.onbranch' '
> '
>
> test_expect_success 'scalar reconfigure --all with detached HEADs' '
> + # This test demonstrates an issue with index.skipHash=true and
> + # this test variable for the split index. Disable the test variable.
> + GIT_TEST_SPLIT_INDEX= &&
> + export GIT_TEST_SPLIT_INDEX &&
Interesting. I would have expected to see a simple "sane_unset",
instead of exporting an empty setting explicitly.
> repos="two three four" &&
> for num in $repos
> doThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 11/26/2025 6:57 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
>
>> diff --git a/t/t9210-scalar.sh b/t/t9210-scalar.sh
>> index 43c210a23d..91d5964b73 100755
>> --- a/t/t9210-scalar.sh
>> +++ b/t/t9210-scalar.sh
>> @@ -246,6 +246,11 @@ test_expect_success 'scalar reconfigure --all with includeIf.onbranch' '
>> '
>>
>> test_expect_success 'scalar reconfigure --all with detached HEADs' '
>> + # This test demonstrates an issue with index.skipHash=true and
>> + # this test variable for the split index. Disable the test variable.
>> + GIT_TEST_SPLIT_INDEX= &&
>> + export GIT_TEST_SPLIT_INDEX &&
>
> Interesting. I would have expected to see a simple "sane_unset",
> instead of exporting an empty setting explicitly.
That's indeed a better way to do it. Will do in v2.
Thanks,
-Stolee
|
|
||
| static int set_recommended_config(int reconfigure) | ||
| { | ||
| struct scalar_config config[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <[email protected]> writes:
> From: Derrick Stolee <[email protected]>
>
> These config values were added in the original Scalar contribution,
> d0feac4e8c (scalar: 'register' sets recommended config and starts
> maintenance, 2021-12-03), but were never fully checked for validity in
> the upstream Git project. At the time, Scalar was only intended for the
> contrib/ directory so did not have as rigorous of an investigation.
>
> Each config option has its own justification for removal:
>
> * core.preloadIndex: This value is true by default, now. Removing this
> causes some changes required to the tests that checked this config
> value. Use gui.gcwarning=false instead.
>
> * core.fscache: This config does not exist in the core Git project, but
> is instead a config option for a Git for Windows feature.
>
> * core.multiPackIndex: This config value is now enabled by default, so
> does not need to be called out specifically. It was originally
> included to make sure the background maintenance that created
> multi-pack-indexes would result in the expected performance
> improvements.
>
> * credential.validate: This option is not something specific to Git but
> instead an older version of Git Credential Manager for Windows. That
> software was replaced several years ago by the cross-platform Git
> Credential Manger so this option is no longer needed to help users who
> were on that older software.
>
> * pack.useSparse=true: This value is now Git's default as of de3a864114
> (config: set pack.useSparse=true by default, 2020-03-20) so we don't
> need it set by Scalar.
Thanks for a conprehensive list. Very well described.
| ~~~~~~ | ||
|
|
||
| delete <enlistment>:: | ||
| This subcommand lets you delete an existing Scalar enlistment from your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <[email protected]> writes:
> +commitGraph.generationVersion=1::
> + While the preferred version is 2 for performance reasons, existing users
> + that had version 1 by default will need special care in upgrading to
> + version 2. This is likely to change in the future as the upgrade story
> + is solidifies.
"as the upgrade story solidifies"?
> +fetch.writeCommitGraph=false::
> + This config setting was created to help users automatically udpate their
> + commit-graph files as they perform fetches. However, this takes time
> + from foreground fetches and pulls and Scalar uses background maintenance
> + for this function instead.
"update their files".
> +index.threads=true::
> + This tells Git to automatically detect how many threads it should use
> + when reading the index in parallel due to the `core.preloadIndex=true`
> + setting.
Is "due to the `core.preloadIndex=true` setting" part of this
sentence still relevant?
Other than that, superbly written. Thanks, will queue.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 11/26/2025 7:09 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
>
>> +commitGraph.generationVersion=1::
>> + While the preferred version is 2 for performance reasons, existing users
>> + that had version 1 by default will need special care in upgrading to
>> + version 2. This is likely to change in the future as the upgrade story
>> + is solidifies.
>
> "as the upgrade story solidifies"?
That's better than what I was going for which was "is solidified". Will fix.
>> +fetch.writeCommitGraph=false::
>> + This config setting was created to help users automatically udpate their
>> + commit-graph files as they perform fetches. However, this takes time
>> + from foreground fetches and pulls and Scalar uses background maintenance
>> + for this function instead.
>
> "update their files".
Yes. thanks.
>> +index.threads=true::
>> + This tells Git to automatically detect how many threads it should use
>> + when reading the index in parallel due to the `core.preloadIndex=true`
>> + setting.
>
> Is "due to the `core.preloadIndex=true` setting" part of this
> sentence still relevant?
I should still include this, but mention that it is enabled by default and
still recommended.
> Other than that, superbly written. Thanks, will queue.
Thanks,
-Stolee
|
This patch series was integrated into seen via git@175d67a. |
|
This branch is now known as |
|
This patch series was integrated into seen via git@d20a0d3. |
|
There was a status update in the "Cooking" section about the branch Documentation updates. Expecting a reroll. cf. <[email protected]> source: <[email protected]> |
| fsm_settings__get_reason(the_repository) == FSMONITOR_REASON_OK; | ||
| } | ||
|
|
||
| static int set_recommended_config(int reconfigure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Wed, Nov 26, 2025 at 10:18:35PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <[email protected]>
>
> The config values set by Scalar went through an audit in the previous
> changes, so now reorganize the settings and simplify their purpose.
>
> First, alphabetize the config options, except put the platform-specific
> options at the end. This groups two Windows-specific settings and only
> one non-Windows setting.
>
> Also, this removes the 'overwrite_on_reconfigure' setting for many of
> these options. That setting made nearly all of these options "required"
> for scalar enlistments, restricting use for users. Instead, now nearly
> all options have removed this setting.
As far as I understand, this setting causes us to overwrite any
preexisting config values when reconfiguring Scalar? So with your
changes the effect is that we now don't do that anymore, which allows
the user to tune some of the configuration values to their liking after
having run `scalar init` for the first time. I guess that makes sense,
as it gives the user more flexibility.
It does make me wonder though: is it really the most sensible thing to
overwrite any keys that already exist in the configuration? We may end
up overwriting configuration specified by the user both in the case of
`scalar init` and `scalar reconfigure`. But arguably, we might want to
only ever write configuration that does _not_ yet have an explicit value
in the configuration file, regardless of whether or not we reconfigure.
> However, there is one setting that still has this, which is
> index.skipHash, which was previously being set to _false_ when we
> actually prefer the value of true. Keep the overwrite here to help
> Scalar users upgrade to the new version. We may remove that overwrite in
> the future once we belive that most of the users who have the false
> value have upgraded to a version that overwrites that to 'true'.
Makes sense. This has likely been a bug, and we now want to rectify that
bug.
Thanks!
PatrickThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 12/1/25 3:55 AM, Patrick Steinhardt wrote:
> On Wed, Nov 26, 2025 at 10:18:35PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <[email protected]>
>>
>> The config values set by Scalar went through an audit in the previous
>> changes, so now reorganize the settings and simplify their purpose.
>>
>> First, alphabetize the config options, except put the platform-specific
>> options at the end. This groups two Windows-specific settings and only
>> one non-Windows setting.
>>
>> Also, this removes the 'overwrite_on_reconfigure' setting for many of
>> these options. That setting made nearly all of these options "required"
>> for scalar enlistments, restricting use for users. Instead, now nearly
>> all options have removed this setting.
> > As far as I understand, this setting causes us to overwrite any
> preexisting config values when reconfiguring Scalar? So with your
> changes the effect is that we now don't do that anymore, which allows
> the user to tune some of the configuration values to their liking after
> having run `scalar init` for the first time. I guess that makes sense,
> as it gives the user more flexibility.
Yes, that is correct.
> It does make me wonder though: is it really the most sensible thing to
> overwrite any keys that already exist in the configuration? We may end
> up overwriting configuration specified by the user both in the case of
> `scalar init` and `scalar reconfigure`. But arguably, we might want to
> only ever write configuration that does _not_ yet have an explicit value
> in the configuration file, regardless of whether or not we reconfigure.
I agree that this notion of forcing config is not optimal, and is a leftover
from VFS for Git where some of these config things were actually required
for the virtualization to work. Once that idea was in place, it was easy
to think "we'll make sure the repo is configured correctly" but that makes
much less sense in Scalar these days.
>> However, there is one setting that still has this, which is
>> index.skipHash, which was previously being set to _false_ when we
>> actually prefer the value of true. Keep the overwrite here to help
>> Scalar users upgrade to the new version. We may remove that overwrite in
>> the future once we belive that most of the users who have the false
>> value have upgraded to a version that overwrites that to 'true'.
> > Makes sense. This has likely been a bug, and we now want to rectify that
> bug.
And hopefully this is the only reason we'd need this "overwrite" feature
from this point on.
Thanks,
-Stolee| ~~~~~~ | ||
|
|
||
| delete <enlistment>:: | ||
| This subcommand lets you delete an existing Scalar enlistment from your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Wed, Nov 26, 2025 at 10:18:36PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/Documentation/scalar.adoc b/Documentation/scalar.adoc
> index f81b2832f8..b34af225e6 100644
> --- a/Documentation/scalar.adoc
> +++ b/Documentation/scalar.adoc
> @@ -197,6 +197,164 @@ delete <enlistment>::
> This subcommand lets you delete an existing Scalar enlistment from your
> local file system, unregistering the repository.
>
> +REQUIRED AND RECOMMENDED CONFIG
> +-------------------------------
> +
> +As part of both `scalar clone` and `scalar register`, certain Git config
> +values are set to optimize for large repositories or cross-platform support.
> +These options are updated in new Git versions according to the best known
> +advice for large repositories, and users can get the latest recommendations
> +by running `scalar reconfigure [--all]`.
> +
> +This section lists justifications for the config values that are set in the
> +latest version.
> +
> +am.keepCR=true::
> + This setting is important for cross-platform development across Windows
> + and non-Windows platforms and keeping carriage return (`\r`) characters
> + in certain workflows.
> +
> +commitGraph.changedPaths=true::
> + This setting helps the background maintenance steps that compute the
> + serialized commit-graph to also store changed-path Bloom filters. This
> + accelerates file history commands and allows users to automatically
> + benefit without running a foreground command.
Is this something we also want to promote to "default" eventually? The
downside of course is that maintenance takes a bit longer, but given
that it runs in the background anyway this shouldn't really impact our
users all that much.
> +commitGraph.generationVersion=1::
> + While the preferred version is 2 for performance reasons, existing users
> + that had version 1 by default will need special care in upgrading to
> + version 2. This is likely to change in the future as the upgrade story
> + is solidifies.
Is that still the case? We _did_ have some bugs in the upgrade path in
the past, but I thought it got all sorted out by now?
[snip]
> +fetch.unpackLimit=1::
> + This setting prevents Git from unpacking packfiles into loose objects
> + as they are downloaded from the server. This feature was intended as a
> + way to prevent performance issues from too many packfiles, but Scalar
> + uses background maintenance to group packfiles and cover them with a
> + multi-pack-index, removing this issue.
The second sentence here reads as if "fetch.unpackLimit=1" was the
feature you are talking about, which led to some puzzlement at first.
But what you are talking about is the _default_ unpack limit of 100.
Maybe something like this reads better?
This setting prevents Git from unpacking packfiles into loose objects
as they are downloaded from the server. The default limit of 100
objects was intended as a way to prevent performance issues from too
many packfiles, but Scalar uses background maintenance to group
packfiles and cover them with a multi-pack-index, removing this
issue.
PatrickThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 12/1/25 3:55 AM, Patrick Steinhardt wrote:
> On Wed, Nov 26, 2025 at 10:18:36PM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/Documentation/scalar.adoc b/Documentation/scalar.adoc
>> index f81b2832f8..b34af225e6 100644
>> --- a/Documentation/scalar.adoc
>> +++ b/Documentation/scalar.adoc
>> @@ -197,6 +197,164 @@ delete <enlistment>::
>> This subcommand lets you delete an existing Scalar enlistment from your
>> local file system, unregistering the repository.
>> >> +REQUIRED AND RECOMMENDED CONFIG
>> +-------------------------------
>> +
>> +As part of both `scalar clone` and `scalar register`, certain Git config
>> +values are set to optimize for large repositories or cross-platform support.
>> +These options are updated in new Git versions according to the best known
>> +advice for large repositories, and users can get the latest recommendations
>> +by running `scalar reconfigure [--all]`.
>> +
>> +This section lists justifications for the config values that are set in the
>> +latest version.
>> +
>> +am.keepCR=true::
>> + This setting is important for cross-platform development across Windows
>> + and non-Windows platforms and keeping carriage return (`\r`) characters
>> + in certain workflows.
>> +
>> +commitGraph.changedPaths=true::
>> + This setting helps the background maintenance steps that compute the
>> + serialized commit-graph to also store changed-path Bloom filters. This
>> + accelerates file history commands and allows users to automatically
>> + benefit without running a foreground command.
> > Is this something we also want to promote to "default" eventually? The
> downside of course is that maintenance takes a bit longer, but given
> that it runs in the background anyway this shouldn't really impact our
> users all that much.
I'm not sure, as this is a significant cost to the computation time. It will
impact foreground commands, as well. It increases the size of the file, too.
It's worth considering, but I don't think the answer is very simple.
>> +commitGraph.generationVersion=1::
>> + While the preferred version is 2 for performance reasons, existing users
>> + that had version 1 by default will need special care in upgrading to
>> + version 2. This is likely to change in the future as the upgrade story
>> + is solidifies.
> > Is that still the case? We _did_ have some bugs in the upgrade path in
> the past, but I thought it got all sorted out by now?
This is very likely, but I haven't validated myself. I'd be interested to
double-check and update this setting in a later series. If we update to 2,
then this would be a good reason to overwrite the old config for a while.
> [snip]
>> +fetch.unpackLimit=1::
>> + This setting prevents Git from unpacking packfiles into loose objects
>> + as they are downloaded from the server. This feature was intended as a
>> + way to prevent performance issues from too many packfiles, but Scalar
>> + uses background maintenance to group packfiles and cover them with a
>> + multi-pack-index, removing this issue.
> > The second sentence here reads as if "fetch.unpackLimit=1" was the
> feature you are talking about, which led to some puzzlement at first.
> But what you are talking about is the _default_ unpack limit of 100.
> Maybe something like this reads better?
> > This setting prevents Git from unpacking packfiles into loose objects
> as they are downloaded from the server. The default limit of 100
> objects was intended as a way to prevent performance issues from too
> many packfiles, but Scalar uses background maintenance to group
> packfiles and cover them with a multi-pack-index, removing this
> issue.
Good catch. Thanks!
-StoleeA repo may have config options set by 'scalar clone' or 'scalar register' and then updated by 'scalar reconfigure'. It can be helpful to point out which of those options were set by the latest scalar recommendations. Add "# set by scalar" to the end of each config option to assist users in identifying why these config options were set in their repo. Use a new helper method to simplify the two callsites. Co-authored-by: Patrick Steinhardt <[email protected]> Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
The index.skipHash config option has been set to 'false' by Scalar since 4933152 (scalar: enable path-walk during push via config, 2025-05-16) but that commit message is trying to communicate the exact opposite: that the 'true' value is what we want instead. This means that we've been disabling this performance benefit for Scalar repos unintentionally. Fix this issue before we add justification for the config options set in this list. Oddly, enabling index.skipHash causes a test issue during 'test_commit' in one of the Scalar tests when GIT_TEST_SPLIT_INDEX is enabled (as caught by the linux-test-vars build). I'm fixing the test by disabling the environment variable, but the issue should be resolved in a series focused on the split index. Signed-off-by: Derrick Stolee <[email protected]>
These config values were added in the original Scalar contribution, d0feac4 (scalar: 'register' sets recommended config and starts maintenance, 2021-12-03), but were never fully checked for validity in the upstream Git project. At the time, Scalar was only intended for the contrib/ directory so did not have as rigorous of an investigation. Each config option has its own justification for removal: * core.preloadIndex: This value is true by default, now. Removing this causes some changes required to the tests that checked this config value. Use gui.gcwarning=false instead. * core.fscache: This config does not exist in the core Git project, but is instead a config option for a Git for Windows feature. * core.multiPackIndex: This config value is now enabled by default, so does not need to be called out specifically. It was originally included to make sure the background maintenance that created multi-pack-indexes would result in the expected performance improvements. * credential.validate: This option is not something specific to Git but instead an older version of Git Credential Manager for Windows. That software was replaced several years ago by the cross-platform Git Credential Manger so this option is no longer needed to help users who were on that older software. * pack.useSparse=true: This value is now Git's default as of de3a864 (config: set pack.useSparse=true by default, 2020-03-20) so we don't need it set by Scalar. Signed-off-by: Derrick Stolee <[email protected]>
The config values set by Scalar went through an audit in the previous changes, so now reorganize the settings and simplify their purpose. First, alphabetize the config options, except put the platform-specific options at the end. This groups two Windows-specific settings and only one non-Windows setting. Also, this removes the 'overwrite_on_reconfigure' setting for many of these options. That setting made nearly all of these options "required" for scalar enlistments, restricting use for users. Instead, now nearly all options have removed this setting. However, there is one setting that still has this, which is index.skipHash, which was previously being set to _false_ when we actually prefer the value of true. Keep the overwrite here to help Scalar users upgrade to the new version. We may remove that overwrite in the future once we belive that most of the users who have the false value have upgraded to a version that overwrites that to 'true'. Signed-off-by: Derrick Stolee <[email protected]>
18580f0 to
70bdcf7
Compare
Add user-facing documentation that justifies the values being set by 'scalar clone', 'scalar register', and 'scalar reconfigure'. Helped-by: Junio C Hamano <[email protected]> Helped-by: Patrick Steinhardt <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
70bdcf7 to
ac1627d
Compare
|
On the Git mailing list, Johannes Schindelin wrote (reply to this): Hi Stolee,
On Wed, 26 Nov 2025, Derrick Stolee via GitGitGadget wrote:
> In September [1], we discussed that the Scalar config options could use some
> documented justification as well as some comments to the config file that
> they were set by Scalar. I was then immediately distracted by other work
> things and am finally here with a series to do just that.
Thank you for doing this, in particular the (quite long!) list of
explanations are excellent, especially when some user wonders why a
particular setting was chosen and wants to understand the reason.
>
> [1]
> https://lore.kernel.org/git/[email protected]/
>
> I have indeed used Patrick's idea to add '# set by scalar' to each line
> added by Scalar, it took a little more work for all the kinds of config set.
I am glad that the work I put in to optionally add comments pays off.
It's a bit sad that there is no well-designed bulk-edit "API" function
which therefore requires constructing and `free()`ing that `file` variable
many times, but that's not the fault of this series.
> I made myself a co-author.
>
> While working to justify each config option, I found some stale or incorrect
> config options. I also relaxed the override setting in most cases which gave
> me an opportunity to alphabetize the settings.
>
> There was at least one case (I'm thinking of core.fscache here) where the
> config doesn't even exist in core Git, but instead in Git for Windows. We'll
> need to adjust in that fork to reinclude it in the right place.
Thank you for calling this out! I will take care of this in Git for
Windows and also in Microsoft Git (which inherits this flag from Git for
Windows).
To be honest, I am not so certain that we want the FSCache to be enabled,
it does have long-standing bugs (introduced by the partial clone feature,
for example, where the FSCache continues to retain stale information about
which loose objects are present even after the missing ones have been
fetched). I guess we'll have to measure the actual performance benefits to
reassess whether the feature is worth the trouble.
Thank you for your diligent work, as always,
Johannes
>
> Thanks, -Stolee
>
> Derrick Stolee (5):
> scalar: annotate config file with "set by scalar"
> scalar: use index.skipHash=true for performance
> scalar: remove stale config values
> scalar: alphabetize and simplify config
> scalar: document config settings
>
> Documentation/scalar.adoc | 158 ++++++++++++++++++++++++++++++++++++++
> scalar.c | 81 ++++++++++---------
> t/t9210-scalar.sh | 26 ++++---
> 3 files changed, 218 insertions(+), 47 deletions(-)
>
>
> base-commit: 6ab38b7e9cc7adafc304f3204616a4debd49c6e9
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2010%2Fderrickstolee%2Fscalar-config-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2010/derrickstolee/scalar-config-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/2010
> --
> gitgitgadget
> |
|
/submit |
|
Submitted as [email protected] To fetch this version into To fetch this version to local tag |
|
|
||
| static int set_recommended_config(int reconfigure) | ||
| { | ||
| struct scalar_config config[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Matthew Hughes wrote (reply to this):
On Mon, Dec 01, 2025 at 04:50:45PM +0000, Derrick Stolee via GitGitGadget wrote:
> * core.preloadIndex: This value is true by default, now. Removing this
> causes some changes required to the tests that checked this config
> value. Use gui.gcwarning=false instead.
I was going to ask about if we could also rely on the default value of
index.threads like we do here, but then went and did some reading and realised
some config values, like index.recordOffsetTable, have their value set
according to whether index.threads was explicitly set, so I guess there's an
implicit reliance on that behaviour that we want to keep?
> * core.fscache: This config does not exist in the core Git project, but
> is instead a config option for a Git for Windows feature.
>
> * core.multiPackIndex: This config value is now enabled by default, so
> does not need to be called out specifically. It was originally
> included to make sure the background maintenance that created
> multi-pack-indexes would result in the expected performance
> improvements.
>
> * credential.validate: This option is not something specific to Git but
> instead an older version of Git Credential Manager for Windows. That
> software was replaced several years ago by the cross-platform Git
> Credential Manger so this option is no longer needed to help users who
> were on that older software.
>
> * pack.useSparse=true: This value is now Git's default as of de3a864114
> (config: set pack.useSparse=true by default, 2020-03-20) so we don't
> need it set by Scalar.
Thanks for the detail on all of these, very helpfulThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Mon, Dec 01, 2025 at 05:46:46PM +0000, Matthew Hughes wrote:
> On Mon, Dec 01, 2025 at 04:50:45PM +0000, Derrick Stolee via GitGitGadget wrote:
> > * core.preloadIndex: This value is true by default, now. Removing this
> > causes some changes required to the tests that checked this config
> > value. Use gui.gcwarning=false instead.
>
> I was going to ask about if we could also rely on the default value of
> index.threads like we do here, but then went and did some reading and realised
> some config values, like index.recordOffsetTable, have their value set
> according to whether index.threads was explicitly set, so I guess there's an
> implicit reliance on that behaviour that we want to keep?
Wait. Are you saying that "index.recordOffsetTable" behaves differently
based on whether "index.threads" is implicitly enabled due to the
default value or explicitly enabled via the configuration? If so, that
smells like a plain bug to me.
PatrickThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Matthew Hughes wrote (reply to this):
On Tue, Dec 02, 2025 at 08:53:45AM +0100, Patrick Steinhardt wrote:
> Wait. Are you saying that "index.recordOffsetTable" behaves differently
> based on whether "index.threads" is implicitly enabled due to the
> default value or explicitly enabled via the configuration?
That was my understanding from a cursory read of the results of searching for
'index.threads' in git-config:
> index.recordEndOfIndexEntries
> ...
> Defaults to true if index.threads has been explicitly enabled, false
> otherwiseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Tue, Dec 02, 2025 at 07:04:24PM +0000, Matthew Hughes wrote:
> On Tue, Dec 02, 2025 at 08:53:45AM +0100, Patrick Steinhardt wrote:
> > Wait. Are you saying that "index.recordOffsetTable" behaves differently
> > based on whether "index.threads" is implicitly enabled due to the
> > default value or explicitly enabled via the configuration?
>
> That was my understanding from a cursory read of the results of searching for
> 'index.threads' in git-config:
>
> > index.recordEndOfIndexEntries
> > ...
> > Defaults to true if index.threads has been explicitly enabled, false
> > otherwise
Hm, true. At least that's a concious decision then.
The logic around this was introduced in 2a9dedef2e (index: make
index.threads=true enable ieot and eoie, 2018-11-19), and the ultimate
reason for it seems to be backwards compatibility:
index.threads and index.recordOffsetTable unspecified: do not write
the offset table yet (to avoid alarming the user with "ignoring IEOT
extension" messages when an older version of Git accesses the
repository) but do make use of multiple threads to read the index if
the supporting offset table is present.
Older versions of Git complained when they see unknown extensions, and
we didn't want to expose users to such warnings. That makes me wonder
whether it's time now to revisit that decision -- it's been 7 years
since then, I guess that many clients nowadays would understand the
extension.
The only (documented) downside should thus not be that important
anymore, but the upside is that reading the index would be faster if we
default-enable writing the extension.
Patrick|
User |
| ~~~~~~ | ||
|
|
||
| delete <enlistment>:: | ||
| This subcommand lets you delete an existing Scalar enlistment from your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Matthew Hughes wrote (reply to this):
On Mon, Dec 01, 2025 at 04:50:47PM +0000, Derrick Stolee via GitGitGadget wrote:
> Add user-facing documentation that justifies the values being set by
> 'scalar clone', 'scalar register', and 'scalar reconfigure'.
Thanks! This is exactly what I was hoping for.
> +REQUIRED AND RECOMMENDED CONFIG
> +-------------------------------
Would it be worth noting in scalar.c that the config options listed there are
documented here, So that a dev changing the list in the source will know to
also update this? I assume there's an understanding that if e.g. you update a
flag you should know to also update relevant docs, but perhaps this is a bit
more niche.
> +gc.auto=0::
> + This disables automatic garbage collection, since Scalar uses background
> + maintenance to keep the repository data in good shape.
Checking my understanding: this means there will be _no_ automatic GC in a
scalar repo? Since scalar calls 'maintenance register' which means
maintenance.strategy will be set to 'incremental' which won't schedule any gc
runsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Mon, Dec 01, 2025 at 05:58:06PM +0000, Matthew Hughes wrote:
> On Mon, Dec 01, 2025 at 04:50:47PM +0000, Derrick Stolee via GitGitGadget wrote:
> > Add user-facing documentation that justifies the values being set by
> > 'scalar clone', 'scalar register', and 'scalar reconfigure'.
>
> Thanks! This is exactly what I was hoping for.
>
> > +REQUIRED AND RECOMMENDED CONFIG
> > +-------------------------------
>
> Would it be worth noting in scalar.c that the config options listed there are
> documented here, So that a dev changing the list in the source will know to
> also update this? I assume there's an understanding that if e.g. you update a
> flag you should know to also update relevant docs, but perhaps this is a bit
> more niche.
>
> > +gc.auto=0::
> > + This disables automatic garbage collection, since Scalar uses background
> > + maintenance to keep the repository data in good shape.
>
> Checking my understanding: this means there will be _no_ automatic GC in a
> scalar repo? Since scalar calls 'maintenance register' which means
> maintenance.strategy will be set to 'incremental' which won't schedule any gc
> runs
Yes, auto-garbage-collection is completely disabled in repositories
managed by Scalar. And I guess that made sense in the past:
auto-maintenance did not know about maintenance strategies at all, and
consequently it would still run git-gc(1). And that's not really
compatible with the "incremental" strategy that Scalar wants to use.
I changed that in Git 2.52 so that maintenance strategies now apply to
both scheduled and normal maintenance. But I was worried about backwards
compatibility for the "incremental" strategy, so I made the change in a
backwards compatible way so that normal maintenance still ends up using
git-gc(1).
Arguably though, we can now iterate on our infrastructure: if we were to
introduce an "incremental-v2" strategy we could adapt it to have proper
strategies for both scheduled and normal maintenance. And if so, we can
adapt Scalar in such a way that it doesn't have to disable auto
maintenance anymore.
I think that would be a reasonable thing to do. Scheduled maintenance
only runs once per hour, and in a high-activity repo a user may easily
generate tons of objects in that hour that make the repository perform
badly.
Patrick|
On the Git mailing list, Junio C Hamano wrote (reply to this): "Derrick Stolee via GitGitGadget" <[email protected]> writes:
> -@@ scalar.c: static int set_scalar_config(const struct scalar_config *config, int reconfigure
> +@@ scalar.c: struct scalar_config {
> + int overwrite_on_reconfigure;
> + };
> +
> ++static int set_config_with_comment(const char *key, const char *value)
I do not care too deeply as this is a file-scope static that is
called only twice, but I would have preferred scalar_set_config()
which is a lot more specificto the purpose of this function (and the
comment "# set by scalar" is hardcoded constant in this function
that its callers cannot affect, so "with_comment" is not even a
statement that "the callers can add comment to their config
settings") which would have taken a bit shorter line to call.
> +fetch.unpackLimit=1::
> + This setting prevents Git from unpacking packfiles into loose objects
> -+ as they are downloaded from the server. This feature was intended as a
> -+ way to prevent performance issues from too many packfiles, but Scalar
> -+ uses background maintenance to group packfiles and cover them with a
> -+ multi-pack-index, removing this issue.
> ++ as they are downloaded from the server. The default limit of 100 was
> ++ intended as a way to prevent performance issues from too many packfiles,
> ++ but Scalar uses background maintenance to group packfiles and cover them
> ++ with a multi-pack-index, removing this issue.
Nicely explained.
Will replace (when I land).
Thanks.
|
| #include "refs.h" | ||
| #include "dir.h" | ||
| #include "packfile.h" | ||
| #include "help.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Mon, Dec 01, 2025 at 04:50:43PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/t/t9210-scalar.sh b/t/t9210-scalar.sh
> index bd6f0c40d2..43c210a23d 100755
> --- a/t/t9210-scalar.sh
> +++ b/t/t9210-scalar.sh
> @@ -210,6 +210,9 @@ test_expect_success 'scalar reconfigure' '
> GIT_TRACE2_EVENT="$(pwd)/reconfigure" scalar reconfigure -a &&
> test_path_is_file one/src/cron.txt &&
> test true = "$(git -C one/src config core.preloadIndex)" &&
> + test_grep "preloadIndex = true # set by scalar" one/src/.git/config &&
> + test_grep "excludeDecoration = refs/prefetch/\* # set by scalar" one/src/.git/config &&
> +
> test_subcommand git maintenance start <reconfigure &&
> test_subcommand ! git maintenance unregister --force <reconfigure &&
We _could_ make this a bit more solid by adding a test that:
1. Initializes a new repository.
2. Saves the configuration.
3. Performs `scalar reconfigure`.
4. Asserts that all new non-section-header lines in the configuration
have a trailing "#set by scalar" comment.
This would ensure that there is no callsite we forgot to add the new
annotation to, and that there are new future callsites where somebody
isn't aware of the comments.
I don't insist on such a test though, so please feel free to ignore this
suggestion.
Patrick
In September [1], we discussed that the Scalar config options could use some documented justification as well as some comments to the config file that they were set by Scalar. I was then immediately distracted by other work things and am finally here with a series to do just that.
[1] https://lore.kernel.org/git/[email protected]/
I have indeed used Patrick's idea to add '# set by scalar' to each line added by Scalar, it took a little more work for all the kinds of config set. I made myself a co-author.
While working to justify each config option, I found some stale or incorrect config options. I also relaxed the override setting in most cases which gave me an opportunity to alphabetize the settings.
There was at least one case (I'm thinking of
core.fscachehere) where the config doesn't even exist in core Git, but instead in Git for Windows. We'll need to adjust in that fork to reinclude it in the right place.Updates in V2
Thanks,
-Stolee
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: Matthew Hughes [email protected]