[1/3] fsck: verify commit graph when implicitly enabled

Message ID	20210913181221.42635-2-chooglen@google.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Date: Mon, 13 Sep 2021 11:12:19 -0700 In-Reply-To: <20210913181221.42635-1-chooglen@google.com> Message-Id: <20210913181221.42635-2-chooglen@google.com> Mime-Version: 1.0 References: <20210913181221.42635-1-chooglen@google.com> Subject: [PATCH 1/3] fsck: verify commit graph when implicitly enabled From: Glen Choo <chooglen@google.com> To: git@vger.kernel.org Cc: Glen Choo <chooglen@google.com>, derrickstolee@github.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	Use default values from settings instead of config \| expand [0/3] Use default values from settings instead of config [1/3] fsck: verify commit graph when implicitly enabled [2/3] fsck: verify multi-pack-index when implictly enabled [3/3] gc: perform incremental repack when implictly enabled

Glen Choo Sept. 13, 2021, 6:12 p.m. UTC

the_repository->settings() is the preferred way to get certain
settings (such as "core.commitGraph") because it gets default values
from prepare_repo_settings(). However, cmd_fsck() reads the config
directly via git_config_get_bool(), which bypasses these default values.
This causes fsck to ignore the commit graph if "core.commitgraph" is not
explicitly set in the config, even though commit graph is enabled by
default.

Replace git_config_get_bool("core.commitgraph") in fsck with the
equivalent call to the_repository->settings.core_commit_graph.

The expected behavior is that fsck respects the config value when it is
set, but uses the default value when it is unset. For example, for
core.commitGraph, there are three cases:

- Config value is set to true -> enabled
- Config value is set to false -> disabled
- Config value is unset -> enabled

As such, tests cover all three cases.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 builtin/fsck.c          |  3 ++-
 t/t5318-commit-graph.sh | 23 ++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 2 deletions(-)

Taylor Blau Sept. 13, 2021, 7:29 p.m. UTC | #1

On Mon, Sep 13, 2021 at 11:12:19AM -0700, Glen Choo wrote:
> the_repository->settings() is the preferred way to get certain
> settings (such as "core.commitGraph") because it gets default values
> from prepare_repo_settings(). However, cmd_fsck() reads the config
> directly via git_config_get_bool(), which bypasses these default values.
> This causes fsck to ignore the commit graph if "core.commitgraph" is not
> explicitly set in the config, even though commit graph is enabled by
> default.

Small nit; "the_repository->settings()" should be spelled as
"the_repository->settings", since "settings" is not a function.

It may be worth noting that this was totally fine before
core.commitGraph's default changed to true. That happened in 31b1de6a09
(commit-graph: turn on commit-graph by default, 2019-08-13), which is
the first time this sort of regression would have appeared.

>  	if (connectivity_only) {
>  		for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
> @@ -908,7 +909,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
>
>  	check_connectivity();
>
> -	if (!git_config_get_bool("core.commitgraph", &i) && i) {
> +	if (the_repository->settings.core_commit_graph) {
>  		struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
>  		const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL };

Here's the main part of your change, which is obviously correct (I'm
glossing over the earlier part where you call prepare_repo_settings();
also required and obviously correct).

> +test_expect_success 'git fsck (ignores commit-graph when config set to false)' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	git fsck &&
> +	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> +		"incorrect checksum" &&
> +	cp commit-graph-pre-write-test $objdir/info/commit-graph &&
> +	git -c core.commitGraph=false fsck

Nit; I recommend replacing the `-c` style configuration with
`test_config`, which modifies `$GIT_DIR/config` but only for the
duration of the sub-shell.

> +'
> +
> +test_expect_success 'git fsck (checks commit-graph when config unset)' '
> +	test_when_finished "git config core.commitGraph true" &&

... which would allow you to get rid of something like this (since you
can avoid modifying the state visible outside of this test).

> +	cd "$TRASH_DIRECTORY/full" &&
> +	git fsck &&
> +	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> +		"incorrect checksum" &&
> +        git config --unset core.commitGraph &&

But I'm not aware of a way to temporarily unset a configuration variable
for the duration of a test, so here I would probably write:

    test_must_fail git -c core.commitGraph= fsck

which Git interprets as "pretend this variable is unset in-core".

Thanks,
Taylor

Eric Sunshine Sept. 13, 2021, 7:33 p.m. UTC | #2

On Mon, Sep 13, 2021 at 3:29 PM Taylor Blau <me@ttaylorr.com> wrote:
> On Mon, Sep 13, 2021 at 11:12:19AM -0700, Glen Choo wrote:
> > +test_expect_success 'git fsck (ignores commit-graph when config set to false)' '
> > +     cd "$TRASH_DIRECTORY/full" &&
> > +     git fsck &&
> > +     corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> > +             "incorrect checksum" &&
> > +     cp commit-graph-pre-write-test $objdir/info/commit-graph &&
> > +     git -c core.commitGraph=false fsck
>
> Nit; I recommend replacing the `-c` style configuration with
> `test_config`, which modifies `$GIT_DIR/config` but only for the
> duration of the sub-shell.

Small correction: There is no subshell here, so it would be more
accurate to say "... for the duration of the test".

Aside: In fact, test_config() can't be used in subshells due to
implementation limitations which I won't go into.

Taylor Blau Sept. 13, 2021, 7:36 p.m. UTC | #3

On Mon, Sep 13, 2021 at 03:33:40PM -0400, Eric Sunshine wrote:
> On Mon, Sep 13, 2021 at 3:29 PM Taylor Blau <me@ttaylorr.com> wrote:
> > On Mon, Sep 13, 2021 at 11:12:19AM -0700, Glen Choo wrote:
> > > +test_expect_success 'git fsck (ignores commit-graph when config set to false)' '
> > > +     cd "$TRASH_DIRECTORY/full" &&
> > > +     git fsck &&
> > > +     corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> > > +             "incorrect checksum" &&
> > > +     cp commit-graph-pre-write-test $objdir/info/commit-graph &&
> > > +     git -c core.commitGraph=false fsck
> >
> > Nit; I recommend replacing the `-c` style configuration with
> > `test_config`, which modifies `$GIT_DIR/config` but only for the
> > duration of the sub-shell.
>
> Small correction: There is no subshell here, so it would be more
> accurate to say "... for the duration of the test".

Yes, sorry about that: I did not mean to say "subshell" here (Glen:
hopefully you didn't read too far into that, since test_config() does
not work in subshells).

Thanks for catching that, Eric.

Thanks,
Taylor

Glen Choo Sept. 13, 2021, 11:15 p.m. UTC | #4

Thanks for the thorough review! I really appreciate it :)

On Mon, Sep 13, 2021 at 03:29:25PM -0400, Taylor Blau wrote:
> Small nit; "the_repository->settings()" should be spelled as
> "the_repository->settings", since "settings" is not a function.

Oh that's a good catch. Thanks!

> It may be worth noting that this was totally fine before
> core.commitGraph's default changed to true. That happened in 31b1de6a09
> (commit-graph: turn on commit-graph by default, 2019-08-13), which is
> the first time this sort of regression would have appeared.

Sounds good, I'll note that down.

> Nit; I recommend replacing the `-c` style configuration with
> `test_config`, which modifies `$GIT_DIR/config` but only for the
> duration of the sub-shell.

Sounds good. This works great :)

> > +	cd "$TRASH_DIRECTORY/full" &&
> > +	git fsck &&
> > +	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> > +		"incorrect checksum" &&
> > +        git config --unset core.commitGraph &&
> 
> But I'm not aware of a way to temporarily unset a configuration variable
> for the duration of a test, so here I would probably write:
> 
>     test_must_fail git -c core.commitGraph= fsck
> 
> which Git interprets as "pretend this variable is unset in-core".

From my testing, I suspect that git does not pretend the variable is
unset, but rather, it pretends that the variable is set to the empty
string. It seems that git behaves as if the variable is set to "false".
This is called out in Documentation/git.txt:

  Including the equals but with an empty value (like `git -c
  foo.bar= ...`) sets `foo.bar` to the empty string which `git config
  --type=bool` will convert to `false`.

If the variable really is set to false, how might we proceed here? Shall
we stick with test_when_finished?

Eric Sunshine Sept. 13, 2021, 11:32 p.m. UTC | #5

On Mon, Sep 13, 2021 at 7:19 PM Glen Choo <chooglen@google.com> wrote:
> On Mon, Sep 13, 2021 at 03:29:25PM -0400, Taylor Blau wrote:
> > > +        git config --unset core.commitGraph &&
> >
> > But I'm not aware of a way to temporarily unset a configuration variable
> > for the duration of a test, so here I would probably write:
> >
> >     test_must_fail git -c core.commitGraph= fsck
> >
> > which Git interprets as "pretend this variable is unset in-core".
>
> From my testing, I suspect that git does not pretend the variable is
> unset, but rather, it pretends that the variable is set to the empty
> string. It seems that git behaves as if the variable is set to "false".
> This is called out in Documentation/git.txt:
>
>   Including the equals but with an empty value (like `git -c
>   foo.bar= ...`) sets `foo.bar` to the empty string which `git config
>   --type=bool` will convert to `false`.
>
> If the variable really is set to false, how might we proceed here? Shall
> we stick with test_when_finished?

That's probably reasonable, however, for robustness, you should
probably use test_unconfig() rather than raw `git config --unset` to
clear the variable.

Aside: This certainly makes one wonder if we should have a new
function in t/test-lib-functions.sh which unsets a variable for the
duration of a test only. However, that's outside the scope of this
submission.

Taylor Blau Sept. 14, 2021, 1:07 a.m. UTC | #6

On Mon, Sep 13, 2021 at 04:15:09PM -0700, Glen Choo wrote:
> > > +	cd "$TRASH_DIRECTORY/full" &&
> > > +	git fsck &&
> > > +	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> > > +		"incorrect checksum" &&
> > > +        git config --unset core.commitGraph &&
> >
> > But I'm not aware of a way to temporarily unset a configuration variable
> > for the duration of a test, so here I would probably write:
> >
> >     test_must_fail git -c core.commitGraph= fsck
> >
> > which Git interprets as "pretend this variable is unset in-core".
>
> From my testing, I suspect that git does not pretend the variable is
> unset, but rather, it pretends that the variable is set to the empty
> string. It seems that git behaves as if the variable is set to "false".
> This is called out in Documentation/git.txt:
>
>   Including the equals but with an empty value (like `git -c
>   foo.bar= ...`) sets `foo.bar` to the empty string which `git config
>   --type=bool` will convert to `false`.
>
> If the variable really is set to false, how might we proceed here? Shall
> we stick with test_when_finished?

Hmm, I thought that we did support `git -c foo.bar=` as shorthand to
unset the key `foo.bar`, but I must have been mistaken, because the
documentation there is quite clear.

In that case, I think your original approach to use test_when_finished
makes sense and is good. Thanks for double checking!

Thanks,
Taylor

Taylor Blau Sept. 14, 2021, 1:09 a.m. UTC | #7

On Mon, Sep 13, 2021 at 07:32:07PM -0400, Eric Sunshine wrote:
> On Mon, Sep 13, 2021 at 7:19 PM Glen Choo <chooglen@google.com> wrote:
> > On Mon, Sep 13, 2021 at 03:29:25PM -0400, Taylor Blau wrote:
> > > > +        git config --unset core.commitGraph &&
> > >
> > > But I'm not aware of a way to temporarily unset a configuration variable
> > > for the duration of a test, so here I would probably write:
> > >
> > >     test_must_fail git -c core.commitGraph= fsck
> > >
> > > which Git interprets as "pretend this variable is unset in-core".
> >
> > From my testing, I suspect that git does not pretend the variable is
> > unset, but rather, it pretends that the variable is set to the empty
> > string. It seems that git behaves as if the variable is set to "false".
> > This is called out in Documentation/git.txt:
> >
> >   Including the equals but with an empty value (like `git -c
> >   foo.bar= ...`) sets `foo.bar` to the empty string which `git config
> >   --type=bool` will convert to `false`.
> >
> > If the variable really is set to false, how might we proceed here? Shall
> > we stick with test_when_finished?
>
> That's probably reasonable, however, for robustness, you should
> probably use test_unconfig() rather than raw `git config --unset` to
> clear the variable.

Hmm. I'm not so sure, do other tests rely on the value of that variable?
If so, test_unconfig() won't restore them.

Even if we don't have any such tests now, it seems like we should err on
the side of leaving it alone (although I suppose that future tests could
set core.commitGraph to whatever value they need as long as they use the
test_config function).

> Aside: This certainly makes one wonder if we should have a new
> function in t/test-lib-functions.sh which unsets a variable for the
> duration of a test only. However, that's outside the scope of this
> submission.

:-). I thought the same thing to myself when reviewing earlier today.
That's why I recommended using test_when_finished upthread, but either
approach is fine (my comments are definitely cosmetic, and don't matter
to the substance of this thread, so ultimately I am fine with either).

Thanks,
Taylor

Eric Sunshine Sept. 14, 2021, 2:05 a.m. UTC | #8

On Mon, Sep 13, 2021 at 9:09 PM Taylor Blau <me@ttaylorr.com> wrote:
> On Mon, Sep 13, 2021 at 07:32:07PM -0400, Eric Sunshine wrote:
> > On Mon, Sep 13, 2021 at 7:19 PM Glen Choo <chooglen@google.com> wrote:
> > > If the variable really is set to false, how might we proceed here? Shall
> > > we stick with test_when_finished?
> >
> > That's probably reasonable, however, for robustness, you should
> > probably use test_unconfig() rather than raw `git config --unset` to
> > clear the variable.
>
> Hmm. I'm not so sure, do other tests rely on the value of that variable?
> If so, test_unconfig() won't restore them.

There may be a misunderstanding. I wasn't saying that test_unconfig()
alone would work. My "That's probably reasonable" referred to Glen's
proposal of combining `git config --unset` with test_when_finished()
to restore the variable. In addition to that, I suggested
test_unconfig() as being a more robust choice in that recipe.

> > Aside: This certainly makes one wonder if we should have a new
> > function in t/test-lib-functions.sh which unsets a variable for the
> > duration of a test only. However, that's outside the scope of this
> > submission.
>
> :-). I thought the same thing to myself when reviewing earlier today.
> That's why I recommended using test_when_finished upthread, but either
> approach is fine (my comments are definitely cosmetic, and don't matter
> to the substance of this thread, so ultimately I am fine with either).

Yep.

[1/3] fsck: verify commit graph when implicitly enabled

Commit Message

Comments

Patch