diff mbox series

[2/3] setup: do not use invalid `repository_format`

Message ID 20181218072528.3870492-3-martin.agren@gmail.com (mailing list archive)
State New, archived
Headers show
Series setup: add `clear_repository_format()` | expand

Commit Message

Martin Ågren Dec. 18, 2018, 7:25 a.m. UTC
If `read_repository_format()` encounters an error, `format->version`
will be -1 and all other fields of `format` will be undefined. However,
in `setup_git_directory_gently()`, we use `repo_fmt.hash_algo`
regardless of the value of `repo_fmt.version`.

This can be observed by adding this to the end of
`read_repository_format()`:

	if (format->version == -1)
		format->hash_algo = 0; /* no-one should peek at this! */

This causes, e.g., "git branch -m q q2 without config should succeed" in
t3200 to fail with "fatal: Failed to resolve HEAD as a valid ref."
because it has moved .git/config out of the way and is now trying to use
a bad hash algorithm.

Check that `version` is non-negative before using `hash_algo`.

This patch adds no tests, but do note that if we skip this patch, the
next patch would cause existing tests to fail as outlined above.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 I fully admit to not understanding all of this setup code, neither in
 its current incarnation, nor in terms of an ideal end game. This check
 seems like a good thing to do though.

 setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

brian m. carlson Dec. 19, 2018, 12:18 a.m. UTC | #1
On Tue, Dec 18, 2018 at 08:25:27AM +0100, Martin Ågren wrote:
>  I fully admit to not understanding all of this setup code, neither in
>  its current incarnation, nor in terms of an ideal end game. This check
>  seems like a good thing to do though.

It's definitely complex.

> diff --git a/setup.c b/setup.c
> index 27747af7a3..52c3c9d31f 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1138,7 +1138,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
>  			setup_git_env(gitdir);
>  		}
> -		if (startup_info->have_repository)
> +		if (startup_info->have_repository && repo_fmt.version > -1)
>  			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
>  	}

I think this change is fine, because we initialize the value in
the_repository elsewhere, and if there's no repository, this should
never have a value other than the default anyway.

I looked at the other patches in the series and thought they looked sane
as well.
Jeff King Dec. 19, 2018, 3:38 p.m. UTC | #2
On Tue, Dec 18, 2018 at 08:25:27AM +0100, Martin Ågren wrote:

> If `read_repository_format()` encounters an error, `format->version`
> will be -1 and all other fields of `format` will be undefined. However,
> in `setup_git_directory_gently()`, we use `repo_fmt.hash_algo`
> regardless of the value of `repo_fmt.version`.
> 
> This can be observed by adding this to the end of
> `read_repository_format()`:
> 
> 	if (format->version == -1)
> 		format->hash_algo = 0; /* no-one should peek at this! */
> 
> This causes, e.g., "git branch -m q q2 without config should succeed" in
> t3200 to fail with "fatal: Failed to resolve HEAD as a valid ref."
> because it has moved .git/config out of the way and is now trying to use
> a bad hash algorithm.
> 
> Check that `version` is non-negative before using `hash_algo`.
> 
> This patch adds no tests, but do note that if we skip this patch, the
> next patch would cause existing tests to fail as outlined above.
> 
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>

Hmm. It looks like we never set repo_fmt.hash_algo to anything besides
GIT_HASH_SHA1 anyway. I guess the existing field is really just there in
preparation for us eventually respecting extensions.hashAlgorithm (or
whatever it's called).

Given what I said in my previous email about repos with a missing
"version" field, I wondered if this patch would be breaking config like:

  [core]
  # no repositoryformatversion!
  [extensions]
  hashAlgorithm = sha256

But I'd argue that:

  1. That's pretty dumb config that we shouldn't need to support. Even
     if we care about handling the missing version for historical repos,
     they wouldn't be talking sha256.

  2. Arguably we should not even look at extensions.* unless we see a
     version >= 1. But we do process them as we parse the config file.
     This is mostly an oversight, I think. We have to handle them as we
     see them, because they may come out of order with respect to the
     repositoryformatversion field. But we could put them into a
     string_list, and then only process them after we've decided which
     version we have.

So I think your patch is doing the right thing, and won't hurt any real
cases. But (of course) there are more opportunities to clean things up.

-Peff
Martin Ågren Dec. 19, 2018, 9:43 p.m. UTC | #3
On Wed, 19 Dec 2018 at 01:18, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> I think this change is fine, because we initialize the value in
> the_repository elsewhere, and if there's no repository, this should
> never have a value other than the default anyway.

Thanks, it feels good that this patch matches how you think about the
`hash_algo` field.

> I looked at the other patches in the series and thought they looked sane
> as well.

Thanks for a review, I appreciate it.


Martin
Martin Ågren Dec. 19, 2018, 9:46 p.m. UTC | #4
On Wed, 19 Dec 2018 at 16:38, Jeff King <peff@peff.net> wrote:
>
> On Tue, Dec 18, 2018 at 08:25:27AM +0100, Martin Ågren wrote:
>
> > Check that `version` is non-negative before using `hash_algo`.

> Hmm. It looks like we never set repo_fmt.hash_algo to anything besides
> GIT_HASH_SHA1 anyway. I guess the existing field is really just there in
> preparation for us eventually respecting extensions.hashAlgorithm (or
> whatever it's called).

That was my understanding as well. Maybe I should have spelled it out.

I think of the diff of this patch as "let's check `foo->valid` before we
`use(foo->bar)`", which should only be able to regress in case foo isn't
valid. And ...

> Given what I said in my previous email about repos with a missing
> "version" field, I wondered if this patch would be breaking config like:
>
>   [core]
>   # no repositoryformatversion!
>   [extensions]
>   hashAlgorithm = sha256
>
> But I'd argue that:
>
>   1. That's pretty dumb config that we shouldn't need to support. Even
>      if we care about handling the missing version for historical repos,
>      they wouldn't be talking sha256.

... this matches my thinking.

>   2. Arguably we should not even look at extensions.* unless we see a
>      version >= 1. But we do process them as we parse the config file.
>      This is mostly an oversight, I think. We have to handle them as we
>      see them, because they may come out of order with respect to the
>      repositoryformatversion field. But we could put them into a
>      string_list, and then only process them after we've decided which
>      version we have.

I hadn't thought too much about this. I guess that for some simpler
extensions--versions dependencies it would be feasible to first parse
everything, then, depending on the version we've identified, forget
about any "irrelevant" extensions. Again, nothing I've thought much
about, and seems to be safely out of scope for this patch.


> So I think your patch is doing the right thing, and won't hurt any real
> cases. But (of course) there are more opportunities to clean things up.
Jeff King Dec. 19, 2018, 11:17 p.m. UTC | #5
On Wed, Dec 19, 2018 at 10:46:52PM +0100, Martin Ågren wrote:

> >   2. Arguably we should not even look at extensions.* unless we see a
> >      version >= 1. But we do process them as we parse the config file.
> >      This is mostly an oversight, I think. We have to handle them as we
> >      see them, because they may come out of order with respect to the
> >      repositoryformatversion field. But we could put them into a
> >      string_list, and then only process them after we've decided which
> >      version we have.
> 
> I hadn't thought too much about this. I guess that for some simpler
> extensions--versions dependencies it would be feasible to first parse
> everything, then, depending on the version we've identified, forget
> about any "irrelevant" extensions. Again, nothing I've thought much
> about, and seems to be safely out of scope for this patch.

The decision is actually pretty straight-forward: if version < 1, ignore
extensions, otherwise respect them (and complain about any we don't know
about).

So I think we could just do in verify_repository_format() something
like:

  if (version < 1) {
    /* "undo" any extensions we might have parsed */
    data->precious_objects = 0;
    FREE_AND_NULL(data->partial_clone);
    data->worktree_config = 0;
    data->hash_algo = GIT_HASH_SHA1;
  } else {
    /* complain about unknown extension; we already do this! */
  }

It's a little ugly to have to know about all the extensions here, but we
already initialize them in read_repository_format(). We could probably
factor that out into a shared function.

-Peff
brian m. carlson Dec. 20, 2018, 12:21 a.m. UTC | #6
On Wed, Dec 19, 2018 at 10:38:41AM -0500, Jeff King wrote:
> Hmm. It looks like we never set repo_fmt.hash_algo to anything besides
> GIT_HASH_SHA1 anyway. I guess the existing field is really just there in
> preparation for us eventually respecting extensions.hashAlgorithm (or
> whatever it's called).

Yeah, it is.

I haven't tested, but since we just read the value of
extensions.objectFormat, this patch shouldn't have any effect on the
SHA-256 code. The default remains SHA-1 if a value isn't specified
somehow.
diff mbox series

Patch

diff --git a/setup.c b/setup.c
index 27747af7a3..52c3c9d31f 100644
--- a/setup.c
+++ b/setup.c
@@ -1138,7 +1138,7 @@  const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository && repo_fmt.version > -1)
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
 	}