diff mbox series

[1/3] setup: drop return value from `read_repository_format()`

Message ID 20181218072528.3870492-2-martin.agren@gmail.com (mailing list archive)
State New, archived
Headers show
Series setup: add `clear_repository_format()` | expand

Commit Message

Martin Ågren Dec. 18, 2018, 7:25 a.m. UTC
No-one looks at the return value, so we might as well drop it. It's
still available as `format->version`.

In v1 of what became commit 2cc7c2c737 ("setup: refactor repo format
reading and verification", 2016-03-11), this function actually had
return type "void", but that was changed in v2. Almost three years
later, no-one has used this return value.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 I only discovered the full history after writing the patch. Had I known
 it from the beginning, maybe I'd have just skipped this step, but I was
 sufficiently disturbed by the redundant and unused return value that I
 dropped it before working on the actual meat of this series.

 cache.h | 7 +++----
 setup.c | 3 +--
 2 files changed, 4 insertions(+), 6 deletions(-)

Comments

Jeff King Dec. 19, 2018, 3:27 p.m. UTC | #1
On Tue, Dec 18, 2018 at 08:25:26AM +0100, Martin Ågren wrote:

> No-one looks at the return value, so we might as well drop it. It's
> still available as `format->version`.
> 
> In v1 of what became commit 2cc7c2c737 ("setup: refactor repo format
> reading and verification", 2016-03-11), this function actually had
> return type "void", but that was changed in v2. Almost three years
> later, no-one has used this return value.

Hmm. If we have to pick one, I'd say that just returning a sane exit
value would be the more conventional thing to do. But looking at the
callers, many of them want to just pass the struct on to the verify
function.

That said, there is a long-standing curiosity here that we may want to
consider: read_repository_format() does not distinguish between these
three cases:

  1. the config file is missing

  2. the config file is present, but does not have a version field

  3. the config file is malformed, or we experience an I/O error
     (although I think there are still some cases that cause the config
     parser to die(), which may be sufficient)

The comment in check_repository_format_gently() implies that git-init
needs (1) to be a non-error for historical reasons. We could probably
tighten this up for other callers.

I think (2) probably should be an error, but note that it makes t1300
very unhappy, since it stomps all over .git/config. I'm not sure if any
real-world cases would be affected.

Case (3) I think we probably ought to do a better job of diagnosing. So
I wonder if the rule should be:

  - if we encounter a real error reading the config,
    read_repository_format() should return -1. Most callers should
    detect this and complain.

  - otherwise, a missing config returns 0 but puts "-1" into the version
    field

  - possibly verify_repository_format() should issue a warning when it
    sees "version == -1" and then rewrite the result into "0"

I dunno. This is one of those dark corners of the code where we appear
to do the wrong thing, but nobody seems to have noticed or cared much,
and changing it runs the risk of breaking some obscure cases. I'm not
sure if we should bite the bullet and try to address that, or just back
away slowly and pretend we never looked at it. ;)

> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  I only discovered the full history after writing the patch. Had I known
>  it from the beginning, maybe I'd have just skipped this step, but I was
>  sufficiently disturbed by the redundant and unused return value that I
>  dropped it before working on the actual meat of this series.
> 
>  cache.h | 7 +++----
>  setup.c | 3 +--
>  2 files changed, 4 insertions(+), 6 deletions(-)

FWIW, the patch itself seems fine, and obviously doesn't make anything
worse on its own. The question is just whether we want to do more
cleanup here.

-Peff
Martin Ågren Dec. 19, 2018, 9:42 p.m. UTC | #2
On Wed, 19 Dec 2018 at 16:27, Jeff King <peff@peff.net> wrote:
>
> On Tue, Dec 18, 2018 at 08:25:26AM +0100, Martin Ågren wrote:
>
> > No-one looks at the return value, so we might as well drop it. It's
> > still available as `format->version`.
>
> Hmm. If we have to pick one, I'd say that just returning a sane exit
> value would be the more conventional thing to do. But looking at the
> callers, many of them want to just pass the struct on to the verify
> function.
>
> That said, there is a long-standing curiosity here that we may want to
> consider: read_repository_format() does not distinguish between these
> three cases:

[snip several valuable insights]

> I dunno. This is one of those dark corners of the code where we appear
> to do the wrong thing, but nobody seems to have noticed or cared much,
> and changing it runs the risk of breaking some obscure cases. I'm not
> sure if we should bite the bullet and try to address that, or just back
> away slowly and pretend we never looked at it. ;)

That was my reaction when I first dug into this. :-/ It's only now that
I've been brave enough to even try to dig through the first layer.

> FWIW, the patch itself seems fine, and obviously doesn't make anything
> worse on its own. The question is just whether we want to do more
> cleanup here.

Well, if we do want to make more cleanups around here, one of the more
obvious ideas is to actually use the return value. If such a cleanup is
going to start with a (moral) revert of this patch, then we're probably
better off just dropping this patch.

Thanks for your thoughtful response


Martin
brian m. carlson Dec. 20, 2018, 12:17 a.m. UTC | #3
On Wed, Dec 19, 2018 at 10:27:35AM -0500, Jeff King wrote:
> I dunno. This is one of those dark corners of the code where we appear
> to do the wrong thing, but nobody seems to have noticed or cared much,
> and changing it runs the risk of breaking some obscure cases. I'm not
> sure if we should bite the bullet and try to address that, or just back
> away slowly and pretend we never looked at it. ;)

I will point out that with the SHA-256 work, reading the config file
becomes essential for SHA-256 repositories, because we need to know the
object format. Removing the config file leads to things blowing up in a
bad way (what specific bad way I don't remember).

That may influence the direction we want to take in this work, or not.
Jeff King Dec. 20, 2018, 2:52 a.m. UTC | #4
On Thu, Dec 20, 2018 at 12:17:53AM +0000, brian m. carlson wrote:

> On Wed, Dec 19, 2018 at 10:27:35AM -0500, Jeff King wrote:
> > I dunno. This is one of those dark corners of the code where we appear
> > to do the wrong thing, but nobody seems to have noticed or cared much,
> > and changing it runs the risk of breaking some obscure cases. I'm not
> > sure if we should bite the bullet and try to address that, or just back
> > away slowly and pretend we never looked at it. ;)
> 
> I will point out that with the SHA-256 work, reading the config file
> becomes essential for SHA-256 repositories, because we need to know the
> object format. Removing the config file leads to things blowing up in a
> bad way (what specific bad way I don't remember).
> 
> That may influence the direction we want to take in this work, or not.

Wouldn't we just treat that the same way we do now? I.e., assume the
default of sha1, just like we assume repositoryformatversion==0?

-Peff
brian m. carlson Dec. 20, 2018, 3:45 a.m. UTC | #5
On Wed, Dec 19, 2018 at 09:52:12PM -0500, Jeff King wrote:
> On Thu, Dec 20, 2018 at 12:17:53AM +0000, brian m. carlson wrote:
> 
> > On Wed, Dec 19, 2018 at 10:27:35AM -0500, Jeff King wrote:
> > > I dunno. This is one of those dark corners of the code where we appear
> > > to do the wrong thing, but nobody seems to have noticed or cared much,
> > > and changing it runs the risk of breaking some obscure cases. I'm not
> > > sure if we should bite the bullet and try to address that, or just back
> > > away slowly and pretend we never looked at it. ;)
> > 
> > I will point out that with the SHA-256 work, reading the config file
> > becomes essential for SHA-256 repositories, because we need to know the
> > object format. Removing the config file leads to things blowing up in a
> > bad way (what specific bad way I don't remember).
> > 
> > That may influence the direction we want to take in this work, or not.
> 
> Wouldn't we just treat that the same way we do now? I.e., assume the
> default of sha1, just like we assume repositoryformatversion==0?

Yeah, we'll default to SHA-1, but the repository will be broken. HEAD
can't be read. Trying to run git status dies with "fatal: Unknown index
entry format". And so on. We've written data with 64-character object
IDs, which can't be read by Git in SHA-1 mode.

My point is essentially that in an SHA-256 repository, the config file
isn't optional anymore. We probably need to consider that and error out
in more situations (e.g. unreadable file or I/O error) instead of
silently falling back to the defaults, since failing loudly in a visible
way is better than having the user try to figure out why the index is
suddenly "corrupt".
Jeff King Dec. 20, 2018, 2:53 p.m. UTC | #6
On Thu, Dec 20, 2018 at 03:45:55AM +0000, brian m. carlson wrote:

> > > I will point out that with the SHA-256 work, reading the config file
> > > becomes essential for SHA-256 repositories, because we need to know the
> > > object format. Removing the config file leads to things blowing up in a
> > > bad way (what specific bad way I don't remember).
> > > 
> > > That may influence the direction we want to take in this work, or not.
> > 
> > Wouldn't we just treat that the same way we do now? I.e., assume the
> > default of sha1, just like we assume repositoryformatversion==0?
> 
> Yeah, we'll default to SHA-1, but the repository will be broken. HEAD
> can't be read. Trying to run git status dies with "fatal: Unknown index
> entry format". And so on. We've written data with 64-character object
> IDs, which can't be read by Git in SHA-1 mode.

Oh, I see. Yes, if you have a SHA-256 repository and you don't tell
anybody (via a config entry), then everything will fail to work. That
seems like a perfectly reasonable outcome to me.

> My point is essentially that in an SHA-256 repository, the config file
> isn't optional anymore. We probably need to consider that and error out
> in more situations (e.g. unreadable file or I/O error) instead of
> silently falling back to the defaults, since failing loudly in a visible
> way is better than having the user try to figure out why the index is
> suddenly "corrupt".

Yes, I agree that ideally we'd produce a better error message. I'd just
be wary of breaking compatibility for the existing cases by making new
requirements when we don't yet suspect the repo is SHA-256.

When we see such a corruption, would it be possible to poke at the data
as if it were the old SHA-1 format, and if _that_ looks sane, suggest to
the user what the problem might be? That would help a number of cases
beyond this one (i.e., you're missing config, you have config but it has
the wrong repo format, you're missing the correct extensions field,
etc).

-Peff
diff mbox series

Patch

diff --git a/cache.h b/cache.h
index ca36b44ee0..8b9e592c65 100644
--- a/cache.h
+++ b/cache.h
@@ -974,11 +974,10 @@  struct repository_format {
 
 /*
  * Read the repository format characteristics from the config file "path" into
- * "format" struct. Returns the numeric version. On error, -1 is returned,
- * format->version is set to -1, and all other fields in the struct are
- * undefined.
+ * "format" struct. On error, format->version is set to -1, and all other
+ * fields in the struct are undefined.
  */
-int read_repository_format(struct repository_format *format, const char *path);
+void read_repository_format(struct repository_format *format, const char *path);
 
 /*
  * Verify that the repository described by repository_format is something we
diff --git a/setup.c b/setup.c
index 1be5037f12..27747af7a3 100644
--- a/setup.c
+++ b/setup.c
@@ -509,7 +509,7 @@  static int check_repository_format_gently(const char *gitdir, struct repository_
 	return 0;
 }
 
-int read_repository_format(struct repository_format *format, const char *path)
+void read_repository_format(struct repository_format *format, const char *path)
 {
 	memset(format, 0, sizeof(*format));
 	format->version = -1;
@@ -517,7 +517,6 @@  int read_repository_format(struct repository_format *format, const char *path)
 	format->hash_algo = GIT_HASH_SHA1;
 	string_list_init(&format->unknown_extensions, 1);
 	git_config_from_file(check_repo_format, path, format);
-	return format->version;
 }
 
 int verify_repository_format(const struct repository_format *format,