Message ID | 20181218072528.3870492-2-martin.agren@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | setup: add `clear_repository_format()` | expand |
On Tue, Dec 18, 2018 at 08:25:26AM +0100, Martin Ågren wrote: > No-one looks at the return value, so we might as well drop it. It's > still available as `format->version`. > > In v1 of what became commit 2cc7c2c737 ("setup: refactor repo format > reading and verification", 2016-03-11), this function actually had > return type "void", but that was changed in v2. Almost three years > later, no-one has used this return value. Hmm. If we have to pick one, I'd say that just returning a sane exit value would be the more conventional thing to do. But looking at the callers, many of them want to just pass the struct on to the verify function. That said, there is a long-standing curiosity here that we may want to consider: read_repository_format() does not distinguish between these three cases: 1. the config file is missing 2. the config file is present, but does not have a version field 3. the config file is malformed, or we experience an I/O error (although I think there are still some cases that cause the config parser to die(), which may be sufficient) The comment in check_repository_format_gently() implies that git-init needs (1) to be a non-error for historical reasons. We could probably tighten this up for other callers. I think (2) probably should be an error, but note that it makes t1300 very unhappy, since it stomps all over .git/config. I'm not sure if any real-world cases would be affected. Case (3) I think we probably ought to do a better job of diagnosing. So I wonder if the rule should be: - if we encounter a real error reading the config, read_repository_format() should return -1. Most callers should detect this and complain. - otherwise, a missing config returns 0 but puts "-1" into the version field - possibly verify_repository_format() should issue a warning when it sees "version == -1" and then rewrite the result into "0" I dunno. This is one of those dark corners of the code where we appear to do the wrong thing, but nobody seems to have noticed or cared much, and changing it runs the risk of breaking some obscure cases. I'm not sure if we should bite the bullet and try to address that, or just back away slowly and pretend we never looked at it. ;) > Signed-off-by: Martin Ågren <martin.agren@gmail.com> > --- > I only discovered the full history after writing the patch. Had I known > it from the beginning, maybe I'd have just skipped this step, but I was > sufficiently disturbed by the redundant and unused return value that I > dropped it before working on the actual meat of this series. > > cache.h | 7 +++---- > setup.c | 3 +-- > 2 files changed, 4 insertions(+), 6 deletions(-) FWIW, the patch itself seems fine, and obviously doesn't make anything worse on its own. The question is just whether we want to do more cleanup here. -Peff
On Wed, 19 Dec 2018 at 16:27, Jeff King <peff@peff.net> wrote: > > On Tue, Dec 18, 2018 at 08:25:26AM +0100, Martin Ågren wrote: > > > No-one looks at the return value, so we might as well drop it. It's > > still available as `format->version`. > > Hmm. If we have to pick one, I'd say that just returning a sane exit > value would be the more conventional thing to do. But looking at the > callers, many of them want to just pass the struct on to the verify > function. > > That said, there is a long-standing curiosity here that we may want to > consider: read_repository_format() does not distinguish between these > three cases: [snip several valuable insights] > I dunno. This is one of those dark corners of the code where we appear > to do the wrong thing, but nobody seems to have noticed or cared much, > and changing it runs the risk of breaking some obscure cases. I'm not > sure if we should bite the bullet and try to address that, or just back > away slowly and pretend we never looked at it. ;) That was my reaction when I first dug into this. :-/ It's only now that I've been brave enough to even try to dig through the first layer. > FWIW, the patch itself seems fine, and obviously doesn't make anything > worse on its own. The question is just whether we want to do more > cleanup here. Well, if we do want to make more cleanups around here, one of the more obvious ideas is to actually use the return value. If such a cleanup is going to start with a (moral) revert of this patch, then we're probably better off just dropping this patch. Thanks for your thoughtful response Martin
On Wed, Dec 19, 2018 at 10:27:35AM -0500, Jeff King wrote: > I dunno. This is one of those dark corners of the code where we appear > to do the wrong thing, but nobody seems to have noticed or cared much, > and changing it runs the risk of breaking some obscure cases. I'm not > sure if we should bite the bullet and try to address that, or just back > away slowly and pretend we never looked at it. ;) I will point out that with the SHA-256 work, reading the config file becomes essential for SHA-256 repositories, because we need to know the object format. Removing the config file leads to things blowing up in a bad way (what specific bad way I don't remember). That may influence the direction we want to take in this work, or not.
On Thu, Dec 20, 2018 at 12:17:53AM +0000, brian m. carlson wrote: > On Wed, Dec 19, 2018 at 10:27:35AM -0500, Jeff King wrote: > > I dunno. This is one of those dark corners of the code where we appear > > to do the wrong thing, but nobody seems to have noticed or cared much, > > and changing it runs the risk of breaking some obscure cases. I'm not > > sure if we should bite the bullet and try to address that, or just back > > away slowly and pretend we never looked at it. ;) > > I will point out that with the SHA-256 work, reading the config file > becomes essential for SHA-256 repositories, because we need to know the > object format. Removing the config file leads to things blowing up in a > bad way (what specific bad way I don't remember). > > That may influence the direction we want to take in this work, or not. Wouldn't we just treat that the same way we do now? I.e., assume the default of sha1, just like we assume repositoryformatversion==0? -Peff
On Wed, Dec 19, 2018 at 09:52:12PM -0500, Jeff King wrote: > On Thu, Dec 20, 2018 at 12:17:53AM +0000, brian m. carlson wrote: > > > On Wed, Dec 19, 2018 at 10:27:35AM -0500, Jeff King wrote: > > > I dunno. This is one of those dark corners of the code where we appear > > > to do the wrong thing, but nobody seems to have noticed or cared much, > > > and changing it runs the risk of breaking some obscure cases. I'm not > > > sure if we should bite the bullet and try to address that, or just back > > > away slowly and pretend we never looked at it. ;) > > > > I will point out that with the SHA-256 work, reading the config file > > becomes essential for SHA-256 repositories, because we need to know the > > object format. Removing the config file leads to things blowing up in a > > bad way (what specific bad way I don't remember). > > > > That may influence the direction we want to take in this work, or not. > > Wouldn't we just treat that the same way we do now? I.e., assume the > default of sha1, just like we assume repositoryformatversion==0? Yeah, we'll default to SHA-1, but the repository will be broken. HEAD can't be read. Trying to run git status dies with "fatal: Unknown index entry format". And so on. We've written data with 64-character object IDs, which can't be read by Git in SHA-1 mode. My point is essentially that in an SHA-256 repository, the config file isn't optional anymore. We probably need to consider that and error out in more situations (e.g. unreadable file or I/O error) instead of silently falling back to the defaults, since failing loudly in a visible way is better than having the user try to figure out why the index is suddenly "corrupt".
On Thu, Dec 20, 2018 at 03:45:55AM +0000, brian m. carlson wrote: > > > I will point out that with the SHA-256 work, reading the config file > > > becomes essential for SHA-256 repositories, because we need to know the > > > object format. Removing the config file leads to things blowing up in a > > > bad way (what specific bad way I don't remember). > > > > > > That may influence the direction we want to take in this work, or not. > > > > Wouldn't we just treat that the same way we do now? I.e., assume the > > default of sha1, just like we assume repositoryformatversion==0? > > Yeah, we'll default to SHA-1, but the repository will be broken. HEAD > can't be read. Trying to run git status dies with "fatal: Unknown index > entry format". And so on. We've written data with 64-character object > IDs, which can't be read by Git in SHA-1 mode. Oh, I see. Yes, if you have a SHA-256 repository and you don't tell anybody (via a config entry), then everything will fail to work. That seems like a perfectly reasonable outcome to me. > My point is essentially that in an SHA-256 repository, the config file > isn't optional anymore. We probably need to consider that and error out > in more situations (e.g. unreadable file or I/O error) instead of > silently falling back to the defaults, since failing loudly in a visible > way is better than having the user try to figure out why the index is > suddenly "corrupt". Yes, I agree that ideally we'd produce a better error message. I'd just be wary of breaking compatibility for the existing cases by making new requirements when we don't yet suspect the repo is SHA-256. When we see such a corruption, would it be possible to poke at the data as if it were the old SHA-1 format, and if _that_ looks sane, suggest to the user what the problem might be? That would help a number of cases beyond this one (i.e., you're missing config, you have config but it has the wrong repo format, you're missing the correct extensions field, etc). -Peff
diff --git a/cache.h b/cache.h index ca36b44ee0..8b9e592c65 100644 --- a/cache.h +++ b/cache.h @@ -974,11 +974,10 @@ struct repository_format { /* * Read the repository format characteristics from the config file "path" into - * "format" struct. Returns the numeric version. On error, -1 is returned, - * format->version is set to -1, and all other fields in the struct are - * undefined. + * "format" struct. On error, format->version is set to -1, and all other + * fields in the struct are undefined. */ -int read_repository_format(struct repository_format *format, const char *path); +void read_repository_format(struct repository_format *format, const char *path); /* * Verify that the repository described by repository_format is something we diff --git a/setup.c b/setup.c index 1be5037f12..27747af7a3 100644 --- a/setup.c +++ b/setup.c @@ -509,7 +509,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_ return 0; } -int read_repository_format(struct repository_format *format, const char *path) +void read_repository_format(struct repository_format *format, const char *path) { memset(format, 0, sizeof(*format)); format->version = -1; @@ -517,7 +517,6 @@ int read_repository_format(struct repository_format *format, const char *path) format->hash_algo = GIT_HASH_SHA1; string_list_init(&format->unknown_extensions, 1); git_config_from_file(check_repo_format, path, format); - return format->version; } int verify_repository_format(const struct repository_format *format,
No-one looks at the return value, so we might as well drop it. It's still available as `format->version`. In v1 of what became commit 2cc7c2c737 ("setup: refactor repo format reading and verification", 2016-03-11), this function actually had return type "void", but that was changed in v2. Almost three years later, no-one has used this return value. Signed-off-by: Martin Ågren <martin.agren@gmail.com> --- I only discovered the full history after writing the patch. Had I known it from the beginning, maybe I'd have just skipped this step, but I was sufficiently disturbed by the redundant and unused return value that I dropped it before working on the actual meat of this series. cache.h | 7 +++---- setup.c | 3 +-- 2 files changed, 4 insertions(+), 6 deletions(-)