Message ID | 7ded51bbce1b23cf4110e3bf0abb7579efd4d344.1718095090.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [1/2] config: fix segfault when parsing "core.abbrev" without repo | expand |
Patrick Steinhardt <ps@pks.im> writes: > Fix both of these issues by not making it an error anymore when the > given length exceeds the hash length. Instead, if we have a repository, > then we truncate the length to the maximum length of `the_hash_algo`. > Otherwise, we simply leave the abbreviated length intact and store it > as-is. This is equivalent to the logic in `parse_opt_abbrev_cb()` and is > handled just fine by `repo_find_unique_abbrev_r()`. In practice, we > should never even end up using `default_abbrev` without a repository > anyway given that abbreviating object IDs to unique prefixes requires us > to have access to an object database. Makes sense. > diff --git a/config.c b/config.c > index abce05b774..ab2844d9e1 100644 > --- a/config.c > +++ b/config.c > @@ -1460,11 +1460,14 @@ static int git_default_core_config(const char *var, const char *value, > if (!strcasecmp(value, "auto")) > default_abbrev = -1; > else if (!git_parse_maybe_bool_text(value)) > - default_abbrev = the_hash_algo->hexsz; > + default_abbrev = startup_info->have_repository ? > + the_hash_algo->hexsz : GIT_MAX_HEXSZ; We will need to have some code that further adjusts overly long default_abbrev when we really have to abbreviate (at which time, hopefully we are already aware of the real hash algorithm used in the repository, and that may be SHA-1) anyway. So do we even need the conditional here? Can't we just set it to GIT_MAX_HEXSZ here unconditionally? > diff --git a/t/t4202-log.sh b/t/t4202-log.sh > index 86c695eb0a..99c063e4cd 100755 > --- a/t/t4202-log.sh > +++ b/t/t4202-log.sh > @@ -1237,6 +1237,12 @@ test_expect_success 'log.abbrevCommit configuration' ' > test_cmp expect.whatchanged.full actual > ' > > +test_expect_success 'log.abbrevCommit with --abbrev=9000' ' > + git log --no-abbrev >expect && > + git log --abbrev-commit --abbrev=9000 >actual && > + test_cmp expect actual > +' Interesting. We didn't have coverage for "we want to see full object names" case. > +test_expect_success 'output from clone with core.abbrev does not crash' ' > + rm -fr dst && > + echo "Cloning into ${SQ}dst${SQ}..." >expect && > + git -c core.abbrev=12 clone -n "file://$(pwd)/src" dst >actual 2>&1 && > + test_cmp expect actual > +' OK.
On Tue, Jun 11, 2024 at 10:54:23AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > Fix both of these issues by not making it an error anymore when the > > given length exceeds the hash length. Instead, if we have a repository, > > then we truncate the length to the maximum length of `the_hash_algo`. > > Otherwise, we simply leave the abbreviated length intact and store it > > as-is. This is equivalent to the logic in `parse_opt_abbrev_cb()` and is > > handled just fine by `repo_find_unique_abbrev_r()`. In practice, we > > should never even end up using `default_abbrev` without a repository > > anyway given that abbreviating object IDs to unique prefixes requires us > > to have access to an object database. > > Makes sense. > > > diff --git a/config.c b/config.c > > index abce05b774..ab2844d9e1 100644 > > --- a/config.c > > +++ b/config.c > > @@ -1460,11 +1460,14 @@ static int git_default_core_config(const char *var, const char *value, > > if (!strcasecmp(value, "auto")) > > default_abbrev = -1; > > else if (!git_parse_maybe_bool_text(value)) > > - default_abbrev = the_hash_algo->hexsz; > > + default_abbrev = startup_info->have_repository ? > > + the_hash_algo->hexsz : GIT_MAX_HEXSZ; > > We will need to have some code that further adjusts overly long > default_abbrev when we really have to abbreviate (at which time, > hopefully we are already aware of the real hash algorithm used in > the repository, and that may be SHA-1) anyway. > > So do we even need the conditional here? Can't we just set it to > GIT_MAX_HEXSZ here unconditionally? Not really. I was erring on the safe side here to retain the status quo for all relevant cases. As explained in the commit message, the length is only relevant when we have a repository because we otherwise wouldn't be able to abbreviate anything anyway. So essentially, this change is a functional no-op. There's also the question of the second commit, where we only handle `abbrev == the_hash_algo->size`, but not `abbrev > the_hash_algo->size`. It works, but is slower when not truncating the length until the second patch fixes it. So yes, we can set this unconditionally to `GIT_MAX_HEXSZ`, but out of abundance of caution I decided to make this conditional. Patrick
diff --git a/config.c b/config.c index abce05b774..ab2844d9e1 100644 --- a/config.c +++ b/config.c @@ -1460,11 +1460,14 @@ static int git_default_core_config(const char *var, const char *value, if (!strcasecmp(value, "auto")) default_abbrev = -1; else if (!git_parse_maybe_bool_text(value)) - default_abbrev = the_hash_algo->hexsz; + default_abbrev = startup_info->have_repository ? + the_hash_algo->hexsz : GIT_MAX_HEXSZ; else { int abbrev = git_config_int(var, value, ctx->kvi); - if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz) + if (abbrev < minimum_abbrev) return error(_("abbrev length out of range: %d"), abbrev); + else if (startup_info->have_repository && abbrev > the_hash_algo->hexsz) + abbrev = the_hash_algo->hexsz; default_abbrev = abbrev; } return 0; diff --git a/t/t4202-log.sh b/t/t4202-log.sh index 86c695eb0a..99c063e4cd 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -1237,6 +1237,12 @@ test_expect_success 'log.abbrevCommit configuration' ' test_cmp expect.whatchanged.full actual ' +test_expect_success 'log.abbrevCommit with --abbrev=9000' ' + git log --no-abbrev >expect && + git log --abbrev-commit --abbrev=9000 >actual && + test_cmp expect actual +' + test_expect_success 'show added path under "--follow -M"' ' # This tests for a regression introduced in v1.7.2-rc0~103^2~2 test_create_repo regression && diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index cc0b953f14..5d7ea147f1 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -46,6 +46,13 @@ test_expect_success 'output from clone' ' test $(grep Clon output | wc -l) = 1 ' +test_expect_success 'output from clone with core.abbrev does not crash' ' + rm -fr dst && + echo "Cloning into ${SQ}dst${SQ}..." >expect && + git -c core.abbrev=12 clone -n "file://$(pwd)/src" dst >actual 2>&1 && + test_cmp expect actual +' + test_expect_success 'clone does not keep pack' ' rm -fr dst &&
The "core.abbrev" config allows the user to specify the minimum length when abbreviating object hashes. Next to the values "auto" and "no", this config also accepts a concrete length that needs to be bigger or equal to the minimum length and smaller or equal to the hash algorithm's hex length. While the former condition is trivial, the latter depends on the object format used by the current repository. It is thus a variable upper boundary that may either be 40 (SHA-1) or 64 (SHA-256). This has two major downsides. First, the user that specifies this config must be aware of the object hashes that its repository use. If they want to configure the value globally, then they cannot pick any value in the range `[41, 64]` if they have any repository that uses SHA-1. If they did, Git would error out when parsing the config. Second, and more importantly, parsing "core.abbrev" crashes when outside of a Git repository because we dereference `the_hash_algo` to figure out its hex length. Starting with c8aed5e8da (repository: stop setting SHA1 as the default object hash, 2024-05-07) though, we stopped initializing `the_hash_algo` outside of Git repositories. Fix both of these issues by not making it an error anymore when the given length exceeds the hash length. Instead, if we have a repository, then we truncate the length to the maximum length of `the_hash_algo`. Otherwise, we simply leave the abbreviated length intact and store it as-is. This is equivalent to the logic in `parse_opt_abbrev_cb()` and is handled just fine by `repo_find_unique_abbrev_r()`. In practice, we should never even end up using `default_abbrev` without a repository anyway given that abbreviating object IDs to unique prefixes requires us to have access to an object database. Reported-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> --- config.c | 7 +++++-- t/t4202-log.sh | 6 ++++++ t/t5601-clone.sh | 7 +++++++ 3 files changed, 18 insertions(+), 2 deletions(-)