diff mbox series

[1/2] config: fix segfault when parsing "core.abbrev" without repo

Message ID 7ded51bbce1b23cf4110e3bf0abb7579efd4d344.1718095090.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series [1/2] config: fix segfault when parsing "core.abbrev" without repo | expand

Commit Message

Patrick Steinhardt June 11, 2024, 8:42 a.m. UTC
The "core.abbrev" config allows the user to specify the minimum length
when abbreviating object hashes. Next to the values "auto" and "no",
this config also accepts a concrete length that needs to be bigger or
equal to the minimum length and smaller or equal to the hash algorithm's
hex length. While the former condition is trivial, the latter depends on
the object format used by the current repository. It is thus a variable
upper boundary that may either be 40 (SHA-1) or 64 (SHA-256).

This has two major downsides. First, the user that specifies this config
must be aware of the object hashes that its repository use. If they want
to configure the value globally, then they cannot pick any value in the
range `[41, 64]` if they have any repository that uses SHA-1. If they
did, Git would error out when parsing the config.

Second, and more importantly, parsing "core.abbrev" crashes when outside
of a Git repository because we dereference `the_hash_algo` to figure out
its hex length. Starting with c8aed5e8da (repository: stop setting SHA1
as the default object hash, 2024-05-07) though, we stopped initializing
`the_hash_algo` outside of Git repositories.

Fix both of these issues by not making it an error anymore when the
given length exceeds the hash length. Instead, if we have a repository,
then we truncate the length to the maximum length of `the_hash_algo`.
Otherwise, we simply leave the abbreviated length intact and store it
as-is. This is equivalent to the logic in `parse_opt_abbrev_cb()` and is
handled just fine by `repo_find_unique_abbrev_r()`. In practice, we
should never even end up using `default_abbrev` without a repository
anyway given that abbreviating object IDs to unique prefixes requires us
to have access to an object database.

Reported-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 config.c         | 7 +++++--
 t/t4202-log.sh   | 6 ++++++
 t/t5601-clone.sh | 7 +++++++
 3 files changed, 18 insertions(+), 2 deletions(-)

Comments

Junio C Hamano June 11, 2024, 5:54 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> Fix both of these issues by not making it an error anymore when the
> given length exceeds the hash length. Instead, if we have a repository,
> then we truncate the length to the maximum length of `the_hash_algo`.
> Otherwise, we simply leave the abbreviated length intact and store it
> as-is. This is equivalent to the logic in `parse_opt_abbrev_cb()` and is
> handled just fine by `repo_find_unique_abbrev_r()`. In practice, we
> should never even end up using `default_abbrev` without a repository
> anyway given that abbreviating object IDs to unique prefixes requires us
> to have access to an object database.

Makes sense.

> diff --git a/config.c b/config.c
> index abce05b774..ab2844d9e1 100644
> --- a/config.c
> +++ b/config.c
> @@ -1460,11 +1460,14 @@ static int git_default_core_config(const char *var, const char *value,
>  		if (!strcasecmp(value, "auto"))
>  			default_abbrev = -1;
>  		else if (!git_parse_maybe_bool_text(value))
> -			default_abbrev = the_hash_algo->hexsz;
> +			default_abbrev = startup_info->have_repository ?
> +				the_hash_algo->hexsz : GIT_MAX_HEXSZ;

We will need to have some code that further adjusts overly long
default_abbrev when we really have to abbreviate (at which time,
hopefully we are already aware of the real hash algorithm used in
the repository, and that may be SHA-1) anyway.

So do we even need the conditional here?  Can't we just set it to
GIT_MAX_HEXSZ here unconditionally?

> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index 86c695eb0a..99c063e4cd 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -1237,6 +1237,12 @@ test_expect_success 'log.abbrevCommit configuration' '
>  	test_cmp expect.whatchanged.full actual
>  '
>  
> +test_expect_success 'log.abbrevCommit with --abbrev=9000' '
> +	git log --no-abbrev >expect &&
> +	git log --abbrev-commit --abbrev=9000 >actual &&
> +	test_cmp expect actual
> +'

Interesting.  We didn't have coverage for "we want to see full
object names" case.

> +test_expect_success 'output from clone with core.abbrev does not crash' '
> +	rm -fr dst &&
> +	echo "Cloning into ${SQ}dst${SQ}..." >expect &&
> +	git -c core.abbrev=12 clone -n "file://$(pwd)/src" dst >actual 2>&1 &&
> +	test_cmp expect actual
> +'

OK.
Patrick Steinhardt June 12, 2024, 5:55 a.m. UTC | #2
On Tue, Jun 11, 2024 at 10:54:23AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Fix both of these issues by not making it an error anymore when the
> > given length exceeds the hash length. Instead, if we have a repository,
> > then we truncate the length to the maximum length of `the_hash_algo`.
> > Otherwise, we simply leave the abbreviated length intact and store it
> > as-is. This is equivalent to the logic in `parse_opt_abbrev_cb()` and is
> > handled just fine by `repo_find_unique_abbrev_r()`. In practice, we
> > should never even end up using `default_abbrev` without a repository
> > anyway given that abbreviating object IDs to unique prefixes requires us
> > to have access to an object database.
> 
> Makes sense.
> 
> > diff --git a/config.c b/config.c
> > index abce05b774..ab2844d9e1 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -1460,11 +1460,14 @@ static int git_default_core_config(const char *var, const char *value,
> >  		if (!strcasecmp(value, "auto"))
> >  			default_abbrev = -1;
> >  		else if (!git_parse_maybe_bool_text(value))
> > -			default_abbrev = the_hash_algo->hexsz;
> > +			default_abbrev = startup_info->have_repository ?
> > +				the_hash_algo->hexsz : GIT_MAX_HEXSZ;
> 
> We will need to have some code that further adjusts overly long
> default_abbrev when we really have to abbreviate (at which time,
> hopefully we are already aware of the real hash algorithm used in
> the repository, and that may be SHA-1) anyway.
> 
> So do we even need the conditional here?  Can't we just set it to
> GIT_MAX_HEXSZ here unconditionally?

Not really. I was erring on the safe side here to retain the status quo
for all relevant cases. As explained in the commit message, the length
is only relevant when we have a repository because we otherwise wouldn't
be able to abbreviate anything anyway. So essentially, this change is a
functional no-op.

There's also the question of the second commit, where we only handle
`abbrev == the_hash_algo->size`, but not `abbrev > the_hash_algo->size`.
It works, but is slower when not truncating the length until the second
patch fixes it.

So yes, we can set this unconditionally to `GIT_MAX_HEXSZ`, but out of
abundance of caution I decided to make this conditional.

Patrick
diff mbox series

Patch

diff --git a/config.c b/config.c
index abce05b774..ab2844d9e1 100644
--- a/config.c
+++ b/config.c
@@ -1460,11 +1460,14 @@  static int git_default_core_config(const char *var, const char *value,
 		if (!strcasecmp(value, "auto"))
 			default_abbrev = -1;
 		else if (!git_parse_maybe_bool_text(value))
-			default_abbrev = the_hash_algo->hexsz;
+			default_abbrev = startup_info->have_repository ?
+				the_hash_algo->hexsz : GIT_MAX_HEXSZ;
 		else {
 			int abbrev = git_config_int(var, value, ctx->kvi);
-			if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz)
+			if (abbrev < minimum_abbrev)
 				return error(_("abbrev length out of range: %d"), abbrev);
+			else if (startup_info->have_repository && abbrev > the_hash_algo->hexsz)
+				abbrev = the_hash_algo->hexsz;
 			default_abbrev = abbrev;
 		}
 		return 0;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 86c695eb0a..99c063e4cd 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1237,6 +1237,12 @@  test_expect_success 'log.abbrevCommit configuration' '
 	test_cmp expect.whatchanged.full actual
 '
 
+test_expect_success 'log.abbrevCommit with --abbrev=9000' '
+	git log --no-abbrev >expect &&
+	git log --abbrev-commit --abbrev=9000 >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'show added path under "--follow -M"' '
 	# This tests for a regression introduced in v1.7.2-rc0~103^2~2
 	test_create_repo regression &&
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index cc0b953f14..5d7ea147f1 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -46,6 +46,13 @@  test_expect_success 'output from clone' '
 	test $(grep Clon output | wc -l) = 1
 '
 
+test_expect_success 'output from clone with core.abbrev does not crash' '
+	rm -fr dst &&
+	echo "Cloning into ${SQ}dst${SQ}..." >expect &&
+	git -c core.abbrev=12 clone -n "file://$(pwd)/src" dst >actual 2>&1 &&
+	test_cmp expect actual
+'
+
 test_expect_success 'clone does not keep pack' '
 
 	rm -fr dst &&