diff mbox series

[2/2] refs.c: unify '--exclude' behavior between files and packed backends

Message ID 7e6a5e020bad14b782a8c85518289220579fae64.1741223981.git.me@ttaylorr.com (mailing list archive)
State New
Headers show
Series refs: a couple of --exclude fixes | expand

Commit Message

Taylor Blau March 6, 2025, 1:19 a.m. UTC
In the packed-refs backend, our implementation of '--exclude' (dating
back to 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid
excluded pattern(s), 2023-07-10)) considers, for example:

    $ git for-each-ref --exclude=refs/heads/ba

to exclude "refs/heads/bar", "refs/heads/baz", and so on.

The files backend, which does not implement '--exclude' (and relies on
the caller to cull out results that don't match) naturally will
enumerate "refs/heads/bar" and so on.

So in the above example, 'for-each-ref' will try and see if
"refs/heads/ba" matches "refs/heads/bar" (since the files backend simply
enumerated every loose reference), and, realizing that it does not
match, output the reference as expected. (A caller that did want to
exclude "refs/heads/bar" and "refs/heads/baz" might instead run "git
for-each-ref --exclude='refs/heads/ba*'").

This can lead to strange behavior, like seeing a different set of
references advertised via 'upload-pack' depending on what set of
references were loose versus packed.

So there is a subtle bug with '--exclude' which is that in the
packed-refs backend we will consider "refs/heads/bar" to be a pattern
match against "refs/heads/ba" when we shouldn't. Likewise, the reftable
backend (which in this case is bug-compatible with the packed backend)
exhibits the same broken behavior.

There are a few ways to fix this. One is to tighten the rules in
cmp_record_to_refname(), which is used to determine the start/end-points
of the jump list used by the packed backend. In this new "strict" mode,
the comparison function would handle the case where we've reached the
end of the pattern by introducing a new check like so:

    while (1) {
        if (*r1 == '\n')
            return *r2 ? -1 : 0;
        if (!*r2)
            if (strict && *r1 != '/')        /* <- here */
                return 1;
            return start ? 1 : -1;
        if (*r1 != *r2)
            return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
        r1++;
        r2++;
    }

(eliding out the rest of cmp_record_to_refname()). Equivalently, we
could teach refs/packed-backend::populate_excluded_jump_list() to append
a trailing '/' if one does not already exist, forcing an exclude pattern
like "refs/heads/ba" to only match "refs/heads/ba/abc" and so forth.

But since the same problem exists in reftable, we can fix both at once
by performing this pre-processing step one layer up in refs.c at the
common entrypoint for the two, which is 'refs_ref_iterator_begin()'.

Since that solution is both the simplest and only requires modification
in one spot, let's normalize exclude patterns so that they end with a
trailing slash. This causes us to unify the behavior between all three
backends.

There is some minor test fallout in the "overlapping excluded regions"
test, which happens to use 'refs/ba' as an exclude pattern, and expects
references under the "refs/heads/bar/*" and "refs/heads/baz/*"
hierarchies to be excluded from the results.

But that test fallout is expected, because the test was codifying the
buggy behavior to begin with, and should have never been written that
way.

Reported-by: SURA <surak8806@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs.c                  | 6 +++++-
 t/t1419-exclude-refs.sh | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

Comments

Patrick Steinhardt March 6, 2025, 8:47 a.m. UTC | #1
On Wed, Mar 05, 2025 at 08:19:53PM -0500, Taylor Blau wrote:

Nit: I'd reword the commit subject to not only talk about "files" and
"packed" backends. Or maybe not mention the backend at all, as it is the
same bug for every backend that implements excludes. Something like
"refs: stop matching non-directory prefixes in exclude patterns", for
example.

[snip]
> But since the same problem exists in reftable, we can fix both at once
> by performing this pre-processing step one layer up in refs.c at the
> common entrypoint for the two, which is 'refs_ref_iterator_begin()'.
> 
> Since that solution is both the simplest and only requires modification
> in one spot, let's normalize exclude patterns so that they end with a
> trailing slash. This causes us to unify the behavior between all three
> backends.

Nice.

> There is some minor test fallout in the "overlapping excluded regions"
> test, which happens to use 'refs/ba' as an exclude pattern, and expects
> references under the "refs/heads/bar/*" and "refs/heads/baz/*"
> hierarchies to be excluded from the results.

Yup, I noticed that this test is asserting the broken behaviour.

> diff --git a/refs.c b/refs.c
> index 17d3840aff..2d9a1b51f4 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1708,7 +1708,11 @@ struct ref_iterator *refs_ref_iterator_begin(
>  			if (!len)
>  				continue;
>  
> -			strvec_push(&normalized_exclude_patterns, pattern);
> +			if (pattern[len - 1] == '/')
> +				strvec_push(&normalized_exclude_patterns, pattern);
> +			else
> +				strvec_pushf(&normalized_exclude_patterns, "%s/",
> +					     pattern);
>  		}
>  
>  		exclude_patterns = normalized_exclude_patterns.v;

This looks exactly as expected.

> diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
> index fd58260a24..d955cf9541 100755
> --- a/t/t1419-exclude-refs.sh
> +++ b/t/t1419-exclude-refs.sh
> @@ -101,7 +101,7 @@ test_expect_success 'adjacent, non-overlapping excluded regions' '
>  
>  test_expect_success 'overlapping excluded regions' '
>  	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf &&
> -	for_each_ref refs/heads/foo refs/heads/quux >expect &&
> +	for_each_ref refs/heads/bar refs/heads/foo refs/heads/quux >expect &&
>  
>  	test_cmp expect actual &&
>  	assert_jumps 1 perf

I was wondering whether this still tests the right thing. But the ranges
still are overlapping, as "refs/heads" and "refs/heads/baz" are. So
judging by the test description it seems to still do what's advertised.

Patrick
diff mbox series

Patch

diff --git a/refs.c b/refs.c
index 17d3840aff..2d9a1b51f4 100644
--- a/refs.c
+++ b/refs.c
@@ -1708,7 +1708,11 @@  struct ref_iterator *refs_ref_iterator_begin(
 			if (!len)
 				continue;
 
-			strvec_push(&normalized_exclude_patterns, pattern);
+			if (pattern[len - 1] == '/')
+				strvec_push(&normalized_exclude_patterns, pattern);
+			else
+				strvec_pushf(&normalized_exclude_patterns, "%s/",
+					     pattern);
 		}
 
 		exclude_patterns = normalized_exclude_patterns.v;
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
index fd58260a24..d955cf9541 100755
--- a/t/t1419-exclude-refs.sh
+++ b/t/t1419-exclude-refs.sh
@@ -101,7 +101,7 @@  test_expect_success 'adjacent, non-overlapping excluded regions' '
 
 test_expect_success 'overlapping excluded regions' '
 	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf &&
-	for_each_ref refs/heads/foo refs/heads/quux >expect &&
+	for_each_ref refs/heads/bar refs/heads/foo refs/heads/quux >expect &&
 
 	test_cmp expect actual &&
 	assert_jumps 1 perf