diff mbox series

[06/15] builtin/for-each-ref.c: add `--exclude` option

Message ID ea5c0ddc10cd484c2af8505b8078697896ff3bad.1683581621.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series refs: implement skip lists for packed backend | expand

Commit Message

Taylor Blau May 8, 2023, 9:59 p.m. UTC
When using `for-each-ref`, it is sometimes convenient for the caller to
be able to exclude certain parts of the references.

For example, if there are many `refs/__hidden__/*` references, the
caller may want to emit all references *except* the hidden ones.
Currently, the only way to do this is to post-process the output, like:

    $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/'

Which is do-able, but requires processing a potentially large quantity
of references.

Teach `git for-each-ref` a new `--exclude=<pattern>` option, which
excludes references from the results if they match one or more excluded
patterns.

This patch provides a naive implementation where the `ref_filter` still
sees all references (including ones that it will discard) and is left to
check whether each reference matches any excluded pattern(s) before
emitting them.

By culling out references we know the caller doesn't care about, we can
avoid allocating memory for their storage, as well as spending time
sorting the output (among other things). Even the naive implementation
provides a significant speed-up on a modified copy of linux.git (that
has a hidden ref pointing at each commit):

    $ hyperfine \
      'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/'
    Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     820.1 ms ±   2.0 ms    [User: 703.7 ms, System: 152.0 ms]
      Range (min … max):   817.7 ms … 823.3 ms    10 runs

    Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/
      Time (mean ± σ):     106.6 ms ±   1.1 ms    [User: 99.4 ms, System: 7.1 ms]
      Range (min … max):   104.7 ms … 109.1 ms    27 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran
        7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"'

Subsequent patches will improve on this by avoiding visiting excluded
sections of the `packed-refs` file in certain cases.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-for-each-ref.txt |  6 +++++
 builtin/for-each-ref.c             |  2 ++
 ref-filter.c                       | 13 +++++++++++
 ref-filter.h                       |  6 +++++
 t/t6300-for-each-ref.sh            | 35 ++++++++++++++++++++++++++++++
 5 files changed, 62 insertions(+)

Comments

Junio C Hamano May 8, 2023, 11:22 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> index c01fa6fefe..449da61e11 100644
> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -14,6 +14,7 @@ static char const * const for_each_ref_usage[] = {
>  	N_("git for-each-ref [--points-at <object>]"),
>  	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
>  	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
> +	N_("git for-each-ref [--exclude=<pattern> ...]"),
>  	NULL
>  };

I think the original is already wrong, but the easiest thing we can
do in order to avoid making it worse is to drop this hunk, as the
existing usage is this:

static char const * const for_each_ref_usage[] = {
	N_("git for-each-ref [<options>] [<pattern>]"),
	N_("git for-each-ref [--points-at <object>]"),
	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
	NULL
};

and this series merely adds a new "--exclude=<pattern>" as one of
the "<options>". 

As we can see from the fact that for example

 $ git for-each-ref --no-merged next refs/heads/\?\?/\*

works just fine, exactly the same thing can be said about the other
--points-at/--merged/--no-merged/--contains/--no-contains options.

The SYNOPSIS section of the manual page is fine.

> diff --git a/ref-filter.c b/ref-filter.c
> index 6c5eed144f..93dc9b331f 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2169,6 +2169,15 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
>  	return match_pattern(filter, filter->name_patterns, refname);
>  }
>  
> +static int filter_exclude_match(struct ref_filter *filter, const char *refname)
> +{
> +	if (!filter->exclude.nr)
> +		return 0;
> +	if (filter->match_as_path)
> +		return match_name_as_path(filter, filter->exclude.v, refname);
> +	return match_pattern(filter, filter->exclude.v, refname);
> +}

Earlier I made a comment about .name_patterns member becoming
unnecessary, but I think what should need to happen is instead
match_pattern() and match_name_as_path() to lose the "filter"
parameter, and take a boolean "ignore_case" instead.

>  struct ref_filter {
>  	const char **name_patterns;
> +	struct strvec exclude;

At some point after the dust settles, we may want to tweak
name_patterns so that these two appear to complement each other more
clearly, e.g. use "struct strvec include" to replace "name_patterns"
or something.  But that is an unrelated tangent.

In any case, there wasn't anything surprising or unexpected in the
code.  Looking good.

> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index 5c00607608..7e8d578522 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -447,6 +447,41 @@ test_expect_success 'exercise glob patterns with prefixes' '
>  	test_cmp expected actual
>  '
>  
> +cat >expected <<\EOF
> +refs/tags/bar
> +refs/tags/baz
> +refs/tags/testtag
> +EOF
> +
> +test_expect_success 'exercise patterns with prefix exclusions' '
> +	for tag in foo/one foo/two foo/three bar baz
> +	do
> +		git tag "$tag" || return 1
> +	done &&
> +	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
> +	git for-each-ref --format="%(refname)" \
> +		refs/tags/ --exclude=refs/tags/foo >actual &&
> +	test_cmp expected actual
> +'
> +
> +cat >expected <<\EOF
> +refs/tags/bar
> +refs/tags/baz
> +refs/tags/foo/one
> +refs/tags/testtag
> +EOF
> +
> +test_expect_success 'exercise patterns with pattern exclusions' '
> +	for tag in foo/one foo/two foo/three bar baz
> +	do
> +		git tag "$tag" || return 1
> +	done &&
> +	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
> +	git for-each-ref --format="%(refname)" \
> +		refs/tags/ --exclude="refs/tags/foo/t*" >actual &&
> +	test_cmp expected actual
> +'

These are doing as Romans do, so I won't comment on the outdated
pattern of preparing the expectation outside the test script.  After
the dust settles, somebody needs to go in and clean it up.

Thanks.
Taylor Blau May 9, 2023, 8:22 p.m. UTC | #2
On Mon, May 08, 2023 at 04:22:08PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> > index c01fa6fefe..449da61e11 100644
> > --- a/builtin/for-each-ref.c
> > +++ b/builtin/for-each-ref.c
> > @@ -14,6 +14,7 @@ static char const * const for_each_ref_usage[] = {
> >  	N_("git for-each-ref [--points-at <object>]"),
> >  	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
> >  	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
> > +	N_("git for-each-ref [--exclude=<pattern> ...]"),
> >  	NULL
> >  };
>
> I think the original is already wrong, but the easiest thing we can
> do in order to avoid making it worse is to drop this hunk, as the
> existing usage is this:
>
> static char const * const for_each_ref_usage[] = {
> 	N_("git for-each-ref [<options>] [<pattern>]"),
> 	N_("git for-each-ref [--points-at <object>]"),
> 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
> 	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
> 	NULL
> };
>
> and this series merely adds a new "--exclude=<pattern>" as one of
> the "<options>".
>
> As we can see from the fact that for example
>
>  $ git for-each-ref --no-merged next refs/heads/\?\?/\*
>
> works just fine, exactly the same thing can be said about the other
> --points-at/--merged/--no-merged/--contains/--no-contains options.
>
> The SYNOPSIS section of the manual page is fine.

Good point, will tweak; thanks.

> > @@ -2169,6 +2169,15 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
> >  	return match_pattern(filter, filter->name_patterns, refname);
> >  }
> >
> > +static int filter_exclude_match(struct ref_filter *filter, const char *refname)
> > +{
> > +	if (!filter->exclude.nr)
> > +		return 0;
> > +	if (filter->match_as_path)
> > +		return match_name_as_path(filter, filter->exclude.v, refname);
> > +	return match_pattern(filter, filter->exclude.v, refname);
> > +}
>
> Earlier I made a comment about .name_patterns member becoming
> unnecessary, but I think what should need to happen is instead
> match_pattern() and match_name_as_path() to lose the "filter"
> parameter, and take a boolean "ignore_case" instead.

Agreed.

> > +cat >expected <<\EOF
> > +refs/tags/bar
> > +refs/tags/baz
> > +refs/tags/foo/one
> > +refs/tags/testtag
> > +EOF
> > +
> > +test_expect_success 'exercise patterns with pattern exclusions' '
> > +	for tag in foo/one foo/two foo/three bar baz
> > +	do
> > +		git tag "$tag" || return 1
> > +	done &&
> > +	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
> > +	git for-each-ref --format="%(refname)" \
> > +		refs/tags/ --exclude="refs/tags/foo/t*" >actual &&
> > +	test_cmp expected actual
> > +'
>
> These are doing as Romans do, so I won't comment on the outdated
> pattern of preparing the expectation outside the test script.  After
> the dust settles, somebody needs to go in and clean it up.

Yeah, I figured that this series was already getting pretty long, but
that it would be expedient to propagate forward this pattern. But it
should be cleaned up. Let's tag it with #leftoverbits accordingly.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 1e215d4e73..5743eb5def 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -14,6 +14,7 @@  SYNOPSIS
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
+		   [--exclude=<pattern> ...]
 
 DESCRIPTION
 -----------
@@ -102,6 +103,11 @@  OPTIONS
 	Do not print a newline after formatted refs where the format expands
 	to the empty string.
 
+--exclude=<pattern>::
+	If one or more patterns are given, only refs which do not match
+	any excluded pattern(s) are shown. Matching is done using the
+	same rules as `<pattern>` above.
+
 FIELD NAMES
 -----------
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index c01fa6fefe..449da61e11 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -14,6 +14,7 @@  static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [--points-at <object>]"),
 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
 	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
+	N_("git for-each-ref [--exclude=<pattern> ...]"),
 	NULL
 };
 
@@ -47,6 +48,7 @@  int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 		OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
+		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
 		OPT_CALLBACK(0, "points-at", &filter.points_at,
 			     N_("object"), N_("print only refs which points at the given object"),
diff --git a/ref-filter.c b/ref-filter.c
index 6c5eed144f..93dc9b331f 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2169,6 +2169,15 @@  static int filter_pattern_match(struct ref_filter *filter, const char *refname)
 	return match_pattern(filter, filter->name_patterns, refname);
 }
 
+static int filter_exclude_match(struct ref_filter *filter, const char *refname)
+{
+	if (!filter->exclude.nr)
+		return 0;
+	if (filter->match_as_path)
+		return match_name_as_path(filter, filter->exclude.v, refname);
+	return match_pattern(filter, filter->exclude.v, refname);
+}
+
 /*
  * This is the same as for_each_fullref_in(), but it tries to iterate
  * only over the patterns we'll care about. Note that it _doesn't_ do a full
@@ -2336,6 +2345,9 @@  static int ref_filter_handler(const char *refname, const struct object_id *oid,
 	if (!filter_pattern_match(filter, refname))
 		return 0;
 
+	if (filter_exclude_match(filter, refname))
+		return 0;
+
 	if (filter->points_at.nr && !match_points_at(&filter->points_at, oid, refname))
 		return 0;
 
@@ -2875,6 +2887,7 @@  void ref_filter_init(struct ref_filter *filter)
 
 void ref_filter_clear(struct ref_filter *filter)
 {
+	strvec_clear(&filter->exclude);
 	oid_array_clear(&filter->points_at);
 	free_commit_list(filter->with_commit);
 	free_commit_list(filter->no_commit);
diff --git a/ref-filter.h b/ref-filter.h
index 160b807224..1524bc463a 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -6,6 +6,7 @@ 
 #include "refs.h"
 #include "commit.h"
 #include "string-list.h"
+#include "strvec.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -59,6 +60,7 @@  struct ref_array {
 
 struct ref_filter {
 	const char **name_patterns;
+	struct strvec exclude;
 	struct oid_array points_at;
 	struct commit_list *with_commit;
 	struct commit_list *no_commit;
@@ -94,6 +96,7 @@  struct ref_format {
 
 #define REF_FILTER_INIT { \
 	.points_at = OID_ARRAY_INIT, \
+	.exclude = STRVEC_INIT, \
 }
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
@@ -112,6 +115,9 @@  struct ref_format {
 #define OPT_REF_SORT(var) \
 	OPT_STRING_LIST(0, "sort", (var), \
 			N_("key"), N_("field name to sort on"))
+#define OPT_REF_FILTER_EXCLUDE(var) \
+	OPT_STRVEC(0, "exclude", &(var)->exclude, \
+		   N_("pattern"), N_("exclude refs which match pattern"))
 
 /*
  * API for filtering a set of refs. Based on the type of refs the user
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 5c00607608..7e8d578522 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -447,6 +447,41 @@  test_expect_success 'exercise glob patterns with prefixes' '
 	test_cmp expected actual
 '
 
+cat >expected <<\EOF
+refs/tags/bar
+refs/tags/baz
+refs/tags/testtag
+EOF
+
+test_expect_success 'exercise patterns with prefix exclusions' '
+	for tag in foo/one foo/two foo/three bar baz
+	do
+		git tag "$tag" || return 1
+	done &&
+	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
+	git for-each-ref --format="%(refname)" \
+		refs/tags/ --exclude=refs/tags/foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+refs/tags/bar
+refs/tags/baz
+refs/tags/foo/one
+refs/tags/testtag
+EOF
+
+test_expect_success 'exercise patterns with pattern exclusions' '
+	for tag in foo/one foo/two foo/three bar baz
+	do
+		git tag "$tag" || return 1
+	done &&
+	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
+	git for-each-ref --format="%(refname)" \
+		refs/tags/ --exclude="refs/tags/foo/t*" >actual &&
+	test_cmp expected actual
+'
+
 cat >expected <<\EOF
 'refs/heads/main'
 'refs/remotes/origin/main'