diff mbox series

[v2,8/8] rev-list: allow filtering of provided items

Message ID 0e26fee8b31e46e87fb9fa1ac599506502a9d622.1615813673.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series rev-parse: implement object type filter | expand

Commit Message

Patrick Steinhardt March 15, 2021, 1:15 p.m. UTC
When providing an object filter, it is currently impossible to also
filter provided items. E.g. when executing `git rev-list HEAD` , the
commit this reference points to will be treated as user-provided and is
thus excluded from the filtering mechanism. This makes it harder than
necessary to properly use the new `--filter=object:type` filter given
that even if the user wants to only see blobs, he'll still see commits
of provided references.

Improve this by introducing a new `--filter-provided` option to the
git-rev-parse(1) command. If given, then all user-provided references
will be subject to filtering.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/rev-list.c                  | 14 +++++++++++
 list-objects-filter-options.c       |  4 ++++
 list-objects-filter-options.h       |  6 +++++
 pack-bitmap.c                       |  3 ++-
 t/t6112-rev-list-filters-objects.sh | 28 ++++++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  | 36 +++++++++++++++++++++++++++++
 6 files changed, 90 insertions(+), 1 deletion(-)

Comments

Jeff King April 6, 2021, 6:04 p.m. UTC | #1
On Mon, Mar 15, 2021 at 02:15:05PM +0100, Patrick Steinhardt wrote:

> When providing an object filter, it is currently impossible to also
> filter provided items. E.g. when executing `git rev-list HEAD` , the
> commit this reference points to will be treated as user-provided and is
> thus excluded from the filtering mechanism. This makes it harder than
> necessary to properly use the new `--filter=object:type` filter given
> that even if the user wants to only see blobs, he'll still see commits
> of provided references.
> 
> Improve this by introducing a new `--filter-provided` option to the
> git-rev-parse(1) command. If given, then all user-provided references
> will be subject to filtering.

I think this option is a good thing to have.

The name seems a little confusing to me, as I can read is as both
"please filter the provided objects" and "a filter has been provided".
I guess "--filter-print-provided" would be more clear. And also the
default, so you'd want "--no-filter-print-provided". That's kind of
clunky, though. Maybe "--filter-omit-provided"?

> @@ -694,6 +698,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  			return show_bisect_vars(&info, reaches, all);
>  	}
>  
> +	if (filter_options.filter_wants) {
> +		struct commit_list *c;
> +		for (i = 0; i < revs.pending.nr; i++) {
> +			struct object_array_entry *pending = revs.pending.objects + i;
> +			pending->item->flags |= NOT_USER_GIVEN;
> +		}
> +		for (c = revs.commits; c; c = c->next)
> +			c->item->object.flags |= NOT_USER_GIVEN;
> +	}

You store the flag inside the filter_options struct, which implies to me
that it's something that could be applied per-filter (at least in
theory; the command line option doesn't allow us to distinguish).

But here you treat it as a global flag that munges the NOT_USER_GIVEN
flags. Given that it's inside the filter_options struct, and that you
propagate it via transform_to_combine_type(), I'd have expected the LOFC
code to look at the flag and decide to ignore the whole user-given
concept completely.

To be clear, I don't mind at all having it as a global that applies to
all filters. I don't think the flexibility buys us anything. But since
it only applies to rev-list, why not just make it a global option within
rev-list?

And then these hunks:

> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index bb6f6577d5..2877aa9e96 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -242,6 +242,7 @@ static void transform_to_combine_type(
>  		memset(filter_options, 0, sizeof(*filter_options));
>  		filter_options->sub = sub_array;
>  		filter_options->sub_alloc = initial_sub_alloc;
> +		filter_options->filter_wants = sub_array[0].filter_wants;
>  	}
>  	filter_options->sub_nr = 1;
>  	filter_options->choice = LOFC_COMBINE;
> @@ -290,6 +291,9 @@ void parse_list_objects_filter(
>  		parse_error = gently_parse_list_objects_filter(
>  			&filter_options->sub[filter_options->sub_nr - 1], arg,
>  			&errbuf);
> +		if (!parse_error)
> +			filter_options->sub[filter_options->sub_nr - 1].filter_wants =
> +				filter_options->filter_wants;
>  	}
>  	if (parse_error)
>  		die("%s", errbuf.buf);
> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> index 4d0d0588cc..5e609e307a 100644
> --- a/list-objects-filter-options.h
> +++ b/list-objects-filter-options.h
> @@ -42,6 +42,12 @@ struct list_objects_filter_options {
>  	 */
>  	enum list_objects_filter_choice choice;
>  
> +	/*
> +	 * "--filter-provided" was given by the user, instructing us to also
> +	 * filter all explicitly provided objects.
> +	 */
> +	unsigned int filter_wants : 1;
> +
>  	/*
>  	 * Choice is LOFC_DISABLED because "--no-filter" was requested.
>  	 */

would not be needed at all.

> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index e33805e076..5ff800316b 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -1101,7 +1101,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
>  	if (haves_bitmap)
>  		bitmap_and_not(wants_bitmap, haves_bitmap);
>  
> -	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
> +	filter_bitmap(bitmap_git, (filter && filter->filter_wants) ? NULL : wants,
> +		      wants_bitmap, filter);
>  
>  	bitmap_git->result = wants_bitmap;
>  	bitmap_git->haves = haves_bitmap;

I guess we'd need to pass that flag into prepare_bitmap_walk() here so
it knows not to bother with the wants-filtering. But that seems less bad
that stuffing it into the filter struct.

-Peff
Patrick Steinhardt April 9, 2021, 10:59 a.m. UTC | #2
On Tue, Apr 06, 2021 at 02:04:15PM -0400, Jeff King wrote:
> On Mon, Mar 15, 2021 at 02:15:05PM +0100, Patrick Steinhardt wrote:
> 
> > When providing an object filter, it is currently impossible to also
> > filter provided items. E.g. when executing `git rev-list HEAD` , the
> > commit this reference points to will be treated as user-provided and is
> > thus excluded from the filtering mechanism. This makes it harder than
> > necessary to properly use the new `--filter=object:type` filter given
> > that even if the user wants to only see blobs, he'll still see commits
> > of provided references.
> > 
> > Improve this by introducing a new `--filter-provided` option to the
> > git-rev-parse(1) command. If given, then all user-provided references
> > will be subject to filtering.
> 
> I think this option is a good thing to have.
> 
> The name seems a little confusing to me, as I can read is as both
> "please filter the provided objects" and "a filter has been provided".
> I guess "--filter-print-provided" would be more clear. And also the
> default, so you'd want "--no-filter-print-provided". That's kind of
> clunky, though. Maybe "--filter-omit-provided"?

Hum, "--filter-omit-provided" doesn't sound good to me, either. Omit to
me sounds like it'd omit filtering provided items, but we're doing
the reverse thing.

How about "--filter-provided-revisions"? Verbose, but at least it cannot
be confused with a filter being provided.

> > @@ -694,6 +698,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> >  			return show_bisect_vars(&info, reaches, all);
> >  	}
> >  
> > +	if (filter_options.filter_wants) {
> > +		struct commit_list *c;
> > +		for (i = 0; i < revs.pending.nr; i++) {
> > +			struct object_array_entry *pending = revs.pending.objects + i;
> > +			pending->item->flags |= NOT_USER_GIVEN;
> > +		}
> > +		for (c = revs.commits; c; c = c->next)
> > +			c->item->object.flags |= NOT_USER_GIVEN;
> > +	}
> 
> You store the flag inside the filter_options struct, which implies to me
> that it's something that could be applied per-filter (at least in
> theory; the command line option doesn't allow us to distinguish).
> 
> But here you treat it as a global flag that munges the NOT_USER_GIVEN
> flags. Given that it's inside the filter_options struct, and that you
> propagate it via transform_to_combine_type(), I'd have expected the LOFC
> code to look at the flag and decide to ignore the whole user-given
> concept completely.
> 
> To be clear, I don't mind at all having it as a global that applies to
> all filters. I don't think the flexibility buys us anything. But since
> it only applies to rev-list, why not just make it a global option within
> rev-list?
[snip]

Fair point. This probably stems from the confusion where I initially
didn't realize that the filter_options is not a "global" options
structure, but in fact the filter itself already. That's also why there
had been the initial bug where converting filter options into a combined
filter led to `filter_wants` being dropped.

In any case, the resulting code with it being global to rev-list.c
instead of part of the options is a lot cleaner.

Patrick
Jeff King April 9, 2021, 3:58 p.m. UTC | #3
On Fri, Apr 09, 2021 at 12:59:41PM +0200, Patrick Steinhardt wrote:

> > The name seems a little confusing to me, as I can read is as both
> > "please filter the provided objects" and "a filter has been provided".
> > I guess "--filter-print-provided" would be more clear. And also the
> > default, so you'd want "--no-filter-print-provided". That's kind of
> > clunky, though. Maybe "--filter-omit-provided"?
> 
> Hum, "--filter-omit-provided" doesn't sound good to me, either. Omit to
> me sounds like it'd omit filtering provided items, but we're doing
> the reverse thing.

Yeah, I can see that.

> How about "--filter-provided-revisions"? Verbose, but at least it cannot
> be confused with a filter being provided.

Yes, that works for me. Maybe "--filter-provided-objects", since you
could also provide a non-revision on the command line (though I think
other parts of the docs are happy to refer to "revisions" or "commits"
on the command line, even though you can clearly provide non-commits
when used with --objects).

-Peff
diff mbox series

Patch

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index b4d8ea0a35..0f959b266d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -599,6 +599,10 @@  int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			list_objects_filter_set_no_filter(&filter_options);
 			continue;
 		}
+		if (!strcmp(arg, "--filter-provided")) {
+			filter_options.filter_wants = 1;
+			continue;
+		}
 		if (!strcmp(arg, "--filter-print-omitted")) {
 			arg_print_omitted = 1;
 			continue;
@@ -694,6 +698,16 @@  int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			return show_bisect_vars(&info, reaches, all);
 	}
 
+	if (filter_options.filter_wants) {
+		struct commit_list *c;
+		for (i = 0; i < revs.pending.nr; i++) {
+			struct object_array_entry *pending = revs.pending.objects + i;
+			pending->item->flags |= NOT_USER_GIVEN;
+		}
+		for (c = revs.commits; c; c = c->next)
+			c->item->object.flags |= NOT_USER_GIVEN;
+	}
+
 	if (arg_print_omitted)
 		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
 	if (arg_missing_action == MA_PRINT)
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index bb6f6577d5..2877aa9e96 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -242,6 +242,7 @@  static void transform_to_combine_type(
 		memset(filter_options, 0, sizeof(*filter_options));
 		filter_options->sub = sub_array;
 		filter_options->sub_alloc = initial_sub_alloc;
+		filter_options->filter_wants = sub_array[0].filter_wants;
 	}
 	filter_options->sub_nr = 1;
 	filter_options->choice = LOFC_COMBINE;
@@ -290,6 +291,9 @@  void parse_list_objects_filter(
 		parse_error = gently_parse_list_objects_filter(
 			&filter_options->sub[filter_options->sub_nr - 1], arg,
 			&errbuf);
+		if (!parse_error)
+			filter_options->sub[filter_options->sub_nr - 1].filter_wants =
+				filter_options->filter_wants;
 	}
 	if (parse_error)
 		die("%s", errbuf.buf);
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 4d0d0588cc..5e609e307a 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -42,6 +42,12 @@  struct list_objects_filter_options {
 	 */
 	enum list_objects_filter_choice choice;
 
+	/*
+	 * "--filter-provided" was given by the user, instructing us to also
+	 * filter all explicitly provided objects.
+	 */
+	unsigned int filter_wants : 1;
+
 	/*
 	 * Choice is LOFC_DISABLED because "--no-filter" was requested.
 	 */
diff --git a/pack-bitmap.c b/pack-bitmap.c
index e33805e076..5ff800316b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1101,7 +1101,8 @@  struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
-	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
+	filter_bitmap(bitmap_git, (filter && filter->filter_wants) ? NULL : wants,
+		      wants_bitmap, filter);
 
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index c79ec04060..47c558ab0e 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -207,6 +207,34 @@  test_expect_success 'verify object:type=tag prints tag' '
 	test_cmp expected actual
 '
 
+test_expect_success 'verify object:type=blob prints only blob with --filter-provided' '
+	printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=blob --filter-provided HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tree prints only tree with --filter-provided' '
+	printf "%s \n" $(git -C object-type rev-parse HEAD^{tree}) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tree HEAD --filter-provided >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=commit prints only commit with --filter-provided' '
+	git -C object-type rev-parse HEAD >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=commit --filter-provided HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tag prints only tag with --filter-provided' '
+	printf "%s tag\n" $(git -C object-type rev-parse tag) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tag --filter-provided tag >actual &&
+	test_cmp expected actual
+'
+
 # Test sparse:path=<path> filter.
 # !!!!
 # NOTE: sparse:path filter support has been dropped for security reasons,
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index cb9db7df6f..9053ac5059 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -98,6 +98,28 @@  test_expect_success 'object:type filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'object:type filter with --filter-provided' '
+	git rev-list --objects --filter-provided --filter=object:type=tag tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=tag tag >actual &&
+	test_cmp expect actual &&
+
+	git rev-list --objects --filter-provided --filter=object:type=commit tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=commit tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter-provided --filter=object:type=tree tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=tree tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter-provided --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_expect_success 'combine filter' '
 	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
 	git rev-list --use-bitmap-index \
@@ -105,4 +127,18 @@  test_expect_success 'combine filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'combine filter with --filter-provided' '
+	git rev-list --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git cat-file --batch-check="%(objecttype) %(objectsize)" <actual >objects &&
+	while read objecttype objectsize
+	do
+		test "$objecttype" = blob || return 1
+		test "$objectsize" -le 1000 || return 1
+	done <objects
+'
+
 test_done