diff mbox series

[v1] load_ref_decorations(): fix decoration with tags

Message ID 20210712224152.2698500-1-martin.agren@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v1] load_ref_decorations(): fix decoration with tags | expand

Commit Message

Martin Ågren July 12, 2021, 10:41 p.m. UTC
Commit 88473c8bae ("load_ref_decorations(): avoid parsing non-tag
objects", 2021-06-22) introduced a shortcut to `add_ref_decoration()`:
Rather than calling `parse_object()` directly, call `oid_object_info()`
and then either return early or go on to call `lookup_object_by_type()`
using the type just discovered. As detailed in the commit message, this
provides a significant time saving.

Unfortunately, it also changes the behavior. As an example, on git.git,

  git log --oneline --decorate origin/master | grep '(.*tag:.*)'

returns zero hits after 88473c8bae. Before it, we have around 800 hits.
What happens is, all annotated tags are lost from the decoration.

Let's do a partial revert of 88473c8bae: Keep doing that early
`oid_object_info()`, but after that, go on with the full
`parse_object()`. This restores the pre-88473c8bae behavior. We clearly
have lacking test coverage here, so make sure to add a test. Without
this fix to log-tree.c, the git-log invocation in this new test does
pick up the lightweight tags involved, but misses out on the annotated
tag.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Cc-ing Peff, the author of 88473c8bae, and Taylor, who made a comment
 regarding the peeling of tags [1], which might be related.

 I'm genuinely unconvinced that my proposed fix is the best possible one.
 Or maybe trying a more surgical fix around annotated tags misses a
 whole bunch of *other* special cases and we really should go for the
 full `parse_object()` to cover all possibilities.

 In my brief testing (time git log -1 --decorate on git.git), the time
 savings from 88473c8bae seem to be gone. So maybe this should be a full
 revert, rather than a partial one. (Plus the test.) Let's hope that
 won't be necessary.

 Also, I'm not sure whether the test really needs to worry about the
 order of the decorations suddenly changing -- maybe it's supposed to be
 stable.

 [1] https://lore.kernel.org/git/YNKgkGkPiMgNubNE@nand.local/

 log-tree.c     |  6 ++----
 t/t4202-log.sh | 14 ++++++++++++++
 2 files changed, 16 insertions(+), 4 deletions(-)

Comments

Jeff King July 13, 2021, 12:06 a.m. UTC | #1
On Tue, Jul 13, 2021 at 12:41:52AM +0200, Martin Ågren wrote:

> Commit 88473c8bae ("load_ref_decorations(): avoid parsing non-tag
> objects", 2021-06-22) introduced a shortcut to `add_ref_decoration()`:
> Rather than calling `parse_object()` directly, call `oid_object_info()`
> and then either return early or go on to call `lookup_object_by_type()`
> using the type just discovered. As detailed in the commit message, this
> provides a significant time saving.
> 
> Unfortunately, it also changes the behavior. As an example, on git.git,
> 
>   git log --oneline --decorate origin/master | grep '(.*tag:.*)'
> 
> returns zero hits after 88473c8bae. Before it, we have around 800 hits.
> What happens is, all annotated tags are lost from the decoration.

Eek. Thanks for catching this.

I wondered how I could have missed this, but it does work if somebody
else happens to have parsed it. For example:

  $ git log -1 --oneline --decorate v5.11
  f40ddce88593 (tag: v5.11) Linux 5.11

works because we'll already have parsed it as a traversal tip.

> Let's do a partial revert of 88473c8bae: Keep doing that early
> `oid_object_info()`, but after that, go on with the full
> `parse_object()`. This restores the pre-88473c8bae behavior.

Your fix is _almost_ there. There's not much point in doing
oid_object_info() if we're just going to parse unconditionally. But we
would want to parse only tags.

And we even do parse tags recursively in the peeling loop. The trouble
is that we do so after realizing we need to recurse. We just need to
bump it up in the loop, like:

diff --git a/log-tree.c b/log-tree.c
index 4f69ed176d..6dc4412268 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -174,11 +174,11 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
 
 	add_name_decoration(deco_type, refname, obj);
 	while (obj->type == OBJ_TAG) {
+		if (!obj->parsed)
+			parse_object(the_repository, &obj->oid);
 		obj = ((struct tag *)obj)->tagged;
 		if (!obj)
 			break;
-		if (!obj->parsed)
-			parse_object(the_repository, &obj->oid);
 		add_name_decoration(DECORATION_REF_TAG, refname, obj);
 	}
 	return 0;

That's the minimum needed to unbreak things. I think we could do even
better, though. There is no need for us to parse a commit object pointed
to by a tag here. We should only be parsing tags we see (whether at the
top-level or recursively).

Do you want to incorporate the fix above, or would you prefer me to pick
it up from here?

> We clearly
> have lacking test coverage here, so make sure to add a test. Without
> this fix to log-tree.c, the git-log invocation in this new test does
> pick up the lightweight tags involved, but misses out on the annotated
> tag.

Yeah, definitely.

>  In my brief testing (time git log -1 --decorate on git.git), the time
>  savings from 88473c8bae seem to be gone. So maybe this should be a full
>  revert, rather than a partial one. (Plus the test.) Let's hope that
>  won't be necessary.

Right. After your patch, the oid_object_info() is pointless, because
we're still parsing everything (the "< 0" error check would only trigger
in a corrupted repo). And it adds some overhead, so it may even be
slightly slower than the original code. :)

The timings I get for "git log -1 --decorate" on git.git are:

  -  before either patch: 27.5ms
  - with my broken patch:  5.9ms
  - with the patch above: 11.3ms

which makes sense. I have a bunch of branches, and now we don't parse
them. We do still have to parse tags. On linux.git, where it's almost
entirely tags, most of the advantage dries up (but it would probably be
helped a bit by the further suggestion I gave above to avoid parsing
tagged commits).

On my big ~220k ref test case (where it's mostly non-tags), the timings
are:

  -  before either patch: 2.945s
  - with my broken patch: 0.707s
  - with the patch above: 0.788s

so the savings are retained.

>  Also, I'm not sure whether the test really needs to worry about the
>  order of the decorations suddenly changing -- maybe it's supposed to be
>  stable.

I think it's probably OK to count on it being stable. We iterate the
refs in a stable order to insert them, and then store the result as a
linked list. If that strategy ever changed, I think we'd end up doing a
manual sort on them to get a stable order anyway.

> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index 350cfa3593..3aa5451913 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -1905,6 +1905,20 @@ test_expect_success '--exclude-promisor-objects does not BUG-crash' '
>  	test_must_fail git log --exclude-promisor-objects source-a
>  '
>  
> +test_expect_success 'log --decorate includes, e.g., all kinds of tags' '
> +	git log --oneline --decorate >log &&
> +	test_line_count = 2 log &&
> +	grep "^1ac6c77 (tag: one) one$" log &&

This presumably breaks when the tests are run in sha256 mode. Coupled
with the ordering simplification, maybe:

        cat >expect <<-\EOF &&
        three HEAD -> source-b, tag: three, tag: source-tag
        one tag: one
        EOF
        git log --format="%s %D" >actual &&
        test_cmp expect actual

(or maybe %d is more readable in the output; it doesn't matter much if
we're matching it verbatim).

Thanks again for noticing this.

-Peff
diff mbox series

Patch

diff --git a/log-tree.c b/log-tree.c
index 4f69ed176d..0b638d2e3c 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -134,7 +134,6 @@  static int add_ref_decoration(const char *refname, const struct object_id *oid,
 			      int flags, void *cb_data)
 {
 	struct object *obj;
-	enum object_type objtype;
 	enum decoration_type deco_type = DECORATION_NONE;
 	struct decoration_filter *filter = (struct decoration_filter *)cb_data;
 
@@ -156,10 +155,9 @@  static int add_ref_decoration(const char *refname, const struct object_id *oid,
 		return 0;
 	}
 
-	objtype = oid_object_info(the_repository, oid, NULL);
-	if (objtype < 0)
+	if (oid_object_info(the_repository, oid, NULL) < 0)
 		return 0;
-	obj = lookup_object_by_type(the_repository, oid, objtype);
+	obj = parse_object(the_repository, oid);
 
 	if (starts_with(refname, "refs/heads/"))
 		deco_type = DECORATION_REF_LOCAL;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 350cfa3593..3aa5451913 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1905,6 +1905,20 @@  test_expect_success '--exclude-promisor-objects does not BUG-crash' '
 	test_must_fail git log --exclude-promisor-objects source-a
 '
 
+test_expect_success 'log --decorate includes, e.g., all kinds of tags' '
+	git log --oneline --decorate >log &&
+	test_line_count = 2 log &&
+	grep "^1ac6c77 (tag: one) one$" log &&
+	grep HEAD log | sed -e "s/^.*(\(.*\)).*$/\1/" -e "s/, /\n/g" |
+		sort >actual &&
+	cat >expect <<-\EOF &&
+		HEAD -> source-b
+		tag: source-tag
+		tag: three
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'log --end-of-options' '
        git update-ref refs/heads/--source HEAD &&
        git log --end-of-options --source >actual &&