[v2] load_ref_decorations(): fix decoration with tags

Message ID	20210713074018.232372-1-martin.agren@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: =?utf-8?q?Martin_=C3=85gren?= <martin.agren@gmail.com> To: git@vger.kernel.org Cc: Jeff King <peff@peff.net>, Taylor Blau <me@ttaylorr.com> Subject: [PATCH v2] load_ref_decorations(): fix decoration with tags Date: Tue, 13 Jul 2021 09:40:18 +0200 Message-Id: <20210713074018.232372-1-martin.agren@gmail.com> In-Reply-To: <YOzY+qNFM2GsgKMO@coredump.intra.peff.net> References: <YOzY+qNFM2GsgKMO@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] load_ref_decorations(): fix decoration with tags \| expand [v2] load_ref_decorations(): fix decoration with tags

Martin Ågren July 13, 2021, 7:40 a.m. UTC

Commit 88473c8bae ("load_ref_decorations(): avoid parsing non-tag
objects", 2021-06-22) introduced a shortcut to `add_ref_decoration()`:
Rather than calling `parse_object()`, we go for `oid_object_info()` and
then `lookup_object_by_type()` using the type just discovered. As
detailed in the commit message, this provides a significant time saving.

Unfortunately, it also changes the behavior: We lose all annotated tags
from the decoration.

The reason this happens is in the loop where we try to peel the tags, we
won't necessarily have parsed that first object. If we haven't, its
`tag` will be NULL, so nothing will be displayed, and its `tagged` will
also be NULL, so we won't peel any further.

Make sure to parse the tag object at the top of the peeling loop. This
effectively restores the pre-88473c8bae parsing -- but only of tags,
allowing us to keep most of the possible speedup from 88473c8bae. Jeff
King reports:

  On my big ~220k ref test case (where it's mostly non-tags), the
  timings [using "git log -1 --decorate"] are:

    - before either patch: 2.945s
    - with my broken patch: 0.707s
    - with [this patch]: 0.788s

Note how this commit could have been done as an optimization before
88473c8bae: When our peeling hits a non-tag, we won't parse that tagged
object only to immediately end the loop.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 On Tue, 13 Jul 2021 at 02:06, Jeff King <peff@peff.net> wrote:
 >
 > Your fix is _almost_ there.

 It's very kind of you to put it like that. I've picked up your
 suggestions and have tried to summarize my understanding of the issue
 and the fix in the commit message.

 > That's the minimum needed to unbreak things. I think we could do even
 > better, though. There is no need for us to parse a commit object pointed
 > to by a tag here. We should only be parsing tags we see (whether at the
 > top-level or recursively).

 Maybe you wrote this before circling back and actually writing that
 "even better" thing? Because it seems to me like that's what you did.
 Maybe I'm still missing something.

 Thank you for your insightful and helpful comments.

 log-tree.c     | 4 ++--
 t/t4202-log.sh | 9 +++++++++
 2 files changed, 11 insertions(+), 2 deletions(-)

Jeff King July 13, 2021, 7:52 a.m. UTC | #1

[+cc Junio; this patch looks good to me, and should go on top of
	    jk/log-decorate-optim, which is in 'next' and has a pretty
	    ugly regression]

On Tue, Jul 13, 2021 at 09:40:18AM +0200, Martin Ågren wrote:

> Commit 88473c8bae ("load_ref_decorations(): avoid parsing non-tag
> objects", 2021-06-22) introduced a shortcut to `add_ref_decoration()`:
> Rather than calling `parse_object()`, we go for `oid_object_info()` and
> then `lookup_object_by_type()` using the type just discovered. As
> detailed in the commit message, this provides a significant time saving.
> 
> Unfortunately, it also changes the behavior: We lose all annotated tags
> from the decoration.
> 
> The reason this happens is in the loop where we try to peel the tags, we
> won't necessarily have parsed that first object. If we haven't, its
> `tag` will be NULL, so nothing will be displayed, and its `tagged` will
> also be NULL, so we won't peel any further.

Thanks, nicely explained.

> Note how this commit could have been done as an optimization before
> 88473c8bae: When our peeling hits a non-tag, we won't parse that tagged
> object only to immediately end the loop.

Yep, thanks for mentioning this, as it's somewhat subtle.

>  On Tue, 13 Jul 2021 at 02:06, Jeff King <peff@peff.net> wrote:
>  >
>  > Your fix is _almost_ there.
> 
>  It's very kind of you to put it like that. I've picked up your
>  suggestions and have tried to summarize my understanding of the issue
>  and the fix in the commit message.

When I wrote that, I thought the fix would just be:

  if (obj_type == OBJ_TAG)
	parse_object(...);

which really would put it only one line off of your fix. :)

>  > That's the minimum needed to unbreak things. I think we could do even
>  > better, though. There is no need for us to parse a commit object pointed
>  > to by a tag here. We should only be parsing tags we see (whether at the
>  > top-level or recursively).
> 
>  Maybe you wrote this before circling back and actually writing that
>  "even better" thing? Because it seems to me like that's what you did.
>  Maybe I'm still missing something.

Nope, I'm just dumb. I wrote what I sent in the other email (rather than
just adding the "if" as above) because it only involved having a single
parse_object() call in the function. To my credit, I did realize about
an hour after sending the other email that I had in fact done the
"better thing" quite accidentally. But I really like how you explained
it in the commit message here, which I had not quite thought through.

>  log-tree.c     | 4 ++--
>  t/t4202-log.sh | 9 +++++++++

Patch looks good. Thanks for noticing the problem and cleaning up my
mess.

-Peff

Martin Ågren July 13, 2021, 8:47 a.m. UTC | #2

On Tue, 13 Jul 2021 at 09:52, Jeff King <peff@peff.net> wrote:
>
> [+cc Junio; this patch looks good to me, and should go on top of
>             jk/log-decorate-optim, which is in 'next' and has a pretty
>             ugly regression]

Thanks for calling out the branch name when I failed to. That's indeed
where my copy of the patch sits and where I've tested this.

> >  Maybe you wrote this before circling back and actually writing that
> >  "even better" thing? Because it seems to me like that's what you did.
> >  Maybe I'm still missing something.
>
> Nope, I'm just dumb. I wrote what I sent in the other email (rather than
> just adding the "if" as above) because it only involved having a single
> parse_object() call in the function. To my credit, I did realize about
> an hour after sending the other email that I had in fact done the
> "better thing" quite accidentally. But I really like how you explained
> it in the commit message here, which I had not quite thought through.

That's the best kind of hindsight: the sudden realization that you
actually got it right.

> Patch looks good. Thanks for noticing the problem and cleaning up my
> mess.

Thanks a lot for your comments and all the help. I'm glad you liked my
explanation in the posted patch.

Martin

Junio C Hamano July 13, 2021, 9:17 p.m. UTC | #3

Jeff King <peff@peff.net> writes:

>> The reason this happens is in the loop where we try to peel the tags, we
>> won't necessarily have parsed that first object. If we haven't, its
>> `tag` will be NULL, so nothing will be displayed, and its `tagged` will
>> also be NULL, so we won't peel any further.
>
> Thanks, nicely explained.

Yup, nicely explained indeed.

>> Note how this commit could have been done as an optimization before
>> 88473c8bae: When our peeling hits a non-tag, we won't parse that tagged
>> object only to immediately end the loop.
>
> Yep, thanks for mentioning this, as it's somewhat subtle.

It is too subtle that I am not sure what the paragraph wants to say.

Before 88473c8b, we had a fully parsed object in obj and entered the
while() loop iff the outermost object is a tag, then we find the
underlying object via obj->tagged.  We parse that underlying object
to find if it is a tag, and break out if it is not.

By "this commit", I assume that the above mean the change in this
fix, i.e. parse 'obj' if it has not been parsed before looking at
its tagged field.  But I am not sure how that would have been an
optimization before 88473c8b that gave a parsed tag object 'obj'
upon entry to the loop.

Puzzled.

In any case, let's talk about this patch in the context to which it
is designed to be applied, i.e. post 88473c8b3c8b.

When we come here, we have done oid_object_info() plus
lookup_object_by_type() to obtain 'obj' and we know its type.
Then we enter the loop.

 	while (obj->type == OBJ_TAG) {
+		if (!obj->parsed)
+			parse_object(the_repository, &obj->oid);

And we parse if it hasn't been parsed.  THat is why we can ...

 		obj = ((struct tag *)obj)->tagged;
 		if (!obj)
 			break;

... look at its tagged member.

-		if (!obj->parsed)
-			parse_object(the_repository, &obj->oid);
 		add_name_decoration(DECORATION_REF_TAG, refname, obj);

And the updated 'obj' (i.e. direct referent of the tag object) is
fed to add_name_decoration().  And then we move to the next
iteration.

Now, do we know what type of object 'obj' is at this point?  We
did parse the outermost tag upon entry to this loop, we replaced
'obj' variable with the referent by following the .tagged member,
but we haven't parsed that object or ran oid_object_info() on it.

Puzzled.

 	}

Jeff King July 13, 2021, 9:27 p.m. UTC | #4

On Tue, Jul 13, 2021 at 02:17:53PM -0700, Junio C Hamano wrote:

> >> Note how this commit could have been done as an optimization before
> >> 88473c8bae: When our peeling hits a non-tag, we won't parse that tagged
> >> object only to immediately end the loop.
> >
> > Yep, thanks for mentioning this, as it's somewhat subtle.
> 
> It is too subtle that I am not sure what the paragraph wants to say.
> 
> Before 88473c8b, we had a fully parsed object in obj and entered the
> while() loop iff the outermost object is a tag, then we find the
> underlying object via obj->tagged.  We parse that underlying object
> to find if it is a tag, and break out if it is not.
> 
> By "this commit", I assume that the above mean the change in this
> fix, i.e. parse 'obj' if it has not been parsed before looking at
> its tagged field.  But I am not sure how that would have been an
> optimization before 88473c8b that gave a parsed tag object 'obj'
> upon entry to the loop.
> 
> Puzzled.

The optimization is that we are parsing tags before looking at their
structs, instead of always parsing the thing that the tag points to.

So in the old loop (pseudo-code for clarity):

  parse_object(obj);
  while (obj->type == OBJ_TAG) {
          obj = obj->tagged;
	  parse_object(obj);
  }

if we had a tag pointing to a commit, we'd parse the commit. But we
don't need to. We just need to know that it exists and is a commit.

In the new code, we parse only when we need to look at obj->tagged:

  while (obj->type == OBJ_TAG) {
          parse_object(obj);
	  obj = obj->tagged;
  }

So we must "somehow" know the type of "obj" in the first place, as well
as the type of every obj->tagged we look at. And that leads into your
question here:

> In any case, let's talk about this patch in the context to which it
> is designed to be applied, i.e. post 88473c8b3c8b.
> 
> When we come here, we have done oid_object_info() plus
> lookup_object_by_type() to obtain 'obj' and we know its type.
> Then we enter the loop.
> 
>  	while (obj->type == OBJ_TAG) {
> +		if (!obj->parsed)
> +			parse_object(the_repository, &obj->oid);
> 
> And we parse if it hasn't been parsed.  THat is why we can ...
> 
>  		obj = ((struct tag *)obj)->tagged;
>  		if (!obj)
>  			break;
> 
> ... look at its tagged member.
> 
> -		if (!obj->parsed)
> -			parse_object(the_repository, &obj->oid);
>  		add_name_decoration(DECORATION_REF_TAG, refname, obj);
> 
> And the updated 'obj' (i.e. direct referent of the tag object) is
> fed to add_name_decoration().  And then we move to the next
> iteration.
> 
> Now, do we know what type of object 'obj' is at this point?  We
> did parse the outermost tag upon entry to this loop, we replaced
> 'obj' variable with the referent by following the .tagged member,
> but we haven't parsed that object or ran oid_object_info() on it.
> 
> Puzzled.

...and the answer is that we don't need to parse it. The tag object
mentions the type of what it points to, and we use lookup_commit(), etc,
to create the object pointed to by its "tagged" field. If we can't do
that (say, because the tag is missing the type field, or we previously
saw the object as a different type, etc), then obj->tagged would be NULL
and we'd break out of the loop.

-Peff

Junio C Hamano July 13, 2021, 9:40 p.m. UTC | #5

Jeff King <peff@peff.net> writes:

>> Puzzled.
>
> ...and the answer is that we don't need to parse it. The tag object
> mentions the type of what it points to, and we use lookup_commit(), etc,
> to create the object pointed to by its "tagged" field.

Ahh, parse_object() on the outer tag, when instantiating the in-core
obj, allocated an in-core object and that instance is already given
a type from the tag object and .taggeed member points at that
object, so it is not an "unknown" object (tag.c::parse_tag_buffer()).

Totally forgot about that one; thanks.

Martin Ågren July 13, 2021, 9:52 p.m. UTC | #6

Earlier, Junio C Hamano <gitster@pobox.com> wrote:

>>> Note how this commit could have been done as an optimization before
>>> 88473c8bae: When our peeling hits a non-tag, we won't parse that tagged
>>> object only to immediately end the loop.
>>
>> Yep, thanks for mentioning this, as it's somewhat subtle.
>
> It is too subtle that I am not sure what the paragraph wants to say.

Then:

> Jeff King <peff@peff.net> writes:
>
> >> Puzzled.
> >
> > ...and the answer is that we don't need to parse it. The tag object
> > mentions the type of what it points to, and we use lookup_commit(), etc,
> > to create the object pointed to by its "tagged" field.
>
> Ahh, parse_object() on the outer tag, when instantiating the in-core
> obj, allocated an in-core object and that instance is already given
> a type from the tag object and .taggeed member points at that
> object, so it is not an "unknown" object (tag.c::parse_tag_buffer()).
>
> Totally forgot about that one; thanks.

Do you have any suggestions for how this could be explained better? I
waffled on whether to add that paragraph to the commit message and when
I finally did, it seems it got a little bit too succinct.

I'm about to check out for today. Maybe in the morning I can think of
some clarification.

Martin

Jeff King July 13, 2021, 10:22 p.m. UTC | #7

On Tue, Jul 13, 2021 at 11:52:53PM +0200, Martin Ågren wrote:

> > >> Puzzled.
> > >
> > > ...and the answer is that we don't need to parse it. The tag object
> > > mentions the type of what it points to, and we use lookup_commit(), etc,
> > > to create the object pointed to by its "tagged" field.
> >
> > Ahh, parse_object() on the outer tag, when instantiating the in-core
> > obj, allocated an in-core object and that instance is already given
> > a type from the tag object and .taggeed member points at that
> > object, so it is not an "unknown" object (tag.c::parse_tag_buffer()).
> >
> > Totally forgot about that one; thanks.
> 
> Do you have any suggestions for how this could be explained better? I
> waffled on whether to add that paragraph to the commit message and when
> I finally did, it seems it got a little bit too succinct.
> 
> I'm about to check out for today. Maybe in the morning I can think of
> some clarification.

My attempt is below. Most of the new explanation is near the end, but I
tweaked a few other things.

Your original said:

  The reason this happens is in the loop where we try to peel the tags,
  we won't necessarily have parsed that first object. If we haven't, its
  `tag` will be NULL, so nothing will be displayed, and its `tagged`
  will also be NULL, so we won't peel any further.

and my earlier explanations were not thinking of the "tag" field at all,
which made me worried there was another subtle bug in not parsing the
tag earlier. But I don't think so. We don't look at the "tag" field for
setting the annotation; it always comes from the refname. So the
paragraph above should not mention "tag" at all.

I also beefed up the test a bit. All this talk of parsing made me want
to make sure we were covering tags-of-tags correctly (which I think we
are both before and after the patch). After adding that, the expected
decoration output was getting quite cluttered. So I tweaked the test to
make a new commit, give the tags sensible names, and just look at that
one commit.

Here it is.

-- >8 --
From: Martin Ågren <martin.agren@gmail.com>
Subject: load_ref_decorations(): fix decoration with tags

Commit 88473c8bae ("load_ref_decorations(): avoid parsing non-tag
objects", 2021-06-22) introduced a shortcut to `add_ref_decoration()`:
Rather than calling `parse_object()`, we go for `oid_object_info()` and
then `lookup_object_by_type()` using the type just discovered. As
detailed in the commit message, this provides a significant time saving.

Unfortunately, it also changes the behavior: We lose all annotated tags
from the decoration.

The reason this happens is in the loop where we try to peel the tags, we
won't necessarily have parsed that first object. If we haven't, its
`tagged` field will be NULL, so we won't actually add a decoration for
the pointed-to object.

Make sure to parse the tag object at the top of the peeling loop. This
effectively restores the pre-88473c8bae parsing -- but only of tags,
allowing us to keep most of the possible speedup from 88473c8bae. Jeff
King reports:

  On my big ~220k ref test case (where it's mostly non-tags), the
  timings [using "git log -1 --decorate"] are:

    - before either patch: 2.945s
    - with my broken patch: 0.707s
    - with [this patch]: 0.788s

The simplest way to do this is to just conditionally parse before the
loop:

  if (obj->type == OBJ_TAG)
          parse_object(&obj->oid);

But we can observe that our tag-peeling loop needs to peel already, to
examine recursive tags-of-tags. So instead of introducing a new call to
parse_object(), we can simply move the parsing higher in the loop:
instead of parsing the new object before we loop, parse each tag object
before we look at its "tagged" field.

This has another beneficial side effect: if a tag points at a commit (or
other non-tag type), we do not bother to parse the commit at all now.
And we know it is a commit without calling oid_object_info(), because
parsing the surrounding tag object will have created the correct in-core
object based on the "type" field of the tag.

Our test coverage for --decorate was obviously not good, since we missed
this quite-basic regression. The new tests covers an annotated tag
(showing the fix), but also that we correctly show annotations for
lightweight tags and double-annotated tag-of-tags.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 log-tree.c     |  4 ++--
 t/t4202-log.sh | 14 ++++++++++++++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index 4f69ed176d..6dc4412268 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -174,11 +174,11 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,

 	add_name_decoration(deco_type, refname, obj);
 	while (obj->type == OBJ_TAG) {
+		if (!obj->parsed)
+			parse_object(the_repository, &obj->oid);
 		obj = ((struct tag *)obj)->tagged;
 		if (!obj)
 			break;
-		if (!obj->parsed)
-			parse_object(the_repository, &obj->oid);
 		add_name_decoration(DECORATION_REF_TAG, refname, obj);
 	}
 	return 0;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 350cfa3593..fe8f5e2067 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1905,6 +1905,20 @@ test_expect_success '--exclude-promisor-objects does not BUG-crash' '
 	test_must_fail git log --exclude-promisor-objects source-a
 '

+test_expect_success 'log --decorate includes all levels of tag annotated tags' '
+	git checkout -b branch &&
+	git commit --allow-empty -m "new commit" &&
+	git tag lightweight HEAD &&
+	git tag -m annotated annotated HEAD &&
+	git tag -m double-0 double-0 HEAD &&
+	git tag -m double-1 double-1 double-0 &&
+	cat >expect <<-\EOF &&
+	HEAD -> branch, tag: lightweight, tag: double-1, tag: double-0, tag: annotated
+	EOF
+	git log -1 --format="%D" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'log --end-of-options' '
        git update-ref refs/heads/--source HEAD &&
        git log --end-of-options --source >actual &&

Martin Ågren July 14, 2021, 8:13 a.m. UTC | #8

On Wed, 14 Jul 2021 at 00:22, Jeff King <peff@peff.net> wrote:
>
> On Tue, Jul 13, 2021 at 11:52:53PM +0200, Martin Ågren wrote:
>
> > > Totally forgot about that one; thanks.
> >
> > Do you have any suggestions for how this could be explained better? I
> > waffled on whether to add that paragraph to the commit message and when
> > I finally did, it seems it got a little bit too succinct.
> >
> > I'm about to check out for today. Maybe in the morning I can think of
> > some clarification.
>
> My attempt is below. Most of the new explanation is near the end, but I
> tweaked a few other things.
>
> Your original said:
>
>   The reason this happens is in the loop where we try to peel the tags,
>   we won't necessarily have parsed that first object. If we haven't, its
>   `tag` will be NULL, so nothing will be displayed, and its `tagged`
>   will also be NULL, so we won't peel any further.
>
> and my earlier explanations were not thinking of the "tag" field at all,
> which made me worried there was another subtle bug in not parsing the
> tag earlier. But I don't think so. We don't look at the "tag" field for
> setting the annotation; it always comes from the refname. So the
> paragraph above should not mention "tag" at all.

Thanks for correcting that. The parsed-ness of "obj" affects whether the
decoration is shown at all. I originally concluded that when the
decorations are eventually displayed, it was something like "if ->tag is
non-NULL, display it". But that's obviously not the case. It's more like
"->tagged is NULL, so I have no idea where to place this decoration".

> I also beefed up the test a bit. All this talk of parsing made me want
> to make sure we were covering tags-of-tags correctly (which I think we
> are both before and after the patch). After adding that, the expected
> decoration output was getting quite cluttered. So I tweaked the test to
> make a new commit, give the tags sensible names, and just look at that
> one commit.

> From: Martin Ågren <martin.agren@gmail.com>

At this point, I think it's fair to say that you've done most of the
authoring here. I wouldn't be at all offended if you took the credit for
this patch. It's your code diff, it's your test, and now it's even your
*updated* test, plus half the commit message. :)

Here's that added half of the message:

> The simplest way to do this is to just conditionally parse before the
> loop:
>
>   if (obj->type == OBJ_TAG)
>           parse_object(&obj->oid);
>
> But we can observe that our tag-peeling loop needs to peel already, to
> examine recursive tags-of-tags. So instead of introducing a new call to
> parse_object(), we can simply move the parsing higher in the loop:
> instead of parsing the new object before we loop, parse each tag object
> before we look at its "tagged" field.
>
> This has another beneficial side effect: if a tag points at a commit (or
> other non-tag type), we do not bother to parse the commit at all now.
> And we know it is a commit without calling oid_object_info(), because
> parsing the surrounding tag object will have created the correct in-core
> object based on the "type" field of the tag.
>
> Our test coverage for --decorate was obviously not good, since we missed
> this quite-basic regression. The new tests covers an annotated tag
> (showing the fix), but also that we correctly show annotations for
> lightweight tags and double-annotated tag-of-tags.

Very well described.

> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>

If you take authorship of this, I think this could be something like

Reported-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Martin Ågren <martin.agren@gmail.com>

Martin

[v2] load_ref_decorations(): fix decoration with tags

Commit Message

Comments

Patch