Message ID | 20210713074018.232372-1-martin.agren@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] load_ref_decorations(): fix decoration with tags | expand |
[+cc Junio; this patch looks good to me, and should go on top of jk/log-decorate-optim, which is in 'next' and has a pretty ugly regression] On Tue, Jul 13, 2021 at 09:40:18AM +0200, Martin Ågren wrote: > Commit 88473c8bae ("load_ref_decorations(): avoid parsing non-tag > objects", 2021-06-22) introduced a shortcut to `add_ref_decoration()`: > Rather than calling `parse_object()`, we go for `oid_object_info()` and > then `lookup_object_by_type()` using the type just discovered. As > detailed in the commit message, this provides a significant time saving. > > Unfortunately, it also changes the behavior: We lose all annotated tags > from the decoration. > > The reason this happens is in the loop where we try to peel the tags, we > won't necessarily have parsed that first object. If we haven't, its > `tag` will be NULL, so nothing will be displayed, and its `tagged` will > also be NULL, so we won't peel any further. Thanks, nicely explained. > Note how this commit could have been done as an optimization before > 88473c8bae: When our peeling hits a non-tag, we won't parse that tagged > object only to immediately end the loop. Yep, thanks for mentioning this, as it's somewhat subtle. > On Tue, 13 Jul 2021 at 02:06, Jeff King <peff@peff.net> wrote: > > > > Your fix is _almost_ there. > > It's very kind of you to put it like that. I've picked up your > suggestions and have tried to summarize my understanding of the issue > and the fix in the commit message. When I wrote that, I thought the fix would just be: if (obj_type == OBJ_TAG) parse_object(...); which really would put it only one line off of your fix. :) > > That's the minimum needed to unbreak things. I think we could do even > > better, though. There is no need for us to parse a commit object pointed > > to by a tag here. We should only be parsing tags we see (whether at the > > top-level or recursively). > > Maybe you wrote this before circling back and actually writing that > "even better" thing? Because it seems to me like that's what you did. > Maybe I'm still missing something. Nope, I'm just dumb. I wrote what I sent in the other email (rather than just adding the "if" as above) because it only involved having a single parse_object() call in the function. To my credit, I did realize about an hour after sending the other email that I had in fact done the "better thing" quite accidentally. But I really like how you explained it in the commit message here, which I had not quite thought through. > log-tree.c | 4 ++-- > t/t4202-log.sh | 9 +++++++++ Patch looks good. Thanks for noticing the problem and cleaning up my mess. -Peff
On Tue, 13 Jul 2021 at 09:52, Jeff King <peff@peff.net> wrote: > > [+cc Junio; this patch looks good to me, and should go on top of > jk/log-decorate-optim, which is in 'next' and has a pretty > ugly regression] Thanks for calling out the branch name when I failed to. That's indeed where my copy of the patch sits and where I've tested this. > > Maybe you wrote this before circling back and actually writing that > > "even better" thing? Because it seems to me like that's what you did. > > Maybe I'm still missing something. > > Nope, I'm just dumb. I wrote what I sent in the other email (rather than > just adding the "if" as above) because it only involved having a single > parse_object() call in the function. To my credit, I did realize about > an hour after sending the other email that I had in fact done the > "better thing" quite accidentally. But I really like how you explained > it in the commit message here, which I had not quite thought through. That's the best kind of hindsight: the sudden realization that you actually got it right. > Patch looks good. Thanks for noticing the problem and cleaning up my > mess. Thanks a lot for your comments and all the help. I'm glad you liked my explanation in the posted patch. Martin
Jeff King <peff@peff.net> writes: >> The reason this happens is in the loop where we try to peel the tags, we >> won't necessarily have parsed that first object. If we haven't, its >> `tag` will be NULL, so nothing will be displayed, and its `tagged` will >> also be NULL, so we won't peel any further. > > Thanks, nicely explained. Yup, nicely explained indeed. >> Note how this commit could have been done as an optimization before >> 88473c8bae: When our peeling hits a non-tag, we won't parse that tagged >> object only to immediately end the loop. > > Yep, thanks for mentioning this, as it's somewhat subtle. It is too subtle that I am not sure what the paragraph wants to say. Before 88473c8b, we had a fully parsed object in obj and entered the while() loop iff the outermost object is a tag, then we find the underlying object via obj->tagged. We parse that underlying object to find if it is a tag, and break out if it is not. By "this commit", I assume that the above mean the change in this fix, i.e. parse 'obj' if it has not been parsed before looking at its tagged field. But I am not sure how that would have been an optimization before 88473c8b that gave a parsed tag object 'obj' upon entry to the loop. Puzzled. In any case, let's talk about this patch in the context to which it is designed to be applied, i.e. post 88473c8b3c8b. When we come here, we have done oid_object_info() plus lookup_object_by_type() to obtain 'obj' and we know its type. Then we enter the loop. while (obj->type == OBJ_TAG) { + if (!obj->parsed) + parse_object(the_repository, &obj->oid); And we parse if it hasn't been parsed. THat is why we can ... obj = ((struct tag *)obj)->tagged; if (!obj) break; ... look at its tagged member. - if (!obj->parsed) - parse_object(the_repository, &obj->oid); add_name_decoration(DECORATION_REF_TAG, refname, obj); And the updated 'obj' (i.e. direct referent of the tag object) is fed to add_name_decoration(). And then we move to the next iteration. Now, do we know what type of object 'obj' is at this point? We did parse the outermost tag upon entry to this loop, we replaced 'obj' variable with the referent by following the .tagged member, but we haven't parsed that object or ran oid_object_info() on it. Puzzled. }
On Tue, Jul 13, 2021 at 02:17:53PM -0700, Junio C Hamano wrote: > >> Note how this commit could have been done as an optimization before > >> 88473c8bae: When our peeling hits a non-tag, we won't parse that tagged > >> object only to immediately end the loop. > > > > Yep, thanks for mentioning this, as it's somewhat subtle. > > It is too subtle that I am not sure what the paragraph wants to say. > > Before 88473c8b, we had a fully parsed object in obj and entered the > while() loop iff the outermost object is a tag, then we find the > underlying object via obj->tagged. We parse that underlying object > to find if it is a tag, and break out if it is not. > > By "this commit", I assume that the above mean the change in this > fix, i.e. parse 'obj' if it has not been parsed before looking at > its tagged field. But I am not sure how that would have been an > optimization before 88473c8b that gave a parsed tag object 'obj' > upon entry to the loop. > > Puzzled. The optimization is that we are parsing tags before looking at their structs, instead of always parsing the thing that the tag points to. So in the old loop (pseudo-code for clarity): parse_object(obj); while (obj->type == OBJ_TAG) { obj = obj->tagged; parse_object(obj); } if we had a tag pointing to a commit, we'd parse the commit. But we don't need to. We just need to know that it exists and is a commit. In the new code, we parse only when we need to look at obj->tagged: while (obj->type == OBJ_TAG) { parse_object(obj); obj = obj->tagged; } So we must "somehow" know the type of "obj" in the first place, as well as the type of every obj->tagged we look at. And that leads into your question here: > In any case, let's talk about this patch in the context to which it > is designed to be applied, i.e. post 88473c8b3c8b. > > When we come here, we have done oid_object_info() plus > lookup_object_by_type() to obtain 'obj' and we know its type. > Then we enter the loop. > > while (obj->type == OBJ_TAG) { > + if (!obj->parsed) > + parse_object(the_repository, &obj->oid); > > And we parse if it hasn't been parsed. THat is why we can ... > > obj = ((struct tag *)obj)->tagged; > if (!obj) > break; > > ... look at its tagged member. > > - if (!obj->parsed) > - parse_object(the_repository, &obj->oid); > add_name_decoration(DECORATION_REF_TAG, refname, obj); > > And the updated 'obj' (i.e. direct referent of the tag object) is > fed to add_name_decoration(). And then we move to the next > iteration. > > Now, do we know what type of object 'obj' is at this point? We > did parse the outermost tag upon entry to this loop, we replaced > 'obj' variable with the referent by following the .tagged member, > but we haven't parsed that object or ran oid_object_info() on it. > > Puzzled. ...and the answer is that we don't need to parse it. The tag object mentions the type of what it points to, and we use lookup_commit(), etc, to create the object pointed to by its "tagged" field. If we can't do that (say, because the tag is missing the type field, or we previously saw the object as a different type, etc), then obj->tagged would be NULL and we'd break out of the loop. -Peff
Jeff King <peff@peff.net> writes: >> Puzzled. > > ...and the answer is that we don't need to parse it. The tag object > mentions the type of what it points to, and we use lookup_commit(), etc, > to create the object pointed to by its "tagged" field. Ahh, parse_object() on the outer tag, when instantiating the in-core obj, allocated an in-core object and that instance is already given a type from the tag object and .taggeed member points at that object, so it is not an "unknown" object (tag.c::parse_tag_buffer()). Totally forgot about that one; thanks.
Earlier, Junio C Hamano <gitster@pobox.com> wrote: >>> Note how this commit could have been done as an optimization before >>> 88473c8bae: When our peeling hits a non-tag, we won't parse that tagged >>> object only to immediately end the loop. >> >> Yep, thanks for mentioning this, as it's somewhat subtle. > > It is too subtle that I am not sure what the paragraph wants to say. Then: > Jeff King <peff@peff.net> writes: > > >> Puzzled. > > > > ...and the answer is that we don't need to parse it. The tag object > > mentions the type of what it points to, and we use lookup_commit(), etc, > > to create the object pointed to by its "tagged" field. > > Ahh, parse_object() on the outer tag, when instantiating the in-core > obj, allocated an in-core object and that instance is already given > a type from the tag object and .taggeed member points at that > object, so it is not an "unknown" object (tag.c::parse_tag_buffer()). > > Totally forgot about that one; thanks. Do you have any suggestions for how this could be explained better? I waffled on whether to add that paragraph to the commit message and when I finally did, it seems it got a little bit too succinct. I'm about to check out for today. Maybe in the morning I can think of some clarification. Martin
On Tue, Jul 13, 2021 at 11:52:53PM +0200, Martin Ågren wrote: > > >> Puzzled. > > > > > > ...and the answer is that we don't need to parse it. The tag object > > > mentions the type of what it points to, and we use lookup_commit(), etc, > > > to create the object pointed to by its "tagged" field. > > > > Ahh, parse_object() on the outer tag, when instantiating the in-core > > obj, allocated an in-core object and that instance is already given > > a type from the tag object and .taggeed member points at that > > object, so it is not an "unknown" object (tag.c::parse_tag_buffer()). > > > > Totally forgot about that one; thanks. > > Do you have any suggestions for how this could be explained better? I > waffled on whether to add that paragraph to the commit message and when > I finally did, it seems it got a little bit too succinct. > > I'm about to check out for today. Maybe in the morning I can think of > some clarification. My attempt is below. Most of the new explanation is near the end, but I tweaked a few other things. Your original said: The reason this happens is in the loop where we try to peel the tags, we won't necessarily have parsed that first object. If we haven't, its `tag` will be NULL, so nothing will be displayed, and its `tagged` will also be NULL, so we won't peel any further. and my earlier explanations were not thinking of the "tag" field at all, which made me worried there was another subtle bug in not parsing the tag earlier. But I don't think so. We don't look at the "tag" field for setting the annotation; it always comes from the refname. So the paragraph above should not mention "tag" at all. I also beefed up the test a bit. All this talk of parsing made me want to make sure we were covering tags-of-tags correctly (which I think we are both before and after the patch). After adding that, the expected decoration output was getting quite cluttered. So I tweaked the test to make a new commit, give the tags sensible names, and just look at that one commit. Here it is. -- >8 -- From: Martin Ågren <martin.agren@gmail.com> Subject: load_ref_decorations(): fix decoration with tags Commit 88473c8bae ("load_ref_decorations(): avoid parsing non-tag objects", 2021-06-22) introduced a shortcut to `add_ref_decoration()`: Rather than calling `parse_object()`, we go for `oid_object_info()` and then `lookup_object_by_type()` using the type just discovered. As detailed in the commit message, this provides a significant time saving. Unfortunately, it also changes the behavior: We lose all annotated tags from the decoration. The reason this happens is in the loop where we try to peel the tags, we won't necessarily have parsed that first object. If we haven't, its `tagged` field will be NULL, so we won't actually add a decoration for the pointed-to object. Make sure to parse the tag object at the top of the peeling loop. This effectively restores the pre-88473c8bae parsing -- but only of tags, allowing us to keep most of the possible speedup from 88473c8bae. Jeff King reports: On my big ~220k ref test case (where it's mostly non-tags), the timings [using "git log -1 --decorate"] are: - before either patch: 2.945s - with my broken patch: 0.707s - with [this patch]: 0.788s The simplest way to do this is to just conditionally parse before the loop: if (obj->type == OBJ_TAG) parse_object(&obj->oid); But we can observe that our tag-peeling loop needs to peel already, to examine recursive tags-of-tags. So instead of introducing a new call to parse_object(), we can simply move the parsing higher in the loop: instead of parsing the new object before we loop, parse each tag object before we look at its "tagged" field. This has another beneficial side effect: if a tag points at a commit (or other non-tag type), we do not bother to parse the commit at all now. And we know it is a commit without calling oid_object_info(), because parsing the surrounding tag object will have created the correct in-core object based on the "type" field of the tag. Our test coverage for --decorate was obviously not good, since we missed this quite-basic regression. The new tests covers an annotated tag (showing the fix), but also that we correctly show annotations for lightweight tags and double-annotated tag-of-tags. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Jeff King <peff@peff.net> --- log-tree.c | 4 ++-- t/t4202-log.sh | 14 ++++++++++++++ 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/log-tree.c b/log-tree.c index 4f69ed176d..6dc4412268 100644 --- a/log-tree.c +++ b/log-tree.c @@ -174,11 +174,11 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid, add_name_decoration(deco_type, refname, obj); while (obj->type == OBJ_TAG) { + if (!obj->parsed) + parse_object(the_repository, &obj->oid); obj = ((struct tag *)obj)->tagged; if (!obj) break; - if (!obj->parsed) - parse_object(the_repository, &obj->oid); add_name_decoration(DECORATION_REF_TAG, refname, obj); } return 0; diff --git a/t/t4202-log.sh b/t/t4202-log.sh index 350cfa3593..fe8f5e2067 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -1905,6 +1905,20 @@ test_expect_success '--exclude-promisor-objects does not BUG-crash' ' test_must_fail git log --exclude-promisor-objects source-a ' +test_expect_success 'log --decorate includes all levels of tag annotated tags' ' + git checkout -b branch && + git commit --allow-empty -m "new commit" && + git tag lightweight HEAD && + git tag -m annotated annotated HEAD && + git tag -m double-0 double-0 HEAD && + git tag -m double-1 double-1 double-0 && + cat >expect <<-\EOF && + HEAD -> branch, tag: lightweight, tag: double-1, tag: double-0, tag: annotated + EOF + git log -1 --format="%D" >actual && + test_cmp expect actual +' + test_expect_success 'log --end-of-options' ' git update-ref refs/heads/--source HEAD && git log --end-of-options --source >actual &&
On Wed, 14 Jul 2021 at 00:22, Jeff King <peff@peff.net> wrote: > > On Tue, Jul 13, 2021 at 11:52:53PM +0200, Martin Ågren wrote: > > > > Totally forgot about that one; thanks. > > > > Do you have any suggestions for how this could be explained better? I > > waffled on whether to add that paragraph to the commit message and when > > I finally did, it seems it got a little bit too succinct. > > > > I'm about to check out for today. Maybe in the morning I can think of > > some clarification. > > My attempt is below. Most of the new explanation is near the end, but I > tweaked a few other things. > > Your original said: > > The reason this happens is in the loop where we try to peel the tags, > we won't necessarily have parsed that first object. If we haven't, its > `tag` will be NULL, so nothing will be displayed, and its `tagged` > will also be NULL, so we won't peel any further. > > and my earlier explanations were not thinking of the "tag" field at all, > which made me worried there was another subtle bug in not parsing the > tag earlier. But I don't think so. We don't look at the "tag" field for > setting the annotation; it always comes from the refname. So the > paragraph above should not mention "tag" at all. Thanks for correcting that. The parsed-ness of "obj" affects whether the decoration is shown at all. I originally concluded that when the decorations are eventually displayed, it was something like "if ->tag is non-NULL, display it". But that's obviously not the case. It's more like "->tagged is NULL, so I have no idea where to place this decoration". > I also beefed up the test a bit. All this talk of parsing made me want > to make sure we were covering tags-of-tags correctly (which I think we > are both before and after the patch). After adding that, the expected > decoration output was getting quite cluttered. So I tweaked the test to > make a new commit, give the tags sensible names, and just look at that > one commit. > From: Martin Ågren <martin.agren@gmail.com> At this point, I think it's fair to say that you've done most of the authoring here. I wouldn't be at all offended if you took the credit for this patch. It's your code diff, it's your test, and now it's even your *updated* test, plus half the commit message. :) Here's that added half of the message: > The simplest way to do this is to just conditionally parse before the > loop: > > if (obj->type == OBJ_TAG) > parse_object(&obj->oid); > > But we can observe that our tag-peeling loop needs to peel already, to > examine recursive tags-of-tags. So instead of introducing a new call to > parse_object(), we can simply move the parsing higher in the loop: > instead of parsing the new object before we loop, parse each tag object > before we look at its "tagged" field. > > This has another beneficial side effect: if a tag points at a commit (or > other non-tag type), we do not bother to parse the commit at all now. > And we know it is a commit without calling oid_object_info(), because > parsing the surrounding tag object will have created the correct in-core > object based on the "type" field of the tag. > > Our test coverage for --decorate was obviously not good, since we missed > this quite-basic regression. The new tests covers an annotated tag > (showing the fix), but also that we correctly show annotations for > lightweight tags and double-annotated tag-of-tags. Very well described. > Helped-by: Jeff King <peff@peff.net> > Signed-off-by: Martin Ågren <martin.agren@gmail.com> > Signed-off-by: Jeff King <peff@peff.net> If you take authorship of this, I think this could be something like Reported-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Martin Ågren <martin.agren@gmail.com> Martin
diff --git a/log-tree.c b/log-tree.c index 4f69ed176d..6dc4412268 100644 --- a/log-tree.c +++ b/log-tree.c @@ -174,11 +174,11 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid, add_name_decoration(deco_type, refname, obj); while (obj->type == OBJ_TAG) { + if (!obj->parsed) + parse_object(the_repository, &obj->oid); obj = ((struct tag *)obj)->tagged; if (!obj) break; - if (!obj->parsed) - parse_object(the_repository, &obj->oid); add_name_decoration(DECORATION_REF_TAG, refname, obj); } return 0; diff --git a/t/t4202-log.sh b/t/t4202-log.sh index 350cfa3593..536b1eef42 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -1905,6 +1905,15 @@ test_expect_success '--exclude-promisor-objects does not BUG-crash' ' test_must_fail git log --exclude-promisor-objects source-a ' +test_expect_success 'log --decorate includes lightweight and annotated tags' ' + cat >expect <<-\EOF && + three HEAD -> source-b, tag: three, tag: source-tag + one tag: one + EOF + git log --format="%s %D" >actual && + test_cmp expect actual +' + test_expect_success 'log --end-of-options' ' git update-ref refs/heads/--source HEAD && git log --end-of-options --source >actual &&