Message ID | 20210712224152.2698500-1-martin.agren@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v1] load_ref_decorations(): fix decoration with tags | expand |
On Tue, Jul 13, 2021 at 12:41:52AM +0200, Martin Ågren wrote: > Commit 88473c8bae ("load_ref_decorations(): avoid parsing non-tag > objects", 2021-06-22) introduced a shortcut to `add_ref_decoration()`: > Rather than calling `parse_object()` directly, call `oid_object_info()` > and then either return early or go on to call `lookup_object_by_type()` > using the type just discovered. As detailed in the commit message, this > provides a significant time saving. > > Unfortunately, it also changes the behavior. As an example, on git.git, > > git log --oneline --decorate origin/master | grep '(.*tag:.*)' > > returns zero hits after 88473c8bae. Before it, we have around 800 hits. > What happens is, all annotated tags are lost from the decoration. Eek. Thanks for catching this. I wondered how I could have missed this, but it does work if somebody else happens to have parsed it. For example: $ git log -1 --oneline --decorate v5.11 f40ddce88593 (tag: v5.11) Linux 5.11 works because we'll already have parsed it as a traversal tip. > Let's do a partial revert of 88473c8bae: Keep doing that early > `oid_object_info()`, but after that, go on with the full > `parse_object()`. This restores the pre-88473c8bae behavior. Your fix is _almost_ there. There's not much point in doing oid_object_info() if we're just going to parse unconditionally. But we would want to parse only tags. And we even do parse tags recursively in the peeling loop. The trouble is that we do so after realizing we need to recurse. We just need to bump it up in the loop, like: diff --git a/log-tree.c b/log-tree.c index 4f69ed176d..6dc4412268 100644 --- a/log-tree.c +++ b/log-tree.c @@ -174,11 +174,11 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid, add_name_decoration(deco_type, refname, obj); while (obj->type == OBJ_TAG) { + if (!obj->parsed) + parse_object(the_repository, &obj->oid); obj = ((struct tag *)obj)->tagged; if (!obj) break; - if (!obj->parsed) - parse_object(the_repository, &obj->oid); add_name_decoration(DECORATION_REF_TAG, refname, obj); } return 0; That's the minimum needed to unbreak things. I think we could do even better, though. There is no need for us to parse a commit object pointed to by a tag here. We should only be parsing tags we see (whether at the top-level or recursively). Do you want to incorporate the fix above, or would you prefer me to pick it up from here? > We clearly > have lacking test coverage here, so make sure to add a test. Without > this fix to log-tree.c, the git-log invocation in this new test does > pick up the lightweight tags involved, but misses out on the annotated > tag. Yeah, definitely. > In my brief testing (time git log -1 --decorate on git.git), the time > savings from 88473c8bae seem to be gone. So maybe this should be a full > revert, rather than a partial one. (Plus the test.) Let's hope that > won't be necessary. Right. After your patch, the oid_object_info() is pointless, because we're still parsing everything (the "< 0" error check would only trigger in a corrupted repo). And it adds some overhead, so it may even be slightly slower than the original code. :) The timings I get for "git log -1 --decorate" on git.git are: - before either patch: 27.5ms - with my broken patch: 5.9ms - with the patch above: 11.3ms which makes sense. I have a bunch of branches, and now we don't parse them. We do still have to parse tags. On linux.git, where it's almost entirely tags, most of the advantage dries up (but it would probably be helped a bit by the further suggestion I gave above to avoid parsing tagged commits). On my big ~220k ref test case (where it's mostly non-tags), the timings are: - before either patch: 2.945s - with my broken patch: 0.707s - with the patch above: 0.788s so the savings are retained. > Also, I'm not sure whether the test really needs to worry about the > order of the decorations suddenly changing -- maybe it's supposed to be > stable. I think it's probably OK to count on it being stable. We iterate the refs in a stable order to insert them, and then store the result as a linked list. If that strategy ever changed, I think we'd end up doing a manual sort on them to get a stable order anyway. > diff --git a/t/t4202-log.sh b/t/t4202-log.sh > index 350cfa3593..3aa5451913 100755 > --- a/t/t4202-log.sh > +++ b/t/t4202-log.sh > @@ -1905,6 +1905,20 @@ test_expect_success '--exclude-promisor-objects does not BUG-crash' ' > test_must_fail git log --exclude-promisor-objects source-a > ' > > +test_expect_success 'log --decorate includes, e.g., all kinds of tags' ' > + git log --oneline --decorate >log && > + test_line_count = 2 log && > + grep "^1ac6c77 (tag: one) one$" log && This presumably breaks when the tests are run in sha256 mode. Coupled with the ordering simplification, maybe: cat >expect <<-\EOF && three HEAD -> source-b, tag: three, tag: source-tag one tag: one EOF git log --format="%s %D" >actual && test_cmp expect actual (or maybe %d is more readable in the output; it doesn't matter much if we're matching it verbatim). Thanks again for noticing this. -Peff
diff --git a/log-tree.c b/log-tree.c index 4f69ed176d..0b638d2e3c 100644 --- a/log-tree.c +++ b/log-tree.c @@ -134,7 +134,6 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid, int flags, void *cb_data) { struct object *obj; - enum object_type objtype; enum decoration_type deco_type = DECORATION_NONE; struct decoration_filter *filter = (struct decoration_filter *)cb_data; @@ -156,10 +155,9 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid, return 0; } - objtype = oid_object_info(the_repository, oid, NULL); - if (objtype < 0) + if (oid_object_info(the_repository, oid, NULL) < 0) return 0; - obj = lookup_object_by_type(the_repository, oid, objtype); + obj = parse_object(the_repository, oid); if (starts_with(refname, "refs/heads/")) deco_type = DECORATION_REF_LOCAL; diff --git a/t/t4202-log.sh b/t/t4202-log.sh index 350cfa3593..3aa5451913 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -1905,6 +1905,20 @@ test_expect_success '--exclude-promisor-objects does not BUG-crash' ' test_must_fail git log --exclude-promisor-objects source-a ' +test_expect_success 'log --decorate includes, e.g., all kinds of tags' ' + git log --oneline --decorate >log && + test_line_count = 2 log && + grep "^1ac6c77 (tag: one) one$" log && + grep HEAD log | sed -e "s/^.*(\(.*\)).*$/\1/" -e "s/, /\n/g" | + sort >actual && + cat >expect <<-\EOF && + HEAD -> source-b + tag: source-tag + tag: three + EOF + test_cmp expect actual +' + test_expect_success 'log --end-of-options' ' git update-ref refs/heads/--source HEAD && git log --end-of-options --source >actual &&
Commit 88473c8bae ("load_ref_decorations(): avoid parsing non-tag objects", 2021-06-22) introduced a shortcut to `add_ref_decoration()`: Rather than calling `parse_object()` directly, call `oid_object_info()` and then either return early or go on to call `lookup_object_by_type()` using the type just discovered. As detailed in the commit message, this provides a significant time saving. Unfortunately, it also changes the behavior. As an example, on git.git, git log --oneline --decorate origin/master | grep '(.*tag:.*)' returns zero hits after 88473c8bae. Before it, we have around 800 hits. What happens is, all annotated tags are lost from the decoration. Let's do a partial revert of 88473c8bae: Keep doing that early `oid_object_info()`, but after that, go on with the full `parse_object()`. This restores the pre-88473c8bae behavior. We clearly have lacking test coverage here, so make sure to add a test. Without this fix to log-tree.c, the git-log invocation in this new test does pick up the lightweight tags involved, but misses out on the annotated tag. Signed-off-by: Martin Ågren <martin.agren@gmail.com> --- Cc-ing Peff, the author of 88473c8bae, and Taylor, who made a comment regarding the peeling of tags [1], which might be related. I'm genuinely unconvinced that my proposed fix is the best possible one. Or maybe trying a more surgical fix around annotated tags misses a whole bunch of *other* special cases and we really should go for the full `parse_object()` to cover all possibilities. In my brief testing (time git log -1 --decorate on git.git), the time savings from 88473c8bae seem to be gone. So maybe this should be a full revert, rather than a partial one. (Plus the test.) Let's hope that won't be necessary. Also, I'm not sure whether the test really needs to worry about the order of the decorations suddenly changing -- maybe it's supposed to be stable. [1] https://lore.kernel.org/git/YNKgkGkPiMgNubNE@nand.local/ log-tree.c | 6 ++---- t/t4202-log.sh | 14 ++++++++++++++ 2 files changed, 16 insertions(+), 4 deletions(-)