diff mbox series

[v3,2/4] revision: stop retrieving reference twice

Message ID db854806498b316b4f59c33d70a1ea9c1401fdb7.1627896460.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Speed up connectivity checks | expand

Commit Message

Patrick Steinhardt Aug. 2, 2021, 9:38 a.m. UTC
When queueing up references for the revision walk, `handle_one_ref()`
will resolve the reference's object ID and then queue the ID as pending
object via `add_pending_oid()`. But `add_pending_oid()` will again try
to resolve the object ID to an object, effectively duplicating the work
its caller already did before.

Fix the issue by instead calling `add_pending_object()`, which takes the
already-resolved object as input. In a repository with lots of refs,
this translates in a nearly 10% speedup:

    Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      5.015 s ±  0.038 s    [User: 4.698 s, System: 0.316 s]
      Range (min … max):    4.970 s …  5.089 s    10 runs

    Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      4.606 s ±  0.029 s    [User: 4.260 s, System: 0.345 s]
      Range (min … max):    4.565 s …  4.657 s    10 runs

    Summary
      'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran
        1.09 ± 0.01 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 revision.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason Aug. 2, 2021, 12:53 p.m. UTC | #1
On Mon, Aug 02 2021, Patrick Steinhardt wrote:

> [[PGP Signed Part:Undecided]]
> When queueing up references for the revision walk, `handle_one_ref()`
> will resolve the reference's object ID and then queue the ID as pending
> object via `add_pending_oid()`. But `add_pending_oid()` will again try
> to resolve the object ID to an object, effectively duplicating the work
> its caller already did before.
>
> Fix the issue by instead calling `add_pending_object()`, which takes the
> already-resolved object as input. In a repository with lots of refs,
> this translates in a nearly 10% speedup:
>
>     Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
>       Time (mean ± σ):      5.015 s ±  0.038 s    [User: 4.698 s, System: 0.316 s]
>       Range (min … max):    4.970 s …  5.089 s    10 runs
>
>     Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
>       Time (mean ± σ):      4.606 s ±  0.029 s    [User: 4.260 s, System: 0.345 s]
>       Range (min … max):    4.565 s …  4.657 s    10 runs
>
>     Summary
>       'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran
>         1.09 ± 0.01 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev'

It might be worth calling out explicitly that it's not just
"effectively", but that add_pending_oid() is just a thin wrapper for
get_reference() followed by add_pending_object(), so we're guaranteed to
get the exact same result here as before, just without the duplicate
work.

I.e. we're not going down some other lookup path that uses object
lookups with different flags or whatever as a result of this change.
diff mbox series

Patch

diff --git a/revision.c b/revision.c
index 7eb09009ba..f06a5d63a3 100644
--- a/revision.c
+++ b/revision.c
@@ -1534,7 +1534,7 @@  static int handle_one_ref(const char *path, const struct object_id *oid,
 
 	object = get_reference(cb->all_revs, path, oid, cb->all_flags);
 	add_rev_cmdline(cb->all_revs, object, path, REV_CMD_REF, cb->all_flags);
-	add_pending_oid(cb->all_revs, path, oid, cb->all_flags);
+	add_pending_object(cb->all_revs, object, path);
 	return 0;
 }