Message ID | 20200917164743.GA3731633@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 67bb65de5ddf008cb39206354ae4b7af66c05b6c |
Headers | show |
Series | packfile: actually set approximate_object_count_valid | expand |
On Thu, Sep 17, 2020 at 12:47:43PM -0400, Jeff King wrote: > So here it is wrapped up as a patch. I think it's worth fixing (as > opposed to dropping the unused flag code). Thanks for finding it. Yup, after reading the patch and performance timings below, I agree that this is worth fixing and keeping instead of dropping. > It doesn't help at all when we have 1 pack (5303.4), but we get a 10% > speedup when there are 1000 packs (5303.12). That's a modest speedup for > a case that's already slow and we'd hope to avoid in general (note how > slow it is even after, because we have to look in each of those packs > for abbreviations). But it's a one-line change that clearly matches the > original intent, so it seems worth doing. Excellent. > The included perf test may also be useful for keeping an eye on any > regressions in the overall abbreviation code. > > Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> > Signed-off-by: Jeff King <peff@peff.net> > --- > packfile.c | 1 + > t/perf/p5303-many-packs.sh | 4 ++++ > 2 files changed, 5 insertions(+) > > diff --git a/packfile.c b/packfile.c > index 9ef27508f2..e69012e7f2 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -923,6 +923,7 @@ unsigned long repo_approximate_object_count(struct repository *r) > count += p->num_objects; > } > r->objects->approximate_object_count = count; > + r->objects->approximate_object_count_valid = 1; > } > return r->objects->approximate_object_count; > } > diff --git a/t/perf/p5303-many-packs.sh b/t/perf/p5303-many-packs.sh > index 7ee791669a..f4c2ab0584 100755 > --- a/t/perf/p5303-many-packs.sh > +++ b/t/perf/p5303-many-packs.sh > @@ -73,6 +73,10 @@ do > git rev-list --objects --all >/dev/null > ' > > + test_perf "abbrev-commit ($nr_packs)" ' > + git rev-list --abbrev-commit HEAD >/dev/null > + ' > + > # This simulates the interesting part of the repack, which is the > # actual pack generation, without smudging the on-disk setup > # between trials. > -- > 2.28.0.982.gdd163d6eb1 Looks all very good to me. Thanks. Thanks, Taylor
Jeff King <peff@peff.net> writes: > Subject: [PATCH] packfile: actually set approximate_object_count_valid > > The approximate_object_count() function tries to compute the count only > once per process. But ever since it was introduced in 8e3f52d778 > (find_unique_abbrev: move logic out of get_short_sha1(), 2016-10-03), we > failed to actually set the "valid" flag, meaning we'd compute it fresh > on every call. > ... > Test origin HEAD > ---------------------------------------------------------------------------- > 5303.3: rev-list (1) 28.91(28.46+0.44) 29.03(28.65+0.38) +0.4% > 5303.4: abbrev-commit (1) 1.18(1.06+0.11) 1.17(1.02+0.14) -0.8% > 5303.7: rev-list (50) 28.95(28.56+0.38) 29.50(29.17+0.32) +1.9% > 5303.8: abbrev-commit (50) 3.67(3.56+0.10) 3.57(3.42+0.15) -2.7% > 5303.11: rev-list (1000) 30.34(29.89+0.43) 30.82(30.35+0.46) +1.6% > 5303.12: abbrev-commit (1000) 86.82(86.52+0.29) 77.82(77.59+0.22) -10.4% Yuk, this is quite extreme. > 5303.15: load 10,000 packs 0.08(0.02+0.05) 0.08(0.02+0.06) +0.0% > > It doesn't help at all when we have 1 pack (5303.4), but we get a 10% > speedup when there are 1000 packs (5303.12). That's a modest speedup for > a case that's already slow and we'd hope to avoid in general (note how > slow it is even after, because we have to look in each of those packs > for abbreviations). But it's a one-line change that clearly matches the > original intent, so it seems worth doing. > > The included perf test may also be useful for keeping an eye on any > regressions in the overall abbreviation code. > > Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> > Signed-off-by: Jeff King <peff@peff.net> > --- > packfile.c | 1 + > t/perf/p5303-many-packs.sh | 4 ++++ > 2 files changed, 5 insertions(+) Thanks for finding and fixing. I agree that this is worth doing.
diff --git a/packfile.c b/packfile.c index 9ef27508f2..e69012e7f2 100644 --- a/packfile.c +++ b/packfile.c @@ -923,6 +923,7 @@ unsigned long repo_approximate_object_count(struct repository *r) count += p->num_objects; } r->objects->approximate_object_count = count; + r->objects->approximate_object_count_valid = 1; } return r->objects->approximate_object_count; } diff --git a/t/perf/p5303-many-packs.sh b/t/perf/p5303-many-packs.sh index 7ee791669a..f4c2ab0584 100755 --- a/t/perf/p5303-many-packs.sh +++ b/t/perf/p5303-many-packs.sh @@ -73,6 +73,10 @@ do git rev-list --objects --all >/dev/null ' + test_perf "abbrev-commit ($nr_packs)" ' + git rev-list --abbrev-commit HEAD >/dev/null + ' + # This simulates the interesting part of the repack, which is the # actual pack generation, without smudging the on-disk setup # between trials.