Message ID | 7687dedd4722c39b5ecef2c2165147c25d16b8d9.1624858240.git.ps@pks.im (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Speed up connectivity checks via bitmaps | expand |
On Mon, Jun 28, 2021 at 07:33:15AM +0200, Patrick Steinhardt wrote: > As expected, performance doesn't change in cases where we do not have a > bitmap available given that the old code path still kicks in. In case we > do have bitmaps, this is kind of a mixed bag: while git-receive-pack(1) > is slower in a "normal" clone of linux.git, it is significantly faster > for a clone with lots of references. The slowness can potentially be > explained by the overhead of loading the bitmap. On the other hand, the > new code is faster as expected in repos which have lots of references > given that we do not have to mark all negative references anymore. I haven't had a chance to look closely at your patches yet, but I like the idea of using an object's presence in the reachability bitmap to perform the connectivity checks. I have wondered how much performance we could eek out by being able to load the .bitmap file without having to read each individual bitmap contained in it. (I believe Peff mentioned this elsewhere, but) I would be be interested in something as simple as an optional .bitmap extension which indicates the list of commits which have a bitmap, and their offset within the bitmap. I'll try this out myself and see if it's worth it. (As an aside, I'll be offline next week, so it may take me a little while to post something to the list).
On Mon, Jun 28, 2021 at 04:23:23PM -0400, Taylor Blau wrote: > I'll try this out myself and see if it's worth it. (As an aside, I'll be > offline next week, so it may take me a little while to post something to > the list). I gave implementing this a shot and it seems to have produced some good improvements, although there are definitely some areas where it does better than others. Here are some results running on linux.git with a cold cache, counting objects for commit 2ab38c17aa, which I picked deliberately since I know it has a bitmap: $ hyperfine \ 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \ 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \ --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 Time (mean ± σ): 141.1 ms ± 2.5 ms [User: 13.0 ms, System: 64.3 ms] Range (min … max): 136.2 ms … 143.4 ms 10 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 Time (mean ± σ): 28.7 ms ± 3.2 ms [User: 6.5 ms, System: 10.0 ms] Range (min … max): 22.0 ms … 31.0 ms 21 runs Summary 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' ran 4.91 ± 0.55 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' That's sort of a best-case scenario, because we're not doing any traversal between the bitmapped commits and the traversal tips. But even if we do have some traversal, the results are still pretty good. Swapping out 2ab38c17aa for `--branches` yields a 5.02x improvement from 141.0ms down to 28.1ms. Adding in `--tags` basically negates any improvement (having the commit table extension eeks out a 1.03x improvement from 645.7ms down to 626.0ms. `perf record` shows that 30% of time is spent outside of the bitmap code. If you want to give this a try yourself, I highly recommend generating your bitmap while packing with `-c pack.writeReverseIndex`. Building a reverse index on-the-fly also seems to negate any performance improvements here, so having an on-disk reverse index is more or less a prerequisite to testing this out. Extremely gross and inscrutable code can be found on the 'tb/bitmap-commit-table' branch of my fork [1]. Thanks, Taylor [1]: https://github.com/git/git/compare/master...ttaylorr:tb/bitmap-commit-table
On Mon, Jun 28, 2021 at 07:33:15AM +0200, Patrick Steinhardt wrote: > As expected, performance doesn't change in cases where we do not have a > bitmap available given that the old code path still kicks in. In case we > do have bitmaps, this is kind of a mixed bag: while git-receive-pack(1) > is slower in a "normal" clone of linux.git, it is significantly faster > for a clone with lots of references. The slowness can potentially be > explained by the overhead of loading the bitmap. On the other hand, the > new code is faster as expected in repos which have lots of references > given that we do not have to mark all negative references anymore. Hmm. We _do_ still have to mark those negative references now, though (the bitmap code still considers each as a reachability tip for the "have" side of the traversal). It's just that we may have to do less traversal on them, if they're mentioned by other bitmaps. So in that sense I don't think your "a ref for every commit" cases are all that interesting. Any bitmap near the tip of history is going to include a bit for all those old commits, because our fake set of refs are all reachable. A much more interesting history is when you have a bunch of little unreachable spikes coming off the main history. This is common if you have a lot of branches in the repo, but also if you maintain a lot of book-keeping refs (like the refs/pull/* we do at GitHub; I assume GitLab does something similar). Here are some real-world numbers from one of the repos that gives us frequent problems with bitmaps. refs/pull/9937/head in this case is an unmerged PR with 8 commits on it. [without bitmaps, full check but with count to suppress output] $ time git rev-list --count refs/pull/9937/head --objects --not --all 0 real 0m1.280s user 0m1.131s sys 0m0.148s [same, but with bitmaps] $ time git rev-list --count refs/pull/9937/head --objects --not --all --use-bitmap-index 0 real 1m38.146s user 1m30.015s sys 0m3.443s Yikes. Now this is a pretty extreme case, as it has a lot of bookkeeping refs. If we limited ourselves to just the branches (in which case our unmerged PR will appear to have a couple new commits), though, we still get: $ time git rev-list --count refs/pull/9937/head --objects --not --branches 64 real 0m0.366s user 0m0.272s sys 0m0.093s $ time git rev-list --count refs/pull/9937/head --objects --not --branches --use-bitmap-index 61 real 0m10.372s user 0m9.633s sys 0m0.736s which is still a pretty bad regression (the difference in the output is expected; the regular traversal is not as thorough at finding objects which appear in non-contiguous sections of history). Again, this is one of the repositories that routinely gives us problems. But even on git/git, which is usually not a problematic repo, I get: $ time git rev-list refs/pull/986/head --objects --not --all real 0m0.121s user 0m0.024s sys 0m0.097s $ time git rev-list refs/pull/986/head --objects --not --all --use-bitmap-index real 0m12.406s user 0m5.843s sys 0m0.734s So even if this tradeoff might help on balance, it really makes some cases pathologically bad. -Peff
On Tue, Jun 29, 2021 at 06:44:03PM -0400, Taylor Blau wrote: > $ hyperfine \ > 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \ > 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \ > --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' > > Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 > Time (mean ± σ): 141.1 ms ± 2.5 ms [User: 13.0 ms, System: 64.3 ms] > Range (min … max): 136.2 ms … 143.4 ms 10 runs > > Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 > Time (mean ± σ): 28.7 ms ± 3.2 ms [User: 6.5 ms, System: 10.0 ms] > Range (min … max): 22.0 ms … 31.0 ms 21 runs > > Summary > 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' ran > 4.91 ± 0.55 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' I was curious why your machine is so much slower than mine. With the current bitmap format, I can run that command pretty consistently in ~22ms. But I think the trick here is the cache-dropping. The cold-cache performance is going to be very dependent on faulting in the extra bytes (and you can see that the actual CPU time in the first case is much smaller than the runtime, so it really is waiting on the disk). In the warm-cache case, the improvement seems to go away (or maybe I'm holding it wrong; even in the cold-cache case I don't get anywhere near as impressive a speedup as you showed above). Which isn't to say that cold-cache isn't sometimes important, and this may still be worth doing. But it really seems like the CPU involve in walking over the file isn't actually that much. I got an even more curious result when adding in "--not --all" (which the connectivity check under discussion would do). There the improvement from your patch should be even less, because we'd end up reading most of the bitmaps anyway. But I got: $ hyperfine \ 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all' \ 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all' \ --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all Time (mean ± σ): 4.197 s ± 0.823 s [User: 284.7 ms, System: 734.5 ms] Range (min … max): 2.612 s … 5.009 s 10 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all Time (mean ± σ): 4.498 s ± 0.612 s [User: 315.3 ms, System: 829.7 ms] Range (min … max): 3.010 s … 5.072 s 10 runs Summary 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all' ran 1.07 ± 0.26 times faster than 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all' which was actually faster _without_ the extra table. 7% isn't a lot, especially for a cold-cache test, so that may just be noise. But it doesn't seem like a clear win to me. -Peff
On Tue, Jun 29, 2021 at 10:04:56PM -0400, Jeff King wrote: > In the warm-cache case, the improvement seems to go away (or maybe I'm > holding it wrong; even in the cold-cache case I don't get anywhere near > as impressive a speedup as you showed above). Which isn't to say that > cold-cache isn't sometimes important, and this may still be worth doing. > But it really seems like the CPU involve in walking over the file isn't > actually that much. Hmm. I think that you might be holding it wrong, or at least I'm able to reproduce some substantial improvements in the warm cache case with limited traversals. Here are a few runs of the same hyperfine invocation, just swapping the `--prepare` which drops the disk cache with `--warmup 3` which populates them. $ hyperfine \ 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \ 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \ --warmup 3 Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 Time (mean ± σ): 23.1 ms ± 6.4 ms [User: 9.4 ms, System: 13.6 ms] Range (min … max): 13.8 ms … 35.8 ms 161 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 Time (mean ± σ): 11.2 ms ± 1.8 ms [User: 7.5 ms, System: 3.7 ms] Range (min … max): 4.7 ms … 12.6 ms 238 runs Swapping just loading an individual commit to look at all branches, I get the following 2.01x improvement: Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index --branches Time (mean ± σ): 22.5 ms ± 5.8 ms [User: 8.5 ms, System: 14.0 ms] Range (min … max): 14.1 ms … 34.9 ms 157 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index --branches Time (mean ± σ): 11.2 ms ± 1.9 ms [User: 7.1 ms, System: 4.1 ms] Range (min … max): 4.7 ms … 13.4 ms 239 runs But there are some diminishing returns when I include --tags, too, since I assume that there is some more traversal involved in older parts of the kernel's history which aren't as well covered by bitmaps. But it's still an improvement of 1.17x (give or take .31x, according to hyperfine). Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index --branches --tags Time (mean ± σ): 66.8 ms ± 12.4 ms [User: 43.6 ms, System: 23.1 ms] Range (min … max): 54.4 ms … 92.3 ms 39 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index --branches --tags Time (mean ± σ): 57.3 ms ± 10.9 ms [User: 37.5 ms, System: 19.8 ms] Range (min … max): 44.0 ms … 79.5 ms 45 runs > I got an even more curious result when adding in "--not --all" (which > the connectivity check under discussion would do). There the improvement > from your patch should be even less, because we'd end up reading most of > the bitmaps anyway. But I got: Interesting. Discussion about that series aside, I go from 28.6ms without reading the table to 35.1ms reading it, which is much better in absolute timings, but much worse relatively speaking. I can't quite seem to make sense of the perf report on that command. Most of the time is spent faulting pages in, but most of the time measured in the "git" object is in ubc_check. I don't really know how to interpret that, but I'd be curious if you had any thoughts. I was just looking at: $ GIT_READ_COMMIT_TABLE=1 perf record -F997 -g \ git.compile rev-list --count --objects \ --use-bitmap-index 2ab38c17aac --not --all $ perf report Thanks, Taylor
On Tue, Jun 29, 2021 at 11:07:47PM -0400, Taylor Blau wrote: > On Tue, Jun 29, 2021 at 10:04:56PM -0400, Jeff King wrote: > > In the warm-cache case, the improvement seems to go away (or maybe I'm > > holding it wrong; even in the cold-cache case I don't get anywhere near > > as impressive a speedup as you showed above). Which isn't to say that > > cold-cache isn't sometimes important, and this may still be worth doing. > > But it really seems like the CPU involve in walking over the file isn't > > actually that much. > > Hmm. I think that you might be holding it wrong, or at least I'm able to > reproduce some substantial improvements in the warm cache case with > limited traversals. OK, I definitely was holding it wrong. It turns out that it helps to run the custom version of Git when passing in the pack.writebitmapcomittable option. (I regret there is no portable way to communicate a facepalm image over plain text). So that helped, but I did seem some other interesting bits. Here's my cold-cache time for the commit you selected: $ export commit=2ab38c17aac10bf55ab3efde4c4db3893d8691d2 $ hyperfine \ -L table 0,1 \ 'GIT_READ_COMMIT_TABLE={table} git.compile rev-list --count --objects --use-bitmap-index $commit' \ --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit Time (mean ± σ): 1.420 s ± 0.162 s [User: 196.1 ms, System: 293.7 ms] Range (min … max): 1.083 s … 1.540 s 10 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit Time (mean ± σ): 1.319 s ± 0.256 s [User: 171.1 ms, System: 237.1 ms] Range (min … max): 0.773 s … 1.588 s 10 runs Summary 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit' ran 1.08 ± 0.24 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit' So better, but well within the noise (and I had a few runs where it actually performed worse). But you picked that commit because you knew it was bitmapped, and it's not in my repo. If I switch to a commit that is covered in my repo, then I get similar results to yours: $ export commit=9b1ea29bc0d7b94d420f96a0f4121403efc3dd85 $ hyperfine \ -L table 0,1 \ 'GIT_READ_COMMIT_TABLE={table} git.compile rev-list --count --objects --use-bitmap-index $commit' \ --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit Time (mean ± σ): 309.3 ms ± 61.0 ms [User: 19.4 ms, System: 79.0 ms] Range (min … max): 154.4 ms … 386.7 ms 10 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit Time (mean ± σ): 33.7 ms ± 2.5 ms [User: 3.3 ms, System: 3.6 ms] Range (min … max): 31.5 ms … 46.5 ms 33 runs Summary 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit' ran 9.19 ± 1.94 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit' And the effect continues in the warm cache case, though the absolute numbers are much tinier: Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit Time (mean ± σ): 12.0 ms ± 0.3 ms [User: 4.6 ms, System: 7.4 ms] Range (min … max): 11.4 ms … 13.2 ms 219 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit Time (mean ± σ): 3.0 ms ± 0.4 ms [User: 2.3 ms, System: 0.8 ms] Range (min … max): 2.6 ms … 5.5 ms 851 runs Summary 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit' ran 3.95 ± 0.55 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit' That implies to me that yes, it really is saving time, especially in the cold-cache case. But if you have to do any actual fill-in traversal, the benefits get totally lost in the noise. _Especially_ in the cold-cache case, because then we're faulting in the actual object data, etc. By the way, one other thing I noticed is that having a fully-build commit-graph also made a big difference (much bigger than this patch), especially for the non-bitmapped commit. Which makes sense, since it is saving us from loading commit objects from disk during fill-in traversal. So I dunno. There's absolutely savings for some cases, but I suspect in practice it's not going to really be noticeable. Part of me says "well, if it ever provides a benefit and there isn't a downside, why not?". So just devil's advocating on downsides for a moment: - there's some extra complexity in the file format and code to read and write these (and still fall back to the old system when they're absent). I don't think it's a deal-breaker, as it's really not that complicated a feature. - we're using extra bytes on disk (and the associated cold block cache effects there). It's not very many bytes, though (I guess 20 for the hash, plus a few offset integers; if we wanted to really penny-pinch, we could probably store 32-bit pointers to the hashes in the associated .idx file, at the cost of an extra level of indirection while binary searching). But that is only for a few hundred commits that are bitmapped, not all of them. And it's balanced by not needing to allocate a similar in-memory lookup table in each command. So it's probably a net win. > > I got an even more curious result when adding in "--not --all" (which > > the connectivity check under discussion would do). There the improvement > > from your patch should be even less, because we'd end up reading most of > > the bitmaps anyway. But I got: > > Interesting. Discussion about that series aside, I go from 28.6ms > without reading the table to 35.1ms reading it, which is much better in > absolute timings, but much worse relatively speaking. I suspect that's mostly noise. With "--not --all" in a regular linux.git repo, I don't find any statistical difference. In a fake repo with one ref per commit, everything is even more lost in the noise (because we spend like 2000ms loading up all the tip commits). I suspect it's worse on a real repo with lots of refs (the "spiky branches" thing I mentioned earlier in the thread), since there we'd have to do even more fill-in traversal. > I can't quite seem to make sense of the perf report on that command. > Most of the time is spent faulting pages in, but most of the time > measured in the "git" object is in ubc_check. I don't really know how to > interpret that, but I'd be curious if you had any thoughts. ubc_check() is basically "computing sha1" (we check the sha1 on every object we call parse_object() for). I'd guess it's just time spent loading the negative tips (commits if you don't have a commit graph, plus tags to peel, though I guess we should be using the packed-refs peel optimization here). -Peff
On Wed, Jun 30, 2021 at 01:45:03AM -0400, Jeff King wrote: > That implies to me that yes, it really is saving time, especially in the > cold-cache case. But if you have to do any actual fill-in traversal, the > benefits get totally lost in the noise. _Especially_ in the cold-cache > case, because then we're faulting in the actual object data, etc. That's definitely true. I would say that any patches in this direction would have the general sense of "this helps in some cases where we don't have to do much traversal by eliminating an unnecessarily eager part of loading bitmaps, and does not make anything worse when the bitmap coverage is incomplete (and requires traversing)." I would add that these effects change with the size of the bitmap. Let's just consider the "count the number of objects in a bitmapped commit". On my local copy of the kernel, I see a relatively modest improvement: $ tip=2ab38c17aac10bf55ab3efde4c4db3893d8691d2 $ hyperfine \ 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip' \ 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip' \ --warmup=3 Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip Time (mean ± σ): 21.5 ms ± 5.6 ms [User: 8.7 ms, System: 12.7 ms] Range (min … max): 12.4 ms … 34.2 ms 170 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip Time (mean ± σ): 10.6 ms ± 1.6 ms [User: 7.1 ms, System: 3.5 ms] Range (min … max): 4.5 ms … 11.9 ms 258 runs but on my copy of the kernel's fork network repo (that containing all of torvalds/linux's objects, as well as all of its fork's objects, too), the magnitude of the effect is much bigger: Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip Time (mean ± σ): 332.3 ms ± 12.6 ms [User: 210.4 ms, System: 121.8 ms] Range (min … max): 322.7 ms … 362.4 ms 10 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip Time (mean ± σ): 260.0 ms ± 9.3 ms [User: 191.0 ms, System: 69.0 ms] Range (min … max): 250.4 ms … 272.8 ms 11 runs That's a more modest 1.28x improvement (versus 2.03x in just linux.git), but the overall magnitude is much bigger. This is basically an effect of the bitmaps themselves. In the latter example, there are more bitmaps (around 1.6k of them, versus just over 500 in my copy of just the kernel), and each of them are much wider (because there are far more objects, 40.2M versus just 7.8M). So there is more work to do, and the page cache is less efficient for us because we can't fit as much of the .bitmap file in the page cache at once. > By the way, one other thing I noticed is that having a fully-build > commit-graph also made a big difference (much bigger than this patch), > especially for the non-bitmapped commit. Which makes sense, since it is > saving us from loading commit objects from disk during fill-in > traversal. Likewise having an reverse index helps a lot, too. That radix sort scales linearly with the number of objects in the bitmapped pack (plus you're paying the cost to allocate more heap, etc). This clouded up some of my timings in p5310, which made me think that it would be a good idea to `git config pack.writeReverseIndex true` in the setup for those tests, but an even better direction would be to change the default of pack.writeReverseIndex to true everywhere. > So I dunno. There's absolutely savings for some cases, but I suspect in > practice it's not going to really be noticeable. Part of me says "well, > if it ever provides a benefit and there isn't a downside, why not?". So > just devil's advocating on downsides for a moment: > > - there's some extra complexity in the file format and code to read > and write these (and still fall back to the old system when they're > absent). I don't think it's a deal-breaker, as it's really not that > complicated a feature. I agree with both of these. The complexity is manageable, I think, especially since I dropped support for the extended offset table (having a bitmap file that is >2GiB seems extremely unlikely to me, and it's possible to add support for it in the future) and fanout table (there are usually less than <1k commits with bitmaps, so a 256-entry fanout table doesn't seem to help much in benchmarking). So what's left of the format is really just: - a table of object id's - a table of (uint32_t, uint32_t) tuples describing the (short) offset of the bitmap, and an index position of the xor'd bitmap (if one exists). > - we're using extra bytes on disk (and the associated cold block cache > effects there). It's not very many bytes, though (I guess 20 for the > hash, plus a few offset integers; if we wanted to really > penny-pinch, we could probably store 32-bit pointers to the hashes > in the associated .idx file, at the cost of an extra level of > indirection while binary searching). But that is only for a few > hundred commits that are bitmapped, not all of them. And it's > balanced by not needing to allocate a similar in-memory lookup table > in each command. So it's probably a net win. Worth benchmarking, at least. I'll be offline for the next ~week and a half for my wedding, but I'll post some patches to the list shortly after I get back. Thanks, Taylor
On Fri, Jul 02, 2021 at 01:44:12PM -0400, Taylor Blau wrote: > I would add that these effects change with the size of the bitmap. > Let's just consider the "count the number of objects in a bitmapped > commit". On my local copy of the kernel, I see a relatively modest > improvement: > > $ tip=2ab38c17aac10bf55ab3efde4c4db3893d8691d2 > $ hyperfine \ > 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip' \ > 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip' \ > --warmup=3 > Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip > Time (mean ± σ): 21.5 ms ± 5.6 ms [User: 8.7 ms, System: 12.7 ms] > Range (min … max): 12.4 ms … 34.2 ms 170 runs > > Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip > Time (mean ± σ): 10.6 ms ± 1.6 ms [User: 7.1 ms, System: 3.5 ms] > Range (min … max): 4.5 ms … 11.9 ms 258 runs > > but on my copy of the kernel's fork network repo (that containing all of > torvalds/linux's objects, as well as all of its fork's objects, too), > the magnitude of the effect is much bigger: > > Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip > Time (mean ± σ): 332.3 ms ± 12.6 ms [User: 210.4 ms, System: 121.8 ms] > Range (min … max): 322.7 ms … 362.4 ms 10 runs > > Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip > Time (mean ± σ): 260.0 ms ± 9.3 ms [User: 191.0 ms, System: 69.0 ms] > Range (min … max): 250.4 ms … 272.8 ms 11 runs > > That's a more modest 1.28x improvement (versus 2.03x in just linux.git), > but the overall magnitude is much bigger. Thanks, this is much more compelling. 70ms is a lot of startup time to save. I am a little surprised that a no-traversal bitmap query like this would still take 300ms. I wonder if 2ab38c17aac actually got a bitmap in your second example (and if not, then there are probably cases where the relative speedup would be even more impressive). > This clouded up some of my timings in p5310, which made me think that it > would be a good idea to `git config pack.writeReverseIndex true` in the > setup for those tests, but an even better direction would be to change > the default of pack.writeReverseIndex to true everywhere. Yes, I'd be in favor of that. IMHO the reason to make it configurable at all was not because it's ever a bad idea, but just to phase it in and get experience with it (and to give an escape hatch for debugging it). It's probably _less_ useful for local clones that are not serving fetches. But every push is already generating the same thing in-memory, so it seems like a good tradeoff to just use it everywhere. > > - there's some extra complexity in the file format and code to read > > and write these (and still fall back to the old system when they're > > absent). I don't think it's a deal-breaker, as it's really not that > > complicated a feature. > > I agree with both of these. The complexity is manageable, I think, > especially since I dropped support for the extended offset table (having > a bitmap file that is >2GiB seems extremely unlikely to me, and it's > possible to add support for it in the future) and > fanout table (there are usually less than <1k commits with bitmaps, so > a 256-entry fanout table doesn't seem to help much in benchmarking). > > So what's left of the format is really just: > > - a table of object id's > - a table of (uint32_t, uint32_t) tuples describing the (short) offset > of the bitmap, and an index position of the xor'd bitmap (if one > exists). Yeah, that really seems quite simple. I'd have to judge after seeing the cleaned up code, but I suspect it's not going to be a burden. > I'll be offline for the next ~week and a half for my wedding, but I'll > post some patches to the list shortly after I get back. Yep, no rush. Thanks for looking into this. -Peff
On Tue, Jun 29, 2021 at 09:51:33PM -0400, Jeff King wrote: > On Mon, Jun 28, 2021 at 07:33:15AM +0200, Patrick Steinhardt wrote: > > > As expected, performance doesn't change in cases where we do not have a > > bitmap available given that the old code path still kicks in. In case we > > do have bitmaps, this is kind of a mixed bag: while git-receive-pack(1) > > is slower in a "normal" clone of linux.git, it is significantly faster > > for a clone with lots of references. The slowness can potentially be > > explained by the overhead of loading the bitmap. On the other hand, the > > new code is faster as expected in repos which have lots of references > > given that we do not have to mark all negative references anymore. > > Hmm. We _do_ still have to mark those negative references now, though > (the bitmap code still considers each as a reachability tip for the > "have" side of the traversal). It's just that we may have to do less > traversal on them, if they're mentioned by other bitmaps. > > So in that sense I don't think your "a ref for every commit" cases are > all that interesting. Any bitmap near the tip of history is going to > include a bit for all those old commits, because our fake set of refs > are all reachable. A much more interesting history is when you have a > bunch of little unreachable spikes coming off the main history. > > This is common if you have a lot of branches in the repo, but also if > you maintain a lot of book-keeping refs (like the refs/pull/* we do at > GitHub; I assume GitLab does something similar). > > Here are some real-world numbers from one of the repos that gives us > frequent problems with bitmaps. refs/pull/9937/head in this case is an > unmerged PR with 8 commits on it. Yeah, this kind of brings us back to the old topic of pathological performance combined with bitmaps. As I said in the cover letter, I haven't been particularly happy with the results of this version, but rather intended it as an RFC. Taylor's extension does look quite interesting, but ultimately I'm not sure whether we want to use bitmaps for connectivity checks. Your spiky-branches example neatly highlights that it cannot really work in the general case. I wonder where that leaves us. I'm out of ideas on how to solve this in the general case for any push/connectivity check, so I guess that any alternative approach would instead make use of heuristics. In the current context, I care mostly about the user-side context, which is interactive pushes. Without knowing the numbers, my bet is that the most frequent usecase here is the push of a single branch with only a bunch of commits. If the pushed commit is a descendant of any existing commit, then we can limit the connectivity check to `git rev-list --objects $newoid --not $oldoid` instead of `--not --all`. There's two issues: - "descendant of any existing commit" is again the same territory performance-wise as `--all`. So we can heuristically limit this either to the to-be-updated-target reference if it exists, or HEAD. - Calculating ancestry can be expensive if there's too many commits in between or if history is unrelated. We may limit this check to a small number like only checking the most recent 16 commits. If these conditions hold, then we can do above optimized check, otherwise we fall back to the old check. Do we actually gain anything by this? The following was executed with linux.git and stable tags. afeb391 is an empty commit on top of master. Benchmark #1: git rev-list afeb391 --not --all Time (mean ± σ): 64.1 ms ± 8.0 ms [User: 52.8 ms, System: 11.1 ms] Range (min … max): 58.2 ms … 79.5 ms 37 runs Benchmark #2: git rev-list afeb391 --not master Time (mean ± σ): 1.6 ms ± 0.5 ms [User: 1.0 ms, System: 1.0 ms] Range (min … max): 0.4 ms … 2.2 ms 1678 runs Obviously not a real-world example, but it serves as a hint that it would help in some cases and potentially pay out quite well. Patrick
diff --git a/connected.c b/connected.c index b18299fdf0..474275a52f 100644 --- a/connected.c +++ b/connected.c @@ -6,6 +6,127 @@ #include "transport.h" #include "packfile.h" #include "promisor-remote.h" +#include "object.h" +#include "tree-walk.h" +#include "commit.h" +#include "tag.h" +#include "progress.h" +#include "oidset.h" +#include "pack-bitmap.h" + +#define QUEUED (1u<<0) + +static int queue_object(struct repository *repo, + struct bitmap_index *bitmap, + struct object_array *pending, + const struct object_id *oid) +{ + struct object *object; + + /* + * If the object ID is part of the bitmap, then we know that it must by + * definition be reachable in the target repository and be fully + * connected. We can thus skip checking the objects' referenced + * objects. + */ + if (bitmap_position(bitmap, oid) >= 0) + return 0; + + /* Otherwise the object is queued up for a connectivity check. */ + object = parse_object(repo, oid); + if (!object) { + /* Promisor objects do not need to be traversed. */ + if (is_promisor_object(oid)) + return 0; + return -1; + } + + /* + * If the object has been queued before already, then we don't need to + * do so again. + */ + if (object->flags & QUEUED) + return 0; + object->flags |= QUEUED; + + /* + * We do not need to queue up blobs given that they don't reference any + * other objects anyway. + */ + if (object->type == OBJ_BLOB) + return 0; + + add_object_array(object, NULL, pending); + + return 0; +} + +static int check_object( + struct repository *repo, + struct bitmap_index *bitmap, + struct object_array *pending, + const struct object *object) +{ + int ret = 0; + + if (object->type == OBJ_TREE) { + struct tree *tree = (struct tree *)object; + struct tree_desc tree_desc; + struct name_entry entry; + + if (init_tree_desc_gently(&tree_desc, tree->buffer, tree->size)) + return error(_("cannot parse tree entry")); + while (tree_entry_gently(&tree_desc, &entry)) + ret |= queue_object(repo, bitmap, pending, &entry.oid); + + free_tree_buffer(tree); + } else if (object->type == OBJ_COMMIT) { + struct commit *commit = (struct commit *) object; + struct commit_list *parents; + + for (parents = commit->parents; parents; parents = parents->next) + ret |= queue_object(repo, bitmap, pending, &parents->item->object.oid); + + free_commit_buffer(repo->parsed_objects, commit); + } else if (object->type == OBJ_TAG) { + ret |= queue_object(repo, bitmap, pending, get_tagged_oid((struct tag *) object)); + } else { + return error(_("unexpected object type")); + } + + return ret; +} + +/* + * If the target repository has a bitmap, then we treat all objects reachable + * via the bitmap as fully connected. Bitmapped objects thus serve as the + * boundary between old and new objects. + */ +static int check_connected_with_bitmap(struct repository *repo, + struct bitmap_index *bitmap, + oid_iterate_fn fn, void *cb_data, + struct check_connected_options *opt) +{ + struct object_array pending = OBJECT_ARRAY_INIT; + struct progress *progress = NULL; + size_t checked_objects = 0; + struct object_id oid; + int ret = 0; + + if (opt->progress) + progress = start_delayed_progress("Checking connectivity", 0); + + while (!fn(cb_data, &oid)) + ret |= queue_object(repo, bitmap, &pending, &oid); + while (pending.nr) { + ret |= check_object(repo, bitmap, &pending, object_array_pop(&pending)); + display_progress(progress, ++checked_objects); + } + + stop_progress(&progress); + object_array_clear(&pending); + return ret; +} /* * If we feed all the commits we want to verify to this command @@ -28,12 +149,27 @@ int check_connected(oid_iterate_fn fn, void *cb_data, int err = 0; struct packed_git *new_pack = NULL; struct transport *transport; + struct bitmap_index *bitmap; size_t base_len; if (!opt) opt = &defaults; transport = opt->transport; + bitmap = prepare_bitmap_git(the_repository); + if (bitmap) { + /* + * If we have a bitmap, then we can reuse the bitmap as + * boundary between old and new objects. + */ + err = check_connected_with_bitmap(the_repository, bitmap, + fn, cb_data, opt); + if (opt->err_fd) + close(opt->err_fd); + free_bitmap_index(bitmap); + return err; + } + if (fn(cb_data, &oid)) { if (opt->err_fd) close(opt->err_fd); diff --git a/pack-bitmap.c b/pack-bitmap.c index d90e1d9d8c..d88a882ee1 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -418,8 +418,8 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git, return pos; } -static int bitmap_position(struct bitmap_index *bitmap_git, - const struct object_id *oid) +int bitmap_position(struct bitmap_index *bitmap_git, + const struct object_id *oid) { int pos = bitmap_position_packfile(bitmap_git, oid); return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid); diff --git a/pack-bitmap.h b/pack-bitmap.h index 99d733eb26..7b4b386107 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -63,6 +63,8 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping void free_bitmap_index(struct bitmap_index *); int bitmap_walk_contains(struct bitmap_index *, struct bitmap *bitmap, const struct object_id *oid); +int bitmap_position(struct bitmap_index *bitmap_git, + const struct object_id *oid); /* * After a traversal has been performed by prepare_bitmap_walk(), this can be
The connectivity checks are currently implemented via git-rev-list(1): we simply ignore any objects which are reachable from preexisting refs via `--not --all`, and pass all new refs which are to be checked via its stdin. While this works well enough for repositories which do not have a lot of references, it's clear that `--not --all` will linearly scale with the number of refs which do exist: for each reference, we'll walk through its commit as well as its five parent commits (defined via `SLOP`). Given that many major hosting sites which use a pull/merge request workflow keep refs to the request's HEAD, this effectively means that `--not --all` will do a nearly complete graph walk. This commit implements an alternate strategy if the target repository has bitmaps. Objects referenced by a bitmap are by definition always fully connected, so they form a natural boundary between old revisions and new revisions. With this alternate strategy, we walk all given new object IDs. Whenever we hit any object which is covered by the bitmap, we stop the walk. The logic only kicks in in case we have a bitmap in the repository. If not, we wouldn't be able to efficiently abort the walk because we cannot easily tell whether an object already exists in the target repository and, if it does, whether it's fully connected. As a result, we'd have to perform a always do graph walk in this case. Naturally, we could do the same thing the previous git-rev-list(1) implementation did in that case and just use preexisting branch tips as boundaries. But for now, we just keep the old implementation for simplicity's sake given that performance characteristics are likely not significantly different. An easier solution may have been to simply add `--use-bitmap-index` to the git-rev-list(1) invocation, but benchmarks have shown that this is not effective: performance characteristics remain the same except for some cases where the bitmap walks performs much worse compared to the non-bitmap walk The following benchmarks have been performed in linux.git: Test origin/master HEAD --------------------------------------------------------------------------------------------------------- 5400.4: empty receive-pack updated:new 176.02(387.28+175.12) 176.86(388.75+175.51) +0.5% 5400.7: clone receive-pack updated:new 0.10(0.09+0.01) 0.08(0.06+0.03) -20.0% 5400.9: clone receive-pack updated:main 0.09(0.08+0.01) 0.09(0.06+0.03) +0.0% 5400.11: clone receive-pack main~10:main 0.14(0.11+0.03) 0.13(0.11+0.02) -7.1% 5400.13: clone receive-pack :main 0.01(0.01+0.00) 0.02(0.01+0.00) +100.0% 5400.16: clone_bitmap receive-pack updated:new 0.10(0.09+0.01) 0.28(0.13+0.15) +180.0% 5400.18: clone_bitmap receive-pack updated:main 0.10(0.08+0.02) 0.28(0.12+0.16) +180.0% 5400.20: clone_bitmap receive-pack main~10:main 0.13(0.11+0.02) 0.26(0.12+0.14) +100.0% 5400.22: clone_bitmap receive-pack :main 0.01(0.01+0.00) 0.01(0.01+0.00) +0.0% 5400.25: extrarefs receive-pack updated:new 32.14(20.76+11.59) 32.35(20.52+12.03) +0.7% 5400.27: extrarefs receive-pack updated:main 32.08(20.54+11.75) 32.10(20.78+11.53) +0.1% 5400.29: extrarefs receive-pack main~10:main 32.14(20.66+11.68) 32.27(20.65+11.83) +0.4% 5400.31: extrarefs receive-pack :main 7.09(3.56+3.53) 7.10(3.70+3.40) +0.1% 5400.34: extrarefs_bitmap receive-pack updated:new 32.41(20.59+12.02) 7.36(3.76+3.60) -77.3% 5400.36: extrarefs_bitmap receive-pack updated:main 32.26(20.53+11.95) 7.34(3.56+3.78) -77.2% 5400.38: extrarefs_bitmap receive-pack main~10:main 32.44(20.77+11.90) 7.40(3.70+3.70) -77.2% 5400.40: extrarefs_bitmap receive-pack :main 7.09(3.62+3.46) 7.17(3.79+3.38) +1.1% As expected, performance doesn't change in cases where we do not have a bitmap available given that the old code path still kicks in. In case we do have bitmaps, this is kind of a mixed bag: while git-receive-pack(1) is slower in a "normal" clone of linux.git, it is significantly faster for a clone with lots of references. The slowness can potentially be explained by the overhead of loading the bitmap. On the other hand, the new code is faster as expected in repos which have lots of references given that we do not have to mark all negative references anymore. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- connected.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++ pack-bitmap.c | 4 +- pack-bitmap.h | 2 + 3 files changed, 140 insertions(+), 2 deletions(-)