Message ID | 08519b8ab6f395cffbcd5e530bfba6aaf64241a2.1628085347.git.ps@pks.im (mailing list archive) |
---|---|
State | Accepted |
Commit | 3e5e6c6e94f09973d3a72049aad09fd2131a3648 |
Headers | show |
Series | fetch-pack: speed up loading of refs via commit graph | expand |
On 8/4/2021 9:56 AM, Patrick Steinhardt wrote: > When doing reference negotiation, git-fetch-pack(1) is loading all refs > from disk in order to determine which commits it has in common with the > remote repository. This can be quite expensive in repositories with many > references though: in a real-world repository with around 2.2 million > refs, fetching a single commit by its ID takes around 44 seconds. > > Dominating the loading time is decompression and parsing of the objects > which are referenced by commits. Given the fact that we only care about > commits (or tags which can be peeled to one) in this context, there is > thus an easy performance win by switching the parsing logic to make use > of the commit graph in case we have one available. Nice find! > Like this, we avoid > hitting the object database to parse these commits but instead only load > them from the commit-graph. This results in a significant performance > boost when executing git-fetch in said repository with 2.2 million refs: > > Benchmark #1: HEAD~: git fetch $remote $commit > Time (mean ± σ): 44.168 s ± 0.341 s [User: 42.985 s, System: 1.106 s] > Range (min … max): 43.565 s … 44.577 s 10 runs > > Benchmark #2: HEAD: git fetch $remote $commit > Time (mean ± σ): 19.498 s ± 0.724 s [User: 18.751 s, System: 0.690 s] > Range (min … max): 18.629 s … 20.454 s 10 runs > > Summary > 'HEAD: git fetch $remote $commit' ran > 2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit' That's a great improvement. I'm sure that the remaining time is dominated by ref parsing. > - if (type == OBJ_COMMIT) > - return (struct commit *) parse_object(the_repository, oid); > + > + if (type == OBJ_COMMIT) { > + struct commit *commit = lookup_commit(the_repository, oid); > + if (!commit || repo_parse_commit(the_repository, commit)) > + return NULL; > + return commit; > + } > + And this change looks obviously correct to me. I'm glad that the implementation actually doesn't care about the commit-graph, but instead cares about using the "standard" parsing approach instead of side-stepping the commit-graph via parse_object(). I took a quick look for other instances where we use parse_object() but also know to expect a commit, but did not find one. Thanks, -Stolee
Patrick Steinhardt <ps@pks.im> writes: > - if (type == OBJ_COMMIT) > - return (struct commit *) parse_object(the_repository, oid); > + > + if (type == OBJ_COMMIT) { > + struct commit *commit = lookup_commit(the_repository, oid); > + if (!commit || repo_parse_commit(the_repository, commit)) > + return NULL; > + return commit; > + } OK. Asking parse_object() to turn an object name to an in-core instance of the object is a pretty much standard and generic idiom, but in this codepath, we have already asked for the type and know it must be a commit, so asking parse_commit() that is more specialized makes quite a lot of sense. Thanks.
On Wed, Aug 04, 2021 at 03:56:11PM +0200, Patrick Steinhardt wrote: > When doing reference negotiation, git-fetch-pack(1) is loading all refs > from disk in order to determine which commits it has in common with the > remote repository. This can be quite expensive in repositories with many > references though: in a real-world repository with around 2.2 million > refs, fetching a single commit by its ID takes around 44 seconds. > > Dominating the loading time is decompression and parsing of the objects > which are referenced by commits. Given the fact that we only care about > commits (or tags which can be peeled to one) in this context, there is > thus an easy performance win by switching the parsing logic to make use > of the commit graph in case we have one available. Like this, we avoid > hitting the object database to parse these commits but instead only load > them from the commit-graph. This results in a significant performance > boost when executing git-fetch in said repository with 2.2 million refs: > > Benchmark #1: HEAD~: git fetch $remote $commit > Time (mean ± σ): 44.168 s ± 0.341 s [User: 42.985 s, System: 1.106 s] > Range (min … max): 43.565 s … 44.577 s 10 runs > > Benchmark #2: HEAD: git fetch $remote $commit > Time (mean ± σ): 19.498 s ± 0.724 s [User: 18.751 s, System: 0.690 s] > Range (min … max): 18.629 s … 20.454 s 10 runs > > Summary > 'HEAD: git fetch $remote $commit' ran > 2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit' Nice. I've sometimes wondered if parse_object() should be doing this optimization itself. Though we'd possibly still want callers (like this one) to give us more hints, since we already know the type is OBJ_COMMIT. Whereas parse_object() would have to discover that itself (though we already incur the extra type lookup there to handle blobs). I wonder where the remaining 20s is going. Do you have a lot of tags in your repository? We'll still parse all of those, which could be expensive. There might be some benefit to using peel_iterated_ref(), which will make us of packed-ref's peel hints, but: - you'd want to double check that we always call this during ref iteration (it looks like we do, and I think peel_iterated_ref() falls back to a normal peel otherwise) - for a tag-of-tag-of-X, that will give us the complete peel to X. But it looks like deref_without_lazy_fetch() marks intermediate tags with the COMPLETE flag, too. I'm not sure how important that is (i.e., is it necessary for correctness, or just an optimization, in which case we might be better off guessing that tags are single-layer, as it's by far the common case). If we don't go that route, there's another possible speedup: after parsing a tag, the type of tag->tagged (if it is not NULL) will be known from the tag's contents, and we can avoid the oid_object_info_extended() type lookup. It might need some extra surgery to convince the tag-parse not to fetch promisor objects, though. I'm not sure it would make that big a difference, though. If we save one type-lookup per parsed tag, then the tag parsing is likely to dwarf it. > diff --git a/fetch-pack.c b/fetch-pack.c > index b0c7be717c..0bf7ed7e47 100644 > --- a/fetch-pack.c > +++ b/fetch-pack.c > @@ -137,8 +137,14 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid, > break; > } > } > - if (type == OBJ_COMMIT) > - return (struct commit *) parse_object(the_repository, oid); > + > + if (type == OBJ_COMMIT) { > + struct commit *commit = lookup_commit(the_repository, oid); > + if (!commit || repo_parse_commit(the_repository, commit)) > + return NULL; > + return commit; > + } Looks correct. You're using lookup_commit(), so we'll auto-create the struct as necessary. If there's any kind of type mismatch (say, previously we saw that oid as a non-commit), we'll get NULL there and bail, which makes sense. I think the original code could produce undefined behavior there if parse_object() found something other than "type", though in practice that is quite unlikely (since oid_object_info() would have just gone to the on-disk odb to get the type itself). -Peff
Jeff King <peff@peff.net> writes: > Nice. I've sometimes wondered if parse_object() should be doing this > optimization itself. Though we'd possibly still want callers (like this > one) to give us more hints, since we already know the type is > OBJ_COMMIT. Whereas parse_object() would have to discover that itself > (though we already incur the extra type lookup there to handle blobs). Ahh, you read one step further than I did ;-) Yes, if we are already inspecting type there, doing this optimization inside parse_object() becomes much easier to justify.
On Wed, Aug 04, 2021 at 04:59:40PM -0400, Jeff King wrote: > On Wed, Aug 04, 2021 at 03:56:11PM +0200, Patrick Steinhardt wrote: > > > When doing reference negotiation, git-fetch-pack(1) is loading all refs > > from disk in order to determine which commits it has in common with the > > remote repository. This can be quite expensive in repositories with many > > references though: in a real-world repository with around 2.2 million > > refs, fetching a single commit by its ID takes around 44 seconds. > > > > Dominating the loading time is decompression and parsing of the objects > > which are referenced by commits. Given the fact that we only care about > > commits (or tags which can be peeled to one) in this context, there is > > thus an easy performance win by switching the parsing logic to make use > > of the commit graph in case we have one available. Like this, we avoid > > hitting the object database to parse these commits but instead only load > > them from the commit-graph. This results in a significant performance > > boost when executing git-fetch in said repository with 2.2 million refs: > > > > Benchmark #1: HEAD~: git fetch $remote $commit > > Time (mean ± σ): 44.168 s ± 0.341 s [User: 42.985 s, System: 1.106 s] > > Range (min … max): 43.565 s … 44.577 s 10 runs > > > > Benchmark #2: HEAD: git fetch $remote $commit > > Time (mean ± σ): 19.498 s ± 0.724 s [User: 18.751 s, System: 0.690 s] > > Range (min … max): 18.629 s … 20.454 s 10 runs > > > > Summary > > 'HEAD: git fetch $remote $commit' ran > > 2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit' > > Nice. I've sometimes wondered if parse_object() should be doing this > optimization itself. Though we'd possibly still want callers (like this > one) to give us more hints, since we already know the type is > OBJ_COMMIT. Whereas parse_object() would have to discover that itself > (though we already incur the extra type lookup there to handle blobs). Would certainly make it much harder to hit this pitfall. The only thing one needs to be cautious about is that we need to somehow assert the object still exists in our ODB. Otherwise you may look up a commit via the commit-graph even though the commit doesn't exist anymore. > Do you have a lot of tags in your repository? No, it's only about 2000 tags. > I wonder where the remaining 20s is going. Rebasing this commit on top of my git-rev-list(1) series [1] for the connectivity check gives another 25% speedup, going down from 20s to 14s (numbers are a bit different given that I'm on a different machine right now). From here on, it's multiple things which take time: - 20% of the time is spent sorting the refs in `mark_complete_and_common_ref()`. This time around I feel less comfortable to just disable sorting given that it may impact correctness. - 30% of the time is spent looking up object types via `oid_object_info_extended()`, where 75% of these lookups come from `deref_without_lazy_fetch()`. This can be improved a bit by doing the `lookup_unknown_object()` dance, buying a modest speedup of ~8%. But this again has memory tradeoffs given that we must allocate the object such that all types would fit. Other than that I don't see any obvious things in the flame graphs. In case anybody is interested, I've posted flame graphs in our GitLab issue at [2], with the state before this patch, with this patch and in combination with [1]. [1]: http://public-inbox.org/git/cover.1627896460.git.ps@pks.im/ [2]: https://gitlab.com/gitlab-org/gitlab/-/issues/336657#note_642957933 > - you'd want to double check that we always call this during ref > iteration (it looks like we do, and I think peel_iterated_ref() > falls back to a normal peel otherwise) > > - for a tag-of-tag-of-X, that will give us the complete peel to X. But > it looks like deref_without_lazy_fetch() marks intermediate tags > with the COMPLETE flag, too. I'm not sure how important that is > (i.e., is it necessary for correctness, or just an optimization, in > which case we might be better off guessing that tags are > single-layer, as it's by far the common case). > > If we don't go that route, there's another possible speedup: after > parsing a tag, the type of tag->tagged (if it is not NULL) will be known > from the tag's contents, and we can avoid the oid_object_info_extended() > type lookup. It might need some extra surgery to convince the tag-parse > not to fetch promisor objects, though. > > I'm not sure it would make that big a difference, though. If we save one > type-lookup per parsed tag, then the tag parsing is likely to dwarf it. Yeah, I'd assume the same. And in any case, our repo doesn't really have any problems with tags given that there's so few of them. So I wouldn't really have the data to back up any performance improvements here. Patrick
On Thu, Aug 05, 2021 at 08:04:53AM +0200, Patrick Steinhardt wrote: > On Wed, Aug 04, 2021 at 04:59:40PM -0400, Jeff King wrote: > > On Wed, Aug 04, 2021 at 03:56:11PM +0200, Patrick Steinhardt wrote: [snip] > > I wonder where the remaining 20s is going. > > Rebasing this commit on top of my git-rev-list(1) series [1] for the > connectivity check gives another 25% speedup, going down from 20s to 14s > (numbers are a bit different given that I'm on a different machine right > now). From here on, it's multiple things which take time: > > - 20% of the time is spent sorting the refs in > `mark_complete_and_common_ref()`. This time around I feel less > comfortable to just disable sorting given that it may impact > correctness. > > - 30% of the time is spent looking up object types via > `oid_object_info_extended()`, where 75% of these lookups come from > `deref_without_lazy_fetch()`. This can be improved a bit by doing > the `lookup_unknown_object()` dance, buying a modest speedup of > ~8%. But this again has memory tradeoffs given that we must > allocate the object such that all types would fit. > > Other than that I don't see any obvious things in the flame graphs. In > case anybody is interested, I've posted flame graphs in our GitLab issue > at [2], with the state before this patch, with this patch and in > combination with [1]. > > [1]: http://public-inbox.org/git/cover.1627896460.git.ps@pks.im/ > [2]: https://gitlab.com/gitlab-org/gitlab/-/issues/336657#note_642957933 I've put some more time into this. If rebased on top of v4 of [1], then we can also use `parse_commit_in_graph_gently()` to further speed this up from 15.8 seconds to 11.6 seconds with below patch. It's the same memory/speed tradeoff as I'm doing in [1]. I guess I'd still like to treat both series separately for now given that [1] is more involved compared to this patch series here. I'll then do a follow-up when (if?) both series have landed. Patrick diff --git a/fetch-pack.c b/fetch-pack.c index 0bf7ed7e47..cc8b2ffa6c 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -119,6 +119,17 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid, { enum object_type type; struct object_info info = { .typep = &type }; + struct object *object = lookup_unknown_object(the_repository, oid); + + if (object->type == OBJ_COMMIT) + return (struct commit *) object; + + if (object->type == OBJ_NONE && + parse_commit_in_graph_gently(the_repository, object) == 0) { + if (!repo_has_object_file(the_repository, oid)) + return NULL; + return (struct commit *) object; + } while (1) { if (oid_object_info_extended(the_repository, oid, &info,
Patrick Steinhardt <ps@pks.im> writes: > I've put some more time into this. If rebased on top of v4 of [1], then > we can also use `parse_commit_in_graph_gently()` to further speed this > up from 15.8 seconds to 11.6 seconds with below patch. It's the same > memory/speed tradeoff as I'm doing in [1]. Sounds good. > I guess I'd still like to treat both series separately for now given > that [1] is more involved compared to this patch series here. I'll then > do a follow-up when (if?) both series have landed. During the pre-release freeze (cf. https://tinyurl.com/gitCal), new topics will not "land" anywhere so you have two weeks or so to stabilize the base topic. I would prefer not to have to worry about inter-dependent multiple topics anyway, especially the ones that are not relevant to the upcoming release during a pre-release period. Thanks.
On Wed, Aug 04 2021, Jeff King wrote: > [...] > If we don't go that route, there's another possible speedup: after > parsing a tag, the type of tag->tagged (if it is not NULL) will be known > from the tag's contents, and we can avoid the oid_object_info_extended() > type lookup. It might need some extra surgery to convince the tag-parse > not to fetch promisor objects, though. > > I'm not sure it would make that big a difference, though. If we save one > type-lookup per parsed tag, then the tag parsing is likely to dwarf it. Except that the tag type can be a lie, as our tests and some recent object/fsck serieses (not yet integrated) from me show. It can be relied upon for >99.99% of cases, but code like that needs to make sure it's not segfaulting or whatever if that relationship doesn't hold.
On Thu, Aug 05, 2021 at 09:05:59PM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Aug 04 2021, Jeff King wrote: > > > [...] > > If we don't go that route, there's another possible speedup: after > > parsing a tag, the type of tag->tagged (if it is not NULL) will be known > > from the tag's contents, and we can avoid the oid_object_info_extended() > > type lookup. It might need some extra surgery to convince the tag-parse > > not to fetch promisor objects, though. > > > > I'm not sure it would make that big a difference, though. If we save one > > type-lookup per parsed tag, then the tag parsing is likely to dwarf it. > > Except that the tag type can be a lie, as our tests and some recent > object/fsck serieses (not yet integrated) from me show. It can be relied > upon for >99.99% of cases, but code like that needs to make sure it's > not segfaulting or whatever if that relationship doesn't hold. Yes, but we'd discover that when we called repo_parse_commit(), and Patrick's patch already handles failure there. -Peff
On Thu, Aug 05, 2021 at 08:04:51AM +0200, Patrick Steinhardt wrote: > > Nice. I've sometimes wondered if parse_object() should be doing this > > optimization itself. Though we'd possibly still want callers (like this > > one) to give us more hints, since we already know the type is > > OBJ_COMMIT. Whereas parse_object() would have to discover that itself > > (though we already incur the extra type lookup there to handle blobs). > > Would certainly make it much harder to hit this pitfall. The only thing > one needs to be cautious about is that we need to somehow assert the > object still exists in our ODB. Otherwise you may look up a commit via > the commit-graph even though the commit doesn't exist anymore. True, though what I really wonder is what exactly people are depending on parse_object() for. I.e., how many callers care about making sure it exists, and how many would be happy to have things magically go faster? I'm not sure of a good way to answer that question, but I agree it's the sticking point on pushing the optimization down to that lower level. (In case it's not clear, this is all a question for the future, and shouldn't hold up your patch). > > Do you have a lot of tags in your repository? > > No, it's only about 2000 tags. Hmm. In that case, I wonder if we could be using the ref peel data in the opposite direction: 0. With modern packed-refs, the file tells us definitely whether an object was peel-able or not. 1. If it is peel-able, then do the same tag-peeling we do now. This handles the intermediate stages, etc, and the optimization doesn't help us. 2. If it isn't peel-able, then we know it's not a tag. We can _guess_ that it's a commit, because most refs that point to non-tags are, and try to look it up in the commit graph. 3. If we do find it in the commit graph, we win. We saved having to call oid_object_info() to get the type. 4. If we didn't, then we have to get the real type (it might simply be a commit that isn't in the graph files), and we lose. We did a pointless lookup in the graph file. I think in general we'd win on balance, because most refs do point to commits. But I'm not sure how big the win would be. You'd have to time it on your weird 2M-ref case (though your numbers below suggest it may be a few seconds). (And I know I'm spouting a lot of micro-optimization ideas here; I won't be offended if you don't feel like following up on them). > > I wonder where the remaining 20s is going. > > Rebasing this commit on top of my git-rev-list(1) series [1] for the > connectivity check gives another 25% speedup, going down from 20s to 14s > (numbers are a bit different given that I'm on a different machine right > now). From here on, it's multiple things which take time: > > - 20% of the time is spent sorting the refs in > `mark_complete_and_common_ref()`. This time around I feel less > comfortable to just disable sorting given that it may impact > correctness. > > - 30% of the time is spent looking up object types via > `oid_object_info_extended()`, where 75% of these lookups come from > `deref_without_lazy_fetch()`. This can be improved a bit by doing > the `lookup_unknown_object()` dance, buying a modest speedup of > ~8%. But this again has memory tradeoffs given that we must > allocate the object such that all types would fit. Thanks, that all makes sense. I think the suggestion I made above is similar to what you're thinking with lookup_unknown_commit(), but would avoid the extra allocations. -Peff
On Thu, Aug 05, 2021 at 01:53:50PM +0200, Patrick Steinhardt wrote: > I've put some more time into this. If rebased on top of v4 of [1], then > we can also use `parse_commit_in_graph_gently()` to further speed this > up from 15.8 seconds to 11.6 seconds with below patch. It's the same > memory/speed tradeoff as I'm doing in [1]. > > I guess I'd still like to treat both series separately for now given > that [1] is more involved compared to this patch series here. I'll then > do a follow-up when (if?) both series have landed. Heh, I should have read this before writing my other response. Your strategy here is what I imagined. If you split the find/fill steps from parse_commit_in_graph(), then you should be able to speculatively ask "if this is a commit in the graph, fill it in, otherwise do nothing". Which would solve the memory tradeoff. -Peff
diff --git a/fetch-pack.c b/fetch-pack.c index b0c7be717c..0bf7ed7e47 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -137,8 +137,14 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid, break; } } - if (type == OBJ_COMMIT) - return (struct commit *) parse_object(the_repository, oid); + + if (type == OBJ_COMMIT) { + struct commit *commit = lookup_commit(the_repository, oid); + if (!commit || repo_parse_commit(the_repository, commit)) + return NULL; + return commit; + } + return NULL; }
When doing reference negotiation, git-fetch-pack(1) is loading all refs from disk in order to determine which commits it has in common with the remote repository. This can be quite expensive in repositories with many references though: in a real-world repository with around 2.2 million refs, fetching a single commit by its ID takes around 44 seconds. Dominating the loading time is decompression and parsing of the objects which are referenced by commits. Given the fact that we only care about commits (or tags which can be peeled to one) in this context, there is thus an easy performance win by switching the parsing logic to make use of the commit graph in case we have one available. Like this, we avoid hitting the object database to parse these commits but instead only load them from the commit-graph. This results in a significant performance boost when executing git-fetch in said repository with 2.2 million refs: Benchmark #1: HEAD~: git fetch $remote $commit Time (mean ± σ): 44.168 s ± 0.341 s [User: 42.985 s, System: 1.106 s] Range (min … max): 43.565 s … 44.577 s 10 runs Benchmark #2: HEAD: git fetch $remote $commit Time (mean ± σ): 19.498 s ± 0.724 s [User: 18.751 s, System: 0.690 s] Range (min … max): 18.629 s … 20.454 s 10 runs Summary 'HEAD: git fetch $remote $commit' ran 2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit' Signed-off-by: Patrick Steinhardt <ps@pks.im> --- fetch-pack.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)