Message ID | ccd03e685af0f5cf25c68272a758fc88d115e37a.1629899211.git.ps@pks.im (mailing list archive) |
---|---|
State | Accepted |
Commit | f54b9f21cab23209697d9985c16a7bf4ee7b4241 |
Headers | show |
Series | ls-refs: reuse buffer when sending refs | expand |
On Wed, Aug 25, 2021 at 03:49:51PM +0200, Patrick Steinhardt wrote: > In the initial reference advertisement, the Git server will first > announce all of its references to the client. The logic is handled in > `send_ref()`, which will allocate a new buffer for each refline it is > about to send. This is quite wasteful: instead of allocating a new > buffer each time, we can just reuse a buffer. > > Improve this by passing in a buffer via the `ls_refs_data` struct which > is then reused on each reference. In a repository with about 2.3M refs, > this speeds up local mirror fetches by about 2%: > > Benchmark #1: HEAD~: git-fetch > Time (mean ± σ): 25.415 s ± 0.131 s [User: 22.722 s, System: 4.740 s] > Range (min … max): 25.240 s … 25.543 s 5 runs > > Benchmark #2: HEAD: git-fetch > Time (mean ± σ): 24.922 s ± 0.110 s [User: 22.404 s, System: 4.476 s] > Range (min … max): 24.825 s … 25.081 s 5 runs > > Summary > 'HEAD: git-fetch' ran > 1.02 ± 0.01 times faster than 'HEAD~: git-fetch' > > Signed-off-by: Patrick Steinhardt <ps@kps.im> I accidentally mis-typed my mail here. Really shows I should be using `--signoff` instead of manually signing things. Anyway, I'll fix this in case I'll need to re-roll. Patrick
On 8/25/2021 9:49 AM, Patrick Steinhardt wrote: > In the initial reference advertisement, the Git server will first > announce all of its references to the client. The logic is handled in > `send_ref()`, which will allocate a new buffer for each refline it is > about to send. This is quite wasteful: instead of allocating a new > buffer each time, we can just reuse a buffer. Reusing a buffer makes perfect sense and is a clear improvement. > Improve this by passing in a buffer via the `ls_refs_data` struct which > is then reused on each reference. In a repository with about 2.3M refs, > this speeds up local mirror fetches by about 2%: > > Benchmark #1: HEAD~: git-fetch > Time (mean ± σ): 25.415 s ± 0.131 s [User: 22.722 s, System: 4.740 s] > Range (min … max): 25.240 s … 25.543 s 5 runs > > Benchmark #2: HEAD: git-fetch > Time (mean ± σ): 24.922 s ± 0.110 s [User: 22.404 s, System: 4.476 s] > Range (min … max): 24.825 s … 25.081 s 5 runs > > Summary > 'HEAD: git-fetch' ran > 1.02 ± 0.01 times faster than 'HEAD~: git-fetch' > > Signed-off-by: Patrick Steinhardt <ps@kps.im> > --- > > Note that while this topic applies on top of "master", I've done the > benchmark on top of my other optimizations for fetches. It's cheating a > bit, but it's easier to see that the optimization does something when > the remaining constant part is lower. I don't mind demonstrating an optimization using the other work. Perhaps this would be better grouped with those other changes? I know that the text is independent and merges cleanly without it, but it can be helpful to think about the effort as one unified topic instead of juggling multiple, especially because I don't see the other one needing many revisions. > - struct strbuf refline = STRBUF_INIT; > + > + strbuf_reset(&data->buf); It's nice that this is the only _real_ change, and everything else is a find-and-replace. > @@ -145,6 +146,7 @@ int ls_refs(struct repository *r, struct strvec *keys, > > memset(&data, 0, sizeof(data)); > strvec_init(&data.prefixes); > + strbuf_init(&data.buf, 0); > > ensure_config_read(); > git_config(ls_refs_config, NULL); > @@ -173,6 +175,7 @@ int ls_refs(struct repository *r, struct strvec *keys, > send_ref, &data, 0); > packet_flush(1); > strvec_clear(&data.prefixes); > + strbuf_release(&data.buf); > return 0; > } Except, of course, these two lines. I think this patch is good to go! Thanks, -Stolee
On Wed, Aug 25, 2021 at 10:50:30AM -0400, Derrick Stolee wrote: > On 8/25/2021 9:49 AM, Patrick Steinhardt wrote: > > Improve this by passing in a buffer via the `ls_refs_data` struct which > > is then reused on each reference. In a repository with about 2.3M refs, > > this speeds up local mirror fetches by about 2%: > > > > Benchmark #1: HEAD~: git-fetch > > Time (mean ± σ): 25.415 s ± 0.131 s [User: 22.722 s, System: 4.740 s] > > Range (min … max): 25.240 s … 25.543 s 5 runs > > > > Benchmark #2: HEAD: git-fetch > > Time (mean ± σ): 24.922 s ± 0.110 s [User: 22.404 s, System: 4.476 s] > > Range (min … max): 24.825 s … 25.081 s 5 runs > > > > Summary > > 'HEAD: git-fetch' ran > > 1.02 ± 0.01 times faster than 'HEAD~: git-fetch' > > > > Signed-off-by: Patrick Steinhardt <ps@kps.im> > > --- > > > > Note that while this topic applies on top of "master", I've done the > > benchmark on top of my other optimizations for fetches. It's cheating a > > bit, but it's easier to see that the optimization does something when > > the remaining constant part is lower. > > I don't mind demonstrating an optimization using the other work. > > Perhaps this would be better grouped with those other changes? > I know that the text is independent and merges cleanly without it, > but it can be helpful to think about the effort as one unified > topic instead of juggling multiple, especially because I don't > see the other one needing many revisions. I don't know. I just happen to revisit this topic every few days, and every time I stumble upon some more performance improvements. It would feel wrong to shift the goalposts of the other series every time I find something new, so I instead opt for separate patch series. If this proves to be annoying for reviewers, then feel free to shout at me and I'll change my approach. Thanks for your review! Patrick
Patrick Steinhardt <ps@pks.im> writes: > I don't know. I just happen to revisit this topic every few days, and > every time I stumble upon some more performance improvements. It would > feel wrong to shift the goalposts of the other series every time I find > something new, so I instead opt for separate patch series. > > If this proves to be annoying for reviewers, then feel free to shout at > me and I'll change my approach. The easiest would be to keep them independent and to justify them independently. There is no shifting the goalposts involved if you did so. Of course, you wouldn't be able to include the improvements the follow-on topics make as part of the "advertising material" to sell the benefit of the base series, though. It can only work when the follow-on topic is truly independent and is a good idea by itself. Another is to keep these follow-on topics unpublished before the base topic graduates, and pretend that you came up with them much later than you originally did. Nobody would notice and mind, as the base topic would be cast in stone by that time. At the receiving end, what is most irritating is a series of topics that pretends to be about different things but depend on each other to function well [*]. I would imagine that it would be a lot more trivial and pleasant to handle if any of the patches involved did not come before these follow-on topics are all already thought of by the author and instead came in a well-structured single topic, but we do not live in a perfect world ;-). In the case of this particular patch, I think the logic behind the change makes sense by itself, so if I were doing it, I'd probably choose to sell it as an independent change unrelated to the other topic. Thanks. [Footnote] * Any time the base topic gets rerolled, I'd be the one who ends up having to remember which other topics that did not rerolled depend on it in what order and rebase them correctly, and then I have to replace the follow-on topics that have been rerolled. Even a single missed subtopic will cause the day's integration work redone, and all that robs time and concentration that the topics by other contributors needs from me, which would make me grumpy X-<.
diff --git a/ls-refs.c b/ls-refs.c index 88f6c3f60d..84021416ca 100644 --- a/ls-refs.c +++ b/ls-refs.c @@ -65,6 +65,7 @@ struct ls_refs_data { unsigned peel; unsigned symrefs; struct strvec prefixes; + struct strbuf buf; unsigned unborn : 1; }; @@ -73,7 +74,8 @@ static int send_ref(const char *refname, const struct object_id *oid, { struct ls_refs_data *data = cb_data; const char *refname_nons = strip_namespace(refname); - struct strbuf refline = STRBUF_INIT; + + strbuf_reset(&data->buf); if (ref_is_hidden(refname_nons, refname)) return 0; @@ -82,9 +84,9 @@ static int send_ref(const char *refname, const struct object_id *oid, return 0; if (oid) - strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons); + strbuf_addf(&data->buf, "%s %s", oid_to_hex(oid), refname_nons); else - strbuf_addf(&refline, "unborn %s", refname_nons); + strbuf_addf(&data->buf, "unborn %s", refname_nons); if (data->symrefs && flag & REF_ISSYMREF) { struct object_id unused; const char *symref_target = resolve_ref_unsafe(refname, 0, @@ -94,20 +96,19 @@ static int send_ref(const char *refname, const struct object_id *oid, if (!symref_target) die("'%s' is a symref but it is not?", refname); - strbuf_addf(&refline, " symref-target:%s", + strbuf_addf(&data->buf, " symref-target:%s", strip_namespace(symref_target)); } if (data->peel && oid) { struct object_id peeled; if (!peel_iterated_oid(oid, &peeled)) - strbuf_addf(&refline, " peeled:%s", oid_to_hex(&peeled)); + strbuf_addf(&data->buf, " peeled:%s", oid_to_hex(&peeled)); } - strbuf_addch(&refline, '\n'); - packet_write(1, refline.buf, refline.len); + strbuf_addch(&data->buf, '\n'); + packet_write(1, data->buf.buf, data->buf.len); - strbuf_release(&refline); return 0; } @@ -145,6 +146,7 @@ int ls_refs(struct repository *r, struct strvec *keys, memset(&data, 0, sizeof(data)); strvec_init(&data.prefixes); + strbuf_init(&data.buf, 0); ensure_config_read(); git_config(ls_refs_config, NULL); @@ -173,6 +175,7 @@ int ls_refs(struct repository *r, struct strvec *keys, send_ref, &data, 0); packet_flush(1); strvec_clear(&data.prefixes); + strbuf_release(&data.buf); return 0; }
In the initial reference advertisement, the Git server will first announce all of its references to the client. The logic is handled in `send_ref()`, which will allocate a new buffer for each refline it is about to send. This is quite wasteful: instead of allocating a new buffer each time, we can just reuse a buffer. Improve this by passing in a buffer via the `ls_refs_data` struct which is then reused on each reference. In a repository with about 2.3M refs, this speeds up local mirror fetches by about 2%: Benchmark #1: HEAD~: git-fetch Time (mean ± σ): 25.415 s ± 0.131 s [User: 22.722 s, System: 4.740 s] Range (min … max): 25.240 s … 25.543 s 5 runs Benchmark #2: HEAD: git-fetch Time (mean ± σ): 24.922 s ± 0.110 s [User: 22.404 s, System: 4.476 s] Range (min … max): 24.825 s … 25.081 s 5 runs Summary 'HEAD: git-fetch' ran 1.02 ± 0.01 times faster than 'HEAD~: git-fetch' Signed-off-by: Patrick Steinhardt <ps@kps.im> --- Note that while this topic applies on top of "master", I've done the benchmark on top of my other optimizations for fetches. It's cheating a bit, but it's easier to see that the optimization does something when the remaining constant part is lower. ls-refs.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)