Message ID | 68c60aff0c77c562aba5613ccbb9ab33ad8e0e08.1621451532.git.ps@pks.im (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Speed up connectivity checks via quarantine dir | expand |
Patrick Steinhardt wrote: > In the case where git-receive-pack(1) receives only commands which > delete references, then per technical specification the client MUST NOT > send a packfile. As a result, we know that no new objects have been > received, which makes it a moot point to check whether all received > objects are fully connected. I don't know if this is related but yesterday I decided to delete a bunch of refs from a forked repo in GitHub. I did it naively with a for loop and so it was doing a bunch of `git push myrepo :ref`. It was unbearably slow. Sure, it was a stupid thing to do, but maybe it can help you do some tests. Cheers.
On Fri, May 21, 2021 at 01:53:49PM -0500, Felipe Contreras wrote: > Patrick Steinhardt wrote: > > In the case where git-receive-pack(1) receives only commands which > > delete references, then per technical specification the client MUST NOT > > send a packfile. As a result, we know that no new objects have been > > received, which makes it a moot point to check whether all received > > objects are fully connected. > > I don't know if this is related but yesterday I decided to delete a > bunch of refs from a forked repo in GitHub. I did it naively with a for > loop and so it was doing a bunch of `git push myrepo :ref`. > > It was unbearably slow. > > Sure, it was a stupid thing to do, but maybe it can help you do some > tests. Patrick's patch might help some, as it would avoid calling rev-list at all. But we wouldn't do any traversal in that command if there are no positive tips anyway, so it is really just saving the startup overhead of iterating the ref tips to add them to the traversal. In the case of GitHub, the problem is much more likely outside of Git's immediate control. Every push will run GitHub-specific hooks for things like branch protections, etc, and there's a lot of overhead there. -Peff
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index a34742513a..b9263cec15 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1918,11 +1918,8 @@ static void execute_commands(struct command *commands, struct shallow_info *si, const struct string_list *push_options) { - struct check_connected_options opt = CHECK_CONNECTED_INIT; struct command *cmd; struct iterate_data data; - struct async muxer; - int err_fd = 0; int run_proc_receive = 0; if (unpacker_error) { @@ -1931,25 +1928,39 @@ static void execute_commands(struct command *commands, return; } - if (use_sideband) { - memset(&muxer, 0, sizeof(muxer)); - muxer.proc = copy_to_sideband; - muxer.in = -1; - if (!start_async(&muxer)) - err_fd = muxer.in; - /* ...else, continue without relaying sideband */ - } - data.cmds = commands; data.si = si; - opt.err_fd = err_fd; - opt.progress = err_fd && !quiet; - opt.env = tmp_objdir_env(tmp_objdir); - if (check_connected(iterate_receive_command_list, &data, &opt)) - set_connectivity_errors(commands, si); - if (use_sideband) - finish_async(&muxer); + /* + * If received commands only consist of deletions, then the client MUST + * NOT send a packfile because there cannot be any new objects in the + * first place. As a result, we do not set up a quarantine environment + * because we know no new objects will be received. And that in turn + * means that we can skip connectivity checks here. + */ + if (tmp_objdir) { + struct check_connected_options opt = CHECK_CONNECTED_INIT; + struct async muxer; + int err_fd = 0; + + if (use_sideband) { + memset(&muxer, 0, sizeof(muxer)); + muxer.proc = copy_to_sideband; + muxer.in = -1; + if (!start_async(&muxer)) + err_fd = muxer.in; + /* ...else, continue without relaying sideband */ + } + + opt.err_fd = err_fd; + opt.progress = err_fd && !quiet; + opt.env = tmp_objdir_env(tmp_objdir); + if (check_connected(iterate_receive_command_list, &data, &opt)) + set_connectivity_errors(commands, si); + + if (use_sideband) + finish_async(&muxer); + } reject_updates_to_hidden(commands);
In the case where git-receive-pack(1) receives only commands which delete references, then per technical specification the client MUST NOT send a packfile. As a result, we know that no new objects have been received, which makes it a moot point to check whether all received objects are fully connected. Fix this by not doing a connectivity check in case there were no pushed objects. Given that git-rev-walk(1) with only negative references will not do any graph walk, no performance improvements are to be expected. Conceptionally, it is still the right thing to do though. The following tests were executed on linux.git and back up above expectation: Test v2.32.0-rc0 HEAD -------------------------------------------------------------------------------------------- 5400.3: receive-pack clone create 1.27(1.11+0.16) 1.26(1.12+0.14) -0.8% 5400.5: receive-pack clone update 1.27(1.13+0.13) 1.27(1.11+0.16) +0.0% 5400.7: receive-pack clone reset 0.13(0.11+0.02) 0.14(0.11+0.02) +7.7% 5400.9: receive-pack clone delete 0.02(0.01+0.01) 0.02(0.00+0.01) +0.0% 5400.11: receive-pack extrarefs create 33.01(18.80+14.43) 32.63(18.52+14.24) -1.2% 5400.13: receive-pack extrarefs update 33.13(18.85+14.50) 32.82(18.85+14.29) -0.9% 5400.15: receive-pack extrarefs reset 32.90(18.82+14.32) 32.70(18.76+14.20) -0.6% 5400.17: receive-pack extrarefs delete 9.13(4.35+4.77) 8.99(4.28+4.70) -1.5% 5400.19: receive-pack empty create 223.35(640.63+127.74) 226.96(655.16+131.93) +1.6% Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/receive-pack.c | 49 ++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 19 deletions(-)