Message ID | pull.1263.git.1655291320433.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | send-pack.c: add config push.useBitmaps | expand |
On 6/15/22 7:08 AM, Kyle Zhao via GitGitGadget wrote: > From: Kyle Zhao <kylezhao@tencent.com> > > This allows you to disabled bitmaps for "git push". Default is false. s/disabled/disable/ > Bitmaps are designed to speed up the "counting objects" phase of > subsequent packs created for clones and fetches. > But in some cases, turning bitmaps on does horrible things for "push" > performance[1]. I would rephrase this message body as follows: Reachability bitmaps are designed to speed up the "counting objects" phase of generating a pack during a clone or fetch. They are not optimized for Git clients sending a small topic branch via "git push". In some cases (see [1]), using reachability bitmaps during "git push" can cause significant performance regressions. Add a new "push.useBitmaps" config option to disable reachability bitmaps during "git push". This allows reachability bitmaps to still be used in other areas, such as "git rev-list --use-bitmap-index". > [1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/ > + > +push.useBitmaps:: > + If this config and `pack.useBitmaps` are both "true", git will > + use pack bitmaps (if available) when git push. Default is false. Rewording slightly: If this config and `pack.useBitmaps` are both `true`, then Git will use reachability bitmaps during `git push`, if available (disabled by default). > \ No newline at end of file Please fix this missing newline. I'm glad that this references `pack.useBitmaps`. I wonder if that config is sufficient for your purposes: do you expect to use your bitmaps to generate pack-files in any other way than "git push"? That is: are you serving remote requests from your repo with your bitmaps while also using "git push"? Or, are you using bitmaps only for things like "git rev-list --use-bitmap-index"? I just want to compare this with `pack.useSparse` which targets "git push", but focuses entirely on the pack-creation part of the operation. Since the existence of reachability bitmaps overrides the sparse algorithm, this does not affect Git servers (who should have a reachability bitmap). I just want to be sure that using pack.useBitmaps=false would not be sufficient for your situation (and probably most situations). > diff --git a/send-pack.c b/send-pack.c > index bc0fcdbb000..d6091571caa 100644 > --- a/send-pack.c > +++ b/send-pack.c > @@ -84,6 +84,8 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised, > strvec_push(&po.args, "--progress"); > if (is_repository_shallow(the_repository)) > strvec_push(&po.args, "--shallow"); > + if (!args->use_bitmaps) > + strvec_push(&po.args, "--no-use-bitmap-index"); Here is where you specify the lower-level pack creation only when sending a pack. It is very focused. Good. > + int use_bitmaps = 0; > + git_config_get_bool("push.usebitmaps", &use_bitmaps); > + if (use_bitmaps) > + args->use_bitmaps = 1; You can instead write this as: if (!git_config_get_bool("push.usebitmaps", &use_bitmaps)) args->use_bitmaps = use_bitmaps; > diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh > index dedca106a7a..ee0b912a5e8 100755 > --- a/t/t5516-fetch-push.sh > +++ b/t/t5516-fetch-push.sh > @@ -1865,4 +1865,15 @@ test_expect_success 'push warns or fails when using username:password' ' > test_line_count = 1 warnings > ' > > +test_expect_success 'push with config push.useBitmaps' ' > + mk_test testrepo heads/main && > + git checkout main && > + GIT_TRACE=1 git push testrepo main:test >/dev/null 2>stderr && Just use "err" instead of "stderr". > + grep "no-use-bitmap-index" stderr && > + > + git config push.useBitmaps true && > + GIT_TRACE=1 git push testrepo main:test2 >/dev/null 2>stderr && > + ! grep "no-use-bitmap-index" stderr > +' I believe this test is correct, but it might be worth looking into test_subcommand if you can determine the exact subcommand arguments you are looking for. Thanks, -Stolee
On Wed, Jun 15 2022, Kyle Zhao via GitGitGadget wrote: [CC'd Taylor, who's worked a lot on bitmaps] > From: Kyle Zhao <kylezhao@tencent.com> > > This allows you to disabled bitmaps for "git push". Default is false. Thanks for following up. Refresh my memory, we always use them if we find them now, correct? I.e. if repack.writebitmaps=true This doesn't make it clear: Are we now going to ignore them for "push" by default, even if they exist? I.e. a change in the current default. I think I recall from the previous discussions that it was a bit of hit & miss, maybe we're confident that they're almost (or always?) bad for "push", but I think there *are* cases where they help. > Bitmaps are designed to speed up the "counting objects" phase of > subsequent packs created for clones and fetches. > But in some cases, turning bitmaps on does horrible things for "push" > performance[1]. > > [1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/ > > Signed-off-by: Kyle Zhao <kylezhao@tencent.com> > --- > send-pack.c: add config push.useBitmaps > > This patch add config push.useBitmaps to prevent git push using bitmap. > > The origin discussion is here: > https://lore.kernel.org/git/b940e705fbe9454685757f2e3055e2ce@tencent.com/ > > Thanks, Kyle > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1263%2Fkeyu98%2Fkz%2Fpush-usebitmps-config-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1263/keyu98/kz/push-usebitmps-config-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1263 > > Documentation/config/push.txt | 4 ++++ > send-pack.c | 6 ++++++ > send-pack.h | 3 ++- > t/t5516-fetch-push.sh | 11 +++++++++++ > 4 files changed, 23 insertions(+), 1 deletion(-) > > diff --git a/Documentation/config/push.txt b/Documentation/config/push.txt > index e32801e6c91..d8fb0bd1414 100644 > --- a/Documentation/config/push.txt > +++ b/Documentation/config/push.txt > @@ -137,3 +137,7 @@ push.negotiate:: > server attempt to find commits in common. If "false", Git will > rely solely on the server's ref advertisement to find commits > in common. > + > +push.useBitmaps:: > + If this config and `pack.useBitmaps` are both "true", git will > + use pack bitmaps (if available) when git push. Default is false. > \ No newline at end of file "git diff" is telling you something is wrong here :) hint: missing \n. > diff --git a/send-pack.c b/send-pack.c > index bc0fcdbb000..d6091571caa 100644 > --- a/send-pack.c > +++ b/send-pack.c > @@ -84,6 +84,8 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised, > strvec_push(&po.args, "--progress"); > if (is_repository_shallow(the_repository)) > strvec_push(&po.args, "--shallow"); > + if (!args->use_bitmaps) > + strvec_push(&po.args, "--no-use-bitmap-index"); > po.in = -1; > po.out = args->stateless_rpc ? -1 : fd; > po.git_cmd = 1; > @@ -482,6 +484,7 @@ int send_pack(struct send_pack_args *args, > int use_push_options = 0; > int push_options_supported = 0; > int object_format_supported = 0; > + int use_bitmaps = 0; > unsigned cmds_sent = 0; > int ret; > struct async demux; > @@ -497,6 +500,9 @@ int send_pack(struct send_pack_args *args, > git_config_get_bool("push.negotiate", &push_negotiate); > if (push_negotiate) > get_commons_through_negotiation(args->url, remote_refs, &commons); > + git_config_get_bool("push.usebitmaps", &use_bitmaps); > + if (use_bitmaps) > + args->use_bitmaps = 1; [A bit rambling, sorry] A bit off of a use of the API, can't we just do: git_config_get_bool("push.usebitmaps", &args->use_bitmaps); And drop the local variable? I.e. if we have a config variable we'll write the value to it. But then again that goes with the suggested default, i.e. I think we should probably use them by default, and provide an out, unless we're *really* sure they're useless for "push". In this case I found the code a bit odd, which took me a moment to put my finger on. In builtin/send-pack.c we initialize "struct send_pack_args" in the file scope, so it's zero'd out. So all parameters are 0'd by default. So your new "use_bitmaps" is born false. Then when we get here you assign 0 to use_bitmaps, and only if it's true do you use it. Which to me is a couple of layers in to something that's less clear than it could be, i.e. we're making the hard assumption here that the default in the struct is false. So we should really just do: int tmp; if (!git_config_get_bool("push.usebitmaps", &tmp)) args->use_bitmaps = tmp; So don't care what the default was before, we have an explicit config variable we're going to use, if we saw push.usebitmaps let's use its value. This way the default can flip, and this code downstream of that will still do what's intended. FWIW that "if" is redundant, but it's the general idiom, but we *can* do it the way I suggested above, but I think it's clearer to go with the second form, which reads more obviously as "if the variable exists, set it to ...". For config.c in particular the "without the if" works, but I think that's relying on an implementation detail, i.e. we have a few other APIs that "zero out" the parameter you put in as the first thing they do. So if they also return "I didn't error" you want to use a temp variable, and only assign it to your "real" variable if the return value was "OK". > > git_config_get_bool("transfer.advertisesid", &advertise_sid); > > diff --git a/send-pack.h b/send-pack.h > index e148fcd9609..f7af1b0353e 100644 > --- a/send-pack.h > +++ b/send-pack.h > @@ -26,7 +26,8 @@ struct send_pack_args { > /* One of the SEND_PACK_PUSH_CERT_* constants. */ > push_cert:2, > stateless_rpc:1, > - atomic:1; > + atomic:1, > + use_bitmaps:1; > const struct string_list *push_options; > }; > > diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh > index dedca106a7a..ee0b912a5e8 100755 > --- a/t/t5516-fetch-push.sh > +++ b/t/t5516-fetch-push.sh > @@ -1865,4 +1865,15 @@ test_expect_success 'push warns or fails when using username:password' ' > test_line_count = 1 warnings > ' > > +test_expect_success 'push with config push.useBitmaps' ' > + mk_test testrepo heads/main && > + git checkout main && > + GIT_TRACE=1 git push testrepo main:test >/dev/null 2>stderr && We generally don't >/dev/null in tests, just emit the output, and if we run with -v you'll see it. In this case though you want just: GIT_TRACE="$PWD/trace.log" [...] Without any redirection, Also, you probably want GIT_TRACE2_EVENT=$PWD/trace.json, and see "test_subcommand". We have a handy helper for finding if we have trace2 regions. I'm assuming we have some region for "used bitmaps" already, but I didn't check...
On Wed, Jun 15, 2022 at 09:09:35AM -0400, Derrick Stolee wrote: > I just want to be sure that using pack.useBitmaps=false would not > be sufficient for your situation (and probably most situations). I think the only other affected scenario on the client side would be repacking. And in theory most clients are repacking in the background anyways, so any speed-ups from using bitmaps wouldn't be noticeable anyway. I think just relying on the existing pack.useBitmaps config should be sufficient here. Thanks, Taylor
On Wed, Jun 15, 2022 at 09:47:58PM +0200, Ævar Arnfjörð Bjarmason wrote: > Refresh my memory, we always use them if we find them now, correct? > I.e. if repack.writebitmaps=true Kind of. `rev-list` has an opt-in `--use-bitmap-index` option, and `pack-objects` has `pack.useBitmaps` which controls whether or not bitmaps are read. So `git rev-list --objects HEAD` won't use any bitmaps, even if they exist, but `git rev-list --objects --use-bitmap-index HEAD` will. There there's a good reason not to use bitmaps by default, which is that they (effectively) randomize the ordering of your output. `pack-objects` behaves slightly differently, cf. its `use_bitmap_index{,default}` variables to see how that works. > This doesn't make it clear: Are we now going to ignore them for "push" > by default, even if they exist? I.e. a change in the current default. > > I think I recall from the previous discussions that it was a bit of hit > & miss, maybe we're confident that they're almost (or always?) bad for > "push", but I think there *are* cases where they help. Yeah. In theory they should help most of the time as long as the bitmaps are kept reasonably up-to-date. There's a tradeoff between using bitmaps and how much walking between bitmap and ref-tips is required. Since every object we encounter between the most recent bitmap and the thing we're trying to generate a bitmap for has to incur extra overhead in order to find and mark its position in the bitmap being generated, there's a certain point at which it would have been faster to just walk down to the roots and avoid bitmaps altogether. But I suspect that in this case the bitmaps are just simply stale, and that a "git repack -adb" or more aggressive background maintenance would make things quite a bit faster. Thanks, Taylor
> > I'm glad that this references `pack.useBitmaps`. I wonder if that config is sufficient for your purposes: do you expect to use your bitmaps to generate pack-files in any other way > > I just want to be sure that using pack.useBitmaps=false would not be > > sufficient for your situation (and probably most situations). > > I think the only other affected scenario on the client side would be repacking. And in theory most clients are repacking in the background anyways, so any speed-ups from using bitmaps wouldn't be noticeable anyway. > > I think just relying on the existing pack.useBitmaps config should be sufficient here. In fact ,we also use "git push" on our server side. Each git repositories have multiple replicas on our servers to improve system disaster tolerance and read performance. For example, a git repo will be distributed on multiple servers (like server-1, server-2, server-3). If user pushes the pack to server-1, then server-1 will call "git push" to transfer the objects data to server-2 and server-3. And users can clone from all the server mentioned above. Under such a process, our system works well most of the time. If we set pack.useBitmaps=false, "git upload-pack" will also be affected. For my situation, it would be sufficient when set both pack.useBitmaps=true and push.useBitmaps=false. > But I suspect that in this case the bitmaps are just simply stale, and > that a "git repack -adb" or more aggressive background maintenance would > make things quite a bit faster. It doesn't seem to be the reason. I have already called "git repack -adb" in this case[1] but that didn't seem to fix the push performance issue. You can see that the git repo had only one pack at that time. [1] https://lore.kernel.org/git/b940e705fbe9454685757f2e3055e2ce@tencent.com/ Thanks, - Kyle
diff --git a/Documentation/config/push.txt b/Documentation/config/push.txt index e32801e6c91..d8fb0bd1414 100644 --- a/Documentation/config/push.txt +++ b/Documentation/config/push.txt @@ -137,3 +137,7 @@ push.negotiate:: server attempt to find commits in common. If "false", Git will rely solely on the server's ref advertisement to find commits in common. + +push.useBitmaps:: + If this config and `pack.useBitmaps` are both "true", git will + use pack bitmaps (if available) when git push. Default is false. \ No newline at end of file diff --git a/send-pack.c b/send-pack.c index bc0fcdbb000..d6091571caa 100644 --- a/send-pack.c +++ b/send-pack.c @@ -84,6 +84,8 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised, strvec_push(&po.args, "--progress"); if (is_repository_shallow(the_repository)) strvec_push(&po.args, "--shallow"); + if (!args->use_bitmaps) + strvec_push(&po.args, "--no-use-bitmap-index"); po.in = -1; po.out = args->stateless_rpc ? -1 : fd; po.git_cmd = 1; @@ -482,6 +484,7 @@ int send_pack(struct send_pack_args *args, int use_push_options = 0; int push_options_supported = 0; int object_format_supported = 0; + int use_bitmaps = 0; unsigned cmds_sent = 0; int ret; struct async demux; @@ -497,6 +500,9 @@ int send_pack(struct send_pack_args *args, git_config_get_bool("push.negotiate", &push_negotiate); if (push_negotiate) get_commons_through_negotiation(args->url, remote_refs, &commons); + git_config_get_bool("push.usebitmaps", &use_bitmaps); + if (use_bitmaps) + args->use_bitmaps = 1; git_config_get_bool("transfer.advertisesid", &advertise_sid); diff --git a/send-pack.h b/send-pack.h index e148fcd9609..f7af1b0353e 100644 --- a/send-pack.h +++ b/send-pack.h @@ -26,7 +26,8 @@ struct send_pack_args { /* One of the SEND_PACK_PUSH_CERT_* constants. */ push_cert:2, stateless_rpc:1, - atomic:1; + atomic:1, + use_bitmaps:1; const struct string_list *push_options; }; diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh index dedca106a7a..ee0b912a5e8 100755 --- a/t/t5516-fetch-push.sh +++ b/t/t5516-fetch-push.sh @@ -1865,4 +1865,15 @@ test_expect_success 'push warns or fails when using username:password' ' test_line_count = 1 warnings ' +test_expect_success 'push with config push.useBitmaps' ' + mk_test testrepo heads/main && + git checkout main && + GIT_TRACE=1 git push testrepo main:test >/dev/null 2>stderr && + grep "no-use-bitmap-index" stderr && + + git config push.useBitmaps true && + GIT_TRACE=1 git push testrepo main:test2 >/dev/null 2>stderr && + ! grep "no-use-bitmap-index" stderr +' + test_done