diff mbox series

[v2] send-pack.c: add config push.useBitmaps

Message ID pull.1263.v2.git.1655350617442.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] send-pack.c: add config push.useBitmaps | expand

Commit Message

Kyle Zhao June 16, 2022, 3:36 a.m. UTC
From: Kyle Zhao <kylezhao@tencent.com>

This allows you to disable bitmaps for "git push". Default is false.

Reachability bitmaps are designed to speed up the "counting objects"
phase of generating a pack during a clone or fetch. They are not
optimized for Git clients sending a small topic branch via "git push".
In some cases (see [1]), using reachability bitmaps during "git push"
can cause significant performance regressions.

Add a new "push.useBitmaps" config option to disable reachability
bitmaps during "git push". This allows reachability bitmaps to still
be used in other areas, such as "git rev-list --use-bitmap-index".

[1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/

Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
---
    send-pack.c: add config push.useBitmaps
    
    This patch add config push.useBitmaps to prevent git push using bitmap.
    
    The origin discussion is here:
    https://lore.kernel.org/git/b940e705fbe9454685757f2e3055e2ce@tencent.com/
    
    Thanks, -Kyle
    
    Changes since v1:
    
     * changed the commit message
     * modified and added missing \n to push.txt
     * used test_subcommand for test
     * modified "if" statement for "git_config_get_bool()" in send-pack.c

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1263%2Fkeyu98%2Fkz%2Fpush-usebitmps-config-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1263/keyu98/kz/push-usebitmps-config-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1263

Range-diff vs v1:

 1:  000d033584b ! 1:  42e0b4845b2 send-pack.c: add config push.useBitmaps
     @@ Metadata
       ## Commit message ##
          send-pack.c: add config push.useBitmaps
      
     -    This allows you to disabled bitmaps for "git push". Default is false.
     +    This allows you to disable bitmaps for "git push". Default is false.
      
     -    Bitmaps are designed to speed up the "counting objects" phase of
     -    subsequent packs created for clones and fetches.
     -    But in some cases, turning bitmaps on does horrible things for "push"
     -    performance[1].
     +    Reachability bitmaps are designed to speed up the "counting objects"
     +    phase of generating a pack during a clone or fetch. They are not
     +    optimized for Git clients sending a small topic branch via "git push".
     +    In some cases (see [1]), using reachability bitmaps during "git push"
     +    can cause significant performance regressions.
     +
     +    Add a new "push.useBitmaps" config option to disable reachability
     +    bitmaps during "git push". This allows reachability bitmaps to still
     +    be used in other areas, such as "git rev-list --use-bitmap-index".
      
          [1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/
      
     @@ Documentation/config/push.txt: push.negotiate::
       	in common.
      +
      +push.useBitmaps::
     -+	If this config and `pack.useBitmaps` are both "true", git will
     -+	use pack bitmaps (if available) when git push. Default is false.
     - \ No newline at end of file
     ++    If this config and `pack.useBitmaps` are both `true`, then Git will
     ++    use reachability bitmaps during `git push`, if available (disabled
     ++    by default).
      
       ## send-pack.c ##
      @@ send-pack.c: static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised,
     @@ send-pack.c: static int pack_objects(int fd, struct ref *refs, struct oid_array
       	po.out = args->stateless_rpc ? -1 : fd;
       	po.git_cmd = 1;
      @@ send-pack.c: int send_pack(struct send_pack_args *args,
     - 	int use_push_options = 0;
     - 	int push_options_supported = 0;
     - 	int object_format_supported = 0;
     -+	int use_bitmaps = 0;
     - 	unsigned cmds_sent = 0;
     - 	int ret;
       	struct async demux;
     + 	const char *push_cert_nonce = NULL;
     + 	struct packet_reader reader;
     ++	int use_bitmaps;
     + 
     + 	if (!remote_refs) {
     + 		fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
      @@ send-pack.c: int send_pack(struct send_pack_args *args,
     - 	git_config_get_bool("push.negotiate", &push_negotiate);
       	if (push_negotiate)
       		get_commons_through_negotiation(args->url, remote_refs, &commons);
     -+	git_config_get_bool("push.usebitmaps", &use_bitmaps);
     -+	if (use_bitmaps)
     -+		args->use_bitmaps = 1;
       
     ++	if (!git_config_get_bool("push.usebitmaps", &use_bitmaps))
     ++		args->use_bitmaps = use_bitmaps;
     ++
       	git_config_get_bool("transfer.advertisesid", &advertise_sid);
       
     + 	/* Does the other end support the reporting? */
      
       ## send-pack.h ##
      @@ send-pack.h: struct send_pack_args {
     @@ t/t5516-fetch-push.sh: test_expect_success 'push warns or fails when using usern
      +test_expect_success 'push with config push.useBitmaps' '
      +	mk_test testrepo heads/main &&
      +	git checkout main &&
     -+	GIT_TRACE=1 git push testrepo main:test >/dev/null 2>stderr &&
     -+	grep "no-use-bitmap-index" stderr &&
     ++	GIT_TRACE2_EVENT="$PWD/default" \
     ++	git push testrepo main:test &&
     ++	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
     ++	--thin --delta-base-offset -q --no-use-bitmap-index <default &&
      +
      +	git config push.useBitmaps true &&
     -+	GIT_TRACE=1 git push testrepo main:test2 >/dev/null 2>stderr &&
     -+	! grep "no-use-bitmap-index" stderr
     ++	GIT_TRACE2_EVENT="$PWD/true" \
     ++	git push testrepo main:test2 &&
     ++	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
     ++	--thin --delta-base-offset -q <true &&
     ++
     ++	git config push.useBitmaps false &&
     ++	GIT_TRACE2_EVENT="$PWD/false" \
     ++	git push testrepo main:test3 &&
     ++	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
     ++	--thin --delta-base-offset -q --no-use-bitmap-index <false
      +'
      +
       test_done


 Documentation/config/push.txt |  5 +++++
 send-pack.c                   |  6 ++++++
 send-pack.h                   |  3 ++-
 t/t5516-fetch-push.sh         | 21 +++++++++++++++++++++
 4 files changed, 34 insertions(+), 1 deletion(-)


base-commit: 8168d5e9c23ed44ae3d604f392320d66556453c9

Comments

Derrick Stolee June 16, 2022, 1:02 p.m. UTC | #1
On 6/15/2022 11:36 PM, Kyle Zhao via GitGitGadget wrote:
> From: Kyle Zhao <kylezhao@tencent.com>
> 
> This allows you to disable bitmaps for "git push". Default is false.
> 
> Reachability bitmaps are designed to speed up the "counting objects"
> phase of generating a pack during a clone or fetch. They are not
> optimized for Git clients sending a small topic branch via "git push".
> In some cases (see [1]), using reachability bitmaps during "git push"
> can cause significant performance regressions.
> 
> Add a new "push.useBitmaps" config option to disable reachability
> bitmaps during "git push". This allows reachability bitmaps to still
> be used in other areas, such as "git rev-list --use-bitmap-index".
> 
> [1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/
> 
> Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
> ---
>     send-pack.c: add config push.useBitmaps
>     
>     This patch add config push.useBitmaps to prevent git push using bitmap.
>     
>     The origin discussion is here:
>     https://lore.kernel.org/git/b940e705fbe9454685757f2e3055e2ce@tencent.com/
>     
>     Thanks, -Kyle
>     
>     Changes since v1:
>     
>      * changed the commit message
>      * modified and added missing \n to push.txt
>      * used test_subcommand for test
>      * modified "if" statement for "git_config_get_bool()" in send-pack.c

This version satisfies the recommended changes I had from the last
version. As long as we think there is a benefit to having this
additional knob over just pack.useBitmaps, then this is good to go.

For my part, I think having the additional knob is less complicated
than requiring all users in this situation to shift from "push" to
a "fetch" (from the other side of the connection).

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason June 16, 2022, 1:38 p.m. UTC | #2
On Thu, Jun 16 2022, Kyle Zhao via GitGitGadget wrote:

> From: Kyle Zhao <kylezhao@tencent.com>
>
> This allows you to disable bitmaps for "git push". Default is false.
>
> Reachability bitmaps are designed to speed up the "counting objects"
> phase of generating a pack during a clone or fetch. They are not
> optimized for Git clients sending a small topic branch via "git push".
> In some cases (see [1]), using reachability bitmaps during "git push"
> can cause significant performance regressions.
>
> Add a new "push.useBitmaps" config option to disable reachability
> bitmaps during "git push". This allows reachability bitmaps to still
> be used in other areas, such as "git rev-list --use-bitmap-index".
>
> [1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/
> [...]
> +	mk_test testrepo heads/main &&
> +	git checkout main &&
> +	GIT_TRACE2_EVENT="$PWD/default" \
> +	git push testrepo main:test &&
> +	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
> +	--thin --delta-base-offset -q --no-use-bitmap-index <default &&

Nit: We tend to indent these ase we wrap, so e.g.:

	test_subcommand git ... \
		--thin --delta [...]

The rest all looks good as far as the diff goes, if what we want to do
is to disable this by default, and this isn't worth a re-roll in itself.

But I still think that completely disabling bitmaps might be premature
here, especially per Taylor's comment on v1 (which I understand to mean
that they should help some of the time, even with push).

And since this is linking to that old thread I started in 2019 I really
think the commit message should be exploring and justifying why this
might be slower in all cases, or at least a sufficient number of cases
to flip the default.

if it's not are we throwing out a generally useful optimization (by
default) due to some edge cases we've found?

E.g. have you tried to follow-up on this comment by Jeff King?
https://lore.kernel.org/git/20190523113031.GA17448@sigill.intra.peff.net/

At the time I didn't, because as noted in a follow-up I'd lost my test
case by the time I read that, but it seems you haven't, and have a
current test case.

If you apply that patch on that old git version at the time (as it
probably won't apply cleanly), does it make a difference for your case?

To be clear I'm *not* confident that this isn't the right thing to do as
far as the diff is concerned. But I *am* confident that the proposed
commit message isn't selling me on the idea.

I think a really good start would be to come up with some benchmarks for
a few common cases, e.g. how do bitmaps do if we have a big repo but few
refs, how about a lot of refs? Does their stale-ness make a difference
or not (per previous discussions) etc. etc.

Or, if you don't have time for any of that I think a good change for now
would be to add the flag, but not flip the default, and say "we don't
have enough data to say if bitmaps are overall worse for pushes, but
here's an opt-opt".

But the change as it stands is effectively saying "bitmaps on push
hinder more than they help, and turning them on for push was a mistake".

Maybe that's true, but I don't think we've got any data to support
that. Even though I've got one of those anecdotes (and from occasional
investigations of slow "git push" I'm pretty sure this would help my
use-cases more than it would harm them) the plural of anecdote isn't
data.

I've only personally run into this with repos with a large number (>30k
at least) number of refs, and where the bitmaps were in some state of
staleness (this usually being my working copy, where I rely on "git gc
--auto").
Kyle Zhao June 16, 2022, 3:11 p.m. UTC | #3
> And since this is linking to that old thread I started in 2019 I really
>think the commit message should be exploring and justifying why this
> might be slower in all cases, or at least a sufficient number of cases
> to flip the default.
> 
> if it's not are we throwing out a generally useful optimization (by
> default) due to some edge cases we've found?

I'm not sure if it's an edge case.
However, if a git repo has this problem, it's "git push" will always be very slow
instead of going fast and slow.

> E.g. have you tried to follow-up on this comment by Jeff King?
> https://lore.kernel.org/git/20190523113031.GA17448@sigill.intra.peff.net/

Not yet, I'll try it later.

> At the time I didn't, because as noted in a follow-up I'd lost my test
> case by the time I read that, but it seems you haven't, and have a
> current test case.

I tried to find a test case in open-source projects, and finally found one.
$ git clone https://github.com/JetBrains/intellij-community.git --bare
$ cd intellij-community.git
$ git repack -adb
$ GIT_TRACE=1 git push . master:test1

Then it would get stuck in "git pack-objects" for about 10s. (OS: Windows)
After I deleted the bitmap, "git push" took less than 1s.


> I think a really good start would be to come up with some benchmarks for
> a few common cases, e.g. how do bitmaps do if we have a big repo but few
> refs, how about a lot of refs? Does their stale-ness make a difference
> or not (per previous discussions) etc. etc.

According to my observation, the problem always occur on git repos which have 
a lot of refs and objects.
Also, "git push"  takes much time on "find _objects(..) for haves_bitmap".

Hope the information above can help find the cause of the problem.

Thanks,
- Kyle
Junio C Hamano June 16, 2022, 6:12 p.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +	mk_test testrepo heads/main &&
>> +	git checkout main &&
>> +	GIT_TRACE2_EVENT="$PWD/default" \
>> +	git push testrepo main:test &&
>> +	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
>> +	--thin --delta-base-offset -q --no-use-bitmap-index <default &&
>
> Nit: We tend to indent these ase we wrap, so e.g.:
>
> 	test_subcommand git ... \
> 		--thin --delta [...]
>
> The rest all looks good as far as the diff goes, if what we want to do
> is to disable this by default, and this isn't worth a re-roll in itself.
>
> But I still think that completely disabling bitmaps might be premature
> here, especially per Taylor's comment on v1 (which I understand to mean
> that they should help some of the time, even with push).

The usual way to move is to move slowly and carefully.

It may well be the case that disabling bitmaps gives users a better
default, but that is not even proven and is hard to prove.  How many
users of Git do we have?  Those silently using it happily will by
definition complain here or elsewhere, and the complaints "X is slow
with Y, so Y should be disabled when doing X" we hear tend to be
louder than "I am happily doing X with Y".

I have different problems with this patch, though.  It can use a bit
more honesty.  If you introduce a new knob and sell it as a knob
that allows disabling, be honest and keep its behaviour as
advertised.

As posted, IIUC, the patch does something quite different.  It
disables by default, and have a knob to allow it enabled again.

So, perhaps make it default on to keep the historical behaviour, and
document it as "setting it false may improve push performance without
affecting use of the reachability bitmaps for other operations.
Default is true."

Thanks.
Jeff King June 16, 2022, 9:04 p.m. UTC | #5
On Thu, Jun 16, 2022 at 03:38:36PM +0200, Ævar Arnfjörð Bjarmason wrote:

> But the change as it stands is effectively saying "bitmaps on push
> hinder more than they help, and turning them on for push was a mistake".
> 
> Maybe that's true, but I don't think we've got any data to support
> that. Even though I've got one of those anecdotes (and from occasional
> investigations of slow "git push" I'm pretty sure this would help my
> use-cases more than it would harm them) the plural of anecdote isn't
> data.

Yeah, I'm not convinced they hinder more than they help on net. And any
problems on pushing are also things that _could_ happen on fetching, so
I think this is really a band-aid. But:

  - even if the average is improved, it's reasonable to want to avoid
    the most pathological cases

  - it's reasonable for it to be the user's choice to make, and right
    now the config isn't expressive enough to allow the choice they want

My biggest fear would be that the pathological cases are improved
(perhaps with the approach of mine that you linked earlier; thanks for
that), but that people erroneously continue to think that turning off
push.useBitmaps is a good idea due to inertia or superstition. But that
seems like a minor problem that can be addressed later if need be.

-Peff
Taylor Blau June 16, 2022, 9:17 p.m. UTC | #6
On Thu, Jun 16, 2022 at 03:11:09PM +0000, kylezhao(赵柯宇) wrote:
> > At the time I didn't, because as noted in a follow-up I'd lost my test
> > case by the time I read that, but it seems you haven't, and have a
> > current test case.
>
> I tried to find a test case in open-source projects, and finally found one.
> $ git clone https://github.com/JetBrains/intellij-community.git --bare
> $ cd intellij-community.git
> $ git repack -adb
> $ GIT_TRACE=1 git push . master:test1

I wouldn't expect this to push any objects at all, since you're pushing
to a repository that already has all of the objects contained in
`master`.

A more representative test might be something like:

    $ git clone https://github.com/JetBrains/intellij-community.git --bare
    $ cd intellij-community.git
    $ git repack -adb
    $ git rev-parse HEAD >in
    $ time git pack-objects --revs --stdout <in >/dev/null
    # move the bitmap away so we don't use it
    $ mv objects/pack/pack-*.bitmap{,.bak}
    $ time git pack-objects --revs --stdout <in >/dev/null

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/config/push.txt b/Documentation/config/push.txt
index e32801e6c91..3f3ff66fe7c 100644
--- a/Documentation/config/push.txt
+++ b/Documentation/config/push.txt
@@ -137,3 +137,8 @@  push.negotiate::
 	server attempt to find commits in common. If "false", Git will
 	rely solely on the server's ref advertisement to find commits
 	in common.
+
+push.useBitmaps::
+    If this config and `pack.useBitmaps` are both `true`, then Git will
+    use reachability bitmaps during `git push`, if available (disabled
+    by default).
diff --git a/send-pack.c b/send-pack.c
index bc0fcdbb000..627e79d7623 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -84,6 +84,8 @@  static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised,
 		strvec_push(&po.args, "--progress");
 	if (is_repository_shallow(the_repository))
 		strvec_push(&po.args, "--shallow");
+	if (!args->use_bitmaps)
+		strvec_push(&po.args, "--no-use-bitmap-index");
 	po.in = -1;
 	po.out = args->stateless_rpc ? -1 : fd;
 	po.git_cmd = 1;
@@ -487,6 +489,7 @@  int send_pack(struct send_pack_args *args,
 	struct async demux;
 	const char *push_cert_nonce = NULL;
 	struct packet_reader reader;
+	int use_bitmaps;
 
 	if (!remote_refs) {
 		fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
@@ -498,6 +501,9 @@  int send_pack(struct send_pack_args *args,
 	if (push_negotiate)
 		get_commons_through_negotiation(args->url, remote_refs, &commons);
 
+	if (!git_config_get_bool("push.usebitmaps", &use_bitmaps))
+		args->use_bitmaps = use_bitmaps;
+
 	git_config_get_bool("transfer.advertisesid", &advertise_sid);
 
 	/* Does the other end support the reporting? */
diff --git a/send-pack.h b/send-pack.h
index e148fcd9609..f7af1b0353e 100644
--- a/send-pack.h
+++ b/send-pack.h
@@ -26,7 +26,8 @@  struct send_pack_args {
 		/* One of the SEND_PACK_PUSH_CERT_* constants. */
 		push_cert:2,
 		stateless_rpc:1,
-		atomic:1;
+		atomic:1,
+		use_bitmaps:1;
 	const struct string_list *push_options;
 };
 
diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
index dedca106a7a..0d416d1474f 100755
--- a/t/t5516-fetch-push.sh
+++ b/t/t5516-fetch-push.sh
@@ -1865,4 +1865,25 @@  test_expect_success 'push warns or fails when using username:password' '
 	test_line_count = 1 warnings
 '
 
+test_expect_success 'push with config push.useBitmaps' '
+	mk_test testrepo heads/main &&
+	git checkout main &&
+	GIT_TRACE2_EVENT="$PWD/default" \
+	git push testrepo main:test &&
+	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
+	--thin --delta-base-offset -q --no-use-bitmap-index <default &&
+
+	git config push.useBitmaps true &&
+	GIT_TRACE2_EVENT="$PWD/true" \
+	git push testrepo main:test2 &&
+	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
+	--thin --delta-base-offset -q <true &&
+
+	git config push.useBitmaps false &&
+	GIT_TRACE2_EVENT="$PWD/false" \
+	git push testrepo main:test3 &&
+	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
+	--thin --delta-base-offset -q --no-use-bitmap-index <false
+'
+
 test_done