diff mbox series

Re* using tree as attribute source is slow, was Re: Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42

Message ID xmqqzft6aozg.fsf_-_@gitster.g (mailing list archive)
State Accepted
Commit 51441e6460b505c07b4a8a6deeaa7de4bf6e8e33
Headers show
Series Re* using tree as attribute source is slow, was Re: Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42 | expand

Commit Message

Junio C Hamano May 3, 2024, 3:34 p.m. UTC
Dhruva Krishnamurthy <dhruvakm@gmail.com> writes:

> On Thu, May 2, 2024 at 2:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>> We could drop [1/2] from the series in the meantime to make it a
>> GitLab installation specific issue where they explicitly use
>> attr.tree to point at HEAD ;-) That is not solving anything for
>> those who set attr.tree (in a sense, they are buying the feature
>> with overhead of reading attributes from the named tree), but at
>> least for most people who are used to seeing the bare repository
>> ignoring the attributes, it would be an improvement to drop the
>> "bare repositories the tree of the HEAD commit is used to look up
>> attributes files by default" half from the series.
>>
>
> A hack (without knowing side effects if any) is to use an empty tree
> for attr source:
> $ git config --add attr.tree $(git hash-object -t tree /dev/null)
>
> This gives me performance comparable to git 2.42

That is clever.  Instead of crawling a potentially large tree that
is at the HEAD of the main project payload to find ".gitattributes"
files that may be relevant (and often not), folks can set an empty
tree to attr.tree to the configuration until this gets corrected.

And for folks who had been happy with the pre 2.42 behaviour,
we could do something like the attached as the first step to a real fix.

----- >8 --------- >8 --------- >8 --------- >8 -----
Subject: [PATCH] stop using HEAD for attributes in bare repository by default

With 23865355 (attr: read attributes from HEAD when bare repo,
2023-10-13), we started to use the HEAD tree as the default
attribute source in a bare repository.  One argument for such a
behaviour is that it would make things like "git archive" run in
bare and non-bare repositories for the same commit consistent.
This changes was merged to Git 2.43 but without an explicit mention
in its release notes.

It turns out that this change destroys performance of shallowly
cloning from a bare repository.  As the "server" installations are
expected to be mostly bare, and "git pack-objects", which is the
core of driving the other side of "git clone" and "git fetch" wants
to see if a path is set not to delta with blobs from other paths via
the attribute system, the change forces the server side to traverse
the tree of the HEAD commit needlessly to find if each and every
paths the objects it sends out has the attribute that controls the
deltification.  Given that (1) most projects do not configure such
an attribute, and (2) it is dubious for the server side to honor
such an end-user supplied attribute anyway, this was a poor choice
of the default.

To mitigate the current situation, let's revert the change that uses
the tree of HEAD in a bare repository by default as the attribute
source.  This will help most people who have been happy with the
behaviour of Git 2.42 and before.

Two things to note:

 * If you are stuck with versions of Git 2.43 or newer, that is
   older than the release this fix appears in, you can explicitly
   set the attr.tree configuration variable to point at an empty
   tree object, i.e.

	$ git config attr.tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904

 * If you like the behaviour we are reverting, you can explicitly
   set the attr.tree configuration variable to HEAD, i.e.

	$ git config attr.tree HEAD

The right fix for this is to optimize the code paths that allow
accesses to attributes in tree objects, but that is a much more
involved change and is left as a longer-term project, outside the
scope of this "first step" fix.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 attr.c                  |  7 -------
 t/t0003-attributes.sh   | 10 ++++++++--
 t/t5001-archive-attr.sh |  3 ++-
 3 files changed, 10 insertions(+), 10 deletions(-)

Comments

Jeff King May 3, 2024, 5:46 p.m. UTC | #1
On Fri, May 03, 2024 at 08:34:27AM -0700, Junio C Hamano wrote:

> And for folks who had been happy with the pre 2.42 behaviour,
> we could do something like the attached as the first step to a real fix.

It looks like lots of discussion happened with out me, and everybody
already posted all of the responses I was going to. Good. :)

In particular...

> ----- >8 --------- >8 --------- >8 --------- >8 -----
> Subject: [PATCH] stop using HEAD for attributes in bare repository by default
> [...]
> The right fix for this is to optimize the code paths that allow
> accesses to attributes in tree objects, but that is a much more
> involved change and is left as a longer-term project, outside the
> scope of this "first step" fix.

...this was the exact first step I was going to suggest. And your patch
looks correct to me. I assume you'd target this for 'maint'. The
regression goes back to v2.43.0, so it's not exactly new, but given the
severity in some cases it seems like it's worth getting it into a
release sooner rather than later.

I am mildly surprised nobody noticed the issue until now. I wonder if
t/perf would notice it and nobody is running it, or if this is a gap in
our coverage there. If the latter, it might be worth adding such a
script, which should be able to show off that your change here takes us
back to the v2.42 state.

Running the perf suite against linux.git between 2.42 and 2.43 would
answer the "is this a gap" question, but I haven't had a chance to do
so, and it takes a while.

-Peff
Taylor Blau May 6, 2024, 8:28 p.m. UTC | #2
On Fri, May 03, 2024 at 01:46:53PM -0400, Jeff King wrote:
> On Fri, May 03, 2024 at 08:34:27AM -0700, Junio C Hamano wrote:
>
> > And for folks who had been happy with the pre 2.42 behaviour,
> > we could do something like the attached as the first step to a real fix.
>
> It looks like lots of discussion happened with out me, and everybody
> already posted all of the responses I was going to. Good. :)
>
> In particular...
>
> > ----- >8 --------- >8 --------- >8 --------- >8 -----
> > Subject: [PATCH] stop using HEAD for attributes in bare repository by default
> > [...]
> > The right fix for this is to optimize the code paths that allow
> > accesses to attributes in tree objects, but that is a much more
> > involved change and is left as a longer-term project, outside the
> > scope of this "first step" fix.
>
> ...this was the exact first step I was going to suggest. And your patch
> looks correct to me. I assume you'd target this for 'maint'. The
> regression goes back to v2.43.0, so it's not exactly new, but given the
> severity in some cases it seems like it's worth getting it into a
> release sooner rather than later.

For what it's worth, I am in favor of the patch that Junio proposed
here.

Thanks,
Taylor
John Cai May 13, 2024, 8:16 p.m. UTC | #3
On 3 May 2024, at 11:34, Junio C Hamano wrote:

> Dhruva Krishnamurthy <dhruvakm@gmail.com> writes:
>
>> On Thu, May 2, 2024 at 2:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>>> We could drop [1/2] from the series in the meantime to make it a
>>> GitLab installation specific issue where they explicitly use
>>> attr.tree to point at HEAD ;-) That is not solving anything for
>>> those who set attr.tree (in a sense, they are buying the feature
>>> with overhead of reading attributes from the named tree), but at
>>> least for most people who are used to seeing the bare repository
>>> ignoring the attributes, it would be an improvement to drop the
>>> "bare repositories the tree of the HEAD commit is used to look up
>>> attributes files by default" half from the series.
>>>
>>
>> A hack (without knowing side effects if any) is to use an empty tree
>> for attr source:
>> $ git config --add attr.tree $(git hash-object -t tree /dev/null)
>>
>> This gives me performance comparable to git 2.42
>
> That is clever.  Instead of crawling a potentially large tree that
> is at the HEAD of the main project payload to find ".gitattributes"
> files that may be relevant (and often not), folks can set an empty
> tree to attr.tree to the configuration until this gets corrected.
>
> And for folks who had been happy with the pre 2.42 behaviour,
> we could do something like the attached as the first step to a real fix.
>
> ----- >8 --------- >8 --------- >8 --------- >8 -----
> Subject: [PATCH] stop using HEAD for attributes in bare repository by default
>
> With 23865355 (attr: read attributes from HEAD when bare repo,
> 2023-10-13), we started to use the HEAD tree as the default
> attribute source in a bare repository.  One argument for such a
> behaviour is that it would make things like "git archive" run in
> bare and non-bare repositories for the same commit consistent.
> This changes was merged to Git 2.43 but without an explicit mention
> in its release notes.
>
> It turns out that this change destroys performance of shallowly
> cloning from a bare repository.  As the "server" installations are
> expected to be mostly bare, and "git pack-objects", which is the
> core of driving the other side of "git clone" and "git fetch" wants
> to see if a path is set not to delta with blobs from other paths via
> the attribute system, the change forces the server side to traverse
> the tree of the HEAD commit needlessly to find if each and every
> paths the objects it sends out has the attribute that controls the
> deltification.  Given that (1) most projects do not configure such
> an attribute, and (2) it is dubious for the server side to honor
> such an end-user supplied attribute anyway, this was a poor choice
> of the default.
>
> To mitigate the current situation, let's revert the change that uses
> the tree of HEAD in a bare repository by default as the attribute
> source.  This will help most people who have been happy with the
> behaviour of Git 2.42 and before.

This change makes sense to me, and glad it got uncovered. Thanks to all who
chimed in with this root cause analysis and the proposed patches. Sorry I
haven't replied until now-I was traveling the past two weeks.

thanks
John

>
> Two things to note:
>
>  * If you are stuck with versions of Git 2.43 or newer, that is
>    older than the release this fix appears in, you can explicitly
>    set the attr.tree configuration variable to point at an empty
>    tree object, i.e.
>
> 	$ git config attr.tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
>
>  * If you like the behaviour we are reverting, you can explicitly
>    set the attr.tree configuration variable to HEAD, i.e.
>
> 	$ git config attr.tree HEAD
>
> The right fix for this is to optimize the code paths that allow
> accesses to attributes in tree objects, but that is a much more
> involved change and is left as a longer-term project, outside the
> scope of this "first step" fix.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  attr.c                  |  7 -------
>  t/t0003-attributes.sh   | 10 ++++++++--
>  t/t5001-archive-attr.sh |  3 ++-
>  3 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git c/attr.c w/attr.c
> index 679e42258c..6af7151088 100644
> --- c/attr.c
> +++ w/attr.c
> @@ -1223,13 +1223,6 @@ static void compute_default_attr_source(struct object_id *attr_source)
>  		ignore_bad_attr_tree = 1;
>  	}
>
> -	if (!default_attr_source_tree_object_name &&
> -	    startup_info->have_repository &&
> -	    is_bare_repository()) {
> -		default_attr_source_tree_object_name = "HEAD";
> -		ignore_bad_attr_tree = 1;
> -	}
> -
>  	if (!default_attr_source_tree_object_name || !is_null_oid(attr_source))
>  		return;
>
> diff --git c/t/t0003-attributes.sh w/t/t0003-attributes.sh
> index 774b52c298..d755cc3c29 100755
> --- c/t/t0003-attributes.sh
> +++ w/t/t0003-attributes.sh
> @@ -398,13 +398,19 @@ test_expect_success 'bad attr source defaults to reading .gitattributes file' '
>  	)
>  '
>
> -test_expect_success 'bare repo defaults to reading .gitattributes from HEAD' '
> +test_expect_success 'bare repo no longer defaults to reading .gitattributes from HEAD' '
>  	test_when_finished rm -rf test bare_with_gitattribute &&
>  	git init test &&
>  	test_commit -C test gitattributes .gitattributes "f/path test=val" &&
>  	git clone --bare test bare_with_gitattribute &&
> -	echo "f/path: test: val" >expect &&
> +
> +	echo "f/path: test: unspecified" >expect &&
>  	git -C bare_with_gitattribute check-attr test -- f/path >actual &&
> +	test_cmp expect actual &&
> +
> +	echo "f/path: test: val" >expect &&
> +	git -C bare_with_gitattribute -c attr.tree=HEAD \
> +		check-attr test -- f/path >actual &&
>  	test_cmp expect actual
>  '
>
> diff --git c/t/t5001-archive-attr.sh w/t/t5001-archive-attr.sh
> index eaf959d8f6..7310774af5 100755
> --- c/t/t5001-archive-attr.sh
> +++ w/t/t5001-archive-attr.sh
> @@ -133,7 +133,8 @@ test_expect_success 'git archive vs. bare' '
>  '
>
>  test_expect_success 'git archive with worktree attributes, bare' '
> -	(cd bare && git archive --worktree-attributes HEAD) >bare-worktree.tar &&
> +	(cd bare &&
> +	git -c attr.tree=HEAD archive --worktree-attributes HEAD) >bare-worktree.tar &&
>  	(mkdir bare-worktree && cd bare-worktree && "$TAR" xf -) <bare-worktree.tar
>  '
diff mbox series

Patch

diff --git c/attr.c w/attr.c
index 679e42258c..6af7151088 100644
--- c/attr.c
+++ w/attr.c
@@ -1223,13 +1223,6 @@  static void compute_default_attr_source(struct object_id *attr_source)
 		ignore_bad_attr_tree = 1;
 	}
 
-	if (!default_attr_source_tree_object_name &&
-	    startup_info->have_repository &&
-	    is_bare_repository()) {
-		default_attr_source_tree_object_name = "HEAD";
-		ignore_bad_attr_tree = 1;
-	}
-
 	if (!default_attr_source_tree_object_name || !is_null_oid(attr_source))
 		return;
 
diff --git c/t/t0003-attributes.sh w/t/t0003-attributes.sh
index 774b52c298..d755cc3c29 100755
--- c/t/t0003-attributes.sh
+++ w/t/t0003-attributes.sh
@@ -398,13 +398,19 @@  test_expect_success 'bad attr source defaults to reading .gitattributes file' '
 	)
 '
 
-test_expect_success 'bare repo defaults to reading .gitattributes from HEAD' '
+test_expect_success 'bare repo no longer defaults to reading .gitattributes from HEAD' '
 	test_when_finished rm -rf test bare_with_gitattribute &&
 	git init test &&
 	test_commit -C test gitattributes .gitattributes "f/path test=val" &&
 	git clone --bare test bare_with_gitattribute &&
-	echo "f/path: test: val" >expect &&
+
+	echo "f/path: test: unspecified" >expect &&
 	git -C bare_with_gitattribute check-attr test -- f/path >actual &&
+	test_cmp expect actual &&
+
+	echo "f/path: test: val" >expect &&
+	git -C bare_with_gitattribute -c attr.tree=HEAD \
+		check-attr test -- f/path >actual &&
 	test_cmp expect actual
 '
 
diff --git c/t/t5001-archive-attr.sh w/t/t5001-archive-attr.sh
index eaf959d8f6..7310774af5 100755
--- c/t/t5001-archive-attr.sh
+++ w/t/t5001-archive-attr.sh
@@ -133,7 +133,8 @@  test_expect_success 'git archive vs. bare' '
 '
 
 test_expect_success 'git archive with worktree attributes, bare' '
-	(cd bare && git archive --worktree-attributes HEAD) >bare-worktree.tar &&
+	(cd bare &&
+	git -c attr.tree=HEAD archive --worktree-attributes HEAD) >bare-worktree.tar &&
 	(mkdir bare-worktree && cd bare-worktree && "$TAR" xf -) <bare-worktree.tar
 '