Message ID | xmqqzft6aozg.fsf_-_@gitster.g (mailing list archive) |
---|---|
State | Accepted |
Commit | 51441e6460b505c07b4a8a6deeaa7de4bf6e8e33 |
Headers | show |
Series | Re* using tree as attribute source is slow, was Re: Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42 | expand |
On Fri, May 03, 2024 at 08:34:27AM -0700, Junio C Hamano wrote: > And for folks who had been happy with the pre 2.42 behaviour, > we could do something like the attached as the first step to a real fix. It looks like lots of discussion happened with out me, and everybody already posted all of the responses I was going to. Good. :) In particular... > ----- >8 --------- >8 --------- >8 --------- >8 ----- > Subject: [PATCH] stop using HEAD for attributes in bare repository by default > [...] > The right fix for this is to optimize the code paths that allow > accesses to attributes in tree objects, but that is a much more > involved change and is left as a longer-term project, outside the > scope of this "first step" fix. ...this was the exact first step I was going to suggest. And your patch looks correct to me. I assume you'd target this for 'maint'. The regression goes back to v2.43.0, so it's not exactly new, but given the severity in some cases it seems like it's worth getting it into a release sooner rather than later. I am mildly surprised nobody noticed the issue until now. I wonder if t/perf would notice it and nobody is running it, or if this is a gap in our coverage there. If the latter, it might be worth adding such a script, which should be able to show off that your change here takes us back to the v2.42 state. Running the perf suite against linux.git between 2.42 and 2.43 would answer the "is this a gap" question, but I haven't had a chance to do so, and it takes a while. -Peff
On Fri, May 03, 2024 at 01:46:53PM -0400, Jeff King wrote: > On Fri, May 03, 2024 at 08:34:27AM -0700, Junio C Hamano wrote: > > > And for folks who had been happy with the pre 2.42 behaviour, > > we could do something like the attached as the first step to a real fix. > > It looks like lots of discussion happened with out me, and everybody > already posted all of the responses I was going to. Good. :) > > In particular... > > > ----- >8 --------- >8 --------- >8 --------- >8 ----- > > Subject: [PATCH] stop using HEAD for attributes in bare repository by default > > [...] > > The right fix for this is to optimize the code paths that allow > > accesses to attributes in tree objects, but that is a much more > > involved change and is left as a longer-term project, outside the > > scope of this "first step" fix. > > ...this was the exact first step I was going to suggest. And your patch > looks correct to me. I assume you'd target this for 'maint'. The > regression goes back to v2.43.0, so it's not exactly new, but given the > severity in some cases it seems like it's worth getting it into a > release sooner rather than later. For what it's worth, I am in favor of the patch that Junio proposed here. Thanks, Taylor
On 3 May 2024, at 11:34, Junio C Hamano wrote: > Dhruva Krishnamurthy <dhruvakm@gmail.com> writes: > >> On Thu, May 2, 2024 at 2:08 PM Junio C Hamano <gitster@pobox.com> wrote: >>> We could drop [1/2] from the series in the meantime to make it a >>> GitLab installation specific issue where they explicitly use >>> attr.tree to point at HEAD ;-) That is not solving anything for >>> those who set attr.tree (in a sense, they are buying the feature >>> with overhead of reading attributes from the named tree), but at >>> least for most people who are used to seeing the bare repository >>> ignoring the attributes, it would be an improvement to drop the >>> "bare repositories the tree of the HEAD commit is used to look up >>> attributes files by default" half from the series. >>> >> >> A hack (without knowing side effects if any) is to use an empty tree >> for attr source: >> $ git config --add attr.tree $(git hash-object -t tree /dev/null) >> >> This gives me performance comparable to git 2.42 > > That is clever. Instead of crawling a potentially large tree that > is at the HEAD of the main project payload to find ".gitattributes" > files that may be relevant (and often not), folks can set an empty > tree to attr.tree to the configuration until this gets corrected. > > And for folks who had been happy with the pre 2.42 behaviour, > we could do something like the attached as the first step to a real fix. > > ----- >8 --------- >8 --------- >8 --------- >8 ----- > Subject: [PATCH] stop using HEAD for attributes in bare repository by default > > With 23865355 (attr: read attributes from HEAD when bare repo, > 2023-10-13), we started to use the HEAD tree as the default > attribute source in a bare repository. One argument for such a > behaviour is that it would make things like "git archive" run in > bare and non-bare repositories for the same commit consistent. > This changes was merged to Git 2.43 but without an explicit mention > in its release notes. > > It turns out that this change destroys performance of shallowly > cloning from a bare repository. As the "server" installations are > expected to be mostly bare, and "git pack-objects", which is the > core of driving the other side of "git clone" and "git fetch" wants > to see if a path is set not to delta with blobs from other paths via > the attribute system, the change forces the server side to traverse > the tree of the HEAD commit needlessly to find if each and every > paths the objects it sends out has the attribute that controls the > deltification. Given that (1) most projects do not configure such > an attribute, and (2) it is dubious for the server side to honor > such an end-user supplied attribute anyway, this was a poor choice > of the default. > > To mitigate the current situation, let's revert the change that uses > the tree of HEAD in a bare repository by default as the attribute > source. This will help most people who have been happy with the > behaviour of Git 2.42 and before. This change makes sense to me, and glad it got uncovered. Thanks to all who chimed in with this root cause analysis and the proposed patches. Sorry I haven't replied until now-I was traveling the past two weeks. thanks John > > Two things to note: > > * If you are stuck with versions of Git 2.43 or newer, that is > older than the release this fix appears in, you can explicitly > set the attr.tree configuration variable to point at an empty > tree object, i.e. > > $ git config attr.tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 > > * If you like the behaviour we are reverting, you can explicitly > set the attr.tree configuration variable to HEAD, i.e. > > $ git config attr.tree HEAD > > The right fix for this is to optimize the code paths that allow > accesses to attributes in tree objects, but that is a much more > involved change and is left as a longer-term project, outside the > scope of this "first step" fix. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > attr.c | 7 ------- > t/t0003-attributes.sh | 10 ++++++++-- > t/t5001-archive-attr.sh | 3 ++- > 3 files changed, 10 insertions(+), 10 deletions(-) > > diff --git c/attr.c w/attr.c > index 679e42258c..6af7151088 100644 > --- c/attr.c > +++ w/attr.c > @@ -1223,13 +1223,6 @@ static void compute_default_attr_source(struct object_id *attr_source) > ignore_bad_attr_tree = 1; > } > > - if (!default_attr_source_tree_object_name && > - startup_info->have_repository && > - is_bare_repository()) { > - default_attr_source_tree_object_name = "HEAD"; > - ignore_bad_attr_tree = 1; > - } > - > if (!default_attr_source_tree_object_name || !is_null_oid(attr_source)) > return; > > diff --git c/t/t0003-attributes.sh w/t/t0003-attributes.sh > index 774b52c298..d755cc3c29 100755 > --- c/t/t0003-attributes.sh > +++ w/t/t0003-attributes.sh > @@ -398,13 +398,19 @@ test_expect_success 'bad attr source defaults to reading .gitattributes file' ' > ) > ' > > -test_expect_success 'bare repo defaults to reading .gitattributes from HEAD' ' > +test_expect_success 'bare repo no longer defaults to reading .gitattributes from HEAD' ' > test_when_finished rm -rf test bare_with_gitattribute && > git init test && > test_commit -C test gitattributes .gitattributes "f/path test=val" && > git clone --bare test bare_with_gitattribute && > - echo "f/path: test: val" >expect && > + > + echo "f/path: test: unspecified" >expect && > git -C bare_with_gitattribute check-attr test -- f/path >actual && > + test_cmp expect actual && > + > + echo "f/path: test: val" >expect && > + git -C bare_with_gitattribute -c attr.tree=HEAD \ > + check-attr test -- f/path >actual && > test_cmp expect actual > ' > > diff --git c/t/t5001-archive-attr.sh w/t/t5001-archive-attr.sh > index eaf959d8f6..7310774af5 100755 > --- c/t/t5001-archive-attr.sh > +++ w/t/t5001-archive-attr.sh > @@ -133,7 +133,8 @@ test_expect_success 'git archive vs. bare' ' > ' > > test_expect_success 'git archive with worktree attributes, bare' ' > - (cd bare && git archive --worktree-attributes HEAD) >bare-worktree.tar && > + (cd bare && > + git -c attr.tree=HEAD archive --worktree-attributes HEAD) >bare-worktree.tar && > (mkdir bare-worktree && cd bare-worktree && "$TAR" xf -) <bare-worktree.tar > '
diff --git c/attr.c w/attr.c index 679e42258c..6af7151088 100644 --- c/attr.c +++ w/attr.c @@ -1223,13 +1223,6 @@ static void compute_default_attr_source(struct object_id *attr_source) ignore_bad_attr_tree = 1; } - if (!default_attr_source_tree_object_name && - startup_info->have_repository && - is_bare_repository()) { - default_attr_source_tree_object_name = "HEAD"; - ignore_bad_attr_tree = 1; - } - if (!default_attr_source_tree_object_name || !is_null_oid(attr_source)) return; diff --git c/t/t0003-attributes.sh w/t/t0003-attributes.sh index 774b52c298..d755cc3c29 100755 --- c/t/t0003-attributes.sh +++ w/t/t0003-attributes.sh @@ -398,13 +398,19 @@ test_expect_success 'bad attr source defaults to reading .gitattributes file' ' ) ' -test_expect_success 'bare repo defaults to reading .gitattributes from HEAD' ' +test_expect_success 'bare repo no longer defaults to reading .gitattributes from HEAD' ' test_when_finished rm -rf test bare_with_gitattribute && git init test && test_commit -C test gitattributes .gitattributes "f/path test=val" && git clone --bare test bare_with_gitattribute && - echo "f/path: test: val" >expect && + + echo "f/path: test: unspecified" >expect && git -C bare_with_gitattribute check-attr test -- f/path >actual && + test_cmp expect actual && + + echo "f/path: test: val" >expect && + git -C bare_with_gitattribute -c attr.tree=HEAD \ + check-attr test -- f/path >actual && test_cmp expect actual ' diff --git c/t/t5001-archive-attr.sh w/t/t5001-archive-attr.sh index eaf959d8f6..7310774af5 100755 --- c/t/t5001-archive-attr.sh +++ w/t/t5001-archive-attr.sh @@ -133,7 +133,8 @@ test_expect_success 'git archive vs. bare' ' ' test_expect_success 'git archive with worktree attributes, bare' ' - (cd bare && git archive --worktree-attributes HEAD) >bare-worktree.tar && + (cd bare && + git -c attr.tree=HEAD archive --worktree-attributes HEAD) >bare-worktree.tar && (mkdir bare-worktree && cd bare-worktree && "$TAR" xf -) <bare-worktree.tar '