diff mbox series

object-name: fix reversed ordering with magic pathspecs

Message ID 20241206-pks-rev-parse-fix-reversed-list-v1-1-95a96564a4d7@pks.im (mailing list archive)
State New
Headers show
Series object-name: fix reversed ordering with magic pathspecs | expand

Commit Message

Patrick Steinhardt Dec. 6, 2024, 9:51 a.m. UTC
It was reported that magic pathspecs started to return results in
reverse recency order starting with Git v2.47.0. This behaviour bisects
to 57fb139b5e (object-name: fix leaking commit list items, 2024-08-01),
which has refactored `get_oid_oneline()` to plug a couple of memory
leaks.

As part of that refactoring we have started to create a copy of the
passed-in list of commits and work on that list instead. The intent of
this was to stop modifying the caller-provided commit list such that the
caller can properly free all commit list items, including those that we
might potentially drop.

We already knew to create such a copy beforehand with the `backup` list,
which was used to clear the `ONELINE_SEEN` commit mark after we were
done. So the refactoring simply renamed that list to `copy` and started
to operate on that list instead. There is a gotcha though: the backup
list, and thus now also the copied list, is always being prepended to,
so the resulting list is in reverse order! The end result is that we
pop commits from the wrong end of the commit list, returning commits in
reverse recency order.

Fix the bug by appending to the list instead.

Reported-by: Aarni Koskela <aarni@valohai.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
This patch applies on top of v2.47.0, which is the first version which
had this regression.
---
 object-name.c                 |  4 ++--
 t/t4208-log-magic-pathspec.sh | 13 +++++++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)


---
base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2
change-id: 20241206-pks-rev-parse-fix-reversed-list-0f94a20a6165

Comments

Kristoffer Haugsbakk Dec. 6, 2024, 11:20 a.m. UTC | #1
Hi

On Fri, Dec 6, 2024, at 10:51, Patrick Steinhardt wrote:
> It was reported

It would be nice with a hyperlink since this email is not part of the
email thread for the report.

https://lore.kernel.org/git/Z1LJSADiStlFicTL@pks.im/T/#mae906ec74d5657e702165e29b90927f730280c29

> It was reported that magic pathspecs started to return results in

I’m not used to this being called “magic pathspecs” as a user.
gitglossary(7) talks about (magic) pathspecs as filepaths.

(c.f. gitrevisions(7) where this revision selection syntax
is discussed.)

> reverse recency order starting with Git v2.47.0. This behaviour bisects
> to 57fb139b5e (object-name: fix leaking commit list items, 2024-08-01),

Nit: Can it be simply be asserted that it is caused by that commit?
Because that would make for a simpler/more assertive narrative.

“This bisects to” can be a nice phrase when it is followed by a “but”
subclause,[1] i.e. when the straightforward troubleshooting can turn up
misleading leads.

† 1: 615d2de3b45 (config.c: avoid segfault with --fixed-value and valueless
    config, 2024-08-01)

> which has refactored `get_oid_oneline()` to plug a couple of memory
> leaks.
>
> As part of that refactoring we have started to create a copy of the
> passed-in list of commits and work on that list instead. The intent of
> this was to stop modifying the caller-provided commit list such that the

Nit: s/The intent of this was/The intent was/

> caller can properly free all commit list items, including those that we
> might potentially drop.
>
> We already knew to create such a copy beforehand with the `backup` list,
> which was used to clear the `ONELINE_SEEN` commit mark after we were
> done. So the refactoring simply renamed that list to `copy` and started
> to operate on that list instead. There is a gotcha though: the backup
> list, and thus now also the copied list, is always being prepended to,
> so the resulting list is in reverse order! The end result is that we
> pop commits from the wrong end of the commit list, returning commits in
> reverse recency order.

Nice explanation.

>
> Fix the bug by appending to the list instead.
>
> Reported-by: Aarni Koskela <aarni@valohai.com>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> [snip]
> diff --git a/t/t4208-log-magic-pathspec.sh
> b/t/t4208-log-magic-pathspec.sh

Yes, so here is that magic pathspec name.  But this test file has a lot
of tests that test positional argument ambiguity.  Which seems very
relevant to pathspecs in particular.  And revision selection syntax
seems to be used to test how things are interpreted.  Not really how
things are ultimately processed (that seems secondary).

The tests involving `:/` in particular seem to only be about
ambiguity testing.

Is this the correct test file?

> index
> 2a46eb6bedbe283a08fd77917b7fb17da155b027..2d80b9044bcf9ec8e2f11b6afd2b0fe8e2fcabfd
> 100755
> --- a/t/t4208-log-magic-pathspec.sh
> +++ b/t/t4208-log-magic-pathspec.sh
> @@ -58,6 +58,19 @@ test_expect_success '"git log :/any/path/" should
> not segfault' '
>  	test_must_fail git log :/any/path/
>  '
>
> +test_expect_success ':/ favors more recent matching commits' '

This wasn’t mentioned in the report but `HEAD^{/}` is a similar syntax.
That one is more controllable since you provide a ref yourself
(`:/` returns the youngest commit from any ref).

I have indeed noticed that `HEAD^{/}` returns a sensible thing while
`:/` does something strange like finding the root commit.  (Then I
shrugged and half-assumed that I hadn’t read some fine print)

gitrevisions(7) calls out the relation between these two. It could be
nice for a regression test to assert that these two syntaxes return the
same commit. I.e. when you have just made a commit, `:/<search>` and
`HEAD^{/<search>}` return the same commit, and that commit is
the youngest.

> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit common-old &&
> +		test_commit --no-tag common-new &&
> +		git rev-parse HEAD >expect &&
> +		git rev-parse :/common >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
>  # This differs from the ":/a" check above in that :/in looks like a pathspec,
>  # but doesn't match an actual file.
>  test_expect_success '"git log :/in" should not be ambiguous' '
>
> ---
> base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2
> change-id: 20241206-pks-rev-parse-fix-reversed-list-0f94a20a6165
Junio C Hamano Dec. 6, 2024, 11:57 a.m. UTC | #2
"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> writes:

> On Fri, Dec 6, 2024, at 10:51, Patrick Steinhardt wrote:
>> It was reported
>
> It would be nice with a hyperlink since this email is not part of the
> email thread for the report.
>
> https://lore.kernel.org/git/Z1LJSADiStlFicTL@pks.im/T/#mae906ec74d5657e702165e29b90927f730280c29
>
>> It was reported that magic pathspecs started to return results in
>
> I’m not used to this being called “magic pathspecs” as a user.
> gitglossary(7) talks about (magic) pathspecs as filepaths.

Thanks for catching the mistaken phrasing.  It would have caused
unnecessary confusion later to "git log" readers.

The syntax to say that the following path is from the top-level of
the working tree even when you are running the command from a
subdirectory, e.g.

    cd Documentation
    git log :/t

is described in gitglossary(7):

    A pathspec that begins with a colon `:` has special meaning.  In the
    short form, the leading colon `:` is followed by zero or more "magic
    signature" letters (which optionally is terminated by another colon `:`),
    and the remainder is the pattern to match against the path.
    The "magic signature" consists of ASCII symbols that are neither
    alphanumeric, glob, regex special characters nor colon.
    The optional colon that terminates the "magic signature" can be
    omitted if the pattern begins with a character that does not belong to
    "magic signature" symbol set and is not a colon.

and even though the word "magic" signature is used, there is no
definition given for the entire construct, i.e. the pathspec that
uses a special meaning introduced by the use of "colon followed by
one or more magic signature letters".  We probably should add a
sentence to officially define it, if only to reduce the need for
exchanges like this ;-)

    ... and is not a colon.  Such a pathspec that uses these "magic"
    meaning is called "magic pathspec".

But more importantly, the syntax that triggered this topic in

<CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com>

is *NOT* a magic pathspec at all.  It is a revision syntax to name
the first commit that is reachable from the current HEAD with a
commit log message, that matches the given patterh, i.e.

    git show ":/my message"

which is a (rather lousy) short-hand for a more general

    git show "HEAD^{/my message}"

i.e. <startingRev>^{/<pattern>}.  There is no specific name for
this syntax.

I suspect that "filepath" you mentioned may be something some folks
(but not me or any other long timers) would call yet another syntax,
which is :<path>, that names the object sitting at <path> in the
index.  Because ":/myMessage" look so similar to ":myFile", yet
they mean so different things, as I said, ":/myMessage" is a rather
lousy short-hand for the more general "^{/<pattern>}" suffix that
is less ambiguous.

Patrick, let's not use a wrong word.  This is not about pathspec at
all, and is a revision syntax.  As there is no specific jargon for
the syntax, here is what I would write, if I were explaining the
problem being solved:

    Recently it was reported [*1*] that "look for the youngest
    reachable commit with log message that match the given pattern"
    syntax (e.g. :/myMessagePattern, or HEAD^{/myMessagePattern})
    started to show older commit.

    [Footnote]
    *1* <CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com>

Thanks.
Patrick Steinhardt Dec. 6, 2024, 12:05 p.m. UTC | #3
On Fri, Dec 06, 2024 at 08:57:04PM +0900, Junio C Hamano wrote:
> "Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> writes:
> 
> > On Fri, Dec 6, 2024, at 10:51, Patrick Steinhardt wrote:
> >> It was reported
> >
> > It would be nice with a hyperlink since this email is not part of the
> > email thread for the report.
> >
> > https://lore.kernel.org/git/Z1LJSADiStlFicTL@pks.im/T/#mae906ec74d5657e702165e29b90927f730280c29
> >
> >> It was reported that magic pathspecs started to return results in
> >
> > I’m not used to this being called “magic pathspecs” as a user.
> > gitglossary(7) talks about (magic) pathspecs as filepaths.
> 
> Thanks for catching the mistaken phrasing.  It would have caused
> unnecessary confusion later to "git log" readers.
> 
> The syntax to say that the following path is from the top-level of
> the working tree even when you are running the command from a
> subdirectory, e.g.
> 
>     cd Documentation
>     git log :/t
> 
> is described in gitglossary(7):
> 
>     A pathspec that begins with a colon `:` has special meaning.  In the
>     short form, the leading colon `:` is followed by zero or more "magic
>     signature" letters (which optionally is terminated by another colon `:`),
>     and the remainder is the pattern to match against the path.
>     The "magic signature" consists of ASCII symbols that are neither
>     alphanumeric, glob, regex special characters nor colon.
>     The optional colon that terminates the "magic signature" can be
>     omitted if the pattern begins with a character that does not belong to
>     "magic signature" symbol set and is not a colon.
> 
> and even though the word "magic" signature is used, there is no
> definition given for the entire construct, i.e. the pathspec that
> uses a special meaning introduced by the use of "colon followed by
> one or more magic signature letters".  We probably should add a
> sentence to officially define it, if only to reduce the need for
> exchanges like this ;-)
> 
>     ... and is not a colon.  Such a pathspec that uses these "magic"
>     meaning is called "magic pathspec".
> 
> But more importantly, the syntax that triggered this topic in
> 
> <CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com>
> 
> is *NOT* a magic pathspec at all.  It is a revision syntax to name
> the first commit that is reachable from the current HEAD with a
> commit log message, that matches the given patterh, i.e.
> 
>     git show ":/my message"
> 
> which is a (rather lousy) short-hand for a more general
> 
>     git show "HEAD^{/my message}"
> 
> i.e. <startingRev>^{/<pattern>}.  There is no specific name for
> this syntax.

Hm, yeah. I think I read pathspec somewhere and just went with it
without thinking.

> I suspect that "filepath" you mentioned may be something some folks
> (but not me or any other long timers) would call yet another syntax,
> which is :<path>, that names the object sitting at <path> in the
> index.  Because ":/myMessage" look so similar to ":myFile", yet
> they mean so different things, as I said, ":/myMessage" is a rather
> lousy short-hand for the more general "^{/<pattern>}" suffix that
> is less ambiguous.
> 
> Patrick, let's not use a wrong word.  This is not about pathspec at
> all, and is a revision syntax.  As there is no specific jargon for
> the syntax, here is what I would write, if I were explaining the
> problem being solved:
> 
>     Recently it was reported [*1*] that "look for the youngest
>     reachable commit with log message that match the given pattern"
>     syntax (e.g. :/myMessagePattern, or HEAD^{/myMessagePattern})
>     started to show older commit.
> 
>     [Footnote]
>     *1* <CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com>

Will update in v2. Thanks!

Patrick
Kristoffer Haugsbakk Dec. 6, 2024, 12:25 p.m. UTC | #4
On Fri, Dec 6, 2024, at 12:57, Junio C Hamano wrote:
> [snip]
>     Recently it was reported [*1*] that "look for the youngest
>     reachable commit with log message that match the given pattern"
>     syntax (e.g. :/myMessagePattern, or HEAD^{/myMessagePattern})

But these are slightly different since `:/` searches in all refs.
They are only equivalent (i.e. with Patrick’s fix) when you have
just made a commit (like in Patrick’s test).

And the longer syntax doesn’t seem to be affected by any regressions.
In this repo:

    $ # wrong
    $ git log --format=reference -1 :/Merge
    3857284f7b8 (Merge refs/heads/master from ., 2005-08-24)
    $ # correct (looks correct)
    $ git log --format=reference -1 HEAD^{/Merge}
    4a611ee7ebd (Merge branch 'kn/ref-transaction-hook-with-reflog', 2024-11-27)

>     started to show older commit.
Patrick Steinhardt Dec. 6, 2024, 12:25 p.m. UTC | #5
On Fri, Dec 06, 2024 at 12:20:45PM +0100, Kristoffer Haugsbakk wrote:
> > diff --git a/t/t4208-log-magic-pathspec.sh
> > b/t/t4208-log-magic-pathspec.sh
> 
> Yes, so here is that magic pathspec name.  But this test file has a lot
> of tests that test positional argument ambiguity.  Which seems very
> relevant to pathspecs in particular.  And revision selection syntax
> seems to be used to test how things are interpreted.  Not really how
> things are ultimately processed (that seems secondary).
> 
> The tests involving `:/` in particular seem to only be about
> ambiguity testing.
> 
> Is this the correct test file?

Probably not. I couldn't really find any cases where we explicitly
verify this syntax, which explains why the regression was not found.
I've added it to t1500 now, which is at least better than t4208.

Thanks!

Patrick
Junio C Hamano Dec. 6, 2024, 10:40 p.m. UTC | #6
"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> writes:

> And the longer syntax doesn’t seem to be affected by any regressions.

That is curious.  I wonder if we have two redundant code paths and
the regression hit only one of them?

Thanks.
diff mbox series

Patch

diff --git a/object-name.c b/object-name.c
index c892fbe80aa7173dfcc1995de5a75bc322c6adb7..34433d2a01d6a23ce6b4ca19b85c53b7b82fd0e5 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1401,7 +1401,7 @@  static int get_oid_oneline(struct repository *r,
 			   const char *prefix, struct object_id *oid,
 			   const struct commit_list *list)
 {
-	struct commit_list *copy = NULL;
+	struct commit_list *copy = NULL, **copy_tail = &copy;
 	const struct commit_list *l;
 	int found = 0;
 	int negative = 0;
@@ -1423,7 +1423,7 @@  static int get_oid_oneline(struct repository *r,
 
 	for (l = list; l; l = l->next) {
 		l->item->object.flags |= ONELINE_SEEN;
-		commit_list_insert(l->item, &copy);
+		copy_tail = &commit_list_insert(l->item, copy_tail)->next;
 	}
 	while (copy) {
 		const char *p, *buf;
diff --git a/t/t4208-log-magic-pathspec.sh b/t/t4208-log-magic-pathspec.sh
index 2a46eb6bedbe283a08fd77917b7fb17da155b027..2d80b9044bcf9ec8e2f11b6afd2b0fe8e2fcabfd 100755
--- a/t/t4208-log-magic-pathspec.sh
+++ b/t/t4208-log-magic-pathspec.sh
@@ -58,6 +58,19 @@  test_expect_success '"git log :/any/path/" should not segfault' '
 	test_must_fail git log :/any/path/
 '
 
+test_expect_success ':/ favors more recent matching commits' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit common-old &&
+		test_commit --no-tag common-new &&
+		git rev-parse HEAD >expect &&
+		git rev-parse :/common >actual &&
+		test_cmp expect actual
+	)
+'
+
 # This differs from the ":/a" check above in that :/in looks like a pathspec,
 # but doesn't match an actual file.
 test_expect_success '"git log :/in" should not be ambiguous' '