Message ID | 20241206-pks-rev-parse-fix-reversed-list-v1-1-95a96564a4d7@pks.im (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | object-name: fix reversed ordering with magic pathspecs | expand |
Hi On Fri, Dec 6, 2024, at 10:51, Patrick Steinhardt wrote: > It was reported It would be nice with a hyperlink since this email is not part of the email thread for the report. https://lore.kernel.org/git/Z1LJSADiStlFicTL@pks.im/T/#mae906ec74d5657e702165e29b90927f730280c29 > It was reported that magic pathspecs started to return results in I’m not used to this being called “magic pathspecs” as a user. gitglossary(7) talks about (magic) pathspecs as filepaths. (c.f. gitrevisions(7) where this revision selection syntax is discussed.) > reverse recency order starting with Git v2.47.0. This behaviour bisects > to 57fb139b5e (object-name: fix leaking commit list items, 2024-08-01), Nit: Can it be simply be asserted that it is caused by that commit? Because that would make for a simpler/more assertive narrative. “This bisects to” can be a nice phrase when it is followed by a “but” subclause,[1] i.e. when the straightforward troubleshooting can turn up misleading leads. † 1: 615d2de3b45 (config.c: avoid segfault with --fixed-value and valueless config, 2024-08-01) > which has refactored `get_oid_oneline()` to plug a couple of memory > leaks. > > As part of that refactoring we have started to create a copy of the > passed-in list of commits and work on that list instead. The intent of > this was to stop modifying the caller-provided commit list such that the Nit: s/The intent of this was/The intent was/ > caller can properly free all commit list items, including those that we > might potentially drop. > > We already knew to create such a copy beforehand with the `backup` list, > which was used to clear the `ONELINE_SEEN` commit mark after we were > done. So the refactoring simply renamed that list to `copy` and started > to operate on that list instead. There is a gotcha though: the backup > list, and thus now also the copied list, is always being prepended to, > so the resulting list is in reverse order! The end result is that we > pop commits from the wrong end of the commit list, returning commits in > reverse recency order. Nice explanation. > > Fix the bug by appending to the list instead. > > Reported-by: Aarni Koskela <aarni@valohai.com> > Signed-off-by: Patrick Steinhardt <ps@pks.im> > [snip] > diff --git a/t/t4208-log-magic-pathspec.sh > b/t/t4208-log-magic-pathspec.sh Yes, so here is that magic pathspec name. But this test file has a lot of tests that test positional argument ambiguity. Which seems very relevant to pathspecs in particular. And revision selection syntax seems to be used to test how things are interpreted. Not really how things are ultimately processed (that seems secondary). The tests involving `:/` in particular seem to only be about ambiguity testing. Is this the correct test file? > index > 2a46eb6bedbe283a08fd77917b7fb17da155b027..2d80b9044bcf9ec8e2f11b6afd2b0fe8e2fcabfd > 100755 > --- a/t/t4208-log-magic-pathspec.sh > +++ b/t/t4208-log-magic-pathspec.sh > @@ -58,6 +58,19 @@ test_expect_success '"git log :/any/path/" should > not segfault' ' > test_must_fail git log :/any/path/ > ' > > +test_expect_success ':/ favors more recent matching commits' ' This wasn’t mentioned in the report but `HEAD^{/}` is a similar syntax. That one is more controllable since you provide a ref yourself (`:/` returns the youngest commit from any ref). I have indeed noticed that `HEAD^{/}` returns a sensible thing while `:/` does something strange like finding the root commit. (Then I shrugged and half-assumed that I hadn’t read some fine print) gitrevisions(7) calls out the relation between these two. It could be nice for a regression test to assert that these two syntaxes return the same commit. I.e. when you have just made a commit, `:/<search>` and `HEAD^{/<search>}` return the same commit, and that commit is the youngest. > + test_when_finished "rm -rf repo" && > + git init repo && > + ( > + cd repo && > + test_commit common-old && > + test_commit --no-tag common-new && > + git rev-parse HEAD >expect && > + git rev-parse :/common >actual && > + test_cmp expect actual > + ) > +' > + > # This differs from the ":/a" check above in that :/in looks like a pathspec, > # but doesn't match an actual file. > test_expect_success '"git log :/in" should not be ambiguous' ' > > --- > base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2 > change-id: 20241206-pks-rev-parse-fix-reversed-list-0f94a20a6165
"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> writes: > On Fri, Dec 6, 2024, at 10:51, Patrick Steinhardt wrote: >> It was reported > > It would be nice with a hyperlink since this email is not part of the > email thread for the report. > > https://lore.kernel.org/git/Z1LJSADiStlFicTL@pks.im/T/#mae906ec74d5657e702165e29b90927f730280c29 > >> It was reported that magic pathspecs started to return results in > > I’m not used to this being called “magic pathspecs” as a user. > gitglossary(7) talks about (magic) pathspecs as filepaths. Thanks for catching the mistaken phrasing. It would have caused unnecessary confusion later to "git log" readers. The syntax to say that the following path is from the top-level of the working tree even when you are running the command from a subdirectory, e.g. cd Documentation git log :/t is described in gitglossary(7): A pathspec that begins with a colon `:` has special meaning. In the short form, the leading colon `:` is followed by zero or more "magic signature" letters (which optionally is terminated by another colon `:`), and the remainder is the pattern to match against the path. The "magic signature" consists of ASCII symbols that are neither alphanumeric, glob, regex special characters nor colon. The optional colon that terminates the "magic signature" can be omitted if the pattern begins with a character that does not belong to "magic signature" symbol set and is not a colon. and even though the word "magic" signature is used, there is no definition given for the entire construct, i.e. the pathspec that uses a special meaning introduced by the use of "colon followed by one or more magic signature letters". We probably should add a sentence to officially define it, if only to reduce the need for exchanges like this ;-) ... and is not a colon. Such a pathspec that uses these "magic" meaning is called "magic pathspec". But more importantly, the syntax that triggered this topic in <CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com> is *NOT* a magic pathspec at all. It is a revision syntax to name the first commit that is reachable from the current HEAD with a commit log message, that matches the given patterh, i.e. git show ":/my message" which is a (rather lousy) short-hand for a more general git show "HEAD^{/my message}" i.e. <startingRev>^{/<pattern>}. There is no specific name for this syntax. I suspect that "filepath" you mentioned may be something some folks (but not me or any other long timers) would call yet another syntax, which is :<path>, that names the object sitting at <path> in the index. Because ":/myMessage" look so similar to ":myFile", yet they mean so different things, as I said, ":/myMessage" is a rather lousy short-hand for the more general "^{/<pattern>}" suffix that is less ambiguous. Patrick, let's not use a wrong word. This is not about pathspec at all, and is a revision syntax. As there is no specific jargon for the syntax, here is what I would write, if I were explaining the problem being solved: Recently it was reported [*1*] that "look for the youngest reachable commit with log message that match the given pattern" syntax (e.g. :/myMessagePattern, or HEAD^{/myMessagePattern}) started to show older commit. [Footnote] *1* <CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com> Thanks.
On Fri, Dec 06, 2024 at 08:57:04PM +0900, Junio C Hamano wrote: > "Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> writes: > > > On Fri, Dec 6, 2024, at 10:51, Patrick Steinhardt wrote: > >> It was reported > > > > It would be nice with a hyperlink since this email is not part of the > > email thread for the report. > > > > https://lore.kernel.org/git/Z1LJSADiStlFicTL@pks.im/T/#mae906ec74d5657e702165e29b90927f730280c29 > > > >> It was reported that magic pathspecs started to return results in > > > > I’m not used to this being called “magic pathspecs” as a user. > > gitglossary(7) talks about (magic) pathspecs as filepaths. > > Thanks for catching the mistaken phrasing. It would have caused > unnecessary confusion later to "git log" readers. > > The syntax to say that the following path is from the top-level of > the working tree even when you are running the command from a > subdirectory, e.g. > > cd Documentation > git log :/t > > is described in gitglossary(7): > > A pathspec that begins with a colon `:` has special meaning. In the > short form, the leading colon `:` is followed by zero or more "magic > signature" letters (which optionally is terminated by another colon `:`), > and the remainder is the pattern to match against the path. > The "magic signature" consists of ASCII symbols that are neither > alphanumeric, glob, regex special characters nor colon. > The optional colon that terminates the "magic signature" can be > omitted if the pattern begins with a character that does not belong to > "magic signature" symbol set and is not a colon. > > and even though the word "magic" signature is used, there is no > definition given for the entire construct, i.e. the pathspec that > uses a special meaning introduced by the use of "colon followed by > one or more magic signature letters". We probably should add a > sentence to officially define it, if only to reduce the need for > exchanges like this ;-) > > ... and is not a colon. Such a pathspec that uses these "magic" > meaning is called "magic pathspec". > > But more importantly, the syntax that triggered this topic in > > <CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com> > > is *NOT* a magic pathspec at all. It is a revision syntax to name > the first commit that is reachable from the current HEAD with a > commit log message, that matches the given patterh, i.e. > > git show ":/my message" > > which is a (rather lousy) short-hand for a more general > > git show "HEAD^{/my message}" > > i.e. <startingRev>^{/<pattern>}. There is no specific name for > this syntax. Hm, yeah. I think I read pathspec somewhere and just went with it without thinking. > I suspect that "filepath" you mentioned may be something some folks > (but not me or any other long timers) would call yet another syntax, > which is :<path>, that names the object sitting at <path> in the > index. Because ":/myMessage" look so similar to ":myFile", yet > they mean so different things, as I said, ":/myMessage" is a rather > lousy short-hand for the more general "^{/<pattern>}" suffix that > is less ambiguous. > > Patrick, let's not use a wrong word. This is not about pathspec at > all, and is a revision syntax. As there is no specific jargon for > the syntax, here is what I would write, if I were explaining the > problem being solved: > > Recently it was reported [*1*] that "look for the youngest > reachable commit with log message that match the given pattern" > syntax (e.g. :/myMessagePattern, or HEAD^{/myMessagePattern}) > started to show older commit. > > [Footnote] > *1* <CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com> Will update in v2. Thanks! Patrick
On Fri, Dec 6, 2024, at 12:57, Junio C Hamano wrote: > [snip] > Recently it was reported [*1*] that "look for the youngest > reachable commit with log message that match the given pattern" > syntax (e.g. :/myMessagePattern, or HEAD^{/myMessagePattern}) But these are slightly different since `:/` searches in all refs. They are only equivalent (i.e. with Patrick’s fix) when you have just made a commit (like in Patrick’s test). And the longer syntax doesn’t seem to be affected by any regressions. In this repo: $ # wrong $ git log --format=reference -1 :/Merge 3857284f7b8 (Merge refs/heads/master from ., 2005-08-24) $ # correct (looks correct) $ git log --format=reference -1 HEAD^{/Merge} 4a611ee7ebd (Merge branch 'kn/ref-transaction-hook-with-reflog', 2024-11-27) > started to show older commit.
On Fri, Dec 06, 2024 at 12:20:45PM +0100, Kristoffer Haugsbakk wrote: > > diff --git a/t/t4208-log-magic-pathspec.sh > > b/t/t4208-log-magic-pathspec.sh > > Yes, so here is that magic pathspec name. But this test file has a lot > of tests that test positional argument ambiguity. Which seems very > relevant to pathspecs in particular. And revision selection syntax > seems to be used to test how things are interpreted. Not really how > things are ultimately processed (that seems secondary). > > The tests involving `:/` in particular seem to only be about > ambiguity testing. > > Is this the correct test file? Probably not. I couldn't really find any cases where we explicitly verify this syntax, which explains why the regression was not found. I've added it to t1500 now, which is at least better than t4208. Thanks! Patrick
"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> writes:
> And the longer syntax doesn’t seem to be affected by any regressions.
That is curious. I wonder if we have two redundant code paths and
the regression hit only one of them?
Thanks.
diff --git a/object-name.c b/object-name.c index c892fbe80aa7173dfcc1995de5a75bc322c6adb7..34433d2a01d6a23ce6b4ca19b85c53b7b82fd0e5 100644 --- a/object-name.c +++ b/object-name.c @@ -1401,7 +1401,7 @@ static int get_oid_oneline(struct repository *r, const char *prefix, struct object_id *oid, const struct commit_list *list) { - struct commit_list *copy = NULL; + struct commit_list *copy = NULL, **copy_tail = © const struct commit_list *l; int found = 0; int negative = 0; @@ -1423,7 +1423,7 @@ static int get_oid_oneline(struct repository *r, for (l = list; l; l = l->next) { l->item->object.flags |= ONELINE_SEEN; - commit_list_insert(l->item, ©); + copy_tail = &commit_list_insert(l->item, copy_tail)->next; } while (copy) { const char *p, *buf; diff --git a/t/t4208-log-magic-pathspec.sh b/t/t4208-log-magic-pathspec.sh index 2a46eb6bedbe283a08fd77917b7fb17da155b027..2d80b9044bcf9ec8e2f11b6afd2b0fe8e2fcabfd 100755 --- a/t/t4208-log-magic-pathspec.sh +++ b/t/t4208-log-magic-pathspec.sh @@ -58,6 +58,19 @@ test_expect_success '"git log :/any/path/" should not segfault' ' test_must_fail git log :/any/path/ ' +test_expect_success ':/ favors more recent matching commits' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + test_commit common-old && + test_commit --no-tag common-new && + git rev-parse HEAD >expect && + git rev-parse :/common >actual && + test_cmp expect actual + ) +' + # This differs from the ":/a" check above in that :/in looks like a pathspec, # but doesn't match an actual file. test_expect_success '"git log :/in" should not be ambiguous' '
It was reported that magic pathspecs started to return results in reverse recency order starting with Git v2.47.0. This behaviour bisects to 57fb139b5e (object-name: fix leaking commit list items, 2024-08-01), which has refactored `get_oid_oneline()` to plug a couple of memory leaks. As part of that refactoring we have started to create a copy of the passed-in list of commits and work on that list instead. The intent of this was to stop modifying the caller-provided commit list such that the caller can properly free all commit list items, including those that we might potentially drop. We already knew to create such a copy beforehand with the `backup` list, which was used to clear the `ONELINE_SEEN` commit mark after we were done. So the refactoring simply renamed that list to `copy` and started to operate on that list instead. There is a gotcha though: the backup list, and thus now also the copied list, is always being prepended to, so the resulting list is in reverse order! The end result is that we pop commits from the wrong end of the commit list, returning commits in reverse recency order. Fix the bug by appending to the list instead. Reported-by: Aarni Koskela <aarni@valohai.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> --- This patch applies on top of v2.47.0, which is the first version which had this regression. --- object-name.c | 4 ++-- t/t4208-log-magic-pathspec.sh | 13 +++++++++++++ 2 files changed, 15 insertions(+), 2 deletions(-) --- base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2 change-id: 20241206-pks-rev-parse-fix-reversed-list-0f94a20a6165