Message ID | 20230715025512.7574-1-jacobabel@nullpo.dev (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | t2400: Fix test failures when using grep 2.5 | expand |
Hi Jocab On 15/07/2023 03:55, Jacob Abel wrote: > Replace all cases of `\s` with `[[:space:]]` as older versions of GNU > grep (and from what it seems most versions of BSD grep) do not handle > `\s`. > > For the same reason all cases of `\S` are replaced with `[^[:space:]]`. > Replacing `\S` also needs to occur as `\S` is technically PCRE and not > part of ERE even though most modern versions of grep accept it as ERE. Thanks for working on this fix. Having looked at the changes I think it would be better just be using a space character in a lot of these expressions - see below. > Signed-off-by: Jacob Abel <jacobabel@nullpo.dev> > --- > This patch is in response to build failures on GGG's Cirrus CI > freebsd_12 build jobs[1] and was prompted by a discussion thread [2]. > > These failures seem to be caused by the behavior outlined in [3]. > Weirdly however they only seem to occur on the FreeBSD CI but not the > Mac OS CI for some reason despite Mac OS using FreeBSD grep. > > 1. https://github.com/gitgitgadget/git/pull/1550/checks?check_run_id=14949695859 > 2. https://lore.kernel.org/git/CALnO6CDryTsguLshcQxx97ZxyY42Twu2hC2y1bLOsS-9zbqXMA@mail.gmail.com/ > 3. https://stackoverflow.com/questions/4233159/grep-regex-whitespace-behavior > > t/t2400-worktree-add.sh | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/t/t2400-worktree-add.sh b/t/t2400-worktree-add.sh > index 0ac468e69e..7f19bdabff 100755 > --- a/t/t2400-worktree-add.sh > +++ b/t/t2400-worktree-add.sh > @@ -417,9 +417,9 @@ test_wt_add_orphan_hint () { > grep "hint: If you meant to create a worktree containing a new orphan branch" actual && > if [ $use_branch -eq 1 ] > then > - grep -E "^hint:\s+git worktree add --orphan -b \S+ \S+\s*$" actual > + grep -E "^hint:[[:space:]]+git worktree add --orphan -b [^[:space:]]+ [^[:space:]]+[[:space:]]*$" actual We know that "hint:" is followed by a single space and all we're really interested in is that we print something after the "-b " so we can simplify this to grep "^hint: git worktree add --orphan -b [^ ]" I think the same applies to most of the other expressions changed in this patch. > else > - grep -E "^hint:\s+git worktree add --orphan \S+\s*$" actual > + grep -E "^hint:[[:space:]]+git worktree add --orphan [^[:space:]]+[[:space:]]*$" actual > fi > > ' > @@ -709,7 +709,7 @@ test_dwim_orphan () { > local info_text="No possible source branch, inferring '--orphan'" && > local fetch_error_text="fatal: No local or remote refs exist despite at least one remote" && > local orphan_hint="hint: If you meant to create a worktree containing a new orphan branch" && > - local invalid_ref_regex="^fatal: invalid reference:\s\+.*" && > + local invalid_ref_regex="^fatal: invalid reference:[[:space:]]\+.*" && > local bad_combo_regex="^fatal: '[a-z-]\+' and '[a-z-]\+' cannot be used together" && > > local git_ns="repo" && > @@ -998,8 +998,8 @@ test_dwim_orphan () { > headpath=$(git $dashc_args rev-parse --sq --path-format=absolute --git-path HEAD) && I'm a bit confused by the --sq here - why does it need to be shell quoted when it is always used inside double quotes? Also when the reftable backend is used I'm not sure that HEAD is actually a file in $GIT_DIR anymore (that's less of an issue at the moment as that backend is not is use yet). > headcontents=$(cat "$headpath") && > grep "HEAD points to an invalid (or orphaned) reference" actual && > - grep "HEAD path:\s*.$headpath." actual && > - grep "HEAD contents:\s*.$headcontents." actual && > + grep "HEAD path:[[:space:]]*.$headpath." actual && > + grep "HEAD contents:[[:space:]]*.$headcontents." actual && Using grep like this makes it harder to debug test failures as one has to run the test with "-x" in order to try and figure out which grep actually failed. I think here we can replace the sequence of "grep"s with "test_cmp" cat >expect <<-EOF && HEAD points to an invalid (or orphaned) reference HEAD path: $headpath HEAD contents: $headcontents EOF test_cmp expect actual Best Wishes Phillip > grep "$orphan_hint" actual && > ! grep "$info_text" actual > fi && > > base-commit: 830b4a04c45bf0a6db26defe02ed1f490acd18ee
On 23/07/15 09:59AM, Phillip Wood wrote: > Hi Jocab > > On 15/07/2023 03:55, Jacob Abel wrote: > > [...] > > Thanks for working on this fix. Having looked at the changes I think it > would be better just be using a space character in a lot of these > expressions - see below. > > > [...] > > > > - grep -E "^hint:\s+git worktree add --orphan -b \S+ \S+\s*$" actual > > + grep -E "^hint:[[:space:]]+git worktree add --orphan -b [^[:space:]]+ [^[:space:]]+[[:space:]]*$" actual > > We know that "hint:" is followed by a single space and all we're really > interested in is that we print something after the "-b " so we can > simplify this to > > grep "^hint: git worktree add --orphan -b [^ ]" > > I think the same applies to most of the other expressions changed in > this patch. This wouldn't work as it's `hint: ` followed by a `\t` as the command is indented in the text block. So I just went with `[[:space:]]+` as I didn't want to have to worry about whether some platforms expand the tab to spaces or how many spaces. I'll make the rest of the suggested changes though. > > [...] > > @@ -998,8 +998,8 @@ test_dwim_orphan () { > > headpath=$(git $dashc_args rev-parse --sq --path-format=absolute --git-path HEAD) && > > I'm a bit confused by the --sq here - why does it need to be shell > quoted when it is always used inside double quotes? To be honest I can't remember if this specifically needs to be in quotes or not however I had a lot of trouble during the development of that patchset with things escaping quotes and causing breakages in the tests so if it isn't currently harmful I'd personally prefer to leave it as is. > Also when the reftable backend is used I'm not sure that HEAD is > actually a file in $GIT_DIR anymore (that's less of an issue at the > moment as that backend is not is use yet). If there is documentation (or discussions) on how to use this backend properly I'd appreciate a link and I can try workshopping a better solution then. The warning included in the original patchset reads from that HEAD file as well so it would also need to be adapted. The reason I did it this way is because I didn't see any easy way to get the raw contents of the HEAD when it was invalid. If there is a cleaner/safer/more portable way to view those contents when HEAD points to an invalid or unborn reference, I'd be willing to work on a followup patch down the line. > > [...] > > Using grep like this makes it harder to debug test failures as one has > to run the test with "-x" in order to try and figure out which grep > actually failed. I think here we can replace the sequence of "grep"s > with "test_cmp" > > cat >expect <<-EOF && > HEAD points to an invalid (or orphaned) reference > HEAD path: $headpath > HEAD contents: $headcontents > EOF > > test_cmp expect actual I'll make these changes. > [...]
On 23/07/15 07:15PM, Jacob Abel wrote: > On 23/07/15 09:59AM, Phillip Wood wrote: > > > > [...] > > [...] > > > > > Using grep like this makes it harder to debug test failures as one has > > to run the test with "-x" in order to try and figure out which grep > > actually failed. I think here we can replace the sequence of "grep"s > > with "test_cmp" > > > > cat >expect <<-EOF && > > HEAD points to an invalid (or orphaned) reference > > HEAD path: $headpath > > HEAD contents: $headcontents > > EOF > > > > test_cmp expect actual > > I'll make these changes. Actually now that I'm sitting down to make these changes, I'm a bit hesitant as to whether this would be a cleaner solution than what is currently there. I say this as the contents of `actual` are about 10-12 lines in length and some of those lines would vary between the different tests that use this test function. Those specific details don't need to be tested for as they are validated in prior tests and building a perfectly matching `expected` would further complicate that fairly large test function. > > > [...]
Jacob Abel <jacobabel@nullpo.dev> writes: >> > @@ -998,8 +998,8 @@ test_dwim_orphan () { >> > headpath=$(git $dashc_args rev-parse --sq --path-format=absolute --git-path HEAD) && >> >> I'm a bit confused by the --sq here - why does it need to be shell >> quoted when it is always used inside double quotes? > > To be honest I can't remember if this specifically needs to be in > quotes or not however I had a lot of trouble during the development of > that patchset with things escaping quotes and causing breakages in the > tests so if it isn't currently harmful I'd personally prefer to leave > it as is. Quoting is sometimes tricky enough that "this happens to work for me but I do not know why it works" is asking for trouble in somebody else's environment. If the form in the patch is correct, but tricky for others to understand, you'd need to pick it apart and document how it works (and if you cannot do so, ask for help by somebody who can, or simplify it enough so that you can explain it yourself). headpath=$(git $dashc_args rev-parse --sq --path-format=absolute --git-path HEAD) && In this case, "--sq" is a noop that only confuses readers, I think, and I would drop it if I were you. "--git-path HEAD" is given by this call chain: builtin/rev-parse.c:cmd_rev_parse() -> builtin/rev-parse.c:print_path() -> transform path depending on the path format -> puts() and nowhere in this chain "output_sq" (which is set by "--sq") is even checked. The transformations are all about relative, prefix, etc., and never about quoting. The original test script t2400 (before your patch) does look crappy with full of long lines and coding style violations (none of which is your fault), and it may need to be cleaned up once this patch settles. Thanks.
On 23/07/15 06:08PM, Junio C Hamano wrote: > Jacob Abel <jacobabel@nullpo.dev> writes: > > >> > @@ -998,8 +998,8 @@ test_dwim_orphan () { > >> > headpath=$(git $dashc_args rev-parse --sq --path-format=absolute --git-path HEAD) && > >> > >> I'm a bit confused by the --sq here - why does it need to be shell > >> quoted when it is always used inside double quotes? > > > > To be honest I can't remember if this specifically needs to be in > > quotes or not however I had a lot of trouble during the development of > > that patchset with things escaping quotes and causing breakages in the > > tests so if it isn't currently harmful I'd personally prefer to leave > > it as is. > > Quoting is sometimes tricky enough that "this happens to work for me > but I do not know why it works" is asking for trouble in somebody > else's environment. If the form in the patch is correct, but tricky > for others to understand, you'd need to pick it apart and document > how it works (and if you cannot do so, ask for help by somebody who > can, or simplify it enough so that you can explain it yourself). Yes Apologies. That was kind of a cop-out on my part as I was hesitant to add additional changes that could potentially introduce new issues to this patch as it is already addressing a fairly obscure issue. > > headpath=$(git $dashc_args rev-parse --sq --path-format=absolute --git-path HEAD) && > > In this case, "--sq" is a noop that only confuses readers, I think, > and I would drop it if I were you. "--git-path HEAD" is given by > this call chain: > > builtin/rev-parse.c:cmd_rev_parse() > -> builtin/rev-parse.c:print_path() > -> transform path depending on the path format > -> puts() > > and nowhere in this chain "output_sq" (which is set by "--sq") is > even checked. The transformations are all about relative, prefix, > etc., and never about quoting. Understood. I tried running it with `--sq` removed and it seems to work as you and Phillip expected so I'm adding that to v2. > > The original test script t2400 (before your patch) does look crappy > with full of long lines and coding style violations (none of which > is your fault), and it may need to be cleaned up once this patch > settles. > > Thanks. I may give that cleanup a shot some time down the line if nobody else takes a crack at it first.
Hi Jacob On 16/07/2023 00:15, Jacob Abel wrote: > On 23/07/15 09:59AM, Phillip Wood wrote: >> Hi Jocab >> >> On 15/07/2023 03:55, Jacob Abel wrote: >>> [...] >> >> Thanks for working on this fix. Having looked at the changes I think it >> would be better just be using a space character in a lot of these >> expressions - see below. One thing I forgot to mention was that I think it would be better to explain in the commit message that "\s" etc. are not part of POSIX EREs and that is why they do not work. >>> [...] >>> >>> - grep -E "^hint:\s+git worktree add --orphan -b \S+ \S+\s*$" actual >>> + grep -E "^hint:[[:space:]]+git worktree add --orphan -b [^[:space:]]+ [^[:space:]]+[[:space:]]*$" actual >> >> We know that "hint:" is followed by a single space and all we're really >> interested in is that we print something after the "-b " so we can >> simplify this to >> >> grep "^hint: git worktree add --orphan -b [^ ]" >> >> I think the same applies to most of the other expressions changed in >> this patch. > > This wouldn't work as it's `hint: ` followed by a `\t` as the command > is indented in the text block. Oh so we need to search for a space followed by a tab after "hint:" then. As an aside we often just use four spaces to indent commands in advice messages (see the output of git -C .. grep '" git' \*.c) > So I just went with `[[:space:]]+` as I > didn't want to have to worry about whether some platforms expand the > tab to spaces or how many spaces. Is that a thing? > I'll make the rest of the suggested > changes though. > >>> [...] >>> @@ -998,8 +998,8 @@ test_dwim_orphan () { >>> headpath=$(git $dashc_args rev-parse --sq --path-format=absolute --git-path HEAD) && >> >> I'm a bit confused by the --sq here - why does it need to be shell >> quoted when it is always used inside double quotes? > > To be honest I can't remember if this specifically needs to be in > quotes or not however I had a lot of trouble during the development of > that patchset with things escaping quotes and causing breakages in the > tests so if it isn't currently harmful I'd personally prefer to leave > it as is. > >> Also when the reftable backend is used I'm not sure that HEAD is >> actually a file in $GIT_DIR anymore (that's less of an issue at the >> moment as that backend is not is use yet). > > If there is documentation (or discussions) on how to use this backend > properly I'd appreciate a link and I can try workshopping a better > solution then. The warning included in the original patchset reads > from that HEAD file as well so it would also need to be adapted. I'm afraid I don't have anything specific, there were some patches a while ago such as dd8468ef00 (t5601: read HEAD using rev-parse, 2021-05-31) that stopped reading HEAD from the filesystem. > The reason I did it this way is because I didn't see any easy way to > get the raw contents of the HEAD when it was invalid. If there is a > cleaner/safer/more portable way to view those contents when HEAD > points to an invalid or unborn reference, I'd be willing to work on a > followup patch down the line. I think it might be better to just diagnose if HEAD is a dangling symbolic-ref or contains an invalid oid and leave it at that. See the documentation in refs.h for refs_resolve_ref_unsafe() for how to check if HEAD is a dangling symbolic ref - if rego_get_oid(repo, "HEAD") fails and it is not a dangling symbolic ref then it contains an invalid oid. Best Wishes Phillip >>> [...] >> >> Using grep like this makes it harder to debug test failures as one has >> to run the test with "-x" in order to try and figure out which grep >> actually failed. I think here we can replace the sequence of "grep"s >> with "test_cmp" >> >> cat >expect <<-EOF && >> HEAD points to an invalid (or orphaned) reference >> HEAD path: $headpath >> HEAD contents: $headcontents >> EOF >> >> test_cmp expect actual > > I'll make these changes. > >> [...] >
Phillip Wood <phillip.wood123@gmail.com> writes: > One thing I forgot to mention was that I think it would be better to > explain in the commit message that "\s" etc. are not part of POSIX > EREs and that is why they do not work. Yes, that is a very good point. We have been burned by regular expression implementations that use or do not use "enhanced" bit in the recent past, IIRC. > I think it might be better to just diagnose if HEAD is a dangling > symbolic-ref or contains an invalid oid and leave it at that. See the > documentation in refs.h for refs_resolve_ref_unsafe() for how to check > if HEAD is a dangling symbolic ref - if rego_get_oid(repo, "HEAD") > fails and it is not a dangling symbolic ref then it contains an > invalid oid. Sounds doable and sensible. If we can easily test it without peeking into the filesystem, that would be very good. Thanks.
On 23/07/16 04:34PM, Phillip Wood wrote: > Hi Jacob > > [...] > > One thing I forgot to mention was that I think it would be better to > explain in the commit message that "\s" etc. are not part of POSIX EREs > and that is why they do not work. Noted. Will do. > [...] > > Oh so we need to search for a space followed by a tab after "hint:" > then. Okay. I think `\t` is PCRE so I'll just update the string in `builtin/worktree.c` so we can just do `[ ]+` instead. > As an aside we often just use four spaces to indent commands in > advice messages (see the output of git -C .. grep '" git' \*.c) Apologies. When writing up that original patchset I based the formatting of the advice based on the ones in `builtin/add.c` which seems to also use `\t`. > > > So I just went with `[[:space:]]+` as I > > didn't want to have to worry about whether some platforms expand the > > tab to spaces or how many spaces. > > Is that a thing? It might be? I know copying text through tmux tends to expand tabs to spaces for me so I figured some other tools or those same tools on different platforms might do things like that as well. To be honest I have no idea and figured that I'd just CYA by making it work in the case that it did than trying to guarantee that it wouldn't happen. > > [...] > > > > If there is documentation (or discussions) on how to use this backend > > properly I'd appreciate a link and I can try workshopping a better > > solution then. The warning included in the original patchset reads > > from that HEAD file as well so it would also need to be adapted. > > I'm afraid I don't have anything specific, there were some patches a > while ago such as dd8468ef00 (t5601: read HEAD using rev-parse, > 2021-05-31) that stopped reading HEAD from the filesystem. Noted. > > [...] > > I think it might be better to just diagnose if HEAD is a dangling > symbolic-ref or contains an invalid oid and leave it at that. See the > documentation in refs.h for refs_resolve_ref_unsafe() for how to check > if HEAD is a dangling symbolic ref - if rego_get_oid(repo, "HEAD") fails > and it is not a dangling symbolic ref then it contains an invalid oid. Understood. I'll start working on a separate patch to update that warning once this patch settles then. > > [...]
On 18/07/2023 01:44, Jacob Abel wrote: > On 23/07/16 04:34PM, Phillip Wood wrote: >> Oh so we need to search for a space followed by a tab after "hint:" >> then. > > Okay. I think `\t` is PCRE so I'll just update the string in > `builtin/worktree.c` so we can just do `[ ]+` instead. > >> As an aside we often just use four spaces to indent commands in >> advice messages (see the output of git -C .. grep '" git' \*.c) > > Apologies. When writing up that original patchset I based the > formatting of the advice based on the ones in `builtin/add.c` which > seems to also use `\t`. The existing code is not consistent on this point but I think there are more instances of " " than "\t". Using " " makes the indentation consistent as the "hint: " prefix is translated so we don't know how far the next tab stop will be from the end of the prefix. >> >>> So I just went with `[[:space:]]+` as I >>> didn't want to have to worry about whether some platforms expand the >>> tab to spaces or how many spaces. >> >> Is that a thing? > > It might be? I know copying text through tmux tends to expand tabs to > spaces for me so I figured some other tools or those same tools on > different platforms might do things like that as well. To be honest I > have no idea and figured that I'd just CYA by making it work in the > case that it did than trying to guarantee that it wouldn't happen. In the test we are redirecting the output to a file so things like tmux do not come into play. I think it would be a bit odd for the system libc to convert tabs to spaces. >>> [...] >> >> I think it might be better to just diagnose if HEAD is a dangling >> symbolic-ref or contains an invalid oid and leave it at that. See the >> documentation in refs.h for refs_resolve_ref_unsafe() for how to check >> if HEAD is a dangling symbolic ref - if rego_get_oid(repo, "HEAD") fails >> and it is not a dangling symbolic ref then it contains an invalid oid. > > Understood. I'll start working on a separate patch to update that > warning once this patch settles then. That's great. I think just telling the user something like branch 'main' does not exist when HEAD contains the dangling symbolic ref "refs/heads/main" and HEAD is corrupt when it is not a symbolic ref and repo_get_oid() fails would be fine. Best Wishes Phillip
On 23/07/18 02:36PM, Phillip Wood wrote: > [...] > > The existing code is not consistent on this point but I think there are > more instances of " " than "\t". Using " " makes the indentation > consistent as the "hint: " prefix is translated so we don't know how far > the next tab stop will be from the end of the prefix. Agreed. > > [...] > > In the test we are redirecting the output to a file so things like tmux > do not come into play. I think it would be a bit odd for the system libc > to convert tabs to spaces. Understood. It was just a bit of paranoia on my part then. > [...] > > > Understood. I'll start working on a separate patch to update that > > warning once this patch settles then. > > That's great. I think just telling the user something like > > branch 'main' does not exist > > when HEAD contains the dangling symbolic ref "refs/heads/main" and > > HEAD is corrupt > > when it is not a symbolic ref and repo_get_oid() fails would be fine. Noted. I'll workshop it a bit before I put v1 of that patch out.
diff --git a/t/t2400-worktree-add.sh b/t/t2400-worktree-add.sh index 0ac468e69e..7f19bdabff 100755 --- a/t/t2400-worktree-add.sh +++ b/t/t2400-worktree-add.sh @@ -417,9 +417,9 @@ test_wt_add_orphan_hint () { grep "hint: If you meant to create a worktree containing a new orphan branch" actual && if [ $use_branch -eq 1 ] then - grep -E "^hint:\s+git worktree add --orphan -b \S+ \S+\s*$" actual + grep -E "^hint:[[:space:]]+git worktree add --orphan -b [^[:space:]]+ [^[:space:]]+[[:space:]]*$" actual else - grep -E "^hint:\s+git worktree add --orphan \S+\s*$" actual + grep -E "^hint:[[:space:]]+git worktree add --orphan [^[:space:]]+[[:space:]]*$" actual fi ' @@ -709,7 +709,7 @@ test_dwim_orphan () { local info_text="No possible source branch, inferring '--orphan'" && local fetch_error_text="fatal: No local or remote refs exist despite at least one remote" && local orphan_hint="hint: If you meant to create a worktree containing a new orphan branch" && - local invalid_ref_regex="^fatal: invalid reference:\s\+.*" && + local invalid_ref_regex="^fatal: invalid reference:[[:space:]]\+.*" && local bad_combo_regex="^fatal: '[a-z-]\+' and '[a-z-]\+' cannot be used together" && local git_ns="repo" && @@ -998,8 +998,8 @@ test_dwim_orphan () { headpath=$(git $dashc_args rev-parse --sq --path-format=absolute --git-path HEAD) && headcontents=$(cat "$headpath") && grep "HEAD points to an invalid (or orphaned) reference" actual && - grep "HEAD path:\s*.$headpath." actual && - grep "HEAD contents:\s*.$headcontents." actual && + grep "HEAD path:[[:space:]]*.$headpath." actual && + grep "HEAD contents:[[:space:]]*.$headcontents." actual && grep "$orphan_hint" actual && ! grep "$info_text" actual fi &&
Replace all cases of `\s` with `[[:space:]]` as older versions of GNU grep (and from what it seems most versions of BSD grep) do not handle `\s`. For the same reason all cases of `\S` are replaced with `[^[:space:]]`. Replacing `\S` also needs to occur as `\S` is technically PCRE and not part of ERE even though most modern versions of grep accept it as ERE. Signed-off-by: Jacob Abel <jacobabel@nullpo.dev> --- This patch is in response to build failures on GGG's Cirrus CI freebsd_12 build jobs[1] and was prompted by a discussion thread [2]. These failures seem to be caused by the behavior outlined in [3]. Weirdly however they only seem to occur on the FreeBSD CI but not the Mac OS CI for some reason despite Mac OS using FreeBSD grep. 1. https://github.com/gitgitgadget/git/pull/1550/checks?check_run_id=14949695859 2. https://lore.kernel.org/git/CALnO6CDryTsguLshcQxx97ZxyY42Twu2hC2y1bLOsS-9zbqXMA@mail.gmail.com/ 3. https://stackoverflow.com/questions/4233159/grep-regex-whitespace-behavior t/t2400-worktree-add.sh | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) base-commit: 830b4a04c45bf0a6db26defe02ed1f490acd18ee