diff mbox series

[v2] parallel-checkout: avoid dash local bug in tests

Message ID 472c1411-fcf8-862b-cef9-52c2c994914b@web.de (mailing list archive)
State Superseded
Headers show
Series [v2] parallel-checkout: avoid dash local bug in tests | expand

Commit Message

René Scharfe June 5, 2021, 6:11 p.m. UTC
The parallel checkout tests fail when run with /bin/dash on MacOS 11.4,
reporting the following error:

   ./t2080-parallel-checkout-basics.sh: 33: local: 0: bad variable name

That's because wc's output contains leading spaces and this version of
dash erroneously expands the variable declaration as "local workers= 0",
i.e. it tries to set the "workers" variable to the empty string and also
declare a variable named "0", which not a valid name.  This is a known
dash bug (https://bugs.launchpad.net/ubuntu/+source/dash/+bug/139097).

Work around it by passing the command output directly to test instead of
storing it in a variable first.  While at it, let grep count the number
of lines instead of piping its output to wc, which is a bit shorter and
more efficient.

Helped-by: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Changes since v1:
- Explain the root cause.
- Get rid of the local variable "workers".
- Adjust title accordingly.
- Still use grep -c, though.
- Remove input redirection.

 t/lib-parallel-checkout.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--
2.31.1

Comments

SZEDER Gábor June 5, 2021, 7:09 p.m. UTC | #1
On Sat, Jun 05, 2021 at 08:11:24PM +0200, René Scharfe wrote:
> The parallel checkout tests fail when run with /bin/dash on MacOS 11.4,
> reporting the following error:
> 
>    ./t2080-parallel-checkout-basics.sh: 33: local: 0: bad variable name
> 
> That's because wc's output contains leading spaces and this version of
> dash erroneously expands the variable declaration as "local workers= 0",
> i.e. it tries to set the "workers" variable to the empty string and also
> declare a variable named "0", which not a valid name.  This is a known
> dash bug (https://bugs.launchpad.net/ubuntu/+source/dash/+bug/139097).

Perhaps a more accurate wording for this bug would be:

  ... and even fairly recent versions of dash erroneously perform
  field splitting on the expansion of the command substitution before
  assigning it to a local variable.

I think the relevant part of POSIX is section 2.9.1 Simple Commands:

  4. Each variable assignment shall be expanded for tilde expansion,
     parameter expansion, command substitution, arithmetic expansion,
     and quote removal prior to assigning the value.

  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_01

Note that it didn't mention field splitting; though POSIX doesn't
specifies local variables in the first place, so...

Anyway, this bug has been fixed in v0.5.11 (2020-06-01).
This is an old bug, it was already present in v0.5.5 (2009-01-13); I
didn't check earlier versions.

> Work around it by passing the command output directly to test instead of
> storing it in a variable first.  While at it, let grep count the number
> of lines instead of piping its output to wc, which is a bit shorter and
> more efficient.

A more debug-friendly alternative would be to save 'grep's output to a
temporary file and use 'test_line_count = $expected_workers'.

> Helped-by: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> Changes since v1:
> - Explain the root cause.
> - Get rid of the local variable "workers".
> - Adjust title accordingly.
> - Still use grep -c, though.
> - Remove input redirection.
> 
>  t/lib-parallel-checkout.sh | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
> index 21f5759732..66350d5207 100644
> --- a/t/lib-parallel-checkout.sh
> +++ b/t/lib-parallel-checkout.sh
> @@ -27,8 +27,7 @@ test_checkout_workers () {
>  	rm -f "$trace_file" &&
>  	GIT_TRACE2="$(pwd)/$trace_file" "$@" 2>&8 &&
> 
> -	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
> -	test $workers -eq $expected_workers &&
> +	test $(grep -c "child_start\[..*\] git checkout--worker" "$trace_file") -eq $expected_workers &&
>  	rm "$trace_file"
>  } 8>&2 2>&4
> 
> --
> 2.31.1
>
Ævar Arnfjörð Bjarmason June 5, 2021, 7:56 p.m. UTC | #2
On Sat, Jun 05 2021, René Scharfe wrote:

> The parallel checkout tests fail when run with /bin/dash on MacOS 11.4,
> reporting the following error:
>
>    ./t2080-parallel-checkout-basics.sh: 33: local: 0: bad variable name
>
> That's because wc's output contains leading spaces and this version of
> dash erroneously expands the variable declaration as "local workers= 0",
> i.e. it tries to set the "workers" variable to the empty string and also
> declare a variable named "0", which not a valid name.  This is a known
> dash bug (https://bugs.launchpad.net/ubuntu/+source/dash/+bug/139097).
>
> Work around it by passing the command output directly to test instead of
> storing it in a variable first.  While at it, let grep count the number
> of lines instead of piping its output to wc, which is a bit shorter and
> more efficient.
>
> Helped-by: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> Changes since v1:
> - Explain the root cause.
> - Get rid of the local variable "workers".
> - Adjust title accordingly.
> - Still use grep -c, though.
> - Remove input redirection.
>
>  t/lib-parallel-checkout.sh | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
> index 21f5759732..66350d5207 100644
> --- a/t/lib-parallel-checkout.sh
> +++ b/t/lib-parallel-checkout.sh
> @@ -27,8 +27,7 @@ test_checkout_workers () {
>  	rm -f "$trace_file" &&
>  	GIT_TRACE2="$(pwd)/$trace_file" "$@" 2>&8 &&
>
> -	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
> -	test $workers -eq $expected_workers &&
> +	test $(grep -c "child_start\[..*\] git checkout--worker" "$trace_file") -eq $expected_workers &&
>  	rm "$trace_file"
>  } 8>&2 2>&4

I'd find this thing much clearer if the v2 just narrowly focused on
avoiding the "local", and thus demonstrated the non-portable shell
issue, and perhaps with something like:
	
	diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
	index fd3303552be..aad6f3e2bf1 100755
	--- a/t/check-non-portable-shell.pl
	+++ b/t/check-non-portable-shell.pl
	@@ -48,6 +48,7 @@ sub err {
	 	/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable (use FOO=bar && export FOO)';
	 	/^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
	 		err '"FOO=bar shell_func" assignment extends beyond "shell_func"';
	+	/\blocal\b \S+=\$\(.*?\|\s*\bwc -l\)/ and err 'whitespace handling in local=$(... | wc -l) differs in some dash versions';
	 	$line = '';
	 	# this resets our $. for each file
	 	close ARGV if eof;


The let's do grep -c while we're at it part of this IMO just adds
confusion while skimming future portability issues with --grep=dash or
--grep=POSIX in the future, and looking at the history in v1 it's just
there because in v1 the root cause wasn't fully understood.

If we're doing a general cleanup of that pattern it would seem to be
better to search-replace this with the rest of them in another commit:

    $ git grep '\$\(grep.*\| wc -l' -- t | wc -l
    27
Matheus Tavares June 5, 2021, 10:17 p.m. UTC | #3
On Sat, Jun 5, 2021 at 5:03 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Sat, Jun 05 2021, René Scharfe wrote:
>
> > The parallel checkout tests fail when run with /bin/dash on MacOS 11.4,
> > reporting the following error:
> >
> >    ./t2080-parallel-checkout-basics.sh: 33: local: 0: bad variable name
> >
> > That's because wc's output contains leading spaces and this version of
> > dash erroneously expands the variable declaration as "local workers= 0",
> > i.e. it tries to set the "workers" variable to the empty string and also
> > declare a variable named "0", which not a valid name.  This is a known
> > dash bug (https://bugs.launchpad.net/ubuntu/+source/dash/+bug/139097).
> >
> > Work around it by passing the command output directly to test instead of
> > storing it in a variable first.  While at it, let grep count the number
> > of lines instead of piping its output to wc, which is a bit shorter and
> > more efficient.
> >
> > Helped-by: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
> > Signed-off-by: René Scharfe <l.s.r@web.de>
> > ---
> > Changes since v1:
> > - Explain the root cause.
> > - Get rid of the local variable "workers".
> > - Adjust title accordingly.
> > - Still use grep -c, though.
> > - Remove input redirection.
> >
> >  t/lib-parallel-checkout.sh | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
> > index 21f5759732..66350d5207 100644
> > --- a/t/lib-parallel-checkout.sh
> > +++ b/t/lib-parallel-checkout.sh
> > @@ -27,8 +27,7 @@ test_checkout_workers () {
> >       rm -f "$trace_file" &&
> >       GIT_TRACE2="$(pwd)/$trace_file" "$@" 2>&8 &&
> >
> > -     local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
> > -     test $workers -eq $expected_workers &&
> > +     test $(grep -c "child_start\[..*\] git checkout--worker" "$trace_file") -eq $expected_workers &&
> >       rm "$trace_file"
> >  } 8>&2 2>&4
>
> I'd find this thing much clearer if the v2 just narrowly focused on
> avoiding the "local", and thus demonstrated the non-portable shell
> issue,

I don't have any strong preference, but if we are leaving the "grep |
wc -l" -> "grep -c" conversion to a followup patch, perhaps the
simplest change focusing on the dash issue would be to quote the
right-hand side of the "local" assignment:

-     local workers=$(grep "child_start\[..*\] git checkout--worker"
"$trace_file" | wc -l) &&
+     local workers="$(grep "child_start\[..*\] git checkout--worker"
"$trace_file" | wc -l)" &&

(René, could you confirm if this works to make the test pass on dash?)

Alternatively, we could use `test_line_count` as SZEDER Gábor
suggested in a parallel reply.
Matheus Tavares June 5, 2021, 10:21 p.m. UTC | #4
On Sat, Jun 5, 2021 at 7:17 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> I don't have any strong preference, but if we are leaving the "grep |
> wc -l" -> "grep -c" conversion to a followup patch, perhaps the
> simplest change focusing on the dash issue would be to quote the
> right-hand side of the "local" assignment:
>
> -     local workers=$(grep "child_start\[..*\] git checkout--worker"
> "$trace_file" | wc -l) &&
> +     local workers="$(grep "child_start\[..*\] git checkout--worker"
> "$trace_file" | wc -l)" &&

Oops, sorry for the odd line breaking. I forgot that Gmail's web
client would break these lines.
René Scharfe June 6, 2021, 1:01 a.m. UTC | #5
Am 05.06.21 um 21:56 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sat, Jun 05 2021, René Scharfe wrote:
>
>> The parallel checkout tests fail when run with /bin/dash on MacOS 11.4,
>> reporting the following error:
>>
>>    ./t2080-parallel-checkout-basics.sh: 33: local: 0: bad variable name
>>
>> That's because wc's output contains leading spaces and this version of
>> dash erroneously expands the variable declaration as "local workers= 0",
>> i.e. it tries to set the "workers" variable to the empty string and also
>> declare a variable named "0", which not a valid name.  This is a known
>> dash bug (https://bugs.launchpad.net/ubuntu/+source/dash/+bug/139097).
>>
>> Work around it by passing the command output directly to test instead of
>> storing it in a variable first.  While at it, let grep count the number
>> of lines instead of piping its output to wc, which is a bit shorter and
>> more efficient.
>>
>> Helped-by: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>> Changes since v1:
>> - Explain the root cause.
>> - Get rid of the local variable "workers".
>> - Adjust title accordingly.
>> - Still use grep -c, though.
>> - Remove input redirection.
>>
>>  t/lib-parallel-checkout.sh | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
>> index 21f5759732..66350d5207 100644
>> --- a/t/lib-parallel-checkout.sh
>> +++ b/t/lib-parallel-checkout.sh
>> @@ -27,8 +27,7 @@ test_checkout_workers () {
>>  	rm -f "$trace_file" &&
>>  	GIT_TRACE2="$(pwd)/$trace_file" "$@" 2>&8 &&
>>
>> -	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
>> -	test $workers -eq $expected_workers &&
>> +	test $(grep -c "child_start\[..*\] git checkout--worker" "$trace_file") -eq $expected_workers &&
>>  	rm "$trace_file"
>>  } 8>&2 2>&4
>
> I'd find this thing much clearer if the v2 just narrowly focused on
> avoiding the "local", and thus demonstrated the non-portable shell
> issue,

I was not aiming for a minimal fix and I don't think the patch above is
too complex, but you're right that at this point in the release cycle a
duct-tape-style patch would be better.

> and perhaps with something like:
>
> 	diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
> 	index fd3303552be..aad6f3e2bf1 100755
> 	--- a/t/check-non-portable-shell.pl
> 	+++ b/t/check-non-portable-shell.pl
> 	@@ -48,6 +48,7 @@ sub err {
> 	 	/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable (use FOO=bar && export FOO)';
> 	 	/^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
> 	 		err '"FOO=bar shell_func" assignment extends beyond "shell_func"';
> 	+	/\blocal\b \S+=\$\(.*?\|\s*\bwc -l\)/ and err 'whitespace handling in local=$(... | wc -l) differs in some dash versions';

Any command can output whitespace, it's not limited to wc -l.  So a
better rule might be to always quote command substitutions in local
variable declarations (local foo="$(...)").  Or to disallow assignments
with local altogether, like we already do for export, but that might be
a bit much.

> 	 	$line = '';
> 	 	# this resets our $. for each file
> 	 	close ARGV if eof;
>
>
> The let's do grep -c while we're at it part of this IMO just adds
> confusion while skimming future portability issues with --grep=dash or
> --grep=POSIX in the future, and looking at the history in v1 it's just
> there because in v1 the root cause wasn't fully understood.

True, I was still gripping the "use grep -c instead of grep | wc -l"
hammer.  I better rewatch https://www.youtube.com/watch?v=Yv4tI6939q0

>
> If we're doing a general cleanup of that pattern it would seem to be
> better to search-replace this with the rest of them in another commit:
>
>     $ git grep '\$\(grep.*\| wc -l' -- t | wc -l
>     27
>
René Scharfe June 6, 2021, 1:01 a.m. UTC | #6
Am 05.06.21 um 21:09 schrieb SZEDER Gábor:
> On Sat, Jun 05, 2021 at 08:11:24PM +0200, René Scharfe wrote:
>> The parallel checkout tests fail when run with /bin/dash on MacOS 11.4,
>> reporting the following error:
>>
>>    ./t2080-parallel-checkout-basics.sh: 33: local: 0: bad variable name
>>
>> That's because wc's output contains leading spaces and this version of
>> dash erroneously expands the variable declaration as "local workers= 0",
>> i.e. it tries to set the "workers" variable to the empty string and also
>> declare a variable named "0", which not a valid name.  This is a known
>> dash bug (https://bugs.launchpad.net/ubuntu/+source/dash/+bug/139097).
>
> Perhaps a more accurate wording for this bug would be:
>
>   ... and even fairly recent versions of dash erroneously perform
>   field splitting on the expansion of the command substitution before
>   assigning it to a local variable.

OK.

>
> I think the relevant part of POSIX is section 2.9.1 Simple Commands:
>
>   4. Each variable assignment shall be expanded for tilde expansion,
>      parameter expansion, command substitution, arithmetic expansion,
>      and quote removal prior to assigning the value.
>
>   https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_01
>
> Note that it didn't mention field splitting; though POSIX doesn't
> specifies local variables in the first place, so...
>
> Anyway, this bug has been fixed in v0.5.11 (2020-06-01).
> This is an old bug, it was already present in v0.5.5 (2009-01-13); I
> didn't check earlier versions.
>
>> Work around it by passing the command output directly to test instead of
>> storing it in a variable first.  While at it, let grep count the number
>> of lines instead of piping its output to wc, which is a bit shorter and
>> more efficient.
>
> A more debug-friendly alternative would be to save 'grep's output to a
> temporary file and use 'test_line_count = $expected_workers'.

Yes, but that would cement the use of grep and wc -l and I still can't
let go of the idea that grep -c would be slightly quicker.  And it would
add file I/O.

Something like this would be more efficient in the expected case:

	if test $expected_count -ne $(grep -c -e "$pattern" "$file")
	then
		echo "Expected $expected_count lines matching $patter, but got:"
		grep -e "$pattern" "$file"
		return 1
	fi

I have no performance numbers to show, just the vague feeling that the
test suite takes way too long already.

Anyway, for now a minimal fix should do.  Debug features can be added to
this case and several similar ones later.

>
>> Helped-by: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>> Changes since v1:
>> - Explain the root cause.
>> - Get rid of the local variable "workers".
>> - Adjust title accordingly.
>> - Still use grep -c, though.
>> - Remove input redirection.
>>
>>  t/lib-parallel-checkout.sh | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
>> index 21f5759732..66350d5207 100644
>> --- a/t/lib-parallel-checkout.sh
>> +++ b/t/lib-parallel-checkout.sh
>> @@ -27,8 +27,7 @@ test_checkout_workers () {
>>  	rm -f "$trace_file" &&
>>  	GIT_TRACE2="$(pwd)/$trace_file" "$@" 2>&8 &&
>>
>> -	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
>> -	test $workers -eq $expected_workers &&
>> +	test $(grep -c "child_start\[..*\] git checkout--worker" "$trace_file") -eq $expected_workers &&
>>  	rm "$trace_file"
>>  } 8>&2 2>&4
>>
>> --
>> 2.31.1
>>
Junio C Hamano June 6, 2021, 1:28 a.m. UTC | #7
René Scharfe <l.s.r@web.de> writes:

> Am 05.06.21 um 21:56 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Sat, Jun 05 2021, René Scharfe wrote:
>>
>>> -	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
>>> -	test $workers -eq $expected_workers &&
>>> +	test $(grep -c "child_start\[..*\] git checkout--worker" "$trace_file") -eq $expected_workers &&
>>>  	rm "$trace_file"
>>>  } 8>&2 2>&4
>>
>> I'd find this thing much clearer if the v2 just narrowly focused on
>> avoiding the "local", and thus demonstrated the non-portable shell
>> issue,
>
> I was not aiming for a minimal fix and I don't think the patch above is
> too complex, but you're right that at this point in the release cycle a
> duct-tape-style patch would be better.

Sorry for being late to the party but I tend to disagree.  "grep -c"
is used elsewhere in the tests, we know it is safe.

IOW, the above change is quite straight-forward.  It is much more
obvious and close to "duct-tape-style" fix, than some altermatives
like enclosing the whole command substitution in a pair of
double-quotes and rely on $workers always getting used without
double-quotes around it.  To me, that looks more subtle and brittle.

>> and perhaps with something like:
>>
>> 	diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
>> 	index fd3303552be..aad6f3e2bf1 100755
>> 	--- a/t/check-non-portable-shell.pl
>> 	+++ b/t/check-non-portable-shell.pl
>> 	@@ -48,6 +48,7 @@ sub err {
>> 	 	/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable (use FOO=bar && export FOO)';
>> 	 	/^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
>> 	 		err '"FOO=bar shell_func" assignment extends beyond "shell_func"';
>> 	+	/\blocal\b \S+=\$\(.*?\|\s*\bwc -l\)/ and err 'whitespace handling in local=$(... | wc -l) differs in some dash versions';
>
> Any command can output whitespace, it's not limited to wc -l.  So a
> better rule might be to always quote command substitutions in local
> variable declarations (local foo="$(...)").  Or to disallow assignments
> with local altogether, like we already do for export, but that might be
> a bit much.

I actually do not mind "no assignments with local decl" that much.
I agree that is outside the scope of this particular fix.
diff mbox series

Patch

diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
index 21f5759732..66350d5207 100644
--- a/t/lib-parallel-checkout.sh
+++ b/t/lib-parallel-checkout.sh
@@ -27,8 +27,7 @@  test_checkout_workers () {
 	rm -f "$trace_file" &&
 	GIT_TRACE2="$(pwd)/$trace_file" "$@" 2>&8 &&

-	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
-	test $workers -eq $expected_workers &&
+	test $(grep -c "child_start\[..*\] git checkout--worker" "$trace_file") -eq $expected_workers &&
 	rm "$trace_file"
 } 8>&2 2>&4