Message ID | 20240722065915.80760-4-ericsunshine@charter.net (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | improve one-shot variable detection with shell function | expand |
On Mon, Jul 22, 2024 at 02:59:13AM -0400, Eric Sunshine wrote: > From: Eric Sunshine <sunshine@sunshineco.com> > > Unlike "VAR=val cmd" one-shot environment variable assignments which > exist only for the invocation of 'cmd', those assigned by "VAR=val > shell-func" exist within the running shell and continue to do so until > the process exits. check-non-portable-shell.pl warns when it detects > such usage since, more often than not, the author who writes such an > invocation is unaware of the undesirable behavior. > > However, a limitation of the check is that it only detects such > invocations when variable assignment (i.e. `VAR=val`) is the first > thing on the line. Thus, it can easily be fooled by an invocation such > as: > > echo X | VAR=val shell-func > > Address this shortcoming by loosening the check so that the variable > assignment can be recognized even when not at the beginning of the line. > > Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> > --- > t/check-non-portable-shell.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl > index b2b28c2ced..44b23d6ddd 100755 > --- a/t/check-non-portable-shell.pl > +++ b/t/check-non-portable-shell.pl > @@ -49,7 +49,7 @@ sub err { > /\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable (use FOO=bar && export FOO)'; > /\blocal\s+[A-Za-z0-9_]*=\$([A-Za-z0-9_{]|[(][^(])/ and > err q(quote "$val" in 'local var=$val'); > - /^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and > + /\b([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and !/test_env.+=/ and exists($func{$4}) and Losing "^\s*" means we'll cause false positives, such as: # VAR=VAL shell-func echo VAR=VAL shell-func Regardless of that, the regex will continue to pose problems with: VAR=$OTHER_VALUE shell-func VAR=$(cmd) shell-func VAR=VAL\ UE shell-func VAR="\"val\" shell-func UE" non-shell-func Which, of course, should be cases that should be written in a more orthodox way. But we will start to detect errors like the ones mentioned in the message, which are more likely to happen. I think this change is a good step forward, and I'm happy with it as it is. Thanks > err '"FOO=bar shell_func" assignment extends beyond "shell_func"'; > $line = ''; > # this resets our $. for each file > -- > 2.45.2 >
On Mon, Jul 22, 2024 at 12:01 AM Eric Sunshine <ericsunshine@charter.net> wrote: > > From: Eric Sunshine <sunshine@sunshineco.com> > > Unlike "VAR=val cmd" one-shot environment variable assignments which > exist only for the invocation of 'cmd', those assigned by "VAR=val > shell-func" exist within the running shell and continue to do so until > the process exits. check-non-portable-shell.pl warns when it detects > such usage since, more often than not, the author who writes such an > invocation is unaware of the undesirable behavior. > > However, a limitation of the check is that it only detects such > invocations when variable assignment (i.e. `VAR=val`) is the first > thing on the line. Thus, it can easily be fooled by an invocation such > as: > > echo X | VAR=val shell-func > > Address this shortcoming by loosening the check so that the variable > assignment can be recognized even when not at the beginning of the line. > > Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> > --- > t/check-non-portable-shell.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl > index b2b28c2ced..44b23d6ddd 100755 > --- a/t/check-non-portable-shell.pl > +++ b/t/check-non-portable-shell.pl > @@ -49,7 +49,7 @@ sub err { > /\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable (use FOO=bar && export FOO)'; > /\blocal\s+[A-Za-z0-9_]*=\$([A-Za-z0-9_{]|[(][^(])/ and > err q(quote "$val" in 'local var=$val'); > - /^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and > + /\b([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and !/test_env.+=/ and exists($func{$4}) and > err '"FOO=bar shell_func" assignment extends beyond "shell_func"'; Is there an example of a shell on Linux that has this behavior that I can observe, and/or reproduction steps? `bash --posix` does not do this, as far as I can tell. I tried: ``` bash --posix echo_hi() { echo hi; } FOO=BAR echo_hi echo $FOO <no output> ``` and the simpler: ``` bash --posix FOO=BAR echo hi echo $FOO ``` Both attempts were done interactively, not via a script. I'm asking mostly because of the recent "platform support document" patch series - this is a very surprising behavior (which is presumably why it was added to this test), and it's possible this was just a bug in a shell that isn't really used anymore. Having documentation on how to reproduce the issue lets us know when we can remove these kinds of restrictions on our codebase that we took in the name of compatibility, possibly over a decade ago. > $line = ''; > # this resets our $. for each file > -- > 2.45.2 > >
Kyle Lippincott <spectral@google.com> writes: > Is there an example of a shell on Linux that has this behavior that I > can observe, and/or reproduction steps? Every once in a while this comes up and we fix, e.g. https://lore.kernel.org/git/528CE716.8060307@ramsay1.demon.co.uk/ https://lore.kernel.org/git/c6efda03848abc00cf8bf8d84fc34ef0d652b64c.1264151435.git.mhagger@alum.mit.edu/ https://lore.kernel.org/git/Koa4iojOlOQ_YENPwWXKt7G8Aa1x6UaBnFFtliKdZmpcrrqOBhY7NQ@cipher.nrlssc.navy.mil/ https://lore.kernel.org/git/20180713055205.32351-2-sunshine@sunshineco.com/ https://lore.kernel.org/git/574E27A4.6040804@ramsayjones.plus.com/ which is from a query https://lore.kernel.org/git/?q=one-shot+export+shell+function but unfortunately we do not document which exact shell the observed breakage happened with. The closest article I found that is suitable as a discussion reignitor talks about what POSIX requires, which may be more relevant: https://lore.kernel.org/git/4B5027B8.2090507@viscovery.net/
On Mon, Jul 22, 2024 at 11:10 AM Junio C Hamano <gitster@pobox.com> wrote: > > Kyle Lippincott <spectral@google.com> writes: > > > Is there an example of a shell on Linux that has this behavior that I > > can observe, and/or reproduction steps? > > Every once in a while this comes up and we fix, e.g. > > https://lore.kernel.org/git/528CE716.8060307@ramsay1.demon.co.uk/ > https://lore.kernel.org/git/c6efda03848abc00cf8bf8d84fc34ef0d652b64c.1264151435.git.mhagger@alum.mit.edu/ > https://lore.kernel.org/git/Koa4iojOlOQ_YENPwWXKt7G8Aa1x6UaBnFFtliKdZmpcrrqOBhY7NQ@cipher.nrlssc.navy.mil/ > https://lore.kernel.org/git/20180713055205.32351-2-sunshine@sunshineco.com/ Thanks, this one leads to https://lore.kernel.org/git/20180713055205.32351-1-sunshine@sunshineco.com/, which references https://public-inbox.org/git/xmqqefg8w73c.fsf@gitster-ct.c.googlers.com/T/, which claims that `dash` has this behavior 6 years ago. The version of `dash` I have on my machine right now doesn't seem to have this issue, but I can believe some older version does. > https://lore.kernel.org/git/574E27A4.6040804@ramsayjones.plus.com/ > > which is from a query > > https://lore.kernel.org/git/?q=one-shot+export+shell+function > > but unfortunately we do not document which exact shell the observed > breakage happened with. > > The closest article I found that is suitable as a discussion > reignitor talks about what POSIX requires, which may be more > relevant: > > https://lore.kernel.org/git/4B5027B8.2090507@viscovery.net/ This claims that `ksh` "gets it right", and I can confirm that ksh does behave this way on my Linux machine. Having just looked at the POSIX standard (I don't think I'm allowed to copy from this document), the POSIX standard (POSIX.1-2024, at least) explicitly leaves it unspecified whether the variable assignments remain in effect after function execution. Thanks for indulging my curiosity; should we include a statement in the linter along the lines of `# POSIX.1-2024 explicitly does not specify if variable assignment persists after executing a shell function; some shells, such as ksh, have these variables remain.`?
Kyle Lippincott <spectral@google.com> writes: > Having just looked at the POSIX standard (I don't think I'm allowed to > copy from this document), the POSIX standard (POSIX.1-2024, at least) > explicitly leaves it unspecified whether the variable assignments > remain in effect after function execution. True. https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_09_01_02 also says that it is unspecified if the variable gets exported, and older version of dash that comes on Ubuntu 20.04 chooses *not* to export, which was the test breakage that triggered this whole discussion. The thread can be seen here: https://lore.kernel.org/git/xmqqbk2p9lwi.fsf_-_@gitster.g/ > Thanks for indulging my curiosity; should we include a statement in > the linter along the lines of `# POSIX.1-2024 explicitly does not > specify if variable assignment persists after executing a shell > function; some shells, such as ksh, have these variables remain.`? Giving a review (either positive or negative is fine, as long as it is constructive) on the update to CodingGuidelines https://lore.kernel.org/git/xmqqbk2p9lwi.fsf_-_@gitster.g/ may be a good place to start. Thanks.
On Mon, Jul 22, 2024 at 10:46 AM Rubén Justo <rjusto@gmail.com> wrote: > On Mon, Jul 22, 2024 at 02:59:13AM -0400, Eric Sunshine wrote: > > - /^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and > > + /\b([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and !/test_env.+=/ and exists($func{$4}) and > > Losing "^\s*" means we'll cause false positives, such as: > > # VAR=VAL shell-func > echo VAR=VAL shell-func True, though, considering that "shell-func" in these examples must match the name of a function actually defined in one of the input files, one would expect (or at least hope) that this sort of false-positive will be exceedingly rare. Indeed, there are no such false-positives in the existing test scripts. Of course, we can always tighten the regex later if it proves to be problematic. > Regardless of that, the regex will continue to pose problems with: > > VAR=$OTHER_VALUE shell-func > VAR=$(cmd) shell-func > VAR=VAL\ UE shell-func > VAR="\"val\" shell-func UE" non-shell-func > > Which, of course, should be cases that should be written in a more > orthodox way. Yes, it can be difficult to be thorough when "linting" a programming language merely via regular-expressions, and this particular expression is already almost unreadable. The effort involved in trying to make it perfect may very well outweigh the potential gain in coverage. > But we will start to detect errors like the ones mentioned in the > message, which are more likely to happen. Indeed.
On Fri, Jul 26, 2024 at 02:45:59AM -0400, Eric Sunshine wrote: > On Mon, Jul 22, 2024 at 10:46 AM Rubén Justo <rjusto@gmail.com> wrote: > > On Mon, Jul 22, 2024 at 02:59:13AM -0400, Eric Sunshine wrote: > > > - /^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and > > > + /\b([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and !/test_env.+=/ and exists($func{$4}) and > > > > Losing "^\s*" means we'll cause false positives, such as: > > > > # VAR=VAL shell-func > > echo VAR=VAL shell-func > > True, though, considering that "shell-func" in these examples must > match the name of a function actually defined in one of the input > files, one would expect (or at least hope) that this sort of > false-positive will be exceedingly rare. Indeed, there are no such > false-positives in the existing test scripts. Of course, we can always > tighten the regex later if it proves to be problematic. > > > Regardless of that, the regex will continue to pose problems with: > > > > VAR=$OTHER_VALUE shell-func > > VAR=$(cmd) shell-func > > VAR=VAL\ UE shell-func > > VAR="\"val\" shell-func UE" non-shell-func > > > > Which, of course, should be cases that should be written in a more > > orthodox way. > > Yes, it can be difficult to be thorough when "linting" a programming > language merely via regular-expressions, and this particular > expression is already almost unreadable. The effort involved in trying > to make it perfect may very well outweigh the potential gain in > coverage. I tried to be exhaustive in the analysis of the change and explicit in the conclusions so that it is clear, and documented in the list, that we acknowledge the magnitude of the change. I agree. I don't think it's worth refining the regex any further. It might even be counterproductive. It covers the cases it was already covering and the new ones that have occurred. A simple 'Acked-by: Rubén Justo <rjusto@gmail.com>' didn't seem sufficient to me :), but perhaps it would have been clearer. I have the same positive opinion of your new iteration: 20240722065915.80760-5-ericsunshine@charter.net > > > But we will start to detect errors like the ones mentioned in the > > message, which are more likely to happen. > > Indeed.
diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl index b2b28c2ced..44b23d6ddd 100755 --- a/t/check-non-portable-shell.pl +++ b/t/check-non-portable-shell.pl @@ -49,7 +49,7 @@ sub err { /\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable (use FOO=bar && export FOO)'; /\blocal\s+[A-Za-z0-9_]*=\$([A-Za-z0-9_{]|[(][^(])/ and err q(quote "$val" in 'local var=$val'); - /^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and + /\b([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and !/test_env.+=/ and exists($func{$4}) and err '"FOO=bar shell_func" assignment extends beyond "shell_func"'; $line = ''; # this resets our $. for each file