Message ID | 5567990E.3090902@gigawatt.nl (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Herbert Xu |
Headers | show |
Harald van Dijk <harald@gigawatt.nl> wrote: > That isn't the problem, not exactly anyway. The problem is that getopts > is required to keep internal state separately from the OPTIND variable > (a single integer is insufficient to track the progress when multiple > options are combined in a single word), and that internal state is > stored along with the positional parameters. The positional parameters > are saved just before a function call, and restored when the function > returns. The internal state of getopts should not be saved the same way. > It should probably just be global to dash. I think the current behaviour is fine as far as POSIX is concerned. It says: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html : APPLICATION USAGE ... : Note that shell functions share OPTIND with the calling shell : even though the positional parameters are changed. If the calling : shell and any of its functions uses getopts to parse arguments, : the results are unspecified. Cheers,
On 29/05/2015 04:58, Herbert Xu wrote: > Harald van Dijk <harald@gigawatt.nl> wrote: >> That isn't the problem, not exactly anyway. The problem is that getopts >> is required to keep internal state separately from the OPTIND variable >> (a single integer is insufficient to track the progress when multiple >> options are combined in a single word), and that internal state is >> stored along with the positional parameters. The positional parameters >> are saved just before a function call, and restored when the function >> returns. The internal state of getopts should not be saved the same way. >> It should probably just be global to dash. > > I think the current behaviour is fine as far as POSIX is concerned. > It says: > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html > > : APPLICATION USAGE > > ... > > : Note that shell functions share OPTIND with the calling shell > : even though the positional parameters are changed. If the calling > : shell and any of its functions uses getopts to parse arguments, > : the results are unspecified. The Application usage sections are informative and aren't worded as precisely as the other sections. If a script uses getopts at the global level, and it calls a shell function that too uses getopts, then it is very easy to be covered by > Any other attempt to invoke getopts multiple times in a single shell execution environment with parameters (positional parameters or arg operands) that are not the same in all invocations, or with an OPTIND value modified to be a value other than 1, produces unspecified results. But the test script in this thread does invoke getopts with parameters that are the same in all invocations, and without modifying OPTIND. I don't see anything else in the normative sections that would make the result undefined or unspecified either. I do think the script is valid, and the results in dash should match those of other shells. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 29, 2015 at 07:50:09AM +0200, Harald van Dijk wrote: > > But the test script in this thread does invoke getopts with > parameters that are the same in all invocations, and without > modifying OPTIND. I don't see anything else in the normative > sections that would make the result undefined or unspecified either. > I do think the script is valid, and the results in dash should match > those of other shells. The bash behaviour makes it impossible to call shell functions that invoke getopts while in the middle of an getopts loop. IMHO the dash behaviour makes a lot more sense since a function always brings with it a new set of parameters. That plus the fact that this behaviour has been there since day one makes me reluctant to change it since the POSIX wording is not all that clear. Cheers,
On 01/06/2015 08:29, Herbert Xu wrote: > On Fri, May 29, 2015 at 07:50:09AM +0200, Harald van Dijk wrote: >> >> But the test script in this thread does invoke getopts with >> parameters that are the same in all invocations, and without >> modifying OPTIND. I don't see anything else in the normative >> sections that would make the result undefined or unspecified either. >> I do think the script is valid, and the results in dash should match >> those of other shells. > > The bash behaviour makes it impossible to call shell functions > that invoke getopts while in the middle of an getopts loop. > > IMHO the dash behaviour makes a lot more sense since a function > always brings with it a new set of parameters. That plus the > fact that this behaviour has been there since day one makes me > reluctant to change it since the POSIX wording is not all that > clear. True. Given that almost no shell supports that anyway, there can't be too many scripts that rely on it, but I did warn about the risk of breaking another type of existing scripts as well, I agree that's a real concern. One thing that doesn't really make sense, though: if the getopts internal state is local to a function, then OPTIND and OPTARG really should be too. Because they aren't, nested getopts loops already don't really work in a useful way in dash, because the inner loop overwrites the OPTIND and OPTARG variables. While OPTARG will typically be checked right at the start of the loop, before any inner loop executes, OPTIND is typically used at the end of the loop, in combination with the shift command. The current behaviour makes the OPTIND value in that case unreliable. So either way, I think something should change. But if you prefer to get clarification first about the intended meaning of the POSIX wording, that certainly seems reasonable to me. > Cheers, -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 01, 2015 at 07:30:46PM +0200, Harald van Dijk wrote: > On 01/06/2015 08:29, Herbert Xu wrote: > > On Fri, May 29, 2015 at 07:50:09AM +0200, Harald van Dijk wrote: > >> But the test script in this thread does invoke getopts with > >> parameters that are the same in all invocations, and without > >> modifying OPTIND. I don't see anything else in the normative > >> sections that would make the result undefined or unspecified either. > >> I do think the script is valid, and the results in dash should match > >> those of other shells. > > The bash behaviour makes it impossible to call shell functions > > that invoke getopts while in the middle of an getopts loop. > > IMHO the dash behaviour makes a lot more sense since a function > > always brings with it a new set of parameters. That plus the > > fact that this behaviour has been there since day one makes me > > reluctant to change it since the POSIX wording is not all that > > clear. > True. Given that almost no shell supports that anyway, there can't be > too many scripts that rely on it, but I did warn about the risk of > breaking another type of existing scripts as well, I agree that's a real > concern. FreeBSD sh inherits similar code from ash and likewise has per-function getopts state. Various shell scripts in the FreeBSD base system use getopts in functions without setting OPTIND=1. > One thing that doesn't really make sense, though: if the getopts > internal state is local to a function, then OPTIND and OPTARG really > should be too. Because they aren't, nested getopts loops already don't > really work in a useful way in dash, because the inner loop overwrites > the OPTIND and OPTARG variables. While OPTARG will typically be checked > right at the start of the loop, before any inner loop executes, OPTIND > is typically used at the end of the loop, in combination with the shift > command. The current behaviour makes the OPTIND value in that case > unreliable. First, note that the OPTARG and OPTIND shell variables are not an input to getopts, except for an assignment OPTIND=1 (restoring an OPTIND local at function return does not reset getopts), and that getopts writes OPTIND no matter whether getopts's internal optind changed in this invocation. With that, the value of OPTIND generally used in scripts is not unreliable. OPTIND is generally only checked after getopts returned false (end of options), in the sequence while getopts ...; do ... done shift "$((OPTIND - 1))" > So either way, I think something should change. But if you prefer to get > clarification first about the intended meaning of the POSIX wording, > that certainly seems reasonable to me. I think the POSIX wording is clear enough, but it may not be desirable to change getopts to work that way.
On 02/06/2015 00:10, Jilles Tjoelker wrote: > On Mon, Jun 01, 2015 at 07:30:46PM +0200, Harald van Dijk wrote: >> On 01/06/2015 08:29, Herbert Xu wrote: >>> On Fri, May 29, 2015 at 07:50:09AM +0200, Harald van Dijk wrote: > >>>> But the test script in this thread does invoke getopts with >>>> parameters that are the same in all invocations, and without >>>> modifying OPTIND. I don't see anything else in the normative >>>> sections that would make the result undefined or unspecified either. >>>> I do think the script is valid, and the results in dash should match >>>> those of other shells. > >>> The bash behaviour makes it impossible to call shell functions >>> that invoke getopts while in the middle of an getopts loop. > >>> IMHO the dash behaviour makes a lot more sense since a function >>> always brings with it a new set of parameters. That plus the >>> fact that this behaviour has been there since day one makes me >>> reluctant to change it since the POSIX wording is not all that >>> clear. > >> True. Given that almost no shell supports that anyway, there can't be >> too many scripts that rely on it, but I did warn about the risk of >> breaking another type of existing scripts as well, I agree that's a real >> concern. > > FreeBSD sh inherits similar code from ash and likewise has per-function > getopts state. Various shell scripts in the FreeBSD base system use > getopts in functions without setting OPTIND=1. Yikes. That's an unfortunate effect of writing scripts that only get run on a single shell: things like that don't even show up as a possible problem. It's similar to how many bashisms sneak into supposedly portable shell scripts. >> One thing that doesn't really make sense, though: if the getopts >> internal state is local to a function, then OPTIND and OPTARG really >> should be too. Because they aren't, nested getopts loops already don't >> really work in a useful way in dash, because the inner loop overwrites >> the OPTIND and OPTARG variables. While OPTARG will typically be checked >> right at the start of the loop, before any inner loop executes, OPTIND >> is typically used at the end of the loop, in combination with the shift >> command. The current behaviour makes the OPTIND value in that case >> unreliable. > > First, note that the OPTARG and OPTIND shell variables are not an input > to getopts, except for an assignment OPTIND=1 (restoring an OPTIND local > at function return does not reset getopts), and that getopts writes > OPTIND no matter whether getopts's internal optind changed in this > invocation. > > With that, the value of OPTIND generally used in scripts is not > unreliable. OPTIND is generally only checked after getopts returned > false (end of options), in the sequence > while getopts ...; do > ... > done > shift "$((OPTIND - 1))" Ah, you're right, I missed that there will usually be another execution of getopts before OPTIND is used. Thanks for clearing that up. In that case, I agree, the situations in which the values of OPTIND and OPTARG are unreliable are only situations in which scripts usually don't bother checking their values. >> So either way, I think something should change. But if you prefer to get >> clarification first about the intended meaning of the POSIX wording, >> that certainly seems reasonable to me. > > I think the POSIX wording is clear enough, but it may not be desirable > to change getopts to work that way. It was Herbert Xu who felt the POSIX wording was unclear, and he is the dash maintainer, so his opinion on whether the wording is clear is the one that matters. If it is clear or clarified what POSIX requires, and that POSIX allows the current implementation, then I see no need either to change the dash behaviour. It could still be useful to make OPTIND and OPTARG local, but you've convinced at least me that it's only a minor problem. If it is clear or clarified what POSIX requires, and that POSIX disallows the current implementation, and the current implementation is deemed too desirable to drop, then it might make sense to support both alternatives, with an option at configure time to switch between them. As far as I know, dash does still aim to conform to POSIX, so even if a conscious decision is made to deviate from POSIX by default, I think an option to conform to it would be nice for those who care about it. I would be happy to create a patch, if this approach would be more agreeable. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Harald van Dijk schreef op 29-05-15 om 00:39: > A quick patch to make sure it is global, and isn't reset when it > shouldn't or doesn't need to be, is attached. You can experiment with > it, if you like. I've been using dash with this patch since you posted it, and it works like a charm (including my function that extends getopts' functionality). No issues encountered. Thanks. Further discussion in this thread shows that the patch may conflict with existing usage of 'getopts' for parsing the options within a function (a usage that would make the script quite shell-specific, by the way, because it would rely on Almquist-specific behaviour). The issue, as I understand it, is that 'getopts' keeps not just the OPTIND variable but also an additional invisible internal variable to maintain its state. This is necessary to keep track of combined short options.[*] There appear to be two possible use cases for calling 'getopts' within a function: 1. The option parsing loop is in the function, parsing the function's options. This requires a function-local internal state of 'getopts', otherwise calling a function using getopts from a main getopts loop couldn't possibly work, because there is no way to directly save or restore the unnamed internal state variable of getopts. 2. The option parsing loop is in the main shell environment, but instead of calling getopts directly, the option parsing loop calls a function, passing on the main positional parameters, and that function then calls 'getopts' and does additional things (in my case, re-parse GNU-style --long options in terms of a special short option '--' with argument; but of course it could be anything). This requires a global internal 'getopts' state. Use case 1 requires a global internal 'getopts' state and use case 2 requires a local one, so they are mutually incompatible. But I'm thinking that perhaps there is a way for the shell to distinguish between these two use cases so that they can be reconciled. The standard says that OPTIND is a global variable in any case, so use case 1 above could only work if, before starting the function's option parsing loop, OPTIND is either explicitly declared a function-local variable using the non-standard 'local' keyword or is reinitialized using an assignment. On the other hand, use case 2 could only work if OPTIND is completely left alone by the function, allowing a 'getopts' with a global state to do its thing without interference. So I would suggest the following might reconcile both use cases: By default, make the 'getopts' internal state global. However, whenever OPTIND is either assigned a value within a function or declared local within a function, automatically make the 'getopts' internal state local to the function. Comments? - M. [*] Just as a datapoint, I found that yash has a different strategy for this: it stores both values in OPTIND, separated by a semicolon -- e.g. an $OPTIND of 3:2 means getopts is at the second option in the third argument. -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/eval.c b/src/eval.c index 071fb1b..59e7506 100644 --- a/src/eval.c +++ b/src/eval.c @@ -953,8 +953,6 @@ evalfun(struct funcnode *func, int argc, char **argv, int flags) INTON; shellparam.nparam = argc - 1; shellparam.p = argv + 1; - shellparam.optind = 1; - shellparam.optoff = -1; pushlocalvars(); evaltree(func->n.ndefun.body, flags & EV_TESTED); poplocalvars(0); diff --git a/src/options.c b/src/options.c index 6f381e6..5b24eeb 100644 --- a/src/options.c +++ b/src/options.c @@ -163,8 +163,8 @@ setarg0: } shellparam.p = xargv; - shellparam.optind = 1; - shellparam.optoff = -1; + shoptind = 1; + shoptoff = -1; /* assert(shellparam.malloc == 0 && shellparam.nparam == 0); */ while (*xargv) { shellparam.nparam++; @@ -316,8 +316,6 @@ setparam(char **argv) shellparam.malloc = 1; shellparam.nparam = nparam; shellparam.p = newparam; - shellparam.optind = 1; - shellparam.optoff = -1; } @@ -362,8 +360,6 @@ shiftcmd(int argc, char **argv) } ap2 = shellparam.p; while ((*ap2++ = *ap1++) != NULL); - shellparam.optind = 1; - shellparam.optoff = -1; INTON; return 0; } @@ -394,8 +390,8 @@ void getoptsreset(value) const char *value; { - shellparam.optind = number(value) ?: 1; - shellparam.optoff = -1; + shoptind = number(value) ?: 1; + shoptoff = -1; } /* @@ -412,20 +408,10 @@ getoptscmd(int argc, char **argv) if (argc < 3) sh_error("Usage: getopts optstring var [arg]"); - else if (argc == 3) { + else if (argc == 3) optbase = shellparam.p; - if ((unsigned)shellparam.optind > shellparam.nparam + 1) { - shellparam.optind = 1; - shellparam.optoff = -1; - } - } - else { + else optbase = &argv[3]; - if ((unsigned)shellparam.optind > argc - 2) { - shellparam.optind = 1; - shellparam.optoff = -1; - } - } return getopts(argv[1], argv[2], optbase); } @@ -438,10 +424,10 @@ getopts(char *optstr, char *optvar, char **optfirst) int done = 0; char s[2]; char **optnext; - int ind = shellparam.optind; - int off = shellparam.optoff; + int ind = shoptind; + int off = shoptoff; - shellparam.optind = -1; + shoptind = -1; optnext = optfirst + ind - 1; if (ind <= 1 || off < 0 || strlen(optnext[-1]) < off) @@ -509,8 +495,8 @@ out: s[1] = '\0'; setvar(optvar, s, 0); - shellparam.optoff = p ? p - *(optnext - 1) : -1; - shellparam.optind = ind; + shoptoff = p ? p - *(optnext - 1) : -1; + shoptind = ind; return done; } diff --git a/src/options.h b/src/options.h index 975fe33..8295eb9 100644 --- a/src/options.h +++ b/src/options.h @@ -38,9 +38,9 @@ struct shparam { int nparam; /* # of positional parameters (without $0) */ unsigned char malloc; /* if parameter list dynamically allocated */ char **p; /* parameter list */ - int optind; /* next parameter to be processed by getopts */ - int optoff; /* used by getopts */ }; +int shoptind; /* next parameter to be processed by getopts */ +int shoptoff; /* used by getopts */