Message ID | 20180327085530.GA12680@gondor.apana.org.au (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Herbert Xu |
Headers | show |
On 27/03/2018 10:55, Herbert Xu wrote: > Here is a better example: > > a="/*/\nullx" b="/*/\null"; printf "%s\n" $a $b > > dash currently prints > > /*/\nullx > /*/\null > > bash prints > > /*/\nullx > /dev/null > > You may argue the bash behaviour is inconsistent but it actually > makes sense. What happens is that quote removal only applies to > the original token as seen by the shell. It is never applied to > the result of parameter expansion. Huh? Here's what POSIX says: > The order of word expansion shall be as follows: > 1. ... parameter expansion (see Parameter Expansion) ... shall be performed, ... > ... > 3. Pathname expansion (see Pathname Expansion) shall be performed, unless set -f is in effect. > 4. Quote removal (see Quote Removal) shall always be performed last. If you say that quote removal takes place on the original token (meaning before parameter expansion), and if parameter expansion takes place before pathname expansion, then there's nothing left to allow \* to behave differently from *. > Now you may ask why on earth does the second line say "/dev/null" > instead of "/dev/\null". Well that's because it is not the quote > removal step that removed the backslash, but the pathname expansion. > > The fact that the /de\v does not become /dev even though it exists > is just the result of the optimisation to avoid unnecessarily > calling stat(2). I have checked POSIX and I don't see anything > that forbids this behaviour. POSIX never actually says this optimisation is allowed. The only thing that allows it is the fact that it produces the same results anyway. If that stops, then checking the file system for matches becomes required. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 27, 2018 at 11:16:29AM +0200, Harald van Dijk wrote: > > If you say that quote removal takes place on the original token (meaning > before parameter expansion), and if parameter expansion takes place before > pathname expansion, then there's nothing left to allow \* to behave > differently from *. Either you misunderstood me or you misread POSIX. Quote removal never applies to the backslashes which occur as a result of parameter expansion: 2.6.7 Quote Removal The quote characters ( <backslash>, single-quote, and double-quote) that were present in the original word shall be removed unless they have themselves been quoted. It's clear that only quote characters in the *original* word will be removed. > POSIX never actually says this optimisation is allowed. The only thing that > allows it is the fact that it produces the same results anyway. If that > stops, then checking the file system for matches becomes required. It doesn't disallow it either. Can you show me a shell that doesn't apply this optimisation? Cheers,
On 27/03/2018 11:44, Herbert Xu wrote: > On Tue, Mar 27, 2018 at 11:16:29AM +0200, Harald van Dijk wrote: >> >> If you say that quote removal takes place on the original token (meaning >> before parameter expansion), and if parameter expansion takes place before >> pathname expansion, then there's nothing left to allow \* to behave >> differently from *. > > Either you misunderstood me or you misread POSIX. I misunderstood you. Given v='\' and the word \\$v, I took the result of parameter expansion as \\\ where the first two backslashes come from the original word, you take the result of parameter expansion as only \. Because of that, when you wrote quote removal is never applied to the results of parameter expansion, I didn't see what you meant. I think you're right in your terminology, the result of parameter expansion is just \, to get \\\ it would need to be phrased as something like "the result of applying parameter expansion". Thanks for the clarification. >> POSIX never actually says this optimisation is allowed. The only thing that >> allows it is the fact that it produces the same results anyway. If that >> stops, then checking the file system for matches becomes required. > > It doesn't disallow it either. If POSIX specifies a result, and a shell applies an optimisation that causes a different result to be produced, doesn't that inherently disallow that optimisation? There may be some confusion and/or disagreement over what exactly POSIX specifies and/or intends to specify, but I don't see how you can argue that POSIX specifies result A, but it's okay to apply an optimisation that produces result B. > Can you show me a shell that doesn't apply this optimisation? Can you show me any shell other than bash that lets this optimisation affect the results? Like I wrote in my initial message, all other shells I tested take a backslash that comes from a variable after parameter expansion as effectively quoted. Given your b="/*/\null", bash is the only shell I know that expands $b to /dev/null. All others do perform pathname expansion, but check to see if a file \null exists in any top-level directory. Given a="/de[v]/\null" and b="/dev/\null", all shells other than bash, even dash, agree that $a and $b expand to '/dev/\null' if that file exists, or the original string if that file doesn't exist. *That* is what allows the optimisation in the case of $b: expanding to either '/dev/\null' or '/dev/\null', depending on some condition, can be done without evaluating that condition. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 27, 2018 at 01:07:21PM +0200, Harald van Dijk wrote: > > If POSIX specifies a result, and a shell applies an optimisation that causes > a different result to be produced, doesn't that inherently disallow that > optimisation? There may be some confusion and/or disagreement over what > exactly POSIX specifies and/or intends to specify, but I don't see how you > can argue that POSIX specifies result A, but it's okay to apply an > optimisation that produces result B. Wait, you're talking about something completely different. The crux of the issue is whether POSIX allows the backslash to escape the character after it or does it have to be a literal backslash. In fact POSIX is perfectly clear on this: 2.13.1 Patterns Matching a Single Character The following patterns matching a single character shall match a single character: ordinary characters, special pattern characters, and pattern bracket expressions. The pattern bracket expression also shall match a single collating element. A <backslash> character shall escape the following character. The escaping <backslash> shall be discarded. If a pattern ends with an unescaped <backslash>, it is unspecified whether the pattern does not match anything or the pattern is treated as invalid. Nowhere does it say that backslashes that come from the result of parameter expansion is always literal. So it's clear that any shell that treats the backslash as a literal backslash is already broken. > Can you show me any shell other than bash that lets this optimisation affect > the results? The fact is that the other shells are not doing what they do because they're not doing this optimisation. They do what they do because they have broken POSIX because they always treat backslashes that arise from parameter expansion as literal backslashes. FWIW it wouldn't be hard to iron out the anomalous case of /d\ev. You just have treat the backslash as a metacharacter. Cheers,
On 27/03/2018 14:19, Herbert Xu wrote: > Nowhere does it say that backslashes that come from the result of > parameter expansion is always literal. > > So it's clear that any shell that treats the backslash as a literal > backslash is already broken. By the literal POSIX wording, yes, but the fact remains that that's what all shells aside from bash were already doing, so that may mean it's a POSIX defect. That's why I qualified with "POSIX specifies and/or intends to specify". But I'm tempted to agree with your and Martijn's point of view now that it's better to just implement it the way POSIX specifies, that it's more useful that way. > On Tue, Mar 27, 2018 at 01:07:21PM +0200, Harald van Dijk wrote: >> Can you show me any shell other than bash that lets this optimisation affect >> the results? > > The fact is that the other shells are not doing what they do because > they're not doing this optimisation. They do what they do because > they have broken POSIX because they always treat backslashes that > arise from parameter expansion as literal backslashes. Right, but they're doing this optimisation as well. > FWIW it wouldn't be hard to iron out the anomalous case of /d\ev. > You just have treat the backslash as a metacharacter. That's a good point and wouldn't have too much of an impact on performance of existing scripts. BTW, that means both expandmeta()'s metachars variable, and expmeta()'s if (metaflag == 0 || lstat64(expdir, &statb) >= 0) optimisation. You already got rid of the latter in your patch to prevent buffer overflows. In regular /d\ev that doesn't come from a variable, the backslash will have been replaced by CTLESC, but it's not necessary to treat CTLESC as a metacharacter: in that case, either there's a match and pathname expansion produces /dev, or there's no match and quote removal produces /dev, so the optimisation is still valid. It'd only affect backslashes that come from a variable, so it's very limited in scope. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 27, 2018 at 02:57:12PM +0200, Harald van Dijk wrote: > > That's a good point and wouldn't have too much of an impact on performance > of existing scripts. BTW, that means both expandmeta()'s metachars variable, > and expmeta()'s > > if (metaflag == 0 || lstat64(expdir, &statb) >= 0) > > optimisation. You already got rid of the latter in your patch to prevent > buffer overflows. No the purpose of this metaflag check is to bypass the lstat call. My patch simply made it bypass the call by returning earlier. It is not relevant to whether we include backslash as a metacharacter. > In regular /d\ev that doesn't come from a variable, the backslash will have > been replaced by CTLESC, but it's not necessary to treat CTLESC as a > metacharacter: in that case, either there's a match and pathname expansion > produces /dev, or there's no match and quote removal produces /dev, so the > optimisation is still valid. It'd only affect backslashes that come from a > variable, so it's very limited in scope. For the case where the backslash came from the parser, the CTLESC would have already been removed by preglob so there should be no difference. The only problem with the metacharacter approach is that glibc's glob(3) probably won't treat it as a metacharacter and therefore it will still behave in the same way as the current bash. Cheers,
On 27/03/2018 17:14, Herbert Xu wrote: > On Tue, Mar 27, 2018 at 02:57:12PM +0200, Harald van Dijk wrote: >> >> That's a good point and wouldn't have too much of an impact on performance >> of existing scripts. BTW, that means both expandmeta()'s metachars variable, >> and expmeta()'s >> >> if (metaflag == 0 || lstat64(expdir, &statb) >= 0) >> >> optimisation. You already got rid of the latter in your patch to prevent >> buffer overflows. > > No the purpose of this metaflag check is to bypass the lstat call. > My patch simply made it bypass the call by returning earlier. Oh, right. > It is not relevant to whether we include backslash as a metacharacter. I was thinking about not making backslashes set metaflag in expmeta(): when the pathname component doesn't include *, ?, or [, but does include backslashes, then the if (metaflag == 0) block could handle that as long as it performs the lstat64() check unconditionally. There's no need to go through the opendir()/readdir()/closedir() path for that case. Since expmeta() is bypassed for words not containing any potentially-magic characters, the impact might be small enough. Regardless of whether metaflag is set, it would mean things like 'set "["' would start hitting the FS unnecessarily, if I understand correctly: the preglob()-expanded pattern is '\[', and expmeta() can no longer tell this apart from $v where v='\[' where a FS check would be required. Perhaps preglob() should just be avoided, and expmeta() taught to respect both '\\' and CTLESC. '\\' would be a metacharacter and require a FS hit, CTLESC wouldn't require it. This would also avoid some memory allocations. >> In regular /d\ev that doesn't come from a variable, the backslash will have >> been replaced by CTLESC, but it's not necessary to treat CTLESC as a >> metacharacter: in that case, either there's a match and pathname expansion >> produces /dev, or there's no match and quote removal produces /dev, so the >> optimisation is still valid. It'd only affect backslashes that come from a >> variable, so it's very limited in scope. > > For the case where the backslash came from the parser, the CTLESC > would have already been removed by preglob so there should be no > difference. if (!strpbrk(str->text, metachars)) goto nometa; in expandmeta() is done before preglob() is called. Here, it is still not necessary to include CTLESC. > The only problem with the metacharacter approach is that glibc's > glob(3) probably won't treat it as a metacharacter and therefore > it will still behave in the same way as the current bash. This doesn't matter much yet, but yeah, it will if you decide to enable --enable-glob by default in the future. As long as you don't include GLOB_NOESCAPE, then normally, it should be taken as escaping the next character, but it might still trigger GLOB_NOMAGIC to return early. If it does, I'd take that as a glibc bug, but that doesn't help dash. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 27, 2018 at 05:54:53PM +0200, Harald van Dijk wrote: > > I was thinking about not making backslashes set metaflag in expmeta(): when > the pathname component doesn't include *, ?, or [, but does include > backslashes, then the if (metaflag == 0) block could handle that as long as > it performs the lstat64() check unconditionally. There's no need to go > through the opendir()/readdir()/closedir() path for that case. Since > expmeta() is bypassed for words not containing any potentially-magic > characters, the impact might be small enough. Honestly such backslashes should be rare enough that I wouldn't bother with such an optimisation. > Regardless of whether metaflag is set, it would mean things like 'set "["' > would start hitting the FS unnecessarily, if I understand correctly: the > preglob()-expanded pattern is '\[', and expmeta() can no longer tell this > apart from $v where v='\[' where a FS check would be required. No it shouldn't. We only mark [ is meta if there is a matching ]. > Perhaps preglob() should just be avoided, and expmeta() taught to respect > both '\\' and CTLESC. '\\' would be a metacharacter and require a FS hit, > CTLESC wouldn't require it. This would also avoid some memory allocations. We need preglob for glob(3). I want to minimise the amount of code difference between the glob path and the expmeta path. > if (!strpbrk(str->text, metachars)) > goto nometa; > > in expandmeta() is done before preglob() is called. Here, it is still not > necessary to include CTLESC. We could move it after preglob. > This doesn't matter much yet, but yeah, it will if you decide to enable > --enable-glob by default in the future. As long as you don't include > GLOB_NOESCAPE, then normally, it should be taken as escaping the next > character, but it might still trigger GLOB_NOMAGIC to return early. If it > does, I'd take that as a glibc bug, but that doesn't help dash. Well at least glibc glob(3) actually works now after 20 years of trying :) Cheers,
On 27/03/2018 18:04, Herbert Xu wrote: > On Tue, Mar 27, 2018 at 05:54:53PM +0200, Harald van Dijk wrote: >> >> I was thinking about not making backslashes set metaflag in expmeta(): when >> the pathname component doesn't include *, ?, or [, but does include >> backslashes, then the if (metaflag == 0) block could handle that as long as >> it performs the lstat64() check unconditionally. There's no need to go >> through the opendir()/readdir()/closedir() path for that case. Since >> expmeta() is bypassed for words not containing any potentially-magic >> characters, the impact might be small enough. > > Honestly such backslashes should be rare enough that I wouldn't > bother with such an optimisation. Backslashes coming from parameters, sure, but backslashes introduced by preglob(), I'm not so sure. >> Regardless of whether metaflag is set, it would mean things like 'set "["' >> would start hitting the FS unnecessarily, if I understand correctly: the >> preglob()-expanded pattern is '\[', and expmeta() can no longer tell this >> apart from $v where v='\[' where a FS check would be required. > > No it shouldn't. We only mark [ is meta if there is a matching ]. The [ character would mark the whole string possibly-meta in expandmeta(), and the CTLESC wouldn't be seen there. Then, preglob() turns it into \[, and expmeta() wouldn't take [ as a meta character, but would take \ as a meta character. >> Perhaps preglob() should just be avoided, and expmeta() taught to respect >> both '\\' and CTLESC. '\\' would be a metacharacter and require a FS hit, >> CTLESC wouldn't require it. This would also avoid some memory allocations. > > We need preglob for glob(3). I want to minimise the amount of code > difference between the glob path and the expmeta path. Then, how about moving some of expmeta()'s checks to return early outside of it, so that they can also be done in the glob() case? >> if (!strpbrk(str->text, metachars)) >> goto nometa; >> >> in expandmeta() is done before preglob() is called. Here, it is still not >> necessary to include CTLESC. > > We could move it after preglob. Assuming backslash is added to metachars, which was needed anyway: Regardless of whether the check is done before or after preglob(), it would never skip expmeta() when it shouldn't. If it's kept before preglob(), then it will correctly skip expmeta() in more cases than if it's moved after it, so I don't think there's any benefit in moving it. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 27, 2018 at 06:48:45PM +0200, Harald van Dijk wrote: > > Backslashes coming from parameters, sure, but backslashes introduced by > preglob(), I'm not so sure. Right. I guess we could change it so that CTLESC is preserved to distinguish between the backslashes from parameter expansion vs. backslashes from open input. > The [ character would mark the whole string possibly-meta in expandmeta(), > and the CTLESC wouldn't be seen there. Then, preglob() turns it into \[, and > expmeta() wouldn't take [ as a meta character, but would take \ as a meta > character. Oh I though you were talking about test/[, I didn't see the set. But why would someone do set "["? > Then, how about moving some of expmeta()'s checks to return early outside of > it, so that they can also be done in the glob() case? We could. Cheers,
On 27/03/2018 19:02, Herbert Xu wrote: > On Tue, Mar 27, 2018 at 06:48:45PM +0200, Harald van Dijk wrote: >> >> Backslashes coming from parameters, sure, but backslashes introduced by >> preglob(), I'm not so sure. > > Right. I guess we could change it so that CTLESC is preserved > to distinguish between the backslashes from parameter expansion > vs. backslashes from open input. Thinking about it some more, see below. >> The [ character would mark the whole string possibly-meta in expandmeta(), >> and the CTLESC wouldn't be seen there. Then, preglob() turns it into \[, and >> expmeta() wouldn't take [ as a meta character, but would take \ as a meta >> character. > > Oh I though you were talking about test/[, I didn't see the set. > But why would someone do set "["? It was just the most basic command I could think of that shouldn't hit the FS, currently doesn't hit the FS, and would start hitting the FS if backslash gets taken as a metacharacter. Anything containing a quoted metacharacter would do. I suppose I could have used echo "[ ok ]" instead for something that's more likely to pop up in a real script. >> Then, how about moving some of expmeta()'s checks to return early outside of >> it, so that they can also be done in the glob() case? > > We could. expandmeta() is implemented twice, once for the glob() case, once for the expmeta() case. There is not much code shared between them except for preglob(). preglob(), through _rmescapes(), already has to go over the whole string and check each character. If that can be made to return an indication somehow of whether it has seen metacharacters, that could be used for both cases. Perhaps an additional flag that, when set, lets it return NULL (and freeing the allocated string) if no metacharacters were seen? That means expmeta() doesn't need to learn about CTLESC, means there's no need to go over the string a second time in the glob() case to really turn CTLESC into \, and means GLOB_NOMAGIC becomes unnecessary and can be removed as a workaround for glibc bugs. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 27, 2018 at 08:01:10PM +0200, Harald van Dijk wrote: > > >Right. I guess we could change it so that CTLESC is preserved > >to distinguish between the backslashes from parameter expansion > >vs. backslashes from open input. > > Thinking about it some more, see below. Actually let's not go down this route because this would mean that the glob(3) path will never have the same functionality since it cannot possibly see CTLESC. > It was just the most basic command I could think of that shouldn't hit the > FS, currently doesn't hit the FS, and would start hitting the FS if > backslash gets taken as a metacharacter. Anything containing a quoted > metacharacter would do. I suppose I could have used echo "[ ok ]" instead > for something that's more likely to pop up in a real script. My inclination is to just drop the /d\ev issue and use the bash behaviour until such a time that bash changes or POSIX changes. It's such an unlikely case as why would anyone knowingly put backslash into a variable and expecting pathname expansion to remove it without actually using real magic characters. Cheers,
On 27/03/2018 20:24, Herbert Xu wrote: > On Tue, Mar 27, 2018 at 08:01:10PM +0200, Harald van Dijk wrote: >> >>> Right. I guess we could change it so that CTLESC is preserved >>> to distinguish between the backslashes from parameter expansion >>> vs. backslashes from open input. >> >> Thinking about it some more, see below. > > Actually let's not go down this route because this would mean > that the glob(3) path will never have the same functionality > since it cannot possibly see CTLESC. Yes, that's why I ended up proposing what I put below, which does allow them the same functionality. Neither glob() nor expmeta() would see CTLESC, and neither would need to, since neither would need to optimise for the no-metachars case. >> It was just the most basic command I could think of that shouldn't hit the >> FS, currently doesn't hit the FS, and would start hitting the FS if >> backslash gets taken as a metacharacter. Anything containing a quoted >> metacharacter would do. I suppose I could have used echo "[ ok ]" instead >> for something that's more likely to pop up in a real script. > > My inclination is to just drop the /d\ev issue and use the bash > behaviour until such a time that bash changes or POSIX changes. What would need to change in POSIX to convince you to change dash to match what you were already arguing is the POSIX-mandated behaviour? :) More serious response: I do think it's worth raising this as a bash bug too and seeing what they do. > It's such an unlikely case as why would anyone knowingly put > backslash into a variable and expecting pathname expansion to > remove it without actually using real magic characters. That's true. The least unlikely scenario I can think of is a directory name containing [ but not ]. A script may end up escaping the [ in that case just as a precaution, even though it's not strictly needed, and glob() may return early due to GLOB_NOMAGIC picking up on the fact that [ is not magic in that context. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 27, 2018 at 08:19:19PM +0800, Herbert Xu wrote: > On Tue, Mar 27, 2018 at 01:07:21PM +0200, Harald van Dijk wrote: > > If POSIX specifies a result, and a shell applies an optimisation that causes > > a different result to be produced, doesn't that inherently disallow that > > optimisation? There may be some confusion and/or disagreement over what > > exactly POSIX specifies and/or intends to specify, but I don't see how you > > can argue that POSIX specifies result A, but it's okay to apply an > > optimisation that produces result B. > Wait, you're talking about something completely different. The > crux of the issue is whether POSIX allows the backslash to escape > the character after it or does it have to be a literal backslash. > In fact POSIX is perfectly clear on this: > 2.13.1 Patterns Matching a Single Character > The following patterns matching a single character shall match a > single character: ordinary characters, special pattern characters, > and pattern bracket expressions. The pattern bracket expression > also shall match a single collating element. A <backslash> > character shall escape the following character. The escaping > <backslash> shall be discarded. If a pattern ends with an > unescaped <backslash>, it is unspecified whether the pattern > does not match anything or the pattern is treated as invalid. > Nowhere does it say that backslashes that come from the result of > parameter expansion is always literal. > So it's clear that any shell that treats the backslash as a literal > backslash is already broken. I don't think it is clear at all. Note the final paragraph of 2.13.1: ] When pattern matching is used where shell quote removal is not ] performed (such as in the argument to the find -name primary when find ] is being called using one of the exec functions as defined in the ] System Interfaces volume of POSIX.1-2008, or in the pattern argument ] to the fnmatch() function), special characters can be escaped to ] remove their special meaning by preceding them with a <backslash> ] character. This escaping <backslash> is discarded. The sequence "\\" ] represents one literal <backslash>. All of the requirements and ] effects of quoting on ordinary, shell special, and special pattern ] characters shall apply to escaping in this context. This implies that special characters cannot be escaped using a backslash in a context where shell quote removal is performed. Instead, special characters can be escaped using shell quoting. As a result, the simplest form of parameter expansion has either all or none of the generated characters quoted (depending on whether the expansion is in double-quotes or not). There is also a sentence "The shell special characters always require quoting." which seems untrue in practice if the characters come from an expansion. Something like touch 'a&b' sh -c 'w=*\&*; printf "%s\n" $w' works for many shells as sh. However, this could be explained away as undefined behaviour. Furthermore, POSIX generally intends to specify the behaviour of the Bourne shell and ksh88. If the Bourne shell and ksh88 both agree, and do not match an interpretation of POSIX, then it is likely that the interpretation is wrong or the standard text has a bug. > > Can you show me any shell other than bash that lets this > > optimisation affect the results? > The fact is that the other shells are not doing what they do because > they're not doing this optimisation. They do what they do because > they have broken POSIX because they always treat backslashes that > arise from parameter expansion as literal backslashes. Let's use clear terminology: if it affects behaviour, it is by definition not an optimization. It is a special case for words not containing special pattern characters (for some value of "special pattern characters").
On 3/27/18 11:32 PM, Jilles Tjoelker wrote: > I don't think it is clear at all. Note the final paragraph of 2.13.1: > > ] When pattern matching is used where shell quote removal is not > ] performed [...] > > This implies that special characters cannot be escaped using a backslash > in a context where shell quote removal is performed. Taken literally, it just says something about what happens when shell quote removal is not performed. In the cases we're talking about, shell quote removal is performed, so this simply wouldn't apply. Perhaps that's taking it more literally than intended, I don't know. > Instead, special > characters can be escaped using shell quoting. As a result, the simplest > form of parameter expansion has either all or none of the generated > characters quoted (depending on whether the expansion is in > double-quotes or not). > > There is also a sentence "The shell special characters always require > quoting." which seems untrue in practice if the characters come from an > expansion. Something like > > touch 'a&b' > sh -c 'w=*\&*; printf "%s\n" $w' > > works for many shells as sh. However, this could be explained away as > undefined behaviour. This is what allows extensions to glob syntax, if those extensions use shell special characters. p="*(ab)"; case abab in $p) echo match ;; esac This prints "match" in ksh93. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3/27/18 10:55 AM, Herbert Xu wrote: > So going back to dash yes I think we should adopt the bash behaviour > for pathname expansion and keep the existing case semantics. > > This patch does exactly that. Note that this patch does not work > unless you have already applied > > https://patchwork.kernel.org/patch/10306507/ > > because otherwise the optimisation mentioned above does not get > detected correctly and we will end up doing quote removal twice. This introduces a buffer overread. When expmeta() sees a backslash, it assumes it can just skip the next character, assuming the next character is not a forward slash. By treating expanded backslashes as unquoted, it becomes possible for the next character to be the terminating '\0'. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 27, 2018 at 08:38:01PM +0200, Harald van Dijk wrote: > > >My inclination is to just drop the /d\ev issue and use the bash > >behaviour until such a time that bash changes or POSIX changes. > > What would need to change in POSIX to convince you to change dash to match > what you were already arguing is the POSIX-mandated behaviour? :) No the passage I quoted only mandates that backslashes be interpreted when doing the matching. It doesn't say anything about whether it should be removed after matching is done. The only thing it does say is that: If the pattern does not match any existing filenames or pathnames, the pattern string shall be left unchanged. IOW it's silent in the case where the pattern does match. But I do concede that it makes a lot more sense to remove the backslash in the matching case. > That's true. The least unlikely scenario I can think of is a directory name > containing [ but not ]. A script may end up escaping the [ in that case just > as a precaution, even though it's not strictly needed, and glob() may return > early due to GLOB_NOMAGIC picking up on the fact that [ is not magic in that > context. I agree. However, the other reason this is not worth fixing is because all those other shells that always treat the backslash as a literal won't remove it in this case anyway. So nor sensible script could possibly use such a feature even if we implemented it. $ ksh -c 'a="/usr/bin/\["; printf "%s\n" $a' /usr/bin/\[ $ If you really wanted to do this in a script, the portable way is $ ksh -c 'a="/usr/bin/[[]"; printf "%s\n" $a' /usr/bin/[ $ Cheers,
On Tue, Mar 27, 2018 at 11:32:36PM +0200, Jilles Tjoelker wrote: > > This implies that special characters cannot be escaped using a backslash > in a context where shell quote removal is performed. Instead, special My argument would be that in the context of parameter expansion quote removal is not done on the portion of the word that results from parameter expansion. Cheers,
On Wed, Mar 28, 2018 at 12:19:17AM +0200, Harald van Dijk wrote: > > This introduces a buffer overread. When expmeta() sees a backslash, it > assumes it can just skip the next character, assuming the next character is > not a forward slash. By treating expanded backslashes as unquoted, it > becomes possible for the next character to be the terminating '\0'. This code has always had to deal with naked backslashes. Can you show me the exact pattern that results in the overread? Thanks,
On 28/03/2018 08:23, Herbert Xu wrote: > On Wed, Mar 28, 2018 at 12:19:17AM +0200, Harald van Dijk wrote: >> >> This introduces a buffer overread. When expmeta() sees a backslash, it >> assumes it can just skip the next character, assuming the next character is >> not a forward slash. By treating expanded backslashes as unquoted, it >> becomes possible for the next character to be the terminating '\0'. > > This code has always had to deal with naked backslashes. Can you > show me the exact pattern that results in the overread? No, it hasn't, because expmeta() is not used in case patterns, and case patterns are currently the only case where naked backslashes can appear. In contexts where pathname expansion is performed, a backslash coming from a variable will be escaped by another backslash in currently released dash versions. Test case: $v='*\' set -- $v Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28/03/2018 08:18, Herbert Xu wrote: > On Tue, Mar 27, 2018 at 08:38:01PM +0200, Harald van Dijk wrote: >> >>> My inclination is to just drop the /d\ev issue and use the bash >>> behaviour until such a time that bash changes or POSIX changes. >> >> What would need to change in POSIX to convince you to change dash to match >> what you were already arguing is the POSIX-mandated behaviour? :) > > No the passage I quoted only mandates that backslashes be > interpreted when doing the matching. It doesn't say anything > about whether it should be removed after matching is done. The > only thing it does say is that: > > If the pattern does not match any existing filenames or pathnames, > the pattern string shall be left unchanged. > > IOW it's silent in the case where the pattern does match. Of course it doesn't say whether characters in the pattern are preserved in the case of a match. That's because when there's a match, pathname expansion produces the file name, not the original pattern. In a directory containing foo.txt, you wouldn't argue that *.txt might expand to *foo.txt just because POSIX doesn't explicitly say the * is removed, would you? > However, the other reason this is not worth fixing is because all > those other shells that always treat the backslash as a literal > won't remove it in this case anyway. I seriously cannot understand the logic of pushing a break from traditional ash behaviour and from all shells apart from bash, giving POSIX as a reason for doing it, and then giving the behaviour of all those other shells as a reason for not doing it properly. If all those other shells are a reason against doing it, then why isn't that a reason for just restoring the dash 0.5.4 behaviour that all those other shells implement, always taking expanded backslash as effectively quoted? > So nor sensible script could > possibly use such a feature even if we implemented it. No portable script could use such a feature. A sensible script certainly could. There are a lot of scripts that aren't meant to be portable across shells. I know I've written scripts for my own use that use dash features that don't work the same in bash. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 28, 2018 at 08:52:57AM +0200, Harald van Dijk wrote: > > I seriously cannot understand the logic of pushing a break from traditional > ash behaviour and from all shells apart from bash, giving POSIX as a reason > for doing it, and then giving the behaviour of all those other shells as a > reason for not doing it properly. It's logical for dash because this aligns the behaviour of pathname expansion with case pattern matching. Cheers,
On 28/03/2018 09:31, Herbert Xu wrote: > On Wed, Mar 28, 2018 at 08:52:57AM +0200, Harald van Dijk wrote: >> >> I seriously cannot understand the logic of pushing a break from traditional >> ash behaviour and from all shells apart from bash, giving POSIX as a reason >> for doing it, and then giving the behaviour of all those other shells as a >> reason for not doing it properly. > > It's logical for dash because this aligns the behaviour of pathname > expansion with case pattern matching. Except it doesn't actually do that. It does in some cases, and not in others. You've made it clear that you don't care about the cases where it doesn't. That can be a valid decision, but it's one you should acknowledge. Also, it's just as logical to restore the case pattern matching to its traditional behaviour to align it with pathname expansion. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 28, 2018 at 09:38:04AM +0200, Harald van Dijk wrote: > > Also, it's just as logical to restore the case pattern matching to its > traditional behaviour to align it with pathname expansion. No I think that makes no sense and clearly contradicts POSIX. Cheers,
On 28/03/2018 10:16, Herbert Xu wrote: > On Wed, Mar 28, 2018 at 09:38:04AM +0200, Harald van Dijk wrote: >> >> Also, it's just as logical to restore the case pattern matching to its >> traditional behaviour to align it with pathname expansion. > > No I think that makes no sense and clearly contradicts POSIX. That's not clear at all. As I already wrote earlier and Jilles Tjoelker chimed in with as well, if there isn't a single shell out there that actually implements what POSIX specifies (bash gets close, but isn't fully there either), not even the shells on which the POSIX specification was based, there's a fair chance it's a POSIX defect, regardless of whether it makes sense. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 28, 2018 at 08:44:28AM +0200, Harald van Dijk wrote: > > Test case: > > $v='*\' > set -- $v I don't see how this would cause an overrun, can you please explain it for me? Thanks,
On 28/03/2018 11:52, Herbert Xu wrote: > On Wed, Mar 28, 2018 at 08:44:28AM +0200, Harald van Dijk wrote: >> >> Test case: >> >> $v='*\' >> set -- $v > > I don't see how this would cause an overrun, can you please explain > it for me? Line numbers are from 0.5.9.1. When expanded backslashes are no longer treated as quoted, this would call expmeta() with the pattern *\, that is with a single unquoted trailing backslash, so: expand.c:1333 if (*p == '\\') esc++; if (p[esc] == '/') { The first if statement will be hit and set esc to 1. p[esc] is then '\0', so the second if block doesn't get entered and the outer loop continues: expand.c:1315 for (p = name; esc = 0, *p; p += esc + 1) { p += esc + 1 will increase p by 2, letting it point just past the terminating '\0'. The loop condition of *p now accesses the byte just past the pattern. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/expand.c b/src/expand.c index 705fef7..07830eb 100644 --- a/src/expand.c +++ b/src/expand.c @@ -848,8 +848,7 @@ memtodest(const char *p, size_t len, const char *syntax, int quotes) { if (c) { if ((quotes & QUOTES_ESC) && ((syntax[c] == CCTL) || - (((quotes & EXP_FULL) || syntax != BASESYNTAX) && - syntax[c] == CBACK))) + (syntax != BASESYNTAX && syntax[c] == CBACK))) USTPUTC(CTLESC, q); } else if (!(quotes & QUOTES_KEEPNUL)) continue;