diff mbox

[1/2] mkbuiltins: Use a `while` loop rather than `nl`

Message ID 20160804055411.23558-1-somasissounds@gmail.com (mailing list archive)
State Rejected
Delegated to: Herbert Xu
Headers show

Commit Message

Kylie McClain Aug. 4, 2016, 5:54 a.m. UTC
From: Kylie McClain <somasis@exherbo.org>

nl, while specified in POSIX, is rather obscure and isn't provided by small
coreutils implementations such as `busybox`. This while loop works just as
well for our purposes.
---
 src/mkbuiltins | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Harald van Dijk Aug. 4, 2016, 5:17 p.m. UTC | #1
On 04/08/2016 07:54, Kylie McClain wrote:
> From: Kylie McClain <somasis@exherbo.org>
>
> nl, while specified in POSIX, is rather obscure and isn't provided by small
> coreutils implementations such as `busybox`. This while loop works just as
> well for our purposes.
...
> -sed 's/	-[a-z]*//' $temp2 | nl -ba -v0 |
> +sed 's/	-[a-z]*//' $temp2 | while read line;do \
> +	i=$(( ${i:--1} + 1 )); printf '%s %s\n' "${i}" "${line}";done |

$(( ... )) is mentioned in 
<https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Shell-Substitutions.html> 
as not universally supported, notably Solaris 10 /bin/sh does not have 
it. Given that dash was fairly recently changed to make it build on 
Solaris 9, it seems like a mistake to break that again.

Aside from that, i is such a common variable name that it seems risky to 
assume it is unset. I know I've set it myself in shell sessions that I 
ended up using for building dash. I never exported it, so it wouldn't 
break here, but it doesn't seem like a stretch that someone else does 
export it.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Seb Aug. 6, 2016, 9:02 a.m. UTC | #2
On Thu, Aug 04, 2016 at 07:17:47PM +0200, Harald van Dijk wrote:

> Given that dash was fairly recently changed to make it build on
> Solaris 9, it seems like a mistake to break that again.

Hello,

This is an attempt to simplify the current implementation. This
one does not require any temporary file anymore and relies only
on sort and (a rather basic usage of) awk.

I've checked the the Opensolaris' awk manual, trying to not
introduce some unsupported syntax. I still wonder about the
escaped newlines and the (currently commented) close()
statements, but both can be easily addressed. So, if someone
thinks it's worth testing...

++
Seb.
#!/bin/sh

LC_ALL=C      # Force the collate order of the builtins.
export LC_ALL # some shells may not support the "export FOO=bar" form.

awk '
(NF && ($1 !~ /^#/)) {
    # command [options] alias1 [[options] alias2] ...
    for (i = 2; i <= NF; i++) {
        mask = 0
        cmd = $1
        if ($i ~ /^-/) {
            if ($i ~ /n/)
                cmd = "NULL"
            if ($i ~ /s/)
                mask += 1
            if ($i ~ /[su]/)
                mask += 2
            if ($i ~ /a/)
                mask += 4
            i++
        }
        print $i, cmd, mask, $1
    }
}' $1 | sort -k1,1 | awk '
BEGIN {
    BUILTINS_H = "./builtins.h"
    BUILTINS_C = "./builtins.c"
    warn = "/*\n * This file was generated by the mkbuiltins program.\n */\n"
    print  warn >BUILTINS_H
    print  warn "\n#include \"shell.h\"" \
                "\n#include \"builtins.h\"\n" >BUILTINS_C
}
(!($NF in DEFINE)) {
    up = $NF # /bin/awk has no toupper() on Solaris.
    gsub(/a/, "A", up); gsub(/j/, "J", up); gsub(/s/, "S", up)
    gsub(/b/, "B", up); gsub(/k/, "K", up); gsub(/t/, "T", up)
    gsub(/c/, "C", up); gsub(/l/, "L", up); gsub(/u/, "U", up)
    gsub(/d/, "D", up); gsub(/m/, "M", up); gsub(/v/, "V", up)
    gsub(/e/, "E", up); gsub(/n/, "N", up); gsub(/w/, "W", up)
    gsub(/f/, "F", up); gsub(/o/, "O", up); gsub(/x/, "X", up)
    gsub(/g/, "G", up); gsub(/p/, "P", up); gsub(/y/, "Y", up)
    gsub(/h/, "H", up); gsub(/q/, "Q", up); gsub(/z/, "Z", up)
    gsub(/i/, "I", up); gsub(/r/, "R", up)


    print "#define "up" (builtincmd + "(NR-1)")" >BUILTINS_H
    print "int "$NF"(int, char **);" >BUILTINS_C
    DEFINE[$NF]
}
{
    CMD[NR] = "\""$1"\", "$2", "$3
}
END {
    print "\n#define NUMBUILTINS "NR"\n" \
          "\n#define BUILTIN_SPECIAL 0x1" \
          "\n#define BUILTIN_REGULAR 0x2" \
          "\n#define BUILTIN_ASSIGN  0x4\n" \
          "\nstruct builtincmd {" \
          "\n    const char *name;" \
          "\n    int (*builtin)(int, char **);" \
          "\n    unsigned flags;" \
          "\n};" \
          "\n\nextern const struct builtincmd builtincmd[];" >BUILTINS_H
    # close(BUILTINS_H) # not supported on Solaris ? 

    print "\nconst struct builtincmd builtincmd[] = {" >BUILTINS_C
    for (i = 1; i <= NR; i++)
        print "\t{ "CMD[i]" }," >BUILTINS_C
    print "};" >BUILTINS_C
    # close(BUILTINS_C)
}'

# EoF
Harald van Dijk Aug. 6, 2016, 4:51 p.m. UTC | #3
On 04/08/2016 19:17, Harald van Dijk wrote:
> On 04/08/2016 07:54, Kylie McClain wrote:
>> From: Kylie McClain <somasis@exherbo.org>
>>
>> nl, while specified in POSIX, is rather obscure and isn't provided by
>> small
>> coreutils implementations such as `busybox`. This while loop works
>> just as
>> well for our purposes.
> ...
>> -sed 's/    -[a-z]*//' $temp2 | nl -ba -v0 |
>> +sed 's/    -[a-z]*//' $temp2 | while read line;do \
>> +    i=$(( ${i:--1} + 1 )); printf '%s %s\n' "${i}" "${line}";done |
>
> $(( ... )) is mentioned in
> <https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Shell-Substitutions.html>
> as not universally supported, notably Solaris 10 /bin/sh does not have
> it. Given that dash was fairly recently changed to make it build on
> Solaris 9, it seems like a mistake to break that again.

I just realised that that particular change to make it build on Solaris 
9 involved avoiding running mkbuiltins using sh, choosing to use $SHELL 
instead, as $SHELL should already have been picked by configure as 
something more modern. So the use of $(( ... )) should probably not be a 
problem after all...

> Aside from that, i is such a common variable name that it seems risky to
> assume it is unset. I know I've set it myself in shell sessions that I
> ended up using for building dash. I never exported it, so it wouldn't
> break here, but it doesn't seem like a stretch that someone else does
> export it.

...as long as i is simply initialised before the loop.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Seb Aug. 7, 2016, 10:17 a.m. UTC | #4
On Sat, Aug 06, 2016 at 11:02:42AM +0200, Seb wrote:
> This is an attempt to simplify the current implementation [...]

BTW, I wonder if there is not a (virtual) bug with the current
mkbuiltins. Look at the TRUECMD macro in the generated builtins.h:

 #define TRUECMD (builtincmd + 1)

It doesn't point on "true" (mask=2, "special builtin") but on ":"
(mask=3, "standard builtin"), whatever the order of the aliases
you set in builtins.def.in. Maybe it's a mere problem with the
macro name ("TRUECMD" should rather be "COLONCMD"), but I don't
know if it is really an intended behaviour as the result is
finally a matter of sorting.

++
Seb.



--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Aug. 8, 2016, 3:33 p.m. UTC | #5
Harald van Dijk <harald@gigawatt.nl> wrote:
> On 04/08/2016 07:54, Kylie McClain wrote:
>> From: Kylie McClain <somasis@exherbo.org>
>>
>> nl, while specified in POSIX, is rather obscure and isn't provided by small
>> coreutils implementations such as `busybox`. This while loop works just as
>> well for our purposes.
> ...
>> -sed 's/      -[a-z]*//' $temp2 | nl -ba -v0 |
>> +sed 's/      -[a-z]*//' $temp2 | while read line;do \
>> +     i=$(( ${i:--1} + 1 )); printf '%s %s\n' "${i}" "${line}";done |
> 
> $(( ... )) is mentioned in 
> <https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Shell-Substitutions.html> 
> as not universally supported, notably Solaris 10 /bin/sh does not have 
> it. Given that dash was fairly recently changed to make it build on 
> Solaris 9, it seems like a mistake to break that again.
> 
> Aside from that, i is such a common variable name that it seems risky to 
> assume it is unset. I know I've set it myself in shell sessions that I 
> ended up using for building dash. I never exported it, so it wouldn't 
> break here, but it doesn't seem like a stretch that someone else does 
> export it.

nl has been around forever.  I'm not taking this patch.

Cheers,
diff mbox

Patch

diff --git a/src/mkbuiltins b/src/mkbuiltins
index b4d6f4e..a47bce8 100644
--- a/src/mkbuiltins
+++ b/src/mkbuiltins
@@ -101,7 +101,8 @@  cat <<\!
  */
 
 !
-sed 's/	-[a-z]*//' $temp2 | nl -ba -v0 |
+sed 's/	-[a-z]*//' $temp2 | while read line;do \
+	i=$(( ${i:--1} + 1 )); printf '%s %s\n' "${i}" "${line}";done |
 	LC_ALL= LC_COLLATE=C sort -u -k 3,3 |
 	tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ |
 	awk '{	printf "#define %s (builtincmd + %d)\n", $3, $1}'