diff mbox series

[v6,06/15] tools/nolibc: __sysret: support syscalls who return a pointer

Message ID 11fd815773f7d202d329ac2c64cd6980b32e4fcf.1688739492.git.falcon@tinylab.org (mailing list archive)
State Accepted
Commit 6591be4a73feb955a107c70c7e5b621a97a2eb4b
Headers show
Series tools/nolibc: add a new syscall helper | expand

Commit Message

Zhangjin Wu July 7, 2023, 2:56 p.m. UTC
No official reference states the errno range, here aligns with musl and
glibc and uses [-MAX_ERRNO, -1] instead of all negative ones.

- musl: src/internal/syscall_ret.c
- glibc: sysdeps/unix/sysv/linux/sysdep.h

The MAX_ERRNO used by musl and glibc is 4095, just like the one nolibc
defined in tools/include/nolibc/errno.h.

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/ZKKdD%2Fp4UkEavru6@1wt.eu/
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Link: https://lore.kernel.org/linux-riscv/94dd5170929f454fbc0a10a2eb3b108d@AcuMS.aculab.com/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
---
 tools/include/nolibc/sys.h | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Willy Tarreau Aug. 6, 2023, 9:58 a.m. UTC | #1
Hi Zhangjin,

On Fri, Jul 07, 2023 at 10:56:59PM +0800, Zhangjin Wu wrote:
> No official reference states the errno range, here aligns with musl and
> glibc and uses [-MAX_ERRNO, -1] instead of all negative ones.
> 
> - musl: src/internal/syscall_ret.c
> - glibc: sysdeps/unix/sysv/linux/sysdep.h
> 
> The MAX_ERRNO used by musl and glibc is 4095, just like the one nolibc
> defined in tools/include/nolibc/errno.h.
> 
> Suggested-by: Willy Tarreau <w@1wt.eu>
> Link: https://lore.kernel.org/lkml/ZKKdD%2Fp4UkEavru6@1wt.eu/
> Suggested-by: David Laight <David.Laight@ACULAB.COM>
> Link: https://lore.kernel.org/linux-riscv/94dd5170929f454fbc0a10a2eb3b108d@AcuMS.aculab.com/
> Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
> ---
>  tools/include/nolibc/sys.h | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
> index 53bc3ad6593e..3479f54d7957 100644
> --- a/tools/include/nolibc/sys.h
> +++ b/tools/include/nolibc/sys.h
> @@ -28,13 +28,20 @@
>  #include "errno.h"
>  #include "types.h"
>  
> -/* Syscall return helper, set errno as -ret when ret < 0 */
> +
> +/* Syscall return helper for library routines, set errno as -ret when ret is in
> + * range of [-MAX_ERRNO, -1]
> + *
> + * Note, No official reference states the errno range here aligns with musl
> + * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
> + */
> +
>  static __inline__ __attribute__((unused, always_inline))
> -long __sysret(long ret)
> +long __sysret(unsigned long ret)
>  {
> -	if (ret < 0) {
> -		SET_ERRNO(-ret);
> -		ret = -1;
> +	if (ret >= (unsigned long)-MAX_ERRNO) {
> +		SET_ERRNO(-(long)ret);
> +		return -1;
>  	}
>  	return ret;
>  }

While running some regression tests, I found that my programs increased
in size by 3-4% overall, which was solely attributed to this helper that
significantly increased the size of many syscalls (particularly those
returning ints). Let's consider this simple function:

  void unlink_exit(const char *name)
  {
        int ret = unlink(name);
        exit(ret < 0 ? errno : 0);
  }

Before:
  $ nm --size unlink.o | grep unli
  0000000000000030 T unlink_exit

  $ objdump -d -j .text --disassemble=unlink_exit unlink.o
  000000000000003b <unlink_exit>:
    3b:   48 89 fe                mov    %rdi,%rsi
    3e:   b8 07 01 00 00          mov    $0x107,%eax
    43:   31 d2                   xor    %edx,%edx
    45:   48 c7 c7 9c ff ff ff    mov    $0xffffffffffffff9c,%rdi
    4c:   0f 05                   syscall 
    4e:   31 ff                   xor    %edi,%edi
    50:   85 c0                   test   %eax,%eax
    52:   79 0a                   jns    5e <unlink_exit+0x23>
    54:   89 c7                   mov    %eax,%edi
    56:   f7 df                   neg    %edi
    58:   89 3d 00 00 00 00       mov    %edi,0x0(%rip)        # 5e <unlink_exit+0x23>
    5e:   b8 3c 00 00 00          mov    $0x3c,%eax
    63:   40 0f b6 ff             movzbl %dil,%edi
    67:   0f 05                   syscall 
    69:   eb fe                   jmp    69 <unlink_exit+0x2e>

After:
  $ nm --size unlink.o | grep unli
  0000000000000042 T unlink_exit

  $ objdump -d -j .text --disassemble=unlink_exit unlink.o
  0000000000000051 <unlink_exit>:
    51:   48 89 fe                mov    %rdi,%rsi
    54:   b8 07 01 00 00          mov    $0x107,%eax
    59:   31 d2                   xor    %edx,%edx
    5b:   48 c7 c7 9c ff ff ff    mov    $0xffffffffffffff9c,%rdi
    62:   0f 05                   syscall 
    64:   48 63 d0                movslq %eax,%rdx
    67:   48 81 fa 00 f0 ff ff    cmp    $0xfffffffffffff000,%rdx
    6e:   76 0a                   jbe    7a <unlink_exit+0x29>
    70:   f7 da                   neg    %edx
    72:   89 15 00 00 00 00       mov    %edx,0x0(%rip)        # 78 <unlink_exit+0x27>
    78:   eb 06                   jmp    80 <unlink_exit+0x2f>
    7a:   31 ff                   xor    %edi,%edi
    7c:   85 c0                   test   %eax,%eax
    7e:   79 06                   jns    86 <unlink_exit+0x35>
    80:   8b 3d 00 00 00 00       mov    0x0(%rip),%edi        # 86 <unlink_exit+0x35>
    86:   b8 3c 00 00 00          mov    $0x3c,%eax
    8b:   40 0f b6 ff             movzbl %dil,%edi
    8f:   0f 05                   syscall 
    91:   eb fe                   jmp    91 <unlink_exit+0x40>

=> that's 18 bytes added to retrieve the result of a syscall.

There are several reasons involved:
  - the "unsigned long" argument to __sysret() forces a sign extension
    from all sys_* functions that used to return int (the movslq above).

  - the comparison with the error range now has to be performed on a
    long instead of an int

  - the return value from __sysret() is a long (note, a signed long)
    which then has to be turned back to an int before being returned
    by the caller to satisfy the caller's prototype.

I could recover a part of it by replacing the __sysret() function with
a macro that preserves the input type and avoids these useless
conversions:

  #define __sysret(arg) ({ \
		typeof(arg) __sysret_arg = (arg);				\
		if ((unsigned long)__sysret_arg >= (unsigned long)-MAX_ERRNO) { \
			SET_ERRNO(-(int)__sysret_arg);				\
			__sysret_arg = -1L;					\
		}								\
		__sysret_arg;							\
	  })

But the remaining part is the comparison to -MAX_ERRNO inflicted on all
integer returns where we could previously keep a simple sign comparison:

    51:   48 89 fe                mov    %rdi,%rsi
    54:   b8 07 01 00 00          mov    $0x107,%eax
    59:   31 d2                   xor    %edx,%edx
    5b:   48 c7 c7 9c ff ff ff    mov    $0xffffffffffffff9c,%rdi
    62:   0f 05                   syscall 
    64:   3d 00 f0 ff ff          cmp    $0xfffff000,%eax
    69:   76 0a                   jbe    75 <unlink_exit+0x24>
    6b:   f7 d8                   neg    %eax
    6d:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # 73 <unlink_exit+0x22>
    73:   eb 06                   jmp    7b <unlink_exit+0x2a>
    75:   31 ff                   xor    %edi,%edi
    77:   85 c0                   test   %eax,%eax
    79:   79 06                   jns    81 <unlink_exit+0x30>
    7b:   8b 3d 00 00 00 00       mov    0x0(%rip),%edi        # 81 <unlink_exit+0x30>
    81:   b8 3c 00 00 00          mov    $0x3c,%eax
    86:   40 0f b6 ff             movzbl %dil,%edi
    8a:   0f 05                   syscall 
    8c:   eb fe                   jmp    8c <unlink_exit+0x3b>

And given that the vast majority of the syscalls return integers, I think
we should specialize these sysret functions so that we don't add needless
complexity for all those for which we know they're returning ints (it's
written in the caller's prototype anyway). I.e. we can have __sysret_int()
that is the low-overhead version and keep __sysret() the expensive one
doing the comparison (and possibly through the macro like above if needed
in order to avoid multiple casts).

I'm not going to change all that now, that's too late, but I'm a bit sad
to see my binaries inflate just because of this, so I hope we can get back
to this later after the current queue is flushed.

Regards,
Willy
Zhangjin Wu Aug. 6, 2023, 12:33 p.m. UTC | #2
> Hi Zhangjin,
> 
> On Fri, Jul 07, 2023 at 10:56:59PM +0800, Zhangjin Wu wrote:
> > No official reference states the errno range, here aligns with musl and
> > glibc and uses [-MAX_ERRNO, -1] instead of all negative ones.
> > 
> > - musl: src/internal/syscall_ret.c
> > - glibc: sysdeps/unix/sysv/linux/sysdep.h
> > 
> > The MAX_ERRNO used by musl and glibc is 4095, just like the one nolibc
> > defined in tools/include/nolibc/errno.h.
> > 
> > Suggested-by: Willy Tarreau <w@1wt.eu>
> > Link: https://lore.kernel.org/lkml/ZKKdD%2Fp4UkEavru6@1wt.eu/
> > Suggested-by: David Laight <David.Laight@ACULAB.COM>
> > Link: https://lore.kernel.org/linux-riscv/94dd5170929f454fbc0a10a2eb3b108d@AcuMS.aculab.com/
> > Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
> > ---
> >  tools/include/nolibc/sys.h | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
> > index 53bc3ad6593e..3479f54d7957 100644
> > --- a/tools/include/nolibc/sys.h
> > +++ b/tools/include/nolibc/sys.h
> > @@ -28,13 +28,20 @@
> >  #include "errno.h"
> >  #include "types.h"
> >  
> > -/* Syscall return helper, set errno as -ret when ret < 0 */
> > +
> > +/* Syscall return helper for library routines, set errno as -ret when ret is in
> > + * range of [-MAX_ERRNO, -1]
> > + *
> > + * Note, No official reference states the errno range here aligns with musl
> > + * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
> > + */
> > +
> >  static __inline__ __attribute__((unused, always_inline))
> > -long __sysret(long ret)
> > +long __sysret(unsigned long ret)
> >  {
> > -	if (ret < 0) {
> > -		SET_ERRNO(-ret);
> > -		ret = -1;
> > +	if (ret >= (unsigned long)-MAX_ERRNO) {
> > +		SET_ERRNO(-(long)ret);
> > +		return -1;
> >  	}
> >  	return ret;
> >  }
> 
> While running some regression tests, I found that my programs increased
> in size by 3-4% overall, which was solely attributed to this helper that
> significantly increased the size of many syscalls (particularly those
> returning ints). Let's consider this simple function:
>

I'm very sad to learn this, so sorry, but we do need a test suite for the
coverage on different toolchains, on O0/1/2/3/s, on generated size ...

>   void unlink_exit(const char *name)
>   {
>         int ret = unlink(name);
>         exit(ret < 0 ? errno : 0);
>   }
>
> Before:
>   $ nm --size unlink.o | grep unli
>   0000000000000030 T unlink_exit
> 
>   $ objdump -d -j .text --disassemble=unlink_exit unlink.o
>   000000000000003b <unlink_exit>:
>     3b:   48 89 fe                mov    %rdi,%rsi
>     3e:   b8 07 01 00 00          mov    $0x107,%eax
>     43:   31 d2                   xor    %edx,%edx
>     45:   48 c7 c7 9c ff ff ff    mov    $0xffffffffffffff9c,%rdi
>     4c:   0f 05                   syscall 
>     4e:   31 ff                   xor    %edi,%edi
>     50:   85 c0                   test   %eax,%eax
>     52:   79 0a                   jns    5e <unlink_exit+0x23>
>     54:   89 c7                   mov    %eax,%edi
>     56:   f7 df                   neg    %edi
>     58:   89 3d 00 00 00 00       mov    %edi,0x0(%rip)        # 5e <unlink_exit+0x23>
>     5e:   b8 3c 00 00 00          mov    $0x3c,%eax
>     63:   40 0f b6 ff             movzbl %dil,%edi
>     67:   0f 05                   syscall 
>     69:   eb fe                   jmp    69 <unlink_exit+0x2e>
> 
> After:
>   $ nm --size unlink.o | grep unli
>   0000000000000042 T unlink_exit
> 
>   $ objdump -d -j .text --disassemble=unlink_exit unlink.o
>   0000000000000051 <unlink_exit>:
>     51:   48 89 fe                mov    %rdi,%rsi
>     54:   b8 07 01 00 00          mov    $0x107,%eax
>     59:   31 d2                   xor    %edx,%edx
>     5b:   48 c7 c7 9c ff ff ff    mov    $0xffffffffffffff9c,%rdi
>     62:   0f 05                   syscall 
>     64:   48 63 d0                movslq %eax,%rdx
>     67:   48 81 fa 00 f0 ff ff    cmp    $0xfffffffffffff000,%rdx
>     6e:   76 0a                   jbe    7a <unlink_exit+0x29>
>     70:   f7 da                   neg    %edx
>     72:   89 15 00 00 00 00       mov    %edx,0x0(%rip)        # 78 <unlink_exit+0x27>
>     78:   eb 06                   jmp    80 <unlink_exit+0x2f>
>     7a:   31 ff                   xor    %edi,%edi
>     7c:   85 c0                   test   %eax,%eax
>     7e:   79 06                   jns    86 <unlink_exit+0x35>
>     80:   8b 3d 00 00 00 00       mov    0x0(%rip),%edi        # 86 <unlink_exit+0x35>
>     86:   b8 3c 00 00 00          mov    $0x3c,%eax
>     8b:   40 0f b6 ff             movzbl %dil,%edi
>     8f:   0f 05                   syscall 
>     91:   eb fe                   jmp    91 <unlink_exit+0x40>
> 
> => that's 18 bytes added to retrieve the result of a syscall.
> 
> There are several reasons involved:
>   - the "unsigned long" argument to __sysret() forces a sign extension
>     from all sys_* functions that used to return int (the movslq above).
> 
>   - the comparison with the error range now has to be performed on a
>     long instead of an int
> 
>   - the return value from __sysret() is a long (note, a signed long)
>     which then has to be turned back to an int before being returned
>     by the caller to satisfy the caller's prototype.
> 
> I could recover a part of it by replacing the __sysret() function with
> a macro that preserves the input type and avoids these useless
> conversions:
> 
>   #define __sysret(arg) ({ \
> 		typeof(arg) __sysret_arg = (arg);				\
> 		if ((unsigned long)__sysret_arg >= (unsigned long)-MAX_ERRNO) { \
> 			SET_ERRNO(-(int)__sysret_arg);				\
> 			__sysret_arg = -1L;					\
> 		}								\
> 		__sysret_arg;							\
> 	  })
> 
> But the remaining part is the comparison to -MAX_ERRNO inflicted on all
> integer returns where we could previously keep a simple sign comparison:
> 
>     51:   48 89 fe                mov    %rdi,%rsi
>     54:   b8 07 01 00 00          mov    $0x107,%eax
>     59:   31 d2                   xor    %edx,%edx
>     5b:   48 c7 c7 9c ff ff ff    mov    $0xffffffffffffff9c,%rdi
>     62:   0f 05                   syscall 
>     64:   3d 00 f0 ff ff          cmp    $0xfffff000,%eax
>     69:   76 0a                   jbe    75 <unlink_exit+0x24>
>     6b:   f7 d8                   neg    %eax
>     6d:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # 73 <unlink_exit+0x22>
>     73:   eb 06                   jmp    7b <unlink_exit+0x2a>
>     75:   31 ff                   xor    %edi,%edi
>     77:   85 c0                   test   %eax,%eax
>     79:   79 06                   jns    81 <unlink_exit+0x30>
>     7b:   8b 3d 00 00 00 00       mov    0x0(%rip),%edi        # 81 <unlink_exit+0x30>
>     81:   b8 3c 00 00 00          mov    $0x3c,%eax
>     86:   40 0f b6 ff             movzbl %dil,%edi
>     8a:   0f 05                   syscall 
>     8c:   eb fe                   jmp    8c <unlink_exit+0x3b>
>
> And given that the vast majority of the syscalls return integers, I think
> we should specialize these sysret functions so that we don't add needless
> complexity for all those for which we know they're returning ints (it's
> written in the caller's prototype anyway). I.e. we can have __sysret_int()
> that is the low-overhead version and keep __sysret() the expensive one
> doing the comparison (and possibly through the macro like above if needed
> in order to avoid multiple casts).
>

Based on your macro version, I tried to use the is_signed_type() from kernel,
it seems works.

A simple test shows, before:

    // ppc64
    $ size nolibc-test
       text	   data	    bss	    dec	    hex	filename
      27308	   1880	     80	  29268	   7254	nolibc-test

    // mips
    $ size nolibc-test
       text	   data	    bss	    dec	    hex	filename
      23276	     64	     64	  23404	   5b6c	nolibc-test

After:

    // ppc64
    $ size nolibc-test
       text	   data	    bss	    dec	    hex	filename
      27136	   1880	     80	  29096	   71a8	nolibc-test

    // mips
    $ size nolibc-test
       text	   data	    bss	    dec	    hex	filename
      23036	     64	     64	  23164	   5a7c	nolibc-test

> I'm not going to change all that now, that's too late, but I'm a bit sad
> to see my binaries inflate just because of this, so I hope we can get back
> to this later after the current queue is flushed.
>

Ok, will send a patch with is_signed_type() for more discuss soon.

Thanks,
Willy

> Regards,
> Willy
David Laight Aug. 7, 2023, 8:16 a.m. UTC | #3
From: Zhangjin Wu
> Sent: 06 August 2023 13:34
...
> Based on your macro version, I tried to use the is_signed_type() from kernel,
> it seems works.

You'll find that (void *)0 isn't 'constant enough' for
is_constexpr() - so is_constexpr((type)0) can be used
to detect pointer types.
Probably requires an 'is_pointer_type()' define.
This also rather means that is_signed_type() needs to be
wrapped in is_constexpr() - probably generating header
file inclusion hell.

I'm going to add a comment on v3 of the patch...

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
index 53bc3ad6593e..3479f54d7957 100644
--- a/tools/include/nolibc/sys.h
+++ b/tools/include/nolibc/sys.h
@@ -28,13 +28,20 @@ 
 #include "errno.h"
 #include "types.h"
 
-/* Syscall return helper, set errno as -ret when ret < 0 */
+
+/* Syscall return helper for library routines, set errno as -ret when ret is in
+ * range of [-MAX_ERRNO, -1]
+ *
+ * Note, No official reference states the errno range here aligns with musl
+ * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
+ */
+
 static __inline__ __attribute__((unused, always_inline))
-long __sysret(long ret)
+long __sysret(unsigned long ret)
 {
-	if (ret < 0) {
-		SET_ERRNO(-ret);
-		ret = -1;
+	if (ret >= (unsigned long)-MAX_ERRNO) {
+		SET_ERRNO(-(long)ret);
+		return -1;
 	}
 	return ret;
 }