Message ID | b6ff2684f557f6ce00151905990643e651391614.1691437328.git.falcon@tinylab.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v5] tools/nolibc: fix up size inflate regression | expand |
Hi Zhangjin, On Tue, Aug 08, 2023 at 04:04:05AM +0800, Zhangjin Wu wrote: > As reported and suggested by Willy, the inline __sysret() helper > introduces three types of conversions and increases the size: > > (1) the "unsigned long" argument to __sysret() forces a sign extension > from all sys_* functions that used to return 'int' > > (2) the comparison with the error range now has to be performed on a > 'unsigned long' instead of an 'int' > > (3) the return value from __sysret() is a 'long' (note, a signed long) > which then has to be turned back to an 'int' before being returned by the > caller to satisfy the caller's prototype. > > To fix up this, firstly, let's use macro instead of inline function to > preserves the input type and avoids these useless conversions (1), (3). > > Secondly, comparison to -MAX_ERRNO inflicts on all integer returns where > we could previously keep a simple sign comparison, let's use a new > __is_pointer() macro suggested by David Laight to limit the comparison > to -MAX_ERRNO (2) only for pointer returns and preserve a simple sign > comparison for integer returns as before. The __builtin_choose_expr() > is suggested by David Laight to choose different comparisons based on > the types to share code. > > Thirdly, fix up the following warning by an explicit conversion and let > __sysret() be able to accept the (void *) type of argument: > > sysroot/powerpc/include/sys.h: In function 'sbrk': > sysroot/powerpc/include/sys.h:104:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] > 104 | return (void *)__sysret(-ENOMEM); > > Fourthly, to further workaround the argument type with 'const' flag, > must use __auto_type for gcc >= 11.0 and __typeof__((arg) + 0) suggested > by David Laight for old gcc versions. (...) > tools/include/nolibc/sys.h | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------- > 1 file changed, 59 insertions(+), 15 deletions(-) Quite frankly, even if it can be looked at as a piece of art, I don't like it. It's overkill for what we need and it brings in several tricky macros that we don't need and that require a link to their analysis so that nobody breaks them by accident. I mean, if one day we need them, okay we know we can find them, they're perfect for certain use cases. But all this just to avoid a ternary operation is far too much IMHO. That's without counting on the compiler tricks to use the ugly __auto_type when available, and the macro names which continue to possibly interact with user code. And if you remember, originally you proposed to factor the SET_ERRNO() stuff in every syscall in order to "simplify the code and improve maintainability". It's clear that we've long abandonned that goal here. If we had no other choice, I'd rather roll back to the clean, readable and trustable SET_ERRNO() in every syscall! So I just restarted from what I proposed the other day, using a ternary operator as I suggested in order to address the const case, and it gives me the following patch, which is way simpler and still a bit readable. It's made of two nested (?:) : - the first one to determine if we have to check for the sign or against -MAX_ERRNO to detect an error (depends on the arg's signedness) - the second one to return either the argument as-is or -1. The only two tricks are that (typeof(arg))-1 is compared to 1 instead of zero so that gcc doesn't complain that we're comparing against a null pointer, and similarly we compare arg+1 to 1 instead of arg to 0 for the negative case, and that's all. It gives me the expected code and output from gcc-4.7 to 12.3, and clang-13. I've checked against your version and it's always exactly the same (in fact to be more precise sometimes it's 1-2 bytes smaller but that's only due to the compiler taking liberties with the code ordering, it could as well have done it the other way around, though it did not this time): 26144 zhangjin-v5/nolibc-test--Os-arm64 | 26144 willy/nolibc-test--Os-arm64 23340 zhangjin-v5/nolibc-test--Os-armv5 | 23340 willy/nolibc-test--Os-armv5 23232 zhangjin-v5/nolibc-test--Os-armv7 | 23232 willy/nolibc-test--Os-armv7 17508 zhangjin-v5/nolibc-test--Os-armv7t | 17508 willy/nolibc-test--Os-armv7t 19674 zhangjin-v5/nolibc-test--Os-i386 | 19673 willy/nolibc-test--Os-i386 19821 zhangjin-v5/nolibc-test--Os-i586 | 19820 willy/nolibc-test--Os-i586 23084 zhangjin-v5/nolibc-test--Os-loongarch | 23084 willy/nolibc-test--Os-loongarch 23404 zhangjin-v5/nolibc-test--Os-mips | 23404 willy/nolibc-test--Os-mips 27108 zhangjin-v5/nolibc-test--Os-ppc32 | 27108 willy/nolibc-test--Os-ppc32 27652 zhangjin-v5/nolibc-test--Os-ppc64 | 27652 willy/nolibc-test--Os-ppc64 27652 zhangjin-v5/nolibc-test--Os-ppc64le | 27652 willy/nolibc-test--Os-ppc64le 19356 zhangjin-v5/nolibc-test--Os-riscv | 19356 willy/nolibc-test--Os-riscv 22152 zhangjin-v5/nolibc-test--Os-s390 | 22152 willy/nolibc-test--Os-s390 22300 zhangjin-v5/nolibc-test--Os-x86_64 | 22299 willy/nolibc-test--Os-x86_64 Unless there's any objection, I'll queue this one. And if __sysret() annoys us again in the future I might very well revert that simplification. Any question about the patch ? Thanks, Willy --- From 73dd63ed6666c6f212ba09247e68b6b5711ed6ec Mon Sep 17 00:00:00 2001 From: Willy Tarreau <w@1wt.eu> Date: Tue, 8 Aug 2023 20:12:50 +0200 Subject: tools/nolibc: avoid undesired casts in the __sysret() macro Having __sysret() as an inline function has the unfortunate effect of adding casts and large constants comparisons after the syscall returns that significantly inflate some light code that's otherwise syscall- heavy. Even nolibc-test grew by ~1%. Let's switch back to a macro for this, and apply the following rule: - if the argument (the local variable containing the syscall return value) is signed, any negative value is an error, so the check is performed compared to zero with the argument's type ; - otherwise if the argument is unsigned, only values >= -MAX_ERRNO indicate an error. That's what is used for mmap() for example. The result is calculated using a ternary operator so that we don't need to assign values to a temporary variable and that it does work fine with const if needed. The sbrk() and mmap() syscalls were also fixed to return the correct type (they used to have double casts to hide the pointer and restore it). Fixes: 428905da6ec4 ("tools/nolibc: sys.h: add a syscall return helper") Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/ Cc: Zhangjin Wu <falcon@tinylab.org> Cc: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Willy Tarreau <w@1wt.eu> --- tools/include/nolibc/sys.h | 35 +++++++++++++++++++---------------- 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h index 833d6c5e86dc..6b5f39fbf998 100644 --- a/tools/include/nolibc/sys.h +++ b/tools/include/nolibc/sys.h @@ -28,22 +28,25 @@ #include "types.h" -/* Syscall return helper for library routines, set errno as -ret when ret is in - * range of [-MAX_ERRNO, -1] - * - * Note, No official reference states the errno range here aligns with musl - * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h) +/* Syscall return helper: takes the syscall value in argument and checks for an + * error in it. For unsigned returns, an error is within [-MAX_ERRNO, -1]. For + * signed returns, an error is any value < 0. When an error is encountered, + * -ret is set into errno and -1 is returned. Otherwise the returned value is + * passed as-is with its type preserved. */ -static __inline__ __attribute__((unused, always_inline)) -long __sysret(unsigned long ret) -{ - if (ret >= (unsigned long)-MAX_ERRNO) { - SET_ERRNO(-(long)ret); - return -1; - } - return ret; -} +#define __sysret(arg) \ +({ \ + __typeof__(arg) __sysret_arg = (arg); \ + ((((__typeof__(arg)) -1) > (__typeof__(arg)) 1) ? /* unsigned arg? */ \ + (uintptr_t)__sysret_arg >= (uintptr_t)-(MAX_ERRNO) : /* errors */ \ + (__sysret_arg + 1) < ((__typeof__(arg))1) /* signed: <0 = error */ \ + ) ? ({ \ + SET_ERRNO(-(intptr_t)__sysret_arg); \ + ((__typeof__(arg)) -1); /* return -1 upon error */ \ + }) : __sysret_arg; /* return original value & type on success */ \ +}) + /* Functions in this file only describe syscalls. They're declared static so * that the compiler usually decides to inline them while still being allowed @@ -94,7 +97,7 @@ void *sbrk(intptr_t inc) if (ret && sys_brk(ret + inc) == ret + inc) return ret + inc; - return (void *)__sysret(-ENOMEM); + return __sysret((void *)-ENOMEM); } @@ -682,7 +685,7 @@ void *sys_mmap(void *addr, size_t length, int prot, int flags, int fd, static __attribute__((unused)) void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) { - return (void *)__sysret((unsigned long)sys_mmap(addr, length, prot, flags, fd, offset)); + return __sysret(sys_mmap(addr, length, prot, flags, fd, offset)); } static __attribute__((unused))
Hi, Willy > Hi Zhangjin, > > On Tue, Aug 08, 2023 at 04:04:05AM +0800, Zhangjin Wu wrote: > > As reported and suggested by Willy, the inline __sysret() helper > > introduces three types of conversions and increases the size: > > > > (1) the "unsigned long" argument to __sysret() forces a sign extension > > from all sys_* functions that used to return 'int' > > > > (2) the comparison with the error range now has to be performed on a > > 'unsigned long' instead of an 'int' > > > > (3) the return value from __sysret() is a 'long' (note, a signed long) > > which then has to be turned back to an 'int' before being returned by the > > caller to satisfy the caller's prototype. > > > > To fix up this, firstly, let's use macro instead of inline function to > > preserves the input type and avoids these useless conversions (1), (3). > > > > Secondly, comparison to -MAX_ERRNO inflicts on all integer returns where > > we could previously keep a simple sign comparison, let's use a new > > __is_pointer() macro suggested by David Laight to limit the comparison > > to -MAX_ERRNO (2) only for pointer returns and preserve a simple sign > > comparison for integer returns as before. The __builtin_choose_expr() > > is suggested by David Laight to choose different comparisons based on > > the types to share code. > > > > Thirdly, fix up the following warning by an explicit conversion and let > > __sysret() be able to accept the (void *) type of argument: > > > > sysroot/powerpc/include/sys.h: In function 'sbrk': > > sysroot/powerpc/include/sys.h:104:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] > > 104 | return (void *)__sysret(-ENOMEM); > > > > Fourthly, to further workaround the argument type with 'const' flag, > > must use __auto_type for gcc >= 11.0 and __typeof__((arg) + 0) suggested > > by David Laight for old gcc versions. > (...) > > tools/include/nolibc/sys.h | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------- > > 1 file changed, 59 insertions(+), 15 deletions(-) > > Quite frankly, even if it can be looked at as a piece of art, I don't > like it. It's overkill for what we need and it brings in several tricky > macros that we don't need and that require a link to their analysis so > that nobody breaks them by accident. I mean, if one day we need them, > okay we know we can find them, they're perfect for certain use cases. > But all this just to avoid a ternary operation is far too much IMHO. > That's without counting on the compiler tricks to use the ugly > __auto_type when available, and the macro names which continue to > possibly interact with user code. > Agree, I don't like __auto_type too, although I have tried to find whether there is a direct macro for it, but NO such one, and the __auto_type in some older versions don't accept 'const' flag, so, I'm also worried about if gcc will change it in the future ;-( Seems __sysret() is mainly used by us in sys.h, perhaps we can simply assume and guarantee nobody will use 'const' in such cases. > And if you remember, originally you proposed to factor the SET_ERRNO() > stuff in every syscall in order to "simplify the code and improve > maintainability". It's clear that we've long abandonned that goal here. > If we had no other choice, I'd rather roll back to the clean, readable > and trustable SET_ERRNO() in every syscall! > Agree, or we simply use the original version without pointer returns support (only sbrk and mmap currently) but convert it to the macro version. Or, as the idea mentioned by Thomas in a reply: if we can let the sys_ functions use 'long' returns, or even further, we convert all of the sys_ functions to macros and let them preserve input types from library routines and preserve return types from the my_syscall<N> macros. As we discussed in my our syscall.h proposal, if there is a common my_syscall(), every sys_ function can be simply defined to something like: #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) In my_syscall(), it can even simply return -ENOSYS if the __NR_xxx is not defined (we init such __NR_xxx to something like __NR_NOSYS): // sysnr.h // If worried about the applications use this macro, perhaps we can // use a different prefix, for example, NOLIBC_NR_xxx #define NOLIBC_NR_NOSYS (-1L) #ifndef __NR_xxx #define NOLIBC_NR_xxx NOLIBC_NR_NOSYS #else #define NOLIBC_NR_xxx __NR_xxx #endif // syscall.h // _my_syscall is similar to syscall() in unistd.h, but without the // __sysret normalization #define _my_syscalln(N, ...) my_syscall##N(__VA_ARGS__) #define _my_syscall_n(N, ...) _my_syscalln(N, __VA_ARGS__) #define _my_syscall(...) _my_syscall_n(_syscall_narg(__VA_ARGS__), ##__VA_ARGS__) #define my_syscall(name, ...) \ ({ \ long _ret; \ if (NOLIBC_NR_##name == NOLIBC_NR_NOSYS) \ _ret = -ENOSYS; \ else \ _ret = _my_syscall(NOLIBC_NR_##name, ##__VA_ARGS__); \ _ret; \ }) // sys_<NAME> list, based on unistd.h #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) With above conversions, we may be able to predefine all of the sys_<NAME> functions to preserve the input types from library rountines and return types from my_syscall<N> (by default, 'long'). This also follows the suggestion from Arnd: let sys_ not use the other low level syscalls, only use its own. This may also help us to remove all of the `#ifdef __NR_` wrappers, we can directly check the -ENOSYS in the library routines and try another sys_<NAME> if required, at last, call __sysret() to normalize the errors and return value. Use dup2 and dup3 as examples, with sysnr.h and syscall.h above, sys.h will work like this, without any #ifdef's: /* * int dup2(int old, int new); */ static __attribute__((unused)) int dup2(int old, int new) { int ret = sys_dup3(old, new, 0); if (ret == -ENOSYS) ret = sys_dup2(old, new); return __sysret(ret); } /* * int dup3(int old, int new, int flags); */ static __attribute__((unused)) int dup3(int old, int new, int flags) { return __sysret(sys_dup3(old, new, flags)); } If the above description is not clear enough, I have changed something for this idea with more cleanups and have done some simple tests: Compare with v5 --> v6 (with gcc-13.2.0): i386: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 19509 -> 19250 x86_64: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 22012 -> 21758 arm64: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 25868 -> 25804 arm: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 23112 -> 22828 mips: 160 test(s): 157 passed, 3 skipped, 0 failed => status: warning 22924 -> 22740 ppc: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 26628 -> 26376 ppc64: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 27204 -> 26756 ppc64le: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 27828 -> 27364 riscv: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 21870 -> 21772 s390: 160 test(s): 157 passed, 3 skipped, 0 failed => status: warning 22192 -> 21992 And now we get: $ wc -l tools/include/nolibc/sys.h 746 tools/include/nolibc/sys.h $ wc -l tools/include/nolibc/sysnr.h 410 tools/include/nolibc/sysnr.h $ wc -l tools/include/nolibc/syscall.h 110 tools/include/nolibc/syscall.h Before: $ wc -l tools/include/nolibc/sys.h 1222 tools/include/nolibc/sys.h Most of the library routines become one line code. The main size reduced may be the carefully tuned sys_ selection, for example, linkat v.s. link, the patchset still require some cleanups, will send v6 for more discussion if you agree. > So I just restarted from what I proposed the other day, using a ternary > operator as I suggested in order to address the const case, and it gives > me the following patch, which is way simpler and still a bit readable. > It's made of two nested (?:) : > - the first one to determine if we have to check for the sign or > against -MAX_ERRNO to detect an error (depends on the arg's > signedness) > - the second one to return either the argument as-is or -1. > > The only two tricks are that (typeof(arg))-1 is compared to 1 instead of > zero so that gcc doesn't complain that we're comparing against a null > pointer, and similarly we compare arg+1 to 1 instead of arg to 0 for the > negative case, and that's all. It gives me the expected code and output > from gcc-4.7 to 12.3, and clang-13. > > I've checked against your version and it's always exactly the same (in > fact to be more precise sometimes it's 1-2 bytes smaller but that's only > due to the compiler taking liberties with the code ordering, it could as > well have done it the other way around, though it did not this time): > > 26144 zhangjin-v5/nolibc-test--Os-arm64 | 26144 willy/nolibc-test--Os-arm64 > 23340 zhangjin-v5/nolibc-test--Os-armv5 | 23340 willy/nolibc-test--Os-armv5 [...] > > Unless there's any objection, I'll queue this one. And if __sysret() > annoys us again in the future I might very well revert that simplification. > > Any question about the patch ? > [...] > > -/* Syscall return helper for library routines, set errno as -ret when ret is in > - * range of [-MAX_ERRNO, -1] > - * > - * Note, No official reference states the errno range here aligns with musl > - * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h) > +/* Syscall return helper: takes the syscall value in argument and checks for an > + * error in it. For unsigned returns, an error is within [-MAX_ERRNO, -1]. For > + * signed returns, an error is any value < 0. When an error is encountered, > + * -ret is set into errno and -1 is returned. Otherwise the returned value is > + * passed as-is with its type preserved. > */ > > -static __inline__ __attribute__((unused, always_inline)) > -long __sysret(unsigned long ret) > -{ > - if (ret >= (unsigned long)-MAX_ERRNO) { > - SET_ERRNO(-(long)ret); > - return -1; > - } > - return ret; > -} > +#define __sysret(arg) \ > +({ \ > + __typeof__(arg) __sysret_arg = (arg); \ Here ignores the 'const' flag in input type? > + ((((__typeof__(arg)) -1) > (__typeof__(arg)) 1) ? /* unsigned arg? */ \ > + (uintptr_t)__sysret_arg >= (uintptr_t)-(MAX_ERRNO) : /* errors */ \ > + (__sysret_arg + 1) < ((__typeof__(arg))1) /* signed: <0 = error */ \ > + ) ? ({ \ > + SET_ERRNO(-(intptr_t)__sysret_arg); \ > + ((__typeof__(arg)) -1); /* return -1 upon error */ \ > + }) : __sysret_arg; /* return original value & type on success */ \ > +}) > + > To be honest, it is also a little complex when with one "?:" embedded in another, I even don't understand how the 'unsigned arg' branch works, sorry, is it dark magic like the __is_constexpr? ;-) Thanks, Zhangjin > /* Functions in this file only describe syscalls. They're declared static so > * that the compiler usually decides to inline them while still being allowed > @@ -94,7 +97,7 @@ void *sbrk(intptr_t inc) > if (ret && sys_brk(ret + inc) == ret + inc) > return ret + inc; > > - return (void *)__sysret(-ENOMEM); > + return __sysret((void *)-ENOMEM); > } > > > @@ -682,7 +685,7 @@ void *sys_mmap(void *addr, size_t length, int prot, int flags, int fd, > static __attribute__((unused)) > void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) > { > - return (void *)__sysret((unsigned long)sys_mmap(addr, length, prot, flags, fd, offset)); > + return __sysret(sys_mmap(addr, length, prot, flags, fd, offset)); > } > > static __attribute__((unused)) > -- > 2.35.3
Hi Zhangjin, On Thu, Aug 10, 2023 at 06:17:43AM +0800, Zhangjin Wu wrote: > > Quite frankly, even if it can be looked at as a piece of art, I don't > > like it. It's overkill for what we need and it brings in several tricky > > macros that we don't need and that require a link to their analysis so > > that nobody breaks them by accident. I mean, if one day we need them, > > okay we know we can find them, they're perfect for certain use cases. > > But all this just to avoid a ternary operation is far too much IMHO. > > That's without counting on the compiler tricks to use the ugly > > __auto_type when available, and the macro names which continue to > > possibly interact with user code. > > > > Agree, I don't like __auto_type too, although I have tried to find whether > there is a direct macro for it, but NO such one, and the __auto_type in some > older versions don't accept 'const' flag, so, I'm also worried about if gcc > will change it in the future ;-( I mean, it's just that we do not need it at all. > Seems __sysret() is mainly used by us in sys.h, Sure, it was added not long ago by you to factor all the calls to SET_ERRNO(): 428905da6ec4 ("tools/nolibc: sys.h: add a syscall return helper") > perhaps we can simply > assume and guarantee nobody will use 'const' in such cases. There is absolutely *no* problem with const since the value is use by a "return" statement. > > And if you remember, originally you proposed to factor the SET_ERRNO() > > stuff in every syscall in order to "simplify the code and improve > > maintainability". It's clear that we've long abandonned that goal here. > > If we had no other choice, I'd rather roll back to the clean, readable > > and trustable SET_ERRNO() in every syscall! > > > > Agree, or we simply use the original version without pointer returns support > (only sbrk and mmap currently) but convert it to the macro version. I indeed think that's the cleanest approach. There will hardly be more than 2 syscalls returning pointers or unsigned values and all this extra complexity added just to avoid *two* SET_ERRNO() calls is totally pointless. > Or, as the idea mentioned by Thomas in a reply: if we can let the sys_ > functions use 'long' returns, or even further, we convert all of the sys_ > functions to macros and let them preserve input types from library routines and > preserve return types from the my_syscall<N> macros. It would be annoying because the sys_* implement some fallbacks, themselves based on #ifdef and such stuff. Macros are really a pain when they're repeated. They're a pain to edit, to debug, to modify and you'll see that editors are even not good with them, you often end up modifying more than you want to try to keep trailing backslashes aligned. > As we discussed in my our syscall.h proposal, if there is a common > my_syscall(), every sys_ function can be simply defined to something > like: > > #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) > > In my_syscall(), it can even simply return -ENOSYS if the __NR_xxx is > not defined (we init such __NR_xxx to something like __NR_NOSYS): > > // sysnr.h > > // If worried about the applications use this macro, perhaps we can > // use a different prefix, for example, NOLIBC_NR_xxx > > #define NOLIBC_NR_NOSYS (-1L) > > #ifndef __NR_xxx > #define NOLIBC_NR_xxx NOLIBC_NR_NOSYS > #else > #define NOLIBC_NR_xxx __NR_xxx > #endif > > // syscall.h > > // _my_syscall is similar to syscall() in unistd.h, but without the > // __sysret normalization > > #define _my_syscalln(N, ...) my_syscall##N(__VA_ARGS__) > #define _my_syscall_n(N, ...) _my_syscalln(N, __VA_ARGS__) > #define _my_syscall(...) _my_syscall_n(_syscall_narg(__VA_ARGS__), ##__VA_ARGS__) > > #define my_syscall(name, ...) \ > ({ \ > long _ret; \ > if (NOLIBC_NR_##name == NOLIBC_NR_NOSYS) \ > _ret = -ENOSYS; \ > else \ > _ret = _my_syscall(NOLIBC_NR_##name, ##__VA_ARGS__); \ > _ret; \ > }) > > // sys_<NAME> list, based on unistd.h > > #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) > #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) > #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) > > With above conversions, we may be able to predefine all of the > sys_<NAME> functions to preserve the input types from library rountines > and return types from my_syscall<N> (by default, 'long'). This also > follows the suggestion from Arnd: let sys_ not use the other low level > syscalls, only use its own. Maybe, but I'm not sure there is much to gain here, compared to the flexibility to map one to another (e.g. see sys_chmod()). > This may also help us to remove all of the `#ifdef __NR_` wrappers, we > can directly check the -ENOSYS in the library routines and try another > sys_<NAME> if required, at last, call __sysret() to normalize the errors > and return value. > > Use dup2 and dup3 as examples, with sysnr.h and syscall.h above, sys.h > will work like this, without any #ifdef's: > > /* > * int dup2(int old, int new); > */ > > static __attribute__((unused)) > int dup2(int old, int new) > { > int ret = sys_dup3(old, new, 0); > > if (ret == -ENOSYS) > ret = sys_dup2(old, new); > > return __sysret(ret); > } But this will add a useless test after all such syscalls, we'd rather not do that! > > -static __inline__ __attribute__((unused, always_inline)) > > -long __sysret(unsigned long ret) > > -{ > > - if (ret >= (unsigned long)-MAX_ERRNO) { > > - SET_ERRNO(-(long)ret); > > - return -1; > > - } > > - return ret; > > -} > > +#define __sysret(arg) \ > > +({ \ > > + __typeof__(arg) __sysret_arg = (arg); \ > > Here ignores the 'const' flag in input type? Yes, as explained above, there's no issue with const. The issue that was met in the version I suggested in the message was that there was an assignment to the variable of value -1 to be returned, which is not permitted when it's const, and I said that it was not necessary, it was just a convenience, but that using "?:" does the job as well without having to do any assignment. > > + ((((__typeof__(arg)) -1) > (__typeof__(arg)) 1) ? /* unsigned arg? */ \ > > + (uintptr_t)__sysret_arg >= (uintptr_t)-(MAX_ERRNO) : /* errors */ \ > > + (__sysret_arg + 1) < ((__typeof__(arg))1) /* signed: <0 = error */ \ > > + ) ? ({ \ > > + SET_ERRNO(-(intptr_t)__sysret_arg); \ > > + ((__typeof__(arg)) -1); /* return -1 upon error */ \ > > + }) : __sysret_arg; /* return original value & type on success */ \ > > +}) > > + > > > > To be honest, it is also a little complex when with one "?:" embedded in > another, I even don't understand how the 'unsigned arg' branch works, > sorry, is it dark magic like the __is_constexpr? ;-) The thing is that we don't need to do anything specific for consts, we just need to check whether an argument is signed or unsigned. The test for unsigned is that all unsigned integers are positive, so ((unsigned)-1 > 0) is always true. We just compare it to 1 instead of 0 to shut up the compiler which was seeing a comparison against NULL. The rest is just checking if arg < 0 if arg is signed, or arg >= -MAX_ERRNO if it's unsigned, and if so, assigns its negation to errno and returns -1 otherwise returns it as-is. So it's not dark magic, doesn't rely on compiler's behavior and does not require links to external books explaining why the macro works in modern compilers. Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be honest, because we're there just because of the temptation to remove lines that were not causing any difficulties :-/ I think we can do something in-between and deal only with signed returns, and explicitly place the test for MAX_ERRNO on the two unsigned ones (brk and mmap). It should look approximately like this: #define __sysret(arg) \ ({ \ __typeof__(arg) __sysret_arg = (arg); \ (__sysret_arg < 0) ? ({ /* error ? */ \ SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */ \ ((__typeof__(arg)) -1); /* return -1 */ \ }) : __sysret_arg; /* return original value */ \ }) Willy
Hi Zhangjin, > On Thu, Aug 10, 2023 at 06:17:43AM +0800, Zhangjin Wu wrote: > [...] > > > > And if you remember, originally you proposed to factor the SET_ERRNO() > > > stuff in every syscall in order to "simplify the code and improve > > > maintainability". It's clear that we've long abandonned that goal here. > > > If we had no other choice, I'd rather roll back to the clean, readable > > > and trustable SET_ERRNO() in every syscall! > > > > > > > Agree, or we simply use the original version without pointer returns support > > (only sbrk and mmap currently) but convert it to the macro version. > > I indeed think that's the cleanest approach. There will hardly be more > than 2 syscalls returning pointers or unsigned values and all this extra > complexity added just to avoid *two* SET_ERRNO() calls is totally > pointless. > Agree. > > Or, as the idea mentioned by Thomas in a reply: if we can let the sys_ > > functions use 'long' returns, or even further, we convert all of the sys_ > > functions to macros and let them preserve input types from library routines and > > preserve return types from the my_syscall<N> macros. > > It would be annoying because the sys_* implement some fallbacks, themselves > based on #ifdef and such stuff. Yeah, this is an issue we must solve, let's talk about it in the 'if (ret == -ENOSYS)' part later. > Macros are really a pain when they're > repeated. They're a pain to edit, to debug, to modify and you'll see that > editors are even not good with them, you often end up modifying more than > you want to try to keep trailing backslashes aligned. > Agree very much. > > As we discussed in my our syscall.h proposal, if there is a common > > my_syscall(), every sys_ function can be simply defined to something > > like: > > > > #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) > > > > In my_syscall(), it can even simply return -ENOSYS if the __NR_xxx is > > not defined (we init such __NR_xxx to something like __NR_NOSYS): > > > > // sysnr.h > > > > // If worried about the applications use this macro, perhaps we can > > // use a different prefix, for example, NOLIBC_NR_xxx > > > > #define NOLIBC_NR_NOSYS (-1L) > > > > #ifndef __NR_xxx > > #define NOLIBC_NR_xxx NOLIBC_NR_NOSYS > > #else > > #define NOLIBC_NR_xxx __NR_xxx > > #endif > > > > // syscall.h > > > > // _my_syscall is similar to syscall() in unistd.h, but without the > > // __sysret normalization > > > > #define _my_syscalln(N, ...) my_syscall##N(__VA_ARGS__) > > #define _my_syscall_n(N, ...) _my_syscalln(N, __VA_ARGS__) > > #define _my_syscall(...) _my_syscall_n(_syscall_narg(__VA_ARGS__), ##__VA_ARGS__) > > > > #define my_syscall(name, ...) \ > > ({ \ > > long _ret; \ > > if (NOLIBC_NR_##name == NOLIBC_NR_NOSYS) \ > > _ret = -ENOSYS; \ > > else \ > > _ret = _my_syscall(NOLIBC_NR_##name, ##__VA_ARGS__); \ > > _ret; \ > > }) > > > > // sys_<NAME> list, based on unistd.h > > > > #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) > > #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) > > #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__) > > > > With above conversions, we may be able to predefine all of the > > sys_<NAME> functions to preserve the input types from library rountines > > and return types from my_syscall<N> (by default, 'long'). This also > > follows the suggestion from Arnd: let sys_ not use the other low level > > syscalls, only use its own. > > Maybe, but I'm not sure there is much to gain here, compared to the > flexibility to map one to another (e.g. see sys_chmod()). > But we can do the mapping in library routines too (still need to solve the #ifdef switch, let's talk later), and the sys_* macros will be purely a user-space name for the kernel-side syscall (of course, this is really the ideal status). Sometimes, the __NR_mmap means old_map, the __NR_select means old_select, and the __NR_clone means different backwards, we still need to let architecture to select what they really want and use some macros like '__ARCH_WANT_SYS_OLD_SELECT' to select the right kernel mmap implementation. From the sys.h side, we can assume every sys_* are just the one (with the same name) provided by kernel side. From the syscall.h side (**this new syscall.h completely differs from the old one with reorg about the my_syscall<N>, it only adds a syscall() like macros: my_syscall() and __sysdef()**), except the one by one mapping: // sysnr.h #define NOLIBC__NR_NOSYS (-1L) #ifndef __NR_brk #define NOLIBC__NR_brk NOLIBC__NR_NOSYS #else #define NOLIBC__NR_brk __NR_brk #endif ... #ifndef __NR_write #define NOLIBC__NR_write NOLIBC__NR_NOSYS #else #define NOLIBC__NR_write __NR_write #endif // syscall.h #define _my_syscall(N, ...) my_syscall##N(__VA_ARGS__) #define _my_syscall_n(N, ...) _my_syscall(N, __VA_ARGS__) #define my_syscall(...) _my_syscall_n(_syscall_narg(__VA_ARGS__), ##__VA_ARGS__) /* syscall() is used from application with normalized error and return value */ #define syscall(...) __sysret(my_syscall(__VA_ARGS__)) /* __sysdef() is used to define sys_* macros with original return value */ #define __sysdef(name, ...) \ ((NOLIBC__NR_##name == NOLIBC__NR_NOSYS) ? (long)-ENOSYS : my_syscall(NOLIBC__NR_##name, ##__VA_ARGS__)) /* sys_* macros */ #define sys_brk(...) __sysdef(brk, __VA_ARGS__) #define sys_chdir(...) __sysdef(chdir, __VA_ARGS__) #define sys_chmod(...) __sysdef(chmod, __VA_ARGS__) #define sys_chown(...) __sysdef(chown, __VA_ARGS__) #define sys_chroot(...) __sysdef(chroot, __VA_ARGS__) #define sys_clone(...) __sysdef(clone, __VA_ARGS__) #define sys_close(...) __sysdef(close, __VA_ARGS__) #define sys_dup(...) __sysdef(dup, __VA_ARGS__) #define sys_dup2(...) __sysdef(dup2, __VA_ARGS__) #define sys_dup3(...) __sysdef(dup3, __VA_ARGS__) #define sys_execve(...) __sysdef(execve, __VA_ARGS__) ... #define sys_unlinkat(...) __sysdef(unlinkat, __VA_ARGS__) #define sys_wait4(...) __sysdef(wait4, __VA_ARGS__) #define sys_write(...) __sysdef(write, __VA_ARGS__) We also add exceptions for the ones like old_select, old_mmap, and backwards of clones, here use old_map as an example: /* sys_* exceptions */ /* Some architectures' mmap() is implemented as old_mmap() in kernel side, * let's correct them */ #ifdef __ARCH_WANT_SYS_OLD_MMAP #undef sys_mmap #define sys_old_mmap(...) __sysdef(mmap, __VA_ARGS__) static __attribute__((unused)) long sys_mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) { struct mmap_arg_struct args = { .addr = (unsigned long)addr, .len = (unsigned long)length, .prot = prot, .flags = flags, .fd = fd, .offset = (unsigned long)offset }; return sys_old_mmap(&args); } #endif With this exception, s390 no long need to provide its own mmap definition, it (seems i386 too, but it uses mmap2 currently) can simply define '__ARCH_WANT_SYS_OLD_MMAP' as the '__ARCH_WANT_SYS_OLD_SELECT' we are using for old_select. The same method applies to the selection of the different backward version of the sys_clone() syscall (from kernel/fork.c): /* * Note: Different archs have a different API of the clone() syscall, let's * normalize sys_clone() for all of them and allow select a backward version by * architecture. */ #ifdef __NR_clone #undef sys_clone #define __sys_clone(...) __sysdef(clone, __VA_ARGS__) static __attribute__((unused)) int sys_clone(unsigned long clone_flags, unsigned long newsp, int __attribute__((unused)) stack_size, int parent_tidptr, int child_tidptr, unsigned long tls) { long ret; #ifdef __ARCH_WANT_SYS_CLONE_BACKWARDS ret = __sys_clone(clone_flags, newsp, parent_tidptr, tls, child_tidptr); #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS2) ret = __sys_clone(newsp, clone_flags, parent_tidptr, child_tidptr, tls); #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS3) ret = __sys_clone(clone_flags, newsp, stack_size, parent_tidptr, child_tidptr, tls); #else ret = __sys_clone(clone_flags, newsp, parent_tidptr, child_tidptr, tls); #endif return ret; } #endif /* __NR_clone */ s390 only requires to define '__ARCH_WANT_SYS_CLONE_BACKWARDS2', no need to provide its own sys_fork() version, in the __NR_clone branch of fork(), __ARCH_WANT_SYS_CLONE_BACKWARDS2 can directly select the right version of sys_clone() for s390). The one for old_select/select is more simple: /* * For historic reasons, the select() syscall on arm, powerpc and i386 is * old_select in kernel side, they all provide _newselect() syscall as the new * version (arm and powerpc from v2.6.27, i386 from v2.6.28) like the select() * syscall on the other architectures. * * To align with them, here defines a new NOLIBC__NR_newselect and map it to * __NR__newselect or __NR_select accordingly. * * Note, since the oldest stable branch is v4.14, it is fair to no long support * the versions older than v2.6.27. */ #ifndef NOLIBC__NR_newselect #ifndef __NR__newselect #ifndef __NR_select #define NOLIBC__NR_newselect NOLIBC__NR_NOSYS #else /* __NR_select */ #define NOLIBC__NR_newselect __NR_select #endif #else /* __NR__newselect */ #define NOLIBC__NR_newselect __NR__newselect #endif /* __NR__newselect */ #endif /* ! NOLIBC__NR_newselect */ /* * sys_newselect is not used by any architecture currently, use it as the new * version of select, it is mapped to sys__newselect or sys_select by the above * definition of NOLIBC__NR_newselect */ #define sys_newselect(...) __sysdef(newselect, __VA_ARGS__) We only have these three exceptions currently, with this normalization, the library routines from sys.h can directly think sys_* macros are generic, if not, let syscall.h take care of the right exceptions. > > This may also help us to remove all of the `#ifdef __NR_` wrappers, we > > can directly check the -ENOSYS in the library routines and try another > > sys_<NAME> if required, at last, call __sysret() to normalize the errors > > and return value. > > > > Use dup2 and dup3 as examples, with sysnr.h and syscall.h above, sys.h > > will work like this, without any #ifdef's: > > > > /* > > * int dup2(int old, int new); > > */ > > > > static __attribute__((unused)) > > int dup2(int old, int new) > > { > > int ret = sys_dup3(old, new, 0); > > > > if (ret == -ENOSYS) > > ret = sys_dup2(old, new); > > > > return __sysret(ret); > > } > > But this will add a useless test after all such syscalls, we'd rather > not do that! > Indeed, I found this issue too, when __NR_dup3 not defined, it returns -ENOSYS, than, no size issue, otherwise, the compiler will not be able to learn what the ret of sys_dup3() will be, so, it can not optimize the second call to sys_dup2(). So, the '#ifdef' logic must be used like we did in sys_* functions, but it is really not that meaningful (no big gain as you mentioned above) if we only move them from the sys_* functions to the library routines. At last, I found the ternary operation together with the initialization of the not-defined __NR_* as NOLIBC__NR_NOSYS help this a lot, at last, we get something like this: /* __systry2() is used to select one of two provided low level syscalls */ #define __systry2(a, sys_a, sys_b) \ ((NOLIBC__NR_##a != NOLIBC__NR_NOSYS) ? (sys_a) : (sys_b)) It can eliminate all of the '#ifdef' stuffs, using the chmod example you mentioned above, it becomes something like this: /* * int chmod(const char *path, mode_t mode); */ static __attribute__((unused)) int chmod(const char *path, mode_t mode) { return __sysret(__systry2(chmod, sys_chmod(path, mode), sys_fchmodat(AT_FDCWD, path, mode, 0))); } Purely clean and clear. Even with the complex select, mmap and fork library routines, we get the same result (we have moved the __ARCH_WANT_SYS_* to syscall.h to normalize sys_* there): /* * int select(int nfds, fd_set *read_fds, fd_set *write_fds, * fd_set *except_fds, struct timeval *timeout); */ static __attribute__((unused)) int select(int nfds, fd_set *rfds, fd_set *wfds, fd_set *efds, struct timeval *timeout) { struct timespec t; if (timeout) { t.tv_sec = timeout->tv_sec; t.tv_nsec = timeout->tv_usec * 1000; } return __sysret(__systry2(newselect, sys_newselect(nfds, rfds, wfds, efds, timeout), sys_pselect6(nfds, rfds, wfds, efds, timeout ? &t : NULL, NULL))); } /* * void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset); * int munmap(void *addr, size_t length); */ /* Note that on Linux, MAP_FAILED is -1 so we can use the generic __sysret() * which returns -1 upon error and still satisfy user land that checks for * MAP_FAILED. */ static __attribute__((unused)) void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) { return (void *)__sysret(__systry2(mmap2, sys_mmap2(addr, length, prot, flags, fd, offset >> 12), sys_mmap(addr, length, prot, flags, fd, offset))); } /* * pid_t fork(void); */ static __attribute__((unused)) pid_t fork(void) { return __sysret(__systry2(fork, sys_fork(), sys_clone(SIGCHLD, 0, 0, 0, 0, 0))); } Currently, except the select() library routine has used 3 sys_* before, the other library routines have only used 2 sys_*, so, __systry2() is enough to select one of two. If we really want to use the third one, based on __systry2(), it is very easy to add __systry3() and even __systry4(): #define __systry3(a, b, sys_a, sys_b, sys_c) \ __systry2(a, (sys_a), __systry2(b, (sys_b), (sys_c))) #define __systry4(a, b, c, sys_a, sys_b, sys_c, sys_d) \ __systry3(a, b, (sys_a), (sys_b), __systry2(c, (sys_c), (sys_d))) Perhaps the coming time64 ones may need __systry3(), not tested yet. > > > -static __inline__ __attribute__((unused, always_inline)) > > > -long __sysret(unsigned long ret) > > > -{ > > > - if (ret >= (unsigned long)-MAX_ERRNO) { > > > - SET_ERRNO(-(long)ret); > > > - return -1; > > > - } > > > - return ret; > > > -} > > > +#define __sysret(arg) \ > > > +({ \ > > > + __typeof__(arg) __sysret_arg = (arg); \ > > > > Here ignores the 'const' flag in input type? > > Yes, as explained above, there's no issue with const. The issue > that was met in the version I suggested in the message was that > there was an assignment to the variable of value -1 to be returned, > which is not permitted when it's const, and I said that it was not > necessary, it was just a convenience, but that using "?:" does the > job as well without having to do any assignment. Sorry, I have mixed the 'assignment' in definition (no problem here) and the 'assignment' in late change (__sysret_arg = -1). The ternary operation here used is really a great idea ;-) > > > > + ((((__typeof__(arg)) -1) > (__typeof__(arg)) 1) ? /* unsigned arg? */ \ > > > + (uintptr_t)__sysret_arg >= (uintptr_t)-(MAX_ERRNO) : /* errors */ \ > > > + (__sysret_arg + 1) < ((__typeof__(arg))1) /* signed: <0 = error */ \ > > > + ) ? ({ \ > > > + SET_ERRNO(-(intptr_t)__sysret_arg); \ > > > + ((__typeof__(arg)) -1); /* return -1 upon error */ \ > > > + }) : __sysret_arg; /* return original value & type on success */ \ > > > +}) > > > + > > > > > > > To be honest, it is also a little complex when with one "?:" embedded in > > another, I even don't understand how the 'unsigned arg' branch works, > > sorry, is it dark magic like the __is_constexpr? ;-) > > The thing is that we don't need to do anything specific for consts, we > just need to check whether an argument is signed or unsigned. The test > for unsigned is that all unsigned integers are positive, so > ((unsigned)-1 > 0) is always true. We just compare it to 1 instead of > 0 to shut up the compiler which was seeing a comparison against NULL. > > The rest is just checking if arg < 0 if arg is signed, or > arg >= -MAX_ERRNO if it's unsigned, and if so, assigns its negation to > errno and returns -1 otherwise returns it as-is. So it's not dark magic, > doesn't rely on compiler's behavior and does not require links to external > books explaining why the macro works in modern compilers. Yeah, I have mixed the outside '?:' with the inside '?:', now get it, thanks a lot! ({ \ ( \ (((__typeof__(arg)) -1) > (__typeof__(arg)) 1) ? /* unsigned arg? */ \ (uintptr_t)__sysret_arg >= (uintptr_t)-(MAX_ERRNO) : /* errors */ \ (__sysret_arg + 1) < ((__typeof__(arg))1) /* signed: <0 = error */ \ ) ? ({ \ SET_ERRNO(-(intptr_t)__sysret_arg); \ ((__typeof__(arg)) -1); /* return -1 upon error */ \ }) \ : __sysret_arg; /* return original value & type on success */ \ }) This looks better although you have prepared a pretty one below ;-) > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be > honest, because we're there just because of the temptation to remove > lines that were not causing any difficulties :-/ > > I think we can do something in-between and deal only with signed returns, > and explicitly place the test for MAX_ERRNO on the two unsigned ones > (brk and mmap). It should look approximately like this: > > #define __sysret(arg) \ > ({ \ > __typeof__(arg) __sysret_arg = (arg); \ > (__sysret_arg < 0) ? ({ /* error ? */ \ > SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */ \ > ((__typeof__(arg)) -1); /* return -1 */ \ > }) : __sysret_arg; /* return original value */ \ > }) > I like this one very much, a simple test shows, it saves one more byte. Only a quesiton, why 'errno != -ret' has a '!'? and we have post-tab in above two lines of __sysret() too, I have changed them to whitespaces. For the brk and mmap part, it is ok to restore them as before, and it also works with the new patchset we proposed above, only need such tuning: #define sys_brk(...) __sysdef(brk, __VA_ARGS__) #define sys_mmap(...) __sysdef(mmap, __VA_ARGS__) #define sys_mmap2(...) __sysdef(mmap2, __VA_ARGS__) --> #define sys_brk(...) (void *)__sysdef(brk, __VA_ARGS__) #define sys_mmap(...) (void *)__sysdef(mmap, __VA_ARGS__) #define sys_mmap2(...) (void *)__sysdef(mmap2, __VA_ARGS__) But let them align with the others may be better, so, most of the sys_* macros can be simply mapped with a simple line (all of them are generated automatically), without the care of the return types changing. So, Willy, as a summary: - one solution is your new __sysret() + restore the original SET_ERRNO for mmap and brk [1]. - another solution is your new __sysret() + my patch [2] to let mmap and brk return 'long' as the other sys_* function does. Both of them are ok for me, but If we can apply the second one, the proposed patchset above may be cleaner. The patch [2] is really a prepare patch for the above proposed patchset, that is why I send it before that patchset. It is time to finish the size inflate regression issue, after you apply any of the above solutions, I will resell my proposed patchset above, at least a RFC patchset, please ignore this currently ;-) [1]: https://lore.kernel.org/lkml/20230813090037.GE8237@1wt.eu/#t [2]: https://lore.kernel.org/lkml/82b584cbda5cee8d5318986644a2a64ba749a098.1691788036.git.falcon@tinylab.org/ Best regards, Zhangjin > Willy
On Sun, Aug 13, 2023 at 09:26:20PM +0800, Zhangjin Wu wrote: > > > With above conversions, we may be able to predefine all of the > > > sys_<NAME> functions to preserve the input types from library rountines > > > and return types from my_syscall<N> (by default, 'long'). This also > > > follows the suggestion from Arnd: let sys_ not use the other low level > > > syscalls, only use its own. > > > > Maybe, but I'm not sure there is much to gain here, compared to the > > flexibility to map one to another (e.g. see sys_chmod()). > > > > But we can do the mapping in library routines too (still need to solve > the #ifdef switch, let's talk later), and the sys_* macros will be purely a > user-space name for the kernel-side syscall (of course, this is really > the ideal status). "we can", sure, but should we ? > Sometimes, the __NR_mmap means old_map, the __NR_select means > old_select, and the __NR_clone means different backwards, we still need > to let architecture to select what they really want and use some macros > like '__ARCH_WANT_SYS_OLD_SELECT' to select the right kernel mmap > implementation. Yes but that's already what we have. I mean, we haven't invented these macros, they're already used in the kernel itself to indicate what variant of a syscall an arch depends on. The __NR_ thing doesn't imply anything, at best its absence implies the syscall doesn't exist in this form. > >From the sys.h side, we can assume every sys_* are just the one (with > the same name) provided by kernel side. > > >From the syscall.h side (**this new syscall.h completely differs from > the old one with reorg about the my_syscall<N>, it only adds a syscall() > like macros: my_syscall() and __sysdef()**), except the one by one > mapping: > > // sysnr.h > #define NOLIBC__NR_NOSYS (-1L) > > #ifndef __NR_brk > #define NOLIBC__NR_brk NOLIBC__NR_NOSYS > #else > #define NOLIBC__NR_brk __NR_brk > #endif > > ... > > #ifndef __NR_write > #define NOLIBC__NR_write NOLIBC__NR_NOSYS > #else > #define NOLIBC__NR_write __NR_write > #endif I really don't like the idea of having to add code to redefine everything that already exists for the purpose of removing code later. There might be other solutions to this. If we had to go through this, at least I would like to be sure that we only use the original __NR_xxx so that nobody has to go through the pain of redefining them like this. Because what's written above is work for a computer, not a human. > ... > #define sys_unlinkat(...) __sysdef(unlinkat, __VA_ARGS__) > #define sys_wait4(...) __sysdef(wait4, __VA_ARGS__) > #define sys_write(...) __sysdef(write, __VA_ARGS__) > > We also add exceptions for the ones like old_select, old_mmap, and > backwards of clones, here use old_map as an example: > > /* sys_* exceptions */ > > /* Some architectures' mmap() is implemented as old_mmap() in kernel side, > * let's correct them > */ > #ifdef __ARCH_WANT_SYS_OLD_MMAP > #undef sys_mmap This one as well should be avoided. Undefining something already defined is very brittle as it prevents any further code reorganization, and is often confusing for those trying to follow the code. > #define sys_old_mmap(...) __sysdef(mmap, __VA_ARGS__) > static __attribute__((unused)) > long sys_mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) > { > struct mmap_arg_struct args = { > .addr = (unsigned long)addr, > .len = (unsigned long)length, > .prot = prot, > .flags = flags, > .fd = fd, > .offset = (unsigned long)offset > }; > return sys_old_mmap(&args); > } > #endif So that's the perfect example of adding unneeded obfuscation. From what I understand it does nothing but: long sys_mmap(...) { struct ... args = { ... }; return my_syscall(&args); } Right ? If so better remove the unneeded #define. > With this exception, s390 no long need to provide its own mmap > definition, it (seems i386 too, but it uses mmap2 currently) can simply > define '__ARCH_WANT_SYS_OLD_MMAP' as the '__ARCH_WANT_SYS_OLD_SELECT' we > are using for old_select. > > The same method applies to the selection of the different backward > version of the sys_clone() syscall (from kernel/fork.c): (...) > #ifdef __NR_clone > #undef sys_clone > #define __sys_clone(...) __sysdef(clone, __VA_ARGS__) > > static __attribute__((unused)) > int sys_clone(unsigned long clone_flags, unsigned long newsp, > int __attribute__((unused)) stack_size, > int parent_tidptr, int child_tidptr, unsigned long tls) > { > long ret; > #ifdef __ARCH_WANT_SYS_CLONE_BACKWARDS > ret = __sys_clone(clone_flags, newsp, parent_tidptr, tls, child_tidptr); > #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS2) > ret = __sys_clone(newsp, clone_flags, parent_tidptr, child_tidptr, tls); > #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS3) > ret = __sys_clone(clone_flags, newsp, stack_size, parent_tidptr, child_tidptr, tls); > #else > ret = __sys_clone(clone_flags, newsp, parent_tidptr, child_tidptr, tls); > #endif > return ret; > } > #endif /* __NR_clone */ > > s390 only requires to define '__ARCH_WANT_SYS_CLONE_BACKWARDS2', no need > to provide its own sys_fork() version, in the __NR_clone branch of > fork(), __ARCH_WANT_SYS_CLONE_BACKWARDS2 can directly select the right > version of sys_clone() for s390). Maybe but with much less #define indirections it would be significantly better. (...) > We only have these three exceptions currently, with this normalization, > the library routines from sys.h can directly think sys_* macros are > generic, if not, let syscall.h take care of the right exceptions. I see the point. But that doesn't remove the need to write the exported function itself. I'm not saying there's nothing to save here, I see your point, I'm just wondering if we really gain something in terms of ease of declaring new syscalls especially for first-time contributors and if we're not losing in maintenance. If at least it's easy enough to implement exceptions, maybe it could be worth further investigating. > > > static __attribute__((unused)) > > > int dup2(int old, int new) > > > { > > > int ret = sys_dup3(old, new, 0); > > > > > > if (ret == -ENOSYS) > > > ret = sys_dup2(old, new); > > > > > > return __sysret(ret); > > > } > > > > But this will add a useless test after all such syscalls, we'd rather > > not do that! > > > > Indeed, I found this issue too, when __NR_dup3 not defined, it returns > -ENOSYS, than, no size issue, otherwise, the compiler will not be able > to learn what the ret of sys_dup3() will be, so, it can not optimize the > second call to sys_dup2(). > > So, the '#ifdef' logic must be used like we did in sys_* functions, but > it is really not that meaningful (no big gain as you mentioned above) if > we only move them from the sys_* functions to the library routines. > > At last, I found the ternary operation together with the initialization > of the not-defined __NR_* as NOLIBC__NR_NOSYS help this a lot, at last, > we get something like this: > > /* __systry2() is used to select one of two provided low level syscalls */ > #define __systry2(a, sys_a, sys_b) \ > ((NOLIBC__NR_##a != NOLIBC__NR_NOSYS) ? (sys_a) : (sys_b)) But this supposes that all of them are manually defined as you did above. I'd rather implement an ugly is_numeric() macro based on argument resolution. I've done it once in another project, I don't remember precisely where it is but I vaguely remember that it used to check that the string resolution of the argument gave a letter (when it does not exist) or a digit (when it does). I can look into that later if needed. But please avoid extra macro definitions as much as possible, they're a real pain to handle in the code. There's no error when one is missing or has a typo, it's difficult to follow them and they don't appear in the debugger. > It can eliminate all of the '#ifdef' stuffs, using the chmod example you > mentioned above, it becomes something like this: > > /* > * int chmod(const char *path, mode_t mode); > */ > > static __attribute__((unused)) > int chmod(const char *path, mode_t mode) > { > return __sysret(__systry2(chmod, sys_chmod(path, mode), sys_fchmodat(AT_FDCWD, path, mode, 0))); > } > > Purely clean and clear. That's a matter of taste and it may explain why we often disagree. For me it's horrid. If I'm the one implementing chmod for my platform and it does not work, what should I do when facing that, except want to cry ? Think that right now we have this: static __attribute__((unused)) int sys_chmod(const char *path, mode_t mode) { #ifdef __NR_fchmodat return my_syscall4(__NR_fchmodat, AT_FDCWD, path, mode, 0); #elif defined(__NR_chmod) return my_syscall2(__NR_chmod, path, mode); #else return -ENOSYS; #endif } Sure it can be called not pretty, but I think it has the merit of being totally explicit, and whoever sees chmod() fail can quickly check based on the test in what situation they're supposed to be and what to check. One thing I'm worried about also regarding using my_syscall() without the number is that it's easy to miss an argument and have random values taken from registers (or the stack) passed as argument. For example above we can see that the flags part is 0 in fchmodat(). It's easy to miss themn and while the syscall4() will complain, syscall() will silently turn that into syscall3(). That's not necessarily a big deal, but we've seen during the development that it's easy to make mistakes and they're not trivial to spot. So again I'm really wondering about the benefits in such a case. This is well illustrated in your example below: > return __sysret(__systry2(newselect, sys_newselect(nfds, rfds, wfds, efds, timeout), > sys_pselect6(nfds, rfds, wfds, efds, timeout ? &t : NULL, NULL))); How many attempts to get it right ? Just skip one NULL and you don't see it. > static __attribute__((unused)) > pid_t fork(void) > { > return __sysret(__systry2(fork, sys_fork(), sys_clone(SIGCHLD, 0, 0, 0, 0, 0))); > } I'd rather really write more explicit code. > > > > -static __inline__ __attribute__((unused, always_inline)) > > > > -long __sysret(unsigned long ret) > > > > -{ > > > > - if (ret >= (unsigned long)-MAX_ERRNO) { > > > > - SET_ERRNO(-(long)ret); > > > > - return -1; > > > > - } > > > > - return ret; > > > > -} > > > > +#define __sysret(arg) \ > > > > +({ \ > > > > + __typeof__(arg) __sysret_arg = (arg); \ > > > > > > Here ignores the 'const' flag in input type? > > > > Yes, as explained above, there's no issue with const. The issue > > that was met in the version I suggested in the message was that > > there was an assignment to the variable of value -1 to be returned, > > which is not permitted when it's const, and I said that it was not > > necessary, it was just a convenience, but that using "?:" does the > > job as well without having to do any assignment. > > Sorry, I have mixed the 'assignment' in definition (no problem here) and > the 'assignment' in late change (__sysret_arg = -1). The ternary > operation here used is really a great idea ;-) But please do not call that an "idea", it's just one of many language constructs, and precisely the one designed exactly for this. Use the right tool for the job! > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be > > honest, because we're there just because of the temptation to remove > > lines that were not causing any difficulties :-/ > > > > I think we can do something in-between and deal only with signed returns, > > and explicitly place the test for MAX_ERRNO on the two unsigned ones > > (brk and mmap). It should look approximately like this: > > > > #define __sysret(arg) \ > > ({ \ > > __typeof__(arg) __sysret_arg = (arg); \ > > (__sysret_arg < 0) ? ({ /* error ? */ \ > > SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */ \ > > ((__typeof__(arg)) -1); /* return -1 */ \ > > }) : __sysret_arg; /* return original value */ \ > > }) > > > > I like this one very much, a simple test shows, it saves one more byte. I'm going to do that and revert the 3 affected syscalls. > Only a quesiton, why 'errno != -ret' has a '!'? and we have post-tab in > above two lines of __sysret() too, I have changed them to whitespaces. Just because I'm not good at writing illustrative code directly in e-mail, which was warned against via "approximately like this". I do not even expect it to compile since I'm not sure it has the right number of braces and parenthesis but indent gives the idea. > For the brk and mmap part, it is ok to restore them as before, and it > also works with the new patchset we proposed above, only need such > tuning: > > #define sys_brk(...) __sysdef(brk, __VA_ARGS__) > #define sys_mmap(...) __sysdef(mmap, __VA_ARGS__) > #define sys_mmap2(...) __sysdef(mmap2, __VA_ARGS__) > > --> > > #define sys_brk(...) (void *)__sysdef(brk, __VA_ARGS__) > #define sys_mmap(...) (void *)__sysdef(mmap, __VA_ARGS__) > #define sys_mmap2(...) (void *)__sysdef(mmap2, __VA_ARGS__) I don't care what it implies for now. We're talking about fixing some breakage, you'll have plenty of time later to see how to adapt your future proposals to what is committed. We DO NOT adapt fixes to what's expected to come later. > But let them align with the others may be better, so, most of the sys_* > macros can be simply mapped with a simple line (all of them are > generated automatically), without the care of the return types changing. > > So, Willy, as a summary: > > - one solution is your new __sysret() + restore the original SET_ERRNO > for mmap and brk [1]. > > - another solution is your new __sysret() + my patch [2] to let mmap and brk > return 'long' as the other sys_* function does. No, because it will completely break them when they'll need to access the second half of the memory, as I already explained somewhere else in one of these numerous discussions. > Both of them are ok for me, but If we can apply the second one, the > proposed patchset above may be cleaner. The patch [2] is really a > prepare patch for the above proposed patchset, that is why I send it > before that patchset. > > It is time to finish the size inflate regression issue, after you apply > any of the above solutions, I'll do, because this has lasted far too long and gone way too far for what was supposed to be a trivial fix. > I will resell my proposed patchset above, at > least a RFC patchset, please ignore this currently ;-) Please allow me to breathe a little bit. Really I mean, I'm already worn by having to constantly review breaking changes that either introduce bugs or break maintainability, and to have to justify myself for things that I thought would be obvious to anyone. Massive changes are extremely time consuming to review, and trying to figure subtle breakage in such low-level stuff is even harder. I simply can't assign more time to this, particularly for the expected gains which for me or often perceived as losses of maintainability instead :-/ Thanks, Willy
Hi, Willy > On Sun, Aug 13, 2023 at 09:26:20PM +0800, Zhangjin Wu wrote: > [...] > > With this exception, s390 no long need to provide its own mmap > > definition, it (seems i386 too, but it uses mmap2 currently) can simply > > define '__ARCH_WANT_SYS_OLD_MMAP' as the '__ARCH_WANT_SYS_OLD_SELECT' we > > are using for old_select. > > > > The same method applies to the selection of the different backward > > version of the sys_clone() syscall (from kernel/fork.c): > (...) > > > #ifdef __NR_clone > > #undef sys_clone > > #define __sys_clone(...) __sysdef(clone, __VA_ARGS__) > > > > static __attribute__((unused)) > > int sys_clone(unsigned long clone_flags, unsigned long newsp, > > int __attribute__((unused)) stack_size, > > int parent_tidptr, int child_tidptr, unsigned long tls) > > { > > long ret; > > #ifdef __ARCH_WANT_SYS_CLONE_BACKWARDS > > ret = __sys_clone(clone_flags, newsp, parent_tidptr, tls, child_tidptr); > > #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS2) > > ret = __sys_clone(newsp, clone_flags, parent_tidptr, child_tidptr, tls); > > #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS3) > > ret = __sys_clone(clone_flags, newsp, stack_size, parent_tidptr, child_tidptr, tls); > > #else > > ret = __sys_clone(clone_flags, newsp, parent_tidptr, child_tidptr, tls); > > #endif > > return ret; > > } > > #endif /* __NR_clone */ > > > > s390 only requires to define '__ARCH_WANT_SYS_CLONE_BACKWARDS2', no need > > to provide its own sys_fork() version, in the __NR_clone branch of > > fork(), __ARCH_WANT_SYS_CLONE_BACKWARDS2 can directly select the right > > version of sys_clone() for s390). > > Maybe but with much less #define indirections it would be significantly > better. > > (...) > > We only have these three exceptions currently, with this normalization, > > the library routines from sys.h can directly think sys_* macros are > > generic, if not, let syscall.h take care of the right exceptions. > > I see the point. But that doesn't remove the need to write the exported > function itself. I'm not saying there's nothing to save here, I see your > point, I'm just wondering if we really gain something in terms of ease > of declaring new syscalls especially for first-time contributors and if > we're not losing in maintenance. If at least it's easy enough to implement > exceptions, maybe it could be worth further investigating. > I will delay the whole work about __sysdef(), but work on some more generic parts (like the exceptions above) at first. > > > > static __attribute__((unused)) > > > > int dup2(int old, int new) > > > > { > > > > int ret = sys_dup3(old, new, 0); > > > > > > > > if (ret == -ENOSYS) > > > > ret = sys_dup2(old, new); > > > > > > > > return __sysret(ret); > > > > } > > > > > > But this will add a useless test after all such syscalls, we'd rather > > > not do that! > > > > > > > Indeed, I found this issue too, when __NR_dup3 not defined, it returns > > -ENOSYS, than, no size issue, otherwise, the compiler will not be able > > to learn what the ret of sys_dup3() will be, so, it can not optimize the > > second call to sys_dup2(). > > > > So, the '#ifdef' logic must be used like we did in sys_* functions, but > > it is really not that meaningful (no big gain as you mentioned above) if > > we only move them from the sys_* functions to the library routines. > > > > At last, I found the ternary operation together with the initialization > > of the not-defined __NR_* as NOLIBC__NR_NOSYS help this a lot, at last, > > we get something like this: > > > > /* __systry2() is used to select one of two provided low level syscalls */ > > #define __systry2(a, sys_a, sys_b) \ > > ((NOLIBC__NR_##a != NOLIBC__NR_NOSYS) ? (sys_a) : (sys_b)) > > But this supposes that all of them are manually defined as you did above. > I'd rather implement an ugly is_numeric() macro based on argument > resolution. I've done it once in another project, I don't remember > precisely where it is but I vaguely remember that it used to check > that the string resolution of the argument gave a letter (when it > does not exist) or a digit (when it does). I can look into that later > if needed. But please avoid extra macro definitions as much as possible, > they're a real pain to handle in the code. There's no error when one is > missing or has a typo, it's difficult to follow them and they don't > appear in the debugger. > Yeah, your reply inspired me to look into the IS_ENABLED() from ../include/linux/kconfig.h macro again, there was a __is_defined() there, let's throw away the ugly sysnr.h. I thought of IS_ENABLED() was only for y/n/m before, but it does return 0 when the macro is not defined, it uses the same trick in syscall() to calculate the number of arguments, if the macro is not defined, then, 0 "argument". > > It can eliminate all of the '#ifdef' stuffs, using the chmod example you > > mentioned above, it becomes something like this: > > > > /* > > * int chmod(const char *path, mode_t mode); > > */ > > > > static __attribute__((unused)) > > int chmod(const char *path, mode_t mode) > > { > > return __sysret(__systry2(chmod, sys_chmod(path, mode), sys_fchmodat(AT_FDCWD, path, mode, 0))); > > } > > > > Purely clean and clear. > > That's a matter of taste and it may explain why we often disagree. For me > it's horrid. If I'm the one implementing chmod for my platform and it does > not work, what should I do when facing that, except want to cry ? Think > that right now we have this: > > static __attribute__((unused)) > int sys_chmod(const char *path, mode_t mode) > { > #ifdef __NR_fchmodat > return my_syscall4(__NR_fchmodat, AT_FDCWD, path, mode, 0); > #elif defined(__NR_chmod) > return my_syscall2(__NR_chmod, path, mode); > #else > return -ENOSYS; > #endif > } > > Sure it can be called not pretty, but I think it has the merit of being > totally explicit, and whoever sees chmod() fail can quickly check based > on the test in what situation they're supposed to be and what to check. > > One thing I'm worried about also regarding using my_syscall() without the > number is that it's easy to miss an argument and have random values taken > from registers (or the stack) passed as argument. For example above we can > see that the flags part is 0 in fchmodat(). It's easy to miss themn and > while the syscall4() will complain, syscall() will silently turn that > into syscall3(). That's not necessarily a big deal, but we've seen during > the development that it's easy to make mistakes and they're not trivial > to spot. So again I'm really wondering about the benefits in such a case. > > This is well illustrated in your example below: > > > return __sysret(__systry2(newselect, sys_newselect(nfds, rfds, wfds, efds, timeout), > > sys_pselect6(nfds, rfds, wfds, efds, timeout ? &t : NULL, NULL))); > > How many attempts to get it right ? Just skip one NULL and you don't > see it. Yeah, seems we have missed the last 0 in ppoll() before and the test may not report about it either. [...] > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be > > > honest, because we're there just because of the temptation to remove > > > lines that were not causing any difficulties :-/ > > > > > > I think we can do something in-between and deal only with signed returns, > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones > > > (brk and mmap). It should look approximately like this: > > > > > > #define __sysret(arg) \ > > > ({ \ > > > __typeof__(arg) __sysret_arg = (arg); \ > > > (__sysret_arg < 0) ? ({ /* error ? */ \ > > > SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */ \ > > > ((__typeof__(arg)) -1); /* return -1 */ \ > > > }) : __sysret_arg; /* return original value */ \ > > > }) > > > > > > > I like this one very much, a simple test shows, it saves one more byte. > > I'm going to do that and revert the 3 affected syscalls. > Ok. > > Only a quesiton, why 'errno != -ret' has a '!'? and we have post-tab in > > above two lines of __sysret() too, I have changed them to whitespaces. > [...] > > > But let them align with the others may be better, so, most of the sys_* > > macros can be simply mapped with a simple line (all of them are > > generated automatically), without the care of the return types changing. > > > > So, Willy, as a summary: > > > > - one solution is your new __sysret() + restore the original SET_ERRNO > > for mmap and brk [1]. > > > > - another solution is your new __sysret() + my patch [2] to let mmap and brk > > return 'long' as the other sys_* function does. > > No, because it will completely break them when they'll need to access the > second half of the memory, as I already explained somewhere else in one > of these numerous discussions. > Sorry, will recheck this part later, please ignore it. [...] > > > I will resell my proposed patchset above, at > > least a RFC patchset, please ignore this currently ;-) > > Please allow me to breathe a little bit. Really I mean, I'm already worn > by having to constantly review breaking changes that either introduce > bugs or break maintainability, and to have to justify myself for things > that I thought would be obvious to anyone. Massive changes are extremely > time consuming to review, and trying to figure subtle breakage in such > low-level stuff is even harder. I simply can't assign more time to this, > particularly for the expected gains which for me or often perceived as > losses of maintainability instead :-/ > Take a rest, I will delay the whole __sysdef() stuff and continue to work on the last tinyconfig patchset after v6.5, it is the last one before our early rv32 work ;-) Thanks, Zhangjin > Thanks, > Willy
From: Zhangjin Wu > Sent: 14 August 2023 11:42 ... > [...] > > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be > > > > honest, because we're there just because of the temptation to remove > > > > lines that were not causing any difficulties :-/ > > > > > > > > I think we can do something in-between and deal only with signed returns, > > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones > > > > (brk and mmap). It should look approximately like this: > > > > > > > > #define __sysret(arg) \ > > > > ({ \ > > > > __typeof__(arg) __sysret_arg = (arg); \ > > > > (__sysret_arg < 0) ? ({ /* error ? */ \ > > > > SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */ \ > > > > ((__typeof__(arg)) -1); /* return -1 */ \ I'm pretty sure you don't need the explicit cast. (It would be needed for a pointer type.) Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg Thinking, maybe it should be: #define __sysret(syscall_fn_args) ({ __typeof__(syscall_fn_args) __rval = syscall_fn_args; __rval >= 0 ? __rval : SET_ERRNO(-__rval), -1; }) Since, IIRC, the usage is return __sysret(sycall_fn(args)); I'm not sure how public SET_ERRO() is. But it could include the negate have the value of -1 cast to its argument type? I think: error = -(int)(long)(arg + 0u); will avoid any sign extension - the (int) might not even be needed. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Hi David, On Mon, Aug 14, 2023 at 11:15:51AM +0000, David Laight wrote: > From: Zhangjin Wu > > Sent: 14 August 2023 11:42 > ... > > [...] > > > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be > > > > > honest, because we're there just because of the temptation to remove > > > > > lines that were not causing any difficulties :-/ > > > > > > > > > > I think we can do something in-between and deal only with signed returns, > > > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones > > > > > (brk and mmap). It should look approximately like this: > > > > > > > > > > #define __sysret(arg) \ > > > > > ({ \ > > > > > __typeof__(arg) __sysret_arg = (arg); \ > > > > > (__sysret_arg < 0) ? ({ /* error ? */ \ > > > > > SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */ \ > > > > > ((__typeof__(arg)) -1); /* return -1 */ \ > > I'm pretty sure you don't need the explicit cast. > (It would be needed for a pointer type.) > Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg > > Thinking, maybe it should be: > > #define __sysret(syscall_fn_args) > ({ > __typeof__(syscall_fn_args) __rval = syscall_fn_args; > __rval >= 0 ? __rval : SET_ERRNO(-__rval), -1; > }) Yeah almost, since arg is necessarily signed in this version, it's just that I manually edited the previous macro in the mail and limited the amount of changes to what was necessary. It's just that SET_ERRNO only is an instruction, not an expression: #define SET_ERRNO(v) do { errno = (v); } while (0) Thus the return value doesn't even pass through it. That's why it was so much simpler before. The rationale behind this was to bring the ability to completely drop errno for programs where you didn't care about it. It's particularly interesting when you don't need any other data either as the program gets strunk from a complete section. > Since, IIRC, the usage is return __sysret(sycall_fn(args)); > I'm not sure how public SET_ERRO() is. For now it is entirely, though it's not supposed to. Thomas and I have been discussing about renaming some internal-use macros and functions to avoid needlessly exposing them by accident to the application. These one definitely qualifies. > But it could include the negate have the value of -1 cast to its argument type? > I think: > error = -(int)(long)(arg + 0u); > will avoid any sign extension - the (int) might not even be needed. So with a signed (int/long) input and errno as int, I don't think we can have any case where there's any such extension anyway. In any case we're either copying the int as-is or truncating it. Regards, Willy
From: Willy Tarreau > Sent: 14 August 2023 13:10 > > Hi David, > > On Mon, Aug 14, 2023 at 11:15:51AM +0000, David Laight wrote: > > From: Zhangjin Wu > > > Sent: 14 August 2023 11:42 > > ... > > > [...] > > > > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be > > > > > > honest, because we're there just because of the temptation to remove > > > > > > lines that were not causing any difficulties :-/ > > > > > > > > > > > > I think we can do something in-between and deal only with signed returns, > > > > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones > > > > > > (brk and mmap). It should look approximately like this: > > > > > > > > > > > > #define __sysret(arg) \ > > > > > > ({ \ > > > > > > __typeof__(arg) __sysret_arg = (arg); \ > > > > > > (__sysret_arg < 0) ? ({ /* error ? */ \ > > > > > > SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */ \ > > > > > > ((__typeof__(arg)) -1); /* return -1 */ \ > > > > I'm pretty sure you don't need the explicit cast. > > (It would be needed for a pointer type.) > > Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg > > > > Thinking, maybe it should be: > > > > #define __sysret(syscall_fn_args) > > ({ > > __typeof__(syscall_fn_args) __rval = syscall_fn_args; > > __rval >= 0 ? __rval : SET_ERRNO(-__rval), -1; > > }) > > Yeah almost, since arg is necessarily signed in this version, it's > just that I manually edited the previous macro in the mail and limited > the amount of changes to what was necessary. It's just that SET_ERRNO > only is an instruction, not an expression: > > #define SET_ERRNO(v) do { errno = (v); } while (0) > > Thus the return value doesn't even pass through it. That's why it was > so much simpler before. The rationale behind this was to bring the > ability to completely drop errno for programs where you didn't care > about it. It's particularly interesting when you don't need any other > data either as the program gets strunk from a complete section. Actually something like: #define SET_ERRNO(v) (errno = -(long)(v), __typeof__(v)-1) seems to work and allows the errno assignment be removed. Also works for pointer types (after a different compare). A quick check with godbolt doesn't show any sign extensions happening. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, Aug 14, 2023 at 12:27:48PM +0000, David Laight wrote: > From: Willy Tarreau > > Sent: 14 August 2023 13:10 > > > > Hi David, > > > > On Mon, Aug 14, 2023 at 11:15:51AM +0000, David Laight wrote: > > > From: Zhangjin Wu > > > > Sent: 14 August 2023 11:42 > > > ... > > > > [...] > > > > > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be > > > > > > > honest, because we're there just because of the temptation to remove > > > > > > > lines that were not causing any difficulties :-/ > > > > > > > > > > > > > > I think we can do something in-between and deal only with signed returns, > > > > > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones > > > > > > > (brk and mmap). It should look approximately like this: > > > > > > > > > > > > > > #define __sysret(arg) \ > > > > > > > ({ \ > > > > > > > __typeof__(arg) __sysret_arg = (arg); \ > > > > > > > (__sysret_arg < 0) ? ({ /* error ? */ \ > > > > > > > SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */ \ > > > > > > > ((__typeof__(arg)) -1); /* return -1 */ \ > > > > > > I'm pretty sure you don't need the explicit cast. > > > (It would be needed for a pointer type.) > > > Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg > > > > > > Thinking, maybe it should be: > > > > > > #define __sysret(syscall_fn_args) > > > ({ > > > __typeof__(syscall_fn_args) __rval = syscall_fn_args; > > > __rval >= 0 ? __rval : SET_ERRNO(-__rval), -1; > > > }) > > > > Yeah almost, since arg is necessarily signed in this version, it's > > just that I manually edited the previous macro in the mail and limited > > the amount of changes to what was necessary. It's just that SET_ERRNO > > only is an instruction, not an expression: > > > > #define SET_ERRNO(v) do { errno = (v); } while (0) > > > > Thus the return value doesn't even pass through it. That's why it was > > so much simpler before. The rationale behind this was to bring the > > ability to completely drop errno for programs where you didn't care > > about it. It's particularly interesting when you don't need any other > > data either as the program gets strunk from a complete section. > > Actually something like: > > #define SET_ERRNO(v) (errno = -(long)(v), __typeof__(v)-1) > > seems to work and allows the errno assignment be removed. > Also works for pointer types (after a different compare). Yes, that's something we can do (with the parenthesis around __typeof__(v) though). > A quick check with godbolt doesn't show any sign extensions happening. I agree there's none here. Willy
> On Mon, Aug 14, 2023 at 12:27:48PM +0000, David Laight wrote: > > From: Willy Tarreau > > > Sent: 14 August 2023 13:10 > > > > > > Hi David, > > > > > > On Mon, Aug 14, 2023 at 11:15:51AM +0000, David Laight wrote: > > > > From: Zhangjin Wu > > > > > Sent: 14 August 2023 11:42 > > > > ... > > > > > [...] > > > > > > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be > > > > > > > > honest, because we're there just because of the temptation to remove > > > > > > > > lines that were not causing any difficulties :-/ > > > > > > > > > > > > > > > > I think we can do something in-between and deal only with signed returns, > > > > > > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones > > > > > > > > (brk and mmap). It should look approximately like this: > > > > > > > > > > > > > > > > #define __sysret(arg) \ > > > > > > > > ({ \ > > > > > > > > __typeof__(arg) __sysret_arg = (arg); \ > > > > > > > > (__sysret_arg < 0) ? ({ /* error ? */ \ > > > > > > > > SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */ \ > > > > > > > > ((__typeof__(arg)) -1); /* return -1 */ \ > > > > > > > > I'm pretty sure you don't need the explicit cast. > > > > (It would be needed for a pointer type.) > > > > Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg > > > > > > > > Thinking, maybe it should be: > > > > > > > > #define __sysret(syscall_fn_args) > > > > ({ > > > > __typeof__(syscall_fn_args) __rval = syscall_fn_args; > > > > __rval >= 0 ? __rval : SET_ERRNO(-__rval), -1; > > > > }) > > > > > > Yeah almost, since arg is necessarily signed in this version, it's > > > just that I manually edited the previous macro in the mail and limited > > > the amount of changes to what was necessary. It's just that SET_ERRNO > > > only is an instruction, not an expression: > > > > > > #define SET_ERRNO(v) do { errno = (v); } while (0) > > > > > > Thus the return value doesn't even pass through it. That's why it was > > > so much simpler before. The rationale behind this was to bring the > > > ability to completely drop errno for programs where you didn't care > > > about it. It's particularly interesting when you don't need any other > > > data either as the program gets strunk from a complete section. > > > > Actually something like: > > > > #define SET_ERRNO(v) (errno = -(long)(v), __typeof__(v)-1) > > > > seems to work and allows the errno assignment be removed. > > Also works for pointer types (after a different compare). > > Yes, that's something we can do (with the parenthesis around > __typeof__(v) though). > Yes, we need the parenthesis, this works: #define SET_ERRNO(v) ( errno = -(long)(v), ((__typeof__(v))-1)) #define __is_unsigned_type(v) ((__typeof__(v))(-1) > (__typeof__(v))1) #define __is_syserr(v) (__is_unsigned_type(v) ? (long)v & ~(-4095UL) : (v + 1 < (__typeof__(v))1)) #define __sysret(arg) \ ({ \ __typeof__(arg) __sysret_arg = (arg); \ (!__is_syserr(__sysret_arg)) ? __sysret_arg : SET_ERRNO(__sysret_arg); \ }) It looks better now, even for 'void *'. '(long)x & ~(-4095UL)' saves another 2+ bytes in some architectures: i386: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 19256 x86_64: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 21740 arm64: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 25812 arm: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 22856 mips: 160 test(s): 157 passed, 3 skipped, 0 failed => status: warning 22756 ppc: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 26380 ppc64: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 26756 ppc64le: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 27364 riscv: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 21758 s390: 160 test(s): 157 passed, 3 skipped, 0 failed => status: warning 21936 BR, Zhangjin > > A quick check with godbolt doesn't show any sign extensions happening. > > I agree there's none here. > > Willy
> > On Mon, Aug 14, 2023 at 12:27:48PM +0000, David Laight wrote: > > > From: Willy Tarreau > > > > Sent: 14 August 2023 13:10 > > > > > > > > Hi David, > > > > > > > > On Mon, Aug 14, 2023 at 11:15:51AM +0000, David Laight wrote: > > > > > From: Zhangjin Wu > > > > > > Sent: 14 August 2023 11:42 > > > > > ... > > > > > > [...] > > > > > > > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be > > > > > > > > > honest, because we're there just because of the temptation to remove > > > > > > > > > lines that were not causing any difficulties :-/ > > > > > > > > > > > > > > > > > > I think we can do something in-between and deal only with signed returns, > > > > > > > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones > > > > > > > > > (brk and mmap). It should look approximately like this: > > > > > > > > > > > > > > > > > > #define __sysret(arg) \ > > > > > > > > > ({ \ > > > > > > > > > __typeof__(arg) __sysret_arg = (arg); \ > > > > > > > > > (__sysret_arg < 0) ? ({ /* error ? */ \ > > > > > > > > > SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */ \ > > > > > > > > > ((__typeof__(arg)) -1); /* return -1 */ \ > > > > > > > > > > I'm pretty sure you don't need the explicit cast. > > > > > (It would be needed for a pointer type.) > > > > > Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg > > > > > > > > > > Thinking, maybe it should be: > > > > > > > > > > #define __sysret(syscall_fn_args) > > > > > ({ > > > > > __typeof__(syscall_fn_args) __rval = syscall_fn_args; > > > > > __rval >= 0 ? __rval : SET_ERRNO(-__rval), -1; > > > > > }) > > > > > > > > Yeah almost, since arg is necessarily signed in this version, it's > > > > just that I manually edited the previous macro in the mail and limited > > > > the amount of changes to what was necessary. It's just that SET_ERRNO > > > > only is an instruction, not an expression: > > > > > > > > #define SET_ERRNO(v) do { errno = (v); } while (0) > > > > > > > > Thus the return value doesn't even pass through it. That's why it was > > > > so much simpler before. The rationale behind this was to bring the > > > > ability to completely drop errno for programs where you didn't care > > > > about it. It's particularly interesting when you don't need any other > > > > data either as the program gets strunk from a complete section. > > > > > > Actually something like: > > > > > > #define SET_ERRNO(v) (errno = -(long)(v), __typeof__(v)-1) > > > > > > seems to work and allows the errno assignment be removed. > > > Also works for pointer types (after a different compare). > > > > Yes, that's something we can do (with the parenthesis around > > __typeof__(v) though). > > > > Yes, we need the parenthesis, this works: > > #define SET_ERRNO(v) ( errno = -(long)(v), ((__typeof__(v))-1)) > Willy, David, We still need "({})": #define SET_ERRNO(v) ({ errno = -(long)(v); (__typeof__(v))-1; }) to silence the warning: sysroot/x86_64/include/errno.h:13:42: warning: right-hand operand of comma expression has no effect [-Wunused-value] 13 | #define SET_ERRNO(v) ( errno = -(long)(v), (__typeof__(v))-1) | ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ BR, Zhangjin > #define __is_unsigned_type(v) ((__typeof__(v))(-1) > (__typeof__(v))1) > #define __is_syserr(v) (__is_unsigned_type(v) ? (long)v & ~(-4095UL) : (v + 1 < (__typeof__(v))1)) > > #define __sysret(arg) \ > ({ \ > __typeof__(arg) __sysret_arg = (arg); \ > (!__is_syserr(__sysret_arg)) ? __sysret_arg : SET_ERRNO(__sysret_arg); \ > }) > > It looks better now, even for 'void *'. > > '(long)x & ~(-4095UL)' saves another 2+ bytes in some architectures: > > i386: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 19256 > x86_64: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 21740 > arm64: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 25812 > arm: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 22856 > mips: 160 test(s): 157 passed, 3 skipped, 0 failed => status: warning 22756 > ppc: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 26380 > ppc64: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 26756 > ppc64le: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 27364 > riscv: 160 test(s): 158 passed, 2 skipped, 0 failed => status: warning 21758 > s390: 160 test(s): 157 passed, 3 skipped, 0 failed => status: warning 21936 > > BR, > Zhangjin
Hi, Willy > [...] > > > > > > /* __systry2() is used to select one of two provided low level syscalls */ > > > #define __systry2(a, sys_a, sys_b) \ > > > ((NOLIBC__NR_##a != NOLIBC__NR_NOSYS) ? (sys_a) : (sys_b)) > > > > But this supposes that all of them are manually defined as you did above. > > I'd rather implement an ugly is_numeric() macro based on argument > > resolution. I've done it once in another project, I don't remember > > precisely where it is but I vaguely remember that it used to check > > that the string resolution of the argument gave a letter (when it > > does not exist) or a digit (when it does). I can look into that later > > if needed. But please avoid extra macro definitions as much as possible, > > they're a real pain to handle in the code. There's no error when one is > > missing or has a typo, it's difficult to follow them and they don't > > appear in the debugger. > > > > Yeah, your reply inspired me to look into the IS_ENABLED() from > ../include/linux/kconfig.h macro again, there was a __is_defined() there, let's > throw away the ugly sysnr.h. I thought of IS_ENABLED() was only for y/n/m > before, but it does return 0 when the macro is not defined, it uses the same > trick in syscall() to calculate the number of arguments, if the macro is not > defined, then, 0 "argument". > The above trick is only for ""#define something 1" ;-) BR, Zhangjin
diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h index 833d6c5e86dc..6bdd18716e84 100644 --- a/tools/include/nolibc/sys.h +++ b/tools/include/nolibc/sys.h @@ -27,23 +27,67 @@ #include "errno.h" #include "types.h" +/* + * This returns a constant expression while determining if an argument is + * a constant expression, most importantly without evaluating the argument. + * Glory to Martin Uecker <Martin.Uecker@med.uni-goettingen.de> + * (from include/linux/const.h) + */ +#define __is_constexpr(x) \ + (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8))) + +/* + * "(void *)0 isn't 'constant enough' for is_constexpr() - so + * is_constexpr((type)0) can be used to detect pointer types." + * (from David Laight <David.Laight@ACULAB.COM>) + */ +#define __is_pointer(x) (!__is_constexpr((__typeof__(x))0)) -/* Syscall return helper for library routines, set errno as -ret when ret is in - * range of [-MAX_ERRNO, -1] +/* + * To preserve the input type and workaround the 'error: assignment of + * read-only variable' when the input type has 'const' flag. + * + * For gcc >= 11.0 (__GXX_ABI_VERSION = 1016), use the new __auto_type keyword + * instead of __typeof__(). * - * Note, No official reference states the errno range here aligns with musl - * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h) + * For old gcc versions, "use typeof((x) + 0) to lose the 'const' flag. The + * only downside is that char/short become int." (from David Laight + * <David.Laight@ACULAB.COM>) */ -static __inline__ __attribute__((unused, always_inline)) -long __sysret(unsigned long ret) -{ - if (ret >= (unsigned long)-MAX_ERRNO) { - SET_ERRNO(-(long)ret); - return -1; - } - return ret; -} +#if __GXX_ABI_VERSION >= 1016 +#define __typeofdecl(arg) __auto_type +#else +#define __typeofdecl(arg) __typeof__((arg) + 0) +#endif + +/* Syscall return helper for library routines + * + * - for pointer returns, set errno as -ret when ret is in [-MAX_ERRNO, -1] + * - for integer returns, set errno as -ret when ret < 0 + * + * Note, + * + * - No official reference states the errno range, here aligns with musl + * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h). + * + * - To reduce binary size by removing useless type conversions and sign + * extensions, the helper is defined as a macro to preserve input type and + * provide two comparisons for both pointer and integer types during the + * compiling stage. + */ + +#define __sysret(arg) \ +({ \ + __typeofdecl(arg) __ret = (arg); \ + if (__builtin_choose_expr(__is_pointer(arg), (unsigned long)-(MAX_ERRNO + 1), (long)__ret) \ + < __builtin_choose_expr(__is_pointer(arg), (unsigned long)__ret, 0)) { \ + SET_ERRNO(-(long)__ret); \ + __ret = (__typeof__(arg))-1L; \ + } \ + __ret; \ +}) + /* Functions in this file only describe syscalls. They're declared static so * that the compiler usually decides to inline them while still being allowed @@ -94,7 +138,7 @@ void *sbrk(intptr_t inc) if (ret && sys_brk(ret + inc) == ret + inc) return ret + inc; - return (void *)__sysret(-ENOMEM); + return __sysret((void *)-ENOMEM); } @@ -682,7 +726,7 @@ void *sys_mmap(void *addr, size_t length, int prot, int flags, int fd, static __attribute__((unused)) void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) { - return (void *)__sysret((unsigned long)sys_mmap(addr, length, prot, flags, fd, offset)); + return __sysret(sys_mmap(addr, length, prot, flags, fd, offset)); } static __attribute__((unused))
As reported and suggested by Willy, the inline __sysret() helper introduces three types of conversions and increases the size: (1) the "unsigned long" argument to __sysret() forces a sign extension from all sys_* functions that used to return 'int' (2) the comparison with the error range now has to be performed on a 'unsigned long' instead of an 'int' (3) the return value from __sysret() is a 'long' (note, a signed long) which then has to be turned back to an 'int' before being returned by the caller to satisfy the caller's prototype. To fix up this, firstly, let's use macro instead of inline function to preserves the input type and avoids these useless conversions (1), (3). Secondly, comparison to -MAX_ERRNO inflicts on all integer returns where we could previously keep a simple sign comparison, let's use a new __is_pointer() macro suggested by David Laight to limit the comparison to -MAX_ERRNO (2) only for pointer returns and preserve a simple sign comparison for integer returns as before. The __builtin_choose_expr() is suggested by David Laight to choose different comparisons based on the types to share code. Thirdly, fix up the following warning by an explicit conversion and let __sysret() be able to accept the (void *) type of argument: sysroot/powerpc/include/sys.h: In function 'sbrk': sysroot/powerpc/include/sys.h:104:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 104 | return (void *)__sysret(-ENOMEM); Fourthly, to further workaround the argument type with 'const' flag, must use __auto_type for gcc >= 11.0 and __typeof__((arg) + 0) suggested by David Laight for old gcc versions. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/ Link: https://lore.kernel.org/lkml/20230806134348.GA19145@1wt.eu/ Suggested-by: David Laight <David.Laight@ACULAB.COM> Link: https://lore.kernel.org/lkml/f51e54bcf470451ea36f24640f000e61@AcuMS.aculab.com/ Link: https://lore.kernel.org/lkml/a1732bbffd1542d3b9dd34c92f45076c@AcuMS.aculab.com/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> --- Hi, Willy, Hi, David v5 applies suggestions from David Laight, it further drops the fixed 'long' conversion branch by using a __typeof__((arg) + 0) trick and also merges the pointer type and integer type comparisons with __bultin_choose_expr() and a new __is_pointer() macro, now, the code is cleaner than before versions. David, Thanks a lot! Like before, tests run for all nolibc supported boards. Changes from v4 --> v5: * Use __typeof__((arg) + 0) to lose the 'const' flag for old gcc versions. * Import the famous __is_constexpr() macro from kernel side and add a __is_pointer() macro based on it. (David, to avoid introduce extra discuss on the prove-in-use __is_constexpr macro, this patch uses the original version instead of your suggested version, more info here: https://lore.kernel.org/lkml/20220131204357.1133674-1-keescook@chromium.org/) * Use __builtin_choose_expr() to merge two comparisons to share the same errno setting code and the -1L assignment code. Changes from v3 --> v4: * fix up a new warning about 'ret < 0' when the input arg type is (void *) Changes from v2 --> v3: * define a __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT for gcc >= 11.0 (ABI_VERSION >= 1016) * split __sysret() to two versions by the macro instead of a mixed unified and unreadable version * use shorter __ret instead of __sysret_arg Changes from v1 --> v2: * fix up argument with 'const' in the type * support "void *" argument Best regards, Zhangjin --- v4: https://lore.kernel.org/lkml/a4084f7fac7a89f861b5582774bc7a98634d1e76.1691392805.git.falcon@tinylab.org/ v3: https://lore.kernel.org/lkml/8eaab5da2dcbba42e3f3efc2ae686a22c95f84f0.1691386601.git.falcon@tinylab.org/ v2: https://lore.kernel.org/lkml/95fe3e732f455fab653fe1427118d905e4d04257.1691339836.git.falcon@tinylab.org/ v1: https://lore.kernel.org/lkml/20230806131921.52453-1-falcon@tinylab.org/ --- tools/include/nolibc/sys.h | 74 ++++++++++++++++++++++++++++++-------- 1 file changed, 59 insertions(+), 15 deletions(-)