Message ID | 20250301142409.2513835-2-visitorckw@gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Johannes Berg |
Headers | show |
Series | Introduce and use generic parity16/32/64 helper | expand |
Context | Check | Description |
---|---|---|
jmberg/tree_selection | success | Guessing tree name failed - patch did not apply |
On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote: > Add generic C implementations of __paritysi2(), __paritydi2(), and > __parityti2() as fallback functions in lib/parity.c. These functions > compute the parity of a given integer using a bitwise approach and are > marked with __weak, allowing architecture-specific implementations to > override them. > > This patch serves as preparation for using __builtin_parity() by > ensuring a fallback mechanism is available when the compiler does not > inline the __builtin_parity(). > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > --- > lib/Makefile | 2 +- > lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 49 insertions(+), 1 deletion(-) > create mode 100644 lib/parity.c > > diff --git a/lib/Makefile b/lib/Makefile > index 7bab71e59019..45affad85ee4 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ > bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ > percpu-refcount.o rhashtable.o base64.o \ > once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ > - generic-radix-tree.o bitmap-str.o > + generic-radix-tree.o bitmap-str.o parity.o > obj-y += string_helpers.o > obj-y += hexdump.o > obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o > diff --git a/lib/parity.c b/lib/parity.c > new file mode 100644 > index 000000000000..a83ff8d96778 > --- /dev/null > +++ b/lib/parity.c > @@ -0,0 +1,48 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * lib/parity.c > + * > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com> > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com> > + * > + * __parity[sdt]i2 can be overridden by linking arch-specific versions. > + */ > + > +#include <linux/export.h> > +#include <linux/kernel.h> > + > +/* > + * One explanation of this algorithm: > + * https://funloop.org/codex/problem/parity/README.html I already asked you not to spread this link. Is there any reason to ignore it? > + */ > +int __weak __paritysi2(u32 val); > +int __weak __paritysi2(u32 val) > +{ > + val ^= val >> 16; > + val ^= val >> 8; > + val ^= val >> 4; > + return (0x6996 >> (val & 0xf)) & 1; > +} > +EXPORT_SYMBOL(__paritysi2); > + > +int __weak __paritydi2(u64 val); > +int __weak __paritydi2(u64 val) > +{ > + val ^= val >> 32; > + val ^= val >> 16; > + val ^= val >> 8; > + val ^= val >> 4; > + return (0x6996 >> (val & 0xf)) & 1; > +} > +EXPORT_SYMBOL(__paritydi2); > + > +int __weak __parityti2(u64 val); > +int __weak __parityti2(u64 val) > +{ > + val ^= val >> 32; > + val ^= val >> 16; > + val ^= val >> 8; > + val ^= val >> 4; > + return (0x6996 >> (val & 0xf)) & 1; > +} > +EXPORT_SYMBOL(__parityti2); OK, it seems I wasn't clear enough on the previous round, so I'll try to be very straightforward now. To begin with, the difference between __parityti2 and __paritydi2 doesn't exist. I'm seriously. I put them side by side, and there's no difference at all. Next, this all is clearly an overengineering. You bake all those weak, const and (ironically, missing) high-efficient arch implementations. But you show no evidence that: - it improves on code generation, - the drivers care about parity()'s performance, and - show no perf tests at all. So you end up with +185/-155 LOCs. Those +30 lines add no new functionality. You copy-paste the same algorithm again and again in very core kernel files. This is a no-go for a nice consolidation series. In the previous round reviewers gave you quite a few nice suggestions. H. Peter Anvin suggested to switch the function to return a boolean, I suggested to make it a macro and even sent you a patch, Jiri and David also spent their time trying to help you, and became ignored. Nevertheless. NAK for the series. Whatever you end up, if it comes to v3, please make it simple, avoid code duplication and run checkpatch. Thanks, Yury
Hi Yury, On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote: > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote: > > Add generic C implementations of __paritysi2(), __paritydi2(), and > > __parityti2() as fallback functions in lib/parity.c. These functions > > compute the parity of a given integer using a bitwise approach and are > > marked with __weak, allowing architecture-specific implementations to > > override them. > > > > This patch serves as preparation for using __builtin_parity() by > > ensuring a fallback mechanism is available when the compiler does not > > inline the __builtin_parity(). > > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > --- > > lib/Makefile | 2 +- > > lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 49 insertions(+), 1 deletion(-) > > create mode 100644 lib/parity.c > > > > diff --git a/lib/Makefile b/lib/Makefile > > index 7bab71e59019..45affad85ee4 100644 > > --- a/lib/Makefile > > +++ b/lib/Makefile > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ > > bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ > > percpu-refcount.o rhashtable.o base64.o \ > > once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ > > - generic-radix-tree.o bitmap-str.o > > + generic-radix-tree.o bitmap-str.o parity.o > > obj-y += string_helpers.o > > obj-y += hexdump.o > > obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o > > diff --git a/lib/parity.c b/lib/parity.c > > new file mode 100644 > > index 000000000000..a83ff8d96778 > > --- /dev/null > > +++ b/lib/parity.c > > @@ -0,0 +1,48 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * lib/parity.c > > + * > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com> > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com> > > + * > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions. > > + */ > > + > > +#include <linux/export.h> > > +#include <linux/kernel.h> > > + > > +/* > > + * One explanation of this algorithm: > > + * https://funloop.org/codex/problem/parity/README.html > > I already asked you not to spread this link. Is there any reason to > ignore it? > In v2, this algorithm was removed from bitops.h, so I moved the link here instead. I'm sorry if it seemed like I ignored your comment. > > + */ > > +int __weak __paritysi2(u32 val); > > +int __weak __paritysi2(u32 val) > > +{ > > + val ^= val >> 16; > > + val ^= val >> 8; > > + val ^= val >> 4; > > + return (0x6996 >> (val & 0xf)) & 1; > > +} > > +EXPORT_SYMBOL(__paritysi2); > > + > > +int __weak __paritydi2(u64 val); > > +int __weak __paritydi2(u64 val) > > +{ > > + val ^= val >> 32; > > + val ^= val >> 16; > > + val ^= val >> 8; > > + val ^= val >> 4; > > + return (0x6996 >> (val & 0xf)) & 1; > > +} > > +EXPORT_SYMBOL(__paritydi2); > > + > > +int __weak __parityti2(u64 val); > > +int __weak __parityti2(u64 val) > > +{ > > + val ^= val >> 32; > > + val ^= val >> 16; > > + val ^= val >> 8; > > + val ^= val >> 4; > > + return (0x6996 >> (val & 0xf)) & 1; > > +} > > +EXPORT_SYMBOL(__parityti2); > > OK, it seems I wasn't clear enough on the previous round, so I'll try > to be very straightforward now. > > To begin with, the difference between __parityti2 and __paritydi2 > doesn't exist. I'm seriously. I put them side by side, and there's > no difference at all. > > Next, this all is clearly an overengineering. You bake all those weak, > const and (ironically, missing) high-efficient arch implementations. > But you show no evidence that: > - it improves on code generation, > - the drivers care about parity()'s performance, and > - show no perf tests at all. > > So you end up with +185/-155 LOCs. > > Those +30 lines add no new functionality. You copy-paste the same > algorithm again and again in very core kernel files. This is a no-go > for a nice consolidation series. > > In the previous round reviewers gave you quite a few nice suggestions. > H. Peter Anvin suggested to switch the function to return a boolean, I > suggested to make it a macro and even sent you a patch, Jiri and David > also spent their time trying to help you, and became ignored. > > Nevertheless. NAK for the series. Whatever you end up, if it comes to > v3, please make it simple, avoid code duplication and run checkpatch. > In v1, I used the same approach as parity8() because I couldn't justify the performance impact in a specific driver or subsystem. However, multiple people commented on using __builtin_parity or an x86 assembly implementation. I'm not ignoring their feedback-I want to address these comments. Before submitting, I sent an email explaining my current approach: using David's suggested method along with __builtin_parity, but no one responded. So, I decided to submit v2 for discussion instead. To avoid mistakes in v3, I want to confirm the following changes before sending it: (a) Change the return type from int to bool. (b) Avoid __builtin_parity and use the same approach as parity8(). (c) Implement parity16/32/64() as single-line inline functions that call the next smaller variant after xor. (d) Add a parity() macro that selects the appropriate parityXX() based on type size. (e) Update users to use this parity() macro. However, (d) may require a patch affecting multiple subsystems at once since some places that already include bitops.h have functions named parity(), causing conflicts. Unless we decide not to add this macro in the end. As for checkpatch.pl warnings, they are mostly pre-existing coding style issues in this series. I've kept them as-is, but if preferred, I'm fine with fixing them. If anything is incorrect or if there are concerns, please let me know. Regards, Kuan-Wei diff --git a/include/linux/bitops.h b/include/linux/bitops.h index c1cb53cf2f0f..47b7eca8d3b7 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -260,6 +260,43 @@ static inline int parity8(u8 val) return (0x6996 >> (val & 0xf)) & 1; } +static inline bool parity16(u16 val) +{ + return parity8(val ^ (val >> 8)); +} + +static inline bool parity32(u32 val) +{ + return parity16(val ^ (val >> 16)); +} + +static inline bool parity64(u64 val) +{ + return parity32(val ^ (val >> 32)); +} + +#define parity(val) \ +({ \ + bool __ret; \ + switch (BITS_PER_TYPE(val)) { \ + case 64: \ + __ret = parity64(val); \ + break; \ + case 32: \ + __ret = parity32(val); \ + break; \ + case 16: \ + __ret = parity16(val); \ + break; \ + case 8: \ + __ret = parity8(val); \ + break; \ + default: \ + BUILD_BUG(); \ + } \ + __ret; \ +}) + /** * __ffs64 - find first set bit in a 64 bit word * @word: The 64 bit word
On Sun, Mar 02, 2025 at 04:20:02PM +0800, Kuan-Wei Chiu wrote: > Hi Yury, > > On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote: > > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote: > > > Add generic C implementations of __paritysi2(), __paritydi2(), and > > > __parityti2() as fallback functions in lib/parity.c. These functions > > > compute the parity of a given integer using a bitwise approach and are > > > marked with __weak, allowing architecture-specific implementations to > > > override them. > > > > > > This patch serves as preparation for using __builtin_parity() by > > > ensuring a fallback mechanism is available when the compiler does not > > > inline the __builtin_parity(). > > > > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > > --- > > > lib/Makefile | 2 +- > > > lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > > > 2 files changed, 49 insertions(+), 1 deletion(-) > > > create mode 100644 lib/parity.c > > > > > > diff --git a/lib/Makefile b/lib/Makefile > > > index 7bab71e59019..45affad85ee4 100644 > > > --- a/lib/Makefile > > > +++ b/lib/Makefile > > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ > > > bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ > > > percpu-refcount.o rhashtable.o base64.o \ > > > once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ > > > - generic-radix-tree.o bitmap-str.o > > > + generic-radix-tree.o bitmap-str.o parity.o > > > obj-y += string_helpers.o > > > obj-y += hexdump.o > > > obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o > > > diff --git a/lib/parity.c b/lib/parity.c > > > new file mode 100644 > > > index 000000000000..a83ff8d96778 > > > --- /dev/null > > > +++ b/lib/parity.c > > > @@ -0,0 +1,48 @@ > > > +// SPDX-License-Identifier: GPL-2.0-only > > > +/* > > > + * lib/parity.c > > > + * > > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com> > > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com> > > > + * > > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions. > > > + */ > > > + > > > +#include <linux/export.h> > > > +#include <linux/kernel.h> > > > + > > > +/* > > > + * One explanation of this algorithm: > > > + * https://funloop.org/codex/problem/parity/README.html > > > > I already asked you not to spread this link. Is there any reason to > > ignore it? > > > In v2, this algorithm was removed from bitops.h, so I moved the link > here instead. I'm sorry if it seemed like I ignored your comment. Yes, it is. > In v1, I used the same approach as parity8() because I couldn't justify > the performance impact in a specific driver or subsystem. However, > multiple people commented on using __builtin_parity or an x86 assembly > implementation. I'm not ignoring their feedback-I want to address these Please ask those multiple people: are they ready to maintain all that zoo of macros, weak implementations, arch implementations and stubs for no clear benefit? Performance is always worth it, but again I see not even a hint that the drivers care about performance. You don't measure it, so don't care as well. Right? > comments. Before submitting, I sent an email explaining my current > approach: using David's suggested method along with __builtin_parity, > but no one responded. So, I decided to submit v2 for discussion > instead. For discussion use tag RFC. > > To avoid mistakes in v3, I want to confirm the following changes before > sending it: > > (a) Change the return type from int to bool. > (b) Avoid __builtin_parity and use the same approach as parity8(). > (c) Implement parity16/32/64() as single-line inline functions that > call the next smaller variant after xor. > (d) Add a parity() macro that selects the appropriate parityXX() based > on type size. > (e) Update users to use this parity() macro. > > However, (d) may require a patch affecting multiple subsystems at once > since some places that already include bitops.h have functions named > parity(), causing conflicts. Unless we decide not to add this macro in > the end. > > As for checkpatch.pl warnings, they are mostly pre-existing coding > style issues in this series. I've kept them as-is, but if preferred, > I'm fine with fixing them. Checkpatch only complains on new lines. Particularly this patch should trigger checkpatch warning because it adds a new file but doesn't touch MAINTAINERS. > If anything is incorrect or if there are concerns, please let me know. > > Regards, > Kuan-Wei > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h > index c1cb53cf2f0f..47b7eca8d3b7 100644 > --- a/include/linux/bitops.h > +++ b/include/linux/bitops.h > @@ -260,6 +260,43 @@ static inline int parity8(u8 val) > return (0x6996 >> (val & 0xf)) & 1; > } > > +static inline bool parity16(u16 val) > +{ > + return parity8(val ^ (val >> 8)); > +} > + > +static inline bool parity32(u32 val) > +{ > + return parity16(val ^ (val >> 16)); > +} > + > +static inline bool parity64(u64 val) > +{ > + return parity32(val ^ (val >> 32)); > +} That was discussed between Jiri and me in v2. Fixed types functions are needed only in a few very specific cases. With the exception of I3C driver (which doesn't look good for both Jiri and me), all the drivers have the type of variable passed to the parityXX() matching the actual variable length. It means that fixed-type versions of the parity() are simply not needed. So if we don't need them, please don't introduce it. > +#define parity(val) \ > +({ \ > + bool __ret; \ > + switch (BITS_PER_TYPE(val)) { \ > + case 64: \ > + __ret = parity64(val); \ > + break; \ > + case 32: \ > + __ret = parity32(val); \ > + break; \ > + case 16: \ > + __ret = parity16(val); \ > + break; \ > + case 8: \ > + __ret = parity8(val); \ > + break; \ > + default: \ > + BUILD_BUG(); \ > + } \ > + __ret; \ > +}) > + > /** > * __ffs64 - find first set bit in a 64 bit word > * @word: The 64 bit word
Hi Yury, On Sun, Mar 02, 2025 at 11:02:12AM -0500, Yury Norov wrote: > On Sun, Mar 02, 2025 at 04:20:02PM +0800, Kuan-Wei Chiu wrote: > > Hi Yury, > > > > On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote: > > > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote: > > > > Add generic C implementations of __paritysi2(), __paritydi2(), and > > > > __parityti2() as fallback functions in lib/parity.c. These functions > > > > compute the parity of a given integer using a bitwise approach and are > > > > marked with __weak, allowing architecture-specific implementations to > > > > override them. > > > > > > > > This patch serves as preparation for using __builtin_parity() by > > > > ensuring a fallback mechanism is available when the compiler does not > > > > inline the __builtin_parity(). > > > > > > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > --- > > > > lib/Makefile | 2 +- > > > > lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > > > > 2 files changed, 49 insertions(+), 1 deletion(-) > > > > create mode 100644 lib/parity.c > > > > > > > > diff --git a/lib/Makefile b/lib/Makefile > > > > index 7bab71e59019..45affad85ee4 100644 > > > > --- a/lib/Makefile > > > > +++ b/lib/Makefile > > > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ > > > > bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ > > > > percpu-refcount.o rhashtable.o base64.o \ > > > > once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ > > > > - generic-radix-tree.o bitmap-str.o > > > > + generic-radix-tree.o bitmap-str.o parity.o > > > > obj-y += string_helpers.o > > > > obj-y += hexdump.o > > > > obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o > > > > diff --git a/lib/parity.c b/lib/parity.c > > > > new file mode 100644 > > > > index 000000000000..a83ff8d96778 > > > > --- /dev/null > > > > +++ b/lib/parity.c > > > > @@ -0,0 +1,48 @@ > > > > +// SPDX-License-Identifier: GPL-2.0-only > > > > +/* > > > > + * lib/parity.c > > > > + * > > > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com> > > > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com> > > > > + * > > > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions. > > > > + */ > > > > + > > > > +#include <linux/export.h> > > > > +#include <linux/kernel.h> > > > > + > > > > +/* > > > > + * One explanation of this algorithm: > > > > + * https://funloop.org/codex/problem/parity/README.html > > > > > > I already asked you not to spread this link. Is there any reason to > > > ignore it? > > > > > In v2, this algorithm was removed from bitops.h, so I moved the link > > here instead. I'm sorry if it seemed like I ignored your comment. > > Yes, it is. > > > In v1, I used the same approach as parity8() because I couldn't justify > > the performance impact in a specific driver or subsystem. However, > > multiple people commented on using __builtin_parity or an x86 assembly > > implementation. I'm not ignoring their feedback-I want to address these > > Please ask those multiple people: are they ready to maintain all that > zoo of macros, weak implementations, arch implementations and stubs > for no clear benefit? Performance is always worth it, but again I see > not even a hint that the drivers care about performance. You don't > measure it, so don't care as well. Right? > > > comments. Before submitting, I sent an email explaining my current > > approach: using David's suggested method along with __builtin_parity, > > but no one responded. So, I decided to submit v2 for discussion > > instead. > > For discussion use tag RFC. > > > > > To avoid mistakes in v3, I want to confirm the following changes before > > sending it: > > > > (a) Change the return type from int to bool. > > (b) Avoid __builtin_parity and use the same approach as parity8(). > > (c) Implement parity16/32/64() as single-line inline functions that > > call the next smaller variant after xor. > > (d) Add a parity() macro that selects the appropriate parityXX() based > > on type size. > > (e) Update users to use this parity() macro. > > > > However, (d) may require a patch affecting multiple subsystems at once > > since some places that already include bitops.h have functions named > > parity(), causing conflicts. Unless we decide not to add this macro in > > the end. > > > > As for checkpatch.pl warnings, they are mostly pre-existing coding > > style issues in this series. I've kept them as-is, but if preferred, > > I'm fine with fixing them. > > Checkpatch only complains on new lines. Particularly this patch should > trigger checkpatch warning because it adds a new file but doesn't touch > MAINTAINERS. > For example, the following warning: ERROR: space required after that ',' (ctx:VxV) #84: FILE: drivers/input/joystick/sidewinder.c:368: + if (!parity64(GB(0,33))) ^ This issue already existed before this series, and I'm keeping its style unchanged for now. > > If anything is incorrect or if there are concerns, please let me know. > > > > Regards, > > Kuan-Wei > > > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h > > index c1cb53cf2f0f..47b7eca8d3b7 100644 > > --- a/include/linux/bitops.h > > +++ b/include/linux/bitops.h > > @@ -260,6 +260,43 @@ static inline int parity8(u8 val) > > return (0x6996 >> (val & 0xf)) & 1; > > } > > > > +static inline bool parity16(u16 val) > > +{ > > + return parity8(val ^ (val >> 8)); > > +} > > + > > +static inline bool parity32(u32 val) > > +{ > > + return parity16(val ^ (val >> 16)); > > +} > > + > > +static inline bool parity64(u64 val) > > +{ > > + return parity32(val ^ (val >> 32)); > > +} > > That was discussed between Jiri and me in v2. Fixed types functions > are needed only in a few very specific cases. With the exception of > I3C driver (which doesn't look good for both Jiri and me), all the > drivers have the type of variable passed to the parityXX() matching > the actual variable length. It means that fixed-type versions of the > parity() are simply not needed. So if we don't need them, please don't > introduce it. > So, I should add the following parity() macro in v3, remove parity8(), and update all current parity8() users to use this macro, right? I changed u64 to __auto_type and applied David's suggestion to replace the >> 32 with >> 16 >> 16 to avoid compiler warnings. Regards, Kuan-Wei #define parity(val) \ ({ \ __auto_type __v = (val); \ bool __ret; \ switch (BITS_PER_TYPE(val)) { \ case 64: \ __v ^= __v >> 16 >> 16; \ fallthrough; \ case 32: \ __v ^= __v >> 16; \ fallthrough; \ case 16: \ __v ^= __v >> 8; \ fallthrough; \ case 8: \ __v ^= __v >> 4; \ __ret = (0x6996 >> (__v & 0xf)) & 1; \ break; \ default: \ BUILD_BUG(); \ } \ __ret; \ })
On Mon, 3 Mar 2025 01:29:19 +0800 Kuan-Wei Chiu <visitorckw@gmail.com> wrote: > Hi Yury, > > On Sun, Mar 02, 2025 at 11:02:12AM -0500, Yury Norov wrote: > > On Sun, Mar 02, 2025 at 04:20:02PM +0800, Kuan-Wei Chiu wrote: > > > Hi Yury, > > > > > > On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote: > > > > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote: > > > > > Add generic C implementations of __paritysi2(), __paritydi2(), and > > > > > __parityti2() as fallback functions in lib/parity.c. These functions > > > > > compute the parity of a given integer using a bitwise approach and are > > > > > marked with __weak, allowing architecture-specific implementations to > > > > > override them. > > > > > > > > > > This patch serves as preparation for using __builtin_parity() by > > > > > ensuring a fallback mechanism is available when the compiler does not > > > > > inline the __builtin_parity(). > > > > > > > > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > --- > > > > > lib/Makefile | 2 +- > > > > > lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > 2 files changed, 49 insertions(+), 1 deletion(-) > > > > > create mode 100644 lib/parity.c > > > > > > > > > > diff --git a/lib/Makefile b/lib/Makefile > > > > > index 7bab71e59019..45affad85ee4 100644 > > > > > --- a/lib/Makefile > > > > > +++ b/lib/Makefile > > > > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ > > > > > bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ > > > > > percpu-refcount.o rhashtable.o base64.o \ > > > > > once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ > > > > > - generic-radix-tree.o bitmap-str.o > > > > > + generic-radix-tree.o bitmap-str.o parity.o > > > > > obj-y += string_helpers.o > > > > > obj-y += hexdump.o > > > > > obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o > > > > > diff --git a/lib/parity.c b/lib/parity.c > > > > > new file mode 100644 > > > > > index 000000000000..a83ff8d96778 > > > > > --- /dev/null > > > > > +++ b/lib/parity.c > > > > > @@ -0,0 +1,48 @@ > > > > > +// SPDX-License-Identifier: GPL-2.0-only > > > > > +/* > > > > > + * lib/parity.c > > > > > + * > > > > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com> > > > > > + * > > > > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions. > > > > > + */ > > > > > + > > > > > +#include <linux/export.h> > > > > > +#include <linux/kernel.h> > > > > > + > > > > > +/* > > > > > + * One explanation of this algorithm: > > > > > + * https://funloop.org/codex/problem/parity/README.html > > > > > > > > I already asked you not to spread this link. Is there any reason to > > > > ignore it? > > > > > > > In v2, this algorithm was removed from bitops.h, so I moved the link > > > here instead. I'm sorry if it seemed like I ignored your comment. > > > > Yes, it is. > > > > > In v1, I used the same approach as parity8() because I couldn't justify > > > the performance impact in a specific driver or subsystem. However, > > > multiple people commented on using __builtin_parity or an x86 assembly > > > implementation. I'm not ignoring their feedback-I want to address these > > > > Please ask those multiple people: are they ready to maintain all that > > zoo of macros, weak implementations, arch implementations and stubs > > for no clear benefit? Performance is always worth it, but again I see > > not even a hint that the drivers care about performance. You don't > > measure it, so don't care as well. Right? > > > > > comments. Before submitting, I sent an email explaining my current > > > approach: using David's suggested method along with __builtin_parity, > > > but no one responded. So, I decided to submit v2 for discussion > > > instead. > > > > For discussion use tag RFC. > > > > > > > > To avoid mistakes in v3, I want to confirm the following changes before > > > sending it: > > > > > > (a) Change the return type from int to bool. > > > (b) Avoid __builtin_parity and use the same approach as parity8(). > > > (c) Implement parity16/32/64() as single-line inline functions that > > > call the next smaller variant after xor. > > > (d) Add a parity() macro that selects the appropriate parityXX() based > > > on type size. > > > (e) Update users to use this parity() macro. > > > > > > However, (d) may require a patch affecting multiple subsystems at once > > > since some places that already include bitops.h have functions named > > > parity(), causing conflicts. Unless we decide not to add this macro in > > > the end. > > > > > > As for checkpatch.pl warnings, they are mostly pre-existing coding > > > style issues in this series. I've kept them as-is, but if preferred, > > > I'm fine with fixing them. > > > > Checkpatch only complains on new lines. Particularly this patch should > > trigger checkpatch warning because it adds a new file but doesn't touch > > MAINTAINERS. > > > For example, the following warning: > > ERROR: space required after that ',' (ctx:VxV) > #84: FILE: drivers/input/joystick/sidewinder.c:368: > + if (!parity64(GB(0,33))) > ^ > > This issue already existed before this series, and I'm keeping its > style unchanged for now. > > > > If anything is incorrect or if there are concerns, please let me know. > > > > > > Regards, > > > Kuan-Wei > > > > > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h > > > index c1cb53cf2f0f..47b7eca8d3b7 100644 > > > --- a/include/linux/bitops.h > > > +++ b/include/linux/bitops.h > > > @@ -260,6 +260,43 @@ static inline int parity8(u8 val) > > > return (0x6996 >> (val & 0xf)) & 1; > > > } > > > > > > +static inline bool parity16(u16 val) > > > +{ > > > + return parity8(val ^ (val >> 8)); > > > +} > > > + > > > +static inline bool parity32(u32 val) > > > +{ > > > + return parity16(val ^ (val >> 16)); > > > +} > > > + > > > +static inline bool parity64(u64 val) > > > +{ > > > + return parity32(val ^ (val >> 32)); > > > +} > > > > That was discussed between Jiri and me in v2. Fixed types functions > > are needed only in a few very specific cases. With the exception of > > I3C driver (which doesn't look good for both Jiri and me), all the > > drivers have the type of variable passed to the parityXX() matching > > the actual variable length. It means that fixed-type versions of the > > parity() are simply not needed. So if we don't need them, please don't > > introduce it. > > > So, I should add the following parity() macro in v3, remove parity8(), > and update all current parity8() users to use this macro, right? > > I changed u64 to __auto_type and applied David's suggestion to replace > the >> 32 with >> 16 >> 16 to avoid compiler warnings. > > Regards, > Kuan-Wei > > #define parity(val) \ > ({ \ > __auto_type __v = (val); \ > bool __ret; \ > switch (BITS_PER_TYPE(val)) { \ > case 64: \ > __v ^= __v >> 16 >> 16; \ > fallthrough; \ > case 32: \ > __v ^= __v >> 16; \ > fallthrough; \ > case 16: \ > __v ^= __v >> 8; \ > fallthrough; \ > case 8: \ > __v ^= __v >> 4; \ > __ret = (0x6996 >> (__v & 0xf)) & 1; \ > break; \ > default: \ > BUILD_BUG(); \ > } \ > __ret; \ > }) I'm seeing double-register shifts for 64bit values on 32bit systems. And gcc is doing 64bit double-register maths all the way down. That is fixed by changing the top of the define to #define parity(val) \ ({ \ unsigned int __v = (val); \ bool __ret; \ switch (BITS_PER_TYPE(val)) { \ case 64: \ __v ^= val >> 16 >> 16; \ fallthrough; \ But it's need changing to only expand 'val' once. Perhaps: auto_type _val = (val); u32 __ret = val; and (mostly) s/__v/__ret/g David
On Sun, Mar 02, 2025 at 07:09:54PM +0000, David Laight wrote: > On Mon, 3 Mar 2025 01:29:19 +0800 > Kuan-Wei Chiu <visitorckw@gmail.com> wrote: > > > Hi Yury, > > > > On Sun, Mar 02, 2025 at 11:02:12AM -0500, Yury Norov wrote: > > > On Sun, Mar 02, 2025 at 04:20:02PM +0800, Kuan-Wei Chiu wrote: > > > > Hi Yury, > > > > > > > > On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote: > > > > > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote: > > > > > > Add generic C implementations of __paritysi2(), __paritydi2(), and > > > > > > __parityti2() as fallback functions in lib/parity.c. These functions > > > > > > compute the parity of a given integer using a bitwise approach and are > > > > > > marked with __weak, allowing architecture-specific implementations to > > > > > > override them. > > > > > > > > > > > > This patch serves as preparation for using __builtin_parity() by > > > > > > ensuring a fallback mechanism is available when the compiler does not > > > > > > inline the __builtin_parity(). > > > > > > > > > > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > > > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > > --- > > > > > > lib/Makefile | 2 +- > > > > > > lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > 2 files changed, 49 insertions(+), 1 deletion(-) > > > > > > create mode 100644 lib/parity.c > > > > > > > > > > > > diff --git a/lib/Makefile b/lib/Makefile > > > > > > index 7bab71e59019..45affad85ee4 100644 > > > > > > --- a/lib/Makefile > > > > > > +++ b/lib/Makefile > > > > > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ > > > > > > bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ > > > > > > percpu-refcount.o rhashtable.o base64.o \ > > > > > > once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ > > > > > > - generic-radix-tree.o bitmap-str.o > > > > > > + generic-radix-tree.o bitmap-str.o parity.o > > > > > > obj-y += string_helpers.o > > > > > > obj-y += hexdump.o > > > > > > obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o > > > > > > diff --git a/lib/parity.c b/lib/parity.c > > > > > > new file mode 100644 > > > > > > index 000000000000..a83ff8d96778 > > > > > > --- /dev/null > > > > > > +++ b/lib/parity.c > > > > > > @@ -0,0 +1,48 @@ > > > > > > +// SPDX-License-Identifier: GPL-2.0-only > > > > > > +/* > > > > > > + * lib/parity.c > > > > > > + * > > > > > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com> > > > > > > + * > > > > > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions. > > > > > > + */ > > > > > > + > > > > > > +#include <linux/export.h> > > > > > > +#include <linux/kernel.h> > > > > > > + > > > > > > +/* > > > > > > + * One explanation of this algorithm: > > > > > > + * https://funloop.org/codex/problem/parity/README.html > > > > > > > > > > I already asked you not to spread this link. Is there any reason to > > > > > ignore it? > > > > > > > > > In v2, this algorithm was removed from bitops.h, so I moved the link > > > > here instead. I'm sorry if it seemed like I ignored your comment. > > > > > > Yes, it is. > > > > > > > In v1, I used the same approach as parity8() because I couldn't justify > > > > the performance impact in a specific driver or subsystem. However, > > > > multiple people commented on using __builtin_parity or an x86 assembly > > > > implementation. I'm not ignoring their feedback-I want to address these > > > > > > Please ask those multiple people: are they ready to maintain all that > > > zoo of macros, weak implementations, arch implementations and stubs > > > for no clear benefit? Performance is always worth it, but again I see > > > not even a hint that the drivers care about performance. You don't > > > measure it, so don't care as well. Right? > > > > > > > comments. Before submitting, I sent an email explaining my current > > > > approach: using David's suggested method along with __builtin_parity, > > > > but no one responded. So, I decided to submit v2 for discussion > > > > instead. > > > > > > For discussion use tag RFC. > > > > > > > > > > > To avoid mistakes in v3, I want to confirm the following changes before > > > > sending it: > > > > > > > > (a) Change the return type from int to bool. > > > > (b) Avoid __builtin_parity and use the same approach as parity8(). > > > > (c) Implement parity16/32/64() as single-line inline functions that > > > > call the next smaller variant after xor. > > > > (d) Add a parity() macro that selects the appropriate parityXX() based > > > > on type size. > > > > (e) Update users to use this parity() macro. > > > > > > > > However, (d) may require a patch affecting multiple subsystems at once > > > > since some places that already include bitops.h have functions named > > > > parity(), causing conflicts. Unless we decide not to add this macro in > > > > the end. > > > > > > > > As for checkpatch.pl warnings, they are mostly pre-existing coding > > > > style issues in this series. I've kept them as-is, but if preferred, > > > > I'm fine with fixing them. > > > > > > Checkpatch only complains on new lines. Particularly this patch should > > > trigger checkpatch warning because it adds a new file but doesn't touch > > > MAINTAINERS. > > > > > For example, the following warning: > > > > ERROR: space required after that ',' (ctx:VxV) > > #84: FILE: drivers/input/joystick/sidewinder.c:368: > > + if (!parity64(GB(0,33))) > > ^ > > > > This issue already existed before this series, and I'm keeping its > > style unchanged for now. > > > > > > If anything is incorrect or if there are concerns, please let me know. > > > > > > > > Regards, > > > > Kuan-Wei > > > > > > > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h > > > > index c1cb53cf2f0f..47b7eca8d3b7 100644 > > > > --- a/include/linux/bitops.h > > > > +++ b/include/linux/bitops.h > > > > @@ -260,6 +260,43 @@ static inline int parity8(u8 val) > > > > return (0x6996 >> (val & 0xf)) & 1; > > > > } > > > > > > > > +static inline bool parity16(u16 val) > > > > +{ > > > > + return parity8(val ^ (val >> 8)); > > > > +} > > > > + > > > > +static inline bool parity32(u32 val) > > > > +{ > > > > + return parity16(val ^ (val >> 16)); > > > > +} > > > > + > > > > +static inline bool parity64(u64 val) > > > > +{ > > > > + return parity32(val ^ (val >> 32)); > > > > +} > > > > > > That was discussed between Jiri and me in v2. Fixed types functions > > > are needed only in a few very specific cases. With the exception of > > > I3C driver (which doesn't look good for both Jiri and me), all the > > > drivers have the type of variable passed to the parityXX() matching > > > the actual variable length. It means that fixed-type versions of the > > > parity() are simply not needed. So if we don't need them, please don't > > > introduce it. > > > > > So, I should add the following parity() macro in v3, remove parity8(), > > and update all current parity8() users to use this macro, right? > > > > I changed u64 to __auto_type and applied David's suggestion to replace > > the >> 32 with >> 16 >> 16 to avoid compiler warnings. > > > > Regards, > > Kuan-Wei > > > > #define parity(val) \ > > ({ \ > > __auto_type __v = (val); \ > > bool __ret; \ > > switch (BITS_PER_TYPE(val)) { \ > > case 64: \ > > __v ^= __v >> 16 >> 16; \ > > fallthrough; \ > > case 32: \ > > __v ^= __v >> 16; \ > > fallthrough; \ > > case 16: \ > > __v ^= __v >> 8; \ > > fallthrough; \ > > case 8: \ > > __v ^= __v >> 4; \ > > __ret = (0x6996 >> (__v & 0xf)) & 1; \ > > break; \ > > default: \ > > BUILD_BUG(); \ > > } \ > > __ret; \ > > }) > > I'm seeing double-register shifts for 64bit values on 32bit systems. > And gcc is doing 64bit double-register maths all the way down. > > That is fixed by changing the top of the define to > #define parity(val) \ > ({ \ > unsigned int __v = (val); \ > bool __ret; \ > switch (BITS_PER_TYPE(val)) { \ > case 64: \ > __v ^= val >> 16 >> 16; \ > fallthrough; \ > > But it's need changing to only expand 'val' once. > Perhaps: > auto_type _val = (val); > u32 __ret = val; > and (mostly) s/__v/__ret/g > I'm happy to make this change, though I'm a bit confused about how much we care about the code generated by gcc. So this is the macro expected in v3: #define parity(val) \ ({ \ __auto_type __v = (val); \ u32 __ret = val; \ switch (BITS_PER_TYPE(val)) { \ case 64: \ __ret ^= __v >> 16 >> 16; \ fallthrough; \ case 32: \ __ret ^= __ret >> 16; \ fallthrough; \ case 16: \ __ret ^= __ret >> 8; \ fallthrough; \ case 8: \ __ret ^= __ret >> 4; \ __ret = (0x6996 >> (__ret & 0xf)) & 1; \ break; \ default: \ BUILD_BUG(); \ } \ __ret; \ })
On Mon, 3 Mar 2025 10:47:20 +0800 Kuan-Wei Chiu <visitorckw@gmail.com> wrote: > On Sun, Mar 02, 2025 at 07:09:54PM +0000, David Laight wrote: > > On Mon, 3 Mar 2025 01:29:19 +0800 > > Kuan-Wei Chiu <visitorckw@gmail.com> wrote: > > > > > Hi Yury, > > > ... > > > #define parity(val) \ > > > ({ \ > > > __auto_type __v = (val); \ > > > bool __ret; \ > > > switch (BITS_PER_TYPE(val)) { \ > > > case 64: \ > > > __v ^= __v >> 16 >> 16; \ > > > fallthrough; \ > > > case 32: \ > > > __v ^= __v >> 16; \ > > > fallthrough; \ > > > case 16: \ > > > __v ^= __v >> 8; \ > > > fallthrough; \ > > > case 8: \ > > > __v ^= __v >> 4; \ > > > __ret = (0x6996 >> (__v & 0xf)) & 1; \ > > > break; \ > > > default: \ > > > BUILD_BUG(); \ > > > } \ > > > __ret; \ > > > }) > > > > I'm seeing double-register shifts for 64bit values on 32bit systems. > > And gcc is doing 64bit double-register maths all the way down. > > > > That is fixed by changing the top of the define to > > #define parity(val) \ > > ({ \ > > unsigned int __v = (val); \ > > bool __ret; \ > > switch (BITS_PER_TYPE(val)) { \ > > case 64: \ > > __v ^= val >> 16 >> 16; \ > > fallthrough; \ > > > > But it's need changing to only expand 'val' once. > > Perhaps: > > auto_type _val = (val); > > u32 __ret = val; > > and (mostly) s/__v/__ret/g > > > I'm happy to make this change, though I'm a bit confused about how much > we care about the code generated by gcc. So this is the macro expected > in v3: There is 'good', 'bad' and 'ugly' - it was in the 'bad' to 'ugly' area. > > #define parity(val) \ > ({ \ > __auto_type __v = (val); \ > u32 __ret = val; \ > switch (BITS_PER_TYPE(val)) { \ > case 64: \ > __ret ^= __v >> 16 >> 16; \ > fallthrough; \ > case 32: \ > __ret ^= __ret >> 16; \ > fallthrough; \ > case 16: \ > __ret ^= __ret >> 8; \ > fallthrough; \ > case 8: \ > __ret ^= __ret >> 4; \ > __ret = (0x6996 >> (__ret & 0xf)) & 1; \ > break; \ > default: \ > BUILD_BUG(); \ > } \ > __ret; \ > }) That looks like it will avoid double-register shifts on 32bit archs. arm64 can do slightly better (a couple of instructions) because of its barrel shifter. x86 can do a lot better because of the cpu 'parity' flag. But maybe it is never used anywhere that really matters. David
On Mon, Mar 03, 2025 at 01:29:19AM +0800, Kuan-Wei Chiu wrote: > Hi Yury, > > On Sun, Mar 02, 2025 at 11:02:12AM -0500, Yury Norov wrote: > > On Sun, Mar 02, 2025 at 04:20:02PM +0800, Kuan-Wei Chiu wrote: > > > Hi Yury, > > > > > > On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote: > > > > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote: > > > > > Add generic C implementations of __paritysi2(), __paritydi2(), and > > > > > __parityti2() as fallback functions in lib/parity.c. These functions > > > > > compute the parity of a given integer using a bitwise approach and are > > > > > marked with __weak, allowing architecture-specific implementations to > > > > > override them. > > > > > > > > > > This patch serves as preparation for using __builtin_parity() by > > > > > ensuring a fallback mechanism is available when the compiler does not > > > > > inline the __builtin_parity(). > > > > > > > > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > --- > > > > > lib/Makefile | 2 +- > > > > > lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > 2 files changed, 49 insertions(+), 1 deletion(-) > > > > > create mode 100644 lib/parity.c > > > > > > > > > > diff --git a/lib/Makefile b/lib/Makefile > > > > > index 7bab71e59019..45affad85ee4 100644 > > > > > --- a/lib/Makefile > > > > > +++ b/lib/Makefile > > > > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ > > > > > bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ > > > > > percpu-refcount.o rhashtable.o base64.o \ > > > > > once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ > > > > > - generic-radix-tree.o bitmap-str.o > > > > > + generic-radix-tree.o bitmap-str.o parity.o > > > > > obj-y += string_helpers.o > > > > > obj-y += hexdump.o > > > > > obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o > > > > > diff --git a/lib/parity.c b/lib/parity.c > > > > > new file mode 100644 > > > > > index 000000000000..a83ff8d96778 > > > > > --- /dev/null > > > > > +++ b/lib/parity.c > > > > > @@ -0,0 +1,48 @@ > > > > > +// SPDX-License-Identifier: GPL-2.0-only > > > > > +/* > > > > > + * lib/parity.c > > > > > + * > > > > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com> > > > > > + * > > > > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions. > > > > > + */ > > > > > + > > > > > +#include <linux/export.h> > > > > > +#include <linux/kernel.h> > > > > > + > > > > > +/* > > > > > + * One explanation of this algorithm: > > > > > + * https://funloop.org/codex/problem/parity/README.html > > > > > > > > I already asked you not to spread this link. Is there any reason to > > > > ignore it? > > > > > > > In v2, this algorithm was removed from bitops.h, so I moved the link > > > here instead. I'm sorry if it seemed like I ignored your comment. > > > > Yes, it is. > > > > > In v1, I used the same approach as parity8() because I couldn't justify > > > the performance impact in a specific driver or subsystem. However, > > > multiple people commented on using __builtin_parity or an x86 assembly > > > implementation. I'm not ignoring their feedback-I want to address these > > > > Please ask those multiple people: are they ready to maintain all that > > zoo of macros, weak implementations, arch implementations and stubs > > for no clear benefit? Performance is always worth it, but again I see > > not even a hint that the drivers care about performance. You don't > > measure it, so don't care as well. Right? > > > > > comments. Before submitting, I sent an email explaining my current > > > approach: using David's suggested method along with __builtin_parity, > > > but no one responded. So, I decided to submit v2 for discussion > > > instead. > > > > For discussion use tag RFC. > > > > > > > > To avoid mistakes in v3, I want to confirm the following changes before > > > sending it: > > > > > > (a) Change the return type from int to bool. > > > (b) Avoid __builtin_parity and use the same approach as parity8(). > > > (c) Implement parity16/32/64() as single-line inline functions that > > > call the next smaller variant after xor. > > > (d) Add a parity() macro that selects the appropriate parityXX() based > > > on type size. > > > (e) Update users to use this parity() macro. > > > > > > However, (d) may require a patch affecting multiple subsystems at once > > > since some places that already include bitops.h have functions named > > > parity(), causing conflicts. Unless we decide not to add this macro in > > > the end. > > > > > > As for checkpatch.pl warnings, they are mostly pre-existing coding > > > style issues in this series. I've kept them as-is, but if preferred, > > > I'm fine with fixing them. > > > > Checkpatch only complains on new lines. Particularly this patch should > > trigger checkpatch warning because it adds a new file but doesn't touch > > MAINTAINERS. > > > For example, the following warning: > > ERROR: space required after that ',' (ctx:VxV) > #84: FILE: drivers/input/joystick/sidewinder.c:368: > + if (!parity64(GB(0,33))) > ^ > > This issue already existed before this series, and I'm keeping its > style unchanged for now. > > > > If anything is incorrect or if there are concerns, please let me know. > > > > > > Regards, > > > Kuan-Wei > > > > > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h > > > index c1cb53cf2f0f..47b7eca8d3b7 100644 > > > --- a/include/linux/bitops.h > > > +++ b/include/linux/bitops.h > > > @@ -260,6 +260,43 @@ static inline int parity8(u8 val) > > > return (0x6996 >> (val & 0xf)) & 1; > > > } > > > > > > +static inline bool parity16(u16 val) > > > +{ > > > + return parity8(val ^ (val >> 8)); > > > +} > > > + > > > +static inline bool parity32(u32 val) > > > +{ > > > + return parity16(val ^ (val >> 16)); > > > +} > > > + > > > +static inline bool parity64(u64 val) > > > +{ > > > + return parity32(val ^ (val >> 32)); > > > +} > > > > That was discussed between Jiri and me in v2. Fixed types functions > > are needed only in a few very specific cases. With the exception of > > I3C driver (which doesn't look good for both Jiri and me), all the > > drivers have the type of variable passed to the parityXX() matching > > the actual variable length. It means that fixed-type versions of the > > parity() are simply not needed. So if we don't need them, please don't > > introduce it. > > > So, I should add the following parity() macro in v3, remove parity8(), > and update all current parity8() users to use this macro, right? If you go with macro, please apply my patch and modify it in-place with this __auto_type thing and GCC hack. Feel free to add your co-developed-by, or tested, or whatever. > I changed u64 to __auto_type and applied David's suggestion to replace > the >> 32 with >> 16 >> 16 to avoid compiler warnings. > > Regards, > Kuan-Wei > > #define parity(val) \ > ({ \ > __auto_type __v = (val); \ > bool __ret; \ > switch (BITS_PER_TYPE(val)) { \ > case 64: \ > __v ^= __v >> 16 >> 16; \ > fallthrough; \ This hack should be GCC-only, and well documented. For clang it should be __v ^= __v >> 32; \ > case 32: \ > __v ^= __v >> 16; \ > fallthrough; \ > case 16: \ > __v ^= __v >> 8; \ > fallthrough; \ > case 8: \ > __v ^= __v >> 4; \ > __ret = (0x6996 >> (__v & 0xf)) & 1; \ > break; \ > default: \ > BUILD_BUG(); \ > } \ > __ret; \ > })
On Sun, Mar 02, 2025 at 07:09:54PM +0000, David Laight wrote: > > #define parity(val) \ > > ({ \ > > __auto_type __v = (val); \ > > bool __ret; \ > > switch (BITS_PER_TYPE(val)) { \ > > case 64: \ > > __v ^= __v >> 16 >> 16; \ > > fallthrough; \ > > case 32: \ > > __v ^= __v >> 16; \ > > fallthrough; \ > > case 16: \ > > __v ^= __v >> 8; \ > > fallthrough; \ > > case 8: \ > > __v ^= __v >> 4; \ > > __ret = (0x6996 >> (__v & 0xf)) & 1; \ > > break; \ > > default: \ > > BUILD_BUG(); \ > > } \ > > __ret; \ > > }) > > I'm seeing double-register shifts for 64bit values on 32bit systems. > And gcc is doing 64bit double-register maths all the way down. If you don't like GCC code generation why don't you discuss it in GCC maillist? > That is fixed by changing the top of the define to > #define parity(val) \ > ({ \ > unsigned int __v = (val); \ > bool __ret; \ > switch (BITS_PER_TYPE(val)) { \ > case 64: \ > __v ^= val >> 16 >> 16; \ > fallthrough; \ > > But it's need changing to only expand 'val' once. > Perhaps: > auto_type _val = (val); > u32 __ret = val; > and (mostly) s/__v/__ret/g You suggest another complication to mitigate a problem that most likely doesn't exist. I looked through the series and found that parity64() is used for I2C, joystick and a netronome ethernet card. For I2C and joystick performance is definitely not a problem. For ethernet - maybe. But I feel like you didn't compile that driver for any 32-bit arch, and didn't test with a real hardware. So your concern is a pure speculation. NAK.
On Mon, Mar 03, 2025 at 10:47:20AM +0800, Kuan-Wei Chiu wrote: > > > #define parity(val) \ > > > ({ \ > > > __auto_type __v = (val); \ > > > bool __ret; \ > > > switch (BITS_PER_TYPE(val)) { \ > > > case 64: \ > > > __v ^= __v >> 16 >> 16; \ > > > fallthrough; \ > > > case 32: \ > > > __v ^= __v >> 16; \ > > > fallthrough; \ > > > case 16: \ > > > __v ^= __v >> 8; \ > > > fallthrough; \ > > > case 8: \ > > > __v ^= __v >> 4; \ > > > __ret = (0x6996 >> (__v & 0xf)) & 1; \ > > > break; \ > > > default: \ > > > BUILD_BUG(); \ > > > } \ > > > __ret; \ > > > }) > > > > I'm seeing double-register shifts for 64bit values on 32bit systems. > > And gcc is doing 64bit double-register maths all the way down. > > > > That is fixed by changing the top of the define to > > #define parity(val) \ > > ({ \ > > unsigned int __v = (val); \ > > bool __ret; \ > > switch (BITS_PER_TYPE(val)) { \ > > case 64: \ > > __v ^= val >> 16 >> 16; \ > > fallthrough; \ > > > > But it's need changing to only expand 'val' once. > > Perhaps: > > auto_type _val = (val); > > u32 __ret = val; > > and (mostly) s/__v/__ret/g > > > I'm happy to make this change, though I'm a bit confused about how much > we care about the code generated by gcc. So this is the macro expected > in v3: We do care about code generated by any compiler. But we don't spread hacks here and there just to make GCC happy. This is entirely broken strategy. Things should work the other way: compiler people should collect real-life examples and learn from them. I'm not happy even with this 'v >> 16 >> 16' hack, I just think that disabling Wshift-count-overflow is the worse option. Hacking the macro to optimize parity64() on 32-bit arch case doesn't worth it entirely. In your patchset, you have only 3 drivers using parity64(). For each of them, please in the commit message refer that calling generic parity() with 64-bit argument may lead to sub-optimal code generation with a certain compiler against 32-bit arches. If you'll get a feedback that it's a real problem for somebody, we'll think about mitigating it. > #define parity(val) \ > ({ \ > __auto_type __v = (val); \ > u32 __ret = val; \ > switch (BITS_PER_TYPE(val)) { \ > case 64: \ > __ret ^= __v >> 16 >> 16; \ > fallthrough; \ > case 32: \ > __ret ^= __ret >> 16; \ > fallthrough; \ > case 16: \ > __ret ^= __ret >> 8; \ > fallthrough; \ > case 8: \ > __ret ^= __ret >> 4; \ > __ret = (0x6996 >> (__ret & 0xf)) & 1; \ > break; \ > default: \ > BUILD_BUG(); \ > } \ > __ret; \ > })
On Mon, Mar 03, 2025 at 10:43:28AM -0500, Yury Norov wrote: > On Mon, Mar 03, 2025 at 10:47:20AM +0800, Kuan-Wei Chiu wrote: > > > > #define parity(val) \ > > > > ({ \ > > > > __auto_type __v = (val); \ > > > > bool __ret; \ > > > > switch (BITS_PER_TYPE(val)) { \ > > > > case 64: \ > > > > __v ^= __v >> 16 >> 16; \ > > > > fallthrough; \ > > > > case 32: \ > > > > __v ^= __v >> 16; \ > > > > fallthrough; \ > > > > case 16: \ > > > > __v ^= __v >> 8; \ > > > > fallthrough; \ > > > > case 8: \ > > > > __v ^= __v >> 4; \ > > > > __ret = (0x6996 >> (__v & 0xf)) & 1; \ > > > > break; \ > > > > default: \ > > > > BUILD_BUG(); \ > > > > } \ > > > > __ret; \ > > > > }) > > > > > > I'm seeing double-register shifts for 64bit values on 32bit systems. > > > And gcc is doing 64bit double-register maths all the way down. > > > > > > That is fixed by changing the top of the define to > > > #define parity(val) \ > > > ({ \ > > > unsigned int __v = (val); \ > > > bool __ret; \ > > > switch (BITS_PER_TYPE(val)) { \ > > > case 64: \ > > > __v ^= val >> 16 >> 16; \ > > > fallthrough; \ > > > > > > But it's need changing to only expand 'val' once. > > > Perhaps: > > > auto_type _val = (val); > > > u32 __ret = val; > > > and (mostly) s/__v/__ret/g > > > > > I'm happy to make this change, though I'm a bit confused about how much > > we care about the code generated by gcc. So this is the macro expected > > in v3: > > We do care about code generated by any compiler. But we don't spread > hacks here and there just to make GCC happy. This is entirely broken > strategy. Things should work the other way: compiler people should > collect real-life examples and learn from them. > > I'm not happy even with this 'v >> 16 >> 16' hack, I just think that > disabling Wshift-count-overflow is the worse option. Hacking the macro > to optimize parity64() on 32-bit arch case doesn't worth it entirely. > > In your patchset, you have only 3 drivers using parity64(). For each > of them, please in the commit message refer that calling generic > parity() with 64-bit argument may lead to sub-optimal code generation > with a certain compiler against 32-bit arches. If you'll get a > feedback that it's a real problem for somebody, we'll think about > mitigating it. > How about reconsidering using parity8/16/32/64() instead of adding a parity() macro? They allow compiler to generate correct code without any hacks, and each implementation is simple and just one line. Jiri also agreed in the previous thread that we need parity8() in cases like the i3c driver. I think this might be the easiest solution to satisfy most people? Regards, Kuan-Wei
On Mon, 3 Mar 2025 10:15:41 -0500 Yury Norov <yury.norov@gmail.com> wrote: > On Mon, Mar 03, 2025 at 01:29:19AM +0800, Kuan-Wei Chiu wrote: > > Hi Yury, > > > > On Sun, Mar 02, 2025 at 11:02:12AM -0500, Yury Norov wrote: > > > On Sun, Mar 02, 2025 at 04:20:02PM +0800, Kuan-Wei Chiu wrote: > > > > Hi Yury, > > > > > > > > On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote: > > > > > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote: > > > > > > Add generic C implementations of __paritysi2(), __paritydi2(), and > > > > > > __parityti2() as fallback functions in lib/parity.c. These functions > > > > > > compute the parity of a given integer using a bitwise approach and are > > > > > > marked with __weak, allowing architecture-specific implementations to > > > > > > override them. > > > > > > > > > > > > This patch serves as preparation for using __builtin_parity() by > > > > > > ensuring a fallback mechanism is available when the compiler does not > > > > > > inline the __builtin_parity(). > > > > > > > > > > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > > > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> > > > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > > --- > > > > > > lib/Makefile | 2 +- > > > > > > lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > 2 files changed, 49 insertions(+), 1 deletion(-) > > > > > > create mode 100644 lib/parity.c > > > > > > > > > > > > diff --git a/lib/Makefile b/lib/Makefile > > > > > > index 7bab71e59019..45affad85ee4 100644 > > > > > > --- a/lib/Makefile > > > > > > +++ b/lib/Makefile > > > > > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ > > > > > > bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ > > > > > > percpu-refcount.o rhashtable.o base64.o \ > > > > > > once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ > > > > > > - generic-radix-tree.o bitmap-str.o > > > > > > + generic-radix-tree.o bitmap-str.o parity.o > > > > > > obj-y += string_helpers.o > > > > > > obj-y += hexdump.o > > > > > > obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o > > > > > > diff --git a/lib/parity.c b/lib/parity.c > > > > > > new file mode 100644 > > > > > > index 000000000000..a83ff8d96778 > > > > > > --- /dev/null > > > > > > +++ b/lib/parity.c > > > > > > @@ -0,0 +1,48 @@ > > > > > > +// SPDX-License-Identifier: GPL-2.0-only > > > > > > +/* > > > > > > + * lib/parity.c > > > > > > + * > > > > > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com> > > > > > > + * > > > > > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions. > > > > > > + */ > > > > > > + > > > > > > +#include <linux/export.h> > > > > > > +#include <linux/kernel.h> > > > > > > + > > > > > > +/* > > > > > > + * One explanation of this algorithm: > > > > > > + * https://funloop.org/codex/problem/parity/README.html > > > > > > > > > > I already asked you not to spread this link. Is there any reason to > > > > > ignore it? > > > > > > > > > In v2, this algorithm was removed from bitops.h, so I moved the link > > > > here instead. I'm sorry if it seemed like I ignored your comment. > > > > > > Yes, it is. > > > > > > > In v1, I used the same approach as parity8() because I couldn't justify > > > > the performance impact in a specific driver or subsystem. However, > > > > multiple people commented on using __builtin_parity or an x86 assembly > > > > implementation. I'm not ignoring their feedback-I want to address these > > > > > > Please ask those multiple people: are they ready to maintain all that > > > zoo of macros, weak implementations, arch implementations and stubs > > > for no clear benefit? Performance is always worth it, but again I see > > > not even a hint that the drivers care about performance. You don't > > > measure it, so don't care as well. Right? > > > > > > > comments. Before submitting, I sent an email explaining my current > > > > approach: using David's suggested method along with __builtin_parity, > > > > but no one responded. So, I decided to submit v2 for discussion > > > > instead. > > > > > > For discussion use tag RFC. > > > > > > > > > > > To avoid mistakes in v3, I want to confirm the following changes before > > > > sending it: > > > > > > > > (a) Change the return type from int to bool. > > > > (b) Avoid __builtin_parity and use the same approach as parity8(). > > > > (c) Implement parity16/32/64() as single-line inline functions that > > > > call the next smaller variant after xor. > > > > (d) Add a parity() macro that selects the appropriate parityXX() based > > > > on type size. > > > > (e) Update users to use this parity() macro. > > > > > > > > However, (d) may require a patch affecting multiple subsystems at once > > > > since some places that already include bitops.h have functions named > > > > parity(), causing conflicts. Unless we decide not to add this macro in > > > > the end. > > > > > > > > As for checkpatch.pl warnings, they are mostly pre-existing coding > > > > style issues in this series. I've kept them as-is, but if preferred, > > > > I'm fine with fixing them. > > > > > > Checkpatch only complains on new lines. Particularly this patch should > > > trigger checkpatch warning because it adds a new file but doesn't touch > > > MAINTAINERS. > > > > > For example, the following warning: > > > > ERROR: space required after that ',' (ctx:VxV) > > #84: FILE: drivers/input/joystick/sidewinder.c:368: > > + if (!parity64(GB(0,33))) > > ^ > > > > This issue already existed before this series, and I'm keeping its > > style unchanged for now. > > > > > > If anything is incorrect or if there are concerns, please let me know. > > > > > > > > Regards, > > > > Kuan-Wei > > > > > > > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h > > > > index c1cb53cf2f0f..47b7eca8d3b7 100644 > > > > --- a/include/linux/bitops.h > > > > +++ b/include/linux/bitops.h > > > > @@ -260,6 +260,43 @@ static inline int parity8(u8 val) > > > > return (0x6996 >> (val & 0xf)) & 1; > > > > } > > > > > > > > +static inline bool parity16(u16 val) > > > > +{ > > > > + return parity8(val ^ (val >> 8)); > > > > +} > > > > + > > > > +static inline bool parity32(u32 val) > > > > +{ > > > > + return parity16(val ^ (val >> 16)); > > > > +} > > > > + > > > > +static inline bool parity64(u64 val) > > > > +{ > > > > + return parity32(val ^ (val >> 32)); > > > > +} > > > > > > That was discussed between Jiri and me in v2. Fixed types functions > > > are needed only in a few very specific cases. With the exception of > > > I3C driver (which doesn't look good for both Jiri and me), all the > > > drivers have the type of variable passed to the parityXX() matching > > > the actual variable length. It means that fixed-type versions of the > > > parity() are simply not needed. So if we don't need them, please don't > > > introduce it. > > > > > So, I should add the following parity() macro in v3, remove parity8(), > > and update all current parity8() users to use this macro, right? > > If you go with macro, please apply my patch and modify it in-place > with this __auto_type thing and GCC hack. Feel free to add your > co-developed-by, or tested, or whatever. > > > I changed u64 to __auto_type and applied David's suggestion to replace > > the >> 32 with >> 16 >> 16 to avoid compiler warnings. > > > > Regards, > > Kuan-Wei > > > > #define parity(val) \ > > ({ \ > > __auto_type __v = (val); \ > > bool __ret; \ > > switch (BITS_PER_TYPE(val)) { \ > > case 64: \ > > __v ^= __v >> 16 >> 16; \ > > fallthrough; \ > > This hack should be GCC-only, and well documented. > For clang it should be > __v ^= __v >> 32; \ There is no point doing a conditional - it just obscures things. > > > case 32: \ > > __v ^= __v >> 16; \ > > fallthrough; \ > > case 16: \ > > __v ^= __v >> 8; \ > > fallthrough; \ > > case 8: \ > > __v ^= __v >> 4; \ > > __ret = (0x6996 >> (__v & 0xf)) & 1; \ > > break; \ > > default: \ > > BUILD_BUG(); \ > > } \ > > __ret; \ > > })
diff --git a/lib/Makefile b/lib/Makefile index 7bab71e59019..45affad85ee4 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ percpu-refcount.o rhashtable.o base64.o \ once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ - generic-radix-tree.o bitmap-str.o + generic-radix-tree.o bitmap-str.o parity.o obj-y += string_helpers.o obj-y += hexdump.o obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o diff --git a/lib/parity.c b/lib/parity.c new file mode 100644 index 000000000000..a83ff8d96778 --- /dev/null +++ b/lib/parity.c @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * lib/parity.c + * + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com> + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com> + * + * __parity[sdt]i2 can be overridden by linking arch-specific versions. + */ + +#include <linux/export.h> +#include <linux/kernel.h> + +/* + * One explanation of this algorithm: + * https://funloop.org/codex/problem/parity/README.html + */ +int __weak __paritysi2(u32 val); +int __weak __paritysi2(u32 val) +{ + val ^= val >> 16; + val ^= val >> 8; + val ^= val >> 4; + return (0x6996 >> (val & 0xf)) & 1; +} +EXPORT_SYMBOL(__paritysi2); + +int __weak __paritydi2(u64 val); +int __weak __paritydi2(u64 val) +{ + val ^= val >> 32; + val ^= val >> 16; + val ^= val >> 8; + val ^= val >> 4; + return (0x6996 >> (val & 0xf)) & 1; +} +EXPORT_SYMBOL(__paritydi2); + +int __weak __parityti2(u64 val); +int __weak __parityti2(u64 val) +{ + val ^= val >> 32; + val ^= val >> 16; + val ^= val >> 8; + val ^= val >> 4; + return (0x6996 >> (val & 0xf)) & 1; +} +EXPORT_SYMBOL(__parityti2);