diff mbox series

[v2,01/18] lib/parity: Add __builtin_parity() fallback implementations

Message ID 20250301142409.2513835-2-visitorckw@gmail.com (mailing list archive)
State Not Applicable
Headers show
Series Introduce and use generic parity16/32/64 helper | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Kuan-Wei Chiu March 1, 2025, 2:23 p.m. UTC
Add generic C implementations of __paritysi2(), __paritydi2(), and
__parityti2() as fallback functions in lib/parity.c. These functions
compute the parity of a given integer using a bitwise approach and are
marked with __weak, allowing architecture-specific implementations to
override them.

This patch serves as preparation for using __builtin_parity() by
ensuring a fallback mechanism is available when the compiler does not
inline the __builtin_parity().

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 lib/Makefile |  2 +-
 lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 lib/parity.c

Comments

Yury Norov March 2, 2025, 3:10 a.m. UTC | #1
On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote:
> Add generic C implementations of __paritysi2(), __paritydi2(), and
> __parityti2() as fallback functions in lib/parity.c. These functions
> compute the parity of a given integer using a bitwise approach and are
> marked with __weak, allowing architecture-specific implementations to
> override them.
> 
> This patch serves as preparation for using __builtin_parity() by
> ensuring a fallback mechanism is available when the compiler does not
> inline the __builtin_parity().
> 
> Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
>  lib/Makefile |  2 +-
>  lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+), 1 deletion(-)
>  create mode 100644 lib/parity.c
> 
> diff --git a/lib/Makefile b/lib/Makefile
> index 7bab71e59019..45affad85ee4 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
>  	 bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \
>  	 percpu-refcount.o rhashtable.o base64.o \
>  	 once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \
> -	 generic-radix-tree.o bitmap-str.o
> +	 generic-radix-tree.o bitmap-str.o parity.o
>  obj-y += string_helpers.o
>  obj-y += hexdump.o
>  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> diff --git a/lib/parity.c b/lib/parity.c
> new file mode 100644
> index 000000000000..a83ff8d96778
> --- /dev/null
> +++ b/lib/parity.c
> @@ -0,0 +1,48 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * lib/parity.c
> + *
> + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com>
> + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com>
> + *
> + * __parity[sdt]i2 can be overridden by linking arch-specific versions.
> + */
> +
> +#include <linux/export.h>
> +#include <linux/kernel.h>
> +
> +/*
> + * One explanation of this algorithm:
> + * https://funloop.org/codex/problem/parity/README.html

I already asked you not to spread this link. Is there any reason to
ignore it?

> + */
> +int __weak __paritysi2(u32 val);
> +int __weak __paritysi2(u32 val)
> +{
> +	val ^= val >> 16;
> +	val ^= val >> 8;
> +	val ^= val >> 4;
> +	return (0x6996 >> (val & 0xf)) & 1;
> +}
> +EXPORT_SYMBOL(__paritysi2);
> +
> +int __weak __paritydi2(u64 val);
> +int __weak __paritydi2(u64 val)
> +{
> +	val ^= val >> 32;
> +	val ^= val >> 16;
> +	val ^= val >> 8;
> +	val ^= val >> 4;
> +	return (0x6996 >> (val & 0xf)) & 1;
> +}
> +EXPORT_SYMBOL(__paritydi2);
> +
> +int __weak __parityti2(u64 val);
> +int __weak __parityti2(u64 val)
> +{
> +	val ^= val >> 32;
> +	val ^= val >> 16;
> +	val ^= val >> 8;
> +	val ^= val >> 4;
> +	return (0x6996 >> (val & 0xf)) & 1;
> +}
> +EXPORT_SYMBOL(__parityti2);

OK, it seems I wasn't clear enough on the previous round, so I'll try
to be very straightforward now.

To begin with, the difference between __parityti2 and __paritydi2 
doesn't exist. I'm seriously. I put them side by side, and there's
no difference at all.

Next, this all is clearly an overengineering. You bake all those weak,
const and (ironically, missing) high-efficient arch implementations.
But you show no evidence that:
 - it improves on code generation,
 - the drivers care about parity()'s performance, and
 - show no perf tests at all.

So you end up with +185/-155 LOCs.

Those +30 lines add no new functionality. You copy-paste the same
algorithm again and again in very core kernel files. This is a no-go
for a nice consolidation series.

In the previous round reviewers gave you quite a few nice suggestions.
H. Peter Anvin suggested to switch the function to return a boolean, I
suggested to make it a macro and even sent you a patch, Jiri and David
also spent their time trying to help you, and became ignored.

Nevertheless. NAK for the series. Whatever you end up, if it comes to
v3, please make it simple, avoid code duplication and run checkpatch.

Thanks,
Yury
Kuan-Wei Chiu March 2, 2025, 8:20 a.m. UTC | #2
Hi Yury,

On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote:
> On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote:
> > Add generic C implementations of __paritysi2(), __paritydi2(), and
> > __parityti2() as fallback functions in lib/parity.c. These functions
> > compute the parity of a given integer using a bitwise approach and are
> > marked with __weak, allowing architecture-specific implementations to
> > override them.
> > 
> > This patch serves as preparation for using __builtin_parity() by
> > ensuring a fallback mechanism is available when the compiler does not
> > inline the __builtin_parity().
> > 
> > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > ---
> >  lib/Makefile |  2 +-
> >  lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 49 insertions(+), 1 deletion(-)
> >  create mode 100644 lib/parity.c
> > 
> > diff --git a/lib/Makefile b/lib/Makefile
> > index 7bab71e59019..45affad85ee4 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
> >  	 bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \
> >  	 percpu-refcount.o rhashtable.o base64.o \
> >  	 once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \
> > -	 generic-radix-tree.o bitmap-str.o
> > +	 generic-radix-tree.o bitmap-str.o parity.o
> >  obj-y += string_helpers.o
> >  obj-y += hexdump.o
> >  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> > diff --git a/lib/parity.c b/lib/parity.c
> > new file mode 100644
> > index 000000000000..a83ff8d96778
> > --- /dev/null
> > +++ b/lib/parity.c
> > @@ -0,0 +1,48 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * lib/parity.c
> > + *
> > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com>
> > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com>
> > + *
> > + * __parity[sdt]i2 can be overridden by linking arch-specific versions.
> > + */
> > +
> > +#include <linux/export.h>
> > +#include <linux/kernel.h>
> > +
> > +/*
> > + * One explanation of this algorithm:
> > + * https://funloop.org/codex/problem/parity/README.html
> 
> I already asked you not to spread this link. Is there any reason to
> ignore it?
> 
In v2, this algorithm was removed from bitops.h, so I moved the link
here instead. I'm sorry if it seemed like I ignored your comment.

> > + */
> > +int __weak __paritysi2(u32 val);
> > +int __weak __paritysi2(u32 val)
> > +{
> > +	val ^= val >> 16;
> > +	val ^= val >> 8;
> > +	val ^= val >> 4;
> > +	return (0x6996 >> (val & 0xf)) & 1;
> > +}
> > +EXPORT_SYMBOL(__paritysi2);
> > +
> > +int __weak __paritydi2(u64 val);
> > +int __weak __paritydi2(u64 val)
> > +{
> > +	val ^= val >> 32;
> > +	val ^= val >> 16;
> > +	val ^= val >> 8;
> > +	val ^= val >> 4;
> > +	return (0x6996 >> (val & 0xf)) & 1;
> > +}
> > +EXPORT_SYMBOL(__paritydi2);
> > +
> > +int __weak __parityti2(u64 val);
> > +int __weak __parityti2(u64 val)
> > +{
> > +	val ^= val >> 32;
> > +	val ^= val >> 16;
> > +	val ^= val >> 8;
> > +	val ^= val >> 4;
> > +	return (0x6996 >> (val & 0xf)) & 1;
> > +}
> > +EXPORT_SYMBOL(__parityti2);
> 
> OK, it seems I wasn't clear enough on the previous round, so I'll try
> to be very straightforward now.
> 
> To begin with, the difference between __parityti2 and __paritydi2 
> doesn't exist. I'm seriously. I put them side by side, and there's
> no difference at all.
> 
> Next, this all is clearly an overengineering. You bake all those weak,
> const and (ironically, missing) high-efficient arch implementations.
> But you show no evidence that:
>  - it improves on code generation,
>  - the drivers care about parity()'s performance, and
>  - show no perf tests at all.
> 
> So you end up with +185/-155 LOCs.
> 
> Those +30 lines add no new functionality. You copy-paste the same
> algorithm again and again in very core kernel files. This is a no-go
> for a nice consolidation series.
> 
> In the previous round reviewers gave you quite a few nice suggestions.
> H. Peter Anvin suggested to switch the function to return a boolean, I
> suggested to make it a macro and even sent you a patch, Jiri and David
> also spent their time trying to help you, and became ignored.
> 
> Nevertheless. NAK for the series. Whatever you end up, if it comes to
> v3, please make it simple, avoid code duplication and run checkpatch.
> 
In v1, I used the same approach as parity8() because I couldn't justify
the performance impact in a specific driver or subsystem. However,
multiple people commented on using __builtin_parity or an x86 assembly
implementation. I'm not ignoring their feedback-I want to address these
comments. Before submitting, I sent an email explaining my current
approach: using David's suggested method along with __builtin_parity,
but no one responded. So, I decided to submit v2 for discussion
instead.

To avoid mistakes in v3, I want to confirm the following changes before
sending it:

(a) Change the return type from int to bool.
(b) Avoid __builtin_parity and use the same approach as parity8().
(c) Implement parity16/32/64() as single-line inline functions that
    call the next smaller variant after xor.
(d) Add a parity() macro that selects the appropriate parityXX() based
    on type size.
(e) Update users to use this parity() macro.

However, (d) may require a patch affecting multiple subsystems at once
since some places that already include bitops.h have functions named
parity(), causing conflicts. Unless we decide not to add this macro in
the end.

As for checkpatch.pl warnings, they are mostly pre-existing coding
style issues in this series. I've kept them as-is, but if preferred,
I'm fine with fixing them.

If anything is incorrect or if there are concerns, please let me know.

Regards,
Kuan-Wei

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index c1cb53cf2f0f..47b7eca8d3b7 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -260,6 +260,43 @@ static inline int parity8(u8 val)
 	return (0x6996 >> (val & 0xf)) & 1;
 }

+static inline bool parity16(u16 val)
+{
+	return parity8(val ^ (val >> 8));
+}
+
+static inline bool parity32(u32 val)
+{
+	return parity16(val ^ (val >> 16));
+}
+
+static inline bool parity64(u64 val)
+{
+	return parity32(val ^ (val >> 32));
+}
+
+#define parity(val)			\
+({					\
+	bool __ret;			\
+	switch (BITS_PER_TYPE(val)) {	\
+	case 64:			\
+		__ret = parity64(val);	\
+		break;			\
+	case 32:			\
+		__ret = parity32(val);	\
+		break;			\
+	case 16:			\
+		__ret = parity16(val);	\
+		break;			\
+	case 8:				\
+		__ret = parity8(val);	\
+		break;			\
+	default:			\
+		BUILD_BUG();		\
+	}				\
+	__ret;				\
+})
+
 /**
  * __ffs64 - find first set bit in a 64 bit word
  * @word: The 64 bit word
Yury Norov March 2, 2025, 4:02 p.m. UTC | #3
On Sun, Mar 02, 2025 at 04:20:02PM +0800, Kuan-Wei Chiu wrote:
> Hi Yury,
> 
> On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote:
> > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote:
> > > Add generic C implementations of __paritysi2(), __paritydi2(), and
> > > __parityti2() as fallback functions in lib/parity.c. These functions
> > > compute the parity of a given integer using a bitwise approach and are
> > > marked with __weak, allowing architecture-specific implementations to
> > > override them.
> > > 
> > > This patch serves as preparation for using __builtin_parity() by
> > > ensuring a fallback mechanism is available when the compiler does not
> > > inline the __builtin_parity().
> > > 
> > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > ---
> > >  lib/Makefile |  2 +-
> > >  lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 49 insertions(+), 1 deletion(-)
> > >  create mode 100644 lib/parity.c
> > > 
> > > diff --git a/lib/Makefile b/lib/Makefile
> > > index 7bab71e59019..45affad85ee4 100644
> > > --- a/lib/Makefile
> > > +++ b/lib/Makefile
> > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
> > >  	 bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \
> > >  	 percpu-refcount.o rhashtable.o base64.o \
> > >  	 once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \
> > > -	 generic-radix-tree.o bitmap-str.o
> > > +	 generic-radix-tree.o bitmap-str.o parity.o
> > >  obj-y += string_helpers.o
> > >  obj-y += hexdump.o
> > >  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> > > diff --git a/lib/parity.c b/lib/parity.c
> > > new file mode 100644
> > > index 000000000000..a83ff8d96778
> > > --- /dev/null
> > > +++ b/lib/parity.c
> > > @@ -0,0 +1,48 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * lib/parity.c
> > > + *
> > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com>
> > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com>
> > > + *
> > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions.
> > > + */
> > > +
> > > +#include <linux/export.h>
> > > +#include <linux/kernel.h>
> > > +
> > > +/*
> > > + * One explanation of this algorithm:
> > > + * https://funloop.org/codex/problem/parity/README.html
> > 
> > I already asked you not to spread this link. Is there any reason to
> > ignore it?
> > 
> In v2, this algorithm was removed from bitops.h, so I moved the link
> here instead. I'm sorry if it seemed like I ignored your comment.

Yes, it is.
 
> In v1, I used the same approach as parity8() because I couldn't justify
> the performance impact in a specific driver or subsystem. However,
> multiple people commented on using __builtin_parity or an x86 assembly
> implementation. I'm not ignoring their feedback-I want to address these

Please ask those multiple people: are they ready to maintain all that
zoo of macros, weak implementations, arch implementations and stubs
for no clear benefit? Performance is always worth it, but again I see
not even a hint that the drivers care about performance. You don't
measure it, so don't care as well. Right?

> comments. Before submitting, I sent an email explaining my current
> approach: using David's suggested method along with __builtin_parity,
> but no one responded. So, I decided to submit v2 for discussion
> instead.

For discussion use tag RFC.

> 
> To avoid mistakes in v3, I want to confirm the following changes before
> sending it:
> 
> (a) Change the return type from int to bool.
> (b) Avoid __builtin_parity and use the same approach as parity8().
> (c) Implement parity16/32/64() as single-line inline functions that
>     call the next smaller variant after xor.
> (d) Add a parity() macro that selects the appropriate parityXX() based
>     on type size.
> (e) Update users to use this parity() macro.
> 
> However, (d) may require a patch affecting multiple subsystems at once
> since some places that already include bitops.h have functions named
> parity(), causing conflicts. Unless we decide not to add this macro in
> the end.
> 
> As for checkpatch.pl warnings, they are mostly pre-existing coding
> style issues in this series. I've kept them as-is, but if preferred,
> I'm fine with fixing them.

Checkpatch only complains on new lines. Particularly this patch should
trigger checkpatch warning because it adds a new file but doesn't touch
MAINTAINERS. 

> If anything is incorrect or if there are concerns, please let me know.
> 
> Regards,
> Kuan-Wei
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index c1cb53cf2f0f..47b7eca8d3b7 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -260,6 +260,43 @@ static inline int parity8(u8 val)
>  	return (0x6996 >> (val & 0xf)) & 1;
>  }
> 
> +static inline bool parity16(u16 val)
> +{
> +	return parity8(val ^ (val >> 8));
> +}
> +
> +static inline bool parity32(u32 val)
> +{
> +	return parity16(val ^ (val >> 16));
> +}
> +
> +static inline bool parity64(u64 val)
> +{
> +	return parity32(val ^ (val >> 32));
> +}

That was discussed between Jiri and me in v2. Fixed types functions
are needed only in a few very specific cases. With the exception of
I3C driver (which doesn't look good for both Jiri and me), all the
drivers have the type of variable passed to the parityXX() matching 
the actual variable length. It means that fixed-type versions of the
parity() are simply not needed. So if we don't need them, please don't
introduce it.

> +#define parity(val)			\
> +({					\
> +	bool __ret;			\
> +	switch (BITS_PER_TYPE(val)) {	\
> +	case 64:			\
> +		__ret = parity64(val);	\
> +		break;			\
> +	case 32:			\
> +		__ret = parity32(val);	\
> +		break;			\
> +	case 16:			\
> +		__ret = parity16(val);	\
> +		break;			\
> +	case 8:				\
> +		__ret = parity8(val);	\
> +		break;			\
> +	default:			\
> +		BUILD_BUG();		\
> +	}				\
> +	__ret;				\
> +})
> +
>  /**
>   * __ffs64 - find first set bit in a 64 bit word
>   * @word: The 64 bit word
Kuan-Wei Chiu March 2, 2025, 5:29 p.m. UTC | #4
Hi Yury,

On Sun, Mar 02, 2025 at 11:02:12AM -0500, Yury Norov wrote:
> On Sun, Mar 02, 2025 at 04:20:02PM +0800, Kuan-Wei Chiu wrote:
> > Hi Yury,
> > 
> > On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote:
> > > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote:
> > > > Add generic C implementations of __paritysi2(), __paritydi2(), and
> > > > __parityti2() as fallback functions in lib/parity.c. These functions
> > > > compute the parity of a given integer using a bitwise approach and are
> > > > marked with __weak, allowing architecture-specific implementations to
> > > > override them.
> > > > 
> > > > This patch serves as preparation for using __builtin_parity() by
> > > > ensuring a fallback mechanism is available when the compiler does not
> > > > inline the __builtin_parity().
> > > > 
> > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > > ---
> > > >  lib/Makefile |  2 +-
> > > >  lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 49 insertions(+), 1 deletion(-)
> > > >  create mode 100644 lib/parity.c
> > > > 
> > > > diff --git a/lib/Makefile b/lib/Makefile
> > > > index 7bab71e59019..45affad85ee4 100644
> > > > --- a/lib/Makefile
> > > > +++ b/lib/Makefile
> > > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
> > > >  	 bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \
> > > >  	 percpu-refcount.o rhashtable.o base64.o \
> > > >  	 once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \
> > > > -	 generic-radix-tree.o bitmap-str.o
> > > > +	 generic-radix-tree.o bitmap-str.o parity.o
> > > >  obj-y += string_helpers.o
> > > >  obj-y += hexdump.o
> > > >  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> > > > diff --git a/lib/parity.c b/lib/parity.c
> > > > new file mode 100644
> > > > index 000000000000..a83ff8d96778
> > > > --- /dev/null
> > > > +++ b/lib/parity.c
> > > > @@ -0,0 +1,48 @@
> > > > +// SPDX-License-Identifier: GPL-2.0-only
> > > > +/*
> > > > + * lib/parity.c
> > > > + *
> > > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com>
> > > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com>
> > > > + *
> > > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions.
> > > > + */
> > > > +
> > > > +#include <linux/export.h>
> > > > +#include <linux/kernel.h>
> > > > +
> > > > +/*
> > > > + * One explanation of this algorithm:
> > > > + * https://funloop.org/codex/problem/parity/README.html
> > > 
> > > I already asked you not to spread this link. Is there any reason to
> > > ignore it?
> > > 
> > In v2, this algorithm was removed from bitops.h, so I moved the link
> > here instead. I'm sorry if it seemed like I ignored your comment.
> 
> Yes, it is.
>  
> > In v1, I used the same approach as parity8() because I couldn't justify
> > the performance impact in a specific driver or subsystem. However,
> > multiple people commented on using __builtin_parity or an x86 assembly
> > implementation. I'm not ignoring their feedback-I want to address these
> 
> Please ask those multiple people: are they ready to maintain all that
> zoo of macros, weak implementations, arch implementations and stubs
> for no clear benefit? Performance is always worth it, but again I see
> not even a hint that the drivers care about performance. You don't
> measure it, so don't care as well. Right?
> 
> > comments. Before submitting, I sent an email explaining my current
> > approach: using David's suggested method along with __builtin_parity,
> > but no one responded. So, I decided to submit v2 for discussion
> > instead.
> 
> For discussion use tag RFC.
> 
> > 
> > To avoid mistakes in v3, I want to confirm the following changes before
> > sending it:
> > 
> > (a) Change the return type from int to bool.
> > (b) Avoid __builtin_parity and use the same approach as parity8().
> > (c) Implement parity16/32/64() as single-line inline functions that
> >     call the next smaller variant after xor.
> > (d) Add a parity() macro that selects the appropriate parityXX() based
> >     on type size.
> > (e) Update users to use this parity() macro.
> > 
> > However, (d) may require a patch affecting multiple subsystems at once
> > since some places that already include bitops.h have functions named
> > parity(), causing conflicts. Unless we decide not to add this macro in
> > the end.
> > 
> > As for checkpatch.pl warnings, they are mostly pre-existing coding
> > style issues in this series. I've kept them as-is, but if preferred,
> > I'm fine with fixing them.
> 
> Checkpatch only complains on new lines. Particularly this patch should
> trigger checkpatch warning because it adds a new file but doesn't touch
> MAINTAINERS. 
> 
For example, the following warning:

ERROR: space required after that ',' (ctx:VxV)
#84: FILE: drivers/input/joystick/sidewinder.c:368:
+                       if (!parity64(GB(0,33)))
                                          ^

This issue already existed before this series, and I'm keeping its
style unchanged for now.

> > If anything is incorrect or if there are concerns, please let me know.
> > 
> > Regards,
> > Kuan-Wei
> > 
> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > index c1cb53cf2f0f..47b7eca8d3b7 100644
> > --- a/include/linux/bitops.h
> > +++ b/include/linux/bitops.h
> > @@ -260,6 +260,43 @@ static inline int parity8(u8 val)
> >  	return (0x6996 >> (val & 0xf)) & 1;
> >  }
> > 
> > +static inline bool parity16(u16 val)
> > +{
> > +	return parity8(val ^ (val >> 8));
> > +}
> > +
> > +static inline bool parity32(u32 val)
> > +{
> > +	return parity16(val ^ (val >> 16));
> > +}
> > +
> > +static inline bool parity64(u64 val)
> > +{
> > +	return parity32(val ^ (val >> 32));
> > +}
> 
> That was discussed between Jiri and me in v2. Fixed types functions
> are needed only in a few very specific cases. With the exception of
> I3C driver (which doesn't look good for both Jiri and me), all the
> drivers have the type of variable passed to the parityXX() matching 
> the actual variable length. It means that fixed-type versions of the
> parity() are simply not needed. So if we don't need them, please don't
> introduce it.
>
So, I should add the following parity() macro in v3, remove parity8(),
and update all current parity8() users to use this macro, right?

I changed u64 to __auto_type and applied David's suggestion to replace
the >> 32 with >> 16 >> 16 to avoid compiler warnings.

Regards,
Kuan-Wei

#define parity(val)					\
({							\
	__auto_type __v = (val);			\
	bool __ret;					\
	switch (BITS_PER_TYPE(val)) {			\
	case 64:					\
		__v ^= __v >> 16 >> 16;			\
		fallthrough;				\
	case 32:					\
		__v ^= __v >> 16;			\
		fallthrough;				\
	case 16:					\
		__v ^= __v >> 8;			\
		fallthrough;				\
	case 8:						\
		__v ^= __v >> 4;			\
		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
		break;					\
	default:					\
		BUILD_BUG();				\
	}						\
	__ret;						\
})
David Laight March 2, 2025, 7:09 p.m. UTC | #5
On Mon, 3 Mar 2025 01:29:19 +0800
Kuan-Wei Chiu <visitorckw@gmail.com> wrote:

> Hi Yury,
> 
> On Sun, Mar 02, 2025 at 11:02:12AM -0500, Yury Norov wrote:
> > On Sun, Mar 02, 2025 at 04:20:02PM +0800, Kuan-Wei Chiu wrote:  
> > > Hi Yury,
> > > 
> > > On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote:  
> > > > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote:  
> > > > > Add generic C implementations of __paritysi2(), __paritydi2(), and
> > > > > __parityti2() as fallback functions in lib/parity.c. These functions
> > > > > compute the parity of a given integer using a bitwise approach and are
> > > > > marked with __weak, allowing architecture-specific implementations to
> > > > > override them.
> > > > > 
> > > > > This patch serves as preparation for using __builtin_parity() by
> > > > > ensuring a fallback mechanism is available when the compiler does not
> > > > > inline the __builtin_parity().
> > > > > 
> > > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > > > ---
> > > > >  lib/Makefile |  2 +-
> > > > >  lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  2 files changed, 49 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 lib/parity.c
> > > > > 
> > > > > diff --git a/lib/Makefile b/lib/Makefile
> > > > > index 7bab71e59019..45affad85ee4 100644
> > > > > --- a/lib/Makefile
> > > > > +++ b/lib/Makefile
> > > > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
> > > > >  	 bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \
> > > > >  	 percpu-refcount.o rhashtable.o base64.o \
> > > > >  	 once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \
> > > > > -	 generic-radix-tree.o bitmap-str.o
> > > > > +	 generic-radix-tree.o bitmap-str.o parity.o
> > > > >  obj-y += string_helpers.o
> > > > >  obj-y += hexdump.o
> > > > >  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> > > > > diff --git a/lib/parity.c b/lib/parity.c
> > > > > new file mode 100644
> > > > > index 000000000000..a83ff8d96778
> > > > > --- /dev/null
> > > > > +++ b/lib/parity.c
> > > > > @@ -0,0 +1,48 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0-only
> > > > > +/*
> > > > > + * lib/parity.c
> > > > > + *
> > > > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com>
> > > > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com>
> > > > > + *
> > > > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions.
> > > > > + */
> > > > > +
> > > > > +#include <linux/export.h>
> > > > > +#include <linux/kernel.h>
> > > > > +
> > > > > +/*
> > > > > + * One explanation of this algorithm:
> > > > > + * https://funloop.org/codex/problem/parity/README.html  
> > > > 
> > > > I already asked you not to spread this link. Is there any reason to
> > > > ignore it?
> > > >   
> > > In v2, this algorithm was removed from bitops.h, so I moved the link
> > > here instead. I'm sorry if it seemed like I ignored your comment.  
> > 
> > Yes, it is.
> >    
> > > In v1, I used the same approach as parity8() because I couldn't justify
> > > the performance impact in a specific driver or subsystem. However,
> > > multiple people commented on using __builtin_parity or an x86 assembly
> > > implementation. I'm not ignoring their feedback-I want to address these  
> > 
> > Please ask those multiple people: are they ready to maintain all that
> > zoo of macros, weak implementations, arch implementations and stubs
> > for no clear benefit? Performance is always worth it, but again I see
> > not even a hint that the drivers care about performance. You don't
> > measure it, so don't care as well. Right?
> >   
> > > comments. Before submitting, I sent an email explaining my current
> > > approach: using David's suggested method along with __builtin_parity,
> > > but no one responded. So, I decided to submit v2 for discussion
> > > instead.  
> > 
> > For discussion use tag RFC.
> >   
> > > 
> > > To avoid mistakes in v3, I want to confirm the following changes before
> > > sending it:
> > > 
> > > (a) Change the return type from int to bool.
> > > (b) Avoid __builtin_parity and use the same approach as parity8().
> > > (c) Implement parity16/32/64() as single-line inline functions that
> > >     call the next smaller variant after xor.
> > > (d) Add a parity() macro that selects the appropriate parityXX() based
> > >     on type size.
> > > (e) Update users to use this parity() macro.
> > > 
> > > However, (d) may require a patch affecting multiple subsystems at once
> > > since some places that already include bitops.h have functions named
> > > parity(), causing conflicts. Unless we decide not to add this macro in
> > > the end.
> > > 
> > > As for checkpatch.pl warnings, they are mostly pre-existing coding
> > > style issues in this series. I've kept them as-is, but if preferred,
> > > I'm fine with fixing them.  
> > 
> > Checkpatch only complains on new lines. Particularly this patch should
> > trigger checkpatch warning because it adds a new file but doesn't touch
> > MAINTAINERS. 
> >   
> For example, the following warning:
> 
> ERROR: space required after that ',' (ctx:VxV)
> #84: FILE: drivers/input/joystick/sidewinder.c:368:
> +                       if (!parity64(GB(0,33)))
>                                           ^
> 
> This issue already existed before this series, and I'm keeping its
> style unchanged for now.
> 
> > > If anything is incorrect or if there are concerns, please let me know.
> > > 
> > > Regards,
> > > Kuan-Wei
> > > 
> > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > > index c1cb53cf2f0f..47b7eca8d3b7 100644
> > > --- a/include/linux/bitops.h
> > > +++ b/include/linux/bitops.h
> > > @@ -260,6 +260,43 @@ static inline int parity8(u8 val)
> > >  	return (0x6996 >> (val & 0xf)) & 1;
> > >  }
> > > 
> > > +static inline bool parity16(u16 val)
> > > +{
> > > +	return parity8(val ^ (val >> 8));
> > > +}
> > > +
> > > +static inline bool parity32(u32 val)
> > > +{
> > > +	return parity16(val ^ (val >> 16));
> > > +}
> > > +
> > > +static inline bool parity64(u64 val)
> > > +{
> > > +	return parity32(val ^ (val >> 32));
> > > +}  
> > 
> > That was discussed between Jiri and me in v2. Fixed types functions
> > are needed only in a few very specific cases. With the exception of
> > I3C driver (which doesn't look good for both Jiri and me), all the
> > drivers have the type of variable passed to the parityXX() matching 
> > the actual variable length. It means that fixed-type versions of the
> > parity() are simply not needed. So if we don't need them, please don't
> > introduce it.
> >  
> So, I should add the following parity() macro in v3, remove parity8(),
> and update all current parity8() users to use this macro, right?
> 
> I changed u64 to __auto_type and applied David's suggestion to replace
> the >> 32 with >> 16 >> 16 to avoid compiler warnings.
> 
> Regards,
> Kuan-Wei
> 
> #define parity(val)					\
> ({							\
> 	__auto_type __v = (val);			\
> 	bool __ret;					\
> 	switch (BITS_PER_TYPE(val)) {			\
> 	case 64:					\
> 		__v ^= __v >> 16 >> 16;			\
> 		fallthrough;				\
> 	case 32:					\
> 		__v ^= __v >> 16;			\
> 		fallthrough;				\
> 	case 16:					\
> 		__v ^= __v >> 8;			\
> 		fallthrough;				\
> 	case 8:						\
> 		__v ^= __v >> 4;			\
> 		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> 		break;					\
> 	default:					\
> 		BUILD_BUG();				\
> 	}						\
> 	__ret;						\
> })

I'm seeing double-register shifts for 64bit values on 32bit systems.
And gcc is doing 64bit double-register maths all the way down.

That is fixed by changing the top of the define to
#define parity(val)					\
({							\
	unsigned int __v = (val);			\
	bool __ret;					\
	switch (BITS_PER_TYPE(val)) {			\
	case 64:					\
		__v ^= val >> 16 >> 16;			\
		fallthrough;				\

But it's need changing to only expand 'val' once.
Perhaps:
	auto_type _val = (val);
	u32 __ret = val;
and (mostly) s/__v/__ret/g

	David
Kuan-Wei Chiu March 3, 2025, 2:47 a.m. UTC | #6
On Sun, Mar 02, 2025 at 07:09:54PM +0000, David Laight wrote:
> On Mon, 3 Mar 2025 01:29:19 +0800
> Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
> 
> > Hi Yury,
> > 
> > On Sun, Mar 02, 2025 at 11:02:12AM -0500, Yury Norov wrote:
> > > On Sun, Mar 02, 2025 at 04:20:02PM +0800, Kuan-Wei Chiu wrote:  
> > > > Hi Yury,
> > > > 
> > > > On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote:  
> > > > > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote:  
> > > > > > Add generic C implementations of __paritysi2(), __paritydi2(), and
> > > > > > __parityti2() as fallback functions in lib/parity.c. These functions
> > > > > > compute the parity of a given integer using a bitwise approach and are
> > > > > > marked with __weak, allowing architecture-specific implementations to
> > > > > > override them.
> > > > > > 
> > > > > > This patch serves as preparation for using __builtin_parity() by
> > > > > > ensuring a fallback mechanism is available when the compiler does not
> > > > > > inline the __builtin_parity().
> > > > > > 
> > > > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > > > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > > > > ---
> > > > > >  lib/Makefile |  2 +-
> > > > > >  lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  2 files changed, 49 insertions(+), 1 deletion(-)
> > > > > >  create mode 100644 lib/parity.c
> > > > > > 
> > > > > > diff --git a/lib/Makefile b/lib/Makefile
> > > > > > index 7bab71e59019..45affad85ee4 100644
> > > > > > --- a/lib/Makefile
> > > > > > +++ b/lib/Makefile
> > > > > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
> > > > > >  	 bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \
> > > > > >  	 percpu-refcount.o rhashtable.o base64.o \
> > > > > >  	 once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \
> > > > > > -	 generic-radix-tree.o bitmap-str.o
> > > > > > +	 generic-radix-tree.o bitmap-str.o parity.o
> > > > > >  obj-y += string_helpers.o
> > > > > >  obj-y += hexdump.o
> > > > > >  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> > > > > > diff --git a/lib/parity.c b/lib/parity.c
> > > > > > new file mode 100644
> > > > > > index 000000000000..a83ff8d96778
> > > > > > --- /dev/null
> > > > > > +++ b/lib/parity.c
> > > > > > @@ -0,0 +1,48 @@
> > > > > > +// SPDX-License-Identifier: GPL-2.0-only
> > > > > > +/*
> > > > > > + * lib/parity.c
> > > > > > + *
> > > > > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com>
> > > > > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com>
> > > > > > + *
> > > > > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions.
> > > > > > + */
> > > > > > +
> > > > > > +#include <linux/export.h>
> > > > > > +#include <linux/kernel.h>
> > > > > > +
> > > > > > +/*
> > > > > > + * One explanation of this algorithm:
> > > > > > + * https://funloop.org/codex/problem/parity/README.html  
> > > > > 
> > > > > I already asked you not to spread this link. Is there any reason to
> > > > > ignore it?
> > > > >   
> > > > In v2, this algorithm was removed from bitops.h, so I moved the link
> > > > here instead. I'm sorry if it seemed like I ignored your comment.  
> > > 
> > > Yes, it is.
> > >    
> > > > In v1, I used the same approach as parity8() because I couldn't justify
> > > > the performance impact in a specific driver or subsystem. However,
> > > > multiple people commented on using __builtin_parity or an x86 assembly
> > > > implementation. I'm not ignoring their feedback-I want to address these  
> > > 
> > > Please ask those multiple people: are they ready to maintain all that
> > > zoo of macros, weak implementations, arch implementations and stubs
> > > for no clear benefit? Performance is always worth it, but again I see
> > > not even a hint that the drivers care about performance. You don't
> > > measure it, so don't care as well. Right?
> > >   
> > > > comments. Before submitting, I sent an email explaining my current
> > > > approach: using David's suggested method along with __builtin_parity,
> > > > but no one responded. So, I decided to submit v2 for discussion
> > > > instead.  
> > > 
> > > For discussion use tag RFC.
> > >   
> > > > 
> > > > To avoid mistakes in v3, I want to confirm the following changes before
> > > > sending it:
> > > > 
> > > > (a) Change the return type from int to bool.
> > > > (b) Avoid __builtin_parity and use the same approach as parity8().
> > > > (c) Implement parity16/32/64() as single-line inline functions that
> > > >     call the next smaller variant after xor.
> > > > (d) Add a parity() macro that selects the appropriate parityXX() based
> > > >     on type size.
> > > > (e) Update users to use this parity() macro.
> > > > 
> > > > However, (d) may require a patch affecting multiple subsystems at once
> > > > since some places that already include bitops.h have functions named
> > > > parity(), causing conflicts. Unless we decide not to add this macro in
> > > > the end.
> > > > 
> > > > As for checkpatch.pl warnings, they are mostly pre-existing coding
> > > > style issues in this series. I've kept them as-is, but if preferred,
> > > > I'm fine with fixing them.  
> > > 
> > > Checkpatch only complains on new lines. Particularly this patch should
> > > trigger checkpatch warning because it adds a new file but doesn't touch
> > > MAINTAINERS. 
> > >   
> > For example, the following warning:
> > 
> > ERROR: space required after that ',' (ctx:VxV)
> > #84: FILE: drivers/input/joystick/sidewinder.c:368:
> > +                       if (!parity64(GB(0,33)))
> >                                           ^
> > 
> > This issue already existed before this series, and I'm keeping its
> > style unchanged for now.
> > 
> > > > If anything is incorrect or if there are concerns, please let me know.
> > > > 
> > > > Regards,
> > > > Kuan-Wei
> > > > 
> > > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > > > index c1cb53cf2f0f..47b7eca8d3b7 100644
> > > > --- a/include/linux/bitops.h
> > > > +++ b/include/linux/bitops.h
> > > > @@ -260,6 +260,43 @@ static inline int parity8(u8 val)
> > > >  	return (0x6996 >> (val & 0xf)) & 1;
> > > >  }
> > > > 
> > > > +static inline bool parity16(u16 val)
> > > > +{
> > > > +	return parity8(val ^ (val >> 8));
> > > > +}
> > > > +
> > > > +static inline bool parity32(u32 val)
> > > > +{
> > > > +	return parity16(val ^ (val >> 16));
> > > > +}
> > > > +
> > > > +static inline bool parity64(u64 val)
> > > > +{
> > > > +	return parity32(val ^ (val >> 32));
> > > > +}  
> > > 
> > > That was discussed between Jiri and me in v2. Fixed types functions
> > > are needed only in a few very specific cases. With the exception of
> > > I3C driver (which doesn't look good for both Jiri and me), all the
> > > drivers have the type of variable passed to the parityXX() matching 
> > > the actual variable length. It means that fixed-type versions of the
> > > parity() are simply not needed. So if we don't need them, please don't
> > > introduce it.
> > >  
> > So, I should add the following parity() macro in v3, remove parity8(),
> > and update all current parity8() users to use this macro, right?
> > 
> > I changed u64 to __auto_type and applied David's suggestion to replace
> > the >> 32 with >> 16 >> 16 to avoid compiler warnings.
> > 
> > Regards,
> > Kuan-Wei
> > 
> > #define parity(val)					\
> > ({							\
> > 	__auto_type __v = (val);			\
> > 	bool __ret;					\
> > 	switch (BITS_PER_TYPE(val)) {			\
> > 	case 64:					\
> > 		__v ^= __v >> 16 >> 16;			\
> > 		fallthrough;				\
> > 	case 32:					\
> > 		__v ^= __v >> 16;			\
> > 		fallthrough;				\
> > 	case 16:					\
> > 		__v ^= __v >> 8;			\
> > 		fallthrough;				\
> > 	case 8:						\
> > 		__v ^= __v >> 4;			\
> > 		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> > 		break;					\
> > 	default:					\
> > 		BUILD_BUG();				\
> > 	}						\
> > 	__ret;						\
> > })
> 
> I'm seeing double-register shifts for 64bit values on 32bit systems.
> And gcc is doing 64bit double-register maths all the way down.
> 
> That is fixed by changing the top of the define to
> #define parity(val)					\
> ({							\
> 	unsigned int __v = (val);			\
> 	bool __ret;					\
> 	switch (BITS_PER_TYPE(val)) {			\
> 	case 64:					\
> 		__v ^= val >> 16 >> 16;			\
> 		fallthrough;				\
> 
> But it's need changing to only expand 'val' once.
> Perhaps:
> 	auto_type _val = (val);
> 	u32 __ret = val;
> and (mostly) s/__v/__ret/g
>
I'm happy to make this change, though I'm a bit confused about how much
we care about the code generated by gcc. So this is the macro expected
in v3:

#define parity(val)					\
({							\
	__auto_type __v = (val);			\
	u32 __ret = val;				\
	switch (BITS_PER_TYPE(val)) {			\
	case 64:					\
                __ret ^= __v >> 16 >> 16;		\
		fallthrough;				\
	case 32:					\
		__ret ^= __ret >> 16;			\
		fallthrough;				\
	case 16:					\
		__ret ^= __ret >> 8;			\
		fallthrough;				\
	case 8:						\
		__ret ^= __ret >> 4;			\
		__ret = (0x6996 >> (__ret & 0xf)) & 1;	\
		break;					\
	default:					\
		BUILD_BUG();				\
	}						\
	__ret;						\
})
David Laight March 3, 2025, 12:41 p.m. UTC | #7
On Mon, 3 Mar 2025 10:47:20 +0800
Kuan-Wei Chiu <visitorckw@gmail.com> wrote:

> On Sun, Mar 02, 2025 at 07:09:54PM +0000, David Laight wrote:
> > On Mon, 3 Mar 2025 01:29:19 +0800
> > Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
> >   
> > > Hi Yury,
> > > 
...
> > > #define parity(val)					\
> > > ({							\
> > > 	__auto_type __v = (val);			\
> > > 	bool __ret;					\
> > > 	switch (BITS_PER_TYPE(val)) {			\
> > > 	case 64:					\
> > > 		__v ^= __v >> 16 >> 16;			\
> > > 		fallthrough;				\
> > > 	case 32:					\
> > > 		__v ^= __v >> 16;			\
> > > 		fallthrough;				\
> > > 	case 16:					\
> > > 		__v ^= __v >> 8;			\
> > > 		fallthrough;				\
> > > 	case 8:						\
> > > 		__v ^= __v >> 4;			\
> > > 		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> > > 		break;					\
> > > 	default:					\
> > > 		BUILD_BUG();				\
> > > 	}						\
> > > 	__ret;						\
> > > })  
> > 
> > I'm seeing double-register shifts for 64bit values on 32bit systems.
> > And gcc is doing 64bit double-register maths all the way down.
> > 
> > That is fixed by changing the top of the define to
> > #define parity(val)					\
> > ({							\
> > 	unsigned int __v = (val);			\
> > 	bool __ret;					\
> > 	switch (BITS_PER_TYPE(val)) {			\
> > 	case 64:					\
> > 		__v ^= val >> 16 >> 16;			\
> > 		fallthrough;				\
> > 
> > But it's need changing to only expand 'val' once.
> > Perhaps:
> > 	auto_type _val = (val);
> > 	u32 __ret = val;
> > and (mostly) s/__v/__ret/g
> >  
> I'm happy to make this change, though I'm a bit confused about how much
> we care about the code generated by gcc. So this is the macro expected
> in v3:

There is 'good', 'bad' and 'ugly' - it was in the 'bad' to 'ugly' area.

> 
> #define parity(val)					\
> ({							\
> 	__auto_type __v = (val);			\
> 	u32 __ret = val;				\
> 	switch (BITS_PER_TYPE(val)) {			\
> 	case 64:					\
>                 __ret ^= __v >> 16 >> 16;		\
> 		fallthrough;				\
> 	case 32:					\
> 		__ret ^= __ret >> 16;			\
> 		fallthrough;				\
> 	case 16:					\
> 		__ret ^= __ret >> 8;			\
> 		fallthrough;				\
> 	case 8:						\
> 		__ret ^= __ret >> 4;			\
> 		__ret = (0x6996 >> (__ret & 0xf)) & 1;	\
> 		break;					\
> 	default:					\
> 		BUILD_BUG();				\
> 	}						\
> 	__ret;						\
> })

That looks like it will avoid double-register shifts on 32bit archs.
arm64 can do slightly better (a couple of instructions) because of its
barrel shifter.
x86 can do a lot better because of the cpu 'parity' flag.
But maybe it is never used anywhere that really matters.

	David
Yury Norov March 3, 2025, 3:15 p.m. UTC | #8
On Mon, Mar 03, 2025 at 01:29:19AM +0800, Kuan-Wei Chiu wrote:
> Hi Yury,
> 
> On Sun, Mar 02, 2025 at 11:02:12AM -0500, Yury Norov wrote:
> > On Sun, Mar 02, 2025 at 04:20:02PM +0800, Kuan-Wei Chiu wrote:
> > > Hi Yury,
> > > 
> > > On Sat, Mar 01, 2025 at 10:10:20PM -0500, Yury Norov wrote:
> > > > On Sat, Mar 01, 2025 at 10:23:52PM +0800, Kuan-Wei Chiu wrote:
> > > > > Add generic C implementations of __paritysi2(), __paritydi2(), and
> > > > > __parityti2() as fallback functions in lib/parity.c. These functions
> > > > > compute the parity of a given integer using a bitwise approach and are
> > > > > marked with __weak, allowing architecture-specific implementations to
> > > > > override them.
> > > > > 
> > > > > This patch serves as preparation for using __builtin_parity() by
> > > > > ensuring a fallback mechanism is available when the compiler does not
> > > > > inline the __builtin_parity().
> > > > > 
> > > > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > > > ---
> > > > >  lib/Makefile |  2 +-
> > > > >  lib/parity.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  2 files changed, 49 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 lib/parity.c
> > > > > 
> > > > > diff --git a/lib/Makefile b/lib/Makefile
> > > > > index 7bab71e59019..45affad85ee4 100644
> > > > > --- a/lib/Makefile
> > > > > +++ b/lib/Makefile
> > > > > @@ -51,7 +51,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
> > > > >  	 bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \
> > > > >  	 percpu-refcount.o rhashtable.o base64.o \
> > > > >  	 once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \
> > > > > -	 generic-radix-tree.o bitmap-str.o
> > > > > +	 generic-radix-tree.o bitmap-str.o parity.o
> > > > >  obj-y += string_helpers.o
> > > > >  obj-y += hexdump.o
> > > > >  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> > > > > diff --git a/lib/parity.c b/lib/parity.c
> > > > > new file mode 100644
> > > > > index 000000000000..a83ff8d96778
> > > > > --- /dev/null
> > > > > +++ b/lib/parity.c
> > > > > @@ -0,0 +1,48 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0-only
> > > > > +/*
> > > > > + * lib/parity.c
> > > > > + *
> > > > > + * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com>
> > > > > + * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com>
> > > > > + *
> > > > > + * __parity[sdt]i2 can be overridden by linking arch-specific versions.
> > > > > + */
> > > > > +
> > > > > +#include <linux/export.h>
> > > > > +#include <linux/kernel.h>
> > > > > +
> > > > > +/*
> > > > > + * One explanation of this algorithm:
> > > > > + * https://funloop.org/codex/problem/parity/README.html
> > > > 
> > > > I already asked you not to spread this link. Is there any reason to
> > > > ignore it?
> > > > 
> > > In v2, this algorithm was removed from bitops.h, so I moved the link
> > > here instead. I'm sorry if it seemed like I ignored your comment.
> > 
> > Yes, it is.
> >  
> > > In v1, I used the same approach as parity8() because I couldn't justify
> > > the performance impact in a specific driver or subsystem. However,
> > > multiple people commented on using __builtin_parity or an x86 assembly
> > > implementation. I'm not ignoring their feedback-I want to address these
> > 
> > Please ask those multiple people: are they ready to maintain all that
> > zoo of macros, weak implementations, arch implementations and stubs
> > for no clear benefit? Performance is always worth it, but again I see
> > not even a hint that the drivers care about performance. You don't
> > measure it, so don't care as well. Right?
> > 
> > > comments. Before submitting, I sent an email explaining my current
> > > approach: using David's suggested method along with __builtin_parity,
> > > but no one responded. So, I decided to submit v2 for discussion
> > > instead.
> > 
> > For discussion use tag RFC.
> > 
> > > 
> > > To avoid mistakes in v3, I want to confirm the following changes before
> > > sending it:
> > > 
> > > (a) Change the return type from int to bool.
> > > (b) Avoid __builtin_parity and use the same approach as parity8().
> > > (c) Implement parity16/32/64() as single-line inline functions that
> > >     call the next smaller variant after xor.
> > > (d) Add a parity() macro that selects the appropriate parityXX() based
> > >     on type size.
> > > (e) Update users to use this parity() macro.
> > > 
> > > However, (d) may require a patch affecting multiple subsystems at once
> > > since some places that already include bitops.h have functions named
> > > parity(), causing conflicts. Unless we decide not to add this macro in
> > > the end.
> > > 
> > > As for checkpatch.pl warnings, they are mostly pre-existing coding
> > > style issues in this series. I've kept them as-is, but if preferred,
> > > I'm fine with fixing them.
> > 
> > Checkpatch only complains on new lines. Particularly this patch should
> > trigger checkpatch warning because it adds a new file but doesn't touch
> > MAINTAINERS. 
> > 
> For example, the following warning:
> 
> ERROR: space required after that ',' (ctx:VxV)
> #84: FILE: drivers/input/joystick/sidewinder.c:368:
> +                       if (!parity64(GB(0,33)))
>                                           ^
> 
> This issue already existed before this series, and I'm keeping its
> style unchanged for now.
> 
> > > If anything is incorrect or if there are concerns, please let me know.
> > > 
> > > Regards,
> > > Kuan-Wei
> > > 
> > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > > index c1cb53cf2f0f..47b7eca8d3b7 100644
> > > --- a/include/linux/bitops.h
> > > +++ b/include/linux/bitops.h
> > > @@ -260,6 +260,43 @@ static inline int parity8(u8 val)
> > >  	return (0x6996 >> (val & 0xf)) & 1;
> > >  }
> > > 
> > > +static inline bool parity16(u16 val)
> > > +{
> > > +	return parity8(val ^ (val >> 8));
> > > +}
> > > +
> > > +static inline bool parity32(u32 val)
> > > +{
> > > +	return parity16(val ^ (val >> 16));
> > > +}
> > > +
> > > +static inline bool parity64(u64 val)
> > > +{
> > > +	return parity32(val ^ (val >> 32));
> > > +}
> > 
> > That was discussed between Jiri and me in v2. Fixed types functions
> > are needed only in a few very specific cases. With the exception of
> > I3C driver (which doesn't look good for both Jiri and me), all the
> > drivers have the type of variable passed to the parityXX() matching 
> > the actual variable length. It means that fixed-type versions of the
> > parity() are simply not needed. So if we don't need them, please don't
> > introduce it.
> >
> So, I should add the following parity() macro in v3, remove parity8(),
> and update all current parity8() users to use this macro, right?

If you go with macro, please apply my patch and modify it in-place
with this __auto_type thing and GCC hack. Feel free to add your
co-developed-by, or tested, or whatever.

> I changed u64 to __auto_type and applied David's suggestion to replace
> the >> 32 with >> 16 >> 16 to avoid compiler warnings.
> 
> Regards,
> Kuan-Wei
> 
> #define parity(val)					\
> ({							\
> 	__auto_type __v = (val);			\
> 	bool __ret;					\
> 	switch (BITS_PER_TYPE(val)) {			\
> 	case 64:					\
> 		__v ^= __v >> 16 >> 16;			\
> 		fallthrough;				\

This hack should be GCC-only, and well documented.
For clang it should be 
 		__v ^= __v >> 32;			\

> 	case 32:					\
> 		__v ^= __v >> 16;			\
> 		fallthrough;				\
> 	case 16:					\
> 		__v ^= __v >> 8;			\
> 		fallthrough;				\
> 	case 8:						\
> 		__v ^= __v >> 4;			\
> 		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> 		break;					\
> 	default:					\
> 		BUILD_BUG();				\
> 	}						\
> 	__ret;						\
> })
diff mbox series

Patch

diff --git a/lib/Makefile b/lib/Makefile
index 7bab71e59019..45affad85ee4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -51,7 +51,7 @@  obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
 	 bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \
 	 percpu-refcount.o rhashtable.o base64.o \
 	 once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \
-	 generic-radix-tree.o bitmap-str.o
+	 generic-radix-tree.o bitmap-str.o parity.o
 obj-y += string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
diff --git a/lib/parity.c b/lib/parity.c
new file mode 100644
index 000000000000..a83ff8d96778
--- /dev/null
+++ b/lib/parity.c
@@ -0,0 +1,48 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * lib/parity.c
+ *
+ * Copyright (C) 2025 Kuan-Wei Chiu <visitorckw@gmail.com>
+ * Copyright (C) 2025 Yu-Chun Lin <eleanor15x@gmail.com>
+ *
+ * __parity[sdt]i2 can be overridden by linking arch-specific versions.
+ */
+
+#include <linux/export.h>
+#include <linux/kernel.h>
+
+/*
+ * One explanation of this algorithm:
+ * https://funloop.org/codex/problem/parity/README.html
+ */
+int __weak __paritysi2(u32 val);
+int __weak __paritysi2(u32 val)
+{
+	val ^= val >> 16;
+	val ^= val >> 8;
+	val ^= val >> 4;
+	return (0x6996 >> (val & 0xf)) & 1;
+}
+EXPORT_SYMBOL(__paritysi2);
+
+int __weak __paritydi2(u64 val);
+int __weak __paritydi2(u64 val)
+{
+	val ^= val >> 32;
+	val ^= val >> 16;
+	val ^= val >> 8;
+	val ^= val >> 4;
+	return (0x6996 >> (val & 0xf)) & 1;
+}
+EXPORT_SYMBOL(__paritydi2);
+
+int __weak __parityti2(u64 val);
+int __weak __parityti2(u64 val)
+{
+	val ^= val >> 32;
+	val ^= val >> 16;
+	val ^= val >> 8;
+	val ^= val >> 4;
+	return (0x6996 >> (val & 0xf)) & 1;
+}
+EXPORT_SYMBOL(__parityti2);