Message ID | 20250223164217.2139331-1-visitorckw@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce and use generic parity32/64 helper | expand |
On 23. 02. 25 17:42, Kuan-Wei Chiu wrote: > Several parts of the kernel contain redundant implementations of parity > calculations for 32-bit and 64-bit values. Introduces generic > parity32() and parity64() helpers in bitops.h, providing a standardized > and optimized implementation. > > Subsequent patches refactor various kernel components to replace > open-coded parity calculations with the new helpers, reducing code > duplication and improving maintainability. Please note that GCC (and clang) provide __builtin_parity{,l,ll}() family of builtin functions. Recently, I have tried to use this builtin in a couple of places [1], [2], but I had to retract the patches, because __builtin functions aren't strictly required to be inlined and can generate a library call [3]. As explained in [2], the compilers are able to emit optimized target-dependent code (also automatically using popcnt insn when avaialble), so ideally the generic parity64() and parity32() would be implemented using __builtin_parity(), where the generic library would provide a fallback __paritydi2() and __paritysi2() functions, otherwise provided by the compiler support library. For x86, we would like to exercise the hardware parity calculation or optimized code sequences involving HW parity calculation, as shown in [1] and [2]. [1] https://lore.kernel.org/lkml/20250129205746.10963-1-ubizjak@gmail.com/ [2] https://lore.kernel.org/lkml/20250129154920.6773-2-ubizjak@gmail.com/ [3] https://lore.kernel.org/linux-mm/CAKbZUD0N7bkuw_Le3Pr9o1V2BjjcY_YiLm8a8DPceubTdZ00GQ@mail.gmail.com/ Thanks, Uros.
Hi Kuan-Wei, > Several parts of the kernel contain redundant implementations of parity > calculations for 32-bit and 64-bit values. Introduces generic > parity32() and parity64() helpers in bitops.h, providing a standardized > and optimized implementation. More so than __builtin_parity() ? I'm all for reducing the duplication, but the compiler may well have a better parity approach than the xor-folding implementation here. Looks like we can get this to two instructions on powerpc64, for example. Cheers, Jeremy
On Sun, Feb 23, 2025 at 09:25:42PM +0100, Uros Bizjak wrote: > > Please note that GCC (and clang) provide __builtin_parity{,l,ll}() family of > builtin functions. Recently, I have tried to use this builtin in a couple of > places [1], [2], but I had to retract the patches, because __builtin > functions aren't strictly required to be inlined and can generate a library > call [3]. > > As explained in [2], the compilers are able to emit optimized > target-dependent code (also automatically using popcnt insn when avaialble), > so ideally the generic parity64() and parity32() would be implemented using > __builtin_parity(), where the generic library would provide a fallback > __paritydi2() and __paritysi2() functions, otherwise provided by the > compiler support library. > > For x86, we would like to exercise the hardware parity calculation or > optimized code sequences involving HW parity calculation, as shown in [1] > and [2]. > > [1] https://lore.kernel.org/lkml/20250129205746.10963-1-ubizjak@gmail.com/ > > [2] https://lore.kernel.org/lkml/20250129154920.6773-2-ubizjak@gmail.com/ > > [3] https://lore.kernel.org/linux-mm/CAKbZUD0N7bkuw_Le3Pr9o1V2BjjcY_YiLm8a8DPceubTdZ00GQ@mail.gmail.com/ Hi Uros, Thanks for your information. We originally planned to implement hardware optimizations after this patch series. However, for V2, We will incorporate __builtin_parity(), while keeping our current implementation as the fallback function. Best regards, Yu-Chun Lin
On Mon, Feb 24, 2025 at 03:58:49PM +0800, Jeremy Kerr wrote: > More so than __builtin_parity() ? > > I'm all for reducing the duplication, but the compiler may well have a > better parity approach than the xor-folding implementation here. Looks > like we can get this to two instructions on powerpc64, for example. Hi Jeremy, Thank for your input. We will do that in V2. Best regards, Yu-Chun Lin