mbox series

[v5,0/4] Introduce the for_each_set_clump macro

Message ID cover.1588460322.git.syednwaris@gmail.com (mailing list archive)
Headers show
Series Introduce the for_each_set_clump macro | expand

Message

Syed Nayyar Waris May 2, 2020, 11:08 p.m. UTC
This patchset introduces a new generic version of for_each_set_clump. 
The previous version of for_each_set_clump8 used a fixed size 8-bit
clump, but the new generic version can work with clump of any size but
less than or equal to BITS_PER_LONG. The patchset utilizes the new macro 
in several GPIO drivers.

The earlier 8-bit for_each_set_clump8 facilitated a
for-loop syntax that iterates over a memory region entire groups of set
bits at a time.

For example, suppose you would like to iterate over a 32-bit integer 8
bits at a time, skipping over 8-bit groups with no set bit, where
XXXXXXXX represents the current 8-bit group:

    Example:        10111110 00000000 11111111 00110011
    First loop:     10111110 00000000 11111111 XXXXXXXX
    Second loop:    10111110 00000000 XXXXXXXX 00110011
    Third loop:     XXXXXXXX 00000000 11111111 00110011

Each iteration of the loop returns the next 8-bit group that has at
least one set bit.

But with the new for_each_set_clump the clump size can be different from 8 bits.
Moreover, the clump can be split at word boundary in situations where word 
size is not multiple of clump size. Following are examples showing the working 
of new macro for clump sizes of 24 bits and 6 bits.

Example 1:
clump size: 24 bits, Number of clumps (or ports): 10
bitmap stores the bit information from where successive clumps are retrieved.

     /* bitmap memory region */
        0x00aa0000ff000000;  /* Most significant bits */
        0xaaaaaa0000ff0000;
        0x000000aa000000aa;
        0xbbbbabcdeffedcba;  /* Least significant bits */

Different iterations of for_each_set_clump:-
'offset' is the bit position and 'clump' is the 24 bit clump from the
above bitmap.
Iteration first:        offset: 0 clump: 0xfedcba
Iteration second:       offset: 24 clump: 0xabcdef
Iteration third:        offset: 48 clump: 0xaabbbb
Iteration fourth:       offset: 96 clump: 0xaa
Iteration fifth:        offset: 144 clump: 0xff
Iteration sixth:        offset: 168 clump: 0xaaaaaa
Iteration seventh:      offset: 216 clump: 0xff
Loop breaks because in the end the remaining bits (0x00aa) size was less
than clump size of 24 bits.

In above example it can be seen that in iteration third, the 24 bit clump
that was retrieved was split between bitmap[0] and bitmap[1]. This example 
also shows that 24 bit zeroes if present in between, were skipped (preserving
the previous for_each_set_macro8 behaviour). 

Example 2:
clump size = 6 bits, Number of clumps (or ports) = 3.

     /* bitmap memory region */
        0x00aa0000ff000000;  /* Most significant bits */
        0xaaaaaa0000ff0000;
        0x0f00000000000000;
        0x0000000000000ac0;  /* Least significant bits */

Different iterations of for_each_set_clump:
'offset' is the bit position and 'clump' is the 6 bit clump from the
above bitmap.
Iteration first:        offset: 6 clump: 0x2b
Loop breaks because 6 * 3 = 18 bits traversed in bitmap.
Here 6 * 3 is clump size * no. of clumps.

Changes in v5:
 - [Patch 4/4]: Minor change: Hardcode value for better code readability.

Changes in v4:
 - [Patch 2/4]: Use 'for' loop in test function of for_each_set_clump.
 - [Patch 3/4]: Minor change: Hardcode value for better code readability.
 - [Patch 4/4]: Minor change: Hardcode value for better code readability.

Changes in v3:
 - [Patch 3/4]: Change datatype of some variables from u64 to unsigned long
   in function thunderx_gpio_set_multiple.

CHanges in v2:
 - [Patch 2/4]: Unify different tests for 'for_each_set_clump'. Pass test data as
   function parameters.
 - [Patch 2/4]: Remove unnecessary bitmap_zero calls.

Syed Nayyar Waris (4):
  bitops: Introduce the the for_each_set_clump macro
  lib/test_bitmap.c: Add for_each_set_clump test cases
  gpio: thunderx: Utilize for_each_set_clump macro
  gpio: xilinx: Utilize for_each_set_clump macro

 drivers/gpio/gpio-thunderx.c      |  11 ++-
 drivers/gpio/gpio-xilinx.c        |  62 ++++++-------
 include/asm-generic/bitops/find.h |  19 ++++
 include/linux/bitmap.h            |  61 +++++++++++++
 include/linux/bitops.h            |  13 +++
 lib/find_bit.c                    |  14 +++
 lib/test_bitmap.c                 | 141 ++++++++++++++++++++++++++++++
 7 files changed, 287 insertions(+), 34 deletions(-)


base-commit: 25c04a75f14fdc074d7dd1d6d40b49eddd0e66e7

Comments

Andy Shevchenko May 4, 2020, 11:41 a.m. UTC | #1
On Sun, May 03, 2020 at 04:38:36AM +0530, Syed Nayyar Waris wrote:
> This patchset introduces a new generic version of for_each_set_clump. 
> The previous version of for_each_set_clump8 used a fixed size 8-bit
> clump, but the new generic version can work with clump of any size but
> less than or equal to BITS_PER_LONG. The patchset utilizes the new macro 
> in several GPIO drivers.
> 
> The earlier 8-bit for_each_set_clump8 facilitated a
> for-loop syntax that iterates over a memory region entire groups of set
> bits at a time.
> 
> For example, suppose you would like to iterate over a 32-bit integer 8
> bits at a time, skipping over 8-bit groups with no set bit, where
> XXXXXXXX represents the current 8-bit group:
> 
>     Example:        10111110 00000000 11111111 00110011
>     First loop:     10111110 00000000 11111111 XXXXXXXX
>     Second loop:    10111110 00000000 XXXXXXXX 00110011
>     Third loop:     XXXXXXXX 00000000 11111111 00110011
> 
> Each iteration of the loop returns the next 8-bit group that has at
> least one set bit.
> 
> But with the new for_each_set_clump the clump size can be different from 8 bits.
> Moreover, the clump can be split at word boundary in situations where word 
> size is not multiple of clump size. Following are examples showing the working 
> of new macro for clump sizes of 24 bits and 6 bits.
> 
> Example 1:
> clump size: 24 bits, Number of clumps (or ports): 10
> bitmap stores the bit information from where successive clumps are retrieved.
> 
>      /* bitmap memory region */
>         0x00aa0000ff000000;  /* Most significant bits */
>         0xaaaaaa0000ff0000;
>         0x000000aa000000aa;
>         0xbbbbabcdeffedcba;  /* Least significant bits */
> 
> Different iterations of for_each_set_clump:-
> 'offset' is the bit position and 'clump' is the 24 bit clump from the
> above bitmap.
> Iteration first:        offset: 0 clump: 0xfedcba
> Iteration second:       offset: 24 clump: 0xabcdef
> Iteration third:        offset: 48 clump: 0xaabbbb
> Iteration fourth:       offset: 96 clump: 0xaa
> Iteration fifth:        offset: 144 clump: 0xff
> Iteration sixth:        offset: 168 clump: 0xaaaaaa
> Iteration seventh:      offset: 216 clump: 0xff
> Loop breaks because in the end the remaining bits (0x00aa) size was less
> than clump size of 24 bits.
> 
> In above example it can be seen that in iteration third, the 24 bit clump
> that was retrieved was split between bitmap[0] and bitmap[1]. This example 
> also shows that 24 bit zeroes if present in between, were skipped (preserving
> the previous for_each_set_macro8 behaviour). 
> 
> Example 2:
> clump size = 6 bits, Number of clumps (or ports) = 3.
> 
>      /* bitmap memory region */
>         0x00aa0000ff000000;  /* Most significant bits */
>         0xaaaaaa0000ff0000;
>         0x0f00000000000000;
>         0x0000000000000ac0;  /* Least significant bits */
> 
> Different iterations of for_each_set_clump:
> 'offset' is the bit position and 'clump' is the 6 bit clump from the
> above bitmap.
> Iteration first:        offset: 6 clump: 0x2b
> Loop breaks because 6 * 3 = 18 bits traversed in bitmap.
> Here 6 * 3 is clump size * no. of clumps.

Looking into the last patches where we have examples I still do not see a
benefit of variadic clump sizes. power of 2 sizes would make sense (and be
optimized accordingly (64-bit, 32-bit).
William Breathitt Gray May 4, 2020, 2:36 p.m. UTC | #2
On Mon, May 04, 2020 at 02:41:09PM +0300, Andy Shevchenko wrote:
> On Sun, May 03, 2020 at 04:38:36AM +0530, Syed Nayyar Waris wrote:
> > This patchset introduces a new generic version of for_each_set_clump. 
> > The previous version of for_each_set_clump8 used a fixed size 8-bit
> > clump, but the new generic version can work with clump of any size but
> > less than or equal to BITS_PER_LONG. The patchset utilizes the new macro 
> > in several GPIO drivers.
> > 
> > The earlier 8-bit for_each_set_clump8 facilitated a
> > for-loop syntax that iterates over a memory region entire groups of set
> > bits at a time.
> > 
> > For example, suppose you would like to iterate over a 32-bit integer 8
> > bits at a time, skipping over 8-bit groups with no set bit, where
> > XXXXXXXX represents the current 8-bit group:
> > 
> >     Example:        10111110 00000000 11111111 00110011
> >     First loop:     10111110 00000000 11111111 XXXXXXXX
> >     Second loop:    10111110 00000000 XXXXXXXX 00110011
> >     Third loop:     XXXXXXXX 00000000 11111111 00110011
> > 
> > Each iteration of the loop returns the next 8-bit group that has at
> > least one set bit.
> > 
> > But with the new for_each_set_clump the clump size can be different from 8 bits.
> > Moreover, the clump can be split at word boundary in situations where word 
> > size is not multiple of clump size. Following are examples showing the working 
> > of new macro for clump sizes of 24 bits and 6 bits.
> > 
> > Example 1:
> > clump size: 24 bits, Number of clumps (or ports): 10
> > bitmap stores the bit information from where successive clumps are retrieved.
> > 
> >      /* bitmap memory region */
> >         0x00aa0000ff000000;  /* Most significant bits */
> >         0xaaaaaa0000ff0000;
> >         0x000000aa000000aa;
> >         0xbbbbabcdeffedcba;  /* Least significant bits */
> > 
> > Different iterations of for_each_set_clump:-
> > 'offset' is the bit position and 'clump' is the 24 bit clump from the
> > above bitmap.
> > Iteration first:        offset: 0 clump: 0xfedcba
> > Iteration second:       offset: 24 clump: 0xabcdef
> > Iteration third:        offset: 48 clump: 0xaabbbb
> > Iteration fourth:       offset: 96 clump: 0xaa
> > Iteration fifth:        offset: 144 clump: 0xff
> > Iteration sixth:        offset: 168 clump: 0xaaaaaa
> > Iteration seventh:      offset: 216 clump: 0xff
> > Loop breaks because in the end the remaining bits (0x00aa) size was less
> > than clump size of 24 bits.
> > 
> > In above example it can be seen that in iteration third, the 24 bit clump
> > that was retrieved was split between bitmap[0] and bitmap[1]. This example 
> > also shows that 24 bit zeroes if present in between, were skipped (preserving
> > the previous for_each_set_macro8 behaviour). 
> > 
> > Example 2:
> > clump size = 6 bits, Number of clumps (or ports) = 3.
> > 
> >      /* bitmap memory region */
> >         0x00aa0000ff000000;  /* Most significant bits */
> >         0xaaaaaa0000ff0000;
> >         0x0f00000000000000;
> >         0x0000000000000ac0;  /* Least significant bits */
> > 
> > Different iterations of for_each_set_clump:
> > 'offset' is the bit position and 'clump' is the 6 bit clump from the
> > above bitmap.
> > Iteration first:        offset: 6 clump: 0x2b
> > Loop breaks because 6 * 3 = 18 bits traversed in bitmap.
> > Here 6 * 3 is clump size * no. of clumps.
> 
> Looking into the last patches where we have examples I still do not see a
> benefit of variadic clump sizes. power of 2 sizes would make sense (and be
> optimized accordingly (64-bit, 32-bit).
> 
> -- 
> With Best Regards,
> Andy Shevchenko

There is of course benefit in defining for_each_set_clump with clump
sizes of powers of 2 (we can optimize for 32 and 64 bit sizes and avoid
boundary checks that we know will not occur), but at the very least the
variable size bitmap_set_value and bitmap_get_value provide significant
benefit for the readability of the gpio-xilinx code:

	bitmap_set_value(old, state[0], 0, width[0]);
	bitmap_set_value(old, state[1], width[0], width[1]);
	...
	state[0] = bitmap_get_value(new, 0, width[0]);
	state[1] = bitmap_get_value(new, width[0], width[1]);

These lines are simple and clear to read: we know immediately what they
do. But if we did not have bitmap_set_value/bitmap_get_value, we'd have
to use several bitwise operations for each line; the obfuscation of the
code would be an obvious hinderance here.

William Breathitt Gray
Andy Shevchenko May 5, 2020, 1:51 p.m. UTC | #3
On Mon, May 4, 2020 at 5:41 PM William Breathitt Gray
<vilhelm.gray@gmail.com> wrote:
> On Mon, May 04, 2020 at 02:41:09PM +0300, Andy Shevchenko wrote:
> > On Sun, May 03, 2020 at 04:38:36AM +0530, Syed Nayyar Waris wrote:

...

> > Looking into the last patches where we have examples I still do not see a
> > benefit of variadic clump sizes. power of 2 sizes would make sense (and be
> > optimized accordingly (64-bit, 32-bit).
> >
> > --
> > With Best Regards,
> > Andy Shevchenko
>
> There is of course benefit in defining for_each_set_clump with clump
> sizes of powers of 2 (we can optimize for 32 and 64 bit sizes and avoid
> boundary checks that we know will not occur), but at the very least the
> variable size bitmap_set_value and bitmap_get_value provide significant
> benefit for the readability of the gpio-xilinx code:
>
>         bitmap_set_value(old, state[0], 0, width[0]);
>         bitmap_set_value(old, state[1], width[0], width[1]);
>         ...
>         state[0] = bitmap_get_value(new, 0, width[0]);
>         state[1] = bitmap_get_value(new, width[0], width[1]);
>
> These lines are simple and clear to read: we know immediately what they
> do. But if we did not have bitmap_set_value/bitmap_get_value, we'd have
> to use several bitwise operations for each line; the obfuscation of the
> code would be an obvious hinderance here.

Do I understand correctly that width[0] and width[1] may not be power
of two and it's actually the case?
William Breathitt Gray May 5, 2020, 2:53 p.m. UTC | #4
On Tue, May 05, 2020 at 04:51:56PM +0300, Andy Shevchenko wrote:
> On Mon, May 4, 2020 at 5:41 PM William Breathitt Gray
> <vilhelm.gray@gmail.com> wrote:
> > On Mon, May 04, 2020 at 02:41:09PM +0300, Andy Shevchenko wrote:
> > > On Sun, May 03, 2020 at 04:38:36AM +0530, Syed Nayyar Waris wrote:
> 
> ...
> 
> > > Looking into the last patches where we have examples I still do not see a
> > > benefit of variadic clump sizes. power of 2 sizes would make sense (and be
> > > optimized accordingly (64-bit, 32-bit).
> > >
> > > --
> > > With Best Regards,
> > > Andy Shevchenko
> >
> > There is of course benefit in defining for_each_set_clump with clump
> > sizes of powers of 2 (we can optimize for 32 and 64 bit sizes and avoid
> > boundary checks that we know will not occur), but at the very least the
> > variable size bitmap_set_value and bitmap_get_value provide significant
> > benefit for the readability of the gpio-xilinx code:
> >
> >         bitmap_set_value(old, state[0], 0, width[0]);
> >         bitmap_set_value(old, state[1], width[0], width[1]);
> >         ...
> >         state[0] = bitmap_get_value(new, 0, width[0]);
> >         state[1] = bitmap_get_value(new, width[0], width[1]);
> >
> > These lines are simple and clear to read: we know immediately what they
> > do. But if we did not have bitmap_set_value/bitmap_get_value, we'd have
> > to use several bitwise operations for each line; the obfuscation of the
> > code would be an obvious hinderance here.
> 
> Do I understand correctly that width[0] and width[1] may not be power
> of two and it's actually the case?
> 
> -- 
> With Best Regards,
> Andy Shevchenko

I'm under the impression that width[0] and width[1] are arbitrarily
chosen by the user and could be any integer. I have never used this
hardware so I'm hoping one of the gpio-xilinx or GPIO subsystem
maintainers in this thread will respond with some guidance.

If the values of width[0] and width[1] are restricted to powers of 2,
then I agree that there is no need for generic bitmap_set_value and
bitmap_get_value functions and we can instead use more optimized power
of 2 versions.

William Breathitt Gray
Syed Nayyar Waris May 9, 2020, 4:36 p.m. UTC | #5
On Tue, May 5, 2020 at 8:24 PM William Breathitt Gray
<vilhelm.gray@gmail.com> wrote:
>
> On Tue, May 05, 2020 at 04:51:56PM +0300, Andy Shevchenko wrote:
> > On Mon, May 4, 2020 at 5:41 PM William Breathitt Gray
> > <vilhelm.gray@gmail.com> wrote:
> > > On Mon, May 04, 2020 at 02:41:09PM +0300, Andy Shevchenko wrote:
> > > > On Sun, May 03, 2020 at 04:38:36AM +0530, Syed Nayyar Waris wrote:
> >
> > ...
> >
> > > > Looking into the last patches where we have examples I still do not see a
> > > > benefit of variadic clump sizes. power of 2 sizes would make sense (and be
> > > > optimized accordingly (64-bit, 32-bit).
> > > >
> > > > --
> > > > With Best Regards,
> > > > Andy Shevchenko
> > >
> > > There is of course benefit in defining for_each_set_clump with clump
> > > sizes of powers of 2 (we can optimize for 32 and 64 bit sizes and avoid
> > > boundary checks that we know will not occur), but at the very least the
> > > variable size bitmap_set_value and bitmap_get_value provide significant
> > > benefit for the readability of the gpio-xilinx code:
> > >
> > >         bitmap_set_value(old, state[0], 0, width[0]);
> > >         bitmap_set_value(old, state[1], width[0], width[1]);
> > >         ...
> > >         state[0] = bitmap_get_value(new, 0, width[0]);
> > >         state[1] = bitmap_get_value(new, width[0], width[1]);
> > >
> > > These lines are simple and clear to read: we know immediately what they
> > > do. But if we did not have bitmap_set_value/bitmap_get_value, we'd have
> > > to use several bitwise operations for each line; the obfuscation of the
> > > code would be an obvious hinderance here.
> >
> > Do I understand correctly that width[0] and width[1] may not be power
> > of two and it's actually the case?
> >
> > --
> > With Best Regards,
> > Andy Shevchenko
>
> I'm under the impression that width[0] and width[1] are arbitrarily
> chosen by the user and could be any integer. I have never used this
> hardware so I'm hoping one of the gpio-xilinx or GPIO subsystem
> maintainers in this thread will respond with some guidance.
>
> If the values of width[0] and width[1] are restricted to powers of 2,
> then I agree that there is no need for generic bitmap_set_value and
> bitmap_get_value functions and we can instead use more optimized power
> of 2 versions.
>
> William Breathitt Gray


Regarding the question that whether width[0] and width[1] can have any
value or they are restricted to power-of-2.

Referring to the document (This xilinx GPIO IP was mentioned in the
gpio-xilinx.c file):
https://www.xilinx.com/support/documentation/ip_documentation/axi_gpio/v2_0/pg144-axi-gpio.pdf

On page 8, we can see that the GPIO widths for the 2 channels can have
values different from power-of-2.For example: 5, 15 etc.

So, I think we should keep the 'for_each_set_clump',
'bitmap_get_value' and 'bitmap_set_value' as completely generic.

I am proceeding further for my next patchset submission keeping above
findings in mind. If you guys think something else or would like to
add something, let me know.

Regards
Syed Nayyar Waris
Andy Shevchenko May 10, 2020, 7:05 p.m. UTC | #6
On Sat, May 9, 2020 at 7:36 PM Syed Nayyar Waris <syednwaris@gmail.com> wrote:
> On Tue, May 5, 2020 at 8:24 PM William Breathitt Gray
> <vilhelm.gray@gmail.com> wrote:
> > On Tue, May 05, 2020 at 04:51:56PM +0300, Andy Shevchenko wrote:
> > > On Mon, May 4, 2020 at 5:41 PM William Breathitt Gray
> > > <vilhelm.gray@gmail.com> wrote:
> > > > On Mon, May 04, 2020 at 02:41:09PM +0300, Andy Shevchenko wrote:
> > > > > On Sun, May 03, 2020 at 04:38:36AM +0530, Syed Nayyar Waris wrote:

...

> > > > > Looking into the last patches where we have examples I still do not see a
> > > > > benefit of variadic clump sizes. power of 2 sizes would make sense (and be
> > > > > optimized accordingly (64-bit, 32-bit).

> > > > There is of course benefit in defining for_each_set_clump with clump
> > > > sizes of powers of 2 (we can optimize for 32 and 64 bit sizes and avoid
> > > > boundary checks that we know will not occur), but at the very least the
> > > > variable size bitmap_set_value and bitmap_get_value provide significant
> > > > benefit for the readability of the gpio-xilinx code:
> > > >
> > > >         bitmap_set_value(old, state[0], 0, width[0]);
> > > >         bitmap_set_value(old, state[1], width[0], width[1]);
> > > >         ...
> > > >         state[0] = bitmap_get_value(new, 0, width[0]);
> > > >         state[1] = bitmap_get_value(new, width[0], width[1]);
> > > >
> > > > These lines are simple and clear to read: we know immediately what they
> > > > do. But if we did not have bitmap_set_value/bitmap_get_value, we'd have
> > > > to use several bitwise operations for each line; the obfuscation of the
> > > > code would be an obvious hinderance here.
> > >
> > > Do I understand correctly that width[0] and width[1] may not be power
> > > of two and it's actually the case?

> > I'm under the impression that width[0] and width[1] are arbitrarily
> > chosen by the user and could be any integer. I have never used this
> > hardware so I'm hoping one of the gpio-xilinx or GPIO subsystem
> > maintainers in this thread will respond with some guidance.
> >
> > If the values of width[0] and width[1] are restricted to powers of 2,
> > then I agree that there is no need for generic bitmap_set_value and
> > bitmap_get_value functions and we can instead use more optimized power
> > of 2 versions.

> Regarding the question that whether width[0] and width[1] can have any
> value or they are restricted to power-of-2.
>
> Referring to the document (This xilinx GPIO IP was mentioned in the
> gpio-xilinx.c file):
> https://www.xilinx.com/support/documentation/ip_documentation/axi_gpio/v2_0/pg144-axi-gpio.pdf
>
> On page 8, we can see that the GPIO widths for the 2 channels can have
> values different from power-of-2.For example: 5, 15 etc.
>
> So, I think we should keep the 'for_each_set_clump',
> 'bitmap_get_value' and 'bitmap_set_value' as completely generic.
>
> I am proceeding further for my next patchset submission keeping above
> findings in mind. If you guys think something else or would like to
> add something, let me know.

Thank you for investigation. So, if Xilinx is okay with the change, I
have no objections.