diff mbox series

[v10,02/14] xen: introduce generic non-atomic test_*bit()

Message ID 219df9d840a183fc55de02aff011c0972a68587c.1715952103.git.oleksii.kurochko@gmail.com (mailing list archive)
State Superseded
Headers show
Series Enable build of full Xen for RISC-V | expand

Commit Message

Oleksii K. May 17, 2024, 1:54 p.m. UTC
The following generic functions were introduced:
* test_bit
* generic__test_and_set_bit
* generic__test_and_clear_bit
* generic__test_and_change_bit

These functions and macros can be useful for architectures
that don't have corresponding arch-specific instructions.

Also, the patch introduces the following generics which are
used by the functions mentioned above:
* BITOP_BITS_PER_WORD
* BITOP_MASK
* BITOP_WORD
* BITOP_TYPE

The following approach was chosen for generic*() and arch*() bit
operation functions:
If the bit operation function that is going to be generic starts
with the prefix "__", then the corresponding generic/arch function
will also contain the "__" prefix. For example:
 * test_bit() will be defined using arch_test_bit() and
   generic_test_bit().
 * __test_and_set_bit() will be defined using
   arch__test_and_set_bit() and generic__test_and_set_bit().

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
   The context ("* Find First Set bit.  Bits are labelled from 1." in xen/bitops.h )
   suggests there's a dependency on an uncommitted patch. It happens becuase the current patch
   series is based on Andrew's patch series ( https://lore.kernel.org/xen-devel/20240313172716.2325427-1-andrew.cooper3@citrix.com/T/#t ),
   but if everything is okay with the current one patch it can be merged without Andrew's patch series being merged.
---
Changes in V10:
 - update the commit message. ( re-order paragraphs and add explanation usage of prefix "__" in bit
   operation function names )
 - add  parentheses around the whole expression of bitop_bad_size() macros.
 - move macros bitop_bad_size() above asm/bitops.h as it is not arch-specific anymore and there is
   no need for overriding it.
 - drop macros check_bitop_size() and use "if ( bitop_bad_size(addr) ) __bitop_bad_size();" implictly
   where it is needed.
 - in <xen/bitops.h> use 'int' as a first parameter for __test_and_*(), generic__test_and_*() to be
   consistent with how the mentioned functions were declared in the original per-arch functions.
 - add 'const' to p variable in generic_test_bit().
 - move definition of BITOP_BITS_PER_WORD and bitop_uint_t to xen/bitops.h as we don't allow for arch
   overrides these definitions anymore. 
---
Changes in V9:
  - move up xen/bitops.h in ppc/asm/page.h.
  - update defintion of arch_check_bitop_size.
    And drop correspondent macros from x86/asm/bitops.h
  - drop parentheses in generic__test_and_set_bit() for definition of
    local variable p.
  - fix indentation inside #ifndef BITOP_TYPE...#endif
  - update the commit message.
---
 Changes in V8:
  - drop __pure for function which uses volatile.
  - drop unnessary () in generic__test_and_change_bit() for addr casting.
  - update prototype of generic_test_bit() and test_bit(): now it returns bool
    instead of int.
  - update generic_test_bit() to use BITOP_MASK().
  - Deal with fls{l} changes: it should be in the patch with introduced generic fls{l}.
  - add a footer with explanation of dependency on an uncommitted patch after Signed-off.
  - abstract bitop_size().
  - move BITOP_TYPE define to <xen/types.h>.
---
 Changes in V7:
  - move everything to xen/bitops.h to follow the same approach for all generic
    bit ops.
  - put together BITOP_BITS_PER_WORD and bitops_uint_t.
  - make BITOP_MASK more generic.
  - drop #ifdef ... #endif around BITOP_MASK, BITOP_WORD as they are generic
    enough.
  - drop "_" for generic__{test_and_set_bit,...}().
  - drop " != 0" for functions which return bool.
  - add volatile during the cast for generic__{...}().
  - update the commit message.
  - update arch related code to follow the proposed generic approach.
---
 Changes in V6:
  - Nothing changed ( only rebase )
---
 Changes in V5:
   - new patch
---
 xen/arch/arm/arm64/livepatch.c    |   1 -
 xen/arch/arm/include/asm/bitops.h |  67 ---------------
 xen/arch/ppc/include/asm/bitops.h |  52 ------------
 xen/arch/ppc/include/asm/page.h   |   2 +-
 xen/arch/ppc/mm-radix.c           |   2 +-
 xen/arch/x86/include/asm/bitops.h |  31 ++-----
 xen/include/xen/bitops.h          | 131 ++++++++++++++++++++++++++++++
 7 files changed, 142 insertions(+), 144 deletions(-)

Comments

Jan Beulich May 21, 2024, 11:10 a.m. UTC | #1
On 17.05.2024 15:54, Oleksii Kurochko wrote:
> The following generic functions were introduced:
> * test_bit
> * generic__test_and_set_bit
> * generic__test_and_clear_bit
> * generic__test_and_change_bit
> 
> These functions and macros can be useful for architectures
> that don't have corresponding arch-specific instructions.
> 
> Also, the patch introduces the following generics which are
> used by the functions mentioned above:
> * BITOP_BITS_PER_WORD
> * BITOP_MASK
> * BITOP_WORD
> * BITOP_TYPE
> 
> The following approach was chosen for generic*() and arch*() bit
> operation functions:
> If the bit operation function that is going to be generic starts
> with the prefix "__", then the corresponding generic/arch function
> will also contain the "__" prefix. For example:
>  * test_bit() will be defined using arch_test_bit() and
>    generic_test_bit().
>  * __test_and_set_bit() will be defined using
>    arch__test_and_set_bit() and generic__test_and_set_bit().
> 
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one remaining nit (can be adjusted while committing, provided other
necessary acks arrive):

> --- a/xen/include/xen/bitops.h
> +++ b/xen/include/xen/bitops.h
> @@ -65,10 +65,141 @@ static inline int generic_flsl(unsigned long x)
>   * scope
>   */
>  
> +#define BITOP_BITS_PER_WORD 32
> +typedef uint32_t bitop_uint_t;
> +
> +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) % BITOP_BITS_PER_WORD))
> +
> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> +
> +extern void __bitop_bad_size(void);
> +
> +#define bitop_bad_size(addr) (sizeof(*(addr)) < sizeof(bitop_uint_t))
> +
>  /* --------------------- Please tidy above here --------------------- */
>  
>  #include <asm/bitops.h>
>  
> +/**
> + * generic__test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static always_inline bool
> +generic__test_and_set_bit(int nr, volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old | mask;
> +    return (old & mask);
> +}
> +
> +/**
> + * generic__test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static always_inline bool
> +generic__test_and_clear_bit(int nr, volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old & ~mask;
> +    return (old & mask);
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */
> +static always_inline bool
> +generic__test_and_change_bit(int nr, volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old ^ mask;
> +    return (old & mask);
> +}
> +/**
> + * generic_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */
> +static always_inline bool generic_test_bit(int nr, const volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    const volatile bitop_uint_t *p =
> +                        (const volatile bitop_uint_t *)addr + BITOP_WORD(nr);

Indentation is odd here. Seeing that the line needs wrapping here, it would
imo want to be

    const volatile bitop_uint_t *p =
        (const volatile bitop_uint_t *)addr + BITOP_WORD(nr);

(i.e. one indentation level further from what the start of the decl has).

Jan
Julien Grall May 23, 2024, 1 p.m. UTC | #2
Hi Oleksii,

On 17/05/2024 14:54, Oleksii Kurochko wrote:
> diff --git a/xen/arch/arm/arm64/livepatch.c b/xen/arch/arm/arm64/livepatch.c
> index df2cebedde..4bc8ed9be5 100644
> --- a/xen/arch/arm/arm64/livepatch.c
> +++ b/xen/arch/arm/arm64/livepatch.c
> @@ -10,7 +10,6 @@
>   #include <xen/mm.h>
>   #include <xen/vmap.h>
>   
> -#include <asm/bitops.h>

It is a bit unclear how this change is related to the patch. Can you 
explain in the commit message?

>   #include <asm/byteorder.h>
>   #include <asm/insn.h>
>   #include <asm/livepatch.h>
> diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h
> index 5104334e48..8e16335e76 100644
> --- a/xen/arch/arm/include/asm/bitops.h
> +++ b/xen/arch/arm/include/asm/bitops.h
> @@ -22,9 +22,6 @@
>   #define __set_bit(n,p)            set_bit(n,p)
>   #define __clear_bit(n,p)          clear_bit(n,p)
>   
> -#define BITOP_BITS_PER_WORD     32
> -#define BITOP_MASK(nr)          (1UL << ((nr) % BITOP_BITS_PER_WORD))
> -#define BITOP_WORD(nr)          ((nr) / BITOP_BITS_PER_WORD)
>   #define BITS_PER_BYTE           8

OOI, any reason BITS_PER_BYTE has not been moved as well? I don't expect 
the value to change across arch.

[...]

> diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
> index f14ad0d33a..6eeeff0117 100644
> --- a/xen/include/xen/bitops.h
> +++ b/xen/include/xen/bitops.h
> @@ -65,10 +65,141 @@ static inline int generic_flsl(unsigned long x)
>    * scope
>    */
>   
> +#define BITOP_BITS_PER_WORD 32
> +typedef uint32_t bitop_uint_t;
> +
> +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) % BITOP_BITS_PER_WORD))
> +
> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> +
> +extern void __bitop_bad_size(void);
> +
> +#define bitop_bad_size(addr) (sizeof(*(addr)) < sizeof(bitop_uint_t))
> +
>   /* --------------------- Please tidy above here --------------------- */
>   
>   #include <asm/bitops.h>
>   
> +/**
> + * generic__test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */

Sorry for only mentioning this on v10. I think this comment should be 
duplicated (or moved to) on top of test_bit() because this is what 
everyone will use. This will avoid the developper to follow the function 
calls and only notice the x86 version which says "This function is 
atomic and may not be reordered." and would be wrong for all the other arch.

> +static always_inline bool
> +generic__test_and_set_bit(int nr, volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old | mask;
> +    return (old & mask);
> +}
> +
> +/**
> + * generic__test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */

Same applies here and ...

> +static always_inline bool
> +generic__test_and_clear_bit(int nr, volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old & ~mask;
> +    return (old & mask);
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */

... here.

> +static always_inline bool
> +generic__test_and_change_bit(int nr, volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old ^ mask;
> +    return (old & mask);
> +}
> +/**
> + * generic_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */
> +static always_inline bool generic_test_bit(int nr, const volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    const volatile bitop_uint_t *p =
> +                        (const volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +
> +    return (*p & mask);
> +}
> +
> +static always_inline bool
> +__test_and_set_bit(int nr, volatile void *addr)
> +{
> +#ifndef arch__test_and_set_bit
> +#define arch__test_and_set_bit generic__test_and_set_bit
> +#endif
> +
> +    return arch__test_and_set_bit(nr, addr);
> +}

NIT: It is a bit too late to change this one. But I have to admit, I 
don't understand the purpose of the static inline when you could have 
simply call...

> +#define __test_and_set_bit(nr, addr) ({             \
> +    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
> +    __test_and_set_bit(nr, addr);                   \

... __arch__test_and_set_bit here.


The only two reasons I am not providing an ack is the:
  * Explanation for the removal of asm/bitops.h in livepatch.c
  * The placement of the comments

There are not too important for me.

Cheers,
Oleksii K. May 23, 2024, 2:11 p.m. UTC | #3
On Thu, 2024-05-23 at 14:00 +0100, Julien Grall wrote:
> Hi Oleksii,
Hi Julien,

> 
> On 17/05/2024 14:54, Oleksii Kurochko wrote:
> > diff --git a/xen/arch/arm/arm64/livepatch.c
> > b/xen/arch/arm/arm64/livepatch.c
> > index df2cebedde..4bc8ed9be5 100644
> > --- a/xen/arch/arm/arm64/livepatch.c
> > +++ b/xen/arch/arm/arm64/livepatch.c
> > @@ -10,7 +10,6 @@
> >   #include <xen/mm.h>
> >   #include <xen/vmap.h>
> >   
> > -#include <asm/bitops.h>
> 
> It is a bit unclear how this change is related to the patch. Can you 
> explain in the commit message?
Probably it doesn't need anymore. I will double check and if this
change is not needed, I will just drop it in the next patch version.

> 
> >   #include <asm/byteorder.h>
> >   #include <asm/insn.h>
> >   #include <asm/livepatch.h>
> > diff --git a/xen/arch/arm/include/asm/bitops.h
> > b/xen/arch/arm/include/asm/bitops.h
> > index 5104334e48..8e16335e76 100644
> > --- a/xen/arch/arm/include/asm/bitops.h
> > +++ b/xen/arch/arm/include/asm/bitops.h
> > @@ -22,9 +22,6 @@
> >   #define __set_bit(n,p)            set_bit(n,p)
> >   #define __clear_bit(n,p)          clear_bit(n,p)
> >   
> > -#define BITOP_BITS_PER_WORD     32
> > -#define BITOP_MASK(nr)          (1UL << ((nr) %
> > BITOP_BITS_PER_WORD))
> > -#define BITOP_WORD(nr)          ((nr) / BITOP_BITS_PER_WORD)
> >   #define BITS_PER_BYTE           8
> 
> OOI, any reason BITS_PER_BYTE has not been moved as well? I don't
> expect 
> the value to change across arch.
I can move it to generic one header too in the next patch version.

> 
> [...]
> 
> > diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
> > index f14ad0d33a..6eeeff0117 100644
> > --- a/xen/include/xen/bitops.h
> > +++ b/xen/include/xen/bitops.h
> > @@ -65,10 +65,141 @@ static inline int generic_flsl(unsigned long
> > x)
> >    * scope
> >    */
> >   
> > +#define BITOP_BITS_PER_WORD 32
> > +typedef uint32_t bitop_uint_t;
> > +
> > +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
> > BITOP_BITS_PER_WORD))
> > +
> > +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > +
> > +extern void __bitop_bad_size(void);
> > +
> > +#define bitop_bad_size(addr) (sizeof(*(addr)) <
> > sizeof(bitop_uint_t))
> > +
> >   /* --------------------- Please tidy above here -----------------
> > ---- */
> >   
> >   #include <asm/bitops.h>
> >   
> > +/**
> > + * generic__test_and_set_bit - Set a bit and return its old value
> > + * @nr: Bit to set
> > + * @addr: Address to count from
> > + *
> > + * This operation is non-atomic and can be reordered.
> > + * If two examples of this operation race, one can appear to
> > succeed
> > + * but actually fail.  You must protect multiple accesses with a
> > lock.
> > + */
> 
> Sorry for only mentioning this on v10. I think this comment should be
> duplicated (or moved to) on top of test_bit() because this is what 
> everyone will use. This will avoid the developper to follow the
> function 
> calls and only notice the x86 version which says "This function is 
> atomic and may not be reordered." and would be wrong for all the
> other arch.
It makes sense to add this comment on top of test_bit(), but I am
curious if it is needed to mention that for x86 arch_test_bit() "is
atomic and may not be reordered":

 * This operation is non-atomic and can be reordered. ( Exception: for
* x86 arch_test_bit() is atomic and may not be reordered )
 * If two examples of this operation race, one can appear to succeed
 * but actually fail.  You must protect multiple accesses with a lock.
 */

> 
> > +static always_inline bool
> > +generic__test_and_set_bit(int nr, volatile void *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old | mask;
> > +    return (old & mask);
> > +}
> > +
> > +/**
> > + * generic__test_and_clear_bit - Clear a bit and return its old
> > value
> > + * @nr: Bit to clear
> > + * @addr: Address to count from
> > + *
> > + * This operation is non-atomic and can be reordered.
> > + * If two examples of this operation race, one can appear to
> > succeed
> > + * but actually fail.  You must protect multiple accesses with a
> > lock.
> > + */
> 
> Same applies here and ...
> 
> > +static always_inline bool
> > +generic__test_and_clear_bit(int nr, volatile void *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old & ~mask;
> > +    return (old & mask);
> > +}
> > +
> > +/* WARNING: non atomic and it can be reordered! */
> 
> ... here.
> 
> > +static always_inline bool
> > +generic__test_and_change_bit(int nr, volatile void *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old ^ mask;
> > +    return (old & mask);
> > +}
> > +/**
> > + * generic_test_bit - Determine whether a bit is set
> > + * @nr: bit number to test
> > + * @addr: Address to start counting from
> > + */
> > +static always_inline bool generic_test_bit(int nr, const volatile
> > void *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    const volatile bitop_uint_t *p =
> > +                        (const volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +
> > +    return (*p & mask);
> > +}
> > +
> > +static always_inline bool
> > +__test_and_set_bit(int nr, volatile void *addr)
> > +{
> > +#ifndef arch__test_and_set_bit
> > +#define arch__test_and_set_bit generic__test_and_set_bit
> > +#endif
> > +
> > +    return arch__test_and_set_bit(nr, addr);
> > +}
> 
> NIT: It is a bit too late to change this one. But I have to admit, I 
> don't understand the purpose of the static inline when you could have
> simply call...
> 
> > +#define __test_and_set_bit(nr, addr) ({             \
> > +    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
> > +    __test_and_set_bit(nr, addr);                   \
> 
> ... __arch__test_and_set_bit here.
I just wanted to be in sync with other non-atomic test*_bit()
operations and Andrew's patch series ( xen/bitops: Reduce the mess,
starting with ffs() ).

> 
> 
> The only two reasons I am not providing an ack is the:
>   * Explanation for the removal of asm/bitops.h in livepatch.c
>   * The placement of the comments
Thanks for review. I'll update the patch and sent new version of the
patch series.

~ Oleksii

> 
> There are not too important for me.
> 
> Cheers,
>
Julien Grall May 23, 2024, 2:33 p.m. UTC | #4
On 23/05/2024 15:11, Oleksii K. wrote:
> On Thu, 2024-05-23 at 14:00 +0100, Julien Grall wrote:
>> Hi Oleksii,
> Hi Julien,
> 
>>
>> On 17/05/2024 14:54, Oleksii Kurochko wrote:
>>> diff --git a/xen/arch/arm/arm64/livepatch.c
>>> b/xen/arch/arm/arm64/livepatch.c
>>> index df2cebedde..4bc8ed9be5 100644
>>> --- a/xen/arch/arm/arm64/livepatch.c
>>> +++ b/xen/arch/arm/arm64/livepatch.c
>>> @@ -10,7 +10,6 @@
>>>    #include <xen/mm.h>
>>>    #include <xen/vmap.h>
>>>    
>>> -#include <asm/bitops.h>
>>
>> It is a bit unclear how this change is related to the patch. Can you
>> explain in the commit message?
> Probably it doesn't need anymore. I will double check and if this
> change is not needed, I will just drop it in the next patch version.
> 
>>
>>>    #include <asm/byteorder.h>
>>>    #include <asm/insn.h>
>>>    #include <asm/livepatch.h>
>>> diff --git a/xen/arch/arm/include/asm/bitops.h
>>> b/xen/arch/arm/include/asm/bitops.h
>>> index 5104334e48..8e16335e76 100644
>>> --- a/xen/arch/arm/include/asm/bitops.h
>>> +++ b/xen/arch/arm/include/asm/bitops.h
>>> @@ -22,9 +22,6 @@
>>>    #define __set_bit(n,p)            set_bit(n,p)
>>>    #define __clear_bit(n,p)          clear_bit(n,p)
>>>    
>>> -#define BITOP_BITS_PER_WORD     32
>>> -#define BITOP_MASK(nr)          (1UL << ((nr) %
>>> BITOP_BITS_PER_WORD))
>>> -#define BITOP_WORD(nr)          ((nr) / BITOP_BITS_PER_WORD)
>>>    #define BITS_PER_BYTE           8
>>
>> OOI, any reason BITS_PER_BYTE has not been moved as well? I don't
>> expect
>> the value to change across arch.
> I can move it to generic one header too in the next patch version.
> 
>>
>> [...]
>>
>>> diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
>>> index f14ad0d33a..6eeeff0117 100644
>>> --- a/xen/include/xen/bitops.h
>>> +++ b/xen/include/xen/bitops.h
>>> @@ -65,10 +65,141 @@ static inline int generic_flsl(unsigned long
>>> x)
>>>     * scope
>>>     */
>>>    
>>> +#define BITOP_BITS_PER_WORD 32
>>> +typedef uint32_t bitop_uint_t;
>>> +
>>> +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
>>> BITOP_BITS_PER_WORD))
>>> +
>>> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
>>> +
>>> +extern void __bitop_bad_size(void);
>>> +
>>> +#define bitop_bad_size(addr) (sizeof(*(addr)) <
>>> sizeof(bitop_uint_t))
>>> +
>>>    /* --------------------- Please tidy above here -----------------
>>> ---- */
>>>    
>>>    #include <asm/bitops.h>
>>>    
>>> +/**
>>> + * generic__test_and_set_bit - Set a bit and return its old value
>>> + * @nr: Bit to set
>>> + * @addr: Address to count from
>>> + *
>>> + * This operation is non-atomic and can be reordered.
>>> + * If two examples of this operation race, one can appear to
>>> succeed
>>> + * but actually fail.  You must protect multiple accesses with a
>>> lock.
>>> + */
>>
>> Sorry for only mentioning this on v10. I think this comment should be
>> duplicated (or moved to) on top of test_bit() because this is what
>> everyone will use. This will avoid the developper to follow the
>> function
>> calls and only notice the x86 version which says "This function is
>> atomic and may not be reordered." and would be wrong for all the
>> other arch.
> It makes sense to add this comment on top of test_bit(), but I am
> curious if it is needed to mention that for x86 arch_test_bit() "is
> atomic and may not be reordered":

I would say no because any developper modifying common code can't 
relying it.

> 
>   * This operation is non-atomic and can be reordered. ( Exception: for
> * x86 arch_test_bit() is atomic and may not be reordered )
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> 
>>
>>> +static always_inline bool
>>> +generic__test_and_set_bit(int nr, volatile void *addr)
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +    bitop_uint_t old = *p;
>>> +
>>> +    *p = old | mask;
>>> +    return (old & mask);
>>> +}
>>> +
>>> +/**
>>> + * generic__test_and_clear_bit - Clear a bit and return its old
>>> value
>>> + * @nr: Bit to clear
>>> + * @addr: Address to count from
>>> + *
>>> + * This operation is non-atomic and can be reordered.
>>> + * If two examples of this operation race, one can appear to
>>> succeed
>>> + * but actually fail.  You must protect multiple accesses with a
>>> lock.
>>> + */
>>
>> Same applies here and ...
>>
>>> +static always_inline bool
>>> +generic__test_and_clear_bit(int nr, volatile void *addr)
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +    bitop_uint_t old = *p;
>>> +
>>> +    *p = old & ~mask;
>>> +    return (old & mask);
>>> +}
>>> +
>>> +/* WARNING: non atomic and it can be reordered! */
>>
>> ... here.
>>
>>> +static always_inline bool
>>> +generic__test_and_change_bit(int nr, volatile void *addr)
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +    bitop_uint_t old = *p;
>>> +
>>> +    *p = old ^ mask;
>>> +    return (old & mask);
>>> +}
>>> +/**
>>> + * generic_test_bit - Determine whether a bit is set
>>> + * @nr: bit number to test
>>> + * @addr: Address to start counting from
>>> + */
>>> +static always_inline bool generic_test_bit(int nr, const volatile
>>> void *addr)
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    const volatile bitop_uint_t *p =
>>> +                        (const volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +
>>> +    return (*p & mask);
>>> +}
>>> +
>>> +static always_inline bool
>>> +__test_and_set_bit(int nr, volatile void *addr)
>>> +{
>>> +#ifndef arch__test_and_set_bit
>>> +#define arch__test_and_set_bit generic__test_and_set_bit
>>> +#endif
>>> +
>>> +    return arch__test_and_set_bit(nr, addr);
>>> +}
>>
>> NIT: It is a bit too late to change this one. But I have to admit, I
>> don't understand the purpose of the static inline when you could have
>> simply call...
>>
>>> +#define __test_and_set_bit(nr, addr) ({             \
>>> +    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
>>> +    __test_and_set_bit(nr, addr);                   \
>>
>> ... __arch__test_and_set_bit here.
> I just wanted to be in sync with other non-atomic test*_bit()
> operations and Andrew's patch series ( xen/bitops: Reduce the mess,
> starting with ffs() ).

I looked at Andrew series and I can't seem to find an example where he 
uses both the macro + static inline. He seems only use a static inline.
Do you have any pointer?

But by any chance are you referring to the pattern on x86? If so, I 
would really like to understand what's the benefits of introducing a a 
one-liner static inline which is "shadowed" by a macro...

Cheers,
Oleksii K. May 23, 2024, 4:40 p.m. UTC | #5
On Thu, 2024-05-23 at 15:33 +0100, Julien Grall wrote:
> 
> 
> On 23/05/2024 15:11, Oleksii K. wrote:
> > On Thu, 2024-05-23 at 14:00 +0100, Julien Grall wrote:
> > > Hi Oleksii,
> > Hi Julien,
> > 
> > > 
> > > On 17/05/2024 14:54, Oleksii Kurochko wrote:
> > > > diff --git a/xen/arch/arm/arm64/livepatch.c
> > > > b/xen/arch/arm/arm64/livepatch.c
> > > > index df2cebedde..4bc8ed9be5 100644
> > > > --- a/xen/arch/arm/arm64/livepatch.c
> > > > +++ b/xen/arch/arm/arm64/livepatch.c
> > > > @@ -10,7 +10,6 @@
> > > >    #include <xen/mm.h>
> > > >    #include <xen/vmap.h>
> > > >    
> > > > -#include <asm/bitops.h>
> > > 
> > > It is a bit unclear how this change is related to the patch. Can
> > > you
> > > explain in the commit message?
> > Probably it doesn't need anymore. I will double check and if this
> > change is not needed, I will just drop it in the next patch
> > version.
> > 
> > > 
> > > >    #include <asm/byteorder.h>
> > > >    #include <asm/insn.h>
> > > >    #include <asm/livepatch.h>
> > > > diff --git a/xen/arch/arm/include/asm/bitops.h
> > > > b/xen/arch/arm/include/asm/bitops.h
> > > > index 5104334e48..8e16335e76 100644
> > > > --- a/xen/arch/arm/include/asm/bitops.h
> > > > +++ b/xen/arch/arm/include/asm/bitops.h
> > > > @@ -22,9 +22,6 @@
> > > >    #define __set_bit(n,p)            set_bit(n,p)
> > > >    #define __clear_bit(n,p)          clear_bit(n,p)
> > > >    
> > > > -#define BITOP_BITS_PER_WORD     32
> > > > -#define BITOP_MASK(nr)          (1UL << ((nr) %
> > > > BITOP_BITS_PER_WORD))
> > > > -#define BITOP_WORD(nr)          ((nr) / BITOP_BITS_PER_WORD)
> > > >    #define BITS_PER_BYTE           8
> > > 
> > > OOI, any reason BITS_PER_BYTE has not been moved as well? I don't
> > > expect
> > > the value to change across arch.
> > I can move it to generic one header too in the next patch version.
> > 
> > > 
> > > [...]
> > > 
> > > > diff --git a/xen/include/xen/bitops.h
> > > > b/xen/include/xen/bitops.h
> > > > index f14ad0d33a..6eeeff0117 100644
> > > > --- a/xen/include/xen/bitops.h
> > > > +++ b/xen/include/xen/bitops.h
> > > > @@ -65,10 +65,141 @@ static inline int generic_flsl(unsigned
> > > > long
> > > > x)
> > > >     * scope
> > > >     */
> > > >    
> > > > +#define BITOP_BITS_PER_WORD 32
> > > > +typedef uint32_t bitop_uint_t;
> > > > +
> > > > +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
> > > > BITOP_BITS_PER_WORD))
> > > > +
> > > > +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > > > +
> > > > +extern void __bitop_bad_size(void);
> > > > +
> > > > +#define bitop_bad_size(addr) (sizeof(*(addr)) <
> > > > sizeof(bitop_uint_t))
> > > > +
> > > >    /* --------------------- Please tidy above here ------------
> > > > -----
> > > > ---- */
> > > >    
> > > >    #include <asm/bitops.h>
> > > >    
> > > > +/**
> > > > + * generic__test_and_set_bit - Set a bit and return its old
> > > > value
> > > > + * @nr: Bit to set
> > > > + * @addr: Address to count from
> > > > + *
> > > > + * This operation is non-atomic and can be reordered.
> > > > + * If two examples of this operation race, one can appear to
> > > > succeed
> > > > + * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > > + */
> > > 
> > > Sorry for only mentioning this on v10. I think this comment
> > > should be
> > > duplicated (or moved to) on top of test_bit() because this is
> > > what
> > > everyone will use. This will avoid the developper to follow the
> > > function
> > > calls and only notice the x86 version which says "This function
> > > is
> > > atomic and may not be reordered." and would be wrong for all the
> > > other arch.
> > It makes sense to add this comment on top of test_bit(), but I am
> > curious if it is needed to mention that for x86 arch_test_bit() "is
> > atomic and may not be reordered":
> 
> I would say no because any developper modifying common code can't 
> relying it.
> 
> > 
> >   * This operation is non-atomic and can be reordered. ( Exception:
> > for
> > * x86 arch_test_bit() is atomic and may not be reordered )
> >   * If two examples of this operation race, one can appear to
> > succeed
> >   * but actually fail.  You must protect multiple accesses with a
> > lock.
> >   */
> > 
> > > 
> > > > +static always_inline bool
> > > > +generic__test_and_set_bit(int nr, volatile void *addr)
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +    bitop_uint_t old = *p;
> > > > +
> > > > +    *p = old | mask;
> > > > +    return (old & mask);
> > > > +}
> > > > +
> > > > +/**
> > > > + * generic__test_and_clear_bit - Clear a bit and return its
> > > > old
> > > > value
> > > > + * @nr: Bit to clear
> > > > + * @addr: Address to count from
> > > > + *
> > > > + * This operation is non-atomic and can be reordered.
> > > > + * If two examples of this operation race, one can appear to
> > > > succeed
> > > > + * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > > + */
> > > 
> > > Same applies here and ...
> > > 
> > > > +static always_inline bool
> > > > +generic__test_and_clear_bit(int nr, volatile void *addr)
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +    bitop_uint_t old = *p;
> > > > +
> > > > +    *p = old & ~mask;
> > > > +    return (old & mask);
> > > > +}
> > > > +
> > > > +/* WARNING: non atomic and it can be reordered! */
> > > 
> > > ... here.
> > > 
> > > > +static always_inline bool
> > > > +generic__test_and_change_bit(int nr, volatile void *addr)
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +    bitop_uint_t old = *p;
> > > > +
> > > > +    *p = old ^ mask;
> > > > +    return (old & mask);
> > > > +}
> > > > +/**
> > > > + * generic_test_bit - Determine whether a bit is set
> > > > + * @nr: bit number to test
> > > > + * @addr: Address to start counting from
> > > > + */
> > > > +static always_inline bool generic_test_bit(int nr, const
> > > > volatile
> > > > void *addr)
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    const volatile bitop_uint_t *p =
> > > > +                        (const volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +
> > > > +    return (*p & mask);
> > > > +}
> > > > +
> > > > +static always_inline bool
> > > > +__test_and_set_bit(int nr, volatile void *addr)
> > > > +{
> > > > +#ifndef arch__test_and_set_bit
> > > > +#define arch__test_and_set_bit generic__test_and_set_bit
> > > > +#endif
> > > > +
> > > > +    return arch__test_and_set_bit(nr, addr);
> > > > +}
> > > 
> > > NIT: It is a bit too late to change this one. But I have to
> > > admit, I
> > > don't understand the purpose of the static inline when you could
> > > have
> > > simply call...
> > > 
> > > > +#define __test_and_set_bit(nr, addr) ({             \
> > > > +    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
> > > > +    __test_and_set_bit(nr, addr);                   \
> > > 
> > > ... __arch__test_and_set_bit here.
> > I just wanted to be in sync with other non-atomic test*_bit()
> > operations and Andrew's patch series ( xen/bitops: Reduce the mess,
> > starting with ffs() ).
> 
> I looked at Andrew series and I can't seem to find an example where
> he 
> uses both the macro + static inline. He seems only use a static
> inline.
> Do you have any pointer?
> 
> But by any chance are you referring to the pattern on x86? If so, I 
> would really like to understand what's the benefits of introducing a
> a 
> one-liner static inline which is "shadowed" by a macro...
Yes, I was referring to the x86 pattern.

I tried to find the reason in the commit but couldn't:
https://lists.xenproject.org/archives/html/xen-changelog/2008-03/msg00097.html

For some reason, I also couldn't find the mailing list thread for this.

Perhaps Jan could help here, based on the commit message he might have
a suggestion.

~ Oleksii
Jan Beulich May 24, 2024, 6:48 a.m. UTC | #6
On 23.05.2024 18:40, Oleksii K. wrote:
> On Thu, 2024-05-23 at 15:33 +0100, Julien Grall wrote:
>> On 23/05/2024 15:11, Oleksii K. wrote:
>>> On Thu, 2024-05-23 at 14:00 +0100, Julien Grall wrote:
>>>> On 17/05/2024 14:54, Oleksii Kurochko wrote:
>>>>> diff --git a/xen/arch/arm/arm64/livepatch.c
>>>>> b/xen/arch/arm/arm64/livepatch.c
>>>>> index df2cebedde..4bc8ed9be5 100644
>>>>> --- a/xen/arch/arm/arm64/livepatch.c
>>>>> +++ b/xen/arch/arm/arm64/livepatch.c
>>>>> @@ -10,7 +10,6 @@
>>>>>    #include <xen/mm.h>
>>>>>    #include <xen/vmap.h>
>>>>>    
>>>>> -#include <asm/bitops.h>
>>>>
>>>> It is a bit unclear how this change is related to the patch. Can
>>>> you
>>>> explain in the commit message?
>>> Probably it doesn't need anymore. I will double check and if this
>>> change is not needed, I will just drop it in the next patch
>>> version.
>>>
>>>>
>>>>>    #include <asm/byteorder.h>
>>>>>    #include <asm/insn.h>
>>>>>    #include <asm/livepatch.h>
>>>>> diff --git a/xen/arch/arm/include/asm/bitops.h
>>>>> b/xen/arch/arm/include/asm/bitops.h
>>>>> index 5104334e48..8e16335e76 100644
>>>>> --- a/xen/arch/arm/include/asm/bitops.h
>>>>> +++ b/xen/arch/arm/include/asm/bitops.h
>>>>> @@ -22,9 +22,6 @@
>>>>>    #define __set_bit(n,p)            set_bit(n,p)
>>>>>    #define __clear_bit(n,p)          clear_bit(n,p)
>>>>>    
>>>>> -#define BITOP_BITS_PER_WORD     32
>>>>> -#define BITOP_MASK(nr)          (1UL << ((nr) %
>>>>> BITOP_BITS_PER_WORD))
>>>>> -#define BITOP_WORD(nr)          ((nr) / BITOP_BITS_PER_WORD)
>>>>>    #define BITS_PER_BYTE           8
>>>>
>>>> OOI, any reason BITS_PER_BYTE has not been moved as well? I don't
>>>> expect
>>>> the value to change across arch.
>>> I can move it to generic one header too in the next patch version.
>>>
>>>>
>>>> [...]
>>>>
>>>>> diff --git a/xen/include/xen/bitops.h
>>>>> b/xen/include/xen/bitops.h
>>>>> index f14ad0d33a..6eeeff0117 100644
>>>>> --- a/xen/include/xen/bitops.h
>>>>> +++ b/xen/include/xen/bitops.h
>>>>> @@ -65,10 +65,141 @@ static inline int generic_flsl(unsigned
>>>>> long
>>>>> x)
>>>>>     * scope
>>>>>     */
>>>>>    
>>>>> +#define BITOP_BITS_PER_WORD 32
>>>>> +typedef uint32_t bitop_uint_t;
>>>>> +
>>>>> +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
>>>>> BITOP_BITS_PER_WORD))
>>>>> +
>>>>> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
>>>>> +
>>>>> +extern void __bitop_bad_size(void);
>>>>> +
>>>>> +#define bitop_bad_size(addr) (sizeof(*(addr)) <
>>>>> sizeof(bitop_uint_t))
>>>>> +
>>>>>    /* --------------------- Please tidy above here ------------
>>>>> -----
>>>>> ---- */
>>>>>    
>>>>>    #include <asm/bitops.h>
>>>>>    
>>>>> +/**
>>>>> + * generic__test_and_set_bit - Set a bit and return its old
>>>>> value
>>>>> + * @nr: Bit to set
>>>>> + * @addr: Address to count from
>>>>> + *
>>>>> + * This operation is non-atomic and can be reordered.
>>>>> + * If two examples of this operation race, one can appear to
>>>>> succeed
>>>>> + * but actually fail.  You must protect multiple accesses with
>>>>> a
>>>>> lock.
>>>>> + */
>>>>
>>>> Sorry for only mentioning this on v10. I think this comment
>>>> should be
>>>> duplicated (or moved to) on top of test_bit() because this is
>>>> what
>>>> everyone will use. This will avoid the developper to follow the
>>>> function
>>>> calls and only notice the x86 version which says "This function
>>>> is
>>>> atomic and may not be reordered." and would be wrong for all the
>>>> other arch.
>>> It makes sense to add this comment on top of test_bit(), but I am
>>> curious if it is needed to mention that for x86 arch_test_bit() "is
>>> atomic and may not be reordered":
>>
>> I would say no because any developper modifying common code can't 
>> relying it.
>>
>>>
>>>   * This operation is non-atomic and can be reordered. ( Exception:
>>> for
>>> * x86 arch_test_bit() is atomic and may not be reordered )
>>>   * If two examples of this operation race, one can appear to
>>> succeed
>>>   * but actually fail.  You must protect multiple accesses with a
>>> lock.
>>>   */
>>>
>>>>
>>>>> +static always_inline bool
>>>>> +generic__test_and_set_bit(int nr, volatile void *addr)
>>>>> +{
>>>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>>>> BITOP_WORD(nr);
>>>>> +    bitop_uint_t old = *p;
>>>>> +
>>>>> +    *p = old | mask;
>>>>> +    return (old & mask);
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * generic__test_and_clear_bit - Clear a bit and return its
>>>>> old
>>>>> value
>>>>> + * @nr: Bit to clear
>>>>> + * @addr: Address to count from
>>>>> + *
>>>>> + * This operation is non-atomic and can be reordered.
>>>>> + * If two examples of this operation race, one can appear to
>>>>> succeed
>>>>> + * but actually fail.  You must protect multiple accesses with
>>>>> a
>>>>> lock.
>>>>> + */
>>>>
>>>> Same applies here and ...
>>>>
>>>>> +static always_inline bool
>>>>> +generic__test_and_clear_bit(int nr, volatile void *addr)
>>>>> +{
>>>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>>>> BITOP_WORD(nr);
>>>>> +    bitop_uint_t old = *p;
>>>>> +
>>>>> +    *p = old & ~mask;
>>>>> +    return (old & mask);
>>>>> +}
>>>>> +
>>>>> +/* WARNING: non atomic and it can be reordered! */
>>>>
>>>> ... here.
>>>>
>>>>> +static always_inline bool
>>>>> +generic__test_and_change_bit(int nr, volatile void *addr)
>>>>> +{
>>>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>>>> BITOP_WORD(nr);
>>>>> +    bitop_uint_t old = *p;
>>>>> +
>>>>> +    *p = old ^ mask;
>>>>> +    return (old & mask);
>>>>> +}
>>>>> +/**
>>>>> + * generic_test_bit - Determine whether a bit is set
>>>>> + * @nr: bit number to test
>>>>> + * @addr: Address to start counting from
>>>>> + */
>>>>> +static always_inline bool generic_test_bit(int nr, const
>>>>> volatile
>>>>> void *addr)
>>>>> +{
>>>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>>>> +    const volatile bitop_uint_t *p =
>>>>> +                        (const volatile bitop_uint_t *)addr +
>>>>> BITOP_WORD(nr);
>>>>> +
>>>>> +    return (*p & mask);
>>>>> +}
>>>>> +
>>>>> +static always_inline bool
>>>>> +__test_and_set_bit(int nr, volatile void *addr)
>>>>> +{
>>>>> +#ifndef arch__test_and_set_bit
>>>>> +#define arch__test_and_set_bit generic__test_and_set_bit
>>>>> +#endif
>>>>> +
>>>>> +    return arch__test_and_set_bit(nr, addr);
>>>>> +}
>>>>
>>>> NIT: It is a bit too late to change this one. But I have to
>>>> admit, I
>>>> don't understand the purpose of the static inline when you could
>>>> have
>>>> simply call...
>>>>
>>>>> +#define __test_and_set_bit(nr, addr) ({             \
>>>>> +    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
>>>>> +    __test_and_set_bit(nr, addr);                   \
>>>>
>>>> ... __arch__test_and_set_bit here.
>>> I just wanted to be in sync with other non-atomic test*_bit()
>>> operations and Andrew's patch series ( xen/bitops: Reduce the mess,
>>> starting with ffs() ).
>>
>> I looked at Andrew series and I can't seem to find an example where
>> he 
>> uses both the macro + static inline. He seems only use a static
>> inline.
>> Do you have any pointer?
>>
>> But by any chance are you referring to the pattern on x86? If so, I 
>> would really like to understand what's the benefits of introducing a
>> a 
>> one-liner static inline which is "shadowed" by a macro...
> Yes, I was referring to the x86 pattern.
> 
> I tried to find the reason in the commit but couldn't:
> https://lists.xenproject.org/archives/html/xen-changelog/2008-03/msg00097.html
> 
> For some reason, I also couldn't find the mailing list thread for this.
> 
> Perhaps Jan could help here, based on the commit message he might have
> a suggestion.

There's a lot of apparently unrelated context left, so I'm trying to guess
on what the actual question is, from the old change you're pointing at: With

#define __test_and_set_bit(nr, addr) ({             \
   if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
   __test_and_set_bit(nr, addr);                   \

why do we have that wrapping macro? If that's indeed the question, then may
I suggest you consider what would happen to the sizeof() in bitop_bad_size()
if the invocation was moved to the inline function, taking pointer-to-void?

However, I can't relate that old change to the question I think Julien
raised (invoking __test_and_set_bit() vs arch__test_and_set_bit()), so
perhaps I'm missing something.

Jan
Oleksii K. May 24, 2024, 7:25 a.m. UTC | #7
On Fri, 2024-05-24 at 08:48 +0200, Jan Beulich wrote:
> On 23.05.2024 18:40, Oleksii K. wrote:
> > On Thu, 2024-05-23 at 15:33 +0100, Julien Grall wrote:
> > > On 23/05/2024 15:11, Oleksii K. wrote:
> > > > On Thu, 2024-05-23 at 14:00 +0100, Julien Grall wrote:
> > > > > On 17/05/2024 14:54, Oleksii Kurochko wrote:
> > > > > > diff --git a/xen/arch/arm/arm64/livepatch.c
> > > > > > b/xen/arch/arm/arm64/livepatch.c
> > > > > > index df2cebedde..4bc8ed9be5 100644
> > > > > > --- a/xen/arch/arm/arm64/livepatch.c
> > > > > > +++ b/xen/arch/arm/arm64/livepatch.c
> > > > > > @@ -10,7 +10,6 @@
> > > > > >    #include <xen/mm.h>
> > > > > >    #include <xen/vmap.h>
> > > > > >    
> > > > > > -#include <asm/bitops.h>
> > > > > 
> > > > > It is a bit unclear how this change is related to the patch.
> > > > > Can
> > > > > you
> > > > > explain in the commit message?
> > > > Probably it doesn't need anymore. I will double check and if
> > > > this
> > > > change is not needed, I will just drop it in the next patch
> > > > version.
> > > > 
> > > > > 
> > > > > >    #include <asm/byteorder.h>
> > > > > >    #include <asm/insn.h>
> > > > > >    #include <asm/livepatch.h>
> > > > > > diff --git a/xen/arch/arm/include/asm/bitops.h
> > > > > > b/xen/arch/arm/include/asm/bitops.h
> > > > > > index 5104334e48..8e16335e76 100644
> > > > > > --- a/xen/arch/arm/include/asm/bitops.h
> > > > > > +++ b/xen/arch/arm/include/asm/bitops.h
> > > > > > @@ -22,9 +22,6 @@
> > > > > >    #define __set_bit(n,p)            set_bit(n,p)
> > > > > >    #define __clear_bit(n,p)          clear_bit(n,p)
> > > > > >    
> > > > > > -#define BITOP_BITS_PER_WORD     32
> > > > > > -#define BITOP_MASK(nr)          (1UL << ((nr) %
> > > > > > BITOP_BITS_PER_WORD))
> > > > > > -#define BITOP_WORD(nr)          ((nr) /
> > > > > > BITOP_BITS_PER_WORD)
> > > > > >    #define BITS_PER_BYTE           8
> > > > > 
> > > > > OOI, any reason BITS_PER_BYTE has not been moved as well? I
> > > > > don't
> > > > > expect
> > > > > the value to change across arch.
> > > > I can move it to generic one header too in the next patch
> > > > version.
> > > > 
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > diff --git a/xen/include/xen/bitops.h
> > > > > > b/xen/include/xen/bitops.h
> > > > > > index f14ad0d33a..6eeeff0117 100644
> > > > > > --- a/xen/include/xen/bitops.h
> > > > > > +++ b/xen/include/xen/bitops.h
> > > > > > @@ -65,10 +65,141 @@ static inline int
> > > > > > generic_flsl(unsigned
> > > > > > long
> > > > > > x)
> > > > > >     * scope
> > > > > >     */
> > > > > >    
> > > > > > +#define BITOP_BITS_PER_WORD 32
> > > > > > +typedef uint32_t bitop_uint_t;
> > > > > > +
> > > > > > +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
> > > > > > BITOP_BITS_PER_WORD))
> > > > > > +
> > > > > > +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > > > > > +
> > > > > > +extern void __bitop_bad_size(void);
> > > > > > +
> > > > > > +#define bitop_bad_size(addr) (sizeof(*(addr)) <
> > > > > > sizeof(bitop_uint_t))
> > > > > > +
> > > > > >    /* --------------------- Please tidy above here --------
> > > > > > ----
> > > > > > -----
> > > > > > ---- */
> > > > > >    
> > > > > >    #include <asm/bitops.h>
> > > > > >    
> > > > > > +/**
> > > > > > + * generic__test_and_set_bit - Set a bit and return its
> > > > > > old
> > > > > > value
> > > > > > + * @nr: Bit to set
> > > > > > + * @addr: Address to count from
> > > > > > + *
> > > > > > + * This operation is non-atomic and can be reordered.
> > > > > > + * If two examples of this operation race, one can appear
> > > > > > to
> > > > > > succeed
> > > > > > + * but actually fail.  You must protect multiple accesses
> > > > > > with
> > > > > > a
> > > > > > lock.
> > > > > > + */
> > > > > 
> > > > > Sorry for only mentioning this on v10. I think this comment
> > > > > should be
> > > > > duplicated (or moved to) on top of test_bit() because this is
> > > > > what
> > > > > everyone will use. This will avoid the developper to follow
> > > > > the
> > > > > function
> > > > > calls and only notice the x86 version which says "This
> > > > > function
> > > > > is
> > > > > atomic and may not be reordered." and would be wrong for all
> > > > > the
> > > > > other arch.
> > > > It makes sense to add this comment on top of test_bit(), but I
> > > > am
> > > > curious if it is needed to mention that for x86 arch_test_bit()
> > > > "is
> > > > atomic and may not be reordered":
> > > 
> > > I would say no because any developper modifying common code can't
> > > relying it.
> > > 
> > > > 
> > > >   * This operation is non-atomic and can be reordered. (
> > > > Exception:
> > > > for
> > > > * x86 arch_test_bit() is atomic and may not be reordered )
> > > >   * If two examples of this operation race, one can appear to
> > > > succeed
> > > >   * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > >   */
> > > > 
> > > > > 
> > > > > > +static always_inline bool
> > > > > > +generic__test_and_set_bit(int nr, volatile void *addr)
> > > > > > +{
> > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t
> > > > > > *)addr +
> > > > > > BITOP_WORD(nr);
> > > > > > +    bitop_uint_t old = *p;
> > > > > > +
> > > > > > +    *p = old | mask;
> > > > > > +    return (old & mask);
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * generic__test_and_clear_bit - Clear a bit and return
> > > > > > its
> > > > > > old
> > > > > > value
> > > > > > + * @nr: Bit to clear
> > > > > > + * @addr: Address to count from
> > > > > > + *
> > > > > > + * This operation is non-atomic and can be reordered.
> > > > > > + * If two examples of this operation race, one can appear
> > > > > > to
> > > > > > succeed
> > > > > > + * but actually fail.  You must protect multiple accesses
> > > > > > with
> > > > > > a
> > > > > > lock.
> > > > > > + */
> > > > > 
> > > > > Same applies here and ...
> > > > > 
> > > > > > +static always_inline bool
> > > > > > +generic__test_and_clear_bit(int nr, volatile void *addr)
> > > > > > +{
> > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t
> > > > > > *)addr +
> > > > > > BITOP_WORD(nr);
> > > > > > +    bitop_uint_t old = *p;
> > > > > > +
> > > > > > +    *p = old & ~mask;
> > > > > > +    return (old & mask);
> > > > > > +}
> > > > > > +
> > > > > > +/* WARNING: non atomic and it can be reordered! */
> > > > > 
> > > > > ... here.
> > > > > 
> > > > > > +static always_inline bool
> > > > > > +generic__test_and_change_bit(int nr, volatile void *addr)
> > > > > > +{
> > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t
> > > > > > *)addr +
> > > > > > BITOP_WORD(nr);
> > > > > > +    bitop_uint_t old = *p;
> > > > > > +
> > > > > > +    *p = old ^ mask;
> > > > > > +    return (old & mask);
> > > > > > +}
> > > > > > +/**
> > > > > > + * generic_test_bit - Determine whether a bit is set
> > > > > > + * @nr: bit number to test
> > > > > > + * @addr: Address to start counting from
> > > > > > + */
> > > > > > +static always_inline bool generic_test_bit(int nr, const
> > > > > > volatile
> > > > > > void *addr)
> > > > > > +{
> > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > +    const volatile bitop_uint_t *p =
> > > > > > +                        (const volatile bitop_uint_t
> > > > > > *)addr +
> > > > > > BITOP_WORD(nr);
> > > > > > +
> > > > > > +    return (*p & mask);
> > > > > > +}
> > > > > > +
> > > > > > +static always_inline bool
> > > > > > +__test_and_set_bit(int nr, volatile void *addr)
> > > > > > +{
> > > > > > +#ifndef arch__test_and_set_bit
> > > > > > +#define arch__test_and_set_bit generic__test_and_set_bit
> > > > > > +#endif
> > > > > > +
> > > > > > +    return arch__test_and_set_bit(nr, addr);
> > > > > > +}
> > > > > 
> > > > > NIT: It is a bit too late to change this one. But I have to
> > > > > admit, I
> > > > > don't understand the purpose of the static inline when you
> > > > > could
> > > > > have
> > > > > simply call...
> > > > > 
> > > > > > +#define __test_and_set_bit(nr, addr) ({             \
> > > > > > +    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
> > > > > > +    __test_and_set_bit(nr, addr);                   \
> > > > > 
> > > > > ... __arch__test_and_set_bit here.
> > > > I just wanted to be in sync with other non-atomic test*_bit()
> > > > operations and Andrew's patch series ( xen/bitops: Reduce the
> > > > mess,
> > > > starting with ffs() ).
> > > 
> > > I looked at Andrew series and I can't seem to find an example
> > > where
> > > he 
> > > uses both the macro + static inline. He seems only use a static
> > > inline.
> > > Do you have any pointer?
> > > 
> > > But by any chance are you referring to the pattern on x86? If so,
> > > I 
> > > would really like to understand what's the benefits of
> > > introducing a
> > > a 
> > > one-liner static inline which is "shadowed" by a macro...
> > Yes, I was referring to the x86 pattern.
> > 
> > I tried to find the reason in the commit but couldn't:
> > https://lists.xenproject.org/archives/html/xen-changelog/2008-03/msg00097.html
> > 
> > For some reason, I also couldn't find the mailing list thread for
> > this.
> > 
> > Perhaps Jan could help here, based on the commit message he might
> > have
> > a suggestion.
> 
> There's a lot of apparently unrelated context left, so I'm trying to
> guess
> on what the actual question is, from the old change you're pointing
> at: With
> 
> #define __test_and_set_bit(nr, addr) ({             \
>    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
>    __test_and_set_bit(nr, addr);                   \
> 
> why do we have that wrapping macro? If that's indeed the question,
> then may
> I suggest you consider what would happen to the sizeof() in
> bitop_bad_size()
> if the invocation was moved to the inline function, taking pointer-
> to-void?
> 
> However, I can't relate that old change to the question I think
> Julien
> raised (invoking __test_and_set_bit() vs arch__test_and_set_bit()),
> so
> perhaps I'm missing something.
If I understand the question correctly then the question is why do we
need static inline function. Can't we just do the following:

#ifndef arch_test_bit
#define arch_test_bit generic_test_bit
#endif

#define test_bit(nr, addr) ({                           \
    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
    arch_test_bit(nr, addr);                            \
})

~ Oleksii
Jan Beulich May 24, 2024, 7:35 a.m. UTC | #8
On 24.05.2024 09:25, Oleksii K. wrote:
> On Fri, 2024-05-24 at 08:48 +0200, Jan Beulich wrote:
>> On 23.05.2024 18:40, Oleksii K. wrote:
>>> On Thu, 2024-05-23 at 15:33 +0100, Julien Grall wrote:
>>>> On 23/05/2024 15:11, Oleksii K. wrote:
>>>>> On Thu, 2024-05-23 at 14:00 +0100, Julien Grall wrote:
>>>>>> On 17/05/2024 14:54, Oleksii Kurochko wrote:
>>>>>>> diff --git a/xen/arch/arm/arm64/livepatch.c
>>>>>>> b/xen/arch/arm/arm64/livepatch.c
>>>>>>> index df2cebedde..4bc8ed9be5 100644
>>>>>>> --- a/xen/arch/arm/arm64/livepatch.c
>>>>>>> +++ b/xen/arch/arm/arm64/livepatch.c
>>>>>>> @@ -10,7 +10,6 @@
>>>>>>>    #include <xen/mm.h>
>>>>>>>    #include <xen/vmap.h>
>>>>>>>    
>>>>>>> -#include <asm/bitops.h>
>>>>>>
>>>>>> It is a bit unclear how this change is related to the patch.
>>>>>> Can
>>>>>> you
>>>>>> explain in the commit message?
>>>>> Probably it doesn't need anymore. I will double check and if
>>>>> this
>>>>> change is not needed, I will just drop it in the next patch
>>>>> version.
>>>>>
>>>>>>
>>>>>>>    #include <asm/byteorder.h>
>>>>>>>    #include <asm/insn.h>
>>>>>>>    #include <asm/livepatch.h>
>>>>>>> diff --git a/xen/arch/arm/include/asm/bitops.h
>>>>>>> b/xen/arch/arm/include/asm/bitops.h
>>>>>>> index 5104334e48..8e16335e76 100644
>>>>>>> --- a/xen/arch/arm/include/asm/bitops.h
>>>>>>> +++ b/xen/arch/arm/include/asm/bitops.h
>>>>>>> @@ -22,9 +22,6 @@
>>>>>>>    #define __set_bit(n,p)            set_bit(n,p)
>>>>>>>    #define __clear_bit(n,p)          clear_bit(n,p)
>>>>>>>    
>>>>>>> -#define BITOP_BITS_PER_WORD     32
>>>>>>> -#define BITOP_MASK(nr)          (1UL << ((nr) %
>>>>>>> BITOP_BITS_PER_WORD))
>>>>>>> -#define BITOP_WORD(nr)          ((nr) /
>>>>>>> BITOP_BITS_PER_WORD)
>>>>>>>    #define BITS_PER_BYTE           8
>>>>>>
>>>>>> OOI, any reason BITS_PER_BYTE has not been moved as well? I
>>>>>> don't
>>>>>> expect
>>>>>> the value to change across arch.
>>>>> I can move it to generic one header too in the next patch
>>>>> version.
>>>>>
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> diff --git a/xen/include/xen/bitops.h
>>>>>>> b/xen/include/xen/bitops.h
>>>>>>> index f14ad0d33a..6eeeff0117 100644
>>>>>>> --- a/xen/include/xen/bitops.h
>>>>>>> +++ b/xen/include/xen/bitops.h
>>>>>>> @@ -65,10 +65,141 @@ static inline int
>>>>>>> generic_flsl(unsigned
>>>>>>> long
>>>>>>> x)
>>>>>>>     * scope
>>>>>>>     */
>>>>>>>    
>>>>>>> +#define BITOP_BITS_PER_WORD 32
>>>>>>> +typedef uint32_t bitop_uint_t;
>>>>>>> +
>>>>>>> +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
>>>>>>> BITOP_BITS_PER_WORD))
>>>>>>> +
>>>>>>> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
>>>>>>> +
>>>>>>> +extern void __bitop_bad_size(void);
>>>>>>> +
>>>>>>> +#define bitop_bad_size(addr) (sizeof(*(addr)) <
>>>>>>> sizeof(bitop_uint_t))
>>>>>>> +
>>>>>>>    /* --------------------- Please tidy above here --------
>>>>>>> ----
>>>>>>> -----
>>>>>>> ---- */
>>>>>>>    
>>>>>>>    #include <asm/bitops.h>
>>>>>>>    
>>>>>>> +/**
>>>>>>> + * generic__test_and_set_bit - Set a bit and return its
>>>>>>> old
>>>>>>> value
>>>>>>> + * @nr: Bit to set
>>>>>>> + * @addr: Address to count from
>>>>>>> + *
>>>>>>> + * This operation is non-atomic and can be reordered.
>>>>>>> + * If two examples of this operation race, one can appear
>>>>>>> to
>>>>>>> succeed
>>>>>>> + * but actually fail.  You must protect multiple accesses
>>>>>>> with
>>>>>>> a
>>>>>>> lock.
>>>>>>> + */
>>>>>>
>>>>>> Sorry for only mentioning this on v10. I think this comment
>>>>>> should be
>>>>>> duplicated (or moved to) on top of test_bit() because this is
>>>>>> what
>>>>>> everyone will use. This will avoid the developper to follow
>>>>>> the
>>>>>> function
>>>>>> calls and only notice the x86 version which says "This
>>>>>> function
>>>>>> is
>>>>>> atomic and may not be reordered." and would be wrong for all
>>>>>> the
>>>>>> other arch.
>>>>> It makes sense to add this comment on top of test_bit(), but I
>>>>> am
>>>>> curious if it is needed to mention that for x86 arch_test_bit()
>>>>> "is
>>>>> atomic and may not be reordered":
>>>>
>>>> I would say no because any developper modifying common code can't
>>>> relying it.
>>>>
>>>>>
>>>>>   * This operation is non-atomic and can be reordered. (
>>>>> Exception:
>>>>> for
>>>>> * x86 arch_test_bit() is atomic and may not be reordered )
>>>>>   * If two examples of this operation race, one can appear to
>>>>> succeed
>>>>>   * but actually fail.  You must protect multiple accesses with
>>>>> a
>>>>> lock.
>>>>>   */
>>>>>
>>>>>>
>>>>>>> +static always_inline bool
>>>>>>> +generic__test_and_set_bit(int nr, volatile void *addr)
>>>>>>> +{
>>>>>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>>>>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t
>>>>>>> *)addr +
>>>>>>> BITOP_WORD(nr);
>>>>>>> +    bitop_uint_t old = *p;
>>>>>>> +
>>>>>>> +    *p = old | mask;
>>>>>>> +    return (old & mask);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * generic__test_and_clear_bit - Clear a bit and return
>>>>>>> its
>>>>>>> old
>>>>>>> value
>>>>>>> + * @nr: Bit to clear
>>>>>>> + * @addr: Address to count from
>>>>>>> + *
>>>>>>> + * This operation is non-atomic and can be reordered.
>>>>>>> + * If two examples of this operation race, one can appear
>>>>>>> to
>>>>>>> succeed
>>>>>>> + * but actually fail.  You must protect multiple accesses
>>>>>>> with
>>>>>>> a
>>>>>>> lock.
>>>>>>> + */
>>>>>>
>>>>>> Same applies here and ...
>>>>>>
>>>>>>> +static always_inline bool
>>>>>>> +generic__test_and_clear_bit(int nr, volatile void *addr)
>>>>>>> +{
>>>>>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>>>>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t
>>>>>>> *)addr +
>>>>>>> BITOP_WORD(nr);
>>>>>>> +    bitop_uint_t old = *p;
>>>>>>> +
>>>>>>> +    *p = old & ~mask;
>>>>>>> +    return (old & mask);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* WARNING: non atomic and it can be reordered! */
>>>>>>
>>>>>> ... here.
>>>>>>
>>>>>>> +static always_inline bool
>>>>>>> +generic__test_and_change_bit(int nr, volatile void *addr)
>>>>>>> +{
>>>>>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>>>>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t
>>>>>>> *)addr +
>>>>>>> BITOP_WORD(nr);
>>>>>>> +    bitop_uint_t old = *p;
>>>>>>> +
>>>>>>> +    *p = old ^ mask;
>>>>>>> +    return (old & mask);
>>>>>>> +}
>>>>>>> +/**
>>>>>>> + * generic_test_bit - Determine whether a bit is set
>>>>>>> + * @nr: bit number to test
>>>>>>> + * @addr: Address to start counting from
>>>>>>> + */
>>>>>>> +static always_inline bool generic_test_bit(int nr, const
>>>>>>> volatile
>>>>>>> void *addr)
>>>>>>> +{
>>>>>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>>>>>> +    const volatile bitop_uint_t *p =
>>>>>>> +                        (const volatile bitop_uint_t
>>>>>>> *)addr +
>>>>>>> BITOP_WORD(nr);
>>>>>>> +
>>>>>>> +    return (*p & mask);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static always_inline bool
>>>>>>> +__test_and_set_bit(int nr, volatile void *addr)
>>>>>>> +{
>>>>>>> +#ifndef arch__test_and_set_bit
>>>>>>> +#define arch__test_and_set_bit generic__test_and_set_bit
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +    return arch__test_and_set_bit(nr, addr);
>>>>>>> +}
>>>>>>
>>>>>> NIT: It is a bit too late to change this one. But I have to
>>>>>> admit, I
>>>>>> don't understand the purpose of the static inline when you
>>>>>> could
>>>>>> have
>>>>>> simply call...
>>>>>>
>>>>>>> +#define __test_and_set_bit(nr, addr) ({             \
>>>>>>> +    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
>>>>>>> +    __test_and_set_bit(nr, addr);                   \
>>>>>>
>>>>>> ... __arch__test_and_set_bit here.
>>>>> I just wanted to be in sync with other non-atomic test*_bit()
>>>>> operations and Andrew's patch series ( xen/bitops: Reduce the
>>>>> mess,
>>>>> starting with ffs() ).
>>>>
>>>> I looked at Andrew series and I can't seem to find an example
>>>> where
>>>> he 
>>>> uses both the macro + static inline. He seems only use a static
>>>> inline.
>>>> Do you have any pointer?
>>>>
>>>> But by any chance are you referring to the pattern on x86? If so,
>>>> I 
>>>> would really like to understand what's the benefits of
>>>> introducing a
>>>> a 
>>>> one-liner static inline which is "shadowed" by a macro...
>>> Yes, I was referring to the x86 pattern.
>>>
>>> I tried to find the reason in the commit but couldn't:
>>> https://lists.xenproject.org/archives/html/xen-changelog/2008-03/msg00097.html
>>>
>>> For some reason, I also couldn't find the mailing list thread for
>>> this.
>>>
>>> Perhaps Jan could help here, based on the commit message he might
>>> have
>>> a suggestion.
>>
>> There's a lot of apparently unrelated context left, so I'm trying to
>> guess
>> on what the actual question is, from the old change you're pointing
>> at: With
>>
>> #define __test_and_set_bit(nr, addr) ({             \
>>    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
>>    __test_and_set_bit(nr, addr);                   \
>>
>> why do we have that wrapping macro? If that's indeed the question,
>> then may
>> I suggest you consider what would happen to the sizeof() in
>> bitop_bad_size()
>> if the invocation was moved to the inline function, taking pointer-
>> to-void?
>>
>> However, I can't relate that old change to the question I think
>> Julien
>> raised (invoking __test_and_set_bit() vs arch__test_and_set_bit()),
>> so
>> perhaps I'm missing something.
> If I understand the question correctly then the question is why do we
> need static inline function. Can't we just do the following:
> 
> #ifndef arch_test_bit
> #define arch_test_bit generic_test_bit
> #endif
> 
> #define test_bit(nr, addr) ({                           \
>     if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
>     arch_test_bit(nr, addr);                            \
> })

I think we can. But then how did that old commit come into play?

Jan
Oleksii K. May 24, 2024, 8:06 a.m. UTC | #9
On Fri, 2024-05-24 at 09:35 +0200, Jan Beulich wrote:
> On 24.05.2024 09:25, Oleksii K. wrote:
> > On Fri, 2024-05-24 at 08:48 +0200, Jan Beulich wrote:
> > > On 23.05.2024 18:40, Oleksii K. wrote:
> > > > On Thu, 2024-05-23 at 15:33 +0100, Julien Grall wrote:
> > > > > On 23/05/2024 15:11, Oleksii K. wrote:
> > > > > > On Thu, 2024-05-23 at 14:00 +0100, Julien Grall wrote:
> > > > > > > On 17/05/2024 14:54, Oleksii Kurochko wrote:
> > > > > > > > diff --git a/xen/arch/arm/arm64/livepatch.c
> > > > > > > > b/xen/arch/arm/arm64/livepatch.c
> > > > > > > > index df2cebedde..4bc8ed9be5 100644
> > > > > > > > --- a/xen/arch/arm/arm64/livepatch.c
> > > > > > > > +++ b/xen/arch/arm/arm64/livepatch.c
> > > > > > > > @@ -10,7 +10,6 @@
> > > > > > > >    #include <xen/mm.h>
> > > > > > > >    #include <xen/vmap.h>
> > > > > > > >    
> > > > > > > > -#include <asm/bitops.h>
> > > > > > > 
> > > > > > > It is a bit unclear how this change is related to the
> > > > > > > patch.
> > > > > > > Can
> > > > > > > you
> > > > > > > explain in the commit message?
> > > > > > Probably it doesn't need anymore. I will double check and
> > > > > > if
> > > > > > this
> > > > > > change is not needed, I will just drop it in the next patch
> > > > > > version.
> > > > > > 
> > > > > > > 
> > > > > > > >    #include <asm/byteorder.h>
> > > > > > > >    #include <asm/insn.h>
> > > > > > > >    #include <asm/livepatch.h>
> > > > > > > > diff --git a/xen/arch/arm/include/asm/bitops.h
> > > > > > > > b/xen/arch/arm/include/asm/bitops.h
> > > > > > > > index 5104334e48..8e16335e76 100644
> > > > > > > > --- a/xen/arch/arm/include/asm/bitops.h
> > > > > > > > +++ b/xen/arch/arm/include/asm/bitops.h
> > > > > > > > @@ -22,9 +22,6 @@
> > > > > > > >    #define __set_bit(n,p)            set_bit(n,p)
> > > > > > > >    #define __clear_bit(n,p)          clear_bit(n,p)
> > > > > > > >    
> > > > > > > > -#define BITOP_BITS_PER_WORD     32
> > > > > > > > -#define BITOP_MASK(nr)          (1UL << ((nr) %
> > > > > > > > BITOP_BITS_PER_WORD))
> > > > > > > > -#define BITOP_WORD(nr)          ((nr) /
> > > > > > > > BITOP_BITS_PER_WORD)
> > > > > > > >    #define BITS_PER_BYTE           8
> > > > > > > 
> > > > > > > OOI, any reason BITS_PER_BYTE has not been moved as well?
> > > > > > > I
> > > > > > > don't
> > > > > > > expect
> > > > > > > the value to change across arch.
> > > > > > I can move it to generic one header too in the next patch
> > > > > > version.
> > > > > > 
> > > > > > > 
> > > > > > > [...]
> > > > > > > 
> > > > > > > > diff --git a/xen/include/xen/bitops.h
> > > > > > > > b/xen/include/xen/bitops.h
> > > > > > > > index f14ad0d33a..6eeeff0117 100644
> > > > > > > > --- a/xen/include/xen/bitops.h
> > > > > > > > +++ b/xen/include/xen/bitops.h
> > > > > > > > @@ -65,10 +65,141 @@ static inline int
> > > > > > > > generic_flsl(unsigned
> > > > > > > > long
> > > > > > > > x)
> > > > > > > >     * scope
> > > > > > > >     */
> > > > > > > >    
> > > > > > > > +#define BITOP_BITS_PER_WORD 32
> > > > > > > > +typedef uint32_t bitop_uint_t;
> > > > > > > > +
> > > > > > > > +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
> > > > > > > > BITOP_BITS_PER_WORD))
> > > > > > > > +
> > > > > > > > +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > > > > > > > +
> > > > > > > > +extern void __bitop_bad_size(void);
> > > > > > > > +
> > > > > > > > +#define bitop_bad_size(addr) (sizeof(*(addr)) <
> > > > > > > > sizeof(bitop_uint_t))
> > > > > > > > +
> > > > > > > >    /* --------------------- Please tidy above here ----
> > > > > > > > ----
> > > > > > > > ----
> > > > > > > > -----
> > > > > > > > ---- */
> > > > > > > >    
> > > > > > > >    #include <asm/bitops.h>
> > > > > > > >    
> > > > > > > > +/**
> > > > > > > > + * generic__test_and_set_bit - Set a bit and return
> > > > > > > > its
> > > > > > > > old
> > > > > > > > value
> > > > > > > > + * @nr: Bit to set
> > > > > > > > + * @addr: Address to count from
> > > > > > > > + *
> > > > > > > > + * This operation is non-atomic and can be reordered.
> > > > > > > > + * If two examples of this operation race, one can
> > > > > > > > appear
> > > > > > > > to
> > > > > > > > succeed
> > > > > > > > + * but actually fail.  You must protect multiple
> > > > > > > > accesses
> > > > > > > > with
> > > > > > > > a
> > > > > > > > lock.
> > > > > > > > + */
> > > > > > > 
> > > > > > > Sorry for only mentioning this on v10. I think this
> > > > > > > comment
> > > > > > > should be
> > > > > > > duplicated (or moved to) on top of test_bit() because
> > > > > > > this is
> > > > > > > what
> > > > > > > everyone will use. This will avoid the developper to
> > > > > > > follow
> > > > > > > the
> > > > > > > function
> > > > > > > calls and only notice the x86 version which says "This
> > > > > > > function
> > > > > > > is
> > > > > > > atomic and may not be reordered." and would be wrong for
> > > > > > > all
> > > > > > > the
> > > > > > > other arch.
> > > > > > It makes sense to add this comment on top of test_bit(),
> > > > > > but I
> > > > > > am
> > > > > > curious if it is needed to mention that for x86
> > > > > > arch_test_bit()
> > > > > > "is
> > > > > > atomic and may not be reordered":
> > > > > 
> > > > > I would say no because any developper modifying common code
> > > > > can't
> > > > > relying it.
> > > > > 
> > > > > > 
> > > > > >   * This operation is non-atomic and can be reordered. (
> > > > > > Exception:
> > > > > > for
> > > > > > * x86 arch_test_bit() is atomic and may not be reordered )
> > > > > >   * If two examples of this operation race, one can appear
> > > > > > to
> > > > > > succeed
> > > > > >   * but actually fail.  You must protect multiple accesses
> > > > > > with
> > > > > > a
> > > > > > lock.
> > > > > >   */
> > > > > > 
> > > > > > > 
> > > > > > > > +static always_inline bool
> > > > > > > > +generic__test_and_set_bit(int nr, volatile void *addr)
> > > > > > > > +{
> > > > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t
> > > > > > > > *)addr +
> > > > > > > > BITOP_WORD(nr);
> > > > > > > > +    bitop_uint_t old = *p;
> > > > > > > > +
> > > > > > > > +    *p = old | mask;
> > > > > > > > +    return (old & mask);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +/**
> > > > > > > > + * generic__test_and_clear_bit - Clear a bit and
> > > > > > > > return
> > > > > > > > its
> > > > > > > > old
> > > > > > > > value
> > > > > > > > + * @nr: Bit to clear
> > > > > > > > + * @addr: Address to count from
> > > > > > > > + *
> > > > > > > > + * This operation is non-atomic and can be reordered.
> > > > > > > > + * If two examples of this operation race, one can
> > > > > > > > appear
> > > > > > > > to
> > > > > > > > succeed
> > > > > > > > + * but actually fail.  You must protect multiple
> > > > > > > > accesses
> > > > > > > > with
> > > > > > > > a
> > > > > > > > lock.
> > > > > > > > + */
> > > > > > > 
> > > > > > > Same applies here and ...
> > > > > > > 
> > > > > > > > +static always_inline bool
> > > > > > > > +generic__test_and_clear_bit(int nr, volatile void
> > > > > > > > *addr)
> > > > > > > > +{
> > > > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t
> > > > > > > > *)addr +
> > > > > > > > BITOP_WORD(nr);
> > > > > > > > +    bitop_uint_t old = *p;
> > > > > > > > +
> > > > > > > > +    *p = old & ~mask;
> > > > > > > > +    return (old & mask);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +/* WARNING: non atomic and it can be reordered! */
> > > > > > > 
> > > > > > > ... here.
> > > > > > > 
> > > > > > > > +static always_inline bool
> > > > > > > > +generic__test_and_change_bit(int nr, volatile void
> > > > > > > > *addr)
> > > > > > > > +{
> > > > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t
> > > > > > > > *)addr +
> > > > > > > > BITOP_WORD(nr);
> > > > > > > > +    bitop_uint_t old = *p;
> > > > > > > > +
> > > > > > > > +    *p = old ^ mask;
> > > > > > > > +    return (old & mask);
> > > > > > > > +}
> > > > > > > > +/**
> > > > > > > > + * generic_test_bit - Determine whether a bit is set
> > > > > > > > + * @nr: bit number to test
> > > > > > > > + * @addr: Address to start counting from
> > > > > > > > + */
> > > > > > > > +static always_inline bool generic_test_bit(int nr,
> > > > > > > > const
> > > > > > > > volatile
> > > > > > > > void *addr)
> > > > > > > > +{
> > > > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > > > +    const volatile bitop_uint_t *p =
> > > > > > > > +                        (const volatile bitop_uint_t
> > > > > > > > *)addr +
> > > > > > > > BITOP_WORD(nr);
> > > > > > > > +
> > > > > > > > +    return (*p & mask);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static always_inline bool
> > > > > > > > +__test_and_set_bit(int nr, volatile void *addr)
> > > > > > > > +{
> > > > > > > > +#ifndef arch__test_and_set_bit
> > > > > > > > +#define arch__test_and_set_bit
> > > > > > > > generic__test_and_set_bit
> > > > > > > > +#endif
> > > > > > > > +
> > > > > > > > +    return arch__test_and_set_bit(nr, addr);
> > > > > > > > +}
> > > > > > > 
> > > > > > > NIT: It is a bit too late to change this one. But I have
> > > > > > > to
> > > > > > > admit, I
> > > > > > > don't understand the purpose of the static inline when
> > > > > > > you
> > > > > > > could
> > > > > > > have
> > > > > > > simply call...
> > > > > > > 
> > > > > > > > +#define __test_and_set_bit(nr, addr) ({             \
> > > > > > > > +    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
> > > > > > > > +    __test_and_set_bit(nr, addr);                   \
> > > > > > > 
> > > > > > > ... __arch__test_and_set_bit here.
> > > > > > I just wanted to be in sync with other non-atomic
> > > > > > test*_bit()
> > > > > > operations and Andrew's patch series ( xen/bitops: Reduce
> > > > > > the
> > > > > > mess,
> > > > > > starting with ffs() ).
> > > > > 
> > > > > I looked at Andrew series and I can't seem to find an example
> > > > > where
> > > > > he 
> > > > > uses both the macro + static inline. He seems only use a
> > > > > static
> > > > > inline.
> > > > > Do you have any pointer?
> > > > > 
> > > > > But by any chance are you referring to the pattern on x86? If
> > > > > so,
> > > > > I 
> > > > > would really like to understand what's the benefits of
> > > > > introducing a
> > > > > a 
> > > > > one-liner static inline which is "shadowed" by a macro...
> > > > Yes, I was referring to the x86 pattern.
> > > > 
> > > > I tried to find the reason in the commit but couldn't:
> > > > https://lists.xenproject.org/archives/html/xen-changelog/2008-03/msg00097.html
> > > > 
> > > > For some reason, I also couldn't find the mailing list thread
> > > > for
> > > > this.
> > > > 
> > > > Perhaps Jan could help here, based on the commit message he
> > > > might
> > > > have
> > > > a suggestion.
> > > 
> > > There's a lot of apparently unrelated context left, so I'm trying
> > > to
> > > guess
> > > on what the actual question is, from the old change you're
> > > pointing
> > > at: With
> > > 
> > > #define __test_and_set_bit(nr, addr) ({             \
> > >    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
> > >    __test_and_set_bit(nr, addr);                   \
> > > 
> > > why do we have that wrapping macro? If that's indeed the
> > > question,
> > > then may
> > > I suggest you consider what would happen to the sizeof() in
> > > bitop_bad_size()
> > > if the invocation was moved to the inline function, taking
> > > pointer-
> > > to-void?
> > > 
> > > However, I can't relate that old change to the question I think
> > > Julien
> > > raised (invoking __test_and_set_bit() vs
> > > arch__test_and_set_bit()),
> > > so
> > > perhaps I'm missing something.
> > If I understand the question correctly then the question is why do
> > we
> > need static inline function. Can't we just do the following:
> > 
> > #ifndef arch_test_bit
> > #define arch_test_bit generic_test_bit
> > #endif
> > 
> > #define test_bit(nr, addr) ({                           \
> >     if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> >     arch_test_bit(nr, addr);                            \
> > })
> 
> I think we can. But then how did that old commit come into play?
In the old commit, this approach was introduced where we have a 'static
inline function' called within a macro, but I only now realized that it
was done solely because of the check "if (bitop_bad_size(addr))
__bitop_bad_size();".

As far as I understand, Julien doesn't insist on dropping the 'static
inline function' and using arch_test_bit() directly in the test_bit()
macro, so I prefer to use the option implemented in this patch.
However, if anyone has objections to this approach and thinks it would
be better to drop the 'static inline functions' and call arch_*()
directly in a macro, I am okay with reworking it today and sending a
new patch version where this changes will be taking into account.

Anyway, I will send a new version of this patch today as Julien asked
me to add comments on top of the functions generic_test_bit() and
generic__test_and_change_bit(). Additionally, "#define BITS_PER_BYTE 8"
will be moved to xen/bitops.h as it seems to be always the same for
every architecture.

~ Oleksii
Oleksii K. May 24, 2024, 8:58 a.m. UTC | #10
On Thu, 2024-05-23 at 15:33 +0100, Julien Grall wrote:
> > > >     #include <asm/bitops.h>
> > > >     
> > > > +/**
> > > > + * generic__test_and_set_bit - Set a bit and return its old
> > > > value
> > > > + * @nr: Bit to set
> > > > + * @addr: Address to count from
> > > > + *
> > > > + * This operation is non-atomic and can be reordered.
> > > > + * If two examples of this operation race, one can appear to
> > > > succeed
> > > > + * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > > + */
> > > 
> > > Sorry for only mentioning this on v10. I think this comment
> > > should be
> > > duplicated (or moved to) on top of test_bit() because this is
> > > what
> > > everyone will use. This will avoid the developper to follow the
> > > function
> > > calls and only notice the x86 version which says "This function
> > > is
> > > atomic and may not be reordered." and would be wrong for all the
> > > other arch.
> > It makes sense to add this comment on top of test_bit(), but I am
> > curious if it is needed to mention that for x86 arch_test_bit() "is
> > atomic and may not be reordered":
> 
> I would say no because any developper modifying common code can't 
> relying it.
But won't then be confusion that if not generic implementation of
test_bit() is chosen then test_bit() can be " atomic and cannot be
reordered " ( as it is in case of x86 )?

~ Oleksii
Julien Grall May 24, 2024, 9:17 a.m. UTC | #11
Hi Oleksii,

On 24/05/2024 09:58, Oleksii K. wrote:
> On Thu, 2024-05-23 at 15:33 +0100, Julien Grall wrote:
>>>>>      #include <asm/bitops.h>
>>>>>      
>>>>> +/**
>>>>> + * generic__test_and_set_bit - Set a bit and return its old
>>>>> value
>>>>> + * @nr: Bit to set
>>>>> + * @addr: Address to count from
>>>>> + *
>>>>> + * This operation is non-atomic and can be reordered.
>>>>> + * If two examples of this operation race, one can appear to
>>>>> succeed
>>>>> + * but actually fail.  You must protect multiple accesses with
>>>>> a
>>>>> lock.
>>>>> + */
>>>>
>>>> Sorry for only mentioning this on v10. I think this comment
>>>> should be
>>>> duplicated (or moved to) on top of test_bit() because this is
>>>> what
>>>> everyone will use. This will avoid the developper to follow the
>>>> function
>>>> calls and only notice the x86 version which says "This function
>>>> is
>>>> atomic and may not be reordered." and would be wrong for all the
>>>> other arch.
>>> It makes sense to add this comment on top of test_bit(), but I am
>>> curious if it is needed to mention that for x86 arch_test_bit() "is
>>> atomic and may not be reordered":
>>
>> I would say no because any developper modifying common code can't
>> relying it.
> But won't then be confusion that if not generic implementation of
> test_bit() is chosen then test_bit() can be " atomic and cannot be
> reordered " ( as it is in case of x86 )?

I don't understand what confusion could arise. A developper cannot rely 
on the x86 behavior in common code. They have to write code that works 
for every arch.

The first thing a developer will do is to check test_bit(). The comment 
on top will say they can't relying on ordering. And that what most of 
the developper needs to rely on.

If someone wants to write more optimized code for x86, they are free to 
look at the implementation. But I don't think this should be detailed on 
top of the common wrapper (the x86 implementation would still be 
compatible with the comment).

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/arm/arm64/livepatch.c b/xen/arch/arm/arm64/livepatch.c
index df2cebedde..4bc8ed9be5 100644
--- a/xen/arch/arm/arm64/livepatch.c
+++ b/xen/arch/arm/arm64/livepatch.c
@@ -10,7 +10,6 @@ 
 #include <xen/mm.h>
 #include <xen/vmap.h>
 
-#include <asm/bitops.h>
 #include <asm/byteorder.h>
 #include <asm/insn.h>
 #include <asm/livepatch.h>
diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h
index 5104334e48..8e16335e76 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -22,9 +22,6 @@ 
 #define __set_bit(n,p)            set_bit(n,p)
 #define __clear_bit(n,p)          clear_bit(n,p)
 
-#define BITOP_BITS_PER_WORD     32
-#define BITOP_MASK(nr)          (1UL << ((nr) % BITOP_BITS_PER_WORD))
-#define BITOP_WORD(nr)          ((nr) / BITOP_BITS_PER_WORD)
 #define BITS_PER_BYTE           8
 
 #define ADDR (*(volatile int *) addr)
@@ -76,70 +73,6 @@  bool test_and_change_bit_timeout(int nr, volatile void *p,
 bool clear_mask16_timeout(uint16_t mask, volatile void *p,
                           unsigned int max_try);
 
-/**
- * __test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static inline int __test_and_set_bit(int nr, volatile void *addr)
-{
-        unsigned int mask = BITOP_MASK(nr);
-        volatile unsigned int *p =
-                ((volatile unsigned int *)addr) + BITOP_WORD(nr);
-        unsigned int old = *p;
-
-        *p = old | mask;
-        return (old & mask) != 0;
-}
-
-/**
- * __test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static inline int __test_and_clear_bit(int nr, volatile void *addr)
-{
-        unsigned int mask = BITOP_MASK(nr);
-        volatile unsigned int *p =
-                ((volatile unsigned int *)addr) + BITOP_WORD(nr);
-        unsigned int old = *p;
-
-        *p = old & ~mask;
-        return (old & mask) != 0;
-}
-
-/* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-                                            volatile void *addr)
-{
-        unsigned int mask = BITOP_MASK(nr);
-        volatile unsigned int *p =
-                ((volatile unsigned int *)addr) + BITOP_WORD(nr);
-        unsigned int old = *p;
-
-        *p = old ^ mask;
-        return (old & mask) != 0;
-}
-
-/**
- * test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static inline int test_bit(int nr, const volatile void *addr)
-{
-        const volatile unsigned int *p = (const volatile unsigned int *)addr;
-        return 1UL & (p[BITOP_WORD(nr)] >> (nr & (BITOP_BITS_PER_WORD-1)));
-}
-
 /*
  * On ARMv5 and above those functions can be implemented around
  * the clz instruction for much better code efficiency.
diff --git a/xen/arch/ppc/include/asm/bitops.h b/xen/arch/ppc/include/asm/bitops.h
index 20f7865358..049aa62b89 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -15,9 +15,6 @@ 
 #define __set_bit(n, p)         set_bit(n, p)
 #define __clear_bit(n, p)       clear_bit(n, p)
 
-#define BITOP_BITS_PER_WORD     32
-#define BITOP_MASK(nr)          (1U << ((nr) % BITOP_BITS_PER_WORD))
-#define BITOP_WORD(nr)          ((nr) / BITOP_BITS_PER_WORD)
 #define BITS_PER_BYTE           8
 
 /* PPC bit number conversion */
@@ -69,17 +66,6 @@  static inline void clear_bit(int nr, volatile void *addr)
     clear_bits(BITOP_MASK(nr), (volatile unsigned int *)addr + BITOP_WORD(nr));
 }
 
-/**
- * test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static inline int test_bit(int nr, const volatile void *addr)
-{
-    const volatile unsigned int *p = addr;
-    return 1 & (p[BITOP_WORD(nr)] >> (nr & (BITOP_BITS_PER_WORD - 1)));
-}
-
 static inline unsigned int test_and_clear_bits(
     unsigned int mask,
     volatile unsigned int *p)
@@ -133,44 +119,6 @@  static inline int test_and_set_bit(unsigned int nr, volatile void *addr)
         (volatile unsigned int *)addr + BITOP_WORD(nr)) != 0;
 }
 
-/**
- * __test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static inline int __test_and_set_bit(int nr, volatile void *addr)
-{
-    unsigned int mask = BITOP_MASK(nr);
-    volatile unsigned int *p = (volatile unsigned int *)addr + BITOP_WORD(nr);
-    unsigned int old = *p;
-
-    *p = old | mask;
-    return (old & mask) != 0;
-}
-
-/**
- * __test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static inline int __test_and_clear_bit(int nr, volatile void *addr)
-{
-    unsigned int mask = BITOP_MASK(nr);
-    volatile unsigned int *p = (volatile unsigned int *)addr + BITOP_WORD(nr);
-    unsigned int old = *p;
-
-    *p = old & ~mask;
-    return (old & mask) != 0;
-}
-
 #define flsl(x) generic_flsl(x)
 #define fls(x) generic_fls(x)
 
diff --git a/xen/arch/ppc/include/asm/page.h b/xen/arch/ppc/include/asm/page.h
index 890e285051..6d4cd2611c 100644
--- a/xen/arch/ppc/include/asm/page.h
+++ b/xen/arch/ppc/include/asm/page.h
@@ -2,9 +2,9 @@ 
 #ifndef _ASM_PPC_PAGE_H
 #define _ASM_PPC_PAGE_H
 
+#include <xen/bitops.h>
 #include <xen/types.h>
 
-#include <asm/bitops.h>
 #include <asm/byteorder.h>
 
 #define PDE_VALID     PPC_BIT(0)
diff --git a/xen/arch/ppc/mm-radix.c b/xen/arch/ppc/mm-radix.c
index ab5a10695c..9055730997 100644
--- a/xen/arch/ppc/mm-radix.c
+++ b/xen/arch/ppc/mm-radix.c
@@ -1,11 +1,11 @@ 
 /* SPDX-License-Identifier: GPL-2.0-or-later */
+#include <xen/bitops.h>
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/mm.h>
 #include <xen/types.h>
 #include <xen/lib.h>
 
-#include <asm/bitops.h>
 #include <asm/byteorder.h>
 #include <asm/early_printk.h>
 #include <asm/page.h>
diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h
index dd439b69a0..23f09fdb7a 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -19,9 +19,6 @@ 
 #define ADDR (*(volatile int *) addr)
 #define CONST_ADDR (*(const volatile int *) addr)
 
-extern void __bitop_bad_size(void);
-#define bitop_bad_size(addr) (sizeof(*(addr)) < 4)
-
 /**
  * set_bit - Atomically set a bit in memory
  * @nr: the bit to set
@@ -175,7 +172,7 @@  static inline int test_and_set_bit(int nr, volatile void *addr)
 })
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * arch__test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -183,7 +180,7 @@  static inline int test_and_set_bit(int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, void *addr)
+static inline int arch__test_and_set_bit(int nr, volatile void *addr)
 {
     int oldbit;
 
@@ -194,10 +191,7 @@  static inline int __test_and_set_bit(int nr, void *addr)
 
     return oldbit;
 }
-#define __test_and_set_bit(nr, addr) ({                 \
-    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
-    __test_and_set_bit(nr, addr);                       \
-})
+#define arch__test_and_set_bit arch__test_and_set_bit
 
 /**
  * test_and_clear_bit - Clear a bit and return its old value
@@ -224,7 +218,7 @@  static inline int test_and_clear_bit(int nr, volatile void *addr)
 })
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * arch__test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -232,7 +226,7 @@  static inline int test_and_clear_bit(int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, void *addr)
+static inline int arch__test_and_clear_bit(int nr, volatile void *addr)
 {
     int oldbit;
 
@@ -243,13 +237,10 @@  static inline int __test_and_clear_bit(int nr, void *addr)
 
     return oldbit;
 }
-#define __test_and_clear_bit(nr, addr) ({               \
-    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
-    __test_and_clear_bit(nr, addr);                     \
-})
+#define arch__test_and_clear_bit arch__test_and_clear_bit
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr, void *addr)
+static inline int arch__test_and_change_bit(int nr, volatile void *addr)
 {
     int oldbit;
 
@@ -260,10 +251,7 @@  static inline int __test_and_change_bit(int nr, void *addr)
 
     return oldbit;
 }
-#define __test_and_change_bit(nr, addr) ({              \
-    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
-    __test_and_change_bit(nr, addr);                    \
-})
+#define arch__test_and_change_bit arch__test_and_change_bit
 
 /**
  * test_and_change_bit - Change a bit and return its new value
@@ -307,8 +295,7 @@  static inline int variable_test_bit(int nr, const volatile void *addr)
     return oldbit;
 }
 
-#define test_bit(nr, addr) ({                           \
-    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
+#define arch_test_bit(nr, addr) ({                      \
     __builtin_constant_p(nr) ?                          \
         constant_test_bit(nr, addr) :                   \
         variable_test_bit(nr, addr);                    \
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index f14ad0d33a..6eeeff0117 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -65,10 +65,141 @@  static inline int generic_flsl(unsigned long x)
  * scope
  */
 
+#define BITOP_BITS_PER_WORD 32
+typedef uint32_t bitop_uint_t;
+
+#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) % BITOP_BITS_PER_WORD))
+
+#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
+
+extern void __bitop_bad_size(void);
+
+#define bitop_bad_size(addr) (sizeof(*(addr)) < sizeof(bitop_uint_t))
+
 /* --------------------- Please tidy above here --------------------- */
 
 #include <asm/bitops.h>
 
+/**
+ * generic__test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static always_inline bool
+generic__test_and_set_bit(int nr, volatile void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+    bitop_uint_t old = *p;
+
+    *p = old | mask;
+    return (old & mask);
+}
+
+/**
+ * generic__test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static always_inline bool
+generic__test_and_clear_bit(int nr, volatile void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+    bitop_uint_t old = *p;
+
+    *p = old & ~mask;
+    return (old & mask);
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static always_inline bool
+generic__test_and_change_bit(int nr, volatile void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+    bitop_uint_t old = *p;
+
+    *p = old ^ mask;
+    return (old & mask);
+}
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static always_inline bool generic_test_bit(int nr, const volatile void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    const volatile bitop_uint_t *p =
+                        (const volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+
+    return (*p & mask);
+}
+
+static always_inline bool
+__test_and_set_bit(int nr, volatile void *addr)
+{
+#ifndef arch__test_and_set_bit
+#define arch__test_and_set_bit generic__test_and_set_bit
+#endif
+
+    return arch__test_and_set_bit(nr, addr);
+}
+#define __test_and_set_bit(nr, addr) ({             \
+    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
+    __test_and_set_bit(nr, addr);                   \
+})
+
+static always_inline bool
+__test_and_clear_bit(int nr, volatile void *addr)
+{
+#ifndef arch__test_and_clear_bit
+#define arch__test_and_clear_bit generic__test_and_clear_bit
+#endif
+
+    return arch__test_and_clear_bit(nr, addr);
+}
+#define __test_and_clear_bit(nr, addr) ({           \
+    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
+    __test_and_clear_bit(nr, addr);                 \
+})
+
+static always_inline bool
+__test_and_change_bit(int nr, volatile void *addr)
+{
+#ifndef arch__test_and_change_bit
+#define arch__test_and_change_bit generic__test_and_change_bit
+#endif
+
+    return arch__test_and_change_bit(nr, addr);
+}
+#define __test_and_change_bit(nr, addr) ({              \
+    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
+    __test_and_change_bit(nr, addr);                    \
+})
+
+static always_inline bool test_bit(int nr, const volatile void *addr)
+{
+#ifndef arch_test_bit
+#define arch_test_bit generic_test_bit
+#endif
+
+    return arch_test_bit(nr, addr);
+}
+#define test_bit(nr, addr) ({                           \
+    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
+    test_bit(nr, addr);                                 \
+})
+
 /*
  * Find First Set bit.  Bits are labelled from 1.
  */