[v9,02/15] xen: introduce generic non-atomic test_*bit()

Message ID	616e8be09f217a766b96c4f9060f6658636a4338.1714988096.git.oleksii.kurochko@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Oleksii Kurochko <oleksii.kurochko@gmail.com> To: xen-devel@lists.xenproject.org Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>, =?utf-8?q?Roger_Pau_Monn?= =?utf-8?q?=C3=A9?= <roger.pau@citrix.com>, Ross Lagerwall <ross.lagerwall@citrix.com>, Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>, Bertrand Marquis <bertrand.marquis@arm.com>, Michal Orzel <michal.orzel@amd.com>, Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>, Andrew Cooper <andrew.cooper3@citrix.com>, George Dunlap <george.dunlap@citrix.com>, Jan Beulich <jbeulich@suse.com>, Shawn Anastasio <sanastasio@raptorengineering.com> Subject: [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit() Date: Mon, 6 May 2024 12:15:11 +0200 Message-ID: <616e8be09f217a766b96c4f9060f6658636a4338.1714988096.git.oleksii.kurochko@gmail.com> In-Reply-To: <cover.1714988096.git.oleksii.kurochko@gmail.com> References: <cover.1714988096.git.oleksii.kurochko@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Enable build of full Xen for RISC-V \| expand [v9,00/15] Enable build of full Xen for RISC-V [v9,01/15] xen/riscv: disable unnecessary configs [v9,02/15] xen: introduce generic non-atomic test_bit() [v9,03/15] xen/bitops: implement fls{l}() in common logic [v9,04/15] xen/bitops: put __ffs() into linux compatible header [v9,05/15] xen/riscv: introduce bitops.h [v9,06/15] xen/riscv: introduce cmpxchg.h [v9,07/15] xen/riscv: introduce atomic.h [v9,08/15] xen/riscv: introduce monitor.h [v9,09/15] xen/riscv: add definition of __read_mostly [v9,10/15] xen/riscv: add required things to current.h [v9,11/15] xen/riscv: add minimal stuff to mm.h to build full Xen [v9,12/15] xen/riscv: introduce vm_event_() functions [v9,13/15] xen/riscv: add minimal amount of stubs to build full Xen [v9,14/15] xen/riscv: enable full Xen build [v9,15/15] xen/README: add compiler and binutils versions for RISC-V64

Oleksii Kurochko May 6, 2024, 10:15 a.m. UTC

The following generic functions were introduced:
* test_bit
* generic__test_and_set_bit
* generic__test_and_clear_bit
* generic__test_and_change_bit

Also, the patch introduces the following generics which are
used by the functions mentioned above:
* BITOP_BITS_PER_WORD
* BITOP_MASK
* BITOP_WORD
* BITOP_TYPE

These functions and macros can be useful for architectures
that don't have corresponding arch-specific instructions.

Because of that x86 has the following check in the macros test_bit(),
__test_and_set_bit(), __test_and_clear_bit(), __test_and_change_bit():
    if ( bitop_bad_size(addr) ) __bitop_bad_size();
It was necessary to make bitop bad size check generic too, so
arch_check_bitop_size() was introduced.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
   The context ("* Find First Set bit.  Bits are labelled from 1." in xen/bitops.h )
   suggests there's a dependency on an uncommitted patch. It happens becuase the current patch
   series is based on Andrew's patch series ( https://lore.kernel.org/xen-devel/20240313172716.2325427-1-andrew.cooper3@citrix.com/T/#t ),
   but if everything is okay with the current one patch it can be merged without Andrew's patch series being merged.
---
Changes in V9:
  - move up xen/bitops.h in ppc/asm/page.h.
  - update defintion of arch_check_bitop_size.
    And drop correspondent macros from x86/asm/bitops.h
  - drop parentheses in generic__test_and_set_bit() for definition of
    local variable p.
  - fix indentation inside #ifndef BITOP_TYPE...#endif
  - update the commit message.
---
 Changes in V8:
  - drop __pure for function which uses volatile.
  - drop unnessary () in generic__test_and_change_bit() for addr casting.
  - update prototype of generic_test_bit() and test_bit(): now it returns bool
    instead of int.
  - update generic_test_bit() to use BITOP_MASK().
  - Deal with fls{l} changes: it should be in the patch with introduced generic fls{l}.
  - add a footer with explanation of dependency on an uncommitted patch after Signed-off.
  - abstract bitop_size().
  - move BITOP_TYPE define to <xen/types.h>.
---
 Changes in V7:
  - move everything to xen/bitops.h to follow the same approach for all generic
    bit ops.
  - put together BITOP_BITS_PER_WORD and bitops_uint_t.
  - make BITOP_MASK more generic.
  - drop #ifdef ... #endif around BITOP_MASK, BITOP_WORD as they are generic
    enough.
  - drop "_" for generic__{test_and_set_bit,...}().
  - drop " != 0" for functions which return bool.
  - add volatile during the cast for generic__{...}().
  - update the commit message.
  - update arch related code to follow the proposed generic approach.
---
 Changes in V6:
  - Nothing changed ( only rebase )
---
 Changes in V5:
   - new patch
---
 xen/arch/arm/arm64/livepatch.c    |   1 -
 xen/arch/arm/include/asm/bitops.h |  67 ---------------
 xen/arch/ppc/include/asm/bitops.h |  52 ------------
 xen/arch/ppc/include/asm/page.h   |   2 +-
 xen/arch/ppc/mm-radix.c           |   2 +-
 xen/arch/x86/include/asm/bitops.h |  31 ++-----
 xen/include/xen/bitops.h          | 134 ++++++++++++++++++++++++++++++
 xen/include/xen/types.h           |   6 ++
 8 files changed, 151 insertions(+), 144 deletions(-)

Jan Beulich May 15, 2024, 8:52 a.m. UTC | #1

On 06.05.2024 12:15, Oleksii Kurochko wrote:
> The following generic functions were introduced:
> * test_bit
> * generic__test_and_set_bit
> * generic__test_and_clear_bit
> * generic__test_and_change_bit
> 
> Also, the patch introduces the following generics which are
> used by the functions mentioned above:
> * BITOP_BITS_PER_WORD
> * BITOP_MASK
> * BITOP_WORD
> * BITOP_TYPE
> 
> These functions and macros can be useful for architectures
> that don't have corresponding arch-specific instructions.

Logically this paragraph may better move ahead of the BITOP_* one.

> Because of that x86 has the following check in the macros test_bit(),
> __test_and_set_bit(), __test_and_clear_bit(), __test_and_change_bit():
>     if ( bitop_bad_size(addr) ) __bitop_bad_size();
> It was necessary to make bitop bad size check generic too, so
> arch_check_bitop_size() was introduced.

Not anymore, with the most recent adjustments? There's nothing arch-
specific anymore in the checking.

> @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, void *addr)
> +static inline int arch__test_and_set_bit(int nr, volatile void *addr)

I think I raised this point before: Why arch__ here, ...

> @@ -232,7 +226,7 @@ static inline int test_and_clear_bit(int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, void *addr)
> +static inline int arch__test_and_clear_bit(int nr, volatile void *addr)

... here, ...

> @@ -243,13 +237,10 @@ static inline int __test_and_clear_bit(int nr, void *addr)
>  
>      return oldbit;
>  }
> -#define __test_and_clear_bit(nr, addr) ({               \
> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> -    __test_and_clear_bit(nr, addr);                     \
> -})
> +#define arch__test_and_clear_bit arch__test_and_clear_bit
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr, void *addr)
> +static inline int arch__test_and_change_bit(int nr, volatile void *addr)

... and here, while ...

> @@ -307,8 +295,7 @@ static inline int variable_test_bit(int nr, const volatile void *addr)
>      return oldbit;
>  }
>  
> -#define test_bit(nr, addr) ({                           \
> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> +#define arch_test_bit(nr, addr) ({                      \

... just arch_ here? I don't like the double underscore infixes very
much, but I'm okay with them as long as they're applied consistently.

> --- a/xen/include/xen/bitops.h
> +++ b/xen/include/xen/bitops.h
> @@ -65,10 +65,144 @@ static inline int generic_flsl(unsigned long x)
>   * scope
>   */
>  
> +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) % BITOP_BITS_PER_WORD))
> +
> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> +
> +extern void __bitop_bad_size(void);
> +
>  /* --------------------- Please tidy above here --------------------- */
>  
>  #include <asm/bitops.h>
>  
> +#ifndef arch_check_bitop_size
> +
> +#define bitop_bad_size(addr) sizeof(*(addr)) < sizeof(bitop_uint_t)

Nit: Missing parentheses around the whole expression.

> +#define arch_check_bitop_size(addr) \
> +    if ( bitop_bad_size(addr) ) __bitop_bad_size();

Apart from the arch_ prefix that now wants dropping, this macro (if we
want one in the first place) also wants writing in a way such that it
can be used as a normal expression, without double semicolons or other
anomalies (potentially) resulting at use sites. E.g.

#define check_bitop_size(addr) do { \
    if ( bitop_bad_size(addr) )     \
        __bitop_bad_size();         \
} while ( 0 )

> +#endif /* arch_check_bitop_size */
> +
> +/**
> + * generic__test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static always_inline bool
> +generic__test_and_set_bit(unsigned long nr, volatile void *addr)

The original per-arch functions all use "int" for their first parameter.
Here you use unsigned long, without any mention in the description of the
potential behavioral change. Which is even worse given that then x86 ends
up inconsistent with Arm and PPC in this regard, by ...

> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old | mask;
> +    return (old & mask);
> +}
> +
> +/**
> + * generic__test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static always_inline bool
> +generic__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old & ~mask;
> +    return (old & mask);
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */
> +static always_inline bool
> +generic__test_and_change_bit(unsigned long nr, volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old ^ mask;
> +    return (old & mask);
> +}
> +/**
> + * generic_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */
> +static always_inline bool generic_test_bit(int nr, const volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +
> +    return (*p & mask);
> +}
> +
> +static always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile void *addr)
> +{
> +#ifndef arch__test_and_set_bit
> +#define arch__test_and_set_bit generic__test_and_set_bit
> +#endif
> +
> +    return arch__test_and_set_bit(nr, addr);

... silently truncating and sign-converting nr here.

As to generic_test_bit() - please don't cast away const-ness there.

> +}
> +#define __test_and_set_bit(nr, addr) ({             \
> +    arch_check_bitop_size(addr);                    \
> +    __test_and_set_bit(nr, addr);                   \
> +})
> +
> +static always_inline bool
> +__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)

Oddly enough here at least you use bitop_uint_t, but that's still ...

> +{
> +#ifndef arch__test_and_clear_bit
> +#define arch__test_and_clear_bit generic__test_and_clear_bit
> +#endif
> +
> +    return arch__test_and_clear_bit(nr, addr);

... meaning a signedness conversion on x86 then. And beware: You can't
simply change x86'es code to use bitop_uint_t. The underlying insns used
interpret the bit position as a signed number, i.e. permitting accesses
below the incoming pointer (whether it really makes sense to be that way
is a separate question). I'm afraid I have no good suggestion how to deal
with that: Any approach I can think of is either detrimental to the
generic implementation or would have unwanted effects on the x86 one.
Others, anyone?

> --- a/xen/include/xen/types.h
> +++ b/xen/include/xen/types.h
> @@ -64,6 +64,12 @@ typedef __u64 __be64;
>  
>  typedef unsigned int __attribute__((__mode__(__pointer__))) uintptr_t;
>  
> +#ifndef BITOP_TYPE
> +#define BITOP_BITS_PER_WORD 32
> +
> +typedef uint32_t bitop_uint_t;
> +#endif

I think you mentioned to me before why this needs to live here, not in
xen/bitops.h. Yet I don't recall the reason, and the description (hint,
hint) doesn't say anything either.

Jan

Oleksii Kurochko May 15, 2024, 3:29 p.m. UTC | #2

On Wed, 2024-05-15 at 10:52 +0200, Jan Beulich wrote:
> On 06.05.2024 12:15, Oleksii Kurochko wrote:
> > The following generic functions were introduced:
> > * test_bit
> > * generic__test_and_set_bit
> > * generic__test_and_clear_bit
> > * generic__test_and_change_bit
> > 
> > Also, the patch introduces the following generics which are
> > used by the functions mentioned above:
> > * BITOP_BITS_PER_WORD
> > * BITOP_MASK
> > * BITOP_WORD
> > * BITOP_TYPE
> > 
> > These functions and macros can be useful for architectures
> > that don't have corresponding arch-specific instructions.
> 
> Logically this paragraph may better move ahead of the BITOP_* one.
> 
> > Because of that x86 has the following check in the macros
> > test_bit(),
> > __test_and_set_bit(), __test_and_clear_bit(),
> > __test_and_change_bit():
> >     if ( bitop_bad_size(addr) ) __bitop_bad_size();
> > It was necessary to make bitop bad size check generic too, so
> > arch_check_bitop_size() was introduced.
> 
> Not anymore, with the most recent adjustments? There's nothing arch-
> specific anymore in the checking.
> 
> > @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int nr,
> > volatile void *addr)
> >   * If two examples of this operation race, one can appear to
> > succeed
> >   * but actually fail.  You must protect multiple accesses with a
> > lock.
> >   */
> > -static inline int __test_and_set_bit(int nr, void *addr)
> > +static inline int arch__test_and_set_bit(int nr, volatile void
> > *addr)
> 
> I think I raised this point before: Why arch__ here, ...
> 
> > @@ -232,7 +226,7 @@ static inline int test_and_clear_bit(int nr,
> > volatile void *addr)
> >   * If two examples of this operation race, one can appear to
> > succeed
> >   * but actually fail.  You must protect multiple accesses with a
> > lock.
> >   */
> > -static inline int __test_and_clear_bit(int nr, void *addr)
> > +static inline int arch__test_and_clear_bit(int nr, volatile void
> > *addr)
> 
> ... here, ...
> 
> > @@ -243,13 +237,10 @@ static inline int __test_and_clear_bit(int
> > nr, void *addr)
> >  
> >      return oldbit;
> >  }
> > -#define __test_and_clear_bit(nr, addr) ({               \
> > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > -    __test_and_clear_bit(nr, addr);                     \
> > -})
> > +#define arch__test_and_clear_bit arch__test_and_clear_bit
> >  
> >  /* WARNING: non atomic and it can be reordered! */
> > -static inline int __test_and_change_bit(int nr, void *addr)
> > +static inline int arch__test_and_change_bit(int nr, volatile void
> > *addr)
> 
> ... and here, while ...
> 
> > @@ -307,8 +295,7 @@ static inline int variable_test_bit(int nr,
> > const volatile void *addr)
> >      return oldbit;
> >  }
> >  
> > -#define test_bit(nr, addr) ({                           \
> > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > +#define arch_test_bit(nr, addr) ({                      \
> 
> ... just arch_ here? I don't like the double underscore infixes very
> much, but I'm okay with them as long as they're applied consistently.

Common code and x86 use __test_and_clear_bit(), and this patch provides
a generic version of __test_and_clear_bit(). To emphasize that
generic__test_and_clear_bit() is a common implementation of
__test_and_clear_bit(), the double underscore was retained. Also,
test_and_clear_bit() exists and if one day it will be needed to provide
a generic version of it, then it will be needed to have
generic__test_and_clear_bit() and generic_test_and_clear_bit()

A similar logic was chosen for test_bit.


> 
> > --- a/xen/include/xen/bitops.h
> > +++ b/xen/include/xen/bitops.h
> > @@ -65,10 +65,144 @@ static inline int generic_flsl(unsigned long
> > x)
> >   * scope
> >   */
> >  
> > +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
> > BITOP_BITS_PER_WORD))
> > +
> > +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > +
> > +extern void __bitop_bad_size(void);
> > +
> >  /* --------------------- Please tidy above here ------------------
> > --- */
> >  
> >  #include <asm/bitops.h>
> >  
> > +#ifndef arch_check_bitop_size
> > +
> > +#define bitop_bad_size(addr) sizeof(*(addr)) <
> > sizeof(bitop_uint_t)
> 
> Nit: Missing parentheses around the whole expression.
> 
> > +#define arch_check_bitop_size(addr) \
> > +    if ( bitop_bad_size(addr) ) __bitop_bad_size();
> 
> Apart from the arch_ prefix that now wants dropping, this macro (if
> we
> want one in the first place) 
What do you mean by 'want' here? I thought it is pretty necessary from
safety point of view to have this check.
Except arch_ prefix does it make sense to drop "#ifndef
arch_check_bitop_size" around this macros?

> also wants writing in a way such that it
> can be used as a normal expression, without double semicolons or
> other
> anomalies (potentially) resulting at use sites. E.g.
> 
> #define check_bitop_size(addr) do { \
>     if ( bitop_bad_size(addr) )     \
>         __bitop_bad_size();         \
> } while ( 0 )
> 
> > +#endif /* arch_check_bitop_size */
> > +
> > +/**
> > + * generic__test_and_set_bit - Set a bit and return its old value
> > + * @nr: Bit to set
> > + * @addr: Address to count from
> > + *
> > + * This operation is non-atomic and can be reordered.
> > + * If two examples of this operation race, one can appear to
> > succeed
> > + * but actually fail.  You must protect multiple accesses with a
> > lock.
> > + */
> > +static always_inline bool
> > +generic__test_and_set_bit(unsigned long nr, volatile void *addr)
> 
> The original per-arch functions all use "int" for their first
> parameter.
> Here you use unsigned long, without any mention in the description of
> the
> potential behavioral change. Which is even worse given that then x86
> ends
> up inconsistent with Arm and PPC in this regard, by ...
> 
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old | mask;
> > +    return (old & mask);
> > +}
> > +
> > +/**
> > + * generic__test_and_clear_bit - Clear a bit and return its old
> > value
> > + * @nr: Bit to clear
> > + * @addr: Address to count from
> > + *
> > + * This operation is non-atomic and can be reordered.
> > + * If two examples of this operation race, one can appear to
> > succeed
> > + * but actually fail.  You must protect multiple accesses with a
> > lock.
> > + */
> > +static always_inline bool
> > +generic__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old & ~mask;
> > +    return (old & mask);
> > +}
> > +
> > +/* WARNING: non atomic and it can be reordered! */
> > +static always_inline bool
> > +generic__test_and_change_bit(unsigned long nr, volatile void
> > *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old ^ mask;
> > +    return (old & mask);
> > +}
> > +/**
> > + * generic_test_bit - Determine whether a bit is set
> > + * @nr: bit number to test
> > + * @addr: Address to start counting from
> > + */
> > +static always_inline bool generic_test_bit(int nr, const volatile
> > void *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +
> > +    return (*p & mask);
> > +}
> > +
> > +static always_inline bool
> > +__test_and_set_bit(unsigned long nr, volatile void *addr)
> > +{
> > +#ifndef arch__test_and_set_bit
> > +#define arch__test_and_set_bit generic__test_and_set_bit
> > +#endif
> > +
> > +    return arch__test_and_set_bit(nr, addr);
> 
> ... silently truncating and sign-converting nr here.
Missed that fact. AFAIU there is no specific reason for bit number to
be 'int' for this function, so it makes sense to update x86's prototype
of arch__test_and_set_bit() to:
   static inline int arch__test_and_set_bit(unsigned long nr, volatile
   void *addr).
   
But probably I can't use 'unsigned long' here too, as it should a
compilation error around 'btsl' instruction.

So it can be or 'unsinged int' or 'int'.
> 
> As to generic_test_bit() - please don't cast away const-ness there.
> 
> > +}
> > +#define __test_and_set_bit(nr, addr) ({             \
> > +    arch_check_bitop_size(addr);                    \
> > +    __test_and_set_bit(nr, addr);                   \
> > +})
> > +
> > +static always_inline bool
> > +__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
> 
> Oddly enough here at least you use bitop_uint_t, but that's still ...
> 
> > +{
> > +#ifndef arch__test_and_clear_bit
> > +#define arch__test_and_clear_bit generic__test_and_clear_bit
> > +#endif
> > +
> > +    return arch__test_and_clear_bit(nr, addr);
> 
> ... meaning a signedness conversion on x86 then. And beware: You
> can't
> simply change x86'es code to use bitop_uint_t. The underlying insns
> used
> interpret the bit position as a signed number, i.e. permitting
> accesses
> below the incoming pointer (whether it really makes sense to be that
> way
> is a separate question). I'm afraid I have no good suggestion how to
> deal
> with that: Any approach I can think of is either detrimental to the
> generic implementation or would have unwanted effects on the x86 one.
> Others, anyone?
Is the signedness conversion here a big problem? I suppose no one will
pass a negative number to nr.

It seems to me that it would be better to use 'int' everywhere and not
use bitop_uint_t for 'nr' since it is just a bit position. 'Int'
provides enough range for possible bit number.

> 
> > --- a/xen/include/xen/types.h
> > +++ b/xen/include/xen/types.h
> > @@ -64,6 +64,12 @@ typedef __u64 __be64;
> >  
> >  typedef unsigned int __attribute__((__mode__(__pointer__)))
> > uintptr_t;
> >  
> > +#ifndef BITOP_TYPE
> > +#define BITOP_BITS_PER_WORD 32
> > +
> > +typedef uint32_t bitop_uint_t;
> > +#endif
> 
> I think you mentioned to me before why this needs to live here, not
> in
> xen/bitops.h. Yet I don't recall the reason, and the description
> (hint,
> hint) doesn't say anything either.
If I remember correctly ( after this phrase I think I have to update
the description ) the reason was that if I put that to xen/bitops.h:

    ...
    #include <asm/bitops.h>
    
    #ifndef BITOP_TYPE
    #define BITOP_BITS_PER_WORD 32
    /* typedef uint32_t bitop_uint_t; */
    #endif
    ...

Then we will have an issue that we can't use the generic definition of 
BITOP_BITS_PER_WORD and bitop_uint_t in asm/bitops.h as it is defined
after inclusion of <asm/bitops.h>.

But if to put it to <xen/types.h> then such problem won't occur.

If it still makes sense, then I'll update the description in the next
patch version.

~ Oleksii

> 
> Jan

Jan Beulich May 15, 2024, 3:41 p.m. UTC | #3

On 15.05.2024 17:29, Oleksii K. wrote:
> On Wed, 2024-05-15 at 10:52 +0200, Jan Beulich wrote:
>> On 06.05.2024 12:15, Oleksii Kurochko wrote:
>>> The following generic functions were introduced:
>>> * test_bit
>>> * generic__test_and_set_bit
>>> * generic__test_and_clear_bit
>>> * generic__test_and_change_bit
>>>
>>> Also, the patch introduces the following generics which are
>>> used by the functions mentioned above:
>>> * BITOP_BITS_PER_WORD
>>> * BITOP_MASK
>>> * BITOP_WORD
>>> * BITOP_TYPE
>>>
>>> These functions and macros can be useful for architectures
>>> that don't have corresponding arch-specific instructions.
>>
>> Logically this paragraph may better move ahead of the BITOP_* one.
>>
>>> Because of that x86 has the following check in the macros
>>> test_bit(),
>>> __test_and_set_bit(), __test_and_clear_bit(),
>>> __test_and_change_bit():
>>>     if ( bitop_bad_size(addr) ) __bitop_bad_size();
>>> It was necessary to make bitop bad size check generic too, so
>>> arch_check_bitop_size() was introduced.
>>
>> Not anymore, with the most recent adjustments? There's nothing arch-
>> specific anymore in the checking.
>>
>>> @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int nr,
>>> volatile void *addr)
>>>   * If two examples of this operation race, one can appear to
>>> succeed
>>>   * but actually fail.  You must protect multiple accesses with a
>>> lock.
>>>   */
>>> -static inline int __test_and_set_bit(int nr, void *addr)
>>> +static inline int arch__test_and_set_bit(int nr, volatile void
>>> *addr)
>>
>> I think I raised this point before: Why arch__ here, ...
>>
>>> @@ -232,7 +226,7 @@ static inline int test_and_clear_bit(int nr,
>>> volatile void *addr)
>>>   * If two examples of this operation race, one can appear to
>>> succeed
>>>   * but actually fail.  You must protect multiple accesses with a
>>> lock.
>>>   */
>>> -static inline int __test_and_clear_bit(int nr, void *addr)
>>> +static inline int arch__test_and_clear_bit(int nr, volatile void
>>> *addr)
>>
>> ... here, ...
>>
>>> @@ -243,13 +237,10 @@ static inline int __test_and_clear_bit(int
>>> nr, void *addr)
>>>  
>>>      return oldbit;
>>>  }
>>> -#define __test_and_clear_bit(nr, addr) ({               \
>>> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
>>> -    __test_and_clear_bit(nr, addr);                     \
>>> -})
>>> +#define arch__test_and_clear_bit arch__test_and_clear_bit
>>>  
>>>  /* WARNING: non atomic and it can be reordered! */
>>> -static inline int __test_and_change_bit(int nr, void *addr)
>>> +static inline int arch__test_and_change_bit(int nr, volatile void
>>> *addr)
>>
>> ... and here, while ...
>>
>>> @@ -307,8 +295,7 @@ static inline int variable_test_bit(int nr,
>>> const volatile void *addr)
>>>      return oldbit;
>>>  }
>>>  
>>> -#define test_bit(nr, addr) ({                           \
>>> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
>>> +#define arch_test_bit(nr, addr) ({                      \
>>
>> ... just arch_ here? I don't like the double underscore infixes very
>> much, but I'm okay with them as long as they're applied consistently.
> 
> Common code and x86 use __test_and_clear_bit(), and this patch provides
> a generic version of __test_and_clear_bit(). To emphasize that
> generic__test_and_clear_bit() is a common implementation of
> __test_and_clear_bit(), the double underscore was retained. Also,
> test_and_clear_bit() exists and if one day it will be needed to provide
> a generic version of it, then it will be needed to have
> generic__test_and_clear_bit() and generic_test_and_clear_bit()
> 
> A similar logic was chosen for test_bit.

Right, but in all of your reply arch_ doesn't appear at all. Yet the
question was: Why then not arch__test_bit(), to match the other arch
helpers?

>>> --- a/xen/include/xen/bitops.h
>>> +++ b/xen/include/xen/bitops.h
>>> @@ -65,10 +65,144 @@ static inline int generic_flsl(unsigned long
>>> x)
>>>   * scope
>>>   */
>>>  
>>> +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
>>> BITOP_BITS_PER_WORD))
>>> +
>>> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
>>> +
>>> +extern void __bitop_bad_size(void);
>>> +
>>>  /* --------------------- Please tidy above here ------------------
>>> --- */
>>>  
>>>  #include <asm/bitops.h>
>>>  
>>> +#ifndef arch_check_bitop_size
>>> +
>>> +#define bitop_bad_size(addr) sizeof(*(addr)) <
>>> sizeof(bitop_uint_t)
>>
>> Nit: Missing parentheses around the whole expression.
>>
>>> +#define arch_check_bitop_size(addr) \
>>> +    if ( bitop_bad_size(addr) ) __bitop_bad_size();
>>
>> Apart from the arch_ prefix that now wants dropping, this macro (if
>> we
>> want one in the first place) 
> What do you mean by 'want' here? I thought it is pretty necessary from
> safety point of view to have this check.

I don't question the check. What I was questioning is the need for a
macro to wrap that, seeing how x86 did without. I'm not outright
objecting to such a macro, though.

> Except arch_ prefix does it make sense to drop "#ifndef
> arch_check_bitop_size" around this macros?

Of course, as with arch_ dropped from the name there's no intention to
allow arch overrides.

>>> +/**
>>> + * generic__test_and_set_bit - Set a bit and return its old value
>>> + * @nr: Bit to set
>>> + * @addr: Address to count from
>>> + *
>>> + * This operation is non-atomic and can be reordered.
>>> + * If two examples of this operation race, one can appear to
>>> succeed
>>> + * but actually fail.  You must protect multiple accesses with a
>>> lock.
>>> + */
>>> +static always_inline bool
>>> +generic__test_and_set_bit(unsigned long nr, volatile void *addr)
>>
>> The original per-arch functions all use "int" for their first
>> parameter.
>> Here you use unsigned long, without any mention in the description of
>> the
>> potential behavioral change. Which is even worse given that then x86
>> ends
>> up inconsistent with Arm and PPC in this regard, by ...
>>
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +    bitop_uint_t old = *p;
>>> +
>>> +    *p = old | mask;
>>> +    return (old & mask);
>>> +}
>>> +
>>> +/**
>>> + * generic__test_and_clear_bit - Clear a bit and return its old
>>> value
>>> + * @nr: Bit to clear
>>> + * @addr: Address to count from
>>> + *
>>> + * This operation is non-atomic and can be reordered.
>>> + * If two examples of this operation race, one can appear to
>>> succeed
>>> + * but actually fail.  You must protect multiple accesses with a
>>> lock.
>>> + */
>>> +static always_inline bool
>>> +generic__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +    bitop_uint_t old = *p;
>>> +
>>> +    *p = old & ~mask;
>>> +    return (old & mask);
>>> +}
>>> +
>>> +/* WARNING: non atomic and it can be reordered! */
>>> +static always_inline bool
>>> +generic__test_and_change_bit(unsigned long nr, volatile void
>>> *addr)
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +    bitop_uint_t old = *p;
>>> +
>>> +    *p = old ^ mask;
>>> +    return (old & mask);
>>> +}
>>> +/**
>>> + * generic_test_bit - Determine whether a bit is set
>>> + * @nr: bit number to test
>>> + * @addr: Address to start counting from
>>> + */
>>> +static always_inline bool generic_test_bit(int nr, const volatile
>>> void *addr)
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +
>>> +    return (*p & mask);
>>> +}
>>> +
>>> +static always_inline bool
>>> +__test_and_set_bit(unsigned long nr, volatile void *addr)
>>> +{
>>> +#ifndef arch__test_and_set_bit
>>> +#define arch__test_and_set_bit generic__test_and_set_bit
>>> +#endif
>>> +
>>> +    return arch__test_and_set_bit(nr, addr);
>>
>> ... silently truncating and sign-converting nr here.
> Missed that fact. AFAIU there is no specific reason for bit number to
> be 'int' for this function, so it makes sense to update x86's prototype
> of arch__test_and_set_bit() to:
>    static inline int arch__test_and_set_bit(unsigned long nr, volatile
>    void *addr).
>    
> But probably I can't use 'unsigned long' here too, as it should a
> compilation error around 'btsl' instruction.
> 
> So it can be or 'unsinged int' or 'int'.
>>
>> As to generic_test_bit() - please don't cast away const-ness there.
>>
>>> +}
>>> +#define __test_and_set_bit(nr, addr) ({             \
>>> +    arch_check_bitop_size(addr);                    \
>>> +    __test_and_set_bit(nr, addr);                   \
>>> +})
>>> +
>>> +static always_inline bool
>>> +__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
>>
>> Oddly enough here at least you use bitop_uint_t, but that's still ...
>>
>>> +{
>>> +#ifndef arch__test_and_clear_bit
>>> +#define arch__test_and_clear_bit generic__test_and_clear_bit
>>> +#endif
>>> +
>>> +    return arch__test_and_clear_bit(nr, addr);
>>
>> ... meaning a signedness conversion on x86 then. And beware: You
>> can't
>> simply change x86'es code to use bitop_uint_t. The underlying insns
>> used
>> interpret the bit position as a signed number, i.e. permitting
>> accesses
>> below the incoming pointer (whether it really makes sense to be that
>> way
>> is a separate question). I'm afraid I have no good suggestion how to
>> deal
>> with that: Any approach I can think of is either detrimental to the
>> generic implementation or would have unwanted effects on the x86 one.
>> Others, anyone?
> Is the signedness conversion here a big problem? I suppose no one will
> pass a negative number to nr.
> 
> It seems to me that it would be better to use 'int' everywhere and not
> use bitop_uint_t for 'nr' since it is just a bit position. 'Int'
> provides enough range for possible bit number.

Indeed, and that's then hopefully less of a risk as to introducing hard
to spot issues. Provided Arm and PPC are okay with that type then as well.

>>> --- a/xen/include/xen/types.h
>>> +++ b/xen/include/xen/types.h
>>> @@ -64,6 +64,12 @@ typedef __u64 __be64;
>>>  
>>>  typedef unsigned int __attribute__((__mode__(__pointer__)))
>>> uintptr_t;
>>>  
>>> +#ifndef BITOP_TYPE
>>> +#define BITOP_BITS_PER_WORD 32
>>> +
>>> +typedef uint32_t bitop_uint_t;
>>> +#endif
>>
>> I think you mentioned to me before why this needs to live here, not
>> in
>> xen/bitops.h. Yet I don't recall the reason, and the description
>> (hint,
>> hint) doesn't say anything either.
> If I remember correctly ( after this phrase I think I have to update
> the description ) the reason was that if I put that to xen/bitops.h:
> 
>     ...
>     #include <asm/bitops.h>
>     
>     #ifndef BITOP_TYPE
>     #define BITOP_BITS_PER_WORD 32
>     /* typedef uint32_t bitop_uint_t; */
>     #endif
>     ...
> 
> Then we will have an issue that we can't use the generic definition of 
> BITOP_BITS_PER_WORD and bitop_uint_t in asm/bitops.h as it is defined
> after inclusion of <asm/bitops.h>.
> 
> But if to put it to <xen/types.h> then such problem won't occur.
> 
> If it still makes sense, then I'll update the description in the next
> patch version.

I see. But we don't need to allow for arch overrides here anymore, so in
xen/bitops.h couldn't we as well have

#define BITOP_BITS_PER_WORD 32
typedef uint32_t bitop_uint_t;

... (if necessary)

#include <asm/bitops.h>

?

Jan

Oleksii Kurochko May 15, 2024, 5:03 p.m. UTC | #4

On Wed, 2024-05-15 at 17:41 +0200, Jan Beulich wrote:
> On 15.05.2024 17:29, Oleksii K. wrote:
> > On Wed, 2024-05-15 at 10:52 +0200, Jan Beulich wrote:
> > > On 06.05.2024 12:15, Oleksii Kurochko wrote:
> > > > The following generic functions were introduced:
> > > > * test_bit
> > > > * generic__test_and_set_bit
> > > > * generic__test_and_clear_bit
> > > > * generic__test_and_change_bit
> > > > 
> > > > Also, the patch introduces the following generics which are
> > > > used by the functions mentioned above:
> > > > * BITOP_BITS_PER_WORD
> > > > * BITOP_MASK
> > > > * BITOP_WORD
> > > > * BITOP_TYPE
> > > > 
> > > > These functions and macros can be useful for architectures
> > > > that don't have corresponding arch-specific instructions.
> > > 
> > > Logically this paragraph may better move ahead of the BITOP_*
> > > one.
> > > 
> > > > Because of that x86 has the following check in the macros
> > > > test_bit(),
> > > > __test_and_set_bit(), __test_and_clear_bit(),
> > > > __test_and_change_bit():
> > > >     if ( bitop_bad_size(addr) ) __bitop_bad_size();
> > > > It was necessary to make bitop bad size check generic too, so
> > > > arch_check_bitop_size() was introduced.
> > > 
> > > Not anymore, with the most recent adjustments? There's nothing
> > > arch-
> > > specific anymore in the checking.
> > > 
> > > > @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int nr,
> > > > volatile void *addr)
> > > >   * If two examples of this operation race, one can appear to
> > > > succeed
> > > >   * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > >   */
> > > > -static inline int __test_and_set_bit(int nr, void *addr)
> > > > +static inline int arch__test_and_set_bit(int nr, volatile void
> > > > *addr)
> > > 
> > > I think I raised this point before: Why arch__ here, ...
> > > 
> > > > @@ -232,7 +226,7 @@ static inline int test_and_clear_bit(int
> > > > nr,
> > > > volatile void *addr)
> > > >   * If two examples of this operation race, one can appear to
> > > > succeed
> > > >   * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > >   */
> > > > -static inline int __test_and_clear_bit(int nr, void *addr)
> > > > +static inline int arch__test_and_clear_bit(int nr, volatile
> > > > void
> > > > *addr)
> > > 
> > > ... here, ...
> > > 
> > > > @@ -243,13 +237,10 @@ static inline int
> > > > __test_and_clear_bit(int
> > > > nr, void *addr)
> > > >  
> > > >      return oldbit;
> > > >  }
> > > > -#define __test_and_clear_bit(nr, addr) ({               \
> > > > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > > > -    __test_and_clear_bit(nr, addr);                     \
> > > > -})
> > > > +#define arch__test_and_clear_bit arch__test_and_clear_bit
> > > >  
> > > >  /* WARNING: non atomic and it can be reordered! */
> > > > -static inline int __test_and_change_bit(int nr, void *addr)
> > > > +static inline int arch__test_and_change_bit(int nr, volatile
> > > > void
> > > > *addr)
> > > 
> > > ... and here, while ...
> > > 
> > > > @@ -307,8 +295,7 @@ static inline int variable_test_bit(int nr,
> > > > const volatile void *addr)
> > > >      return oldbit;
> > > >  }
> > > >  
> > > > -#define test_bit(nr, addr) ({                           \
> > > > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > > > +#define arch_test_bit(nr, addr) ({                      \
> > > 
> > > ... just arch_ here? I don't like the double underscore infixes
> > > very
> > > much, but I'm okay with them as long as they're applied
> > > consistently.
> > 
> > Common code and x86 use __test_and_clear_bit(), and this patch
> > provides
> > a generic version of __test_and_clear_bit(). To emphasize that
> > generic__test_and_clear_bit() is a common implementation of
> > __test_and_clear_bit(), the double underscore was retained. Also,
> > test_and_clear_bit() exists and if one day it will be needed to
> > provide
> > a generic version of it, then it will be needed to have
> > generic__test_and_clear_bit() and generic_test_and_clear_bit()
> > 
> > A similar logic was chosen for test_bit.
> 
> Right, but in all of your reply arch_ doesn't appear at all.
I am a little confused here. According to my logic, should it be
arch___test_and_set_bit() and generic___test_and_set_bit()?

If you are asking why there is no generic implementation for
test_and_clear_bit() without the double underscores, the reason is that
Arm, PPC, and x86 don't use generic code and rely on specific
instructions for this operation. Therefore, I didn't see much sense in
providing a generic version of test_and_clear_bit(), at least, for now.


>  Yet the
> question was: Why then not arch__test_bit(), to match the other arch
> helpers?
Because no one uses __test_bit(). Everywhere is used test_bit().

> 
> > > > --- a/xen/include/xen/bitops.h
> > > > +++ b/xen/include/xen/bitops.h
> > > > @@ -65,10 +65,144 @@ static inline int generic_flsl(unsigned
> > > > long
> > > > x)
> > > >   * scope
> > > >   */
> > > >  
> > > > +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
> > > > BITOP_BITS_PER_WORD))
> > > > +
> > > > +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > > > +
> > > > +extern void __bitop_bad_size(void);
> > > > +
> > > >  /* --------------------- Please tidy above here --------------
> > > > ----
> > > > --- */
> > > >  
> > > >  #include <asm/bitops.h>
> > > >  
> > > > +#ifndef arch_check_bitop_size
> > > > +
> > > > +#define bitop_bad_size(addr) sizeof(*(addr)) <
> > > > sizeof(bitop_uint_t)
> > > 
> > > Nit: Missing parentheses around the whole expression.
> > > 
> > > > +#define arch_check_bitop_size(addr) \
> > > > +    if ( bitop_bad_size(addr) ) __bitop_bad_size();
> > > 
> > > Apart from the arch_ prefix that now wants dropping, this macro
> > > (if
> > > we
> > > want one in the first place) 
> > What do you mean by 'want' here? I thought it is pretty necessary
> > from
> > safety point of view to have this check.
> 
> I don't question the check. What I was questioning is the need for a
> macro to wrap that, seeing how x86 did without. I'm not outright
> objecting to such a macro, though.
We can follow x86 approach as we there's no any more intention to allow
arch overrides.

> 
> > Except arch_ prefix does it make sense to drop "#ifndef
> > arch_check_bitop_size" around this macros?
> 
> Of course, as with arch_ dropped from the name there's no intention
> to
> allow arch overrides.
> 
> > > > +/**
> > > > + * generic__test_and_set_bit - Set a bit and return its old
> > > > value
> > > > + * @nr: Bit to set
> > > > + * @addr: Address to count from
> > > > + *
> > > > + * This operation is non-atomic and can be reordered.
> > > > + * If two examples of this operation race, one can appear to
> > > > succeed
> > > > + * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > > + */
> > > > +static always_inline bool
> > > > +generic__test_and_set_bit(unsigned long nr, volatile void
> > > > *addr)
> > > 
> > > The original per-arch functions all use "int" for their first
> > > parameter.
> > > Here you use unsigned long, without any mention in the
> > > description of
> > > the
> > > potential behavioral change. Which is even worse given that then
> > > x86
> > > ends
> > > up inconsistent with Arm and PPC in this regard, by ...
> > > 
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +    bitop_uint_t old = *p;
> > > > +
> > > > +    *p = old | mask;
> > > > +    return (old & mask);
> > > > +}
> > > > +
> > > > +/**
> > > > + * generic__test_and_clear_bit - Clear a bit and return its
> > > > old
> > > > value
> > > > + * @nr: Bit to clear
> > > > + * @addr: Address to count from
> > > > + *
> > > > + * This operation is non-atomic and can be reordered.
> > > > + * If two examples of this operation race, one can appear to
> > > > succeed
> > > > + * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > > + */
> > > > +static always_inline bool
> > > > +generic__test_and_clear_bit(bitop_uint_t nr, volatile void
> > > > *addr)
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +    bitop_uint_t old = *p;
> > > > +
> > > > +    *p = old & ~mask;
> > > > +    return (old & mask);
> > > > +}
> > > > +
> > > > +/* WARNING: non atomic and it can be reordered! */
> > > > +static always_inline bool
> > > > +generic__test_and_change_bit(unsigned long nr, volatile void
> > > > *addr)
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +    bitop_uint_t old = *p;
> > > > +
> > > > +    *p = old ^ mask;
> > > > +    return (old & mask);
> > > > +}
> > > > +/**
> > > > + * generic_test_bit - Determine whether a bit is set
> > > > + * @nr: bit number to test
> > > > + * @addr: Address to start counting from
> > > > + */
> > > > +static always_inline bool generic_test_bit(int nr, const
> > > > volatile
> > > > void *addr)
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +
> > > > +    return (*p & mask);
> > > > +}
> > > > +
> > > > +static always_inline bool
> > > > +__test_and_set_bit(unsigned long nr, volatile void *addr)
> > > > +{
> > > > +#ifndef arch__test_and_set_bit
> > > > +#define arch__test_and_set_bit generic__test_and_set_bit
> > > > +#endif
> > > > +
> > > > +    return arch__test_and_set_bit(nr, addr);
> > > 
> > > ... silently truncating and sign-converting nr here.
> > Missed that fact. AFAIU there is no specific reason for bit number
> > to
> > be 'int' for this function, so it makes sense to update x86's
> > prototype
> > of arch__test_and_set_bit() to:
> >    static inline int arch__test_and_set_bit(unsigned long nr,
> > volatile
> >    void *addr).
> >    
> > But probably I can't use 'unsigned long' here too, as it should a
> > compilation error around 'btsl' instruction.
> > 
> > So it can be or 'unsinged int' or 'int'.
> > > 
> > > As to generic_test_bit() - please don't cast away const-ness
> > > there.
> > > 
> > > > +}
> > > > +#define __test_and_set_bit(nr, addr) ({             \
> > > > +    arch_check_bitop_size(addr);                    \
> > > > +    __test_and_set_bit(nr, addr);                   \
> > > > +})
> > > > +
> > > > +static always_inline bool
> > > > +__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
> > > 
> > > Oddly enough here at least you use bitop_uint_t, but that's still
> > > ...
> > > 
> > > > +{
> > > > +#ifndef arch__test_and_clear_bit
> > > > +#define arch__test_and_clear_bit generic__test_and_clear_bit
> > > > +#endif
> > > > +
> > > > +    return arch__test_and_clear_bit(nr, addr);
> > > 
> > > ... meaning a signedness conversion on x86 then. And beware: You
> > > can't
> > > simply change x86'es code to use bitop_uint_t. The underlying
> > > insns
> > > used
> > > interpret the bit position as a signed number, i.e. permitting
> > > accesses
> > > below the incoming pointer (whether it really makes sense to be
> > > that
> > > way
> > > is a separate question). I'm afraid I have no good suggestion how
> > > to
> > > deal
> > > with that: Any approach I can think of is either detrimental to
> > > the
> > > generic implementation or would have unwanted effects on the x86
> > > one.
> > > Others, anyone?
> > Is the signedness conversion here a big problem? I suppose no one
> > will
> > pass a negative number to nr.
> > 
> > It seems to me that it would be better to use 'int' everywhere and
> > not
> > use bitop_uint_t for 'nr' since it is just a bit position. 'Int'
> > provides enough range for possible bit number.
> 
> Indeed, and that's then hopefully less of a risk as to introducing
> hard
> to spot issues. Provided Arm and PPC are okay with that type then as
> well.
Then I will update prototypes of generic functions correspondingly.

> 
> > > > --- a/xen/include/xen/types.h
> > > > +++ b/xen/include/xen/types.h
> > > > @@ -64,6 +64,12 @@ typedef __u64 __be64;
> > > >  
> > > >  typedef unsigned int __attribute__((__mode__(__pointer__)))
> > > > uintptr_t;
> > > >  
> > > > +#ifndef BITOP_TYPE
> > > > +#define BITOP_BITS_PER_WORD 32
> > > > +
> > > > +typedef uint32_t bitop_uint_t;
> > > > +#endif
> > > 
> > > I think you mentioned to me before why this needs to live here,
> > > not
> > > in
> > > xen/bitops.h. Yet I don't recall the reason, and the description
> > > (hint,
> > > hint) doesn't say anything either.
> > If I remember correctly ( after this phrase I think I have to
> > update
> > the description ) the reason was that if I put that to
> > xen/bitops.h:
> > 
> >     ...
> >     #include <asm/bitops.h>
> >     
> >     #ifndef BITOP_TYPE
> >     #define BITOP_BITS_PER_WORD 32
> >     /* typedef uint32_t bitop_uint_t; */
> >     #endif
> >     ...
> > 
> > Then we will have an issue that we can't use the generic definition
> > of 
> > BITOP_BITS_PER_WORD and bitop_uint_t in asm/bitops.h as it is
> > defined
> > after inclusion of <asm/bitops.h>.
> > 
> > But if to put it to <xen/types.h> then such problem won't occur.
> > 
> > If it still makes sense, then I'll update the description in the
> > next
> > patch version.
> 
> I see. But we don't need to allow for arch overrides here anymore, so
> in
> xen/bitops.h couldn't we as well have
> 
> #define BITOP_BITS_PER_WORD 32
> typedef uint32_t bitop_uint_t;
> 
> ... (if necessary)
> 
> #include <asm/bitops.h>
> 
> ?
Good point. If there is not need to allow for arch overrides then we
can use suggested above approach. Thanks.

~ Oleksii

Jan Beulich May 16, 2024, 7:04 a.m. UTC | #5

On 15.05.2024 19:03, Oleksii K. wrote:
> On Wed, 2024-05-15 at 17:41 +0200, Jan Beulich wrote:
>> On 15.05.2024 17:29, Oleksii K. wrote:
>>> On Wed, 2024-05-15 at 10:52 +0200, Jan Beulich wrote:
>>>> On 06.05.2024 12:15, Oleksii Kurochko wrote:
>>>>> The following generic functions were introduced:
>>>>> * test_bit
>>>>> * generic__test_and_set_bit
>>>>> * generic__test_and_clear_bit
>>>>> * generic__test_and_change_bit
>>>>>
>>>>> Also, the patch introduces the following generics which are
>>>>> used by the functions mentioned above:
>>>>> * BITOP_BITS_PER_WORD
>>>>> * BITOP_MASK
>>>>> * BITOP_WORD
>>>>> * BITOP_TYPE
>>>>>
>>>>> These functions and macros can be useful for architectures
>>>>> that don't have corresponding arch-specific instructions.
>>>>
>>>> Logically this paragraph may better move ahead of the BITOP_*
>>>> one.
>>>>
>>>>> Because of that x86 has the following check in the macros
>>>>> test_bit(),
>>>>> __test_and_set_bit(), __test_and_clear_bit(),
>>>>> __test_and_change_bit():
>>>>>     if ( bitop_bad_size(addr) ) __bitop_bad_size();
>>>>> It was necessary to make bitop bad size check generic too, so
>>>>> arch_check_bitop_size() was introduced.
>>>>
>>>> Not anymore, with the most recent adjustments? There's nothing
>>>> arch-
>>>> specific anymore in the checking.
>>>>
>>>>> @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int nr,
>>>>> volatile void *addr)
>>>>>   * If two examples of this operation race, one can appear to
>>>>> succeed
>>>>>   * but actually fail.  You must protect multiple accesses with
>>>>> a
>>>>> lock.
>>>>>   */
>>>>> -static inline int __test_and_set_bit(int nr, void *addr)
>>>>> +static inline int arch__test_and_set_bit(int nr, volatile void
>>>>> *addr)
>>>>
>>>> I think I raised this point before: Why arch__ here, ...
>>>>
>>>>> @@ -232,7 +226,7 @@ static inline int test_and_clear_bit(int
>>>>> nr,
>>>>> volatile void *addr)
>>>>>   * If two examples of this operation race, one can appear to
>>>>> succeed
>>>>>   * but actually fail.  You must protect multiple accesses with
>>>>> a
>>>>> lock.
>>>>>   */
>>>>> -static inline int __test_and_clear_bit(int nr, void *addr)
>>>>> +static inline int arch__test_and_clear_bit(int nr, volatile
>>>>> void
>>>>> *addr)
>>>>
>>>> ... here, ...
>>>>
>>>>> @@ -243,13 +237,10 @@ static inline int
>>>>> __test_and_clear_bit(int
>>>>> nr, void *addr)
>>>>>  
>>>>>      return oldbit;
>>>>>  }
>>>>> -#define __test_and_clear_bit(nr, addr) ({               \
>>>>> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
>>>>> -    __test_and_clear_bit(nr, addr);                     \
>>>>> -})
>>>>> +#define arch__test_and_clear_bit arch__test_and_clear_bit
>>>>>  
>>>>>  /* WARNING: non atomic and it can be reordered! */
>>>>> -static inline int __test_and_change_bit(int nr, void *addr)
>>>>> +static inline int arch__test_and_change_bit(int nr, volatile
>>>>> void
>>>>> *addr)
>>>>
>>>> ... and here, while ...
>>>>
>>>>> @@ -307,8 +295,7 @@ static inline int variable_test_bit(int nr,
>>>>> const volatile void *addr)
>>>>>      return oldbit;
>>>>>  }
>>>>>  
>>>>> -#define test_bit(nr, addr) ({                           \
>>>>> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
>>>>> +#define arch_test_bit(nr, addr) ({                      \
>>>>
>>>> ... just arch_ here? I don't like the double underscore infixes
>>>> very
>>>> much, but I'm okay with them as long as they're applied
>>>> consistently.
>>>
>>> Common code and x86 use __test_and_clear_bit(), and this patch
>>> provides
>>> a generic version of __test_and_clear_bit(). To emphasize that
>>> generic__test_and_clear_bit() is a common implementation of
>>> __test_and_clear_bit(), the double underscore was retained. Also,
>>> test_and_clear_bit() exists and if one day it will be needed to
>>> provide
>>> a generic version of it, then it will be needed to have
>>> generic__test_and_clear_bit() and generic_test_and_clear_bit()
>>>
>>> A similar logic was chosen for test_bit.
>>
>> Right, but in all of your reply arch_ doesn't appear at all.
> I am a little confused here. According to my logic, should it be
> arch___test_and_set_bit() and generic___test_and_set_bit()?

Why 3 underscores in a row? I'm clearly not following.

> If you are asking why there is no generic implementation for
> test_and_clear_bit() without the double underscores, the reason is that
> Arm, PPC, and x86 don't use generic code and rely on specific
> instructions for this operation. Therefore, I didn't see much sense in
> providing a generic version of test_and_clear_bit(), at least, for now.

No, there was no question in that direction. And hence ...

>>  Yet the
>> question was: Why then not arch__test_bit(), to match the other arch
>> helpers?
> Because no one uses __test_bit(). Everywhere is used test_bit().

... this seems unrelated (constrained by my earlier lack of following you).

(Later) Wait, maybe I've finally figured it: You use arch__test_and_*()
because those underlie __test_and_*(), but arch_test_bit() because there's
solely test_bit() (same for the generic_* naming). I guess I can accept
that then, despite the slight anomaly you point out, resulting in the
question towards 3 underscores in a row. To clarify, my thinking was more
towards there not possibly being generic forms of test_and_*() (i.e. no
possible set of arch_test_and_*() or generic_test_and_*()), thus using
double inner underscores in arch__test_*() and generic__test_*() to
signify that those are purely internal functions, which aren't supposed to
be called directly.

Jan

Oleksii Kurochko May 16, 2024, 10:34 a.m. UTC | #6

On Thu, 2024-05-16 at 09:04 +0200, Jan Beulich wrote:
> On 15.05.2024 19:03, Oleksii K. wrote:
> > On Wed, 2024-05-15 at 17:41 +0200, Jan Beulich wrote:
> > > On 15.05.2024 17:29, Oleksii K. wrote:
> > > > On Wed, 2024-05-15 at 10:52 +0200, Jan Beulich wrote:
> > > > > On 06.05.2024 12:15, Oleksii Kurochko wrote:
> > > > > > The following generic functions were introduced:
> > > > > > * test_bit
> > > > > > * generic__test_and_set_bit
> > > > > > * generic__test_and_clear_bit
> > > > > > * generic__test_and_change_bit
> > > > > > 
> > > > > > Also, the patch introduces the following generics which are
> > > > > > used by the functions mentioned above:
> > > > > > * BITOP_BITS_PER_WORD
> > > > > > * BITOP_MASK
> > > > > > * BITOP_WORD
> > > > > > * BITOP_TYPE
> > > > > > 
> > > > > > These functions and macros can be useful for architectures
> > > > > > that don't have corresponding arch-specific instructions.
> > > > > 
> > > > > Logically this paragraph may better move ahead of the BITOP_*
> > > > > one.
> > > > > 
> > > > > > Because of that x86 has the following check in the macros
> > > > > > test_bit(),
> > > > > > __test_and_set_bit(), __test_and_clear_bit(),
> > > > > > __test_and_change_bit():
> > > > > >     if ( bitop_bad_size(addr) ) __bitop_bad_size();
> > > > > > It was necessary to make bitop bad size check generic too,
> > > > > > so
> > > > > > arch_check_bitop_size() was introduced.
> > > > > 
> > > > > Not anymore, with the most recent adjustments? There's
> > > > > nothing
> > > > > arch-
> > > > > specific anymore in the checking.
> > > > > 
> > > > > > @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int
> > > > > > nr,
> > > > > > volatile void *addr)
> > > > > >   * If two examples of this operation race, one can appear
> > > > > > to
> > > > > > succeed
> > > > > >   * but actually fail.  You must protect multiple accesses
> > > > > > with
> > > > > > a
> > > > > > lock.
> > > > > >   */
> > > > > > -static inline int __test_and_set_bit(int nr, void *addr)
> > > > > > +static inline int arch__test_and_set_bit(int nr, volatile
> > > > > > void
> > > > > > *addr)
> > > > > 
> > > > > I think I raised this point before: Why arch__ here, ...
> > > > > 
> > > > > > @@ -232,7 +226,7 @@ static inline int
> > > > > > test_and_clear_bit(int
> > > > > > nr,
> > > > > > volatile void *addr)
> > > > > >   * If two examples of this operation race, one can appear
> > > > > > to
> > > > > > succeed
> > > > > >   * but actually fail.  You must protect multiple accesses
> > > > > > with
> > > > > > a
> > > > > > lock.
> > > > > >   */
> > > > > > -static inline int __test_and_clear_bit(int nr, void *addr)
> > > > > > +static inline int arch__test_and_clear_bit(int nr,
> > > > > > volatile
> > > > > > void
> > > > > > *addr)
> > > > > 
> > > > > ... here, ...
> > > > > 
> > > > > > @@ -243,13 +237,10 @@ static inline int
> > > > > > __test_and_clear_bit(int
> > > > > > nr, void *addr)
> > > > > >  
> > > > > >      return oldbit;
> > > > > >  }
> > > > > > -#define __test_and_clear_bit(nr, addr) ({               \
> > > > > > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > > > > > -    __test_and_clear_bit(nr, addr);                     \
> > > > > > -})
> > > > > > +#define arch__test_and_clear_bit arch__test_and_clear_bit
> > > > > >  
> > > > > >  /* WARNING: non atomic and it can be reordered! */
> > > > > > -static inline int __test_and_change_bit(int nr, void
> > > > > > *addr)
> > > > > > +static inline int arch__test_and_change_bit(int nr,
> > > > > > volatile
> > > > > > void
> > > > > > *addr)
> > > > > 
> > > > > ... and here, while ...
> > > > > 
> > > > > > @@ -307,8 +295,7 @@ static inline int variable_test_bit(int
> > > > > > nr,
> > > > > > const volatile void *addr)
> > > > > >      return oldbit;
> > > > > >  }
> > > > > >  
> > > > > > -#define test_bit(nr, addr) ({                           \
> > > > > > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > > > > > +#define arch_test_bit(nr, addr) ({                      \
> > > > > 
> > > > > ... just arch_ here? I don't like the double underscore
> > > > > infixes
> > > > > very
> > > > > much, but I'm okay with them as long as they're applied
> > > > > consistently.
> > > > 
> > > > Common code and x86 use __test_and_clear_bit(), and this patch
> > > > provides
> > > > a generic version of __test_and_clear_bit(). To emphasize that
> > > > generic__test_and_clear_bit() is a common implementation of
> > > > __test_and_clear_bit(), the double underscore was retained.
> > > > Also,
> > > > test_and_clear_bit() exists and if one day it will be needed to
> > > > provide
> > > > a generic version of it, then it will be needed to have
> > > > generic__test_and_clear_bit() and generic_test_and_clear_bit()
> > > > 
> > > > A similar logic was chosen for test_bit.
> > > 
> > > Right, but in all of your reply arch_ doesn't appear at all.
> > I am a little confused here. According to my logic, should it be
> > arch___test_and_set_bit() and generic___test_and_set_bit()?
> 
> Why 3 underscores in a row? I'm clearly not following.
> 
> > If you are asking why there is no generic implementation for
> > test_and_clear_bit() without the double underscores, the reason is
> > that
> > Arm, PPC, and x86 don't use generic code and rely on specific
> > instructions for this operation. Therefore, I didn't see much sense
> > in
> > providing a generic version of test_and_clear_bit(), at least, for
> > now.
> 
> No, there was no question in that direction. And hence ...
> 
> > >  Yet the
> > > question was: Why then not arch__test_bit(), to match the other
> > > arch
> > > helpers?
> > Because no one uses __test_bit(). Everywhere is used test_bit().
> 
> ... this seems unrelated (constrained by my earlier lack of following
> you).
> 
> (Later) Wait, maybe I've finally figured it: You use
> arch__test_and_*()
> because those underlie __test_and_*(), but arch_test_bit() because
> there's
> solely test_bit() (same for the generic_* naming).
Yes, that what I meant.

>  I guess I can accept
> that then, despite the slight anomaly you point out, resulting in the
> question towards 3 underscores in a row. To clarify, my thinking was
> more
> towards there not possibly being generic forms of test_and_*() (i.e.
> no
> possible set of arch_test_and_*() or generic_test_and_*()), thus
> using
> double inner underscores in arch__test_*() and generic__test_*() to
> signify that those are purely internal functions, which aren't
> supposed to
> be called directly.
I understand your point regarding functions that start with "__".
For example, __test_and_clear_bit() is used not only internally (in
terms of architecture code) but also in common code, so it is not
strictly internal. I may have misunderstood what "internal function"
means in this context.

I thought that, at least for bit operations, "__bit_operation" means
that the bit operation is non-atomic and can be reordered, which
implies that it's not a good idea to use it in common code without
additional steps.

Anyway, I am not sure I understand which approach I should use in this
patch. You mentioned that possibly test_and_() can't have a generic
form, meaning it won't be a set of arch_test_and_() functions.

So, can I rename arch__test_() and generic__test_() to arch_test_() and
generic_test_(), respectively, and use the renamed functions in
_test_and*() in xen/bitops.h? Is my understanding correct?
If my understanding is correct, I am happy to apply mentioned changes
in the next patch version.

~ Oleksii


> 
> Jan

Jan Beulich May 16, 2024, 10:49 a.m. UTC | #7

On 16.05.2024 12:34, Oleksii K. wrote:
> On Thu, 2024-05-16 at 09:04 +0200, Jan Beulich wrote:
>> (Later) Wait, maybe I've finally figured it: You use
>> arch__test_and_*()
>> because those underlie __test_and_*(), but arch_test_bit() because
>> there's
>> solely test_bit() (same for the generic_* naming).
> Yes, that what I meant.
> 
>>  I guess I can accept
>> that then, despite the slight anomaly you point out, resulting in the
>> question towards 3 underscores in a row. To clarify, my thinking was
>> more
>> towards there not possibly being generic forms of test_and_*() (i.e.
>> no
>> possible set of arch_test_and_*() or generic_test_and_*()), thus
>> using
>> double inner underscores in arch__test_*() and generic__test_*() to
>> signify that those are purely internal functions, which aren't
>> supposed to
>> be called directly.
> I understand your point regarding functions that start with "__".
> For example, __test_and_clear_bit() is used not only internally (in
> terms of architecture code) but also in common code, so it is not
> strictly internal. I may have misunderstood what "internal function"
> means in this context.
> 
> I thought that, at least for bit operations, "__bit_operation" means
> that the bit operation is non-atomic and can be reordered, which
> implies that it's not a good idea to use it in common code without
> additional steps.

Correct, up to the comma; those may very well be used in common code,
provided non-atomic forms indeed suffice. But in my reply I didn't talk
about double-underscore-prefixes in names of involved functions. I
talked about inner double underscores.

> Anyway, I am not sure I understand which approach I should use in this
> patch. You mentioned that possibly test_and_() can't have a generic
> form, meaning it won't be a set of arch_test_and_() functions.
> 
> So, can I rename arch__test_() and generic__test_() to arch_test_() and
> generic_test_(), respectively, and use the renamed functions in
> _test_and*() in xen/bitops.h? Is my understanding correct?

You could. You could also stick to what you have now - as said, I can
accept that with the worked out explanation. Or you could switch to
using arch__test_bit() and generic__test_bit(), thus having the double
inner underscores identify "internal to the implementation" functions.
My preference would be in backwards order of what I have just enumerated
as possible options. I wonder whether really no-one else has any opinion
here ...

Jan

Oleksii Kurochko May 17, 2024, 8:42 a.m. UTC | #8

On Thu, 2024-05-16 at 12:49 +0200, Jan Beulich wrote:
> > Anyway, I am not sure I understand which approach I should use in
> > this
> > patch. You mentioned that possibly test_and_() can't have a generic
> > form, meaning it won't be a set of arch_test_and_() functions.
> > 
> > So, can I rename arch__test_() and generic__test_() to arch_test_()
> > and
> > generic_test_(), respectively, and use the renamed functions in
> > _test_and*() in xen/bitops.h? Is my understanding correct?
> 
> You could. You could also stick to what you have now - as said, I can
> accept that with the worked out explanation. Or you could switch to
> using arch__test_bit() and generic__test_bit(), thus having the
> double
> inner underscores identify "internal to the implementation"
> functions.
> My preference would be in backwards order of what I have just
> enumerated
> as possible options. I wonder whether really no-one else has any
> opinion
> here ...

I see that __test_bit() doesn't exist now and perhaps won't exist at
all, but in this patch we are providing the generic for test_bit(), not
__test_bit(). Thereby according to provided by me naming for test_bit()
should be defined using {generic, arch}_test_bit().

~ Oleksii

[v9,02/15] xen: introduce generic non-atomic test_*bit()

Commit Message

Comments

Patch