diff mbox series

[v7,10/19] xen/riscv: introduce atomic.h

Message ID 6a6c4ef8663d9da8c8a2aba4baf0286f31566edf.1712137031.git.oleksii.kurochko@gmail.com (mailing list archive)
State Superseded
Headers show
Series Enable build of full Xen for RISC-V | expand

Commit Message

Oleksii Kurochko April 3, 2024, 10:20 a.m. UTC
Initially the patch was introduced by Bobby, who takes the header from
Linux kernel.

The following changes were done on top of Bobby's changes:
 - atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
   to use__*xchg_generic()
 - drop casts in write_atomic() as they are unnecessary
 - drop introduction of WRITE_ONCE() and READ_ONCE().
   Xen provides ACCESS_ONCE()
 - remove zero-length array access in read_atomic()
 - drop defines similar to pattern:
   #define atomic_add_return_relaxed   atomic_add_return_relaxed
 - move not RISC-V specific functions to asm-generic/atomics-ops.h
 - drop  atomic##prefix##_{cmp}xchg_{release, aquire, release}() as they
   are not used in Xen.
 - update the defintion of  atomic##prefix##_{cmp}xchg according to
   {cmp}xchg() implementation in Xen.

The current implementation is the same with 8e86f0b409a4
("arm64: atomics: fix use of acquire + release for full barrier
semantics") [1].
RISC-V could combine acquire and release into the SC
instructions and it could reduce a fence instruction to gain better
performance. Here is related description from RISC-V ISA 10.2
Load-Reserved/Store-Conditional Instructions:

 - .aq:   The LR/SC sequence can be given acquire semantics by
          setting the aq bit on the LR instruction.
 - .rl:   The LR/SC sequence can be given release semantics by
              setting the rl bit on the SC instruction.
 - .aqrl: Setting the aq bit on the LR instruction, and setting
          both the aq and the rl bit on the SC instruction makes
          the LR/SC sequence sequentially consistent, meaning that
          it cannot be reordered with earlier or later memory
          operations from the same hart.

 Software should not set the rl bit on an LR instruction unless
 the aq bit is also set, nor should software set the aq bit on an
 SC instruction unless the rl bit is also set. LR.rl and SC.aq
 instructions are not guaranteed to provide any stronger ordering
 than those with both bits clear, but may result in lower
 performance.

Also, I way of transforming ".rl + full barrier" to ".aqrl" was approved
by (the author of the RVWMO spec) [2]

[1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1391516953-14541-1-git-send-email-will.deacon@arm.com/
[2] https://lore.kernel.org/linux-riscv/41e01514-74ca-84f2-f5cc-2645c444fd8e@nvidia.com/

Signed-off-by: Bobby Eshleman <bobbyeshleman@gmail.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V7:
 - drop relaxed version of atomic ops as they are not used.
 - update the commit message
 - code style fixes
 - refactor functions write_atomic(), add_sized() to be able to use #ifdef CONFIG_RISCV_32 ... #endif
   for {write,read}q().
 - update ATOMIC_OPS to receive unary operator.
 - update the header on top of atomic-ops.h.
 - some minor movements of function inside atomic-ops.h header.
---
Changes in V6:
 - drop atomic##prefix##_{cmp}xchg_{release, aquire, relaxed} as they aren't used
   by Xen
 - code style fixes.
 - %s/__asm__ __volatile__/asm volatile
 - add explanational comments.
 - move inclusion of "#include <asm-generic/atomic-ops.h>" further down in atomic.h
   header.
---
Changes in V5:
 - fence.h changes were moved to separate patch as patches related to io.h and cmpxchg.h,
   which are dependecies for this patch, also needed changes in fence.h
 - remove accessing of zero-length array
 - drops cast in write_atomic()
 - drop introduction of WRITE_ONCE() and READ_ONCE().
 - drop defines similar to pattern #define atomic_add_return_relaxed   atomic_add_return_relaxed
 - Xen code style fixes
 - move not RISC-V specific functions to asm-generic/atomics-ops.h
---
Changes in V4:
 - do changes related to the updates of [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h
 - drop casts in read_atomic_size(), write_atomic(), add_sized()
 - tabs -> spaces
 - drop #ifdef CONFIG_SMP ... #endif in fence.ha as it is simpler to handle NR_CPUS=1
   the same as NR_CPUS>1 with accepting less than ideal performance.
---
Changes in V3:
  - update the commit message
  - add SPDX for fence.h
  - code style fixes
  - Remove /* TODO: ... */ for add_sized macros. It looks correct to me.
  - re-order the patch
  - merge to this patch fence.h
---
Changes in V2:
 - Change an author of commit. I got this header from Bobby's old repo.
---
 xen/arch/riscv/include/asm/atomic.h  | 261 +++++++++++++++++++++++++++
 xen/include/asm-generic/atomic-ops.h |  97 ++++++++++
 2 files changed, 358 insertions(+)
 create mode 100644 xen/arch/riscv/include/asm/atomic.h
 create mode 100644 xen/include/asm-generic/atomic-ops.h

Comments

Jan Beulich April 8, 2024, 8:23 a.m. UTC | #1
On 03.04.2024 12:20, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/atomic.h
> @@ -0,0 +1,261 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Taken and modified from Linux.
> + *
> + * The following changes were done:
> + * - * atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
> + *     to use__*xchg_generic()
> + * - drop casts in write_atomic() as they are unnecessary
> + * - drop introduction of WRITE_ONCE() and READ_ONCE().
> + *   Xen provides ACCESS_ONCE()
> + * - remove zero-length array access in read_atomic()
> + * - drop defines similar to pattern
> + *   #define atomic_add_return_relaxed   atomic_add_return_relaxed
> + * - move not RISC-V specific functions to asm-generic/atomics-ops.h
> + * 
> + * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
> + * Copyright (C) 2012 Regents of the University of California
> + * Copyright (C) 2017 SiFive
> + * Copyright (C) 2024 Vates SAS
> + */
> +
> +#ifndef _ASM_RISCV_ATOMIC_H
> +#define _ASM_RISCV_ATOMIC_H
> +
> +#include <xen/atomic.h>
> +
> +#include <asm/cmpxchg.h>
> +#include <asm/fence.h>
> +#include <asm/io.h>
> +#include <asm/system.h>
> +
> +void __bad_atomic_size(void);
> +
> +/*
> + * Legacy from Linux kernel. For some reason they wanted to have ordered
> + * read/write access. Thereby read* is used instead of read*_cpu()
> + */
> +static always_inline void read_atomic_size(const volatile void *p,
> +                                           void *res,
> +                                           unsigned int size)
> +{
> +    switch ( size )
> +    {
> +    case 1: *(uint8_t *)res = readb(p); break;
> +    case 2: *(uint16_t *)res = readw(p); break;
> +    case 4: *(uint32_t *)res = readl(p); break;
> +#ifndef CONFIG_RISCV_32
> +    case 8: *(uint32_t *)res = readq(p); break;
> +#endif
> +    default: __bad_atomic_size(); break;
> +    }
> +}
> +
> +#define read_atomic(p) ({                                   \
> +    union { typeof(*(p)) val; char c[sizeof(*(p))]; } x_;   \
> +    read_atomic_size(p, x_.c, sizeof(*(p)));                \
> +    x_.val;                                                 \
> +})
> +
> +static always_inline void _write_atomic(volatile void *p,
> +                                       unsigned long x, unsigned int size)
> +{
> +    switch ( size )
> +    {
> +    case 1: writeb(x, p); break;
> +    case 2: writew(x, p); break;
> +    case 4: writel(x, p); break;
> +#ifndef CONFIG_RISCV_32
> +    case 8: writeq(x, p); break;
> +#endif
> +    default: __bad_atomic_size(); break;
> +    }
> +}
> +
> +#define write_atomic(p, x)                              \
> +({                                                      \
> +    typeof(*(p)) x_ = (x);                              \
> +    _write_atomic((p), x_, sizeof(*(p)));               \
> +    x_;                                                 \
> +})
> +
> +static always_inline void _add_sized(volatile void *p,
> +                                     unsigned long x, unsigned int size)
> +{
> +    switch ( size )
> +    {
> +    case 1: writeb(read_atomic((volatile uint8_t *)p) + x, p); break;
> +    case 2: writew(read_atomic((volatile uint16_t *)p) + x, p); break;
> +    case 4: writel(read_atomic((volatile uint32_t *)p) + x, p); break;
> +#ifndef CONFIG_RISCV_32
> +    case 8: writeq(read_atomic((volatile uint64_t *)p) + x, p); break;
> +#endif

Any particular reason for using read_atomic() but write{b,w,l,q}() here?

> +    default: __bad_atomic_size(); break;
> +    }
> +}
> +
> +#define add_sized(p, x)                                 \
> +({                                                      \
> +    typeof(*(p)) x_ = (x);                              \
> +    _add_sized((p), x_, sizeof(*(p)));                  \
> +})
> +
> +#define __atomic_acquire_fence() \
> +    asm volatile ( RISCV_ACQUIRE_BARRIER "" ::: "memory" )
> +
> +#define __atomic_release_fence() \
> +    asm volatile ( RISCV_RELEASE_BARRIER "" ::: "memory" )

There isn't any need for the "" in these two, is there?

> +/*
> + * First, the atomic ops that have no ordering constraints and therefor don't
> + * have the AQ or RL bits set.  These don't return anything, so there's only
> + * one version to worry about.
> + */
> +#define ATOMIC_OP(op, asm_op, unary_op, asm_type, c_type, prefix)  \
> +static inline                                               \
> +void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
> +{                                                           \
> +    asm volatile (                                          \
> +        "   amo" #asm_op "." #asm_type " zero, %1, %0"      \
> +        : "+A" (v->counter)                                 \
> +        : "r" (unary_op i)                                  \
> +        : "memory" );                                       \
> +}                                                           \
> +
> +/*
> + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the reason why
> + * last argument for ATOMIC_OP isn't used.
> + */
> +#define ATOMIC_OPS(op, asm_op, unary_op)                    \
> +        ATOMIC_OP (op, asm_op, unary_op, w, int,   )
> +
> +ATOMIC_OPS(add, add, +)
> +ATOMIC_OPS(sub, add, -)
> +ATOMIC_OPS(and, and, +)
> +ATOMIC_OPS( or,  or, +)
> +ATOMIC_OPS(xor, xor, +)
> +
> +#undef ATOMIC_OP
> +#undef ATOMIC_OPS
> +
> +#include <asm-generic/atomic-ops.h>
> +
> +/*
> + * Atomic ops that have ordered variant.
> + * There's two flavors of these: the arithmatic ops have both fetch and return
> + * versions, while the logical ops only have fetch versions.
> + */
> +#define ATOMIC_FETCH_OP(op, asm_op, unary_op, asm_type, c_type, prefix) \
> +static inline                                                       \
> +c_type atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t *v) \
> +{                                                                   \
> +    register c_type ret;                                            \
> +    asm volatile (                                                  \
> +        "   amo" #asm_op "." #asm_type ".aqrl  %1, %2, %0"          \
> +        : "+A" (v->counter), "=r" (ret)                             \
> +        : "r" (unary_op i)                                          \
> +        : "memory" );                                               \
> +    return ret;                                                     \
> +}
> +
> +#define ATOMIC_OP_RETURN(op, asm_op, c_op, unary_op, asm_type, c_type, prefix) \
> +static inline                                                           \
> +c_type atomic##prefix##_##op##_return(c_type i, atomic##prefix##_t *v)  \
> +{                                                                       \
> +        return atomic##prefix##_fetch_##op(i, v) c_op (unary_op i);     \

Nit: Too deep indentation.

> +}
> +
> +/*
> + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the reason why
> + * last argument of ATOMIC_FETCH_OP, ATOMIC_OP_RETURN isn't used.
> + */
> +#define ATOMIC_OPS(op, asm_op, c_op, unary_op)                          \
> +        ATOMIC_FETCH_OP( op, asm_op,       unary_op, w, int,   )        \
> +        ATOMIC_OP_RETURN(op, asm_op, c_op, unary_op, w, int,   )
> +
> +ATOMIC_OPS(add, add, +, +)
> +ATOMIC_OPS(sub, add, +, -)
> +
> +#undef ATOMIC_OPS
> +
> +#define ATOMIC_OPS(op, asm_op, unary_op) \
> +        ATOMIC_FETCH_OP(op, asm_op, unary_op, w, int,   )
> +
> +ATOMIC_OPS(and, and, +)
> +ATOMIC_OPS( or,  or, +)
> +ATOMIC_OPS(xor, xor, +)

The + isn't really needed here as a macro argument; ATOMIC_OPS() itself could
pass it to ATOMIC_FETCH_OP(). I also wonder why ATOMIC_OPS() has both "op" and
"asm_op", when both are uniformly the same.

> +#undef ATOMIC_OPS
> +
> +#undef ATOMIC_FETCH_OP
> +#undef ATOMIC_OP_RETURN
> +
> +/* This is required to provide a full barrier on success. */
> +static inline int atomic_add_unless(atomic_t *v, int a, int u)
> +{
> +    int prev, rc;
> +
> +    asm volatile (
> +        "0: lr.w     %[p],  %[c]\n"
> +        "   beq      %[p],  %[u], 1f\n"
> +        "   add      %[rc], %[p], %[a]\n"
> +        "   sc.w.aqrl  %[rc], %[rc], %[c]\n"
> +        "   bnez     %[rc], 0b\n"
> +        "1:\n"
> +        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
> +        : [a] "r" (a), [u] "r" (u)
> +        : "memory");
> +    return prev;
> +}
> +
> +static inline int atomic_sub_if_positive(atomic_t *v, int offset)
> +{
> +    int prev, rc;
> +
> +    asm volatile (
> +        "0: lr.w     %[p],  %[c]\n"
> +        "   sub      %[rc], %[p], %[o]\n"
> +        "   bltz     %[rc], 1f\n"
> +        "   sc.w.aqrl  %[rc], %[rc], %[c]\n"
> +        "   bnez     %[rc], 0b\n"
> +        "1:\n"
> +        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
> +        : [o] "r" (offset)
> +        : "memory" );
> +    return prev - offset;
> +}
> +
> +/*
> + * atomic_{cmp,}xchg is required to have exactly the same ordering semantics as
> + * {cmp,}xchg and the operations that return.
> + */
> +#define ATOMIC_OP(c_t, prefix, size)                            \
> +static inline                                                   \
> +c_t atomic##prefix##_xchg(atomic##prefix##_t *v, c_t n)         \
> +{                                                               \
> +    return __xchg(&v->counter, n, size);                        \
> +}                                                               \
> +static inline                                                   \
> +c_t atomic##prefix##_cmpxchg(atomic##prefix##_t *v, c_t o, c_t n) \
> +{                                                               \
> +    return __cmpxchg(&v->counter, o, n, size);                  \
> +}
> +
> +#define ATOMIC_OPS() \
> +    ATOMIC_OP(int,   , 4)

Can't the two inline functions use sizeof(*v), sizeof(v->counter), sizeof(c_t),
or sizeof(n) instead of passing a literal 4 here?

Jan
Oleksii Kurochko April 9, 2024, 7:39 a.m. UTC | #2
On Mon, 2024-04-08 at 10:23 +0200, Jan Beulich wrote:
> On 03.04.2024 12:20, Oleksii Kurochko wrote:
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/atomic.h
> > @@ -0,0 +1,261 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Taken and modified from Linux.
> > + *
> > + * The following changes were done:
> > + * - * atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were
> > updated
> > + *     to use__*xchg_generic()
> > + * - drop casts in write_atomic() as they are unnecessary
> > + * - drop introduction of WRITE_ONCE() and READ_ONCE().
> > + *   Xen provides ACCESS_ONCE()
> > + * - remove zero-length array access in read_atomic()
> > + * - drop defines similar to pattern
> > + *   #define atomic_add_return_relaxed   atomic_add_return_relaxed
> > + * - move not RISC-V specific functions to asm-generic/atomics-
> > ops.h
> > + * 
> > + * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
> > + * Copyright (C) 2012 Regents of the University of California
> > + * Copyright (C) 2017 SiFive
> > + * Copyright (C) 2024 Vates SAS
> > + */
> > +
> > +#ifndef _ASM_RISCV_ATOMIC_H
> > +#define _ASM_RISCV_ATOMIC_H
> > +
> > +#include <xen/atomic.h>
> > +
> > +#include <asm/cmpxchg.h>
> > +#include <asm/fence.h>
> > +#include <asm/io.h>
> > +#include <asm/system.h>
> > +
> > +void __bad_atomic_size(void);
> > +
> > +/*
> > + * Legacy from Linux kernel. For some reason they wanted to have
> > ordered
> > + * read/write access. Thereby read* is used instead of read*_cpu()
> > + */
> > +static always_inline void read_atomic_size(const volatile void *p,
> > +                                           void *res,
> > +                                           unsigned int size)
> > +{
> > +    switch ( size )
> > +    {
> > +    case 1: *(uint8_t *)res = readb(p); break;
> > +    case 2: *(uint16_t *)res = readw(p); break;
> > +    case 4: *(uint32_t *)res = readl(p); break;
> > +#ifndef CONFIG_RISCV_32
> > +    case 8: *(uint32_t *)res = readq(p); break;
> > +#endif
> > +    default: __bad_atomic_size(); break;
> > +    }
> > +}
> > +
> > +#define read_atomic(p) ({                                   \
> > +    union { typeof(*(p)) val; char c[sizeof(*(p))]; } x_;   \
> > +    read_atomic_size(p, x_.c, sizeof(*(p)));                \
> > +    x_.val;                                                 \
> > +})
> > +
> > +static always_inline void _write_atomic(volatile void *p,
> > +                                       unsigned long x, unsigned
> > int size)
> > +{
> > +    switch ( size )
> > +    {
> > +    case 1: writeb(x, p); break;
> > +    case 2: writew(x, p); break;
> > +    case 4: writel(x, p); break;
> > +#ifndef CONFIG_RISCV_32
> > +    case 8: writeq(x, p); break;
> > +#endif
> > +    default: __bad_atomic_size(); break;
> > +    }
> > +}
> > +
> > +#define write_atomic(p, x)                              \
> > +({                                                      \
> > +    typeof(*(p)) x_ = (x);                              \
> > +    _write_atomic((p), x_, sizeof(*(p)));               \
> > +    x_;                                                 \
> > +})
> > +
> > +static always_inline void _add_sized(volatile void *p,
> > +                                     unsigned long x, unsigned int
> > size)
> > +{
> > +    switch ( size )
> > +    {
> > +    case 1: writeb(read_atomic((volatile uint8_t *)p) + x, p);
> > break;
> > +    case 2: writew(read_atomic((volatile uint16_t *)p) + x, p);
> > break;
> > +    case 4: writel(read_atomic((volatile uint32_t *)p) + x, p);
> > break;
> > +#ifndef CONFIG_RISCV_32
> > +    case 8: writeq(read_atomic((volatile uint64_t *)p) + x, p);
> > break;
> > +#endif
> 
> Any particular reason for using read_atomic() but write{b,w,l,q}()
> here?
There is no particular reason. I will use write_atomic for consistency.

> 
> > +    default: __bad_atomic_size(); break;
> > +    }
> > +}
> > +
> > +#define add_sized(p, x)                                 \
> > +({                                                      \
> > +    typeof(*(p)) x_ = (x);                              \
> > +    _add_sized((p), x_, sizeof(*(p)));                  \
> > +})
> > +
> > +#define __atomic_acquire_fence() \
> > +    asm volatile ( RISCV_ACQUIRE_BARRIER "" ::: "memory" )
> > +
> > +#define __atomic_release_fence() \
> > +    asm volatile ( RISCV_RELEASE_BARRIER "" ::: "memory" )
> 
> There isn't any need for the "" in these two, is there?
There is no really needed "" in this case.
> 
> > +/*
> > + * First, the atomic ops that have no ordering constraints and
> > therefor don't
> > + * have the AQ or RL bits set.  These don't return anything, so
> > there's only
> > + * one version to worry about.
> > + */
> > +#define ATOMIC_OP(op, asm_op, unary_op, asm_type, c_type, prefix) 
> > \
> > +static inline                                               \
> > +void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
> > +{                                                           \
> > +    asm volatile (                                          \
> > +        "   amo" #asm_op "." #asm_type " zero, %1, %0"      \
> > +        : "+A" (v->counter)                                 \
> > +        : "r" (unary_op i)                                  \
> > +        : "memory" );                                       \
> > +}                                                           \
> > +
> > +/*
> > + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the
> > reason why
> > + * last argument for ATOMIC_OP isn't used.
> > + */
> > +#define ATOMIC_OPS(op, asm_op, unary_op)                    \
> > +        ATOMIC_OP (op, asm_op, unary_op, w, int,   )
> > +
> > +ATOMIC_OPS(add, add, +)
> > +ATOMIC_OPS(sub, add, -)
> > +ATOMIC_OPS(and, and, +)
> > +ATOMIC_OPS( or,  or, +)
> > +ATOMIC_OPS(xor, xor, +)
> > +
> > +#undef ATOMIC_OP
> > +#undef ATOMIC_OPS
> > +
> > +#include <asm-generic/atomic-ops.h>
> > +
> > +/*
> > + * Atomic ops that have ordered variant.
> > + * There's two flavors of these: the arithmatic ops have both
> > fetch and return
> > + * versions, while the logical ops only have fetch versions.
> > + */
> > +#define ATOMIC_FETCH_OP(op, asm_op, unary_op, asm_type, c_type,
> > prefix) \
> > +static
> > inline                                                       \
> > +c_type atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t
> > *v) \
> > +{                                                                 
> >   \
> > +    register c_type
> > ret;                                            \
> > +    asm volatile
> > (                                                  \
> > +        "   amo" #asm_op "." #asm_type ".aqrl  %1, %2,
> > %0"          \
> > +        : "+A" (v->counter), "=r"
> > (ret)                             \
> > +        : "r" (unary_op
> > i)                                          \
> > +        : "memory"
> > );                                               \
> > +    return
> > ret;                                                     \
> > +}
> > +
> > +#define ATOMIC_OP_RETURN(op, asm_op, c_op, unary_op, asm_type,
> > c_type, prefix) \
> > +static
> > inline                                                           \
> > +c_type atomic##prefix##_##op##_return(c_type i, atomic##prefix##_t
> > *v)  \
> > +{                                                                 
> >       \
> > +        return atomic##prefix##_fetch_##op(i, v) c_op (unary_op
> > i);     \
> 
> Nit: Too deep indentation.
> 
> > +}
> > +
> > +/*
> > + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the
> > reason why
> > + * last argument of ATOMIC_FETCH_OP, ATOMIC_OP_RETURN isn't used.
> > + */
> > +#define ATOMIC_OPS(op, asm_op, c_op,
> > unary_op)                          \
> > +        ATOMIC_FETCH_OP( op, asm_op,       unary_op, w, int,  
> > )        \
> > +        ATOMIC_OP_RETURN(op, asm_op, c_op, unary_op, w, int,   )
> > +
> > +ATOMIC_OPS(add, add, +, +)
> > +ATOMIC_OPS(sub, add, +, -)
> > +
> > +#undef ATOMIC_OPS
> > +
> > +#define ATOMIC_OPS(op, asm_op, unary_op) \
> > +        ATOMIC_FETCH_OP(op, asm_op, unary_op, w, int,   )
> > +
> > +ATOMIC_OPS(and, and, +)
> > +ATOMIC_OPS( or,  or, +)
> > +ATOMIC_OPS(xor, xor, +)
> 
> The + isn't really needed here as a macro argument; ATOMIC_OPS()
> itself could
> pass it to ATOMIC_FETCH_OP(). I also wonder why ATOMIC_OPS() has both
> "op" and
> "asm_op", when both are uniformly the same.
It is needed for the case when sub operation is implemented using add
plus negative number:
+ATOMIC_OPS(sub, add, +, -)


> 
> > +#undef ATOMIC_OPS
> > +
> > +#undef ATOMIC_FETCH_OP
> > +#undef ATOMIC_OP_RETURN
> > +
> > +/* This is required to provide a full barrier on success. */
> > +static inline int atomic_add_unless(atomic_t *v, int a, int u)
> > +{
> > +    int prev, rc;
> > +
> > +    asm volatile (
> > +        "0: lr.w     %[p],  %[c]\n"
> > +        "   beq      %[p],  %[u], 1f\n"
> > +        "   add      %[rc], %[p], %[a]\n"
> > +        "   sc.w.aqrl  %[rc], %[rc], %[c]\n"
> > +        "   bnez     %[rc], 0b\n"
> > +        "1:\n"
> > +        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
> > +        : [a] "r" (a), [u] "r" (u)
> > +        : "memory");
> > +    return prev;
> > +}
> > +
> > +static inline int atomic_sub_if_positive(atomic_t *v, int offset)
> > +{
> > +    int prev, rc;
> > +
> > +    asm volatile (
> > +        "0: lr.w     %[p],  %[c]\n"
> > +        "   sub      %[rc], %[p], %[o]\n"
> > +        "   bltz     %[rc], 1f\n"
> > +        "   sc.w.aqrl  %[rc], %[rc], %[c]\n"
> > +        "   bnez     %[rc], 0b\n"
> > +        "1:\n"
> > +        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
> > +        : [o] "r" (offset)
> > +        : "memory" );
> > +    return prev - offset;
> > +}
> > +
> > +/*
> > + * atomic_{cmp,}xchg is required to have exactly the same ordering
> > semantics as
> > + * {cmp,}xchg and the operations that return.
> > + */
> > +#define ATOMIC_OP(c_t, prefix, size)                            \
> > +static inline                                                   \
> > +c_t atomic##prefix##_xchg(atomic##prefix##_t *v, c_t n)         \
> > +{                                                               \
> > +    return __xchg(&v->counter, n, size);                        \
> > +}                                                               \
> > +static inline                                                   \
> > +c_t atomic##prefix##_cmpxchg(atomic##prefix##_t *v, c_t o, c_t n)
> > \
> > +{                                                               \
> > +    return __cmpxchg(&v->counter, o, n, size);                  \
> > +}
> > +
> > +#define ATOMIC_OPS() \
> > +    ATOMIC_OP(int,   , 4)
> 
> Can't the two inline functions use sizeof(*v), sizeof(v->counter),
> sizeof(c_t),
> or sizeof(n) instead of passing a literal 4 here?
Agree, it would be better.

Thanks.

~ Oleksii
Oleksii Kurochko April 12, 2024, 10:39 a.m. UTC | #3
On Mon, 2024-04-08 at 10:23 +0200, Jan Beulich wrote:
> > +static always_inline void _add_sized(volatile void *p,
> > +                                     unsigned long x, unsigned int
> > size)
> > +{
> > +    switch ( size )
> > +    {
> > +    case 1: writeb(read_atomic((volatile uint8_t *)p) + x, p);
> > break;
> > +    case 2: writew(read_atomic((volatile uint16_t *)p) + x, p);
> > break;
> > +    case 4: writel(read_atomic((volatile uint32_t *)p) + x, p);
> > break;
> > +#ifndef CONFIG_RISCV_32
> > +    case 8: writeq(read_atomic((volatile uint64_t *)p) + x, p);
> > break;
> > +#endif
> 
> Any particular reason for using read_atomic() but write{b,w,l,q}()
> here?
It was done because write_atomic() wants to have pointer as a first
argument, but read_atomic() returns a value.

As an option it can be used read{b,w,l,q}() instead of read_atomic() to
have the code consistent with write{b,w,l,q}.

Another option is to left as is and add the comment.

~ Oleksii
Oleksii Kurochko April 12, 2024, 1:42 p.m. UTC | #4
On Fri, 2024-04-12 at 12:39 +0200, Oleksii wrote:
> On Mon, 2024-04-08 at 10:23 +0200, Jan Beulich wrote:
> > > +static always_inline void _add_sized(volatile void *p,
> > > +                                     unsigned long x, unsigned
> > > int
> > > size)
> > > +{
> > > +    switch ( size )
> > > +    {
> > > +    case 1: writeb(read_atomic((volatile uint8_t *)p) + x, p);
> > > break;
> > > +    case 2: writew(read_atomic((volatile uint16_t *)p) + x, p);
> > > break;
> > > +    case 4: writel(read_atomic((volatile uint32_t *)p) + x, p);
> > > break;
> > > +#ifndef CONFIG_RISCV_32
> > > +    case 8: writeq(read_atomic((volatile uint64_t *)p) + x, p);
> > > break;
> > > +#endif
> > 
> > Any particular reason for using read_atomic() but write{b,w,l,q}()
> > here?
> It was done because write_atomic() wants to have pointer as a first
> argument, but read_atomic() returns a value.
> 
> As an option it can be used read{b,w,l,q}() instead of read_atomic()
> to
> have the code consistent with write{b,w,l,q}.
> 
> Another option is to left as is and add the comment.
I decided to write it using write_atomic() in the next way:

    case 1:
    {
        uint8_t *t = (volatile uint8_t)p;
        write_atomic(t, read_atomic(t) + x);
        break;
    }
...

~ Oleksii
Jan Beulich April 18, 2024, 6:03 a.m. UTC | #5
On 09.04.2024 09:39, Oleksii wrote:
> On Mon, 2024-04-08 at 10:23 +0200, Jan Beulich wrote:
>> On 03.04.2024 12:20, Oleksii Kurochko wrote:
>>> +/*
>>> + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the
>>> reason why
>>> + * last argument of ATOMIC_FETCH_OP, ATOMIC_OP_RETURN isn't used.
>>> + */
>>> +#define ATOMIC_OPS(op, asm_op, c_op,
>>> unary_op)                          \
>>> +        ATOMIC_FETCH_OP( op, asm_op,       unary_op, w, int,  
>>> )        \
>>> +        ATOMIC_OP_RETURN(op, asm_op, c_op, unary_op, w, int,   )
>>> +
>>> +ATOMIC_OPS(add, add, +, +)
>>> +ATOMIC_OPS(sub, add, +, -)
>>> +
>>> +#undef ATOMIC_OPS
>>> +
>>> +#define ATOMIC_OPS(op, asm_op, unary_op) \
>>> +        ATOMIC_FETCH_OP(op, asm_op, unary_op, w, int,   )
>>> +
>>> +ATOMIC_OPS(and, and, +)
>>> +ATOMIC_OPS( or,  or, +)
>>> +ATOMIC_OPS(xor, xor, +)
>>
>> The + isn't really needed here as a macro argument; ATOMIC_OPS()
>> itself could
>> pass it to ATOMIC_FETCH_OP(). I also wonder why ATOMIC_OPS() has both
>> "op" and
>> "asm_op", when both are uniformly the same.
> It is needed for the case when sub operation is implemented using add
> plus negative number:
> +ATOMIC_OPS(sub, add, +, -)

Except there's no such case right here.

Jan
diff mbox series

Patch

diff --git a/xen/arch/riscv/include/asm/atomic.h b/xen/arch/riscv/include/asm/atomic.h
new file mode 100644
index 0000000000..51574e7ce8
--- /dev/null
+++ b/xen/arch/riscv/include/asm/atomic.h
@@ -0,0 +1,261 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Taken and modified from Linux.
+ *
+ * The following changes were done:
+ * - * atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
+ *     to use__*xchg_generic()
+ * - drop casts in write_atomic() as they are unnecessary
+ * - drop introduction of WRITE_ONCE() and READ_ONCE().
+ *   Xen provides ACCESS_ONCE()
+ * - remove zero-length array access in read_atomic()
+ * - drop defines similar to pattern
+ *   #define atomic_add_return_relaxed   atomic_add_return_relaxed
+ * - move not RISC-V specific functions to asm-generic/atomics-ops.h
+ * 
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2012 Regents of the University of California
+ * Copyright (C) 2017 SiFive
+ * Copyright (C) 2024 Vates SAS
+ */
+
+#ifndef _ASM_RISCV_ATOMIC_H
+#define _ASM_RISCV_ATOMIC_H
+
+#include <xen/atomic.h>
+
+#include <asm/cmpxchg.h>
+#include <asm/fence.h>
+#include <asm/io.h>
+#include <asm/system.h>
+
+void __bad_atomic_size(void);
+
+/*
+ * Legacy from Linux kernel. For some reason they wanted to have ordered
+ * read/write access. Thereby read* is used instead of read*_cpu()
+ */
+static always_inline void read_atomic_size(const volatile void *p,
+                                           void *res,
+                                           unsigned int size)
+{
+    switch ( size )
+    {
+    case 1: *(uint8_t *)res = readb(p); break;
+    case 2: *(uint16_t *)res = readw(p); break;
+    case 4: *(uint32_t *)res = readl(p); break;
+#ifndef CONFIG_RISCV_32
+    case 8: *(uint32_t *)res = readq(p); break;
+#endif
+    default: __bad_atomic_size(); break;
+    }
+}
+
+#define read_atomic(p) ({                                   \
+    union { typeof(*(p)) val; char c[sizeof(*(p))]; } x_;   \
+    read_atomic_size(p, x_.c, sizeof(*(p)));                \
+    x_.val;                                                 \
+})
+
+static always_inline void _write_atomic(volatile void *p,
+                                       unsigned long x, unsigned int size)
+{
+    switch ( size )
+    {
+    case 1: writeb(x, p); break;
+    case 2: writew(x, p); break;
+    case 4: writel(x, p); break;
+#ifndef CONFIG_RISCV_32
+    case 8: writeq(x, p); break;
+#endif
+    default: __bad_atomic_size(); break;
+    }
+}
+
+#define write_atomic(p, x)                              \
+({                                                      \
+    typeof(*(p)) x_ = (x);                              \
+    _write_atomic((p), x_, sizeof(*(p)));               \
+    x_;                                                 \
+})
+
+static always_inline void _add_sized(volatile void *p,
+                                     unsigned long x, unsigned int size)
+{
+    switch ( size )
+    {
+    case 1: writeb(read_atomic((volatile uint8_t *)p) + x, p); break;
+    case 2: writew(read_atomic((volatile uint16_t *)p) + x, p); break;
+    case 4: writel(read_atomic((volatile uint32_t *)p) + x, p); break;
+#ifndef CONFIG_RISCV_32
+    case 8: writeq(read_atomic((volatile uint64_t *)p) + x, p); break;
+#endif
+    default: __bad_atomic_size(); break;
+    }
+}
+
+#define add_sized(p, x)                                 \
+({                                                      \
+    typeof(*(p)) x_ = (x);                              \
+    _add_sized((p), x_, sizeof(*(p)));                  \
+})
+
+#define __atomic_acquire_fence() \
+    asm volatile ( RISCV_ACQUIRE_BARRIER "" ::: "memory" )
+
+#define __atomic_release_fence() \
+    asm volatile ( RISCV_RELEASE_BARRIER "" ::: "memory" )
+
+/*
+ * First, the atomic ops that have no ordering constraints and therefor don't
+ * have the AQ or RL bits set.  These don't return anything, so there's only
+ * one version to worry about.
+ */
+#define ATOMIC_OP(op, asm_op, unary_op, asm_type, c_type, prefix)  \
+static inline                                               \
+void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
+{                                                           \
+    asm volatile (                                          \
+        "   amo" #asm_op "." #asm_type " zero, %1, %0"      \
+        : "+A" (v->counter)                                 \
+        : "r" (unary_op i)                                  \
+        : "memory" );                                       \
+}                                                           \
+
+/*
+ * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the reason why
+ * last argument for ATOMIC_OP isn't used.
+ */
+#define ATOMIC_OPS(op, asm_op, unary_op)                    \
+        ATOMIC_OP (op, asm_op, unary_op, w, int,   )
+
+ATOMIC_OPS(add, add, +)
+ATOMIC_OPS(sub, add, -)
+ATOMIC_OPS(and, and, +)
+ATOMIC_OPS( or,  or, +)
+ATOMIC_OPS(xor, xor, +)
+
+#undef ATOMIC_OP
+#undef ATOMIC_OPS
+
+#include <asm-generic/atomic-ops.h>
+
+/*
+ * Atomic ops that have ordered variant.
+ * There's two flavors of these: the arithmatic ops have both fetch and return
+ * versions, while the logical ops only have fetch versions.
+ */
+#define ATOMIC_FETCH_OP(op, asm_op, unary_op, asm_type, c_type, prefix) \
+static inline                                                       \
+c_type atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t *v) \
+{                                                                   \
+    register c_type ret;                                            \
+    asm volatile (                                                  \
+        "   amo" #asm_op "." #asm_type ".aqrl  %1, %2, %0"          \
+        : "+A" (v->counter), "=r" (ret)                             \
+        : "r" (unary_op i)                                          \
+        : "memory" );                                               \
+    return ret;                                                     \
+}
+
+#define ATOMIC_OP_RETURN(op, asm_op, c_op, unary_op, asm_type, c_type, prefix) \
+static inline                                                           \
+c_type atomic##prefix##_##op##_return(c_type i, atomic##prefix##_t *v)  \
+{                                                                       \
+        return atomic##prefix##_fetch_##op(i, v) c_op (unary_op i);     \
+}
+
+/*
+ * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the reason why
+ * last argument of ATOMIC_FETCH_OP, ATOMIC_OP_RETURN isn't used.
+ */
+#define ATOMIC_OPS(op, asm_op, c_op, unary_op)                          \
+        ATOMIC_FETCH_OP( op, asm_op,       unary_op, w, int,   )        \
+        ATOMIC_OP_RETURN(op, asm_op, c_op, unary_op, w, int,   )
+
+ATOMIC_OPS(add, add, +, +)
+ATOMIC_OPS(sub, add, +, -)
+
+#undef ATOMIC_OPS
+
+#define ATOMIC_OPS(op, asm_op, unary_op) \
+        ATOMIC_FETCH_OP(op, asm_op, unary_op, w, int,   )
+
+ATOMIC_OPS(and, and, +)
+ATOMIC_OPS( or,  or, +)
+ATOMIC_OPS(xor, xor, +)
+
+#undef ATOMIC_OPS
+
+#undef ATOMIC_FETCH_OP
+#undef ATOMIC_OP_RETURN
+
+/* This is required to provide a full barrier on success. */
+static inline int atomic_add_unless(atomic_t *v, int a, int u)
+{
+    int prev, rc;
+
+    asm volatile (
+        "0: lr.w     %[p],  %[c]\n"
+        "   beq      %[p],  %[u], 1f\n"
+        "   add      %[rc], %[p], %[a]\n"
+        "   sc.w.aqrl  %[rc], %[rc], %[c]\n"
+        "   bnez     %[rc], 0b\n"
+        "1:\n"
+        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
+        : [a] "r" (a), [u] "r" (u)
+        : "memory");
+    return prev;
+}
+
+static inline int atomic_sub_if_positive(atomic_t *v, int offset)
+{
+    int prev, rc;
+
+    asm volatile (
+        "0: lr.w     %[p],  %[c]\n"
+        "   sub      %[rc], %[p], %[o]\n"
+        "   bltz     %[rc], 1f\n"
+        "   sc.w.aqrl  %[rc], %[rc], %[c]\n"
+        "   bnez     %[rc], 0b\n"
+        "1:\n"
+        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
+        : [o] "r" (offset)
+        : "memory" );
+    return prev - offset;
+}
+
+/*
+ * atomic_{cmp,}xchg is required to have exactly the same ordering semantics as
+ * {cmp,}xchg and the operations that return.
+ */
+#define ATOMIC_OP(c_t, prefix, size)                            \
+static inline                                                   \
+c_t atomic##prefix##_xchg(atomic##prefix##_t *v, c_t n)         \
+{                                                               \
+    return __xchg(&v->counter, n, size);                        \
+}                                                               \
+static inline                                                   \
+c_t atomic##prefix##_cmpxchg(atomic##prefix##_t *v, c_t o, c_t n) \
+{                                                               \
+    return __cmpxchg(&v->counter, o, n, size);                  \
+}
+
+#define ATOMIC_OPS() \
+    ATOMIC_OP(int,   , 4)
+
+ATOMIC_OPS()
+
+#undef ATOMIC_OPS
+#undef ATOMIC_OP
+
+#endif /* _ASM_RISCV_ATOMIC_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-generic/atomic-ops.h b/xen/include/asm-generic/atomic-ops.h
new file mode 100644
index 0000000000..98dd907942
--- /dev/null
+++ b/xen/include/asm-generic/atomic-ops.h
@@ -0,0 +1,97 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * The header provides default implementations for every xen/atomic.h-provided
+ * forward inline declaration that can be synthesized from other atomic
+ * functions or being created from scratch.
+ */
+#ifndef _ASM_GENERIC_ATOMIC_OPS_H_
+#define _ASM_GENERIC_ATOMIC_OPS_H_
+
+#include <xen/atomic.h>
+#include <xen/lib.h>
+
+#ifndef ATOMIC_READ
+static inline int atomic_read(const atomic_t *v)
+{
+    return ACCESS_ONCE(v->counter);
+}
+#endif
+
+#ifndef _ATOMIC_READ
+static inline int _atomic_read(atomic_t v)
+{
+    return v.counter;
+}
+#endif
+
+#ifndef ATOMIC_SET
+static inline void atomic_set(atomic_t *v, int i)
+{
+    ACCESS_ONCE(v->counter) = i;
+}
+#endif
+
+#ifndef _ATOMIC_SET
+static inline void _atomic_set(atomic_t *v, int i)
+{
+    v->counter = i;
+}
+#endif
+
+#ifndef ATOMIC_SUB_AND_TEST
+static inline int atomic_sub_and_test(int i, atomic_t *v)
+{
+    return atomic_sub_return(i, v) == 0;
+}
+#endif
+
+#ifndef ATOMIC_INC_AND_TEST
+static inline int atomic_inc_and_test(atomic_t *v)
+{
+    return atomic_add_return(1, v) == 0;
+}
+#endif
+
+#ifndef ATOMIC_INC
+static inline void atomic_inc(atomic_t *v)
+{
+    atomic_add(1, v);
+}
+#endif
+
+#ifndef ATOMIC_INC_RETURN
+static inline int atomic_inc_return(atomic_t *v)
+{
+    return atomic_add_return(1, v);
+}
+#endif
+
+#ifndef ATOMIC_DEC
+static inline void atomic_dec(atomic_t *v)
+{
+    atomic_sub(1, v);
+}
+#endif
+
+#ifndef ATOMIC_DEC_RETURN
+static inline int atomic_dec_return(atomic_t *v)
+{
+    return atomic_sub_return(1, v);
+}
+#endif
+
+#ifndef ATOMIC_DEC_AND_TEST
+static inline int atomic_dec_and_test(atomic_t *v)
+{
+    return atomic_sub_return(1, v) == 0;
+}
+#endif
+
+#ifndef ATOMIC_ADD_NEGATIVE
+static inline int atomic_add_negative(int i, atomic_t *v)
+{
+    return atomic_add_return(i, v) < 0;
+}
+#endif
+
+#endif /* _ASM_GENERIC_ATOMIC_OPS_H_ */