diff mbox series

[2/2] membarrier: riscv: Provide core serializing command

Message ID 20231127103235.28442-3-parri.andrea@gmail.com (mailing list archive)
State Superseded, archived
Headers show
Series membarrier: riscv: Provide core serializing command | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR success PR summary
conchuod/patch-2-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh
conchuod/patch-2-test-2 success .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh
conchuod/patch-2-test-3 success .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh
conchuod/patch-2-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-2-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-2-test-6 warning .github/scripts/patches/tests/checkpatch.sh
conchuod/patch-2-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh
conchuod/patch-2-test-8 success .github/scripts/patches/tests/header_inline.sh
conchuod/patch-2-test-9 success .github/scripts/patches/tests/kdoc.sh
conchuod/patch-2-test-10 success .github/scripts/patches/tests/module_param.sh
conchuod/patch-2-test-11 success .github/scripts/patches/tests/verify_fixes.sh
conchuod/patch-2-test-12 success .github/scripts/patches/tests/verify_signedoff.sh

Commit Message

Andrea Parri Nov. 27, 2023, 10:32 a.m. UTC
RISC-V uses xRET instructions on return from interrupt and to go back
to user-space; the xRET instruction is not core serializing.

Use FENCE.I for providing core serialization as follows:

 - by calling sync_core_before_usermode() on return from interrupt (cf.
   ipi_sync_core()),

 - via switch_mm() and sync_core_before_usermode() (respectively, for
   uthread->uthread and kthread->uthread transitions) to go back to
   user-space.

On RISC-V, the serialization in switch_mm() is activated by resetting
the icache_stale_mask of the mm at prepare_sync_core_cmd().

Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Suggested-by: Palmer Dabbelt <palmer@dabbelt.com>
---
 .../membarrier-sync-core/arch-support.txt     |  2 +-
 arch/riscv/Kconfig                            |  3 ++
 arch/riscv/include/asm/sync_core.h            | 29 +++++++++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/include/asm/sync_core.h

Comments

Mathieu Desnoyers Nov. 27, 2023, 1:28 p.m. UTC | #1
On 2023-11-27 05:32, Andrea Parri wrote:
> RISC-V uses xRET instructions on return from interrupt and to go back
> to user-space; the xRET instruction is not core serializing.
> 
> Use FENCE.I for providing core serialization as follows:
> 
>   - by calling sync_core_before_usermode() on return from interrupt (cf.
>     ipi_sync_core()),
> 
>   - via switch_mm() and sync_core_before_usermode() (respectively, for
>     uthread->uthread and kthread->uthread transitions) to go back to
>     user-space.
> 
> On RISC-V, the serialization in switch_mm() is activated by resetting
> the icache_stale_mask of the mm at prepare_sync_core_cmd().
> 
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> Suggested-by: Palmer Dabbelt <palmer@dabbelt.com>
> ---

[...]

> +
> +#ifdef CONFIG_SMP
> +/*
> + * Ensure the next switch_mm() on every CPU issues a core serializing
> + * instruction for the given @mm.
> + */
> +static inline void prepare_sync_core_cmd(struct mm_struct *mm)
> +{
> +	cpumask_setall(&mm->context.icache_stale_mask);

I am concerned about the possibility that this change lacks two barriers in the
following scenario:

On a transition from uthread -> uthread on [CPU 0], from a thread belonging to
another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent
membarrier sync-core is done on [CPU 1]:

- [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers
   associated with these stores.

- [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C]
   within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes
   cpu_rq(0)->curr->mm != mm, so it skips the IPI.

- This means membarrier relies on switch_mm() to issue the sync-core.

- [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm()
   may incorrectly skip the sync-core.

AFAIU, [C] can be reordered before [A] because there is no barrier between those
operations within membarrier. I suspect it can cause the switch_mm() code to skip
a needed sync-core.

AFAIU, [D] can be reordered before [B] because there is no documented barrier
between those operations within the scheduler, which can also cause switch_mm()
to skip a needed sync-core.

We possibly have a similar scenario for uthread->uthread when the scheduler
switches between mm -> !mm.

One way to fix this would be to add the following barriers:

- A smp_mb() between [A] and [C], possibly just after cpumask_setall() in
   prepare_sync_core_cmd(), with comments detailing the ordering it guarantees,
- A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in
   flush_icache_deferred(), with appropriate comments.

Am I missing something ?

Thanks,

Mathieu

> +}
> +#else
> +static inline void prepare_sync_core_cmd(struct mm_struct *mm)
> +{
> +}
> +#endif /* CONFIG_SMP */
> +
> +#endif /* _ASM_RISCV_SYNC_CORE_H */
Andrea Parri Nov. 28, 2023, 3:13 p.m. UTC | #2
> I am concerned about the possibility that this change lacks two barriers in the
> following scenario:
> 
> On a transition from uthread -> uthread on [CPU 0], from a thread belonging to
> another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent
> membarrier sync-core is done on [CPU 1]:
> 
> - [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers
>   associated with these stores.
> 
> - [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C]
>   within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes
>   cpu_rq(0)->curr->mm != mm, so it skips the IPI.
> 
> - This means membarrier relies on switch_mm() to issue the sync-core.
> 
> - [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm()
>   may incorrectly skip the sync-core.
> 
> AFAIU, [C] can be reordered before [A] because there is no barrier between those
> operations within membarrier. I suspect it can cause the switch_mm() code to skip
> a needed sync-core.
> 
> AFAIU, [D] can be reordered before [B] because there is no documented barrier
> between those operations within the scheduler, which can also cause switch_mm()
> to skip a needed sync-core.
> 
> We possibly have a similar scenario for uthread->uthread when the scheduler
> switches between mm -> !mm.
> 
> One way to fix this would be to add the following barriers:
> 
> - A smp_mb() between [A] and [C], possibly just after cpumask_setall() in
>   prepare_sync_core_cmd(), with comments detailing the ordering it guarantees,
> - A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in
>   flush_icache_deferred(), with appropriate comments.
> 
> Am I missing something ?

Thanks for the detailed analysis.

AFAIU, the following barrier (in membarrier_private_expedited())

	/*
	 * Matches memory barriers around rq->curr modification in
	 * scheduler.
	 */
	smp_mb();	/* system call entry is not a mb. */

can serve the purpose of ordering [A] before [C] (to be documented in v2).

But I agree that [B] and [D] are unordered /missing suitable synchronization.
Worse, RISC-V has currently no full barrier after [B] and before returning to
user-space: I'm thinking (inspired by the PowerPC implementation),

diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 217fd4de61342..f63222513076d 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 	if (unlikely(prev == next))
 		return;
 
+#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP)
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * after storing to rq->curr, before going back to user-space.
+	 *
+	 * Only need the full barrier when switching between processes:
+	 * barrier when switching from kernel to userspace is not
+	 * required here, given that it is implied by mmdrop(); barrier
+	 * when switching from userspace to kernel is not needed after
+	 * store to rq->curr.
+	 */
+	if (unlikely(atomic_read(&next->membarrier_state) &
+		     (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
+		      MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev)
+		smp_mb();
+#endif
+
 	/*
 	 * Mark the current MM context as inactive, and the next as
 	 * active.  This is at least used by the icache flushing
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a708d225c28e8..a1c749fddd095 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6670,8 +6670,9 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 		 *
 		 * Here are the schemes providing that barrier on the
 		 * various architectures:
-		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.
-		 *   switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.
+		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC,
+		 *   RISC-V.  switch_mm() relies on membarrier_arch_switch_mm()
+		 *   on PowerPC.
 		 * - finish_lock_switch() for weakly-ordered
 		 *   architectures where spin_unlock is a full barrier,
 		 * - switch_to() for arm64 (weakly-ordered, spin_unlock

The silver lining is that similar changes (probably as a separate/preliminary
patch) also restore the desired order between [B] and [D] AFAIU; so with them,
2/2 would just need additions to document the above SYNC_CORE scenario.

Thoughts?

  Andrea
Mathieu Desnoyers Nov. 28, 2023, 6:39 p.m. UTC | #3
On 2023-11-28 10:13, Andrea Parri wrote:
>> I am concerned about the possibility that this change lacks two barriers in the
>> following scenario:
>>
>> On a transition from uthread -> uthread on [CPU 0], from a thread belonging to
>> another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent
>> membarrier sync-core is done on [CPU 1]:
>>
>> - [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers
>>    associated with these stores.
>>
>> - [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C]
>>    within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes
>>    cpu_rq(0)->curr->mm != mm, so it skips the IPI.
>>
>> - This means membarrier relies on switch_mm() to issue the sync-core.
>>
>> - [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm()
>>    may incorrectly skip the sync-core.
>>
>> AFAIU, [C] can be reordered before [A] because there is no barrier between those
>> operations within membarrier. I suspect it can cause the switch_mm() code to skip
>> a needed sync-core.
>>
>> AFAIU, [D] can be reordered before [B] because there is no documented barrier
>> between those operations within the scheduler, which can also cause switch_mm()
>> to skip a needed sync-core.
>>
>> We possibly have a similar scenario for uthread->uthread when the scheduler
>> switches between mm -> !mm.
>>
>> One way to fix this would be to add the following barriers:
>>
>> - A smp_mb() between [A] and [C], possibly just after cpumask_setall() in
>>    prepare_sync_core_cmd(), with comments detailing the ordering it guarantees,
>> - A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in
>>    flush_icache_deferred(), with appropriate comments.
>>
>> Am I missing something ?
> 
> Thanks for the detailed analysis.
> 
> AFAIU, the following barrier (in membarrier_private_expedited())
> 
> 	/*
> 	 * Matches memory barriers around rq->curr modification in
> 	 * scheduler.
> 	 */
> 	smp_mb();	/* system call entry is not a mb. */
> 
> can serve the purpose of ordering [A] before [C] (to be documented in v2).

Agreed. Yes it should be documented.

> 
> But I agree that [B] and [D] are unordered /missing suitable synchronization.
> Worse, RISC-V has currently no full barrier after [B] and before returning to
> user-space: I'm thinking (inspired by the PowerPC implementation),

If RISC-V currently supports the membarrier private cmd and lacks the 
appropriate smp_mb() in switch_mm(), then it's a bug. This initial patch 
should be a "Fix" and fast-tracked as such.

Indeed, looking at how ASID is used to implement switch_mm, it appears 
to not require a full smp_mb() at all as long as there are no ASID 
rollovers.

> 
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index 217fd4de61342..f63222513076d 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>   	if (unlikely(prev == next))
>   		return;
>   
> +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP)
> +	/*
> +	 * The membarrier system call requires a full memory barrier
> +	 * after storing to rq->curr, before going back to user-space.
> +	 *
> +	 * Only need the full barrier when switching between processes:
> +	 * barrier when switching from kernel to userspace is not
> +	 * required here, given that it is implied by mmdrop(); barrier
> +	 * when switching from userspace to kernel is not needed after
> +	 * store to rq->curr.
> +	 */
> +	if (unlikely(atomic_read(&next->membarrier_state) &
> +		     (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
> +		      MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev)
> +		smp_mb();
> +#endif

The approach looks good. Please implement it within a separate 
membarrier_arch_switch_mm() as done on powerpc.

> +
>   	/*
>   	 * Mark the current MM context as inactive, and the next as
>   	 * active.  This is at least used by the icache flushing
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a708d225c28e8..a1c749fddd095 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6670,8 +6670,9 @@ static void __sched notrace __schedule(unsigned int sched_mode)
>   		 *
>   		 * Here are the schemes providing that barrier on the
>   		 * various architectures:
> -		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.
> -		 *   switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.
> +		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC,
> +		 *   RISC-V.  switch_mm() relies on membarrier_arch_switch_mm()
> +		 *   on PowerPC.
>   		 * - finish_lock_switch() for weakly-ordered
>   		 *   architectures where spin_unlock is a full barrier,
>   		 * - switch_to() for arm64 (weakly-ordered, spin_unlock
> 
> The silver lining is that similar changes (probably as a separate/preliminary
> patch) also restore the desired order between [B] and [D] AFAIU; so with them,
> 2/2 would just need additions to document the above SYNC_CORE scenario.

Exactly.

> Thoughts?

I think we should be OK with the changes you suggest.

Thanks!

Mathieu
Andrea Parri Nov. 29, 2023, 6:29 p.m. UTC | #4
> > diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> > index 217fd4de61342..f63222513076d 100644
> > --- a/arch/riscv/mm/context.c
> > +++ b/arch/riscv/mm/context.c
> > @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> >   	if (unlikely(prev == next))
> >   		return;
> > +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP)
> > +	/*
> > +	 * The membarrier system call requires a full memory barrier
> > +	 * after storing to rq->curr, before going back to user-space.
> > +	 *
> > +	 * Only need the full barrier when switching between processes:
> > +	 * barrier when switching from kernel to userspace is not
> > +	 * required here, given that it is implied by mmdrop(); barrier
> > +	 * when switching from userspace to kernel is not needed after
> > +	 * store to rq->curr.
> > +	 */
> > +	if (unlikely(atomic_read(&next->membarrier_state) &
> > +		     (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
> > +		      MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev)
> > +		smp_mb();
> > +#endif
> 
> The approach looks good. Please implement it within a separate
> membarrier_arch_switch_mm() as done on powerpc.

Will do.  Thanks.

As regards the Fixes: tag, I guess it boils down to what we want or we
need to take for commit "riscv: Support membarrier private cmd".  :-)
FWIW, a quick git-log search confirmed that MEMBARRIER has been around
for quite some time in the RISC-V world (though I'm not familiar with
any of its mainstream uses): commit 1464d00b27b2 says (at least) since
93917ad50972 ("RISC-V: Add support for restartable sequence").  I am
currently inclined to pick the latter commit (and check it w/ Palmer),
but other suggestions are welcome.

  Andrea
Mathieu Desnoyers Nov. 29, 2023, 8 p.m. UTC | #5
On 2023-11-29 13:29, Andrea Parri wrote:
>>> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
>>> index 217fd4de61342..f63222513076d 100644
>>> --- a/arch/riscv/mm/context.c
>>> +++ b/arch/riscv/mm/context.c
>>> @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>>>    	if (unlikely(prev == next))
>>>    		return;
>>> +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP)
>>> +	/*
>>> +	 * The membarrier system call requires a full memory barrier
>>> +	 * after storing to rq->curr, before going back to user-space.
>>> +	 *
>>> +	 * Only need the full barrier when switching between processes:
>>> +	 * barrier when switching from kernel to userspace is not
>>> +	 * required here, given that it is implied by mmdrop(); barrier
>>> +	 * when switching from userspace to kernel is not needed after
>>> +	 * store to rq->curr.
>>> +	 */
>>> +	if (unlikely(atomic_read(&next->membarrier_state) &
>>> +		     (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
>>> +		      MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev)
>>> +		smp_mb();
>>> +#endif
>>
>> The approach looks good. Please implement it within a separate
>> membarrier_arch_switch_mm() as done on powerpc.
> 
> Will do.  Thanks.
> 
> As regards the Fixes: tag, I guess it boils down to what we want or we
> need to take for commit "riscv: Support membarrier private cmd".  :-)

I'm not seeing this commit in the Linux master branch, am I missing
something ?

> FWIW, a quick git-log search confirmed that MEMBARRIER has been around
> for quite some time in the RISC-V world (though I'm not familiar with
> any of its mainstream uses): commit 1464d00b27b2 says (at least) since
> 93917ad50972 ("RISC-V: Add support for restartable sequence").  I am
> currently inclined to pick the latter commit (and check it w/ Palmer),
> but other suggestions are welcome.

Supporting membarrier private expedited is not optional since Linux 4.14:

see kernel/sched/core.c:

                 membarrier_switch_mm(rq, prev->active_mm, next->mm);
                 /*
                  * sys_membarrier() requires an smp_mb() between setting
                  * rq->curr / membarrier_switch_mm() and returning to userspace.
                  *
                  * The below provides this either through switch_mm(), or in
                  * case 'prev->active_mm == next->mm' through
                  * finish_task_switch()'s mmdrop().
                  */
                 switch_mm_irqs_off(prev->active_mm, next->mm, next);

Failure to provide the required barrier is a bug in the architecture's
switch_mm implementation when CONFIG_MEMBARRIER=y.

We should probably introduce a new
Documentation/features/sched/membarrier/arch-support.txt
to clarify this requirement.

Userspace code such as liburcu [1] heavily relies on membarrier private
expedited (when available) to speed up RCU read-side critical sections.
Various DNS servers, including BIND 9, use liburcu.

Thanks,

Mathieu

[1] https:/liburcu.org
Andrea Parri Nov. 29, 2023, 9:25 p.m. UTC | #6
> > As regards the Fixes: tag, I guess it boils down to what we want or we
> > need to take for commit "riscv: Support membarrier private cmd".  :-)
> 
> I'm not seeing this commit in the Linux master branch, am I missing
> something ?

I don't think you're missing something: I was wondering "what/where is
this commit"?  Sorry for the confusion.


> > FWIW, a quick git-log search confirmed that MEMBARRIER has been around
> > for quite some time in the RISC-V world (though I'm not familiar with
> > any of its mainstream uses): commit 1464d00b27b2 says (at least) since
> > 93917ad50972 ("RISC-V: Add support for restartable sequence").  I am
> > currently inclined to pick the latter commit (and check it w/ Palmer),
> > but other suggestions are welcome.
> 
> Supporting membarrier private expedited is not optional since Linux 4.14:
> 
> see kernel/sched/core.c:
> 
>                 membarrier_switch_mm(rq, prev->active_mm, next->mm);
>                 /*
>                  * sys_membarrier() requires an smp_mb() between setting
>                  * rq->curr / membarrier_switch_mm() and returning to userspace.
>                  *
>                  * The below provides this either through switch_mm(), or in
>                  * case 'prev->active_mm == next->mm' through
>                  * finish_task_switch()'s mmdrop().
>                  */
>                 switch_mm_irqs_off(prev->active_mm, next->mm, next);
> 
> Failure to provide the required barrier is a bug in the architecture's
> switch_mm implementation when CONFIG_MEMBARRIER=y.
> 
> We should probably introduce a new
> Documentation/features/sched/membarrier/arch-support.txt
> to clarify this requirement.
> 
> Userspace code such as liburcu [1] heavily relies on membarrier private
> expedited (when available) to speed up RCU read-side critical sections.
> Various DNS servers, including BIND 9, use liburcu.

Thanks for the information.

So I should probably stick to 93917ad50972, which apparently selected
CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question.

I'll look into adding the membarrier feature you mention (as a final/
follow-up patch), unless you or someone else want to take care of it.

  Andrea
Mathieu Desnoyers Nov. 29, 2023, 9:32 p.m. UTC | #7
On 2023-11-29 16:25, Andrea Parri wrote:
>>> As regards the Fixes: tag, I guess it boils down to what we want or we
>>> need to take for commit "riscv: Support membarrier private cmd".  :-)
>>
>> I'm not seeing this commit in the Linux master branch, am I missing
>> something ?
> 
> I don't think you're missing something: I was wondering "what/where is
> this commit"?  Sorry for the confusion.
> 
> 
>>> FWIW, a quick git-log search confirmed that MEMBARRIER has been around
>>> for quite some time in the RISC-V world (though I'm not familiar with
>>> any of its mainstream uses): commit 1464d00b27b2 says (at least) since
>>> 93917ad50972 ("RISC-V: Add support for restartable sequence").  I am
>>> currently inclined to pick the latter commit (and check it w/ Palmer),
>>> but other suggestions are welcome.
>>
>> Supporting membarrier private expedited is not optional since Linux 4.14:
>>
>> see kernel/sched/core.c:
>>
>>                  membarrier_switch_mm(rq, prev->active_mm, next->mm);
>>                  /*
>>                   * sys_membarrier() requires an smp_mb() between setting
>>                   * rq->curr / membarrier_switch_mm() and returning to userspace.
>>                   *
>>                   * The below provides this either through switch_mm(), or in
>>                   * case 'prev->active_mm == next->mm' through
>>                   * finish_task_switch()'s mmdrop().
>>                   */
>>                  switch_mm_irqs_off(prev->active_mm, next->mm, next);
>>
>> Failure to provide the required barrier is a bug in the architecture's
>> switch_mm implementation when CONFIG_MEMBARRIER=y.
>>
>> We should probably introduce a new
>> Documentation/features/sched/membarrier/arch-support.txt
>> to clarify this requirement.
>>
>> Userspace code such as liburcu [1] heavily relies on membarrier private
>> expedited (when available) to speed up RCU read-side critical sections.
>> Various DNS servers, including BIND 9, use liburcu.
> 
> Thanks for the information.
> 
> So I should probably stick to 93917ad50972, which apparently selected
> CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question.

I think it goes further than that, because you can explicitly
CONFIG_MEMBARRIER=y, see init/Kconfig:

config MEMBARRIER
         bool "Enable membarrier() system call" if EXPERT
         default y
         help
           Enable the membarrier() system call that allows issuing memory
           barriers across all running threads, which can be used to distribute
           the cost of user-space memory barriers asymmetrically by transforming
           pairs of memory barriers into pairs consisting of membarrier() and a
           compiler barrier.

           If unsure, say Y.

Before 1464d00b27b2, riscv just happened to set it to =n in the defconfig.

I suspect the initial port of riscv merged after v4.14 was already broken.

> I'll look into adding the membarrier feature you mention (as a final/
> follow-up patch), unless you or someone else want to take care of it.

I'll be happy to review it :)

Thanks,

Mathieu
Andrea Parri Nov. 29, 2023, 10:43 p.m. UTC | #8
> > So I should probably stick to 93917ad50972, which apparently selected
> > CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question.
> 
> I think it goes further than that, because you can explicitly
> CONFIG_MEMBARRIER=y, see init/Kconfig:
> 
> config MEMBARRIER
>         bool "Enable membarrier() system call" if EXPERT
>         default y
>         help
>           Enable the membarrier() system call that allows issuing memory
>           barriers across all running threads, which can be used to distribute
>           the cost of user-space memory barriers asymmetrically by transforming
>           pairs of memory barriers into pairs consisting of membarrier() and a
>           compiler barrier.
> 
>           If unsure, say Y.
> 
> Before 1464d00b27b2, riscv just happened to set it to =n in the defconfig.
> 
> I suspect the initial port of riscv merged after v4.14 was already broken.

I see.  Oh well, guess I'll have to leave this up to the maintainers then
(I believe I've never managed to build riscv that far), Palmer?


> > I'll look into adding the membarrier feature you mention (as a final/
> > follow-up patch), unless you or someone else want to take care of it.
> 
> I'll be happy to review it :)

Sweet!  :-)

  Andrea
Palmer Dabbelt Dec. 6, 2023, 1:05 p.m. UTC | #9
On Wed, 29 Nov 2023 14:43:34 PST (-0800), parri.andrea@gmail.com wrote:
>> > So I should probably stick to 93917ad50972, which apparently selected
>> > CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question.
>>
>> I think it goes further than that, because you can explicitly
>> CONFIG_MEMBARRIER=y, see init/Kconfig:
>>
>> config MEMBARRIER
>>         bool "Enable membarrier() system call" if EXPERT
>>         default y
>>         help
>>           Enable the membarrier() system call that allows issuing memory
>>           barriers across all running threads, which can be used to distribute
>>           the cost of user-space memory barriers asymmetrically by transforming
>>           pairs of memory barriers into pairs consisting of membarrier() and a
>>           compiler barrier.
>>
>>           If unsure, say Y.
>>
>> Before 1464d00b27b2, riscv just happened to set it to =n in the defconfig.
>>
>> I suspect the initial port of riscv merged after v4.14 was already broken.
>
> I see.  Oh well, guess I'll have to leave this up to the maintainers then
> (I believe I've never managed to build riscv that far), Palmer?

I see

$ git grep "config MEMBARRIER" fab957c11efe2f405e08b9f0d080524bc2631428
fab957c11efe2f405e08b9f0d080524bc2631428:init/Kconfig:config MEMBARRIER

so IMO this is just one of those forever bugs.  So I'd lean towards

    Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code")

(or anything in that original patch set).  It's not that big of a 
backport, so I think it's safe enough?

>> > I'll look into adding the membarrier feature you mention (as a final/
>> > follow-up patch), unless you or someone else want to take care of it.
>>
>> I'll be happy to review it :)
>
> Sweet!  :-)
>
>   Andrea
Andrea Parri Dec. 6, 2023, 2:11 p.m. UTC | #10
> I see
> 
> $ git grep "config MEMBARRIER" fab957c11efe2f405e08b9f0d080524bc2631428
> fab957c11efe2f405e08b9f0d080524bc2631428:init/Kconfig:config MEMBARRIER
> 
> so IMO this is just one of those forever bugs.  So I'd lean towards
> 
>    Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code")

Works for me, will apply in v2.


> (or anything in that original patch set).  It's not that big of a backport,
> so I think it's safe enough?

Indeed, I think so.

The final version of this fix will likely depend on some machinery/code
introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier
in switch_mm()"); but, yes, nothing we can't safely adjust I think.

Thanks,
  Andrea
Palmer Dabbelt Dec. 6, 2023, 2:15 p.m. UTC | #11
On Wed, 06 Dec 2023 06:11:07 PST (-0800), parri.andrea@gmail.com wrote:
>> I see
>>
>> $ git grep "config MEMBARRIER" fab957c11efe2f405e08b9f0d080524bc2631428
>> fab957c11efe2f405e08b9f0d080524bc2631428:init/Kconfig:config MEMBARRIER
>>
>> so IMO this is just one of those forever bugs.  So I'd lean towards
>>
>>    Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code")
>
> Works for me, will apply in v2.
>
>
>> (or anything in that original patch set).  It's not that big of a backport,
>> so I think it's safe enough?
>
> Indeed, I think so.
>
> The final version of this fix will likely depend on some machinery/code
> introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier
> in switch_mm()"); but, yes, nothing we can't safely adjust I think.

Ya, I guess we'll have to look to know for sure but hopefully it's 
manageable.

> Thanks,
>   Andrea
Andrea Parri Dec. 6, 2023, 5:42 p.m. UTC | #12
> > The final version of this fix will likely depend on some machinery/code
> > introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier
> > in switch_mm()"); but, yes, nothing we can't safely adjust I think.
> 
> Ya, I guess we'll have to look to know for sure but hopefully it's
> manageable.

Absolutely.  One approach would be to follow what PowerPC did: AFAIU, before
3ccfebedd8cf54 membarrier/powerpc used to hard code the required barrier in
in finish_task_switch(), "masking" it as an smp_mb__after_unlock_lock(); riscv
could use a similar approach (though with a different/new mask function).
Alternatively, we could maybe keep the barrier in switch_mm().

But let me complete and send out v2 with the fix at stake...  this should give
us a more concrete basis to discuss about these matters.

  Andrea
Palmer Dabbelt Dec. 6, 2023, 5:56 p.m. UTC | #13
On Wed, 06 Dec 2023 09:42:44 PST (-0800), parri.andrea@gmail.com wrote:
>> > The final version of this fix will likely depend on some machinery/code
>> > introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier
>> > in switch_mm()"); but, yes, nothing we can't safely adjust I think.
>>
>> Ya, I guess we'll have to look to know for sure but hopefully it's
>> manageable.
>
> Absolutely.  One approach would be to follow what PowerPC did: AFAIU, before
> 3ccfebedd8cf54 membarrier/powerpc used to hard code the required barrier in
> in finish_task_switch(), "masking" it as an smp_mb__after_unlock_lock(); riscv
> could use a similar approach (though with a different/new mask function).

IIRC we patterned our MMIOWB/after-spinlock barriers after PPC, so 
that's probably a good place to start here as well.

> Alternatively, we could maybe keep the barrier in switch_mm().
>
> But let me complete and send out v2 with the fix at stake...  this should give
> us a more concrete basis to discuss about these matters.
>
>   Andrea
diff mbox series

Patch

diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
index d96b778b87ed8..f6f0bd871a578 100644
--- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt
+++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
@@ -43,7 +43,7 @@ 
     |    openrisc: | TODO |
     |      parisc: | TODO |
     |     powerpc: |  ok  |
-    |       riscv: | TODO |
+    |       riscv: |  ok  |
     |        s390: |  ok  |
     |          sh: | TODO |
     |       sparc: | TODO |
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 95a2a06acc6a6..3e2734a3f2957 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -27,14 +27,17 @@  config RISCV
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_GIGANTIC_PAGE
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_MMIOWB
 	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	select ARCH_HAS_PMEM_API
+	select ARCH_HAS_PREPARE_SYNC_CORE_CMD
 	select ARCH_HAS_PTE_SPECIAL
 	select ARCH_HAS_SET_DIRECT_MAP if MMU
 	select ARCH_HAS_SET_MEMORY if MMU
 	select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
 	select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
+	select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 	select ARCH_HAS_SYSCALL_WRAPPER
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
diff --git a/arch/riscv/include/asm/sync_core.h b/arch/riscv/include/asm/sync_core.h
new file mode 100644
index 0000000000000..9153016da8f14
--- /dev/null
+++ b/arch/riscv/include/asm/sync_core.h
@@ -0,0 +1,29 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RISCV_SYNC_CORE_H
+#define _ASM_RISCV_SYNC_CORE_H
+
+/*
+ * RISC-V implements return to user-space through an xRET instruction,
+ * which is not core serializing.
+ */
+static inline void sync_core_before_usermode(void)
+{
+	asm volatile ("fence.i" ::: "memory");
+}
+
+#ifdef CONFIG_SMP
+/*
+ * Ensure the next switch_mm() on every CPU issues a core serializing
+ * instruction for the given @mm.
+ */
+static inline void prepare_sync_core_cmd(struct mm_struct *mm)
+{
+	cpumask_setall(&mm->context.icache_stale_mask);
+}
+#else
+static inline void prepare_sync_core_cmd(struct mm_struct *mm)
+{
+}
+#endif /* CONFIG_SMP */
+
+#endif /* _ASM_RISCV_SYNC_CORE_H */