Message ID | 20231127103235.28442-3-parri.andrea@gmail.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | membarrier: riscv: Provide core serializing command | expand |
On 2023-11-27 05:32, Andrea Parri wrote: > RISC-V uses xRET instructions on return from interrupt and to go back > to user-space; the xRET instruction is not core serializing. > > Use FENCE.I for providing core serialization as follows: > > - by calling sync_core_before_usermode() on return from interrupt (cf. > ipi_sync_core()), > > - via switch_mm() and sync_core_before_usermode() (respectively, for > uthread->uthread and kthread->uthread transitions) to go back to > user-space. > > On RISC-V, the serialization in switch_mm() is activated by resetting > the icache_stale_mask of the mm at prepare_sync_core_cmd(). > > Signed-off-by: Andrea Parri <parri.andrea@gmail.com> > Suggested-by: Palmer Dabbelt <palmer@dabbelt.com> > --- [...] > + > +#ifdef CONFIG_SMP > +/* > + * Ensure the next switch_mm() on every CPU issues a core serializing > + * instruction for the given @mm. > + */ > +static inline void prepare_sync_core_cmd(struct mm_struct *mm) > +{ > + cpumask_setall(&mm->context.icache_stale_mask); I am concerned about the possibility that this change lacks two barriers in the following scenario: On a transition from uthread -> uthread on [CPU 0], from a thread belonging to another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent membarrier sync-core is done on [CPU 1]: - [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers associated with these stores. - [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C] within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes cpu_rq(0)->curr->mm != mm, so it skips the IPI. - This means membarrier relies on switch_mm() to issue the sync-core. - [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm() may incorrectly skip the sync-core. AFAIU, [C] can be reordered before [A] because there is no barrier between those operations within membarrier. I suspect it can cause the switch_mm() code to skip a needed sync-core. AFAIU, [D] can be reordered before [B] because there is no documented barrier between those operations within the scheduler, which can also cause switch_mm() to skip a needed sync-core. We possibly have a similar scenario for uthread->uthread when the scheduler switches between mm -> !mm. One way to fix this would be to add the following barriers: - A smp_mb() between [A] and [C], possibly just after cpumask_setall() in prepare_sync_core_cmd(), with comments detailing the ordering it guarantees, - A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in flush_icache_deferred(), with appropriate comments. Am I missing something ? Thanks, Mathieu > +} > +#else > +static inline void prepare_sync_core_cmd(struct mm_struct *mm) > +{ > +} > +#endif /* CONFIG_SMP */ > + > +#endif /* _ASM_RISCV_SYNC_CORE_H */
> I am concerned about the possibility that this change lacks two barriers in the > following scenario: > > On a transition from uthread -> uthread on [CPU 0], from a thread belonging to > another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent > membarrier sync-core is done on [CPU 1]: > > - [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers > associated with these stores. > > - [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C] > within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes > cpu_rq(0)->curr->mm != mm, so it skips the IPI. > > - This means membarrier relies on switch_mm() to issue the sync-core. > > - [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm() > may incorrectly skip the sync-core. > > AFAIU, [C] can be reordered before [A] because there is no barrier between those > operations within membarrier. I suspect it can cause the switch_mm() code to skip > a needed sync-core. > > AFAIU, [D] can be reordered before [B] because there is no documented barrier > between those operations within the scheduler, which can also cause switch_mm() > to skip a needed sync-core. > > We possibly have a similar scenario for uthread->uthread when the scheduler > switches between mm -> !mm. > > One way to fix this would be to add the following barriers: > > - A smp_mb() between [A] and [C], possibly just after cpumask_setall() in > prepare_sync_core_cmd(), with comments detailing the ordering it guarantees, > - A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in > flush_icache_deferred(), with appropriate comments. > > Am I missing something ? Thanks for the detailed analysis. AFAIU, the following barrier (in membarrier_private_expedited()) /* * Matches memory barriers around rq->curr modification in * scheduler. */ smp_mb(); /* system call entry is not a mb. */ can serve the purpose of ordering [A] before [C] (to be documented in v2). But I agree that [B] and [D] are unordered /missing suitable synchronization. Worse, RISC-V has currently no full barrier after [B] and before returning to user-space: I'm thinking (inspired by the PowerPC implementation), diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c index 217fd4de61342..f63222513076d 100644 --- a/arch/riscv/mm/context.c +++ b/arch/riscv/mm/context.c @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, if (unlikely(prev == next)) return; +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP) + /* + * The membarrier system call requires a full memory barrier + * after storing to rq->curr, before going back to user-space. + * + * Only need the full barrier when switching between processes: + * barrier when switching from kernel to userspace is not + * required here, given that it is implied by mmdrop(); barrier + * when switching from userspace to kernel is not needed after + * store to rq->curr. + */ + if (unlikely(atomic_read(&next->membarrier_state) & + (MEMBARRIER_STATE_PRIVATE_EXPEDITED | + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev) + smp_mb(); +#endif + /* * Mark the current MM context as inactive, and the next as * active. This is at least used by the icache flushing diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a708d225c28e8..a1c749fddd095 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6670,8 +6670,9 @@ static void __sched notrace __schedule(unsigned int sched_mode) * * Here are the schemes providing that barrier on the * various architectures: - * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC. - * switch_mm() rely on membarrier_arch_switch_mm() on PowerPC. + * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC, + * RISC-V. switch_mm() relies on membarrier_arch_switch_mm() + * on PowerPC. * - finish_lock_switch() for weakly-ordered * architectures where spin_unlock is a full barrier, * - switch_to() for arm64 (weakly-ordered, spin_unlock The silver lining is that similar changes (probably as a separate/preliminary patch) also restore the desired order between [B] and [D] AFAIU; so with them, 2/2 would just need additions to document the above SYNC_CORE scenario. Thoughts? Andrea
On 2023-11-28 10:13, Andrea Parri wrote: >> I am concerned about the possibility that this change lacks two barriers in the >> following scenario: >> >> On a transition from uthread -> uthread on [CPU 0], from a thread belonging to >> another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent >> membarrier sync-core is done on [CPU 1]: >> >> - [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers >> associated with these stores. >> >> - [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C] >> within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes >> cpu_rq(0)->curr->mm != mm, so it skips the IPI. >> >> - This means membarrier relies on switch_mm() to issue the sync-core. >> >> - [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm() >> may incorrectly skip the sync-core. >> >> AFAIU, [C] can be reordered before [A] because there is no barrier between those >> operations within membarrier. I suspect it can cause the switch_mm() code to skip >> a needed sync-core. >> >> AFAIU, [D] can be reordered before [B] because there is no documented barrier >> between those operations within the scheduler, which can also cause switch_mm() >> to skip a needed sync-core. >> >> We possibly have a similar scenario for uthread->uthread when the scheduler >> switches between mm -> !mm. >> >> One way to fix this would be to add the following barriers: >> >> - A smp_mb() between [A] and [C], possibly just after cpumask_setall() in >> prepare_sync_core_cmd(), with comments detailing the ordering it guarantees, >> - A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in >> flush_icache_deferred(), with appropriate comments. >> >> Am I missing something ? > > Thanks for the detailed analysis. > > AFAIU, the following barrier (in membarrier_private_expedited()) > > /* > * Matches memory barriers around rq->curr modification in > * scheduler. > */ > smp_mb(); /* system call entry is not a mb. */ > > can serve the purpose of ordering [A] before [C] (to be documented in v2). Agreed. Yes it should be documented. > > But I agree that [B] and [D] are unordered /missing suitable synchronization. > Worse, RISC-V has currently no full barrier after [B] and before returning to > user-space: I'm thinking (inspired by the PowerPC implementation), If RISC-V currently supports the membarrier private cmd and lacks the appropriate smp_mb() in switch_mm(), then it's a bug. This initial patch should be a "Fix" and fast-tracked as such. Indeed, looking at how ASID is used to implement switch_mm, it appears to not require a full smp_mb() at all as long as there are no ASID rollovers. > > diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c > index 217fd4de61342..f63222513076d 100644 > --- a/arch/riscv/mm/context.c > +++ b/arch/riscv/mm/context.c > @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, > if (unlikely(prev == next)) > return; > > +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP) > + /* > + * The membarrier system call requires a full memory barrier > + * after storing to rq->curr, before going back to user-space. > + * > + * Only need the full barrier when switching between processes: > + * barrier when switching from kernel to userspace is not > + * required here, given that it is implied by mmdrop(); barrier > + * when switching from userspace to kernel is not needed after > + * store to rq->curr. > + */ > + if (unlikely(atomic_read(&next->membarrier_state) & > + (MEMBARRIER_STATE_PRIVATE_EXPEDITED | > + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev) > + smp_mb(); > +#endif The approach looks good. Please implement it within a separate membarrier_arch_switch_mm() as done on powerpc. > + > /* > * Mark the current MM context as inactive, and the next as > * active. This is at least used by the icache flushing > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index a708d225c28e8..a1c749fddd095 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6670,8 +6670,9 @@ static void __sched notrace __schedule(unsigned int sched_mode) > * > * Here are the schemes providing that barrier on the > * various architectures: > - * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC. > - * switch_mm() rely on membarrier_arch_switch_mm() on PowerPC. > + * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC, > + * RISC-V. switch_mm() relies on membarrier_arch_switch_mm() > + * on PowerPC. > * - finish_lock_switch() for weakly-ordered > * architectures where spin_unlock is a full barrier, > * - switch_to() for arm64 (weakly-ordered, spin_unlock > > The silver lining is that similar changes (probably as a separate/preliminary > patch) also restore the desired order between [B] and [D] AFAIU; so with them, > 2/2 would just need additions to document the above SYNC_CORE scenario. Exactly. > Thoughts? I think we should be OK with the changes you suggest. Thanks! Mathieu
> > diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c > > index 217fd4de61342..f63222513076d 100644 > > --- a/arch/riscv/mm/context.c > > +++ b/arch/riscv/mm/context.c > > @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, > > if (unlikely(prev == next)) > > return; > > +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP) > > + /* > > + * The membarrier system call requires a full memory barrier > > + * after storing to rq->curr, before going back to user-space. > > + * > > + * Only need the full barrier when switching between processes: > > + * barrier when switching from kernel to userspace is not > > + * required here, given that it is implied by mmdrop(); barrier > > + * when switching from userspace to kernel is not needed after > > + * store to rq->curr. > > + */ > > + if (unlikely(atomic_read(&next->membarrier_state) & > > + (MEMBARRIER_STATE_PRIVATE_EXPEDITED | > > + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev) > > + smp_mb(); > > +#endif > > The approach looks good. Please implement it within a separate > membarrier_arch_switch_mm() as done on powerpc. Will do. Thanks. As regards the Fixes: tag, I guess it boils down to what we want or we need to take for commit "riscv: Support membarrier private cmd". :-) FWIW, a quick git-log search confirmed that MEMBARRIER has been around for quite some time in the RISC-V world (though I'm not familiar with any of its mainstream uses): commit 1464d00b27b2 says (at least) since 93917ad50972 ("RISC-V: Add support for restartable sequence"). I am currently inclined to pick the latter commit (and check it w/ Palmer), but other suggestions are welcome. Andrea
On 2023-11-29 13:29, Andrea Parri wrote: >>> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c >>> index 217fd4de61342..f63222513076d 100644 >>> --- a/arch/riscv/mm/context.c >>> +++ b/arch/riscv/mm/context.c >>> @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, >>> if (unlikely(prev == next)) >>> return; >>> +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP) >>> + /* >>> + * The membarrier system call requires a full memory barrier >>> + * after storing to rq->curr, before going back to user-space. >>> + * >>> + * Only need the full barrier when switching between processes: >>> + * barrier when switching from kernel to userspace is not >>> + * required here, given that it is implied by mmdrop(); barrier >>> + * when switching from userspace to kernel is not needed after >>> + * store to rq->curr. >>> + */ >>> + if (unlikely(atomic_read(&next->membarrier_state) & >>> + (MEMBARRIER_STATE_PRIVATE_EXPEDITED | >>> + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev) >>> + smp_mb(); >>> +#endif >> >> The approach looks good. Please implement it within a separate >> membarrier_arch_switch_mm() as done on powerpc. > > Will do. Thanks. > > As regards the Fixes: tag, I guess it boils down to what we want or we > need to take for commit "riscv: Support membarrier private cmd". :-) I'm not seeing this commit in the Linux master branch, am I missing something ? > FWIW, a quick git-log search confirmed that MEMBARRIER has been around > for quite some time in the RISC-V world (though I'm not familiar with > any of its mainstream uses): commit 1464d00b27b2 says (at least) since > 93917ad50972 ("RISC-V: Add support for restartable sequence"). I am > currently inclined to pick the latter commit (and check it w/ Palmer), > but other suggestions are welcome. Supporting membarrier private expedited is not optional since Linux 4.14: see kernel/sched/core.c: membarrier_switch_mm(rq, prev->active_mm, next->mm); /* * sys_membarrier() requires an smp_mb() between setting * rq->curr / membarrier_switch_mm() and returning to userspace. * * The below provides this either through switch_mm(), or in * case 'prev->active_mm == next->mm' through * finish_task_switch()'s mmdrop(). */ switch_mm_irqs_off(prev->active_mm, next->mm, next); Failure to provide the required barrier is a bug in the architecture's switch_mm implementation when CONFIG_MEMBARRIER=y. We should probably introduce a new Documentation/features/sched/membarrier/arch-support.txt to clarify this requirement. Userspace code such as liburcu [1] heavily relies on membarrier private expedited (when available) to speed up RCU read-side critical sections. Various DNS servers, including BIND 9, use liburcu. Thanks, Mathieu [1] https:/liburcu.org
> > As regards the Fixes: tag, I guess it boils down to what we want or we > > need to take for commit "riscv: Support membarrier private cmd". :-) > > I'm not seeing this commit in the Linux master branch, am I missing > something ? I don't think you're missing something: I was wondering "what/where is this commit"? Sorry for the confusion. > > FWIW, a quick git-log search confirmed that MEMBARRIER has been around > > for quite some time in the RISC-V world (though I'm not familiar with > > any of its mainstream uses): commit 1464d00b27b2 says (at least) since > > 93917ad50972 ("RISC-V: Add support for restartable sequence"). I am > > currently inclined to pick the latter commit (and check it w/ Palmer), > > but other suggestions are welcome. > > Supporting membarrier private expedited is not optional since Linux 4.14: > > see kernel/sched/core.c: > > membarrier_switch_mm(rq, prev->active_mm, next->mm); > /* > * sys_membarrier() requires an smp_mb() between setting > * rq->curr / membarrier_switch_mm() and returning to userspace. > * > * The below provides this either through switch_mm(), or in > * case 'prev->active_mm == next->mm' through > * finish_task_switch()'s mmdrop(). > */ > switch_mm_irqs_off(prev->active_mm, next->mm, next); > > Failure to provide the required barrier is a bug in the architecture's > switch_mm implementation when CONFIG_MEMBARRIER=y. > > We should probably introduce a new > Documentation/features/sched/membarrier/arch-support.txt > to clarify this requirement. > > Userspace code such as liburcu [1] heavily relies on membarrier private > expedited (when available) to speed up RCU read-side critical sections. > Various DNS servers, including BIND 9, use liburcu. Thanks for the information. So I should probably stick to 93917ad50972, which apparently selected CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question. I'll look into adding the membarrier feature you mention (as a final/ follow-up patch), unless you or someone else want to take care of it. Andrea
On 2023-11-29 16:25, Andrea Parri wrote: >>> As regards the Fixes: tag, I guess it boils down to what we want or we >>> need to take for commit "riscv: Support membarrier private cmd". :-) >> >> I'm not seeing this commit in the Linux master branch, am I missing >> something ? > > I don't think you're missing something: I was wondering "what/where is > this commit"? Sorry for the confusion. > > >>> FWIW, a quick git-log search confirmed that MEMBARRIER has been around >>> for quite some time in the RISC-V world (though I'm not familiar with >>> any of its mainstream uses): commit 1464d00b27b2 says (at least) since >>> 93917ad50972 ("RISC-V: Add support for restartable sequence"). I am >>> currently inclined to pick the latter commit (and check it w/ Palmer), >>> but other suggestions are welcome. >> >> Supporting membarrier private expedited is not optional since Linux 4.14: >> >> see kernel/sched/core.c: >> >> membarrier_switch_mm(rq, prev->active_mm, next->mm); >> /* >> * sys_membarrier() requires an smp_mb() between setting >> * rq->curr / membarrier_switch_mm() and returning to userspace. >> * >> * The below provides this either through switch_mm(), or in >> * case 'prev->active_mm == next->mm' through >> * finish_task_switch()'s mmdrop(). >> */ >> switch_mm_irqs_off(prev->active_mm, next->mm, next); >> >> Failure to provide the required barrier is a bug in the architecture's >> switch_mm implementation when CONFIG_MEMBARRIER=y. >> >> We should probably introduce a new >> Documentation/features/sched/membarrier/arch-support.txt >> to clarify this requirement. >> >> Userspace code such as liburcu [1] heavily relies on membarrier private >> expedited (when available) to speed up RCU read-side critical sections. >> Various DNS servers, including BIND 9, use liburcu. > > Thanks for the information. > > So I should probably stick to 93917ad50972, which apparently selected > CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question. I think it goes further than that, because you can explicitly CONFIG_MEMBARRIER=y, see init/Kconfig: config MEMBARRIER bool "Enable membarrier() system call" if EXPERT default y help Enable the membarrier() system call that allows issuing memory barriers across all running threads, which can be used to distribute the cost of user-space memory barriers asymmetrically by transforming pairs of memory barriers into pairs consisting of membarrier() and a compiler barrier. If unsure, say Y. Before 1464d00b27b2, riscv just happened to set it to =n in the defconfig. I suspect the initial port of riscv merged after v4.14 was already broken. > I'll look into adding the membarrier feature you mention (as a final/ > follow-up patch), unless you or someone else want to take care of it. I'll be happy to review it :) Thanks, Mathieu
> > So I should probably stick to 93917ad50972, which apparently selected > > CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question. > > I think it goes further than that, because you can explicitly > CONFIG_MEMBARRIER=y, see init/Kconfig: > > config MEMBARRIER > bool "Enable membarrier() system call" if EXPERT > default y > help > Enable the membarrier() system call that allows issuing memory > barriers across all running threads, which can be used to distribute > the cost of user-space memory barriers asymmetrically by transforming > pairs of memory barriers into pairs consisting of membarrier() and a > compiler barrier. > > If unsure, say Y. > > Before 1464d00b27b2, riscv just happened to set it to =n in the defconfig. > > I suspect the initial port of riscv merged after v4.14 was already broken. I see. Oh well, guess I'll have to leave this up to the maintainers then (I believe I've never managed to build riscv that far), Palmer? > > I'll look into adding the membarrier feature you mention (as a final/ > > follow-up patch), unless you or someone else want to take care of it. > > I'll be happy to review it :) Sweet! :-) Andrea
On Wed, 29 Nov 2023 14:43:34 PST (-0800), parri.andrea@gmail.com wrote: >> > So I should probably stick to 93917ad50972, which apparently selected >> > CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question. >> >> I think it goes further than that, because you can explicitly >> CONFIG_MEMBARRIER=y, see init/Kconfig: >> >> config MEMBARRIER >> bool "Enable membarrier() system call" if EXPERT >> default y >> help >> Enable the membarrier() system call that allows issuing memory >> barriers across all running threads, which can be used to distribute >> the cost of user-space memory barriers asymmetrically by transforming >> pairs of memory barriers into pairs consisting of membarrier() and a >> compiler barrier. >> >> If unsure, say Y. >> >> Before 1464d00b27b2, riscv just happened to set it to =n in the defconfig. >> >> I suspect the initial port of riscv merged after v4.14 was already broken. > > I see. Oh well, guess I'll have to leave this up to the maintainers then > (I believe I've never managed to build riscv that far), Palmer? I see $ git grep "config MEMBARRIER" fab957c11efe2f405e08b9f0d080524bc2631428 fab957c11efe2f405e08b9f0d080524bc2631428:init/Kconfig:config MEMBARRIER so IMO this is just one of those forever bugs. So I'd lean towards Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code") (or anything in that original patch set). It's not that big of a backport, so I think it's safe enough? >> > I'll look into adding the membarrier feature you mention (as a final/ >> > follow-up patch), unless you or someone else want to take care of it. >> >> I'll be happy to review it :) > > Sweet! :-) > > Andrea
> I see > > $ git grep "config MEMBARRIER" fab957c11efe2f405e08b9f0d080524bc2631428 > fab957c11efe2f405e08b9f0d080524bc2631428:init/Kconfig:config MEMBARRIER > > so IMO this is just one of those forever bugs. So I'd lean towards > > Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code") Works for me, will apply in v2. > (or anything in that original patch set). It's not that big of a backport, > so I think it's safe enough? Indeed, I think so. The final version of this fix will likely depend on some machinery/code introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier in switch_mm()"); but, yes, nothing we can't safely adjust I think. Thanks, Andrea
On Wed, 06 Dec 2023 06:11:07 PST (-0800), parri.andrea@gmail.com wrote: >> I see >> >> $ git grep "config MEMBARRIER" fab957c11efe2f405e08b9f0d080524bc2631428 >> fab957c11efe2f405e08b9f0d080524bc2631428:init/Kconfig:config MEMBARRIER >> >> so IMO this is just one of those forever bugs. So I'd lean towards >> >> Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code") > > Works for me, will apply in v2. > > >> (or anything in that original patch set). It's not that big of a backport, >> so I think it's safe enough? > > Indeed, I think so. > > The final version of this fix will likely depend on some machinery/code > introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier > in switch_mm()"); but, yes, nothing we can't safely adjust I think. Ya, I guess we'll have to look to know for sure but hopefully it's manageable. > Thanks, > Andrea
> > The final version of this fix will likely depend on some machinery/code > > introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier > > in switch_mm()"); but, yes, nothing we can't safely adjust I think. > > Ya, I guess we'll have to look to know for sure but hopefully it's > manageable. Absolutely. One approach would be to follow what PowerPC did: AFAIU, before 3ccfebedd8cf54 membarrier/powerpc used to hard code the required barrier in in finish_task_switch(), "masking" it as an smp_mb__after_unlock_lock(); riscv could use a similar approach (though with a different/new mask function). Alternatively, we could maybe keep the barrier in switch_mm(). But let me complete and send out v2 with the fix at stake... this should give us a more concrete basis to discuss about these matters. Andrea
On Wed, 06 Dec 2023 09:42:44 PST (-0800), parri.andrea@gmail.com wrote: >> > The final version of this fix will likely depend on some machinery/code >> > introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier >> > in switch_mm()"); but, yes, nothing we can't safely adjust I think. >> >> Ya, I guess we'll have to look to know for sure but hopefully it's >> manageable. > > Absolutely. One approach would be to follow what PowerPC did: AFAIU, before > 3ccfebedd8cf54 membarrier/powerpc used to hard code the required barrier in > in finish_task_switch(), "masking" it as an smp_mb__after_unlock_lock(); riscv > could use a similar approach (though with a different/new mask function). IIRC we patterned our MMIOWB/after-spinlock barriers after PPC, so that's probably a good place to start here as well. > Alternatively, we could maybe keep the barrier in switch_mm(). > > But let me complete and send out v2 with the fix at stake... this should give > us a more concrete basis to discuss about these matters. > > Andrea
diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt index d96b778b87ed8..f6f0bd871a578 100644 --- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt +++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt @@ -43,7 +43,7 @@ | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | - | riscv: | TODO | + | riscv: | ok | | s390: | ok | | sh: | TODO | | sparc: | TODO | diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 95a2a06acc6a6..3e2734a3f2957 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -27,14 +27,17 @@ config RISCV select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MMIOWB select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PMEM_API + select ARCH_HAS_PREPARE_SYNC_CORE_CMD select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_SET_DIRECT_MAP if MMU select ARCH_HAS_SET_MEMORY if MMU select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL + select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE select ARCH_HAS_SYSCALL_WRAPPER select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UBSAN_SANITIZE_ALL diff --git a/arch/riscv/include/asm/sync_core.h b/arch/riscv/include/asm/sync_core.h new file mode 100644 index 0000000000000..9153016da8f14 --- /dev/null +++ b/arch/riscv/include/asm/sync_core.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_RISCV_SYNC_CORE_H +#define _ASM_RISCV_SYNC_CORE_H + +/* + * RISC-V implements return to user-space through an xRET instruction, + * which is not core serializing. + */ +static inline void sync_core_before_usermode(void) +{ + asm volatile ("fence.i" ::: "memory"); +} + +#ifdef CONFIG_SMP +/* + * Ensure the next switch_mm() on every CPU issues a core serializing + * instruction for the given @mm. + */ +static inline void prepare_sync_core_cmd(struct mm_struct *mm) +{ + cpumask_setall(&mm->context.icache_stale_mask); +} +#else +static inline void prepare_sync_core_cmd(struct mm_struct *mm) +{ +} +#endif /* CONFIG_SMP */ + +#endif /* _ASM_RISCV_SYNC_CORE_H */
RISC-V uses xRET instructions on return from interrupt and to go back to user-space; the xRET instruction is not core serializing. Use FENCE.I for providing core serialization as follows: - by calling sync_core_before_usermode() on return from interrupt (cf. ipi_sync_core()), - via switch_mm() and sync_core_before_usermode() (respectively, for uthread->uthread and kthread->uthread transitions) to go back to user-space. On RISC-V, the serialization in switch_mm() is activated by resetting the icache_stale_mask of the mm at prepare_sync_core_cmd(). Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Suggested-by: Palmer Dabbelt <palmer@dabbelt.com> --- .../membarrier-sync-core/arch-support.txt | 2 +- arch/riscv/Kconfig | 3 ++ arch/riscv/include/asm/sync_core.h | 29 +++++++++++++++++++ 3 files changed, 33 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/sync_core.h