Message ID | 20230614163534.18668-1-palmer@rivosinc.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | RISC-V: Clobber V registers on syscalls | expand |
Context | Check | Description |
---|---|---|
conchuod/cover_letter | success | Single patches do not need cover letters |
conchuod/tree_selection | success | Guessed tree name to be for-next at HEAD d5e45e810e0e |
conchuod/fixes_present | success | Fixes tag not required for -next series |
conchuod/maintainers_pattern | success | MAINTAINERS pattern errors before the patch: 6 and now 6 |
conchuod/verify_signedoff | success | Signed-off-by tag matches author and committer |
conchuod/kdoc | success | Errors and warnings before: 0 this patch: 0 |
conchuod/build_rv64_clang_allmodconfig | success | Errors and warnings before: 8 this patch: 8 |
conchuod/module_param | success | Was 0 now: 0 |
conchuod/build_rv64_gcc_allmodconfig | success | Errors and warnings before: 8 this patch: 8 |
conchuod/build_rv32_defconfig | success | Build OK |
conchuod/dtb_warn_rv64 | success | Errors and warnings before: 3 this patch: 3 |
conchuod/header_inline | success | No static functions without inline keyword in header files |
conchuod/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 16 lines checked |
conchuod/build_rv64_nommu_k210_defconfig | success | Build OK |
conchuod/verify_fixes | success | No Fixes tag |
conchuod/build_rv64_nommu_virt_defconfig | success | Build OK |
Le keskiviikkona 14. kesäkuuta 2023, 19.35.34 EEST Palmer Dabbelt a écrit : > The V registers are clobbered by standard ABI functions, so userspace > probably doesn't have anything useful in them by the time we get to the > kernel. Indeed, for your typical system call, wrapped by two or more layers of function calls inside libc, userspace will treat the registers as clobbered anyhow. But AFAIU, other architectures don't gratuitiously clobber SIMD or vector registers, even those that are callee-clobbered by their respective function calling convention, or do they? FWIW, Arm is going the opposite direction with their higher privilege calls (newer versions of SMCCC define how to preserve SVE vectors). The kernel cannot simply clobber registers, as that would likely cause data leakage from kernel to user mode. So it is unclear what the benefits would be here. And I fear that there will be less conventional use cases whence it makes sense to preserve registers on system calls. For example an inline or compiler intrinsic implementation of C++20/C2X atomic-wait/atomic-notify, which would presumably invoke the futex() syscall on Linux, maybe??
On Thu, 15 Jun 2023 10:36:31 PDT (-0700), remi@remlab.net wrote: > Le keskiviikkona 14. kesäkuuta 2023, 19.35.34 EEST Palmer Dabbelt a écrit : >> The V registers are clobbered by standard ABI functions, so userspace >> probably doesn't have anything useful in them by the time we get to the >> kernel. > > Indeed, for your typical system call, wrapped by two or more layers of > function calls inside libc, userspace will treat the registers as clobbered > anyhow. > > But AFAIU, other architectures don't gratuitiously clobber SIMD or vector > registers, even those that are callee-clobbered by their respective function > calling convention, or do they? IIUC arm64 has some similar code, at least that's what the comment says (and I got the clobbering V state from Arm) /* * As per the ABI exit SME streaming mode and clear the SVE state not * shared with FPSIMD on syscall entry. */ static inline void fp_user_discard(void) if we don't clobber on syscalls then we'll likely need some way for userspace to inform the kernel that V state can be discarded. > FWIW, Arm is going the opposite direction with > their higher privilege calls (newer versions of SMCCC define how to preserve > SVE vectors). That has a slightly different cost structure, though: in the kernel V would usually be off, so there's already a strong indication when the save/restore is useful. > The kernel cannot simply clobber registers, as that would likely cause data > leakage from kernel to user mode. So it is unclear what the benefits would be What's the data leakage? Unless I'm missing something setting the sstatus.vs=off will result in userspace trapping in any V state access, so if we're leaking something we're probably also at risk of leaking it for new/cloned processes. That said, we do need to think about speculative side-channels: with the V crypto stuff there will be keys in V registers and other architectures have had exploitable issues related to lazy save/restore and speculation. Maybe it's best to just wait on that, though? We'd ideally want some canonical sequence in the ISA but the fastest way to do that is probably to just wait for an exploit to show up. > here. And I fear that there will be less conventional use cases whence it > makes sense to preserve registers on system calls. > > For example an inline or compiler intrinsic implementation of C++20/C2X > atomic-wait/atomic-notify, which would presumably invoke the futex() syscall > on Linux, maybe?? It'd have to be a pretty special case: at least in libstdc++ and glibc the futex calls are behind function calls, so the V registers are already clobbered by the time the kernel has been entered (at least for anything following the standard ABIs). > > -- > 雷米‧德尼-库尔蒙 > http://www.remlab.net/
Rémi Denis-Courmont <remi@remlab.net> writes: > Le keskiviikkona 14. kesäkuuta 2023, 19.35.34 EEST Palmer Dabbelt a écrit : >> The V registers are clobbered by standard ABI functions, so userspace >> probably doesn't have anything useful in them by the time we get to the >> kernel. > > Indeed, for your typical system call, wrapped by two or more layers of > function calls inside libc, userspace will treat the registers as clobbered > anyhow. > > But AFAIU, other architectures don't gratuitiously clobber SIMD or vector > registers, even those that are callee-clobbered by their respective function > calling convention, or do they? FWIW, Arm is going the opposite direction with > their higher privilege calls (newer versions of SMCCC define how to preserve > SVE vectors). Actually, it's from the V spec: riscv-v-spec-1.0-4.pdf: Executing a system call causes all caller-saved vector registers (v0-v31, vl, vtype) and vstart to become unspecified. AFAIU Arm's SVE/SME has that as well. Björn
Le torstaina 15. kesäkuuta 2023, 23.33.44 EEST Palmer Dabbelt a écrit : > > The kernel cannot simply clobber registers, as that would likely cause > > data leakage from kernel to user mode. So it is unclear what the benefits > > would be > What's the data leakage? Typically "clobbering" the register means that you are writing something else in them. If you don't restore them (or expressly reset them to zero or some other fixed value), then you leak daata. Of course, if you don't actually use the register, then you don't leak anything in them. But then it's unclear what the benefit of marking them as clobbered is. (...) > It'd have to be a pretty special case: at least in libstdc++ and glibc > the futex calls are behind function calls, Traditionally, atomic variable methods are intrinsics, which result in either inline or outline C runtime calls (with some ad-hoc ABI that clobbers very little). They cannot be C functions, since they accept parameters of several different types. atomic_notify_one, atomic_notify_all, and atomic_wait or however their standardised names end up, will presumably be outlines of the later type, that just happen to wrap futex() on Linux. But anyway, if the spec says that registers are clobbered by system calls as Björn pointed out, then that's that.
Palmer Dabbelt <palmer@rivosinc.com> writes: > The V registers are clobbered by standard ABI functions, so userspace > probably doesn't have anything useful in them by the time we get to the > kernel. So let's just document that they're clobbered by syscalls and > proactively clobber them. > > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > --- > IIRC we'd talked about doing this, but I didn't see anything in the > docs. I figure it's better to just proactively clobber the registers on > syscalls, as that way userspace can't end up accidentally depending on > them. > --- > Documentation/riscv/vector.rst | 5 +++++ > arch/riscv/kernel/traps.c | 2 ++ > 2 files changed, 7 insertions(+) > > diff --git a/Documentation/riscv/vector.rst b/Documentation/riscv/vector.rst > index 48f189d79e41..a4dfa954215b 100644 > --- a/Documentation/riscv/vector.rst > +++ b/Documentation/riscv/vector.rst > @@ -130,3 +130,8 @@ processes in form of sysctl knob: > > Modifying the system default enablement status does not affect the enablement > status of any existing process of thread that do not make an execve() call. > + > +3. Vector Register State Across System Calls > +--------------------------------------------- > + > +Vector registers are clobbered by system calls. > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c > index 05ffdcd1424e..bb99a6379b37 100644 > --- a/arch/riscv/kernel/traps.c > +++ b/arch/riscv/kernel/traps.c > @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) > regs->epc += 4; > regs->orig_a0 = regs->a0; > > + riscv_v_vstate_off(regs); > + Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to call? Something like: static void vstate_discard(struct pt_regs *regs) { if ((regs->status & SR_VS) == SR_VS_DIRTY) __riscv_v_vstate_clean(regs); } Complemented by a !V config variant. Björn
On Fri, 16 Jun 2023 13:12:14 PDT (-0700), bjorn@kernel.org wrote: > Palmer Dabbelt <palmer@rivosinc.com> writes: > >> The V registers are clobbered by standard ABI functions, so userspace >> probably doesn't have anything useful in them by the time we get to the >> kernel. So let's just document that they're clobbered by syscalls and >> proactively clobber them. >> >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> >> --- >> IIRC we'd talked about doing this, but I didn't see anything in the >> docs. I figure it's better to just proactively clobber the registers on >> syscalls, as that way userspace can't end up accidentally depending on >> them. >> --- >> Documentation/riscv/vector.rst | 5 +++++ >> arch/riscv/kernel/traps.c | 2 ++ >> 2 files changed, 7 insertions(+) >> >> diff --git a/Documentation/riscv/vector.rst b/Documentation/riscv/vector.rst >> index 48f189d79e41..a4dfa954215b 100644 >> --- a/Documentation/riscv/vector.rst >> +++ b/Documentation/riscv/vector.rst >> @@ -130,3 +130,8 @@ processes in form of sysctl knob: >> >> Modifying the system default enablement status does not affect the enablement >> status of any existing process of thread that do not make an execve() call. >> + >> +3. Vector Register State Across System Calls >> +--------------------------------------------- >> + >> +Vector registers are clobbered by system calls. >> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c >> index 05ffdcd1424e..bb99a6379b37 100644 >> --- a/arch/riscv/kernel/traps.c >> +++ b/arch/riscv/kernel/traps.c >> @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) >> regs->epc += 4; >> regs->orig_a0 = regs->a0; >> >> + riscv_v_vstate_off(regs); >> + > > Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to > call? Something like: > > static void vstate_discard(struct pt_regs *regs) > { > if ((regs->status & SR_VS) == SR_VS_DIRTY) > __riscv_v_vstate_clean(regs); > } > > Complemented by a !V config variant. I think it's just a question of what we're trying to do here: clean avoids the kernel V state save, but unless the kernel decides to use V during the syscall the register contents will still be usable by userspace. Maybe that's fine and we can just rely on the ISA spec, though? I sent another patch to just document it in Linux, even if it's in the ISA spec it seems worth having in the kernel as well. That said, I think the right thing to do here might be to zero the V register state and set it to initial: that way we can prevent userspace from accidentally relying on the state save, but we can also avoid the trap that would come from turning it off. That lets us give the hardware a nice clean indication when the V state isn't in use, which will hopefully help us avoid the save/restore performance issues that other ports have hit. I think the issue with zeroing the registers in that it may be slow on some implementations, as it requires a bunch of V register writes and those could be multi-cycle. I'd lean towards doing the zeroing now, as it'll make sure userspace respects the uABI and we don't have any HW to measure the performance on. Maybe the zeroing will be enough to get HW to make that fast, if not we can always roll it back when HW starts showing up. There's also some questions as to whether or not HW is going to bother respecting the intermediate states, as IIRC it's pretty common for HW to ignore them for the F/D extensions (at least the old SiFive cores do). I think there's just not a whole lot we can do there, HW that inaccurately tracks the metadata will just end up with more save/restore time. > Björn > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
Palmer Dabbelt <palmer@rivosinc.com> writes: [...] >>> + riscv_v_vstate_off(regs); >>> + >> >> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to >> call? Something like: >> >> static void vstate_discard(struct pt_regs *regs) >> { >> if ((regs->status & SR_VS) == SR_VS_DIRTY) >> __riscv_v_vstate_clean(regs); >> } >> >> Complemented by a !V config variant. > > I think it's just a question of what we're trying to do here: clean > avoids the kernel V state save, but unless the kernel decides to use V > during the syscall the register contents will still be usable by > userspace. Maybe that's fine and we can just rely on the ISA spec, > though? I sent another patch to just document it in Linux, even if it's > in the ISA spec it seems worth having in the kernel as well. > > That said, I think the right thing to do here might be to zero the V > register state and set it to initial: that way we can prevent userspace > from accidentally relying on the state save, but we can also avoid the > trap that would come from turning it off. That lets us give the > hardware a nice clean indication when the V state isn't in use, which > will hopefully help us avoid the save/restore performance issues that > other ports have hit. FWIW, I think that's a much better idea than turning V off. I also like that it'll preventing userland to rely on pre-ecall state. Björn
On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote: > Palmer Dabbelt <palmer@rivosinc.com> writes: > > [...] > >>>> + riscv_v_vstate_off(regs); >>>> + >>> >>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to >>> call? Something like: >>> >>> static void vstate_discard(struct pt_regs *regs) >>> { >>> if ((regs->status & SR_VS) == SR_VS_DIRTY) >>> __riscv_v_vstate_clean(regs); >>> } >>> >>> Complemented by a !V config variant. >> >> I think it's just a question of what we're trying to do here: clean >> avoids the kernel V state save, but unless the kernel decides to use V >> during the syscall the register contents will still be usable by >> userspace. Maybe that's fine and we can just rely on the ISA spec, >> though? I sent another patch to just document it in Linux, even if it's >> in the ISA spec it seems worth having in the kernel as well. >> >> That said, I think the right thing to do here might be to zero the V >> register state and set it to initial: that way we can prevent userspace >> from accidentally relying on the state save, but we can also avoid the >> trap that would come from turning it off. That lets us give the >> hardware a nice clean indication when the V state isn't in use, which >> will hopefully help us avoid the save/restore performance issues that >> other ports have hit. > > FWIW, I think that's a much better idea than turning V off. I also like > that it'll preventing userland to rely on pre-ecall state. OK, anyone else opposed? We're kind of in the weeds on performance, I think we'd need HW to know for sure if either is an issue. Seems best to just play it safe WRT the uABI for now, we can always deal with any performance issues if the exist. > Björn
Palmer Dabbelt <palmer@rivosinc.com> writes: > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote: >> Palmer Dabbelt <palmer@rivosinc.com> writes: >> >> [...] >> >>>>> + riscv_v_vstate_off(regs); >>>>> + >>>> >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to >>>> call? Something like: >>>> >>>> static void vstate_discard(struct pt_regs *regs) >>>> { >>>> if ((regs->status & SR_VS) == SR_VS_DIRTY) >>>> __riscv_v_vstate_clean(regs); >>>> } >>>> >>>> Complemented by a !V config variant. >>> >>> I think it's just a question of what we're trying to do here: clean >>> avoids the kernel V state save, but unless the kernel decides to use V >>> during the syscall the register contents will still be usable by >>> userspace. Maybe that's fine and we can just rely on the ISA spec, >>> though? I sent another patch to just document it in Linux, even if it's >>> in the ISA spec it seems worth having in the kernel as well. >>> >>> That said, I think the right thing to do here might be to zero the V >>> register state and set it to initial: that way we can prevent userspace >>> from accidentally relying on the state save, but we can also avoid the >>> trap that would come from turning it off. That lets us give the >>> hardware a nice clean indication when the V state isn't in use, which >>> will hopefully help us avoid the save/restore performance issues that >>> other ports have hit. >> >> FWIW, I think that's a much better idea than turning V off. I also like >> that it'll preventing userland to rely on pre-ecall state. > > OK, anyone else opposed? > > We're kind of in the weeds on performance, I think we'd need HW to know > for sure if either is an issue. Seems best to just play it safe WRT the > uABI for now, we can always deal with any performance issues if the > exist. Here's the patch you mentioned at the PW synchup; I've kept the Subject and such if you wan't to apply it. LMK if you'd like a proper one. -- Subject: [PATCH] riscv: Discard vector state on syscalls MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The RISC-V vector specification states: Executing a system call causes all caller-saved vector registers (v0-v31, vl, vtype) and vstart to become unspecified. The vector status is set to Initial, and the vector state is explicitly zeroed. That way we can prevent userspace from accidentally relying on the stated save. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> --- arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++ arch/riscv/kernel/traps.c | 2 ++ 2 files changed, 26 insertions(+) diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h index 04c0b07bf6cd..b3020d064f42 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk); bool riscv_v_vstate_ctrl_user_allowed(void); +static inline void riscv_v_vstate_discard(struct pt_regs *regs) +{ + unsigned long vl; + + if (!riscv_v_vstate_query(regs)) + return; + + riscv_v_vstate_on(regs); + + riscv_v_enable(); + asm volatile ( + ".option push\n\t" + ".option arch, +v\n\t" + "vsetvli %0, x0, e8, m8, ta, ma\n\t" + "vmv.v.i v0, 0\n\t" + "vmv.v.i v8, 0\n\t" + "vmv.v.i v16, 0\n\t" + "vmv.v.i v24, 0\n\t" + ".option pop\n\t" + : "=&r" (vl) : : "memory"); + riscv_v_disable(); +} + #else /* ! CONFIG_RISCV_ISA_V */ struct pt_regs; @@ -178,6 +201,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; } #define __switch_to_vector(__prev, __next) do {} while (0) #define riscv_v_vstate_off(regs) do {} while (0) #define riscv_v_vstate_on(regs) do {} while (0) +#define riscv_v_vstate_discard(regs) do {} while (0) #endif /* CONFIG_RISCV_ISA_V */ diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index 05ffdcd1424e..00c68b57ff88 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) regs->epc += 4; regs->orig_a0 = regs->a0; + riscv_v_vstate_discard(regs); + syscall = syscall_enter_from_user_mode(regs, syscall); if (syscall < NR_syscalls) base-commit: abd6152d6046ddc4be1040b6206bee2e025e8a79
On Wed, Jun 21, 2023 at 04:26:14PM +0200, Björn Töpel wrote: > Palmer Dabbelt <palmer@rivosinc.com> writes: > > > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote: > >> Palmer Dabbelt <palmer@rivosinc.com> writes: > >> > >> [...] > >> > >>>>> + riscv_v_vstate_off(regs); > >>>>> + > >>>> > >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to > >>>> call? Something like: > >>>> > >>>> static void vstate_discard(struct pt_regs *regs) > >>>> { > >>>> if ((regs->status & SR_VS) == SR_VS_DIRTY) > >>>> __riscv_v_vstate_clean(regs); > >>>> } > >>>> > >>>> Complemented by a !V config variant. > >>> > >>> I think it's just a question of what we're trying to do here: clean > >>> avoids the kernel V state save, but unless the kernel decides to use V > >>> during the syscall the register contents will still be usable by > >>> userspace. Maybe that's fine and we can just rely on the ISA spec, > >>> though? I sent another patch to just document it in Linux, even if it's > >>> in the ISA spec it seems worth having in the kernel as well. > >>> > >>> That said, I think the right thing to do here might be to zero the V > >>> register state and set it to initial: that way we can prevent userspace > >>> from accidentally relying on the state save, but we can also avoid the > >>> trap that would come from turning it off. That lets us give the > >>> hardware a nice clean indication when the V state isn't in use, which > >>> will hopefully help us avoid the save/restore performance issues that > >>> other ports have hit. > >> > >> FWIW, I think that's a much better idea than turning V off. I also like > >> that it'll preventing userland to rely on pre-ecall state. > > > > OK, anyone else opposed? > > > > We're kind of in the weeds on performance, I think we'd need HW to know > > for sure if either is an issue. Seems best to just play it safe WRT the > > uABI for now, we can always deal with any performance issues if the > > exist. > > Here's the patch you mentioned at the PW synchup; I've kept the Subject > and such if you wan't to apply it. LMK if you'd like a proper one. > > -- > > Subject: [PATCH] riscv: Discard vector state on syscalls > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > The RISC-V vector specification states: > Executing a system call causes all caller-saved vector registers > (v0-v31, vl, vtype) and vstart to become unspecified. > > The vector status is set to Initial, and the vector state is > explicitly zeroed. That way we can prevent userspace from accidentally > relying on the stated save. Is it worth clobbering with all 1s, rather than zero, for consistency with other vector behavior (i.e., tail/mask agnostic) and for the reasons given in the vector spec for not doing so with zero? > > Signed-off-by: Björn Töpel <bjorn@rivosinc.com> > --- > arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++ > arch/riscv/kernel/traps.c | 2 ++ > 2 files changed, 26 insertions(+) > > diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h > index 04c0b07bf6cd..b3020d064f42 100644 > --- a/arch/riscv/include/asm/vector.h > +++ b/arch/riscv/include/asm/vector.h > @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct task_struct *prev, > void riscv_v_vstate_ctrl_init(struct task_struct *tsk); > bool riscv_v_vstate_ctrl_user_allowed(void); > > +static inline void riscv_v_vstate_discard(struct pt_regs *regs) > +{ > + unsigned long vl; > + > + if (!riscv_v_vstate_query(regs)) > + return; > + > + riscv_v_vstate_on(regs); > + > + riscv_v_enable(); > + asm volatile ( > + ".option push\n\t" > + ".option arch, +v\n\t" > + "vsetvli %0, x0, e8, m8, ta, ma\n\t" > + "vmv.v.i v0, 0\n\t" > + "vmv.v.i v8, 0\n\t" > + "vmv.v.i v16, 0\n\t" > + "vmv.v.i v24, 0\n\t" > + ".option pop\n\t" > + : "=&r" (vl) : : "memory"); > + riscv_v_disable(); > +} > + > #else /* ! CONFIG_RISCV_ISA_V */ > > struct pt_regs; > @@ -178,6 +201,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; } > #define __switch_to_vector(__prev, __next) do {} while (0) > #define riscv_v_vstate_off(regs) do {} while (0) > #define riscv_v_vstate_on(regs) do {} while (0) > +#define riscv_v_vstate_discard(regs) do {} while (0) > > #endif /* CONFIG_RISCV_ISA_V */ > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c > index 05ffdcd1424e..00c68b57ff88 100644 > --- a/arch/riscv/kernel/traps.c > +++ b/arch/riscv/kernel/traps.c > @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) > regs->epc += 4; > regs->orig_a0 = regs->a0; > > + riscv_v_vstate_discard(regs); > + > syscall = syscall_enter_from_user_mode(regs, syscall); > > if (syscall < NR_syscalls) > > base-commit: abd6152d6046ddc4be1040b6206bee2e025e8a79 > -- > 2.39.2 > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Wed, Jun 21, 2023 at 10:26 PM Björn Töpel <bjorn@kernel.org> wrote: > > Palmer Dabbelt <palmer@rivosinc.com> writes: > > > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote: > >> Palmer Dabbelt <palmer@rivosinc.com> writes: > >> > >> [...] > >> > >>>>> + riscv_v_vstate_off(regs); > >>>>> + > >>>> > >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to > >>>> call? Something like: > >>>> > >>>> static void vstate_discard(struct pt_regs *regs) > >>>> { > >>>> if ((regs->status & SR_VS) == SR_VS_DIRTY) > >>>> __riscv_v_vstate_clean(regs); > >>>> } > >>>> > >>>> Complemented by a !V config variant. > >>> > >>> I think it's just a question of what we're trying to do here: clean > >>> avoids the kernel V state save, but unless the kernel decides to use V > >>> during the syscall the register contents will still be usable by > >>> userspace. Maybe that's fine and we can just rely on the ISA spec, > >>> though? I sent another patch to just document it in Linux, even if it's > >>> in the ISA spec it seems worth having in the kernel as well. > >>> > >>> That said, I think the right thing to do here might be to zero the V > >>> register state and set it to initial: that way we can prevent userspace > >>> from accidentally relying on the state save, but we can also avoid the > >>> trap that would come from turning it off. That lets us give the > >>> hardware a nice clean indication when the V state isn't in use, which > >>> will hopefully help us avoid the save/restore performance issues that > >>> other ports have hit. > >> > >> FWIW, I think that's a much better idea than turning V off. I also like > >> that it'll preventing userland to rely on pre-ecall state. > > > > OK, anyone else opposed? > > > > We're kind of in the weeds on performance, I think we'd need HW to know > > for sure if either is an issue. Seems best to just play it safe WRT the > > uABI for now, we can always deal with any performance issues if the > > exist. > > Here's the patch you mentioned at the PW synchup; I've kept the Subject > and such if you wan't to apply it. LMK if you'd like a proper one. > > -- > > Subject: [PATCH] riscv: Discard vector state on syscalls > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > The RISC-V vector specification states: > Executing a system call causes all caller-saved vector registers > (v0-v31, vl, vtype) and vstart to become unspecified. > > The vector status is set to Initial, and the vector state is > explicitly zeroed. That way we can prevent userspace from accidentally > relying on the stated save. > > Signed-off-by: Björn Töpel <bjorn@rivosinc.com> > --- > arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++ > arch/riscv/kernel/traps.c | 2 ++ > 2 files changed, 26 insertions(+) > > diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h > index 04c0b07bf6cd..b3020d064f42 100644 > --- a/arch/riscv/include/asm/vector.h > +++ b/arch/riscv/include/asm/vector.h > @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct task_struct *prev, > void riscv_v_vstate_ctrl_init(struct task_struct *tsk); > bool riscv_v_vstate_ctrl_user_allowed(void); > > +static inline void riscv_v_vstate_discard(struct pt_regs *regs) > +{ > + unsigned long vl; > + > + if (!riscv_v_vstate_query(regs)) > + return; > + > + riscv_v_vstate_on(regs); Do we need this riscv_v_vstate_on()? If it is not on we'd return early in the previous "if" statement, right? > + > + riscv_v_enable(); > + asm volatile ( > + ".option push\n\t" > + ".option arch, +v\n\t" > + "vsetvli %0, x0, e8, m8, ta, ma\n\t" > + "vmv.v.i v0, 0\n\t" > + "vmv.v.i v8, 0\n\t" > + "vmv.v.i v16, 0\n\t" > + "vmv.v.i v24, 0\n\t" > + ".option pop\n\t" > + : "=&r" (vl) : : "memory"); > + riscv_v_disable(); Maybe consider cleaning the vstate (status.vs) here. As such we don't have to save V during context switch. Or, maybe we could set vstate as off during syscall and discard V-reg + restore status.VS when returning back to userspace? > +} > + > #else /* ! CONFIG_RISCV_ISA_V */ > > struct pt_regs; > @@ -178,6 +201,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; } > #define __switch_to_vector(__prev, __next) do {} while (0) > #define riscv_v_vstate_off(regs) do {} while (0) > #define riscv_v_vstate_on(regs) do {} while (0) > +#define riscv_v_vstate_discard(regs) do {} while (0) > > #endif /* CONFIG_RISCV_ISA_V */ > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c > index 05ffdcd1424e..00c68b57ff88 100644 > --- a/arch/riscv/kernel/traps.c > +++ b/arch/riscv/kernel/traps.c > @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) > regs->epc += 4; > regs->orig_a0 = regs->a0; > > + riscv_v_vstate_discard(regs); > + > syscall = syscall_enter_from_user_mode(regs, syscall); > > if (syscall < NR_syscalls) > > base-commit: abd6152d6046ddc4be1040b6206bee2e025e8a79 > -- > 2.39.2 Agree. It is better to clean V registers instead of turning off Vector. Regards, Andy
Le keskiviikkona 21. kesäkuuta 2023, 17.26.14 EEST Björn Töpel a écrit : > Palmer Dabbelt <palmer@rivosinc.com> writes: > > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote: > >> Palmer Dabbelt <palmer@rivosinc.com> writes: > >> > >> [...] > >> > >>>>> + riscv_v_vstate_off(regs); > >>>>> + > >>>> > >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to > >>>> call? Something like: > >>>> > >>>> static void vstate_discard(struct pt_regs *regs) > >>>> { > >>>> > >>>> if ((regs->status & SR_VS) == SR_VS_DIRTY) > >>>> > >>>> __riscv_v_vstate_clean(regs); > >>>> > >>>> } > >>>> > >>>> Complemented by a !V config variant. > >>> > >>> I think it's just a question of what we're trying to do here: clean > >>> avoids the kernel V state save, but unless the kernel decides to use V > >>> during the syscall the register contents will still be usable by > >>> userspace. Maybe that's fine and we can just rely on the ISA spec, > >>> though? I sent another patch to just document it in Linux, even if it's > >>> in the ISA spec it seems worth having in the kernel as well. > >>> > >>> That said, I think the right thing to do here might be to zero the V > >>> register state and set it to initial: that way we can prevent userspace > >>> from accidentally relying on the state save, but we can also avoid the > >>> trap that would come from turning it off. That lets us give the > >>> hardware a nice clean indication when the V state isn't in use, which > >>> will hopefully help us avoid the save/restore performance issues that > >>> other ports have hit. > >> > >> FWIW, I think that's a much better idea than turning V off. I also like > >> that it'll preventing userland to rely on pre-ecall state. > > > > OK, anyone else opposed? > > > > We're kind of in the weeds on performance, I think we'd need HW to know > > for sure if either is an issue. Seems best to just play it safe WRT the > > uABI for now, we can always deal with any performance issues if the > > exist. > > Here's the patch you mentioned at the PW synchup; I've kept the Subject > and such if you wan't to apply it. LMK if you'd like a proper one. > > -- > > Subject: [PATCH] riscv: Discard vector state on syscalls > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > The RISC-V vector specification states: > Executing a system call causes all caller-saved vector registers > (v0-v31, vl, vtype) and vstart to become unspecified. > > The vector status is set to Initial, and the vector state is > explicitly zeroed. That way we can prevent userspace from accidentally > relying on the stated save. > > Signed-off-by: Björn Töpel <bjorn@rivosinc.com> > --- > arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++ > arch/riscv/kernel/traps.c | 2 ++ > 2 files changed, 26 insertions(+) > > diff --git a/arch/riscv/include/asm/vector.h > b/arch/riscv/include/asm/vector.h index 04c0b07bf6cd..b3020d064f42 100644 > --- a/arch/riscv/include/asm/vector.h > +++ b/arch/riscv/include/asm/vector.h > @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct > task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk); > bool riscv_v_vstate_ctrl_user_allowed(void); > > +static inline void riscv_v_vstate_discard(struct pt_regs *regs) > +{ > + unsigned long vl; > + > + if (!riscv_v_vstate_query(regs)) > + return; > + > + riscv_v_vstate_on(regs); > + > + riscv_v_enable(); > + asm volatile ( > + ".option push\n\t" > + ".option arch, +v\n\t" > + "vsetvli %0, x0, e8, m8, ta, ma\n\t" > + "vmv.v.i v0, 0\n\t" > + "vmv.v.i v8, 0\n\t" > + "vmv.v.i v16, 0\n\t" > + "vmv.v.i v24, 0\n\t" > + ".option pop\n\t" > + : "=&r" (vl) : : "memory"); > + riscv_v_disable(); Shouldn't this also set `vill` to 1 using `vsetvl`? In fact, a faster alternative may yet be to *only* set an invalid vector configuration. It's rather unlikely that user-space code would set a valid configuration and use vectors without loading them first. If it ever does, then it's so broken that the kernel probably doesn't need to care.
On Wed, 21 Jun 2023 09:47:37 PDT (-0700), remi@remlab.net wrote: > Le keskiviikkona 21. kesäkuuta 2023, 17.26.14 EEST Björn Töpel a écrit : >> Palmer Dabbelt <palmer@rivosinc.com> writes: >> > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote: >> >> Palmer Dabbelt <palmer@rivosinc.com> writes: >> >> >> >> [...] >> >> >> >>>>> + riscv_v_vstate_off(regs); >> >>>>> + >> >>>> >> >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to >> >>>> call? Something like: >> >>>> >> >>>> static void vstate_discard(struct pt_regs *regs) >> >>>> { >> >>>> >> >>>> if ((regs->status & SR_VS) == SR_VS_DIRTY) >> >>>> >> >>>> __riscv_v_vstate_clean(regs); >> >>>> >> >>>> } >> >>>> >> >>>> Complemented by a !V config variant. >> >>> >> >>> I think it's just a question of what we're trying to do here: clean >> >>> avoids the kernel V state save, but unless the kernel decides to use V >> >>> during the syscall the register contents will still be usable by >> >>> userspace. Maybe that's fine and we can just rely on the ISA spec, >> >>> though? I sent another patch to just document it in Linux, even if it's >> >>> in the ISA spec it seems worth having in the kernel as well. >> >>> >> >>> That said, I think the right thing to do here might be to zero the V >> >>> register state and set it to initial: that way we can prevent userspace >> >>> from accidentally relying on the state save, but we can also avoid the >> >>> trap that would come from turning it off. That lets us give the >> >>> hardware a nice clean indication when the V state isn't in use, which >> >>> will hopefully help us avoid the save/restore performance issues that >> >>> other ports have hit. >> >> >> >> FWIW, I think that's a much better idea than turning V off. I also like >> >> that it'll preventing userland to rely on pre-ecall state. >> > >> > OK, anyone else opposed? >> > >> > We're kind of in the weeds on performance, I think we'd need HW to know >> > for sure if either is an issue. Seems best to just play it safe WRT the >> > uABI for now, we can always deal with any performance issues if the >> > exist. >> >> Here's the patch you mentioned at the PW synchup; I've kept the Subject >> and such if you wan't to apply it. LMK if you'd like a proper one. >> >> -- >> >> Subject: [PATCH] riscv: Discard vector state on syscalls >> MIME-Version: 1.0 >> Content-Type: text/plain; charset=UTF-8 >> Content-Transfer-Encoding: 8bit >> >> The RISC-V vector specification states: >> Executing a system call causes all caller-saved vector registers >> (v0-v31, vl, vtype) and vstart to become unspecified. >> >> The vector status is set to Initial, and the vector state is >> explicitly zeroed. That way we can prevent userspace from accidentally >> relying on the stated save. >> >> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> >> --- >> arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++ >> arch/riscv/kernel/traps.c | 2 ++ >> 2 files changed, 26 insertions(+) >> >> diff --git a/arch/riscv/include/asm/vector.h >> b/arch/riscv/include/asm/vector.h index 04c0b07bf6cd..b3020d064f42 100644 >> --- a/arch/riscv/include/asm/vector.h >> +++ b/arch/riscv/include/asm/vector.h >> @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct >> task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk); >> bool riscv_v_vstate_ctrl_user_allowed(void); >> >> +static inline void riscv_v_vstate_discard(struct pt_regs *regs) >> +{ >> + unsigned long vl; >> + >> + if (!riscv_v_vstate_query(regs)) >> + return; >> + >> + riscv_v_vstate_on(regs); >> + >> + riscv_v_enable(); >> + asm volatile ( >> + ".option push\n\t" >> + ".option arch, +v\n\t" >> + "vsetvli %0, x0, e8, m8, ta, ma\n\t" >> + "vmv.v.i v0, 0\n\t" >> + "vmv.v.i v8, 0\n\t" >> + "vmv.v.i v16, 0\n\t" >> + "vmv.v.i v24, 0\n\t" >> + ".option pop\n\t" >> + : "=&r" (vl) : : "memory"); >> + riscv_v_disable(); > > Shouldn't this also set `vill` to 1 using `vsetvl`? That seems reasonable to me. > In fact, a faster alternative may yet be to *only* set an invalid vector > configuration. It's rather unlikely that user-space code would set a valid > configuration and use vectors without loading them first. If it ever does, then > it's so broken that the kernel probably doesn't need to care. I think that's sufficient to force userspace to trap on a bad value? Most of the unsupported value writes in RISC-V are just WARL, but as far as I can tell the V spec requires vill handling. Specifically Implementations must consider all bits of the vtype value to determine if the configuration is supported. An unsupported value in any location within the vtype value must result in vill being set. which seems pretty concrete about this being required. That's from the current draft of the V spec, the wording in 1.0 isn't quite as clear: it sort of allows for the WARL-type behavior, but that's probably splitting hairs. That said, it provides a slightly different cost curve: we'd need to save/restore the V registers on non-syscall traps even when vill is set in userspace, as they've still got state in them (userspace could be in the middle of some probing routine, for example). Also from Darius' fork of the thread: IIUC there's nothing saying 0 is initial, or that initial even needs to work. So I think we're just splitting hairs here, as long as we clobber enough state that userspace doesn't accidentally depend on is fine with me. > -- > 雷米‧德尼-库尔蒙 > http://www.remlab.net/
On Wed, 21 Jun 2023 07:44:51 PDT (-0700), Darius Rad wrote: > On Wed, Jun 21, 2023 at 04:26:14PM +0200, Björn Töpel wrote: >> Palmer Dabbelt <palmer@rivosinc.com> writes: >> >> > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote: >> >> Palmer Dabbelt <palmer@rivosinc.com> writes: >> >> >> >> [...] >> >> >> >>>>> + riscv_v_vstate_off(regs); >> >>>>> + >> >>>> >> >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to >> >>>> call? Something like: >> >>>> >> >>>> static void vstate_discard(struct pt_regs *regs) >> >>>> { >> >>>> if ((regs->status & SR_VS) == SR_VS_DIRTY) >> >>>> __riscv_v_vstate_clean(regs); >> >>>> } >> >>>> >> >>>> Complemented by a !V config variant. >> >>> >> >>> I think it's just a question of what we're trying to do here: clean >> >>> avoids the kernel V state save, but unless the kernel decides to use V >> >>> during the syscall the register contents will still be usable by >> >>> userspace. Maybe that's fine and we can just rely on the ISA spec, >> >>> though? I sent another patch to just document it in Linux, even if it's >> >>> in the ISA spec it seems worth having in the kernel as well. >> >>> >> >>> That said, I think the right thing to do here might be to zero the V >> >>> register state and set it to initial: that way we can prevent userspace >> >>> from accidentally relying on the state save, but we can also avoid the >> >>> trap that would come from turning it off. That lets us give the >> >>> hardware a nice clean indication when the V state isn't in use, which >> >>> will hopefully help us avoid the save/restore performance issues that >> >>> other ports have hit. >> >> >> >> FWIW, I think that's a much better idea than turning V off. I also like >> >> that it'll preventing userland to rely on pre-ecall state. >> > >> > OK, anyone else opposed? >> > >> > We're kind of in the weeds on performance, I think we'd need HW to know >> > for sure if either is an issue. Seems best to just play it safe WRT the >> > uABI for now, we can always deal with any performance issues if the >> > exist. >> >> Here's the patch you mentioned at the PW synchup; I've kept the Subject >> and such if you wan't to apply it. LMK if you'd like a proper one. >> >> -- >> >> Subject: [PATCH] riscv: Discard vector state on syscalls >> MIME-Version: 1.0 >> Content-Type: text/plain; charset=UTF-8 >> Content-Transfer-Encoding: 8bit >> >> The RISC-V vector specification states: >> Executing a system call causes all caller-saved vector registers >> (v0-v31, vl, vtype) and vstart to become unspecified. >> >> The vector status is set to Initial, and the vector state is >> explicitly zeroed. That way we can prevent userspace from accidentally >> relying on the stated save. > > Is it worth clobbering with all 1s, rather than zero, for consistency with > other vector behavior (i.e., tail/mask agnostic) and for the reasons given > in the vector spec for not doing so with zero? Might be. I guess the assumption was that vs==initial means all 0's, but unless I'm missing something there's no rules for what initial means in the spec. > >> >> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> >> --- >> arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++ >> arch/riscv/kernel/traps.c | 2 ++ >> 2 files changed, 26 insertions(+) >> >> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h >> index 04c0b07bf6cd..b3020d064f42 100644 >> --- a/arch/riscv/include/asm/vector.h >> +++ b/arch/riscv/include/asm/vector.h >> @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct task_struct *prev, >> void riscv_v_vstate_ctrl_init(struct task_struct *tsk); >> bool riscv_v_vstate_ctrl_user_allowed(void); >> >> +static inline void riscv_v_vstate_discard(struct pt_regs *regs) >> +{ >> + unsigned long vl; >> + >> + if (!riscv_v_vstate_query(regs)) >> + return; >> + >> + riscv_v_vstate_on(regs); >> + >> + riscv_v_enable(); >> + asm volatile ( >> + ".option push\n\t" >> + ".option arch, +v\n\t" >> + "vsetvli %0, x0, e8, m8, ta, ma\n\t" >> + "vmv.v.i v0, 0\n\t" >> + "vmv.v.i v8, 0\n\t" >> + "vmv.v.i v16, 0\n\t" >> + "vmv.v.i v24, 0\n\t" >> + ".option pop\n\t" >> + : "=&r" (vl) : : "memory"); >> + riscv_v_disable(); >> +} >> + >> #else /* ! CONFIG_RISCV_ISA_V */ >> >> struct pt_regs; >> @@ -178,6 +201,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; } >> #define __switch_to_vector(__prev, __next) do {} while (0) >> #define riscv_v_vstate_off(regs) do {} while (0) >> #define riscv_v_vstate_on(regs) do {} while (0) >> +#define riscv_v_vstate_discard(regs) do {} while (0) >> >> #endif /* CONFIG_RISCV_ISA_V */ >> >> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c >> index 05ffdcd1424e..00c68b57ff88 100644 >> --- a/arch/riscv/kernel/traps.c >> +++ b/arch/riscv/kernel/traps.c >> @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) >> regs->epc += 4; >> regs->orig_a0 = regs->a0; >> >> + riscv_v_vstate_discard(regs); >> + >> syscall = syscall_enter_from_user_mode(regs, syscall); >> >> if (syscall < NR_syscalls) >> >> base-commit: abd6152d6046ddc4be1040b6206bee2e025e8a79 >> -- >> 2.39.2 >> >> _______________________________________________ >> linux-riscv mailing list >> linux-riscv@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-riscv
Andy Chiu <andy.chiu@sifive.com> writes: >> +static inline void riscv_v_vstate_discard(struct pt_regs *regs) >> +{ >> + unsigned long vl; >> + >> + if (!riscv_v_vstate_query(regs)) >> + return; >> + >> + riscv_v_vstate_on(regs); > > Do we need this riscv_v_vstate_on()? If it is not on we'd return > early in the previous "if" statement, right? riscv_v_vstate_on() just set the state to Initial, right? Or do you mean that riscv_v_vstate_query() is too much, and we should only check if the state is dirty? >> + >> + riscv_v_enable(); >> + asm volatile ( >> + ".option push\n\t" >> + ".option arch, +v\n\t" >> + "vsetvli %0, x0, e8, m8, ta, ma\n\t" >> + "vmv.v.i v0, 0\n\t" >> + "vmv.v.i v8, 0\n\t" >> + "vmv.v.i v16, 0\n\t" >> + "vmv.v.i v24, 0\n\t" >> + ".option pop\n\t" >> + : "=&r" (vl) : : "memory"); >> + riscv_v_disable(); > > Maybe consider cleaning the vstate (status.vs) here. As such we don't > have to save V during context switch. It's late, and I'm slower than usual. The regs are cleared, and the state is Initial. No save on context switch, but restore, right? > Or, maybe we could set vstate as off during syscall and discard V-reg > + restore status.VS when returning back to userspace? Hmm, interesting. We need to track the status.VS to restore somewhere... Björn
Palmer Dabbelt <palmer@rivosinc.com> writes: > On Wed, 21 Jun 2023 09:47:37 PDT (-0700), remi@remlab.net wrote: >> Le keskiviikkona 21. kesäkuuta 2023, 17.26.14 EEST Björn Töpel a écrit : >>> Palmer Dabbelt <palmer@rivosinc.com> writes: >>> > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote: >>> >> Palmer Dabbelt <palmer@rivosinc.com> writes: >>> >> >>> >> [...] >>> >> >>> >>>>> + riscv_v_vstate_off(regs); >>> >>>>> + >>> >>>> >>> >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to >>> >>>> call? Something like: >>> >>>> >>> >>>> static void vstate_discard(struct pt_regs *regs) >>> >>>> { >>> >>>> >>> >>>> if ((regs->status & SR_VS) == SR_VS_DIRTY) >>> >>>> >>> >>>> __riscv_v_vstate_clean(regs); >>> >>>> >>> >>>> } >>> >>>> >>> >>>> Complemented by a !V config variant. >>> >>> >>> >>> I think it's just a question of what we're trying to do here: clean >>> >>> avoids the kernel V state save, but unless the kernel decides to use V >>> >>> during the syscall the register contents will still be usable by >>> >>> userspace. Maybe that's fine and we can just rely on the ISA spec, >>> >>> though? I sent another patch to just document it in Linux, even if it's >>> >>> in the ISA spec it seems worth having in the kernel as well. >>> >>> >>> >>> That said, I think the right thing to do here might be to zero the V >>> >>> register state and set it to initial: that way we can prevent userspace >>> >>> from accidentally relying on the state save, but we can also avoid the >>> >>> trap that would come from turning it off. That lets us give the >>> >>> hardware a nice clean indication when the V state isn't in use, which >>> >>> will hopefully help us avoid the save/restore performance issues that >>> >>> other ports have hit. >>> >> >>> >> FWIW, I think that's a much better idea than turning V off. I also like >>> >> that it'll preventing userland to rely on pre-ecall state. >>> > >>> > OK, anyone else opposed? >>> > >>> > We're kind of in the weeds on performance, I think we'd need HW to know >>> > for sure if either is an issue. Seems best to just play it safe WRT the >>> > uABI for now, we can always deal with any performance issues if the >>> > exist. >>> >>> Here's the patch you mentioned at the PW synchup; I've kept the Subject >>> and such if you wan't to apply it. LMK if you'd like a proper one. >>> >>> -- >>> >>> Subject: [PATCH] riscv: Discard vector state on syscalls >>> MIME-Version: 1.0 >>> Content-Type: text/plain; charset=UTF-8 >>> Content-Transfer-Encoding: 8bit >>> >>> The RISC-V vector specification states: >>> Executing a system call causes all caller-saved vector registers >>> (v0-v31, vl, vtype) and vstart to become unspecified. >>> >>> The vector status is set to Initial, and the vector state is >>> explicitly zeroed. That way we can prevent userspace from accidentally >>> relying on the stated save. >>> >>> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> >>> --- >>> arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++ >>> arch/riscv/kernel/traps.c | 2 ++ >>> 2 files changed, 26 insertions(+) >>> >>> diff --git a/arch/riscv/include/asm/vector.h >>> b/arch/riscv/include/asm/vector.h index 04c0b07bf6cd..b3020d064f42 100644 >>> --- a/arch/riscv/include/asm/vector.h >>> +++ b/arch/riscv/include/asm/vector.h >>> @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct >>> task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk); >>> bool riscv_v_vstate_ctrl_user_allowed(void); >>> >>> +static inline void riscv_v_vstate_discard(struct pt_regs *regs) >>> +{ >>> + unsigned long vl; >>> + >>> + if (!riscv_v_vstate_query(regs)) >>> + return; >>> + >>> + riscv_v_vstate_on(regs); >>> + >>> + riscv_v_enable(); >>> + asm volatile ( >>> + ".option push\n\t" >>> + ".option arch, +v\n\t" >>> + "vsetvli %0, x0, e8, m8, ta, ma\n\t" >>> + "vmv.v.i v0, 0\n\t" >>> + "vmv.v.i v8, 0\n\t" >>> + "vmv.v.i v16, 0\n\t" >>> + "vmv.v.i v24, 0\n\t" >>> + ".option pop\n\t" >>> + : "=&r" (vl) : : "memory"); >>> + riscv_v_disable(); >> >> Shouldn't this also set `vill` to 1 using `vsetvl`? > > That seems reasonable to me. Something like this? --- diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h index b3020d064f42..d5f7853936d5 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -165,7 +165,7 @@ bool riscv_v_vstate_ctrl_user_allowed(void); static inline void riscv_v_vstate_discard(struct pt_regs *regs) { - unsigned long vl; + unsigned long vl, vtype_inval = 1UL << (BITS_PER_LONG - 1); if (!riscv_v_vstate_query(regs)) return; @@ -181,8 +181,9 @@ static inline void riscv_v_vstate_discard(struct pt_regs *regs) "vmv.v.i v8, 0\n\t" "vmv.v.i v16, 0\n\t" "vmv.v.i v24, 0\n\t" + "vsetvl %0, x0, %1\n\t" ".option pop\n\t" - : "=&r" (vl) : : "memory"); + : "=&r" (vl) : "r" (vtype_inval) : "memory"); riscv_v_disable(); } --- Björn
On Thu, Jun 22, 2023 at 5:40 AM Björn Töpel <bjorn@kernel.org> wrote: > > Andy Chiu <andy.chiu@sifive.com> writes: > > >> +static inline void riscv_v_vstate_discard(struct pt_regs *regs) > >> +{ > >> + unsigned long vl; > >> + > >> + if (!riscv_v_vstate_query(regs)) > >> + return; > >> + > >> + riscv_v_vstate_on(regs); > > > > Do we need this riscv_v_vstate_on()? If it is not on we'd return > > early in the previous "if" statement, right? > > riscv_v_vstate_on() just set the state to Initial, right? Or do you mean > that riscv_v_vstate_query() is too much, and we should only check if the > state is dirty? > > >> + > >> + riscv_v_enable(); > >> + asm volatile ( > >> + ".option push\n\t" > >> + ".option arch, +v\n\t" > >> + "vsetvli %0, x0, e8, m8, ta, ma\n\t" > >> + "vmv.v.i v0, 0\n\t" > >> + "vmv.v.i v8, 0\n\t" > >> + "vmv.v.i v16, 0\n\t" > >> + "vmv.v.i v24, 0\n\t" > >> + ".option pop\n\t" > >> + : "=&r" (vl) : : "memory"); > >> + riscv_v_disable(); > > > > Maybe consider cleaning the vstate (status.vs) here. As such we don't > > have to save V during context switch. > > It's late, and I'm slower than usual. The regs are cleared, and the > state is Initial. No save on context switch, but restore, right? Yes, it's my bad, you are right. I sometime messed around the "real" status.VS with the one in the userspace context :P > > > Or, maybe we could set vstate as off during syscall and discard V-reg > > + restore status.VS when returning back to userspace? > > Hmm, interesting. We need to track the status.VS to restore somewhere... Maybe something like this? diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h index 04c0b07bf6cd..79de9ca83391 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -43,6 +43,11 @@ static inline void riscv_v_vstate_on(struct pt_regs *regs) regs->status = (regs->status & ~SR_VS) | SR_VS_INITIAL; } +static inline void riscv_v_vstate_dirty(struct pt_regs *regs) +{ + regs->status = (regs->status & ~SR_VS) | SR_VS_DIRTY; +} + static inline bool riscv_v_vstate_query(struct pt_regs *regs) { return (regs->status & SR_VS) != 0; @@ -163,6 +168,24 @@ static inline void __switch_to_vector(struct task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk); bool riscv_v_vstate_ctrl_user_allowed(void); +static inline void riscv_v_vstate_discard(struct pt_regs *regs) +{ + unsigned long vl; + + riscv_v_enable(); + asm volatile ( + ".option push\n\t" + ".option arch, +v\n\t" + "vsetvli %0, x0, e8, m8, ta, ma\n\t" + "vmv.v.i v0, 0\n\t" + "vmv.v.i v8, 0\n\t" + "vmv.v.i v16, 0\n\t" + "vmv.v.i v24, 0\n\t" + ".option pop\n\t" + : "=&r" (vl) : : "memory"); + riscv_v_disable(); +} + #else /* ! CONFIG_RISCV_ISA_V */ struct pt_regs; @@ -178,6 +201,8 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; } #define __switch_to_vector(__prev, __next) do {} while (0) #define riscv_v_vstate_off(regs) do {} while (0) #define riscv_v_vstate_on(regs) do {} while (0) +#define riscv_v_vstate_dirty(regs) do {} while (0) +#define riscv_v_vstate_discard(regs) do {} while (0) #endif /* CONFIG_RISCV_ISA_V */ diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index 24d309c6ab8d..e36b69c9b07f 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) { if (user_mode(regs)) { ulong syscall = regs->a7; + bool v_is_on; regs->epc += 4; regs->orig_a0 = regs->a0; + v_is_on = riscv_v_vstate_query(regs); + riscv_v_vstate_off(regs); + syscall = syscall_enter_from_user_mode(regs, syscall); if (syscall < NR_syscalls) @@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) regs->a0 = -ENOSYS; syscall_exit_to_user_mode(regs); + if (v_is_on) { + riscv_v_vstate_discard(regs); + riscv_v_vstate_dirty(regs); + } } else { irqentry_state_t state = irqentry_nmi_enter(regs); > > > Björn Thanks, Andy
Andy Chiu <andy.chiu@sifive.com> writes: > On Thu, Jun 22, 2023 at 5:40 AM Björn Töpel <bjorn@kernel.org> wrote: >> >> Andy Chiu <andy.chiu@sifive.com> writes: >> >> >> +static inline void riscv_v_vstate_discard(struct pt_regs *regs) >> >> +{ >> >> + unsigned long vl; >> >> + >> >> + if (!riscv_v_vstate_query(regs)) >> >> + return; >> >> + >> >> + riscv_v_vstate_on(regs); >> > >> > Do we need this riscv_v_vstate_on()? If it is not on we'd return >> > early in the previous "if" statement, right? >> >> riscv_v_vstate_on() just set the state to Initial, right? Or do you mean >> that riscv_v_vstate_query() is too much, and we should only check if the >> state is dirty? >> >> >> + >> >> + riscv_v_enable(); >> >> + asm volatile ( >> >> + ".option push\n\t" >> >> + ".option arch, +v\n\t" >> >> + "vsetvli %0, x0, e8, m8, ta, ma\n\t" >> >> + "vmv.v.i v0, 0\n\t" >> >> + "vmv.v.i v8, 0\n\t" >> >> + "vmv.v.i v16, 0\n\t" >> >> + "vmv.v.i v24, 0\n\t" >> >> + ".option pop\n\t" >> >> + : "=&r" (vl) : : "memory"); >> >> + riscv_v_disable(); >> > >> > Maybe consider cleaning the vstate (status.vs) here. As such we don't >> > have to save V during context switch. >> >> It's late, and I'm slower than usual. The regs are cleared, and the >> state is Initial. No save on context switch, but restore, right? > > Yes, it's my bad, you are right. I sometime messed around the "real" > status.VS with the one in the userspace context :P > >> >> > Or, maybe we could set vstate as off during syscall and discard V-reg >> > + restore status.VS when returning back to userspace? >> >> Hmm, interesting. We need to track the status.VS to restore somewhere... > > Maybe something like this? > > diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h > index 04c0b07bf6cd..79de9ca83391 100644 > --- a/arch/riscv/include/asm/vector.h > +++ b/arch/riscv/include/asm/vector.h > @@ -43,6 +43,11 @@ static inline void riscv_v_vstate_on(struct pt_regs *regs) > regs->status = (regs->status & ~SR_VS) | SR_VS_INITIAL; > } > > +static inline void riscv_v_vstate_dirty(struct pt_regs *regs) > +{ > + regs->status = (regs->status & ~SR_VS) | SR_VS_DIRTY; > +} > + > static inline bool riscv_v_vstate_query(struct pt_regs *regs) > { > return (regs->status & SR_VS) != 0; > @@ -163,6 +168,24 @@ static inline void __switch_to_vector(struct task_struct *prev, > void riscv_v_vstate_ctrl_init(struct task_struct *tsk); > bool riscv_v_vstate_ctrl_user_allowed(void); > > +static inline void riscv_v_vstate_discard(struct pt_regs *regs) > +{ > + unsigned long vl; > + > + riscv_v_enable(); > + asm volatile ( > + ".option push\n\t" > + ".option arch, +v\n\t" > + "vsetvli %0, x0, e8, m8, ta, ma\n\t" > + "vmv.v.i v0, 0\n\t" > + "vmv.v.i v8, 0\n\t" > + "vmv.v.i v16, 0\n\t" > + "vmv.v.i v24, 0\n\t" > + ".option pop\n\t" > + : "=&r" (vl) : : "memory"); > + riscv_v_disable(); > +} > + > #else /* ! CONFIG_RISCV_ISA_V */ > > struct pt_regs; > @@ -178,6 +201,8 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; } > #define __switch_to_vector(__prev, __next) do {} while (0) > #define riscv_v_vstate_off(regs) do {} while (0) > #define riscv_v_vstate_on(regs) do {} while (0) > +#define riscv_v_vstate_dirty(regs) do {} while (0) > +#define riscv_v_vstate_discard(regs) do {} while (0) > > #endif /* CONFIG_RISCV_ISA_V */ > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c > index 24d309c6ab8d..e36b69c9b07f 100644 > --- a/arch/riscv/kernel/traps.c > +++ b/arch/riscv/kernel/traps.c > @@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) > { > if (user_mode(regs)) { > ulong syscall = regs->a7; > + bool v_is_on; > > regs->epc += 4; > regs->orig_a0 = regs->a0; > > + v_is_on = riscv_v_vstate_query(regs); > + riscv_v_vstate_off(regs); > + > syscall = syscall_enter_from_user_mode(regs, syscall); > > if (syscall < NR_syscalls) > @@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) > regs->a0 = -ENOSYS; > > syscall_exit_to_user_mode(regs); > + if (v_is_on) { > + riscv_v_vstate_discard(regs); > + riscv_v_vstate_dirty(regs); Ah! Neat! Why dirty, instead of just keeping the "set to Initial" from my diff? This flow does avoid some context switch costs, but I wonder if this is some that can be added later, when we can more reliable measure the overhead. Premature optimization, and all that. ;-) Björn
On Fri, Jun 23, 2023 at 12:38 AM Björn Töpel <bjorn@kernel.org> wrote: > > Andy Chiu <andy.chiu@sifive.com> writes: > > > On Thu, Jun 22, 2023 at 5:40 AM Björn Töpel <bjorn@kernel.org> wrote: > >> > >> Andy Chiu <andy.chiu@sifive.com> writes: > >> > >> >> +static inline void riscv_v_vstate_discard(struct pt_regs *regs) > >> >> +{ > >> >> + unsigned long vl; > >> >> + > >> >> + if (!riscv_v_vstate_query(regs)) > >> >> + return; > >> >> + > >> >> + riscv_v_vstate_on(regs); > >> > > >> > Do we need this riscv_v_vstate_on()? If it is not on we'd return > >> > early in the previous "if" statement, right? > >> > >> riscv_v_vstate_on() just set the state to Initial, right? Or do you mean > >> that riscv_v_vstate_query() is too much, and we should only check if the > >> state is dirty? > >> > >> >> + > >> >> + riscv_v_enable(); > >> >> + asm volatile ( > >> >> + ".option push\n\t" > >> >> + ".option arch, +v\n\t" > >> >> + "vsetvli %0, x0, e8, m8, ta, ma\n\t" > >> >> + "vmv.v.i v0, 0\n\t" > >> >> + "vmv.v.i v8, 0\n\t" > >> >> + "vmv.v.i v16, 0\n\t" > >> >> + "vmv.v.i v24, 0\n\t" > >> >> + ".option pop\n\t" > >> >> + : "=&r" (vl) : : "memory"); > >> >> + riscv_v_disable(); > >> > > >> > Maybe consider cleaning the vstate (status.vs) here. As such we don't > >> > have to save V during context switch. > >> > >> It's late, and I'm slower than usual. The regs are cleared, and the > >> state is Initial. No save on context switch, but restore, right? > > > > Yes, it's my bad, you are right. I sometime messed around the "real" > > status.VS with the one in the userspace context :P > > > >> > >> > Or, maybe we could set vstate as off during syscall and discard V-reg > >> > + restore status.VS when returning back to userspace? > >> > >> Hmm, interesting. We need to track the status.VS to restore somewhere... > > > > Maybe something like this? > > > > diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h > > index 04c0b07bf6cd..79de9ca83391 100644 > > --- a/arch/riscv/include/asm/vector.h > > +++ b/arch/riscv/include/asm/vector.h > > @@ -43,6 +43,11 @@ static inline void riscv_v_vstate_on(struct pt_regs *regs) > > regs->status = (regs->status & ~SR_VS) | SR_VS_INITIAL; > > } > > > > +static inline void riscv_v_vstate_dirty(struct pt_regs *regs) > > +{ > > + regs->status = (regs->status & ~SR_VS) | SR_VS_DIRTY; > > +} > > + > > static inline bool riscv_v_vstate_query(struct pt_regs *regs) > > { > > return (regs->status & SR_VS) != 0; > > @@ -163,6 +168,24 @@ static inline void __switch_to_vector(struct task_struct *prev, > > void riscv_v_vstate_ctrl_init(struct task_struct *tsk); > > bool riscv_v_vstate_ctrl_user_allowed(void); > > > > +static inline void riscv_v_vstate_discard(struct pt_regs *regs) > > +{ > > + unsigned long vl; > > + > > + riscv_v_enable(); > > + asm volatile ( > > + ".option push\n\t" > > + ".option arch, +v\n\t" > > + "vsetvli %0, x0, e8, m8, ta, ma\n\t" > > + "vmv.v.i v0, 0\n\t" > > + "vmv.v.i v8, 0\n\t" > > + "vmv.v.i v16, 0\n\t" > > + "vmv.v.i v24, 0\n\t" > > + ".option pop\n\t" > > + : "=&r" (vl) : : "memory"); > > + riscv_v_disable(); > > +} > > + > > #else /* ! CONFIG_RISCV_ISA_V */ > > > > struct pt_regs; > > @@ -178,6 +201,8 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; } > > #define __switch_to_vector(__prev, __next) do {} while (0) > > #define riscv_v_vstate_off(regs) do {} while (0) > > #define riscv_v_vstate_on(regs) do {} while (0) > > +#define riscv_v_vstate_dirty(regs) do {} while (0) > > +#define riscv_v_vstate_discard(regs) do {} while (0) > > > > #endif /* CONFIG_RISCV_ISA_V */ > > > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c > > index 24d309c6ab8d..e36b69c9b07f 100644 > > --- a/arch/riscv/kernel/traps.c > > +++ b/arch/riscv/kernel/traps.c > > @@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) > > { > > if (user_mode(regs)) { > > ulong syscall = regs->a7; > > + bool v_is_on; > > > > regs->epc += 4; > > regs->orig_a0 = regs->a0; > > > > + v_is_on = riscv_v_vstate_query(regs); > > + riscv_v_vstate_off(regs); > > + > > syscall = syscall_enter_from_user_mode(regs, syscall); > > > > if (syscall < NR_syscalls) > > @@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) > > regs->a0 = -ENOSYS; > > > > syscall_exit_to_user_mode(regs); > > + if (v_is_on) { > > + riscv_v_vstate_discard(regs); > > + riscv_v_vstate_dirty(regs); > > Ah! Neat! Why dirty, instead of just keeping the "set to Initial" from > my diff? Both work, I think. But here if we set it to "on" after discarding V-regs, then take a context switch before executing any V instructions in user space (does not change future vstate to dirty). Then we will leak V-regs previously set into its vstate.datap after switching back, because we only save V context if vstate is dirty. So, I think setting vstate to dirty is a safer option. In your diff case, V-regs may be restored back to the previously-saved state if the syscall caused a context switch. I have not had a chance to test it yet because we are having a vacation in Taiwan, and I have some other stuff to keep me busy :) Please correct me if my thinking was wrong and I forgot some important idea again... > > This flow does avoid some context switch costs, but I wonder if this is > some that can be added later, when we can more reliable measure the > overhead. Premature optimization, and all that. ;-) > > > Björn Thanks, Andy
On Fri, Jun 23, 2023 at 12:38 AM Björn Töpel <bjorn@kernel.org> wrote: > This flow does avoid some context switch costs, but I wonder if this is > some that can be added later, when we can more reliable measure the > overhead. Premature optimization, and all that. ;-) > Sure, do you suggest any kinds of measurement, experiment, or benchmarking that could give out a figure on how things are different? > > Björn Thanks, Andy
Andy Chiu <andy.chiu@sifive.com> writes: > On Fri, Jun 23, 2023 at 12:38 AM Björn Töpel <bjorn@kernel.org> wrote: >> This flow does avoid some context switch costs, but I wonder if this is >> some that can be added later, when we can more reliable measure the >> overhead. Premature optimization, and all that. ;-) >> > > Sure, do you suggest any kinds of measurement, experiment, or > benchmarking that could give out a figure on how things are different? My take was; If you have access to actual V 1.0 hardware, and just not Qemu, then we could do some actual real tests, measuring context switch costs etc! Björn
Andy Chiu <andy.chiu@sifive.com> writes: >> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c >> > index 24d309c6ab8d..e36b69c9b07f 100644 >> > --- a/arch/riscv/kernel/traps.c >> > +++ b/arch/riscv/kernel/traps.c >> > @@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) >> > { >> > if (user_mode(regs)) { >> > ulong syscall = regs->a7; >> > + bool v_is_on; >> > >> > regs->epc += 4; >> > regs->orig_a0 = regs->a0; >> > >> > + v_is_on = riscv_v_vstate_query(regs); >> > + riscv_v_vstate_off(regs); >> > + >> > syscall = syscall_enter_from_user_mode(regs, syscall); >> > >> > if (syscall < NR_syscalls) >> > @@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) >> > regs->a0 = -ENOSYS; >> > >> > syscall_exit_to_user_mode(regs); >> > + if (v_is_on) { >> > + riscv_v_vstate_discard(regs); >> > + riscv_v_vstate_dirty(regs); >> >> Ah! Neat! Why dirty, instead of just keeping the "set to Initial" from >> my diff? > > Both work, I think. But here if we set it to "on" after discarding > V-regs, then take a context switch before executing any V instructions > in user space (does not change future vstate to dirty). Then we will > leak V-regs previously set into its vstate.datap after switching back, > because we only save V context if vstate is dirty. So, I think setting > vstate to dirty is a safer option. Ah, yes, good point. An alternative variant is this: --- diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h index 04c0b07bf6cd..32b6115a54a5 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -139,15 +139,51 @@ static inline void riscv_v_vstate_save(struct task_struct *task, } } +static inline void __riscv_v_vstate_discard(void) +{ + unsigned long vl, vtype_inval = 1UL << (BITS_PER_LONG - 1); + + riscv_v_enable(); + asm volatile ( + ".option push\n\t" + ".option arch, +v\n\t" + "vsetvli %0, x0, e8, m8, ta, ma\n\t" + "vmv.v.i v0, 0\n\t" + "vmv.v.i v8, 0\n\t" + "vmv.v.i v16, 0\n\t" + "vmv.v.i v24, 0\n\t" + "vsetvl %0, x0, %1\n\t" + ".option pop\n\t" + : "=&r" (vl) : "r" (vtype_inval) : "memory"); + riscv_v_disable(); +} + +static inline void riscv_v_vstate_discard(struct pt_regs *regs) +{ + if (!riscv_v_vstate_query(regs)) + return; + + __riscv_v_vstate_discard(); + riscv_v_vstate_on(regs); +} + static inline void riscv_v_vstate_restore(struct task_struct *task, struct pt_regs *regs) { - if ((regs->status & SR_VS) != SR_VS_OFF) { + unsigned long status = regs->status & SR_VS; + + WARN_ON(status == SR_VS_DIRTY); + + if (status == SR_VS_CLEAN) { struct __riscv_v_ext_state *vstate = &task->thread.vstate; __riscv_v_vstate_restore(vstate, vstate->datap); __riscv_v_vstate_clean(regs); + return; } + + if (status == SR_VS_INITIAL) + __riscv_v_vstate_discard(); } static inline void __switch_to_vector(struct task_struct *prev, @@ -178,6 +214,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; } #define __switch_to_vector(__prev, __next) do {} while (0) #define riscv_v_vstate_off(regs) do {} while (0) #define riscv_v_vstate_on(regs) do {} while (0) +#define riscv_v_vstate_discard(regs) do {} while (0) #endif /* CONFIG_RISCV_ISA_V */ diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index 5158961ea977..5ff63a784a6d 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -296,6 +296,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) regs->epc += 4; regs->orig_a0 = regs->a0; + riscv_v_vstate_discard(regs); + syscall = syscall_enter_from_user_mode(regs, syscall); if (syscall < NR_syscalls) --- Here, we simply discard the regs if the state is Initial. Thoughts? Björn
On Mon, Jun 26, 2023 at 11:36 PM Björn Töpel <bjorn@kernel.org> wrote: > > Andy Chiu <andy.chiu@sifive.com> writes: > > >> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c > >> > index 24d309c6ab8d..e36b69c9b07f 100644 > >> > --- a/arch/riscv/kernel/traps.c > >> > +++ b/arch/riscv/kernel/traps.c > >> > @@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) > >> > { > >> > if (user_mode(regs)) { > >> > ulong syscall = regs->a7; > >> > + bool v_is_on; > >> > > >> > regs->epc += 4; > >> > regs->orig_a0 = regs->a0; > >> > > >> > + v_is_on = riscv_v_vstate_query(regs); > >> > + riscv_v_vstate_off(regs); > >> > + > >> > syscall = syscall_enter_from_user_mode(regs, syscall); > >> > > >> > if (syscall < NR_syscalls) > >> > @@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) > >> > regs->a0 = -ENOSYS; > >> > > >> > syscall_exit_to_user_mode(regs); > >> > + if (v_is_on) { > >> > + riscv_v_vstate_discard(regs); > >> > + riscv_v_vstate_dirty(regs); > >> > >> Ah! Neat! Why dirty, instead of just keeping the "set to Initial" from > >> my diff? > > > > Both work, I think. But here if we set it to "on" after discarding > > V-regs, then take a context switch before executing any V instructions > > in user space (does not change future vstate to dirty). Then we will > > leak V-regs previously set into its vstate.datap after switching back, > > because we only save V context if vstate is dirty. So, I think setting > > vstate to dirty is a safer option. > > Ah, yes, good point. An alternative variant is this: > > --- > diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h > index 04c0b07bf6cd..32b6115a54a5 100644 > --- a/arch/riscv/include/asm/vector.h > +++ b/arch/riscv/include/asm/vector.h > @@ -139,15 +139,51 @@ static inline void riscv_v_vstate_save(struct task_struct *task, > } > } > > +static inline void __riscv_v_vstate_discard(void) > +{ > + unsigned long vl, vtype_inval = 1UL << (BITS_PER_LONG - 1); > + > + riscv_v_enable(); > + asm volatile ( > + ".option push\n\t" > + ".option arch, +v\n\t" > + "vsetvli %0, x0, e8, m8, ta, ma\n\t" > + "vmv.v.i v0, 0\n\t" > + "vmv.v.i v8, 0\n\t" > + "vmv.v.i v16, 0\n\t" > + "vmv.v.i v24, 0\n\t" > + "vsetvl %0, x0, %1\n\t" > + ".option pop\n\t" > + : "=&r" (vl) : "r" (vtype_inval) : "memory"); > + riscv_v_disable(); > +} > + > +static inline void riscv_v_vstate_discard(struct pt_regs *regs) > +{ > + if (!riscv_v_vstate_query(regs)) > + return; > + > + __riscv_v_vstate_discard(); > + riscv_v_vstate_on(regs); > +} > + > static inline void riscv_v_vstate_restore(struct task_struct *task, > struct pt_regs *regs) > { > - if ((regs->status & SR_VS) != SR_VS_OFF) { > + unsigned long status = regs->status & SR_VS; > + > + WARN_ON(status == SR_VS_DIRTY); > + > + if (status == SR_VS_CLEAN) { > struct __riscv_v_ext_state *vstate = &task->thread.vstate; > > __riscv_v_vstate_restore(vstate, vstate->datap); > __riscv_v_vstate_clean(regs); > + return; > } > + > + if (status == SR_VS_INITIAL) > + __riscv_v_vstate_discard(); > } > > static inline void __switch_to_vector(struct task_struct *prev, > @@ -178,6 +214,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; } > #define __switch_to_vector(__prev, __next) do {} while (0) > #define riscv_v_vstate_off(regs) do {} while (0) > #define riscv_v_vstate_on(regs) do {} while (0) > +#define riscv_v_vstate_discard(regs) do {} while (0) > > #endif /* CONFIG_RISCV_ISA_V */ > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c > index 5158961ea977..5ff63a784a6d 100644 > --- a/arch/riscv/kernel/traps.c > +++ b/arch/riscv/kernel/traps.c > @@ -296,6 +296,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) > regs->epc += 4; > regs->orig_a0 = regs->a0; > > + riscv_v_vstate_discard(regs); > + > syscall = syscall_enter_from_user_mode(regs, syscall); > > if (syscall < NR_syscalls) > > --- > > > Here, we simply discard the regs if the state is Initial. Thoughts? > > > Björn Yes, it makes sense to me to handle the initial state in vstate_restore. Thanks, Andy
Andy Chiu <andy.chiu@sifive.com> writes: > On Mon, Jun 26, 2023 at 11:36 PM Björn Töpel <bjorn@kernel.org> wrote: >> >> Andy Chiu <andy.chiu@sifive.com> writes: >> >> >> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c >> >> > index 24d309c6ab8d..e36b69c9b07f 100644 >> >> > --- a/arch/riscv/kernel/traps.c >> >> > +++ b/arch/riscv/kernel/traps.c >> >> > @@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) >> >> > { >> >> > if (user_mode(regs)) { >> >> > ulong syscall = regs->a7; >> >> > + bool v_is_on; >> >> > >> >> > regs->epc += 4; >> >> > regs->orig_a0 = regs->a0; >> >> > >> >> > + v_is_on = riscv_v_vstate_query(regs); >> >> > + riscv_v_vstate_off(regs); >> >> > + >> >> > syscall = syscall_enter_from_user_mode(regs, syscall); >> >> > >> >> > if (syscall < NR_syscalls) >> >> > @@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) >> >> > regs->a0 = -ENOSYS; >> >> > >> >> > syscall_exit_to_user_mode(regs); >> >> > + if (v_is_on) { >> >> > + riscv_v_vstate_discard(regs); >> >> > + riscv_v_vstate_dirty(regs); >> >> >> >> Ah! Neat! Why dirty, instead of just keeping the "set to Initial" from >> >> my diff? >> > >> > Both work, I think. But here if we set it to "on" after discarding >> > V-regs, then take a context switch before executing any V instructions >> > in user space (does not change future vstate to dirty). Then we will >> > leak V-regs previously set into its vstate.datap after switching back, >> > because we only save V context if vstate is dirty. So, I think setting >> > vstate to dirty is a safer option. >> >> Ah, yes, good point. An alternative variant is this: >> >> --- >> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h >> index 04c0b07bf6cd..32b6115a54a5 100644 >> --- a/arch/riscv/include/asm/vector.h >> +++ b/arch/riscv/include/asm/vector.h >> @@ -139,15 +139,51 @@ static inline void riscv_v_vstate_save(struct task_struct *task, >> } >> } >> >> +static inline void __riscv_v_vstate_discard(void) >> +{ >> + unsigned long vl, vtype_inval = 1UL << (BITS_PER_LONG - 1); >> + >> + riscv_v_enable(); >> + asm volatile ( >> + ".option push\n\t" >> + ".option arch, +v\n\t" >> + "vsetvli %0, x0, e8, m8, ta, ma\n\t" >> + "vmv.v.i v0, 0\n\t" >> + "vmv.v.i v8, 0\n\t" >> + "vmv.v.i v16, 0\n\t" >> + "vmv.v.i v24, 0\n\t" >> + "vsetvl %0, x0, %1\n\t" >> + ".option pop\n\t" >> + : "=&r" (vl) : "r" (vtype_inval) : "memory"); >> + riscv_v_disable(); >> +} >> + >> +static inline void riscv_v_vstate_discard(struct pt_regs *regs) >> +{ >> + if (!riscv_v_vstate_query(regs)) >> + return; >> + >> + __riscv_v_vstate_discard(); >> + riscv_v_vstate_on(regs); >> +} >> + >> static inline void riscv_v_vstate_restore(struct task_struct *task, >> struct pt_regs *regs) >> { >> - if ((regs->status & SR_VS) != SR_VS_OFF) { >> + unsigned long status = regs->status & SR_VS; >> + >> + WARN_ON(status == SR_VS_DIRTY); >> + >> + if (status == SR_VS_CLEAN) { >> struct __riscv_v_ext_state *vstate = &task->thread.vstate; >> >> __riscv_v_vstate_restore(vstate, vstate->datap); >> __riscv_v_vstate_clean(regs); >> + return; >> } >> + >> + if (status == SR_VS_INITIAL) >> + __riscv_v_vstate_discard(); >> } >> >> static inline void __switch_to_vector(struct task_struct *prev, >> @@ -178,6 +214,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; } >> #define __switch_to_vector(__prev, __next) do {} while (0) >> #define riscv_v_vstate_off(regs) do {} while (0) >> #define riscv_v_vstate_on(regs) do {} while (0) >> +#define riscv_v_vstate_discard(regs) do {} while (0) >> >> #endif /* CONFIG_RISCV_ISA_V */ >> >> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c >> index 5158961ea977..5ff63a784a6d 100644 >> --- a/arch/riscv/kernel/traps.c >> +++ b/arch/riscv/kernel/traps.c >> @@ -296,6 +296,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) >> regs->epc += 4; >> regs->orig_a0 = regs->a0; >> >> + riscv_v_vstate_discard(regs); >> + >> syscall = syscall_enter_from_user_mode(regs, syscall); >> >> if (syscall < NR_syscalls) >> >> --- >> >> >> Here, we simply discard the regs if the state is Initial. Thoughts? >> >> >> Björn > > Yes, it makes sense to me to handle the initial state in vstate_restore. Ok! I sent out a proper v2, but without the WARN_ON to match the behavior of the the original code. PTAL, and let me know what you think. Björn
diff --git a/Documentation/riscv/vector.rst b/Documentation/riscv/vector.rst index 48f189d79e41..a4dfa954215b 100644 --- a/Documentation/riscv/vector.rst +++ b/Documentation/riscv/vector.rst @@ -130,3 +130,8 @@ processes in form of sysctl knob: Modifying the system default enablement status does not affect the enablement status of any existing process of thread that do not make an execve() call. + +3. Vector Register State Across System Calls +--------------------------------------------- + +Vector registers are clobbered by system calls. diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index 05ffdcd1424e..bb99a6379b37 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) regs->epc += 4; regs->orig_a0 = regs->a0; + riscv_v_vstate_off(regs); + syscall = syscall_enter_from_user_mode(regs, syscall); if (syscall < NR_syscalls)
The V registers are clobbered by standard ABI functions, so userspace probably doesn't have anything useful in them by the time we get to the kernel. So let's just document that they're clobbered by syscalls and proactively clobber them. Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> --- IIRC we'd talked about doing this, but I didn't see anything in the docs. I figure it's better to just proactively clobber the registers on syscalls, as that way userspace can't end up accidentally depending on them. --- Documentation/riscv/vector.rst | 5 +++++ arch/riscv/kernel/traps.c | 2 ++ 2 files changed, 7 insertions(+)