Message ID | 20241017133909.3837547-4-kevin.brodsky@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improve arm64 pkeys handling in signal delivery | expand |
On Thu, Oct 17, 2024 at 02:39:07PM +0100, Kevin Brodsky wrote: > TL;DR: reset POR_EL0 to "allow all" before writing the signal frame, > preventing spurious uaccess failures. > > When POE is supported, the POR_EL0 register constrains memory > accesses based on the target page's POIndex (pkey). This raises the > question: what constraints should apply to a signal handler? The > current answer is that POR_EL0 is reset to POR_EL0_INIT when > invoking the handler, giving it full access to POIndex 0. This is in > line with x86's MPK support and remains unchanged. > > This is only part of the story, though. POR_EL0 constrains all > unprivileged memory accesses, meaning that uaccess routines such as > put_user() are also impacted. As a result POR_EL0 may prevent the > signal frame from being written to the signal stack (ultimately > causing a SIGSEGV). This is especially concerning when an alternate > signal stack is used, because userspace may want to prevent access > to it outside of signal handlers. There is currently no provision > for that: POR_EL0 is reset after writing to the stack, and > POR_EL0_INIT only enables access to POIndex 0. This all seems a bit convoluted. The issues seem to boil down to: the signal frame is read and written on behalf of the signal handler, and so should be done with consistent permissions to those the signal handler gets (instead of whatever random prevailing permissions were specified in POR_EL0 before signal delivery... and were possibly responsible for triggering the signal in the first place.) The need to save/restore POR_EL0 via staging storage seems to follow on naturally from that, since if POR_EL0 was independent of uaccess then this patch would be redundant... > > This patch ensures that POR_EL0 is reset to its most permissive Is this right? See my comment on save_reset_unpriv_access_state() below. > state before the signal stack is accessed. Once the signal frame has > been fully written, POR_EL0 is still set to POR_EL0_INIT - it is up > to the signal handler to enable access to additional pkeys if > needed. As to sigreturn(), it expects having access to the stack > like any other syscall; we only need to ensure that POR_EL0 is > restored from the signal frame after all uaccess calls. This > approach is in line with the recent x86/pkeys series [1]. > > Resetting POR_EL0 early introduces some complications, in that we > can no longer read the register directly in preserve_poe_context(). > This is addressed by introducing a struct (unpriv_access_state) > and helpers to manage any such register impacting uaccess. Things > look like this on signal delivery: > 1. Save original POR_EL0 into struct [save_reset_unpriv_access_state()] > 2. Set POR_EL0 to "allow all" [save_reset_unpriv_access_state()] > 3. Create signal frame > 4. Write saved POR_EL0 value to the signal frame [preserve_poe_context()] > 5. Finalise signal frame > 6. Set POR_EL0 to POR_EL0_INIT [set_handler_unpriv_access_state()] > > The return path (sys_rt_sigreturn) doesn't strictly require any change > since restore_poe_context() is already called last. However, to > avoid uaccess calls being accidentally added after that point, we > use the same approach as in the delivery path, i.e. separating > uaccess from writing to the register: > 1. Read saved POR_EL0 value from the signal frame [restore_poe_context()] > 2. Set POR_EL0 to the saved value [restore_unpriv_access_state()] > > [1] https://lore.kernel.org/lkml/20240802061318.2140081-1-aruna.ramakrishna@oracle.com/ > > Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> > --- > arch/arm64/kernel/signal.c | 89 ++++++++++++++++++++++++++++++++------ > 1 file changed, 75 insertions(+), 14 deletions(-) > > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c > index f5fb48dabebe..3548146084b3 100644 > --- a/arch/arm64/kernel/signal.c > +++ b/arch/arm64/kernel/signal.c > @@ -66,9 +66,64 @@ struct rt_sigframe_user_layout { > unsigned long end_offset; > }; > > +/* > + * Holds any EL0-controlled state that influences unprivileged memory accesses. > + * This includes both accesses done in userspace and uaccess done in the kernel. > + * > + * This state needs to be carefully managed to ensure that it doesn't cause > + * uaccess to fail when setting up the signal frame, and the signal handler > + * itself also expects a well-defined state when entered. > + */ > +struct unpriv_access_state { > + u64 por_el0; > +}; > + > #define TERMINATOR_SIZE round_up(sizeof(struct _aarch64_ctx), 16) > #define EXTRA_CONTEXT_SIZE round_up(sizeof(struct extra_context), 16) > > +/* > + * Save the unpriv access state into ua_state and reset it to disable any > + * restrictions. > + */ > +static void save_reset_unpriv_access_state(struct unpriv_access_state *ua_state) Would _user_ be more consistent naming than _unpriv_ ? Same elsewhere. > +{ > + if (system_supports_poe()) { > + /* > + * Enable all permissions in all 8 keys > + * (inspired by REPEAT_BYTE()) > + */ > + u64 por_enable_all = (~0u / POE_MASK) * POE_RXW; Yikes! Seriously though, why are we granting permissions that the signal handler isn't itself going to have over its own stack? I think the logical thing to do is to think of the write/read of the signal frame as being done on behalf of the signal handler, so the permissions should be those we're going to give the signal handler: not less, and (so far as we can approximate) not more. > + > + ua_state->por_el0 = read_sysreg_s(SYS_POR_EL0); > + write_sysreg_s(por_enable_all, SYS_POR_EL0); > + /* Ensure that any subsequent uaccess observes the updated value */ > + isb(); > + } > +} > + > +/* > + * Set the unpriv access state for invoking the signal handler. > + * > + * No uaccess should be done after that function is called. > + */ > +static void set_handler_unpriv_access_state(void) > +{ > + if (system_supports_poe()) > + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); > + Spurious blank line? > +} [...] > @@ -1252,9 +1310,11 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, > { > struct rt_sigframe_user_layout user; > struct rt_sigframe __user *frame; > + struct unpriv_access_state ua_state; > int err = 0; > > fpsimd_signal_preserve_current_state(); > + save_reset_unpriv_access_state(&ua_state); (Trivial nit: maybe put the blank line before this rather than after? This has nothing to do with "settling" the kernel's internal context switch state, and a lot to do with generaing the signal frame...) > > if (get_sigframe(&user, ksig, regs)) > return 1; [...] > @@ -1273,6 +1333,7 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, > regs->regs[1] = (unsigned long)&frame->info; > regs->regs[2] = (unsigned long)&frame->uc; > } > + set_handler_unpriv_access_state(); This bit feels prematurely factored? We don't have separate functions for the other low-level preparation done here... It works either way though, and I don't have a strong view. Overall, this all looks reasonable. Cheers ---Dave
On 17/10/2024 17:53, Dave Martin wrote: > [...] >> +/* >> + * Save the unpriv access state into ua_state and reset it to disable any >> + * restrictions. >> + */ >> +static void save_reset_unpriv_access_state(struct unpriv_access_state *ua_state) > Would _user_ be more consistent naming than _unpriv_ ? I did ponder on the naming. I considered user_access/uaccess instead of unpriv_access, but my concern is that it might imply that only uaccess is concerned, while in reality loads/stores that userspace itself executes are impacted too. I thought using the "unpriv" terminology from the Arm ARM (used for stage 1 permissions) might avoid such misunderstanding. I'm interested to hear opinions on this, maybe accuracy sacrifices readability. > Same elsewhere. > >> +{ >> + if (system_supports_poe()) { >> + /* >> + * Enable all permissions in all 8 keys >> + * (inspired by REPEAT_BYTE()) >> + */ >> + u64 por_enable_all = (~0u / POE_MASK) * POE_RXW; > Yikes! > > Seriously though, why are we granting permissions that the signal > handler isn't itself going to have over its own stack? > > I think the logical thing to do is to think of the write/read of the > signal frame as being done on behalf of the signal handler, so the > permissions should be those we're going to give the signal handler: > not less, and (so far as we can approximate) not more. Will continue that discussion on the cover letter. > >> + >> + ua_state->por_el0 = read_sysreg_s(SYS_POR_EL0); >> + write_sysreg_s(por_enable_all, SYS_POR_EL0); >> + /* Ensure that any subsequent uaccess observes the updated value */ >> + isb(); >> + } >> +} >> + >> +/* >> + * Set the unpriv access state for invoking the signal handler. >> + * >> + * No uaccess should be done after that function is called. >> + */ >> +static void set_handler_unpriv_access_state(void) >> +{ >> + if (system_supports_poe()) >> + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); >> + > Spurious blank line? Thanks! >> +} > [...] > >> @@ -1252,9 +1310,11 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, >> { >> struct rt_sigframe_user_layout user; >> struct rt_sigframe __user *frame; >> + struct unpriv_access_state ua_state; >> int err = 0; >> >> fpsimd_signal_preserve_current_state(); >> + save_reset_unpriv_access_state(&ua_state); > (Trivial nit: maybe put the blank line before this rather than after? > This has nothing to do with "settling" the kernel's internal context > switch state, and a lot to do with generaing the signal frame...) In fact considering the concern Catalin brought up with POR_EL0 being reset even when we fail to deliver the signal [1], I'm realising this call should be moved after get_sigframe(), since the latter doesn't use uaccess and can fail. [1] https://lore.kernel.org/linux-arm-kernel/Zw6D2waVyIwYE7wd@arm.com/ >> >> if (get_sigframe(&user, ksig, regs)) >> return 1; > [...] > >> @@ -1273,6 +1333,7 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, >> regs->regs[1] = (unsigned long)&frame->info; >> regs->regs[2] = (unsigned long)&frame->uc; >> } >> + set_handler_unpriv_access_state(); > This bit feels prematurely factored? We don't have separate functions > for the other low-level preparation done here... I preferred to have a consistent API for all manipulations of POR_EL0, the idea being that if more registers are added to struct unpriv_access_state, only the *unpriv_access* helpers need to be amended. > It works either way though, and I don't have a strong view. > > Overall, this all looks reasonable. Thanks for the review! Kevin
On Mon, Oct 21, 2024 at 12:06:07PM +0200, Kevin Brodsky wrote: > On 17/10/2024 17:53, Dave Martin wrote: > > [...] > >> +/* > >> + * Save the unpriv access state into ua_state and reset it to disable any > >> + * restrictions. > >> + */ > >> +static void save_reset_unpriv_access_state(struct unpriv_access_state *ua_state) > > Would _user_ be more consistent naming than _unpriv_ ? > > I did ponder on the naming. I considered user_access/uaccess instead of > unpriv_access, but my concern is that it might imply that only uaccess > is concerned, while in reality loads/stores that userspace itself > executes are impacted too. I thought using the "unpriv" terminology from > the Arm ARM (used for stage 1 permissions) might avoid such > misunderstanding. I'm interested to hear opinions on this, maybe > accuracy sacrifices readability. "user_access" seemed natural to me: it parses equally as "[user access]" (i.e., uaccess) and "[user] access" (i.e., access by, to, or on behalf of user(space)). Introducing an architectural term when there is already a generic OS and Linux kernel term that means the right thing seemed not to improve readability, but I guess it's a matter of opinion. Anyway, it doesn't really matter. > > > Same elsewhere. > > > >> +{ > >> + if (system_supports_poe()) { > >> + /* > >> + * Enable all permissions in all 8 keys > >> + * (inspired by REPEAT_BYTE()) > >> + */ > >> + u64 por_enable_all = (~0u / POE_MASK) * POE_RXW; > > Yikes! > > > > Seriously though, why are we granting permissions that the signal > > handler isn't itself going to have over its own stack? > > > > I think the logical thing to do is to think of the write/read of the > > signal frame as being done on behalf of the signal handler, so the > > permissions should be those we're going to give the signal handler: > > not less, and (so far as we can approximate) not more. > > Will continue that discussion on the cover letter. > > > > >> + > >> + ua_state->por_el0 = read_sysreg_s(SYS_POR_EL0); > >> + write_sysreg_s(por_enable_all, SYS_POR_EL0); > >> + /* Ensure that any subsequent uaccess observes the updated value */ > >> + isb(); > >> + } > >> +} > >> + > >> +/* > >> + * Set the unpriv access state for invoking the signal handler. > >> + * > >> + * No uaccess should be done after that function is called. > >> + */ > >> +static void set_handler_unpriv_access_state(void) > >> +{ > >> + if (system_supports_poe()) > >> + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); > >> + > > Spurious blank line? > > Thanks! > > >> +} > > [...] > > > >> @@ -1252,9 +1310,11 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, > >> { > >> struct rt_sigframe_user_layout user; > >> struct rt_sigframe __user *frame; > >> + struct unpriv_access_state ua_state; > >> int err = 0; > >> > >> fpsimd_signal_preserve_current_state(); > >> + save_reset_unpriv_access_state(&ua_state); > > (Trivial nit: maybe put the blank line before this rather than after? > > This has nothing to do with "settling" the kernel's internal context > > switch state, and a lot to do with generaing the signal frame...) > > In fact considering the concern Catalin brought up with POR_EL0 being > reset even when we fail to deliver the signal [1], I'm realising this > call should be moved after get_sigframe(), since the latter doesn't use > uaccess and can fail. > > [1] https://lore.kernel.org/linux-arm-kernel/Zw6D2waVyIwYE7wd@arm.com/ > > >> > >> if (get_sigframe(&user, ksig, regs)) > >> return 1; > > [...] ^ Ah, good point. The save_reset_unpriv_access_state(&ua_state) call probably belong just before the first __put_user() then. > > > >> @@ -1273,6 +1333,7 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, > >> regs->regs[1] = (unsigned long)&frame->info; > >> regs->regs[2] = (unsigned long)&frame->uc; > >> } > >> + set_handler_unpriv_access_state(); > > This bit feels prematurely factored? We don't have separate functions > > for the other low-level preparation done here... > > I preferred to have a consistent API for all manipulations of POR_EL0, > the idea being that if more registers are added to struct > unpriv_access_state, only the *unpriv_access* helpers need to be amended. Certainly if that struct grows more state, then the factoring will help in future. I wasn't clear on how we expect this all to evolve. Either way, this is basically a non-issue, and keeping the symmetry is probably a good idea. Cheers ---Dave
Hi, Just in case the reply I thought I'd sent to this evaporated (or I imagined it): On Mon, Oct 21, 2024 at 12:06:07PM +0200, Kevin Brodsky wrote: > On 17/10/2024 17:53, Dave Martin wrote: > > [...] > >> +/* > >> + * Save the unpriv access state into ua_state and reset it to disable any > >> + * restrictions. > >> + */ > >> +static void save_reset_unpriv_access_state(struct unpriv_access_state *ua_state) > > Would _user_ be more consistent naming than _unpriv_ ? > > I did ponder on the naming. I considered user_access/uaccess instead of > unpriv_access, but my concern is that it might imply that only uaccess > is concerned, while in reality loads/stores that userspace itself > executes are impacted too. I thought using the "unpriv" terminology from > the Arm ARM (used for stage 1 permissions) might avoid such > misunderstanding. I'm interested to hear opinions on this, maybe > accuracy sacrifices readability. > > > Same elsewhere. I think "user" covers these meanings, though including the word "access" makes it sound like this is specific to uaccess. Maybe something like: save_reset_user_permissions() restore_user_permissions() would make sense? (But again, it's not a big deal.) > > > >> +{ > >> + if (system_supports_poe()) { > >> + /* > >> + * Enable all permissions in all 8 keys > >> + * (inspired by REPEAT_BYTE()) > >> + */ > >> + u64 por_enable_all = (~0u / POE_MASK) * POE_RXW; > > Yikes! > > > > Seriously though, why are we granting permissions that the signal > > handler isn't itself going to have over its own stack? > > > > I think the logical thing to do is to think of the write/read of the > > signal frame as being done on behalf of the signal handler, so the > > permissions should be those we're going to give the signal handler: > > not less, and (so far as we can approximate) not more. > > Will continue that discussion on the cover letter. > > > > >> + > >> + ua_state->por_el0 = read_sysreg_s(SYS_POR_EL0); > >> + write_sysreg_s(por_enable_all, SYS_POR_EL0); > >> + /* Ensure that any subsequent uaccess observes the updated value */ > >> + isb(); > >> + } > >> +} > >> + > >> +/* > >> + * Set the unpriv access state for invoking the signal handler. > >> + * > >> + * No uaccess should be done after that function is called. > >> + */ > >> +static void set_handler_unpriv_access_state(void) > >> +{ > >> + if (system_supports_poe()) > >> + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); > >> + > > Spurious blank line? > > Thanks! > > >> +} > > [...] > > > >> @@ -1252,9 +1310,11 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, > >> { > >> struct rt_sigframe_user_layout user; > >> struct rt_sigframe __user *frame; > >> + struct unpriv_access_state ua_state; > >> int err = 0; > >> > >> fpsimd_signal_preserve_current_state(); > >> + save_reset_unpriv_access_state(&ua_state); > > (Trivial nit: maybe put the blank line before this rather than after? > > This has nothing to do with "settling" the kernel's internal context > > switch state, and a lot to do with generaing the signal frame...) > > In fact considering the concern Catalin brought up with POR_EL0 being > reset even when we fail to deliver the signal [1], I'm realising this > call should be moved after get_sigframe(), since the latter doesn't use > uaccess and can fail. Good point... > [1] https://lore.kernel.org/linux-arm-kernel/Zw6D2waVyIwYE7wd@arm.com/ > > >> > >> if (get_sigframe(&user, ksig, regs)) > >> return 1; > > [...] I guess the call can be pushed to just before the first __put_user(), after here? > > > >> @@ -1273,6 +1333,7 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, > >> regs->regs[1] = (unsigned long)&frame->info; > >> regs->regs[2] = (unsigned long)&frame->uc; > >> } > >> + set_handler_unpriv_access_state(); > > This bit feels prematurely factored? We don't have separate functions > > for the other low-level preparation done here... > > I preferred to have a consistent API for all manipulations of POR_EL0, > the idea being that if more registers are added to struct > unpriv_access_state, only the *unpriv_access* helpers need to be amended. > > > It works either way though, and I don't have a strong view. > > > > Overall, this all looks reasonable. Keeping the symmetry seems generally a good idea, especially if we expect that struct to grow more state over time. I wasn't sure how we anticipiate this evolving. [...] Cheers ---Dave
On 21/10/2024 15:43, Dave Martin wrote: > On Mon, Oct 21, 2024 at 12:06:07PM +0200, Kevin Brodsky wrote: >> On 17/10/2024 17:53, Dave Martin wrote: >>> [...] >>>> +/* >>>> + * Save the unpriv access state into ua_state and reset it to disable any >>>> + * restrictions. >>>> + */ >>>> +static void save_reset_unpriv_access_state(struct unpriv_access_state *ua_state) >>> Would _user_ be more consistent naming than _unpriv_ ? >> I did ponder on the naming. I considered user_access/uaccess instead of >> unpriv_access, but my concern is that it might imply that only uaccess >> is concerned, while in reality loads/stores that userspace itself >> executes are impacted too. I thought using the "unpriv" terminology from >> the Arm ARM (used for stage 1 permissions) might avoid such >> misunderstanding. I'm interested to hear opinions on this, maybe >> accuracy sacrifices readability. > "user_access" seemed natural to me: it parses equally as "[user > access]" (i.e., uaccess) and "[user] access" (i.e., access by, to, or > on behalf of user(space)). > > Introducing an architectural term when there is already a generic OS > and Linux kernel term that means the right thing seemed not to improve > readability, but I guess it's a matter of opinion. Both good points. "user_access" seems to strike the right balance, plus it's slightly shorter. Will switch to that naming in v2. Kevin
Hi, On Tue, Oct 22, 2024 at 02:34:09PM +0200, Kevin Brodsky wrote: > On 21/10/2024 15:43, Dave Martin wrote: > > On Mon, Oct 21, 2024 at 12:06:07PM +0200, Kevin Brodsky wrote: > >> On 17/10/2024 17:53, Dave Martin wrote: > >>> [...] > >>>> +/* > >>>> + * Save the unpriv access state into ua_state and reset it to disable any > >>>> + * restrictions. > >>>> + */ > >>>> +static void save_reset_unpriv_access_state(struct unpriv_access_state *ua_state) > >>> Would _user_ be more consistent naming than _unpriv_ ? > >> I did ponder on the naming. I considered user_access/uaccess instead of > >> unpriv_access, but my concern is that it might imply that only uaccess > >> is concerned, while in reality loads/stores that userspace itself > >> executes are impacted too. I thought using the "unpriv" terminology from > >> the Arm ARM (used for stage 1 permissions) might avoid such > >> misunderstanding. I'm interested to hear opinions on this, maybe > >> accuracy sacrifices readability. > > "user_access" seemed natural to me: it parses equally as "[user > > access]" (i.e., uaccess) and "[user] access" (i.e., access by, to, or > > on behalf of user(space)). > > > > Introducing an architectural term when there is already a generic OS > > and Linux kernel term that means the right thing seemed not to improve > > readability, but I guess it's a matter of opinion. > > Both good points. "user_access" seems to strike the right balance, plus > it's slightly shorter. Will switch to that naming in v2. Suits me (wasn't sure I was going to win that one actually!) Cheers ---Dave
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index f5fb48dabebe..3548146084b3 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -66,9 +66,64 @@ struct rt_sigframe_user_layout { unsigned long end_offset; }; +/* + * Holds any EL0-controlled state that influences unprivileged memory accesses. + * This includes both accesses done in userspace and uaccess done in the kernel. + * + * This state needs to be carefully managed to ensure that it doesn't cause + * uaccess to fail when setting up the signal frame, and the signal handler + * itself also expects a well-defined state when entered. + */ +struct unpriv_access_state { + u64 por_el0; +}; + #define TERMINATOR_SIZE round_up(sizeof(struct _aarch64_ctx), 16) #define EXTRA_CONTEXT_SIZE round_up(sizeof(struct extra_context), 16) +/* + * Save the unpriv access state into ua_state and reset it to disable any + * restrictions. + */ +static void save_reset_unpriv_access_state(struct unpriv_access_state *ua_state) +{ + if (system_supports_poe()) { + /* + * Enable all permissions in all 8 keys + * (inspired by REPEAT_BYTE()) + */ + u64 por_enable_all = (~0u / POE_MASK) * POE_RXW; + + ua_state->por_el0 = read_sysreg_s(SYS_POR_EL0); + write_sysreg_s(por_enable_all, SYS_POR_EL0); + /* Ensure that any subsequent uaccess observes the updated value */ + isb(); + } +} + +/* + * Set the unpriv access state for invoking the signal handler. + * + * No uaccess should be done after that function is called. + */ +static void set_handler_unpriv_access_state(void) +{ + if (system_supports_poe()) + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); + +} + +/* + * Restore the unpriv access state to the values saved in ua_state. + * + * No uaccess should be done after that function is called. + */ +static void restore_unpriv_access_state(const struct unpriv_access_state *ua_state) +{ + if (system_supports_poe()) + write_sysreg_s(ua_state->por_el0, SYS_POR_EL0); +} + static void init_user_layout(struct rt_sigframe_user_layout *user) { const size_t reserved_size = @@ -260,18 +315,20 @@ static int restore_fpmr_context(struct user_ctxs *user) return err; } -static int preserve_poe_context(struct poe_context __user *ctx) +static int preserve_poe_context(struct poe_context __user *ctx, + const struct unpriv_access_state *ua_state) { int err = 0; __put_user_error(POE_MAGIC, &ctx->head.magic, err); __put_user_error(sizeof(*ctx), &ctx->head.size, err); - __put_user_error(read_sysreg_s(SYS_POR_EL0), &ctx->por_el0, err); + __put_user_error(ua_state->por_el0, &ctx->por_el0, err); return err; } -static int restore_poe_context(struct user_ctxs *user) +static int restore_poe_context(struct user_ctxs *user, + struct unpriv_access_state *ua_state) { u64 por_el0; int err = 0; @@ -281,7 +338,7 @@ static int restore_poe_context(struct user_ctxs *user) __get_user_error(por_el0, &(user->poe->por_el0), err); if (!err) - write_sysreg_s(por_el0, SYS_POR_EL0); + ua_state->por_el0 = por_el0; return err; } @@ -849,7 +906,8 @@ static int parse_user_sigframe(struct user_ctxs *user, } static int restore_sigframe(struct pt_regs *regs, - struct rt_sigframe __user *sf) + struct rt_sigframe __user *sf, + struct unpriv_access_state *ua_state) { sigset_t set; int i, err; @@ -898,7 +956,7 @@ static int restore_sigframe(struct pt_regs *regs, err = restore_zt_context(&user); if (err == 0 && system_supports_poe() && user.poe) - err = restore_poe_context(&user); + err = restore_poe_context(&user, ua_state); return err; } @@ -907,6 +965,7 @@ SYSCALL_DEFINE0(rt_sigreturn) { struct pt_regs *regs = current_pt_regs(); struct rt_sigframe __user *frame; + struct unpriv_access_state ua_state; /* Always make any pending restarted system calls return -EINTR */ current->restart_block.fn = do_no_restart_syscall; @@ -923,12 +982,14 @@ SYSCALL_DEFINE0(rt_sigreturn) if (!access_ok(frame, sizeof (*frame))) goto badframe; - if (restore_sigframe(regs, frame)) + if (restore_sigframe(regs, frame, &ua_state)) goto badframe; if (restore_altstack(&frame->uc.uc_stack)) goto badframe; + restore_unpriv_access_state(&ua_state); + return regs->regs[0]; badframe: @@ -1034,7 +1095,8 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user, } static int setup_sigframe(struct rt_sigframe_user_layout *user, - struct pt_regs *regs, sigset_t *set) + struct pt_regs *regs, sigset_t *set, + const struct unpriv_access_state *ua_state) { int i, err = 0; struct rt_sigframe __user *sf = user->sigframe; @@ -1096,10 +1158,9 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user, struct poe_context __user *poe_ctx = apply_user_offset(user, user->poe_offset); - err |= preserve_poe_context(poe_ctx); + err |= preserve_poe_context(poe_ctx, ua_state); } - /* ZA state if present */ if (system_supports_sme() && err == 0 && user->za_offset) { struct za_context __user *za_ctx = @@ -1236,9 +1297,6 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka, sme_smstop(); } - if (system_supports_poe()) - write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); - if (ka->sa.sa_flags & SA_RESTORER) sigtramp = ka->sa.sa_restorer; else @@ -1252,9 +1310,11 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, { struct rt_sigframe_user_layout user; struct rt_sigframe __user *frame; + struct unpriv_access_state ua_state; int err = 0; fpsimd_signal_preserve_current_state(); + save_reset_unpriv_access_state(&ua_state); if (get_sigframe(&user, ksig, regs)) return 1; @@ -1265,7 +1325,7 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, __put_user_error(NULL, &frame->uc.uc_link, err); err |= __save_altstack(&frame->uc.uc_stack, regs->sp); - err |= setup_sigframe(&user, regs, set); + err |= setup_sigframe(&user, regs, set, &ua_state); if (err == 0) { setup_return(regs, &ksig->ka, &user, usig); if (ksig->ka.sa.sa_flags & SA_SIGINFO) { @@ -1273,6 +1333,7 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, regs->regs[1] = (unsigned long)&frame->info; regs->regs[2] = (unsigned long)&frame->uc; } + set_handler_unpriv_access_state(); } return err;
TL;DR: reset POR_EL0 to "allow all" before writing the signal frame, preventing spurious uaccess failures. When POE is supported, the POR_EL0 register constrains memory accesses based on the target page's POIndex (pkey). This raises the question: what constraints should apply to a signal handler? The current answer is that POR_EL0 is reset to POR_EL0_INIT when invoking the handler, giving it full access to POIndex 0. This is in line with x86's MPK support and remains unchanged. This is only part of the story, though. POR_EL0 constrains all unprivileged memory accesses, meaning that uaccess routines such as put_user() are also impacted. As a result POR_EL0 may prevent the signal frame from being written to the signal stack (ultimately causing a SIGSEGV). This is especially concerning when an alternate signal stack is used, because userspace may want to prevent access to it outside of signal handlers. There is currently no provision for that: POR_EL0 is reset after writing to the stack, and POR_EL0_INIT only enables access to POIndex 0. This patch ensures that POR_EL0 is reset to its most permissive state before the signal stack is accessed. Once the signal frame has been fully written, POR_EL0 is still set to POR_EL0_INIT - it is up to the signal handler to enable access to additional pkeys if needed. As to sigreturn(), it expects having access to the stack like any other syscall; we only need to ensure that POR_EL0 is restored from the signal frame after all uaccess calls. This approach is in line with the recent x86/pkeys series [1]. Resetting POR_EL0 early introduces some complications, in that we can no longer read the register directly in preserve_poe_context(). This is addressed by introducing a struct (unpriv_access_state) and helpers to manage any such register impacting uaccess. Things look like this on signal delivery: 1. Save original POR_EL0 into struct [save_reset_unpriv_access_state()] 2. Set POR_EL0 to "allow all" [save_reset_unpriv_access_state()] 3. Create signal frame 4. Write saved POR_EL0 value to the signal frame [preserve_poe_context()] 5. Finalise signal frame 6. Set POR_EL0 to POR_EL0_INIT [set_handler_unpriv_access_state()] The return path (sys_rt_sigreturn) doesn't strictly require any change since restore_poe_context() is already called last. However, to avoid uaccess calls being accidentally added after that point, we use the same approach as in the delivery path, i.e. separating uaccess from writing to the register: 1. Read saved POR_EL0 value from the signal frame [restore_poe_context()] 2. Set POR_EL0 to the saved value [restore_unpriv_access_state()] [1] https://lore.kernel.org/lkml/20240802061318.2140081-1-aruna.ramakrishna@oracle.com/ Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> --- arch/arm64/kernel/signal.c | 89 ++++++++++++++++++++++++++++++++------ 1 file changed, 75 insertions(+), 14 deletions(-)