mbox series

[0/5] Improve arm64 pkeys handling in signal delivery

Message ID 20241017133909.3837547-1-kevin.brodsky@arm.com (mailing list archive)
Headers show
Series Improve arm64 pkeys handling in signal delivery | expand

Message

Kevin Brodsky Oct. 17, 2024, 1:39 p.m. UTC
This series is a follow-up to Joey's Permission Overlay Extension (POE)
series [1] that recently landed on mainline. The goal is to improve the
way we handle the register that governs which pkeys/POIndex are
accessible (POR_EL0) during signal delivery. As things stand, we may
unexpectedly fail to write the signal frame on the stack because POR_EL0
is not reset before the uaccess operations. See patch 3 for more details
and the main changes this series brings.

A similar series landed recently for x86/MPK [2]; the present series
aims at aligning arm64 with x86. Worth noting: once the signal frame is
written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
only. This means that a program that sets up an alternate signal stack
with a non-zero pkey will need some assembly trampoline to set POR_EL0
before invoking the real signal handler, as discussed here [3].

The x86 series also added kselftests to ensure that no spurious SIGSEGV
occurs during signal delivery regardless of which pkey is accessible at
the point where the signal is delivered. This series adapts those
kselftests to allow running them on arm64 (patch 4-5).

Finally patch 2 is a clean-up following feedback on Joey's series [4].

I have tested this series on arm64 and x86_64 (booting and running the
protection_keys and pkey_sighandler_tests mm kselftests).

- Kevin

[1] https://lore.kernel.org/linux-arm-kernel/20240822151113.1479789-1-joey.gouly@arm.com/
[2] https://lore.kernel.org/lkml/20240802061318.2140081-1-aruna.ramakrishna@oracle.com/
[3] https://lore.kernel.org/lkml/CABi2SkWxNkP2O7ipkP67WKz0-LV33e5brReevTTtba6oKUfHRw@mail.gmail.com/
[4] https://lore.kernel.org/linux-arm-kernel/20241015114116.GA19334@willie-the-truck/

Cc: akpm@linux-foundation.org
Cc: anshuman.khandual@arm.com
Cc: aruna.ramakrishna@oracle.com
Cc: broonie@kernel.org
Cc: catalin.marinas@arm.com
Cc: dave.hansen@linux.intel.com
Cc: dave.martin@arm.com
Cc: jeffxu@chromium.org
Cc: joey.gouly@arm.com
Cc: shuah@kernel.org
Cc: will@kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: x86@kernel.org

Kevin Brodsky (5):
  arm64: signal: Remove unused macro
  arm64: signal: Remove unnecessary check when saving POE state
  arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
  selftests/mm: Use generic pkey register manipulation
  selftests/mm: Enable pkey_sighandler_tests on arm64

 arch/arm64/kernel/signal.c                    |  92 +++++++++++++---
 tools/testing/selftests/mm/Makefile           |   8 +-
 tools/testing/selftests/mm/pkey-arm64.h       |   1 +
 tools/testing/selftests/mm/pkey-x86.h         |   2 +
 .../selftests/mm/pkey_sighandler_tests.c      | 101 +++++++++++++-----
 5 files changed, 159 insertions(+), 45 deletions(-)

Comments

Dave Martin Oct. 17, 2024, 3:48 p.m. UTC | #1
On Thu, Oct 17, 2024 at 02:39:04PM +0100, Kevin Brodsky wrote:
> This series is a follow-up to Joey's Permission Overlay Extension (POE)
> series [1] that recently landed on mainline. The goal is to improve the
> way we handle the register that governs which pkeys/POIndex are
> accessible (POR_EL0) during signal delivery. As things stand, we may
> unexpectedly fail to write the signal frame on the stack because POR_EL0
> is not reset before the uaccess operations. See patch 3 for more details
> and the main changes this series brings.
> 
> A similar series landed recently for x86/MPK [2]; the present series
> aims at aligning arm64 with x86. Worth noting: once the signal frame is
> written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
> only. This means that a program that sets up an alternate signal stack
> with a non-zero pkey will need some assembly trampoline to set POR_EL0
> before invoking the real signal handler, as discussed here [3].

This feels a bit bogus (though it's anyway orthogonal to this series).

Really, we want some way for userspace to tell the kernel what
permissions to use for the alternate signal stack and signal handlers
using it, and then honour that request consistently (just as we try to
do for the main stack today).

ss_flags is mostly unused... I wonder whether we could add something in
there?  Or add a sigaltstack2()?

[...]

Cheers
---Dave
Kevin Brodsky Oct. 21, 2024, 10:06 a.m. UTC | #2
On 17/10/2024 17:48, Dave Martin wrote:
> On Thu, Oct 17, 2024 at 02:39:04PM +0100, Kevin Brodsky wrote:
>> This series is a follow-up to Joey's Permission Overlay Extension (POE)
>> series [1] that recently landed on mainline. The goal is to improve the
>> way we handle the register that governs which pkeys/POIndex are
>> accessible (POR_EL0) during signal delivery. As things stand, we may
>> unexpectedly fail to write the signal frame on the stack because POR_EL0
>> is not reset before the uaccess operations. See patch 3 for more details
>> and the main changes this series brings.
>>
>> A similar series landed recently for x86/MPK [2]; the present series
>> aims at aligning arm64 with x86. Worth noting: once the signal frame is
>> written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
>> only. This means that a program that sets up an alternate signal stack
>> with a non-zero pkey will need some assembly trampoline to set POR_EL0
>> before invoking the real signal handler, as discussed here [3].
> This feels a bit bogus (though it's anyway orthogonal to this series).

I'm not very fond of this either. However I believe this is the correct
first step: bring arm64 in line with x86. Removing all restrictions
before uaccess and then setting POR_EL0 to POR_EL0_INIT enables
userspace to use any pkey for the alternate signal stack without an ABI
change, albeit not in a very comfortable way (if the pkey is not 0).

> Really, we want some way for userspace to tell the kernel what
> permissions to use for the alternate signal stack and signal handlers
> using it, and then honour that request consistently (just as we try to
> do for the main stack today).
>
> ss_flags is mostly unused... I wonder whether we could add something in
> there?  Or add a sigaltstack2()?

Yes, this would be sensible as a second step (backwards-compatible
extension). Exactly how that API would look like is not trivial though:
is the pkey implicitly derived from the pointer provided to
sigaltstack()? Is there a need to specify another pkey for code, or do
we just assume that the signal handler is only using code with pkey 0?
(Not a concern on x86 as MPK doesn't restrict execution.) Would be very
interested to hear opinions on this.

Kevin
Dave Martin Oct. 21, 2024, 1:31 p.m. UTC | #3
On Mon, Oct 21, 2024 at 12:06:25PM +0200, Kevin Brodsky wrote:
> On 17/10/2024 17:48, Dave Martin wrote:
> > On Thu, Oct 17, 2024 at 02:39:04PM +0100, Kevin Brodsky wrote:
> >> This series is a follow-up to Joey's Permission Overlay Extension (POE)
> >> series [1] that recently landed on mainline. The goal is to improve the
> >> way we handle the register that governs which pkeys/POIndex are
> >> accessible (POR_EL0) during signal delivery. As things stand, we may
> >> unexpectedly fail to write the signal frame on the stack because POR_EL0
> >> is not reset before the uaccess operations. See patch 3 for more details
> >> and the main changes this series brings.
> >>
> >> A similar series landed recently for x86/MPK [2]; the present series
> >> aims at aligning arm64 with x86. Worth noting: once the signal frame is
> >> written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
> >> only. This means that a program that sets up an alternate signal stack
> >> with a non-zero pkey will need some assembly trampoline to set POR_EL0
> >> before invoking the real signal handler, as discussed here [3].
> > This feels a bit bogus (though it's anyway orthogonal to this series).
> 
> I'm not very fond of this either. However I believe this is the correct
> first step: bring arm64 in line with x86. Removing all restrictions
> before uaccess and then setting POR_EL0 to POR_EL0_INIT enables
> userspace to use any pkey for the alternate signal stack without an ABI
> change, albeit not in a very comfortable way (if the pkey is not 0).

I see: we try not to prevent userspace from using whatever pkey it
likes for the alternate signal stack, but we are only permissive for
the bare minimum operations that userspace can't possibly control for
itself (i.e., writing the signal frame).

This whole thing feels a bit of a botch, though.

Do we know of anyone actually using a sigaltstack with a pkey other
than 0?  Why the urgency?  Code relying on an asm shim on x86 is
already nonportable, unless I've misunderstood something, so simply
turning on arm64 pkeys support in the kernel and libc shouldn't break
anything today?  (At least, nothing that wasn't asking to be broken.)

> 
> > Really, we want some way for userspace to tell the kernel what
> > permissions to use for the alternate signal stack and signal handler
> > using it, and then honour that request consistently (just as we try to
> > do for the main stack today).
> >
> > ss_flags is mostly unused... I wonder whether we could add something in
> > there?  Or add a sigaltstack2()?
> 
> Yes, this would be sensible as a second step (backwards-compatible
> extension). Exactly how that API would look like is not trivial though:
> is the pkey implicitly derived from the pointer provided to
> sigaltstack()? Is there a need to specify another pkey for code, or do
> we just assume that the signal handler is only using code with pkey 0?
> (Not a concern on x86 as MPK doesn't restrict execution.) Would be very
> interested to hear opinions on this.
> 
> Kevin

I would vote for specifying the pkey (or, if feasible, PKRU or
modifications to it) in some bits of ss_flags, or in an additional
flags argument to sigaltstack2().

Memory with a non-zero pkey cannot be used 100% portably, period, and
having non-RW(X) permissions on pkey 0 at any time is also not
portable, period.  So I'm not sure that having libc magically guess
what userspace's pkeys policy is supposed to be based on racily digging
metadata out of /proc/self/maps or a cache of it etc. would be such a
good idea.

There are other ways to approach (or not approach) this though --
I would be interested to hear what other people think too...

Cheers
---Dave
Catalin Marinas Oct. 21, 2024, 3:30 p.m. UTC | #4
On Mon, Oct 21, 2024 at 02:31:08PM +0100, Dave P Martin wrote:
> > > On Thu, Oct 17, 2024 at 02:39:04PM +0100, Kevin Brodsky wrote:
> > >> This series is a follow-up to Joey's Permission Overlay Extension (POE)
> > >> series [1] that recently landed on mainline. The goal is to improve the
> > >> way we handle the register that governs which pkeys/POIndex are
> > >> accessible (POR_EL0) during signal delivery. As things stand, we may
> > >> unexpectedly fail to write the signal frame on the stack because POR_EL0
> > >> is not reset before the uaccess operations. See patch 3 for more details
> > >> and the main changes this series brings.
> > >>
> > >> A similar series landed recently for x86/MPK [2]; the present series
> > >> aims at aligning arm64 with x86. Worth noting: once the signal frame is
> > >> written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
> > >> only. This means that a program that sets up an alternate signal stack
> > >> with a non-zero pkey will need some assembly trampoline to set POR_EL0
> > >> before invoking the real signal handler, as discussed here [3].
[...]
> Memory with a non-zero pkey cannot be used 100% portably, period, and
> having non-RW(X) permissions on pkey 0 at any time is also not
> portable, period.  So I'm not sure that having libc magically guess
> what userspace's pkeys policy is supposed to be based on racily digging
> metadata out of /proc/self/maps or a cache of it etc. would be such a
> good idea.

I agree that changing RWX overlay permission for pkey 0 to anything else
is a really bad idea. We can't prevent it but we shouldn't actively try
to work around it in the kernel either. With the current signal ABI, I
don't think we should support anything other than pkey 0 for the stack.
Since the user shouldn't change the pkey 0 RWX overlay permission
anyway, I don't think we should reset POR_EL0 _prior_ to writing the
signal frame. The best we can do is document it somewhere.

So on patch 3 I'd only ensure that we have POR_EL0_INIT when invoking
the signal handler and not when performing the uaccess. If the uaccess
fails, we'd get a fatal SIGSEGV. The user may have got it already if it
made the stack read-only.

Currently the primary use of pkeys is for W^X and signal stacks
shouldn't fall into this category. If we ever have a strong case for
non-zero pkeys on the signal stack, we'll need to look into some new
ABI. I'm not sure about SS_* flags though, I think the signal POR_EL0
should be associated with the sigaction rather than the stack (the
latter would just be mapped by the user with the right pkey, the kernel
doesn't need to know which, only what POR_EL0 is needed by the handler).

Until such case turns up, I'd not put any effort into ABI improvements.
I can think of some light compartmentalisation where we have a pkey
that's "privileged" and all threads have a POR_EL0 that prevents access
to that pkey. The signal handler would have more permissive rights to
that privileged pkey. I'd not proactively add support for this though.
Will Deacon Oct. 21, 2024, 5:19 p.m. UTC | #5
On Mon, Oct 21, 2024 at 04:30:04PM +0100, Catalin Marinas wrote:
> On Mon, Oct 21, 2024 at 02:31:08PM +0100, Dave P Martin wrote:
> > > > On Thu, Oct 17, 2024 at 02:39:04PM +0100, Kevin Brodsky wrote:
> > > >> This series is a follow-up to Joey's Permission Overlay Extension (POE)
> > > >> series [1] that recently landed on mainline. The goal is to improve the
> > > >> way we handle the register that governs which pkeys/POIndex are
> > > >> accessible (POR_EL0) during signal delivery. As things stand, we may
> > > >> unexpectedly fail to write the signal frame on the stack because POR_EL0
> > > >> is not reset before the uaccess operations. See patch 3 for more details
> > > >> and the main changes this series brings.
> > > >>
> > > >> A similar series landed recently for x86/MPK [2]; the present series
> > > >> aims at aligning arm64 with x86. Worth noting: once the signal frame is
> > > >> written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
> > > >> only. This means that a program that sets up an alternate signal stack
> > > >> with a non-zero pkey will need some assembly trampoline to set POR_EL0
> > > >> before invoking the real signal handler, as discussed here [3].
> [...]
> > Memory with a non-zero pkey cannot be used 100% portably, period, and
> > having non-RW(X) permissions on pkey 0 at any time is also not
> > portable, period.  So I'm not sure that having libc magically guess
> > what userspace's pkeys policy is supposed to be based on racily digging
> > metadata out of /proc/self/maps or a cache of it etc. would be such a
> > good idea.
> 
> I agree that changing RWX overlay permission for pkey 0 to anything else
> is a really bad idea. We can't prevent it but we shouldn't actively try
> to work around it in the kernel either. With the current signal ABI, I
> don't think we should support anything other than pkey 0 for the stack.
> Since the user shouldn't change the pkey 0 RWX overlay permission
> anyway, I don't think we should reset POR_EL0 _prior_ to writing the
> signal frame. The best we can do is document it somewhere.
> 
> So on patch 3 I'd only ensure that we have POR_EL0_INIT when invoking
> the signal handler and not when performing the uaccess. If the uaccess
> fails, we'd get a fatal SIGSEGV. The user may have got it already if it
> made the stack read-only.

Hmm, but based on what Kevin's saying, this would mean actively choosing
a different ABI than what x86 is trying to get to.

> Currently the primary use of pkeys is for W^X and signal stacks
> shouldn't fall into this category. If we ever have a strong case for
> non-zero pkeys on the signal stack, we'll need to look into some new
> ABI. I'm not sure about SS_* flags though, I think the signal POR_EL0
> should be associated with the sigaction rather than the stack (the
> latter would just be mapped by the user with the right pkey, the kernel
> doesn't need to know which, only what POR_EL0 is needed by the handler).
> 
> Until such case turns up, I'd not put any effort into ABI improvements.

Kevin -- do you know what prompted x86 to want the pkey to be reset early
in signal delivery? Perhaps such a use-case already exists.

> I can think of some light compartmentalisation where we have a pkey
> that's "privileged" and all threads have a POR_EL0 that prevents access
> to that pkey. The signal handler would have more permissive rights to
> that privileged pkey. I'd not proactively add support for this though.

I'd not proactively diverge from other architectures, either :p

Will
Pierre Langlois Oct. 22, 2024, 9:31 a.m. UTC | #6
Dave Martin <Dave.Martin@arm.com> writes:

> On Mon, Oct 21, 2024 at 12:06:25PM +0200, Kevin Brodsky wrote:
>> On 17/10/2024 17:48, Dave Martin wrote:
>> > On Thu, Oct 17, 2024 at 02:39:04PM +0100, Kevin Brodsky wrote:
>> >> This series is a follow-up to Joey's Permission Overlay Extension (POE)
>> >> series [1] that recently landed on mainline. The goal is to improve the
>> >> way we handle the register that governs which pkeys/POIndex are
>> >> accessible (POR_EL0) during signal delivery. As things stand, we may
>> >> unexpectedly fail to write the signal frame on the stack because POR_EL0
>> >> is not reset before the uaccess operations. See patch 3 for more details
>> >> and the main changes this series brings.
>> >>
>> >> A similar series landed recently for x86/MPK [2]; the present series
>> >> aims at aligning arm64 with x86. Worth noting: once the signal frame is
>> >> written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
>> >> only. This means that a program that sets up an alternate signal stack
>> >> with a non-zero pkey will need some assembly trampoline to set POR_EL0
>> >> before invoking the real signal handler, as discussed here [3].
>> > This feels a bit bogus (though it's anyway orthogonal to this series).
>> 
>> I'm not very fond of this either. However I believe this is the correct
>> first step: bring arm64 in line with x86. Removing all restrictions
>> before uaccess and then setting POR_EL0 to POR_EL0_INIT enables
>> userspace to use any pkey for the alternate signal stack without an ABI
>> change, albeit not in a very comfortable way (if the pkey is not 0).
>
> I see: we try not to prevent userspace from using whatever pkey it
> likes for the alternate signal stack, but we are only permissive for
> the bare minimum operations that userspace can't possibly control for
> itself (i.e., writing the signal frame).
>
> This whole thing feels a bit of a botch, though.
>
> Do we know of anyone actually using a sigaltstack with a pkey other
> than 0?  Why the urgency?  Code relying on an asm shim on x86 is
> already nonportable, unless I've misunderstood something, so simply
> turning on arm64 pkeys support in the kernel and libc shouldn't break
> anything today?  (At least, nothing that wasn't asking to be broken.)

As far as I know, Chrome plans on using a sigaltstack with a non-zero
pkey as part of the V8 CFI and W^X work [0][1][2]. IIUC that was is part
of the motivation for the x86 change. I don't know if it's urgent
though.

I added Stephen on CC who might be able to comment on the current state
of things in Chrome. I don't think the code that uses a pkey on a
sigaltstack is upstream yet.

[0]: https://v8.dev/blog/control-flow-integrity#signal-frame-corruption
[1]: https://lore.kernel.org/lkml/CAEAAPHa3g0QwU=DZ2zVCqTCSh-+n2TtVKrQ07LvpwDjQ-F09gA@mail.gmail.com/
[2]: https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo

>
>> 
>> > Really, we want some way for userspace to tell the kernel what
>> > permissions to use for the alternate signal stack and signal handler
>> > using it, and then honour that request consistently (just as we try to
>> > do for the main stack today).
>> >
>> > ss_flags is mostly unused... I wonder whether we could add something in
>> > there?  Or add a sigaltstack2()?
>> 
>> Yes, this would be sensible as a second step (backwards-compatible
>> extension). Exactly how that API would look like is not trivial though:
>> is the pkey implicitly derived from the pointer provided to
>> sigaltstack()? Is there a need to specify another pkey for code, or do
>> we just assume that the signal handler is only using code with pkey 0?
>> (Not a concern on x86 as MPK doesn't restrict execution.) Would be very
>> interested to hear opinions on this.

I hadn't considered setting a non-zero pkey for code, but it sounds
appealing.

The general goal, IIUC, is for signal handlers to run in an isolated
"context" using pkeys, in order to mitigate against an attacker trying
to corrupt the CPU state on the stack from another thread. Then use this
as a way to bypass any CFI mitigation, by setting an arbitrary PC and
registers.

So sigaltstack+pkey helps with isolating the stack. Then it's up to the
programmer to carefully write the signal handler code so it only uses
pkey-tagged data that other threads cannot corrupt in order to trick the
signal handler into writing to its own stack.

In this context, using a non-default pkey for code might be useful, in
order to differentiate between "valid" signal handlers and other
functions. It could help fend against an attacker being able to use
sigaction as a whole-function gadget to install any function as a signal
hander. Basically mitigating going from a limited CFI bypass to an
arbitrary CFI bypass.

That being said, regarding the kernel API, it might be possible to do
the above with this patch. We'd be using the proposed "assembly
prologues" that sets POR_EL0 as the first thing then continues to the
real signal handler. But if we can avoid those and directly ask the
kernel what POR_EL0 should be set to, it'd be simpler (and maybe safer).

>> 
>> Kevin
>
> I would vote for specifying the pkey (or, if feasible, PKRU or
> modifications to it) in some bits of ss_flags, or in an additional
> flags argument to sigaltstack2().
>
> Memory with a non-zero pkey cannot be used 100% portably, period, and
> having non-RW(X) permissions on pkey 0 at any time is also not
> portable, period.  So I'm not sure that having libc magically guess
> what userspace's pkeys policy is supposed to be based on racily digging
> metadata out of /proc/self/maps or a cache of it etc. would be such a
> good idea.
>
> There are other ways to approach (or not approach) this though --
> I would be interested to hear what other people think too...

Thinking about this, I'm not sure about tying this API to sigaltstack,
as this is about configuring the POR_EL0 register, which may control
more than the stack.

I actually have a concrete example of this in V8. There's a
SetDefaultPermissionsForSignalHandler [3] function that needs to be
called first thing on signal handlers to configure access to an
allocated non-zero key.  This is independent from having a pkey-tagged
sigaltstack or not, but I suppose later on it will need to be replaced
with assembly when the stack is no-longer accessible.

[3]: https://source.chromium.org/chromium/chromium/src/+/main:v8/include/v8-platform.h;l=665;drc=0abf23ec2a1bb475b1555790fdc72ef630a43c2a;bpv=1;bpt=1

Doing this via sigaction as Catalin suggested makes sense to me, but I'm
unsure how we express how POR_EL0 needs to be set solely using SA_*
flags. Are we able to add a new architecture-specific payload to
sigaction, or would that resort in a new syscall like sigaction2?

As an alternative, I was wondering if this would warrant a new syscall
like sigaltstack, but for CPU state.

Thanks,
Pierre
Stephen Röttger Oct. 22, 2024, 10:40 a.m. UTC | #7
On Tue, Oct 22, 2024 at 11:31 AM Pierre Langlois
<pierre.langlois@arm.com> wrote:
>
> Dave Martin <Dave.Martin@arm.com> writes:
>
> > On Mon, Oct 21, 2024 at 12:06:25PM +0200, Kevin Brodsky wrote:
> >> On 17/10/2024 17:48, Dave Martin wrote:
> >> > On Thu, Oct 17, 2024 at 02:39:04PM +0100, Kevin Brodsky wrote:
> >> >> This series is a follow-up to Joey's Permission Overlay Extension (POE)
> >> >> series [1] that recently landed on mainline. The goal is to improve the
> >> >> way we handle the register that governs which pkeys/POIndex are
> >> >> accessible (POR_EL0) during signal delivery. As things stand, we may
> >> >> unexpectedly fail to write the signal frame on the stack because POR_EL0
> >> >> is not reset before the uaccess operations. See patch 3 for more details
> >> >> and the main changes this series brings.
> >> >>
> >> >> A similar series landed recently for x86/MPK [2]; the present series
> >> >> aims at aligning arm64 with x86. Worth noting: once the signal frame is
> >> >> written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
> >> >> only. This means that a program that sets up an alternate signal stack
> >> >> with a non-zero pkey will need some assembly trampoline to set POR_EL0
> >> >> before invoking the real signal handler, as discussed here [3].
> >> > This feels a bit bogus (though it's anyway orthogonal to this series).
> >>
> >> I'm not very fond of this either. However I believe this is the correct
> >> first step: bring arm64 in line with x86. Removing all restrictions
> >> before uaccess and then setting POR_EL0 to POR_EL0_INIT enables
> >> userspace to use any pkey for the alternate signal stack without an ABI
> >> change, albeit not in a very comfortable way (if the pkey is not 0).
> >
> > I see: we try not to prevent userspace from using whatever pkey it
> > likes for the alternate signal stack, but we are only permissive for
> > the bare minimum operations that userspace can't possibly control for
> > itself (i.e., writing the signal frame).
> >
> > This whole thing feels a bit of a botch, though.
> >
> > Do we know of anyone actually using a sigaltstack with a pkey other
> > than 0?  Why the urgency?  Code relying on an asm shim on x86 is
> > already nonportable, unless I've misunderstood something, so simply
> > turning on arm64 pkeys support in the kernel and libc shouldn't break
> > anything today?  (At least, nothing that wasn't asking to be broken.)
>
> As far as I know, Chrome plans on using a sigaltstack with a non-zero
> pkey as part of the V8 CFI and W^X work [0][1][2]. IIUC that was is part
> of the motivation for the x86 change. I don't know if it's urgent
> though.
>
> I added Stephen on CC who might be able to comment on the current state
> of things in Chrome. I don't think the code that uses a pkey on a
> sigaltstack is upstream yet.

We don't have any code for this in Chrome, since I believe it's not
supported by the kernel yet.

> [0]: https://v8.dev/blog/control-flow-integrity#signal-frame-corruption
> [1]: https://lore.kernel.org/lkml/CAEAAPHa3g0QwU=DZ2zVCqTCSh-+n2TtVKrQ07LvpwDjQ-F09gA@mail.gmail.com/
> [2]: https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo
>
> >
> >>
> >> > Really, we want some way for userspace to tell the kernel what
> >> > permissions to use for the alternate signal stack and signal handler
> >> > using it, and then honour that request consistently (just as we try to
> >> > do for the main stack today).
> >> >
> >> > ss_flags is mostly unused... I wonder whether we could add something in
> >> > there?  Or add a sigaltstack2()?
> >>
> >> Yes, this would be sensible as a second step (backwards-compatible
> >> extension). Exactly how that API would look like is not trivial though:
> >> is the pkey implicitly derived from the pointer provided to
> >> sigaltstack()? Is there a need to specify another pkey for code, or do
> >> we just assume that the signal handler is only using code with pkey 0?
> >> (Not a concern on x86 as MPK doesn't restrict execution.) Would be very
> >> interested to hear opinions on this.
>
> I hadn't considered setting a non-zero pkey for code, but it sounds
> appealing.
>
> The general goal, IIUC, is for signal handlers to run in an isolated
> "context" using pkeys, in order to mitigate against an attacker trying
> to corrupt the CPU state on the stack from another thread. Then use this
> as a way to bypass any CFI mitigation, by setting an arbitrary PC and
> registers.

Right. We're mainly looking for a solution to protect the signal frame
against memory corruption.
I'm aware of two proposals on how to achieve this:
1) is using a pkey-protected sigaltstack, which requires a patchset
like [0] to allow xsave to write to the stack
2) is to store part of the sigframe on the shadow stack as Rick
Edgecombe proposed in [1]

[0] https://lore.kernel.org/lkml/20240802061318.2140081-1-aruna.ramakrishna@oracle.com/#t
[1] https://lore.kernel.org/lkml/2fb80876e286b4db8f9ef36bcce04bbf02af0de2.camel@intel.com/

> So sigaltstack+pkey helps with isolating the stack. Then it's up to the
> programmer to carefully write the signal handler code so it only uses
> pkey-tagged data that other threads cannot corrupt in order to trick the
> signal handler into writing to its own stack.
>
> In this context, using a non-default pkey for code might be useful, in
> order to differentiate between "valid" signal handlers and other
> functions. It could help fend against an attacker being able to use
> sigaction as a whole-function gadget to install any function as a signal
> hander. Basically mitigating going from a limited CFI bypass to an
> arbitrary CFI bypass.
>
> That being said, regarding the kernel API, it might be possible to do
> the above with this patch. We'd be using the proposed "assembly
> prologues" that sets POR_EL0 as the first thing then continues to the
> real signal handler. But if we can avoid those and directly ask the
> kernel what POR_EL0 should be set to, it'd be simpler (and maybe safer).
>
> >>
> >> Kevin
> >
> > I would vote for specifying the pkey (or, if feasible, PKRU or
> > modifications to it) in some bits of ss_flags, or in an additional
> > flags argument to sigaltstack2().
> >
> > Memory with a non-zero pkey cannot be used 100% portably, period, and
> > having non-RW(X) permissions on pkey 0 at any time is also not
> > portable, period.  So I'm not sure that having libc magically guess
> > what userspace's pkeys policy is supposed to be based on racily digging
> > metadata out of /proc/self/maps or a cache of it etc. would be such a
> > good idea.
> >
> > There are other ways to approach (or not approach) this though --
> > I would be interested to hear what other people think too...
>
> Thinking about this, I'm not sure about tying this API to sigaltstack,
> as this is about configuring the POR_EL0 register, which may control
> more than the stack.
>
> I actually have a concrete example of this in V8. There's a
> SetDefaultPermissionsForSignalHandler [3] function that needs to be
> called first thing on signal handlers to configure access to an
> allocated non-zero key.  This is independent from having a pkey-tagged
> sigaltstack or not, but I suppose later on it will need to be replaced
> with assembly when the stack is no-longer accessible.
>
> [3]: https://source.chromium.org/chromium/chromium/src/+/main:v8/include/v8-platform.h;l=665;drc=0abf23ec2a1bb475b1555790fdc72ef630a43c2a;bpv=1;bpt=1
>
> Doing this via sigaction as Catalin suggested makes sense to me, but I'm
> unsure how we express how POR_EL0 needs to be set solely using SA_*
> flags. Are we able to add a new architecture-specific payload to
> sigaction, or would that resort in a new syscall like sigaction2?
>
> As an alternative, I was wondering if this would warrant a new syscall
> like sigaltstack, but for CPU state.
>
> Thanks,
> Pierre
Catalin Marinas Oct. 22, 2024, 10:45 a.m. UTC | #8
On Mon, Oct 21, 2024 at 06:19:38PM +0100, Will Deacon wrote:
> On Mon, Oct 21, 2024 at 04:30:04PM +0100, Catalin Marinas wrote:
> > On Mon, Oct 21, 2024 at 02:31:08PM +0100, Dave P Martin wrote:
> > > > > On Thu, Oct 17, 2024 at 02:39:04PM +0100, Kevin Brodsky wrote:
> > > > >> This series is a follow-up to Joey's Permission Overlay Extension (POE)
> > > > >> series [1] that recently landed on mainline. The goal is to improve the
> > > > >> way we handle the register that governs which pkeys/POIndex are
> > > > >> accessible (POR_EL0) during signal delivery. As things stand, we may
> > > > >> unexpectedly fail to write the signal frame on the stack because POR_EL0
> > > > >> is not reset before the uaccess operations. See patch 3 for more details
> > > > >> and the main changes this series brings.
> > > > >>
> > > > >> A similar series landed recently for x86/MPK [2]; the present series
> > > > >> aims at aligning arm64 with x86. Worth noting: once the signal frame is
> > > > >> written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
> > > > >> only. This means that a program that sets up an alternate signal stack
> > > > >> with a non-zero pkey will need some assembly trampoline to set POR_EL0
> > > > >> before invoking the real signal handler, as discussed here [3].
> > [...]
> > > Memory with a non-zero pkey cannot be used 100% portably, period, and
> > > having non-RW(X) permissions on pkey 0 at any time is also not
> > > portable, period.  So I'm not sure that having libc magically guess
> > > what userspace's pkeys policy is supposed to be based on racily digging
> > > metadata out of /proc/self/maps or a cache of it etc. would be such a
> > > good idea.
> > 
> > I agree that changing RWX overlay permission for pkey 0 to anything else
> > is a really bad idea. We can't prevent it but we shouldn't actively try
> > to work around it in the kernel either. With the current signal ABI, I
> > don't think we should support anything other than pkey 0 for the stack.
> > Since the user shouldn't change the pkey 0 RWX overlay permission
> > anyway, I don't think we should reset POR_EL0 _prior_ to writing the
> > signal frame. The best we can do is document it somewhere.
> > 
> > So on patch 3 I'd only ensure that we have POR_EL0_INIT when invoking
> > the signal handler and not when performing the uaccess. If the uaccess
> > fails, we'd get a fatal SIGSEGV. The user may have got it already if it
> > made the stack read-only.
> 
> Hmm, but based on what Kevin's saying, this would mean actively choosing
> a different ABI than what x86 is trying to get to.

I was more thinking of not relaxing the ABI further at this point in the
rc cycle rather than completely diverging (x86 did relax the ABI
subsequently to handle non-zero pkey sigaltstack).

> > Currently the primary use of pkeys is for W^X and signal stacks
> > shouldn't fall into this category. If we ever have a strong case for
> > non-zero pkeys on the signal stack, we'll need to look into some new
> > ABI. I'm not sure about SS_* flags though, I think the signal POR_EL0
> > should be associated with the sigaction rather than the stack (the
> > latter would just be mapped by the user with the right pkey, the kernel
> > doesn't need to know which, only what POR_EL0 is needed by the handler).
> > 
> > Until such case turns up, I'd not put any effort into ABI improvements.
> 
> Kevin -- do you know what prompted x86 to want the pkey to be reset early
> in signal delivery? Perhaps such a use-case already exists.

Given the email from Pierre with Chrome potentially using a sigaltstack
with a non-zero pkey, Kevin's patches (and the x86 changes) make more
sense. The question is whether we do this as a fix now or we leave the
relaxation for a subsequent kernel release. I guess we could squeeze it
now if we agree on the implementation.