arm64: Trap WFI executed in userspace

Message ID	20180807093326.5090-1-marc.zyngier@arm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Marc Zyngier <marc.zyngier@arm.com> To: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH] arm64: Trap WFI executed in userspace Date: Tue, 7 Aug 2018 10:33:26 +0100 Message-Id: <20180807093326.5090-1-marc.zyngier@arm.com> Precedence: list Cc: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will.deacon@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org
Series	arm64: Trap WFI executed in userspace \| expand arm64: Trap WFI executed in userspace

Marc Zyngier Aug. 7, 2018, 9:33 a.m. UTC

It recently came to light that userspace can execute WFI, and that
the arm64 kernel doesn trap this event. This sounds rather benign,
but the kernel should decide when it wants to wait for an interrupt,
and not userspace.

Let's trap WFI and treat it as a way to yield the CPU to another
process.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/esr.h    |  4 ++++
 arch/arm64/include/asm/sysreg.h |  4 ++--
 arch/arm64/kernel/entry.S       |  1 +
 arch/arm64/kernel/traps.c       | 12 ++++++++++++
 4 files changed, 19 insertions(+), 2 deletions(-)

Dave Martin Aug. 7, 2018, 10:05 a.m. UTC | #1

On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:
> It recently came to light that userspace can execute WFI, and that
> the arm64 kernel doesn trap this event. This sounds rather benign,
> but the kernel should decide when it wants to wait for an interrupt,
> and not userspace.
> 
> Let's trap WFI and treat it as a way to yield the CPU to another
> process.

This doesn't amount to a justification.

If the power controller is unexpectedly left in a bad state so that
WFI will do something nasty to a cpu that may enter userspace, then we
probably have bigger problems.

So, maybe it really is pretty harmless to let userspace execute this.

I can't think of a legitimate reason for userspace to execute WFI
however.  Userspace doesn't have interrupts under Linux, so it makes
no sense to wait for one.

Have we seen anybody using WFI in userspace?  It may be cleaner to
map this to SIGILL rather than be permissive and regret it later.

[...]

Cheers
---Dave

Marc Zyngier Aug. 7, 2018, 10:24 a.m. UTC | #2

On 07/08/18 11:05, Dave Martin wrote:
> On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:
>> It recently came to light that userspace can execute WFI, and that
>> the arm64 kernel doesn trap this event. This sounds rather benign,
>> but the kernel should decide when it wants to wait for an interrupt,
>> and not userspace.
>>
>> Let's trap WFI and treat it as a way to yield the CPU to another
>> process.
> 
> This doesn't amount to a justification.
> 
> If the power controller is unexpectedly left in a bad state so that
> WFI will do something nasty to a cpu that may enter userspace, then we
> probably have bigger problems.
> 
> So, maybe it really is pretty harmless to let userspace execute this.

Or not. It is also a very good way for userspace to find out when an
interrupt gets delivered and start doing all kind of probing on the
kernel. The least the userspace knows about that, the better I feel.

> I can't think of a legitimate reason for userspace to execute WFI
> however.  Userspace doesn't have interrupts under Linux, so it makes
> no sense to wait for one.
> 
> Have we seen anybody using WFI in userspace?  It may be cleaner to
> map this to SIGILL rather than be permissive and regret it later.

I couldn't find any user, and I'm happy to just send userspace to hell
in that case. But it could also been said that since it was never
prevented, it is a de-facto ABI.

	M.

Dave Martin Aug. 7, 2018, 10:30 a.m. UTC | #3

On Tue, Aug 07, 2018 at 11:24:34AM +0100, Marc Zyngier wrote:
> On 07/08/18 11:05, Dave Martin wrote:
> > On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:
> >> It recently came to light that userspace can execute WFI, and that
> >> the arm64 kernel doesn trap this event. This sounds rather benign,
> >> but the kernel should decide when it wants to wait for an interrupt,
> >> and not userspace.
> >>
> >> Let's trap WFI and treat it as a way to yield the CPU to another
> >> process.
> > 
> > This doesn't amount to a justification.
> > 
> > If the power controller is unexpectedly left in a bad state so that
> > WFI will do something nasty to a cpu that may enter userspace, then we
> > probably have bigger problems.
> > 
> > So, maybe it really is pretty harmless to let userspace execute this.
> 
> Or not. It is also a very good way for userspace to find out when an
> interrupt gets delivered and start doing all kind of probing on the
> kernel. The least the userspace knows about that, the better I feel.

Possibly.  I suspect there are other ways to guess pretty accurately
when an interrupt occurs, but WFI allows greater precision.

> > I can't think of a legitimate reason for userspace to execute WFI
> > however.  Userspace doesn't have interrupts under Linux, so it makes
> > no sense to wait for one.
> > 
> > Have we seen anybody using WFI in userspace?  It may be cleaner to
> > map this to SIGILL rather than be permissive and regret it later.
> 
> I couldn't find any user, and I'm happy to just send userspace to hell
> in that case. But it could also been said that since it was never
> prevented, it is a de-facto ABI.

Agreed.  I wonder whether it's sufficient to have this mapping to SIGILL
in -next for a while and see whether anybody complains.

Cheers
---Dave

Robin Murphy Aug. 7, 2018, 12:12 p.m. UTC | #4

On 07/08/18 11:30, Dave Martin wrote:
> On Tue, Aug 07, 2018 at 11:24:34AM +0100, Marc Zyngier wrote:
>> On 07/08/18 11:05, Dave Martin wrote:
>>> On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:
>>>> It recently came to light that userspace can execute WFI, and that
>>>> the arm64 kernel doesn trap this event. This sounds rather benign,
>>>> but the kernel should decide when it wants to wait for an interrupt,
>>>> and not userspace.
>>>>
>>>> Let's trap WFI and treat it as a way to yield the CPU to another
>>>> process.
>>>
>>> This doesn't amount to a justification.
>>>
>>> If the power controller is unexpectedly left in a bad state so that
>>> WFI will do something nasty to a cpu that may enter userspace, then we
>>> probably have bigger problems.
>>>
>>> So, maybe it really is pretty harmless to let userspace execute this.
>>
>> Or not. It is also a very good way for userspace to find out when an
>> interrupt gets delivered and start doing all kind of probing on the
>> kernel. The least the userspace knows about that, the better I feel.
> 
> Possibly.  I suspect there are other ways to guess pretty accurately
> when an interrupt occurs, but WFI allows greater precision.

...unless you're running in a VM and it traps to KVM anyway ;)

>>> I can't think of a legitimate reason for userspace to execute WFI
>>> however.  Userspace doesn't have interrupts under Linux, so it makes
>>> no sense to wait for one.
>>>
>>> Have we seen anybody using WFI in userspace?  It may be cleaner to
>>> map this to SIGILL rather than be permissive and regret it later.
>>
>> I couldn't find any user, and I'm happy to just send userspace to hell
>> in that case. But it could also been said that since it was never
>> prevented, it is a de-facto ABI.
> 
> Agreed.  I wonder whether it's sufficient to have this mapping to SIGILL
> in -next for a while and see whether anybody complains.

I think we'd have to avoid that for compat, though, since v7 code would 
have the expectation that WFI can't be trapped by the kernel at all. 
Personally I'm in favour of this patch as-is, since the architectural 
intent of the instruction is essentially "I've got nothing better to do 
right now than wait for something to happen", so treating it as a poor 
man's sched_vield() stays close to that while mitigating the potential 
nefarious and/or minor DoS implications of letting it architecturally 
execute.

Robin.

Dave Martin Aug. 7, 2018, 1:02 p.m. UTC | #5

On Tue, Aug 07, 2018 at 01:12:03PM +0100, Robin Murphy wrote:
> On 07/08/18 11:30, Dave Martin wrote:
> >On Tue, Aug 07, 2018 at 11:24:34AM +0100, Marc Zyngier wrote:
> >>On 07/08/18 11:05, Dave Martin wrote:
> >>>On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:
> >>>>It recently came to light that userspace can execute WFI, and that
> >>>>the arm64 kernel doesn trap this event. This sounds rather benign,
> >>>>but the kernel should decide when it wants to wait for an interrupt,
> >>>>and not userspace.
> >>>>
> >>>>Let's trap WFI and treat it as a way to yield the CPU to another
> >>>>process.
> >>>
> >>>This doesn't amount to a justification.
> >>>
> >>>If the power controller is unexpectedly left in a bad state so that
> >>>WFI will do something nasty to a cpu that may enter userspace, then we
> >>>probably have bigger problems.
> >>>
> >>>So, maybe it really is pretty harmless to let userspace execute this.
> >>
> >>Or not. It is also a very good way for userspace to find out when an
> >>interrupt gets delivered and start doing all kind of probing on the
> >>kernel. The least the userspace knows about that, the better I feel.
> >
> >Possibly.  I suspect there are other ways to guess pretty accurately
> >when an interrupt occurs, but WFI allows greater precision.
> 
> ...unless you're running in a VM and it traps to KVM anyway ;)
> 
> >>>I can't think of a legitimate reason for userspace to execute WFI
> >>>however.  Userspace doesn't have interrupts under Linux, so it makes
> >>>no sense to wait for one.
> >>>
> >>>Have we seen anybody using WFI in userspace?  It may be cleaner to
> >>>map this to SIGILL rather than be permissive and regret it later.
> >>
> >>I couldn't find any user, and I'm happy to just send userspace to hell
> >>in that case. But it could also been said that since it was never
> >>prevented, it is a de-facto ABI.
> >
> >Agreed.  I wonder whether it's sufficient to have this mapping to SIGILL
> >in -next for a while and see whether anybody complains.
> 
> I think we'd have to avoid that for compat, though, since v7 code would have
> the expectation that WFI can't be trapped by the kernel at all. Personally
> I'm in favour of this patch as-is, since the architectural intent of the
> instruction is essentially "I've got nothing better to do right now than
> wait for something to happen", so treating it as a poor man's sched_vield()

Congratulations for knowing what the architects intended ;)

That said, sched_yield() does seem a reasonable approximation if
we don't go for SIGILL.


Note, the patch currently maps WFI to schedule(), which is not quite
the same, though I don't understand the details.

I suspect that sched_yield() is more appropriate, since it tells the
scheduler that the task doesn't expect to run in the very near future,
whereas schedule() only tells the scheduler not to run the task right
now.

> stays close to that while mitigating the potential nefarious and/or minor
> DoS implications of letting it architecturally execute.

Agreed.


For cleanliness I still prefer SIGILL, since WFI just provides code
with a grubby, nonportable means of yielding that won't work the
same on all kernel versions anyway, whereas there is a perfectly good
standard API for yielding already.

This argument works for compat too (if a little more controversial,
due to the historial lack of a trap).

But sched_yield() is minimally invasive and highly unlikely to break
anything (even where deserved).


I wonder whether we can get the Debian folks to act as guinea-pigs
for the SIGILL approach.  Their package builders are likely to be good
at finding compat regressions...  May not be worth it for such a small
change though.

Cheers
---Dave

Catalin Marinas Aug. 8, 2018, 12:34 p.m. UTC | #6

On Tue, Aug 07, 2018 at 11:24:34AM +0100, Marc Zyngier wrote:
> On 07/08/18 11:05, Dave Martin wrote:
> > On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:
> >> It recently came to light that userspace can execute WFI, and that
> >> the arm64 kernel doesn trap this event. This sounds rather benign,

Nitpick: "doesn't".

> >> but the kernel should decide when it wants to wait for an interrupt,
> >> and not userspace.
> >>
> >> Let's trap WFI and treat it as a way to yield the CPU to another
> >> process.
[...]
> > I can't think of a legitimate reason for userspace to execute WFI
> > however.  Userspace doesn't have interrupts under Linux, so it makes
> > no sense to wait for one.
> > 
> > Have we seen anybody using WFI in userspace?  It may be cleaner to
> > map this to SIGILL rather than be permissive and regret it later.
> 
> I couldn't find any user, and I'm happy to just send userspace to hell
> in that case. But it could also been said that since it was never
> prevented, it is a de-facto ABI.

I wouldn't really go as far as SIGILL on WFI. I think the patch is fine
as it is. In case Will plans to merge it:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Dave Martin Aug. 9, 2018, 12:34 p.m. UTC | #7

On Wed, Aug 08, 2018 at 01:34:09PM +0100, Catalin Marinas wrote:
> On Tue, Aug 07, 2018 at 11:24:34AM +0100, Marc Zyngier wrote:
> > On 07/08/18 11:05, Dave Martin wrote:
> > > On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:
> > >> It recently came to light that userspace can execute WFI, and that
> > >> the arm64 kernel doesn trap this event. This sounds rather benign,
> 
> Nitpick: "doesn't".
> 
> > >> but the kernel should decide when it wants to wait for an interrupt,
> > >> and not userspace.
> > >>
> > >> Let's trap WFI and treat it as a way to yield the CPU to another
> > >> process.
> [...]
> > > I can't think of a legitimate reason for userspace to execute WFI
> > > however.  Userspace doesn't have interrupts under Linux, so it makes
> > > no sense to wait for one.
> > > 
> > > Have we seen anybody using WFI in userspace?  It may be cleaner to
> > > map this to SIGILL rather than be permissive and regret it later.
> > 
> > I couldn't find any user, and I'm happy to just send userspace to hell
> > in that case. But it could also been said that since it was never
> > prevented, it is a de-facto ABI.
> 
> I wouldn't really go as far as SIGILL on WFI. I think the patch is fine
> as it is. In case Will plans to merge it:

For practical purposes I agree, because we can't control the binary
blobs out there: I just wanted to bang the drum because we are creating
semantics here and there is not an obvious correct answer to what they
should be.

I'd still like to see rationale for why this should map to schedule()
(which userspace currently has no direct way to trigger) as opposed to
sched_yield() or something like that.

[...]

Cheers
---Dave

Will Deacon Aug. 9, 2018, 12:38 p.m. UTC | #8

On Thu, Aug 09, 2018 at 01:34:57PM +0100, Dave Martin wrote:
> On Wed, Aug 08, 2018 at 01:34:09PM +0100, Catalin Marinas wrote:
> > On Tue, Aug 07, 2018 at 11:24:34AM +0100, Marc Zyngier wrote:
> > > On 07/08/18 11:05, Dave Martin wrote:
> > > > On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:
> > > >> It recently came to light that userspace can execute WFI, and that
> > > >> the arm64 kernel doesn trap this event. This sounds rather benign,
> > 
> > Nitpick: "doesn't".
> > 
> > > >> but the kernel should decide when it wants to wait for an interrupt,
> > > >> and not userspace.
> > > >>
> > > >> Let's trap WFI and treat it as a way to yield the CPU to another
> > > >> process.
> > [...]
> > > > I can't think of a legitimate reason for userspace to execute WFI
> > > > however.  Userspace doesn't have interrupts under Linux, so it makes
> > > > no sense to wait for one.
> > > > 
> > > > Have we seen anybody using WFI in userspace?  It may be cleaner to
> > > > map this to SIGILL rather than be permissive and regret it later.
> > > 
> > > I couldn't find any user, and I'm happy to just send userspace to hell
> > > in that case. But it could also been said that since it was never
> > > prevented, it is a de-facto ABI.
> > 
> > I wouldn't really go as far as SIGILL on WFI. I think the patch is fine
> > as it is. In case Will plans to merge it:
> 
> For practical purposes I agree, because we can't control the binary
> blobs out there: I just wanted to bang the drum because we are creating
> semantics here and there is not an obvious correct answer to what they
> should be.
> 
> I'd still like to see rationale for why this should map to schedule()
> (which userspace currently has no direct way to trigger) as opposed to
> sched_yield() or something like that.

A better idea might just be to do pc +=4 and return. If there's work
pending, we'll hit it on the return path (just like any other ret_to_user
call).

I initially thought about sched_yield(), but it's not clear whether that
creates a problem if, e.g. seccomp has been used to restrict that syscall.

Will

Dave Martin Aug. 9, 2018, 12:47 p.m. UTC | #9

On Thu, Aug 09, 2018 at 01:38:12PM +0100, Will Deacon wrote:
> On Thu, Aug 09, 2018 at 01:34:57PM +0100, Dave Martin wrote:
> > On Wed, Aug 08, 2018 at 01:34:09PM +0100, Catalin Marinas wrote:
> > > On Tue, Aug 07, 2018 at 11:24:34AM +0100, Marc Zyngier wrote:
> > > > On 07/08/18 11:05, Dave Martin wrote:
> > > > > On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:
> > > > >> It recently came to light that userspace can execute WFI, and that
> > > > >> the arm64 kernel doesn trap this event. This sounds rather benign,
> > > 
> > > Nitpick: "doesn't".
> > > 
> > > > >> but the kernel should decide when it wants to wait for an interrupt,
> > > > >> and not userspace.
> > > > >>
> > > > >> Let's trap WFI and treat it as a way to yield the CPU to another
> > > > >> process.
> > > [...]
> > > > > I can't think of a legitimate reason for userspace to execute WFI
> > > > > however.  Userspace doesn't have interrupts under Linux, so it makes
> > > > > no sense to wait for one.
> > > > > 
> > > > > Have we seen anybody using WFI in userspace?  It may be cleaner to
> > > > > map this to SIGILL rather than be permissive and regret it later.
> > > > 
> > > > I couldn't find any user, and I'm happy to just send userspace to hell
> > > > in that case. But it could also been said that since it was never
> > > > prevented, it is a de-facto ABI.
> > > 
> > > I wouldn't really go as far as SIGILL on WFI. I think the patch is fine
> > > as it is. In case Will plans to merge it:
> > 
> > For practical purposes I agree, because we can't control the binary
> > blobs out there: I just wanted to bang the drum because we are creating
> > semantics here and there is not an obvious correct answer to what they
> > should be.
> > 
> > I'd still like to see rationale for why this should map to schedule()
> > (which userspace currently has no direct way to trigger) as opposed to
> > sched_yield() or something like that.
> 
> A better idea might just be to do pc +=4 and return. If there's work
> pending, we'll hit it on the return path (just like any other ret_to_user
> call).
> 
> I initially thought about sched_yield(), but it's not clear whether that
> creates a problem if, e.g. seccomp has been used to restrict that syscall.

Indeed.  I can't see why that might be restricted, but there's presumably
nothing to stop people doing that today.

Other than putting the task to sleep for 1ms or something, I don't know
what to suggest ;)

Perhaps we can patch a NOP into .text, like Marc's BX trick :P

Cheers
---Dave

Marc Zyngier Aug. 9, 2018, 1:25 p.m. UTC | #10

On Thu, 09 Aug 2018 13:47:01 +0100,
Dave Martin <Dave.Martin@arm.com> wrote:
> 
> On Thu, Aug 09, 2018 at 01:38:12PM +0100, Will Deacon wrote:
> > On Thu, Aug 09, 2018 at 01:34:57PM +0100, Dave Martin wrote:
> > > On Wed, Aug 08, 2018 at 01:34:09PM +0100, Catalin Marinas wrote:
> > > > On Tue, Aug 07, 2018 at 11:24:34AM +0100, Marc Zyngier wrote:
> > > > > On 07/08/18 11:05, Dave Martin wrote:
> > > > > > On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:
> > > > > >> It recently came to light that userspace can execute WFI, and that
> > > > > >> the arm64 kernel doesn trap this event. This sounds rather benign,
> > > > 
> > > > Nitpick: "doesn't".
> > > > 
> > > > > >> but the kernel should decide when it wants to wait for an interrupt,
> > > > > >> and not userspace.
> > > > > >>
> > > > > >> Let's trap WFI and treat it as a way to yield the CPU to another
> > > > > >> process.
> > > > [...]
> > > > > > I can't think of a legitimate reason for userspace to execute WFI
> > > > > > however.  Userspace doesn't have interrupts under Linux, so it makes
> > > > > > no sense to wait for one.
> > > > > > 
> > > > > > Have we seen anybody using WFI in userspace?  It may be cleaner to
> > > > > > map this to SIGILL rather than be permissive and regret it later.
> > > > > 
> > > > > I couldn't find any user, and I'm happy to just send userspace to hell
> > > > > in that case. But it could also been said that since it was never
> > > > > prevented, it is a de-facto ABI.
> > > > 
> > > > I wouldn't really go as far as SIGILL on WFI. I think the patch is fine
> > > > as it is. In case Will plans to merge it:
> > > 
> > > For practical purposes I agree, because we can't control the binary
> > > blobs out there: I just wanted to bang the drum because we are creating
> > > semantics here and there is not an obvious correct answer to what they
> > > should be.
> > > 
> > > I'd still like to see rationale for why this should map to schedule()
> > > (which userspace currently has no direct way to trigger) as opposed to
> > > sched_yield() or something like that.
> > 
> > A better idea might just be to do pc +=4 and return. If there's work
> > pending, we'll hit it on the return path (just like any other ret_to_user
> > call).
> > 
> > I initially thought about sched_yield(), but it's not clear whether that
> > creates a problem if, e.g. seccomp has been used to restrict that syscall.
> 
> Indeed.  I can't see why that might be restricted, but there's presumably
> nothing to stop people doing that today.
> 
> Other than putting the task to sleep for 1ms or something, I don't know
> what to suggest ;)
> 
> Perhaps we can patch a NOP into .text, like Marc's BX trick :P

Do not give me any additional bad idea... ;-)

I'll respin the patch without the schedule() call.

Thanks,

	M.

Pavel Machek Sept. 20, 2018, 10:04 p.m. UTC | #11

On Tue 2018-08-07 10:33:26, Marc Zyngier wrote:
> It recently came to light that userspace can execute WFI, and that
> the arm64 kernel doesn trap this event. This sounds rather benign,
> but the kernel should decide when it wants to wait for an interrupt,
> and not userspace.
> 
> Let's trap WFI and treat it as a way to yield the CPU to another
> process.

I don't think that's reasonable. Userspace should not be doing that, and we do not want
to encourage it. SIGILL would be reasonable, I'd  say.

(Very old mail, was forgotton in the queue)

Marc Zyngier Sept. 20, 2018, 10:33 p.m. UTC | #12

On Thu, 20 Sep 2018 23:04:30 +0100,
Pavel Machek <pavel@ucw.cz> wrote:
> 
> On Tue 2018-08-07 10:33:26, Marc Zyngier wrote:
> > It recently came to light that userspace can execute WFI, and that
> > the arm64 kernel doesn trap this event. This sounds rather benign,
> > but the kernel should decide when it wants to wait for an interrupt,
> > and not userspace.
> > 
> > Let's trap WFI and treat it as a way to yield the CPU to another
> > process.
> 
> I don't think that's reasonable. Userspace should not be doing that,
> and we do not want to encourage it. SIGILL would be reasonable, I'd
> say.

This ship has already sailed. Sending SIGILL changes the ABI in an
incompatible way. Immediately returning to userspace without doing
anything else (see v2 [1]) is the best we can do without changing the
behaviour of userspace.

Thanks,

	M.

[1] https://patchwork.kernel.org/patch/10562517/

arm64: Trap WFI executed in userspace

Commit Message

Comments

Patch