diff mbox series

[2/2] KVM: arm/arm64: Allow user injection of external data aborts

Message ID 20190909121337.27287-3-christoffer.dall@arm.com (mailing list archive)
State New, archived
Headers show
Series Improve handling of stage 2 aborts without instruction decode | expand

Commit Message

Christoffer Dall Sept. 9, 2019, 12:13 p.m. UTC
In some scenarios, such as buggy guest or incorrect configuration of the
VMM and firmware description data, userspace will detect a memory access
to a portion of the IPA, which is not mapped to any MMIO region.

For this purpose, the appropriate action is to inject an external abort
to the guest.  The kernel already has functionality to inject an
external abort, but we need to wire up a signal from user space that
lets user space tell the kernel to do this.

It turns out, we already have the set event functionality which we can
perfectly reuse for this.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
---
 Documentation/virt/kvm/api.txt    | 15 ++++++++++++++-
 arch/arm/include/uapi/asm/kvm.h   |  3 ++-
 arch/arm/kvm/guest.c              |  3 +++
 arch/arm64/include/uapi/asm/kvm.h |  3 ++-
 arch/arm64/kvm/guest.c            |  3 +++
 arch/arm64/kvm/inject_fault.c     |  4 ++--
 include/uapi/linux/kvm.h          |  1 +
 virt/kvm/arm/arm.c                |  1 +
 8 files changed, 28 insertions(+), 5 deletions(-)

Comments

Peter Maydell Sept. 9, 2019, 12:32 p.m. UTC | #1
On Mon, 9 Sep 2019 at 13:13, Christoffer Dall <christoffer.dall@arm.com> wrote:
>
> In some scenarios, such as buggy guest or incorrect configuration of the
> VMM and firmware description data, userspace will detect a memory access
> to a portion of the IPA, which is not mapped to any MMIO region.
>
> For this purpose, the appropriate action is to inject an external abort
> to the guest.  The kernel already has functionality to inject an
> external abort, but we need to wire up a signal from user space that
> lets user space tell the kernel to do this.
>
> It turns out, we already have the set event functionality which we can
> perfectly reuse for this.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  Documentation/virt/kvm/api.txt    | 15 ++++++++++++++-
>  arch/arm/include/uapi/asm/kvm.h   |  3 ++-
>  arch/arm/kvm/guest.c              |  3 +++
>  arch/arm64/include/uapi/asm/kvm.h |  3 ++-
>  arch/arm64/kvm/guest.c            |  3 +++
>  arch/arm64/kvm/inject_fault.c     |  4 ++--
>  include/uapi/linux/kvm.h          |  1 +
>  virt/kvm/arm/arm.c                |  1 +
>  8 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> index 02501333f746..edd6cdc470ca 100644
> --- a/Documentation/virt/kvm/api.txt
> +++ b/Documentation/virt/kvm/api.txt
> @@ -955,6 +955,8 @@ The following bits are defined in the flags field:
>
>  ARM/ARM64:
>
> +User space may need to inject several types of events to the guest.
> +
>  If the guest accesses a device that is being emulated by the host kernel in
>  such a way that a real device would generate a physical SError, KVM may make
>  a virtual SError pending for that VCPU. This system error interrupt remains
> @@ -989,12 +991,23 @@ Specifying exception.has_esr on a system that does not support it will return
>  -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
>  will return -EINVAL.
>
> +If the guest performed an access to I/O memory which could not be handled by
> +user space, for example because of missing instruction syndrome decode
> +information or because there is no device mapped at the accessed IPA, then
> +user space can ask the kernel to inject an external abort using the address
> +from the exiting fault on the VCPU. It is a programming error to set
> +ext_dabt_pending at the same time as any of the serror fields, or to set
> +ext_dabt_pending on an exit which was not either KVM_EXIT_MMIO or
> +KVM_EXIT_ARM_NISV. This feature is only available if the system supports
> +KVM_CAP_ARM_INJECT_EXT_DABT;
> +
>  struct kvm_vcpu_events {
>         struct {
>                 __u8 serror_pending;
>                 __u8 serror_has_esr;
> +               __u8 ext_dabt_pending;
>                 /* Align it to 8 bytes */
> -               __u8 pad[6];
> +               __u8 pad[5];
>                 __u64 serror_esr;
>         } exception;
>         __u32 reserved[12];

This API seems to be missing support for userspace to specify
whether the ESR_ELx for the guest should have the EA bit set
(and more generally other syndrome/fault status bits). I think
if we have an API for "KVM_EXIT_MMIO but the access failed"
then it should either (a) be architecture agnostic, since
pretty much any architecture might have a concept of "access
gave some bus-error-type failure" and it would be nice if userspace
didn't have to special case them all in arch-specific code,
or (b) have the same flexibility for specifying exactly what
kind of fault as the architecture does. This sort of seems to
fall between two stools. (My ideal for KVM_EXIT_MMIO faults
would be a generic API which included space for optional
arch-specific info, which for Arm would pretty much just be
the EA bit.)

As and when we support nested virtualization, any suggestions
on how this API would extend to support userspace saying
"deliver fault to guest EL1" vs "deliver fault to guest EL2" ?

thanks
-- PMM
Christoffer Dall Sept. 9, 2019, 3:16 p.m. UTC | #2
On Mon, Sep 09, 2019 at 01:32:46PM +0100, Peter Maydell wrote:
> On Mon, 9 Sep 2019 at 13:13, Christoffer Dall <christoffer.dall@arm.com> wrote:
> >
> > In some scenarios, such as buggy guest or incorrect configuration of the
> > VMM and firmware description data, userspace will detect a memory access
> > to a portion of the IPA, which is not mapped to any MMIO region.
> >
> > For this purpose, the appropriate action is to inject an external abort
> > to the guest.  The kernel already has functionality to inject an
> > external abort, but we need to wire up a signal from user space that
> > lets user space tell the kernel to do this.
> >
> > It turns out, we already have the set event functionality which we can
> > perfectly reuse for this.
> >
> > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > ---
> >  Documentation/virt/kvm/api.txt    | 15 ++++++++++++++-
> >  arch/arm/include/uapi/asm/kvm.h   |  3 ++-
> >  arch/arm/kvm/guest.c              |  3 +++
> >  arch/arm64/include/uapi/asm/kvm.h |  3 ++-
> >  arch/arm64/kvm/guest.c            |  3 +++
> >  arch/arm64/kvm/inject_fault.c     |  4 ++--
> >  include/uapi/linux/kvm.h          |  1 +
> >  virt/kvm/arm/arm.c                |  1 +
> >  8 files changed, 28 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> > index 02501333f746..edd6cdc470ca 100644
> > --- a/Documentation/virt/kvm/api.txt
> > +++ b/Documentation/virt/kvm/api.txt
> > @@ -955,6 +955,8 @@ The following bits are defined in the flags field:
> >
> >  ARM/ARM64:
> >
> > +User space may need to inject several types of events to the guest.
> > +
> >  If the guest accesses a device that is being emulated by the host kernel in
> >  such a way that a real device would generate a physical SError, KVM may make
> >  a virtual SError pending for that VCPU. This system error interrupt remains
> > @@ -989,12 +991,23 @@ Specifying exception.has_esr on a system that does not support it will return
> >  -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
> >  will return -EINVAL.
> >
> > +If the guest performed an access to I/O memory which could not be handled by
> > +user space, for example because of missing instruction syndrome decode
> > +information or because there is no device mapped at the accessed IPA, then
> > +user space can ask the kernel to inject an external abort using the address
> > +from the exiting fault on the VCPU. It is a programming error to set
> > +ext_dabt_pending at the same time as any of the serror fields, or to set
> > +ext_dabt_pending on an exit which was not either KVM_EXIT_MMIO or
> > +KVM_EXIT_ARM_NISV. This feature is only available if the system supports
> > +KVM_CAP_ARM_INJECT_EXT_DABT;
> > +
> >  struct kvm_vcpu_events {
> >         struct {
> >                 __u8 serror_pending;
> >                 __u8 serror_has_esr;
> > +               __u8 ext_dabt_pending;
> >                 /* Align it to 8 bytes */
> > -               __u8 pad[6];
> > +               __u8 pad[5];
> >                 __u64 serror_esr;
> >         } exception;
> >         __u32 reserved[12];
> 
> This API seems to be missing support for userspace to specify
> whether the ESR_ELx for the guest should have the EA bit set
> (and more generally other syndrome/fault status bits). I think
> if we have an API for "KVM_EXIT_MMIO but the access failed"
> then it should either (a) be architecture agnostic, since
> pretty much any architecture might have a concept of "access
> gave some bus-error-type failure" and it would be nice if userspace
> didn't have to special case them all in arch-specific code,
> or (b) have the same flexibility for specifying exactly what
> kind of fault as the architecture does. This sort of seems to
> fall between two stools. (My ideal for KVM_EXIT_MMIO faults
> would be a generic API which included space for optional
> arch-specific info, which for Arm would pretty much just be
> the EA bit.)

I'm not sure I understand exactly what would be improved by making this
either more architecture speific or more architecture generic.  The
EA bit will always be set, that's why the field is called
'ext_dabt_pending'.

I thought as per the previous discussion, that we were specifically
trying to avoid userspace emulating the exception in detail, so I
designed this to provide the minimal effort API for userspace.

Since we already have an architecture specific ioctl, kvm_vcpu_events, I
don't think we're painting ourselves into a corner by using that.  Is a
natural consequence of what you're saying not that we should try to make
that whole call architecture generic?

Unless we already have specific examples of how other architectures
would want to use something like this, and given the impact of this
patch, I'm not sure it's worth trying to speculate about that.

> 
> As and when we support nested virtualization, any suggestions
> on how this API would extend to support userspace saying
> "deliver fault to guest EL1" vs "deliver fault to guest EL2" ?
> 

If we took one of the supported exits from a VM with nested virt
support, it means that you either had a fault from the guest hypervisor,
or a fault from a nested guest where the guest hypervisor has set up a
virtual stage 2 mapping to a hole in the VM's IPA space.  In the former
case, the exception would be delivered back to guest hypervisor, and in
the latter case the target depends on the guest hypervisor's
configuration of the virtual HCR_EL2(.TEA), which the kernel should
respect when handling the KVM_SET_VCPU_EVENTS ioctl.


Thanks,

    Christoffer
Peter Maydell Sept. 9, 2019, 3:56 p.m. UTC | #3
On Mon, 9 Sep 2019 at 16:16, Christoffer Dall <christoffer.dall@arm.com> wrote:
>
> On Mon, Sep 09, 2019 at 01:32:46PM +0100, Peter Maydell wrote:
> > This API seems to be missing support for userspace to specify
> > whether the ESR_ELx for the guest should have the EA bit set
> > (and more generally other syndrome/fault status bits). I think
> > if we have an API for "KVM_EXIT_MMIO but the access failed"
> > then it should either (a) be architecture agnostic, since
> > pretty much any architecture might have a concept of "access
> > gave some bus-error-type failure" and it would be nice if userspace
> > didn't have to special case them all in arch-specific code,
> > or (b) have the same flexibility for specifying exactly what
> > kind of fault as the architecture does. This sort of seems to
> > fall between two stools. (My ideal for KVM_EXIT_MMIO faults
> > would be a generic API which included space for optional
> > arch-specific info, which for Arm would pretty much just be
> > the EA bit.)
>
> I'm not sure I understand exactly what would be improved by making this
> either more architecture speific or more architecture generic.  The
> EA bit will always be set, that's why the field is called
> 'ext_dabt_pending'.

ESR_EL1.EA doesn't mean "this is an external abort". It means
"given that this is an external abort as indicated by ESR_EL1.DFSC,
specify the external abort type". Traditionally this is 0 for
an AXI bus Decode error ("interconnect says there's nothing there")
and 1 for a Slave error ("there's something there but it told us
to go away"), though architecturally it's specified as impdef
because not everybody uses AXI. In QEMU we track the difference
between these two things and for TCG will raise external aborts
with the correct EA bit value.

> I thought as per the previous discussion, that we were specifically
> trying to avoid userspace emulating the exception in detail, so I
> designed this to provide the minimal effort API for userspace.
>
> Since we already have an architecture specific ioctl, kvm_vcpu_events, I
> don't think we're painting ourselves into a corner by using that.  Is a
> natural consequence of what you're saying not that we should try to make
> that whole call architecture generic?
>
> Unless we already have specific examples of how other architectures
> would want to use something like this, and given the impact of this
> patch, I'm not sure it's worth trying to speculate about that.

In QEMU, use of a generic API would look something like
this in kvm-all.c:

        case KVM_EXIT_MMIO:
            DPRINTF("handle_mmio\n");
            /* Called outside BQL */
            MemTxResult res;

            res = address_space_rw(&address_space_memory,
                                   run->mmio.phys_addr, attrs,
                                   run->mmio.data,
                                   run->mmio.len,
                                   run->mmio.is_write);
            if (res != MEMTX_OK) {
                /* tell the kernel the access failed, eg
                 * by updating the kvm_run struct to say so
                 */
            } else {
                /* access passed, we have updated the kvm_run
                 * struct's mmio subfield, proceed as usual
                 */
            }
            ret = 0;
            break;

[this is exactly the current QEMU code except that today
we throw away the 'res' that tells us if the transaction
succeeded because we have no way to report it to KVM and
effectively always RAZ/WI the access.]

This is nice because you don't need anything here that has to do
"bail out to architecture specific handling of anything",
you just say "nope, the access failed", and let the kernel handle
that however the CPU would handle it. It just immediately works
for all architectures on the userspace side (assuming the kernel
defaults to not actually trying to report an abort to the guest
if nobody's implemented that on the kernel side, which is exactly
what happens today where there's no way to report the error for
any architecture).
The downside is that you lose the ability to be more specific about
architecture-specific fine distinctions like decode errors vs slave
errors, though.

Or you could have an arm-specific API that does care about
fine details like the EA bit (and maybe also other ESR_ELx
fields); that has the downside that userspace needs to
make the handling of error returns from "handle this MMIO
access" architecture specific, but you get architecture-specific
benefits as a result. (Preferably the architecture-specific
APIs should at least be basically the same, eg same ioctl
or same bit of the kvm_run struct being updated with some parts
being arch-specific data, rather than 3 different mechanisms.)

Having an API that is architecture specific but doesn't actually
let you define any of the architecture-specific aspects of
what the abort might imply seems like the worst of both worlds.
If all we can say is "this aborted" then we might as well have
the API be generic.

thanks
-- PMM
Christoffer Dall Sept. 9, 2019, 5:36 p.m. UTC | #4
On Mon, Sep 09, 2019 at 04:56:23PM +0100, Peter Maydell wrote:
> On Mon, 9 Sep 2019 at 16:16, Christoffer Dall <christoffer.dall@arm.com> wrote:
> >
> > On Mon, Sep 09, 2019 at 01:32:46PM +0100, Peter Maydell wrote:
> > > This API seems to be missing support for userspace to specify
> > > whether the ESR_ELx for the guest should have the EA bit set
> > > (and more generally other syndrome/fault status bits). I think
> > > if we have an API for "KVM_EXIT_MMIO but the access failed"
> > > then it should either (a) be architecture agnostic, since
> > > pretty much any architecture might have a concept of "access
> > > gave some bus-error-type failure" and it would be nice if userspace
> > > didn't have to special case them all in arch-specific code,
> > > or (b) have the same flexibility for specifying exactly what
> > > kind of fault as the architecture does. This sort of seems to
> > > fall between two stools. (My ideal for KVM_EXIT_MMIO faults
> > > would be a generic API which included space for optional
> > > arch-specific info, which for Arm would pretty much just be
> > > the EA bit.)
> >
> > I'm not sure I understand exactly what would be improved by making this
> > either more architecture speific or more architecture generic.  The
> > EA bit will always be set, that's why the field is called
> > 'ext_dabt_pending'.
> 
> ESR_EL1.EA doesn't mean "this is an external abort". It means
> "given that this is an external abort as indicated by ESR_EL1.DFSC,
> specify the external abort type". Traditionally this is 0 for
> an AXI bus Decode error ("interconnect says there's nothing there")
> and 1 for a Slave error ("there's something there but it told us
> to go away"), though architecturally it's specified as impdef
> because not everybody uses AXI. In QEMU we track the difference
> between these two things and for TCG will raise external aborts
> with the correct EA bit value.
> 

Ah, I missed that.  I don't think we want to allow userspace to supply
any implementation defined values for the VM, though.

> > I thought as per the previous discussion, that we were specifically
> > trying to avoid userspace emulating the exception in detail, so I
> > designed this to provide the minimal effort API for userspace.
> >
> > Since we already have an architecture specific ioctl, kvm_vcpu_events, I
> > don't think we're painting ourselves into a corner by using that.  Is a
> > natural consequence of what you're saying not that we should try to make
> > that whole call architecture generic?
> >
> > Unless we already have specific examples of how other architectures
> > would want to use something like this, and given the impact of this
> > patch, I'm not sure it's worth trying to speculate about that.
> 
> In QEMU, use of a generic API would look something like
> this in kvm-all.c:
> 
>         case KVM_EXIT_MMIO:
>             DPRINTF("handle_mmio\n");
>             /* Called outside BQL */
>             MemTxResult res;
> 
>             res = address_space_rw(&address_space_memory,
>                                    run->mmio.phys_addr, attrs,
>                                    run->mmio.data,
>                                    run->mmio.len,
>                                    run->mmio.is_write);
>             if (res != MEMTX_OK) {
>                 /* tell the kernel the access failed, eg
>                  * by updating the kvm_run struct to say so
>                  */
>             } else {
>                 /* access passed, we have updated the kvm_run
>                  * struct's mmio subfield, proceed as usual
>                  */
>             }
>             ret = 0;
>             break;
> 
> [this is exactly the current QEMU code except that today
> we throw away the 'res' that tells us if the transaction
> succeeded because we have no way to report it to KVM and
> effectively always RAZ/WI the access.]
> 
> This is nice because you don't need anything here that has to do
> "bail out to architecture specific handling of anything",
> you just say "nope, the access failed", and let the kernel handle
> that however the CPU would handle it. It just immediately works
> for all architectures on the userspace side (assuming the kernel
> defaults to not actually trying to report an abort to the guest
> if nobody's implemented that on the kernel side, which is exactly
> what happens today where there's no way to report the error for
> any architecture).
> The downside is that you lose the ability to be more specific about
> architecture-specific fine distinctions like decode errors vs slave
> errors, though.

I understand that it's convenient to avoid having to write an
architecture hook, but I simply don't know if it makes sense to do this
on other architectures, and while it can be more code to have to write
the architecture hooks in QEMU, it's hardly a strong argument against
using an existing architecture-specific mechanism to inject an event to
a guest.

Note that I looked at using a an appropriate field in the kvm_run
structure, but nothing elegant came to mind.

Do you have a concrete example of how you would like to change the
kvm_run structure?

> 
> Or you could have an arm-specific API that does care about
> fine details like the EA bit (and maybe also other ESR_ELx
> fields); that has the downside that userspace needs to
> make the handling of error returns from "handle this MMIO
> access" architecture specific, but you get architecture-specific
> benefits as a result. (Preferably the architecture-specific
> APIs should at least be basically the same, eg same ioctl
> or same bit of the kvm_run struct being updated with some parts
> being arch-specific data, rather than 3 different mechanisms.)

Are there other bits of the ESR than the EA that you think we should be
able to specify?

Can we decide if we need to allow userspace to provide additional
information or not, and then decide on the mechanism, instead of
conflating the two questions?

I think we should either expose the minimal mechanism to user space, or
just leave it to user space to emulate the whole thing.


Thanks,

    Christoffer
Marc Zyngier Sept. 26, 2019, 2:09 p.m. UTC | #5
On 09/09/2019 13:13, Christoffer Dall wrote:
> In some scenarios, such as buggy guest or incorrect configuration of the
> VMM and firmware description data, userspace will detect a memory access
> to a portion of the IPA, which is not mapped to any MMIO region.
> 
> For this purpose, the appropriate action is to inject an external abort
> to the guest.  The kernel already has functionality to inject an
> external abort, but we need to wire up a signal from user space that
> lets user space tell the kernel to do this.
> 
> It turns out, we already have the set event functionality which we can
> perfectly reuse for this.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  Documentation/virt/kvm/api.txt    | 15 ++++++++++++++-
>  arch/arm/include/uapi/asm/kvm.h   |  3 ++-
>  arch/arm/kvm/guest.c              |  3 +++
>  arch/arm64/include/uapi/asm/kvm.h |  3 ++-
>  arch/arm64/kvm/guest.c            |  3 +++
>  arch/arm64/kvm/inject_fault.c     |  4 ++--
>  include/uapi/linux/kvm.h          |  1 +
>  virt/kvm/arm/arm.c                |  1 +
>  8 files changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> index 02501333f746..edd6cdc470ca 100644
> --- a/Documentation/virt/kvm/api.txt
> +++ b/Documentation/virt/kvm/api.txt
> @@ -955,6 +955,8 @@ The following bits are defined in the flags field:
>  
>  ARM/ARM64:
>  
> +User space may need to inject several types of events to the guest.
> +
>  If the guest accesses a device that is being emulated by the host kernel in
>  such a way that a real device would generate a physical SError, KVM may make
>  a virtual SError pending for that VCPU. This system error interrupt remains
> @@ -989,12 +991,23 @@ Specifying exception.has_esr on a system that does not support it will return
>  -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
>  will return -EINVAL.
>  
> +If the guest performed an access to I/O memory which could not be handled by
> +user space, for example because of missing instruction syndrome decode
> +information or because there is no device mapped at the accessed IPA, then
> +user space can ask the kernel to inject an external abort using the address
> +from the exiting fault on the VCPU. It is a programming error to set
> +ext_dabt_pending at the same time as any of the serror fields, or to set
> +ext_dabt_pending on an exit which was not either KVM_EXIT_MMIO or

... on *re-entry from* an exit?

> +KVM_EXIT_ARM_NISV. This feature is only available if the system supports
> +KVM_CAP_ARM_INJECT_EXT_DABT;

s/;/./

Can we add some wording to the fact that this is only a helper for the
most common case? Most of the ARM exceptions can otherwise be
constructed/injected using the SET_ONE_REG API.

> +
>  struct kvm_vcpu_events {
>  	struct {
>  		__u8 serror_pending;
>  		__u8 serror_has_esr;
> +		__u8 ext_dabt_pending;
>  		/* Align it to 8 bytes */
> -		__u8 pad[6];
> +		__u8 pad[5];
>  		__u64 serror_esr;
>  	} exception;
>  	__u32 reserved[12];
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index a4217c1a5d01..d2449a5bf8d5 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -131,8 +131,9 @@ struct kvm_vcpu_events {
>  	struct {
>  		__u8 serror_pending;
>  		__u8 serror_has_esr;
> +		__u8 ext_dabt_pending;
>  		/* Align it to 8 bytes */
> -		__u8 pad[6];
> +		__u8 pad[5];
>  		__u64 serror_esr;
>  	} exception;
>  	__u32 reserved[12];
> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
> index 684cf64b4033..4154c5589501 100644
> --- a/arch/arm/kvm/guest.c
> +++ b/arch/arm/kvm/guest.c
> @@ -263,11 +263,14 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
>  {
>  	bool serror_pending = events->exception.serror_pending;
>  	bool has_esr = events->exception.serror_has_esr;
> +	bool has_ext_dabt_pending = events->exception.ext_dabt_pending;
>  
>  	if (serror_pending && has_esr)
>  		return -EINVAL;
>  	else if (serror_pending)
>  		kvm_inject_vabt(vcpu);
> +	else if (has_ext_dabt_pending)
> +		kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>  
>  	return 0;
>  }
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 9a507716ae2f..7729efdb1c0c 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -164,8 +164,9 @@ struct kvm_vcpu_events {
>  	struct {
>  		__u8 serror_pending;
>  		__u8 serror_has_esr;
> +		__u8 ext_dabt_pending;
>  		/* Align it to 8 bytes */
> -		__u8 pad[6];
> +		__u8 pad[5];
>  		__u64 serror_esr;
>  	} exception;
>  	__u32 reserved[12];
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index dfd626447482..10e6e2144dca 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -720,6 +720,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
>  {
>  	bool serror_pending = events->exception.serror_pending;
>  	bool has_esr = events->exception.serror_has_esr;
> +	bool has_ext_dabt_pending = events->exception.ext_dabt_pending;
>  
>  	if (serror_pending && has_esr) {
>  		if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
> @@ -731,6 +732,8 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
>  			return -EINVAL;
>  	} else if (serror_pending) {
>  		kvm_inject_vabt(vcpu);
> +	} else if (has_ext_dabt_pending) {
> +		kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>  	}
>  
>  	return 0;
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index a9d25a305af5..ccdb6a051ab2 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -109,7 +109,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
>  
>  /**
>   * kvm_inject_dabt - inject a data abort into the guest
> - * @vcpu: The VCPU to receive the undefined exception
> + * @vcpu: The VCPU to receive the data abort
>   * @addr: The address to report in the DFAR
>   *
>   * It is assumed that this code is called from the VCPU thread and that the
> @@ -125,7 +125,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
>  
>  /**
>   * kvm_inject_pabt - inject a prefetch abort into the guest
> - * @vcpu: The VCPU to receive the undefined exception
> + * @vcpu: The VCPU to receive the prefetch abort
>   * @addr: The address to report in the DFAR
>   *
>   * It is assumed that this code is called from the VCPU thread and that the
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index dd79235b6435..a80ee820e700 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1003,6 +1003,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
>  #define KVM_CAP_PMU_EVENT_FILTER 173
>  #define KVM_CAP_ARM_NISV_TO_USER 174
> +#define KVM_CAP_ARM_INJECT_EXT_DABT 175
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 7153504bb106..56a97dd9b292 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -217,6 +217,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_IMMEDIATE_EXIT:
>  	case KVM_CAP_VCPU_EVENTS:
>  	case KVM_CAP_ARM_NISV_TO_USER:
> +	case KVM_CAP_ARM_INJECT_EXT_DABT:
>  		r = 1;
>  		break;
>  	case KVM_CAP_ARM_SET_DEVICE_ADDR:
> 

Otherwise looks good to me. If you respin the series, and unless anyone
shouts, I'll queue it. No hurry though, I'm going to take slow(er) the
following two weeks.

Thanks,

	M.
Christoffer Dall Oct. 8, 2019, 8:34 a.m. UTC | #6
On Thu, Sep 26, 2019 at 03:09:11PM +0100, Marc Zyngier wrote:
> On 09/09/2019 13:13, Christoffer Dall wrote:
> > In some scenarios, such as buggy guest or incorrect configuration of the
> > VMM and firmware description data, userspace will detect a memory access
> > to a portion of the IPA, which is not mapped to any MMIO region.
> > 
> > For this purpose, the appropriate action is to inject an external abort
> > to the guest.  The kernel already has functionality to inject an
> > external abort, but we need to wire up a signal from user space that
> > lets user space tell the kernel to do this.
> > 
> > It turns out, we already have the set event functionality which we can
> > perfectly reuse for this.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > ---
> >  Documentation/virt/kvm/api.txt    | 15 ++++++++++++++-
> >  arch/arm/include/uapi/asm/kvm.h   |  3 ++-
> >  arch/arm/kvm/guest.c              |  3 +++
> >  arch/arm64/include/uapi/asm/kvm.h |  3 ++-
> >  arch/arm64/kvm/guest.c            |  3 +++
> >  arch/arm64/kvm/inject_fault.c     |  4 ++--
> >  include/uapi/linux/kvm.h          |  1 +
> >  virt/kvm/arm/arm.c                |  1 +
> >  8 files changed, 28 insertions(+), 5 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> > index 02501333f746..edd6cdc470ca 100644
> > --- a/Documentation/virt/kvm/api.txt
> > +++ b/Documentation/virt/kvm/api.txt
> > @@ -955,6 +955,8 @@ The following bits are defined in the flags field:
> >  
> >  ARM/ARM64:
> >  
> > +User space may need to inject several types of events to the guest.
> > +
> >  If the guest accesses a device that is being emulated by the host kernel in
> >  such a way that a real device would generate a physical SError, KVM may make
> >  a virtual SError pending for that VCPU. This system error interrupt remains
> > @@ -989,12 +991,23 @@ Specifying exception.has_esr on a system that does not support it will return
> >  -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
> >  will return -EINVAL.
> >  
> > +If the guest performed an access to I/O memory which could not be handled by
> > +user space, for example because of missing instruction syndrome decode
> > +information or because there is no device mapped at the accessed IPA, then
> > +user space can ask the kernel to inject an external abort using the address
> > +from the exiting fault on the VCPU. It is a programming error to set
> > +ext_dabt_pending at the same time as any of the serror fields, or to set
> > +ext_dabt_pending on an exit which was not either KVM_EXIT_MMIO or
> 
> ... on *re-entry from* an exit?
> 
> > +KVM_EXIT_ARM_NISV. This feature is only available if the system supports
> > +KVM_CAP_ARM_INJECT_EXT_DABT;
> 
> s/;/./
> 
> Can we add some wording to the fact that this is only a helper for the
> most common case? Most of the ARM exceptions can otherwise be
> constructed/injected using the SET_ONE_REG API.
> 
> > +
> >  struct kvm_vcpu_events {
> >  	struct {
> >  		__u8 serror_pending;
> >  		__u8 serror_has_esr;
> > +		__u8 ext_dabt_pending;
> >  		/* Align it to 8 bytes */
> > -		__u8 pad[6];
> > +		__u8 pad[5];
> >  		__u64 serror_esr;
> >  	} exception;
> >  	__u32 reserved[12];
> > diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> > index a4217c1a5d01..d2449a5bf8d5 100644
> > --- a/arch/arm/include/uapi/asm/kvm.h
> > +++ b/arch/arm/include/uapi/asm/kvm.h
> > @@ -131,8 +131,9 @@ struct kvm_vcpu_events {
> >  	struct {
> >  		__u8 serror_pending;
> >  		__u8 serror_has_esr;
> > +		__u8 ext_dabt_pending;
> >  		/* Align it to 8 bytes */
> > -		__u8 pad[6];
> > +		__u8 pad[5];
> >  		__u64 serror_esr;
> >  	} exception;
> >  	__u32 reserved[12];
> > diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
> > index 684cf64b4033..4154c5589501 100644
> > --- a/arch/arm/kvm/guest.c
> > +++ b/arch/arm/kvm/guest.c
> > @@ -263,11 +263,14 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> >  {
> >  	bool serror_pending = events->exception.serror_pending;
> >  	bool has_esr = events->exception.serror_has_esr;
> > +	bool has_ext_dabt_pending = events->exception.ext_dabt_pending;
> >  
> >  	if (serror_pending && has_esr)
> >  		return -EINVAL;
> >  	else if (serror_pending)
> >  		kvm_inject_vabt(vcpu);
> > +	else if (has_ext_dabt_pending)
> > +		kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> >  
> >  	return 0;
> >  }
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index 9a507716ae2f..7729efdb1c0c 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -164,8 +164,9 @@ struct kvm_vcpu_events {
> >  	struct {
> >  		__u8 serror_pending;
> >  		__u8 serror_has_esr;
> > +		__u8 ext_dabt_pending;
> >  		/* Align it to 8 bytes */
> > -		__u8 pad[6];
> > +		__u8 pad[5];
> >  		__u64 serror_esr;
> >  	} exception;
> >  	__u32 reserved[12];
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index dfd626447482..10e6e2144dca 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -720,6 +720,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> >  {
> >  	bool serror_pending = events->exception.serror_pending;
> >  	bool has_esr = events->exception.serror_has_esr;
> > +	bool has_ext_dabt_pending = events->exception.ext_dabt_pending;
> >  
> >  	if (serror_pending && has_esr) {
> >  		if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
> > @@ -731,6 +732,8 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> >  			return -EINVAL;
> >  	} else if (serror_pending) {
> >  		kvm_inject_vabt(vcpu);
> > +	} else if (has_ext_dabt_pending) {
> > +		kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> >  	}
> >  
> >  	return 0;
> > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> > index a9d25a305af5..ccdb6a051ab2 100644
> > --- a/arch/arm64/kvm/inject_fault.c
> > +++ b/arch/arm64/kvm/inject_fault.c
> > @@ -109,7 +109,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
> >  
> >  /**
> >   * kvm_inject_dabt - inject a data abort into the guest
> > - * @vcpu: The VCPU to receive the undefined exception
> > + * @vcpu: The VCPU to receive the data abort
> >   * @addr: The address to report in the DFAR
> >   *
> >   * It is assumed that this code is called from the VCPU thread and that the
> > @@ -125,7 +125,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
> >  
> >  /**
> >   * kvm_inject_pabt - inject a prefetch abort into the guest
> > - * @vcpu: The VCPU to receive the undefined exception
> > + * @vcpu: The VCPU to receive the prefetch abort
> >   * @addr: The address to report in the DFAR
> >   *
> >   * It is assumed that this code is called from the VCPU thread and that the
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index dd79235b6435..a80ee820e700 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1003,6 +1003,7 @@ struct kvm_ppc_resize_hpt {
> >  #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
> >  #define KVM_CAP_PMU_EVENT_FILTER 173
> >  #define KVM_CAP_ARM_NISV_TO_USER 174
> > +#define KVM_CAP_ARM_INJECT_EXT_DABT 175
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >  
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 7153504bb106..56a97dd9b292 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -217,6 +217,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  	case KVM_CAP_IMMEDIATE_EXIT:
> >  	case KVM_CAP_VCPU_EVENTS:
> >  	case KVM_CAP_ARM_NISV_TO_USER:
> > +	case KVM_CAP_ARM_INJECT_EXT_DABT:
> >  		r = 1;
> >  		break;
> >  	case KVM_CAP_ARM_SET_DEVICE_ADDR:
> > 
> 
> Otherwise looks good to me. If you respin the series, and unless anyone
> shouts, I'll queue it. No hurry though, I'm going to take slow(er) the
> following two weeks.
> 

Thanks, I've tried to come with a wording for the above, you can have a
look in v2.

    Christoffer
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
index 02501333f746..edd6cdc470ca 100644
--- a/Documentation/virt/kvm/api.txt
+++ b/Documentation/virt/kvm/api.txt
@@ -955,6 +955,8 @@  The following bits are defined in the flags field:
 
 ARM/ARM64:
 
+User space may need to inject several types of events to the guest.
+
 If the guest accesses a device that is being emulated by the host kernel in
 such a way that a real device would generate a physical SError, KVM may make
 a virtual SError pending for that VCPU. This system error interrupt remains
@@ -989,12 +991,23 @@  Specifying exception.has_esr on a system that does not support it will return
 -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
 will return -EINVAL.
 
+If the guest performed an access to I/O memory which could not be handled by
+user space, for example because of missing instruction syndrome decode
+information or because there is no device mapped at the accessed IPA, then
+user space can ask the kernel to inject an external abort using the address
+from the exiting fault on the VCPU. It is a programming error to set
+ext_dabt_pending at the same time as any of the serror fields, or to set
+ext_dabt_pending on an exit which was not either KVM_EXIT_MMIO or
+KVM_EXIT_ARM_NISV. This feature is only available if the system supports
+KVM_CAP_ARM_INJECT_EXT_DABT;
+
 struct kvm_vcpu_events {
 	struct {
 		__u8 serror_pending;
 		__u8 serror_has_esr;
+		__u8 ext_dabt_pending;
 		/* Align it to 8 bytes */
-		__u8 pad[6];
+		__u8 pad[5];
 		__u64 serror_esr;
 	} exception;
 	__u32 reserved[12];
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index a4217c1a5d01..d2449a5bf8d5 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -131,8 +131,9 @@  struct kvm_vcpu_events {
 	struct {
 		__u8 serror_pending;
 		__u8 serror_has_esr;
+		__u8 ext_dabt_pending;
 		/* Align it to 8 bytes */
-		__u8 pad[6];
+		__u8 pad[5];
 		__u64 serror_esr;
 	} exception;
 	__u32 reserved[12];
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 684cf64b4033..4154c5589501 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -263,11 +263,14 @@  int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 {
 	bool serror_pending = events->exception.serror_pending;
 	bool has_esr = events->exception.serror_has_esr;
+	bool has_ext_dabt_pending = events->exception.ext_dabt_pending;
 
 	if (serror_pending && has_esr)
 		return -EINVAL;
 	else if (serror_pending)
 		kvm_inject_vabt(vcpu);
+	else if (has_ext_dabt_pending)
+		kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
 
 	return 0;
 }
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 9a507716ae2f..7729efdb1c0c 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -164,8 +164,9 @@  struct kvm_vcpu_events {
 	struct {
 		__u8 serror_pending;
 		__u8 serror_has_esr;
+		__u8 ext_dabt_pending;
 		/* Align it to 8 bytes */
-		__u8 pad[6];
+		__u8 pad[5];
 		__u64 serror_esr;
 	} exception;
 	__u32 reserved[12];
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index dfd626447482..10e6e2144dca 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -720,6 +720,7 @@  int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 {
 	bool serror_pending = events->exception.serror_pending;
 	bool has_esr = events->exception.serror_has_esr;
+	bool has_ext_dabt_pending = events->exception.ext_dabt_pending;
 
 	if (serror_pending && has_esr) {
 		if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
@@ -731,6 +732,8 @@  int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 			return -EINVAL;
 	} else if (serror_pending) {
 		kvm_inject_vabt(vcpu);
+	} else if (has_ext_dabt_pending) {
+		kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
 	}
 
 	return 0;
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index a9d25a305af5..ccdb6a051ab2 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -109,7 +109,7 @@  static void inject_undef64(struct kvm_vcpu *vcpu)
 
 /**
  * kvm_inject_dabt - inject a data abort into the guest
- * @vcpu: The VCPU to receive the undefined exception
+ * @vcpu: The VCPU to receive the data abort
  * @addr: The address to report in the DFAR
  *
  * It is assumed that this code is called from the VCPU thread and that the
@@ -125,7 +125,7 @@  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
 
 /**
  * kvm_inject_pabt - inject a prefetch abort into the guest
- * @vcpu: The VCPU to receive the undefined exception
+ * @vcpu: The VCPU to receive the prefetch abort
  * @addr: The address to report in the DFAR
  *
  * It is assumed that this code is called from the VCPU thread and that the
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index dd79235b6435..a80ee820e700 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1003,6 +1003,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
 #define KVM_CAP_PMU_EVENT_FILTER 173
 #define KVM_CAP_ARM_NISV_TO_USER 174
+#define KVM_CAP_ARM_INJECT_EXT_DABT 175
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 7153504bb106..56a97dd9b292 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -217,6 +217,7 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_IMMEDIATE_EXIT:
 	case KVM_CAP_VCPU_EVENTS:
 	case KVM_CAP_ARM_NISV_TO_USER:
+	case KVM_CAP_ARM_INJECT_EXT_DABT:
 		r = 1;
 		break;
 	case KVM_CAP_ARM_SET_DEVICE_ADDR: