Message ID | 20190909121337.27287-3-christoffer.dall@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improve handling of stage 2 aborts without instruction decode | expand |
On Mon, 9 Sep 2019 at 13:13, Christoffer Dall <christoffer.dall@arm.com> wrote: > > In some scenarios, such as buggy guest or incorrect configuration of the > VMM and firmware description data, userspace will detect a memory access > to a portion of the IPA, which is not mapped to any MMIO region. > > For this purpose, the appropriate action is to inject an external abort > to the guest. The kernel already has functionality to inject an > external abort, but we need to wire up a signal from user space that > lets user space tell the kernel to do this. > > It turns out, we already have the set event functionality which we can > perfectly reuse for this. > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> > --- > Documentation/virt/kvm/api.txt | 15 ++++++++++++++- > arch/arm/include/uapi/asm/kvm.h | 3 ++- > arch/arm/kvm/guest.c | 3 +++ > arch/arm64/include/uapi/asm/kvm.h | 3 ++- > arch/arm64/kvm/guest.c | 3 +++ > arch/arm64/kvm/inject_fault.c | 4 ++-- > include/uapi/linux/kvm.h | 1 + > virt/kvm/arm/arm.c | 1 + > 8 files changed, 28 insertions(+), 5 deletions(-) > > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt > index 02501333f746..edd6cdc470ca 100644 > --- a/Documentation/virt/kvm/api.txt > +++ b/Documentation/virt/kvm/api.txt > @@ -955,6 +955,8 @@ The following bits are defined in the flags field: > > ARM/ARM64: > > +User space may need to inject several types of events to the guest. > + > If the guest accesses a device that is being emulated by the host kernel in > such a way that a real device would generate a physical SError, KVM may make > a virtual SError pending for that VCPU. This system error interrupt remains > @@ -989,12 +991,23 @@ Specifying exception.has_esr on a system that does not support it will return > -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr > will return -EINVAL. > > +If the guest performed an access to I/O memory which could not be handled by > +user space, for example because of missing instruction syndrome decode > +information or because there is no device mapped at the accessed IPA, then > +user space can ask the kernel to inject an external abort using the address > +from the exiting fault on the VCPU. It is a programming error to set > +ext_dabt_pending at the same time as any of the serror fields, or to set > +ext_dabt_pending on an exit which was not either KVM_EXIT_MMIO or > +KVM_EXIT_ARM_NISV. This feature is only available if the system supports > +KVM_CAP_ARM_INJECT_EXT_DABT; > + > struct kvm_vcpu_events { > struct { > __u8 serror_pending; > __u8 serror_has_esr; > + __u8 ext_dabt_pending; > /* Align it to 8 bytes */ > - __u8 pad[6]; > + __u8 pad[5]; > __u64 serror_esr; > } exception; > __u32 reserved[12]; This API seems to be missing support for userspace to specify whether the ESR_ELx for the guest should have the EA bit set (and more generally other syndrome/fault status bits). I think if we have an API for "KVM_EXIT_MMIO but the access failed" then it should either (a) be architecture agnostic, since pretty much any architecture might have a concept of "access gave some bus-error-type failure" and it would be nice if userspace didn't have to special case them all in arch-specific code, or (b) have the same flexibility for specifying exactly what kind of fault as the architecture does. This sort of seems to fall between two stools. (My ideal for KVM_EXIT_MMIO faults would be a generic API which included space for optional arch-specific info, which for Arm would pretty much just be the EA bit.) As and when we support nested virtualization, any suggestions on how this API would extend to support userspace saying "deliver fault to guest EL1" vs "deliver fault to guest EL2" ? thanks -- PMM
On Mon, Sep 09, 2019 at 01:32:46PM +0100, Peter Maydell wrote: > On Mon, 9 Sep 2019 at 13:13, Christoffer Dall <christoffer.dall@arm.com> wrote: > > > > In some scenarios, such as buggy guest or incorrect configuration of the > > VMM and firmware description data, userspace will detect a memory access > > to a portion of the IPA, which is not mapped to any MMIO region. > > > > For this purpose, the appropriate action is to inject an external abort > > to the guest. The kernel already has functionality to inject an > > external abort, but we need to wire up a signal from user space that > > lets user space tell the kernel to do this. > > > > It turns out, we already have the set event functionality which we can > > perfectly reuse for this. > > > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> > > --- > > Documentation/virt/kvm/api.txt | 15 ++++++++++++++- > > arch/arm/include/uapi/asm/kvm.h | 3 ++- > > arch/arm/kvm/guest.c | 3 +++ > > arch/arm64/include/uapi/asm/kvm.h | 3 ++- > > arch/arm64/kvm/guest.c | 3 +++ > > arch/arm64/kvm/inject_fault.c | 4 ++-- > > include/uapi/linux/kvm.h | 1 + > > virt/kvm/arm/arm.c | 1 + > > 8 files changed, 28 insertions(+), 5 deletions(-) > > > > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt > > index 02501333f746..edd6cdc470ca 100644 > > --- a/Documentation/virt/kvm/api.txt > > +++ b/Documentation/virt/kvm/api.txt > > @@ -955,6 +955,8 @@ The following bits are defined in the flags field: > > > > ARM/ARM64: > > > > +User space may need to inject several types of events to the guest. > > + > > If the guest accesses a device that is being emulated by the host kernel in > > such a way that a real device would generate a physical SError, KVM may make > > a virtual SError pending for that VCPU. This system error interrupt remains > > @@ -989,12 +991,23 @@ Specifying exception.has_esr on a system that does not support it will return > > -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr > > will return -EINVAL. > > > > +If the guest performed an access to I/O memory which could not be handled by > > +user space, for example because of missing instruction syndrome decode > > +information or because there is no device mapped at the accessed IPA, then > > +user space can ask the kernel to inject an external abort using the address > > +from the exiting fault on the VCPU. It is a programming error to set > > +ext_dabt_pending at the same time as any of the serror fields, or to set > > +ext_dabt_pending on an exit which was not either KVM_EXIT_MMIO or > > +KVM_EXIT_ARM_NISV. This feature is only available if the system supports > > +KVM_CAP_ARM_INJECT_EXT_DABT; > > + > > struct kvm_vcpu_events { > > struct { > > __u8 serror_pending; > > __u8 serror_has_esr; > > + __u8 ext_dabt_pending; > > /* Align it to 8 bytes */ > > - __u8 pad[6]; > > + __u8 pad[5]; > > __u64 serror_esr; > > } exception; > > __u32 reserved[12]; > > This API seems to be missing support for userspace to specify > whether the ESR_ELx for the guest should have the EA bit set > (and more generally other syndrome/fault status bits). I think > if we have an API for "KVM_EXIT_MMIO but the access failed" > then it should either (a) be architecture agnostic, since > pretty much any architecture might have a concept of "access > gave some bus-error-type failure" and it would be nice if userspace > didn't have to special case them all in arch-specific code, > or (b) have the same flexibility for specifying exactly what > kind of fault as the architecture does. This sort of seems to > fall between two stools. (My ideal for KVM_EXIT_MMIO faults > would be a generic API which included space for optional > arch-specific info, which for Arm would pretty much just be > the EA bit.) I'm not sure I understand exactly what would be improved by making this either more architecture speific or more architecture generic. The EA bit will always be set, that's why the field is called 'ext_dabt_pending'. I thought as per the previous discussion, that we were specifically trying to avoid userspace emulating the exception in detail, so I designed this to provide the minimal effort API for userspace. Since we already have an architecture specific ioctl, kvm_vcpu_events, I don't think we're painting ourselves into a corner by using that. Is a natural consequence of what you're saying not that we should try to make that whole call architecture generic? Unless we already have specific examples of how other architectures would want to use something like this, and given the impact of this patch, I'm not sure it's worth trying to speculate about that. > > As and when we support nested virtualization, any suggestions > on how this API would extend to support userspace saying > "deliver fault to guest EL1" vs "deliver fault to guest EL2" ? > If we took one of the supported exits from a VM with nested virt support, it means that you either had a fault from the guest hypervisor, or a fault from a nested guest where the guest hypervisor has set up a virtual stage 2 mapping to a hole in the VM's IPA space. In the former case, the exception would be delivered back to guest hypervisor, and in the latter case the target depends on the guest hypervisor's configuration of the virtual HCR_EL2(.TEA), which the kernel should respect when handling the KVM_SET_VCPU_EVENTS ioctl. Thanks, Christoffer
On Mon, 9 Sep 2019 at 16:16, Christoffer Dall <christoffer.dall@arm.com> wrote: > > On Mon, Sep 09, 2019 at 01:32:46PM +0100, Peter Maydell wrote: > > This API seems to be missing support for userspace to specify > > whether the ESR_ELx for the guest should have the EA bit set > > (and more generally other syndrome/fault status bits). I think > > if we have an API for "KVM_EXIT_MMIO but the access failed" > > then it should either (a) be architecture agnostic, since > > pretty much any architecture might have a concept of "access > > gave some bus-error-type failure" and it would be nice if userspace > > didn't have to special case them all in arch-specific code, > > or (b) have the same flexibility for specifying exactly what > > kind of fault as the architecture does. This sort of seems to > > fall between two stools. (My ideal for KVM_EXIT_MMIO faults > > would be a generic API which included space for optional > > arch-specific info, which for Arm would pretty much just be > > the EA bit.) > > I'm not sure I understand exactly what would be improved by making this > either more architecture speific or more architecture generic. The > EA bit will always be set, that's why the field is called > 'ext_dabt_pending'. ESR_EL1.EA doesn't mean "this is an external abort". It means "given that this is an external abort as indicated by ESR_EL1.DFSC, specify the external abort type". Traditionally this is 0 for an AXI bus Decode error ("interconnect says there's nothing there") and 1 for a Slave error ("there's something there but it told us to go away"), though architecturally it's specified as impdef because not everybody uses AXI. In QEMU we track the difference between these two things and for TCG will raise external aborts with the correct EA bit value. > I thought as per the previous discussion, that we were specifically > trying to avoid userspace emulating the exception in detail, so I > designed this to provide the minimal effort API for userspace. > > Since we already have an architecture specific ioctl, kvm_vcpu_events, I > don't think we're painting ourselves into a corner by using that. Is a > natural consequence of what you're saying not that we should try to make > that whole call architecture generic? > > Unless we already have specific examples of how other architectures > would want to use something like this, and given the impact of this > patch, I'm not sure it's worth trying to speculate about that. In QEMU, use of a generic API would look something like this in kvm-all.c: case KVM_EXIT_MMIO: DPRINTF("handle_mmio\n"); /* Called outside BQL */ MemTxResult res; res = address_space_rw(&address_space_memory, run->mmio.phys_addr, attrs, run->mmio.data, run->mmio.len, run->mmio.is_write); if (res != MEMTX_OK) { /* tell the kernel the access failed, eg * by updating the kvm_run struct to say so */ } else { /* access passed, we have updated the kvm_run * struct's mmio subfield, proceed as usual */ } ret = 0; break; [this is exactly the current QEMU code except that today we throw away the 'res' that tells us if the transaction succeeded because we have no way to report it to KVM and effectively always RAZ/WI the access.] This is nice because you don't need anything here that has to do "bail out to architecture specific handling of anything", you just say "nope, the access failed", and let the kernel handle that however the CPU would handle it. It just immediately works for all architectures on the userspace side (assuming the kernel defaults to not actually trying to report an abort to the guest if nobody's implemented that on the kernel side, which is exactly what happens today where there's no way to report the error for any architecture). The downside is that you lose the ability to be more specific about architecture-specific fine distinctions like decode errors vs slave errors, though. Or you could have an arm-specific API that does care about fine details like the EA bit (and maybe also other ESR_ELx fields); that has the downside that userspace needs to make the handling of error returns from "handle this MMIO access" architecture specific, but you get architecture-specific benefits as a result. (Preferably the architecture-specific APIs should at least be basically the same, eg same ioctl or same bit of the kvm_run struct being updated with some parts being arch-specific data, rather than 3 different mechanisms.) Having an API that is architecture specific but doesn't actually let you define any of the architecture-specific aspects of what the abort might imply seems like the worst of both worlds. If all we can say is "this aborted" then we might as well have the API be generic. thanks -- PMM
On Mon, Sep 09, 2019 at 04:56:23PM +0100, Peter Maydell wrote: > On Mon, 9 Sep 2019 at 16:16, Christoffer Dall <christoffer.dall@arm.com> wrote: > > > > On Mon, Sep 09, 2019 at 01:32:46PM +0100, Peter Maydell wrote: > > > This API seems to be missing support for userspace to specify > > > whether the ESR_ELx for the guest should have the EA bit set > > > (and more generally other syndrome/fault status bits). I think > > > if we have an API for "KVM_EXIT_MMIO but the access failed" > > > then it should either (a) be architecture agnostic, since > > > pretty much any architecture might have a concept of "access > > > gave some bus-error-type failure" and it would be nice if userspace > > > didn't have to special case them all in arch-specific code, > > > or (b) have the same flexibility for specifying exactly what > > > kind of fault as the architecture does. This sort of seems to > > > fall between two stools. (My ideal for KVM_EXIT_MMIO faults > > > would be a generic API which included space for optional > > > arch-specific info, which for Arm would pretty much just be > > > the EA bit.) > > > > I'm not sure I understand exactly what would be improved by making this > > either more architecture speific or more architecture generic. The > > EA bit will always be set, that's why the field is called > > 'ext_dabt_pending'. > > ESR_EL1.EA doesn't mean "this is an external abort". It means > "given that this is an external abort as indicated by ESR_EL1.DFSC, > specify the external abort type". Traditionally this is 0 for > an AXI bus Decode error ("interconnect says there's nothing there") > and 1 for a Slave error ("there's something there but it told us > to go away"), though architecturally it's specified as impdef > because not everybody uses AXI. In QEMU we track the difference > between these two things and for TCG will raise external aborts > with the correct EA bit value. > Ah, I missed that. I don't think we want to allow userspace to supply any implementation defined values for the VM, though. > > I thought as per the previous discussion, that we were specifically > > trying to avoid userspace emulating the exception in detail, so I > > designed this to provide the minimal effort API for userspace. > > > > Since we already have an architecture specific ioctl, kvm_vcpu_events, I > > don't think we're painting ourselves into a corner by using that. Is a > > natural consequence of what you're saying not that we should try to make > > that whole call architecture generic? > > > > Unless we already have specific examples of how other architectures > > would want to use something like this, and given the impact of this > > patch, I'm not sure it's worth trying to speculate about that. > > In QEMU, use of a generic API would look something like > this in kvm-all.c: > > case KVM_EXIT_MMIO: > DPRINTF("handle_mmio\n"); > /* Called outside BQL */ > MemTxResult res; > > res = address_space_rw(&address_space_memory, > run->mmio.phys_addr, attrs, > run->mmio.data, > run->mmio.len, > run->mmio.is_write); > if (res != MEMTX_OK) { > /* tell the kernel the access failed, eg > * by updating the kvm_run struct to say so > */ > } else { > /* access passed, we have updated the kvm_run > * struct's mmio subfield, proceed as usual > */ > } > ret = 0; > break; > > [this is exactly the current QEMU code except that today > we throw away the 'res' that tells us if the transaction > succeeded because we have no way to report it to KVM and > effectively always RAZ/WI the access.] > > This is nice because you don't need anything here that has to do > "bail out to architecture specific handling of anything", > you just say "nope, the access failed", and let the kernel handle > that however the CPU would handle it. It just immediately works > for all architectures on the userspace side (assuming the kernel > defaults to not actually trying to report an abort to the guest > if nobody's implemented that on the kernel side, which is exactly > what happens today where there's no way to report the error for > any architecture). > The downside is that you lose the ability to be more specific about > architecture-specific fine distinctions like decode errors vs slave > errors, though. I understand that it's convenient to avoid having to write an architecture hook, but I simply don't know if it makes sense to do this on other architectures, and while it can be more code to have to write the architecture hooks in QEMU, it's hardly a strong argument against using an existing architecture-specific mechanism to inject an event to a guest. Note that I looked at using a an appropriate field in the kvm_run structure, but nothing elegant came to mind. Do you have a concrete example of how you would like to change the kvm_run structure? > > Or you could have an arm-specific API that does care about > fine details like the EA bit (and maybe also other ESR_ELx > fields); that has the downside that userspace needs to > make the handling of error returns from "handle this MMIO > access" architecture specific, but you get architecture-specific > benefits as a result. (Preferably the architecture-specific > APIs should at least be basically the same, eg same ioctl > or same bit of the kvm_run struct being updated with some parts > being arch-specific data, rather than 3 different mechanisms.) Are there other bits of the ESR than the EA that you think we should be able to specify? Can we decide if we need to allow userspace to provide additional information or not, and then decide on the mechanism, instead of conflating the two questions? I think we should either expose the minimal mechanism to user space, or just leave it to user space to emulate the whole thing. Thanks, Christoffer
On 09/09/2019 13:13, Christoffer Dall wrote: > In some scenarios, such as buggy guest or incorrect configuration of the > VMM and firmware description data, userspace will detect a memory access > to a portion of the IPA, which is not mapped to any MMIO region. > > For this purpose, the appropriate action is to inject an external abort > to the guest. The kernel already has functionality to inject an > external abort, but we need to wire up a signal from user space that > lets user space tell the kernel to do this. > > It turns out, we already have the set event functionality which we can > perfectly reuse for this. > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> > --- > Documentation/virt/kvm/api.txt | 15 ++++++++++++++- > arch/arm/include/uapi/asm/kvm.h | 3 ++- > arch/arm/kvm/guest.c | 3 +++ > arch/arm64/include/uapi/asm/kvm.h | 3 ++- > arch/arm64/kvm/guest.c | 3 +++ > arch/arm64/kvm/inject_fault.c | 4 ++-- > include/uapi/linux/kvm.h | 1 + > virt/kvm/arm/arm.c | 1 + > 8 files changed, 28 insertions(+), 5 deletions(-) > > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt > index 02501333f746..edd6cdc470ca 100644 > --- a/Documentation/virt/kvm/api.txt > +++ b/Documentation/virt/kvm/api.txt > @@ -955,6 +955,8 @@ The following bits are defined in the flags field: > > ARM/ARM64: > > +User space may need to inject several types of events to the guest. > + > If the guest accesses a device that is being emulated by the host kernel in > such a way that a real device would generate a physical SError, KVM may make > a virtual SError pending for that VCPU. This system error interrupt remains > @@ -989,12 +991,23 @@ Specifying exception.has_esr on a system that does not support it will return > -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr > will return -EINVAL. > > +If the guest performed an access to I/O memory which could not be handled by > +user space, for example because of missing instruction syndrome decode > +information or because there is no device mapped at the accessed IPA, then > +user space can ask the kernel to inject an external abort using the address > +from the exiting fault on the VCPU. It is a programming error to set > +ext_dabt_pending at the same time as any of the serror fields, or to set > +ext_dabt_pending on an exit which was not either KVM_EXIT_MMIO or ... on *re-entry from* an exit? > +KVM_EXIT_ARM_NISV. This feature is only available if the system supports > +KVM_CAP_ARM_INJECT_EXT_DABT; s/;/./ Can we add some wording to the fact that this is only a helper for the most common case? Most of the ARM exceptions can otherwise be constructed/injected using the SET_ONE_REG API. > + > struct kvm_vcpu_events { > struct { > __u8 serror_pending; > __u8 serror_has_esr; > + __u8 ext_dabt_pending; > /* Align it to 8 bytes */ > - __u8 pad[6]; > + __u8 pad[5]; > __u64 serror_esr; > } exception; > __u32 reserved[12]; > diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h > index a4217c1a5d01..d2449a5bf8d5 100644 > --- a/arch/arm/include/uapi/asm/kvm.h > +++ b/arch/arm/include/uapi/asm/kvm.h > @@ -131,8 +131,9 @@ struct kvm_vcpu_events { > struct { > __u8 serror_pending; > __u8 serror_has_esr; > + __u8 ext_dabt_pending; > /* Align it to 8 bytes */ > - __u8 pad[6]; > + __u8 pad[5]; > __u64 serror_esr; > } exception; > __u32 reserved[12]; > diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c > index 684cf64b4033..4154c5589501 100644 > --- a/arch/arm/kvm/guest.c > +++ b/arch/arm/kvm/guest.c > @@ -263,11 +263,14 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, > { > bool serror_pending = events->exception.serror_pending; > bool has_esr = events->exception.serror_has_esr; > + bool has_ext_dabt_pending = events->exception.ext_dabt_pending; > > if (serror_pending && has_esr) > return -EINVAL; > else if (serror_pending) > kvm_inject_vabt(vcpu); > + else if (has_ext_dabt_pending) > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > > return 0; > } > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h > index 9a507716ae2f..7729efdb1c0c 100644 > --- a/arch/arm64/include/uapi/asm/kvm.h > +++ b/arch/arm64/include/uapi/asm/kvm.h > @@ -164,8 +164,9 @@ struct kvm_vcpu_events { > struct { > __u8 serror_pending; > __u8 serror_has_esr; > + __u8 ext_dabt_pending; > /* Align it to 8 bytes */ > - __u8 pad[6]; > + __u8 pad[5]; > __u64 serror_esr; > } exception; > __u32 reserved[12]; > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c > index dfd626447482..10e6e2144dca 100644 > --- a/arch/arm64/kvm/guest.c > +++ b/arch/arm64/kvm/guest.c > @@ -720,6 +720,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, > { > bool serror_pending = events->exception.serror_pending; > bool has_esr = events->exception.serror_has_esr; > + bool has_ext_dabt_pending = events->exception.ext_dabt_pending; > > if (serror_pending && has_esr) { > if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) > @@ -731,6 +732,8 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, > return -EINVAL; > } else if (serror_pending) { > kvm_inject_vabt(vcpu); > + } else if (has_ext_dabt_pending) { > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > } > > return 0; > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c > index a9d25a305af5..ccdb6a051ab2 100644 > --- a/arch/arm64/kvm/inject_fault.c > +++ b/arch/arm64/kvm/inject_fault.c > @@ -109,7 +109,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu) > > /** > * kvm_inject_dabt - inject a data abort into the guest > - * @vcpu: The VCPU to receive the undefined exception > + * @vcpu: The VCPU to receive the data abort > * @addr: The address to report in the DFAR > * > * It is assumed that this code is called from the VCPU thread and that the > @@ -125,7 +125,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr) > > /** > * kvm_inject_pabt - inject a prefetch abort into the guest > - * @vcpu: The VCPU to receive the undefined exception > + * @vcpu: The VCPU to receive the prefetch abort > * @addr: The address to report in the DFAR > * > * It is assumed that this code is called from the VCPU thread and that the > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index dd79235b6435..a80ee820e700 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1003,6 +1003,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 > #define KVM_CAP_PMU_EVENT_FILTER 173 > #define KVM_CAP_ARM_NISV_TO_USER 174 > +#define KVM_CAP_ARM_INJECT_EXT_DABT 175 > > #ifdef KVM_CAP_IRQ_ROUTING > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 7153504bb106..56a97dd9b292 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -217,6 +217,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_IMMEDIATE_EXIT: > case KVM_CAP_VCPU_EVENTS: > case KVM_CAP_ARM_NISV_TO_USER: > + case KVM_CAP_ARM_INJECT_EXT_DABT: > r = 1; > break; > case KVM_CAP_ARM_SET_DEVICE_ADDR: > Otherwise looks good to me. If you respin the series, and unless anyone shouts, I'll queue it. No hurry though, I'm going to take slow(er) the following two weeks. Thanks, M.
On Thu, Sep 26, 2019 at 03:09:11PM +0100, Marc Zyngier wrote: > On 09/09/2019 13:13, Christoffer Dall wrote: > > In some scenarios, such as buggy guest or incorrect configuration of the > > VMM and firmware description data, userspace will detect a memory access > > to a portion of the IPA, which is not mapped to any MMIO region. > > > > For this purpose, the appropriate action is to inject an external abort > > to the guest. The kernel already has functionality to inject an > > external abort, but we need to wire up a signal from user space that > > lets user space tell the kernel to do this. > > > > It turns out, we already have the set event functionality which we can > > perfectly reuse for this. > > > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> > > --- > > Documentation/virt/kvm/api.txt | 15 ++++++++++++++- > > arch/arm/include/uapi/asm/kvm.h | 3 ++- > > arch/arm/kvm/guest.c | 3 +++ > > arch/arm64/include/uapi/asm/kvm.h | 3 ++- > > arch/arm64/kvm/guest.c | 3 +++ > > arch/arm64/kvm/inject_fault.c | 4 ++-- > > include/uapi/linux/kvm.h | 1 + > > virt/kvm/arm/arm.c | 1 + > > 8 files changed, 28 insertions(+), 5 deletions(-) > > > > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt > > index 02501333f746..edd6cdc470ca 100644 > > --- a/Documentation/virt/kvm/api.txt > > +++ b/Documentation/virt/kvm/api.txt > > @@ -955,6 +955,8 @@ The following bits are defined in the flags field: > > > > ARM/ARM64: > > > > +User space may need to inject several types of events to the guest. > > + > > If the guest accesses a device that is being emulated by the host kernel in > > such a way that a real device would generate a physical SError, KVM may make > > a virtual SError pending for that VCPU. This system error interrupt remains > > @@ -989,12 +991,23 @@ Specifying exception.has_esr on a system that does not support it will return > > -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr > > will return -EINVAL. > > > > +If the guest performed an access to I/O memory which could not be handled by > > +user space, for example because of missing instruction syndrome decode > > +information or because there is no device mapped at the accessed IPA, then > > +user space can ask the kernel to inject an external abort using the address > > +from the exiting fault on the VCPU. It is a programming error to set > > +ext_dabt_pending at the same time as any of the serror fields, or to set > > +ext_dabt_pending on an exit which was not either KVM_EXIT_MMIO or > > ... on *re-entry from* an exit? > > > +KVM_EXIT_ARM_NISV. This feature is only available if the system supports > > +KVM_CAP_ARM_INJECT_EXT_DABT; > > s/;/./ > > Can we add some wording to the fact that this is only a helper for the > most common case? Most of the ARM exceptions can otherwise be > constructed/injected using the SET_ONE_REG API. > > > + > > struct kvm_vcpu_events { > > struct { > > __u8 serror_pending; > > __u8 serror_has_esr; > > + __u8 ext_dabt_pending; > > /* Align it to 8 bytes */ > > - __u8 pad[6]; > > + __u8 pad[5]; > > __u64 serror_esr; > > } exception; > > __u32 reserved[12]; > > diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h > > index a4217c1a5d01..d2449a5bf8d5 100644 > > --- a/arch/arm/include/uapi/asm/kvm.h > > +++ b/arch/arm/include/uapi/asm/kvm.h > > @@ -131,8 +131,9 @@ struct kvm_vcpu_events { > > struct { > > __u8 serror_pending; > > __u8 serror_has_esr; > > + __u8 ext_dabt_pending; > > /* Align it to 8 bytes */ > > - __u8 pad[6]; > > + __u8 pad[5]; > > __u64 serror_esr; > > } exception; > > __u32 reserved[12]; > > diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c > > index 684cf64b4033..4154c5589501 100644 > > --- a/arch/arm/kvm/guest.c > > +++ b/arch/arm/kvm/guest.c > > @@ -263,11 +263,14 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, > > { > > bool serror_pending = events->exception.serror_pending; > > bool has_esr = events->exception.serror_has_esr; > > + bool has_ext_dabt_pending = events->exception.ext_dabt_pending; > > > > if (serror_pending && has_esr) > > return -EINVAL; > > else if (serror_pending) > > kvm_inject_vabt(vcpu); > > + else if (has_ext_dabt_pending) > > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > > > > return 0; > > } > > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h > > index 9a507716ae2f..7729efdb1c0c 100644 > > --- a/arch/arm64/include/uapi/asm/kvm.h > > +++ b/arch/arm64/include/uapi/asm/kvm.h > > @@ -164,8 +164,9 @@ struct kvm_vcpu_events { > > struct { > > __u8 serror_pending; > > __u8 serror_has_esr; > > + __u8 ext_dabt_pending; > > /* Align it to 8 bytes */ > > - __u8 pad[6]; > > + __u8 pad[5]; > > __u64 serror_esr; > > } exception; > > __u32 reserved[12]; > > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c > > index dfd626447482..10e6e2144dca 100644 > > --- a/arch/arm64/kvm/guest.c > > +++ b/arch/arm64/kvm/guest.c > > @@ -720,6 +720,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, > > { > > bool serror_pending = events->exception.serror_pending; > > bool has_esr = events->exception.serror_has_esr; > > + bool has_ext_dabt_pending = events->exception.ext_dabt_pending; > > > > if (serror_pending && has_esr) { > > if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) > > @@ -731,6 +732,8 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, > > return -EINVAL; > > } else if (serror_pending) { > > kvm_inject_vabt(vcpu); > > + } else if (has_ext_dabt_pending) { > > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > > } > > > > return 0; > > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c > > index a9d25a305af5..ccdb6a051ab2 100644 > > --- a/arch/arm64/kvm/inject_fault.c > > +++ b/arch/arm64/kvm/inject_fault.c > > @@ -109,7 +109,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu) > > > > /** > > * kvm_inject_dabt - inject a data abort into the guest > > - * @vcpu: The VCPU to receive the undefined exception > > + * @vcpu: The VCPU to receive the data abort > > * @addr: The address to report in the DFAR > > * > > * It is assumed that this code is called from the VCPU thread and that the > > @@ -125,7 +125,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr) > > > > /** > > * kvm_inject_pabt - inject a prefetch abort into the guest > > - * @vcpu: The VCPU to receive the undefined exception > > + * @vcpu: The VCPU to receive the prefetch abort > > * @addr: The address to report in the DFAR > > * > > * It is assumed that this code is called from the VCPU thread and that the > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index dd79235b6435..a80ee820e700 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -1003,6 +1003,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 > > #define KVM_CAP_PMU_EVENT_FILTER 173 > > #define KVM_CAP_ARM_NISV_TO_USER 174 > > +#define KVM_CAP_ARM_INJECT_EXT_DABT 175 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > > index 7153504bb106..56a97dd9b292 100644 > > --- a/virt/kvm/arm/arm.c > > +++ b/virt/kvm/arm/arm.c > > @@ -217,6 +217,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > > case KVM_CAP_IMMEDIATE_EXIT: > > case KVM_CAP_VCPU_EVENTS: > > case KVM_CAP_ARM_NISV_TO_USER: > > + case KVM_CAP_ARM_INJECT_EXT_DABT: > > r = 1; > > break; > > case KVM_CAP_ARM_SET_DEVICE_ADDR: > > > > Otherwise looks good to me. If you respin the series, and unless anyone > shouts, I'll queue it. No hurry though, I'm going to take slow(er) the > following two weeks. > Thanks, I've tried to come with a wording for the above, you can have a look in v2. Christoffer
diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt index 02501333f746..edd6cdc470ca 100644 --- a/Documentation/virt/kvm/api.txt +++ b/Documentation/virt/kvm/api.txt @@ -955,6 +955,8 @@ The following bits are defined in the flags field: ARM/ARM64: +User space may need to inject several types of events to the guest. + If the guest accesses a device that is being emulated by the host kernel in such a way that a real device would generate a physical SError, KVM may make a virtual SError pending for that VCPU. This system error interrupt remains @@ -989,12 +991,23 @@ Specifying exception.has_esr on a system that does not support it will return -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr will return -EINVAL. +If the guest performed an access to I/O memory which could not be handled by +user space, for example because of missing instruction syndrome decode +information or because there is no device mapped at the accessed IPA, then +user space can ask the kernel to inject an external abort using the address +from the exiting fault on the VCPU. It is a programming error to set +ext_dabt_pending at the same time as any of the serror fields, or to set +ext_dabt_pending on an exit which was not either KVM_EXIT_MMIO or +KVM_EXIT_ARM_NISV. This feature is only available if the system supports +KVM_CAP_ARM_INJECT_EXT_DABT; + struct kvm_vcpu_events { struct { __u8 serror_pending; __u8 serror_has_esr; + __u8 ext_dabt_pending; /* Align it to 8 bytes */ - __u8 pad[6]; + __u8 pad[5]; __u64 serror_esr; } exception; __u32 reserved[12]; diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h index a4217c1a5d01..d2449a5bf8d5 100644 --- a/arch/arm/include/uapi/asm/kvm.h +++ b/arch/arm/include/uapi/asm/kvm.h @@ -131,8 +131,9 @@ struct kvm_vcpu_events { struct { __u8 serror_pending; __u8 serror_has_esr; + __u8 ext_dabt_pending; /* Align it to 8 bytes */ - __u8 pad[6]; + __u8 pad[5]; __u64 serror_esr; } exception; __u32 reserved[12]; diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c index 684cf64b4033..4154c5589501 100644 --- a/arch/arm/kvm/guest.c +++ b/arch/arm/kvm/guest.c @@ -263,11 +263,14 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, { bool serror_pending = events->exception.serror_pending; bool has_esr = events->exception.serror_has_esr; + bool has_ext_dabt_pending = events->exception.ext_dabt_pending; if (serror_pending && has_esr) return -EINVAL; else if (serror_pending) kvm_inject_vabt(vcpu); + else if (has_ext_dabt_pending) + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); return 0; } diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h index 9a507716ae2f..7729efdb1c0c 100644 --- a/arch/arm64/include/uapi/asm/kvm.h +++ b/arch/arm64/include/uapi/asm/kvm.h @@ -164,8 +164,9 @@ struct kvm_vcpu_events { struct { __u8 serror_pending; __u8 serror_has_esr; + __u8 ext_dabt_pending; /* Align it to 8 bytes */ - __u8 pad[6]; + __u8 pad[5]; __u64 serror_esr; } exception; __u32 reserved[12]; diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index dfd626447482..10e6e2144dca 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -720,6 +720,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, { bool serror_pending = events->exception.serror_pending; bool has_esr = events->exception.serror_has_esr; + bool has_ext_dabt_pending = events->exception.ext_dabt_pending; if (serror_pending && has_esr) { if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) @@ -731,6 +732,8 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, return -EINVAL; } else if (serror_pending) { kvm_inject_vabt(vcpu); + } else if (has_ext_dabt_pending) { + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); } return 0; diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c index a9d25a305af5..ccdb6a051ab2 100644 --- a/arch/arm64/kvm/inject_fault.c +++ b/arch/arm64/kvm/inject_fault.c @@ -109,7 +109,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu) /** * kvm_inject_dabt - inject a data abort into the guest - * @vcpu: The VCPU to receive the undefined exception + * @vcpu: The VCPU to receive the data abort * @addr: The address to report in the DFAR * * It is assumed that this code is called from the VCPU thread and that the @@ -125,7 +125,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr) /** * kvm_inject_pabt - inject a prefetch abort into the guest - * @vcpu: The VCPU to receive the undefined exception + * @vcpu: The VCPU to receive the prefetch abort * @addr: The address to report in the DFAR * * It is assumed that this code is called from the VCPU thread and that the diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index dd79235b6435..a80ee820e700 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1003,6 +1003,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 #define KVM_CAP_PMU_EVENT_FILTER 173 #define KVM_CAP_ARM_NISV_TO_USER 174 +#define KVM_CAP_ARM_INJECT_EXT_DABT 175 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 7153504bb106..56a97dd9b292 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -217,6 +217,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_IMMEDIATE_EXIT: case KVM_CAP_VCPU_EVENTS: case KVM_CAP_ARM_NISV_TO_USER: + case KVM_CAP_ARM_INJECT_EXT_DABT: r = 1; break; case KVM_CAP_ARM_SET_DEVICE_ADDR:
In some scenarios, such as buggy guest or incorrect configuration of the VMM and firmware description data, userspace will detect a memory access to a portion of the IPA, which is not mapped to any MMIO region. For this purpose, the appropriate action is to inject an external abort to the guest. The kernel already has functionality to inject an external abort, but we need to wire up a signal from user space that lets user space tell the kernel to do this. It turns out, we already have the set event functionality which we can perfectly reuse for this. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> --- Documentation/virt/kvm/api.txt | 15 ++++++++++++++- arch/arm/include/uapi/asm/kvm.h | 3 ++- arch/arm/kvm/guest.c | 3 +++ arch/arm64/include/uapi/asm/kvm.h | 3 ++- arch/arm64/kvm/guest.c | 3 +++ arch/arm64/kvm/inject_fault.c | 4 ++-- include/uapi/linux/kvm.h | 1 + virt/kvm/arm/arm.c | 1 + 8 files changed, 28 insertions(+), 5 deletions(-)