Message ID | 20190909121337.27287-2-christoffer.dall@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improve handling of stage 2 aborts without instruction decode | expand |
On 09/09/2019 13:13, Christoffer Dall wrote: > For a long time, if a guest accessed memory outside of a memslot using > any of the load/store instructions in the architecture which doesn't > supply decoding information in the ESR_EL2 (the ISV bit is not set), the > kernel would print the following message and terminate the VM as a > result of returning -ENOSYS to userspace: > > load/store instruction decoding not implemented > > The reason behind this message is that KVM assumes that all accesses > outside a memslot is an MMIO access which should be handled by > userspace, and we originally expected to eventually implement some sort > of decoding of load/store instructions where the ISV bit was not set. > > However, it turns out that many of the instructions which don't provide > decoding information on abort are not safe to use for MMIO accesses, and > the remaining few that would potentially make sense to use on MMIO > accesses, such as those with register writeback, are not used in > practice. It also turns out that fetching an instruction from guest > memory can be a pretty horrible affair, involving stopping all CPUs on > SMP systems, handling multiple corner cases of address translation in > software, and more. It doesn't appear likely that we'll ever implement > this in the kernel. > > What is much more common is that a user has misconfigured his/her guest > and is actually not accessing an MMIO region, but just hitting some > random hole in the IPA space. In this scenario, the error message above > is almost misleading and has led to a great deal of confusion over the > years. > > It is, nevertheless, ABI to userspace, and we therefore need to > introduce a new capability that userspace explicitly enables to change > behavior. > > This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV) > which does exactly that, and introduces a new exit reason to report the > event to userspace. User space can then emulate an exception to the > guest, restart the guest, suspend the guest, or take any other > appropriate action as per the policy of the running system. > > Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> > --- > Documentation/virt/kvm/api.txt | 29 ++++++++++++++++++++++++++++ > arch/arm/include/asm/kvm_arm.h | 2 ++ > arch/arm/include/asm/kvm_emulate.h | 5 +++++ > arch/arm/include/asm/kvm_host.h | 8 ++++++++ > arch/arm64/include/asm/kvm_emulate.h | 5 +++++ > arch/arm64/include/asm/kvm_host.h | 8 ++++++++ > include/uapi/linux/kvm.h | 7 +++++++ > virt/kvm/arm/arm.c | 21 ++++++++++++++++++++ > virt/kvm/arm/mmio.c | 11 +++++++++-- > 9 files changed, 94 insertions(+), 2 deletions(-) > > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt > index 2d067767b617..02501333f746 100644 > --- a/Documentation/virt/kvm/api.txt > +++ b/Documentation/virt/kvm/api.txt > @@ -4453,6 +4453,35 @@ Hyper-V SynIC state change. Notification is used to remap SynIC > event/message pages and to enable/disable SynIC messages/events processing > in userspace. > > + /* KVM_EXIT_ARM_NISV */ > + struct { > + __u64 esr_iss; > + __u64 fault_ipa; > + } arm_nisv; > + > +Used on arm and arm64 systems. If a guest accesses memory not in a memslot, > +KVM will typically return to userspace and ask it to do MMIO emulation on its > +behalf. However, for certain classes of instructions, no instruction decode > +(direction, length of memory access) is provided, and fetching and decoding > +the instruction from the VM is overly complicated to live in the kernel. > + > +Historically, when this situation occurred, KVM would print a warning and kill > +the VM. KVM assumed that if the guest accessed non-memslot memory, it was > +trying to do I/O, which just couldn't be emulated, and the warning message was > +phrased accordingly. However, what happened more often was that a guest bug > +caused access outside the guest memory areas which should lead to a more > +mearningful warning message and an external abort in the guest, if the access meaningful? > +did not fall within an I/O window. > + > +Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable > +this capability at VM creation. Once this is done, these types of errors will > +instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from > +the HSR (arm) and ESR_EL2 (arm64) in the esr_iss field, and the faulting IPA > +in the fault_ipa field. Userspace can either fix up the access if it's > +actually an I/O access by decoding the instruction from guest memory (if it's > +very brave) and continue executing the guest, or it can decide to suspend, > +dump, or restart the guest. > + > /* Fix the size of the union. */ > char padding[256]; > }; > diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h > index 0125aa059d5b..ce61b3b0058d 100644 > --- a/arch/arm/include/asm/kvm_arm.h > +++ b/arch/arm/include/asm/kvm_arm.h > @@ -162,6 +162,8 @@ > #define HSR_ISV (_AC(1, UL) << HSR_ISV_SHIFT) > #define HSR_SRT_SHIFT (16) > #define HSR_SRT_MASK (0xf << HSR_SRT_SHIFT) > +#define HSR_CM (1 << 8) > +#define HSR_WNR (1 << 6) I think we already have that one (bizarrely placed after HSR_SSE). > #define HSR_FSC (0x3f) > #define HSR_FSC_TYPE (0x3c) > #define HSR_SSE (1 << 21) > diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h > index 40002416efec..e8ef349c04b4 100644 > --- a/arch/arm/include/asm/kvm_emulate.h > +++ b/arch/arm/include/asm/kvm_emulate.h > @@ -167,6 +167,11 @@ static inline bool kvm_vcpu_dabt_isvalid(struct kvm_vcpu *vcpu) > return kvm_vcpu_get_hsr(vcpu) & HSR_ISV; > } > > +static inline unsigned long kvm_vcpu_dabt_iss_nisv_sanitized(const struct kvm_vcpu *vcpu) > +{ > + return kvm_vcpu_get_hsr(vcpu) & (HSR_CM | HSR_WNR | HSR_FSC); > +} > + > static inline bool kvm_vcpu_dabt_iswrite(struct kvm_vcpu *vcpu) > { > return kvm_vcpu_get_hsr(vcpu) & HSR_WNR; > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index 8a37c8e89777..19a92c49039c 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -76,6 +76,14 @@ struct kvm_arch { > > /* Mandated version of PSCI */ > u32 psci_version; > + > + /* > + * If we encounter a data abort without valid instruction syndrome > + * information, report this to user space. User space can (and > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > + * supported. > + */ > + bool return_nisv_io_abort_to_user; > }; > > #define KVM_NR_MEM_OBJS 40 > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h > index d69c1efc63e7..a3c967988e1d 100644 > --- a/arch/arm64/include/asm/kvm_emulate.h > +++ b/arch/arm64/include/asm/kvm_emulate.h > @@ -258,6 +258,11 @@ static inline bool kvm_vcpu_dabt_isvalid(const struct kvm_vcpu *vcpu) > return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_ISV); > } > > +static inline unsigned long kvm_vcpu_dabt_iss_nisv_sanitized(const struct kvm_vcpu *vcpu) > +{ > + return kvm_vcpu_get_hsr(vcpu) & (ESR_ELx_CM | ESR_ELx_WNR | ESR_ELx_FSC); > +} > + > static inline bool kvm_vcpu_dabt_issext(const struct kvm_vcpu *vcpu) > { > return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SSE); > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index f656169db8c3..019bc560edc1 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -83,6 +83,14 @@ struct kvm_arch { > > /* Mandated version of PSCI */ > u32 psci_version; > + > + /* > + * If we encounter a data abort without valid instruction syndrome > + * information, report this to user space. User space can (and > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > + * supported. > + */ > + bool return_nisv_io_abort_to_user; > }; > > #define KVM_NR_MEM_OBJS 40 > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 5e3f12d5359e..dd79235b6435 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -235,6 +235,7 @@ struct kvm_hyperv_exit { > #define KVM_EXIT_S390_STSI 25 > #define KVM_EXIT_IOAPIC_EOI 26 > #define KVM_EXIT_HYPERV 27 > +#define KVM_EXIT_ARM_NISV 28 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -392,6 +393,11 @@ struct kvm_run { > } eoi; > /* KVM_EXIT_HYPERV */ > struct kvm_hyperv_exit hyperv; > + /* KVM_EXIT_ARM_NISV */ > + struct { > + __u64 esr_iss; > + __u64 fault_ipa; > + } arm_nisv; > /* Fix the size of the union. */ > char padding[256]; > }; > @@ -996,6 +1002,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 > #define KVM_CAP_PMU_EVENT_FILTER 173 > +#define KVM_CAP_ARM_NISV_TO_USER 174 > > #ifdef KVM_CAP_IRQ_ROUTING > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 35a069815baf..7153504bb106 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void) > return 0; > } > > +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > + struct kvm_enable_cap *cap) > +{ > + int r; > + > + if (cap->flags) > + return -EINVAL; > + > + switch (cap->cap) { > + case KVM_CAP_ARM_NISV_TO_USER: > + r = 0; > + kvm->arch.return_nisv_io_abort_to_user = true; > + break; > + default: > + r = -EINVAL; > + break; > + } > + > + return r; > +} > > /** > * kvm_arch_init_vm - initializes a VM data structure > @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_MP_STATE: > case KVM_CAP_IMMEDIATE_EXIT: > case KVM_CAP_VCPU_EVENTS: > + case KVM_CAP_ARM_NISV_TO_USER: > r = 1; > break; > case KVM_CAP_ARM_SET_DEVICE_ADDR: > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c > index 6af5c91337f2..7b92e2744fa7 100644 > --- a/virt/kvm/arm/mmio.c > +++ b/virt/kvm/arm/mmio.c > @@ -167,8 +167,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, > if (ret) > return ret; > } else { > - kvm_err("load/store instruction decoding not implemented\n"); > - return -ENOSYS; > + if (vcpu->kvm->arch.return_nisv_io_abort_to_user) { > + run->exit_reason = KVM_EXIT_ARM_NISV; > + run->arm_nisv.esr_iss = kvm_vcpu_dabt_iss_nisv_sanitized(vcpu); > + run->arm_nisv.fault_ipa = fault_ipa; > + return 0; > + } else { > + kvm_info("Encountered data abort outside memslots with no valid syndrome info\n"); Nit: you could lose the else clause, making this line a bit shorter. More importantly, I'd like this to become a rate-limited output, just to keep things sane. How about kvm_pr_unimpl()? > + return -ENOSYS; > + } > } > > rt = vcpu->arch.mmio_decode.rt; > Other than this collection of nits, this looks pretty good, and I'd really like to take this patch in 5.5. Thanks, M.
Hi Christoffer, On 09/09/2019 13:13, Christoffer Dall wrote: > For a long time, if a guest accessed memory outside of a memslot using > any of the load/store instructions in the architecture which doesn't > supply decoding information in the ESR_EL2 (the ISV bit is not set), the > kernel would print the following message and terminate the VM as a > result of returning -ENOSYS to userspace: > > load/store instruction decoding not implemented > > The reason behind this message is that KVM assumes that all accesses > outside a memslot is an MMIO access which should be handled by > userspace, and we originally expected to eventually implement some sort > of decoding of load/store instructions where the ISV bit was not set. > However, it turns out that many of the instructions which don't provide > decoding information on abort are not safe to use for MMIO accesses, and > the remaining few that would potentially make sense to use on MMIO > accesses, such as those with register writeback, are not used in > practice. It also turns out that fetching an instruction from guest > memory can be a pretty horrible affair, involving stopping all CPUs on > SMP systems, handling multiple corner cases of address translation in > software, and more. It doesn't appear likely that we'll ever implement > this in the kernel. > What is much more common is that a user has misconfigured his/her guest > and is actually not accessing an MMIO region, but just hitting some > random hole in the IPA space. In this scenario, the error message above > is almost misleading and has led to a great deal of confusion over the > years. > > It is, nevertheless, ABI to userspace, and we therefore need to > introduce a new capability that userspace explicitly enables to change > behavior. > > This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV) > which does exactly that, and introduces a new exit reason to report the > event to userspace. User space can then emulate an exception to the > guest, restart the guest, suspend the guest, or take any other > appropriate action as per the policy of the running system. > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt > index 2d067767b617..02501333f746 100644 > --- a/Documentation/virt/kvm/api.txt > +++ b/Documentation/virt/kvm/api.txt > @@ -4453,6 +4453,35 @@ Hyper-V SynIC state change. Notification is used to remap SynIC > event/message pages and to enable/disable SynIC messages/events processing > in userspace. > > + /* KVM_EXIT_ARM_NISV */ > + struct { > + __u64 esr_iss; > + __u64 fault_ipa; > + } arm_nisv; > + > +Used on arm and arm64 systems. If a guest accesses memory not in a memslot, > +KVM will typically return to userspace and ask it to do MMIO emulation on its > +behalf. However, for certain classes of instructions, no instruction decode > +(direction, length of memory access) is provided, and fetching and decoding > +the instruction from the VM is overly complicated to live in the kernel. > + > +Historically, when this situation occurred, KVM would print a warning and kill > +the VM. KVM assumed that if the guest accessed non-memslot memory, it was > +trying to do I/O, which just couldn't be emulated, and the warning message was > +phrased accordingly. However, what happened more often was that a guest bug > +caused access outside the guest memory areas which should lead to a more > +mearningful warning message and an external abort in the guest, if the access > +did not fall within an I/O window. > + > +Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable > +this capability at VM creation. Once this is done, these types of errors will > +instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from > +the HSR (arm) and ESR_EL2 (arm64) in the esr_iss field, and the faulting IPA > +in the fault_ipa field. Userspace can either fix up the access if it's > +actually an I/O access by decoding the instruction from guest memory (if it's > +very brave) and continue executing the guest, or it can decide to suspend, > +dump, or restart the guest. Should we document which parts of instruction-emulation the VMM has to do? For KVM_EXIT_MMIO, kvm looks after updating registers and advancing the PC and SS state machine. I can't see a kvm_skip_instr() in here, so the VMM has to do all of that stuff, including any register post-increment, which is the reason we need the instruction in the first place. Thanks, James
On Thu, Sep 26, 2019 at 02:47:55PM +0100, Marc Zyngier wrote: > On 09/09/2019 13:13, Christoffer Dall wrote: > > For a long time, if a guest accessed memory outside of a memslot using > > any of the load/store instructions in the architecture which doesn't > > supply decoding information in the ESR_EL2 (the ISV bit is not set), the > > kernel would print the following message and terminate the VM as a > > result of returning -ENOSYS to userspace: > > > > load/store instruction decoding not implemented > > > > The reason behind this message is that KVM assumes that all accesses > > outside a memslot is an MMIO access which should be handled by > > userspace, and we originally expected to eventually implement some sort > > of decoding of load/store instructions where the ISV bit was not set. > > > > However, it turns out that many of the instructions which don't provide > > decoding information on abort are not safe to use for MMIO accesses, and > > the remaining few that would potentially make sense to use on MMIO > > accesses, such as those with register writeback, are not used in > > practice. It also turns out that fetching an instruction from guest > > memory can be a pretty horrible affair, involving stopping all CPUs on > > SMP systems, handling multiple corner cases of address translation in > > software, and more. It doesn't appear likely that we'll ever implement > > this in the kernel. > > > > What is much more common is that a user has misconfigured his/her guest > > and is actually not accessing an MMIO region, but just hitting some > > random hole in the IPA space. In this scenario, the error message above > > is almost misleading and has led to a great deal of confusion over the > > years. > > > > It is, nevertheless, ABI to userspace, and we therefore need to > > introduce a new capability that userspace explicitly enables to change > > behavior. > > > > This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV) > > which does exactly that, and introduces a new exit reason to report the > > event to userspace. User space can then emulate an exception to the > > guest, restart the guest, suspend the guest, or take any other > > appropriate action as per the policy of the running system. > > > > Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> > > --- > > Documentation/virt/kvm/api.txt | 29 ++++++++++++++++++++++++++++ > > arch/arm/include/asm/kvm_arm.h | 2 ++ > > arch/arm/include/asm/kvm_emulate.h | 5 +++++ > > arch/arm/include/asm/kvm_host.h | 8 ++++++++ > > arch/arm64/include/asm/kvm_emulate.h | 5 +++++ > > arch/arm64/include/asm/kvm_host.h | 8 ++++++++ > > include/uapi/linux/kvm.h | 7 +++++++ > > virt/kvm/arm/arm.c | 21 ++++++++++++++++++++ > > virt/kvm/arm/mmio.c | 11 +++++++++-- > > 9 files changed, 94 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt > > index 2d067767b617..02501333f746 100644 > > --- a/Documentation/virt/kvm/api.txt > > +++ b/Documentation/virt/kvm/api.txt > > @@ -4453,6 +4453,35 @@ Hyper-V SynIC state change. Notification is used to remap SynIC > > event/message pages and to enable/disable SynIC messages/events processing > > in userspace. > > > > + /* KVM_EXIT_ARM_NISV */ > > + struct { > > + __u64 esr_iss; > > + __u64 fault_ipa; > > + } arm_nisv; > > + > > +Used on arm and arm64 systems. If a guest accesses memory not in a memslot, > > +KVM will typically return to userspace and ask it to do MMIO emulation on its > > +behalf. However, for certain classes of instructions, no instruction decode > > +(direction, length of memory access) is provided, and fetching and decoding > > +the instruction from the VM is overly complicated to live in the kernel. > > + > > +Historically, when this situation occurred, KVM would print a warning and kill > > +the VM. KVM assumed that if the guest accessed non-memslot memory, it was > > +trying to do I/O, which just couldn't be emulated, and the warning message was > > +phrased accordingly. However, what happened more often was that a guest bug > > +caused access outside the guest memory areas which should lead to a more > > +mearningful warning message and an external abort in the guest, if the access > > meaningful? > > > +did not fall within an I/O window. > > + > > +Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable > > +this capability at VM creation. Once this is done, these types of errors will > > +instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from > > +the HSR (arm) and ESR_EL2 (arm64) in the esr_iss field, and the faulting IPA > > +in the fault_ipa field. Userspace can either fix up the access if it's > > +actually an I/O access by decoding the instruction from guest memory (if it's > > +very brave) and continue executing the guest, or it can decide to suspend, > > +dump, or restart the guest. > > + > > /* Fix the size of the union. */ > > char padding[256]; > > }; > > diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h > > index 0125aa059d5b..ce61b3b0058d 100644 > > --- a/arch/arm/include/asm/kvm_arm.h > > +++ b/arch/arm/include/asm/kvm_arm.h > > @@ -162,6 +162,8 @@ > > #define HSR_ISV (_AC(1, UL) << HSR_ISV_SHIFT) > > #define HSR_SRT_SHIFT (16) > > #define HSR_SRT_MASK (0xf << HSR_SRT_SHIFT) > > +#define HSR_CM (1 << 8) > > +#define HSR_WNR (1 << 6) > > I think we already have that one (bizarrely placed after HSR_SSE). > Hey, at least the bit definitions were the same. > > #define HSR_FSC (0x3f) > > #define HSR_FSC_TYPE (0x3c) > > #define HSR_SSE (1 << 21) > > diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h > > index 40002416efec..e8ef349c04b4 100644 > > --- a/arch/arm/include/asm/kvm_emulate.h > > +++ b/arch/arm/include/asm/kvm_emulate.h > > @@ -167,6 +167,11 @@ static inline bool kvm_vcpu_dabt_isvalid(struct kvm_vcpu *vcpu) > > return kvm_vcpu_get_hsr(vcpu) & HSR_ISV; > > } > > > > +static inline unsigned long kvm_vcpu_dabt_iss_nisv_sanitized(const struct kvm_vcpu *vcpu) > > +{ > > + return kvm_vcpu_get_hsr(vcpu) & (HSR_CM | HSR_WNR | HSR_FSC); > > +} > > + > > static inline bool kvm_vcpu_dabt_iswrite(struct kvm_vcpu *vcpu) > > { > > return kvm_vcpu_get_hsr(vcpu) & HSR_WNR; > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > > index 8a37c8e89777..19a92c49039c 100644 > > --- a/arch/arm/include/asm/kvm_host.h > > +++ b/arch/arm/include/asm/kvm_host.h > > @@ -76,6 +76,14 @@ struct kvm_arch { > > > > /* Mandated version of PSCI */ > > u32 psci_version; > > + > > + /* > > + * If we encounter a data abort without valid instruction syndrome > > + * information, report this to user space. User space can (and > > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > > + * supported. > > + */ > > + bool return_nisv_io_abort_to_user; > > }; > > > > #define KVM_NR_MEM_OBJS 40 > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h > > index d69c1efc63e7..a3c967988e1d 100644 > > --- a/arch/arm64/include/asm/kvm_emulate.h > > +++ b/arch/arm64/include/asm/kvm_emulate.h > > @@ -258,6 +258,11 @@ static inline bool kvm_vcpu_dabt_isvalid(const struct kvm_vcpu *vcpu) > > return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_ISV); > > } > > > > +static inline unsigned long kvm_vcpu_dabt_iss_nisv_sanitized(const struct kvm_vcpu *vcpu) > > +{ > > + return kvm_vcpu_get_hsr(vcpu) & (ESR_ELx_CM | ESR_ELx_WNR | ESR_ELx_FSC); > > +} > > + > > static inline bool kvm_vcpu_dabt_issext(const struct kvm_vcpu *vcpu) > > { > > return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SSE); > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > > index f656169db8c3..019bc560edc1 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -83,6 +83,14 @@ struct kvm_arch { > > > > /* Mandated version of PSCI */ > > u32 psci_version; > > + > > + /* > > + * If we encounter a data abort without valid instruction syndrome > > + * information, report this to user space. User space can (and > > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > > + * supported. > > + */ > > + bool return_nisv_io_abort_to_user; > > }; > > > > #define KVM_NR_MEM_OBJS 40 > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index 5e3f12d5359e..dd79235b6435 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -235,6 +235,7 @@ struct kvm_hyperv_exit { > > #define KVM_EXIT_S390_STSI 25 > > #define KVM_EXIT_IOAPIC_EOI 26 > > #define KVM_EXIT_HYPERV 27 > > +#define KVM_EXIT_ARM_NISV 28 > > > > /* For KVM_EXIT_INTERNAL_ERROR */ > > /* Emulate instruction failed. */ > > @@ -392,6 +393,11 @@ struct kvm_run { > > } eoi; > > /* KVM_EXIT_HYPERV */ > > struct kvm_hyperv_exit hyperv; > > + /* KVM_EXIT_ARM_NISV */ > > + struct { > > + __u64 esr_iss; > > + __u64 fault_ipa; > > + } arm_nisv; > > /* Fix the size of the union. */ > > char padding[256]; > > }; > > @@ -996,6 +1002,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 > > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 > > #define KVM_CAP_PMU_EVENT_FILTER 173 > > +#define KVM_CAP_ARM_NISV_TO_USER 174 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > > index 35a069815baf..7153504bb106 100644 > > --- a/virt/kvm/arm/arm.c > > +++ b/virt/kvm/arm/arm.c > > @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void) > > return 0; > > } > > > > +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > > + struct kvm_enable_cap *cap) > > +{ > > + int r; > > + > > + if (cap->flags) > > + return -EINVAL; > > + > > + switch (cap->cap) { > > + case KVM_CAP_ARM_NISV_TO_USER: > > + r = 0; > > + kvm->arch.return_nisv_io_abort_to_user = true; > > + break; > > + default: > > + r = -EINVAL; > > + break; > > + } > > + > > + return r; > > +} > > > > /** > > * kvm_arch_init_vm - initializes a VM data structure > > @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > > case KVM_CAP_MP_STATE: > > case KVM_CAP_IMMEDIATE_EXIT: > > case KVM_CAP_VCPU_EVENTS: > > + case KVM_CAP_ARM_NISV_TO_USER: > > r = 1; > > break; > > case KVM_CAP_ARM_SET_DEVICE_ADDR: > > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c > > index 6af5c91337f2..7b92e2744fa7 100644 > > --- a/virt/kvm/arm/mmio.c > > +++ b/virt/kvm/arm/mmio.c > > @@ -167,8 +167,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, > > if (ret) > > return ret; > > } else { > > - kvm_err("load/store instruction decoding not implemented\n"); > > - return -ENOSYS; > > + if (vcpu->kvm->arch.return_nisv_io_abort_to_user) { > > + run->exit_reason = KVM_EXIT_ARM_NISV; > > + run->arm_nisv.esr_iss = kvm_vcpu_dabt_iss_nisv_sanitized(vcpu); > > + run->arm_nisv.fault_ipa = fault_ipa; > > + return 0; > > + } else { > > + kvm_info("Encountered data abort outside memslots with no valid syndrome info\n"); > > Nit: you could lose the else clause, making this line a bit shorter. > > More importantly, I'd like this to become a rate-limited output, just to > keep things sane. How about kvm_pr_unimpl()? > Yes, I also got rid of 'Encountered' which doesn't seem to add much. > > + return -ENOSYS; > > + } > > } > > > > rt = vcpu->arch.mmio_decode.rt; > > > > Other than this collection of nits, this looks pretty good, and I'd > really like to take this patch in 5.5. > Thanks, Christoffer
On Tue, Oct 01, 2019 at 06:21:43PM +0100, James Morse wrote: > Hi Christoffer, > > On 09/09/2019 13:13, Christoffer Dall wrote: > > For a long time, if a guest accessed memory outside of a memslot using > > any of the load/store instructions in the architecture which doesn't > > supply decoding information in the ESR_EL2 (the ISV bit is not set), the > > kernel would print the following message and terminate the VM as a > > result of returning -ENOSYS to userspace: > > > > load/store instruction decoding not implemented > > > > The reason behind this message is that KVM assumes that all accesses > > outside a memslot is an MMIO access which should be handled by > > userspace, and we originally expected to eventually implement some sort > > of decoding of load/store instructions where the ISV bit was not set. > > > However, it turns out that many of the instructions which don't provide > > decoding information on abort are not safe to use for MMIO accesses, and > > the remaining few that would potentially make sense to use on MMIO > > accesses, such as those with register writeback, are not used in > > practice. It also turns out that fetching an instruction from guest > > memory can be a pretty horrible affair, involving stopping all CPUs on > > SMP systems, handling multiple corner cases of address translation in > > software, and more. It doesn't appear likely that we'll ever implement > > this in the kernel. > > > What is much more common is that a user has misconfigured his/her guest > > and is actually not accessing an MMIO region, but just hitting some > > random hole in the IPA space. In this scenario, the error message above > > is almost misleading and has led to a great deal of confusion over the > > years. > > > > It is, nevertheless, ABI to userspace, and we therefore need to > > introduce a new capability that userspace explicitly enables to change > > behavior. > > > > This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV) > > which does exactly that, and introduces a new exit reason to report the > > event to userspace. User space can then emulate an exception to the > > guest, restart the guest, suspend the guest, or take any other > > appropriate action as per the policy of the running system. > > > > > diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt > > index 2d067767b617..02501333f746 100644 > > --- a/Documentation/virt/kvm/api.txt > > +++ b/Documentation/virt/kvm/api.txt > > @@ -4453,6 +4453,35 @@ Hyper-V SynIC state change. Notification is used to remap SynIC > > event/message pages and to enable/disable SynIC messages/events processing > > in userspace. > > > > + /* KVM_EXIT_ARM_NISV */ > > + struct { > > + __u64 esr_iss; > > + __u64 fault_ipa; > > + } arm_nisv; > > + > > +Used on arm and arm64 systems. If a guest accesses memory not in a memslot, > > +KVM will typically return to userspace and ask it to do MMIO emulation on its > > +behalf. However, for certain classes of instructions, no instruction decode > > +(direction, length of memory access) is provided, and fetching and decoding > > +the instruction from the VM is overly complicated to live in the kernel. > > + > > +Historically, when this situation occurred, KVM would print a warning and kill > > +the VM. KVM assumed that if the guest accessed non-memslot memory, it was > > +trying to do I/O, which just couldn't be emulated, and the warning message was > > +phrased accordingly. However, what happened more often was that a guest bug > > +caused access outside the guest memory areas which should lead to a more > > +mearningful warning message and an external abort in the guest, if the access > > +did not fall within an I/O window. > > + > > +Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable > > +this capability at VM creation. Once this is done, these types of errors will > > +instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from > > +the HSR (arm) and ESR_EL2 (arm64) in the esr_iss field, and the faulting IPA > > +in the fault_ipa field. Userspace can either fix up the access if it's > > +actually an I/O access by decoding the instruction from guest memory (if it's > > +very brave) and continue executing the guest, or it can decide to suspend, > > +dump, or restart the guest. > > Should we document which parts of instruction-emulation the VMM has to do? > > For KVM_EXIT_MMIO, kvm looks after updating registers and advancing the PC and SS state > machine. > > I can't see a kvm_skip_instr() in here, so the VMM has to do all of that stuff, including > any register post-increment, which is the reason we need the instruction in the first place. I'll clarify that in this case, KVM will not do anything, including not advanding the PC. As for what user space would actually have to do, I'd prefer not trying to put that in English here, because chances are that I'll be wrong. Thank, Christoffer
diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt index 2d067767b617..02501333f746 100644 --- a/Documentation/virt/kvm/api.txt +++ b/Documentation/virt/kvm/api.txt @@ -4453,6 +4453,35 @@ Hyper-V SynIC state change. Notification is used to remap SynIC event/message pages and to enable/disable SynIC messages/events processing in userspace. + /* KVM_EXIT_ARM_NISV */ + struct { + __u64 esr_iss; + __u64 fault_ipa; + } arm_nisv; + +Used on arm and arm64 systems. If a guest accesses memory not in a memslot, +KVM will typically return to userspace and ask it to do MMIO emulation on its +behalf. However, for certain classes of instructions, no instruction decode +(direction, length of memory access) is provided, and fetching and decoding +the instruction from the VM is overly complicated to live in the kernel. + +Historically, when this situation occurred, KVM would print a warning and kill +the VM. KVM assumed that if the guest accessed non-memslot memory, it was +trying to do I/O, which just couldn't be emulated, and the warning message was +phrased accordingly. However, what happened more often was that a guest bug +caused access outside the guest memory areas which should lead to a more +mearningful warning message and an external abort in the guest, if the access +did not fall within an I/O window. + +Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable +this capability at VM creation. Once this is done, these types of errors will +instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from +the HSR (arm) and ESR_EL2 (arm64) in the esr_iss field, and the faulting IPA +in the fault_ipa field. Userspace can either fix up the access if it's +actually an I/O access by decoding the instruction from guest memory (if it's +very brave) and continue executing the guest, or it can decide to suspend, +dump, or restart the guest. + /* Fix the size of the union. */ char padding[256]; }; diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index 0125aa059d5b..ce61b3b0058d 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -162,6 +162,8 @@ #define HSR_ISV (_AC(1, UL) << HSR_ISV_SHIFT) #define HSR_SRT_SHIFT (16) #define HSR_SRT_MASK (0xf << HSR_SRT_SHIFT) +#define HSR_CM (1 << 8) +#define HSR_WNR (1 << 6) #define HSR_FSC (0x3f) #define HSR_FSC_TYPE (0x3c) #define HSR_SSE (1 << 21) diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index 40002416efec..e8ef349c04b4 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -167,6 +167,11 @@ static inline bool kvm_vcpu_dabt_isvalid(struct kvm_vcpu *vcpu) return kvm_vcpu_get_hsr(vcpu) & HSR_ISV; } +static inline unsigned long kvm_vcpu_dabt_iss_nisv_sanitized(const struct kvm_vcpu *vcpu) +{ + return kvm_vcpu_get_hsr(vcpu) & (HSR_CM | HSR_WNR | HSR_FSC); +} + static inline bool kvm_vcpu_dabt_iswrite(struct kvm_vcpu *vcpu) { return kvm_vcpu_get_hsr(vcpu) & HSR_WNR; diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 8a37c8e89777..19a92c49039c 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -76,6 +76,14 @@ struct kvm_arch { /* Mandated version of PSCI */ u32 psci_version; + + /* + * If we encounter a data abort without valid instruction syndrome + * information, report this to user space. User space can (and + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is + * supported. + */ + bool return_nisv_io_abort_to_user; }; #define KVM_NR_MEM_OBJS 40 diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index d69c1efc63e7..a3c967988e1d 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -258,6 +258,11 @@ static inline bool kvm_vcpu_dabt_isvalid(const struct kvm_vcpu *vcpu) return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_ISV); } +static inline unsigned long kvm_vcpu_dabt_iss_nisv_sanitized(const struct kvm_vcpu *vcpu) +{ + return kvm_vcpu_get_hsr(vcpu) & (ESR_ELx_CM | ESR_ELx_WNR | ESR_ELx_FSC); +} + static inline bool kvm_vcpu_dabt_issext(const struct kvm_vcpu *vcpu) { return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SSE); diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f656169db8c3..019bc560edc1 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -83,6 +83,14 @@ struct kvm_arch { /* Mandated version of PSCI */ u32 psci_version; + + /* + * If we encounter a data abort without valid instruction syndrome + * information, report this to user space. User space can (and + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is + * supported. + */ + bool return_nisv_io_abort_to_user; }; #define KVM_NR_MEM_OBJS 40 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5e3f12d5359e..dd79235b6435 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -235,6 +235,7 @@ struct kvm_hyperv_exit { #define KVM_EXIT_S390_STSI 25 #define KVM_EXIT_IOAPIC_EOI 26 #define KVM_EXIT_HYPERV 27 +#define KVM_EXIT_ARM_NISV 28 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -392,6 +393,11 @@ struct kvm_run { } eoi; /* KVM_EXIT_HYPERV */ struct kvm_hyperv_exit hyperv; + /* KVM_EXIT_ARM_NISV */ + struct { + __u64 esr_iss; + __u64 fault_ipa; + } arm_nisv; /* Fix the size of the union. */ char padding[256]; }; @@ -996,6 +1002,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 #define KVM_CAP_PMU_EVENT_FILTER 173 +#define KVM_CAP_ARM_NISV_TO_USER 174 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 35a069815baf..7153504bb106 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void) return 0; } +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, + struct kvm_enable_cap *cap) +{ + int r; + + if (cap->flags) + return -EINVAL; + + switch (cap->cap) { + case KVM_CAP_ARM_NISV_TO_USER: + r = 0; + kvm->arch.return_nisv_io_abort_to_user = true; + break; + default: + r = -EINVAL; + break; + } + + return r; +} /** * kvm_arch_init_vm - initializes a VM data structure @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_MP_STATE: case KVM_CAP_IMMEDIATE_EXIT: case KVM_CAP_VCPU_EVENTS: + case KVM_CAP_ARM_NISV_TO_USER: r = 1; break; case KVM_CAP_ARM_SET_DEVICE_ADDR: diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c index 6af5c91337f2..7b92e2744fa7 100644 --- a/virt/kvm/arm/mmio.c +++ b/virt/kvm/arm/mmio.c @@ -167,8 +167,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, if (ret) return ret; } else { - kvm_err("load/store instruction decoding not implemented\n"); - return -ENOSYS; + if (vcpu->kvm->arch.return_nisv_io_abort_to_user) { + run->exit_reason = KVM_EXIT_ARM_NISV; + run->arm_nisv.esr_iss = kvm_vcpu_dabt_iss_nisv_sanitized(vcpu); + run->arm_nisv.fault_ipa = fault_ipa; + return 0; + } else { + kvm_info("Encountered data abort outside memslots with no valid syndrome info\n"); + return -ENOSYS; + } } rt = vcpu->arch.mmio_decode.rt;
For a long time, if a guest accessed memory outside of a memslot using any of the load/store instructions in the architecture which doesn't supply decoding information in the ESR_EL2 (the ISV bit is not set), the kernel would print the following message and terminate the VM as a result of returning -ENOSYS to userspace: load/store instruction decoding not implemented The reason behind this message is that KVM assumes that all accesses outside a memslot is an MMIO access which should be handled by userspace, and we originally expected to eventually implement some sort of decoding of load/store instructions where the ISV bit was not set. However, it turns out that many of the instructions which don't provide decoding information on abort are not safe to use for MMIO accesses, and the remaining few that would potentially make sense to use on MMIO accesses, such as those with register writeback, are not used in practice. It also turns out that fetching an instruction from guest memory can be a pretty horrible affair, involving stopping all CPUs on SMP systems, handling multiple corner cases of address translation in software, and more. It doesn't appear likely that we'll ever implement this in the kernel. What is much more common is that a user has misconfigured his/her guest and is actually not accessing an MMIO region, but just hitting some random hole in the IPA space. In this scenario, the error message above is almost misleading and has led to a great deal of confusion over the years. It is, nevertheless, ABI to userspace, and we therefore need to introduce a new capability that userspace explicitly enables to change behavior. This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV) which does exactly that, and introduces a new exit reason to report the event to userspace. User space can then emulate an exception to the guest, restart the guest, suspend the guest, or take any other appropriate action as per the policy of the running system. Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> --- Documentation/virt/kvm/api.txt | 29 ++++++++++++++++++++++++++++ arch/arm/include/asm/kvm_arm.h | 2 ++ arch/arm/include/asm/kvm_emulate.h | 5 +++++ arch/arm/include/asm/kvm_host.h | 8 ++++++++ arch/arm64/include/asm/kvm_emulate.h | 5 +++++ arch/arm64/include/asm/kvm_host.h | 8 ++++++++ include/uapi/linux/kvm.h | 7 +++++++ virt/kvm/arm/arm.c | 21 ++++++++++++++++++++ virt/kvm/arm/mmio.c | 11 +++++++++-- 9 files changed, 94 insertions(+), 2 deletions(-)