Message ID | 20240508132502.184428-1-julian.stecklina@cyberus-technology.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: add KVM_RUN_X86_GUEST_MODE kvm_run flag | expand |
Hey Sean, does this this patch go into the right direction? Julian On Wed, 2024-05-08 at 15:25 +0200, Julian Stecklina wrote: > From: Thomas Prescher <thomas.prescher@cyberus-technology.de> > > When a vCPU is interrupted by a signal while running a nested guest, > KVM will exit to userspace with L2 state. However, userspace has no > way to know whether it sees L1 or L2 state (besides calling > KVM_GET_STATS_FD, which does not have a stable ABI). > > This causes multiple problems: > > The simplest one is L2 state corruption when userspace marks the sregs > as dirty. See this mailing list thread [1] for a complete discussion. > > Another problem is that if userspace decides to continue by emulating > instructions, it will unknowingly emulate with L2 state as if L1 > doesn't exist, which can be considered a weird guest escape. > > This patch introduces a new flag KVM_RUN_X86_GUEST_MODE in the kvm_run > data structure, which is set when the vCPU exited while running a > nested guest. Userspace can then handle this situation. > > To see whether this functionality is available, this patch also > introduces a new capability KVM_CAP_X86_GUEST_MODE. > > [1] > https://lore.kernel.org/kvm/20240416123558.212040-1-julian.stecklina@cyberus-technology.de/T/#m280aadcb2e10ae02c191a7dc4ed4b711a74b1f55 > > Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de> > Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de> > --- > Documentation/virt/kvm/api.rst | 17 +++++++++++++++++ > arch/x86/include/uapi/asm/kvm.h | 1 + > arch/x86/kvm/x86.c | 3 +++ > include/uapi/linux/kvm.h | 1 + > 4 files changed, 22 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 0b5a33ee71ee..7748c3eb98e0 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6419,6 +6419,9 @@ affect the device's behavior. Current defined flags:: > #define KVM_RUN_X86_SMM (1 << 0) > /* x86, set if bus lock detected in VM */ > #define KVM_RUN_BUS_LOCK (1 << 1) > + /* x86, set if the VCPU exited from a nested (L2) guest */ > + #define KVM_RUN_X86_GUEST_MODE (1 << 2) > + > /* arm64, set for KVM_EXIT_DEBUG */ > #define KVM_DEBUG_ARCH_HSR_HIGH_VALID (1 << 0) > > @@ -8063,6 +8066,20 @@ error/annotated fault. > > See KVM_EXIT_MEMORY_FAULT for more information. > > +7.34 KVM_CAP_X86_GUEST_MODE > +------------------------------ > + > +:Architectures: x86 > +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP. > + > +The presence of this capability indicates that KVM_RUN will update the > +KVM_RUN_X86_GUEST_MODE bit in kvm_run.flags to indicate whether the > +vCPU was executing nested guest code when it exited. > + > +KVM exits with the register state of either the L1 or L2 guest > +depending on which executed at the time of an exit. Userspace must > +take care to differentiate between these cases. > + > 8. Other capabilities. > ====================== > > diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h > index ef11aa4cab42..ff4ed82a2d06 100644 > --- a/arch/x86/include/uapi/asm/kvm.h > +++ b/arch/x86/include/uapi/asm/kvm.h > @@ -106,6 +106,7 @@ struct kvm_ioapic_state { > > #define KVM_RUN_X86_SMM (1 << 0) > #define KVM_RUN_X86_BUS_LOCK (1 << 1) > +#define KVM_RUN_X86_GUEST_MODE (1 << 2) > > /* for KVM_GET_REGS and KVM_SET_REGS */ > struct kvm_regs { > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 91478b769af0..64f2cba9345e 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4714,6 +4714,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long > ext) > case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: > case KVM_CAP_IRQFD_RESAMPLE: > case KVM_CAP_MEMORY_FAULT_INFO: > + case KVM_CAP_X86_GUEST_MODE: > r = 1; > break; > case KVM_CAP_EXIT_HYPERCALL: > @@ -10200,6 +10201,8 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu) > > if (is_smm(vcpu)) > kvm_run->flags |= KVM_RUN_X86_SMM; > + if (is_guest_mode(vcpu)) > + kvm_run->flags |= KVM_RUN_X86_GUEST_MODE; > } > > static void update_cr8_intercept(struct kvm_vcpu *vcpu) > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 2190adbe3002..ccb12f6a656d 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -917,6 +917,7 @@ struct kvm_enable_cap { > #define KVM_CAP_MEMORY_ATTRIBUTES 233 > #define KVM_CAP_GUEST_MEMFD 234 > #define KVM_CAP_VM_TYPES 235 > +#define KVM_CAP_X86_GUEST_MODE 236 > > struct kvm_irq_routing_irqchip { > __u32 irqchip;
On Wed, May 15, 2024, Julian Stecklina wrote: > Hey Sean, > > does this this patch go into the right direction? At a glance, yes. We're in a "quite period" until 6.10-rc1, so it'll be a few weeks before I take a closer look at this (or really anything that's destined for 6.11 or later).
On Wed, 08 May 2024 15:25:01 +0200, Julian Stecklina wrote: > When a vCPU is interrupted by a signal while running a nested guest, > KVM will exit to userspace with L2 state. However, userspace has no > way to know whether it sees L1 or L2 state (besides calling > KVM_GET_STATS_FD, which does not have a stable ABI). > > This causes multiple problems: > > [...] Applied to kvm-x86 misc. Note, the capability got number 237, as 236 was claimed by KVM_CAP_X86_APIC_BUS_CYCLES_NS. The number might also change again, e.g. if a different arch adds a capability and x86 loses the race. Thanks! [1/1] KVM: x86: add KVM_RUN_X86_GUEST_MODE kvm_run flag https://github.com/kvm-x86/linux/commit/85542adb65ec -- https://github.com/kvm-x86/linux/tree/next
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 0b5a33ee71ee..7748c3eb98e0 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6419,6 +6419,9 @@ affect the device's behavior. Current defined flags:: #define KVM_RUN_X86_SMM (1 << 0) /* x86, set if bus lock detected in VM */ #define KVM_RUN_BUS_LOCK (1 << 1) + /* x86, set if the VCPU exited from a nested (L2) guest */ + #define KVM_RUN_X86_GUEST_MODE (1 << 2) + /* arm64, set for KVM_EXIT_DEBUG */ #define KVM_DEBUG_ARCH_HSR_HIGH_VALID (1 << 0) @@ -8063,6 +8066,20 @@ error/annotated fault. See KVM_EXIT_MEMORY_FAULT for more information. +7.34 KVM_CAP_X86_GUEST_MODE +------------------------------ + +:Architectures: x86 +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP. + +The presence of this capability indicates that KVM_RUN will update the +KVM_RUN_X86_GUEST_MODE bit in kvm_run.flags to indicate whether the +vCPU was executing nested guest code when it exited. + +KVM exits with the register state of either the L1 or L2 guest +depending on which executed at the time of an exit. Userspace must +take care to differentiate between these cases. + 8. Other capabilities. ====================== diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index ef11aa4cab42..ff4ed82a2d06 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -106,6 +106,7 @@ struct kvm_ioapic_state { #define KVM_RUN_X86_SMM (1 << 0) #define KVM_RUN_X86_BUS_LOCK (1 << 1) +#define KVM_RUN_X86_GUEST_MODE (1 << 2) /* for KVM_GET_REGS and KVM_SET_REGS */ struct kvm_regs { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 91478b769af0..64f2cba9345e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4714,6 +4714,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: case KVM_CAP_IRQFD_RESAMPLE: case KVM_CAP_MEMORY_FAULT_INFO: + case KVM_CAP_X86_GUEST_MODE: r = 1; break; case KVM_CAP_EXIT_HYPERCALL: @@ -10200,6 +10201,8 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu) if (is_smm(vcpu)) kvm_run->flags |= KVM_RUN_X86_SMM; + if (is_guest_mode(vcpu)) + kvm_run->flags |= KVM_RUN_X86_GUEST_MODE; } static void update_cr8_intercept(struct kvm_vcpu *vcpu) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2190adbe3002..ccb12f6a656d 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -917,6 +917,7 @@ struct kvm_enable_cap { #define KVM_CAP_MEMORY_ATTRIBUTES 233 #define KVM_CAP_GUEST_MEMFD 234 #define KVM_CAP_VM_TYPES 235 +#define KVM_CAP_X86_GUEST_MODE 236 struct kvm_irq_routing_irqchip { __u32 irqchip;