diff mbox series

[2/2] KVM: x86/mmu: Bail out kvm_tdp_map_page() when VM dead

Message ID 20250217085731.19733-1-yan.y.zhao@intel.com (mailing list archive)
State New
Headers show
Series two KVM MMU fixes for TDX | expand

Commit Message

Yan Zhao Feb. 17, 2025, 8:57 a.m. UTC
Bail out of the loop in kvm_tdp_map_page() when a VM is dead. Otherwise,
kvm_tdp_map_page() may get stuck in the kernel loop when there's only one
vCPU in the VM (or if the other vCPUs are not executing ioctls), even if
fatal errors have occurred.

kvm_tdp_map_page() is called by the ioctl KVM_PRE_FAULT_MEMORY or the TDX
ioctl KVM_TDX_INIT_MEM_REGION. It loops in the kernel whenever RET_PF_RETRY
is returned. In the TDP MMU, kvm_tdp_mmu_map() always returns RET_PF_RETRY,
regardless of the specific error code from tdp_mmu_set_spte_atomic(),
tdp_mmu_link_sp(), or tdp_mmu_split_huge_page(). While this is acceptable
in general cases where the only possible error code from these functions is
-EBUSY, TDX introduces an additional error code, -EIO, due to SEAMCALL
errors.

Since this -EIO error is also a fatal error, check for VM dead in the
kvm_tdp_map_page() to avoid unnecessary retries until a signal is pending.

The error -EIO is uncommon and has not been observed in real workloads.
Currently, it is only hypothetically triggered by bypassing the real
SEAMCALL and faking an error in the SEAMCALL wrapper.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Sean Christopherson Feb. 18, 2025, 4:03 p.m. UTC | #1
On Mon, Feb 17, 2025, Yan Zhao wrote:
> Bail out of the loop in kvm_tdp_map_page() when a VM is dead. Otherwise,
> kvm_tdp_map_page() may get stuck in the kernel loop when there's only one
> vCPU in the VM (or if the other vCPUs are not executing ioctls), even if
> fatal errors have occurred.
> 
> kvm_tdp_map_page() is called by the ioctl KVM_PRE_FAULT_MEMORY or the TDX
> ioctl KVM_TDX_INIT_MEM_REGION. It loops in the kernel whenever RET_PF_RETRY
> is returned. In the TDP MMU, kvm_tdp_mmu_map() always returns RET_PF_RETRY,
> regardless of the specific error code from tdp_mmu_set_spte_atomic(),
> tdp_mmu_link_sp(), or tdp_mmu_split_huge_page(). While this is acceptable
> in general cases where the only possible error code from these functions is
> -EBUSY, TDX introduces an additional error code, -EIO, due to SEAMCALL
> errors.
> 
> Since this -EIO error is also a fatal error, check for VM dead in the
> kvm_tdp_map_page() to avoid unnecessary retries until a signal is pending.
> 
> The error -EIO is uncommon and has not been observed in real workloads.
> Currently, it is only hypothetically triggered by bypassing the real
> SEAMCALL and faking an error in the SEAMCALL wrapper.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 08ed5092c15a..3a8d735939b5 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4700,6 +4700,10 @@ int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level
>  	do {
>  		if (signal_pending(current))
>  			return -EINTR;
> +
> +		if (vcpu->kvm->vm_dead)

This needs to be READ_ONCE().  Along those lines, I think I'd prefer

		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu))
			return -EIO;

or

		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) 
			return -EIO;

so that if more terminal requests come long, we can bundle everything into a
single check via a selective version of kvm_request_pending().
Yan Zhao Feb. 19, 2025, 2:17 a.m. UTC | #2
On Tue, Feb 18, 2025 at 08:03:57AM -0800, Sean Christopherson wrote:
> On Mon, Feb 17, 2025, Yan Zhao wrote:
> > Bail out of the loop in kvm_tdp_map_page() when a VM is dead. Otherwise,
> > kvm_tdp_map_page() may get stuck in the kernel loop when there's only one
> > vCPU in the VM (or if the other vCPUs are not executing ioctls), even if
> > fatal errors have occurred.
> > 
> > kvm_tdp_map_page() is called by the ioctl KVM_PRE_FAULT_MEMORY or the TDX
> > ioctl KVM_TDX_INIT_MEM_REGION. It loops in the kernel whenever RET_PF_RETRY
> > is returned. In the TDP MMU, kvm_tdp_mmu_map() always returns RET_PF_RETRY,
> > regardless of the specific error code from tdp_mmu_set_spte_atomic(),
> > tdp_mmu_link_sp(), or tdp_mmu_split_huge_page(). While this is acceptable
> > in general cases where the only possible error code from these functions is
> > -EBUSY, TDX introduces an additional error code, -EIO, due to SEAMCALL
> > errors.
> > 
> > Since this -EIO error is also a fatal error, check for VM dead in the
> > kvm_tdp_map_page() to avoid unnecessary retries until a signal is pending.
> > 
> > The error -EIO is uncommon and has not been observed in real workloads.
> > Currently, it is only hypothetically triggered by bypassing the real
> > SEAMCALL and faking an error in the SEAMCALL wrapper.
> > 
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 08ed5092c15a..3a8d735939b5 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4700,6 +4700,10 @@ int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level
> >  	do {
> >  		if (signal_pending(current))
> >  			return -EINTR;
> > +
> > +		if (vcpu->kvm->vm_dead)
> 
> This needs to be READ_ONCE().  Along those lines, I think I'd prefer
Indeed.

> 
> 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu))
> 			return -EIO;
> 
> or
> 
> 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) 
> 			return -EIO;
Hmm, what's the difference between the two cases?
Paste error?

> so that if more terminal requests come long, we can bundle everything into a
> single check via a selective version of kvm_request_pending().
Makes sense!
I'll update it to
 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) 
 			return -EIO;
in v2.
Sean Christopherson Feb. 19, 2025, 2:18 p.m. UTC | #3
On Wed, Feb 19, 2025, Yan Zhao wrote:
> On Tue, Feb 18, 2025 at 08:03:57AM -0800, Sean Christopherson wrote:
> > On Mon, Feb 17, 2025, Yan Zhao wrote:
> > > Bail out of the loop in kvm_tdp_map_page() when a VM is dead. Otherwise,
> > > kvm_tdp_map_page() may get stuck in the kernel loop when there's only one
> > > vCPU in the VM (or if the other vCPUs are not executing ioctls), even if
> > > fatal errors have occurred.
> > > 
> > > kvm_tdp_map_page() is called by the ioctl KVM_PRE_FAULT_MEMORY or the TDX
> > > ioctl KVM_TDX_INIT_MEM_REGION. It loops in the kernel whenever RET_PF_RETRY
> > > is returned. In the TDP MMU, kvm_tdp_mmu_map() always returns RET_PF_RETRY,
> > > regardless of the specific error code from tdp_mmu_set_spte_atomic(),
> > > tdp_mmu_link_sp(), or tdp_mmu_split_huge_page(). While this is acceptable
> > > in general cases where the only possible error code from these functions is
> > > -EBUSY, TDX introduces an additional error code, -EIO, due to SEAMCALL
> > > errors.
> > > 
> > > Since this -EIO error is also a fatal error, check for VM dead in the
> > > kvm_tdp_map_page() to avoid unnecessary retries until a signal is pending.
> > > 
> > > The error -EIO is uncommon and has not been observed in real workloads.
> > > Currently, it is only hypothetically triggered by bypassing the real
> > > SEAMCALL and faking an error in the SEAMCALL wrapper.
> > > 
> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 08ed5092c15a..3a8d735939b5 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -4700,6 +4700,10 @@ int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level
> > >  	do {
> > >  		if (signal_pending(current))
> > >  			return -EINTR;
> > > +
> > > +		if (vcpu->kvm->vm_dead)
> > 
> > This needs to be READ_ONCE().  Along those lines, I think I'd prefer
> Indeed.
> 
> > 
> > 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu))
> > 			return -EIO;
> > 
> > or
> > 
> > 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) 
> > 			return -EIO;
> Hmm, what's the difference between the two cases?
> Paste error?

Hrm, yes.  I already forgot what I was thinking, but I believe the second one was
supposed to be:

		if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu))
			return -EIO;

The "check" version should be fine though, i.e. clearing the request is ok,
because kvm_vcpu_ioctl() will see vcpu->kvm->vm_dead before handling KVM_RUN or
any other ioctl.
Yan Zhao Feb. 20, 2025, 1:50 a.m. UTC | #4
On Wed, Feb 19, 2025 at 06:18:41AM -0800, Sean Christopherson wrote:
> On Wed, Feb 19, 2025, Yan Zhao wrote:
> > On Tue, Feb 18, 2025 at 08:03:57AM -0800, Sean Christopherson wrote:
> > > On Mon, Feb 17, 2025, Yan Zhao wrote:
> > > > Bail out of the loop in kvm_tdp_map_page() when a VM is dead. Otherwise,
> > > > kvm_tdp_map_page() may get stuck in the kernel loop when there's only one
> > > > vCPU in the VM (or if the other vCPUs are not executing ioctls), even if
> > > > fatal errors have occurred.
> > > > 
> > > > kvm_tdp_map_page() is called by the ioctl KVM_PRE_FAULT_MEMORY or the TDX
> > > > ioctl KVM_TDX_INIT_MEM_REGION. It loops in the kernel whenever RET_PF_RETRY
> > > > is returned. In the TDP MMU, kvm_tdp_mmu_map() always returns RET_PF_RETRY,
> > > > regardless of the specific error code from tdp_mmu_set_spte_atomic(),
> > > > tdp_mmu_link_sp(), or tdp_mmu_split_huge_page(). While this is acceptable
> > > > in general cases where the only possible error code from these functions is
> > > > -EBUSY, TDX introduces an additional error code, -EIO, due to SEAMCALL
> > > > errors.
> > > > 
> > > > Since this -EIO error is also a fatal error, check for VM dead in the
> > > > kvm_tdp_map_page() to avoid unnecessary retries until a signal is pending.
> > > > 
> > > > The error -EIO is uncommon and has not been observed in real workloads.
> > > > Currently, it is only hypothetically triggered by bypassing the real
> > > > SEAMCALL and faking an error in the SEAMCALL wrapper.
> > > > 
> > > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > > ---
> > > >  arch/x86/kvm/mmu/mmu.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 08ed5092c15a..3a8d735939b5 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -4700,6 +4700,10 @@ int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level
> > > >  	do {
> > > >  		if (signal_pending(current))
> > > >  			return -EINTR;
> > > > +
> > > > +		if (vcpu->kvm->vm_dead)
> > > 
> > > This needs to be READ_ONCE().  Along those lines, I think I'd prefer
> > Indeed.
> > 
> > > 
> > > 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu))
> > > 			return -EIO;
> > > 
> > > or
> > > 
> > > 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) 
> > > 			return -EIO;
> > Hmm, what's the difference between the two cases?
> > Paste error?
> 
> Hrm, yes.  I already forgot what I was thinking, but I believe the second one was
> supposed to be:
> 
> 		if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu))
> 			return -EIO;
> 
> The "check" version should be fine though, i.e. clearing the request is ok,
> because kvm_vcpu_ioctl() will see vcpu->kvm->vm_dead before handling KVM_RUN or
> any other ioctl.
Got it!
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 08ed5092c15a..3a8d735939b5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4700,6 +4700,10 @@  int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level
 	do {
 		if (signal_pending(current))
 			return -EINTR;
+
+		if (vcpu->kvm->vm_dead)
+			return -EIO;
+
 		cond_resched();
 		r = kvm_mmu_do_page_fault(vcpu, gpa, error_code, true, NULL, level);
 	} while (r == RET_PF_RETRY);