diff mbox series

KVM: x86: Reset RSP before exiting to userspace when emulating POPA

Message ID 20240724044529.3837492-1-tao1.su@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Reset RSP before exiting to userspace when emulating POPA | expand

Commit Message

Tao Su July 24, 2024, 4:45 a.m. UTC
When emulating POPA and exiting to userspace for MMIO, reset modified RSP
as emulation context may not be reset. POPA may generate more multiple
reads, i.e. multiple POPs from the stack, but if stack points to MMIO,
KVM needs to emulate multiple MMIO reads.

When one MMIO done, POPA may be re-emulated with EMULTYPE_NO_DECODE set,
i.e. ctxt will not be reset, but RSP is modified by previous emulation of
current POPA instruction, which eventually leads to emulation error.

The commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if
emulator detects exception") provides a detailed analysis of how KVM
emulates multiple MMIO reads, and its correctness can be verified in the
POPA instruction with this patch.

Signed-off-by: Tao Su <tao1.su@linux.intel.com>
---
For testing, we can add POPA to the emulator case in kvm-unit-test.
---
 arch/x86/kvm/emulate.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)


base-commit: 786c8248dbd33a5a7a07f7c6e55a7bfc68d2ca48

Comments

Sean Christopherson July 24, 2024, 3:33 p.m. UTC | #1
On Wed, Jul 24, 2024, Tao Su wrote:
> When emulating POPA and exiting to userspace for MMIO, reset modified RSP
> as emulation context may not be reset. POPA may generate more multiple
> reads, i.e. multiple POPs from the stack, but if stack points to MMIO,
> KVM needs to emulate multiple MMIO reads.
> 
> When one MMIO done, POPA may be re-emulated with EMULTYPE_NO_DECODE set,
> i.e. ctxt will not be reset, but RSP is modified by previous emulation of
> current POPA instruction, which eventually leads to emulation error.
> 
> The commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if
> emulator detects exception") provides a detailed analysis of how KVM
> emulates multiple MMIO reads, and its correctness can be verified in the
> POPA instruction with this patch.

I don't see how this can work.  If POPA is reading from MMIO, it will need to
do 8 distinct emulated MMIO accesses.  Unwinding to the original RSP will allow
the first MMIO (store to EDI) to succeed, but then the second MMIO (store to ESI)
will exit back to userspace.  And the second restart will load EDI with the
result of the MMIO, not ESI.  It will also re-trigger the second MMIO indefinitely.

To make this work, KVM would need to allow precisely resuming execution where
it left off.  We can't use MMIO fragments, because unlike MMIO accesses that
split pages, each memory load is an individual access.

I don't see any reason to try to make this work.  It's a ridiculously convoluted
scenario that, AFAIK, has no real world application.

> Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> ---
> For testing, we can add POPA to the emulator case in kvm-unit-test.
> ---
>  arch/x86/kvm/emulate.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index e72aed25d721..3746fef6ca60 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1999,6 +1999,7 @@ static int em_pushf(struct x86_emulate_ctxt *ctxt)
>  
>  static int em_popa(struct x86_emulate_ctxt *ctxt)
>  {
> +	unsigned long old_esp = reg_read(ctxt, VCPU_REGS_RSP);
>  	int rc = X86EMUL_CONTINUE;
>  	int reg = VCPU_REGS_RDI;
>  	u32 val = 0;
> @@ -2010,8 +2011,11 @@ static int em_popa(struct x86_emulate_ctxt *ctxt)
>  		}
>  
>  		rc = emulate_pop(ctxt, &val, ctxt->op_bytes);
> -		if (rc != X86EMUL_CONTINUE)
> +		if (rc != X86EMUL_CONTINUE) {
> +			assign_register(reg_rmw(ctxt, VCPU_REGS_RSP),
> +					old_esp, ctxt->op_bytes);
>  			break;
> +		}
>  		assign_register(reg_rmw(ctxt, reg), val, ctxt->op_bytes);
>  		--reg;
>  	}
> 
> base-commit: 786c8248dbd33a5a7a07f7c6e55a7bfc68d2ca48
> -- 
> 2.34.1
>
Tao Su July 25, 2024, 2:32 a.m. UTC | #2
On Wed, Jul 24, 2024 at 08:33:48AM -0700, Sean Christopherson wrote:
> On Wed, Jul 24, 2024, Tao Su wrote:
> > When emulating POPA and exiting to userspace for MMIO, reset modified RSP
> > as emulation context may not be reset. POPA may generate more multiple
> > reads, i.e. multiple POPs from the stack, but if stack points to MMIO,
> > KVM needs to emulate multiple MMIO reads.
> > 
> > When one MMIO done, POPA may be re-emulated with EMULTYPE_NO_DECODE set,
> > i.e. ctxt will not be reset, but RSP is modified by previous emulation of
> > current POPA instruction, which eventually leads to emulation error.
> > 
> > The commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if
> > emulator detects exception") provides a detailed analysis of how KVM
> > emulates multiple MMIO reads, and its correctness can be verified in the
> > POPA instruction with this patch.
> 
> I don't see how this can work.  If POPA is reading from MMIO, it will need to
> do 8 distinct emulated MMIO accesses.  Unwinding to the original RSP will allow
> the first MMIO (store to EDI) to succeed, but then the second MMIO (store to ESI)
> will exit back to userspace.  And the second restart will load EDI with the
> result of the MMIO, not ESI.  It will also re-trigger the second MMIO indefinitely.
> 

This can work like commit 0dc902267cb3 description. Since there is a buffer
(ctxt->mem_read), KVM can safely restart the instruction from the beginning.

After the first MMIO (store to EDI) done, vcpu->mmio_read_completed is set and
POPA is re-emulated with EMULTYPE_NO_DECODE. When re-emulating, KVM will try
first MMIO (store to EDI) _again_, but KVM will not exit to userspace in
emulator_read_write() because vcpu->mmio_read_completed is set and then adjust
mc->end in read_emulated().

For the second MMIO (store to ESI), it will exit to userspace and re-emulate.
When re-emulating, KVM can directly read EDI from the buffer (mc->data) and try
second MMIO (store to ESI) _again_, but KVM will not exit to userspace in
emulator_read_write() because vcpu->mmio_read_completed is set and then adjust
mc->end in read_emulated().

For the third MMIO (store to EBP) ...

> To make this work, KVM would need to allow precisely resuming execution where
> it left off.  We can't use MMIO fragments, because unlike MMIO accesses that
> split pages, each memory load is an individual access.
> 

Since all MMIO reads are saved to the buffer (mc->data) and the index (mc->end)
is adjusted by every re-emulation, KVM already supports multiple MMIO reads.

> I don't see any reason to try to make this work.  It's a ridiculously convoluted
> scenario that, AFAIK, has no real world application.
> 

I agree POPA+MMIO is rare but supporting it to be same as hardware is cheap. We
can also use POPA to validate the multiple MMIO reads which has a complex flow (
actually, that's why I tried POPA in kvm-unit-test and then find this issue).

> > Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> > ---
> > For testing, we can add POPA to the emulator case in kvm-unit-test.
> > ---
> >  arch/x86/kvm/emulate.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > index e72aed25d721..3746fef6ca60 100644
> > --- a/arch/x86/kvm/emulate.c
> > +++ b/arch/x86/kvm/emulate.c
> > @@ -1999,6 +1999,7 @@ static int em_pushf(struct x86_emulate_ctxt *ctxt)
> >  
> >  static int em_popa(struct x86_emulate_ctxt *ctxt)
> >  {
> > +	unsigned long old_esp = reg_read(ctxt, VCPU_REGS_RSP);
> >  	int rc = X86EMUL_CONTINUE;
> >  	int reg = VCPU_REGS_RDI;
> >  	u32 val = 0;
> > @@ -2010,8 +2011,11 @@ static int em_popa(struct x86_emulate_ctxt *ctxt)
> >  		}
> >  
> >  		rc = emulate_pop(ctxt, &val, ctxt->op_bytes);
> > -		if (rc != X86EMUL_CONTINUE)
> > +		if (rc != X86EMUL_CONTINUE) {
> > +			assign_register(reg_rmw(ctxt, VCPU_REGS_RSP),
> > +					old_esp, ctxt->op_bytes);
> >  			break;
> > +		}
> >  		assign_register(reg_rmw(ctxt, reg), val, ctxt->op_bytes);
> >  		--reg;
> >  	}
> > 
> > base-commit: 786c8248dbd33a5a7a07f7c6e55a7bfc68d2ca48
> > -- 
> > 2.34.1
> >
diff mbox series

Patch

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e72aed25d721..3746fef6ca60 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1999,6 +1999,7 @@  static int em_pushf(struct x86_emulate_ctxt *ctxt)
 
 static int em_popa(struct x86_emulate_ctxt *ctxt)
 {
+	unsigned long old_esp = reg_read(ctxt, VCPU_REGS_RSP);
 	int rc = X86EMUL_CONTINUE;
 	int reg = VCPU_REGS_RDI;
 	u32 val = 0;
@@ -2010,8 +2011,11 @@  static int em_popa(struct x86_emulate_ctxt *ctxt)
 		}
 
 		rc = emulate_pop(ctxt, &val, ctxt->op_bytes);
-		if (rc != X86EMUL_CONTINUE)
+		if (rc != X86EMUL_CONTINUE) {
+			assign_register(reg_rmw(ctxt, VCPU_REGS_RSP),
+					old_esp, ctxt->op_bytes);
 			break;
+		}
 		assign_register(reg_rmw(ctxt, reg), val, ctxt->op_bytes);
 		--reg;
 	}