mbox series

[0/2] KVM: VMX: Clean up IRQ/NMI handling

Message ID 20200914195634.12881-1-sean.j.christopherson@intel.com (mailing list archive)
Headers show
Series KVM: VMX: Clean up IRQ/NMI handling | expand

Message

Sean Christopherson Sept. 14, 2020, 7:56 p.m. UTC
Minor (if there is such a thing for this code) cleanup of KVM's handling
of IRQ and NMI exits to move the invocation of the IRQ handler to a
standalone assembly routine, and to then consolidate the NMI handling to
use the same indirect call approach instead of using INTn.

The IRQ cleanup was suggested by Josh Poimboeuf in the context of a false
postive objtool warning[*].  I believe Josh intended to use UNWIND hints
instead of trickery to avoid objtool complaints.  I opted for trickery in
the form of a redundant, but explicit, restoration of RSP after the hidden
IRET.  AFAICT, there are no existing UNWIND hints that would let objtool
know that the stack is magically being restored, and adding a new hint to
save a single MOV <reg>, <reg> instruction seemed like overkill.

The NMI consolidation was loosely suggested by Andi Kleen.  Andi's actual
suggestion was to export and directly call the NMI handler, but that's a
more involved change (unless I'm misunderstanding the wants of the NMI
handler), whereas piggybacking the IRQ code is simple and seems like a
worthwhile intermediate step.

[*] https://lkml.kernel.org/r/20200908205947.arryy75c5cvldps7@treble

Sean Christopherson (2):
  KVM: VMX: Move IRQ invocation to assembly subroutine
  KVM: VMX: Invoke NMI handler via indirect call instead of INTn

 arch/x86/kvm/vmx/vmenter.S | 28 +++++++++++++++++
 arch/x86/kvm/vmx/vmx.c     | 61 +++++++++++---------------------------
 2 files changed, 45 insertions(+), 44 deletions(-)

Comments

Uros Bizjak Sept. 14, 2020, 8:37 p.m. UTC | #1
On Mon, Sep 14, 2020 at 9:56 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Move the asm blob that invokes the appropriate IRQ handler after VM-Exit
> into a proper subroutine.  Slightly rework the blob so that it plays
> nice with objtool without any additional hints (existing hints aren't
> able to handle returning with a seemingly modified stack size).
>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Uros Bizjak <ubizjak@gmail.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/vmenter.S | 28 ++++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/vmx.c     | 33 +++------------------------------
>  2 files changed, 31 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index 799db084a336..baec1e0fefc5 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -4,6 +4,7 @@
>  #include <asm/bitsperlong.h>
>  #include <asm/kvm_vcpu_regs.h>
>  #include <asm/nospec-branch.h>
> +#include <asm/segment.h>
>
>  #define WORD_SIZE (BITS_PER_LONG / 8)
>
> @@ -294,3 +295,30 @@ SYM_FUNC_START(vmread_error_trampoline)
>
>         ret
>  SYM_FUNC_END(vmread_error_trampoline)
> +
> +SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
> +       /*
> +        * Unconditionally create a stack frame.  RSP needs to be aligned for
> +        * x86-64, getting the correct RSP on the stack (for x86-64) would take
> +        * two instructions anyways, and it helps make objtool happy (see below).
> +        */
> +       push %_ASM_BP
> +       mov %rsp, %_ASM_BP

_ASM_SP instead of %rsp to avoid assembly failure for 32bit targets.

> +
> +#ifdef CONFIG_X86_64
> +       push $__KERNEL_DS
> +       push %_ASM_BP
> +#endif
> +       pushf
> +       push $__KERNEL_CS
> +       CALL_NOSPEC _ASM_ARG1
> +
> +       /*
> +        * "Restore" RSP from RBP, even though IRET has already unwound RSP to
> +        * the correct value.  objtool doesn't know the target will IRET and so
> +        * thinks the stack is getting walloped (without the explicit restore).
> +        */
> +       mov %_ASM_BP, %rsp
> +       pop %_ASM_BP
> +       ret
> +SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 46ba2e03a892..391f079d9136 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6409,6 +6409,8 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
>         memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
>  }
>
> +void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
> +
>  static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>  {
>         u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
> @@ -6430,10 +6432,6 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>  static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>  {
>         unsigned int vector;
> -       unsigned long entry;
> -#ifdef CONFIG_X86_64
> -       unsigned long tmp;
> -#endif
>         gate_desc *desc;
>         u32 intr_info = vmx_get_intr_info(vcpu);
>
> @@ -6443,36 +6441,11 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>
>         vector = intr_info & INTR_INFO_VECTOR_MASK;
>         desc = (gate_desc *)host_idt_base + vector;
> -       entry = gate_offset(desc);

I'd leave the above line...

>
>         kvm_before_interrupt(vcpu);
> -
> -       asm volatile(
> -#ifdef CONFIG_X86_64
> -               "mov %%rsp, %[sp]\n\t"
> -               "and $-16, %%rsp\n\t"
> -               "push %[ss]\n\t"
> -               "push %[sp]\n\t"
> -#endif
> -               "pushf\n\t"
> -               "push %[cs]\n\t"
> -               CALL_NOSPEC
> -               :
> -#ifdef CONFIG_X86_64
> -               [sp]"=&r"(tmp),
> -#endif
> -               ASM_CALL_CONSTRAINT
> -               :
> -               [thunk_target]"r"(entry),
> -#ifdef CONFIG_X86_64
> -               [ss]"i"(__KERNEL_DS),
> -#endif
> -               [cs]"i"(__KERNEL_CS)
> -       );
> -
> +       vmx_do_interrupt_nmi_irqoff(gate_offset(desc));

... to make the above line read as:

vmx_do_interrupt_nmi_irqoff(entry);

This way, it looks more descriptive to me.

Uros.

>         kvm_after_interrupt(vcpu);
>  }
> -STACK_FRAME_NON_STANDARD(handle_external_interrupt_irqoff);
>
>  static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
>  {
> --
> 2.28.0
>
Josh Poimboeuf Sept. 14, 2020, 8:40 p.m. UTC | #2
On Mon, Sep 14, 2020 at 12:56:33PM -0700, Sean Christopherson wrote:
> Move the asm blob that invokes the appropriate IRQ handler after VM-Exit
> into a proper subroutine.  Slightly rework the blob so that it plays
> nice with objtool without any additional hints (existing hints aren't
> able to handle returning with a seemingly modified stack size).
> 
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Uros Bizjak <ubizjak@gmail.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/vmenter.S | 28 ++++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/vmx.c     | 33 +++------------------------------
>  2 files changed, 31 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index 799db084a336..baec1e0fefc5 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -4,6 +4,7 @@
>  #include <asm/bitsperlong.h>
>  #include <asm/kvm_vcpu_regs.h>
>  #include <asm/nospec-branch.h>
> +#include <asm/segment.h>
>  
>  #define WORD_SIZE (BITS_PER_LONG / 8)
>  
> @@ -294,3 +295,30 @@ SYM_FUNC_START(vmread_error_trampoline)
>  
>  	ret
>  SYM_FUNC_END(vmread_error_trampoline)
> +
> +SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
> +	/*
> +	 * Unconditionally create a stack frame.  RSP needs to be aligned for
> +	 * x86-64, getting the correct RSP on the stack (for x86-64) would take
> +	 * two instructions anyways, and it helps make objtool happy (see below).
> +	 */
> +	push %_ASM_BP
> +	mov %rsp, %_ASM_BP

RSP needs to be aligned to what?  How would this align the stack, other
than by accident?

> +
> +#ifdef CONFIG_X86_64
> +	push $__KERNEL_DS
> +	push %_ASM_BP
> +#endif
> +	pushf
> +	push $__KERNEL_CS
> +	CALL_NOSPEC _ASM_ARG1
> +
> +	/*
> +	 * "Restore" RSP from RBP, even though IRET has already unwound RSP to
> +	 * the correct value.  objtool doesn't know the target will IRET and so
> +	 * thinks the stack is getting walloped (without the explicit restore).
> +	 */
> +	mov %_ASM_BP, %rsp
> +	pop %_ASM_BP
> +	ret

BTW, there *is* actually an unwind hint for this situation:
UNWIND_HINT_RET_OFFSET.

So you might be able to do something like the following (depending on
what your alignment requirements actually are):

SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
#ifdef CONFIG_X86_64
	push $__KERNEL_DS
	push %_ASM_BP
#endif
	pushf
	push $__KERNEL_CS
	CALL_NOSPEC _ASM_ARG1

	/* The call popped the pushes */
	UNWIND_HINT_RET_OFFSET sp_offset=32

	ret
SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
Sean Christopherson Sept. 14, 2020, 9:08 p.m. UTC | #3
On Mon, Sep 14, 2020 at 10:37:25PM +0200, Uros Bizjak wrote:
> On Mon, Sep 14, 2020 at 9:56 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > Move the asm blob that invokes the appropriate IRQ handler after VM-Exit
> > into a proper subroutine.  Slightly rework the blob so that it plays
> > nice with objtool without any additional hints (existing hints aren't
> > able to handle returning with a seemingly modified stack size).
> >
> > Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > Cc: Uros Bizjak <ubizjak@gmail.com>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kvm/vmx/vmenter.S | 28 ++++++++++++++++++++++++++++
> >  arch/x86/kvm/vmx/vmx.c     | 33 +++------------------------------
> >  2 files changed, 31 insertions(+), 30 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> > index 799db084a336..baec1e0fefc5 100644
> > --- a/arch/x86/kvm/vmx/vmenter.S
> > +++ b/arch/x86/kvm/vmx/vmenter.S
> > @@ -4,6 +4,7 @@
> >  #include <asm/bitsperlong.h>
> >  #include <asm/kvm_vcpu_regs.h>
> >  #include <asm/nospec-branch.h>
> > +#include <asm/segment.h>
> >
> >  #define WORD_SIZE (BITS_PER_LONG / 8)
> >
> > @@ -294,3 +295,30 @@ SYM_FUNC_START(vmread_error_trampoline)
> >
> >         ret
> >  SYM_FUNC_END(vmread_error_trampoline)
> > +
> > +SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
> > +       /*
> > +        * Unconditionally create a stack frame.  RSP needs to be aligned for
> > +        * x86-64, getting the correct RSP on the stack (for x86-64) would take
> > +        * two instructions anyways, and it helps make objtool happy (see below).
> > +        */
> > +       push %_ASM_BP
> > +       mov %rsp, %_ASM_BP
> 
> _ASM_SP instead of %rsp to avoid assembly failure for 32bit targets.

*sigh*  Thanks!  I'll build i386 this time...
Uros Bizjak Sept. 14, 2020, 9:21 p.m. UTC | #4
On Mon, Sep 14, 2020 at 11:07 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Mon, Sep 14, 2020 at 03:40:24PM -0500, Josh Poimboeuf wrote:
> > On Mon, Sep 14, 2020 at 12:56:33PM -0700, Sean Christopherson wrote:
> > > Move the asm blob that invokes the appropriate IRQ handler after VM-Exit
> > > into a proper subroutine.  Slightly rework the blob so that it plays
> > > nice with objtool without any additional hints (existing hints aren't
> > > able to handle returning with a seemingly modified stack size).
> > >
> > > Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > > Cc: Uros Bizjak <ubizjak@gmail.com>
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > ---
> > >  arch/x86/kvm/vmx/vmenter.S | 28 ++++++++++++++++++++++++++++
> > >  arch/x86/kvm/vmx/vmx.c     | 33 +++------------------------------
> > >  2 files changed, 31 insertions(+), 30 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> > > index 799db084a336..baec1e0fefc5 100644
> > > --- a/arch/x86/kvm/vmx/vmenter.S
> > > +++ b/arch/x86/kvm/vmx/vmenter.S
> > > @@ -4,6 +4,7 @@
> > >  #include <asm/bitsperlong.h>
> > >  #include <asm/kvm_vcpu_regs.h>
> > >  #include <asm/nospec-branch.h>
> > > +#include <asm/segment.h>
> > >
> > >  #define WORD_SIZE (BITS_PER_LONG / 8)
> > >
> > > @@ -294,3 +295,30 @@ SYM_FUNC_START(vmread_error_trampoline)
> > >
> > >     ret
> > >  SYM_FUNC_END(vmread_error_trampoline)
> > > +
> > > +SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
> > > +   /*
> > > +    * Unconditionally create a stack frame.  RSP needs to be aligned for
> > > +    * x86-64, getting the correct RSP on the stack (for x86-64) would take
> > > +    * two instructions anyways, and it helps make objtool happy (see below).
> > > +    */
> > > +   push %_ASM_BP
> > > +   mov %rsp, %_ASM_BP
> >
> > RSP needs to be aligned to what?  How would this align the stack, other
> > than by accident?
>
> Ah, yeah, that's lacking info.
>
> 16-byte aligned to correctly mimic CPU behavior when vectoring an IRQ/NMI.
> When not changing stack, the CPU aligns RSP before pushing the frame.
>
> The above shenanigans work because the x86-64 ABI also requires RSP to be
> 16-byte aligned prior to CALL.  RSP is thus 8-byte aligned due to CALL
> pushing the return IP, and so creating the stack frame by pushing RBP makes
> it 16-byte aliagned again.

IIRC, the kernel violates x86_64 ABI and aligns RSP to 8 bytes prior
to CALL. Please note -mpreferred-stack-boundary=3 in the compile
flags.

Uros.
Uros Bizjak Sept. 14, 2020, 9:31 p.m. UTC | #5
On Mon, Sep 14, 2020 at 11:21 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Sep 14, 2020 at 11:07 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Mon, Sep 14, 2020 at 03:40:24PM -0500, Josh Poimboeuf wrote:
> > > On Mon, Sep 14, 2020 at 12:56:33PM -0700, Sean Christopherson wrote:
> > > > Move the asm blob that invokes the appropriate IRQ handler after VM-Exit
> > > > into a proper subroutine.  Slightly rework the blob so that it plays
> > > > nice with objtool without any additional hints (existing hints aren't
> > > > able to handle returning with a seemingly modified stack size).
> > > >
> > > > Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > > > Cc: Uros Bizjak <ubizjak@gmail.com>
> > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > > ---
> > > >  arch/x86/kvm/vmx/vmenter.S | 28 ++++++++++++++++++++++++++++
> > > >  arch/x86/kvm/vmx/vmx.c     | 33 +++------------------------------
> > > >  2 files changed, 31 insertions(+), 30 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> > > > index 799db084a336..baec1e0fefc5 100644
> > > > --- a/arch/x86/kvm/vmx/vmenter.S
> > > > +++ b/arch/x86/kvm/vmx/vmenter.S
> > > > @@ -4,6 +4,7 @@
> > > >  #include <asm/bitsperlong.h>
> > > >  #include <asm/kvm_vcpu_regs.h>
> > > >  #include <asm/nospec-branch.h>
> > > > +#include <asm/segment.h>
> > > >
> > > >  #define WORD_SIZE (BITS_PER_LONG / 8)
> > > >
> > > > @@ -294,3 +295,30 @@ SYM_FUNC_START(vmread_error_trampoline)
> > > >
> > > >     ret
> > > >  SYM_FUNC_END(vmread_error_trampoline)
> > > > +
> > > > +SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
> > > > +   /*
> > > > +    * Unconditionally create a stack frame.  RSP needs to be aligned for
> > > > +    * x86-64, getting the correct RSP on the stack (for x86-64) would take
> > > > +    * two instructions anyways, and it helps make objtool happy (see below).
> > > > +    */
> > > > +   push %_ASM_BP
> > > > +   mov %rsp, %_ASM_BP
> > >
> > > RSP needs to be aligned to what?  How would this align the stack, other
> > > than by accident?
> >
> > Ah, yeah, that's lacking info.
> >
> > 16-byte aligned to correctly mimic CPU behavior when vectoring an IRQ/NMI.
> > When not changing stack, the CPU aligns RSP before pushing the frame.
> >
> > The above shenanigans work because the x86-64 ABI also requires RSP to be
> > 16-byte aligned prior to CALL.  RSP is thus 8-byte aligned due to CALL
> > pushing the return IP, and so creating the stack frame by pushing RBP makes
> > it 16-byte aliagned again.
>
> IIRC, the kernel violates x86_64 ABI and aligns RSP to 8 bytes prior
> to CALL. Please note -mpreferred-stack-boundary=3 in the compile
> flags.

+       push %_ASM_BP
+       mov %_ASM_SP, %_ASM_BP
+
+#ifdef CONFIG_X86_64
+       and $-16, %rsp"
+       push $__KERNEL_DS
+       push %rbp
+#endif
+       pushf
+       push $__KERNEL_CS
+       CALL_NOSPEC _ASM_ARG1
...
+       mov %_ASM_BP, %_ASM_SP
+       pop %_ASM_BP
+       ret

should work.

Uros.
Josh Poimboeuf Sept. 14, 2020, 9:38 p.m. UTC | #6
On Mon, Sep 14, 2020 at 02:07:19PM -0700, Sean Christopherson wrote:
> > RSP needs to be aligned to what?  How would this align the stack, other
> > than by accident?
> 
> Ah, yeah, that's lacking info.
> 
> 16-byte aligned to correctly mimic CPU behavior when vectoring an IRQ/NMI.
> When not changing stack, the CPU aligns RSP before pushing the frame.
> 
> The above shenanigans work because the x86-64 ABI also requires RSP to be
> 16-byte aligned prior to CALL.  RSP is thus 8-byte aligned due to CALL
> pushing the return IP, and so creating the stack frame by pushing RBP makes
> it 16-byte aliagned again.

As Uros mentioned, the kernel doesn't do this.

> > > +
> > > +#ifdef CONFIG_X86_64
> > > +	push $__KERNEL_DS
> > > +	push %_ASM_BP
> > > +#endif
> > > +	pushf
> > > +	push $__KERNEL_CS
> > > +	CALL_NOSPEC _ASM_ARG1
> > > +
> > > +	/*
> > > +	 * "Restore" RSP from RBP, even though IRET has already unwound RSP to
> > > +	 * the correct value.  objtool doesn't know the target will IRET and so
> > > +	 * thinks the stack is getting walloped (without the explicit restore).
> > > +	 */
> > > +	mov %_ASM_BP, %rsp
> > > +	pop %_ASM_BP
> > > +	ret
> > 
> > BTW, there *is* actually an unwind hint for this situation:
> > UNWIND_HINT_RET_OFFSET.
> 
> I played with that one, but for the life of me couldn't figure out how to
> satisfy both the "stack size" and "cfa.offset" checks.  In the code below,
> cfa.offset will be 8, stack_size will be 40 and initial_func_cfi.cfa.offset
> will be 8.  But rereading this, I assume I missed something that would allow
> maniuplating cfa.offset?  Or maybe I botched my debugging?
> 
> static bool has_modified_stack_frame(struct instruction *insn, struct insn_state *state)
> {
> 	...
> 
>         if (cfi->cfa.offset != initial_func_cfi.cfa.offset + ret_offset)
>                 return true;
> 
>         if (cfi->stack_size != initial_func_cfi.cfa.offset + ret_offset)
>                 return true;
> 
> 	...
> }

It only works without the frame pointer, in which case stack size and
cfa.offset will be the same (see below code).  With the frame pointer,
it probably wouldn't work.

But if you're going to be aligning the stack in the next patch version,
your frame pointer approach works better anyway, because the stack size
will be variable depending on the stack alignment of the callee.  So
forget I said anything :-)

> > So you might be able to do something like the following (depending on
> > what your alignment requirements actually are):
> > 
> > SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
> > #ifdef CONFIG_X86_64
> > 	push $__KERNEL_DS
> > 	push %_ASM_BP
> > #endif
> > 	pushf
> > 	push $__KERNEL_CS
> > 	CALL_NOSPEC _ASM_ARG1
> > 
> > 	/* The call popped the pushes */
> > 	UNWIND_HINT_RET_OFFSET sp_offset=32
> > 
> > 	ret
> > SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
Sean Christopherson Sept. 14, 2020, 9:54 p.m. UTC | #7
On Mon, Sep 14, 2020 at 04:38:13PM -0500, Josh Poimboeuf wrote:
> On Mon, Sep 14, 2020 at 02:07:19PM -0700, Sean Christopherson wrote:
> > > RSP needs to be aligned to what?  How would this align the stack, other
> > > than by accident?
> > 
> > Ah, yeah, that's lacking info.
> > 
> > 16-byte aligned to correctly mimic CPU behavior when vectoring an IRQ/NMI.
> > When not changing stack, the CPU aligns RSP before pushing the frame.
> > 
> > The above shenanigans work because the x86-64 ABI also requires RSP to be
> > 16-byte aligned prior to CALL.  RSP is thus 8-byte aligned due to CALL
> > pushing the return IP, and so creating the stack frame by pushing RBP makes
> > it 16-byte aliagned again.
> 
> As Uros mentioned, the kernel doesn't do this.

Argh, apparently I just got lucky with my compiles then.  I added explicit
checks on RSP being properly aligned and thought that confirmed the kernel
played nice.  Bummer.
Sean Christopherson Sept. 14, 2020, 9:55 p.m. UTC | #8
On Mon, Sep 14, 2020 at 11:31:26PM +0200, Uros Bizjak wrote:
> On Mon, Sep 14, 2020 at 11:21 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Mon, Sep 14, 2020 at 11:07 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Mon, Sep 14, 2020 at 03:40:24PM -0500, Josh Poimboeuf wrote:
> > > > On Mon, Sep 14, 2020 at 12:56:33PM -0700, Sean Christopherson wrote:
> > > > > Move the asm blob that invokes the appropriate IRQ handler after VM-Exit
> > > > > into a proper subroutine.  Slightly rework the blob so that it plays
> > > > > nice with objtool without any additional hints (existing hints aren't
> > > > > able to handle returning with a seemingly modified stack size).
> > > > >
> > > > > Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > > > > Cc: Uros Bizjak <ubizjak@gmail.com>
> > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > > > ---
> > > > >  arch/x86/kvm/vmx/vmenter.S | 28 ++++++++++++++++++++++++++++
> > > > >  arch/x86/kvm/vmx/vmx.c     | 33 +++------------------------------
> > > > >  2 files changed, 31 insertions(+), 30 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> > > > > index 799db084a336..baec1e0fefc5 100644
> > > > > --- a/arch/x86/kvm/vmx/vmenter.S
> > > > > +++ b/arch/x86/kvm/vmx/vmenter.S
> > > > > @@ -4,6 +4,7 @@
> > > > >  #include <asm/bitsperlong.h>
> > > > >  #include <asm/kvm_vcpu_regs.h>
> > > > >  #include <asm/nospec-branch.h>
> > > > > +#include <asm/segment.h>
> > > > >
> > > > >  #define WORD_SIZE (BITS_PER_LONG / 8)
> > > > >
> > > > > @@ -294,3 +295,30 @@ SYM_FUNC_START(vmread_error_trampoline)
> > > > >
> > > > >     ret
> > > > >  SYM_FUNC_END(vmread_error_trampoline)
> > > > > +
> > > > > +SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
> > > > > +   /*
> > > > > +    * Unconditionally create a stack frame.  RSP needs to be aligned for
> > > > > +    * x86-64, getting the correct RSP on the stack (for x86-64) would take
> > > > > +    * two instructions anyways, and it helps make objtool happy (see below).
> > > > > +    */
> > > > > +   push %_ASM_BP
> > > > > +   mov %rsp, %_ASM_BP
> > > >
> > > > RSP needs to be aligned to what?  How would this align the stack, other
> > > > than by accident?
> > >
> > > Ah, yeah, that's lacking info.
> > >
> > > 16-byte aligned to correctly mimic CPU behavior when vectoring an IRQ/NMI.
> > > When not changing stack, the CPU aligns RSP before pushing the frame.
> > >
> > > The above shenanigans work because the x86-64 ABI also requires RSP to be
> > > 16-byte aligned prior to CALL.  RSP is thus 8-byte aligned due to CALL
> > > pushing the return IP, and so creating the stack frame by pushing RBP makes
> > > it 16-byte aliagned again.
> >
> > IIRC, the kernel violates x86_64 ABI and aligns RSP to 8 bytes prior
> > to CALL. Please note -mpreferred-stack-boundary=3 in the compile
> > flags.
> 
> +       push %_ASM_BP
> +       mov %_ASM_SP, %_ASM_BP
> +
> +#ifdef CONFIG_X86_64
> +       and $-16, %rsp"
> +       push $__KERNEL_DS
> +       push %rbp
> +#endif
> +       pushf
> +       push $__KERNEL_CS
> +       CALL_NOSPEC _ASM_ARG1
> ...
> +       mov %_ASM_BP, %_ASM_SP
> +       pop %_ASM_BP
> +       ret
> 
> should work.

Yar, I thought I was being super clever to avoid the AND :-/
Andi Kleen Sept. 15, 2020, 2:42 a.m. UTC | #9
On Mon, Sep 14, 2020 at 02:07:19PM -0700, Sean Christopherson wrote:
> 16-byte aligned to correctly mimic CPU behavior when vectoring an IRQ/NMI.
> When not changing stack, the CPU aligns RSP before pushing the frame.

16 byte alignment is not needed for the internal kernel ABI because it doesn't use
SSE.

-Andi

> 
> The above shenanigans work because the x86-64 ABI also requires RSP to be
> 16-byte aligned prior to CALL.  RSP is thus 8-byte aligned due to CALL
> pushing the return IP, and so creating the stack frame by pushing RBP makes
> it 16-byte aliagned again.
> 
> > > +
> > > +#ifdef CONFIG_X86_64
> > > +	push $__KERNEL_DS
> > > +	push %_ASM_BP
> > > +#endif
> > > +	pushf
> > > +	push $__KERNEL_CS
> > > +	CALL_NOSPEC _ASM_ARG1
> > > +
> > > +	/*
> > > +	 * "Restore" RSP from RBP, even though IRET has already unwound RSP to
> > > +	 * the correct value.  objtool doesn't know the target will IRET and so
> > > +	 * thinks the stack is getting walloped (without the explicit restore).
> > > +	 */
> > > +	mov %_ASM_BP, %rsp
> > > +	pop %_ASM_BP
> > > +	ret
> > 
> > BTW, there *is* actually an unwind hint for this situation:
> > UNWIND_HINT_RET_OFFSET.
> 
> I played with that one, but for the life of me couldn't figure out how to
> satisfy both the "stack size" and "cfa.offset" checks.  In the code below,
> cfa.offset will be 8, stack_size will be 40 and initial_func_cfi.cfa.offset
> will be 8.  But rereading this, I assume I missed something that would allow
> maniuplating cfa.offset?  Or maybe I botched my debugging?
> 
> static bool has_modified_stack_frame(struct instruction *insn, struct insn_state *state)
> {
> 	...
> 
>         if (cfi->cfa.offset != initial_func_cfi.cfa.offset + ret_offset)
>                 return true;
> 
>         if (cfi->stack_size != initial_func_cfi.cfa.offset + ret_offset)
>                 return true;
> 
> 	...
> }
>  
> > So you might be able to do something like the following (depending on
> > what your alignment requirements actually are):
> > 
> > SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
> > #ifdef CONFIG_X86_64
> > 	push $__KERNEL_DS
> > 	push %_ASM_BP
> > #endif
> > 	pushf
> > 	push $__KERNEL_CS
> > 	CALL_NOSPEC _ASM_ARG1
> > 
> > 	/* The call popped the pushes */
> > 	UNWIND_HINT_RET_OFFSET sp_offset=32
> > 
> > 	ret
> > SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
> > 
> > -- 
> > Josh
> >