Message ID | 20181220222108.26181-1-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup | expand |
On 20/12/18 23:21, Sean Christopherson wrote: > ____kvm_handle_fault_on_reboot() provides a generic exception fixup > handler that is used to cleanly handle faults on VMX/SVM instructions > during reboot (or at least try to). If there isn't a reboot in > progress, ____kvm_handle_fault_on_reboot() treats any exception as > fatal to KVM and invokes kvm_spurious_fault(), which in turn generates > a BUG() to get a stack trace and die. > > When it was originally added by commit 4ecac3fd6dc2 ("KVM: Handle > virtualization instruction #UD faults during reboot"), the "call" to > kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value > is the RIP of the faulting instructing. > > The PUSH+JMP trickery is necessary because the exception fixup handler > code lies outside of its associated function, e.g. right after the > function. An actual CALL from the .fixup code would show a slightly > bogus stack trace, e.g. an extra "random" function would be inserted > into the trace, as the return RIP on the stack would point to no known > function (and the unwinder will likely try to guess who owns the RIP). > > Unfortunately, the JMP was replaced with a CALL when the macro was > reworked to not spin indefinitely during reboot (commit b7c4145ba2eb > "KVM: Don't spin on virt instruction faults during reboot"). This > causes the aforementioned behavior where a bogus function is inserted > into the stack trace, e.g. my builds like to blame free_kvm_area(). > > Revert the CALL back to a JMP. The changelog for commit b7c4145ba2eb > ("KVM: Don't spin on virt instruction faults during reboot") contains > nothing that indicates the switch to CALL was deliberate. This is > backed up by the fact that the PUSH <insn RIP> was left intact. > > Note that an alternative to the PUSH+JMP magic would be to JMP back > to the "real" code and CALL from there, but that would require adding > a JMP in the non-faulting path to avoid calling kvm_spurious_fault() > and would add no value, i.e. the stack trace would be the same. > > Using CALL: > > ------------[ cut here ]------------ > kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! > invalid opcode: 0000 [#1] SMP > CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] > Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 > RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0 > R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000 > FS: 00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0 > Call Trace: > free_kvm_area+0x1044/0x43ea [kvm_intel] > ? vmx_vcpu_run+0x156/0x630 [kvm_intel] > ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] > ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] > ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] > ? __set_task_blocked+0x38/0x90 > ? __set_current_blocked+0x50/0x60 > ? __fpu__restore_sig+0x97/0x490 > ? do_vfs_ioctl+0xa1/0x620 > ? __x64_sys_futex+0x89/0x180 > ? ksys_ioctl+0x66/0x70 > ? __x64_sys_ioctl+0x16/0x20 > ? do_syscall_64+0x4f/0x100 > ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc > ---[ end trace 9775b14b123b1713 ]--- > > Using JMP: > > ------------[ cut here ]------------ > kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! > invalid opcode: 0000 [#1] SMP > CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] > Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 > RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0 > R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40 > FS: 00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0 > Call Trace: > vmx_vcpu_run+0x156/0x630 [kvm_intel] > ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] > ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] > ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] > ? __set_task_blocked+0x38/0x90 > ? __set_current_blocked+0x50/0x60 > ? __fpu__restore_sig+0x97/0x490 > ? do_vfs_ioctl+0xa1/0x620 > ? __x64_sys_futex+0x89/0x180 > ? ksys_ioctl+0x66/0x70 > ? __x64_sys_ioctl+0x16/0x20 > ? do_syscall_64+0x4f/0x100 > ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc > ---[ end trace f9daedb85ab3ddba ]--- > > Fixes: b7c4145ba2eb ("KVM: Don't spin on virt instruction faults during reboot") > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > arch/x86/include/asm/kvm_host.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index bca77c25a19a..311763be2e8a 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1493,7 +1493,7 @@ asmlinkage void kvm_spurious_fault(void); > "cmpb $0, kvm_rebooting \n\t" \ > "jne 668b \n\t" \ > __ASM_SIZE(push) " $666b \n\t" \ > - "call kvm_spurious_fault \n\t" \ > + "jmp kvm_spurious_fault \n\t" \ > ".popsection \n\t" \ > _ASM_EXTABLE(666b, 667b) > > Queued, thanks. Paolo
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index bca77c25a19a..311763be2e8a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1493,7 +1493,7 @@ asmlinkage void kvm_spurious_fault(void); "cmpb $0, kvm_rebooting \n\t" \ "jne 668b \n\t" \ __ASM_SIZE(push) " $666b \n\t" \ - "call kvm_spurious_fault \n\t" \ + "jmp kvm_spurious_fault \n\t" \ ".popsection \n\t" \ _ASM_EXTABLE(666b, 667b)
____kvm_handle_fault_on_reboot() provides a generic exception fixup handler that is used to cleanly handle faults on VMX/SVM instructions during reboot (or at least try to). If there isn't a reboot in progress, ____kvm_handle_fault_on_reboot() treats any exception as fatal to KVM and invokes kvm_spurious_fault(), which in turn generates a BUG() to get a stack trace and die. When it was originally added by commit 4ecac3fd6dc2 ("KVM: Handle virtualization instruction #UD faults during reboot"), the "call" to kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value is the RIP of the faulting instructing. The PUSH+JMP trickery is necessary because the exception fixup handler code lies outside of its associated function, e.g. right after the function. An actual CALL from the .fixup code would show a slightly bogus stack trace, e.g. an extra "random" function would be inserted into the trace, as the return RIP on the stack would point to no known function (and the unwinder will likely try to guess who owns the RIP). Unfortunately, the JMP was replaced with a CALL when the macro was reworked to not spin indefinitely during reboot (commit b7c4145ba2eb "KVM: Don't spin on virt instruction faults during reboot"). This causes the aforementioned behavior where a bogus function is inserted into the stack trace, e.g. my builds like to blame free_kvm_area(). Revert the CALL back to a JMP. The changelog for commit b7c4145ba2eb ("KVM: Don't spin on virt instruction faults during reboot") contains nothing that indicates the switch to CALL was deliberate. This is backed up by the fact that the PUSH <insn RIP> was left intact. Note that an alternative to the PUSH+JMP magic would be to JMP back to the "real" code and CALL from there, but that would require adding a JMP in the non-faulting path to avoid calling kvm_spurious_fault() and would add no value, i.e. the stack trace would be the same. Using CALL: ------------[ cut here ]------------ kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! invalid opcode: 0000 [#1] SMP CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0 R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000 FS: 00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0 Call Trace: free_kvm_area+0x1044/0x43ea [kvm_intel] ? vmx_vcpu_run+0x156/0x630 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? __set_task_blocked+0x38/0x90 ? __set_current_blocked+0x50/0x60 ? __fpu__restore_sig+0x97/0x490 ? do_vfs_ioctl+0xa1/0x620 ? __x64_sys_futex+0x89/0x180 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x4f/0x100 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc ---[ end trace 9775b14b123b1713 ]--- Using JMP: ------------[ cut here ]------------ kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! invalid opcode: 0000 [#1] SMP CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0 R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40 FS: 00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0 Call Trace: vmx_vcpu_run+0x156/0x630 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? __set_task_blocked+0x38/0x90 ? __set_current_blocked+0x50/0x60 ? __fpu__restore_sig+0x97/0x490 ? do_vfs_ioctl+0xa1/0x620 ? __x64_sys_futex+0x89/0x180 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x4f/0x100 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc ---[ end trace f9daedb85ab3ddba ]--- Fixes: b7c4145ba2eb ("KVM: Don't spin on virt instruction faults during reboot") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/include/asm/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)