From patchwork Thu Apr 21 07:29:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 12821204 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8226C433EF for ; Thu, 21 Apr 2022 07:25:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1385794AbiDUH1r (ORCPT ); Thu, 21 Apr 2022 03:27:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1385757AbiDUH1m (ORCPT ); Thu, 21 Apr 2022 03:27:42 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF5DC10BB; Thu, 21 Apr 2022 00:24:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650525892; x=1682061892; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=OC1bR1aHL/6hCtPOV+9KHIXOV6Xbb4YnTDCVqyxUZe0=; b=HzWoAJ3vqVTrBnCdTyH7BqXrOa+UBnTjAWQWyM6rfh4Xd1FlArM2JotV mNypdf+MVQvlAlAuBl6LP5NQiAyGdHtSj5T05NAip/l2MxTSYe4yB+CQk 6oQmTJjFdO7lzrWDbCAMhBKD+72hbq01PwleVyublwymWu5Bh77SeZ16N 91dK9eHYBi51vEooPix9ONhLB149oRdjXNgbCfkP38yV+JwDLvDUweWy3 wF0LxHab4ghLK0xlEzwOZHnj5FpY+Yv+FKxQPlRCAc8BW4I6Nz/lcSsRi f3h/ya9wjd+ELhLoMKqpijdJO6qSbt7GAYySfqvyGC/XOFTaqUBykLzg5 g==; X-IronPort-AV: E=McAfee;i="6400,9594,10323"; a="324709287" X-IronPort-AV: E=Sophos;i="5.90,278,1643702400"; d="scan'208";a="324709287" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2022 00:24:52 -0700 X-IronPort-AV: E=Sophos;i="5.90,278,1643702400"; d="scan'208";a="727860208" Received: from chenyi-pc.sh.intel.com ([10.239.159.73]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2022 00:24:49 -0700 From: Chenyi Qiang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Xiaoyao Li Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Chenyi Qiang Subject: [PATCH v6 1/3] KVM: X86: Save&restore the triple fault request Date: Thu, 21 Apr 2022 15:29:56 +0800 Message-Id: <20220421072958.16375-2-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220421072958.16375-1-chenyi.qiang@intel.com> References: <20220421072958.16375-1-chenyi.qiang@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org For the triple fault sythesized by KVM, e.g. the RSM path or nested_vmx_abort(), if KVM exits to userspace before the request is serviced, userspace could migrate the VM and lose the triple fault. Add the support to save and restore the triple fault event from userspace. Introduce a new event KVM_VCPUEVENT_VALID_TRIPLE_FAULT in get/set_vcpu_events to track the triple fault request. Note that in the set_vcpu_events path, userspace is able to set/clear the triple fault request through triple_fault_pending field. Signed-off-by: Chenyi Qiang --- Documentation/virt/kvm/api.rst | 7 +++++++ arch/x86/include/uapi/asm/kvm.h | 4 +++- arch/x86/kvm/x86.c | 15 +++++++++++++-- 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 72183ae628f7..e09ce3cb49c5 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -1150,6 +1150,9 @@ The following bits are defined in the flags field: fields contain a valid state. This bit will be set whenever KVM_CAP_EXCEPTION_PAYLOAD is enabled. +- KVM_VCPUEVENT_VALID_TRIPLE_FAULT may be set to signal that the + triple_fault_pending field contains a valid state. + ARM64: ^^^^^^ @@ -1245,6 +1248,10 @@ can be set in the flags field to signal that the exception_has_payload, exception_payload, and exception.pending fields contain a valid state and shall be written into the VCPU. +KVM_VCPUEVENT_VALID_TRIPLE_FAULT can be set in flags field to signal that +the triple_fault_pending field contains a valid state and shall be written +into the VCPU. + ARM64: ^^^^^^ diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 21614807a2cb..fd083f6337af 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -325,6 +325,7 @@ struct kvm_reinject_control { #define KVM_VCPUEVENT_VALID_SHADOW 0x00000004 #define KVM_VCPUEVENT_VALID_SMM 0x00000008 #define KVM_VCPUEVENT_VALID_PAYLOAD 0x00000010 +#define KVM_VCPUEVENT_VALID_TRIPLE_FAULT 0x00000020 /* Interrupt shadow states */ #define KVM_X86_SHADOW_INT_MOV_SS 0x01 @@ -359,7 +360,8 @@ struct kvm_vcpu_events { __u8 smm_inside_nmi; __u8 latched_init; } smi; - __u8 reserved[27]; + __u8 triple_fault_pending; + __u8 reserved[26]; __u8 exception_has_payload; __u64 exception_payload; }; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ab336f7c82e4..c8b9b0bc42aa 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4911,9 +4911,12 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, !!(vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK); events->smi.latched_init = kvm_lapic_latched_init(vcpu); + events->triple_fault_pending = kvm_test_request(KVM_REQ_TRIPLE_FAULT, vcpu); + events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SHADOW - | KVM_VCPUEVENT_VALID_SMM); + | KVM_VCPUEVENT_VALID_SMM + | KVM_VCPUEVENT_VALID_TRIPLE_FAULT); if (vcpu->kvm->arch.exception_payload_enabled) events->flags |= KVM_VCPUEVENT_VALID_PAYLOAD; @@ -4929,7 +4932,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, | KVM_VCPUEVENT_VALID_SIPI_VECTOR | KVM_VCPUEVENT_VALID_SHADOW | KVM_VCPUEVENT_VALID_SMM - | KVM_VCPUEVENT_VALID_PAYLOAD)) + | KVM_VCPUEVENT_VALID_PAYLOAD + | KVM_VCPUEVENT_VALID_TRIPLE_FAULT)) return -EINVAL; if (events->flags & KVM_VCPUEVENT_VALID_PAYLOAD) { @@ -5002,6 +5006,13 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, } } + if (events->flags & KVM_VCPUEVENT_VALID_TRIPLE_FAULT) { + if (events->triple_fault_pending) + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + else + kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu); + } + kvm_make_request(KVM_REQ_EVENT, vcpu); return 0; From patchwork Thu Apr 21 07:29:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 12821205 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67DA0C433F5 for ; Thu, 21 Apr 2022 07:25:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1385749AbiDUH1z (ORCPT ); Thu, 21 Apr 2022 03:27:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1385775AbiDUH1p (ORCPT ); Thu, 21 Apr 2022 03:27:45 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAE4AFF9; Thu, 21 Apr 2022 00:24:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650525895; x=1682061895; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=tK/Q9eBMuCYib5IXMrOq3jJ+HGgt3T5qhzfnCTx8v+Y=; b=Yzp7kM+YA/h+KuGi7YqGdlMW1fuxPBFCuQnZUkQMb1vax4ei51JzsCkQ lCCPeLySudGW1g4Mpwssl/hM8cbNLZBKCBPkAyzOGatI9xrkySsMSLQKs u9fW4a1MNskaUa+gEMLEV1hAcWCGQIk68SDYnaVOSc3Sp4ArwHgFsV9uY vKcP7bgBW88TrnEMb3s3iEm7dVoL9BmyUDH05W5RhS1tEwDXltNYjxvFr m+OTspfdaryJev2zacRNSdc0Ki5Xfvj2U5VB8w3M5M/zs5KTfVgAYWX5K nYgrRextT5cxSd3qgnMdnX0ICs/hG0Dw1s25logB6lNCw0tdcGxmpNvHN Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10323"; a="324709296" X-IronPort-AV: E=Sophos;i="5.90,278,1643702400"; d="scan'208";a="324709296" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2022 00:24:55 -0700 X-IronPort-AV: E=Sophos;i="5.90,278,1643702400"; d="scan'208";a="727860213" Received: from chenyi-pc.sh.intel.com ([10.239.159.73]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2022 00:24:52 -0700 From: Chenyi Qiang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Xiaoyao Li Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Chenyi Qiang Subject: [PATCH v6 2/3] KVM: selftests: Add a test to get/set triple fault event Date: Thu, 21 Apr 2022 15:29:57 +0800 Message-Id: <20220421072958.16375-3-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220421072958.16375-1-chenyi.qiang@intel.com> References: <20220421072958.16375-1-chenyi.qiang@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a selftest to get/set triple fault event by: - exiting to userspace via I/O from L2; - using KVM_GET_VCPU_EVENTS to verify the TRIPEL_FAULT field is valid in flags. - using KVM_SET_VCPU_EVENTS to inject a triple fault event so that L1 can see the appropriate exit. Suggested-by: Sean Christopherson Signed-off-by: Chenyi Qiang --- tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../kvm/x86_64/triple_fault_event_test.c | 96 +++++++++++++++++++ 3 files changed, 98 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/triple_fault_event_test.c diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore index 56140068b763..989fd4672f61 100644 --- a/tools/testing/selftests/kvm/.gitignore +++ b/tools/testing/selftests/kvm/.gitignore @@ -55,6 +55,7 @@ /x86_64/xen_vmcall_test /x86_64/xss_msr_test /x86_64/vmx_pmu_msrs_test +/x86_64/triple_fault_event_test /access_tracking_perf_test /demand_paging_test /dirty_log_test diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index af582d168621..ce147b50b9be 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -88,6 +88,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/xen_shinfo_test TEST_GEN_PROGS_x86_64 += x86_64/xen_vmcall_test TEST_GEN_PROGS_x86_64 += x86_64/sev_migrate_tests TEST_GEN_PROGS_x86_64 += x86_64/amx_test +TEST_GEN_PROGS_x86_64 += x86_64/triple_fault_event_test TEST_GEN_PROGS_x86_64 += access_tracking_perf_test TEST_GEN_PROGS_x86_64 += demand_paging_test TEST_GEN_PROGS_x86_64 += dirty_log_test diff --git a/tools/testing/selftests/kvm/x86_64/triple_fault_event_test.c b/tools/testing/selftests/kvm/x86_64/triple_fault_event_test.c new file mode 100644 index 000000000000..20bf8f9a61d7 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/triple_fault_event_test.c @@ -0,0 +1,96 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" +#include "vmx.h" + +#include +#include + +#include "kselftest.h" + +#ifndef __x86_64__ +# error This test is 64-bit only +#endif + +#define VCPU_ID 0 +#define ARBITRARY_IO_PORT 0x2000 + +/* The virtual machine object. */ +static struct kvm_vm *vm; + +static void l2_guest_code(void) +{ + /* + * Generate an exit to L0 userspace, i.e. main(), via I/O to an + * arbitrary port. + */ + asm volatile("inb %%dx, %%al" + : : [port] "d" (ARBITRARY_IO_PORT) : "rax"); +} + +void l1_guest_code(struct vmx_pages *vmx) +{ +#define L2_GUEST_STACK_SIZE 64 + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + + GUEST_ASSERT(vmx->vmcs_gpa); + GUEST_ASSERT(prepare_for_vmx_operation(vmx)); + GUEST_ASSERT(load_vmcs(vmx)); + + prepare_vmcs(vmx, l2_guest_code, + &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + GUEST_ASSERT(!vmlaunch()); + /* L2 should triple fault after a triple fault event injected. */ + GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_TRIPLE_FAULT); + GUEST_DONE(); +} + +int main(void) +{ + struct kvm_run *run; + struct kvm_vcpu_events events; + vm_vaddr_t vmx_pages_gva; + struct ucall uc; + + if (!nested_vmx_supported()) { + print_skip("Nested VMX not supported"); + exit(KSFT_SKIP); + } + + vm = vm_create_default(VCPU_ID, 0, (void *) l1_guest_code); + + vcpu_alloc_vmx(vm, &vmx_pages_gva); + vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva); + + vcpu_run(vm, VCPU_ID); + + run = vcpu_state(vm, VCPU_ID); + + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, + "Expected KVM_EXIT_IO, got: %u (%s)\n", + run->exit_reason, exit_reason_str(run->exit_reason)); + + TEST_ASSERT(run->io.port == ARBITRARY_IO_PORT, + "Expected IN from port %d from L2, got port %d", + ARBITRARY_IO_PORT, run->io.port); + + vcpu_events_get(vm, VCPU_ID, &events); + TEST_ASSERT(events.flags & KVM_VCPUEVENT_VALID_TRIPLE_FAULT, + "Triple fault event invalid"); + events.triple_fault_pending = true; + vcpu_events_set(vm, VCPU_ID, &events); + + vcpu_run(vm, VCPU_ID); + + switch (get_ucall(vm, VCPU_ID, &uc)) { + case UCALL_DONE: + break; + case UCALL_ABORT: + TEST_FAIL("%s", (const char *)uc.args[0]); + default: + TEST_FAIL("Unexpected ucall: %lu", uc.cmd); + } + +} From patchwork Thu Apr 21 07:29:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 12821206 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 146B1C433F5 for ; Thu, 21 Apr 2022 07:25:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1385760AbiDUH2G (ORCPT ); Thu, 21 Apr 2022 03:28:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1385810AbiDUH1s (ORCPT ); Thu, 21 Apr 2022 03:27:48 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A98EB4E; Thu, 21 Apr 2022 00:24:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650525898; x=1682061898; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=7PBvYTiJT6dm3nnyDvDcbNm0dn7iUjIBH3j0kt5kuJc=; b=EGzlTKOjHRQoFdx+2nxEqn/8F0VY2ikX543dpSP+Acx3j+Y8Qzre+svE T8VLMXdy2ngHKlVw1w1N3i8vwYRZSYot7Z2c4KKn0fdUxSDT8GRjKDe8Q TwwmFQJmN3W/BcWlkKXkITqCHnX9C4uS/2Ohx+LVikP8TyLMVgHj3IyXc DCd0sKzyeI50rgydv2kVw2O3nyQNiS3NsSE6DjLUzIBJ4zNLEa9lhQ7wE IyLEXB/nWVAmN6JZV93U2LCZwZm4Icu+eVIjWejdCQdzsKRioXcFj1E79 ScuKYplEjc6KDWKPseYMVLpszOoVa6RzwBm+Zwhe3VnOJx5ByfAN4ZgwT w==; X-IronPort-AV: E=McAfee;i="6400,9594,10323"; a="324709298" X-IronPort-AV: E=Sophos;i="5.90,278,1643702400"; d="scan'208";a="324709298" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2022 00:24:58 -0700 X-IronPort-AV: E=Sophos;i="5.90,278,1643702400"; d="scan'208";a="727860218" Received: from chenyi-pc.sh.intel.com ([10.239.159.73]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2022 00:24:55 -0700 From: Chenyi Qiang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Xiaoyao Li Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Chenyi Qiang Subject: [PATCH v6 3/3] KVM: VMX: Enable Notify VM exit Date: Thu, 21 Apr 2022 15:29:58 +0800 Message-Id: <20220421072958.16375-4-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220421072958.16375-1-chenyi.qiang@intel.com> References: <20220421072958.16375-1-chenyi.qiang@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Tao Xu There are cases that malicious virtual machines can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. VMM can enable notify VM exit that a VM exit generated if no event window occurs in VM non-root mode for a specified amount of time (notify window). Feature enabling: - The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust the expected notify window. - Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space can query and enable this feature in per-VM scope. The argument is a 64bit value: bits 63:32 are used for notify window, and bits 31:0 are for flags. Current supported flags: - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify window provided. - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen. - It's safe to even set notify window to zero since an internal hardware threshold is added to vmcs.notify_window. VM exit handling: - Introduce a vcpu state notify_window_exits to records the count of notify VM exits and expose it through the debugfs. - Notify VM exit can happen incident to delivery of a vector event. Allow it in KVM. - Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID bit is set. Nested handling - Nested notify VM exits are not supported yet. Keep the same notify window control in vmcs02 as vmcs01, so that L1 can't escape the restriction of notify VM exits through launching L2 VM. - When L2 VM is context invalid and user space should synthesize a shutdown event to a vcpu. KVM makes KVM_REQ_TRIPLE_FAULT request accordingly and it would synthesize a nested triple fault exit to L1 hypervisor to kill L2. Notify VM exit is defined in latest Intel Architecture Instruction Set Extensions Programming Reference, chapter 9.2. Co-developed-by: Xiaoyao Li Signed-off-by: Xiaoyao Li Signed-off-by: Tao Xu Co-developed-by: Chenyi Qiang Signed-off-by: Chenyi Qiang Signed-off-by: Sean Christopherson --- Documentation/virt/kvm/api.rst | 48 ++++++++++++++++++++++++++++++ arch/x86/include/asm/kvm_host.h | 9 ++++++ arch/x86/include/asm/vmx.h | 7 +++++ arch/x86/include/asm/vmxfeatures.h | 1 + arch/x86/include/uapi/asm/vmx.h | 4 ++- arch/x86/kvm/vmx/capabilities.h | 6 ++++ arch/x86/kvm/vmx/nested.c | 8 +++++ arch/x86/kvm/vmx/vmx.c | 48 ++++++++++++++++++++++++++++-- arch/x86/kvm/x86.c | 18 ++++++++++- arch/x86/kvm/x86.h | 5 ++++ include/uapi/linux/kvm.h | 10 +++++++ 11 files changed, 159 insertions(+), 5 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index e09ce3cb49c5..821ca979234d 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6321,6 +6321,26 @@ array field represents return values. The userspace should update the return values of SBI call before resuming the VCPU. For more details on RISC-V SBI spec refer, https://github.com/riscv/riscv-sbi-doc. +:: + + /* KVM_EXIT_NOTIFY */ + struct { + #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) + __u32 flags; + } notify; + +Used on x86 systems. When the VM capability KVM_CAP_X86_NOTIFY_VMEXIT is +enabled, a VM exit generated if no event window occurs in VM non-root mode +for a specified amount of time. Once KVM_X86_NOTIFY_VMEXIT_USER is set when +enabling the cap, it would exit to userspace with the exit reason +KVM_EXIT_NOTIFY for further handling. The "flags" field contains more +detailed info. + +The valid value for 'data' is: + + - KVM_NOTIFY_CONTEXT_INVALID -- the VM context is corrupted and not valid + in VMCS. It would run into unknown result if resume the target VM. + :: /* Fix the size of the union. */ @@ -7266,6 +7286,34 @@ The valid bits in cap.args[0] are: generate a #UD within the guest. =================================== ============================================ +7.31 KVM_CAP_X86_NOTIFY_VMEXIT +------------------------------ + +:Architectures: x86 +:Target: VM +:Parameters: args[0] is the value of notify window as well as some flags +:Returns: 0 on success, -EINVAL if hardware doesn't support notify VM exit. + +Bits 63:32 of args[0] are used for notify window. +Bits 31:0 of args[0] are for some flags. Valid bits are:: + + #define KVM_X86_NOTIFY_VMEXIT_ENABLED (1 << 0) + #define KVM_X86_NOTIFY_VMEXIT_USER (1 << 1) + +This capability allows userspace to configure the notify VM exit on/off +in per-VM scope during VM creation. Notify VM exit is disabled by default. +When userspace sets KVM_X86_NOTIFY_VMEXIT_ENABLED bit in args[0], VMM would +enable this feature with the notify window provided, which will generate +a VM exit if no event window occurs in VM non-root mode for a specified of +time (notify window). + +If KVM_X86_NOTIFY_VMEXIT_USER is set in args[0], upon notify VM exits happen, +KVM would exit to userspace for handling. + +This capability is aimed to mitigate the threat that malicious VMs can +cause CPU stuck (due to event windows don't open up) and make the CPU +unavailable to host or other VMs. + 8. Other capabilities. ====================== diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 2c20f715f009..28a42658eec5 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -65,6 +65,9 @@ #define KVM_BUS_LOCK_DETECTION_VALID_MODE (KVM_BUS_LOCK_DETECTION_OFF | \ KVM_BUS_LOCK_DETECTION_EXIT) +#define KVM_X86_NOTIFY_VMEXIT_VALID_BITS (KVM_X86_NOTIFY_VMEXIT_ENABLED | \ + KVM_X86_NOTIFY_VMEXIT_USER) + /* x86-specific vcpu->requests bit members */ #define KVM_REQ_MIGRATE_TIMER KVM_ARCH_REQ(0) #define KVM_REQ_REPORT_TPR_ACCESS KVM_ARCH_REQ(1) @@ -1157,6 +1160,9 @@ struct kvm_arch { bool bus_lock_detection_enabled; bool enable_pmu; + + u32 notify_window; + u32 notify_vmexit_flags; /* * If exit_on_emulation_error is set, and the in-kernel instruction * emulator fails to emulate an instruction, allow userspace @@ -1293,6 +1299,7 @@ struct kvm_vcpu_stat { u64 directed_yield_attempted; u64 directed_yield_successful; u64 guest_mode; + u64 notify_window_exits; }; struct x86_instruction_info; @@ -1504,6 +1511,8 @@ struct kvm_x86_ops { * Returns vCPU specific APICv inhibit reasons */ unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); + + bool has_notify_vmexit; }; struct kvm_x86_nested_ops { diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 6c343c6a1855..2aa266007873 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -74,6 +74,7 @@ #define SECONDARY_EXEC_TSC_SCALING VMCS_CONTROL_BIT(TSC_SCALING) #define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE VMCS_CONTROL_BIT(USR_WAIT_PAUSE) #define SECONDARY_EXEC_BUS_LOCK_DETECTION VMCS_CONTROL_BIT(BUS_LOCK_DETECTION) +#define SECONDARY_EXEC_NOTIFY_VM_EXITING VMCS_CONTROL_BIT(NOTIFY_VM_EXITING) #define PIN_BASED_EXT_INTR_MASK VMCS_CONTROL_BIT(INTR_EXITING) #define PIN_BASED_NMI_EXITING VMCS_CONTROL_BIT(NMI_EXITING) @@ -269,6 +270,7 @@ enum vmcs_field { SECONDARY_VM_EXEC_CONTROL = 0x0000401e, PLE_GAP = 0x00004020, PLE_WINDOW = 0x00004022, + NOTIFY_WINDOW = 0x00004024, VM_INSTRUCTION_ERROR = 0x00004400, VM_EXIT_REASON = 0x00004402, VM_EXIT_INTR_INFO = 0x00004404, @@ -553,6 +555,11 @@ enum vm_entry_failure_code { #define EPT_VIOLATION_GVA_IS_VALID (1 << EPT_VIOLATION_GVA_IS_VALID_BIT) #define EPT_VIOLATION_GVA_TRANSLATED (1 << EPT_VIOLATION_GVA_TRANSLATED_BIT) +/* + * Exit Qualifications for NOTIFY VM EXIT + */ +#define NOTIFY_VM_CONTEXT_INVALID BIT(0) + /* * VM-instruction error numbers */ diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h index d9a74681a77d..15f0f2ab4f95 100644 --- a/arch/x86/include/asm/vmxfeatures.h +++ b/arch/x86/include/asm/vmxfeatures.h @@ -84,5 +84,6 @@ #define VMX_FEATURE_USR_WAIT_PAUSE ( 2*32+ 26) /* Enable TPAUSE, UMONITOR, UMWAIT in guest */ #define VMX_FEATURE_ENCLV_EXITING ( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf dependent) */ #define VMX_FEATURE_BUS_LOCK_DETECTION ( 2*32+ 30) /* "" VM-Exit when bus lock caused */ +#define VMX_FEATURE_NOTIFY_VM_EXITING ( 2*32+ 31) /* VM-Exit when no event windows after notify window */ #endif /* _ASM_X86_VMXFEATURES_H */ diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h index 946d761adbd3..a5faf6d88f1b 100644 --- a/arch/x86/include/uapi/asm/vmx.h +++ b/arch/x86/include/uapi/asm/vmx.h @@ -91,6 +91,7 @@ #define EXIT_REASON_UMWAIT 67 #define EXIT_REASON_TPAUSE 68 #define EXIT_REASON_BUS_LOCK 74 +#define EXIT_REASON_NOTIFY 75 #define VMX_EXIT_REASONS \ { EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \ @@ -153,7 +154,8 @@ { EXIT_REASON_XRSTORS, "XRSTORS" }, \ { EXIT_REASON_UMWAIT, "UMWAIT" }, \ { EXIT_REASON_TPAUSE, "TPAUSE" }, \ - { EXIT_REASON_BUS_LOCK, "BUS_LOCK" } + { EXIT_REASON_BUS_LOCK, "BUS_LOCK" }, \ + { EXIT_REASON_NOTIFY, "NOTIFY" } #define VMX_EXIT_REASON_FLAGS \ { VMX_EXIT_REASONS_FAILED_VMENTRY, "FAILED_VMENTRY" } diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index 3f430e218375..0102a6e8a194 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -417,4 +417,10 @@ static inline u64 vmx_supported_debugctl(void) return debugctl; } +static inline bool cpu_has_notify_vmexit(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl & + SECONDARY_EXEC_NOTIFY_VM_EXITING; +} + #endif /* __KVM_X86_VMX_CAPS_H */ diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index a6688663da4d..5f74aae96031 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2133,6 +2133,8 @@ static u64 nested_vmx_calc_efer(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12) static void prepare_vmcs02_constant_state(struct vcpu_vmx *vmx) { + struct kvm *kvm = vmx->vcpu.kvm; + /* * If vmcs02 hasn't been initialized, set the constant vmcs02 state * according to L0's settings (vmcs12 is irrelevant here). Host @@ -2175,6 +2177,9 @@ static void prepare_vmcs02_constant_state(struct vcpu_vmx *vmx) if (cpu_has_vmx_encls_vmexit()) vmcs_write64(ENCLS_EXITING_BITMAP, INVALID_GPA); + if (kvm_notify_vmexit_enabled(kvm)) + vmcs_write32(NOTIFY_WINDOW, kvm->arch.notify_window); + /* * Set the MSR load/store lists to match L0's settings. Only the * addresses are constant (for vmcs02), the counts can change based @@ -6107,6 +6112,9 @@ static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu, SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE); case EXIT_REASON_ENCLS: return nested_vmx_exit_handled_encls(vcpu, vmcs12); + case EXIT_REASON_NOTIFY: + /* Notify VM exit is not exposed to L1 */ + return false; default: return true; } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cf8581978bce..4ecfca00ff9e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2472,7 +2472,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, SECONDARY_EXEC_PT_USE_GPA | SECONDARY_EXEC_PT_CONCEAL_VMX | SECONDARY_EXEC_ENABLE_VMFUNC | - SECONDARY_EXEC_BUS_LOCK_DETECTION; + SECONDARY_EXEC_BUS_LOCK_DETECTION | + SECONDARY_EXEC_NOTIFY_VM_EXITING; if (cpu_has_sgx()) opt2 |= SECONDARY_EXEC_ENCLS_EXITING; if (adjust_vmx_controls(min2, opt2, @@ -4357,6 +4358,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx) if (!vcpu->kvm->arch.bus_lock_detection_enabled) exec_control &= ~SECONDARY_EXEC_BUS_LOCK_DETECTION; + if (!kvm_notify_vmexit_enabled(vcpu->kvm)) + exec_control &= ~SECONDARY_EXEC_NOTIFY_VM_EXITING; + return exec_control; } @@ -4364,6 +4368,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx) static void init_vmcs(struct vcpu_vmx *vmx) { + struct kvm *kvm = vmx->vcpu.kvm; + if (nested) nested_vmx_set_vmcs_shadowing_bitmap(); @@ -4392,12 +4398,15 @@ static void init_vmcs(struct vcpu_vmx *vmx) vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc))); } - if (!kvm_pause_in_guest(vmx->vcpu.kvm)) { + if (!kvm_pause_in_guest(kvm)) { vmcs_write32(PLE_GAP, ple_gap); vmx->ple_window = ple_window; vmx->ple_window_dirty = true; } + if (kvm_notify_vmexit_enabled(kvm)) + vmcs_write32(NOTIFY_WINDOW, kvm->arch.notify_window); + vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0); vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, 0); vmcs_write32(CR3_TARGET_COUNT, 0); /* 22.2.1 */ @@ -5684,6 +5693,32 @@ static int handle_bus_lock_vmexit(struct kvm_vcpu *vcpu) return 1; } +static int handle_notify(struct kvm_vcpu *vcpu) +{ + unsigned long exit_qual = vmx_get_exit_qual(vcpu); + bool context_invalid = exit_qual & NOTIFY_VM_CONTEXT_INVALID; + + ++vcpu->stat.notify_window_exits; + + /* + * Notify VM exit happened while executing iret from NMI, + * "blocked by NMI" bit has to be set before next VM entry. + */ + if (enable_vnmi && (exit_qual & INTR_INFO_UNBLOCK_NMI)) + vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, + GUEST_INTR_STATE_NMI); + + if (vcpu->kvm->arch.notify_vmexit_flags & KVM_X86_NOTIFY_VMEXIT_USER || + context_invalid) { + vcpu->run->exit_reason = KVM_EXIT_NOTIFY; + vcpu->run->notify.flags = context_invalid ? + KVM_NOTIFY_CONTEXT_INVALID : 0; + return 0; + } + + return 1; +} + /* * The exit handlers return 1 if the exit was handled fully and guest execution * may resume. Otherwise they set the kvm_run parameter to indicate what needs @@ -5741,6 +5776,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_PREEMPTION_TIMER] = handle_preemption_timer, [EXIT_REASON_ENCLS] = handle_encls, [EXIT_REASON_BUS_LOCK] = handle_bus_lock_vmexit, + [EXIT_REASON_NOTIFY] = handle_notify, }; static const int kvm_vmx_max_exit_handlers = @@ -6105,7 +6141,8 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) exit_reason.basic != EXIT_REASON_EPT_VIOLATION && exit_reason.basic != EXIT_REASON_PML_FULL && exit_reason.basic != EXIT_REASON_APIC_ACCESS && - exit_reason.basic != EXIT_REASON_TASK_SWITCH)) { + exit_reason.basic != EXIT_REASON_TASK_SWITCH && + exit_reason.basic != EXIT_REASON_NOTIFY)) { int ndata = 3; vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; @@ -7841,6 +7878,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + + .has_notify_vmexit = false, }; static unsigned int vmx_handle_intel_pt_intr(void) @@ -7979,6 +8018,9 @@ static __init int hardware_setup(void) kvm_tsc_scaling_ratio_frac_bits = 48; kvm_has_bus_lock_exit = cpu_has_vmx_bus_lock_detection(); + if (cpu_has_notify_vmexit()) + vmx_x86_ops.has_notify_vmexit = true; + set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ if (enable_ept) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c8b9b0bc42aa..ad8a05349614 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -291,7 +291,8 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { STATS_DESC_COUNTER(VCPU, nested_run), STATS_DESC_COUNTER(VCPU, directed_yield_attempted), STATS_DESC_COUNTER(VCPU, directed_yield_successful), - STATS_DESC_ICOUNTER(VCPU, guest_mode) + STATS_DESC_ICOUNTER(VCPU, guest_mode), + STATS_DESC_COUNTER(VCPU, notify_window_exits), }; const struct kvm_stats_header kvm_vcpu_stats_header = { @@ -4389,6 +4390,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_DISABLE_QUIRKS2: r = KVM_X86_VALID_QUIRKS; break; + case KVM_CAP_X86_NOTIFY_VMEXIT: + r = kvm_x86_ops.has_notify_vmexit; + break; default: break; } @@ -6090,6 +6094,18 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, } mutex_unlock(&kvm->lock); break; + case KVM_CAP_X86_NOTIFY_VMEXIT: + r = -EINVAL; + if ((u32)cap->args[0] & ~KVM_X86_NOTIFY_VMEXIT_VALID_BITS) + break; + if (!kvm_x86_ops.has_notify_vmexit) + break; + if (!(u32)cap->args[0] & KVM_X86_NOTIFY_VMEXIT_ENABLED) + break; + kvm->arch.notify_window = cap->args[0] >> 32; + kvm->arch.notify_vmexit_flags = (u32)cap->args[0]; + r = 0; + break; default: r = -EINVAL; break; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 588792f00334..6e1e2159cbd4 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -344,6 +344,11 @@ static inline bool kvm_cstate_in_guest(struct kvm *kvm) return kvm->arch.cstate_in_guest; } +static inline bool kvm_notify_vmexit_enabled(struct kvm *kvm) +{ + return kvm->arch.notify_vmexit_flags & KVM_X86_NOTIFY_VMEXIT_ENABLED; +} + enum kvm_intr_type { /* Values are arbitrary, but must be non-zero. */ KVM_HANDLING_IRQ = 1, diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index dd1d8167e71f..86f1ca9c1514 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -270,6 +270,7 @@ struct kvm_xen_exit { #define KVM_EXIT_X86_BUS_LOCK 33 #define KVM_EXIT_XEN 34 #define KVM_EXIT_RISCV_SBI 35 +#define KVM_EXIT_NOTIFY 36 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -490,6 +491,11 @@ struct kvm_run { unsigned long args[6]; unsigned long ret[2]; } riscv_sbi; + /* KVM_EXIT_NOTIFY */ + struct { +#define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) + __u32 flags; + } notify; /* Fix the size of the union. */ char padding[256]; }; @@ -1148,6 +1154,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_PMU_CAPABILITY 212 #define KVM_CAP_DISABLE_QUIRKS2 213 #define KVM_CAP_VM_TSC_CONTROL 214 +#define KVM_CAP_X86_NOTIFY_VMEXIT 215 #ifdef KVM_CAP_IRQ_ROUTING @@ -2109,4 +2116,7 @@ struct kvm_stats_desc { /* Available with KVM_CAP_XSAVE2 */ #define KVM_GET_XSAVE2 _IOR(KVMIO, 0xcf, struct kvm_xsave) +#define KVM_X86_NOTIFY_VMEXIT_ENABLED (1ULL << 0) +#define KVM_X86_NOTIFY_VMEXIT_USER (1ULL << 1) + #endif /* __LINUX_KVM_H */