From patchwork Thu Sep 15 09:28:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 12977207 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 877EEECAAD3 for ; Thu, 15 Sep 2022 10:04:11 +0000 (UTC) Received: from localhost ([::1]:56794 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oYlj8-0006bW-Br for qemu-devel@archiver.kernel.org; Thu, 15 Sep 2022 06:04:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59910) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oYl4K-0005sE-Gn for qemu-devel@nongnu.org; Thu, 15 Sep 2022 05:22:01 -0400 Received: from mga18.intel.com ([134.134.136.126]:26529) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oYl4I-000425-SD for qemu-devel@nongnu.org; Thu, 15 Sep 2022 05:22:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663233718; x=1694769718; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=0del9VXeyTnxvbQjFzOC/wKBQkuUfZxt/1ML/Cb1adw=; b=hmM1kIq4AFt2zfhPlXBK28aOnynnK8NOneAOGBfwOzqhj9Ma9G7FQ/MH OhNaBK0RtfClI14o/FOb+WS8wKf9Ul2IOEXAcLD/pSBoNOt8vnLx8BFm4 yBPwW0peCsoqwF7a1R1aQ9NSxoLLHMndYSI/3X2mh4J25hHCvZUBlPA0y eB6OgjSQoY3iCP3+rFqRz6zdOGR1oGObtlMXCEbfBBeX3+eOGPX7JWugI Vccm692g60T+lCEvgZAJ3qb3hT2C++a7/XBWb3MyKR5rFUQZjSbDcVjna xR/qNrw9AxENeDPjxJl9zqClCicD/inq7iDeRWyFqVRbTP1MGLWEOye6a w==; X-IronPort-AV: E=McAfee;i="6500,9779,10470"; a="281694390" X-IronPort-AV: E=Sophos;i="5.93,317,1654585200"; d="scan'208";a="281694390" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2022 02:21:56 -0700 X-IronPort-AV: E=Sophos;i="5.93,317,1654585200"; d="scan'208";a="759563768" Received: from chenyi-pc.sh.intel.com ([10.239.159.73]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2022 02:21:54 -0700 From: Chenyi Qiang To: Paolo Bonzini , Marcelo Tosatti , Richard Henderson , Eduardo Habkost , Peter Xu , Xiaoyao Li Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org Subject: [PATCH v6 1/2] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault Date: Thu, 15 Sep 2022 17:28:38 +0800 Message-Id: <20220915092839.5518-2-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220915092839.5518-1-chenyi.qiang@intel.com> References: <20220915092839.5518-1-chenyi.qiang@intel.com> Received-SPF: pass client-ip=134.134.136.126; envelope-from=chenyi.qiang@intel.com; helo=mga18.intel.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" For the direct triple faults, i.e. hardware detected and KVM morphed to VM-Exit, KVM will never lose them. But for triple faults sythesized by KVM, e.g. the RSM path, if KVM exits to userspace before the request is serviced, userspace could migrate the VM and lose the triple fault. A new flag KVM_VCPUEVENT_VALID_TRIPLE_FAULT is defined to signal that the event.triple_fault_pending field contains a valid state if the KVM_CAP_X86_TRIPLE_FAULT_EVENT capability is enabled. Acked-by: Peter Xu Signed-off-by: Chenyi Qiang --- target/i386/cpu.c | 1 + target/i386/cpu.h | 1 + target/i386/kvm/kvm.c | 20 ++++++++++++++++++++ 3 files changed, 22 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 1db1278a59..6e107466b3 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6017,6 +6017,7 @@ static void x86_cpu_reset(DeviceState *dev) env->exception_has_payload = false; env->exception_payload = 0; env->nmi_injected = false; + env->triple_fault_pending = false; #if !defined(CONFIG_USER_ONLY) /* We hard-wire the BSP to the first CPU. */ apic_designate_bsp(cpu->apic_state, s->cpu_index == 0); diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 82004b65b9..b97d182e28 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1739,6 +1739,7 @@ typedef struct CPUArchState { uint8_t has_error_code; uint8_t exception_has_payload; uint64_t exception_payload; + bool triple_fault_pending; uint32_t ins_len; uint32_t sipi_vector; bool tsc_valid; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index a1fd1f5379..3838827134 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -132,6 +132,7 @@ static int has_xcrs; static int has_pit_state2; static int has_sregs2; static int has_exception_payload; +static int has_triple_fault_event; static bool has_msr_mcg_ext_ctl; @@ -2483,6 +2484,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } + has_triple_fault_event = kvm_check_extension(s, KVM_CAP_X86_TRIPLE_FAULT_EVENT); + if (has_triple_fault_event) { + ret = kvm_vm_enable_cap(s, KVM_CAP_X86_TRIPLE_FAULT_EVENT, 0, true); + if (ret < 0) { + error_report("kvm: Failed to enable triple fault event cap: %s", + strerror(-ret)); + return ret; + } + } + ret = kvm_get_supported_msrs(s); if (ret < 0) { return ret; @@ -4299,6 +4310,11 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level) } } + if (has_triple_fault_event) { + events.flags |= KVM_VCPUEVENT_VALID_TRIPLE_FAULT; + events.triple_fault.pending = env->triple_fault_pending; + } + return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events); } @@ -4368,6 +4384,10 @@ static int kvm_get_vcpu_events(X86CPU *cpu) } } + if (events.flags & KVM_VCPUEVENT_VALID_TRIPLE_FAULT) { + env->triple_fault_pending = events.triple_fault.pending; + } + env->sipi_vector = events.sipi_vector; return 0; From patchwork Thu Sep 15 09:28:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 12977214 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A21F6ECAAA1 for ; Thu, 15 Sep 2022 10:09:10 +0000 (UTC) Received: from localhost ([::1]:55674 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oYlnx-00021U-Kf for qemu-devel@archiver.kernel.org; Thu, 15 Sep 2022 06:09:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59912) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oYl4M-0005t1-7V for qemu-devel@nongnu.org; Thu, 15 Sep 2022 05:22:07 -0400 Received: from mga18.intel.com ([134.134.136.126]:26535) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oYl4K-00042T-0H for qemu-devel@nongnu.org; Thu, 15 Sep 2022 05:22:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663233719; x=1694769719; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=SdILAL/8zd47QyTOHpuXlA7fPczAgfIob64wgJeBSnA=; b=b8Pc5n8R9oFbgXoJDIUBIosZDl5LGiV5DjyPdmNbAXz/nZfrpb58V0hh Dl5DruoeZnAIwVFXqypyq3lhmJ+9A3VrDeyeae8vzZqyZZQb19y/h6pYn KEkHBUhoFkvcA+0DfwpQ1aBjetLqhPsElxiIv84IHT34UYyCxfoExt3av JWYLvv30v5Pk59N6BL0qa6a30QpMRha+swIRyfRlgzIWhV/XtsAKhtnZN uElqZ+Cfb82kkCNetvSHd+T7l6TtuK8lqx1MQQXhmclx2XeQ5hdd/A+LE Fe8RXrFiLmdNjmPISm+x3QnVQ/rRaEb7xPZDQcSFF/kK8+hYPyFXKGGlc g==; X-IronPort-AV: E=McAfee;i="6500,9779,10470"; a="281694395" X-IronPort-AV: E=Sophos;i="5.93,317,1654585200"; d="scan'208";a="281694395" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2022 02:21:59 -0700 X-IronPort-AV: E=Sophos;i="5.93,317,1654585200"; d="scan'208";a="759563776" Received: from chenyi-pc.sh.intel.com ([10.239.159.73]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2022 02:21:57 -0700 From: Chenyi Qiang To: Paolo Bonzini , Marcelo Tosatti , Richard Henderson , Eduardo Habkost , Peter Xu , Xiaoyao Li Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org Subject: [PATCH v6 2/2] i386: Add notify VM exit support Date: Thu, 15 Sep 2022 17:28:39 +0800 Message-Id: <20220915092839.5518-3-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220915092839.5518-1-chenyi.qiang@intel.com> References: <20220915092839.5518-1-chenyi.qiang@intel.com> Received-SPF: pass client-ip=134.134.136.126; envelope-from=chenyi.qiang@intel.com; helo=mga18.intel.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" There are cases that malicious virtual machine can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. Notify VM exit is introduced to mitigate such kind of attacks, which will generate a VM exit if no event window occurs in VM non-root mode for a specified amount of time (notify window). A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space so that the user can query the capability and set the expected notify window when creating VMs. The format of the argument when enabling this capability is as follows: Bit 63:32 - notify window specified in qemu command Bit 31:0 - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to enable the feature.) Because there are some concerns, e.g. a notify VM exit may happen with VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated that would set this bit), which means VM context is corrupted. To avoid the false positive and a well-behaved guest gets killed, make this feature disabled by default. Users can enable the feature by a new machine property: qemu -machine notify_vmexit=on,notify_window=0 ... Note that notify_window is only valid when notify_vmexit is on. The valid range of notify_window is non-negative. It is even safe to set it to zero since there's an internal hardware threshold to be added to ensure no false positive. A new KVM exit reason KVM_EXIT_NOTIFY is defined for notify VM exit. If it happens with VM_INVALID_CONTEXT, hypervisor exits to user space to inform the fatal case. Then user space can inject a SHUTDOWN event to the target vcpu. This is implemented by injecting a sythesized triple fault event. Signed-off-by: Chenyi Qiang Acked-by: Peter Xu --- hw/i386/x86.c | 45 +++++++++++++++++++++++++++++++++++++++++++ include/hw/i386/x86.h | 5 +++++ qemu-options.hx | 10 +++++++++- target/i386/kvm/kvm.c | 28 +++++++++++++++++++++++++++ 4 files changed, 87 insertions(+), 1 deletion(-) diff --git a/hw/i386/x86.c b/hw/i386/x86.c index 050eedc0c8..1eccbd3deb 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -1379,6 +1379,37 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name, qapi_free_SgxEPCList(list); } +static bool x86_machine_get_notify_vmexit(Object *obj, Error **errp) +{ + X86MachineState *x86ms = X86_MACHINE(obj); + + return x86ms->notify_vmexit; +} + +static void x86_machine_set_notify_vmexit(Object *obj, bool value, Error **errp) +{ + X86MachineState *x86ms = X86_MACHINE(obj); + + x86ms->notify_vmexit = value; +} + +static void x86_machine_get_notify_window(Object *obj, Visitor *v, + const char *name, void *opaque, Error **errp) +{ + X86MachineState *x86ms = X86_MACHINE(obj); + uint32_t notify_window = x86ms->notify_window; + + visit_type_uint32(v, name, ¬ify_window, errp); +} + +static void x86_machine_set_notify_window(Object *obj, Visitor *v, + const char *name, void *opaque, Error **errp) +{ + X86MachineState *x86ms = X86_MACHINE(obj); + + visit_type_uint32(v, name, &x86ms->notify_window, errp); +} + static void x86_machine_initfn(Object *obj) { X86MachineState *x86ms = X86_MACHINE(obj); @@ -1392,6 +1423,8 @@ static void x86_machine_initfn(Object *obj) x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8); x86ms->bus_lock_ratelimit = 0; x86ms->above_4g_mem_start = 4 * GiB; + x86ms->notify_vmexit = false; + x86ms->notify_window = 0; } static void x86_machine_class_init(ObjectClass *oc, void *data) @@ -1461,6 +1494,18 @@ static void x86_machine_class_init(ObjectClass *oc, void *data) NULL, NULL); object_class_property_set_description(oc, "sgx-epc", "SGX EPC device"); + + object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t", + x86_machine_get_notify_window, + x86_machine_set_notify_window, NULL, NULL); + object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW, + "Set the notify window required by notify VM exit"); + + object_class_property_add_bool(oc, X86_MACHINE_NOTIFY_VMEXIT, + x86_machine_get_notify_vmexit, + x86_machine_set_notify_vmexit); + object_class_property_set_description(oc, X86_MACHINE_NOTIFY_VMEXIT, + "Enable notify VM exit"); } static const TypeInfo x86_machine_info = { diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h index 62fa5774f8..5707329fa7 100644 --- a/include/hw/i386/x86.h +++ b/include/hw/i386/x86.h @@ -85,6 +85,9 @@ struct X86MachineState { * which means no limitation on the guest's bus locks. */ uint64_t bus_lock_ratelimit; + + bool notify_vmexit; + uint32_t notify_window; }; #define X86_MACHINE_SMM "smm" @@ -94,6 +97,8 @@ struct X86MachineState { #define X86_MACHINE_OEM_ID "x-oem-id" #define X86_MACHINE_OEM_TABLE_ID "x-oem-table-id" #define X86_MACHINE_BUS_LOCK_RATELIMIT "bus-lock-ratelimit" +#define X86_MACHINE_NOTIFY_VMEXIT "notify-vmexit" +#define X86_MACHINE_NOTIFY_WINDOW "notify-window" #define TYPE_X86_MACHINE MACHINE_TYPE_NAME("x86") OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, X86_MACHINE) diff --git a/qemu-options.hx b/qemu-options.hx index 31c04f7eea..3cdeeac8f3 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -37,7 +37,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \ " memory-encryption=@var{} memory encryption object to use (default=none)\n" " hmat=on|off controls ACPI HMAT support (default=off)\n" " memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n" - " cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n", + " cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n" + " notify_vmexit=on|off,notify_window=n controls notify VM exit support (default=off) and specifies the notify window size (default=0)\n", QEMU_ARCH_ALL) SRST ``-machine [type=]name[,prop=value[,...]]`` @@ -157,6 +158,13 @@ SRST :: -machine cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512k + + ``notify_vmexit=on|off,notify_window=n`` + Enables or disables Notify VM exit support on x86 host and specify + the corresponding notify window to trigger the VM exit if enabled. + This feature can mitigate the CPU stuck issue due to event windows + don't open up for a specified of time (notify window). + The default is off. ERST DEF("M", HAS_ARG, QEMU_OPTION_M, diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 3838827134..ae7fb2c495 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2597,6 +2597,20 @@ int kvm_arch_init(MachineState *ms, KVMState *s) ratelimit_set_speed(&bus_lock_ratelimit_ctrl, x86ms->bus_lock_ratelimit, BUS_LOCK_SLICE_TIME); } + + if (x86ms->notify_vmexit && + kvm_check_extension(s, KVM_CAP_X86_NOTIFY_VMEXIT)) { + uint64_t notify_window_flags = ((uint64_t)x86ms->notify_window << 32) | + KVM_X86_NOTIFY_VMEXIT_ENABLED | + KVM_X86_NOTIFY_VMEXIT_USER; + ret = kvm_vm_enable_cap(s, KVM_CAP_X86_NOTIFY_VMEXIT, 0, + notify_window_flags); + if (ret < 0) { + error_report("kvm: Failed to enable notify vmexit cap: %s", + strerror(-ret)); + return ret; + } + } } return 0; @@ -5141,6 +5155,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) X86CPU *cpu = X86_CPU(cs); uint64_t code; int ret; + struct kvm_vcpu_events events = {}; switch (run->exit_reason) { case KVM_EXIT_HLT: @@ -5196,6 +5211,19 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) /* already handled in kvm_arch_post_run */ ret = 0; break; + case KVM_EXIT_NOTIFY: + ret = 0; + if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) { + warn_report("KVM: invalid context due to notify vmexit"); + if (has_triple_fault_event) { + events.flags |= KVM_VCPUEVENT_VALID_TRIPLE_FAULT; + events.triple_fault.pending = true; + ret = kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, &events); + } else { + ret = -1; + } + } + break; default: fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason); ret = -1;