From patchwork Mon Dec 13 10:46:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12673701 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4BA6C433F5 for ; Mon, 13 Dec 2021 10:47:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235824AbhLMKry (ORCPT ); Mon, 13 Dec 2021 05:47:54 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:45255 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236542AbhLMKrx (ORCPT ); Mon, 13 Dec 2021 05:47:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639392473; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=59HMtpTMQ9i7eUmXBrMB6oGvSjYwRln+nBm30V7CKiY=; b=cvDfVlRkkCuLsouYzfGS0RKpeN53G7baRdkUcoii0cLPVQBtKF59WLgF7eb65zUueNCihD OxkmnozQchzJfOMGgmKVbuYieTlDUU/MCsG8FQMxG6WffI5okYsOJsXLhN4LYBUj0kMESV /NP6qyIoTvNWN3XmzEqD0ZAfWkIhJMw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-438-ELnzw9ZaOCeLXC2pEKNjHw-1; Mon, 13 Dec 2021 05:47:50 -0500 X-MC-Unique: ELnzw9ZaOCeLXC2pEKNjHw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DB1B5802921; Mon, 13 Dec 2021 10:47:48 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4FF595F70B; Mon, 13 Dec 2021 10:47:23 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Jim Mattson , Thomas Gleixner , Joerg Roedel , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), Vitaly Kuznetsov , Borislav Petkov , linux-kernel@vger.kernel.org, Paolo Bonzini , Dave Hansen , "H. Peter Anvin" , Sean Christopherson , Wanpeng Li , Ingo Molnar , Maxim Levitsky Subject: [PATCH v2 1/5] KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control EFLAGS.IF Date: Mon, 13 Dec 2021 12:46:30 +0200 Message-Id: <20211213104634.199141-2-mlevitsk@redhat.com> In-Reply-To: <20211213104634.199141-1-mlevitsk@redhat.com> References: <20211213104634.199141-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Fix a corner case in which L1 hypervisor intercepts interrupts (INTERCEPT_INTR) and either doesn't use virtual interrupt masking (V_INTR_MASKING) or enters a nested guest with EFLAGS.IF disabled prior to the entry. In this case, despite the fact that L1 intercepts the interrupts, KVM still needs to set up an interrupt window to wait before it can deliver INTR vmexit. Currently instead, the KVM enters an endless loop of 'req_immediate_exit'. Note that on VMX this case is impossible as there is only 'vmexit on external interrupts' execution control which either set, in which case both host and guest's EFLAGS.IF is ignored, or clear, in which case no VMexit is delivered. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/svm.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index e57e6857e0630..c9668a3b51011 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3372,17 +3372,21 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu) static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection) { struct vcpu_svm *svm = to_svm(vcpu); + bool blocked; + if (svm->nested.nested_run_pending) return -EBUSY; + blocked = svm_interrupt_blocked(vcpu); + /* * An IRQ must not be injected into L2 if it's supposed to VM-Exit, * e.g. if the IRQ arrived asynchronously after checking nested events. */ if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(svm)) - return -EBUSY; - - return !svm_interrupt_blocked(vcpu); + return !blocked ? -EBUSY : 0; + else + return !blocked; } static void svm_enable_irq_window(struct kvm_vcpu *vcpu) From patchwork Mon Dec 13 10:46:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12673703 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45E59C433F5 for ; Mon, 13 Dec 2021 10:48:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234458AbhLMKsK (ORCPT ); Mon, 13 Dec 2021 05:48:10 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:39503 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230033AbhLMKsJ (ORCPT ); Mon, 13 Dec 2021 05:48:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639392488; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jnXL5IiyKlqaAw3UKpSxxZyirByxfhySDRurZ1Ij+oA=; b=LSvT6HTwg1+tCPJReYwkLDU22+VgBlyu3xyhh01babzCRP/17R1CA1mHKUGRpUUNFO6l5Y j/sBmfB+zczxSdpYIR7pmt7UA2GKibQkRJWuFDRCVTT1Pl+5RQBhK4CEVtAOEj3YA/q4vv V4lvqLUyiMXJ9jwwOthP7mTR4LM+jkU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-537-XBVA_j1IOd2ehJoQPF96TQ-1; Mon, 13 Dec 2021 05:48:05 -0500 X-MC-Unique: XBVA_j1IOd2ehJoQPF96TQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9D49664083; Mon, 13 Dec 2021 10:48:03 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id D46595F70B; Mon, 13 Dec 2021 10:47:49 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Jim Mattson , Thomas Gleixner , Joerg Roedel , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), Vitaly Kuznetsov , Borislav Petkov , linux-kernel@vger.kernel.org, Paolo Bonzini , Dave Hansen , "H. Peter Anvin" , Sean Christopherson , Wanpeng Li , Ingo Molnar , Maxim Levitsky Subject: [PATCH v2 2/5] KVM: SVM: allow to force AVIC to be enabled Date: Mon, 13 Dec 2021 12:46:31 +0200 Message-Id: <20211213104634.199141-3-mlevitsk@redhat.com> In-Reply-To: <20211213104634.199141-1-mlevitsk@redhat.com> References: <20211213104634.199141-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Apparently on some systems AVIC is disabled in CPUID but still usable. Allow the user to override the CPUID if the user is willing to take the risk. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/svm.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index c9668a3b51011..468cc385c35f0 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -206,6 +206,9 @@ module_param(tsc_scaling, int, 0444); static bool avic; module_param(avic, bool, 0444); +static bool force_avic; +module_param_unsafe(force_avic, bool, 0444); + bool __read_mostly dump_invalid_vmcb; module_param(dump_invalid_vmcb, bool, 0644); @@ -4656,10 +4659,14 @@ static __init int svm_hardware_setup(void) nrips = false; } - enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC); + enable_apicv = avic = avic && npt_enabled && (boot_cpu_has(X86_FEATURE_AVIC) || force_avic); if (enable_apicv) { - pr_info("AVIC enabled\n"); + if (!boot_cpu_has(X86_FEATURE_AVIC)) { + pr_warn("AVIC is not supported in CPUID but force enabled"); + pr_warn("Your system might crash and burn"); + } else + pr_info("AVIC enabled\n"); amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier); } else { From patchwork Mon Dec 13 10:46:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12673705 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87E37C433F5 for ; Mon, 13 Dec 2021 10:48:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236278AbhLMKsY (ORCPT ); Mon, 13 Dec 2021 05:48:24 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:47819 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235468AbhLMKsX (ORCPT ); Mon, 13 Dec 2021 05:48:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639392502; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L8woNSjiJXqRPy5qaUU/wxUTPQoIrL9aaF3/RmpsZcw=; b=IuqgJ0WPgOv+YKwgsFKeDAo9/4AyL56BAYcnVMdXnwSuc0NECAnkGV17U4DbRQRBSm+nN6 ND58xJJApOquMpk9Vbp+7hkGatXCs1Y+xpb062J39MIVnjtMrw/bbBggIXKqhKeS/M8+Zq OOTktZMJE/2iFe5jk0kAjnMZv91b2PI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-418-Oi11apmLMROUobslDcMCfQ-1; Mon, 13 Dec 2021 05:48:19 -0500 X-MC-Unique: Oi11apmLMROUobslDcMCfQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BE9B269737; Mon, 13 Dec 2021 10:48:17 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 189895F70B; Mon, 13 Dec 2021 10:48:03 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Jim Mattson , Thomas Gleixner , Joerg Roedel , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), Vitaly Kuznetsov , Borislav Petkov , linux-kernel@vger.kernel.org, Paolo Bonzini , Dave Hansen , "H. Peter Anvin" , Sean Christopherson , Wanpeng Li , Ingo Molnar , Maxim Levitsky Subject: [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition Date: Mon, 13 Dec 2021 12:46:32 +0200 Message-Id: <20211213104634.199141-4-mlevitsk@redhat.com> In-Reply-To: <20211213104634.199141-1-mlevitsk@redhat.com> References: <20211213104634.199141-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If svm_deliver_avic_intr is called just after the target vcpu's AVIC got inhibited, it might read a stale value of vcpu->arch.apicv_active which can lead to the target vCPU not noticing the interrupt. To fix this use load-acquire/store-release so that, if the target vCPU is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the AVIC. If AVIC has been disabled in the meanwhile, proceed with the KVM_REQ_EVENT-based delivery. All this complicated logic is actually exactly how we can handle an incomplete IPI vmexit; the only difference lies in who sets IRR, whether KVM or the processor. Also incomplete IPI vmexit, has the same races as svm_deliver_avic_intr. therefore just reuse the avic_kick_target_vcpu for it as well. Reported-by: Maxim Levitsky Co-developed-with: Paolo Bonzini Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 85 +++++++++++++++++++++++++---------------- arch/x86/kvm/x86.c | 4 +- 2 files changed, 55 insertions(+), 34 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 90364d02f22aa..34f62da2fbadd 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -289,6 +289,47 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu) return 0; } +static void avic_kick_target_vcpu(struct kvm_vcpu *vcpu) +{ + bool in_guest_mode; + + /* + * vcpu->arch.apicv_active is read after vcpu->mode. Pairs + * with smp_store_release in vcpu_enter_guest. + */ + in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE); + if (READ_ONCE(vcpu->arch.apicv_active)) { + if (in_guest_mode) { + /* + * Signal the doorbell to tell hardware to inject the IRQ if the vCPU + * is in the guest. If the vCPU is not in the guest, hardware will + * automatically process AVIC interrupts at VMRUN. + * + * Note, the vCPU could get migrated to a different pCPU at any + * point, which could result in signalling the wrong/previous + * pCPU. But if that happens the vCPU is guaranteed to do a + * VMRUN (after being migrated) and thus will process pending + * interrupts, i.e. a doorbell is not needed (and the spurious + * one is harmless). + */ + int cpu = READ_ONCE(vcpu->cpu); + if (cpu != get_cpu()) + wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu)); + put_cpu(); + } else { + /* + * Wake the vCPU if it was blocking. KVM will then detect the + * pending IRQ when checking if the vCPU has a wake event. + */ + kvm_vcpu_wake_up(vcpu); + } + } else { + /* Compare this case with __apic_accept_irq. */ + kvm_make_request(KVM_REQ_EVENT, vcpu); + kvm_vcpu_kick(vcpu); + } +} + static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source, u32 icrl, u32 icrh) { @@ -304,8 +345,10 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source, kvm_for_each_vcpu(i, vcpu, kvm) { if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK, GET_APIC_DEST_FIELD(icrh), - icrl & APIC_DEST_MASK)) - kvm_vcpu_wake_up(vcpu); + icrl & APIC_DEST_MASK)) { + vcpu->arch.apic->irr_pending = true; + avic_kick_target_vcpu(vcpu); + } } } @@ -671,9 +714,12 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) { - if (!vcpu->arch.apicv_active) - return -1; - + /* + * Below, we have to handle anyway the case of AVIC being disabled + * in the middle of this function, and there is hardly any overhead + * if AVIC is disabled. So, we do not bother returning -1 and handle + * the kick ourselves for disabled APICv. + */ kvm_lapic_set_irr(vec, vcpu->arch.apic); /* @@ -684,34 +730,7 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) * the doorbell if the vCPU is already running in the guest. */ smp_mb__after_atomic(); - - /* - * Signal the doorbell to tell hardware to inject the IRQ if the vCPU - * is in the guest. If the vCPU is not in the guest, hardware will - * automatically process AVIC interrupts at VMRUN. - */ - if (vcpu->mode == IN_GUEST_MODE) { - int cpu = READ_ONCE(vcpu->cpu); - - /* - * Note, the vCPU could get migrated to a different pCPU at any - * point, which could result in signalling the wrong/previous - * pCPU. But if that happens the vCPU is guaranteed to do a - * VMRUN (after being migrated) and thus will process pending - * interrupts, i.e. a doorbell is not needed (and the spurious - * one is harmless). - */ - if (cpu != get_cpu()) - wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu)); - put_cpu(); - } else { - /* - * Wake the vCPU if it was blocking. KVM will then detect the - * pending IRQ when checking if the vCPU has a wake event. - */ - kvm_vcpu_wake_up(vcpu); - } - + avic_kick_target_vcpu(vcpu); return 0; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 85127b3e3690b..81a74d86ee5eb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9869,7 +9869,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) * result in virtual interrupt delivery. */ local_irq_disable(); - vcpu->mode = IN_GUEST_MODE; + + /* Store vcpu->apicv_active before vcpu->mode. */ + smp_store_release(&vcpu->mode, IN_GUEST_MODE); srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); From patchwork Mon Dec 13 10:46:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12673707 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA4CFC433F5 for ; Mon, 13 Dec 2021 10:48:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235961AbhLMKsx (ORCPT ); Mon, 13 Dec 2021 05:48:53 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:47325 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235112AbhLMKsu (ORCPT ); Mon, 13 Dec 2021 05:48:50 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639392530; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wOCjHHsPFgaO4I7ekXyOaxYIjaRIzYI5BxBxhwzZrN8=; b=Ne0AbHXER5RRsSeb8TilTDJ6FTQ/XqvXsvrpX1njfeEnn3GyOu4uek8cnPBc844mk1iRT5 R0p/+8HJMAcU5HaTwN5na123PiQpKJxpQo32Z/YmjLD+y0OJpJYmZLSkrvC4gDZEXM2Ep5 w4Di06AcN01I55VSOKv7h2YZBdknJVY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-341-KNkLwAAyMIG_Bzts0B2haA-1; Mon, 13 Dec 2021 05:48:47 -0500 X-MC-Unique: KNkLwAAyMIG_Bzts0B2haA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4DBA8193F560; Mon, 13 Dec 2021 10:48:45 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 399F218035; Mon, 13 Dec 2021 10:48:17 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Jim Mattson , Thomas Gleixner , Joerg Roedel , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), Vitaly Kuznetsov , Borislav Petkov , linux-kernel@vger.kernel.org, Paolo Bonzini , Dave Hansen , "H. Peter Anvin" , Sean Christopherson , Wanpeng Li , Ingo Molnar , Maxim Levitsky Subject: [PATCH v2 4/5] KVM: x86: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it Date: Mon, 13 Dec 2021 12:46:33 +0200 Message-Id: <20211213104634.199141-5-mlevitsk@redhat.com> In-Reply-To: <20211213104634.199141-1-mlevitsk@redhat.com> References: <20211213104634.199141-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org kvm_apic_update_apicv is called when AVIC is still active, thus IRR bits can be set by the CPU after it was called, and won't cause the irr_pending to be set to true. Also the logic in avic_kick_target_vcpu doesn't expect a race with this function. To make it simple, just keep irr_pending set to true and let the next interrupt injection to the guest clear it. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/lapic.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index baca9fa37a91c..6e1fbbf4c508b 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2312,7 +2312,10 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu) apic->irr_pending = true; apic->isr_count = 1; } else { - apic->irr_pending = (apic_search_irr(apic) != -1); + /* + * Don't touch irr_pending, let it be cleared when + * we process the interrupt + */ apic->isr_count = count_vectors(apic->regs + APIC_ISR); } } From patchwork Mon Dec 13 10:46:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12673711 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4A57C433F5 for ; Mon, 13 Dec 2021 10:49:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235409AbhLMKtT (ORCPT ); Mon, 13 Dec 2021 05:49:19 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:27548 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236527AbhLMKtP (ORCPT ); Mon, 13 Dec 2021 05:49:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639392554; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rs40bgCePNpjQHCYG1lxNyn1f0wqZimcgcMBHpgI8oo=; b=VlISA27jLZ/wS4BG0QmuhPmsgATb9D7hrzBpeD/Q9SP2Rs5LeK1n3MfhkRh8ICvNu0nLgc D4xqBKhF81H8vUm1n1OGWAQaOOZhVnQnC1Ruo3BC0YDK9NcuIbZ08lLhT3QJFAxLmaKKPr mzQdS/ZIjHJ+XIrIZsooRTyv7Xe2l0Q= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-150-WgxIV8nHOP-d6jztXaJgTQ-1; Mon, 13 Dec 2021 05:49:08 -0500 X-MC-Unique: WgxIV8nHOP-d6jztXaJgTQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 944DE760D5; Mon, 13 Dec 2021 10:49:06 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id A004E18035; Mon, 13 Dec 2021 10:48:45 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Jim Mattson , Thomas Gleixner , Joerg Roedel , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), Vitaly Kuznetsov , Borislav Petkov , linux-kernel@vger.kernel.org, Paolo Bonzini , Dave Hansen , "H. Peter Anvin" , Sean Christopherson , Wanpeng Li , Ingo Molnar , Maxim Levitsky Subject: [PATCH v2 5/5] KVM: SVM: allow AVIC to co-exist with a nested guest running Date: Mon, 13 Dec 2021 12:46:34 +0200 Message-Id: <20211213104634.199141-6-mlevitsk@redhat.com> In-Reply-To: <20211213104634.199141-1-mlevitsk@redhat.com> References: <20211213104634.199141-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Inhibit the AVIC of the vCPU that is running nested for the duration of the nested run, so that all interrupts arriving from both its vCPU siblings and from the KVM are delivered using normal IPIs and cause that vCPU to vmexit. Note that unlike normal AVIC inhibition, there is no need to update the AVIC mmio memslot, because the nested guest uses its own set of paging tables. That also means that AVIC doesn't need to be inhibited VM wide. Note that in theory when a nested guest doesn't intercept physical interrupts, we could continue using AVIC to deliver them to it but don't bother doing this. Plus when nested AVIC is implemented, the nested guest will likely use it, which will not allow this optimization to be used anyway. (can't use real AVIC to support both L1 and L2 at the same time) Signed-off-by: Maxim Levitsky --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 7 ++++++- arch/x86/kvm/svm/avic.c | 6 +++++- arch/x86/kvm/svm/nested.c | 11 ++++++----- arch/x86/kvm/svm/svm.c | 30 +++++++++++++++++++----------- arch/x86/kvm/svm/svm.h | 1 + arch/x86/kvm/x86.c | 13 ++++++++++++- 7 files changed, 50 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index c2b007171abd2..d9d7459ef9e8f 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -119,6 +119,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush) KVM_X86_OP_NULL(migrate_timers) KVM_X86_OP(msr_filter_changed) KVM_X86_OP_NULL(complete_emulated_msr) +KVM_X86_OP_NULL(apicv_check_inhibit); #undef KVM_X86_OP #undef KVM_X86_OP_NULL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e863d569c89a4..78b3793cc08c5 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1036,7 +1036,6 @@ struct kvm_x86_msr_filter { #define APICV_INHIBIT_REASON_DISABLE 0 #define APICV_INHIBIT_REASON_HYPERV 1 -#define APICV_INHIBIT_REASON_NESTED 2 #define APICV_INHIBIT_REASON_IRQWIN 3 #define APICV_INHIBIT_REASON_PIT_REINJ 4 #define APICV_INHIBIT_REASON_X2APIC 5 @@ -1486,6 +1485,12 @@ struct kvm_x86_ops { int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); + + /* + * Returns false if for some reason APICv (e.g guest mode) + * must be inhibited on this vCPU + */ + bool (*apicv_check_inhibit)(struct kvm_vcpu *vcpu); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 34f62da2fbadd..5a8304938f51e 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -734,6 +734,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) return 0; } +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu) +{ + return is_guest_mode(vcpu); +} + bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu) { return false; @@ -950,7 +955,6 @@ bool svm_check_apicv_inhibit_reasons(ulong bit) ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) | BIT(APICV_INHIBIT_REASON_ABSENT) | BIT(APICV_INHIBIT_REASON_HYPERV) | - BIT(APICV_INHIBIT_REASON_NESTED) | BIT(APICV_INHIBIT_REASON_IRQWIN) | BIT(APICV_INHIBIT_REASON_PIT_REINJ) | BIT(APICV_INHIBIT_REASON_X2APIC) | diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index cf206855ebf09..bf17c2d7cf321 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -551,11 +551,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm) * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes. */ - /* - * Also covers avic_vapic_bar, avic_backing_page, avic_logical_id, - * avic_physical_id. - */ - WARN_ON(kvm_apicv_activated(svm->vcpu.kvm)); /* Copied from vmcb01. msrpm_base can be overwritten later. */ svm->vmcb->control.nested_ctl = svm->vmcb01.ptr->control.nested_ctl; @@ -659,6 +654,9 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, svm_set_gif(svm, true); + if (kvm_vcpu_apicv_active(vcpu)) + kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu); + return 0; } @@ -923,6 +921,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm) if (unlikely(svm->vmcb->save.rflags & X86_EFLAGS_TF)) kvm_queue_exception(&(svm->vcpu), DB_VECTOR); + if (kvm_apicv_activated(vcpu->kvm)) + kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu); + return 0; } diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 468cc385c35f0..e4228580286e8 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1383,7 +1383,8 @@ static void svm_set_vintr(struct vcpu_svm *svm) /* * The following fields are ignored when AVIC is enabled */ - WARN_ON(kvm_apicv_activated(svm->vcpu.kvm)); + if (!is_guest_mode(&svm->vcpu)) + WARN_ON(kvm_apicv_activated(svm->vcpu.kvm)); svm_set_intercept(svm, INTERCEPT_VINTR); @@ -2853,10 +2854,16 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu) svm_clear_vintr(to_svm(vcpu)); /* - * For AVIC, the only reason to end up here is ExtINTs. + * If not running nested, for AVIC, the only reason to end up here is ExtINTs. * In this case AVIC was temporarily disabled for * requesting the IRQ window and we have to re-enable it. + * + * If running nested, still uninhibit the AVIC in case irq window + * was requested when it was not running nested. + * All vCPUs which run nested will have their AVIC still + * inhibited due to AVIC inhibition override for that. */ + kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_IRQWIN); ++vcpu->stat.irq_window_exits; @@ -3410,8 +3417,16 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu) * unless we have pending ExtINT since it cannot be injected * via AVIC. In such case, we need to temporarily disable AVIC, * and fallback to injecting IRQ via V_IRQ. + * + * If running nested, this vCPU will use separate page tables + * which don't have L1's AVIC mapped, and the AVIC is + * already inhibited thus there is no need for global + * AVIC inhibition. */ - kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN); + + if (!is_guest_mode(vcpu)) + kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN); + svm_set_vintr(svm); } } @@ -3881,14 +3896,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC)) kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_X2APIC); - - /* - * Currently, AVIC does not work with nested virtualization. - * So, we disable AVIC when cpuid for SVM is set in the L1 guest. - */ - if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM)) - kvm_request_apicv_update(vcpu->kvm, false, - APICV_INHIBIT_REASON_NESTED); } init_vmcb_after_set_cpuid(vcpu); } @@ -4486,6 +4493,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .complete_emulated_msr = svm_complete_emulated_msr, .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, + .apicv_check_inhibit = avic_is_vcpu_inhibited, }; /* diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index daa8ca84afccd..545684ea37353 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -590,6 +590,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr); void svm_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr); int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec); +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu); bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu); int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq, bool set); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 81a74d86ee5eb..125599855af47 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9161,6 +9161,10 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit) r = kvm_check_nested_events(vcpu); if (r < 0) goto out; + + /* Nested VM exit might need to update APICv status */ + if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu)) + kvm_vcpu_update_apicv(vcpu); } /* try to inject new event if pending */ @@ -9538,6 +9542,10 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu) down_read(&vcpu->kvm->arch.apicv_update_lock); activate = kvm_apicv_activated(vcpu->kvm); + + if (kvm_x86_ops.apicv_check_inhibit) + activate = activate && !kvm_x86_ops.apicv_check_inhibit(vcpu); + if (vcpu->arch.apicv_active == activate) goto out; @@ -9935,7 +9943,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) * per-VM state, and responsing vCPUs must wait for the update * to complete before servicing KVM_REQ_APICV_UPDATE. */ - WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu)); + if (!is_guest_mode(vcpu)) + WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu)); + else + WARN_ON(kvm_vcpu_apicv_active(vcpu)); exit_fastpath = static_call(kvm_x86_run)(vcpu); if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))