From patchwork Mon Oct 2 11:57:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 13405884 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC68BE784B3 for ; Mon, 2 Oct 2023 11:58:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236879AbjJBL6k (ORCPT ); Mon, 2 Oct 2023 07:58:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236843AbjJBL63 (ORCPT ); Mon, 2 Oct 2023 07:58:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C28F8D7 for ; Mon, 2 Oct 2023 04:57:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696247866; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KOVqc9nE1wzSloZSIZglCt8Fd/e7Bb/HSk7V8DxF+zY=; b=NYzUfK/YwNVoJ574guj64m25GGKRJJ/vObZfC8YmJTkHdt55QEuHTn2sZLegHBxqFbNd5J Ke5mWxLRUYZwxABSCwaOpn2D7DnUaDu9AOftnt9BfpzGWtEtaIYnf2s8ZcrJ74Yf3x/wXf 47FopIqFi+ThpgQXC7XEfeqQnruVI0I= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-217-AAsvoonTPgymJCVqpBS66A-1; Mon, 02 Oct 2023 07:57:33 -0400 X-MC-Unique: AAsvoonTPgymJCVqpBS66A-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5225F101B042; Mon, 2 Oct 2023 11:57:32 +0000 (UTC) Received: from localhost.localdomain (unknown [10.45.224.55]) by smtp.corp.redhat.com (Postfix) with ESMTP id 12BAE140E953; Mon, 2 Oct 2023 11:57:28 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Will Deacon , linux-kernel@vger.kernel.org, Borislav Petkov , Dave Hansen , x86@kernel.org, Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , Joerg Roedel , Suravee Suthikulpanit , Sean Christopherson , Maxim Levitsky , Robin Murphy , iommu@lists.linux.dev, Paolo Bonzini Subject: [PATCH v3 1/4] KVM: Add per vCPU flag specifying that a vCPU is loaded Date: Mon, 2 Oct 2023 14:57:20 +0300 Message-Id: <20231002115723.175344-2-mlevitsk@redhat.com> In-Reply-To: <20231002115723.175344-1-mlevitsk@redhat.com> References: <20231002115723.175344-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add vcpu->loaded boolean flag specifying that a vCPU is loaded. Such flag can be useful in a vendor code (e.g AVIC) to make decisions based on it. Signed-off-by: Maxim Levitsky --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 10 ++++++++++ 2 files changed, 11 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index fb6c6109fdcad69..331432d86e44d51 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -379,6 +379,7 @@ struct kvm_vcpu { #endif bool preempted; bool ready; + bool loaded; struct kvm_vcpu_arch arch; struct kvm_vcpu_stat stat; char stats_id[KVM_STATS_NAME_SIZE]; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 486800a7024b373..615f2a02b7cb97f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -214,6 +214,10 @@ void vcpu_load(struct kvm_vcpu *vcpu) __this_cpu_write(kvm_running_vcpu, vcpu); preempt_notifier_register(&vcpu->preempt_notifier); kvm_arch_vcpu_load(vcpu, cpu); + + /* Ensure that vcpu->cpu is visible before vcpu->loaded is set to true */ + smp_wmb(); + WRITE_ONCE(vcpu->loaded, true); put_cpu(); } EXPORT_SYMBOL_GPL(vcpu_load); @@ -221,6 +225,12 @@ EXPORT_SYMBOL_GPL(vcpu_load); void vcpu_put(struct kvm_vcpu *vcpu) { preempt_disable(); + WRITE_ONCE(vcpu->loaded, false); + /* + * Ensure that vcpu->loaded is set and visible, + * before KVM actually unloads the vCPU. + */ + smp_wmb(); kvm_arch_vcpu_put(vcpu); preempt_notifier_unregister(&vcpu->preempt_notifier); __this_cpu_write(kvm_running_vcpu, NULL); From patchwork Mon Oct 2 11:57:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 13405882 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 826E3E7849A for ; Mon, 2 Oct 2023 11:58:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236846AbjJBL6g (ORCPT ); Mon, 2 Oct 2023 07:58:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236837AbjJBL63 (ORCPT ); Mon, 2 Oct 2023 07:58:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6507DD3 for ; Mon, 2 Oct 2023 04:57:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696247860; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k2JA8Z38D8eUi/xmH+rBm9nb9Bwfoc30nugRqNPb0OY=; b=d4fMeZt522x/Yg7vVF3sug+c1n8kyw1YcJ95IBZ7hLG+B4pyPnmqfc1sR8a2P7NJyv1M+v 9v6/RfRAtWakLpBOLwDBdPup3CUajvyw6hMUBmbh2uoN16tSmVqQG62UAFnh4rL0l+szfM FvTUtG/jcn+QPdYXRUdOS/tVbco2Llk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-249-CWvtSVzZOEiXnzY8rUs16w-1; Mon, 02 Oct 2023 07:57:37 -0400 X-MC-Unique: CWvtSVzZOEiXnzY8rUs16w-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E4A5F8007A4; Mon, 2 Oct 2023 11:57:35 +0000 (UTC) Received: from localhost.localdomain (unknown [10.45.224.55]) by smtp.corp.redhat.com (Postfix) with ESMTP id A5CB6140E950; Mon, 2 Oct 2023 11:57:32 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Will Deacon , linux-kernel@vger.kernel.org, Borislav Petkov , Dave Hansen , x86@kernel.org, Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , Joerg Roedel , Suravee Suthikulpanit , Sean Christopherson , Maxim Levitsky , Robin Murphy , iommu@lists.linux.dev, Paolo Bonzini Subject: [PATCH v3 2/4] x86: KVM: AVIC: stop using 'is_running' bit in avic_vcpu_put() Date: Mon, 2 Oct 2023 14:57:21 +0300 Message-Id: <20231002115723.175344-3-mlevitsk@redhat.com> In-Reply-To: <20231002115723.175344-1-mlevitsk@redhat.com> References: <20231002115723.175344-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org An optimization was added to avic_vcpu_load() in commit 782f64558de7 ("KVM: SVM: Skip AVIC and IRTE updates when loading blocking vCPU") to avoid re-enabling AVIC if the vCPU is about to block. Such situation arises when a vCPU thread is preempted in between the call to kvm_arch_vcpu_blocking() and before the matching call to kvm_arch_vcpu_unblocking() which in case of AVIC disables/enables the AVIC on this vCPU. The same optimization was done in avic_vcpu_put() however the code was based on physical id table's 'is_running' bit, building upon assumption that if avic_vcpu_load() didn't set it, then kvm doesn't need to disable avic (since it wasn't really enabled). However, once AVIC's IPI virtualization is made optional, this bit might be always false regardless if a vCPU is running or not. To fix this, instead of checking this bit, check the same 'kvm_vcpu_is_blocking()' condition. Also, as a bonus, re-enable the warning for already set 'is_running' bit, if it was found set, during avic_vcpu_put() execution and the vCPU was not blocking a condition which indicates a bug. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 2092db892d7d052..4c75ca15999fcd4 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -1075,16 +1075,10 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu) lockdep_assert_preemption_disabled(); /* - * Note, reading the Physical ID entry outside of ir_list_lock is safe - * as only the pCPU that has loaded (or is loading) the vCPU is allowed - * to modify the entry, and preemption is disabled. I.e. the vCPU - * can't be scheduled out and thus avic_vcpu_{put,load}() can't run - * recursively. + * If the vcpu is blocking, there is no need to do anything. + * See the comment in avic_vcpu_load(). */ - entry = READ_ONCE(*(svm->avic_physical_id_cache)); - - /* Nothing to do if IsRunning == '0' due to vCPU blocking. */ - if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)) + if (kvm_vcpu_is_blocking(vcpu)) return; /* @@ -1099,6 +1093,9 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu) avic_update_iommu_vcpu_affinity(vcpu, -1, 0); + entry = READ_ONCE(*(svm->avic_physical_id_cache)); + WARN_ON_ONCE(!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)); + entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; WRITE_ONCE(*(svm->avic_physical_id_cache), entry); From patchwork Mon Oct 2 11:57:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 13405883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99617E784AC for ; Mon, 2 Oct 2023 11:58:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236873AbjJBL6i (ORCPT ); Mon, 2 Oct 2023 07:58:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236844AbjJBL63 (ORCPT ); Mon, 2 Oct 2023 07:58:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A187EE0 for ; Mon, 2 Oct 2023 04:57:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696247861; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ar0urwZ0bmfOZ0eV2FeowpE2tYaVs6PwEG08j6g7CC4=; b=Wa/ScDUVhNugkGhBb9vBP97WepTd40u4KA0Aqb4jcPiPCs34CtEC1wA7kFNosNRoce7kck xsraxy3Jg6k/1T2Q30B746DONT3RRXMmUD5x2mfVGHfr4zHJISzk7IZTaC+OqUeVxBtznS QbXkqO5m+SEfKLglNZjhQ0rhURroOZU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-652-eA8zi65nNnCoX1cK0Oq2uA-1; Mon, 02 Oct 2023 07:57:40 -0400 X-MC-Unique: eA8zi65nNnCoX1cK0Oq2uA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F3B4285A5BA; Mon, 2 Oct 2023 11:57:39 +0000 (UTC) Received: from localhost.localdomain (unknown [10.45.224.55]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4471C140E950; Mon, 2 Oct 2023 11:57:36 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Will Deacon , linux-kernel@vger.kernel.org, Borislav Petkov , Dave Hansen , x86@kernel.org, Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , Joerg Roedel , Suravee Suthikulpanit , Sean Christopherson , Maxim Levitsky , Robin Murphy , iommu@lists.linux.dev, Paolo Bonzini Subject: [PATCH v3 3/4] x86: KVM: don't read physical ID table entry in avic_pi_update_irte() Date: Mon, 2 Oct 2023 14:57:22 +0300 Message-Id: <20231002115723.175344-4-mlevitsk@redhat.com> In-Reply-To: <20231002115723.175344-1-mlevitsk@redhat.com> References: <20231002115723.175344-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Change AVIC's code to use vcpu->loaded and vcpu->cpu instead of reading back the cpu and 'is_running' bit from the avic's physical id entry. Once AVIC's IPI virtualization is made optional, the 'is_running' bit might always be false regardless if a vCPU is running or not. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 4c75ca15999fcd4..bdab28005ad3405 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -791,7 +791,6 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi) int ret = 0; unsigned long flags; struct amd_svm_iommu_ir *ir; - u64 entry; /** * In some cases, the existing irte is updated and re-set, @@ -832,10 +831,11 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi) * will update the pCPU info when the vCPU awkened and/or scheduled in. * See also avic_vcpu_load(). */ - entry = READ_ONCE(*(svm->avic_physical_id_cache)); - if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK) - amd_iommu_update_ga(entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK, - true, pi->ir_data); + if (READ_ONCE(svm->vcpu.loaded)) { + /* Ensure that the vcpu->loaded is read before the vcpu->cpu */ + smp_rmb(); + amd_iommu_update_ga(READ_ONCE(svm->vcpu.cpu), true, pi->ir_data); + } list_add(&ir->node, &svm->ir_list); spin_unlock_irqrestore(&svm->ir_list_lock, flags); From patchwork Mon Oct 2 11:57:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 13405886 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76473E784AC for ; Mon, 2 Oct 2023 12:03:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236860AbjJBMDz (ORCPT ); Mon, 2 Oct 2023 08:03:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236843AbjJBMDx (ORCPT ); Mon, 2 Oct 2023 08:03:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E74610E4 for ; Mon, 2 Oct 2023 05:02:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696248168; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KRNcUMO/3qZE3ezJDf3zkvX+ebI+kEjoeR2cSJmrJs8=; b=TGp0jq5Z6vST0lSeu+78/MRe5b3+1gkR910JbEohCCKIe8Moe3hE/Q+MtKogFX1YfiqLDJ U9M4V2LVaFBqlYX+wz8E9tzYG8NrUNpNKAy73AjQyJoUfDPefIePMNGUfpujZcM0YdZZ4X 8o9K9V3r2kyllEFQ8T9kqxwJ5ebbVmc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-499-gLzkORjIPI-dG0TowIN7Kw-1; Mon, 02 Oct 2023 08:02:45 -0400 X-MC-Unique: gLzkORjIPI-dG0TowIN7Kw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 958F8811E7B; Mon, 2 Oct 2023 11:57:43 +0000 (UTC) Received: from localhost.localdomain (unknown [10.45.224.55]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5446C140E953; Mon, 2 Oct 2023 11:57:40 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Will Deacon , linux-kernel@vger.kernel.org, Borislav Petkov , Dave Hansen , x86@kernel.org, Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , Joerg Roedel , Suravee Suthikulpanit , Sean Christopherson , Maxim Levitsky , Robin Murphy , iommu@lists.linux.dev, Paolo Bonzini Subject: [PATCH v3 4/4] x86: KVM: SVM: allow optionally to disable AVIC's IPI virtualization Date: Mon, 2 Oct 2023 14:57:23 +0300 Message-Id: <20231002115723.175344-5-mlevitsk@redhat.com> In-Reply-To: <20231002115723.175344-1-mlevitsk@redhat.com> References: <20231002115723.175344-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Zen2 (and likely on Zen1 as well), AVIC doesn't reliably detect a change in the 'is_running' bit during ICR write emulation and might skip a VM exit, if that bit was recently cleared. The absence of the VM exit, leads to the KVM not waking up / triggering nested vm exit on the target(s) of the IPI, which can, in some cases, lead to unbounded delays in the guest execution. As I recently discovered, a reasonable workaround exists: make the KVM never set the is_running bit, which in essence disables the IPI virtualization portion of AVIC making it equal to APICv without IPI virtualization. This workaround ensures that (*) all ICR writes always cause a VM exit and therefore correctly emulated, in expense of never enjoying VM exit-less ICR write emulation. To let the user control the workaround, a new kvm_amd module parameter was added: 'enable_ipiv', using the same name as IPI virtualization of VMX. However unlike VMX, this parameter is tri-state: 0, 1, -1. -1 is the default value which instructs KVM to choose the default based on the CPU model. (*) More correctly all ICR writes except when the 'Self' shorthand is used: In this case AVIC skips reading physid table and just sets bits in IRR of local APIC. Thankfully in this case, the errata is not possible, therefore an extra workaround is not needed. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 51 +++++++++++++++++++++++++++++++---------- 1 file changed, 39 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index bdab28005ad3405..b3ec693083cc883 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -62,6 +62,9 @@ static_assert(__AVIC_GATAG(AVIC_VM_ID_MASK, AVIC_VCPU_ID_MASK) == -1u); static bool force_avic; module_param_unsafe(force_avic, bool, 0444); +static int enable_ipiv = -1; +module_param(enable_ipiv, int, 0444); + /* Note: * This hash table is used to map VM_ID to a struct kvm_svm, * when handling AMD IOMMU GALOG notification to schedule in @@ -1024,7 +1027,6 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r) void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { - u64 entry; int h_physical_id = kvm_cpu_get_apicid(cpu); struct vcpu_svm *svm = to_svm(vcpu); unsigned long flags; @@ -1053,14 +1055,22 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) */ spin_lock_irqsave(&svm->ir_list_lock, flags); - entry = READ_ONCE(*(svm->avic_physical_id_cache)); - WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK); + /* + * Do not update the actual physical id table entry, if the IPI + * virtualization portion of AVIC is not enabled. + * In this case all ICR writes except Self IPIs will be intercepted. + */ + + if (enable_ipiv) { + u64 entry = READ_ONCE(*svm->avic_physical_id_cache); - entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; - entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK); - entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK); + entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; + entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK); + entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + WRITE_ONCE(*(svm->avic_physical_id_cache), entry); + } - WRITE_ONCE(*(svm->avic_physical_id_cache), entry); avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true); spin_unlock_irqrestore(&svm->ir_list_lock, flags); @@ -1068,7 +1078,6 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) void avic_vcpu_put(struct kvm_vcpu *vcpu) { - u64 entry; struct vcpu_svm *svm = to_svm(vcpu); unsigned long flags; @@ -1093,11 +1102,17 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu) avic_update_iommu_vcpu_affinity(vcpu, -1, 0); - entry = READ_ONCE(*(svm->avic_physical_id_cache)); - WARN_ON_ONCE(!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)); + /* + * Do not update the actual physical id table entry if the IPI + * virtualization is disabled. See explanation in avic_vcpu_load(). + */ + if (enable_ipiv) { + u64 entry = READ_ONCE(*svm->avic_physical_id_cache); - entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; - WRITE_ONCE(*(svm->avic_physical_id_cache), entry); + WARN_ON_ONCE(!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)); + entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + WRITE_ONCE(*(svm->avic_physical_id_cache), entry); + } spin_unlock_irqrestore(&svm->ir_list_lock, flags); @@ -1211,5 +1226,17 @@ bool avic_hardware_setup(void) amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier); + if (enable_ipiv == -1) { + enable_ipiv = 1; + /* Assume that Zen1 and Zen2 have errata #1235 */ + if (boot_cpu_data.x86 == 0x17) { + pr_info("AVIC's IPI virtualization disabled due to errata #1235\n"); + enable_ipiv = 0; + } + } + + if (enable_ipiv) + pr_info("AVIC's IPI virtualization enabled\n"); + return true; }