Message ID | 20211209115440.394441-6-mlevitsk@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting | expand |
On 12/9/21 12:54, Maxim Levitsky wrote: > Also reorder call to kvm_apic_update_apicv to be after > .refresh_apicv_exec_ctrl, although that doesn't guarantee > that it will see up to date IRR bits. Can you spell out why do that? Paolo
On Thu, 2021-12-09 at 15:12 +0100, Paolo Bonzini wrote: > On 12/9/21 12:54, Maxim Levitsky wrote: > > Also reorder call to kvm_apic_update_apicv to be after > > .refresh_apicv_exec_ctrl, although that doesn't guarantee > > that it will see up to date IRR bits. > > Can you spell out why do that? Here is what I seen happening during kvm_vcpu_update_apicv when we about to disable AVIC: 1. we call kvm_apic_update_apicv which sets irr_pending == false, because there is nothing in IRR yet. 2. we call kvm_x86_refresh_apicv_exec_ctrl which disables AVIC If IPI arrives in between 1 and 2, the IRR bits are set, and legit there is no VMexit happening so no chance of irr_pending to be set to true. This is why I reordered those calls and added a memory barrier between them (but I didn't post it in the series) However I then found out that even with incomplete IPI handler setting irr_pending, I can here observe irr_pending = true but no bits in IRR so the kvm_apic_update_apicv would reset it. I expected VM exit to be write barrier but it seems that it isn't. However I ended up fixing the incomplete IPI handler to just always - set irr_pending - raise KVM_REQ_EVENT - kick the vcpu Because kicking a sleeping vCPU is just waking it up, and otherwise vcpu kick only sends IPI when the target vCPU is in guest mode anyway. That I think ensures for good that interrupt will be processed by this vCPU regardless of order of these calls, and barrier between them. The only thing I kept is that make kvm_apic_update_apicv never clear irr_pending to make sure it doesn't reset it if it sees the writes out of order. Later the KVM_REQ_EVENT should see writes in order because kvm_make_request includes a write barrier, and the kick should ensure that the vCPU will process that request. So in summary this reorder is not needed anymore but it seems more logical to scan IRR after we disable AVIC. Or on the second though I think we should drop the IRR scan from here at all, now that the callers do vcpu kicks. Best regards, Maxim Levitsky > > Paolo >
On 12/9/21 12:54, Maxim Levitsky wrote: > It is possible that during the AVIC incomplete IPI vmexit, > its handler will set irr_pending to true, > but the target vCPU will still see the IRR bit not set, > due to the apparent lack of memory ordering between CPU's vIRR write > that is supposed to happen prior to the AVIC incomplete IPI > vmexit and the write of the irr_pending in that handler. Are you sure about this? Store-to-store ordering should be guaranteed---if not by the architecture---by existing memory barriers between vmrun returning and avic_incomplete_ipi_interception(). For example, srcu_read_lock implies an smp_mb(). Even more damning: no matter what internal black magic the processor could be using to write to IRR, the processor needs to order the writes against reads of IsRunning on processors without the erratum. That would be equivalent to flushing the store buffer, and it would imply that the write of vIRR is ordered before the write to irr_pending. Paolo
On Fri, 2021-12-10 at 13:07 +0100, Paolo Bonzini wrote: > On 12/9/21 12:54, Maxim Levitsky wrote: > > It is possible that during the AVIC incomplete IPI vmexit, > > its handler will set irr_pending to true, > > but the target vCPU will still see the IRR bit not set, > > due to the apparent lack of memory ordering between CPU's vIRR write > > that is supposed to happen prior to the AVIC incomplete IPI > > vmexit and the write of the irr_pending in that handler. > > Are you sure about this? Store-to-store ordering should be > guaranteed---if not by the architecture---by existing memory barriers > between vmrun returning and avic_incomplete_ipi_interception(). For > example, srcu_read_lock implies an smp_mb(). > > Even more damning: no matter what internal black magic the processor > could be using to write to IRR, the processor needs to order the writes > against reads of IsRunning on processors without the erratum. That > would be equivalent to flushing the store buffer, and it would imply > that the write of vIRR is ordered before the write to irr_pending. > > Paolo > Yes I almost 100% sure now that this patch is wrong. the code was just seeing irr_pending true because it is set to true while APICv/AVIC is use, and was not seeing yet the vIRR bits, because they didn't arrive yet. This this patch isn't needed. Thanks again for help! I am testing your version of fixes to avic inhibition races, and then I'll send a new version of these patches. Best regards, Maxim Levitsky
On Fri, 2021-12-10 at 14:20 +0200, Maxim Levitsky wrote: > On Fri, 2021-12-10 at 13:07 +0100, Paolo Bonzini wrote: > > On 12/9/21 12:54, Maxim Levitsky wrote: > > > It is possible that during the AVIC incomplete IPI vmexit, > > > its handler will set irr_pending to true, > > > but the target vCPU will still see the IRR bit not set, > > > due to the apparent lack of memory ordering between CPU's vIRR write > > > that is supposed to happen prior to the AVIC incomplete IPI > > > vmexit and the write of the irr_pending in that handler. > > > > Are you sure about this? Store-to-store ordering should be > > guaranteed---if not by the architecture---by existing memory barriers > > between vmrun returning and avic_incomplete_ipi_interception(). For > > example, srcu_read_lock implies an smp_mb(). > > > > Even more damning: no matter what internal black magic the processor > > could be using to write to IRR, the processor needs to order the writes > > against reads of IsRunning on processors without the erratum. That > > would be equivalent to flushing the store buffer, and it would imply > > that the write of vIRR is ordered before the write to irr_pending. > > > > Paolo > > > Yes I almost 100% sure now that this patch is wrong. > the code was just seeing irr_pending true because it is set > to true while APICv/AVIC is use, and was not seeing yet the vIRR bits, > because they didn't arrive yet. This this patch isn't needed. > > Thanks again for help! > I am testing your version of fixes to avic inhibition races, > and then I'll send a new version of these patches. > > Best regards, > Maxim Levitsky And yet that patch is needed for a differnt reason. If the sender has AVIC enabled, it can turn on vIRR bits at any moment without setting irr_pending = true - there are no VMexits happeing on the sender side. If we scan vIRR here and see no bits, and *then* disable AVIC, there is a window where the they could legit be turned on without any cpu errata, and we will not have irr_pending == true, and thus the following KVM_REQ_EVENT will make no difference. Not touching irr_pending and letting just the KVM_REQ_EVENT do the work will work too, and if the avic errata is present, reduce slightly the chances of it happening. Best regards, Maxim Levitsky
On 12/10/21 13:47, Maxim Levitsky wrote: > If we scan vIRR here and see no bits, and*then* disable AVIC, > there is a window where the they could legit be turned on without any cpu errata, > and we will not have irr_pending == true, and thus the following > KVM_REQ_EVENT will make no difference. Right. > Not touching irr_pending and letting just the KVM_REQ_EVENT do the work > will work too, Yeah, I think that's preferrable. irr_pending == true is a conservative setting that works; irr_pending will be evaluated again on the first call to apic_clear_irr and that's enough. With that justification, you don't need to reorder the call to kvm_apic_update_apicv to be after kvm_x86_refresh_apicv_exec_ctrl. Paolo and if the avic errata is present, reduce slightly > the chances of it happening.
On Fri, 2021-12-10 at 14:03 +0100, Paolo Bonzini wrote: > On 12/10/21 13:47, Maxim Levitsky wrote: > > If we scan vIRR here and see no bits, and*then* disable AVIC, > > there is a window where the they could legit be turned on without any cpu errata, > > and we will not have irr_pending == true, and thus the following > > KVM_REQ_EVENT will make no difference. > > Right. > > > Not touching irr_pending and letting just the KVM_REQ_EVENT do the work > > will work too, > > Yeah, I think that's preferrable. irr_pending == true is a conservative > setting that works; irr_pending will be evaluated again on the first > call to apic_clear_irr and that's enough. > > With that justification, you don't need to reorder the call to > kvm_apic_update_apicv to be after kvm_x86_refresh_apicv_exec_ctrl. Yes exactly! but no need to scan IRR here since irr_pending is already true at that point anyway - it is always true while avic is enabled. Best regards, Maxim Levitsky > > Paolo > > and if the avic errata is present, reduce slightly > > the chances of it happening.
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index c5028e6b0f96..ecd6111b9a0d 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2314,7 +2314,8 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu) apic->irr_pending = true; apic->isr_count = 1; } else { - apic->irr_pending = (apic_search_irr(apic) != -1); + if (apic_search_irr(apic) != -1) + apic->irr_pending = true; apic->isr_count = count_vectors(apic->regs + APIC_ISR); } } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 26cb3a4cd0e9..ca037ac2ea08 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9542,8 +9542,8 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu) goto out; vcpu->arch.apicv_active = activate; - kvm_apic_update_apicv(vcpu); static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu); + kvm_apic_update_apicv(vcpu); /* * When APICv gets disabled, we may still have injected interrupts
It is possible that during the AVIC incomplete IPI vmexit, its handler will set irr_pending to true, but the target vCPU will still see the IRR bit not set, due to the apparent lack of memory ordering between CPU's vIRR write that is supposed to happen prior to the AVIC incomplete IPI vmexit and the write of the irr_pending in that handler. The AVIC incomplete IPI handler sets this boolean, then issues a write barrier and then raises KVM_REQ_EVENT, thus when we later process the KVM_REQ_EVENT we will notice the vIRR bits set. Also reorder call to kvm_apic_update_apicv to be after .refresh_apicv_exec_ctrl, although that doesn't guarantee that it will see up to date IRR bits. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> --- arch/x86/kvm/lapic.c | 3 ++- arch/x86/kvm/x86.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)