Message ID | 20211009021236.4122790-25-seanjc@google.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | KVM: Halt-polling and x86 APICv overhaul | expand |
On 09/10/21 04:12, Sean Christopherson wrote: > Don't update Posted Interrupt's NDST, a.k.a. the target pCPU, in the > pre-block path, as NDST is guaranteed to be up-to-date. The comment > about the vCPU being preempted during the update is simply wrong, as the > update path runs with IRQs disabled (from before snapshotting vcpu->cpu, > until after the update completes). Right, it didn't as of commit bf9f6ac8d74969690df1485b33b7c238ca9f2269 (when VT-d posted interrupts were introduced). The interrupt disable/enable pair was added in the same commit that motivated the introduction of the sanity checks: commit 8b306e2f3c41939ea528e6174c88cfbfff893ce1 Author: Paolo Bonzini <pbonzini@redhat.com> Date: Tue Jun 6 12:57:05 2017 +0200 KVM: VMX: avoid double list add with VT-d posted interrupts In some cases, for example involving hot-unplug of assigned devices, pi_post_block can forget to remove the vCPU from the blocked_vcpu_list. When this happens, the next call to pi_pre_block corrupts the list. Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block and WARN instead of adding the element twice in the list. Second, always do the list removal in pi_post_block if vcpu->pre_pcpu is set (not -1). The new code keeps interrupts disabled for the whole duration of pi_pre_block/pi_post_block. This is not strictly necessary, but easier to follow. For the same reason, PI.ON is checked only after the cmpxchg, and to handle it we just call the post-block code. This removes duplication of the list removal code. At the time, I didn't notice the now useless NDST update. Paolo > The vCPU can get preempted_before_ the update starts, but not during. > And if the vCPU is preempted before, vmx_vcpu_pi_load() is responsible > for updating NDST when the vCPU is scheduled back in. In that case, the > check against the wakeup vector in vmx_vcpu_pi_load() cannot be true as > that would require the notification vector to have been set to the wakeup > vector_before_ blocking. > > Opportunistically switch to using vcpu->cpu for the list/lock lookups, > which presumably used pre_pcpu only for some phantom preemption logic.
On Mon, Oct 25, 2021, Paolo Bonzini wrote: > On 09/10/21 04:12, Sean Christopherson wrote: > > Don't update Posted Interrupt's NDST, a.k.a. the target pCPU, in the > > pre-block path, as NDST is guaranteed to be up-to-date. The comment > > about the vCPU being preempted during the update is simply wrong, as the > > update path runs with IRQs disabled (from before snapshotting vcpu->cpu, > > until after the update completes). > > Right, it didn't as of commit bf9f6ac8d74969690df1485b33b7c238ca9f2269 (when > VT-d posted interrupts were introduced). > > The interrupt disable/enable pair was added in the same commit that > motivated the introduction of the sanity checks: Ya, I found that commit when digging around for different commit in the series and forgot to come back to this changelog. I'll incorporate this info into the next version. > commit 8b306e2f3c41939ea528e6174c88cfbfff893ce1 > Author: Paolo Bonzini <pbonzini@redhat.com> > Date: Tue Jun 6 12:57:05 2017 +0200 > > KVM: VMX: avoid double list add with VT-d posted interrupts > > In some cases, for example involving hot-unplug of assigned > devices, pi_post_block can forget to remove the vCPU from the > blocked_vcpu_list. When this happens, the next call to > pi_pre_block corrupts the list. > > Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block > and WARN instead of adding the element twice in the list. Second, > always do the list removal in pi_post_block if vcpu->pre_pcpu is > set (not -1). > > The new code keeps interrupts disabled for the whole duration of > pi_pre_block/pi_post_block. This is not strictly necessary, but > easier to follow. For the same reason, PI.ON is checked only > after the cmpxchg, and to handle it we just call the post-block > code. This removes duplication of the list removal code. > > At the time, I didn't notice the now useless NDST update. > > Paolo > > > The vCPU can get preempted_before_ the update starts, but not during. > > And if the vCPU is preempted before, vmx_vcpu_pi_load() is responsible > > for updating NDST when the vCPU is scheduled back in. In that case, the > > check against the wakeup vector in vmx_vcpu_pi_load() cannot be true as > > that would require the notification vector to have been set to the wakeup > > vector_before_ blocking. > > > > Opportunistically switch to using vcpu->cpu for the list/lock lookups, > > which presumably used pre_pcpu only for some phantom preemption logic. >
On Fri, 2021-10-08 at 19:12 -0700, Sean Christopherson wrote: > Don't update Posted Interrupt's NDST, a.k.a. the target pCPU, in the > pre-block path, as NDST is guaranteed to be up-to-date. The comment > about the vCPU being preempted during the update is simply wrong, as the > update path runs with IRQs disabled (from before snapshotting vcpu->cpu, > until after the update completes). > > The vCPU can get preempted _before_ the update starts, but not during. > And if the vCPU is preempted before, vmx_vcpu_pi_load() is responsible > for updating NDST when the vCPU is scheduled back in. In that case, the > check against the wakeup vector in vmx_vcpu_pi_load() cannot be true as > that would require the notification vector to have been set to the wakeup > vector _before_ blocking. > > Opportunistically switch to using vcpu->cpu for the list/lock lookups, > which presumably used pre_pcpu only for some phantom preemption logic. > > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/vmx/posted_intr.c | 23 +++-------------------- > 1 file changed, 3 insertions(+), 20 deletions(-) > > diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c > index 1688f8dc535a..239e0e72a0dd 100644 > --- a/arch/x86/kvm/vmx/posted_intr.c > +++ b/arch/x86/kvm/vmx/posted_intr.c > @@ -130,7 +130,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu) > * - Store the vCPU to the wakeup list, so when interrupts happen > * we can find the right vCPU to wake up. > * - Change the Posted-interrupt descriptor as below: > - * 'NDST' <-- vcpu->pre_pcpu > * 'NV' <-- POSTED_INTR_WAKEUP_VECTOR > * - If 'ON' is set during this process, which means at least one > * interrupt is posted for this vCPU, we cannot block it, in > @@ -139,7 +138,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu) > */ > int pi_pre_block(struct kvm_vcpu *vcpu) > { > - unsigned int dest; > struct pi_desc old, new; > struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu); > > @@ -153,10 +151,10 @@ int pi_pre_block(struct kvm_vcpu *vcpu) > local_irq_disable(); > > vcpu->pre_pcpu = vcpu->cpu; > - spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu)); > + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->cpu)); > list_add_tail(&vcpu->blocked_vcpu_list, > - &per_cpu(blocked_vcpu_on_cpu, vcpu->pre_pcpu)); > - spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu)); > + &per_cpu(blocked_vcpu_on_cpu, vcpu->cpu)); > + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->cpu)); > > WARN(pi_desc->sn == 1, > "Posted Interrupt Suppress Notification set before blocking"); > @@ -164,21 +162,6 @@ int pi_pre_block(struct kvm_vcpu *vcpu) > do { > old.control = new.control = pi_desc->control; > > - /* > - * Since vCPU can be preempted during this process, > - * vcpu->cpu could be different with pre_pcpu, we > - * need to set pre_pcpu as the destination of wakeup > - * notification event, then we can find the right vCPU > - * to wakeup in wakeup handler if interrupts happen > - * when the vCPU is in blocked state. > - */ > - dest = cpu_physical_id(vcpu->pre_pcpu); > - > - if (x2apic_mode) > - new.ndst = dest; > - else > - new.ndst = (dest << 8) & 0xFF00; > - > /* set 'NV' to 'wakeup vector' */ > new.nv = POSTED_INTR_WAKEUP_VECTOR; > } while (cmpxchg64(&pi_desc->control, old.control, Reviewed-by : Maxim Levitsky <mlevitsk@redhat.com> Best regards, Maxim Levitsky
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c index 1688f8dc535a..239e0e72a0dd 100644 --- a/arch/x86/kvm/vmx/posted_intr.c +++ b/arch/x86/kvm/vmx/posted_intr.c @@ -130,7 +130,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu) * - Store the vCPU to the wakeup list, so when interrupts happen * we can find the right vCPU to wake up. * - Change the Posted-interrupt descriptor as below: - * 'NDST' <-- vcpu->pre_pcpu * 'NV' <-- POSTED_INTR_WAKEUP_VECTOR * - If 'ON' is set during this process, which means at least one * interrupt is posted for this vCPU, we cannot block it, in @@ -139,7 +138,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu) */ int pi_pre_block(struct kvm_vcpu *vcpu) { - unsigned int dest; struct pi_desc old, new; struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu); @@ -153,10 +151,10 @@ int pi_pre_block(struct kvm_vcpu *vcpu) local_irq_disable(); vcpu->pre_pcpu = vcpu->cpu; - spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu)); + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->cpu)); list_add_tail(&vcpu->blocked_vcpu_list, - &per_cpu(blocked_vcpu_on_cpu, vcpu->pre_pcpu)); - spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu)); + &per_cpu(blocked_vcpu_on_cpu, vcpu->cpu)); + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->cpu)); WARN(pi_desc->sn == 1, "Posted Interrupt Suppress Notification set before blocking"); @@ -164,21 +162,6 @@ int pi_pre_block(struct kvm_vcpu *vcpu) do { old.control = new.control = pi_desc->control; - /* - * Since vCPU can be preempted during this process, - * vcpu->cpu could be different with pre_pcpu, we - * need to set pre_pcpu as the destination of wakeup - * notification event, then we can find the right vCPU - * to wakeup in wakeup handler if interrupts happen - * when the vCPU is in blocked state. - */ - dest = cpu_physical_id(vcpu->pre_pcpu); - - if (x2apic_mode) - new.ndst = dest; - else - new.ndst = (dest << 8) & 0xFF00; - /* set 'NV' to 'wakeup vector' */ new.nv = POSTED_INTR_WAKEUP_VECTOR; } while (cmpxchg64(&pi_desc->control, old.control,
Don't update Posted Interrupt's NDST, a.k.a. the target pCPU, in the pre-block path, as NDST is guaranteed to be up-to-date. The comment about the vCPU being preempted during the update is simply wrong, as the update path runs with IRQs disabled (from before snapshotting vcpu->cpu, until after the update completes). The vCPU can get preempted _before_ the update starts, but not during. And if the vCPU is preempted before, vmx_vcpu_pi_load() is responsible for updating NDST when the vCPU is scheduled back in. In that case, the check against the wakeup vector in vmx_vcpu_pi_load() cannot be true as that would require the notification vector to have been set to the wakeup vector _before_ blocking. Opportunistically switch to using vcpu->cpu for the list/lock lookups, which presumably used pre_pcpu only for some phantom preemption logic. Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/vmx/posted_intr.c | 23 +++-------------------- 1 file changed, 3 insertions(+), 20 deletions(-)