Message ID | 20191118101600.94645-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [for-4.13] x86/vmx: always sync PIR to IRR before vmentry | expand |
On 18.11.2019 11:16, Roger Pau Monne wrote: > When using posted interrupts on Intel hardware it's possible that the > vCPU resumes execution with a stale local APIC IRR register because > depending on the interrupts to be injected vlapic_has_pending_irq > might not be called, and thus PIR won't be synced into IRR. > > Fix this by making sure PIR is always synced to IRR in vmx_intr_assist > regardless of what interrupts are pending. For this part, did you consider pulling ahead to the beginning of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()? I ask because ... > --- a/xen/arch/x86/hvm/vmx/intr.c > +++ b/xen/arch/x86/hvm/vmx/intr.c > @@ -232,6 +232,14 @@ void vmx_intr_assist(void) > enum hvm_intblk intblk; > int pt_vector; > > + if ( cpu_has_vmx_posted_intr_processing ) > + /* > + * Always force PIR to be synced to IRR before vmentry, this is also > + * done by vlapic_has_pending_irq but it's possible other pending > + * interrupts prevent the execution of that function. > + */ > + vmx_sync_pir_to_irr(v); ... this addition looks more like papering over some issue than actually taking care of it. Then again I wonder whether the PIR->IRR sync is actually legitimate to perform when v != current. If it's not, then there might be a wider set of problems (see e.g. hvm_local_events_need_delivery()). But of course the adjustment to hvm_vcpu_has_pending_irq() could also be to make the call early only when v == current. A last question is that on the consequences of overly aggressive sync-ing - that'll harm performance, but shouldn't affect correctness if I'm not mistaken. Jan
On 18.11.2019 11:16, Roger Pau Monne wrote: > @@ -1954,48 +1952,28 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) > * 2. The target vCPU is the current vCPU and we're in non-interrupt > * context. > */ > - if ( running && (in_irq() || (v != current)) ) > - { > + if ( vcpu_runnable(v) && v != current ) I'm afraid you need to be more careful with the running vs runnable distinction here. The comment above here becomes stale with the change (also wrt the removal of in_irq(), which I'm at least uneasy about), and the new commentary below also largely says/assumes "running", not "runnable". In general I think "runnable" is the more appropriate state to check for, as "running" is even more likely to change behind our backs. But of course there are caveats: When observing "running", v->processor is sufficiently certain to hold the pCPU the vCPU is running on or has been running on last. For "runnable" that's less helpful, because by the time you look at v->processor it may already have changed to whichever the vCPU is (about to be) running on. > /* > - * Note: Only two cases will reach here: > - * 1. The target vCPU is running on other pCPU. > - * 2. The target vCPU is the current vCPU. > + * If the vCPU is running on another pCPU send an IPI to the pCPU. When > + * the IPI arrives, the target vCPU may be running in non-root mode, > + * running in root mode, runnable or blocked. If the target vCPU is > + * running in non-root mode, the hardware will sync PIR to vIRR for > + * 'posted_intr_vector' is a special vector handled directly by the > + * hardware. > * > - * Note2: Don't worry the v->processor may change. The vCPU being > - * moved to another processor is guaranteed to sync PIR to vIRR, > - * due to the involved scheduling cycle. > + * If the target vCPU is running in root-mode, the interrupt handler > + * starts to run. Considering an IPI may arrive in the window between > + * the call to vmx_intr_assist() and interrupts getting disabled, the > + * interrupt handler should raise a softirq to ensure events will be > + * delivered in time. > */ > - unsigned int cpu = v->processor; > + send_IPI_mask(cpumask_of(v->processor), posted_intr_vector); > > - /* > - * For case 1, we send an IPI to the pCPU. When an IPI arrives, the > - * target vCPU maybe is running in non-root mode, running in root > - * mode, runnable or blocked. If the target vCPU is running in > - * non-root mode, the hardware will sync PIR to vIRR for > - * 'posted_intr_vector' is special to the pCPU. If the target vCPU is > - * running in root-mode, the interrupt handler starts to run. > - * Considering an IPI may arrive in the window between the call to > - * vmx_intr_assist() and interrupts getting disabled, the interrupt > - * handler should raise a softirq to ensure events will be delivered > - * in time. If the target vCPU is runnable, it will sync PIR to > - * vIRR next time it is chose to run. In this case, a IPI and a > - * softirq is sent to a wrong vCPU which will not have any adverse > - * effect. If the target vCPU is blocked, since vcpu_block() checks > - * whether there is an event to be delivered through > - * local_events_need_delivery() just after blocking, the vCPU must > - * have synced PIR to vIRR. Similarly, there is a IPI and a softirq > - * sent to a wrong vCPU. > - */ > - if ( cpu != smp_processor_id() ) > - send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > - /* > - * For case 2, raising a softirq ensures PIR will be synced to vIRR. > - * As any softirq will do, as an optimization we only raise one if > - * none is pending already. > - */ > - else if ( !softirq_pending(cpu) ) > - raise_softirq(VCPU_KICK_SOFTIRQ); > - } > + /* > + * If the vCPU is not runnable or if it's the one currently running in this > + * pCPU there's nothing to do, the vmentry code will already sync the PIR > + * to IRR when resuming execution. > + */ > } Just for my own understanding - the "already" here relates to the code addition you make to vmx_intr_assist()? And then - is this true even for an interrupt hitting between vmx_intr_assist() returning and the subsequent CLI in vmx_asm_vmexit_handler()? Jan
On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote: > On 18.11.2019 11:16, Roger Pau Monne wrote: > > When using posted interrupts on Intel hardware it's possible that the > > vCPU resumes execution with a stale local APIC IRR register because > > depending on the interrupts to be injected vlapic_has_pending_irq > > might not be called, and thus PIR won't be synced into IRR. > > > > Fix this by making sure PIR is always synced to IRR in vmx_intr_assist > > regardless of what interrupts are pending. > > For this part, did you consider pulling ahead to the beginning > of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()? I assumed the order in hvm_vcpu_has_pending_irq is there for a reason. I could indeed move vlapic_has_pending_irq to the top, but then either the result is discarded if for example a NMI is pending injection (in which case there's no need to go through all the logic in vlapic_has_pending_irq), or we invert the priority of event injection. I have to admit I have doubts about the code in hvm_vcpu_has_pending_irq. I'm not sure what's the motivation for example to give priority to HVMIRQ_callback_vector over other vectors from the lapic. > I ask because ... > > > --- a/xen/arch/x86/hvm/vmx/intr.c > > +++ b/xen/arch/x86/hvm/vmx/intr.c > > @@ -232,6 +232,14 @@ void vmx_intr_assist(void) > > enum hvm_intblk intblk; > > int pt_vector; > > > > + if ( cpu_has_vmx_posted_intr_processing ) > > + /* > > + * Always force PIR to be synced to IRR before vmentry, this is also > > + * done by vlapic_has_pending_irq but it's possible other pending > > + * interrupts prevent the execution of that function. > > + */ > > + vmx_sync_pir_to_irr(v); > > ... this addition looks more like papering over some issue than > actually taking care of it. Xen needs to make sure PIR is synced to IRR before entering non-root mode. I could place the call somewhere else, or alternatively Xen could disable interrupts, send a self-ipi with the posted vector and enter non-root mode. That should IMO force a resync of PIR to IRR when resuming vCPU execution, but is overly complicated. > Then again I wonder whether the PIR->IRR sync is actually > legitimate to perform when v != current. IMO this is fine as long as the vCPU is not running, as in that case the hardware is not in control of IRR. > If it's not, then there > might be a wider set of problems (see e.g. > hvm_local_events_need_delivery()). But of course the adjustment > to hvm_vcpu_has_pending_irq() could also be to make the call > early only when v == current. I don't think we should be that restrictive, v == current || !vcpu_runable(v) ought to be safe. I've also forgot to send my pre-patch to introduce an assert to that effect: https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00635.html > A last question is that on the consequences of overly aggressive > sync-ing - that'll harm performance, but shouldn't affect > correctness if I'm not mistaken. That's correct, as long as the vcpu is the current one or it's not running. Roger.
On 18.11.2019 14:46, Roger Pau Monné wrote: > On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote: >> On 18.11.2019 11:16, Roger Pau Monne wrote: >>> When using posted interrupts on Intel hardware it's possible that the >>> vCPU resumes execution with a stale local APIC IRR register because >>> depending on the interrupts to be injected vlapic_has_pending_irq >>> might not be called, and thus PIR won't be synced into IRR. >>> >>> Fix this by making sure PIR is always synced to IRR in vmx_intr_assist >>> regardless of what interrupts are pending. >> >> For this part, did you consider pulling ahead to the beginning >> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()? > > I assumed the order in hvm_vcpu_has_pending_irq is there for a reason. > I could indeed move vlapic_has_pending_irq to the top, but then either > the result is discarded if for example a NMI is pending injection > (in which case there's no need to go through all the logic in > vlapic_has_pending_irq), or we invert the priority of event > injection. Changing the order of events injected is not an option afaict. The pointless processing done is a valid concern, yet the suggestion was specifically to have (part of) this processing to occur early. The discarding of the result, in turn, is not a problem afaict, as a subsequent call will return the same result (unless a higher priority interrupt has surfaced in the meantime). > I have to admit I have doubts about the code in > hvm_vcpu_has_pending_irq. I'm not sure what's the motivation for > example to give priority to HVMIRQ_callback_vector over other vectors > from the lapic. I vaguely recall there being a reason, but I guess it would take some git archaeology to find out. >> I ask because ... >> >>> --- a/xen/arch/x86/hvm/vmx/intr.c >>> +++ b/xen/arch/x86/hvm/vmx/intr.c >>> @@ -232,6 +232,14 @@ void vmx_intr_assist(void) >>> enum hvm_intblk intblk; >>> int pt_vector; >>> >>> + if ( cpu_has_vmx_posted_intr_processing ) >>> + /* >>> + * Always force PIR to be synced to IRR before vmentry, this is also >>> + * done by vlapic_has_pending_irq but it's possible other pending >>> + * interrupts prevent the execution of that function. >>> + */ >>> + vmx_sync_pir_to_irr(v); >> >> ... this addition looks more like papering over some issue than >> actually taking care of it. > > Xen needs to make sure PIR is synced to IRR before entering > non-root mode. I could place the call somewhere else, or alternatively > Xen could disable interrupts, send a self-ipi with the posted vector > and enter non-root mode. That should IMO force a resync of PIR to IRR > when resuming vCPU execution, but is overly complicated. Indeed, further complicating things can't be the goal. But finding the most suitable place to make the call might still be worthwhile. >> Then again I wonder whether the PIR->IRR sync is actually >> legitimate to perform when v != current. > > IMO this is fine as long as the vCPU is not running, as in that case > the hardware is not in control of IRR. Here and ... >> If it's not, then there >> might be a wider set of problems (see e.g. >> hvm_local_events_need_delivery()). But of course the adjustment >> to hvm_vcpu_has_pending_irq() could also be to make the call >> early only when v == current. > > I don't think we should be that restrictive, v == current || > !vcpu_runable(v) ought to be safe. I've also forgot to send my > pre-patch to introduce an assert to that effect: > > https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00635.html > >> A last question is that on the consequences of overly aggressive >> sync-ing - that'll harm performance, but shouldn't affect >> correctness if I'm not mistaken. > > That's correct, as long as the vcpu is the current one or it's not > running. ... here I continue to be worried of races: Any check for a vCPU to be non-running (or non-runnable) is stale the moment you inspect the result of the check. Unless, of course, you suppress scheduling (actions potentially making a vCPU runnable) during that time window. Jan
On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote: > On 18.11.2019 11:16, Roger Pau Monne wrote: > > @@ -1954,48 +1952,28 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) > > * 2. The target vCPU is the current vCPU and we're in non-interrupt > > * context. > > */ > > - if ( running && (in_irq() || (v != current)) ) > > - { > > + if ( vcpu_runnable(v) && v != current ) > > I'm afraid you need to be more careful with the running vs runnable > distinction here. The comment above here becomes stale with the > change (also wrt the removal of in_irq(), which I'm at least uneasy > about), and the new commentary below also largely says/assumes > "running", not "runnable". I've missed to fix that comment, will take care in the next version. Note also that the comment is quite pointless, it only states what the code below is supposed to do, but doesn't give any reasoning as to why in_irq is relevant here. TBH I'm not sure of the point of the in_irq check, I don't think it's relevant for the code here. It only matters whether the vCPU is running, and whether it's running in this pCPU. Whether __vmx_deliver_posted_interrupt is called from an irq context is irrelevant AFAICT. > > In general I think "runnable" is the more appropriate state to > check for, as "running" is even more likely to change behind our > backs. But of course there are caveats: When observing "running", > v->processor is sufficiently certain to hold the pCPU the vCPU is > running on or has been running on last. For "runnable" that's > less helpful, because by the time you look at v->processor it may > already have changed to whichever the vCPU is (about to be) > running on. I think this is fine. In fact it would also be fine from a safety PoV to always send a posted interrupt IPI to the pCPU in v->processor: if the vCPU is running PIR will be synced into IRR, otherwise a dummy softirq will be recorded and PIR will be synced into IRR before entering non-root mode. v->is_running might be better in order to prevent sending pointless IPIs around, will fix in v2, since we would only like to send the IPI when the vCPU is executing on a pCPU. > > /* > > - * Note: Only two cases will reach here: > > - * 1. The target vCPU is running on other pCPU. > > - * 2. The target vCPU is the current vCPU. > > + * If the vCPU is running on another pCPU send an IPI to the pCPU. When > > + * the IPI arrives, the target vCPU may be running in non-root mode, > > + * running in root mode, runnable or blocked. If the target vCPU is > > + * running in non-root mode, the hardware will sync PIR to vIRR for > > + * 'posted_intr_vector' is a special vector handled directly by the > > + * hardware. > > * > > - * Note2: Don't worry the v->processor may change. The vCPU being > > - * moved to another processor is guaranteed to sync PIR to vIRR, > > - * due to the involved scheduling cycle. > > + * If the target vCPU is running in root-mode, the interrupt handler > > + * starts to run. Considering an IPI may arrive in the window between > > + * the call to vmx_intr_assist() and interrupts getting disabled, the > > + * interrupt handler should raise a softirq to ensure events will be > > + * delivered in time. > > */ > > - unsigned int cpu = v->processor; > > + send_IPI_mask(cpumask_of(v->processor), posted_intr_vector); > > > > - /* > > - * For case 1, we send an IPI to the pCPU. When an IPI arrives, the > > - * target vCPU maybe is running in non-root mode, running in root > > - * mode, runnable or blocked. If the target vCPU is running in > > - * non-root mode, the hardware will sync PIR to vIRR for > > - * 'posted_intr_vector' is special to the pCPU. If the target vCPU is > > - * running in root-mode, the interrupt handler starts to run. > > - * Considering an IPI may arrive in the window between the call to > > - * vmx_intr_assist() and interrupts getting disabled, the interrupt > > - * handler should raise a softirq to ensure events will be delivered > > - * in time. If the target vCPU is runnable, it will sync PIR to > > - * vIRR next time it is chose to run. In this case, a IPI and a > > - * softirq is sent to a wrong vCPU which will not have any adverse > > - * effect. If the target vCPU is blocked, since vcpu_block() checks > > - * whether there is an event to be delivered through > > - * local_events_need_delivery() just after blocking, the vCPU must > > - * have synced PIR to vIRR. Similarly, there is a IPI and a softirq > > - * sent to a wrong vCPU. > > - */ > > - if ( cpu != smp_processor_id() ) > > - send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > > - /* > > - * For case 2, raising a softirq ensures PIR will be synced to vIRR. > > - * As any softirq will do, as an optimization we only raise one if > > - * none is pending already. > > - */ > > - else if ( !softirq_pending(cpu) ) > > - raise_softirq(VCPU_KICK_SOFTIRQ); > > - } > > + /* > > + * If the vCPU is not runnable or if it's the one currently running in this > > + * pCPU there's nothing to do, the vmentry code will already sync the PIR > > + * to IRR when resuming execution. > > + */ > > } > > Just for my own understanding - the "already" here relates to the code > addition you make to vmx_intr_assist()? Yes. > And then - is this true even for an interrupt hitting between > vmx_intr_assist() returning and the subsequent CLI in > vmx_asm_vmexit_handler()? Should be, as that IPI will raise a softirq that would force a jump into vmx_process_softirqs, which in turn will jump to the start of vmx_do_vmentry, thus calling vmx_intr_assist again. Thanks, Roger.
On 18.11.2019 15:03, Roger Pau Monné wrote: > On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote: >> On 18.11.2019 11:16, Roger Pau Monne wrote: >>> @@ -1954,48 +1952,28 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) >>> * 2. The target vCPU is the current vCPU and we're in non-interrupt >>> * context. >>> */ >>> - if ( running && (in_irq() || (v != current)) ) >>> - { >>> + if ( vcpu_runnable(v) && v != current ) >> >> I'm afraid you need to be more careful with the running vs runnable >> distinction here. The comment above here becomes stale with the >> change (also wrt the removal of in_irq(), which I'm at least uneasy >> about), and the new commentary below also largely says/assumes >> "running", not "runnable". > > I've missed to fix that comment, will take care in the next version. > Note also that the comment is quite pointless, it only states what the > code below is supposed to do, but doesn't give any reasoning as to why > in_irq is relevant here. It's main "value" is to refer to vcpu_kick(), which has ... > TBH I'm not sure of the point of the in_irq check, I don't think it's > relevant for the code here. ... a similar in_irq() check. Sadly that one, while having a bigger comment, also doesn't explain what it's needed for. It looks like I should recall the reason, but I'm sorry - I don't right now. Jan
On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote: > On 18.11.2019 14:46, Roger Pau Monné wrote: > > On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote: > >> On 18.11.2019 11:16, Roger Pau Monne wrote: > >>> When using posted interrupts on Intel hardware it's possible that the > >>> vCPU resumes execution with a stale local APIC IRR register because > >>> depending on the interrupts to be injected vlapic_has_pending_irq > >>> might not be called, and thus PIR won't be synced into IRR. > >>> > >>> Fix this by making sure PIR is always synced to IRR in vmx_intr_assist > >>> regardless of what interrupts are pending. > >> > >> For this part, did you consider pulling ahead to the beginning > >> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()? > > > > I assumed the order in hvm_vcpu_has_pending_irq is there for a reason. > > I could indeed move vlapic_has_pending_irq to the top, but then either > > the result is discarded if for example a NMI is pending injection > > (in which case there's no need to go through all the logic in > > vlapic_has_pending_irq), or we invert the priority of event > > injection. > > Changing the order of events injected is not an option afaict. The > pointless processing done is a valid concern, yet the suggestion > was specifically to have (part of) this processing to occur early. > The discarding of the result, in turn, is not a problem afaict, as > a subsequent call will return the same result (unless a higher > priority interrupt has surfaced in the meantime). Yes, that's fine. So you would prefer to move the call to vlapic_has_pending_irq before any exit path in hvm_vcpu_has_pending_irq? > >> Then again I wonder whether the PIR->IRR sync is actually > >> legitimate to perform when v != current. > > > > IMO this is fine as long as the vCPU is not running, as in that case > > the hardware is not in control of IRR. > > Here and ... > > >> If it's not, then there > >> might be a wider set of problems (see e.g. > >> hvm_local_events_need_delivery()). But of course the adjustment > >> to hvm_vcpu_has_pending_irq() could also be to make the call > >> early only when v == current. > > > > I don't think we should be that restrictive, v == current || > > !vcpu_runable(v) ought to be safe. I've also forgot to send my > > pre-patch to introduce an assert to that effect: > > > > https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00635.html > > > >> A last question is that on the consequences of overly aggressive > >> sync-ing - that'll harm performance, but shouldn't affect > >> correctness if I'm not mistaken. > > > > That's correct, as long as the vcpu is the current one or it's not > > running. > > ... here I continue to be worried of races: Any check for a vCPU to > be non-running (or non-runnable) is stale the moment you inspect the > result of the check. Unless, of course, you suppress scheduling > (actions potentially making a vCPU runnable) during that time window. Hm, it's indeed true that syncing PIR into IRR for a vCPU not running in the current pCPU is troublesome. Maybe syncing PIR into IRR should only be done when v == current? The only alternative I can think of is something like: if ( v != current ) vcpu_pause(v); sync_pir_irr(v); if ( v != current ) vcpu_unpause(v); Is there a need to check the IRR of vCPUs that are not running, and does it matter if the IRR is not fully updated in that case? Thanks, Roger.
On Mon, Nov 18, 2019 at 03:19:50PM +0100, Jan Beulich wrote: > On 18.11.2019 15:03, Roger Pau Monné wrote: > > On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote: > >> On 18.11.2019 11:16, Roger Pau Monne wrote: > >>> @@ -1954,48 +1952,28 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) > >>> * 2. The target vCPU is the current vCPU and we're in non-interrupt > >>> * context. > >>> */ > >>> - if ( running && (in_irq() || (v != current)) ) > >>> - { > >>> + if ( vcpu_runnable(v) && v != current ) > >> > >> I'm afraid you need to be more careful with the running vs runnable > >> distinction here. The comment above here becomes stale with the > >> change (also wrt the removal of in_irq(), which I'm at least uneasy > >> about), and the new commentary below also largely says/assumes > >> "running", not "runnable". > > > > I've missed to fix that comment, will take care in the next version. > > Note also that the comment is quite pointless, it only states what the > > code below is supposed to do, but doesn't give any reasoning as to why > > in_irq is relevant here. > > It's main "value" is to refer to vcpu_kick(), which has ... > > > TBH I'm not sure of the point of the in_irq check, I don't think it's > > relevant for the code here. > > ... a similar in_irq() check. Sadly that one, while having a bigger > comment, also doesn't explain what it's needed for. It looks like I > should recall the reason, but I'm sorry - I don't right now. By reading the message of the commit that introduced the in_irq check in vcpu_kick: "The drawback is that {vmx,svm}_intr_assist() now races new event notifications delivered by IRQ or IPI. We close down this race by having vcpu_kick() send a dummy softirq -- this gets picked up in IRQ-sage context and will cause retry of *_intr_assist(). We avoid delivering the softirq where possible by avoiding it when we are running in the non-IRQ context of the VCPU to be kicked." AFAICT in the vcpu_kick case this is done because the softirq should only be raised when in IRQ context in order to trigger the code in vmx_do_vmentry to retry the call to vmx_intr_assist (this is relevant if vcpu_kick is issued from an irq handler executed after vmx_intr_assist and before the disabling interrupts in vmx_do_vmentry. I think we need something along the lines of: if ( v->is_running && v != current ) send_IPI_mask(cpumask_of(v->processor), posted_intr_vector); else if ( v == current && in_irq() && !softirq_pending(smp_processor_id()) ) raise_softirq(VCPU_KICK_SOFTIRQ); So that vmx_intr_assist is also retried if a vector is signaled in PIR on the vCPU currently running between the call to vmx_intr_assist and the disabling of interrupts in vmx_do_vmentry. Thanks, Roger.
On 18.11.2019 15:20, Roger Pau Monné wrote: > On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote: >> On 18.11.2019 14:46, Roger Pau Monné wrote: >>> On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote: >>>> On 18.11.2019 11:16, Roger Pau Monne wrote: >>>>> When using posted interrupts on Intel hardware it's possible that the >>>>> vCPU resumes execution with a stale local APIC IRR register because >>>>> depending on the interrupts to be injected vlapic_has_pending_irq >>>>> might not be called, and thus PIR won't be synced into IRR. >>>>> >>>>> Fix this by making sure PIR is always synced to IRR in vmx_intr_assist >>>>> regardless of what interrupts are pending. >>>> >>>> For this part, did you consider pulling ahead to the beginning >>>> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()? >>> >>> I assumed the order in hvm_vcpu_has_pending_irq is there for a reason. >>> I could indeed move vlapic_has_pending_irq to the top, but then either >>> the result is discarded if for example a NMI is pending injection >>> (in which case there's no need to go through all the logic in >>> vlapic_has_pending_irq), or we invert the priority of event >>> injection. >> >> Changing the order of events injected is not an option afaict. The >> pointless processing done is a valid concern, yet the suggestion >> was specifically to have (part of) this processing to occur early. >> The discarding of the result, in turn, is not a problem afaict, as >> a subsequent call will return the same result (unless a higher >> priority interrupt has surfaced in the meantime). > > Yes, that's fine. So you would prefer to move the call to > vlapic_has_pending_irq before any exit path in > hvm_vcpu_has_pending_irq? "Prefer" isn't really the way I would put it. I'd like this to be considered as an alternative because, as said, I think the current placement look more like a plaster than a cure. I'm also open for other suggestions. But first of all I'd like to see what the VMX maintainers think. >>>> Then again I wonder whether the PIR->IRR sync is actually >>>> legitimate to perform when v != current. >>> >>> IMO this is fine as long as the vCPU is not running, as in that case >>> the hardware is not in control of IRR. >> >> Here and ... >> >>>> If it's not, then there >>>> might be a wider set of problems (see e.g. >>>> hvm_local_events_need_delivery()). But of course the adjustment >>>> to hvm_vcpu_has_pending_irq() could also be to make the call >>>> early only when v == current. >>> >>> I don't think we should be that restrictive, v == current || >>> !vcpu_runable(v) ought to be safe. I've also forgot to send my >>> pre-patch to introduce an assert to that effect: >>> >>> https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00635.html >>> >>>> A last question is that on the consequences of overly aggressive >>>> sync-ing - that'll harm performance, but shouldn't affect >>>> correctness if I'm not mistaken. >>> >>> That's correct, as long as the vcpu is the current one or it's not >>> running. >> >> ... here I continue to be worried of races: Any check for a vCPU to >> be non-running (or non-runnable) is stale the moment you inspect the >> result of the check. Unless, of course, you suppress scheduling >> (actions potentially making a vCPU runnable) during that time window. > > Hm, it's indeed true that syncing PIR into IRR for a vCPU not running > in the current pCPU is troublesome. Maybe syncing PIR into IRR should > only be done when v == current? > > The only alternative I can think of is something like: > > if ( v != current ) > vcpu_pause(v); > sync_pir_irr(v); > if ( v != current ) > vcpu_unpause(v); > > Is there a need to check the IRR of vCPUs that are not running, and > does it matter if the IRR is not fully updated in that case? That's one of the problems with function parameters decribed "struct vcpu *v" - you don't know whether there's any expectation that v == current at all times. And tracking down all uses of certain functions can be rather tedious. IOW without quite a bit of code auditing I'm afraid I can't tell whether there's possibly some corner case where this might be needed. Jan
On 18.11.2019 15:55, Roger Pau Monné wrote: > On Mon, Nov 18, 2019 at 03:19:50PM +0100, Jan Beulich wrote: >> On 18.11.2019 15:03, Roger Pau Monné wrote: >>> On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote: >>>> On 18.11.2019 11:16, Roger Pau Monne wrote: >>>>> @@ -1954,48 +1952,28 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) >>>>> * 2. The target vCPU is the current vCPU and we're in non-interrupt >>>>> * context. >>>>> */ >>>>> - if ( running && (in_irq() || (v != current)) ) >>>>> - { >>>>> + if ( vcpu_runnable(v) && v != current ) >>>> >>>> I'm afraid you need to be more careful with the running vs runnable >>>> distinction here. The comment above here becomes stale with the >>>> change (also wrt the removal of in_irq(), which I'm at least uneasy >>>> about), and the new commentary below also largely says/assumes >>>> "running", not "runnable". >>> >>> I've missed to fix that comment, will take care in the next version. >>> Note also that the comment is quite pointless, it only states what the >>> code below is supposed to do, but doesn't give any reasoning as to why >>> in_irq is relevant here. >> >> It's main "value" is to refer to vcpu_kick(), which has ... >> >>> TBH I'm not sure of the point of the in_irq check, I don't think it's >>> relevant for the code here. >> >> ... a similar in_irq() check. Sadly that one, while having a bigger >> comment, also doesn't explain what it's needed for. It looks like I >> should recall the reason, but I'm sorry - I don't right now. > > By reading the message of the commit that introduced the in_irq check > in vcpu_kick: > > "The drawback is that {vmx,svm}_intr_assist() now races new event > notifications delivered by IRQ or IPI. We close down this race by > having vcpu_kick() send a dummy softirq -- this gets picked up in > IRQ-sage context and will cause retry of *_intr_assist(). We avoid > delivering the softirq where possible by avoiding it when we are > running in the non-IRQ context of the VCPU to be kicked." > > AFAICT in the vcpu_kick case this is done because the softirq should > only be raised when in IRQ context in order to trigger the code in > vmx_do_vmentry to retry the call to vmx_intr_assist (this is relevant > if vcpu_kick is issued from an irq handler executed after > vmx_intr_assist and before the disabling interrupts in > vmx_do_vmentry. > > I think we need something along the lines of: > > if ( v->is_running && v != current ) > send_IPI_mask(cpumask_of(v->processor), posted_intr_vector); > else if ( v == current && in_irq() && !softirq_pending(smp_processor_id()) ) > raise_softirq(VCPU_KICK_SOFTIRQ); > > So that vmx_intr_assist is also retried if a vector is signaled in PIR > on the vCPU currently running between the call to vmx_intr_assist and > the disabling of interrupts in vmx_do_vmentry. Looks plausible. Jan
On Mon, Nov 18, 2019 at 05:00:29PM +0100, Jan Beulich wrote: > On 18.11.2019 15:20, Roger Pau Monné wrote: > > On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote: > >> On 18.11.2019 14:46, Roger Pau Monné wrote: > >>> On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote: > >>>> On 18.11.2019 11:16, Roger Pau Monne wrote: > >>>>> When using posted interrupts on Intel hardware it's possible that the > >>>>> vCPU resumes execution with a stale local APIC IRR register because > >>>>> depending on the interrupts to be injected vlapic_has_pending_irq > >>>>> might not be called, and thus PIR won't be synced into IRR. > >>>>> > >>>>> Fix this by making sure PIR is always synced to IRR in vmx_intr_assist > >>>>> regardless of what interrupts are pending. > >>>> > >>>> For this part, did you consider pulling ahead to the beginning > >>>> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()? > >>> > >>> I assumed the order in hvm_vcpu_has_pending_irq is there for a reason. > >>> I could indeed move vlapic_has_pending_irq to the top, but then either > >>> the result is discarded if for example a NMI is pending injection > >>> (in which case there's no need to go through all the logic in > >>> vlapic_has_pending_irq), or we invert the priority of event > >>> injection. > >> > >> Changing the order of events injected is not an option afaict. The > >> pointless processing done is a valid concern, yet the suggestion > >> was specifically to have (part of) this processing to occur early. > >> The discarding of the result, in turn, is not a problem afaict, as > >> a subsequent call will return the same result (unless a higher > >> priority interrupt has surfaced in the meantime). > > > > Yes, that's fine. So you would prefer to move the call to > > vlapic_has_pending_irq before any exit path in > > hvm_vcpu_has_pending_irq? > > "Prefer" isn't really the way I would put it. I'd like this to be > considered as an alternative because, as said, I think the current > placement look more like a plaster than a cure. I'm also open for > other suggestions. But first of all I'd like to see what the VMX > maintainers think. Kevin/Jun, can we please get your opinion on the above item? Thanks, Roger.
> From: Roger Pau Monné [mailto:roger.pau@citrix.com] > Sent: Thursday, November 21, 2019 5:26 PM > > On Mon, Nov 18, 2019 at 05:00:29PM +0100, Jan Beulich wrote: > > On 18.11.2019 15:20, Roger Pau Monné wrote: > > > On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote: > > >> On 18.11.2019 14:46, Roger Pau Monné wrote: > > >>> On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote: > > >>>> On 18.11.2019 11:16, Roger Pau Monne wrote: > > >>>>> When using posted interrupts on Intel hardware it's possible that > the > > >>>>> vCPU resumes execution with a stale local APIC IRR register > because > > >>>>> depending on the interrupts to be injected vlapic_has_pending_irq > > >>>>> might not be called, and thus PIR won't be synced into IRR. > > >>>>> > > >>>>> Fix this by making sure PIR is always synced to IRR in > vmx_intr_assist > > >>>>> regardless of what interrupts are pending. > > >>>> > > >>>> For this part, did you consider pulling ahead to the beginning > > >>>> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()? > > >>> > > >>> I assumed the order in hvm_vcpu_has_pending_irq is there for a > reason. > > >>> I could indeed move vlapic_has_pending_irq to the top, but then > either > > >>> the result is discarded if for example a NMI is pending injection > > >>> (in which case there's no need to go through all the logic in > > >>> vlapic_has_pending_irq), or we invert the priority of event > > >>> injection. > > >> > > >> Changing the order of events injected is not an option afaict. The > > >> pointless processing done is a valid concern, yet the suggestion > > >> was specifically to have (part of) this processing to occur early. > > >> The discarding of the result, in turn, is not a problem afaict, as > > >> a subsequent call will return the same result (unless a higher > > >> priority interrupt has surfaced in the meantime). > > > > > > Yes, that's fine. So you would prefer to move the call to > > > vlapic_has_pending_irq before any exit path in > > > hvm_vcpu_has_pending_irq? > > > > "Prefer" isn't really the way I would put it. I'd like this to be > > considered as an alternative because, as said, I think the current > > placement look more like a plaster than a cure. I'm also open for > > other suggestions. But first of all I'd like to see what the VMX > > maintainers think. > > Kevin/Jun, can we please get your opinion on the above item? Give me some time to catch up... Thanks Kevin
> From: Roger Pau Monné <roger.pau@citrix.com> > Sent: Thursday, November 21, 2019 5:26 PM > > On Mon, Nov 18, 2019 at 05:00:29PM +0100, Jan Beulich wrote: > > On 18.11.2019 15:20, Roger Pau Monné wrote: > > > On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote: > > >> On 18.11.2019 14:46, Roger Pau Monné wrote: > > >>> On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote: > > >>>> On 18.11.2019 11:16, Roger Pau Monne wrote: > > >>>>> When using posted interrupts on Intel hardware it's possible that the > > >>>>> vCPU resumes execution with a stale local APIC IRR register because > > >>>>> depending on the interrupts to be injected vlapic_has_pending_irq > > >>>>> might not be called, and thus PIR won't be synced into IRR. > > >>>>> > > >>>>> Fix this by making sure PIR is always synced to IRR in vmx_intr_assist > > >>>>> regardless of what interrupts are pending. > > >>>> > > >>>> For this part, did you consider pulling ahead to the beginning > > >>>> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()? > > >>> > > >>> I assumed the order in hvm_vcpu_has_pending_irq is there for a > reason. > > >>> I could indeed move vlapic_has_pending_irq to the top, but then either > > >>> the result is discarded if for example a NMI is pending injection > > >>> (in which case there's no need to go through all the logic in > > >>> vlapic_has_pending_irq), or we invert the priority of event > > >>> injection. > > >> > > >> Changing the order of events injected is not an option afaict. The > > >> pointless processing done is a valid concern, yet the suggestion > > >> was specifically to have (part of) this processing to occur early. > > >> The discarding of the result, in turn, is not a problem afaict, as > > >> a subsequent call will return the same result (unless a higher > > >> priority interrupt has surfaced in the meantime). > > > > > > Yes, that's fine. So you would prefer to move the call to > > > vlapic_has_pending_irq before any exit path in > > > hvm_vcpu_has_pending_irq? > > > > "Prefer" isn't really the way I would put it. I'd like this to be > > considered as an alternative because, as said, I think the current > > placement look more like a plaster than a cure. I'm also open for > > other suggestions. But first of all I'd like to see what the VMX > > maintainers think. > > Kevin/Jun, can we please get your opinion on the above item? > putting the sync within hvm_vcpu_has_pending_irq sounds better, implying that all intermediate states must be synced back to architectural states anytime when software wants to check virtual interrupt. Thanks Kevin
> From: Roger Pau Monné <roger.pau@citrix.com> > Sent: Monday, November 18, 2019 10:56 PM > > On Mon, Nov 18, 2019 at 03:19:50PM +0100, Jan Beulich wrote: > > On 18.11.2019 15:03, Roger Pau Monné wrote: > > > On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote: > > >> On 18.11.2019 11:16, Roger Pau Monne wrote: > > >>> @@ -1954,48 +1952,28 @@ static void > __vmx_deliver_posted_interrupt(struct vcpu *v) > > >>> * 2. The target vCPU is the current vCPU and we're in non-interrupt > > >>> * context. > > >>> */ > > >>> - if ( running && (in_irq() || (v != current)) ) > > >>> - { > > >>> + if ( vcpu_runnable(v) && v != current ) > > >> > > >> I'm afraid you need to be more careful with the running vs runnable > > >> distinction here. The comment above here becomes stale with the > > >> change (also wrt the removal of in_irq(), which I'm at least uneasy > > >> about), and the new commentary below also largely says/assumes > > >> "running", not "runnable". > > > > > > I've missed to fix that comment, will take care in the next version. > > > Note also that the comment is quite pointless, it only states what the > > > code below is supposed to do, but doesn't give any reasoning as to why > > > in_irq is relevant here. > > > > It's main "value" is to refer to vcpu_kick(), which has ... > > > > > TBH I'm not sure of the point of the in_irq check, I don't think it's > > > relevant for the code here. > > > > ... a similar in_irq() check. Sadly that one, while having a bigger > > comment, also doesn't explain what it's needed for. It looks like I > > should recall the reason, but I'm sorry - I don't right now. > > By reading the message of the commit that introduced the in_irq check > in vcpu_kick: > > "The drawback is that {vmx,svm}_intr_assist() now races new event > notifications delivered by IRQ or IPI. We close down this race by > having vcpu_kick() send a dummy softirq -- this gets picked up in > IRQ-sage context and will cause retry of *_intr_assist(). We avoid > delivering the softirq where possible by avoiding it when we are > running in the non-IRQ context of the VCPU to be kicked." > > AFAICT in the vcpu_kick case this is done because the softirq should > only be raised when in IRQ context in order to trigger the code in > vmx_do_vmentry to retry the call to vmx_intr_assist (this is relevant > if vcpu_kick is issued from an irq handler executed after > vmx_intr_assist and before the disabling interrupts in > vmx_do_vmentry. > > I think we need something along the lines of: > > if ( v->is_running && v != current ) > send_IPI_mask(cpumask_of(v->processor), posted_intr_vector); > else if ( v == current && in_irq() && !softirq_pending(smp_processor_id()) ) > raise_softirq(VCPU_KICK_SOFTIRQ); Then what's the difference from original logic? > > So that vmx_intr_assist is also retried if a vector is signaled in PIR > on the vCPU currently running between the call to vmx_intr_assist and > the disabling of interrupts in vmx_do_vmentry. >
On Wed, Nov 27, 2019 at 02:07:16AM +0000, Tian, Kevin wrote: > > From: Roger Pau Monné <roger.pau@citrix.com> > > Sent: Monday, November 18, 2019 10:56 PM > > > > On Mon, Nov 18, 2019 at 03:19:50PM +0100, Jan Beulich wrote: > > > On 18.11.2019 15:03, Roger Pau Monné wrote: > > > > On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote: > > > >> On 18.11.2019 11:16, Roger Pau Monne wrote: > > > >>> @@ -1954,48 +1952,28 @@ static void > > __vmx_deliver_posted_interrupt(struct vcpu *v) > > > >>> * 2. The target vCPU is the current vCPU and we're in non-interrupt > > > >>> * context. > > > >>> */ > > > >>> - if ( running && (in_irq() || (v != current)) ) > > > >>> - { > > > >>> + if ( vcpu_runnable(v) && v != current ) > > > >> > > > >> I'm afraid you need to be more careful with the running vs runnable > > > >> distinction here. The comment above here becomes stale with the > > > >> change (also wrt the removal of in_irq(), which I'm at least uneasy > > > >> about), and the new commentary below also largely says/assumes > > > >> "running", not "runnable". > > > > > > > > I've missed to fix that comment, will take care in the next version. > > > > Note also that the comment is quite pointless, it only states what the > > > > code below is supposed to do, but doesn't give any reasoning as to why > > > > in_irq is relevant here. > > > > > > It's main "value" is to refer to vcpu_kick(), which has ... > > > > > > > TBH I'm not sure of the point of the in_irq check, I don't think it's > > > > relevant for the code here. > > > > > > ... a similar in_irq() check. Sadly that one, while having a bigger > > > comment, also doesn't explain what it's needed for. It looks like I > > > should recall the reason, but I'm sorry - I don't right now. > > > > By reading the message of the commit that introduced the in_irq check > > in vcpu_kick: > > > > "The drawback is that {vmx,svm}_intr_assist() now races new event > > notifications delivered by IRQ or IPI. We close down this race by > > having vcpu_kick() send a dummy softirq -- this gets picked up in > > IRQ-sage context and will cause retry of *_intr_assist(). We avoid > > delivering the softirq where possible by avoiding it when we are > > running in the non-IRQ context of the VCPU to be kicked." > > > > AFAICT in the vcpu_kick case this is done because the softirq should > > only be raised when in IRQ context in order to trigger the code in > > vmx_do_vmentry to retry the call to vmx_intr_assist (this is relevant > > if vcpu_kick is issued from an irq handler executed after > > vmx_intr_assist and before the disabling interrupts in > > vmx_do_vmentry. > > > > I think we need something along the lines of: > > > > if ( v->is_running && v != current ) > > send_IPI_mask(cpumask_of(v->processor), posted_intr_vector); > > else if ( v == current && in_irq() && !softirq_pending(smp_processor_id()) ) > > raise_softirq(VCPU_KICK_SOFTIRQ); > > Then what's the difference from original logic? The original logic is: if ( running && (in_irq() || (v != current)) ) { unsigned int cpu = v->processor; if ( cpu != smp_processor_id() ) send_IPI_mask(cpumask_of(cpu), posted_intr_vector); else if ( !softirq_pending(cpu) ) raise_softirq(VCPU_KICK_SOFTIRQ); } Which I find much harder to understand. For example I'm not sure of what's the benefit of doing the cpu != smp_processor_id() check instead of simply doing v != current (like in the outer if condition). My suggestion removes one level of nesting and IMO makes the condition clearer, but maybe that's just my taste. Also the original comments don't mention at all why a softirq should be raised if v == current && in_irq, and it took me some time to figure out why that's required. My proposed change clarifies why this is needed, and also makes it more obvious that the softirq will only be raised when in irq context. Anyway, I'm not going to insist and will drop the change if it's not deemed useful. Thanks, Roger.
On 27.11.2019 12:03, Roger Pau Monné wrote: > On Wed, Nov 27, 2019 at 02:07:16AM +0000, Tian, Kevin wrote: >> Then what's the difference from original logic? > > The original logic is: > > if ( running && (in_irq() || (v != current)) ) > { > unsigned int cpu = v->processor; > > if ( cpu != smp_processor_id() ) > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > else if ( !softirq_pending(cpu) ) > raise_softirq(VCPU_KICK_SOFTIRQ); > } > > Which I find much harder to understand. For example I'm not sure of > what's the benefit of doing the cpu != smp_processor_id() check > instead of simply doing v != current (like in the outer if condition). There are two aspects to consider: One is that v->processor may equal smp_processor_id() also for v != current. The other is that without this check in the if() it would need adding to the else-if(). I'm not sure to what degree which of the two matters functionality wise. Jan
On Wed, Nov 27, 2019 at 12:16:37PM +0100, Jan Beulich wrote: > On 27.11.2019 12:03, Roger Pau Monné wrote: > > On Wed, Nov 27, 2019 at 02:07:16AM +0000, Tian, Kevin wrote: > >> Then what's the difference from original logic? > > > > The original logic is: > > > > if ( running && (in_irq() || (v != current)) ) > > { > > unsigned int cpu = v->processor; > > > > if ( cpu != smp_processor_id() ) > > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > > else if ( !softirq_pending(cpu) ) > > raise_softirq(VCPU_KICK_SOFTIRQ); > > } > > > > Which I find much harder to understand. For example I'm not sure of > > what's the benefit of doing the cpu != smp_processor_id() check > > instead of simply doing v != current (like in the outer if condition). > > There are two aspects to consider: One is that v->processor > may equal smp_processor_id() also for v != current. The other > is that without this check in the if() it would need adding > to the else-if(). I'm not sure to what degree which of the > two matters functionality wise. Since the vCPU is running v->processor can only equal smp_processor_id if v == current, and hence I think both checks achieve exactly the same end result, it's just that IMO doing the outer one with v != current and the inner one with cpu != smp_processor_id() is confusing. Maybe I'm missing something else that actually requires doing the inner check with v->processor and smp_processor_id(). Roger.
On 27.11.2019 12:29, Roger Pau Monné wrote: > On Wed, Nov 27, 2019 at 12:16:37PM +0100, Jan Beulich wrote: >> On 27.11.2019 12:03, Roger Pau Monné wrote: >>> On Wed, Nov 27, 2019 at 02:07:16AM +0000, Tian, Kevin wrote: >>>> Then what's the difference from original logic? >>> >>> The original logic is: >>> >>> if ( running && (in_irq() || (v != current)) ) >>> { >>> unsigned int cpu = v->processor; >>> >>> if ( cpu != smp_processor_id() ) >>> send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >>> else if ( !softirq_pending(cpu) ) >>> raise_softirq(VCPU_KICK_SOFTIRQ); >>> } >>> >>> Which I find much harder to understand. For example I'm not sure of >>> what's the benefit of doing the cpu != smp_processor_id() check >>> instead of simply doing v != current (like in the outer if condition). >> >> There are two aspects to consider: One is that v->processor >> may equal smp_processor_id() also for v != current. The other >> is that without this check in the if() it would need adding >> to the else-if(). I'm not sure to what degree which of the >> two matters functionality wise. > > Since the vCPU is running v->processor can only equal smp_processor_id > if v == current, What tells you that it is running? It had been running at the time the flag was latched (before vcpu_unblock()), but may have got de-scheduled in the meantime. Jan
On Wed, Nov 27, 2019 at 12:34:09PM +0100, Jan Beulich wrote: > On 27.11.2019 12:29, Roger Pau Monné wrote: > > On Wed, Nov 27, 2019 at 12:16:37PM +0100, Jan Beulich wrote: > >> On 27.11.2019 12:03, Roger Pau Monné wrote: > >>> On Wed, Nov 27, 2019 at 02:07:16AM +0000, Tian, Kevin wrote: > >>>> Then what's the difference from original logic? > >>> > >>> The original logic is: > >>> > >>> if ( running && (in_irq() || (v != current)) ) > >>> { > >>> unsigned int cpu = v->processor; > >>> > >>> if ( cpu != smp_processor_id() ) > >>> send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > >>> else if ( !softirq_pending(cpu) ) > >>> raise_softirq(VCPU_KICK_SOFTIRQ); > >>> } > >>> > >>> Which I find much harder to understand. For example I'm not sure of > >>> what's the benefit of doing the cpu != smp_processor_id() check > >>> instead of simply doing v != current (like in the outer if condition). > >> > >> There are two aspects to consider: One is that v->processor > >> may equal smp_processor_id() also for v != current. The other > >> is that without this check in the if() it would need adding > >> to the else-if(). I'm not sure to what degree which of the > >> two matters functionality wise. > > > > Since the vCPU is running v->processor can only equal smp_processor_id > > if v == current, > > What tells you that it is running? It had been running at the > time the flag was latched (before vcpu_unblock()), but may > have got de-scheduled in the meantime. Right, but if it's not running then it doesn't really matter that we send an IPI or raise a softirq, the PIR to IRR sync will happen anyway before the vCPU is resumed. Thanks, Roger.
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index 0d097cf1f2..ce70f4bc75 100644 --- a/xen/arch/x86/hvm/vmx/intr.c +++ b/xen/arch/x86/hvm/vmx/intr.c @@ -232,6 +232,14 @@ void vmx_intr_assist(void) enum hvm_intblk intblk; int pt_vector; + if ( cpu_has_vmx_posted_intr_processing ) + /* + * Always force PIR to be synced to IRR before vmentry, this is also + * done by vlapic_has_pending_irq but it's possible other pending + * interrupts prevent the execution of that function. + */ + vmx_sync_pir_to_irr(v); + /* Block event injection when single step with MTF. */ if ( unlikely(v->arch.hvm.single_step) ) { diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 33e68eaddf..82a1b972c5 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1945,8 +1945,6 @@ static void vmx_process_isr(int isr, struct vcpu *v) static void __vmx_deliver_posted_interrupt(struct vcpu *v) { - bool_t running = v->is_running; - vcpu_unblock(v); /* * Just like vcpu_kick(), nothing is needed for the following two cases: @@ -1954,48 +1952,28 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) * 2. The target vCPU is the current vCPU and we're in non-interrupt * context. */ - if ( running && (in_irq() || (v != current)) ) - { + if ( vcpu_runnable(v) && v != current ) /* - * Note: Only two cases will reach here: - * 1. The target vCPU is running on other pCPU. - * 2. The target vCPU is the current vCPU. + * If the vCPU is running on another pCPU send an IPI to the pCPU. When + * the IPI arrives, the target vCPU may be running in non-root mode, + * running in root mode, runnable or blocked. If the target vCPU is + * running in non-root mode, the hardware will sync PIR to vIRR for + * 'posted_intr_vector' is a special vector handled directly by the + * hardware. * - * Note2: Don't worry the v->processor may change. The vCPU being - * moved to another processor is guaranteed to sync PIR to vIRR, - * due to the involved scheduling cycle. + * If the target vCPU is running in root-mode, the interrupt handler + * starts to run. Considering an IPI may arrive in the window between + * the call to vmx_intr_assist() and interrupts getting disabled, the + * interrupt handler should raise a softirq to ensure events will be + * delivered in time. */ - unsigned int cpu = v->processor; + send_IPI_mask(cpumask_of(v->processor), posted_intr_vector); - /* - * For case 1, we send an IPI to the pCPU. When an IPI arrives, the - * target vCPU maybe is running in non-root mode, running in root - * mode, runnable or blocked. If the target vCPU is running in - * non-root mode, the hardware will sync PIR to vIRR for - * 'posted_intr_vector' is special to the pCPU. If the target vCPU is - * running in root-mode, the interrupt handler starts to run. - * Considering an IPI may arrive in the window between the call to - * vmx_intr_assist() and interrupts getting disabled, the interrupt - * handler should raise a softirq to ensure events will be delivered - * in time. If the target vCPU is runnable, it will sync PIR to - * vIRR next time it is chose to run. In this case, a IPI and a - * softirq is sent to a wrong vCPU which will not have any adverse - * effect. If the target vCPU is blocked, since vcpu_block() checks - * whether there is an event to be delivered through - * local_events_need_delivery() just after blocking, the vCPU must - * have synced PIR to vIRR. Similarly, there is a IPI and a softirq - * sent to a wrong vCPU. - */ - if ( cpu != smp_processor_id() ) - send_IPI_mask(cpumask_of(cpu), posted_intr_vector); - /* - * For case 2, raising a softirq ensures PIR will be synced to vIRR. - * As any softirq will do, as an optimization we only raise one if - * none is pending already. - */ - else if ( !softirq_pending(cpu) ) - raise_softirq(VCPU_KICK_SOFTIRQ); - } + /* + * If the vCPU is not runnable or if it's the one currently running in this + * pCPU there's nothing to do, the vmentry code will already sync the PIR + * to IRR when resuming execution. + */ } static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) @@ -2048,7 +2026,7 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) vcpu_kick(v); } -static void vmx_sync_pir_to_irr(struct vcpu *v) +void vmx_sync_pir_to_irr(struct vcpu *v) { struct vlapic *vlapic = vcpu_vlapic(v); unsigned int group, i; diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index ebaa74449b..c43fab7c4f 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -101,6 +101,7 @@ void vmx_update_debug_state(struct vcpu *v); void vmx_update_exception_bitmap(struct vcpu *v); void vmx_update_cpu_exec_control(struct vcpu *v); void vmx_update_secondary_exec_control(struct vcpu *v); +void vmx_sync_pir_to_irr(struct vcpu *v); #define POSTED_INTR_ON 0 #define POSTED_INTR_SN 1