Message ID | 20241003142036.43287-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/dpci: do not leak pending interrupts on CPU offline | expand |
On 03/10/2024 3:20 pm, Roger Pau Monne wrote: > The current dpci logic relies on a softirq being executed as a side effect of > the cpu_notifier_call_chain() call in the code path that offlines the target > CPU. However the call to cpu_notifier_call_chain() won't trigger any softirq > processing, and even if it did, such processing should be done after all > interrupts have been migrated off the current CPU, otherwise new pending dpci > interrupts could still appear. > > Current ASSERT in "Currently the ASSERT() in" > the cpu callback notifier is fairly easy to trigger by doing > CPU offline from a PVH dom0. > > Solve this by instead moving out any dpci interrupts pending processing once > the CPU is dead. This might introduce more latency than attempting to drain > before the CPU is put offline, but it's less complex, and CPU online/offline is > not a common action. Any extra introduced latency should be tolerable. > > Fixes: f6dd295381f4 ('dpci: replace tasklet with softirq') > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Yeah, I'm not concerned with minor extra latency in the offline path. In production it's used 0% of the time to many many significant figures. Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c index d3627e4af71b..f5faff7a499a 100644 --- a/xen/drivers/passthrough/x86/hvm.c +++ b/xen/drivers/passthrough/x86/hvm.c @@ -1105,23 +1105,27 @@ static int cf_check cpu_callback( struct notifier_block *nfb, unsigned long action, void *hcpu) { unsigned int cpu = (unsigned long)hcpu; + unsigned long flags; switch ( action ) { case CPU_UP_PREPARE: INIT_LIST_HEAD(&per_cpu(dpci_list, cpu)); break; + case CPU_UP_CANCELED: - case CPU_DEAD: - /* - * On CPU_DYING this callback is called (on the CPU that is dying) - * with an possible HVM_DPIC_SOFTIRQ pending - at which point we can - * clear out any outstanding domains (by the virtue of the idle loop - * calling the softirq later). In CPU_DEAD case the CPU is deaf and - * there are no pending softirqs for us to handle so we can chill. - */ ASSERT(list_empty(&per_cpu(dpci_list, cpu))); break; + + case CPU_DEAD: + if ( list_empty(&per_cpu(dpci_list, cpu)) ) + break; + /* Take whatever dpci interrupts are pending on the dead CPU. */ + local_irq_save(flags); + list_splice_init(&per_cpu(dpci_list, cpu), &this_cpu(dpci_list)); + local_irq_restore(flags); + raise_softirq(HVM_DPCI_SOFTIRQ); + break; } return NOTIFY_DONE;
The current dpci logic relies on a softirq being executed as a side effect of the cpu_notifier_call_chain() call in the code path that offlines the target CPU. However the call to cpu_notifier_call_chain() won't trigger any softirq processing, and even if it did, such processing should be done after all interrupts have been migrated off the current CPU, otherwise new pending dpci interrupts could still appear. Current ASSERT in the cpu callback notifier is fairly easy to trigger by doing CPU offline from a PVH dom0. Solve this by instead moving out any dpci interrupts pending processing once the CPU is dead. This might introduce more latency than attempting to drain before the CPU is put offline, but it's less complex, and CPU online/offline is not a common action. Any extra introduced latency should be tolerable. Fixes: f6dd295381f4 ('dpci: replace tasklet with softirq') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/drivers/passthrough/x86/hvm.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)