x86/dpci: do not leak pending interrupts on CPU offline

Message ID	20241003142036.43287-1-roger.pau@citrix.com (mailing list archive)
State	New
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Roger Pau Monne <roger.pau@citrix.com> To: xen-devel@lists.xenproject.org Cc: Roger Pau Monne <roger.pau@citrix.com>, Jan Beulich <jbeulich@suse.com> Subject: [PATCH] x86/dpci: do not leak pending interrupts on CPU offline Date: Thu, 3 Oct 2024 16:20:36 +0200 Message-ID: <20241003142036.43287-1-roger.pau@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	x86/dpci: do not leak pending interrupts on CPU offline \| expand x86/dpci: do not leak pending interrupts on CPU offline

Message ID

20241003142036.43287-1-roger.pau@citrix.com (mailing list archive)

State

New

Headers

Errors-To: xen-devel-bounces@lists.xenproject.org
Precedence: list
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
From: Roger Pau Monne <roger.pau@citrix.com>
To: xen-devel@lists.xenproject.org
Cc: Roger Pau Monne <roger.pau@citrix.com>,
	Jan Beulich <jbeulich@suse.com>
Subject: [PATCH] x86/dpci: do not leak pending interrupts on CPU offline
Date: Thu,  3 Oct 2024 16:20:36 +0200
Message-ID: <20241003142036.43287-1-roger.pau@citrix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Series

x86/dpci: do not leak pending interrupts on CPU offline | expand

Commit Message

Roger Pau Monné Oct. 3, 2024, 2:20 p.m. UTC

The current dpci logic relies on a softirq being executed as a side effect of
the cpu_notifier_call_chain() call in the code path that offlines the target
CPU.  However the call to cpu_notifier_call_chain() won't trigger any softirq
processing, and even if it did, such processing should be done after all
interrupts have been migrated off the current CPU, otherwise new pending dpci
interrupts could still appear.

Current ASSERT in the cpu callback notifier is fairly easy to trigger by doing
CPU offline from a PVH dom0.

Solve this by instead moving out any dpci interrupts pending processing once
the CPU is dead.  This might introduce more latency than attempting to drain
before the CPU is put offline, but it's less complex, and CPU online/offline is
not a common action.  Any extra introduced latency should be tolerable.

Fixes: f6dd295381f4 ('dpci: replace tasklet with softirq')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/drivers/passthrough/x86/hvm.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

Comments

Andrew Cooper Oct. 3, 2024, 3:55 p.m. UTC | #1

On 03/10/2024 3:20 pm, Roger Pau Monne wrote:
> The current dpci logic relies on a softirq being executed as a side effect of
> the cpu_notifier_call_chain() call in the code path that offlines the target
> CPU.  However the call to cpu_notifier_call_chain() won't trigger any softirq
> processing, and even if it did, such processing should be done after all
> interrupts have been migrated off the current CPU, otherwise new pending dpci
> interrupts could still appear.
>
> Current ASSERT in

"Currently the ASSERT() in"

>  the cpu callback notifier is fairly easy to trigger by doing
> CPU offline from a PVH dom0.
>
> Solve this by instead moving out any dpci interrupts pending processing once
> the CPU is dead.  This might introduce more latency than attempting to drain
> before the CPU is put offline, but it's less complex, and CPU online/offline is
> not a common action.  Any extra introduced latency should be tolerable.
>
> Fixes: f6dd295381f4 ('dpci: replace tasklet with softirq')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Yeah, I'm not concerned with minor extra latency in the offline path. 
In production it's used 0% of the time to many many significant figures.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index d3627e4af71b..f5faff7a499a 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -1105,23 +1105,27 @@  static int cf_check cpu_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
+    unsigned long flags;
 
     switch ( action )
     {
     case CPU_UP_PREPARE:
         INIT_LIST_HEAD(&per_cpu(dpci_list, cpu));
         break;
+
     case CPU_UP_CANCELED:
-    case CPU_DEAD:
-        /*
-         * On CPU_DYING this callback is called (on the CPU that is dying)
-         * with an possible HVM_DPIC_SOFTIRQ pending - at which point we can
-         * clear out any outstanding domains (by the virtue of the idle loop
-         * calling the softirq later). In CPU_DEAD case the CPU is deaf and
-         * there are no pending softirqs for us to handle so we can chill.
-         */
         ASSERT(list_empty(&per_cpu(dpci_list, cpu)));
         break;
+
+    case CPU_DEAD:
+        if ( list_empty(&per_cpu(dpci_list, cpu)) )
+            break;
+        /* Take whatever dpci interrupts are pending on the dead CPU. */
+        local_irq_save(flags);
+        list_splice_init(&per_cpu(dpci_list, cpu), &this_cpu(dpci_list));
+        local_irq_restore(flags);
+        raise_softirq(HVM_DPCI_SOFTIRQ);
+        break;
     }
 
     return NOTIFY_DONE;

x86/dpci: do not leak pending interrupts on CPU offline

Commit Message

Comments

Patch