diff mbox series

xen: Fix XenStore initialisation for XS_LOCAL

Message ID 4c9af052a6e0f6485d1de43f2c38b1461996db99.camel@infradead.org (mailing list archive)
State Accepted
Commit 5f46400f7a6a4fad635d5a79e2aa5a04a30ffea1
Headers show
Series xen: Fix XenStore initialisation for XS_LOCAL | expand

Commit Message

David Woodhouse Jan. 26, 2021, 5:01 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

In commit 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI")
I reworked the triggering of xenbus_probe().

I tried to simplify things by taking out the workqueue based startup
triggered from wake_waiting(); the somewhat poorly named xenbus IRQ
handler.

I missed the fact that in the XS_LOCAL case (Dom0 starting its own
xenstored or xenstore-stubdom, which happens after the kernel is booted
completely), that IRQ-based trigger is still actually needed.

So... put it back, except more cleanly. By just spawning a xenbus_probe
thread which waits on xb_waitq and runs the probe the first time it
gets woken, just as the workqueue-based hack did.

This is actually a nicer approach for *all* the back ends with different
interrupt methods, and we can switch them all over to that without the
complex conditions for when to trigger it. But not in -rc6. This is
the minimal fix for the regression, although it's a step in the right
direction instead of doing a partial revert and actually putting the
workqueue back. It's also simpler than the workqueue.

Fixes: 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/xen/xenbus/xenbus_probe.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

Comments

Boris Ostrovsky Jan. 26, 2021, 9:36 p.m. UTC | #1
On 1/26/21 12:01 PM, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> In commit 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI")
> I reworked the triggering of xenbus_probe().
> 
> I tried to simplify things by taking out the workqueue based startup
> triggered from wake_waiting(); the somewhat poorly named xenbus IRQ
> handler.
> 
> I missed the fact that in the XS_LOCAL case (Dom0 starting its own
> xenstored or xenstore-stubdom, which happens after the kernel is booted
> completely), that IRQ-based trigger is still actually needed.
> 
> So... put it back, except more cleanly. By just spawning a xenbus_probe
> thread which waits on xb_waitq and runs the probe the first time it
> gets woken, just as the workqueue-based hack did.
> 
> This is actually a nicer approach for *all* the back ends with different
> interrupt methods, and we can switch them all over to that without the
> complex conditions for when to trigger it. But not in -rc6. This is
> the minimal fix for the regression, although it's a step in the right
> direction instead of doing a partial revert and actually putting the
> workqueue back. It's also simpler than the workqueue.


Wouldn't the minimal fix be to restore wake_waiting() to its previous 

        if (unlikely(xenstored_ready == 0)) {
                xenstored_ready = 1;
                schedule_work(&probe_work);
        }

(And to avoid changing xenbus_probe()'s signature just create a wrapper)

-boris
Jürgen Groß Jan. 27, 2021, 6:57 a.m. UTC | #2
On 26.01.21 22:36, Boris Ostrovsky wrote:
> 
> 
> On 1/26/21 12:01 PM, David Woodhouse wrote:
>> From: David Woodhouse <dwmw@amazon.co.uk>
>>
>> In commit 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI")
>> I reworked the triggering of xenbus_probe().
>>
>> I tried to simplify things by taking out the workqueue based startup
>> triggered from wake_waiting(); the somewhat poorly named xenbus IRQ
>> handler.
>>
>> I missed the fact that in the XS_LOCAL case (Dom0 starting its own
>> xenstored or xenstore-stubdom, which happens after the kernel is booted
>> completely), that IRQ-based trigger is still actually needed.
>>
>> So... put it back, except more cleanly. By just spawning a xenbus_probe
>> thread which waits on xb_waitq and runs the probe the first time it
>> gets woken, just as the workqueue-based hack did.
>>
>> This is actually a nicer approach for *all* the back ends with different
>> interrupt methods, and we can switch them all over to that without the
>> complex conditions for when to trigger it. But not in -rc6. This is
>> the minimal fix for the regression, although it's a step in the right
>> direction instead of doing a partial revert and actually putting the
>> workqueue back. It's also simpler than the workqueue.
> 
> 
> Wouldn't the minimal fix be to restore wake_waiting() to its previous
> 
>          if (unlikely(xenstored_ready == 0)) {
>                  xenstored_ready = 1;
>                  schedule_work(&probe_work);
>          }
> 
> (And to avoid changing xenbus_probe()'s signature just create a wrapper)

David and I had a longer chat on IRC regarding this fix.

The long term idea is to have just his current thread based variant
for all cases calling xenbus_probe() (so no call of xenbus_probe() at
any other place).

We agreed that this approach would be too risky now, but we wanted to
go in the right direction with the current fix. This is the outcome.


Juergen
Jürgen Groß Jan. 27, 2021, 8:09 a.m. UTC | #3
On 26.01.21 18:01, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> In commit 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI")
> I reworked the triggering of xenbus_probe().
> 
> I tried to simplify things by taking out the workqueue based startup
> triggered from wake_waiting(); the somewhat poorly named xenbus IRQ
> handler.
> 
> I missed the fact that in the XS_LOCAL case (Dom0 starting its own
> xenstored or xenstore-stubdom, which happens after the kernel is booted
> completely), that IRQ-based trigger is still actually needed.
> 
> So... put it back, except more cleanly. By just spawning a xenbus_probe
> thread which waits on xb_waitq and runs the probe the first time it
> gets woken, just as the workqueue-based hack did.
> 
> This is actually a nicer approach for *all* the back ends with different
> interrupt methods, and we can switch them all over to that without the
> complex conditions for when to trigger it. But not in -rc6. This is
> the minimal fix for the regression, although it's a step in the right
> direction instead of doing a partial revert and actually putting the
> workqueue back. It's also simpler than the workqueue.
> 
> Fixes: 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI")
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Committed to: xen/tip.git for-linus-5.11


Juergen
diff mbox series

Patch

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index c8f0282bb649..18ffd0551b54 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -714,6 +714,23 @@  static bool xs_hvm_defer_init_for_callback(void)
 #endif
 }
 
+static int xenbus_probe_thread(void *unused)
+{
+	DEFINE_WAIT(w);
+
+	/*
+	 * We actually just want to wait for *any* trigger of xb_waitq,
+	 * and run xenbus_probe() the moment it occurs.
+	 */
+	prepare_to_wait(&xb_waitq, &w, TASK_INTERRUPTIBLE);
+	schedule();
+	finish_wait(&xb_waitq, &w);
+
+	DPRINTK("probing");
+	xenbus_probe();
+	return 0;
+}
+
 static int __init xenbus_probe_initcall(void)
 {
 	/*
@@ -725,6 +742,20 @@  static int __init xenbus_probe_initcall(void)
 	     !xs_hvm_defer_init_for_callback()))
 		xenbus_probe();
 
+	/*
+	 * For XS_LOCAL, spawn a thread which will wait for xenstored
+	 * or a xenstore-stubdom to be started, then probe. It will be
+	 * triggered when communication starts happening, by waiting
+	 * on xb_waitq.
+	 */
+	if (xen_store_domain_type == XS_LOCAL) {
+		struct task_struct *probe_task;
+
+		probe_task = kthread_run(xenbus_probe_thread, NULL,
+					 "xenbus_probe");
+		if (IS_ERR(probe_task))
+			return PTR_ERR(probe_task);
+	}
 	return 0;
 }
 device_initcall(xenbus_probe_initcall);