Message ID | 20220829151536.8578-1-jandryuk@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | xen-pcifront: Handle missed Connected state | expand |
On Aug 29, 2022, at 11:16 AM, Jason Andryuk <jandryuk@gmail.com> wrote: > > An HVM guest with linux stubdom and 2 PCI devices failed to start as > libxl timed out waiting for the PCI devices to be added. It happens > intermittently but with some regularity. libxl wrote the two xenstore > entries for the devices, but then timed out waiting for backend state 4 > (Connected) - the state stayed at 7 (Reconfiguring). (PCI passthrough > to an HVM with stubdomain is PV passthrough to the stubdomain and then > HVM passthrough with the QEMU inside the stubdomain.) > > The stubdom kernel never printed "pcifront pci-0: Installing PCI > frontend", so it seems to have missed state 4 which would have > called pcifront_try_connect -> pcifront_connect_and_init_dma Is there a state machine doc/flowchart for LibXL and Xen PCI device passthrough to Linux? This would be a valuable addition to Xen's developer docs, even as a whiteboard photo in this thread. Rich
On Wed, Aug 31, 2022 at 10:35 PM Rich Persaud <persaur@gmail.com> wrote: > > On Aug 29, 2022, at 11:16 AM, Jason Andryuk <jandryuk@gmail.com> wrote: > > > > An HVM guest with linux stubdom and 2 PCI devices failed to start as > > libxl timed out waiting for the PCI devices to be added. It happens > > intermittently but with some regularity. libxl wrote the two xenstore > > entries for the devices, but then timed out waiting for backend state 4 > > (Connected) - the state stayed at 7 (Reconfiguring). (PCI passthrough > > to an HVM with stubdomain is PV passthrough to the stubdomain and then > > HVM passthrough with the QEMU inside the stubdomain.) > > > > The stubdom kernel never printed "pcifront pci-0: Installing PCI > > frontend", so it seems to have missed state 4 which would have > > called pcifront_try_connect -> pcifront_connect_and_init_dma > > Is there a state machine doc/flowchart for LibXL and Xen PCI device passthrough to Linux? This would be a valuable addition to Xen's developer docs, even as a whiteboard photo in this thread. I am not aware of one. -Jason
The conventional style for subject (from "git log --oneline") is: xen/pcifront: Handle ... On Mon, Aug 29, 2022 at 11:15:36AM -0400, Jason Andryuk wrote: > An HVM guest with linux stubdom and 2 PCI devices failed to start as "stubdom" might be handy shorthand in the Xen world, but I think it would be nice to consistently spell out "stubdomain" since you use both forms randomly in this commit log and newbies like me have to wonder whether they're the same or different. > libxl timed out waiting for the PCI devices to be added. It happens > intermittently but with some regularity. libxl wrote the two xenstore > entries for the devices, but then timed out waiting for backend state 4 > (Connected) - the state stayed at 7 (Reconfiguring). (PCI passthrough > to an HVM with stubdomain is PV passthrough to the stubdomain and then > HVM passthrough with the QEMU inside the stubdomain.) > > The stubdom kernel never printed "pcifront pci-0: Installing PCI > frontend", so it seems to have missed state 4 which would have > called pcifront_try_connect -> pcifront_connect_and_init_dma Add "()" after function names for clarity. > Have pcifront_detach_devices special-case state Initialised and call > pcifront_connect_and_init_dma. Don't use pcifront_try_connect because > that sets the xenbus state which may throw off the backend. After > connecting, skip the remainder of detach_devices since none have been > initialized yet. When the backend switches to Reconfigured, > pcifront_attach_devices will pick them up again. Bjorn
On Fri, Sep 2, 2022 at 12:59 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > The conventional style for subject (from "git log --oneline") is: > > xen/pcifront: Handle ... > > On Mon, Aug 29, 2022 at 11:15:36AM -0400, Jason Andryuk wrote: > > An HVM guest with linux stubdom and 2 PCI devices failed to start as > > "stubdom" might be handy shorthand in the Xen world, but I think > it would be nice to consistently spell out "stubdomain" since you use > both forms randomly in this commit log and newbies like me have to > wonder whether they're the same or different. > > > libxl timed out waiting for the PCI devices to be added. It happens > > intermittently but with some regularity. libxl wrote the two xenstore > > entries for the devices, but then timed out waiting for backend state 4 > > (Connected) - the state stayed at 7 (Reconfiguring). (PCI passthrough > > to an HVM with stubdomain is PV passthrough to the stubdomain and then > > HVM passthrough with the QEMU inside the stubdomain.) > > > > The stubdom kernel never printed "pcifront pci-0: Installing PCI > > frontend", so it seems to have missed state 4 which would have > > called pcifront_try_connect -> pcifront_connect_and_init_dma > > Add "()" after function names for clarity. > > > Have pcifront_detach_devices special-case state Initialised and call > > pcifront_connect_and_init_dma. Don't use pcifront_try_connect because > > that sets the xenbus state which may throw off the backend. After > > connecting, skip the remainder of detach_devices since none have been > > initialized yet. When the backend switches to Reconfigured, > > pcifront_attach_devices will pick them up again. Thanks for taking a look, Bjorn. That all sounds good. I'll wait a little longer to see if there is any more feedback before sending a v2. Regards, Jason
On 29.08.22 17:15, Jason Andryuk wrote: > An HVM guest with linux stubdom and 2 PCI devices failed to start as > libxl timed out waiting for the PCI devices to be added. It happens > intermittently but with some regularity. libxl wrote the two xenstore > entries for the devices, but then timed out waiting for backend state 4 > (Connected) - the state stayed at 7 (Reconfiguring). (PCI passthrough > to an HVM with stubdomain is PV passthrough to the stubdomain and then > HVM passthrough with the QEMU inside the stubdomain.) > > The stubdom kernel never printed "pcifront pci-0: Installing PCI > frontend", so it seems to have missed state 4 which would have > called pcifront_try_connect -> pcifront_connect_and_init_dma > > Have pcifront_detach_devices special-case state Initialised and call > pcifront_connect_and_init_dma. Don't use pcifront_try_connect because > that sets the xenbus state which may throw off the backend. After > connecting, skip the remainder of detach_devices since none have been > initialized yet. When the backend switches to Reconfigured, > pcifront_attach_devices will pick them up again. > > Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> The modifications of the commit message requested by Bjorn can be done while committing. Juergen
On 29.08.22 17:15, Jason Andryuk wrote: > An HVM guest with linux stubdom and 2 PCI devices failed to start as > libxl timed out waiting for the PCI devices to be added. It happens > intermittently but with some regularity. libxl wrote the two xenstore > entries for the devices, but then timed out waiting for backend state 4 > (Connected) - the state stayed at 7 (Reconfiguring). (PCI passthrough > to an HVM with stubdomain is PV passthrough to the stubdomain and then > HVM passthrough with the QEMU inside the stubdomain.) > > The stubdom kernel never printed "pcifront pci-0: Installing PCI > frontend", so it seems to have missed state 4 which would have > called pcifront_try_connect -> pcifront_connect_and_init_dma > > Have pcifront_detach_devices special-case state Initialised and call > pcifront_connect_and_init_dma. Don't use pcifront_try_connect because > that sets the xenbus state which may throw off the backend. After > connecting, skip the remainder of detach_devices since none have been > initialized yet. When the backend switches to Reconfigured, > pcifront_attach_devices will pick them up again. > > Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Pushed to xen/tip.git for-linus-6.1 Juergen
--- a/drivers/pci/xen-pcifront.c +++ b/drivers/pci/xen-pcifront.c @@ -1012,13 +1012,26 @@ static int pcifront_detach_devices(struc { int err = 0; int i, num_devs; + enum xenbus_state state; unsigned int domain, bus, slot, func; struct pci_dev *pci_dev; char str[64]; - if (xenbus_read_driver_state(pdev->xdev->nodename) != - XenbusStateConnected) + state = xenbus_read_driver_state(pdev->xdev->nodename); + if (state == XenbusStateInitialised) { + dev_dbg(&pdev->xdev->dev, "Handle skipped connect.\n"); + /* We missed Connected and need to initialize. */ + err = pcifront_connect_and_init_dma(pdev); + if (err && err != -EEXIST) { + xenbus_dev_fatal(pdev->xdev, err, + "Error setting up PCI Frontend"); + goto out; + } + + goto out_switch_state; + } else if (state != XenbusStateConnected) { goto out; + } err = xenbus_scanf(XBT_NIL, pdev->xdev->otherend, "num_devs", "%d", &num_devs); @@ -1079,6 +1092,7 @@ static int pcifront_detach_devices(struc domain, bus, slot, func); } + out_switch_state: err = xenbus_switch_state(pdev->xdev, XenbusStateReconfiguring); out:
An HVM guest with linux stubdom and 2 PCI devices failed to start as libxl timed out waiting for the PCI devices to be added. It happens intermittently but with some regularity. libxl wrote the two xenstore entries for the devices, but then timed out waiting for backend state 4 (Connected) - the state stayed at 7 (Reconfiguring). (PCI passthrough to an HVM with stubdomain is PV passthrough to the stubdomain and then HVM passthrough with the QEMU inside the stubdomain.) The stubdom kernel never printed "pcifront pci-0: Installing PCI frontend", so it seems to have missed state 4 which would have called pcifront_try_connect -> pcifront_connect_and_init_dma Have pcifront_detach_devices special-case state Initialised and call pcifront_connect_and_init_dma. Don't use pcifront_try_connect because that sets the xenbus state which may throw off the backend. After connecting, skip the remainder of detach_devices since none have been initialized yet. When the backend switches to Reconfigured, pcifront_attach_devices will pick them up again. Signed-off-by: Jason Andryuk <jandryuk@gmail.com> --- drivers/pci/xen-pcifront.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)