Message ID | 20231213003614.1648343-2-imammedo@redhat.com (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
Series | PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job | expand |
Am 13.12.23 um 01:36 schrieb Igor Mammedov: > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line), > adding device to bus and enabling it will kick in async host scan > > scsi_scan_host+0x21/0x1f0 > virtscsi_probe+0x2dd/0x350 > .. > driver_probe_device+0x19/0x80 > ... > driver_probe_device+0x19/0x80 > pci_bus_add_device+0x53/0x80 > pci_bus_add_devices+0x2b/0x70 > ... > > which will schedule a job for async scan. That however breaks > if there are more than one SCSI host behind bridge, since > acpiphp_check_bridge() will walk over all slots and try to > enable each of them regardless of whether they were already > enabled. > As result the bridge might be reconfigured several times > and trigger following sequence: > > [cpu 0] acpiphp_check_bridge() > [cpu 0] enable_slot(a) > [cpu 0] configure bridge > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1) > [cpu 0] enable_slot(b) > ... > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a > ... > [cpu 0] configure bridge <- temporaly disables bridge > > and cause do_scsi_scan_host() failure. > The same race affects SHPC (but it manages to avoid hitting the race due to > 1sec delay when enabling slot). > To cover case of single device hotplug (at a time) do not attempt to > enable slot that have already been enabled. > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > Reported-by: Dongli Zhang <dongli.zhang@oracle.com> > Reported-by: iona Ebner <f.ebner@proxmox.com> Missing an F here ;) > Signed-off-by: Igor Mammedov <imammedo@redhat.com> Thank you! Works for me: Tested-by: Fiona Ebner <f.ebner@proxmox.com>
On Wed, 13 Dec 2023 10:47:27 +0100 Fiona Ebner <f.ebner@proxmox.com> wrote: > Am 13.12.23 um 01:36 schrieb Igor Mammedov: > > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line), > > adding device to bus and enabling it will kick in async host scan > > > > scsi_scan_host+0x21/0x1f0 > > virtscsi_probe+0x2dd/0x350 > > .. > > driver_probe_device+0x19/0x80 > > ... > > driver_probe_device+0x19/0x80 > > pci_bus_add_device+0x53/0x80 > > pci_bus_add_devices+0x2b/0x70 > > ... > > > > which will schedule a job for async scan. That however breaks > > if there are more than one SCSI host behind bridge, since > > acpiphp_check_bridge() will walk over all slots and try to > > enable each of them regardless of whether they were already > > enabled. > > As result the bridge might be reconfigured several times > > and trigger following sequence: > > > > [cpu 0] acpiphp_check_bridge() > > [cpu 0] enable_slot(a) > > [cpu 0] configure bridge > > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1) > > [cpu 0] enable_slot(b) > > ... > > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a > > ... > > [cpu 0] configure bridge <- temporaly disables bridge > > > > and cause do_scsi_scan_host() failure. > > The same race affects SHPC (but it manages to avoid hitting the race due to > > 1sec delay when enabling slot). > > To cover case of single device hotplug (at a time) do not attempt to > > enable slot that have already been enabled. > > > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > > Reported-by: Dongli Zhang <dongli.zhang@oracle.com> > > Reported-by: iona Ebner <f.ebner@proxmox.com> > > Missing an F here ;) Sorry for copypaste mistake, I'll fix it up on the next submission. > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > Thank you! Works for me: > > Tested-by: Fiona Ebner <f.ebner@proxmox.com> >
On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote: > > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line), > adding device to bus and enabling it will kick in async host scan > > scsi_scan_host+0x21/0x1f0 > virtscsi_probe+0x2dd/0x350 > .. > driver_probe_device+0x19/0x80 > ... > driver_probe_device+0x19/0x80 > pci_bus_add_device+0x53/0x80 > pci_bus_add_devices+0x2b/0x70 > ... > > which will schedule a job for async scan. That however breaks > if there are more than one SCSI host behind bridge, since > acpiphp_check_bridge() will walk over all slots and try to > enable each of them regardless of whether they were already > enabled. > As result the bridge might be reconfigured several times > and trigger following sequence: > > [cpu 0] acpiphp_check_bridge() > [cpu 0] enable_slot(a) > [cpu 0] configure bridge > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1) > [cpu 0] enable_slot(b) > ... > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a > ... > [cpu 0] configure bridge <- temporaly disables bridge > > and cause do_scsi_scan_host() failure. > The same race affects SHPC (but it manages to avoid hitting the race due to > 1sec delay when enabling slot). > To cover case of single device hotplug (at a time) do not attempt to > enable slot that have already been enabled. > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > Reported-by: Dongli Zhang <dongli.zhang@oracle.com> > Reported-by: iona Ebner <f.ebner@proxmox.com> > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > drivers/pci/hotplug/acpiphp_glue.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > index 601129772b2d..6b11609927d6 100644 > --- a/drivers/pci/hotplug/acpiphp_glue.c > +++ b/drivers/pci/hotplug/acpiphp_glue.c > @@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge) > trim_stale_devices(dev); > > /* configure all functions */ > - enable_slot(slot, true); > + if (slot->flags != SLOT_ENABLED) { > + enable_slot(slot, true); > + } Shouldn't this be following the acpiphp_enable_slot() pattern, that is if (!(slot->flags & SLOT_ENABLED)) enable_slot(slot, true); Also the braces are redundant. > } else { > disable_slot(slot); > } > --
On Wed, Dec 13, 2023 at 2:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote: > > > > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line), > > adding device to bus and enabling it will kick in async host scan > > > > scsi_scan_host+0x21/0x1f0 > > virtscsi_probe+0x2dd/0x350 > > .. > > driver_probe_device+0x19/0x80 > > ... > > driver_probe_device+0x19/0x80 > > pci_bus_add_device+0x53/0x80 > > pci_bus_add_devices+0x2b/0x70 > > ... > > > > which will schedule a job for async scan. That however breaks > > if there are more than one SCSI host behind bridge, since > > acpiphp_check_bridge() will walk over all slots and try to > > enable each of them regardless of whether they were already > > enabled. > > As result the bridge might be reconfigured several times > > and trigger following sequence: > > > > [cpu 0] acpiphp_check_bridge() > > [cpu 0] enable_slot(a) > > [cpu 0] configure bridge > > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1) > > [cpu 0] enable_slot(b) > > ... > > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a > > ... > > [cpu 0] configure bridge <- temporaly disables bridge > > > > and cause do_scsi_scan_host() failure. > > The same race affects SHPC (but it manages to avoid hitting the race due to > > 1sec delay when enabling slot). > > To cover case of single device hotplug (at a time) do not attempt to > > enable slot that have already been enabled. > > > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > > Reported-by: Dongli Zhang <dongli.zhang@oracle.com> > > Reported-by: iona Ebner <f.ebner@proxmox.com> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > drivers/pci/hotplug/acpiphp_glue.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > > index 601129772b2d..6b11609927d6 100644 > > --- a/drivers/pci/hotplug/acpiphp_glue.c > > +++ b/drivers/pci/hotplug/acpiphp_glue.c > > @@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge) > > trim_stale_devices(dev); > > > > /* configure all functions */ > > - enable_slot(slot, true); > > + if (slot->flags != SLOT_ENABLED) { > > + enable_slot(slot, true); > > + } > > Shouldn't this be following the acpiphp_enable_slot() pattern, that is > > if (!(slot->flags & SLOT_ENABLED)) > enable_slot(slot, true); > > Also the braces are redundant. I'll fix up on respin if Bjorn is fine with the approach in general. Patches need respin anyways to fix botched up white spacing. > > > } else { > > disable_slot(slot); > > } > > -- >
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index 601129772b2d..6b11609927d6 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge) trim_stale_devices(dev); /* configure all functions */ - enable_slot(slot, true); + if (slot->flags != SLOT_ENABLED) { + enable_slot(slot, true); + } } else { disable_slot(slot); }
When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line), adding device to bus and enabling it will kick in async host scan scsi_scan_host+0x21/0x1f0 virtscsi_probe+0x2dd/0x350 .. driver_probe_device+0x19/0x80 ... driver_probe_device+0x19/0x80 pci_bus_add_device+0x53/0x80 pci_bus_add_devices+0x2b/0x70 ... which will schedule a job for async scan. That however breaks if there are more than one SCSI host behind bridge, since acpiphp_check_bridge() will walk over all slots and try to enable each of them regardless of whether they were already enabled. As result the bridge might be reconfigured several times and trigger following sequence: [cpu 0] acpiphp_check_bridge() [cpu 0] enable_slot(a) [cpu 0] configure bridge [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1) [cpu 0] enable_slot(b) ... [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a ... [cpu 0] configure bridge <- temporaly disables bridge and cause do_scsi_scan_host() failure. The same race affects SHPC (but it manages to avoid hitting the race due to 1sec delay when enabling slot). To cover case of single device hotplug (at a time) do not attempt to enable slot that have already been enabled. Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") Reported-by: Dongli Zhang <dongli.zhang@oracle.com> Reported-by: iona Ebner <f.ebner@proxmox.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- drivers/pci/hotplug/acpiphp_glue.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)