Message ID | 20180905203546.21921-17-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI, error handling and hot plug | expand |
On 09/05/2018 04:35 PM, Keith Busch wrote: > Error handling may trigger spurious link events, which may trigger > hotplug handling to re-enumerate the topology. This patch temporarily > disables notification during such error handling and checks the topology > for changes after reset. > > Signed-off-by: Keith Busch <keith.busch@intel.com> > --- > drivers/pci/hotplug/pciehp.h | 1 + > drivers/pci/hotplug/pciehp_core.c | 39 +++++++++++++++++++++++++++++++++++++++ > drivers/pci/hotplug/pciehp_hpc.c | 9 +++++++++ > 3 files changed, 49 insertions(+) > > diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h > index 3a53a24f22f0..f43bcf291c4e 100644 > --- a/drivers/pci/hotplug/pciehp.h > +++ b/drivers/pci/hotplug/pciehp.h > @@ -194,6 +194,7 @@ void pciehp_get_attention_status(struct slot *slot, u8 *status); > void pciehp_set_attention_status(struct slot *slot, u8 status); > void pciehp_get_latch_status(struct slot *slot, u8 *status); > void pciehp_get_adapter_status(struct slot *slot, u8 *status); > +void pciehp_get_adapter_changed(struct slot *slot, u8 *changed); > int pciehp_query_power_fault(struct slot *slot); > void pciehp_green_led_on(struct slot *slot); > void pciehp_green_led_off(struct slot *slot); > diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c > index ec48c9433ae5..7181fd991068 100644 > --- a/drivers/pci/hotplug/pciehp_core.c > +++ b/drivers/pci/hotplug/pciehp_core.c > @@ -301,6 +301,43 @@ static void pciehp_remove(struct pcie_device *dev) > pciehp_release_ctrl(ctrl); > } > > +static void pciehp_error_detected(struct pcie_device *dev) > +{ > + struct controller *ctrl = get_service_data(dev); > + > + /* > + * Shutdown notification to ignore hotplug events during error > + * handling. We'll recheck the status when error handling completes. > + * > + * It is possible link event related to this error handling may have > + * triggered a the hotplug interrupt ahead of this notification, but we > + * can't do anything about that race. > + */ > + pcie_shutdown_notification(ctrl); > +} > + > +static void pciehp_slot_reset(struct pcie_device *dev) > +{ > + struct controller *ctrl = get_service_data(dev); > + u8 changed; > + > + /* > + * Cache presence detect change, but ignore other hotplug events that > + * may occur during error handling. Ports that implement only in-band > + * presence detection may inadvertently believe the device has changed, > + * so those devices will have to be re-enumerated. > + */ > + pciehp_get_adapter_changed(ctrl->slot, &changed); > + pcie_clear_hotplug_events(ctrl); > + pcie_init_notification(ctrl); > + > + if (changed) { > + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); > + __pci_walk_bus(parent, pci_dev_set_disconnected, NULL); error: ‘parent’ undeclared. Shouldn't it be __pci_walk_bus(dev->port->bus, pci_dev_set_disconnected, NULL); Thanks, Thomas > + } else if (atomic_read(&ctrl->pending_events) && !pciehp_poll_mode) > + irq_wake_thread(ctrl->pcie->irq, ctrl); > +} > + > #ifdef CONFIG_PM > static int pciehp_suspend(struct pcie_device *dev) > { > @@ -340,6 +377,8 @@ static struct pcie_port_service_driver hpdriver_portdrv = { > > .probe = pciehp_probe, > .remove = pciehp_remove, > + .error_detected = pciehp_error_detected, > + .slot_reset = pciehp_slot_reset, > > #ifdef CONFIG_PM > .suspend = pciehp_suspend, > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > index 52a18a7ec2a2..5ec2bc871a9c 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -382,6 +382,15 @@ void pciehp_get_adapter_status(struct slot *slot, u8 *status) > *status = !!(slot_status & PCI_EXP_SLTSTA_PDS); > } > > +void pciehp_get_adapter_changed(struct slot *slot, u8 *changed) > +{ > + struct pci_dev *pdev = ctrl_dev(slot->ctrl); > + u16 slot_status; > + > + pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status); > + *changed = !!(slot_status & PCI_EXP_SLTSTA_PDC); > +} > + > int pciehp_query_power_fault(struct slot *slot) > { > struct pci_dev *pdev = ctrl_dev(slot->ctrl); >
On Thu, Sep 06, 2018 at 02:23:19PM -0400, Thomas Tai wrote: > On 09/05/2018 04:35 PM, Keith Busch wrote: > > + if (changed) { > > + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); > > + __pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > > error: ‘parent’ undeclared. > > Shouldn't it be __pci_walk_bus(dev->port->bus, pci_dev_set_disconnected, > NULL); Indeed. I've pointed git send-email to the wrong directory, but looks like this is the only discrepency.
On Wed, Sep 05, 2018 at 02:35:42PM -0600, Keith Busch wrote: > Error handling may trigger spurious link events, which may trigger > hotplug handling to re-enumerate the topology. This patch temporarily > disables notification during such error handling and checks the topology > for changes after reset. [...] > --- a/drivers/pci/hotplug/pciehp_core.c > +++ b/drivers/pci/hotplug/pciehp_core.c > @@ -301,6 +301,43 @@ static void pciehp_remove(struct pcie_device *dev) > pciehp_release_ctrl(ctrl); > } > > +static void pciehp_error_detected(struct pcie_device *dev) > +{ > + struct controller *ctrl = get_service_data(dev); > + > + /* > + * Shutdown notification to ignore hotplug events during error > + * handling. We'll recheck the status when error handling completes. > + * > + * It is possible link event related to this error handling may have > + * triggered a the hotplug interrupt ahead of this notification, but we > + * can't do anything about that race. > + */ > + pcie_shutdown_notification(ctrl); Unfortunately this patch does not take into account my comment on patch [13/16] in v1 of this series that pcie_shutdown_notification() can't be used here because the sysfs user interface to enable/disable the slot is still present but no longer functions once the IRQ is released: https://patchwork.ozlabs.org/patch/964715/ My suggestion was: "I think what you want to do instead is "down_write(&ctrl->reset_lock)", see commit 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset"). And possibly mask PDCE/DLLSCE like pciehp_reset_slot() does." > +static void pciehp_slot_reset(struct pcie_device *dev) > +{ > + struct controller *ctrl = get_service_data(dev); > + u8 changed; > + > + /* > + * Cache presence detect change, but ignore other hotplug events that > + * may occur during error handling. Ports that implement only in-band > + * presence detection may inadvertently believe the device has changed, > + * so those devices will have to be re-enumerated. > + */ > + pciehp_get_adapter_changed(ctrl->slot, &changed); > + pcie_clear_hotplug_events(ctrl); > + pcie_init_notification(ctrl); > + > + if (changed) { > + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); > + __pci_walk_bus(parent, pci_dev_set_disconnected, NULL); The __pci_walk_bus() seems superfluous because the devices are also marked disconnected when bringing down the slot as a result of the synthesized PCI_EXP_SLTSTA_PDC event. > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -382,6 +382,15 @@ void pciehp_get_adapter_status(struct slot *slot, u8 *status) > +void pciehp_get_adapter_changed(struct slot *slot, u8 *changed) > +{ > + struct pci_dev *pdev = ctrl_dev(slot->ctrl); > + u16 slot_status; > + > + pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status); > + *changed = !!(slot_status & PCI_EXP_SLTSTA_PDC); > +} When adding new functions like this I usually let them return a bool instead of passing a call-by-reference parameter. The only reason pciehp has these call-by-reference pciehp_get_*() functions is because some of them assign a u8 value that is then directly passed to user space via sysfs. Thanks, Lukas
On Mon, Sep 10, 2018 at 06:20:33AM -0700, Lukas Wunner wrote: > On Wed, Sep 05, 2018 at 02:35:42PM -0600, Keith Busch wrote: > > Error handling may trigger spurious link events, which may trigger > > hotplug handling to re-enumerate the topology. This patch temporarily > > disables notification during such error handling and checks the topology > > for changes after reset. > [...] > > --- a/drivers/pci/hotplug/pciehp_core.c > > +++ b/drivers/pci/hotplug/pciehp_core.c > > @@ -301,6 +301,43 @@ static void pciehp_remove(struct pcie_device *dev) > > pciehp_release_ctrl(ctrl); > > } > > > > +static void pciehp_error_detected(struct pcie_device *dev) > > +{ > > + struct controller *ctrl = get_service_data(dev); > > + > > + /* > > + * Shutdown notification to ignore hotplug events during error > > + * handling. We'll recheck the status when error handling completes. > > + * > > + * It is possible link event related to this error handling may have > > + * triggered a the hotplug interrupt ahead of this notification, but we > > + * can't do anything about that race. > > + */ > > + pcie_shutdown_notification(ctrl); > > Unfortunately this patch does not take into account my comment on > patch [13/16] in v1 of this series that pcie_shutdown_notification() > can't be used here because the sysfs user interface to enable/disable > the slot is still present but no longer functions once the IRQ is > released: > https://patchwork.ozlabs.org/patch/964715/ > > My suggestion was: > > "I think what you want to do instead is "down_write(&ctrl->reset_lock)", > see commit 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset"). > And possibly mask PDCE/DLLSCE like pciehp_reset_slot() does." The sysfs entries still function. Their actions are only temporarily stalled during error handling. Once the slot reset is called, the ctrl->pending_events is queried to take requested actions. > > +static void pciehp_slot_reset(struct pcie_device *dev) > > +{ > > + struct controller *ctrl = get_service_data(dev); > > + u8 changed; > > + > > + /* > > + * Cache presence detect change, but ignore other hotplug events that > > + * may occur during error handling. Ports that implement only in-band > > + * presence detection may inadvertently believe the device has changed, > > + * so those devices will have to be re-enumerated. > > + */ > > + pciehp_get_adapter_changed(ctrl->slot, &changed); > > + pcie_clear_hotplug_events(ctrl); > > + pcie_init_notification(ctrl); > > + > > + if (changed) { > > + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); > > + __pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > > The __pci_walk_bus() seems superfluous because the devices are also > marked disconnected when bringing down the slot as a result of the > synthesized PCI_EXP_SLTSTA_PDC event. Right, but we can't do that inline with the slot reset because of the circular locks that creates with the pci_bus_sem. We still want to fence off drivers for downstream devices from mucking with config space in a topology that is reported to be different than the one we're recoverying from. You can get undefined results that way.
On Mon, Sep 10, 2018 at 08:56:42AM -0600, Keith Busch wrote: > On Mon, Sep 10, 2018 at 06:20:33AM -0700, Lukas Wunner wrote: > > On Wed, Sep 05, 2018 at 02:35:42PM -0600, Keith Busch wrote: > > > +static void pciehp_error_detected(struct pcie_device *dev) > > > +{ > > > + struct controller *ctrl = get_service_data(dev); > > > + > > > + /* > > > + * Shutdown notification to ignore hotplug events during error > > > + * handling. We'll recheck the status when error handling completes. > > > + * > > > + * It is possible link event related to this error handling may have > > > + * triggered a the hotplug interrupt ahead of this notification, but we > > > + * can't do anything about that race. > > > + */ > > > + pcie_shutdown_notification(ctrl); > > > > Unfortunately this patch does not take into account my comment on > > patch [13/16] in v1 of this series that pcie_shutdown_notification() > > can't be used here because the sysfs user interface to enable/disable > > the slot is still present but no longer functions once the IRQ is > > released: > > https://patchwork.ozlabs.org/patch/964715/ > > > > My suggestion was: > > > > "I think what you want to do instead is "down_write(&ctrl->reset_lock)", > > see commit 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset"). > > And possibly mask PDCE/DLLSCE like pciehp_reset_slot() does." > > The sysfs entries still function. Their actions are only temporarily > stalled during error handling. Once the slot reset is called, the > ctrl->pending_events is queried to take requested actions. Okay I see. Still, releasing the IRQ and requesting it again seems fairly heavy-wheight. Why not just acquire ctrl->reset_lock? > > > +static void pciehp_slot_reset(struct pcie_device *dev) > > > +{ > > > + struct controller *ctrl = get_service_data(dev); > > > + u8 changed; > > > + > > > + /* > > > + * Cache presence detect change, but ignore other hotplug events that > > > + * may occur during error handling. Ports that implement only in-band > > > + * presence detection may inadvertently believe the device has changed, > > > + * so those devices will have to be re-enumerated. > > > + */ > > > + pciehp_get_adapter_changed(ctrl->slot, &changed); > > > + pcie_clear_hotplug_events(ctrl); > > > + pcie_init_notification(ctrl); > > > + > > > + if (changed) { > > > + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); > > > + __pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > > > > The __pci_walk_bus() seems superfluous because the devices are also > > marked disconnected when bringing down the slot as a result of the > > synthesized PCI_EXP_SLTSTA_PDC event. > > Right, but we can't do that inline with the slot reset because of the > circular locks that creates with the pci_bus_sem. We still want to > fence off drivers for downstream devices from mucking with config space > in a topology that is reported to be different than the one we're > recoverying from. You can get undefined results that way. I don't quite follow. I meant that __pci_walk_bus() is unnecessary because the pciehp_request() call indirectly triggers it. So the devices are marked disconnected twice. Thanks, Lukas
On Mon, Sep 10, 2018 at 06:09:26PM +0200, Lukas Wunner wrote: > On Mon, Sep 10, 2018 at 08:56:42AM -0600, Keith Busch wrote: > > On Mon, Sep 10, 2018 at 06:20:33AM -0700, Lukas Wunner wrote: > > > The __pci_walk_bus() seems superfluous because the devices are also > > > marked disconnected when bringing down the slot as a result of the > > > synthesized PCI_EXP_SLTSTA_PDC event. > > > > Right, but we can't do that inline with the slot reset because of the > > circular locks that creates with the pci_bus_sem. We still want to > > fence off drivers for downstream devices from mucking with config space > > in a topology that is reported to be different than the one we're > > recoverying from. You can get undefined results that way. > > I don't quite follow. I meant that __pci_walk_bus() is unnecessary > because the pciehp_request() call indirectly triggers it. So the > devices are marked disconnected twice. Right, but pciehhp_request handling happens too late to fence off drivers bound to devices downstream this hotplug port. The disconnect setting needs to happen before pciehp's slot_reset returns.
On Mon, Sep 10, 2018 at 06:09:26PM +0200, Lukas Wunner wrote: > On Mon, Sep 10, 2018 at 08:56:42AM -0600, Keith Busch wrote: > > The sysfs entries still function. Their actions are only temporarily > > stalled during error handling. Once the slot reset is called, the > > ctrl->pending_events is queried to take requested actions. > > Okay I see. Still, releasing the IRQ and requesting it again seems fairly > heavy-wheight. Why not just acquire ctrl->reset_lock? That was looking like a nice way to handle it, but it introduces circular locking between ctrl->reset_lock and pci_bus_sem: CPU A CPU B --------------------------------- ------------------------ pci_walk_bus pciehp_ist down_read(&pci_bus_sem) down_read(&ctrl->reset_lock); pcie_portdrv_error_detected pciehp_handle_presence_or_link_change pciehp_error_detected pciehp_unconfigure_device down_write(&ctrl->reset_lock) pci_stop_and_remove_bus_devicea down_write(&pci_bus_sem);
On Mon, Sep 10, 2018 at 10:45:28AM -0600, Keith Busch wrote: > On Mon, Sep 10, 2018 at 06:09:26PM +0200, Lukas Wunner wrote: > > On Mon, Sep 10, 2018 at 08:56:42AM -0600, Keith Busch wrote: > > > The sysfs entries still function. Their actions are only temporarily > > > stalled during error handling. Once the slot reset is called, the > > > ctrl->pending_events is queried to take requested actions. > > > > Okay I see. Still, releasing the IRQ and requesting it again seems fairly > > heavy-wheight. Why not just acquire ctrl->reset_lock? > > That was looking like a nice way to handle it, but it introduces > circular locking between ctrl->reset_lock and pci_bus_sem: > > CPU A CPU B > --------------------------------- ------------------------ > pci_walk_bus pciehp_ist > down_read(&pci_bus_sem) down_read(&ctrl->reset_lock); > pcie_portdrv_error_detected pciehp_handle_presence_or_link_change > pciehp_error_detected pciehp_unconfigure_device > down_write(&ctrl->reset_lock) pci_stop_and_remove_bus_devicea > down_write(&pci_bus_sem); Why is pciehp bringing down the slot? Is that in reaction to a Link Down caused by the error? Can this be solved with Sinan's approach to check in pciehp whether PCI_EXP_DEVSTA_FED is set and if so, waiting for it to be handled? FWIW you can use synchronize_irq() in pciehp_error_detected() if you need to wait for the IRQ thread to stop before taking the reset_lock. Thanks, Lukas
On Mon, Sep 10, 2018 at 07:08:48PM +0200, Lukas Wunner wrote: > On Mon, Sep 10, 2018 at 10:45:28AM -0600, Keith Busch wrote: > > On Mon, Sep 10, 2018 at 06:09:26PM +0200, Lukas Wunner wrote: > > > On Mon, Sep 10, 2018 at 08:56:42AM -0600, Keith Busch wrote: > > > > The sysfs entries still function. Their actions are only temporarily > > > > stalled during error handling. Once the slot reset is called, the > > > > ctrl->pending_events is queried to take requested actions. > > > > > > Okay I see. Still, releasing the IRQ and requesting it again seems fairly > > > heavy-wheight. Why not just acquire ctrl->reset_lock? > > > > That was looking like a nice way to handle it, but it introduces > > circular locking between ctrl->reset_lock and pci_bus_sem: > > > > CPU A CPU B > > --------------------------------- ------------------------ > > pci_walk_bus pciehp_ist > > down_read(&pci_bus_sem) down_read(&ctrl->reset_lock); > > pcie_portdrv_error_detected pciehp_handle_presence_or_link_change > > pciehp_error_detected pciehp_unconfigure_device > > down_write(&ctrl->reset_lock) pci_stop_and_remove_bus_devicea > > down_write(&pci_bus_sem); > > Why is pciehp bringing down the slot? Is that in reaction to a > Link Down caused by the error? This could just be something that happens if a hotplug and error event occur about the same time. > Can this be solved with Sinan's > approach to check in pciehp whether PCI_EXP_DEVSTA_FED is set > and if so, waiting for it to be handled? That only helps if the downstream port detected a fatal error, but error handling happens for any device reported error. > FWIW you can use synchronize_irq() in pciehp_error_detected() > if you need to wait for the IRQ thread to stop before taking > the reset_lock. That would just be a different dead lock: pciehp_error_detected is holding a read lock on pci_bus_sem, and irq thread may request a write lock, so both threads are dead locked on each other.
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index 3a53a24f22f0..f43bcf291c4e 100644 --- a/drivers/pci/hotplug/pciehp.h +++ b/drivers/pci/hotplug/pciehp.h @@ -194,6 +194,7 @@ void pciehp_get_attention_status(struct slot *slot, u8 *status); void pciehp_set_attention_status(struct slot *slot, u8 status); void pciehp_get_latch_status(struct slot *slot, u8 *status); void pciehp_get_adapter_status(struct slot *slot, u8 *status); +void pciehp_get_adapter_changed(struct slot *slot, u8 *changed); int pciehp_query_power_fault(struct slot *slot); void pciehp_green_led_on(struct slot *slot); void pciehp_green_led_off(struct slot *slot); diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c index ec48c9433ae5..7181fd991068 100644 --- a/drivers/pci/hotplug/pciehp_core.c +++ b/drivers/pci/hotplug/pciehp_core.c @@ -301,6 +301,43 @@ static void pciehp_remove(struct pcie_device *dev) pciehp_release_ctrl(ctrl); } +static void pciehp_error_detected(struct pcie_device *dev) +{ + struct controller *ctrl = get_service_data(dev); + + /* + * Shutdown notification to ignore hotplug events during error + * handling. We'll recheck the status when error handling completes. + * + * It is possible link event related to this error handling may have + * triggered a the hotplug interrupt ahead of this notification, but we + * can't do anything about that race. + */ + pcie_shutdown_notification(ctrl); +} + +static void pciehp_slot_reset(struct pcie_device *dev) +{ + struct controller *ctrl = get_service_data(dev); + u8 changed; + + /* + * Cache presence detect change, but ignore other hotplug events that + * may occur during error handling. Ports that implement only in-band + * presence detection may inadvertently believe the device has changed, + * so those devices will have to be re-enumerated. + */ + pciehp_get_adapter_changed(ctrl->slot, &changed); + pcie_clear_hotplug_events(ctrl); + pcie_init_notification(ctrl); + + if (changed) { + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); + __pci_walk_bus(parent, pci_dev_set_disconnected, NULL); + } else if (atomic_read(&ctrl->pending_events) && !pciehp_poll_mode) + irq_wake_thread(ctrl->pcie->irq, ctrl); +} + #ifdef CONFIG_PM static int pciehp_suspend(struct pcie_device *dev) { @@ -340,6 +377,8 @@ static struct pcie_port_service_driver hpdriver_portdrv = { .probe = pciehp_probe, .remove = pciehp_remove, + .error_detected = pciehp_error_detected, + .slot_reset = pciehp_slot_reset, #ifdef CONFIG_PM .suspend = pciehp_suspend, diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 52a18a7ec2a2..5ec2bc871a9c 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -382,6 +382,15 @@ void pciehp_get_adapter_status(struct slot *slot, u8 *status) *status = !!(slot_status & PCI_EXP_SLTSTA_PDS); } +void pciehp_get_adapter_changed(struct slot *slot, u8 *changed) +{ + struct pci_dev *pdev = ctrl_dev(slot->ctrl); + u16 slot_status; + + pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status); + *changed = !!(slot_status & PCI_EXP_SLTSTA_PDC); +} + int pciehp_query_power_fault(struct slot *slot) { struct pci_dev *pdev = ctrl_dev(slot->ctrl);
Error handling may trigger spurious link events, which may trigger hotplug handling to re-enumerate the topology. This patch temporarily disables notification during such error handling and checks the topology for changes after reset. Signed-off-by: Keith Busch <keith.busch@intel.com> --- drivers/pci/hotplug/pciehp.h | 1 + drivers/pci/hotplug/pciehp_core.c | 39 +++++++++++++++++++++++++++++++++++++++ drivers/pci/hotplug/pciehp_hpc.c | 9 +++++++++ 3 files changed, 49 insertions(+)