Message ID | 20180905203546.21921-16-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI, error handling and hot plug | expand |
On Wed, Sep 05, 2018 at 02:35:41PM -0600, Keith Busch wrote: > A device add in a power controller controlled slot will power on and > clear power fault slot events, but this was happening before the interrupt > handler attempted to set the sticky status and attention indicators. The > wrong status will be set if a hot-add and power fault are handled in > one interrupt. This patch fixes that by checking for power faults before > checking for new devices. Can you clarify the part about "the interrupt handler attempting to set the sticky status and attention indicators"? My first impression is that you're talking about bits in the Slot Status register, but that's obviously wrong because those bits are set by hardware (not the interrupt handler) and they're RW1C so software clears them by writing 1 to them. Lukas suggests that this patch should be in v4.19. Do you agree, and if so, can you help me justify it by describing the user-visible effect of this? I'm not sure what "setting the wrong status" means to a user, e.g., does this result in a non-functional device, an incorrect status LED on the slot, something else? Does it fix a regression or something we merged for v4.19? > Signed-off-by: Keith Busch <keith.busch@intel.com> > Reviewed-by: Lukas Wunner <lukas@wunner.de> > --- > drivers/pci/hotplug/pciehp_hpc.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > index 9eb28a06cac6..52a18a7ec2a2 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -630,6 +630,14 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) > pciehp_handle_button_press(slot); > } > > + /* Check Power Fault Detected */ > + if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { > + ctrl->power_fault_detected = 1; > + ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); > + pciehp_set_attention_status(slot, 1); > + pciehp_green_led_off(slot); > + } > + > /* > * Disable requests have higher priority than Presence Detect Changed > * or Data Link Layer State Changed events. > @@ -641,14 +649,6 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) > pciehp_handle_presence_or_link_change(slot, events); > up_read(&ctrl->reset_lock); > > - /* Check Power Fault Detected */ > - if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { > - ctrl->power_fault_detected = 1; > - ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); > - pciehp_set_attention_status(slot, 1); > - pciehp_green_led_off(slot); > - } > - > pci_config_pm_runtime_put(pdev); > wake_up(&ctrl->requester); > return IRQ_HANDLED; > -- > 2.14.4 >
On Thu, Sep 06, 2018 at 02:36:57PM -0500, Bjorn Helgaas wrote: > On Wed, Sep 05, 2018 at 02:35:41PM -0600, Keith Busch wrote: > > A device add in a power controller controlled slot will power on and > > clear power fault slot events, but this was happening before the interrupt > > handler attempted to set the sticky status and attention indicators. The > > wrong status will be set if a hot-add and power fault are handled in > > one interrupt. This patch fixes that by checking for power faults before > > checking for new devices. > > Can you clarify the part about "the interrupt handler attempting to set the > sticky status and attention indicators"? My first impression is that > you're talking about bits in the Slot Status register, but that's > obviously wrong because those bits are set by hardware (not the interrupt > handler) and they're RW1C so software clears them by writing 1 to them. The sticky status being the pciehp driver's "power_fault_detected" field. We set it on the first observation of a slot's PFD and do not clear it until we have a successful board_added event. > Lukas suggests that this patch should be in v4.19. Do you agree, and if > so, can you help me justify it by describing the user-visible effect of > this? I'm not sure what "setting the wrong status" means to a user, e.g., > does this result in a non-functional device, an incorrect status LED on the > slot, something else? Does it fix a regression or something we merged for > v4.19? From a user point of view, it is possible the attention LED light could be on after a successful hot add. The only reason this was successful before was how everything was chained through work queues, the work order being: INT_PRESENCE_ON -> INT_POWER_FAULT -> ENABLE_REQ The ENABLE_REQ cleared the power fault at the end, but now everything is handled inline with the interrupt thread (which was a great change, IMO), such that the work ENABLE_REQ was doing happens before power fault handling now. The commit that changed that order: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=0e94916e6091f48391b65110e71c87c583021640 > > Signed-off-by: Keith Busch <keith.busch@intel.com> > > Reviewed-by: Lukas Wunner <lukas@wunner.de> > > --- > > drivers/pci/hotplug/pciehp_hpc.c | 16 ++++++++-------- > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > > index 9eb28a06cac6..52a18a7ec2a2 100644 > > --- a/drivers/pci/hotplug/pciehp_hpc.c > > +++ b/drivers/pci/hotplug/pciehp_hpc.c > > @@ -630,6 +630,14 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) > > pciehp_handle_button_press(slot); > > } > > > > + /* Check Power Fault Detected */ > > + if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { > > + ctrl->power_fault_detected = 1; > > + ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); > > + pciehp_set_attention_status(slot, 1); > > + pciehp_green_led_off(slot); > > + } > > + > > /* > > * Disable requests have higher priority than Presence Detect Changed > > * or Data Link Layer State Changed events. > > @@ -641,14 +649,6 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) > > pciehp_handle_presence_or_link_change(slot, events); > > up_read(&ctrl->reset_lock); > > > > - /* Check Power Fault Detected */ > > - if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { > > - ctrl->power_fault_detected = 1; > > - ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); > > - pciehp_set_attention_status(slot, 1); > > - pciehp_green_led_off(slot); > > - } > > - > > pci_config_pm_runtime_put(pdev); > > wake_up(&ctrl->requester); > > return IRQ_HANDLED; > > -- > > 2.14.4 > >
On Thu, Sep 06, 2018 at 01:50:47PM -0600, Keith Busch wrote: > On Thu, Sep 06, 2018 at 02:36:57PM -0500, Bjorn Helgaas wrote: > > On Wed, Sep 05, 2018 at 02:35:41PM -0600, Keith Busch wrote: > > > A device add in a power controller controlled slot will power on and > > > clear power fault slot events, but this was happening before the interrupt > > > handler attempted to set the sticky status and attention indicators. The > > > wrong status will be set if a hot-add and power fault are handled in > > > one interrupt. This patch fixes that by checking for power faults before > > > checking for new devices. > > > > Can you clarify the part about "the interrupt handler attempting to set the > > sticky status and attention indicators"? My first impression is that > > you're talking about bits in the Slot Status register, but that's > > obviously wrong because those bits are set by hardware (not the interrupt > > handler) and they're RW1C so software clears them by writing 1 to them. > > The sticky status being the pciehp driver's "power_fault_detected" > field. We set it on the first observation of a slot's PFD and do not > clear it until we have a successful board_added event. > > > Lukas suggests that this patch should be in v4.19. Do you agree, and if > > so, can you help me justify it by describing the user-visible effect of > > this? I'm not sure what "setting the wrong status" means to a user, e.g., > > does this result in a non-functional device, an incorrect status LED on the > > slot, something else? Does it fix a regression or something we merged for > > v4.19? > > From a user point of view, it is possible the attention LED light could be > on after a successful hot add. Great, thanks! Also, it looks like the power LED will be off even though the power is actually on. pciehp_ist if (events & (PDC | DLLSC)) pciehp_handle_presence_or_link_change case OFF_STATE: pciehp_enable_slot __pciehp_enable_slot board_added pciehp_power_on_slot ctrl->power_fault_detected = 0 pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC) if (PFD && !ctrl->power_fault_detected) ctrl->power_fault_detected = 1 pciehp_set_attention_status(slot, 1) # attention LED on pciehp_green_led_off(slot) # power LED off Tangent: how annoying that the spec refers to "Power Indicator" and "Attention Indicator", but (a) we call them the "green_led" and "attention_status", and (b) both can be on/off/blinking, but the interfaces are totally different. > The only reason this was successful before was how everything was chained > through work queues, the work order being: > > INT_PRESENCE_ON -> INT_POWER_FAULT -> ENABLE_REQ > > The ENABLE_REQ cleared the power fault at the end, but now everything > is handled inline with the interrupt thread (which was a great change, > IMO), such that the work ENABLE_REQ was doing happens before power > fault handling now. > > The commit that changed that order: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=0e94916e6091f48391b65110e71c87c583021640 > > > > > Signed-off-by: Keith Busch <keith.busch@intel.com> > > > Reviewed-by: Lukas Wunner <lukas@wunner.de> > > > --- > > > drivers/pci/hotplug/pciehp_hpc.c | 16 ++++++++-------- > > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > > > index 9eb28a06cac6..52a18a7ec2a2 100644 > > > --- a/drivers/pci/hotplug/pciehp_hpc.c > > > +++ b/drivers/pci/hotplug/pciehp_hpc.c > > > @@ -630,6 +630,14 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) > > > pciehp_handle_button_press(slot); > > > } > > > > > > + /* Check Power Fault Detected */ > > > + if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { > > > + ctrl->power_fault_detected = 1; > > > + ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); > > > + pciehp_set_attention_status(slot, 1); > > > + pciehp_green_led_off(slot); > > > + } > > > + > > > /* > > > * Disable requests have higher priority than Presence Detect Changed > > > * or Data Link Layer State Changed events. > > > @@ -641,14 +649,6 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) > > > pciehp_handle_presence_or_link_change(slot, events); > > > up_read(&ctrl->reset_lock); > > > > > > - /* Check Power Fault Detected */ > > > - if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { > > > - ctrl->power_fault_detected = 1; > > > - ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); > > > - pciehp_set_attention_status(slot, 1); > > > - pciehp_green_led_off(slot); > > > - } > > > - > > > pci_config_pm_runtime_put(pdev); > > > wake_up(&ctrl->requester); > > > return IRQ_HANDLED; > > > -- > > > 2.14.4 > > >
On Fri, Sep 07, 2018 at 11:53:52AM -0500, Bjorn Helgaas wrote: > On Thu, Sep 06, 2018 at 01:50:47PM -0600, Keith Busch wrote: > > On Thu, Sep 06, 2018 at 02:36:57PM -0500, Bjorn Helgaas wrote: > > > On Wed, Sep 05, 2018 at 02:35:41PM -0600, Keith Busch wrote: > > > > A device add in a power controller controlled slot will power on and > > > > clear power fault slot events, but this was happening before the interrupt > > > > handler attempted to set the sticky status and attention indicators. The > > > > wrong status will be set if a hot-add and power fault are handled in > > > > one interrupt. This patch fixes that by checking for power faults before > > > > checking for new devices. > > > > > > Can you clarify the part about "the interrupt handler attempting to set the > > > sticky status and attention indicators"? My first impression is that > > > you're talking about bits in the Slot Status register, but that's > > > obviously wrong because those bits are set by hardware (not the interrupt > > > handler) and they're RW1C so software clears them by writing 1 to them. > > > > The sticky status being the pciehp driver's "power_fault_detected" > > field. We set it on the first observation of a slot's PFD and do not > > clear it until we have a successful board_added event. > > > > > Lukas suggests that this patch should be in v4.19. Do you agree, and if > > > so, can you help me justify it by describing the user-visible effect of > > > this? I'm not sure what "setting the wrong status" means to a user, e.g., > > > does this result in a non-functional device, an incorrect status LED on the > > > slot, something else? Does it fix a regression or something we merged for > > > v4.19? > > > > From a user point of view, it is possible the attention LED light could be > > on after a successful hot add. > > Great, thanks! Also, it looks like the power LED will be off even though > the power is actually on. > > pciehp_ist > if (events & (PDC | DLLSC)) > pciehp_handle_presence_or_link_change > case OFF_STATE: > pciehp_enable_slot > __pciehp_enable_slot > board_added > pciehp_power_on_slot > ctrl->power_fault_detected = 0 > pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC) > if (PFD && !ctrl->power_fault_detected) > ctrl->power_fault_detected = 1 > pciehp_set_attention_status(slot, 1) # attention LED on > pciehp_green_led_off(slot) # power LED off > > > Tangent: how annoying that the spec refers to "Power Indicator" and > "Attention Indicator", but (a) we call them the "green_led" and > "attention_status", and (b) both can be on/off/blinking, but the interfaces > are totally different. I applied this to for-linus with the following changelog. Let me know if I didn't understand this correctly. I changed the comment in pciehp_power_on_slot() so it doesn't say "sticky" to avoid confusion with the PCI spec concept of sticky register bits (ROS, RWS, RW1CS). commit 342227b42fe849eb2edac38342702aff12a5491d Author: Keith Busch <keith.busch@intel.com> Date: Wed Sep 5 14:35:41 2018 -0600 PCI: pciehp: Fix hot-add vs powerfault detection order If both hot-add and power fault were observed in a single interrupt, we handled the hot-add first, then the power fault, in this path: pciehp_ist if (events & (PDC | DLLSC)) pciehp_handle_presence_or_link_change case OFF_STATE: pciehp_enable_slot __pciehp_enable_slot board_added pciehp_power_on_slot ctrl->power_fault_detected = 0 pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC) pciehp_green_led_on(p_slot) # power LED on pciehp_set_attention_status(p_slot, 0) # attention LED off if ((events & PFD) && !ctrl->power_fault_detected) ctrl->power_fault_detected = 1 pciehp_set_attention_status(1) # attention LED on pciehp_green_led_off(slot) # power LED off This left the attention indicator on (even though the hot-add succeeded) and the power indicator off (even though the slot power was on). Fix this by checking for power faults before checking for new devices. Fixes: 0e94916e6091 ("PCI: pciehp: Handle events synchronously") Signed-off-by: Keith Busch <keith.busch@intel.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 7136e3430925..a938abdb41ce 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -496,7 +496,7 @@ int pciehp_power_on_slot(struct slot *slot) u16 slot_status; int retval; - /* Clear sticky power-fault bit from previous power failures */ + /* Clear power-fault bit from previous power failures */ pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status); if (slot_status & PCI_EXP_SLTSTA_PFD) pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, @@ -646,6 +646,14 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) pciehp_handle_button_press(slot); } + /* Check Power Fault Detected */ + if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { + ctrl->power_fault_detected = 1; + ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); + pciehp_set_attention_status(slot, 1); + pciehp_green_led_off(slot); + } + /* * Disable requests have higher priority than Presence Detect Changed * or Data Link Layer State Changed events. @@ -657,14 +665,6 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) pciehp_handle_presence_or_link_change(slot, events); up_read(&ctrl->reset_lock); - /* Check Power Fault Detected */ - if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { - ctrl->power_fault_detected = 1; - ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); - pciehp_set_attention_status(slot, 1); - pciehp_green_led_off(slot); - } - pci_config_pm_runtime_put(pdev); wake_up(&ctrl->requester); return IRQ_HANDLED;
On Fri, Sep 07, 2018 at 03:03:32PM -0500, Bjorn Helgaas wrote: > I applied this to for-linus with the following changelog. Let me know > if I didn't understand this correctly. I changed the comment in > pciehp_power_on_slot() so it doesn't say "sticky" to avoid confusion > with the PCI spec concept of sticky register bits (ROS, RWS, RW1CS). Perfect! Thanks for queueing this up. I'll drop this one from the rest of the series, which will need at least a v3 to fix a dumb mistake in pointed out in review, and I'll get the order to better sense (or maybe split into independent patch sets). > commit 342227b42fe849eb2edac38342702aff12a5491d > Author: Keith Busch <keith.busch@intel.com> > Date: Wed Sep 5 14:35:41 2018 -0600 > > PCI: pciehp: Fix hot-add vs powerfault detection order > > If both hot-add and power fault were observed in a single interrupt, we > handled the hot-add first, then the power fault, in this path: > > pciehp_ist > if (events & (PDC | DLLSC)) > pciehp_handle_presence_or_link_change > case OFF_STATE: > pciehp_enable_slot > __pciehp_enable_slot > board_added > pciehp_power_on_slot > ctrl->power_fault_detected = 0 > pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC) > pciehp_green_led_on(p_slot) # power LED on > pciehp_set_attention_status(p_slot, 0) # attention LED off > if ((events & PFD) && !ctrl->power_fault_detected) > ctrl->power_fault_detected = 1 > pciehp_set_attention_status(1) # attention LED on > pciehp_green_led_off(slot) # power LED off > > This left the attention indicator on (even though the hot-add succeeded) > and the power indicator off (even though the slot power was on). > > Fix this by checking for power faults before checking for new devices. > > Fixes: 0e94916e6091 ("PCI: pciehp: Handle events synchronously") > Signed-off-by: Keith Busch <keith.busch@intel.com> > [bhelgaas: changelog] > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > Reviewed-by: Lukas Wunner <lukas@wunner.de> > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > index 7136e3430925..a938abdb41ce 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -496,7 +496,7 @@ int pciehp_power_on_slot(struct slot *slot) > u16 slot_status; > int retval; > > - /* Clear sticky power-fault bit from previous power failures */ > + /* Clear power-fault bit from previous power failures */ > pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status); > if (slot_status & PCI_EXP_SLTSTA_PFD) > pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, > @@ -646,6 +646,14 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) > pciehp_handle_button_press(slot); > } > > + /* Check Power Fault Detected */ > + if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { > + ctrl->power_fault_detected = 1; > + ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); > + pciehp_set_attention_status(slot, 1); > + pciehp_green_led_off(slot); > + } > + > /* > * Disable requests have higher priority than Presence Detect Changed > * or Data Link Layer State Changed events. > @@ -657,14 +665,6 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) > pciehp_handle_presence_or_link_change(slot, events); > up_read(&ctrl->reset_lock); > > - /* Check Power Fault Detected */ > - if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { > - ctrl->power_fault_detected = 1; > - ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); > - pciehp_set_attention_status(slot, 1); > - pciehp_green_led_off(slot); > - } > - > pci_config_pm_runtime_put(pdev); > wake_up(&ctrl->requester); > return IRQ_HANDLED;
On Fri, Sep 07, 2018 at 03:03:32PM -0500, Bjorn Helgaas wrote: > I applied this to for-linus with the following changelog. Let me know > if I didn't understand this correctly. I changed the comment in > pciehp_power_on_slot() so it doesn't say "sticky" to avoid confusion > with the PCI spec concept of sticky register bits (ROS, RWS, RW1CS). The edited changelog and patch look perfectly fine to me, thanks a lot and sorry for missing this when doing the rework. (I missed it because Thunderbolt doesn't have a power controller, hence can't signal a power fault.) The "sticky" refers to the property that if a power fault occurs and the Power Fault Detected bit is cleared to acknowledge receipt of the event, and if the power fault persists, the bit is immediately set again and another interrupt is signaled. In that sense, the bit is "sticky" and that's what the code comment was referring to. It's basically level-triggered as long as the power fault persists. pciehp does not clear the bit on receipt of a PFD event, but only sets a flag in its internal struct. This avoids an interrupt storm. Both the bit and the internal flag are cleared when attempting to bring the slot up again, either through an unplug-replug operation by the user or an enable request via sysfs or an Attention Button press. In either case user intervention is required. If the power fault is still not gone, bringup of the slot is aborted. The problem here was not only that the LED is turned off despite the slot being brought up, but that the internal flag ctrl->power_fault_detected was incorrectly set to 1 even though it had just been set to 0 when successfully bringing up the slot. There are some oddities with the power fault handling code, such as a "TBD" code comment in pcie_enable_notification() where it's unclear if there's really anything left "to be done". I collected this and other oddities in this e-mail: https://www.spinics.net/lists/linux-pci/msg75743.html Thanks, Lukas
On Fri, Sep 07, 2018 at 02:18:19PM -0600, Keith Busch wrote: > On Fri, Sep 07, 2018 at 03:03:32PM -0500, Bjorn Helgaas wrote: > > I applied this to for-linus with the following changelog. Let me know > > if I didn't understand this correctly. I changed the comment in > > pciehp_power_on_slot() so it doesn't say "sticky" to avoid confusion > > with the PCI spec concept of sticky register bits (ROS, RWS, RW1CS). > > Perfect! Thanks for queueing this up. I'll drop this one from the rest > of the series, which will need at least a v3 to fix a dumb mistake in > pointed out in review, and I'll get the order to better sense (or maybe > split into independent patch sets). Are you still planning a v3? I really want to get this in for v4.20 and I think there's probably some integration to be done with Lukas' series (which I haven't applied yet either). I rebased my branches to v4.19-rc4 to avoid a merge conflict Lukas pointed out.
On Tue, Sep 18, 2018 at 02:46:50PM -0700, Bjorn Helgaas wrote: > On Fri, Sep 07, 2018 at 02:18:19PM -0600, Keith Busch wrote: > > On Fri, Sep 07, 2018 at 03:03:32PM -0500, Bjorn Helgaas wrote: > > > I applied this to for-linus with the following changelog. Let me know > > > if I didn't understand this correctly. I changed the comment in > > > pciehp_power_on_slot() so it doesn't say "sticky" to avoid confusion > > > with the PCI spec concept of sticky register bits (ROS, RWS, RW1CS). > > > > Perfect! Thanks for queueing this up. I'll drop this one from the rest > > of the series, which will need at least a v3 to fix a dumb mistake in > > pointed out in review, and I'll get the order to better sense (or maybe > > split into independent patch sets). > > Are you still planning a v3? I really want to get this in for v4.20 > and I think there's probably some integration to be done with Lukas' > series (which I haven't applied yet either). > > I rebased my branches to v4.19-rc4 to avoid a merge conflict Lukas > pointed out. I'll send something out today, and I think I'll split it into multiple independent sets. I had to trim down what this is trying to accomplish due to existing deadlocking bugs I've found in testing: there are several circular dependencies on tasks holding the single pci_rescan_remove_lock. I don't think I'll be able to fix that in time for 4.20, but I'll send the parts that I believe are an improvement that don't break anything else.
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 9eb28a06cac6..52a18a7ec2a2 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -630,6 +630,14 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) pciehp_handle_button_press(slot); } + /* Check Power Fault Detected */ + if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { + ctrl->power_fault_detected = 1; + ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); + pciehp_set_attention_status(slot, 1); + pciehp_green_led_off(slot); + } + /* * Disable requests have higher priority than Presence Detect Changed * or Data Link Layer State Changed events. @@ -641,14 +649,6 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) pciehp_handle_presence_or_link_change(slot, events); up_read(&ctrl->reset_lock); - /* Check Power Fault Detected */ - if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { - ctrl->power_fault_detected = 1; - ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); - pciehp_set_attention_status(slot, 1); - pciehp_green_led_off(slot); - } - pci_config_pm_runtime_put(pdev); wake_up(&ctrl->requester); return IRQ_HANDLED;