Message ID | 1476214747-10428-1-git-send-email-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Hi Keith, On Tue, Oct 11, 2016 at 03:39:07PM -0400, Keith Busch wrote: > The pciehp control only clears the saved power fault detected if power > controller control is present, but there is no requirement that the > capability exist in order to observe power faults. This means a hot-added > device in a slot that had experienced a power fault at some point would > never be able to add a new device since the power fault detected would > be on permanently. > > This patch sets the sticky field only if there is a chance it can > be cleared later. I agree we should handle power_fault_detected consistently with respect to POWER_CTRL(). We currently clear power_fault_detected in pciehp_power_on_slot(), but we only call that if we have a power controller. But we set power_fault_detected in pciehp_isr() always, resulting in this "sticky" situation. I don't think it's 100% clear from the spec whether power faults can be detected without a power controller. All the "power fault" references are in the context of a power controller, e.g., r3.0 sec 6.7.1.8, 7.8.10, 7.8.11. But regardless, we're certainly doing the wrong thing right now. If we do have a power controller, it's fairly clear what we should do: - Power off the slot (even though the hardware is supposed to remove power automatically, sec 6.7.1.8 suggests that software should turn off the power), - Wait at least one second, per 6.7.1.8 (I think we do the power off and wait in the board_added() path, but not in the INT_POWER_FAULT path where we handle asynchronous events), - It looks like we currently turn on the attention indicator and turn off the power indicator (the INT_POWER_FAULT case in interrupt_event_handler()), - Wait for another slot event. If we *don't* have a power controller, we currently set power_fault_detected and log a message, but nothing else. > Signed-off-by: Keith Busch <keith.busch@intel.com> > --- > drivers/pci/hotplug/pciehp_hpc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > index b57fc6d..b083e1c 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -631,7 +631,8 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) > > /* Check Power Fault Detected */ > if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { > - ctrl->power_fault_detected = 1; > + if (POWER_CTRL(ctrl)) > + ctrl->power_fault_detected = 1; > ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); > pciehp_queue_interrupt_event(slot, INT_POWER_FAULT); I think this is correct as far as it goes, but I think we should do a little more: - When there's no power controller, we can't do anything to resolve a power fault, so I think the ctrl_err() should be rate-limited, - We can't turn off the power, but we could turn on the attention indicator, e.g., by making the INT_POWER_FAULT case look like this: case INT_POWER_FAULT: set_slot_off(ctrl, p_slot); break; -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 22, 2016 at 09:44:25AM -0500, Bjorn Helgaas wrote: > On Tue, Oct 11, 2016 at 03:39:07PM -0400, Keith Busch wrote: > > The pciehp control only clears the saved power fault detected if power > > controller control is present, but there is no requirement that the > > capability exist in order to observe power faults. This means a hot-added > > device in a slot that had experienced a power fault at some point would > > never be able to add a new device since the power fault detected would > > be on permanently. > > > > This patch sets the sticky field only if there is a chance it can > > be cleared later. > > I agree we should handle power_fault_detected consistently with > respect to POWER_CTRL(). We currently clear power_fault_detected in > pciehp_power_on_slot(), but we only call that if we have a power > controller. But we set power_fault_detected in pciehp_isr() always, > resulting in this "sticky" situation. > > I don't think it's 100% clear from the spec whether power faults can > be detected without a power controller. All the "power fault" > references are in the context of a power controller, e.g., r3.0 sec > 6.7.1.8, 7.8.10, 7.8.11. > > But regardless, we're certainly doing the wrong thing right now. If > we do have a power controller, it's fairly clear what we should do: > > - Power off the slot (even though the hardware is supposed to remove > power automatically, sec 6.7.1.8 suggests that software should > turn off the power), > > - Wait at least one second, per 6.7.1.8 (I think we do the power off > and wait in the board_added() path, but not in the INT_POWER_FAULT > path where we handle asynchronous events), > > - It looks like we currently turn on the attention indicator and > turn off the power indicator (the INT_POWER_FAULT case in > interrupt_event_handler()), > > - Wait for another slot event. > > If we *don't* have a power controller, we currently set > power_fault_detected and log a message, but nothing else. As always, thank you for the detailed analysis. While there a power controller should be present to observe power faults, a software controllable power controller (the "power controller control" capability) is not necessary as I understand it. That said, I've learned new information from the hardware side that may not require this patch, or something different entirely. I'll send an update if/when I find out. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index b57fc6d..b083e1c 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -631,7 +631,8 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) /* Check Power Fault Detected */ if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { - ctrl->power_fault_detected = 1; + if (POWER_CTRL(ctrl)) + ctrl->power_fault_detected = 1; ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); pciehp_queue_interrupt_event(slot, INT_POWER_FAULT); }
The pciehp control only clears the saved power fault detected if power controller control is present, but there is no requirement that the capability exist in order to observe power faults. This means a hot-added device in a slot that had experienced a power fault at some point would never be able to add a new device since the power fault detected would be on permanently. This patch sets the sticky field only if there is a chance it can be cleared later. Signed-off-by: Keith Busch <keith.busch@intel.com> --- drivers/pci/hotplug/pciehp_hpc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)