Message ID | 20171025112940.77890-1-mika.westerberg@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Wed, Oct 25, 2017 at 02:29:40PM +0300, Mika Westerberg wrote: > During surprise hot-unplug the device is not accessible anymore and > register reads return 0xffffffff. When that happens pciehp_unconfigure_device() > may inadvertently think the device below the bridge may be a display > device of somesort as reading PCI_BRIDGE_CONTROL register also returns > 0xff. This results failure of the remove operation: > > pciehp 0000:00:1c.0:pcie004: Slot(0): Link Down > pciehp 0000:00:1c.0:pcie004: Slot(0): Card present > pciehp 0000:00:1c.0:pcie004: Cannot remove display device 0000:01:00.0 > > Because of this the hierarchy is left untouched preventing further > hotplug operations. > > Now, it is not clear why the check is there in the first place and why > we would like to prevent removing a bridge if it has PCI_BRIDGE_CTL_VGA > set. In case of surprise hot-unplug, it would not even be possible to > prevent the removal. It may be due to the fact that pciehp_pci.c pretty > much copies similar implementation from shpchp_pci.c and this check was > just left there in the code without further thinking if it is actually > needed at all. > > Given this and the issue described above, I think it makes sense to drop > the whole PCI_BRIDGE_CONTROL check from pciehp_unconfigure_device() > which is what this patch does. > > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> > --- > The previous version of the patch can be found here: > > https://patchwork.kernel.org/patch/10024145/ > > Changes from the previous version: > > * Drop the whole PCI_BRIDGE_CONTROL check > * Update patch subject to reflect that > > drivers/pci/hotplug/pciehp_pci.c | 12 ------------ > 1 file changed, 12 deletions(-) > > diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c > index 2a1ca020cf5a..acc360d1a1fb 100644 > --- a/drivers/pci/hotplug/pciehp_pci.c > +++ b/drivers/pci/hotplug/pciehp_pci.c > @@ -79,7 +79,6 @@ int pciehp_configure_device(struct slot *p_slot) > int pciehp_unconfigure_device(struct slot *p_slot) > { > int rc = 0; > - u8 bctl = 0; > u8 presence = 0; > struct pci_dev *dev, *temp; > struct pci_bus *parent = p_slot->ctrl->pcie->port->subordinate; > @@ -101,17 +100,6 @@ int pciehp_unconfigure_device(struct slot *p_slot) > list_for_each_entry_safe_reverse(dev, temp, &parent->devices, > bus_list) { > pci_dev_get(dev); > - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) { > - pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl); > - if (bctl & PCI_BRIDGE_CTL_VGA) { > - ctrl_err(ctrl, > - "Cannot remove display device %s\n", > - pci_name(dev)); > - pci_dev_put(dev); > - rc = -EINVAL; > - break; > - } > - } > if (!presence) { > pci_dev_set_disconnected(dev, NULL); > if (pci_has_subordinate(dev)) As you mention, shpchp_unconfigure_device() contains the same code; is there any reason not to remove it from there as well? I know you probably can't test shpchp, and neither can I, but it looks like it has the same issue, and I don't want to avoid fixing it there just for want of testing. I wish we knew more about why that PCI_BRIDGE_CTL_VGA check was there in the first place. I think it was added by Dely Sy in 2004 [1]. LinkedIn thinks he's still at Intel; any chance you could ping him and see if he has any insight? My guess is that it was originally in a path that didn't have to worry about surprise unplug. But I still don't know why it would be a problem. Bjorn [1] https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=c16b4b14d980
On Mon, Nov 06, 2017 at 05:14:09PM -0600, Bjorn Helgaas wrote: > As you mention, shpchp_unconfigure_device() contains the same code; is > there any reason not to remove it from there as well? I don't see why we cannot remove that as well. But I have to admit that I don't know much about conventional PCI hotplug. > I know you probably can't test shpchp, and neither can I, but it looks > like it has the same issue, and I don't want to avoid fixing it there > just for want of testing. Yeah, I can't test it but no problem removing that code from shpchp as well. I think it should be a separate patch, though. > I wish we knew more about why that PCI_BRIDGE_CTL_VGA check was there > in the first place. I think it was added by Dely Sy in 2004 [1]. > LinkedIn thinks he's still at Intel; any chance you could ping him and > see if he has any insight? I checked from our internal "phone book" and could not find him there, unfortunately. > My guess is that it was originally in a > path that didn't have to worry about surprise unplug. But I still > don't know why it would be a problem. One reason might be that if you do unplug through sysfs and you are running some graphics console (X) or so, you will suddenly lose access to the system. However, even in that case you can just ssh to the box and re-enable the card...
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c index 2a1ca020cf5a..acc360d1a1fb 100644 --- a/drivers/pci/hotplug/pciehp_pci.c +++ b/drivers/pci/hotplug/pciehp_pci.c @@ -79,7 +79,6 @@ int pciehp_configure_device(struct slot *p_slot) int pciehp_unconfigure_device(struct slot *p_slot) { int rc = 0; - u8 bctl = 0; u8 presence = 0; struct pci_dev *dev, *temp; struct pci_bus *parent = p_slot->ctrl->pcie->port->subordinate; @@ -101,17 +100,6 @@ int pciehp_unconfigure_device(struct slot *p_slot) list_for_each_entry_safe_reverse(dev, temp, &parent->devices, bus_list) { pci_dev_get(dev); - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) { - pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl); - if (bctl & PCI_BRIDGE_CTL_VGA) { - ctrl_err(ctrl, - "Cannot remove display device %s\n", - pci_name(dev)); - pci_dev_put(dev); - rc = -EINVAL; - break; - } - } if (!presence) { pci_dev_set_disconnected(dev, NULL); if (pci_has_subordinate(dev))
During surprise hot-unplug the device is not accessible anymore and register reads return 0xffffffff. When that happens pciehp_unconfigure_device() may inadvertently think the device below the bridge may be a display device of somesort as reading PCI_BRIDGE_CONTROL register also returns 0xff. This results failure of the remove operation: pciehp 0000:00:1c.0:pcie004: Slot(0): Link Down pciehp 0000:00:1c.0:pcie004: Slot(0): Card present pciehp 0000:00:1c.0:pcie004: Cannot remove display device 0000:01:00.0 Because of this the hierarchy is left untouched preventing further hotplug operations. Now, it is not clear why the check is there in the first place and why we would like to prevent removing a bridge if it has PCI_BRIDGE_CTL_VGA set. In case of surprise hot-unplug, it would not even be possible to prevent the removal. It may be due to the fact that pciehp_pci.c pretty much copies similar implementation from shpchp_pci.c and this check was just left there in the code without further thinking if it is actually needed at all. Given this and the issue described above, I think it makes sense to drop the whole PCI_BRIDGE_CONTROL check from pciehp_unconfigure_device() which is what this patch does. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> --- The previous version of the patch can be found here: https://patchwork.kernel.org/patch/10024145/ Changes from the previous version: * Drop the whole PCI_BRIDGE_CONTROL check * Update patch subject to reflect that drivers/pci/hotplug/pciehp_pci.c | 12 ------------ 1 file changed, 12 deletions(-)