Message ID | 1372993054-25730-2-git-send-email-shangw@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Fri, 2013-07-05 at 10:57 +0800, Gavin Shan wrote: > When stopping and removing one specific PCI device, the platform > might need take some actions. One example is that EEH already had > eeh cache and eeh device attached to the PCI device, and we need > release eeh cache and device during the time. The patch introduces > hook pcibios_stop_dev() for the purpose. Bjorn, any objection ? Ack ? Nack ? :-) I'd like to put that in my tree, it's part of a series that fixes a number of bugs with our hotplug and EEH code which I'd like to send to Linus soonish. Cheers, Ben. > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> > --- > drivers/pci/probe.c | 4 ++++ > drivers/pci/remove.c | 2 ++ > include/linux/pci.h | 1 + > 3 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 70f10fa..7167dc4 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1669,6 +1669,10 @@ void __weak pcibios_remove_bus(struct pci_bus *bus) > { > } > > +void __weak pcibios_stop_dev(struct pci_dev *dev) > +{ > +} > + > struct pci_bus *pci_create_root_bus(struct device *parent, int bus, > struct pci_ops *ops, void *sysdata, struct list_head *resources) > { > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > index 8fc54b7..e329efc 100644 > --- a/drivers/pci/remove.c > +++ b/drivers/pci/remove.c > @@ -21,6 +21,8 @@ static void pci_stop_dev(struct pci_dev *dev) > { > pci_pme_active(dev, false); > > + pcibios_stop_dev(dev); > + > if (dev->is_added) { > pci_proc_detach_device(dev); > pci_remove_sysfs_dev_files(dev); > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 3a24e4f..40df783 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -696,6 +696,7 @@ int no_pci_devices(void); > void pcibios_resource_survey_bus(struct pci_bus *bus); > void pcibios_add_bus(struct pci_bus *bus); > void pcibios_remove_bus(struct pci_bus *bus); > +void pcibios_stop_dev(struct pci_dev *dev); > void pcibios_fixup_bus(struct pci_bus *); > int __must_check pcibios_enable_device(struct pci_dev *, int mask); > /* Architecture specific versions may override this (weak) */ -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 4, 2013 at 8:57 PM, Gavin Shan <shangw@linux.vnet.ibm.com> wrote: > When stopping and removing one specific PCI device, the platform > might need take some actions. One example is that EEH already had > eeh cache and eeh device attached to the PCI device, and we need > release eeh cache and device during the time. The patch introduces > hook pcibios_stop_dev() for the purpose. > > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> > --- > drivers/pci/probe.c | 4 ++++ > drivers/pci/remove.c | 2 ++ > include/linux/pci.h | 1 + > 3 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 70f10fa..7167dc4 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1669,6 +1669,10 @@ void __weak pcibios_remove_bus(struct pci_bus *bus) > { > } > > +void __weak pcibios_stop_dev(struct pci_dev *dev) > +{ > +} > + > struct pci_bus *pci_create_root_bus(struct device *parent, int bus, > struct pci_ops *ops, void *sysdata, struct list_head *resources) > { > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > index 8fc54b7..e329efc 100644 > --- a/drivers/pci/remove.c > +++ b/drivers/pci/remove.c > @@ -21,6 +21,8 @@ static void pci_stop_dev(struct pci_dev *dev) > { > pci_pme_active(dev, false); > > + pcibios_stop_dev(dev); We already have pcibios_release_device() (just merged for v3.11). Would that work for you? I haven't seen your whole series, so I can't tell whether you need something different. > if (dev->is_added) { > pci_proc_detach_device(dev); > pci_remove_sysfs_dev_files(dev); > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 3a24e4f..40df783 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -696,6 +696,7 @@ int no_pci_devices(void); > void pcibios_resource_survey_bus(struct pci_bus *bus); > void pcibios_add_bus(struct pci_bus *bus); > void pcibios_remove_bus(struct pci_bus *bus); > +void pcibios_stop_dev(struct pci_dev *dev); > void pcibios_fixup_bus(struct pci_bus *); > int __must_check pcibios_enable_device(struct pci_dev *, int mask); > /* Architecture specific versions may override this (weak) */ > -- > 1.7.5.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-07-05 at 12:49 -0600, Bjorn Helgaas wrote: > We already have pcibios_release_device() (just merged for v3.11). > Would that work for you? I haven't seen your whole series, so I can't > tell whether you need something different. Maybe... Gavin's original hook applies when the device is removed from the tree, your new hook when the refcount expires and the structure is about to be freed... I *suspect* that's fine but I'll need to have a closer look (or wait for Gavin to be back from vacation :-) BTW. One thing we do in EEH is that if we get an error and the driver for the device doesn't implement recovery, we simulate a hotplug. IE. We unplug the device (currently removing it), reset it (or the bus it's on, whatever applies, which lifts the fence established by the EEH hardware), and re-plug it. The current method really removes the device from the PCI subsystem, and "plugs" it back. IE. We rescan the slot, and might even end up re-assigning resources, which I'm not too fan about. It's also unclear what quirks gets called in that re-plug case, etc... I'm wondering whether it might be better to keep the pci_dev around, just mark it blocked for user access, unbind the driver, reset, restore the BARs to their original values, unblock and re-bind. What's your take on this ? Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 5, 2013 at 4:36 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Fri, 2013-07-05 at 12:49 -0600, Bjorn Helgaas wrote: >> We already have pcibios_release_device() (just merged for v3.11). >> Would that work for you? I haven't seen your whole series, so I can't >> tell whether you need something different. > > Maybe... Gavin's original hook applies when the device is removed > from the tree, your new hook when the refcount expires and the > structure is about to be freed... Yep, pcibios_release_device() is definitely at a different point. I just want to avoid exposing too many points in the device lifetime to the arch, because it makes it so much harder to refactor things. > I *suspect* that's fine but I'll need to have a closer look (or wait > for Gavin to be back from vacation :-) > > BTW. One thing we do in EEH is that if we get an error and the driver > for the device doesn't implement recovery, we simulate a hotplug. > > IE. We unplug the device (currently removing it), reset it (or the > bus it's on, whatever applies, which lifts the fence established by > the EEH hardware), and re-plug it. > > The current method really removes the device from the PCI subsystem, > and "plugs" it back. IE. We rescan the slot, and might even end up > re-assigning resources, which I'm not too fan about. It's also unclear > what quirks gets called in that re-plug case, etc... > > I'm wondering whether it might be better to keep the pci_dev around, > just mark it blocked for user access, unbind the driver, reset, restore > the BARs to their original values, unblock and re-bind. Personally, I like the full hotplug approach, even though it will probably reassign resources, because it uses the standard paths that should be better-tested, and it's clear what should happen. Reassigning resources should be OK because we already have to do that for the non-error handling hot-add case. I know we have paths where we save config space, do a reset, and restore config space. I don't like those because some quirks aren't applied and I don't believe restoring config space is sufficient to make the device safe to use. There could be internal state that we aren't restoring. It's also possible that a firmware update means the device is different than it was before the reset, e.g., BARs could be different types or sizes. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-07-05 at 16:49 -0600, Bjorn Helgaas wrote: > I know we have paths where we save config space, do a reset, and > restore config space. I don't like those because some quirks aren't > applied and I don't believe restoring config space is sufficient to > make the device safe to use. There could be internal state that we > aren't restoring. It's also possible that a firmware update means the > device is different than it was before the reset, e.g., BARs could be > different types or sizes. True. Ok, I'll stick to the actual hotplug path then, you make good points. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 70f10fa..7167dc4 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1669,6 +1669,10 @@ void __weak pcibios_remove_bus(struct pci_bus *bus) { } +void __weak pcibios_stop_dev(struct pci_dev *dev) +{ +} + struct pci_bus *pci_create_root_bus(struct device *parent, int bus, struct pci_ops *ops, void *sysdata, struct list_head *resources) { diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c index 8fc54b7..e329efc 100644 --- a/drivers/pci/remove.c +++ b/drivers/pci/remove.c @@ -21,6 +21,8 @@ static void pci_stop_dev(struct pci_dev *dev) { pci_pme_active(dev, false); + pcibios_stop_dev(dev); + if (dev->is_added) { pci_proc_detach_device(dev); pci_remove_sysfs_dev_files(dev); diff --git a/include/linux/pci.h b/include/linux/pci.h index 3a24e4f..40df783 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -696,6 +696,7 @@ int no_pci_devices(void); void pcibios_resource_survey_bus(struct pci_bus *bus); void pcibios_add_bus(struct pci_bus *bus); void pcibios_remove_bus(struct pci_bus *bus); +void pcibios_stop_dev(struct pci_dev *dev); void pcibios_fixup_bus(struct pci_bus *); int __must_check pcibios_enable_device(struct pci_dev *, int mask); /* Architecture specific versions may override this (weak) */
When stopping and removing one specific PCI device, the platform might need take some actions. One example is that EEH already had eeh cache and eeh device attached to the PCI device, and we need release eeh cache and device during the time. The patch introduces hook pcibios_stop_dev() for the purpose. Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> --- drivers/pci/probe.c | 4 ++++ drivers/pci/remove.c | 2 ++ include/linux/pci.h | 1 + 3 files changed, 7 insertions(+), 0 deletions(-)