Message ID | 20240409160256.94184-4-dave.jiang@intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Add Secondary Bus Reset (SBR) support for CXL | expand |
Dave Jiang wrote: > CXL spec r3.1 8.1.5.2 > By default Secondary Bus Reset (SBR) is masked for CXL ports. Introduce a > new PCI reset method "cxl_bus" to force SBR on CXL ports by setting > the unmask SBR bit in the CXL DVSEC port control register before performing > the bus reset and restore the original value of the bit post reset. The > new reset method allows the user to intentionally perform SBR on a CXL > device without needing to set the "Unmask SBR" bit via a user tool. > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > v4: > - Use pci_dev_lock guard() on bridge. (Dan) > --- > drivers/pci/pci.c | 39 +++++++++++++++++++++++++++++++++++++++ > include/linux/pci.h | 2 +- > 2 files changed, 40 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 570b00fe10f7..3b4f7b369287 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4982,6 +4982,44 @@ static int pci_reset_bus_function(struct pci_dev *dev, bool probe) > return pci_parent_bus_reset(dev, probe); > } > > +static int cxl_reset_bus_function(struct pci_dev *dev, bool probe) > +{ > + struct pci_dev *bridge; > + u16 dvsec, reg, val; > + int rc; > + > + bridge = pci_upstream_bridge(dev); > + if (!bridge) > + return -ENOTTY; > + > + dvsec = cxl_port_dvsec(bridge); > + if (!dvsec) > + return -ENOTTY; > + > + if (probe) > + return 0; > + > + guard(pci_dev)(bridge); My concern here is this sets up a pci_reset_function() locking order of: pci_dev_lock(@dev) pci_dev_lock(pci_upstream_bridge(@dev)) Compare that to pci_reset_bus() that does: pci_dev_lock(pci_upstream_bridge(@dev)) pci_dev_lock(@dev) This also highlights that the pci_dev_lock() performed by pci_reset_function() has long been insufficient for the pci_reset_bus_function() method case that could race userspace when pci_reset_secondary_bus() is manipulating the bridge control register. So, if the goal of the lock is to prevent userspace from clobbering the kernel's read-modify-write cycles of @dev's parent bridge, then the lock needs to be held over both the cxl_reset_bus_function() and the pci_reset_bus_function() cases, and it needs to be taken in upstream-bridge => endpoint order.
On Fri, Apr 26, 2024 at 12:46:45PM -0700, Dan Williams wrote: > This also highlights that the pci_dev_lock() performed by > pci_reset_function() has long been insufficient for the > pci_reset_bus_function() method case that could race userspace when > pci_reset_secondary_bus() is manipulating the bridge control register. > > So, if the goal of the lock is to prevent userspace from clobbering the > kernel's read-modify-write cycles of @dev's parent bridge, then the lock > needs to be held over both the cxl_reset_bus_function() and the > pci_reset_bus_function() cases, and it needs to be taken in > upstream-bridge => endpoint order. No, the device lock is taken to prevent the driver from unbinding. It has nothing to do with protecting RMW of parent bridge registers. Here's Christoph Hellwig's explanation why he introduced acquisition of the device lock in the PCI reset code paths: https://lore.kernel.org/all/20200325104018.GA30853@lst.de/ TL;DR: The PCI core calls the driver's ->reset_prepare and ->reset_done callbacks and the driver needs to be held in place for that. Thanks, Lukas
Lukas Wunner wrote: > On Fri, Apr 26, 2024 at 12:46:45PM -0700, Dan Williams wrote: > > This also highlights that the pci_dev_lock() performed by > > pci_reset_function() has long been insufficient for the > > pci_reset_bus_function() method case that could race userspace when > > pci_reset_secondary_bus() is manipulating the bridge control register. > > > > So, if the goal of the lock is to prevent userspace from clobbering the > > kernel's read-modify-write cycles of @dev's parent bridge, then the lock > > needs to be held over both the cxl_reset_bus_function() and the > > pci_reset_bus_function() cases, and it needs to be taken in > > upstream-bridge => endpoint order. > > No, the device lock is taken to prevent the driver from unbinding. > It has nothing to do with protecting RMW of parent bridge registers. > > Here's Christoph Hellwig's explanation why he introduced acquisition > of the device lock in the PCI reset code paths: > > https://lore.kernel.org/all/20200325104018.GA30853@lst.de/ > > TL;DR: The PCI core calls the driver's ->reset_prepare and ->reset_done > callbacks and the driver needs to be held in place for that. Yes, that's what device_lock() is for, *but* that's not what pci_cfg_access_lock() is for. So when I say that pci_dev_lock() is to protect read-modify-write configuration cycle access of an endpoint to be reset, I am talking about pci_cfg_access_lock() not device_lock(). For better or worse pci_dev_lock() does both, and maybe device_lock() should be open-coded separately. So there is a discrepancy when it comes to protecting manipulation of secondary bus reset control registers of a bridge. The pci_cfg_access_lock() protection is applied to the bridge in the pci_reset_bus() case, but not the pci_reset_bus_function() case.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 570b00fe10f7..3b4f7b369287 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4982,6 +4982,44 @@ static int pci_reset_bus_function(struct pci_dev *dev, bool probe) return pci_parent_bus_reset(dev, probe); } +static int cxl_reset_bus_function(struct pci_dev *dev, bool probe) +{ + struct pci_dev *bridge; + u16 dvsec, reg, val; + int rc; + + bridge = pci_upstream_bridge(dev); + if (!bridge) + return -ENOTTY; + + dvsec = cxl_port_dvsec(bridge); + if (!dvsec) + return -ENOTTY; + + if (probe) + return 0; + + guard(pci_dev)(bridge); + rc = pci_read_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL, ®); + if (rc) + return -ENOTTY; + + if (!(reg & PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR)) { + val = reg | PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR; + pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL, + val); + } else { + val = reg; + } + + rc = pci_reset_bus_function(dev, probe); + + if (reg != val) + pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL, reg); + + return rc; +} + void pci_dev_lock(struct pci_dev *dev) { /* block PM suspend, driver probe, etc. */ @@ -5066,6 +5104,7 @@ static const struct pci_reset_fn_method pci_reset_fn_methods[] = { { pci_af_flr, .name = "af_flr" }, { pci_pm_reset, .name = "pm" }, { pci_reset_bus_function, .name = "bus" }, + { cxl_reset_bus_function, .name = "cxl_bus" }, }; static ssize_t reset_method_show(struct device *dev, diff --git a/include/linux/pci.h b/include/linux/pci.h index 16493426a04f..235f37715a43 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -51,7 +51,7 @@ PCI_STATUS_PARITY) /* Number of reset methods used in pci_reset_fn_methods array in pci.c */ -#define PCI_NUM_RESET_METHODS 7 +#define PCI_NUM_RESET_METHODS 8 #define PCI_RESET_PROBE true #define PCI_RESET_DO_RESET false