diff mbox series

[v5,2/6] PCI: allow for callback to prepare nascent subdev

Message ID 20211022140714.28767-3-jim2101024@gmail.com (mailing list archive)
State Superseded
Delegated to: Lorenzo Pieralisi
Headers show
Series PCI: brcmstb: have host-bridge turn on sub-device power | expand

Commit Message

Jim Quinlan Oct. 22, 2021, 2:06 p.m. UTC
We would like the Broadcom STB PCIe root complex driver to be able to turn
off/on regulators[1] that provide power to endpoint[2] devices.  Typically,
the drivers of these endpoint devices are stock Linux drivers that are not
aware that these regulator(s) exist and must be turned on for the driver to
be probed.  The simple solution of course is to turn these regulators on at
boot and keep them on.  However, this solution does not satisfy at least
three of our usage modes:

1. For example, one customer uses multiple PCIe controllers, but wants the
ability to, by script, turn any or all of them by and their subdevices off
to save power, e.g. when in battery mode.

2. Another example is when a watchdog script discovers that an endpoint
device is in an unresponsive state and would like to unbind, power toggle,
and re-bind just the PCIe endpoint and controller.

3. Of course we also want power turned off during suspend mode.  However,
some endpoint devices may be able to "wake" during suspend and we need to
recognise this case and veto the nominal act of turning off its regulator.
Such is the case with Wake-on-LAN and Wake-on-WLAN support where PCIe
end-point device needs to be kept powered on in order to receive network
packets and wake-up the system.

In all of these cases it is advantageous for the PCIe controller to govern
the turning off/on the regulators needed by the endpoint device.  The first
two cases can be done by simply unbinding and binding the PCIe controller,
if the controller has control of these regulators.

This commit solves the "chicken-and-egg" problem by providing a callback to
the RC driver when a downstream device is "discovered" by inspecting its
DT, and allowing said driver to allocate the device object prematurely in
order to get the regulator(s) and turn them on before the device's ID is
read.

[1] These regulators typically govern the actual power supply to the
    endpoint chip.  Sometimes they may be a the official PCIe socket
    power -- such as 3.3v or aux-3.3v.  Sometimes they are truly
    the regulator(s) that supply power to the EP chip.

[2] The 99% configuration of our boards is a single endpoint device
    attached to the PCIe controller.  I use the term endpoint but it could
    possible mean a switch as well.

Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
---
 drivers/base/core.c    |  5 +++++
 drivers/pci/probe.c    | 47 ++++++++++++++++++++++++++++++++----------
 include/linux/device.h |  3 +++
 include/linux/pci.h    |  3 +++
 4 files changed, 47 insertions(+), 11 deletions(-)

Comments

Greg KH Oct. 22, 2021, 2:34 p.m. UTC | #1
On Fri, Oct 22, 2021 at 10:06:55AM -0400, Jim Quinlan wrote:
> We would like the Broadcom STB PCIe root complex driver to be able to turn
> off/on regulators[1] that provide power to endpoint[2] devices.  Typically,
> the drivers of these endpoint devices are stock Linux drivers that are not
> aware that these regulator(s) exist and must be turned on for the driver to
> be probed.  The simple solution of course is to turn these regulators on at
> boot and keep them on.  However, this solution does not satisfy at least
> three of our usage modes:
> 
> 1. For example, one customer uses multiple PCIe controllers, but wants the
> ability to, by script, turn any or all of them by and their subdevices off
> to save power, e.g. when in battery mode.
> 
> 2. Another example is when a watchdog script discovers that an endpoint
> device is in an unresponsive state and would like to unbind, power toggle,
> and re-bind just the PCIe endpoint and controller.
> 
> 3. Of course we also want power turned off during suspend mode.  However,
> some endpoint devices may be able to "wake" during suspend and we need to
> recognise this case and veto the nominal act of turning off its regulator.
> Such is the case with Wake-on-LAN and Wake-on-WLAN support where PCIe
> end-point device needs to be kept powered on in order to receive network
> packets and wake-up the system.
> 
> In all of these cases it is advantageous for the PCIe controller to govern
> the turning off/on the regulators needed by the endpoint device.  The first
> two cases can be done by simply unbinding and binding the PCIe controller,
> if the controller has control of these regulators.
> 
> This commit solves the "chicken-and-egg" problem by providing a callback to
> the RC driver when a downstream device is "discovered" by inspecting its
> DT, and allowing said driver to allocate the device object prematurely in
> order to get the regulator(s) and turn them on before the device's ID is
> read.
> 
> [1] These regulators typically govern the actual power supply to the
>     endpoint chip.  Sometimes they may be a the official PCIe socket
>     power -- such as 3.3v or aux-3.3v.  Sometimes they are truly
>     the regulator(s) that supply power to the EP chip.
> 
> [2] The 99% configuration of our boards is a single endpoint device
>     attached to the PCIe controller.  I use the term endpoint but it could
>     possible mean a switch as well.
> 
> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> ---
>  drivers/base/core.c    |  5 +++++
>  drivers/pci/probe.c    | 47 ++++++++++++++++++++++++++++++++----------
>  include/linux/device.h |  3 +++
>  include/linux/pci.h    |  3 +++
>  4 files changed, 47 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 249da496581a..62d9ac123ae5 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2864,6 +2864,10 @@ static void klist_children_put(struct klist_node *n)
>   */
>  void device_initialize(struct device *dev)
>  {
> +	/* Return if this has been called already. */
> +	if (dev->device_initialized)
> +		return;
> +

Ick, no!  Who would ever be calling this function more than once?  That
"should" be impossible.

This function should only be called by a bus when it first creates the
structure and before anything is done with it.  As the bus itself
controls the creation, it already "knows" when the structure was
initialzed so it should not have to be called again.

Please fix the bus logic that requires this, it is broken.

Consider this a NACK for this patch, sorry.

greg k-h
Jim Quinlan Oct. 22, 2021, 3:01 p.m. UTC | #2
On Fri, Oct 22, 2021 at 10:34 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Fri, Oct 22, 2021 at 10:06:55AM -0400, Jim Quinlan wrote:
> > We would like the Broadcom STB PCIe root complex driver to be able to turn
> > off/on regulators[1] that provide power to endpoint[2] devices.  Typically,
> > the drivers of these endpoint devices are stock Linux drivers that are not
> > aware that these regulator(s) exist and must be turned on for the driver to
> > be probed.  The simple solution of course is to turn these regulators on at
> > boot and keep them on.  However, this solution does not satisfy at least
> > three of our usage modes:
> >
> > 1. For example, one customer uses multiple PCIe controllers, but wants the
> > ability to, by script, turn any or all of them by and their subdevices off
> > to save power, e.g. when in battery mode.
> >
> > 2. Another example is when a watchdog script discovers that an endpoint
> > device is in an unresponsive state and would like to unbind, power toggle,
> > and re-bind just the PCIe endpoint and controller.
> >
> > 3. Of course we also want power turned off during suspend mode.  However,
> > some endpoint devices may be able to "wake" during suspend and we need to
> > recognise this case and veto the nominal act of turning off its regulator.
> > Such is the case with Wake-on-LAN and Wake-on-WLAN support where PCIe
> > end-point device needs to be kept powered on in order to receive network
> > packets and wake-up the system.
> >
> > In all of these cases it is advantageous for the PCIe controller to govern
> > the turning off/on the regulators needed by the endpoint device.  The first
> > two cases can be done by simply unbinding and binding the PCIe controller,
> > if the controller has control of these regulators.
> >
> > This commit solves the "chicken-and-egg" problem by providing a callback to
> > the RC driver when a downstream device is "discovered" by inspecting its
> > DT, and allowing said driver to allocate the device object prematurely in
> > order to get the regulator(s) and turn them on before the device's ID is
> > read.
> >
> > [1] These regulators typically govern the actual power supply to the
> >     endpoint chip.  Sometimes they may be a the official PCIe socket
> >     power -- such as 3.3v or aux-3.3v.  Sometimes they are truly
> >     the regulator(s) that supply power to the EP chip.
> >
> > [2] The 99% configuration of our boards is a single endpoint device
> >     attached to the PCIe controller.  I use the term endpoint but it could
> >     possible mean a switch as well.
> >
> > Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> > ---
> >  drivers/base/core.c    |  5 +++++
> >  drivers/pci/probe.c    | 47 ++++++++++++++++++++++++++++++++----------
> >  include/linux/device.h |  3 +++
> >  include/linux/pci.h    |  3 +++
> >  4 files changed, 47 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 249da496581a..62d9ac123ae5 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -2864,6 +2864,10 @@ static void klist_children_put(struct klist_node *n)
> >   */
> >  void device_initialize(struct device *dev)
> >  {
> > +     /* Return if this has been called already. */
> > +     if (dev->device_initialized)
> > +             return;
> > +
>
> Ick, no!  Who would ever be calling this function more than once?  That
> "should" be impossible.
>
> This function should only be called by a bus when it first creates the
> structure and before anything is done with it.  As the bus itself
> controls the creation, it already "knows" when the structure was
> initialzed so it should not have to be called again.



>
> Please fix the bus logic that requires this, it is broken.
Got it, thanks for the prompt reply.

JimQ
>
> Consider this a NACK for this patch, sorry.
>
> greg k-h
diff mbox series

Patch

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 249da496581a..62d9ac123ae5 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2864,6 +2864,10 @@  static void klist_children_put(struct klist_node *n)
  */
 void device_initialize(struct device *dev)
 {
+	/* Return if this has been called already. */
+	if (dev->device_initialized)
+		return;
+
 	dev->kobj.kset = devices_kset;
 	kobject_init(&dev->kobj, &device_ktype);
 	INIT_LIST_HEAD(&dev->dma_pools);
@@ -2892,6 +2896,7 @@  void device_initialize(struct device *dev)
 #ifdef CONFIG_SWIOTLB
 	dev->dma_io_tlb_mem = &io_tlb_default_mem;
 #endif
+	dev->device_initialized = true;
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index d9fc02a71baa..12947e972b7b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2372,27 +2372,52 @@  EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
  */
 static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 {
-	struct pci_dev *dev;
+	struct pci_host_bridge *hb = pci_find_host_bridge(bus);
+	struct pci_dev *dev = NULL;
 	u32 l;
 
-	if (!pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000))
-		return NULL;
+	/*
+	 * If the host bridge has a pci_subdev_prepare() function, first
+	 * call it with true as the first argument to see if it "cares"
+	 * about this device.  A non-zero return value indicates it cares,
+	 * so in that case partially allocate some of the device and call
+	 * pci_subdev_prepare() again, with false as the first argument.
+	 * This tells it to allow the host bridge driver to pre-allocate
+	 * some resources such as voltage regulators.
+	 */
+	if (hb->pci_subdev_prepare
+	    && hb->pci_subdev_prepare(true, bus, devfn, NULL, NULL)) {
+		dev = pci_alloc_dev(bus);
+		if (!dev)
+			return NULL;
 
-	dev = pci_alloc_dev(bus);
-	if (!dev)
-		return NULL;
+		dev->devfn = devfn;
+		device_initialize(&dev->dev);
 
+		/* Call again, this time for actual prep work */
+		if (hb->pci_subdev_prepare(false, bus, devfn, hb, dev)
+		    || !pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000))
+			goto err_out;
+	} else {
+		if (!pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000))
+			return NULL;
+		dev = pci_alloc_dev(bus);
+		if (!dev)
+			return NULL;
+	}
 	dev->devfn = devfn;
 	dev->vendor = l & 0xffff;
 	dev->device = (l >> 16) & 0xffff;
 
-	if (pci_setup_device(dev)) {
-		pci_bus_put(dev->bus);
-		kfree(dev);
-		return NULL;
-	}
+	if (pci_setup_device(dev))
+		goto err_out;
 
 	return dev;
+
+err_out:
+	pci_bus_put(dev->bus);
+	kfree(dev);
+	return NULL;
 }
 
 void pcie_report_downtraining(struct pci_dev *dev)
diff --git a/include/linux/device.h b/include/linux/device.h
index e270cb740b9e..cf175684a270 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -461,6 +461,8 @@  struct dev_links_info {
  *		and optionall (if the coherent mask is large enough) also
  *		for dma allocations.  This flag is managed by the dma ops
  *		instance from ->dma_supported.
+ * @device_initialized: true if device_initialize(dev) has already been
+ *		invoked, false otherwise.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -575,6 +577,7 @@  struct device {
 #ifdef CONFIG_DMA_OPS_BYPASS
 	bool			dma_ops_bypass : 1;
 #endif
+	bool			device_initialized : 1;
 };
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index cd8aa6fce204..7a72b3af1e33 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -552,6 +552,9 @@  struct pci_host_bridge {
 	u8 (*swizzle_irq)(struct pci_dev *, u8 *); /* Platform IRQ swizzler */
 	int (*map_irq)(const struct pci_dev *, u8, u8);
 	void (*release_fn)(struct pci_host_bridge *);
+	int (*pci_subdev_prepare)(bool query, struct pci_bus *bus, int devfn,
+				  struct pci_host_bridge *hb,
+				  struct pci_dev *pdev);
 	void		*release_data;
 	unsigned int	ignore_reset_delay:1;	/* For entire hierarchy */
 	unsigned int	no_ext_tags:1;		/* No Extended Tags */