diff mbox series

[01/12] PCI/ACPI: Add D3cold Aux Power Limit_DSM method

Message ID 20250401153225.96379-2-anshuman.gupta@intel.com (mailing list archive)
State New
Headers show
Series VRAM Self Refresh | expand

Commit Message

Anshuman Gupta April 1, 2025, 3:32 p.m. UTC
Implement _DSM method 10 and _DSM Method 11 as per PCI firmware specs
section 4.6.10 Rev 3.3.

Note that this implementation assumes only a single device below the
Downstream Port will request for Aux Power Limit under a given
Root Port because it does not track and aggregate requests
from all child devices below the Downstream Port as required
by Section 4.6.10 Rev 3.3.

One possible mitigation would be only allowing only first PCIe
Non-Bridge Endpoint Function 0 driver to call_DSM method 10.

Signed-off-by: Varun Gupta <varun.gupta@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/pci/pci-acpi.c   | 78 ++++++++++++++++++++++++++++++++++++++++
 include/linux/pci-acpi.h |  6 ++++
 2 files changed, 84 insertions(+)

Comments

Bjorn Helgaas April 1, 2025, 6:25 p.m. UTC | #1
On Tue, Apr 01, 2025 at 09:02:14PM +0530, Anshuman Gupta wrote:
> Implement _DSM method 10 and _DSM Method 11 as per PCI firmware specs
> section 4.6.10 Rev 3.3.

Thanks for splitting this into two patches.  But I think now this only
implements function 10 (0x0a), so this sentence needs to be updated.

I would write this consistently as "0x0a" or "0Ah" to match the spec
description.  I don't think the spec ever uses "10".

> Note that this implementation assumes only a single device below the
> Downstream Port will request for Aux Power Limit under a given
> Root Port because it does not track and aggregate requests
> from all child devices below the Downstream Port as required
> by Section 4.6.10 Rev 3.3.
> 
> One possible mitigation would be only allowing only first PCIe
> Non-Bridge Endpoint Function 0 driver to call_DSM method 10.
> 
> Signed-off-by: Varun Gupta <varun.gupta@intel.com>
> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> ---
>  drivers/pci/pci-acpi.c   | 78 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci-acpi.h |  6 ++++
>  2 files changed, 84 insertions(+)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index af370628e583..ebd49e43457e 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -1421,6 +1421,84 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev,
>  	ACPI_FREE(obj);
>  }
>  
> +/**
> + * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via ACPI DSM
> + * @dev: PCI device instance
> + * @requested_power: Requested auxiliary power in milliwatts
> + *
> + * This function sends a request to the host BIOS via ACPI _DSM Function 10
> + * to grant the required D3Cold Auxiliary power for the specified PCI device.
> + * It evaluates the _DSM (Device Specific Method) to request the Auxiliary
> + * power and handles the response accordingly.
> + *
> + * This function shall be only called by 1st non-bridge Endpoint driver
> + * on Function 0. For a Multi-Function Device, the driver for Function 0 is
> + * required to report an aggregate power requirement covering all
> + * functions contained within the device.
> + *
> + * Return: Returns 0 on success and errno on failure.

Write all this in imperative mood, e.g.,

  Request auxiliary power while device is in D3cold ...

  Return 0 on success ...

> + */
> +int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
> +{
> +	union acpi_object in_obj = {
> +		.integer.type = ACPI_TYPE_INTEGER,
> +		.integer.value = requested_power,
> +	};
> +
> +	union acpi_object *out_obj;
> +	acpi_handle handle;
> +	int result, ret = -EINVAL;
> +
> +	if (!dev || !ACPI_HANDLE(&dev->dev))
> +		return -EINVAL;
> +
> +	handle = ACPI_HANDLE(&dev->dev);

This needs an acpi_check_dsm() call to find out whether the platform
supports DSM_PCI_D3COLD_AUX_POWER_LIMIT.

We have several _DSM calls that *should* do this, but unfortunately
they don't do it yet, so they're not good examples to copy.

> +	out_obj = acpi_evaluate_dsm_typed(handle,
> +					  &pci_acpi_dsm_guid,
> +					  4,
> +					  DSM_PCI_D3COLD_AUX_POWER_LIMIT,
> +					  &in_obj,
> +					  ACPI_TYPE_INTEGER);
> +	if (!out_obj)
> +		return -EINVAL;
> +
> +	result = out_obj->integer.value;
> +
> +	switch (result) {
> +	case 0x0:
> +		dev_dbg(&dev->dev, "D3cold Aux Power %umW request denied\n",
> +			requested_power);

Use pci_dbg(dev), pci_info(dev), etc.

> +		break;
> +	case 0x1:
> +		dev_info(&dev->dev, "D3cold Aux Power request granted: %umW\n",
> +			 requested_power);
> +		ret = 0;
> +		break;
> +	case 0x2:
> +		dev_info(&dev->dev, "D3cold Aux Power: Main power won't be removed\n");
> +		ret = -EBUSY;
> +		break;
> +	default:
> +		if (result >= 0x11 && result <= 0x1F) {
> +			int retry_interval = result & 0xF;
> +
> +			dev_warn(&dev->dev, "D3cold Aux Power request needs retry."
> +				 "Interval: %d seconds.\n", retry_interval);
> +			msleep(retry_interval * 1000);

It doesn't seem right to me to do this sleep internally because it
means this interface might take up to 15 seconds to return, and the
caller has no idea what is happening during that time.

This seems like it should be done in the driver, which can decide
*whether* it wants to sleep, and if it does sleep, it may give a nice
indication to the user.

Of course, that would mean returning some kind of retry interval
information to the caller.

> +			ret = -EAGAIN;
> +		} else {
> +			dev_err(&dev->dev, "D3cold Aux Power: Reserved or "
> +				"unsupported response: 0x%x.\n", result);

Drop periods at end of messages.

> +		}
> +		break;
> +	}
> +
> +	ACPI_FREE(out_obj);
> +	return ret;
> +}
> +EXPORT_SYMBOL(pci_acpi_request_d3cold_aux_power);
Rafael J. Wysocki April 2, 2025, 10:59 a.m. UTC | #2
On Tue, Apr 1, 2025 at 8:25 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Apr 01, 2025 at 09:02:14PM +0530, Anshuman Gupta wrote:
> > Implement _DSM method 10 and _DSM Method 11 as per PCI firmware specs
> > section 4.6.10 Rev 3.3.
>
> Thanks for splitting this into two patches.  But I think now this only
> implements function 10 (0x0a), so this sentence needs to be updated.
>
> I would write this consistently as "0x0a" or "0Ah" to match the spec
> description.  I don't think the spec ever uses "10".
>
> > Note that this implementation assumes only a single device below the
> > Downstream Port will request for Aux Power Limit under a given
> > Root Port because it does not track and aggregate requests
> > from all child devices below the Downstream Port as required
> > by Section 4.6.10 Rev 3.3.

Why is it regarded as compliant, then?

Request aggregation is a known problem that has been addressed for a
couple of times (at least) in the kernel.  Why is it too hard to
address it here?

> > One possible mitigation would be only allowing only first PCIe
> > Non-Bridge Endpoint Function 0 driver to call_DSM method 10.

What about topologies where there is a switch with multiple downstream
ports below the Root Port?

> >
> > Signed-off-by: Varun Gupta <varun.gupta@intel.com>
> > Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > ---
> >  drivers/pci/pci-acpi.c   | 78 ++++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci-acpi.h |  6 ++++
> >  2 files changed, 84 insertions(+)
> >
> > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > index af370628e583..ebd49e43457e 100644
> > --- a/drivers/pci/pci-acpi.c
> > +++ b/drivers/pci/pci-acpi.c
> > @@ -1421,6 +1421,84 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev,
> >       ACPI_FREE(obj);
> >  }
> >
> > +/**
> > + * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via ACPI DSM
> > + * @dev: PCI device instance
> > + * @requested_power: Requested auxiliary power in milliwatts

How's the caller supposed to find out what value to use here?

> > + *
> > + * This function sends a request to the host BIOS via ACPI _DSM Function 10
> > + * to grant the required D3Cold Auxiliary power for the specified PCI device.
> > + * It evaluates the _DSM (Device Specific Method) to request the Auxiliary
> > + * power and handles the response accordingly.
> > + *
> > + * This function shall be only called by 1st non-bridge Endpoint driver
> > + * on Function 0. For a Multi-Function Device, the driver for Function 0 is
> > + * required to report an aggregate power requirement covering all
> > + * functions contained within the device.
> > + *
> > + * Return: Returns 0 on success and errno on failure.
>
> Write all this in imperative mood, e.g.,
>
>   Request auxiliary power while device is in D3cold ...
>
>   Return 0 on success ...
>
> > + */
> > +int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
> > +{
> > +     union acpi_object in_obj = {
> > +             .integer.type = ACPI_TYPE_INTEGER,
> > +             .integer.value = requested_power,
> > +     };
> > +
> > +     union acpi_object *out_obj;
> > +     acpi_handle handle;
> > +     int result, ret = -EINVAL;
> > +
> > +     if (!dev || !ACPI_HANDLE(&dev->dev))
> > +             return -EINVAL;
> > +
> > +     handle = ACPI_HANDLE(&dev->dev);

Well, ACPI_HANDLE() is not a simple macro, so I'd rather avoid using
it twice in a row for the same device.

What about

if (!dev)
        return -EINVAL;

handle = ACPI_HANDLE(&dev->dev);
if (!handle)
        return -EINVAL;

>
> This needs an acpi_check_dsm() call to find out whether the platform
> supports DSM_PCI_D3COLD_AUX_POWER_LIMIT.

Absolutely.

> We have several _DSM calls that *should* do this, but unfortunately
> they don't do it yet, so they're not good examples to copy.

Right.

> > +     out_obj = acpi_evaluate_dsm_typed(handle,
> > +                                       &pci_acpi_dsm_guid,
> > +                                       4,
> > +                                       DSM_PCI_D3COLD_AUX_POWER_LIMIT,
> > +                                       &in_obj,
> > +                                       ACPI_TYPE_INTEGER);
> > +     if (!out_obj)
> > +             return -EINVAL;
> > +
> > +     result = out_obj->integer.value;
> > +
> > +     switch (result) {
> > +     case 0x0:
> > +             dev_dbg(&dev->dev, "D3cold Aux Power %umW request denied\n",
> > +                     requested_power);
>
> Use pci_dbg(dev), pci_info(dev), etc.
>
> > +             break;
> > +     case 0x1:
> > +             dev_info(&dev->dev, "D3cold Aux Power request granted: %umW\n",
> > +                      requested_power);
> > +             ret = 0;
> > +             break;
> > +     case 0x2:
> > +             dev_info(&dev->dev, "D3cold Aux Power: Main power won't be removed\n");
> > +             ret = -EBUSY;
> > +             break;
> > +     default:
> > +             if (result >= 0x11 && result <= 0x1F) {
> > +                     int retry_interval = result & 0xF;
> > +
> > +                     dev_warn(&dev->dev, "D3cold Aux Power request needs retry."
> > +                              "Interval: %d seconds.\n", retry_interval);
> > +                     msleep(retry_interval * 1000);
>
> It doesn't seem right to me to do this sleep internally because it
> means this interface might take up to 15 seconds to return, and the
> caller has no idea what is happening during that time.

I entirely agree here.

If this is going to happen during system suspend, sleeping for seconds
is a total no-go.

> This seems like it should be done in the driver, which can decide
> *whether* it wants to sleep, and if it does sleep, it may give a nice
> indication to the user.

So during a system suspend in progress, this is rather hard to achieve.

I'd say that if the request is not granted right away, it is a failure
and we don't use D3cold.

> Of course, that would mean returning some kind of retry interval
> information to the caller.
>
> > +                     ret = -EAGAIN;
> > +             } else {
> > +                     dev_err(&dev->dev, "D3cold Aux Power: Reserved or "
> > +                             "unsupported response: 0x%x.\n", result);
>
> Drop periods at end of messages.
>
> > +             }
> > +             break;
> > +     }
> > +
> > +     ACPI_FREE(out_obj);
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL(pci_acpi_request_d3cold_aux_power);
diff mbox series

Patch

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index af370628e583..ebd49e43457e 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1421,6 +1421,84 @@  static void pci_acpi_optimize_delay(struct pci_dev *pdev,
 	ACPI_FREE(obj);
 }
 
+/**
+ * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via ACPI DSM
+ * @dev: PCI device instance
+ * @requested_power: Requested auxiliary power in milliwatts
+ *
+ * This function sends a request to the host BIOS via ACPI _DSM Function 10
+ * to grant the required D3Cold Auxiliary power for the specified PCI device.
+ * It evaluates the _DSM (Device Specific Method) to request the Auxiliary
+ * power and handles the response accordingly.
+ *
+ * This function shall be only called by 1st non-bridge Endpoint driver
+ * on Function 0. For a Multi-Function Device, the driver for Function 0 is
+ * required to report an aggregate power requirement covering all
+ * functions contained within the device.
+ *
+ * Return: Returns 0 on success and errno on failure.
+ */
+int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
+{
+	union acpi_object in_obj = {
+		.integer.type = ACPI_TYPE_INTEGER,
+		.integer.value = requested_power,
+	};
+
+	union acpi_object *out_obj;
+	acpi_handle handle;
+	int result, ret = -EINVAL;
+
+	if (!dev || !ACPI_HANDLE(&dev->dev))
+		return -EINVAL;
+
+	handle = ACPI_HANDLE(&dev->dev);
+
+	out_obj = acpi_evaluate_dsm_typed(handle,
+					  &pci_acpi_dsm_guid,
+					  4,
+					  DSM_PCI_D3COLD_AUX_POWER_LIMIT,
+					  &in_obj,
+					  ACPI_TYPE_INTEGER);
+	if (!out_obj)
+		return -EINVAL;
+
+	result = out_obj->integer.value;
+
+	switch (result) {
+	case 0x0:
+		dev_dbg(&dev->dev, "D3cold Aux Power %umW request denied\n",
+			requested_power);
+		break;
+	case 0x1:
+		dev_info(&dev->dev, "D3cold Aux Power request granted: %umW\n",
+			 requested_power);
+		ret = 0;
+		break;
+	case 0x2:
+		dev_info(&dev->dev, "D3cold Aux Power: Main power won't be removed\n");
+		ret = -EBUSY;
+		break;
+	default:
+		if (result >= 0x11 && result <= 0x1F) {
+			int retry_interval = result & 0xF;
+
+			dev_warn(&dev->dev, "D3cold Aux Power request needs retry."
+				 "Interval: %d seconds.\n", retry_interval);
+			msleep(retry_interval * 1000);
+			ret = -EAGAIN;
+		} else {
+			dev_err(&dev->dev, "D3cold Aux Power: Reserved or "
+				"unsupported response: 0x%x.\n", result);
+		}
+		break;
+	}
+
+	ACPI_FREE(out_obj);
+	return ret;
+}
+EXPORT_SYMBOL(pci_acpi_request_d3cold_aux_power);
+
 static void pci_acpi_set_external_facing(struct pci_dev *dev)
 {
 	u8 val;
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 078225b514d4..dbc4b0ed433c 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -121,6 +121,7 @@  extern const guid_t pci_acpi_dsm_guid;
 #define DSM_PCI_DEVICE_NAME			0x07
 #define DSM_PCI_POWER_ON_RESET_DELAY		0x08
 #define DSM_PCI_DEVICE_READINESS_DURATIONS	0x09
+#define DSM_PCI_D3COLD_AUX_POWER_LIMIT		0x0A
 
 #ifdef CONFIG_PCIE_EDR
 void pci_acpi_add_edr_notifier(struct pci_dev *pdev);
@@ -132,10 +133,15 @@  static inline void pci_acpi_remove_edr_notifier(struct pci_dev *pdev) { }
 
 int pci_acpi_set_companion_lookup_hook(struct acpi_device *(*func)(struct pci_dev *));
 void pci_acpi_clear_companion_lookup_hook(void);
+int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power);
 
 #else	/* CONFIG_ACPI */
 static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
 static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
+int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
+{
+	return -EOPNOTSUPP;
+}
 #endif	/* CONFIG_ACPI */
 
 #endif	/* _PCI_ACPI_H_ */