diff mbox series

[v2,3/3] PCI: qcom: properly implement RC shutdown/power up

Message ID 20240210-topic-8280_pcie-v2-3-1cef4b606883@linaro.org (mailing list archive)
State Superseded
Headers show
Series Qualcomm PCIe RC shutdown & reinit | expand

Commit Message

Konrad Dybcio Feb. 10, 2024, 5:10 p.m. UTC
Currently, we've only been minimizing the power draw while keeping the
RC up at all times. This is suboptimal, as it draws a whole lot of power
and prevents the SoC from power collapsing.

Implement full shutdown and re-initialization to allow for powering off
the controller.

This is mainly indended for SC8280XP with a broken power rail setup,
which requires a full RC shutdown/reinit in order to reach SoC-wide
power collapse, but sleeping is generally better than not sleeping and
less destructive suspend can be implemented later for platforms that
support it.

Co-developed-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/pci/controller/dwc/Kconfig     |   1 +
 drivers/pci/controller/dwc/pcie-qcom.c | 159 ++++++++++++++++++++++++++-------
 2 files changed, 126 insertions(+), 34 deletions(-)

Comments

Bjorn Helgaas Feb. 12, 2024, 9:32 p.m. UTC | #1
"Properly" is a noise word that suggests "we're doing it right this
time" but doesn't hint at what actually makes this better.

On Sat, Feb 10, 2024 at 06:10:07PM +0100, Konrad Dybcio wrote:
> Currently, we've only been minimizing the power draw while keeping the
> RC up at all times. This is suboptimal, as it draws a whole lot of power
> and prevents the SoC from power collapsing.

Is "power collapse" a technical term specific to this device, or is
there some more common term that could be used?  I assume the fact
that the RC remains powered precludes some lower power state of the
entire SoC?

> Implement full shutdown and re-initialization to allow for powering off
> the controller.
> 
> This is mainly indended for SC8280XP with a broken power rail setup,
> which requires a full RC shutdown/reinit in order to reach SoC-wide
> power collapse, but sleeping is generally better than not sleeping and
> less destructive suspend can be implemented later for platforms that
> support it.

s/indended/intended/

>  config PCIE_QCOM
>  	bool "Qualcomm PCIe controller (host mode)"
>  	depends on OF && (ARCH_QCOM || COMPILE_TEST)
> +	depends on QCOM_COMMAND_DB || QCOM_COMMAND_DB=n

Just out of curiosity since I'm not a Kconfig expert, what does
"depends on X || X=n" mean?  

I guess it's different from
"depends on (QCOM_COMMAND_DB || !QCOM_COMMAND_DB)", which I also see
used for QCOM_RPMH?

Does this reduce compile testing?  I see COMPILE_TEST mentioned in a
few other QCOM_COMMAND_DB dependencies.

> +	ret_l23 = readl_poll_timeout(pcie->parf + PARF_PM_STTS, val,
> +				     val & PM_ENTER_L23, 10000, 100000);

Are these timeout values rooted in some PCIe or Qcom spec?  Would be
nice to have a spec citation or other reason for choosing these
values.

> +	reset_control_assert(res->rst);
> +	usleep_range(2000, 2500);

Ditto, some kind of citation would be nice.

Bjorn
Konrad Dybcio Feb. 14, 2024, 9:33 p.m. UTC | #2
On 12.02.2024 22:32, Bjorn Helgaas wrote:
> "Properly" is a noise word that suggests "we're doing it right this
> time" but doesn't hint at what actually makes this better.
> 
> On Sat, Feb 10, 2024 at 06:10:07PM +0100, Konrad Dybcio wrote:
>> Currently, we've only been minimizing the power draw while keeping the
>> RC up at all times. This is suboptimal, as it draws a whole lot of power
>> and prevents the SoC from power collapsing.
> 
> Is "power collapse" a technical term specific to this device, or is
> there some more common term that could be used?  I assume the fact
> that the RC remains powered precludes some lower power state of the
> entire SoC?

That's spot on, "power collapse" commonly refers to shutting down as many
parts of the SoC as possible, in order to achieve miliwatt-order power draw.


> 
>> Implement full shutdown and re-initialization to allow for powering off
>> the controller.
>>
>> This is mainly indended for SC8280XP with a broken power rail setup,
>> which requires a full RC shutdown/reinit in order to reach SoC-wide
>> power collapse, but sleeping is generally better than not sleeping and
>> less destructive suspend can be implemented later for platforms that
>> support it.
> 
> s/indended/intended/
> 
>>  config PCIE_QCOM
>>  	bool "Qualcomm PCIe controller (host mode)"
>>  	depends on OF && (ARCH_QCOM || COMPILE_TEST)
>> +	depends on QCOM_COMMAND_DB || QCOM_COMMAND_DB=n
> 
> Just out of curiosity since I'm not a Kconfig expert, what does
> "depends on X || X=n" mean?  

"not a module"

> 
> I guess it's different from
> "depends on (QCOM_COMMAND_DB || !QCOM_COMMAND_DB)", which I also see
> used for QCOM_RPMH?

Yep

> 
> Does this reduce compile testing?  I see COMPILE_TEST mentioned in a
> few other QCOM_COMMAND_DB dependencies.

I can add "&& COMPILE_TEST", yeah

> 
>> +	ret_l23 = readl_poll_timeout(pcie->parf + PARF_PM_STTS, val,
>> +				     val & PM_ENTER_L23, 10000, 100000);
> 
> Are these timeout values rooted in some PCIe or Qcom spec?  Would be
> nice to have a spec citation or other reason for choosing these
> values.
> 
>> +	reset_control_assert(res->rst);
>> +	usleep_range(2000, 2500);
> 
> Ditto, some kind of citation would be nice.

Both are magic values coming from Qualcomm BSP, that we suppose
we can safely assume (and that's a two-level assumption at this
point, I know..) is going to work fine, as it does so on millions
of shipped devices.

Maybe Mani or Bjorn A can find something interesting in the documentation.

Konrad
Johan Hovold Feb. 15, 2024, 7:13 a.m. UTC | #3
On Wed, Feb 14, 2024 at 10:33:19PM +0100, Konrad Dybcio wrote:
> On 12.02.2024 22:32, Bjorn Helgaas wrote:
> > "Properly" is a noise word that suggests "we're doing it right this
> > time" but doesn't hint at what actually makes this better.
> > 
> > On Sat, Feb 10, 2024 at 06:10:07PM +0100, Konrad Dybcio wrote:
> >> Currently, we've only been minimizing the power draw while keeping the
> >> RC up at all times. This is suboptimal, as it draws a whole lot of power
> >> and prevents the SoC from power collapsing.
> > 
> > Is "power collapse" a technical term specific to this device, or is
> > there some more common term that could be used?  I assume the fact
> > that the RC remains powered precludes some lower power state of the
> > entire SoC?
> 
> That's spot on, "power collapse" commonly refers to shutting down as many
> parts of the SoC as possible, in order to achieve miliwatt-order power draw.

I'm pretty sure "power collapse" is a Qualcomm:ism so better to use
common terminology as Bjorn suggested, and maybe put the Qualcomm
wording in parenthesis or similar.

Johan
Konrad Dybcio Feb. 15, 2024, 10:22 a.m. UTC | #4
On 15.02.2024 08:13, Johan Hovold wrote:
> On Wed, Feb 14, 2024 at 10:33:19PM +0100, Konrad Dybcio wrote:
>> On 12.02.2024 22:32, Bjorn Helgaas wrote:
>>> "Properly" is a noise word that suggests "we're doing it right this
>>> time" but doesn't hint at what actually makes this better.
>>>
>>> On Sat, Feb 10, 2024 at 06:10:07PM +0100, Konrad Dybcio wrote:
>>>> Currently, we've only been minimizing the power draw while keeping the
>>>> RC up at all times. This is suboptimal, as it draws a whole lot of power
>>>> and prevents the SoC from power collapsing.
>>>
>>> Is "power collapse" a technical term specific to this device, or is
>>> there some more common term that could be used?  I assume the fact
>>> that the RC remains powered precludes some lower power state of the
>>> entire SoC?
>>
>> That's spot on, "power collapse" commonly refers to shutting down as many
>> parts of the SoC as possible, in order to achieve miliwatt-order power draw.
> 
> I'm pretty sure "power collapse" is a Qualcomm:ism so better to use
> common terminology as Bjorn suggested, and maybe put the Qualcomm
> wording in parenthesis or similar.

Ok, I keep hearing it so much that I had previously assumed it's the
standard way of describing it.. I'll reword this.

Konrad
Krishna Chaitanya Chundru Feb. 20, 2024, 4:12 a.m. UTC | #5
On 2/10/2024 10:40 PM, Konrad Dybcio wrote:
> Currently, we've only been minimizing the power draw while keeping the
> RC up at all times. This is suboptimal, as it draws a whole lot of power
> and prevents the SoC from power collapsing.
> 
> Implement full shutdown and re-initialization to allow for powering off
> the controller.
> 
> This is mainly indended for SC8280XP with a broken power rail setup,
> which requires a full RC shutdown/reinit in order to reach SoC-wide
> power collapse, but sleeping is generally better than not sleeping and
> less destructive suspend can be implemented later for platforms that
> support it.
> 
> Co-developed-by: Bjorn Andersson <quic_bjorande@quicinc.com>
> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> ---
>   drivers/pci/controller/dwc/Kconfig     |   1 +
>   drivers/pci/controller/dwc/pcie-qcom.c | 159 ++++++++++++++++++++++++++-------
>   2 files changed, 126 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
> index 8afacc90c63b..4ce266951cb6 100644
> --- a/drivers/pci/controller/dwc/Kconfig
> +++ b/drivers/pci/controller/dwc/Kconfig
> @@ -268,6 +268,7 @@ config PCIE_DW_PLAT_EP
>   config PCIE_QCOM
>   	bool "Qualcomm PCIe controller (host mode)"
>   	depends on OF && (ARCH_QCOM || COMPILE_TEST)
> +	depends on QCOM_COMMAND_DB || QCOM_COMMAND_DB=n
>   	depends on PCI_MSI
>   	select PCIE_DW_HOST
>   	select CRC8
> diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
> index 6a469ed213ce..c807833ee4a7 100644
> --- a/drivers/pci/controller/dwc/pcie-qcom.c
> +++ b/drivers/pci/controller/dwc/pcie-qcom.c
> @@ -30,13 +30,18 @@
>   #include <linux/reset.h>
>   #include <linux/slab.h>
>   #include <linux/types.h>
> +#include <soc/qcom/cmd-db.h>
>   
>   #include "../../pci.h"
>   #include "pcie-designware.h"
>   
> +#include <dt-bindings/interconnect/qcom,icc.h>
> +#include <dt-bindings/interconnect/qcom,rpm-icc.h>
> +
>   /* PARF registers */
>   #define PARF_SYS_CTRL				0x00
>   #define PARF_PM_CTRL				0x20
> +#define PARF_PM_STTS				0x24
>   #define PARF_PCS_DEEMPH				0x34
>   #define PARF_PCS_SWING				0x38
>   #define PARF_PHY_CTRL				0x40
> @@ -80,7 +85,10 @@
>   #define L1_CLK_RMV_DIS				BIT(1)
>   
>   /* PARF_PM_CTRL register fields */
> -#define REQ_NOT_ENTR_L1				BIT(5)
> +#define REQ_NOT_ENTR_L1				BIT(5) /* "Prevent L0->L1" */
> +
> +/* PARF_PM_STTS register fields */
> +#define PM_ENTER_L23				BIT(5)
>   
>   /* PARF_PCS_DEEMPH register fields */
>   #define PCS_DEEMPH_TX_DEEMPH_GEN1(x)		FIELD_PREP(GENMASK(21, 16), x)
> @@ -122,6 +130,7 @@
>   
>   /* ELBI_SYS_CTRL register fields */
>   #define ELBI_SYS_CTRL_LT_ENABLE			BIT(0)
> +#define ELBI_SYS_CTRL_PME_TURNOFF_MSG		BIT(4)
>   
>   /* AXI_MSTR_RESP_COMP_CTRL0 register fields */
>   #define CFG_REMOTE_RD_REQ_BRIDGE_SIZE_2K	0x4
> @@ -243,6 +252,7 @@ struct qcom_pcie {
>   	const struct qcom_pcie_cfg *cfg;
>   	struct dentry *debugfs;
>   	bool suspended;
> +	bool soc_is_rpmh;
>   };
>   
>   #define to_qcom_pcie(x)		dev_get_drvdata((x)->dev)
> @@ -272,6 +282,24 @@ static int qcom_pcie_start_link(struct dw_pcie *pci)
>   	return 0;
>   }
>   
> +static int qcom_pcie_stop_link(struct dw_pcie *pci)
> +{
> +	struct qcom_pcie *pcie = to_qcom_pcie(pci);
> +	u32 ret_l23, val;
> +
> +	writel(ELBI_SYS_CTRL_PME_TURNOFF_MSG, pcie->elbi + ELBI_SYS_CTRL);
> +	readl(pcie->elbi + ELBI_SYS_CTRL);
> +
> +	ret_l23 = readl_poll_timeout(pcie->parf + PARF_PM_STTS, val,
> +				     val & PM_ENTER_L23, 10000, 100000);
> +	if (ret_l23) {
> +		dev_err(pci->dev, "Failed to enter L2/L3\n");
> +		return -ETIMEDOUT;
> +	}
> +
> +	return 0;
> +}
> +
>   static void qcom_pcie_clear_hpc(struct dw_pcie *pci)
>   {
>   	u16 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
> @@ -987,9 +1015,19 @@ static void qcom_pcie_host_post_init_2_7_0(struct qcom_pcie *pcie)
>   static void qcom_pcie_deinit_2_7_0(struct qcom_pcie *pcie)
>   {
>   	struct qcom_pcie_resources_2_7_0 *res = &pcie->res.v2_7_0;
> +	u32 val;
> +
> +	/* Disable PCIe clocks and resets */
> +	val = readl(pcie->parf + PARF_PHY_CTRL);
> +	val |= PHY_TEST_PWR_DOWN;
> +	writel(val, pcie->parf + PARF_PHY_CTRL);
> +	readl(pcie->parf + PARF_PHY_CTRL);
>   
>   	clk_bulk_disable_unprepare(res->num_clks, res->clks);
>   
> +	reset_control_assert(res->rst);
> +	usleep_range(2000, 2500);
> +
>   	regulator_bulk_disable(ARRAY_SIZE(res->supplies), res->supplies);
>   }
>   
> @@ -1545,6 +1583,9 @@ static int qcom_pcie_probe(struct platform_device *pdev)
>   		goto err_phy_exit;
>   	}
>   
> +	/* If the soc features RPMh, cmd_db must have been prepared by now */
> +	pcie->soc_is_rpmh = !cmd_db_ready();
> +
>   	qcom_pcie_icc_update(pcie);
>   
>   	if (pcie->mhi)
> @@ -1561,58 +1602,108 @@ static int qcom_pcie_probe(struct platform_device *pdev)
>   	return ret;
>   }
>   
> -static int qcom_pcie_suspend_noirq(struct device *dev)
> +static int qcom_pcie_resume_noirq(struct device *dev)
>   {
>   	struct qcom_pcie *pcie = dev_get_drvdata(dev);
>   	int ret;
>   
> -	/*
> -	 * Set minimum bandwidth required to keep data path functional during
> -	 * suspend.
> -	 */
> -	ret = icc_set_bw(pcie->icc_mem, 0, kBps_to_icc(1));
> -	if (ret) {
> -		dev_err(dev, "Failed to set interconnect bandwidth: %d\n", ret);
> -		return ret;
> +	if (pcie->soc_is_rpmh) {
> +		/*
> +		 * Undo the tag change from qcom_pcie_suspend_noirq first in
> +		 * case RPMh spontaneously decides to power collapse the
> +		 * platform based on other inputs.
> +		 */
> +		icc_set_tag(pcie->icc_mem, QCOM_ICC_TAG_ALWAYS);
> +
> +		/* Flush the tag change */
> +		ret = icc_enable(pcie->icc_mem);
> +		if (ret) {
> +			dev_err(pcie->pci->dev, "failed to icc_enable: %d", ret);
> +			return ret;
> +		}
>   	}
>   
> -	/*
> -	 * Turn OFF the resources only for controllers without active PCIe
> -	 * devices. For controllers with active devices, the resources are kept
> -	 * ON and the link is expected to be in L0/L1 (sub)states.
> -	 *
> -	 * Turning OFF the resources for controllers with active PCIe devices
> -	 * will trigger access violation during the end of the suspend cycle,
> -	 * as kernel tries to access the PCIe devices config space for masking
> -	 * MSIs.
> -	 *
> -	 * Also, it is not desirable to put the link into L2/L3 state as that
> -	 * implies VDD supply will be removed and the devices may go into
> -	 * powerdown state. This will affect the lifetime of the storage devices
> -	 * like NVMe.
> -	 */
> -	if (!dw_pcie_link_up(pcie->pci)) {
> -		qcom_pcie_host_deinit(&pcie->pci->pp);
> -		pcie->suspended = true;
> -	}
> +	/* Only check this now to make sure the icc tag has been set. */
> +	if (!pcie->suspended)
> +		return 0;
> +
> +	ret = qcom_pcie_host_init(&pcie->pci->pp);
> +	if (ret)
> +		goto revert_icc_tag;
> +
> +	dw_pcie_setup_rc(&pcie->pci->pp);
> +
> +	ret = qcom_pcie_start_link(pcie->pci);
> +	if (ret)
> +		goto deinit_host;
> +
> +	/* Ignore the retval, the devices may come up later. */
> +	dw_pcie_wait_for_link(pcie->pci);
> +
> +	qcom_pcie_icc_update(pcie);
> +
> +	pcie->suspended = false;
>   
>   	return 0;
> +
> +deinit_host:
> +	qcom_pcie_host_deinit(&pcie->pci->pp);
> +revert_icc_tag:
> +	if (pcie->soc_is_rpmh) {
> +		icc_set_tag(pcie->icc_mem, QCOM_ICC_TAG_WAKE);
> +
> +		/* Ignore the retval, failing here would be tragic anyway.. */
> +		icc_enable(pcie->icc_mem);
> +	}
> +
> +	return ret;
>   }
>   
> -static int qcom_pcie_resume_noirq(struct device *dev)
> +static int qcom_pcie_suspend_noirq(struct device *dev)
>   {
>   	struct qcom_pcie *pcie = dev_get_drvdata(dev);
>   	int ret;
>   
> -	if (pcie->suspended) {
> -		ret = qcom_pcie_host_init(&pcie->pci->pp);
> +	if (pcie->suspended)
> +		return 0;
> +
> +	if (dw_pcie_link_up(pcie->pci)) {
> +		ret = qcom_pcie_stop_link(pcie->pci);
>   		if (ret)
>   			return ret;
> +	}
>   
> -		pcie->suspended = false;
> +	qcom_pcie_host_deinit(&pcie->pci->pp);
> +
> +	if (pcie->soc_is_rpmh) {
> +		/*
> +		 * The PCIe RC may be covertly accessed by the secure firmware
> +		 * on sleep exit. Use the WAKE bucket to let RPMh pull the plug
> +		 * on PCIe in sleep, but guarantee it comes back up for resume.
> +		 */
> +		icc_set_tag(pcie->icc_mem, QCOM_ICC_TAG_WAKE);
> +
> +		/* Flush the tag change */
> +		ret = icc_enable(pcie->icc_mem);
> +		if (ret) {
> +			dev_err(pcie->pci->dev, "failed to icc_enable %d\n", ret);
> +
> +			/* Revert everything and pray icc calls succeed */
> +			return qcom_pcie_resume_noirq(dev);
> +		}
> +	} else {
> +		/*
> +		 * Set minimum bandwidth required to keep data path functional
> +		 * during suspend.
> +		 */
calling qcom_pcie_host_deinit(&pcie->pci->pp) above will turn off all 
the resources, setting BW to 1Kbps will not make sense here.

- Krishna Chaitanya.
> +		ret = icc_set_bw(pcie->icc_mem, 0, kBps_to_icc(1));
> +		if (ret) {
> +			dev_err(dev, "Failed to set interconnect bandwidth: %d\n", ret);
> +			return ret;
> +		}
>   	}
>   
> -	qcom_pcie_icc_update(pcie);
> +	pcie->suspended = true;
>   
>   	return 0;
>   }
>
Konrad Dybcio March 27, 2024, 7:37 p.m. UTC | #6
On 20.02.2024 5:12 AM, Krishna Chaitanya Chundru wrote:
> 
> 
> On 2/10/2024 10:40 PM, Konrad Dybcio wrote:
>> Currently, we've only been minimizing the power draw while keeping the
>> RC up at all times. This is suboptimal, as it draws a whole lot of power
>> and prevents the SoC from power collapsing.
>>
>> Implement full shutdown and re-initialization to allow for powering off
>> the controller.
>>
>> This is mainly indended for SC8280XP with a broken power rail setup,
>> which requires a full RC shutdown/reinit in order to reach SoC-wide
>> power collapse, but sleeping is generally better than not sleeping and
>> less destructive suspend can be implemented later for platforms that
>> support it.
>>
>> Co-developed-by: Bjorn Andersson <quic_bjorande@quicinc.com>
>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
>> ---

[...]


>> +    if (pcie->soc_is_rpmh) {
>> +        /*
>> +         * The PCIe RC may be covertly accessed by the secure firmware
>> +         * on sleep exit. Use the WAKE bucket to let RPMh pull the plug
>> +         * on PCIe in sleep, but guarantee it comes back up for resume.
>> +         */
>> +        icc_set_tag(pcie->icc_mem, QCOM_ICC_TAG_WAKE);
>> +
>> +        /* Flush the tag change */
>> +        ret = icc_enable(pcie->icc_mem);
>> +        if (ret) {
>> +            dev_err(pcie->pci->dev, "failed to icc_enable %d\n", ret);
>> +
>> +            /* Revert everything and pray icc calls succeed */
>> +            return qcom_pcie_resume_noirq(dev);
>> +        }
>> +    } else {
>> +        /*
>> +         * Set minimum bandwidth required to keep data path functional
>> +         * during suspend.
>> +         */
> calling qcom_pcie_host_deinit(&pcie->pci->pp) above will turn off all the resources, setting BW to 1Kbps will not make sense here.

This is preserving the current behavior, it may be revised later.

See ad9b9b6e36c9 ("PCI: qcom: Add support for system suspend and resume")
that introduced it, in a perhaps overly 8280-centric fashion.

Konrad
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
index 8afacc90c63b..4ce266951cb6 100644
--- a/drivers/pci/controller/dwc/Kconfig
+++ b/drivers/pci/controller/dwc/Kconfig
@@ -268,6 +268,7 @@  config PCIE_DW_PLAT_EP
 config PCIE_QCOM
 	bool "Qualcomm PCIe controller (host mode)"
 	depends on OF && (ARCH_QCOM || COMPILE_TEST)
+	depends on QCOM_COMMAND_DB || QCOM_COMMAND_DB=n
 	depends on PCI_MSI
 	select PCIE_DW_HOST
 	select CRC8
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 6a469ed213ce..c807833ee4a7 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -30,13 +30,18 @@ 
 #include <linux/reset.h>
 #include <linux/slab.h>
 #include <linux/types.h>
+#include <soc/qcom/cmd-db.h>
 
 #include "../../pci.h"
 #include "pcie-designware.h"
 
+#include <dt-bindings/interconnect/qcom,icc.h>
+#include <dt-bindings/interconnect/qcom,rpm-icc.h>
+
 /* PARF registers */
 #define PARF_SYS_CTRL				0x00
 #define PARF_PM_CTRL				0x20
+#define PARF_PM_STTS				0x24
 #define PARF_PCS_DEEMPH				0x34
 #define PARF_PCS_SWING				0x38
 #define PARF_PHY_CTRL				0x40
@@ -80,7 +85,10 @@ 
 #define L1_CLK_RMV_DIS				BIT(1)
 
 /* PARF_PM_CTRL register fields */
-#define REQ_NOT_ENTR_L1				BIT(5)
+#define REQ_NOT_ENTR_L1				BIT(5) /* "Prevent L0->L1" */
+
+/* PARF_PM_STTS register fields */
+#define PM_ENTER_L23				BIT(5)
 
 /* PARF_PCS_DEEMPH register fields */
 #define PCS_DEEMPH_TX_DEEMPH_GEN1(x)		FIELD_PREP(GENMASK(21, 16), x)
@@ -122,6 +130,7 @@ 
 
 /* ELBI_SYS_CTRL register fields */
 #define ELBI_SYS_CTRL_LT_ENABLE			BIT(0)
+#define ELBI_SYS_CTRL_PME_TURNOFF_MSG		BIT(4)
 
 /* AXI_MSTR_RESP_COMP_CTRL0 register fields */
 #define CFG_REMOTE_RD_REQ_BRIDGE_SIZE_2K	0x4
@@ -243,6 +252,7 @@  struct qcom_pcie {
 	const struct qcom_pcie_cfg *cfg;
 	struct dentry *debugfs;
 	bool suspended;
+	bool soc_is_rpmh;
 };
 
 #define to_qcom_pcie(x)		dev_get_drvdata((x)->dev)
@@ -272,6 +282,24 @@  static int qcom_pcie_start_link(struct dw_pcie *pci)
 	return 0;
 }
 
+static int qcom_pcie_stop_link(struct dw_pcie *pci)
+{
+	struct qcom_pcie *pcie = to_qcom_pcie(pci);
+	u32 ret_l23, val;
+
+	writel(ELBI_SYS_CTRL_PME_TURNOFF_MSG, pcie->elbi + ELBI_SYS_CTRL);
+	readl(pcie->elbi + ELBI_SYS_CTRL);
+
+	ret_l23 = readl_poll_timeout(pcie->parf + PARF_PM_STTS, val,
+				     val & PM_ENTER_L23, 10000, 100000);
+	if (ret_l23) {
+		dev_err(pci->dev, "Failed to enter L2/L3\n");
+		return -ETIMEDOUT;
+	}
+
+	return 0;
+}
+
 static void qcom_pcie_clear_hpc(struct dw_pcie *pci)
 {
 	u16 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
@@ -987,9 +1015,19 @@  static void qcom_pcie_host_post_init_2_7_0(struct qcom_pcie *pcie)
 static void qcom_pcie_deinit_2_7_0(struct qcom_pcie *pcie)
 {
 	struct qcom_pcie_resources_2_7_0 *res = &pcie->res.v2_7_0;
+	u32 val;
+
+	/* Disable PCIe clocks and resets */
+	val = readl(pcie->parf + PARF_PHY_CTRL);
+	val |= PHY_TEST_PWR_DOWN;
+	writel(val, pcie->parf + PARF_PHY_CTRL);
+	readl(pcie->parf + PARF_PHY_CTRL);
 
 	clk_bulk_disable_unprepare(res->num_clks, res->clks);
 
+	reset_control_assert(res->rst);
+	usleep_range(2000, 2500);
+
 	regulator_bulk_disable(ARRAY_SIZE(res->supplies), res->supplies);
 }
 
@@ -1545,6 +1583,9 @@  static int qcom_pcie_probe(struct platform_device *pdev)
 		goto err_phy_exit;
 	}
 
+	/* If the soc features RPMh, cmd_db must have been prepared by now */
+	pcie->soc_is_rpmh = !cmd_db_ready();
+
 	qcom_pcie_icc_update(pcie);
 
 	if (pcie->mhi)
@@ -1561,58 +1602,108 @@  static int qcom_pcie_probe(struct platform_device *pdev)
 	return ret;
 }
 
-static int qcom_pcie_suspend_noirq(struct device *dev)
+static int qcom_pcie_resume_noirq(struct device *dev)
 {
 	struct qcom_pcie *pcie = dev_get_drvdata(dev);
 	int ret;
 
-	/*
-	 * Set minimum bandwidth required to keep data path functional during
-	 * suspend.
-	 */
-	ret = icc_set_bw(pcie->icc_mem, 0, kBps_to_icc(1));
-	if (ret) {
-		dev_err(dev, "Failed to set interconnect bandwidth: %d\n", ret);
-		return ret;
+	if (pcie->soc_is_rpmh) {
+		/*
+		 * Undo the tag change from qcom_pcie_suspend_noirq first in
+		 * case RPMh spontaneously decides to power collapse the
+		 * platform based on other inputs.
+		 */
+		icc_set_tag(pcie->icc_mem, QCOM_ICC_TAG_ALWAYS);
+
+		/* Flush the tag change */
+		ret = icc_enable(pcie->icc_mem);
+		if (ret) {
+			dev_err(pcie->pci->dev, "failed to icc_enable: %d", ret);
+			return ret;
+		}
 	}
 
-	/*
-	 * Turn OFF the resources only for controllers without active PCIe
-	 * devices. For controllers with active devices, the resources are kept
-	 * ON and the link is expected to be in L0/L1 (sub)states.
-	 *
-	 * Turning OFF the resources for controllers with active PCIe devices
-	 * will trigger access violation during the end of the suspend cycle,
-	 * as kernel tries to access the PCIe devices config space for masking
-	 * MSIs.
-	 *
-	 * Also, it is not desirable to put the link into L2/L3 state as that
-	 * implies VDD supply will be removed and the devices may go into
-	 * powerdown state. This will affect the lifetime of the storage devices
-	 * like NVMe.
-	 */
-	if (!dw_pcie_link_up(pcie->pci)) {
-		qcom_pcie_host_deinit(&pcie->pci->pp);
-		pcie->suspended = true;
-	}
+	/* Only check this now to make sure the icc tag has been set. */
+	if (!pcie->suspended)
+		return 0;
+
+	ret = qcom_pcie_host_init(&pcie->pci->pp);
+	if (ret)
+		goto revert_icc_tag;
+
+	dw_pcie_setup_rc(&pcie->pci->pp);
+
+	ret = qcom_pcie_start_link(pcie->pci);
+	if (ret)
+		goto deinit_host;
+
+	/* Ignore the retval, the devices may come up later. */
+	dw_pcie_wait_for_link(pcie->pci);
+
+	qcom_pcie_icc_update(pcie);
+
+	pcie->suspended = false;
 
 	return 0;
+
+deinit_host:
+	qcom_pcie_host_deinit(&pcie->pci->pp);
+revert_icc_tag:
+	if (pcie->soc_is_rpmh) {
+		icc_set_tag(pcie->icc_mem, QCOM_ICC_TAG_WAKE);
+
+		/* Ignore the retval, failing here would be tragic anyway.. */
+		icc_enable(pcie->icc_mem);
+	}
+
+	return ret;
 }
 
-static int qcom_pcie_resume_noirq(struct device *dev)
+static int qcom_pcie_suspend_noirq(struct device *dev)
 {
 	struct qcom_pcie *pcie = dev_get_drvdata(dev);
 	int ret;
 
-	if (pcie->suspended) {
-		ret = qcom_pcie_host_init(&pcie->pci->pp);
+	if (pcie->suspended)
+		return 0;
+
+	if (dw_pcie_link_up(pcie->pci)) {
+		ret = qcom_pcie_stop_link(pcie->pci);
 		if (ret)
 			return ret;
+	}
 
-		pcie->suspended = false;
+	qcom_pcie_host_deinit(&pcie->pci->pp);
+
+	if (pcie->soc_is_rpmh) {
+		/*
+		 * The PCIe RC may be covertly accessed by the secure firmware
+		 * on sleep exit. Use the WAKE bucket to let RPMh pull the plug
+		 * on PCIe in sleep, but guarantee it comes back up for resume.
+		 */
+		icc_set_tag(pcie->icc_mem, QCOM_ICC_TAG_WAKE);
+
+		/* Flush the tag change */
+		ret = icc_enable(pcie->icc_mem);
+		if (ret) {
+			dev_err(pcie->pci->dev, "failed to icc_enable %d\n", ret);
+
+			/* Revert everything and pray icc calls succeed */
+			return qcom_pcie_resume_noirq(dev);
+		}
+	} else {
+		/*
+		 * Set minimum bandwidth required to keep data path functional
+		 * during suspend.
+		 */
+		ret = icc_set_bw(pcie->icc_mem, 0, kBps_to_icc(1));
+		if (ret) {
+			dev_err(dev, "Failed to set interconnect bandwidth: %d\n", ret);
+			return ret;
+		}
 	}
 
-	qcom_pcie_icc_update(pcie);
+	pcie->suspended = true;
 
 	return 0;
 }