diff mbox series

[v1] PCI: dwc: Clean up some unnecessary codes in dw_pcie_suspend_noirq()

Message ID 20241107084455.3623576-1-hongxing.zhu@nxp.com (mailing list archive)
State Superseded
Delegated to: Krzysztof Wilczyński
Headers show
Series [v1] PCI: dwc: Clean up some unnecessary codes in dw_pcie_suspend_noirq() | expand

Commit Message

Hongxing Zhu Nov. 7, 2024, 8:44 a.m. UTC
Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's safe
to send PME_TURN_OFF message regardless of whether the link is up or
down. So, there would be no need to test the LTSSM stat before sending
PME_TURN_OFF message.

Remove the L2 poll too, after the PME_TURN_OFF message is sent out.
Because the re-initialization would be done in dw_pcie_resume_noirq().

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
---
 .../pci/controller/dwc/pcie-designware-host.c | 20 ++++---------------
 1 file changed, 4 insertions(+), 16 deletions(-)

Comments

Krishna chaitanya chundru Nov. 7, 2024, 10:09 a.m. UTC | #1
On 11/7/2024 2:14 PM, Richard Zhu wrote:
> Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's safe
> to send PME_TURN_OFF message regardless of whether the link is up or
> down. So, there would be no need to test the LTSSM stat before sending
> PME_TURN_OFF message.
> 
> Remove the L2 poll too, after the PME_TURN_OFF message is sent out.
> Because the re-initialization would be done in dw_pcie_resume_noirq().
>
we should not remove the poll here, it is required for the endpoint
to go gracefully in to L2. Some endpoints can have some cleanups needs
to be done before entering into L2 or L3. For the PME turnoff message,
the endpoints needs to send L23 ack which indicates endpoint is
ready to L2 without that it will not be gracefull D3cold sequence.

-Krishna Chaitanya.

> Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
> ---
>   .../pci/controller/dwc/pcie-designware-host.c | 20 ++++---------------
>   1 file changed, 4 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> index f86347452026..64c49adf81d2 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -917,7 +917,6 @@ static int dw_pcie_pme_turn_off(struct dw_pcie *pci)
>   int dw_pcie_suspend_noirq(struct dw_pcie *pci)
>   {
>   	u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
> -	u32 val;
>   	int ret = 0;
>   
>   	/*
> @@ -927,23 +926,12 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
>   	if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
>   		return 0;
>   
> -	/* Only send out PME_TURN_OFF when PCIE link is up */
> -	if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
> -		if (pci->pp.ops->pme_turn_off)
> -			pci->pp.ops->pme_turn_off(&pci->pp);
> -		else
> -			ret = dw_pcie_pme_turn_off(pci);
> -
> +	if (pci->pp.ops->pme_turn_off) {
> +		pci->pp.ops->pme_turn_off(&pci->pp);
> +	} else {
> +		ret = dw_pcie_pme_turn_off(pci);
>   		if (ret)
>   			return ret;
> -
> -		ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
> -					PCIE_PME_TO_L2_TIMEOUT_US/10,
> -					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> -		if (ret) {
> -			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
> -			return ret; > -		}>   	}
>   
>   	dw_pcie_stop_link(pci);
Manivannan Sadhasivam Nov. 7, 2024, 11:13 a.m. UTC | #2
On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's safe
> to send PME_TURN_OFF message regardless of whether the link is up or
> down. So, there would be no need to test the LTSSM stat before sending
> PME_TURN_OFF message.
> 

What is the incentive to send PME_Turn_Off when link is not up?

> Remove the L2 poll too, after the PME_TURN_OFF message is sent out.
> Because the re-initialization would be done in dw_pcie_resume_noirq().
> 

As Krishna explained, host needs to wait until the endpoint acks the message
(just to give it some time to do cleanups). Then only the host can initiate
D3Cold. It matters when the device supports L2.

- Mani

> Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
> ---
>  .../pci/controller/dwc/pcie-designware-host.c | 20 ++++---------------
>  1 file changed, 4 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> index f86347452026..64c49adf81d2 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -917,7 +917,6 @@ static int dw_pcie_pme_turn_off(struct dw_pcie *pci)
>  int dw_pcie_suspend_noirq(struct dw_pcie *pci)
>  {
>  	u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
> -	u32 val;
>  	int ret = 0;
>  
>  	/*
> @@ -927,23 +926,12 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
>  	if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
>  		return 0;
>  
> -	/* Only send out PME_TURN_OFF when PCIE link is up */
> -	if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
> -		if (pci->pp.ops->pme_turn_off)
> -			pci->pp.ops->pme_turn_off(&pci->pp);
> -		else
> -			ret = dw_pcie_pme_turn_off(pci);
> -
> +	if (pci->pp.ops->pme_turn_off) {
> +		pci->pp.ops->pme_turn_off(&pci->pp);
> +	} else {
> +		ret = dw_pcie_pme_turn_off(pci);
>  		if (ret)
>  			return ret;
> -
> -		ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
> -					PCIE_PME_TO_L2_TIMEOUT_US/10,
> -					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> -		if (ret) {
> -			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
> -			return ret;
> -		}
>  	}
>  
>  	dw_pcie_stop_link(pci);
> -- 
> 2.37.1
>
Frank Li Nov. 7, 2024, 4:08 p.m. UTC | #3
On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam wrote:
> On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's safe
> > to send PME_TURN_OFF message regardless of whether the link is up or
> > down. So, there would be no need to test the LTSSM stat before sending
> > PME_TURN_OFF message.
> >
>
> What is the incentive to send PME_Turn_Off when link is not up?

see Bjorn's comments in https://lore.kernel.org/imx/20241106222933.GA1543549@bhelgaas/

"But I don't think you responded to the race question.  What happens
here?

  if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
    --> link goes down here <--
    pci->pp.ops->pme_turn_off(&pci->pp);

You decide the LTSSM is active and the link is up.  Then the link goes
down.  Then you send PME_Turn_off.  Now what?

If it's safe to try to send PME_Turn_off regardless of whether the
link is up or down, there would be no need to test the LTSSM state."

I think it may happen if EP device HOT remove/reset after if check.

Frank
>
> > Remove the L2 poll too, after the PME_TURN_OFF message is sent out.
> > Because the re-initialization would be done in dw_pcie_resume_noirq().
> >
>
> As Krishna explained, host needs to wait until the endpoint acks the message
> (just to give it some time to do cleanups). Then only the host can initiate
> D3Cold. It matters when the device supports L2.
>
> - Mani
>
> > Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
> > ---
> >  .../pci/controller/dwc/pcie-designware-host.c | 20 ++++---------------
> >  1 file changed, 4 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > index f86347452026..64c49adf81d2 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > @@ -917,7 +917,6 @@ static int dw_pcie_pme_turn_off(struct dw_pcie *pci)
> >  int dw_pcie_suspend_noirq(struct dw_pcie *pci)
> >  {
> >  	u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
> > -	u32 val;
> >  	int ret = 0;
> >
> >  	/*
> > @@ -927,23 +926,12 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
> >  	if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
> >  		return 0;
> >
> > -	/* Only send out PME_TURN_OFF when PCIE link is up */
> > -	if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
> > -		if (pci->pp.ops->pme_turn_off)
> > -			pci->pp.ops->pme_turn_off(&pci->pp);
> > -		else
> > -			ret = dw_pcie_pme_turn_off(pci);
> > -
> > +	if (pci->pp.ops->pme_turn_off) {
> > +		pci->pp.ops->pme_turn_off(&pci->pp);
> > +	} else {
> > +		ret = dw_pcie_pme_turn_off(pci);
> >  		if (ret)
> >  			return ret;
> > -
> > -		ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
> > -					PCIE_PME_TO_L2_TIMEOUT_US/10,
> > -					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> > -		if (ret) {
> > -			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
> > -			return ret;
> > -		}
> >  	}
> >
> >  	dw_pcie_stop_link(pci);
> > --
> > 2.37.1
> >
>
> --
> மணிவண்ணன் சதாசிவம்
Manivannan Sadhasivam Nov. 7, 2024, 4:30 p.m. UTC | #4
On Thu, Nov 07, 2024 at 11:08:37AM -0500, Frank Li wrote:
> On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam wrote:
> > On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > > Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's safe
> > > to send PME_TURN_OFF message regardless of whether the link is up or
> > > down. So, there would be no need to test the LTSSM stat before sending
> > > PME_TURN_OFF message.
> > >
> >
> > What is the incentive to send PME_Turn_Off when link is not up?
> 
> see Bjorn's comments in https://lore.kernel.org/imx/20241106222933.GA1543549@bhelgaas/
> 

Thanks for the pointer. Let me reply there itsef.

- Mani

> "But I don't think you responded to the race question.  What happens
> here?
> 
>   if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
>     --> link goes down here <--
>     pci->pp.ops->pme_turn_off(&pci->pp);
> 
> You decide the LTSSM is active and the link is up.  Then the link goes
> down.  Then you send PME_Turn_off.  Now what?
> 
> If it's safe to try to send PME_Turn_off regardless of whether the
> link is up or down, there would be no need to test the LTSSM state."
> 
> I think it may happen if EP device HOT remove/reset after if check.
> 
> Frank
> >
> > > Remove the L2 poll too, after the PME_TURN_OFF message is sent out.
> > > Because the re-initialization would be done in dw_pcie_resume_noirq().
> > >
> >
> > As Krishna explained, host needs to wait until the endpoint acks the message
> > (just to give it some time to do cleanups). Then only the host can initiate
> > D3Cold. It matters when the device supports L2.
> >
> > - Mani
> >
> > > Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
> > > ---
> > >  .../pci/controller/dwc/pcie-designware-host.c | 20 ++++---------------
> > >  1 file changed, 4 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > index f86347452026..64c49adf81d2 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > @@ -917,7 +917,6 @@ static int dw_pcie_pme_turn_off(struct dw_pcie *pci)
> > >  int dw_pcie_suspend_noirq(struct dw_pcie *pci)
> > >  {
> > >  	u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
> > > -	u32 val;
> > >  	int ret = 0;
> > >
> > >  	/*
> > > @@ -927,23 +926,12 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
> > >  	if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
> > >  		return 0;
> > >
> > > -	/* Only send out PME_TURN_OFF when PCIE link is up */
> > > -	if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
> > > -		if (pci->pp.ops->pme_turn_off)
> > > -			pci->pp.ops->pme_turn_off(&pci->pp);
> > > -		else
> > > -			ret = dw_pcie_pme_turn_off(pci);
> > > -
> > > +	if (pci->pp.ops->pme_turn_off) {
> > > +		pci->pp.ops->pme_turn_off(&pci->pp);
> > > +	} else {
> > > +		ret = dw_pcie_pme_turn_off(pci);
> > >  		if (ret)
> > >  			return ret;
> > > -
> > > -		ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
> > > -					PCIE_PME_TO_L2_TIMEOUT_US/10,
> > > -					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> > > -		if (ret) {
> > > -			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
> > > -			return ret;
> > > -		}
> > >  	}
> > >
> > >  	dw_pcie_stop_link(pci);
> > > --
> > > 2.37.1
> > >
> >
> > --
> > மணிவண்ணன் சதாசிவம்
Bjorn Helgaas Nov. 8, 2024, 12:24 a.m. UTC | #5
On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam wrote:
> On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's
> > safe to send PME_TURN_OFF message regardless of whether the link
> > is up or down. So, there would be no need to test the LTSSM stat
> > before sending PME_TURN_OFF message.
> 
> What is the incentive to send PME_Turn_Off when link is not up?

There's no need to send PME_Turn_Off when link is not up.

But a link-up check is inherently racy because the link may go down
between the check and the PME_Turn_Off.  Since it's impossible for
software to guarantee the link is up, the Root Port should be able to
tolerate attempts to send PME_Turn_Off when the link is down.

So IMO there's no need to check whether the link is up, and checking
gives the misleading impression that "we know the link is up and
therefore sending PME_Turn_Off is safe."

> > Remove the L2 poll too, after the PME_TURN_OFF message is sent
> > out.  Because the re-initialization would be done in
> > dw_pcie_resume_noirq().
> 
> As Krishna explained, host needs to wait until the endpoint acks the
> message (just to give it some time to do cleanups). Then only the
> host can initiate D3Cold. It matters when the device supports L2.

The important thing here is to be clear about the *reason* to poll for
L2 and the *event* that must wait for L2.

I don't have any DesignWare specs, but when dw_pcie_suspend_noirq()
waits for DW_PCIE_LTSSM_L2_IDLE, I think what we're doing is waiting
for the link to be in the L2/L3 Ready pseudo-state (PCIe r6.0, sec
5.2, fig 5-1).

L2 and L3 are states where main power to the downstream component is
off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so there is
no link in those states.

The PME_Turn_Off handshake is part of the process to put the
downstream component in D3cold.  I think the reason for this handshake
is to allow an orderly shutdown of that component before main power is
removed.

When the downstream component receives PME_Turn_Off, it will stop
scheduling new TLPs, but it may already have TLPs scheduled but not
yet sent.  If power were removed immediately, they would be lost.  My
understanding is that the link will not enter L2/L3 Ready until the
components on both ends have completed whatever needs to be done with
those TLPs.  (This is based on the L2/L3 discussion in the Mindshare
PCIe book; I haven't found clear spec citations for all of it.)

I think waiting for L2/L3 Ready is to keep us from turning off main
power when the components are still trying to dispose of those TLPs.

So I think every controller that turns off main power needs to wait
for L2/L3 Ready.

There's also a requirement that software wait at least 100 ns after
L2/L3 Ready before turning off refclock and main power (sec
5.3.3.2.1).

Bjorn
Krishna chaitanya chundru Nov. 10, 2024, 12:10 a.m. UTC | #6
On 11/8/2024 5:54 AM, Bjorn Helgaas wrote:
> On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam wrote:
>> On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
>>> Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's
>>> safe to send PME_TURN_OFF message regardless of whether the link
>>> is up or down. So, there would be no need to test the LTSSM stat
>>> before sending PME_TURN_OFF message.
>>
>> What is the incentive to send PME_Turn_Off when link is not up?
> 
> There's no need to send PME_Turn_Off when link is not up.
> 
> But a link-up check is inherently racy because the link may go down
> between the check and the PME_Turn_Off.  Since it's impossible for
> software to guarantee the link is up, the Root Port should be able to
> tolerate attempts to send PME_Turn_Off when the link is down.
> 
> So IMO there's no need to check whether the link is up, and checking
> gives the misleading impression that "we know the link is up and
> therefore sending PME_Turn_Off is safe."
> 
Hi Bjorn,

I agree that link-up check is racy but once link is up and link has
gone down due to some reason the ltssm state will not move detect quiet
or detect act, it will go to pre detect quiet (i.e value 0f 0x5).
we can assume if the link is up LTSSM state will greater than detect act
even if the link was down.

- Krishna Chaitanya.
>>> Remove the L2 poll too, after the PME_TURN_OFF message is sent
>>> out.  Because the re-initialization would be done in
>>> dw_pcie_resume_noirq().
>>
>> As Krishna explained, host needs to wait until the endpoint acks the
>> message (just to give it some time to do cleanups). Then only the
>> host can initiate D3Cold. It matters when the device supports L2.
> 
> The important thing here is to be clear about the *reason* to poll for
> L2 and the *event* that must wait for L2.
> 
> I don't have any DesignWare specs, but when dw_pcie_suspend_noirq()
> waits for DW_PCIE_LTSSM_L2_IDLE, I think what we're doing is waiting
> for the link to be in the L2/L3 Ready pseudo-state (PCIe r6.0, sec
> 5.2, fig 5-1).
> 
> L2 and L3 are states where main power to the downstream component is
> off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so there is
> no link in those states.
> 
> The PME_Turn_Off handshake is part of the process to put the
> downstream component in D3cold.  I think the reason for this handshake
> is to allow an orderly shutdown of that component before main power is
> removed.
> 
> When the downstream component receives PME_Turn_Off, it will stop
> scheduling new TLPs, but it may already have TLPs scheduled but not
> yet sent.  If power were removed immediately, they would be lost.  My
> understanding is that the link will not enter L2/L3 Ready until the
> components on both ends have completed whatever needs to be done with
> those TLPs.  (This is based on the L2/L3 discussion in the Mindshare
> PCIe book; I haven't found clear spec citations for all of it.)
> 
> I think waiting for L2/L3 Ready is to keep us from turning off main
> power when the components are still trying to dispose of those TLPs.
> 
> So I think every controller that turns off main power needs to wait
> for L2/L3 Ready.
> 
> There's also a requirement that software wait at least 100 ns after
> L2/L3 Ready before turning off refclock and main power (sec
> 5.3.3.2.1).
> 
> Bjorn
>
Hongxing Zhu Nov. 11, 2024, 3:29 a.m. UTC | #7
> -----Original Message-----
> From: Krishna Chaitanya Chundru <quic_krichai@quicinc.com>
> Sent: 2024年11月10日 8:10
> To: Bjorn Helgaas <helgaas@kernel.org>; Manivannan Sadhasivam
> <manivannan.sadhasivam@linaro.org>
> Cc: Hongxing Zhu <hongxing.zhu@nxp.com>; jingoohan1@gmail.com;
> bhelgaas@google.com; lpieralisi@kernel.org; kw@linux.com;
> robh@kernel.org; Frank Li <frank.li@nxp.com>; imx@lists.linux.dev;
> kernel@pengutronix.de; linux-pci@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1] PCI: dwc: Clean up some unnecessary codes in
> dw_pcie_suspend_noirq()
> 
> 
> 
> On 11/8/2024 5:54 AM, Bjorn Helgaas wrote:
> > On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam
> wrote:
> >> On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> >>> Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's
> >>> safe to send PME_TURN_OFF message regardless of whether the link is
> >>> up or down. So, there would be no need to test the LTSSM stat before
> >>> sending PME_TURN_OFF message.
> >>
> >> What is the incentive to send PME_Turn_Off when link is not up?
> >
> > There's no need to send PME_Turn_Off when link is not up.
> >
> > But a link-up check is inherently racy because the link may go down
> > between the check and the PME_Turn_Off.  Since it's impossible for
> > software to guarantee the link is up, the Root Port should be able to
> > tolerate attempts to send PME_Turn_Off when the link is down.
> >
> > So IMO there's no need to check whether the link is up, and checking
> > gives the misleading impression that "we know the link is up and
> > therefore sending PME_Turn_Off is safe."
> >
> Hi Bjorn,
> 
> I agree that link-up check is racy but once link is up and link has gone down
> due to some reason the ltssm state will not move detect quiet or detect act, it
> will go to pre detect quiet (i.e value 0f 0x5).
> we can assume if the link is up LTSSM state will greater than detect act even if
> the link was down.
> 
> - Krishna Chaitanya.
> >>> Remove the L2 poll too, after the PME_TURN_OFF message is sent out.
> >>> Because the re-initialization would be done in
> >>> dw_pcie_resume_noirq().
> >>
> >> As Krishna explained, host needs to wait until the endpoint acks the
> >> message (just to give it some time to do cleanups). Then only the
> >> host can initiate D3Cold. It matters when the device supports L2.
> >
> > The important thing here is to be clear about the *reason* to poll for
> > L2 and the *event* that must wait for L2.
> >
> > I don't have any DesignWare specs, but when dw_pcie_suspend_noirq()
> > waits for DW_PCIE_LTSSM_L2_IDLE, I think what we're doing is waiting
> > for the link to be in the L2/L3 Ready pseudo-state (PCIe r6.0, sec
> > 5.2, fig 5-1).
> >
> > L2 and L3 are states where main power to the downstream component is
> > off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so there is
> > no link in those states.
> >
> > The PME_Turn_Off handshake is part of the process to put the
> > downstream component in D3cold.  I think the reason for this handshake
> > is to allow an orderly shutdown of that component before main power is
> > removed.
> >
> > When the downstream component receives PME_Turn_Off, it will stop
> > scheduling new TLPs, but it may already have TLPs scheduled but not
> > yet sent.  If power were removed immediately, they would be lost.  My
> > understanding is that the link will not enter L2/L3 Ready until the
> > components on both ends have completed whatever needs to be done with
> > those TLPs.  (This is based on the L2/L3 discussion in the Mindshare
> > PCIe book; I haven't found clear spec citations for all of it.)
> >
> > I think waiting for L2/L3 Ready is to keep us from turning off main
> > power when the components are still trying to dispose of those TLPs.
> >
> > So I think every controller that turns off main power needs to wait
> > for L2/L3 Ready.
> >
> > There's also a requirement that software wait at least 100 ns after
> > L2/L3 Ready before turning off refclock and main power (sec
> > 5.3.3.2.1).
Thanks for the comments.
So, the L2 poll is better kept, since PCIe r6.0, sec 5.3.3.2.1 also recommends
 1ms to 10ms timeout to check L2 ready or not.
The v2 of this patch would only remove the LTSSM stat check when issue
 the PME_TURN_OFF message if there are no further comments.

Thanks again for this discussion, it's very helpful.
Best Regards
Richard Zhu

> >
> > Bjorn
> >
Manivannan Sadhasivam Nov. 11, 2024, 5:33 a.m. UTC | #8
On Mon, Nov 11, 2024 at 03:29:18AM +0000, Hongxing Zhu wrote:
> > -----Original Message-----
> > From: Krishna Chaitanya Chundru <quic_krichai@quicinc.com>
> > Sent: 2024年11月10日 8:10
> > To: Bjorn Helgaas <helgaas@kernel.org>; Manivannan Sadhasivam
> > <manivannan.sadhasivam@linaro.org>
> > Cc: Hongxing Zhu <hongxing.zhu@nxp.com>; jingoohan1@gmail.com;
> > bhelgaas@google.com; lpieralisi@kernel.org; kw@linux.com;
> > robh@kernel.org; Frank Li <frank.li@nxp.com>; imx@lists.linux.dev;
> > kernel@pengutronix.de; linux-pci@vger.kernel.org;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH v1] PCI: dwc: Clean up some unnecessary codes in
> > dw_pcie_suspend_noirq()
> > 
> > 
> > 
> > On 11/8/2024 5:54 AM, Bjorn Helgaas wrote:
> > > On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam
> > wrote:
> > >> On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > >>> Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's
> > >>> safe to send PME_TURN_OFF message regardless of whether the link is
> > >>> up or down. So, there would be no need to test the LTSSM stat before
> > >>> sending PME_TURN_OFF message.
> > >>
> > >> What is the incentive to send PME_Turn_Off when link is not up?
> > >
> > > There's no need to send PME_Turn_Off when link is not up.
> > >
> > > But a link-up check is inherently racy because the link may go down
> > > between the check and the PME_Turn_Off.  Since it's impossible for
> > > software to guarantee the link is up, the Root Port should be able to
> > > tolerate attempts to send PME_Turn_Off when the link is down.
> > >
> > > So IMO there's no need to check whether the link is up, and checking
> > > gives the misleading impression that "we know the link is up and
> > > therefore sending PME_Turn_Off is safe."
> > >
> > Hi Bjorn,
> > 
> > I agree that link-up check is racy but once link is up and link has gone down
> > due to some reason the ltssm state will not move detect quiet or detect act, it
> > will go to pre detect quiet (i.e value 0f 0x5).
> > we can assume if the link is up LTSSM state will greater than detect act even if
> > the link was down.
> > 
> > - Krishna Chaitanya.
> > >>> Remove the L2 poll too, after the PME_TURN_OFF message is sent out.
> > >>> Because the re-initialization would be done in
> > >>> dw_pcie_resume_noirq().
> > >>
> > >> As Krishna explained, host needs to wait until the endpoint acks the
> > >> message (just to give it some time to do cleanups). Then only the
> > >> host can initiate D3Cold. It matters when the device supports L2.
> > >
> > > The important thing here is to be clear about the *reason* to poll for
> > > L2 and the *event* that must wait for L2.
> > >
> > > I don't have any DesignWare specs, but when dw_pcie_suspend_noirq()
> > > waits for DW_PCIE_LTSSM_L2_IDLE, I think what we're doing is waiting
> > > for the link to be in the L2/L3 Ready pseudo-state (PCIe r6.0, sec
> > > 5.2, fig 5-1).
> > >
> > > L2 and L3 are states where main power to the downstream component is
> > > off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so there is
> > > no link in those states.
> > >
> > > The PME_Turn_Off handshake is part of the process to put the
> > > downstream component in D3cold.  I think the reason for this handshake
> > > is to allow an orderly shutdown of that component before main power is
> > > removed.
> > >
> > > When the downstream component receives PME_Turn_Off, it will stop
> > > scheduling new TLPs, but it may already have TLPs scheduled but not
> > > yet sent.  If power were removed immediately, they would be lost.  My
> > > understanding is that the link will not enter L2/L3 Ready until the
> > > components on both ends have completed whatever needs to be done with
> > > those TLPs.  (This is based on the L2/L3 discussion in the Mindshare
> > > PCIe book; I haven't found clear spec citations for all of it.)
> > >
> > > I think waiting for L2/L3 Ready is to keep us from turning off main
> > > power when the components are still trying to dispose of those TLPs.
> > >
> > > So I think every controller that turns off main power needs to wait
> > > for L2/L3 Ready.
> > >
> > > There's also a requirement that software wait at least 100 ns after
> > > L2/L3 Ready before turning off refclock and main power (sec
> > > 5.3.3.2.1).
> Thanks for the comments.
> So, the L2 poll is better kept, since PCIe r6.0, sec 5.3.3.2.1 also recommends
>  1ms to 10ms timeout to check L2 ready or not.
> The v2 of this patch would only remove the LTSSM stat check when issue
>  the PME_TURN_OFF message if there are no further comments.
> 

If you unconditionally send PME_Turn_Off message, then you'd end up polling for
L23 Ready, which may result in a timeout and users will see the error message.
This is my concern.

- Mani
Manivannan Sadhasivam Nov. 11, 2024, 6:09 a.m. UTC | #9
On Thu, Nov 07, 2024 at 06:24:25PM -0600, Bjorn Helgaas wrote:
> On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam wrote:
> > On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > > Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's
> > > safe to send PME_TURN_OFF message regardless of whether the link
> > > is up or down. So, there would be no need to test the LTSSM stat
> > > before sending PME_TURN_OFF message.
> > 
> > What is the incentive to send PME_Turn_Off when link is not up?
> 
> There's no need to send PME_Turn_Off when link is not up.
> 
> But a link-up check is inherently racy because the link may go down
> between the check and the PME_Turn_Off.  Since it's impossible for
> software to guarantee the link is up, the Root Port should be able to
> tolerate attempts to send PME_Turn_Off when the link is down.
> 
> So IMO there's no need to check whether the link is up, and checking
> gives the misleading impression that "we know the link is up and
> therefore sending PME_Turn_Off is safe."
> 

I agree that the check is racy (not sure if there is a better way to avoid
that), but if you send the PME_Turn_Off unconditionally, then it will result in
L23 Ready timeout and users will see the error message.

> > > Remove the L2 poll too, after the PME_TURN_OFF message is sent
> > > out.  Because the re-initialization would be done in
> > > dw_pcie_resume_noirq().
> > 
> > As Krishna explained, host needs to wait until the endpoint acks the
> > message (just to give it some time to do cleanups). Then only the
> > host can initiate D3Cold. It matters when the device supports L2.
> 
> The important thing here is to be clear about the *reason* to poll for
> L2 and the *event* that must wait for L2.
> 
> I don't have any DesignWare specs, but when dw_pcie_suspend_noirq()
> waits for DW_PCIE_LTSSM_L2_IDLE, I think what we're doing is waiting
> for the link to be in the L2/L3 Ready pseudo-state (PCIe r6.0, sec
> 5.2, fig 5-1).
> 
> L2 and L3 are states where main power to the downstream component is
> off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so there is
> no link in those states.
> 
> The PME_Turn_Off handshake is part of the process to put the
> downstream component in D3cold.  I think the reason for this handshake
> is to allow an orderly shutdown of that component before main power is
> removed.
> 
> When the downstream component receives PME_Turn_Off, it will stop
> scheduling new TLPs, but it may already have TLPs scheduled but not
> yet sent.  If power were removed immediately, they would be lost.  My
> understanding is that the link will not enter L2/L3 Ready until the
> components on both ends have completed whatever needs to be done with
> those TLPs.  (This is based on the L2/L3 discussion in the Mindshare
> PCIe book; I haven't found clear spec citations for all of it.)
> 
> I think waiting for L2/L3 Ready is to keep us from turning off main
> power when the components are still trying to dispose of those TLPs.
> 

Not just disposing TLPs as per the spec, most endpoints also need to reset their
state machine as well (if there is a way for the endpoint sw to delay sending
L23 Ready).

> So I think every controller that turns off main power needs to wait
> for L2/L3 Ready.
> 
> There's also a requirement that software wait at least 100 ns after
> L2/L3 Ready before turning off refclock and main power (sec
> 5.3.3.2.1).
> 

Right. Usually, the delay after PERST# assert would make sure this, but in
layerscape driver (user of dw_pcie_suspend_noirq) I don't see power/refclk
removal.

Richard Zhu/Frank, thoughts?

- Mani
Frank Li Nov. 11, 2024, 4:18 p.m. UTC | #10
On Mon, Nov 11, 2024 at 11:03:22AM +0530, Manivannan Sadhasivam wrote:
> On Mon, Nov 11, 2024 at 03:29:18AM +0000, Hongxing Zhu wrote:
> > > -----Original Message-----
> > > From: Krishna Chaitanya Chundru <quic_krichai@quicinc.com>
> > > Sent: 2024年11月10日 8:10
> > > To: Bjorn Helgaas <helgaas@kernel.org>; Manivannan Sadhasivam
> > > <manivannan.sadhasivam@linaro.org>
> > > Cc: Hongxing Zhu <hongxing.zhu@nxp.com>; jingoohan1@gmail.com;
> > > bhelgaas@google.com; lpieralisi@kernel.org; kw@linux.com;
> > > robh@kernel.org; Frank Li <frank.li@nxp.com>; imx@lists.linux.dev;
> > > kernel@pengutronix.de; linux-pci@vger.kernel.org;
> > > linux-kernel@vger.kernel.org
> > > Subject: Re: [PATCH v1] PCI: dwc: Clean up some unnecessary codes in
> > > dw_pcie_suspend_noirq()
> > >
> > >
> > >
> > > On 11/8/2024 5:54 AM, Bjorn Helgaas wrote:
> > > > On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam
> > > wrote:
> > > >> On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > > >>> Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's
> > > >>> safe to send PME_TURN_OFF message regardless of whether the link is
> > > >>> up or down. So, there would be no need to test the LTSSM stat before
> > > >>> sending PME_TURN_OFF message.
> > > >>
> > > >> What is the incentive to send PME_Turn_Off when link is not up?
> > > >
> > > > There's no need to send PME_Turn_Off when link is not up.
> > > >
> > > > But a link-up check is inherently racy because the link may go down
> > > > between the check and the PME_Turn_Off.  Since it's impossible for
> > > > software to guarantee the link is up, the Root Port should be able to
> > > > tolerate attempts to send PME_Turn_Off when the link is down.
> > > >
> > > > So IMO there's no need to check whether the link is up, and checking
> > > > gives the misleading impression that "we know the link is up and
> > > > therefore sending PME_Turn_Off is safe."
> > > >
> > > Hi Bjorn,
> > >
> > > I agree that link-up check is racy but once link is up and link has gone down
> > > due to some reason the ltssm state will not move detect quiet or detect act, it
> > > will go to pre detect quiet (i.e value 0f 0x5).
> > > we can assume if the link is up LTSSM state will greater than detect act even if
> > > the link was down.
> > >
> > > - Krishna Chaitanya.
> > > >>> Remove the L2 poll too, after the PME_TURN_OFF message is sent out.
> > > >>> Because the re-initialization would be done in
> > > >>> dw_pcie_resume_noirq().
> > > >>
> > > >> As Krishna explained, host needs to wait until the endpoint acks the
> > > >> message (just to give it some time to do cleanups). Then only the
> > > >> host can initiate D3Cold. It matters when the device supports L2.
> > > >
> > > > The important thing here is to be clear about the *reason* to poll for
> > > > L2 and the *event* that must wait for L2.
> > > >
> > > > I don't have any DesignWare specs, but when dw_pcie_suspend_noirq()
> > > > waits for DW_PCIE_LTSSM_L2_IDLE, I think what we're doing is waiting
> > > > for the link to be in the L2/L3 Ready pseudo-state (PCIe r6.0, sec
> > > > 5.2, fig 5-1).
> > > >
> > > > L2 and L3 are states where main power to the downstream component is
> > > > off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so there is
> > > > no link in those states.
> > > >
> > > > The PME_Turn_Off handshake is part of the process to put the
> > > > downstream component in D3cold.  I think the reason for this handshake
> > > > is to allow an orderly shutdown of that component before main power is
> > > > removed.
> > > >
> > > > When the downstream component receives PME_Turn_Off, it will stop
> > > > scheduling new TLPs, but it may already have TLPs scheduled but not
> > > > yet sent.  If power were removed immediately, they would be lost.  My
> > > > understanding is that the link will not enter L2/L3 Ready until the
> > > > components on both ends have completed whatever needs to be done with
> > > > those TLPs.  (This is based on the L2/L3 discussion in the Mindshare
> > > > PCIe book; I haven't found clear spec citations for all of it.)
> > > >
> > > > I think waiting for L2/L3 Ready is to keep us from turning off main
> > > > power when the components are still trying to dispose of those TLPs.
> > > >
> > > > So I think every controller that turns off main power needs to wait
> > > > for L2/L3 Ready.
> > > >
> > > > There's also a requirement that software wait at least 100 ns after
> > > > L2/L3 Ready before turning off refclock and main power (sec
> > > > 5.3.3.2.1).
> > Thanks for the comments.
> > So, the L2 poll is better kept, since PCIe r6.0, sec 5.3.3.2.1 also recommends
> >  1ms to 10ms timeout to check L2 ready or not.
> > The v2 of this patch would only remove the LTSSM stat check when issue
> >  the PME_TURN_OFF message if there are no further comments.
> >
>
> If you unconditionally send PME_Turn_Off message, then you'd end up polling for
> L23 Ready, which may result in a timeout and users will see the error message.
> This is my concern.

Yes, may we can check if entry L2 or link down, so no such message print
for link down case.

Frank

>
> - Mani
>
> --
> மணிவண்ணன் சதாசிவம்
Frank Li Nov. 11, 2024, 5:42 p.m. UTC | #11
On Mon, Nov 11, 2024 at 11:39:02AM +0530, Manivannan Sadhasivam wrote:
> On Thu, Nov 07, 2024 at 06:24:25PM -0600, Bjorn Helgaas wrote:
> > On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam wrote:
> > > On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > > > Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's
> > > > safe to send PME_TURN_OFF message regardless of whether the link
> > > > is up or down. So, there would be no need to test the LTSSM stat
> > > > before sending PME_TURN_OFF message.
> > >
> > > What is the incentive to send PME_Turn_Off when link is not up?
> >
> > There's no need to send PME_Turn_Off when link is not up.
> >
> > But a link-up check is inherently racy because the link may go down
> > between the check and the PME_Turn_Off.  Since it's impossible for
> > software to guarantee the link is up, the Root Port should be able to
> > tolerate attempts to send PME_Turn_Off when the link is down.
> >
> > So IMO there's no need to check whether the link is up, and checking
> > gives the misleading impression that "we know the link is up and
> > therefore sending PME_Turn_Off is safe."
> >
>
> I agree that the check is racy (not sure if there is a better way to avoid
> that), but if you send the PME_Turn_Off unconditionally, then it will result in
> L23 Ready timeout and users will see the error message.
>
> > > > Remove the L2 poll too, after the PME_TURN_OFF message is sent
> > > > out.  Because the re-initialization would be done in
> > > > dw_pcie_resume_noirq().
> > >
> > > As Krishna explained, host needs to wait until the endpoint acks the
> > > message (just to give it some time to do cleanups). Then only the
> > > host can initiate D3Cold. It matters when the device supports L2.
> >
> > The important thing here is to be clear about the *reason* to poll for
> > L2 and the *event* that must wait for L2.
> >
> > I don't have any DesignWare specs, but when dw_pcie_suspend_noirq()
> > waits for DW_PCIE_LTSSM_L2_IDLE, I think what we're doing is waiting
> > for the link to be in the L2/L3 Ready pseudo-state (PCIe r6.0, sec
> > 5.2, fig 5-1).
> >
> > L2 and L3 are states where main power to the downstream component is
> > off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so there is
> > no link in those states.
> >
> > The PME_Turn_Off handshake is part of the process to put the
> > downstream component in D3cold.  I think the reason for this handshake
> > is to allow an orderly shutdown of that component before main power is
> > removed.
> >
> > When the downstream component receives PME_Turn_Off, it will stop
> > scheduling new TLPs, but it may already have TLPs scheduled but not
> > yet sent.  If power were removed immediately, they would be lost.  My
> > understanding is that the link will not enter L2/L3 Ready until the
> > components on both ends have completed whatever needs to be done with
> > those TLPs.  (This is based on the L2/L3 discussion in the Mindshare
> > PCIe book; I haven't found clear spec citations for all of it.)
> >
> > I think waiting for L2/L3 Ready is to keep us from turning off main
> > power when the components are still trying to dispose of those TLPs.
> >
>
> Not just disposing TLPs as per the spec, most endpoints also need to reset their
> state machine as well (if there is a way for the endpoint sw to delay sending
> L23 Ready).
>
> > So I think every controller that turns off main power needs to wait
> > for L2/L3 Ready.
> >
> > There's also a requirement that software wait at least 100 ns after
> > L2/L3 Ready before turning off refclock and main power (sec
> > 5.3.3.2.1).
> >
>
> Right. Usually, the delay after PERST# assert would make sure this, but in
> layerscape driver (user of dw_pcie_suspend_noirq) I don't see power/refclk
> removal.
>
> Richard Zhu/Frank, thoughts?

Generally, power/refclk remove when system enter sleep state. There is
signal "suspend_request_b", which connect to PMIC. After CPU trigger this
signnal, PMIC will turn off (pre fused) some power rail.

Refclk(come from SOC chip), OSC will be shutdown when send out
"suspend_request_b".

Frank


>
> - Mani
>
> --
> மணிவண்ணன் சதாசிவம்
Manivannan Sadhasivam Nov. 12, 2024, 8:02 a.m. UTC | #12
On Mon, Nov 11, 2024 at 12:42:50PM -0500, Frank Li wrote:
> On Mon, Nov 11, 2024 at 11:39:02AM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Nov 07, 2024 at 06:24:25PM -0600, Bjorn Helgaas wrote:
> > > On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam wrote:
> > > > On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > > > > Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's
> > > > > safe to send PME_TURN_OFF message regardless of whether the link
> > > > > is up or down. So, there would be no need to test the LTSSM stat
> > > > > before sending PME_TURN_OFF message.
> > > >
> > > > What is the incentive to send PME_Turn_Off when link is not up?
> > >
> > > There's no need to send PME_Turn_Off when link is not up.
> > >
> > > But a link-up check is inherently racy because the link may go down
> > > between the check and the PME_Turn_Off.  Since it's impossible for
> > > software to guarantee the link is up, the Root Port should be able to
> > > tolerate attempts to send PME_Turn_Off when the link is down.
> > >
> > > So IMO there's no need to check whether the link is up, and checking
> > > gives the misleading impression that "we know the link is up and
> > > therefore sending PME_Turn_Off is safe."
> > >
> >
> > I agree that the check is racy (not sure if there is a better way to avoid
> > that), but if you send the PME_Turn_Off unconditionally, then it will result in
> > L23 Ready timeout and users will see the error message.
> >
> > > > > Remove the L2 poll too, after the PME_TURN_OFF message is sent
> > > > > out.  Because the re-initialization would be done in
> > > > > dw_pcie_resume_noirq().
> > > >
> > > > As Krishna explained, host needs to wait until the endpoint acks the
> > > > message (just to give it some time to do cleanups). Then only the
> > > > host can initiate D3Cold. It matters when the device supports L2.
> > >
> > > The important thing here is to be clear about the *reason* to poll for
> > > L2 and the *event* that must wait for L2.
> > >
> > > I don't have any DesignWare specs, but when dw_pcie_suspend_noirq()
> > > waits for DW_PCIE_LTSSM_L2_IDLE, I think what we're doing is waiting
> > > for the link to be in the L2/L3 Ready pseudo-state (PCIe r6.0, sec
> > > 5.2, fig 5-1).
> > >
> > > L2 and L3 are states where main power to the downstream component is
> > > off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so there is
> > > no link in those states.
> > >
> > > The PME_Turn_Off handshake is part of the process to put the
> > > downstream component in D3cold.  I think the reason for this handshake
> > > is to allow an orderly shutdown of that component before main power is
> > > removed.
> > >
> > > When the downstream component receives PME_Turn_Off, it will stop
> > > scheduling new TLPs, but it may already have TLPs scheduled but not
> > > yet sent.  If power were removed immediately, they would be lost.  My
> > > understanding is that the link will not enter L2/L3 Ready until the
> > > components on both ends have completed whatever needs to be done with
> > > those TLPs.  (This is based on the L2/L3 discussion in the Mindshare
> > > PCIe book; I haven't found clear spec citations for all of it.)
> > >
> > > I think waiting for L2/L3 Ready is to keep us from turning off main
> > > power when the components are still trying to dispose of those TLPs.
> > >
> >
> > Not just disposing TLPs as per the spec, most endpoints also need to reset their
> > state machine as well (if there is a way for the endpoint sw to delay sending
> > L23 Ready).
> >
> > > So I think every controller that turns off main power needs to wait
> > > for L2/L3 Ready.
> > >
> > > There's also a requirement that software wait at least 100 ns after
> > > L2/L3 Ready before turning off refclock and main power (sec
> > > 5.3.3.2.1).
> > >
> >
> > Right. Usually, the delay after PERST# assert would make sure this, but in
> > layerscape driver (user of dw_pcie_suspend_noirq) I don't see power/refclk
> > removal.
> >
> > Richard Zhu/Frank, thoughts?
> 
> Generally, power/refclk remove when system enter sleep state. There is
> signal "suspend_request_b", which connect to PMIC. After CPU trigger this
> signnal, PMIC will turn off (pre fused) some power rail.
> 
> Refclk(come from SOC chip), OSC will be shutdown when send out
> "suspend_request_b".
> 

Thanks for clarifying! Then it would be better to add the 100ns delay after
receiving the L23 Ready message from endpoint.

- Mani
Hongxing Zhu Nov. 12, 2024, 9 a.m. UTC | #13
> -----Original Message-----
> From: Frank Li <frank.li@nxp.com>
> Sent: 2024年11月12日 0:18
> To: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Cc: Hongxing Zhu <hongxing.zhu@nxp.com>; Krishna Chaitanya Chundru
> <quic_krichai@quicinc.com>; Bjorn Helgaas <helgaas@kernel.org>;
> jingoohan1@gmail.com; bhelgaas@google.com; lpieralisi@kernel.org;
> kw@linux.com; robh@kernel.org; imx@lists.linux.dev;
> kernel@pengutronix.de; linux-pci@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1] PCI: dwc: Clean up some unnecessary codes in
> dw_pcie_suspend_noirq()
> 
> On Mon, Nov 11, 2024 at 11:03:22AM +0530, Manivannan Sadhasivam wrote:
> > On Mon, Nov 11, 2024 at 03:29:18AM +0000, Hongxing Zhu wrote:
> > > > -----Original Message-----
> > > > From: Krishna Chaitanya Chundru <quic_krichai@quicinc.com>
> > > > Sent: 2024年11月10日 8:10
> > > > To: Bjorn Helgaas <helgaas@kernel.org>; Manivannan Sadhasivam
> > > > <manivannan.sadhasivam@linaro.org>
> > > > Cc: Hongxing Zhu <hongxing.zhu@nxp.com>; jingoohan1@gmail.com;
> > > > bhelgaas@google.com; lpieralisi@kernel.org; kw@linux.com;
> > > > robh@kernel.org; Frank Li <frank.li@nxp.com>; imx@lists.linux.dev;
> > > > kernel@pengutronix.de; linux-pci@vger.kernel.org;
> > > > linux-kernel@vger.kernel.org
> > > > Subject: Re: [PATCH v1] PCI: dwc: Clean up some unnecessary codes
> > > > in
> > > > dw_pcie_suspend_noirq()
> > > >
> > > >
> > > >
> > > > On 11/8/2024 5:54 AM, Bjorn Helgaas wrote:
> > > > > On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam
> > > > wrote:
> > > > >> On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > > > >>> Before sending PME_TURN_OFF, don't test the LTSSM stat. Since
> > > > >>> it's safe to send PME_TURN_OFF message regardless of whether
> > > > >>> the link is up or down. So, there would be no need to test the
> > > > >>> LTSSM stat before sending PME_TURN_OFF message.
> > > > >>
> > > > >> What is the incentive to send PME_Turn_Off when link is not up?
> > > > >
> > > > > There's no need to send PME_Turn_Off when link is not up.
> > > > >
> > > > > But a link-up check is inherently racy because the link may go
> > > > > down between the check and the PME_Turn_Off.  Since it's
> > > > > impossible for software to guarantee the link is up, the Root
> > > > > Port should be able to tolerate attempts to send PME_Turn_Off when
> the link is down.
> > > > >
> > > > > So IMO there's no need to check whether the link is up, and
> > > > > checking gives the misleading impression that "we know the link
> > > > > is up and therefore sending PME_Turn_Off is safe."
> > > > >
> > > > Hi Bjorn,
> > > >
> > > > I agree that link-up check is racy but once link is up and link
> > > > has gone down due to some reason the ltssm state will not move
> > > > detect quiet or detect act, it will go to pre detect quiet (i.e value 0f 0x5).
> > > > we can assume if the link is up LTSSM state will greater than
> > > > detect act even if the link was down.
> > > >
> > > > - Krishna Chaitanya.
> > > > >>> Remove the L2 poll too, after the PME_TURN_OFF message is sent
> out.
> > > > >>> Because the re-initialization would be done in
> > > > >>> dw_pcie_resume_noirq().
> > > > >>
> > > > >> As Krishna explained, host needs to wait until the endpoint
> > > > >> acks the message (just to give it some time to do cleanups).
> > > > >> Then only the host can initiate D3Cold. It matters when the device
> supports L2.
> > > > >
> > > > > The important thing here is to be clear about the *reason* to
> > > > > poll for
> > > > > L2 and the *event* that must wait for L2.
> > > > >
> > > > > I don't have any DesignWare specs, but when
> > > > > dw_pcie_suspend_noirq() waits for DW_PCIE_LTSSM_L2_IDLE, I think
> > > > > what we're doing is waiting for the link to be in the L2/L3
> > > > > Ready pseudo-state (PCIe r6.0, sec 5.2, fig 5-1).
> > > > >
> > > > > L2 and L3 are states where main power to the downstream
> > > > > component is off, i.e., the component is in D3cold (r6.0, sec
> > > > > 5.3.2), so there is no link in those states.
> > > > >
> > > > > The PME_Turn_Off handshake is part of the process to put the
> > > > > downstream component in D3cold.  I think the reason for this
> > > > > handshake is to allow an orderly shutdown of that component
> > > > > before main power is removed.
> > > > >
> > > > > When the downstream component receives PME_Turn_Off, it will
> > > > > stop scheduling new TLPs, but it may already have TLPs scheduled
> > > > > but not yet sent.  If power were removed immediately, they would
> > > > > be lost.  My understanding is that the link will not enter L2/L3
> > > > > Ready until the components on both ends have completed whatever
> > > > > needs to be done with those TLPs.  (This is based on the L2/L3
> > > > > discussion in the Mindshare PCIe book; I haven't found clear
> > > > > spec citations for all of it.)
> > > > >
> > > > > I think waiting for L2/L3 Ready is to keep us from turning off
> > > > > main power when the components are still trying to dispose of those
> TLPs.
> > > > >
> > > > > So I think every controller that turns off main power needs to
> > > > > wait for L2/L3 Ready.
> > > > >
> > > > > There's also a requirement that software wait at least 100 ns
> > > > > after
> > > > > L2/L3 Ready before turning off refclock and main power (sec
> > > > > 5.3.3.2.1).
> > > Thanks for the comments.
> > > So, the L2 poll is better kept, since PCIe r6.0, sec 5.3.3.2.1 also
> > > recommends  1ms to 10ms timeout to check L2 ready or not.
> > > The v2 of this patch would only remove the LTSSM stat check when
> > > issue  the PME_TURN_OFF message if there are no further comments.
> > >
> >
> > If you unconditionally send PME_Turn_Off message, then you'd end up
> > polling for
> > L23 Ready, which may result in a timeout and users will see the error
> message.
> > This is my concern.
> 
> Yes, may we can check if entry L2 or link down, so no such message print for
> link down case.
>
At the L2/L3 Ready wait moment, the link should be still up stat, right?
Before dump the error message, how about to check link is up or not like this:
                ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
                                        PCIE_PME_TO_L2_TIMEOUT_US/10,
                                        PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
-               if (ret) {
+               if (ret && dw_pcie_link_up(pci)) {
                        dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
                        return ret;
                }

Best Regards
Richard Zhu

> Frank
> 
> >
> > - Mani
> >
> > --
> > மணிவண்ணன் சதாசிவம்
Hongxing Zhu Nov. 12, 2024, 9:15 a.m. UTC | #14
> -----Original Message-----
> From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Sent: 2024年11月12日 16:02
> To: Frank Li <frank.li@nxp.com>
> Cc: Bjorn Helgaas <helgaas@kernel.org>; Hongxing Zhu
> <hongxing.zhu@nxp.com>; jingoohan1@gmail.com; bhelgaas@google.com;
> lpieralisi@kernel.org; kw@linux.com; robh@kernel.org; imx@lists.linux.dev;
> kernel@pengutronix.de; linux-pci@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1] PCI: dwc: Clean up some unnecessary codes in
> dw_pcie_suspend_noirq()
> 
> On Mon, Nov 11, 2024 at 12:42:50PM -0500, Frank Li wrote:
> > On Mon, Nov 11, 2024 at 11:39:02AM +0530, Manivannan Sadhasivam
> wrote:
> > > On Thu, Nov 07, 2024 at 06:24:25PM -0600, Bjorn Helgaas wrote:
> > > > On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam
> wrote:
> > > > > On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > > > > > Before sending PME_TURN_OFF, don't test the LTSSM stat. Since
> > > > > > it's safe to send PME_TURN_OFF message regardless of whether
> > > > > > the link is up or down. So, there would be no need to test the
> > > > > > LTSSM stat before sending PME_TURN_OFF message.
> > > > >
> > > > > What is the incentive to send PME_Turn_Off when link is not up?
> > > >
> > > > There's no need to send PME_Turn_Off when link is not up.
> > > >
> > > > But a link-up check is inherently racy because the link may go
> > > > down between the check and the PME_Turn_Off.  Since it's
> > > > impossible for software to guarantee the link is up, the Root Port
> > > > should be able to tolerate attempts to send PME_Turn_Off when the link
> is down.
> > > >
> > > > So IMO there's no need to check whether the link is up, and
> > > > checking gives the misleading impression that "we know the link is
> > > > up and therefore sending PME_Turn_Off is safe."
> > > >
> > >
> > > I agree that the check is racy (not sure if there is a better way to
> > > avoid that), but if you send the PME_Turn_Off unconditionally, then
> > > it will result in
> > > L23 Ready timeout and users will see the error message.
> > >
> > > > > > Remove the L2 poll too, after the PME_TURN_OFF message is sent
> > > > > > out.  Because the re-initialization would be done in
> > > > > > dw_pcie_resume_noirq().
> > > > >
> > > > > As Krishna explained, host needs to wait until the endpoint acks
> > > > > the message (just to give it some time to do cleanups). Then
> > > > > only the host can initiate D3Cold. It matters when the device supports
> L2.
> > > >
> > > > The important thing here is to be clear about the *reason* to poll
> > > > for
> > > > L2 and the *event* that must wait for L2.
> > > >
> > > > I don't have any DesignWare specs, but when
> > > > dw_pcie_suspend_noirq() waits for DW_PCIE_LTSSM_L2_IDLE, I think
> > > > what we're doing is waiting for the link to be in the L2/L3 Ready
> > > > pseudo-state (PCIe r6.0, sec 5.2, fig 5-1).
> > > >
> > > > L2 and L3 are states where main power to the downstream component
> > > > is off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so
> > > > there is no link in those states.
> > > >
> > > > The PME_Turn_Off handshake is part of the process to put the
> > > > downstream component in D3cold.  I think the reason for this
> > > > handshake is to allow an orderly shutdown of that component before
> > > > main power is removed.
> > > >
> > > > When the downstream component receives PME_Turn_Off, it will stop
> > > > scheduling new TLPs, but it may already have TLPs scheduled but
> > > > not yet sent.  If power were removed immediately, they would be
> > > > lost.  My understanding is that the link will not enter L2/L3
> > > > Ready until the components on both ends have completed whatever
> > > > needs to be done with those TLPs.  (This is based on the L2/L3
> > > > discussion in the Mindshare PCIe book; I haven't found clear spec
> > > > citations for all of it.)
> > > >
> > > > I think waiting for L2/L3 Ready is to keep us from turning off
> > > > main power when the components are still trying to dispose of those
> TLPs.
> > > >
> > >
> > > Not just disposing TLPs as per the spec, most endpoints also need to
> > > reset their state machine as well (if there is a way for the
> > > endpoint sw to delay sending
> > > L23 Ready).
> > >
> > > > So I think every controller that turns off main power needs to
> > > > wait for L2/L3 Ready.
> > > >
> > > > There's also a requirement that software wait at least 100 ns
> > > > after
> > > > L2/L3 Ready before turning off refclock and main power (sec
> > > > 5.3.3.2.1).
> > > >
> > >
> > > Right. Usually, the delay after PERST# assert would make sure this,
> > > but in layerscape driver (user of dw_pcie_suspend_noirq) I don't see
> > > power/refclk removal.
> > >
> > > Richard Zhu/Frank, thoughts?
> >
> > Generally, power/refclk remove when system enter sleep state. There is
> > signal "suspend_request_b", which connect to PMIC. After CPU trigger
> > this signnal, PMIC will turn off (pre fused) some power rail.
> >
> > Refclk(come from SOC chip), OSC will be shutdown when send out
> > "suspend_request_b".
> >
> 
> Thanks for clarifying! Then it would be better to add the 100ns delay after
> receiving the L23 Ready message from endpoint.
Okay.
How about the following changes?
- Before dump error message, make sure link is up.
- Add 1us delay after L2/L3 Ready is received.

--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -940,9 +940,16 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
                ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
                                        PCIE_PME_TO_L2_TIMEOUT_US/10,
                                        PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
-               if (ret) {
+               if (ret && dw_pcie_link_up(pci)) {
                        dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
                        return ret;
+               } else {
+                       /*
+                        * Refer to r6.0, sec 5.3.3.2.1, software should wait at
+                        * least 100ns after L2/L3 Ready before turning off
+                        * refclock and main power.
+                        */
+                       udelay(1);

Best Regards
Richard Zhu
> 
> - Mani
> 
> --
> மணிவண்ணன் சதாசிவம்
Hongxing Zhu Nov. 12, 2024, 9:25 a.m. UTC | #15
> -----Original Message-----
> From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Sent: 2024年11月11日 14:09
> To: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Hongxing Zhu <hongxing.zhu@nxp.com>; jingoohan1@gmail.com;
> bhelgaas@google.com; lpieralisi@kernel.org; kw@linux.com;
> robh@kernel.org; Frank Li <frank.li@nxp.com>; imx@lists.linux.dev;
> kernel@pengutronix.de; linux-pci@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1] PCI: dwc: Clean up some unnecessary codes in
> dw_pcie_suspend_noirq()
> 
> On Thu, Nov 07, 2024 at 06:24:25PM -0600, Bjorn Helgaas wrote:
> > On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam
> wrote:
> > > On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > > > Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's
> > > > safe to send PME_TURN_OFF message regardless of whether the link
> > > > is up or down. So, there would be no need to test the LTSSM stat
> > > > before sending PME_TURN_OFF message.
> > >
> > > What is the incentive to send PME_Turn_Off when link is not up?
> >
> > There's no need to send PME_Turn_Off when link is not up.
> >
> > But a link-up check is inherently racy because the link may go down
> > between the check and the PME_Turn_Off.  Since it's impossible for
> > software to guarantee the link is up, the Root Port should be able to
> > tolerate attempts to send PME_Turn_Off when the link is down.
> >
> > So IMO there's no need to check whether the link is up, and checking
> > gives the misleading impression that "we know the link is up and
> > therefore sending PME_Turn_Off is safe."
> >
> 
> I agree that the check is racy (not sure if there is a better way to avoid that),
> but if you send the PME_Turn_Off unconditionally, then it will result in
> L23 Ready timeout and users will see the error message.
> 
I understand Manivannan' s concerns.
When check the link is up or not before dumping error message, 
there is another check racy.
How about to replace the dev_err() by dev_info(), and no error return?
Whatever the timeout is caused by no EP connected or something else. Just
inform user the real stat it is.

Best Regards
Richard Zhu

> > > > Remove the L2 poll too, after the PME_TURN_OFF message is sent
> > > > out.  Because the re-initialization would be done in
> > > > dw_pcie_resume_noirq().
> > >
> > > As Krishna explained, host needs to wait until the endpoint acks the
> > > message (just to give it some time to do cleanups). Then only the
> > > host can initiate D3Cold. It matters when the device supports L2.
> >
> > The important thing here is to be clear about the *reason* to poll for
> > L2 and the *event* that must wait for L2.
> >
> > I don't have any DesignWare specs, but when dw_pcie_suspend_noirq()
> > waits for DW_PCIE_LTSSM_L2_IDLE, I think what we're doing is waiting
> > for the link to be in the L2/L3 Ready pseudo-state (PCIe r6.0, sec
> > 5.2, fig 5-1).
> >
> > L2 and L3 are states where main power to the downstream component is
> > off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so there is
> > no link in those states.
> >
> > The PME_Turn_Off handshake is part of the process to put the
> > downstream component in D3cold.  I think the reason for this handshake
> > is to allow an orderly shutdown of that component before main power is
> > removed.
> >
> > When the downstream component receives PME_Turn_Off, it will stop
> > scheduling new TLPs, but it may already have TLPs scheduled but not
> > yet sent.  If power were removed immediately, they would be lost.  My
> > understanding is that the link will not enter L2/L3 Ready until the
> > components on both ends have completed whatever needs to be done with
> > those TLPs.  (This is based on the L2/L3 discussion in the Mindshare
> > PCIe book; I haven't found clear spec citations for all of it.)
> >
> > I think waiting for L2/L3 Ready is to keep us from turning off main
> > power when the components are still trying to dispose of those TLPs.
> >
> 
> Not just disposing TLPs as per the spec, most endpoints also need to reset
> their state machine as well (if there is a way for the endpoint sw to delay
> sending
> L23 Ready).
> 
> > So I think every controller that turns off main power needs to wait
> > for L2/L3 Ready.
> >
> > There's also a requirement that software wait at least 100 ns after
> > L2/L3 Ready before turning off refclock and main power (sec
> > 5.3.3.2.1).
> >
> 
> Right. Usually, the delay after PERST# assert would make sure this, but in
> layerscape driver (user of dw_pcie_suspend_noirq) I don't see power/refclk
> removal.
> 
> Richard Zhu/Frank, thoughts?
> 
> - Mani
> 
> --
> மணிவண்ணன் சதாசிவம்
Frank Li Nov. 12, 2024, 4:30 p.m. UTC | #16
On Tue, Nov 12, 2024 at 09:15:25AM +0000, Hongxing Zhu wrote:
> > -----Original Message-----
> > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > Sent: 2024年11月12日 16:02
> > To: Frank Li <frank.li@nxp.com>
> > Cc: Bjorn Helgaas <helgaas@kernel.org>; Hongxing Zhu
> > <hongxing.zhu@nxp.com>; jingoohan1@gmail.com; bhelgaas@google.com;
> > lpieralisi@kernel.org; kw@linux.com; robh@kernel.org; imx@lists.linux.dev;
> > kernel@pengutronix.de; linux-pci@vger.kernel.org;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH v1] PCI: dwc: Clean up some unnecessary codes in
> > dw_pcie_suspend_noirq()
> >
> > On Mon, Nov 11, 2024 at 12:42:50PM -0500, Frank Li wrote:
> > > On Mon, Nov 11, 2024 at 11:39:02AM +0530, Manivannan Sadhasivam
> > wrote:
> > > > On Thu, Nov 07, 2024 at 06:24:25PM -0600, Bjorn Helgaas wrote:
> > > > > On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam
> > wrote:
> > > > > > On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > > > > > > Before sending PME_TURN_OFF, don't test the LTSSM stat. Since
> > > > > > > it's safe to send PME_TURN_OFF message regardless of whether
> > > > > > > the link is up or down. So, there would be no need to test the
> > > > > > > LTSSM stat before sending PME_TURN_OFF message.
> > > > > >
> > > > > > What is the incentive to send PME_Turn_Off when link is not up?
> > > > >
> > > > > There's no need to send PME_Turn_Off when link is not up.
> > > > >
> > > > > But a link-up check is inherently racy because the link may go
> > > > > down between the check and the PME_Turn_Off.  Since it's
> > > > > impossible for software to guarantee the link is up, the Root Port
> > > > > should be able to tolerate attempts to send PME_Turn_Off when the link
> > is down.
> > > > >
> > > > > So IMO there's no need to check whether the link is up, and
> > > > > checking gives the misleading impression that "we know the link is
> > > > > up and therefore sending PME_Turn_Off is safe."
> > > > >
> > > >
> > > > I agree that the check is racy (not sure if there is a better way to
> > > > avoid that), but if you send the PME_Turn_Off unconditionally, then
> > > > it will result in
> > > > L23 Ready timeout and users will see the error message.
> > > >
> > > > > > > Remove the L2 poll too, after the PME_TURN_OFF message is sent
> > > > > > > out.  Because the re-initialization would be done in
> > > > > > > dw_pcie_resume_noirq().
> > > > > >
> > > > > > As Krishna explained, host needs to wait until the endpoint acks
> > > > > > the message (just to give it some time to do cleanups). Then
> > > > > > only the host can initiate D3Cold. It matters when the device supports
> > L2.
> > > > >
> > > > > The important thing here is to be clear about the *reason* to poll
> > > > > for
> > > > > L2 and the *event* that must wait for L2.
> > > > >
> > > > > I don't have any DesignWare specs, but when
> > > > > dw_pcie_suspend_noirq() waits for DW_PCIE_LTSSM_L2_IDLE, I think
> > > > > what we're doing is waiting for the link to be in the L2/L3 Ready
> > > > > pseudo-state (PCIe r6.0, sec 5.2, fig 5-1).
> > > > >
> > > > > L2 and L3 are states where main power to the downstream component
> > > > > is off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so
> > > > > there is no link in those states.
> > > > >
> > > > > The PME_Turn_Off handshake is part of the process to put the
> > > > > downstream component in D3cold.  I think the reason for this
> > > > > handshake is to allow an orderly shutdown of that component before
> > > > > main power is removed.
> > > > >
> > > > > When the downstream component receives PME_Turn_Off, it will stop
> > > > > scheduling new TLPs, but it may already have TLPs scheduled but
> > > > > not yet sent.  If power were removed immediately, they would be
> > > > > lost.  My understanding is that the link will not enter L2/L3
> > > > > Ready until the components on both ends have completed whatever
> > > > > needs to be done with those TLPs.  (This is based on the L2/L3
> > > > > discussion in the Mindshare PCIe book; I haven't found clear spec
> > > > > citations for all of it.)
> > > > >
> > > > > I think waiting for L2/L3 Ready is to keep us from turning off
> > > > > main power when the components are still trying to dispose of those
> > TLPs.
> > > > >
> > > >
> > > > Not just disposing TLPs as per the spec, most endpoints also need to
> > > > reset their state machine as well (if there is a way for the
> > > > endpoint sw to delay sending
> > > > L23 Ready).
> > > >
> > > > > So I think every controller that turns off main power needs to
> > > > > wait for L2/L3 Ready.
> > > > >
> > > > > There's also a requirement that software wait at least 100 ns
> > > > > after
> > > > > L2/L3 Ready before turning off refclock and main power (sec
> > > > > 5.3.3.2.1).
> > > > >
> > > >
> > > > Right. Usually, the delay after PERST# assert would make sure this,
> > > > but in layerscape driver (user of dw_pcie_suspend_noirq) I don't see
> > > > power/refclk removal.
> > > >
> > > > Richard Zhu/Frank, thoughts?
> > >
> > > Generally, power/refclk remove when system enter sleep state. There is
> > > signal "suspend_request_b", which connect to PMIC. After CPU trigger
> > > this signnal, PMIC will turn off (pre fused) some power rail.
> > >
> > > Refclk(come from SOC chip), OSC will be shutdown when send out
> > > "suspend_request_b".
> > >
> >
> > Thanks for clarifying! Then it would be better to add the 100ns delay after
> > receiving the L23 Ready message from endpoint.
> Okay.
> How about the following changes?
> - Before dump error message, make sure link is up.
> - Add 1us delay after L2/L3 Ready is received.
>
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -940,9 +940,16 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
>                 ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
>                                         PCIE_PME_TO_L2_TIMEOUT_US/10,
>                                         PCIE_PME_TO_L2_TIMEOUT_US, false, pci);

My means change
val == DW_PCIE_LTSSM_L2_IDLE || val == XXX

XXX should be one of below value when linkdown, Maybe S_DISABLED or
S_DETECT_QUIET

00_0000b - S_DETECT_QUIET
00_0001b - S_DETECT_ACT
00_0010b - S_POLL_ACTIVE
00_0011b - S_POLL_COMPLIANCE
00_0100b - S_POLL_CONFIG
00_0101b - S_PRE_DETECT_QUIET
00_0110b - S_DETECT_WAIT
00_0111b - S_CFG_LINKWD_START
00_1000b - S_CFG_LINKWD_ACEPT
00_1001b - S_CFG_LANENUM_WAI
00_1010b - S_CFG_LANENUM_ACEPT
00_1011b - S_CFG_COMPLETE
00_1100b - S_CFG_IDLE
00_1101b - S_RCVRY_LOCK
00_1110b - S_RCVRY_SPEED
00_1111b - S_RCVRY_RCVRCFG
01_0000b - S_RCVRY_IDLE
01_0001b - S_L0
01_0010b - S_L0S
01_0011b - S_L123_SEND_EIDLE
01_0100b - S_L1_IDLE
01_0101b - S_L2_IDLE
01_0110b - S_L2_WAKE
01_0111b - S_DISABLED_ENTRY
01_1000b - S_DISABLED_IDLE
01_1001b - S_DISABLED
01_1010b - S_LPBK_ENTRY
01_1011b - S_LPBK_ACTIVE
01_1100b - S_LPBK_EXIT
01_1101b - S_LPBK_EXIT_TIMEOUT
01_1110b - S_HOT_RESET_ENTRY
01_1111b - S_HOT_RESET
10_0000b - S_RCVRY_EQ0
10_0001b - S_RCVRY_EQ1
10_0010b - S_RCVRY_EQ2
10_0011b - S_RCVRY_EQ3

> -               if (ret) {
> +               if (ret && dw_pcie_link_up(pci)) {
>                         dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
>                         return ret;
> +               } else {
> +                       /*
> +                        * Refer to r6.0, sec 5.3.3.2.1, software should wait at
> +                        * least 100ns after L2/L3 Ready before turning off
> +                        * refclock and main power.
> +                        */
> +                       udelay(1);
>
> Best Regards
> Richard Zhu
> >
> > - Mani
> >
> > --
> > மணிவண்ணன் சதாசிவம்
Manivannan Sadhasivam Nov. 12, 2024, 6:04 p.m. UTC | #17
On Tue, Nov 12, 2024 at 09:25:57AM +0000, Hongxing Zhu wrote:
> > -----Original Message-----
> > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > Sent: 2024年11月11日 14:09
> > To: Bjorn Helgaas <helgaas@kernel.org>
> > Cc: Hongxing Zhu <hongxing.zhu@nxp.com>; jingoohan1@gmail.com;
> > bhelgaas@google.com; lpieralisi@kernel.org; kw@linux.com;
> > robh@kernel.org; Frank Li <frank.li@nxp.com>; imx@lists.linux.dev;
> > kernel@pengutronix.de; linux-pci@vger.kernel.org;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH v1] PCI: dwc: Clean up some unnecessary codes in
> > dw_pcie_suspend_noirq()
> > 
> > On Thu, Nov 07, 2024 at 06:24:25PM -0600, Bjorn Helgaas wrote:
> > > On Thu, Nov 07, 2024 at 11:13:34AM +0000, Manivannan Sadhasivam
> > wrote:
> > > > On Thu, Nov 07, 2024 at 04:44:55PM +0800, Richard Zhu wrote:
> > > > > Before sending PME_TURN_OFF, don't test the LTSSM stat. Since it's
> > > > > safe to send PME_TURN_OFF message regardless of whether the link
> > > > > is up or down. So, there would be no need to test the LTSSM stat
> > > > > before sending PME_TURN_OFF message.
> > > >
> > > > What is the incentive to send PME_Turn_Off when link is not up?
> > >
> > > There's no need to send PME_Turn_Off when link is not up.
> > >
> > > But a link-up check is inherently racy because the link may go down
> > > between the check and the PME_Turn_Off.  Since it's impossible for
> > > software to guarantee the link is up, the Root Port should be able to
> > > tolerate attempts to send PME_Turn_Off when the link is down.
> > >
> > > So IMO there's no need to check whether the link is up, and checking
> > > gives the misleading impression that "we know the link is up and
> > > therefore sending PME_Turn_Off is safe."
> > >
> > 
> > I agree that the check is racy (not sure if there is a better way to avoid that),
> > but if you send the PME_Turn_Off unconditionally, then it will result in
> > L23 Ready timeout and users will see the error message.
> > 
> I understand Manivannan' s concerns.
> When check the link is up or not before dumping error message, 
> there is another check racy.

Right.

> How about to replace the dev_err() by dev_info(), and no error return?
> Whatever the timeout is caused by no EP connected or something else. Just
> inform user the real stat it is.
> 

But users don't want the timeout message if no EP is connected, that's my point.

- Mani

> Best Regards
> Richard Zhu
> 
> > > > > Remove the L2 poll too, after the PME_TURN_OFF message is sent
> > > > > out.  Because the re-initialization would be done in
> > > > > dw_pcie_resume_noirq().
> > > >
> > > > As Krishna explained, host needs to wait until the endpoint acks the
> > > > message (just to give it some time to do cleanups). Then only the
> > > > host can initiate D3Cold. It matters when the device supports L2.
> > >
> > > The important thing here is to be clear about the *reason* to poll for
> > > L2 and the *event* that must wait for L2.
> > >
> > > I don't have any DesignWare specs, but when dw_pcie_suspend_noirq()
> > > waits for DW_PCIE_LTSSM_L2_IDLE, I think what we're doing is waiting
> > > for the link to be in the L2/L3 Ready pseudo-state (PCIe r6.0, sec
> > > 5.2, fig 5-1).
> > >
> > > L2 and L3 are states where main power to the downstream component is
> > > off, i.e., the component is in D3cold (r6.0, sec 5.3.2), so there is
> > > no link in those states.
> > >
> > > The PME_Turn_Off handshake is part of the process to put the
> > > downstream component in D3cold.  I think the reason for this handshake
> > > is to allow an orderly shutdown of that component before main power is
> > > removed.
> > >
> > > When the downstream component receives PME_Turn_Off, it will stop
> > > scheduling new TLPs, but it may already have TLPs scheduled but not
> > > yet sent.  If power were removed immediately, they would be lost.  My
> > > understanding is that the link will not enter L2/L3 Ready until the
> > > components on both ends have completed whatever needs to be done with
> > > those TLPs.  (This is based on the L2/L3 discussion in the Mindshare
> > > PCIe book; I haven't found clear spec citations for all of it.)
> > >
> > > I think waiting for L2/L3 Ready is to keep us from turning off main
> > > power when the components are still trying to dispose of those TLPs.
> > >
> > 
> > Not just disposing TLPs as per the spec, most endpoints also need to reset
> > their state machine as well (if there is a way for the endpoint sw to delay
> > sending
> > L23 Ready).
> > 
> > > So I think every controller that turns off main power needs to wait
> > > for L2/L3 Ready.
> > >
> > > There's also a requirement that software wait at least 100 ns after
> > > L2/L3 Ready before turning off refclock and main power (sec
> > > 5.3.3.2.1).
> > >
> > 
> > Right. Usually, the delay after PERST# assert would make sure this, but in
> > layerscape driver (user of dw_pcie_suspend_noirq) I don't see power/refclk
> > removal.
> > 
> > Richard Zhu/Frank, thoughts?
> > 
> > - Mani
> > 
> > --
> > மணிவண்ணன் சதாசிவம்
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index f86347452026..64c49adf81d2 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -917,7 +917,6 @@  static int dw_pcie_pme_turn_off(struct dw_pcie *pci)
 int dw_pcie_suspend_noirq(struct dw_pcie *pci)
 {
 	u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
-	u32 val;
 	int ret = 0;
 
 	/*
@@ -927,23 +926,12 @@  int dw_pcie_suspend_noirq(struct dw_pcie *pci)
 	if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
 		return 0;
 
-	/* Only send out PME_TURN_OFF when PCIE link is up */
-	if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
-		if (pci->pp.ops->pme_turn_off)
-			pci->pp.ops->pme_turn_off(&pci->pp);
-		else
-			ret = dw_pcie_pme_turn_off(pci);
-
+	if (pci->pp.ops->pme_turn_off) {
+		pci->pp.ops->pme_turn_off(&pci->pp);
+	} else {
+		ret = dw_pcie_pme_turn_off(pci);
 		if (ret)
 			return ret;
-
-		ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
-					PCIE_PME_TO_L2_TIMEOUT_US/10,
-					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
-		if (ret) {
-			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
-			return ret;
-		}
 	}
 
 	dw_pcie_stop_link(pci);