diff mbox series

[V1] PCI: dwc: Use dev_info for PCIe link down event logging

Message ID 20220913101237.4337-1-vidyas@nvidia.com (mailing list archive)
State Accepted
Delegated to: Lorenzo Pieralisi
Headers show
Series [V1] PCI: dwc: Use dev_info for PCIe link down event logging | expand

Commit Message

Vidya Sagar Sept. 13, 2022, 10:12 a.m. UTC
Some of the platforms (like Tegra194 and Tegra234) have open slots and
not having an endpoint connected to the slot is not an error.
So, changing the macro from dev_err to dev_info to log the event.

Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
 drivers/pci/controller/dwc/pcie-designware.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Manivannan Sadhasivam Sept. 13, 2022, 4:51 p.m. UTC | #1
On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> Some of the platforms (like Tegra194 and Tegra234) have open slots and
> not having an endpoint connected to the slot is not an error.
> So, changing the macro from dev_err to dev_info to log the event.
> 

But the link up not happening is an actual error and -ETIMEDOUT is being
returned. So I don't think the log severity should be changed.

Thanks,
Mani

> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
>  drivers/pci/controller/dwc/pcie-designware.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> index 650a7f22f9d0..25154555aa7a 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.c
> +++ b/drivers/pci/controller/dwc/pcie-designware.c
> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
>  	}
>  
>  	if (retries >= LINK_WAIT_MAX_RETRIES) {
> -		dev_err(pci->dev, "Phy link never came up\n");
> +		dev_info(pci->dev, "Phy link never came up\n");
>  		return -ETIMEDOUT;
>  	}
>  
> -- 
> 2.17.1
>
Jon Hunter Sept. 13, 2022, 5 p.m. UTC | #2
On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>> not having an endpoint connected to the slot is not an error.
>> So, changing the macro from dev_err to dev_info to log the event.
>>
> 
> But the link up not happening is an actual error and -ETIMEDOUT is being
> returned. So I don't think the log severity should be changed.

Yes it is an error in the sense it is a timeout, but reporting an error 
because nothing is attached to a PCI slot seems a bit noisy. Please note 
that a similar change was made by the following commit and it also seems 
appropriate here ...

commit 4b16a8227907118e011fb396022da671a52b2272
Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Date:   Tue Jun 18 23:32:06 2019 +0530

     PCI: tegra: Change link retry log level to debug


BTW, we check for error messages in the dmesg output and this is a new 
error seen as of Linux v6.0 and so this was flagged in a test. We can 
ignore the error, but in this case it seem more appropriate to make this 
a info or debug level print.

Jon
Bjorn Helgaas Sept. 13, 2022, 8:07 p.m. UTC | #3
On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > not having an endpoint connected to the slot is not an error.
> > > So, changing the macro from dev_err to dev_info to log the event.
> > 
> > But the link up not happening is an actual error and -ETIMEDOUT is being
> > returned. So I don't think the log severity should be changed.
> 
> Yes it is an error in the sense it is a timeout, but reporting an error
> because nothing is attached to a PCI slot seems a bit noisy. Please note
> that a similar change was made by the following commit and it also seems
> appropriate here ...
> 
> commit 4b16a8227907118e011fb396022da671a52b2272
> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
> Date:   Tue Jun 18 23:32:06 2019 +0530
> 
>     PCI: tegra: Change link retry log level to debug
> 
> 
> BTW, we check for error messages in the dmesg output and this is a new error
> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> error, but in this case it seem more appropriate to make this a info or
> debug level print.

Can you tell whether there's a device present, e.g., via Slot Status
Presence Detect?  If there's nothing in the slot, I don't know why we
would print anything at all.  If a card is present but there's no
link, that's probably worthy of dev_info() or even dev_err().

I guess if you can tell the slot is empty, there's no point in even
trying to start the link, so you could avoid both the message and the
timeout by not even calling dw_pcie_wait_for_link().

Bjorn
Manivannan Sadhasivam Sept. 14, 2022, 6:12 a.m. UTC | #4
On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> 
> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > not having an endpoint connected to the slot is not an error.
> > > So, changing the macro from dev_err to dev_info to log the event.
> > > 
> > 
> > But the link up not happening is an actual error and -ETIMEDOUT is being
> > returned. So I don't think the log severity should be changed.
> 
> Yes it is an error in the sense it is a timeout, but reporting an error
> because nothing is attached to a PCI slot seems a bit noisy. Please note
> that a similar change was made by the following commit and it also seems
> appropriate here ...
> 

I got the concern. But the problem here is, we cannot distinguish if the
error is due to slot empty or an actual issues with link/phy. Also it
should be noted that if the slot is empty, the dwc core anyway waits for
the link to be up. IMO this is a boot time overhead that could be
avoided if the specific instance could be disabled in platform data like
devicetree.

> commit 4b16a8227907118e011fb396022da671a52b2272
> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
> Date:   Tue Jun 18 23:32:06 2019 +0530
> 
>     PCI: tegra: Change link retry log level to debug
> 
> 
> BTW, we check for error messages in the dmesg output and this is a new error
> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> error, but in this case it seem more appropriate to make this a info or
> debug level print.
> 

On some of the Qcom based platforms, we avoid this situation by
disabling the specific PCIe node in devicetree for which we know that
there would be no devices connected.

This not only avoids the error message in log but also removes the time
spent waiting for link to be up.

Thanks,
Mani

> Jon
> 
> -- 
> nvpublic
Manivannan Sadhasivam Sept. 14, 2022, 6:24 a.m. UTC | #5
On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > not having an endpoint connected to the slot is not an error.
> > > > So, changing the macro from dev_err to dev_info to log the event.
> > > 
> > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > returned. So I don't think the log severity should be changed.
> > 
> > Yes it is an error in the sense it is a timeout, but reporting an error
> > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > that a similar change was made by the following commit and it also seems
> > appropriate here ...
> > 
> > commit 4b16a8227907118e011fb396022da671a52b2272
> > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
> > Date:   Tue Jun 18 23:32:06 2019 +0530
> > 
> >     PCI: tegra: Change link retry log level to debug
> > 
> > 
> > BTW, we check for error messages in the dmesg output and this is a new error
> > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > error, but in this case it seem more appropriate to make this a info or
> > debug level print.
> 
> Can you tell whether there's a device present, e.g., via Slot Status
> Presence Detect?  If there's nothing in the slot, I don't know why we
> would print anything at all.  If a card is present but there's no
> link, that's probably worthy of dev_info() or even dev_err().
> 

I don't think all form factors allow for the PRSNT pin to be wired up,
so we cannot know if the device is actually present in the slot or not all
the time. Maybe we should do if the form factor supports it?

> I guess if you can tell the slot is empty, there's no point in even
> trying to start the link, so you could avoid both the message and the
> timeout by not even calling dw_pcie_wait_for_link().

Right. There is an overhead of waiting for ~1ms during boot.

We workaround this issue by disabling the PCIe instances in devicetree
for which there would be no devices connected.

Thanks,
Mani

> 
> Bjorn
Vidya Sagar Sept. 14, 2022, 11:02 a.m. UTC | #6
Agree with Mani.
Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and 
even if the slot form factor supports PRSNT, it is not mandatory to have 
a GPIO routed to the PRSNT pin of the slot.
Also, since these are development platforms, we wouldn't want to keep 
changing a controller's status in the DT, instead want to know the 
device's presence/absence (by way of link up) looking at the log.
In my honest opinion, I don't think the absence of a device in the slot 
should be treated as an error.

Thanks,
Vidya Sagar

On 9/14/2022 11:54 AM, Manivannan Sadhasivam wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>>>>> not having an endpoint connected to the slot is not an error.
>>>>> So, changing the macro from dev_err to dev_info to log the event.
>>>>
>>>> But the link up not happening is an actual error and -ETIMEDOUT is being
>>>> returned. So I don't think the log severity should be changed.
>>>
>>> Yes it is an error in the sense it is a timeout, but reporting an error
>>> because nothing is attached to a PCI slot seems a bit noisy. Please note
>>> that a similar change was made by the following commit and it also seems
>>> appropriate here ...
>>>
>>> commit 4b16a8227907118e011fb396022da671a52b2272
>>> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
>>> Date:   Tue Jun 18 23:32:06 2019 +0530
>>>
>>>      PCI: tegra: Change link retry log level to debug
>>>
>>>
>>> BTW, we check for error messages in the dmesg output and this is a new error
>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
>>> error, but in this case it seem more appropriate to make this a info or
>>> debug level print.
>>
>> Can you tell whether there's a device present, e.g., via Slot Status
>> Presence Detect?  If there's nothing in the slot, I don't know why we
>> would print anything at all.  If a card is present but there's no
>> link, that's probably worthy of dev_info() or even dev_err().
>>
> 
> I don't think all form factors allow for the PRSNT pin to be wired up,
> so we cannot know if the device is actually present in the slot or not all
> the time. Maybe we should do if the form factor supports it?
> 
>> I guess if you can tell the slot is empty, there's no point in even
>> trying to start the link, so you could avoid both the message and the
>> timeout by not even calling dw_pcie_wait_for_link().
> 
> Right. There is an overhead of waiting for ~1ms during boot.
> 
> We workaround this issue by disabling the PCIe instances in devicetree
> for which there would be no devices connected.
> 
> Thanks,
> Mani
> 
>>
>> Bjorn
Manivannan Sadhasivam Sept. 14, 2022, 11:18 a.m. UTC | #7
On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote:
> Agree with Mani.
> Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even
> if the slot form factor supports PRSNT, it is not mandatory to have a GPIO
> routed to the PRSNT pin of the slot.

Right.

> Also, since these are development platforms, we wouldn't want to keep
> changing a controller's status in the DT, instead want to know the device's
> presence/absence (by way of link up) looking at the log.
> In my honest opinion, I don't think the absence of a device in the slot
> should be treated as an error.
> 

As I mentioned earlier, timeout can happen due to an issue with PHY layer
also. In those cases, "dev_err()" is relevant.

And I also agree that absence of the device should not be treated as an
error. But my question is, if you know that the slot is going to be
empty always, why cannot you just disable it in dts?

Even if you make the log "dev_info()" there is the wait time for link up
and I'm sure that you wouldn't want it either.

Thanks,
Mani

> Thanks,
> Vidya Sagar
> 
> On 9/14/2022 11:54 AM, Manivannan Sadhasivam wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > > not having an endpoint connected to the slot is not an error.
> > > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > > 
> > > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > > returned. So I don't think the log severity should be changed.
> > > > 
> > > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > > that a similar change was made by the following commit and it also seems
> > > > appropriate here ...
> > > > 
> > > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
> > > > Date:   Tue Jun 18 23:32:06 2019 +0530
> > > > 
> > > >      PCI: tegra: Change link retry log level to debug
> > > > 
> > > > 
> > > > BTW, we check for error messages in the dmesg output and this is a new error
> > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > > error, but in this case it seem more appropriate to make this a info or
> > > > debug level print.
> > > 
> > > Can you tell whether there's a device present, e.g., via Slot Status
> > > Presence Detect?  If there's nothing in the slot, I don't know why we
> > > would print anything at all.  If a card is present but there's no
> > > link, that's probably worthy of dev_info() or even dev_err().
> > > 
> > 
> > I don't think all form factors allow for the PRSNT pin to be wired up,
> > so we cannot know if the device is actually present in the slot or not all
> > the time. Maybe we should do if the form factor supports it?
> > 
> > > I guess if you can tell the slot is empty, there's no point in even
> > > trying to start the link, so you could avoid both the message and the
> > > timeout by not even calling dw_pcie_wait_for_link().
> > 
> > Right. There is an overhead of waiting for ~1ms during boot.
> > 
> > We workaround this issue by disabling the PCIe instances in devicetree
> > for which there would be no devices connected.
> > 
> > Thanks,
> > Mani
> > 
> > > 
> > > Bjorn
Jon Hunter Sept. 14, 2022, 11:25 a.m. UTC | #8
On 14/09/2022 12:18, Manivannan Sadhasivam wrote:
> On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote:
>> Agree with Mani.
>> Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even
>> if the slot form factor supports PRSNT, it is not mandatory to have a GPIO
>> routed to the PRSNT pin of the slot.
> 
> Right.
> 
>> Also, since these are development platforms, we wouldn't want to keep
>> changing a controller's status in the DT, instead want to know the device's
>> presence/absence (by way of link up) looking at the log.
>> In my honest opinion, I don't think the absence of a device in the slot
>> should be treated as an error.
>>
> 
> As I mentioned earlier, timeout can happen due to an issue with PHY layer
> also. In those cases, "dev_err()" is relevant.
> 
> And I also agree that absence of the device should not be treated as an
> error. But my question is, if you know that the slot is going to be
> empty always, why cannot you just disable it in dts?

I really don't think that makes sense from a usability perspective. You 
want to allow users to connect PCI cards and for them to work without 
having to update the DTB. I have plenty of open PCI slots on my PC and I 
would be a bit upset if someone told me I need to update the PC firmware 
each time I wanted to use a new slot.

Jon
Manivannan Sadhasivam Sept. 14, 2022, 11:43 a.m. UTC | #9
On Wed, Sep 14, 2022 at 12:25:51PM +0100, Jon Hunter wrote:
> 
> On 14/09/2022 12:18, Manivannan Sadhasivam wrote:
> > On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote:
> > > Agree with Mani.
> > > Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even
> > > if the slot form factor supports PRSNT, it is not mandatory to have a GPIO
> > > routed to the PRSNT pin of the slot.
> > 
> > Right.
> > 
> > > Also, since these are development platforms, we wouldn't want to keep
> > > changing a controller's status in the DT, instead want to know the device's
> > > presence/absence (by way of link up) looking at the log.
> > > In my honest opinion, I don't think the absence of a device in the slot
> > > should be treated as an error.
> > > 
> > 
> > As I mentioned earlier, timeout can happen due to an issue with PHY layer
> > also. In those cases, "dev_err()" is relevant.
> > 
> > And I also agree that absence of the device should not be treated as an
> > error. But my question is, if you know that the slot is going to be
> > empty always, why cannot you just disable it in dts?
> 
> I really don't think that makes sense from a usability perspective. You want
> to allow users to connect PCI cards and for them to work without having to
> update the DTB. I have plenty of open PCI slots on my PC and I would be a
> bit upset if someone told me I need to update the PC firmware each time I
> wanted to use a new slot.
> 

My question was, "do you think the slot is going to be empty always".
This can happen with slots exposed through a connector (not a PCIe one) and
users would plug in add-on cards for accessing the slots. In those
cases, the add-on specific devicetree can enable the PCIe instance and
use it.

But from your reply, I can infer that the slot is exposed on a standard
PCIe connector and users would connect a PCIe device any time.

Anyway, I don't strongly object the change and leave it to the
maintainers to decide.

Thanks,
Mani

> Jon
> 
> -- 
> nvpublic
Jon Hunter Sept. 14, 2022, 11:52 a.m. UTC | #10
On 14/09/2022 12:43, Manivannan Sadhasivam wrote:
> On Wed, Sep 14, 2022 at 12:25:51PM +0100, Jon Hunter wrote:
>>
>> On 14/09/2022 12:18, Manivannan Sadhasivam wrote:
>>> On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote:
>>>> Agree with Mani.
>>>> Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even
>>>> if the slot form factor supports PRSNT, it is not mandatory to have a GPIO
>>>> routed to the PRSNT pin of the slot.
>>>
>>> Right.
>>>
>>>> Also, since these are development platforms, we wouldn't want to keep
>>>> changing a controller's status in the DT, instead want to know the device's
>>>> presence/absence (by way of link up) looking at the log.
>>>> In my honest opinion, I don't think the absence of a device in the slot
>>>> should be treated as an error.
>>>>
>>>
>>> As I mentioned earlier, timeout can happen due to an issue with PHY layer
>>> also. In those cases, "dev_err()" is relevant.
>>>
>>> And I also agree that absence of the device should not be treated as an
>>> error. But my question is, if you know that the slot is going to be
>>> empty always, why cannot you just disable it in dts?
>>
>> I really don't think that makes sense from a usability perspective. You want
>> to allow users to connect PCI cards and for them to work without having to
>> update the DTB. I have plenty of open PCI slots on my PC and I would be a
>> bit upset if someone told me I need to update the PC firmware each time I
>> wanted to use a new slot.
>>
> 
> My question was, "do you think the slot is going to be empty always".
> This can happen with slots exposed through a connector (not a PCIe one) and
> users would plug in add-on cards for accessing the slots. In those
> cases, the add-on specific devicetree can enable the PCIe instance and
> use it.
> 
> But from your reply, I can infer that the slot is exposed on a standard
> PCIe connector and users would connect a PCIe device any time.

Correct it is exposed via a standard connector. Yes for PCIe slots that 
are not connected to an on-board device or connector, in that case it 
would make sense to disable, but this is not the case here.

Jon
Krzysztof Wilczyński Sept. 14, 2022, 12:44 p.m. UTC | #11
Hello,

[...]
> Anyway, I don't strongly object the change and leave it to the
> maintainers to decide.

Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially
since it appears that this information is of more use to the developer (who
most likely has the suitable log level set anyway), and given that there is
no way to reliably detect a presence in a slot on some platforms, this
might otherwise, add to the other messages that normal users don't pay
attention to usually - if this is not to be treated as an error.

	Krzysztof
Manivannan Sadhasivam Sept. 14, 2022, 1:45 p.m. UTC | #12
On Wed, Sep 14, 2022 at 09:44:44PM +0900, Krzysztof Wilczyński wrote:
> Hello,
> 
> [...]
> > Anyway, I don't strongly object the change and leave it to the
> > maintainers to decide.
> 
> Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially
> since it appears that this information is of more use to the developer (who
> most likely has the suitable log level set anyway), and given that there is
> no way to reliably detect a presence in a slot on some platforms, this
> might otherwise, add to the other messages that normal users don't pay
> attention to usually - if this is not to be treated as an error.
> 

No, this is clearly not a debug message. As I quoted above, the link up
can fail due to an issue with PHY also. In that case, user has to see
the log to debug/report the issue.

So, either dev_info() or dev_err().

Thanks,
Mani

> 	Krzysztof
Krzysztof Wilczyński Sept. 14, 2022, 2:52 p.m. UTC | #13
Hello,

> > Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially
> > since it appears that this information is of more use to the developer (who
> > most likely has the suitable log level set anyway), and given that there is
> > no way to reliably detect a presence in a slot on some platforms, this
> > might otherwise, add to the other messages that normal users don't pay
> > attention to usually - if this is not to be treated as an error.
> > 
> 
> No, this is clearly not a debug message. As I quoted above, the link up
> can fail due to an issue with PHY also. In that case, user has to see
> the log to debug/report the issue.

Apologies!  I missed that.  Thank you!
 
> So, either dev_info() or dev_err().

So, there is nothing to do here, then.  This stays as dev_err() as per the
change from:

  14c4ad125cf9 ("PCI: dwc: Log link speed and width if it comes up")

That said, perhaps adding a comment explaining why this is an error
might help future reference, as the linked commit didn't justify the
change.  There also will be redundant messages printed, as per:

  https://elixir.bootlin.com/linux/v6.0-rc5/source/drivers/pci/controller/dwc/pcie-fu740.c#L207

Which the linked commit didn't take into account, I suppose.

	Krzysztof
Jon Hunter Sept. 14, 2022, 3:11 p.m. UTC | #14
On 14/09/2022 15:52, Krzysztof Wilczyński wrote:
> Hello,
> 
>>> Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially
>>> since it appears that this information is of more use to the developer (who
>>> most likely has the suitable log level set anyway), and given that there is
>>> no way to reliably detect a presence in a slot on some platforms, this
>>> might otherwise, add to the other messages that normal users don't pay
>>> attention to usually - if this is not to be treated as an error.
>>>
>>
>> No, this is clearly not a debug message. As I quoted above, the link up
>> can fail due to an issue with PHY also. In that case, user has to see
>> the log to debug/report the issue.
> 
> Apologies!  I missed that.  Thank you!
>   
>> So, either dev_info() or dev_err().
> 
> So, there is nothing to do here, then.  This stays as dev_err() as per the
> change from:
> 
>    14c4ad125cf9 ("PCI: dwc: Log link speed and width if it comes up")


I am not sure I agree. There is a similar change here ...

commit 4b16a8227907118e011fb396022da671a52b2272
Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Date:   Tue Jun 18 23:32:06 2019 +0530

     PCI: tegra: Change link retry log level to debug

If we have a way to determine if a card/device is connected then dev_err 
is appropriate, but if not then dev_dbg/info are appropriate IMO.

Jon
Manivannan Sadhasivam Sept. 15, 2022, 10:44 a.m. UTC | #15
On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> Some of the platforms (like Tegra194 and Tegra234) have open slots and
> not having an endpoint connected to the slot is not an error.
> So, changing the macro from dev_err to dev_info to log the event.
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>

Since the PHY issue that could happen during boot up is rare, it looks
okay to me treating the log as INFO.

Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

Thanks,
Mani

> ---
>  drivers/pci/controller/dwc/pcie-designware.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> index 650a7f22f9d0..25154555aa7a 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.c
> +++ b/drivers/pci/controller/dwc/pcie-designware.c
> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
>  	}
>  
>  	if (retries >= LINK_WAIT_MAX_RETRIES) {
> -		dev_err(pci->dev, "Phy link never came up\n");
> +		dev_info(pci->dev, "Phy link never came up\n");
>  		return -ETIMEDOUT;
>  	}
>  
> -- 
> 2.17.1
>
Rob Herring (Arm) Sept. 15, 2022, 2:16 p.m. UTC | #16
On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
<manivannan.sadhasivam@linaro.org> wrote:
>
> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > not having an endpoint connected to the slot is not an error.
> > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > >
> > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > returned. So I don't think the log severity should be changed.
> > >
> > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > that a similar change was made by the following commit and it also seems
> > > appropriate here ...
> > >
> > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
> > > Date:   Tue Jun 18 23:32:06 2019 +0530
> > >
> > >     PCI: tegra: Change link retry log level to debug
> > >
> > >
> > > BTW, we check for error messages in the dmesg output and this is a new error
> > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > error, but in this case it seem more appropriate to make this a info or
> > > debug level print.
> >
> > Can you tell whether there's a device present, e.g., via Slot Status
> > Presence Detect?  If there's nothing in the slot, I don't know why we
> > would print anything at all.  If a card is present but there's no
> > link, that's probably worthy of dev_info() or even dev_err().
> >
>
> I don't think all form factors allow for the PRSNT pin to be wired up,
> so we cannot know if the device is actually present in the slot or not all
> the time. Maybe we should do if the form factor supports it?
>
> > I guess if you can tell the slot is empty, there's no point in even
> > trying to start the link, so you could avoid both the message and the
> > timeout by not even calling dw_pcie_wait_for_link().
>
> Right. There is an overhead of waiting for ~1ms during boot.

Async probe should mitigate that, right? Saravana is working toward
making that the default instead of opt in, but you could opt in now.

Rob
Manivannan Sadhasivam Sept. 15, 2022, 2:52 p.m. UTC | #17
On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
> <manivannan.sadhasivam@linaro.org> wrote:
> >
> > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > > not having an endpoint connected to the slot is not an error.
> > > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > >
> > > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > > returned. So I don't think the log severity should be changed.
> > > >
> > > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > > that a similar change was made by the following commit and it also seems
> > > > appropriate here ...
> > > >
> > > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
> > > > Date:   Tue Jun 18 23:32:06 2019 +0530
> > > >
> > > >     PCI: tegra: Change link retry log level to debug
> > > >
> > > >
> > > > BTW, we check for error messages in the dmesg output and this is a new error
> > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > > error, but in this case it seem more appropriate to make this a info or
> > > > debug level print.
> > >
> > > Can you tell whether there's a device present, e.g., via Slot Status
> > > Presence Detect?  If there's nothing in the slot, I don't know why we
> > > would print anything at all.  If a card is present but there's no
> > > link, that's probably worthy of dev_info() or even dev_err().
> > >
> >
> > I don't think all form factors allow for the PRSNT pin to be wired up,
> > so we cannot know if the device is actually present in the slot or not all
> > the time. Maybe we should do if the form factor supports it?
> >
> > > I guess if you can tell the slot is empty, there's no point in even
> > > trying to start the link, so you could avoid both the message and the
> > > timeout by not even calling dw_pcie_wait_for_link().
> >
> > Right. There is an overhead of waiting for ~1ms during boot.
> 
> Async probe should mitigate that, right? Saravana is working toward
> making that the default instead of opt in, but you could opt in now.
> 

No. The delay is due to the DWC core waiting for link up that depends on
the PCIe device to be present on the slot. The driver probe order
doesn't apply here.

Thanks,
Mani

> Rob
Vidya Sagar Sept. 26, 2022, 10:29 a.m. UTC | #18
Hi,
Just checking if we are good with this patch or does it need any further 
modifications?

Thanks,
Vidya Sagar

On 9/15/2022 8:22 PM, Manivannan Sadhasivam wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
>> <manivannan.sadhasivam@linaro.org> wrote:
>>>
>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>>>>>>> not having an endpoint connected to the slot is not an error.
>>>>>>> So, changing the macro from dev_err to dev_info to log the event.
>>>>>>
>>>>>> But the link up not happening is an actual error and -ETIMEDOUT is being
>>>>>> returned. So I don't think the log severity should be changed.
>>>>>
>>>>> Yes it is an error in the sense it is a timeout, but reporting an error
>>>>> because nothing is attached to a PCI slot seems a bit noisy. Please note
>>>>> that a similar change was made by the following commit and it also seems
>>>>> appropriate here ...
>>>>>
>>>>> commit 4b16a8227907118e011fb396022da671a52b2272
>>>>> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
>>>>> Date:   Tue Jun 18 23:32:06 2019 +0530
>>>>>
>>>>>      PCI: tegra: Change link retry log level to debug
>>>>>
>>>>>
>>>>> BTW, we check for error messages in the dmesg output and this is a new error
>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
>>>>> error, but in this case it seem more appropriate to make this a info or
>>>>> debug level print.
>>>>
>>>> Can you tell whether there's a device present, e.g., via Slot Status
>>>> Presence Detect?  If there's nothing in the slot, I don't know why we
>>>> would print anything at all.  If a card is present but there's no
>>>> link, that's probably worthy of dev_info() or even dev_err().
>>>>
>>>
>>> I don't think all form factors allow for the PRSNT pin to be wired up,
>>> so we cannot know if the device is actually present in the slot or not all
>>> the time. Maybe we should do if the form factor supports it?
>>>
>>>> I guess if you can tell the slot is empty, there's no point in even
>>>> trying to start the link, so you could avoid both the message and the
>>>> timeout by not even calling dw_pcie_wait_for_link().
>>>
>>> Right. There is an overhead of waiting for ~1ms during boot.
>>
>> Async probe should mitigate that, right? Saravana is working toward
>> making that the default instead of opt in, but you could opt in now.
>>
> 
> No. The delay is due to the DWC core waiting for link up that depends on
> the PCIe device to be present on the slot. The driver probe order
> doesn't apply here.
> 
> Thanks,
> Mani
> 
>> Rob
Vidya Sagar Oct. 3, 2022, 11:25 a.m. UTC | #19
Hi Bjorn / Lorenzo,
Just checking if there are any more comments for this patch? If not, are 
we good to take it?

Thanks,
Vidya Sagar

On 9/26/2022 3:59 PM, Vidya Sagar wrote:
> Hi,
> Just checking if we are good with this patch or does it need any further 
> modifications?
> 
> Thanks,
> Vidya Sagar
> 
> On 9/15/2022 8:22 PM, Manivannan Sadhasivam wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
>>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
>>> <manivannan.sadhasivam@linaro.org> wrote:
>>>>
>>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
>>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
>>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
>>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
>>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open 
>>>>>>>> slots and
>>>>>>>> not having an endpoint connected to the slot is not an error.
>>>>>>>> So, changing the macro from dev_err to dev_info to log the event.
>>>>>>>
>>>>>>> But the link up not happening is an actual error and -ETIMEDOUT 
>>>>>>> is being
>>>>>>> returned. So I don't think the log severity should be changed.
>>>>>>
>>>>>> Yes it is an error in the sense it is a timeout, but reporting an 
>>>>>> error
>>>>>> because nothing is attached to a PCI slot seems a bit noisy. 
>>>>>> Please note
>>>>>> that a similar change was made by the following commit and it also 
>>>>>> seems
>>>>>> appropriate here ...
>>>>>>
>>>>>> commit 4b16a8227907118e011fb396022da671a52b2272
>>>>>> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
>>>>>> Date:   Tue Jun 18 23:32:06 2019 +0530
>>>>>>
>>>>>>      PCI: tegra: Change link retry log level to debug
>>>>>>
>>>>>>
>>>>>> BTW, we check for error messages in the dmesg output and this is a 
>>>>>> new error
>>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can 
>>>>>> ignore the
>>>>>> error, but in this case it seem more appropriate to make this a 
>>>>>> info or
>>>>>> debug level print.
>>>>>
>>>>> Can you tell whether there's a device present, e.g., via Slot Status
>>>>> Presence Detect?  If there's nothing in the slot, I don't know why we
>>>>> would print anything at all.  If a card is present but there's no
>>>>> link, that's probably worthy of dev_info() or even dev_err().
>>>>>
>>>>
>>>> I don't think all form factors allow for the PRSNT pin to be wired up,
>>>> so we cannot know if the device is actually present in the slot or 
>>>> not all
>>>> the time. Maybe we should do if the form factor supports it?
>>>>
>>>>> I guess if you can tell the slot is empty, there's no point in even
>>>>> trying to start the link, so you could avoid both the message and the
>>>>> timeout by not even calling dw_pcie_wait_for_link().
>>>>
>>>> Right. There is an overhead of waiting for ~1ms during boot.
>>>
>>> Async probe should mitigate that, right? Saravana is working toward
>>> making that the default instead of opt in, but you could opt in now.
>>>
>>
>> No. The delay is due to the DWC core waiting for link up that depends on
>> the PCIe device to be present on the slot. The driver probe order
>> doesn't apply here.
>>
>> Thanks,
>> Mani
>>
>>> Rob
Rob Herring (Arm) Oct. 4, 2022, 12:53 p.m. UTC | #20
On Thu, Sep 15, 2022 at 9:52 AM Manivannan Sadhasivam
<manivannan.sadhasivam@linaro.org> wrote:
>
> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
> > On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
> > <manivannan.sadhasivam@linaro.org> wrote:
> > >
> > > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > > > not having an endpoint connected to the slot is not an error.
> > > > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > > >
> > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > > > returned. So I don't think the log severity should be changed.
> > > > >
> > > > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > > > that a similar change was made by the following commit and it also seems
> > > > > appropriate here ...
> > > > >
> > > > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
> > > > > Date:   Tue Jun 18 23:32:06 2019 +0530
> > > > >
> > > > >     PCI: tegra: Change link retry log level to debug
> > > > >
> > > > >
> > > > > BTW, we check for error messages in the dmesg output and this is a new error
> > > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > > > error, but in this case it seem more appropriate to make this a info or
> > > > > debug level print.
> > > >
> > > > Can you tell whether there's a device present, e.g., via Slot Status
> > > > Presence Detect?  If there's nothing in the slot, I don't know why we
> > > > would print anything at all.  If a card is present but there's no
> > > > link, that's probably worthy of dev_info() or even dev_err().
> > > >
> > >
> > > I don't think all form factors allow for the PRSNT pin to be wired up,
> > > so we cannot know if the device is actually present in the slot or not all
> > > the time. Maybe we should do if the form factor supports it?
> > >
> > > > I guess if you can tell the slot is empty, there's no point in even
> > > > trying to start the link, so you could avoid both the message and the
> > > > timeout by not even calling dw_pcie_wait_for_link().
> > >
> > > Right. There is an overhead of waiting for ~1ms during boot.
> >
> > Async probe should mitigate that, right? Saravana is working toward
> > making that the default instead of opt in, but you could opt in now.
> >
>
> No. The delay is due to the DWC core waiting for link up that depends on
> the PCIe device to be present on the slot.

Yes, I understand that already.

> The driver probe order
> doesn't apply here.

I'm not talking about probe order, but rather async probe enabling
parallel probing. If waiting for the link happens asynchronously, then
other probes can happen in parallel and you won't see the delay (until
you run out of cores or all the other probes are faster).

Rob
Vidya Sagar Oct. 10, 2022, 6:02 a.m. UTC | #21
On 10/4/2022 6:23 PM, Rob Herring wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Sep 15, 2022 at 9:52 AM Manivannan Sadhasivam
> <manivannan.sadhasivam@linaro.org> wrote:
>>
>> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
>>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
>>> <manivannan.sadhasivam@linaro.org> wrote:
>>>>
>>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
>>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
>>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
>>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
>>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>>>>>>>> not having an endpoint connected to the slot is not an error.
>>>>>>>> So, changing the macro from dev_err to dev_info to log the event.
>>>>>>>
>>>>>>> But the link up not happening is an actual error and -ETIMEDOUT is being
>>>>>>> returned. So I don't think the log severity should be changed.
>>>>>>
>>>>>> Yes it is an error in the sense it is a timeout, but reporting an error
>>>>>> because nothing is attached to a PCI slot seems a bit noisy. Please note
>>>>>> that a similar change was made by the following commit and it also seems
>>>>>> appropriate here ...
>>>>>>
>>>>>> commit 4b16a8227907118e011fb396022da671a52b2272
>>>>>> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
>>>>>> Date:   Tue Jun 18 23:32:06 2019 +0530
>>>>>>
>>>>>>      PCI: tegra: Change link retry log level to debug
>>>>>>
>>>>>>
>>>>>> BTW, we check for error messages in the dmesg output and this is a new error
>>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
>>>>>> error, but in this case it seem more appropriate to make this a info or
>>>>>> debug level print.
>>>>>
>>>>> Can you tell whether there's a device present, e.g., via Slot Status
>>>>> Presence Detect?  If there's nothing in the slot, I don't know why we
>>>>> would print anything at all.  If a card is present but there's no
>>>>> link, that's probably worthy of dev_info() or even dev_err().
>>>>>
>>>>
>>>> I don't think all form factors allow for the PRSNT pin to be wired up,
>>>> so we cannot know if the device is actually present in the slot or not all
>>>> the time. Maybe we should do if the form factor supports it?
>>>>
>>>>> I guess if you can tell the slot is empty, there's no point in even
>>>>> trying to start the link, so you could avoid both the message and the
>>>>> timeout by not even calling dw_pcie_wait_for_link().
>>>>
>>>> Right. There is an overhead of waiting for ~1ms during boot.
>>>
>>> Async probe should mitigate that, right? Saravana is working toward
>>> making that the default instead of opt in, but you could opt in now.
>>>
>>
>> No. The delay is due to the DWC core waiting for link up that depends on
>> the PCIe device to be present on the slot.
> 
> Yes, I understand that already.
> 
>> The driver probe order
>> doesn't apply here.
> 
> I'm not talking about probe order, but rather async probe enabling
> parallel probing. If waiting for the link happens asynchronously, then
> other probes can happen in parallel and you won't see the delay (until
> you run out of cores or all the other probes are faster).

Are you suggesting to break the existing probe of DWC based PCIe 
platform drivers into two i.e. sync part that handles the sequence up 
until link up and the async part that starts after link is up (or after 
LIKUP_TIMEOUT if link doesn't come up).

- Vidya Sagar

> 
> Rob
>
Jon Hunter Oct. 18, 2022, 6:21 a.m. UTC | #22
Hi Bjorn,

On 13/09/2022 11:12, Vidya Sagar wrote:
> Some of the platforms (like Tegra194 and Tegra234) have open slots and
> not having an endpoint connected to the slot is not an error.
> So, changing the macro from dev_err to dev_info to log the event.
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
>   drivers/pci/controller/dwc/pcie-designware.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> index 650a7f22f9d0..25154555aa7a 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.c
> +++ b/drivers/pci/controller/dwc/pcie-designware.c
> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
>   	}
>   
>   	if (retries >= LINK_WAIT_MAX_RETRIES) {
> -		dev_err(pci->dev, "Phy link never came up\n");
> +		dev_info(pci->dev, "Phy link never came up\n");
>   		return -ETIMEDOUT;
>   	}
>   


Are you OK to take this change?

Acked-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>

Thanks
Jon
Bjorn Helgaas Oct. 18, 2022, 4:43 p.m. UTC | #23
On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote:
> Hi Bjorn,
> 
> On 13/09/2022 11:12, Vidya Sagar wrote:
> > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > not having an endpoint connected to the slot is not an error.
> > So, changing the macro from dev_err to dev_info to log the event.
> > 
> > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > ---
> >   drivers/pci/controller/dwc/pcie-designware.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> > index 650a7f22f9d0..25154555aa7a 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
> >   	}
> >   	if (retries >= LINK_WAIT_MAX_RETRIES) {
> > -		dev_err(pci->dev, "Phy link never came up\n");
> > +		dev_info(pci->dev, "Phy link never came up\n");
> >   		return -ETIMEDOUT;
> >   	}
> 
> 
> Are you OK to take this change?

When this came up, Lorenzo was in the middle of a big move and I was
covering for him while he was unavailable.  But he's back, and I'm
sure he will resolve this soon.

Personally I'm OK either way.

Bjorn
Jon Hunter Oct. 26, 2022, 11:39 a.m. UTC | #24
Hi Lorenzo,

On 18/10/2022 17:43, Bjorn Helgaas wrote:
> On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote:
>> Hi Bjorn,
>>
>> On 13/09/2022 11:12, Vidya Sagar wrote:
>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>>> not having an endpoint connected to the slot is not an error.
>>> So, changing the macro from dev_err to dev_info to log the event.
>>>
>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>> ---
>>>    drivers/pci/controller/dwc/pcie-designware.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
>>> index 650a7f22f9d0..25154555aa7a 100644
>>> --- a/drivers/pci/controller/dwc/pcie-designware.c
>>> +++ b/drivers/pci/controller/dwc/pcie-designware.c
>>> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
>>>    	}
>>>    	if (retries >= LINK_WAIT_MAX_RETRIES) {
>>> -		dev_err(pci->dev, "Phy link never came up\n");
>>> +		dev_info(pci->dev, "Phy link never came up\n");
>>>    		return -ETIMEDOUT;
>>>    	}
>>
>>
>> Are you OK to take this change?
> 
> When this came up, Lorenzo was in the middle of a big move and I was
> covering for him while he was unavailable.  But he's back, and I'm
> sure he will resolve this soon.
> 
> Personally I'm OK either way.
> 
> Bjorn


Can we come to a conclusion on this?

We have tests that fail when errors/warning messages are reported. We 
can choose to ignore these errors, but given that this is not an error 
in this case, we were thinking it is better to make it informational.

Thanks
Jon
Lorenzo Pieralisi Oct. 26, 2022, 12:33 p.m. UTC | #25
On Wed, Oct 26, 2022 at 12:39:13PM +0100, Jon Hunter wrote:
> Hi Lorenzo,
> 
> On 18/10/2022 17:43, Bjorn Helgaas wrote:
> > On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote:
> > > Hi Bjorn,
> > > 
> > > On 13/09/2022 11:12, Vidya Sagar wrote:
> > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > not having an endpoint connected to the slot is not an error.
> > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > 
> > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > > ---
> > > >    drivers/pci/controller/dwc/pcie-designware.c | 2 +-
> > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> > > > index 650a7f22f9d0..25154555aa7a 100644
> > > > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > > > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > > > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
> > > >    	}
> > > >    	if (retries >= LINK_WAIT_MAX_RETRIES) {
> > > > -		dev_err(pci->dev, "Phy link never came up\n");
> > > > +		dev_info(pci->dev, "Phy link never came up\n");
> > > >    		return -ETIMEDOUT;
> > > >    	}
> > > 
> > > 
> > > Are you OK to take this change?
> > 
> > When this came up, Lorenzo was in the middle of a big move and I was
> > covering for him while he was unavailable.  But he's back, and I'm
> > sure he will resolve this soon.
> > 
> > Personally I'm OK either way.
> > 
> > Bjorn
> 
> 
> Can we come to a conclusion on this?
> 
> We have tests that fail when errors/warning messages are reported. We can
> choose to ignore these errors, but given that this is not an error in this
> case, we were thinking it is better to make it informational.

I understood.

We are at v6.1-rc2, this patch is not a fix, we will handle it for the
v6.2 merge window.

Thanks,
Lorenzo
Rob Herring (Arm) Oct. 26, 2022, 6:06 p.m. UTC | #26
On Mon, Oct 10, 2022 at 1:02 AM Vidya Sagar <vidyas@nvidia.com> wrote:
>
>
>
> On 10/4/2022 6:23 PM, Rob Herring wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Thu, Sep 15, 2022 at 9:52 AM Manivannan Sadhasivam
> > <manivannan.sadhasivam@linaro.org> wrote:
> >>
> >> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
> >>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
> >>> <manivannan.sadhasivam@linaro.org> wrote:
> >>>>
> >>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> >>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> >>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> >>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> >>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
> >>>>>>>> not having an endpoint connected to the slot is not an error.
> >>>>>>>> So, changing the macro from dev_err to dev_info to log the event.
> >>>>>>>
> >>>>>>> But the link up not happening is an actual error and -ETIMEDOUT is being
> >>>>>>> returned. So I don't think the log severity should be changed.
> >>>>>>
> >>>>>> Yes it is an error in the sense it is a timeout, but reporting an error
> >>>>>> because nothing is attached to a PCI slot seems a bit noisy. Please note
> >>>>>> that a similar change was made by the following commit and it also seems
> >>>>>> appropriate here ...
> >>>>>>
> >>>>>> commit 4b16a8227907118e011fb396022da671a52b2272
> >>>>>> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
> >>>>>> Date:   Tue Jun 18 23:32:06 2019 +0530
> >>>>>>
> >>>>>>      PCI: tegra: Change link retry log level to debug
> >>>>>>
> >>>>>>
> >>>>>> BTW, we check for error messages in the dmesg output and this is a new error
> >>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> >>>>>> error, but in this case it seem more appropriate to make this a info or
> >>>>>> debug level print.
> >>>>>
> >>>>> Can you tell whether there's a device present, e.g., via Slot Status
> >>>>> Presence Detect?  If there's nothing in the slot, I don't know why we
> >>>>> would print anything at all.  If a card is present but there's no
> >>>>> link, that's probably worthy of dev_info() or even dev_err().
> >>>>>
> >>>>
> >>>> I don't think all form factors allow for the PRSNT pin to be wired up,
> >>>> so we cannot know if the device is actually present in the slot or not all
> >>>> the time. Maybe we should do if the form factor supports it?
> >>>>
> >>>>> I guess if you can tell the slot is empty, there's no point in even
> >>>>> trying to start the link, so you could avoid both the message and the
> >>>>> timeout by not even calling dw_pcie_wait_for_link().
> >>>>
> >>>> Right. There is an overhead of waiting for ~1ms during boot.
> >>>
> >>> Async probe should mitigate that, right? Saravana is working toward
> >>> making that the default instead of opt in, but you could opt in now.
> >>>
> >>
> >> No. The delay is due to the DWC core waiting for link up that depends on
> >> the PCIe device to be present on the slot.
> >
> > Yes, I understand that already.
> >
> >> The driver probe order
> >> doesn't apply here.
> >
> > I'm not talking about probe order, but rather async probe enabling
> > parallel probing. If waiting for the link happens asynchronously, then
> > other probes can happen in parallel and you won't see the delay (until
> > you run out of cores or all the other probes are faster).
>
> Are you suggesting to break the existing probe of DWC based PCIe
> platform drivers into two i.e. sync part that handles the sequence up
> until link up and the async part that starts after link is up (or after
> LIKUP_TIMEOUT if link doesn't come up).

No, just make the driver opt-in to async probe. It's 1 flag to set for
the driver. Then the delay in your probe is not blocking other probes
and the whole probe for the driver happens in parallel. Then the delay
is only an issue if it is longer than all the other things
initializing during boot or if you are on a single core system.
Neither is likely true.

Rob
Lorenzo Pieralisi Oct. 27, 2022, 9:39 a.m. UTC | #27
On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
> <manivannan.sadhasivam@linaro.org> wrote:
> >
> > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > > not having an endpoint connected to the slot is not an error.
> > > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > >
> > > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > > returned. So I don't think the log severity should be changed.
> > > >
> > > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > > that a similar change was made by the following commit and it also seems
> > > > appropriate here ...
> > > >
> > > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
> > > > Date:   Tue Jun 18 23:32:06 2019 +0530
> > > >
> > > >     PCI: tegra: Change link retry log level to debug
> > > >
> > > >
> > > > BTW, we check for error messages in the dmesg output and this is a new error
> > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > > error, but in this case it seem more appropriate to make this a info or
> > > > debug level print.
> > >
> > > Can you tell whether there's a device present, e.g., via Slot Status
> > > Presence Detect?  If there's nothing in the slot, I don't know why we
> > > would print anything at all.  If a card is present but there's no
> > > link, that's probably worthy of dev_info() or even dev_err().
> > >
> >
> > I don't think all form factors allow for the PRSNT pin to be wired up,
> > so we cannot know if the device is actually present in the slot or not all
> > the time. Maybe we should do if the form factor supports it?
> >
> > > I guess if you can tell the slot is empty, there's no point in even
> > > trying to start the link, so you could avoid both the message and the
> > > timeout by not even calling dw_pcie_wait_for_link().
> >
> > Right. There is an overhead of waiting for ~1ms during boot.
> 
> Async probe should mitigate that, right? Saravana is working toward
> making that the default instead of opt in, but you could opt in now.

I read this as "trying to bring the link up is mandatory because
we can't detect an empty slot, therefore ~1ms delay during boot
is unavoidable" and on that Rob's suggestion applies.

Just to understand where this thread is going. The suggestion
above does not change the aim of this patch, that seems reasonable
to me and it makes sense even if/when what Rob is suggesting is
implemented.

Is that correct ?

Thanks,
Lorenzo
Ben Dooks Oct. 27, 2022, 9:55 a.m. UTC | #28
On 26/10/2022 12:39, Jon Hunter wrote:
> Hi Lorenzo,
> 
> On 18/10/2022 17:43, Bjorn Helgaas wrote:
>> On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote:
>>> Hi Bjorn,
>>>
>>> On 13/09/2022 11:12, Vidya Sagar wrote:
>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>>>> not having an endpoint connected to the slot is not an error.
>>>> So, changing the macro from dev_err to dev_info to log the event.
>>>>
>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>> ---
>>>>    drivers/pci/controller/dwc/pcie-designware.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/controller/dwc/pcie-designware.c 
>>>> b/drivers/pci/controller/dwc/pcie-designware.c
>>>> index 650a7f22f9d0..25154555aa7a 100644
>>>> --- a/drivers/pci/controller/dwc/pcie-designware.c
>>>> +++ b/drivers/pci/controller/dwc/pcie-designware.c
>>>> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
>>>>        }
>>>>        if (retries >= LINK_WAIT_MAX_RETRIES) {
>>>> -        dev_err(pci->dev, "Phy link never came up\n");
>>>> +        dev_info(pci->dev, "Phy link never came up\n");
>>>>            return -ETIMEDOUT;
>>>>        }
>>>
>>>
>>> Are you OK to take this change?
>>
>> When this came up, Lorenzo was in the middle of a big move and I was
>> covering for him while he was unavailable.  But he's back, and I'm
>> sure he will resolve this soon.
>>
>> Personally I'm OK either way.
>>
>> Bjorn
> 
> 
> Can we come to a conclusion on this?
> 
> We have tests that fail when errors/warning messages are reported. We 
> can choose to ignore these errors, but given that this is not an error 
> in this case, we were thinking it is better to make it informational.

Is there any hardware presence detect available to just avoid even
trying to bring a link up on an disconnected port?
Manivannan Sadhasivam Oct. 27, 2022, 11:03 a.m. UTC | #29
On Thu, Oct 27, 2022 at 11:39:02AM +0200, Lorenzo Pieralisi wrote:
> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
> > On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
> > <manivannan.sadhasivam@linaro.org> wrote:
> > >
> > > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > > > not having an endpoint connected to the slot is not an error.
> > > > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > > >
> > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > > > returned. So I don't think the log severity should be changed.
> > > > >
> > > > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > > > that a similar change was made by the following commit and it also seems
> > > > > appropriate here ...
> > > > >
> > > > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com>
> > > > > Date:   Tue Jun 18 23:32:06 2019 +0530
> > > > >
> > > > >     PCI: tegra: Change link retry log level to debug
> > > > >
> > > > >
> > > > > BTW, we check for error messages in the dmesg output and this is a new error
> > > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > > > error, but in this case it seem more appropriate to make this a info or
> > > > > debug level print.
> > > >
> > > > Can you tell whether there's a device present, e.g., via Slot Status
> > > > Presence Detect?  If there's nothing in the slot, I don't know why we
> > > > would print anything at all.  If a card is present but there's no
> > > > link, that's probably worthy of dev_info() or even dev_err().
> > > >
> > >
> > > I don't think all form factors allow for the PRSNT pin to be wired up,
> > > so we cannot know if the device is actually present in the slot or not all
> > > the time. Maybe we should do if the form factor supports it?
> > >
> > > > I guess if you can tell the slot is empty, there's no point in even
> > > > trying to start the link, so you could avoid both the message and the
> > > > timeout by not even calling dw_pcie_wait_for_link().
> > >
> > > Right. There is an overhead of waiting for ~1ms during boot.
> > 
> > Async probe should mitigate that, right? Saravana is working toward
> > making that the default instead of opt in, but you could opt in now.
> 
> I read this as "trying to bring the link up is mandatory because
> we can't detect an empty slot, therefore ~1ms delay during boot
> is unavoidable" and on that Rob's suggestion applies.
> 

Right. Rob's suggestion came as areply to my worry on waiting for link up
during boot.

> Just to understand where this thread is going. The suggestion
> above does not change the aim of this patch, that seems reasonable
> to me and it makes sense even if/when what Rob is suggesting is
> implemented.
> 
> Is that correct ?
> 

Yes, Rob's suggestion makes sense to me. And it has to be implemented as a
separate patch but this patch itself is required to demote the error log to
debug.

Thanks,
Mani

> Thanks,
> Lorenzo
Manivannan Sadhasivam Oct. 27, 2022, 11:05 a.m. UTC | #30
On Thu, Oct 27, 2022 at 10:55:34AM +0100, Ben Dooks wrote:
> On 26/10/2022 12:39, Jon Hunter wrote:
> > Hi Lorenzo,
> > 
> > On 18/10/2022 17:43, Bjorn Helgaas wrote:
> > > On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote:
> > > > Hi Bjorn,
> > > > 
> > > > On 13/09/2022 11:12, Vidya Sagar wrote:
> > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > not having an endpoint connected to the slot is not an error.
> > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > > 
> > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > > > ---
> > > > >    drivers/pci/controller/dwc/pcie-designware.c | 2 +-
> > > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c
> > > > > b/drivers/pci/controller/dwc/pcie-designware.c
> > > > > index 650a7f22f9d0..25154555aa7a 100644
> > > > > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > > > > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > > > > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
> > > > >        }
> > > > >        if (retries >= LINK_WAIT_MAX_RETRIES) {
> > > > > -        dev_err(pci->dev, "Phy link never came up\n");
> > > > > +        dev_info(pci->dev, "Phy link never came up\n");
> > > > >            return -ETIMEDOUT;
> > > > >        }
> > > > 
> > > > 
> > > > Are you OK to take this change?
> > > 
> > > When this came up, Lorenzo was in the middle of a big move and I was
> > > covering for him while he was unavailable.  But he's back, and I'm
> > > sure he will resolve this soon.
> > > 
> > > Personally I'm OK either way.
> > > 
> > > Bjorn
> > 
> > 
> > Can we come to a conclusion on this?
> > 
> > We have tests that fail when errors/warning messages are reported. We
> > can choose to ignore these errors, but given that this is not an error
> > in this case, we were thinking it is better to make it informational.
> 
> Is there any hardware presence detect available to just avoid even
> trying to bring a link up on an disconnected port?
> 

PRSNT pin is not available on all form factors sadly.

Thanks,
Mani

> 
> -- 
> Ben Dooks				http://www.codethink.co.uk/
> Senior Engineer				Codethink - Providing Genius
> 
> https://www.codethink.co.uk/privacy.html
>
Lorenzo Pieralisi Nov. 10, 2022, 3:40 p.m. UTC | #31
On Tue, 13 Sep 2022 15:42:37 +0530, Vidya Sagar wrote:
> Some of the platforms (like Tegra194 and Tegra234) have open slots and
> not having an endpoint connected to the slot is not an error.
> So, changing the macro from dev_err to dev_info to log the event.
> 
> 

Applied to pci/dwc, thanks!

[1/1] PCI: dwc: Use dev_info for PCIe link down event logging
      https://git.kernel.org/lpieralisi/pci/c/8405d8f0956d

Thanks,
Lorenzo
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
index 650a7f22f9d0..25154555aa7a 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -456,7 +456,7 @@  int dw_pcie_wait_for_link(struct dw_pcie *pci)
 	}
 
 	if (retries >= LINK_WAIT_MAX_RETRIES) {
-		dev_err(pci->dev, "Phy link never came up\n");
+		dev_info(pci->dev, "Phy link never came up\n");
 		return -ETIMEDOUT;
 	}