Message ID | 20220913101237.4337-1-vidyas@nvidia.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Lorenzo Pieralisi |
Headers | show |
Series | [V1] PCI: dwc: Use dev_info for PCIe link down event logging | expand |
On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > Some of the platforms (like Tegra194 and Tegra234) have open slots and > not having an endpoint connected to the slot is not an error. > So, changing the macro from dev_err to dev_info to log the event. > But the link up not happening is an actual error and -ETIMEDOUT is being returned. So I don't think the log severity should be changed. Thanks, Mani > Signed-off-by: Vidya Sagar <vidyas@nvidia.com> > --- > drivers/pci/controller/dwc/pcie-designware.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c > index 650a7f22f9d0..25154555aa7a 100644 > --- a/drivers/pci/controller/dwc/pcie-designware.c > +++ b/drivers/pci/controller/dwc/pcie-designware.c > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > } > > if (retries >= LINK_WAIT_MAX_RETRIES) { > - dev_err(pci->dev, "Phy link never came up\n"); > + dev_info(pci->dev, "Phy link never came up\n"); > return -ETIMEDOUT; > } > > -- > 2.17.1 >
On 13/09/2022 17:51, Manivannan Sadhasivam wrote: > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: >> Some of the platforms (like Tegra194 and Tegra234) have open slots and >> not having an endpoint connected to the slot is not an error. >> So, changing the macro from dev_err to dev_info to log the event. >> > > But the link up not happening is an actual error and -ETIMEDOUT is being > returned. So I don't think the log severity should be changed. Yes it is an error in the sense it is a timeout, but reporting an error because nothing is attached to a PCI slot seems a bit noisy. Please note that a similar change was made by the following commit and it also seems appropriate here ... commit 4b16a8227907118e011fb396022da671a52b2272 Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> Date: Tue Jun 18 23:32:06 2019 +0530 PCI: tegra: Change link retry log level to debug BTW, we check for error messages in the dmesg output and this is a new error seen as of Linux v6.0 and so this was flagged in a test. We can ignore the error, but in this case it seem more appropriate to make this a info or debug level print. Jon
On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: > On 13/09/2022 17:51, Manivannan Sadhasivam wrote: > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > > not having an endpoint connected to the slot is not an error. > > > So, changing the macro from dev_err to dev_info to log the event. > > > > But the link up not happening is an actual error and -ETIMEDOUT is being > > returned. So I don't think the log severity should be changed. > > Yes it is an error in the sense it is a timeout, but reporting an error > because nothing is attached to a PCI slot seems a bit noisy. Please note > that a similar change was made by the following commit and it also seems > appropriate here ... > > commit 4b16a8227907118e011fb396022da671a52b2272 > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> > Date: Tue Jun 18 23:32:06 2019 +0530 > > PCI: tegra: Change link retry log level to debug > > > BTW, we check for error messages in the dmesg output and this is a new error > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the > error, but in this case it seem more appropriate to make this a info or > debug level print. Can you tell whether there's a device present, e.g., via Slot Status Presence Detect? If there's nothing in the slot, I don't know why we would print anything at all. If a card is present but there's no link, that's probably worthy of dev_info() or even dev_err(). I guess if you can tell the slot is empty, there's no point in even trying to start the link, so you could avoid both the message and the timeout by not even calling dw_pcie_wait_for_link(). Bjorn
On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote: > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > > not having an endpoint connected to the slot is not an error. > > > So, changing the macro from dev_err to dev_info to log the event. > > > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being > > returned. So I don't think the log severity should be changed. > > Yes it is an error in the sense it is a timeout, but reporting an error > because nothing is attached to a PCI slot seems a bit noisy. Please note > that a similar change was made by the following commit and it also seems > appropriate here ... > I got the concern. But the problem here is, we cannot distinguish if the error is due to slot empty or an actual issues with link/phy. Also it should be noted that if the slot is empty, the dwc core anyway waits for the link to be up. IMO this is a boot time overhead that could be avoided if the specific instance could be disabled in platform data like devicetree. > commit 4b16a8227907118e011fb396022da671a52b2272 > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> > Date: Tue Jun 18 23:32:06 2019 +0530 > > PCI: tegra: Change link retry log level to debug > > > BTW, we check for error messages in the dmesg output and this is a new error > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the > error, but in this case it seem more appropriate to make this a info or > debug level print. > On some of the Qcom based platforms, we avoid this situation by disabling the specific PCIe node in devicetree for which we know that there would be no devices connected. This not only avoids the error message in log but also removes the time spent waiting for link to be up. Thanks, Mani > Jon > > -- > nvpublic
On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote: > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > > > not having an endpoint connected to the slot is not an error. > > > > So, changing the macro from dev_err to dev_info to log the event. > > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being > > > returned. So I don't think the log severity should be changed. > > > > Yes it is an error in the sense it is a timeout, but reporting an error > > because nothing is attached to a PCI slot seems a bit noisy. Please note > > that a similar change was made by the following commit and it also seems > > appropriate here ... > > > > commit 4b16a8227907118e011fb396022da671a52b2272 > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> > > Date: Tue Jun 18 23:32:06 2019 +0530 > > > > PCI: tegra: Change link retry log level to debug > > > > > > BTW, we check for error messages in the dmesg output and this is a new error > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the > > error, but in this case it seem more appropriate to make this a info or > > debug level print. > > Can you tell whether there's a device present, e.g., via Slot Status > Presence Detect? If there's nothing in the slot, I don't know why we > would print anything at all. If a card is present but there's no > link, that's probably worthy of dev_info() or even dev_err(). > I don't think all form factors allow for the PRSNT pin to be wired up, so we cannot know if the device is actually present in the slot or not all the time. Maybe we should do if the form factor supports it? > I guess if you can tell the slot is empty, there's no point in even > trying to start the link, so you could avoid both the message and the > timeout by not even calling dw_pcie_wait_for_link(). Right. There is an overhead of waiting for ~1ms during boot. We workaround this issue by disabling the PCIe instances in devicetree for which there would be no devices connected. Thanks, Mani > > Bjorn
Agree with Mani. Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even if the slot form factor supports PRSNT, it is not mandatory to have a GPIO routed to the PRSNT pin of the slot. Also, since these are development platforms, we wouldn't want to keep changing a controller's status in the DT, instead want to know the device's presence/absence (by way of link up) looking at the log. In my honest opinion, I don't think the absence of a device in the slot should be treated as an error. Thanks, Vidya Sagar On 9/14/2022 11:54 AM, Manivannan Sadhasivam wrote: > External email: Use caution opening links or attachments > > > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: >> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: >>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote: >>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: >>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and >>>>> not having an endpoint connected to the slot is not an error. >>>>> So, changing the macro from dev_err to dev_info to log the event. >>>> >>>> But the link up not happening is an actual error and -ETIMEDOUT is being >>>> returned. So I don't think the log severity should be changed. >>> >>> Yes it is an error in the sense it is a timeout, but reporting an error >>> because nothing is attached to a PCI slot seems a bit noisy. Please note >>> that a similar change was made by the following commit and it also seems >>> appropriate here ... >>> >>> commit 4b16a8227907118e011fb396022da671a52b2272 >>> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> >>> Date: Tue Jun 18 23:32:06 2019 +0530 >>> >>> PCI: tegra: Change link retry log level to debug >>> >>> >>> BTW, we check for error messages in the dmesg output and this is a new error >>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the >>> error, but in this case it seem more appropriate to make this a info or >>> debug level print. >> >> Can you tell whether there's a device present, e.g., via Slot Status >> Presence Detect? If there's nothing in the slot, I don't know why we >> would print anything at all. If a card is present but there's no >> link, that's probably worthy of dev_info() or even dev_err(). >> > > I don't think all form factors allow for the PRSNT pin to be wired up, > so we cannot know if the device is actually present in the slot or not all > the time. Maybe we should do if the form factor supports it? > >> I guess if you can tell the slot is empty, there's no point in even >> trying to start the link, so you could avoid both the message and the >> timeout by not even calling dw_pcie_wait_for_link(). > > Right. There is an overhead of waiting for ~1ms during boot. > > We workaround this issue by disabling the PCIe instances in devicetree > for which there would be no devices connected. > > Thanks, > Mani > >> >> Bjorn
On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote: > Agree with Mani. > Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even > if the slot form factor supports PRSNT, it is not mandatory to have a GPIO > routed to the PRSNT pin of the slot. Right. > Also, since these are development platforms, we wouldn't want to keep > changing a controller's status in the DT, instead want to know the device's > presence/absence (by way of link up) looking at the log. > In my honest opinion, I don't think the absence of a device in the slot > should be treated as an error. > As I mentioned earlier, timeout can happen due to an issue with PHY layer also. In those cases, "dev_err()" is relevant. And I also agree that absence of the device should not be treated as an error. But my question is, if you know that the slot is going to be empty always, why cannot you just disable it in dts? Even if you make the log "dev_info()" there is the wait time for link up and I'm sure that you wouldn't want it either. Thanks, Mani > Thanks, > Vidya Sagar > > On 9/14/2022 11:54 AM, Manivannan Sadhasivam wrote: > > External email: Use caution opening links or attachments > > > > > > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: > > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: > > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote: > > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > > > > > not having an endpoint connected to the slot is not an error. > > > > > > So, changing the macro from dev_err to dev_info to log the event. > > > > > > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being > > > > > returned. So I don't think the log severity should be changed. > > > > > > > > Yes it is an error in the sense it is a timeout, but reporting an error > > > > because nothing is attached to a PCI slot seems a bit noisy. Please note > > > > that a similar change was made by the following commit and it also seems > > > > appropriate here ... > > > > > > > > commit 4b16a8227907118e011fb396022da671a52b2272 > > > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> > > > > Date: Tue Jun 18 23:32:06 2019 +0530 > > > > > > > > PCI: tegra: Change link retry log level to debug > > > > > > > > > > > > BTW, we check for error messages in the dmesg output and this is a new error > > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the > > > > error, but in this case it seem more appropriate to make this a info or > > > > debug level print. > > > > > > Can you tell whether there's a device present, e.g., via Slot Status > > > Presence Detect? If there's nothing in the slot, I don't know why we > > > would print anything at all. If a card is present but there's no > > > link, that's probably worthy of dev_info() or even dev_err(). > > > > > > > I don't think all form factors allow for the PRSNT pin to be wired up, > > so we cannot know if the device is actually present in the slot or not all > > the time. Maybe we should do if the form factor supports it? > > > > > I guess if you can tell the slot is empty, there's no point in even > > > trying to start the link, so you could avoid both the message and the > > > timeout by not even calling dw_pcie_wait_for_link(). > > > > Right. There is an overhead of waiting for ~1ms during boot. > > > > We workaround this issue by disabling the PCIe instances in devicetree > > for which there would be no devices connected. > > > > Thanks, > > Mani > > > > > > > > Bjorn
On 14/09/2022 12:18, Manivannan Sadhasivam wrote: > On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote: >> Agree with Mani. >> Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even >> if the slot form factor supports PRSNT, it is not mandatory to have a GPIO >> routed to the PRSNT pin of the slot. > > Right. > >> Also, since these are development platforms, we wouldn't want to keep >> changing a controller's status in the DT, instead want to know the device's >> presence/absence (by way of link up) looking at the log. >> In my honest opinion, I don't think the absence of a device in the slot >> should be treated as an error. >> > > As I mentioned earlier, timeout can happen due to an issue with PHY layer > also. In those cases, "dev_err()" is relevant. > > And I also agree that absence of the device should not be treated as an > error. But my question is, if you know that the slot is going to be > empty always, why cannot you just disable it in dts? I really don't think that makes sense from a usability perspective. You want to allow users to connect PCI cards and for them to work without having to update the DTB. I have plenty of open PCI slots on my PC and I would be a bit upset if someone told me I need to update the PC firmware each time I wanted to use a new slot. Jon
On Wed, Sep 14, 2022 at 12:25:51PM +0100, Jon Hunter wrote: > > On 14/09/2022 12:18, Manivannan Sadhasivam wrote: > > On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote: > > > Agree with Mani. > > > Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even > > > if the slot form factor supports PRSNT, it is not mandatory to have a GPIO > > > routed to the PRSNT pin of the slot. > > > > Right. > > > > > Also, since these are development platforms, we wouldn't want to keep > > > changing a controller's status in the DT, instead want to know the device's > > > presence/absence (by way of link up) looking at the log. > > > In my honest opinion, I don't think the absence of a device in the slot > > > should be treated as an error. > > > > > > > As I mentioned earlier, timeout can happen due to an issue with PHY layer > > also. In those cases, "dev_err()" is relevant. > > > > And I also agree that absence of the device should not be treated as an > > error. But my question is, if you know that the slot is going to be > > empty always, why cannot you just disable it in dts? > > I really don't think that makes sense from a usability perspective. You want > to allow users to connect PCI cards and for them to work without having to > update the DTB. I have plenty of open PCI slots on my PC and I would be a > bit upset if someone told me I need to update the PC firmware each time I > wanted to use a new slot. > My question was, "do you think the slot is going to be empty always". This can happen with slots exposed through a connector (not a PCIe one) and users would plug in add-on cards for accessing the slots. In those cases, the add-on specific devicetree can enable the PCIe instance and use it. But from your reply, I can infer that the slot is exposed on a standard PCIe connector and users would connect a PCIe device any time. Anyway, I don't strongly object the change and leave it to the maintainers to decide. Thanks, Mani > Jon > > -- > nvpublic
On 14/09/2022 12:43, Manivannan Sadhasivam wrote: > On Wed, Sep 14, 2022 at 12:25:51PM +0100, Jon Hunter wrote: >> >> On 14/09/2022 12:18, Manivannan Sadhasivam wrote: >>> On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote: >>>> Agree with Mani. >>>> Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even >>>> if the slot form factor supports PRSNT, it is not mandatory to have a GPIO >>>> routed to the PRSNT pin of the slot. >>> >>> Right. >>> >>>> Also, since these are development platforms, we wouldn't want to keep >>>> changing a controller's status in the DT, instead want to know the device's >>>> presence/absence (by way of link up) looking at the log. >>>> In my honest opinion, I don't think the absence of a device in the slot >>>> should be treated as an error. >>>> >>> >>> As I mentioned earlier, timeout can happen due to an issue with PHY layer >>> also. In those cases, "dev_err()" is relevant. >>> >>> And I also agree that absence of the device should not be treated as an >>> error. But my question is, if you know that the slot is going to be >>> empty always, why cannot you just disable it in dts? >> >> I really don't think that makes sense from a usability perspective. You want >> to allow users to connect PCI cards and for them to work without having to >> update the DTB. I have plenty of open PCI slots on my PC and I would be a >> bit upset if someone told me I need to update the PC firmware each time I >> wanted to use a new slot. >> > > My question was, "do you think the slot is going to be empty always". > This can happen with slots exposed through a connector (not a PCIe one) and > users would plug in add-on cards for accessing the slots. In those > cases, the add-on specific devicetree can enable the PCIe instance and > use it. > > But from your reply, I can infer that the slot is exposed on a standard > PCIe connector and users would connect a PCIe device any time. Correct it is exposed via a standard connector. Yes for PCIe slots that are not connected to an on-board device or connector, in that case it would make sense to disable, but this is not the case here. Jon
Hello, [...] > Anyway, I don't strongly object the change and leave it to the > maintainers to decide. Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially since it appears that this information is of more use to the developer (who most likely has the suitable log level set anyway), and given that there is no way to reliably detect a presence in a slot on some platforms, this might otherwise, add to the other messages that normal users don't pay attention to usually - if this is not to be treated as an error. Krzysztof
On Wed, Sep 14, 2022 at 09:44:44PM +0900, Krzysztof Wilczyński wrote: > Hello, > > [...] > > Anyway, I don't strongly object the change and leave it to the > > maintainers to decide. > > Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially > since it appears that this information is of more use to the developer (who > most likely has the suitable log level set anyway), and given that there is > no way to reliably detect a presence in a slot on some platforms, this > might otherwise, add to the other messages that normal users don't pay > attention to usually - if this is not to be treated as an error. > No, this is clearly not a debug message. As I quoted above, the link up can fail due to an issue with PHY also. In that case, user has to see the log to debug/report the issue. So, either dev_info() or dev_err(). Thanks, Mani > Krzysztof
Hello, > > Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially > > since it appears that this information is of more use to the developer (who > > most likely has the suitable log level set anyway), and given that there is > > no way to reliably detect a presence in a slot on some platforms, this > > might otherwise, add to the other messages that normal users don't pay > > attention to usually - if this is not to be treated as an error. > > > > No, this is clearly not a debug message. As I quoted above, the link up > can fail due to an issue with PHY also. In that case, user has to see > the log to debug/report the issue. Apologies! I missed that. Thank you! > So, either dev_info() or dev_err(). So, there is nothing to do here, then. This stays as dev_err() as per the change from: 14c4ad125cf9 ("PCI: dwc: Log link speed and width if it comes up") That said, perhaps adding a comment explaining why this is an error might help future reference, as the linked commit didn't justify the change. There also will be redundant messages printed, as per: https://elixir.bootlin.com/linux/v6.0-rc5/source/drivers/pci/controller/dwc/pcie-fu740.c#L207 Which the linked commit didn't take into account, I suppose. Krzysztof
On 14/09/2022 15:52, Krzysztof Wilczyński wrote: > Hello, > >>> Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially >>> since it appears that this information is of more use to the developer (who >>> most likely has the suitable log level set anyway), and given that there is >>> no way to reliably detect a presence in a slot on some platforms, this >>> might otherwise, add to the other messages that normal users don't pay >>> attention to usually - if this is not to be treated as an error. >>> >> >> No, this is clearly not a debug message. As I quoted above, the link up >> can fail due to an issue with PHY also. In that case, user has to see >> the log to debug/report the issue. > > Apologies! I missed that. Thank you! > >> So, either dev_info() or dev_err(). > > So, there is nothing to do here, then. This stays as dev_err() as per the > change from: > > 14c4ad125cf9 ("PCI: dwc: Log link speed and width if it comes up") I am not sure I agree. There is a similar change here ... commit 4b16a8227907118e011fb396022da671a52b2272 Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> Date: Tue Jun 18 23:32:06 2019 +0530 PCI: tegra: Change link retry log level to debug If we have a way to determine if a card/device is connected then dev_err is appropriate, but if not then dev_dbg/info are appropriate IMO. Jon
On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > Some of the platforms (like Tegra194 and Tegra234) have open slots and > not having an endpoint connected to the slot is not an error. > So, changing the macro from dev_err to dev_info to log the event. > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Since the PHY issue that could happen during boot up is rare, it looks okay to me treating the log as INFO. Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Thanks, Mani > --- > drivers/pci/controller/dwc/pcie-designware.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c > index 650a7f22f9d0..25154555aa7a 100644 > --- a/drivers/pci/controller/dwc/pcie-designware.c > +++ b/drivers/pci/controller/dwc/pcie-designware.c > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > } > > if (retries >= LINK_WAIT_MAX_RETRIES) { > - dev_err(pci->dev, "Phy link never came up\n"); > + dev_info(pci->dev, "Phy link never came up\n"); > return -ETIMEDOUT; > } > > -- > 2.17.1 >
On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote: > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > > > > not having an endpoint connected to the slot is not an error. > > > > > So, changing the macro from dev_err to dev_info to log the event. > > > > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being > > > > returned. So I don't think the log severity should be changed. > > > > > > Yes it is an error in the sense it is a timeout, but reporting an error > > > because nothing is attached to a PCI slot seems a bit noisy. Please note > > > that a similar change was made by the following commit and it also seems > > > appropriate here ... > > > > > > commit 4b16a8227907118e011fb396022da671a52b2272 > > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> > > > Date: Tue Jun 18 23:32:06 2019 +0530 > > > > > > PCI: tegra: Change link retry log level to debug > > > > > > > > > BTW, we check for error messages in the dmesg output and this is a new error > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the > > > error, but in this case it seem more appropriate to make this a info or > > > debug level print. > > > > Can you tell whether there's a device present, e.g., via Slot Status > > Presence Detect? If there's nothing in the slot, I don't know why we > > would print anything at all. If a card is present but there's no > > link, that's probably worthy of dev_info() or even dev_err(). > > > > I don't think all form factors allow for the PRSNT pin to be wired up, > so we cannot know if the device is actually present in the slot or not all > the time. Maybe we should do if the form factor supports it? > > > I guess if you can tell the slot is empty, there's no point in even > > trying to start the link, so you could avoid both the message and the > > timeout by not even calling dw_pcie_wait_for_link(). > > Right. There is an overhead of waiting for ~1ms during boot. Async probe should mitigate that, right? Saravana is working toward making that the default instead of opt in, but you could opt in now. Rob
On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote: > On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam > <manivannan.sadhasivam@linaro.org> wrote: > > > > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: > > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: > > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote: > > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > > > > > not having an endpoint connected to the slot is not an error. > > > > > > So, changing the macro from dev_err to dev_info to log the event. > > > > > > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being > > > > > returned. So I don't think the log severity should be changed. > > > > > > > > Yes it is an error in the sense it is a timeout, but reporting an error > > > > because nothing is attached to a PCI slot seems a bit noisy. Please note > > > > that a similar change was made by the following commit and it also seems > > > > appropriate here ... > > > > > > > > commit 4b16a8227907118e011fb396022da671a52b2272 > > > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> > > > > Date: Tue Jun 18 23:32:06 2019 +0530 > > > > > > > > PCI: tegra: Change link retry log level to debug > > > > > > > > > > > > BTW, we check for error messages in the dmesg output and this is a new error > > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the > > > > error, but in this case it seem more appropriate to make this a info or > > > > debug level print. > > > > > > Can you tell whether there's a device present, e.g., via Slot Status > > > Presence Detect? If there's nothing in the slot, I don't know why we > > > would print anything at all. If a card is present but there's no > > > link, that's probably worthy of dev_info() or even dev_err(). > > > > > > > I don't think all form factors allow for the PRSNT pin to be wired up, > > so we cannot know if the device is actually present in the slot or not all > > the time. Maybe we should do if the form factor supports it? > > > > > I guess if you can tell the slot is empty, there's no point in even > > > trying to start the link, so you could avoid both the message and the > > > timeout by not even calling dw_pcie_wait_for_link(). > > > > Right. There is an overhead of waiting for ~1ms during boot. > > Async probe should mitigate that, right? Saravana is working toward > making that the default instead of opt in, but you could opt in now. > No. The delay is due to the DWC core waiting for link up that depends on the PCIe device to be present on the slot. The driver probe order doesn't apply here. Thanks, Mani > Rob
Hi, Just checking if we are good with this patch or does it need any further modifications? Thanks, Vidya Sagar On 9/15/2022 8:22 PM, Manivannan Sadhasivam wrote: > External email: Use caution opening links or attachments > > > On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote: >> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam >> <manivannan.sadhasivam@linaro.org> wrote: >>> >>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: >>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: >>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote: >>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: >>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and >>>>>>> not having an endpoint connected to the slot is not an error. >>>>>>> So, changing the macro from dev_err to dev_info to log the event. >>>>>> >>>>>> But the link up not happening is an actual error and -ETIMEDOUT is being >>>>>> returned. So I don't think the log severity should be changed. >>>>> >>>>> Yes it is an error in the sense it is a timeout, but reporting an error >>>>> because nothing is attached to a PCI slot seems a bit noisy. Please note >>>>> that a similar change was made by the following commit and it also seems >>>>> appropriate here ... >>>>> >>>>> commit 4b16a8227907118e011fb396022da671a52b2272 >>>>> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> >>>>> Date: Tue Jun 18 23:32:06 2019 +0530 >>>>> >>>>> PCI: tegra: Change link retry log level to debug >>>>> >>>>> >>>>> BTW, we check for error messages in the dmesg output and this is a new error >>>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the >>>>> error, but in this case it seem more appropriate to make this a info or >>>>> debug level print. >>>> >>>> Can you tell whether there's a device present, e.g., via Slot Status >>>> Presence Detect? If there's nothing in the slot, I don't know why we >>>> would print anything at all. If a card is present but there's no >>>> link, that's probably worthy of dev_info() or even dev_err(). >>>> >>> >>> I don't think all form factors allow for the PRSNT pin to be wired up, >>> so we cannot know if the device is actually present in the slot or not all >>> the time. Maybe we should do if the form factor supports it? >>> >>>> I guess if you can tell the slot is empty, there's no point in even >>>> trying to start the link, so you could avoid both the message and the >>>> timeout by not even calling dw_pcie_wait_for_link(). >>> >>> Right. There is an overhead of waiting for ~1ms during boot. >> >> Async probe should mitigate that, right? Saravana is working toward >> making that the default instead of opt in, but you could opt in now. >> > > No. The delay is due to the DWC core waiting for link up that depends on > the PCIe device to be present on the slot. The driver probe order > doesn't apply here. > > Thanks, > Mani > >> Rob
Hi Bjorn / Lorenzo, Just checking if there are any more comments for this patch? If not, are we good to take it? Thanks, Vidya Sagar On 9/26/2022 3:59 PM, Vidya Sagar wrote: > Hi, > Just checking if we are good with this patch or does it need any further > modifications? > > Thanks, > Vidya Sagar > > On 9/15/2022 8:22 PM, Manivannan Sadhasivam wrote: >> External email: Use caution opening links or attachments >> >> >> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote: >>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam >>> <manivannan.sadhasivam@linaro.org> wrote: >>>> >>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: >>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: >>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote: >>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: >>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open >>>>>>>> slots and >>>>>>>> not having an endpoint connected to the slot is not an error. >>>>>>>> So, changing the macro from dev_err to dev_info to log the event. >>>>>>> >>>>>>> But the link up not happening is an actual error and -ETIMEDOUT >>>>>>> is being >>>>>>> returned. So I don't think the log severity should be changed. >>>>>> >>>>>> Yes it is an error in the sense it is a timeout, but reporting an >>>>>> error >>>>>> because nothing is attached to a PCI slot seems a bit noisy. >>>>>> Please note >>>>>> that a similar change was made by the following commit and it also >>>>>> seems >>>>>> appropriate here ... >>>>>> >>>>>> commit 4b16a8227907118e011fb396022da671a52b2272 >>>>>> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> >>>>>> Date: Tue Jun 18 23:32:06 2019 +0530 >>>>>> >>>>>> PCI: tegra: Change link retry log level to debug >>>>>> >>>>>> >>>>>> BTW, we check for error messages in the dmesg output and this is a >>>>>> new error >>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can >>>>>> ignore the >>>>>> error, but in this case it seem more appropriate to make this a >>>>>> info or >>>>>> debug level print. >>>>> >>>>> Can you tell whether there's a device present, e.g., via Slot Status >>>>> Presence Detect? If there's nothing in the slot, I don't know why we >>>>> would print anything at all. If a card is present but there's no >>>>> link, that's probably worthy of dev_info() or even dev_err(). >>>>> >>>> >>>> I don't think all form factors allow for the PRSNT pin to be wired up, >>>> so we cannot know if the device is actually present in the slot or >>>> not all >>>> the time. Maybe we should do if the form factor supports it? >>>> >>>>> I guess if you can tell the slot is empty, there's no point in even >>>>> trying to start the link, so you could avoid both the message and the >>>>> timeout by not even calling dw_pcie_wait_for_link(). >>>> >>>> Right. There is an overhead of waiting for ~1ms during boot. >>> >>> Async probe should mitigate that, right? Saravana is working toward >>> making that the default instead of opt in, but you could opt in now. >>> >> >> No. The delay is due to the DWC core waiting for link up that depends on >> the PCIe device to be present on the slot. The driver probe order >> doesn't apply here. >> >> Thanks, >> Mani >> >>> Rob
On Thu, Sep 15, 2022 at 9:52 AM Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote: > > On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam > > <manivannan.sadhasivam@linaro.org> wrote: > > > > > > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: > > > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: > > > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote: > > > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > > > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > > > > > > not having an endpoint connected to the slot is not an error. > > > > > > > So, changing the macro from dev_err to dev_info to log the event. > > > > > > > > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being > > > > > > returned. So I don't think the log severity should be changed. > > > > > > > > > > Yes it is an error in the sense it is a timeout, but reporting an error > > > > > because nothing is attached to a PCI slot seems a bit noisy. Please note > > > > > that a similar change was made by the following commit and it also seems > > > > > appropriate here ... > > > > > > > > > > commit 4b16a8227907118e011fb396022da671a52b2272 > > > > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> > > > > > Date: Tue Jun 18 23:32:06 2019 +0530 > > > > > > > > > > PCI: tegra: Change link retry log level to debug > > > > > > > > > > > > > > > BTW, we check for error messages in the dmesg output and this is a new error > > > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the > > > > > error, but in this case it seem more appropriate to make this a info or > > > > > debug level print. > > > > > > > > Can you tell whether there's a device present, e.g., via Slot Status > > > > Presence Detect? If there's nothing in the slot, I don't know why we > > > > would print anything at all. If a card is present but there's no > > > > link, that's probably worthy of dev_info() or even dev_err(). > > > > > > > > > > I don't think all form factors allow for the PRSNT pin to be wired up, > > > so we cannot know if the device is actually present in the slot or not all > > > the time. Maybe we should do if the form factor supports it? > > > > > > > I guess if you can tell the slot is empty, there's no point in even > > > > trying to start the link, so you could avoid both the message and the > > > > timeout by not even calling dw_pcie_wait_for_link(). > > > > > > Right. There is an overhead of waiting for ~1ms during boot. > > > > Async probe should mitigate that, right? Saravana is working toward > > making that the default instead of opt in, but you could opt in now. > > > > No. The delay is due to the DWC core waiting for link up that depends on > the PCIe device to be present on the slot. Yes, I understand that already. > The driver probe order > doesn't apply here. I'm not talking about probe order, but rather async probe enabling parallel probing. If waiting for the link happens asynchronously, then other probes can happen in parallel and you won't see the delay (until you run out of cores or all the other probes are faster). Rob
On 10/4/2022 6:23 PM, Rob Herring wrote: > External email: Use caution opening links or attachments > > > On Thu, Sep 15, 2022 at 9:52 AM Manivannan Sadhasivam > <manivannan.sadhasivam@linaro.org> wrote: >> >> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote: >>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam >>> <manivannan.sadhasivam@linaro.org> wrote: >>>> >>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: >>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: >>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote: >>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: >>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and >>>>>>>> not having an endpoint connected to the slot is not an error. >>>>>>>> So, changing the macro from dev_err to dev_info to log the event. >>>>>>> >>>>>>> But the link up not happening is an actual error and -ETIMEDOUT is being >>>>>>> returned. So I don't think the log severity should be changed. >>>>>> >>>>>> Yes it is an error in the sense it is a timeout, but reporting an error >>>>>> because nothing is attached to a PCI slot seems a bit noisy. Please note >>>>>> that a similar change was made by the following commit and it also seems >>>>>> appropriate here ... >>>>>> >>>>>> commit 4b16a8227907118e011fb396022da671a52b2272 >>>>>> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> >>>>>> Date: Tue Jun 18 23:32:06 2019 +0530 >>>>>> >>>>>> PCI: tegra: Change link retry log level to debug >>>>>> >>>>>> >>>>>> BTW, we check for error messages in the dmesg output and this is a new error >>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the >>>>>> error, but in this case it seem more appropriate to make this a info or >>>>>> debug level print. >>>>> >>>>> Can you tell whether there's a device present, e.g., via Slot Status >>>>> Presence Detect? If there's nothing in the slot, I don't know why we >>>>> would print anything at all. If a card is present but there's no >>>>> link, that's probably worthy of dev_info() or even dev_err(). >>>>> >>>> >>>> I don't think all form factors allow for the PRSNT pin to be wired up, >>>> so we cannot know if the device is actually present in the slot or not all >>>> the time. Maybe we should do if the form factor supports it? >>>> >>>>> I guess if you can tell the slot is empty, there's no point in even >>>>> trying to start the link, so you could avoid both the message and the >>>>> timeout by not even calling dw_pcie_wait_for_link(). >>>> >>>> Right. There is an overhead of waiting for ~1ms during boot. >>> >>> Async probe should mitigate that, right? Saravana is working toward >>> making that the default instead of opt in, but you could opt in now. >>> >> >> No. The delay is due to the DWC core waiting for link up that depends on >> the PCIe device to be present on the slot. > > Yes, I understand that already. > >> The driver probe order >> doesn't apply here. > > I'm not talking about probe order, but rather async probe enabling > parallel probing. If waiting for the link happens asynchronously, then > other probes can happen in parallel and you won't see the delay (until > you run out of cores or all the other probes are faster). Are you suggesting to break the existing probe of DWC based PCIe platform drivers into two i.e. sync part that handles the sequence up until link up and the async part that starts after link is up (or after LIKUP_TIMEOUT if link doesn't come up). - Vidya Sagar > > Rob >
Hi Bjorn, On 13/09/2022 11:12, Vidya Sagar wrote: > Some of the platforms (like Tegra194 and Tegra234) have open slots and > not having an endpoint connected to the slot is not an error. > So, changing the macro from dev_err to dev_info to log the event. > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com> > --- > drivers/pci/controller/dwc/pcie-designware.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c > index 650a7f22f9d0..25154555aa7a 100644 > --- a/drivers/pci/controller/dwc/pcie-designware.c > +++ b/drivers/pci/controller/dwc/pcie-designware.c > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > } > > if (retries >= LINK_WAIT_MAX_RETRIES) { > - dev_err(pci->dev, "Phy link never came up\n"); > + dev_info(pci->dev, "Phy link never came up\n"); > return -ETIMEDOUT; > } > Are you OK to take this change? Acked-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Thanks Jon
On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote: > Hi Bjorn, > > On 13/09/2022 11:12, Vidya Sagar wrote: > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > not having an endpoint connected to the slot is not an error. > > So, changing the macro from dev_err to dev_info to log the event. > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com> > > --- > > drivers/pci/controller/dwc/pcie-designware.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c > > index 650a7f22f9d0..25154555aa7a 100644 > > --- a/drivers/pci/controller/dwc/pcie-designware.c > > +++ b/drivers/pci/controller/dwc/pcie-designware.c > > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > > } > > if (retries >= LINK_WAIT_MAX_RETRIES) { > > - dev_err(pci->dev, "Phy link never came up\n"); > > + dev_info(pci->dev, "Phy link never came up\n"); > > return -ETIMEDOUT; > > } > > > Are you OK to take this change? When this came up, Lorenzo was in the middle of a big move and I was covering for him while he was unavailable. But he's back, and I'm sure he will resolve this soon. Personally I'm OK either way. Bjorn
Hi Lorenzo, On 18/10/2022 17:43, Bjorn Helgaas wrote: > On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote: >> Hi Bjorn, >> >> On 13/09/2022 11:12, Vidya Sagar wrote: >>> Some of the platforms (like Tegra194 and Tegra234) have open slots and >>> not having an endpoint connected to the slot is not an error. >>> So, changing the macro from dev_err to dev_info to log the event. >>> >>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com> >>> --- >>> drivers/pci/controller/dwc/pcie-designware.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c >>> index 650a7f22f9d0..25154555aa7a 100644 >>> --- a/drivers/pci/controller/dwc/pcie-designware.c >>> +++ b/drivers/pci/controller/dwc/pcie-designware.c >>> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) >>> } >>> if (retries >= LINK_WAIT_MAX_RETRIES) { >>> - dev_err(pci->dev, "Phy link never came up\n"); >>> + dev_info(pci->dev, "Phy link never came up\n"); >>> return -ETIMEDOUT; >>> } >> >> >> Are you OK to take this change? > > When this came up, Lorenzo was in the middle of a big move and I was > covering for him while he was unavailable. But he's back, and I'm > sure he will resolve this soon. > > Personally I'm OK either way. > > Bjorn Can we come to a conclusion on this? We have tests that fail when errors/warning messages are reported. We can choose to ignore these errors, but given that this is not an error in this case, we were thinking it is better to make it informational. Thanks Jon
On Wed, Oct 26, 2022 at 12:39:13PM +0100, Jon Hunter wrote: > Hi Lorenzo, > > On 18/10/2022 17:43, Bjorn Helgaas wrote: > > On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote: > > > Hi Bjorn, > > > > > > On 13/09/2022 11:12, Vidya Sagar wrote: > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > > > not having an endpoint connected to the slot is not an error. > > > > So, changing the macro from dev_err to dev_info to log the event. > > > > > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com> > > > > --- > > > > drivers/pci/controller/dwc/pcie-designware.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c > > > > index 650a7f22f9d0..25154555aa7a 100644 > > > > --- a/drivers/pci/controller/dwc/pcie-designware.c > > > > +++ b/drivers/pci/controller/dwc/pcie-designware.c > > > > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > > > > } > > > > if (retries >= LINK_WAIT_MAX_RETRIES) { > > > > - dev_err(pci->dev, "Phy link never came up\n"); > > > > + dev_info(pci->dev, "Phy link never came up\n"); > > > > return -ETIMEDOUT; > > > > } > > > > > > > > > Are you OK to take this change? > > > > When this came up, Lorenzo was in the middle of a big move and I was > > covering for him while he was unavailable. But he's back, and I'm > > sure he will resolve this soon. > > > > Personally I'm OK either way. > > > > Bjorn > > > Can we come to a conclusion on this? > > We have tests that fail when errors/warning messages are reported. We can > choose to ignore these errors, but given that this is not an error in this > case, we were thinking it is better to make it informational. I understood. We are at v6.1-rc2, this patch is not a fix, we will handle it for the v6.2 merge window. Thanks, Lorenzo
On Mon, Oct 10, 2022 at 1:02 AM Vidya Sagar <vidyas@nvidia.com> wrote: > > > > On 10/4/2022 6:23 PM, Rob Herring wrote: > > External email: Use caution opening links or attachments > > > > > > On Thu, Sep 15, 2022 at 9:52 AM Manivannan Sadhasivam > > <manivannan.sadhasivam@linaro.org> wrote: > >> > >> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote: > >>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam > >>> <manivannan.sadhasivam@linaro.org> wrote: > >>>> > >>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: > >>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: > >>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote: > >>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > >>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and > >>>>>>>> not having an endpoint connected to the slot is not an error. > >>>>>>>> So, changing the macro from dev_err to dev_info to log the event. > >>>>>>> > >>>>>>> But the link up not happening is an actual error and -ETIMEDOUT is being > >>>>>>> returned. So I don't think the log severity should be changed. > >>>>>> > >>>>>> Yes it is an error in the sense it is a timeout, but reporting an error > >>>>>> because nothing is attached to a PCI slot seems a bit noisy. Please note > >>>>>> that a similar change was made by the following commit and it also seems > >>>>>> appropriate here ... > >>>>>> > >>>>>> commit 4b16a8227907118e011fb396022da671a52b2272 > >>>>>> Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> > >>>>>> Date: Tue Jun 18 23:32:06 2019 +0530 > >>>>>> > >>>>>> PCI: tegra: Change link retry log level to debug > >>>>>> > >>>>>> > >>>>>> BTW, we check for error messages in the dmesg output and this is a new error > >>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the > >>>>>> error, but in this case it seem more appropriate to make this a info or > >>>>>> debug level print. > >>>>> > >>>>> Can you tell whether there's a device present, e.g., via Slot Status > >>>>> Presence Detect? If there's nothing in the slot, I don't know why we > >>>>> would print anything at all. If a card is present but there's no > >>>>> link, that's probably worthy of dev_info() or even dev_err(). > >>>>> > >>>> > >>>> I don't think all form factors allow for the PRSNT pin to be wired up, > >>>> so we cannot know if the device is actually present in the slot or not all > >>>> the time. Maybe we should do if the form factor supports it? > >>>> > >>>>> I guess if you can tell the slot is empty, there's no point in even > >>>>> trying to start the link, so you could avoid both the message and the > >>>>> timeout by not even calling dw_pcie_wait_for_link(). > >>>> > >>>> Right. There is an overhead of waiting for ~1ms during boot. > >>> > >>> Async probe should mitigate that, right? Saravana is working toward > >>> making that the default instead of opt in, but you could opt in now. > >>> > >> > >> No. The delay is due to the DWC core waiting for link up that depends on > >> the PCIe device to be present on the slot. > > > > Yes, I understand that already. > > > >> The driver probe order > >> doesn't apply here. > > > > I'm not talking about probe order, but rather async probe enabling > > parallel probing. If waiting for the link happens asynchronously, then > > other probes can happen in parallel and you won't see the delay (until > > you run out of cores or all the other probes are faster). > > Are you suggesting to break the existing probe of DWC based PCIe > platform drivers into two i.e. sync part that handles the sequence up > until link up and the async part that starts after link is up (or after > LIKUP_TIMEOUT if link doesn't come up). No, just make the driver opt-in to async probe. It's 1 flag to set for the driver. Then the delay in your probe is not blocking other probes and the whole probe for the driver happens in parallel. Then the delay is only an issue if it is longer than all the other things initializing during boot or if you are on a single core system. Neither is likely true. Rob
On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote: > On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam > <manivannan.sadhasivam@linaro.org> wrote: > > > > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: > > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: > > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote: > > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > > > > > not having an endpoint connected to the slot is not an error. > > > > > > So, changing the macro from dev_err to dev_info to log the event. > > > > > > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being > > > > > returned. So I don't think the log severity should be changed. > > > > > > > > Yes it is an error in the sense it is a timeout, but reporting an error > > > > because nothing is attached to a PCI slot seems a bit noisy. Please note > > > > that a similar change was made by the following commit and it also seems > > > > appropriate here ... > > > > > > > > commit 4b16a8227907118e011fb396022da671a52b2272 > > > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> > > > > Date: Tue Jun 18 23:32:06 2019 +0530 > > > > > > > > PCI: tegra: Change link retry log level to debug > > > > > > > > > > > > BTW, we check for error messages in the dmesg output and this is a new error > > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the > > > > error, but in this case it seem more appropriate to make this a info or > > > > debug level print. > > > > > > Can you tell whether there's a device present, e.g., via Slot Status > > > Presence Detect? If there's nothing in the slot, I don't know why we > > > would print anything at all. If a card is present but there's no > > > link, that's probably worthy of dev_info() or even dev_err(). > > > > > > > I don't think all form factors allow for the PRSNT pin to be wired up, > > so we cannot know if the device is actually present in the slot or not all > > the time. Maybe we should do if the form factor supports it? > > > > > I guess if you can tell the slot is empty, there's no point in even > > > trying to start the link, so you could avoid both the message and the > > > timeout by not even calling dw_pcie_wait_for_link(). > > > > Right. There is an overhead of waiting for ~1ms during boot. > > Async probe should mitigate that, right? Saravana is working toward > making that the default instead of opt in, but you could opt in now. I read this as "trying to bring the link up is mandatory because we can't detect an empty slot, therefore ~1ms delay during boot is unavoidable" and on that Rob's suggestion applies. Just to understand where this thread is going. The suggestion above does not change the aim of this patch, that seems reasonable to me and it makes sense even if/when what Rob is suggesting is implemented. Is that correct ? Thanks, Lorenzo
On 26/10/2022 12:39, Jon Hunter wrote: > Hi Lorenzo, > > On 18/10/2022 17:43, Bjorn Helgaas wrote: >> On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote: >>> Hi Bjorn, >>> >>> On 13/09/2022 11:12, Vidya Sagar wrote: >>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and >>>> not having an endpoint connected to the slot is not an error. >>>> So, changing the macro from dev_err to dev_info to log the event. >>>> >>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com> >>>> --- >>>> drivers/pci/controller/dwc/pcie-designware.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/pci/controller/dwc/pcie-designware.c >>>> b/drivers/pci/controller/dwc/pcie-designware.c >>>> index 650a7f22f9d0..25154555aa7a 100644 >>>> --- a/drivers/pci/controller/dwc/pcie-designware.c >>>> +++ b/drivers/pci/controller/dwc/pcie-designware.c >>>> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) >>>> } >>>> if (retries >= LINK_WAIT_MAX_RETRIES) { >>>> - dev_err(pci->dev, "Phy link never came up\n"); >>>> + dev_info(pci->dev, "Phy link never came up\n"); >>>> return -ETIMEDOUT; >>>> } >>> >>> >>> Are you OK to take this change? >> >> When this came up, Lorenzo was in the middle of a big move and I was >> covering for him while he was unavailable. But he's back, and I'm >> sure he will resolve this soon. >> >> Personally I'm OK either way. >> >> Bjorn > > > Can we come to a conclusion on this? > > We have tests that fail when errors/warning messages are reported. We > can choose to ignore these errors, but given that this is not an error > in this case, we were thinking it is better to make it informational. Is there any hardware presence detect available to just avoid even trying to bring a link up on an disconnected port?
On Thu, Oct 27, 2022 at 11:39:02AM +0200, Lorenzo Pieralisi wrote: > On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote: > > On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam > > <manivannan.sadhasivam@linaro.org> wrote: > > > > > > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote: > > > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote: > > > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote: > > > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote: > > > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > > > > > > not having an endpoint connected to the slot is not an error. > > > > > > > So, changing the macro from dev_err to dev_info to log the event. > > > > > > > > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being > > > > > > returned. So I don't think the log severity should be changed. > > > > > > > > > > Yes it is an error in the sense it is a timeout, but reporting an error > > > > > because nothing is attached to a PCI slot seems a bit noisy. Please note > > > > > that a similar change was made by the following commit and it also seems > > > > > appropriate here ... > > > > > > > > > > commit 4b16a8227907118e011fb396022da671a52b2272 > > > > > Author: Manikanta Maddireddy <mmaddireddy@nvidia.com> > > > > > Date: Tue Jun 18 23:32:06 2019 +0530 > > > > > > > > > > PCI: tegra: Change link retry log level to debug > > > > > > > > > > > > > > > BTW, we check for error messages in the dmesg output and this is a new error > > > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the > > > > > error, but in this case it seem more appropriate to make this a info or > > > > > debug level print. > > > > > > > > Can you tell whether there's a device present, e.g., via Slot Status > > > > Presence Detect? If there's nothing in the slot, I don't know why we > > > > would print anything at all. If a card is present but there's no > > > > link, that's probably worthy of dev_info() or even dev_err(). > > > > > > > > > > I don't think all form factors allow for the PRSNT pin to be wired up, > > > so we cannot know if the device is actually present in the slot or not all > > > the time. Maybe we should do if the form factor supports it? > > > > > > > I guess if you can tell the slot is empty, there's no point in even > > > > trying to start the link, so you could avoid both the message and the > > > > timeout by not even calling dw_pcie_wait_for_link(). > > > > > > Right. There is an overhead of waiting for ~1ms during boot. > > > > Async probe should mitigate that, right? Saravana is working toward > > making that the default instead of opt in, but you could opt in now. > > I read this as "trying to bring the link up is mandatory because > we can't detect an empty slot, therefore ~1ms delay during boot > is unavoidable" and on that Rob's suggestion applies. > Right. Rob's suggestion came as areply to my worry on waiting for link up during boot. > Just to understand where this thread is going. The suggestion > above does not change the aim of this patch, that seems reasonable > to me and it makes sense even if/when what Rob is suggesting is > implemented. > > Is that correct ? > Yes, Rob's suggestion makes sense to me. And it has to be implemented as a separate patch but this patch itself is required to demote the error log to debug. Thanks, Mani > Thanks, > Lorenzo
On Thu, Oct 27, 2022 at 10:55:34AM +0100, Ben Dooks wrote: > On 26/10/2022 12:39, Jon Hunter wrote: > > Hi Lorenzo, > > > > On 18/10/2022 17:43, Bjorn Helgaas wrote: > > > On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote: > > > > Hi Bjorn, > > > > > > > > On 13/09/2022 11:12, Vidya Sagar wrote: > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and > > > > > not having an endpoint connected to the slot is not an error. > > > > > So, changing the macro from dev_err to dev_info to log the event. > > > > > > > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com> > > > > > --- > > > > > drivers/pci/controller/dwc/pcie-designware.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c > > > > > b/drivers/pci/controller/dwc/pcie-designware.c > > > > > index 650a7f22f9d0..25154555aa7a 100644 > > > > > --- a/drivers/pci/controller/dwc/pcie-designware.c > > > > > +++ b/drivers/pci/controller/dwc/pcie-designware.c > > > > > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > > > > > } > > > > > if (retries >= LINK_WAIT_MAX_RETRIES) { > > > > > - dev_err(pci->dev, "Phy link never came up\n"); > > > > > + dev_info(pci->dev, "Phy link never came up\n"); > > > > > return -ETIMEDOUT; > > > > > } > > > > > > > > > > > > Are you OK to take this change? > > > > > > When this came up, Lorenzo was in the middle of a big move and I was > > > covering for him while he was unavailable. But he's back, and I'm > > > sure he will resolve this soon. > > > > > > Personally I'm OK either way. > > > > > > Bjorn > > > > > > Can we come to a conclusion on this? > > > > We have tests that fail when errors/warning messages are reported. We > > can choose to ignore these errors, but given that this is not an error > > in this case, we were thinking it is better to make it informational. > > Is there any hardware presence detect available to just avoid even > trying to bring a link up on an disconnected port? > PRSNT pin is not available on all form factors sadly. Thanks, Mani > > -- > Ben Dooks http://www.codethink.co.uk/ > Senior Engineer Codethink - Providing Genius > > https://www.codethink.co.uk/privacy.html >
On Tue, 13 Sep 2022 15:42:37 +0530, Vidya Sagar wrote: > Some of the platforms (like Tegra194 and Tegra234) have open slots and > not having an endpoint connected to the slot is not an error. > So, changing the macro from dev_err to dev_info to log the event. > > Applied to pci/dwc, thanks! [1/1] PCI: dwc: Use dev_info for PCIe link down event logging https://git.kernel.org/lpieralisi/pci/c/8405d8f0956d Thanks, Lorenzo
diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c index 650a7f22f9d0..25154555aa7a 100644 --- a/drivers/pci/controller/dwc/pcie-designware.c +++ b/drivers/pci/controller/dwc/pcie-designware.c @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) } if (retries >= LINK_WAIT_MAX_RETRIES) { - dev_err(pci->dev, "Phy link never came up\n"); + dev_info(pci->dev, "Phy link never came up\n"); return -ETIMEDOUT; }
Some of the platforms (like Tegra194 and Tegra234) have open slots and not having an endpoint connected to the slot is not an error. So, changing the macro from dev_err to dev_info to log the event. Signed-off-by: Vidya Sagar <vidyas@nvidia.com> --- drivers/pci/controller/dwc/pcie-designware.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)