diff mbox

[v2] PCI: Check for PCIe downtraining conditions

Message ID 20180601150129.10486-1-mr.nuke.me@gmail.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Alex G. June 1, 2018, 3:01 p.m. UTC
PCIe downtraining happens when both the device and PCIe port are
capable of a larger bus width or higher speed than negotiated.
Downtraining might be indicative of other problems in the system, and
identifying this from userspace is neither intuitive, nor straigh
forward.

The easiest way to detect this is with pcie_print_link_status(),
since the bottleneck is usually the link that is downtrained. It's not
a perfect solution, but it works extremely well in most cases.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/pci/probe.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Sinan Kaya June 1, 2018, 3:03 p.m. UTC | #1
On 6/1/2018 11:01 AM, Alexandru Gagniuc wrote:
> +	/* Multi-function PCIe share the same link/status. */
> +	if (PCI_FUNC(dev->devfn) != 0)
> +		return;

How about virtual functions?
Alex G. June 1, 2018, 3:06 p.m. UTC | #2
On 06/01/2018 10:03 AM, Sinan Kaya wrote:
> On 6/1/2018 11:01 AM, Alexandru Gagniuc wrote:
>> +	/* Multi-function PCIe share the same link/status. */
>> +	if (PCI_FUNC(dev->devfn) != 0)
>> +		return;
> 
> How about virtual functions?

I have almost no clue about those. Is your concern that we might end up
printing more than our fair share of link statuses?

Alex
Sinan Kaya June 1, 2018, 3:10 p.m. UTC | #3
On 6/1/2018 11:06 AM, Alex G. wrote:
> On 06/01/2018 10:03 AM, Sinan Kaya wrote:
>> On 6/1/2018 11:01 AM, Alexandru Gagniuc wrote:
>>> +	/* Multi-function PCIe share the same link/status. */
>>> +	if (PCI_FUNC(dev->devfn) != 0)
>>> +		return;
>>
>> How about virtual functions?
> 
> I have almost no clue about those. Is your concern that we might end up
> printing more than our fair share of link statuses?

Yes, struct pci_dev also has a flag called is_virtfn that you should bail out
here too. 

> 
> Alex
> 
>
Andy Shevchenko June 1, 2018, 3:12 p.m. UTC | #4
On Fri, Jun 1, 2018 at 6:01 PM, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor straigh
> forward.
>
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.

> +static void pcie_check_upstream_link(struct pci_dev *dev)
> +{

> +

This is redundant, but...

> +       if (!pci_is_pcie(dev))
> +               return;
> +
> +       /* Look from the device up to avoid downstream ports with no devices. */
> +       if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> +           (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> +           (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> +               return;

...wouldn't be better

int type = pci_pcie_type(dev);

?

But also possible, looking at existing code,

static inline bool pci_is_pcie_type(dev, type)
{
  return pci_is_pcie(dev) ? pci_pcie_type(dev) == type : false;
}
Alex G. June 1, 2018, 3:29 p.m. UTC | #5
On 06/01/2018 10:12 AM, Andy Shevchenko wrote:
> On Fri, Jun 1, 2018 at 6:01 PM, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
>> PCIe downtraining happens when both the device and PCIe port are
>> capable of a larger bus width or higher speed than negotiated.
>> Downtraining might be indicative of other problems in the system, and
>> identifying this from userspace is neither intuitive, nor straigh
>> forward.
>>
>> The easiest way to detect this is with pcie_print_link_status(),
>> since the bottleneck is usually the link that is downtrained. It's not
>> a perfect solution, but it works extremely well in most cases.
> 
>> +static void pcie_check_upstream_link(struct pci_dev *dev)
>> +{
> 
>> +
> 
> This is redundant, but...

Hmm. I thought it's not safe to call pci_pcie_type() on non-pcie devices.

I see the pci_is_pcie() check followed by pci_pcie_type() is not
uncommon. I didn't think it would be an issue, as long as it's
consistent with the rest of the code.

>> +       if (!pci_is_pcie(dev))
>> +               return;
>> +
>> +       /* Look from the device up to avoid downstream ports with no devices. */
>> +       if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>> +           (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>> +           (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>> +               return;
> 
> ...wouldn't be better
> 
> int type = pci_pcie_type(dev);
> 
> ?

An extra local variable when the compiler knows how to optimize it out?
To me, it doesn't seem like it would improve readability, but it would
make the code longer.

> But also possible, looking at existing code,
> 
> static inline bool pci_is_pcie_type(dev, type)
> {
>   return pci_is_pcie(dev) ? pci_pcie_type(dev) == type : false;
> }

	return pci_is_pcie(dev) && (pci_pcie_type(dev) == type);

seems cleaner. Although this sort of cleanup is beyond the scope of this
change.

Thanks,
Alex
Alex G. June 1, 2018, 3:50 p.m. UTC | #6
On 06/01/2018 10:10 AM, Sinan Kaya wrote:
> On 6/1/2018 11:06 AM, Alex G. wrote:
>> On 06/01/2018 10:03 AM, Sinan Kaya wrote:
>>> On 6/1/2018 11:01 AM, Alexandru Gagniuc wrote:
>>>> +	/* Multi-function PCIe share the same link/status. */
>>>> +	if (PCI_FUNC(dev->devfn) != 0)
>>>> +		return;
>>>
>>> How about virtual functions?
>>
>> I have almost no clue about those. Is your concern that we might end up
>> printing more than our fair share of link statuses?
> 
> Yes, struct pci_dev also has a flag called is_virtfn that you should bail out
> here too. 

Will be fixed in v3.

Thanks,
Alex

>>
>> Alex
>>
>>
> 
>
diff mbox

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac91b6fd0bcd..e8e158046cab 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2146,6 +2146,25 @@  static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_check_upstream_link(struct pci_dev *dev)
+{
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices. */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe share the same link/status. */
+	if (PCI_FUNC(dev->devfn) != 0)
+		return;
+
+	pcie_print_link_status(dev);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2181,6 +2200,9 @@  static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	/* Check link and detect downtrain errors */
+	pcie_check_upstream_link(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }