diff mbox

[v8,1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

Message ID 1501767889-7772-2-git-send-email-dingtianhong@huawei.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Ding Tianhong Aug. 3, 2017, 1:44 p.m. UTC
From: Casey Leedom <leedom@chelsio.com>

The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
Relaxed Ordering (RO) attribute should not be used for Transaction Layer
Packets (TLP) targetted towards these affected root complexes. Current list
of affected parts include Intel E5-26xx root complex which suffers from 
flow control credits that result in performance issues. On these affected
parts RO can still be used for peer-2-peer traffic. AMD A1100 ARM ("SEATTLE")
Root complexes don't obey PCIe 3.0 ordering rules, hence could lead to
data-corruption.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h  |  2 ++
 2 files changed, 40 insertions(+)

Comments

Casey Leedom Aug. 4, 2017, 9:06 p.m. UTC | #1
| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Thursday, August 3, 2017 6:44 AM
|
| diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
| index 6967c6b..1e1cdbe 100644
| --- a/drivers/pci/quirks.c
| +++ b/drivers/pci/quirks.c
| @@ -4016,6 +4016,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
|                                quirk_tw686x_class);
|
|  /*
| + * Some devices have problems with Transaction Layer Packets with the Relaxed
| + * Ordering Attribute set.  Such devices should mark themselves and other
| + * Device Drivers should check before sending TLPs with RO set.
| + */
| +static void quirk_relaxedordering_disable(struct pci_dev *dev)
| +{
| +       dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
| +}
| +
| +/*
| + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
| + * cause performance problems with Upstream Transaction Layer Packets with
| + * Relaxed Ordering set.
| + */
| +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
| +                             quirk_relaxedordering_disable);
| +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
| +                             quirk_relaxedordering_disable);
| +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
| +                             quirk_relaxedordering_disable);
| + ...

It looks like this is missing the set of Root Complex IDs that were noted in
the document to which Patrick Cramer sent us a reference:

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

In section 3.9.1 we have:

    3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
          and Toward MMIO Regions (P2P)

    In order to maximize performance for PCIe devices in the processors
    listed in Table 3-6 below, the soft- ware should determine whether the
    accesses are toward coherent memory (system memory) or toward MMIO
    regions (P2P access to other devices). If the access is toward MMIO
    region, then software can command HW to set the RO bit in the TLP
    header, as this would allow hardware to achieve maximum throughput for
    these types of accesses. For accesses toward coherent memory, software
    can command HW to clear the RO bit in the TLP header (no RO), as this
    would allow hardware to achieve maximum throughput for these types of
    accesses.

    Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
               PCIe Performance

    Processor                            CPU RP Device IDs

    Intel Xeon processors based on       6F01H-6F0EH
    Broadwell microarchitecture

    Intel Xeon processors based on       2F01H-2F0EH
    Haswell microarchitecture

The PCI Device IDs you have there are the first ones that I guessed at
having the performance problem with Relaxed Ordering.  We now apparently
have a complete list from Intel.

I don't want to phrase this as a "NAK" because you've gone around the
mulberry bush a bunch of times already.  So maybe just go with what you've
got in version 8 of your patch and then do a follow on patch to complete the
table?

Casey
Ding Tianhong Aug. 5, 2017, 6:28 a.m. UTC | #2
On 2017/8/5 5:06, Casey Leedom wrote:
> | From: Ding Tianhong <dingtianhong@huawei.com>
> | Sent: Thursday, August 3, 2017 6:44 AM
> |
> | diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> | index 6967c6b..1e1cdbe 100644
> | --- a/drivers/pci/quirks.c
> | +++ b/drivers/pci/quirks.c
> | @@ -4016,6 +4016,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
> |                                quirk_tw686x_class);
> |
> |  /*
> | + * Some devices have problems with Transaction Layer Packets with the Relaxed
> | + * Ordering Attribute set.  Such devices should mark themselves and other
> | + * Device Drivers should check before sending TLPs with RO set.
> | + */
> | +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> | +{
> | +       dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> | +}
> | +
> | +/*
> | + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> | + * cause performance problems with Upstream Transaction Layer Packets with
> | + * Relaxed Ordering set.
> | + */
> | +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> | +                             quirk_relaxedordering_disable);
> | +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> | +                             quirk_relaxedordering_disable);
> | +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> | +                             quirk_relaxedordering_disable);
> | + ...
> 
> It looks like this is missing the set of Root Complex IDs that were noted in
> the document to which Patrick Cramer sent us a reference:
> 
> https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf
> 
> In section 3.9.1 we have:
> 
>     3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
>           and Toward MMIO Regions (P2P)
> 
>     In order to maximize performance for PCIe devices in the processors
>     listed in Table 3-6 below, the soft- ware should determine whether the
>     accesses are toward coherent memory (system memory) or toward MMIO
>     regions (P2P access to other devices). If the access is toward MMIO
>     region, then software can command HW to set the RO bit in the TLP
>     header, as this would allow hardware to achieve maximum throughput for
>     these types of accesses. For accesses toward coherent memory, software
>     can command HW to clear the RO bit in the TLP header (no RO), as this
>     would allow hardware to achieve maximum throughput for these types of
>     accesses.
> 
>     Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
>                PCIe Performance
> 
>     Processor                            CPU RP Device IDs
> 
>     Intel Xeon processors based on       6F01H-6F0EH
>     Broadwell microarchitecture
> 
>     Intel Xeon processors based on       2F01H-2F0EH
>     Haswell microarchitecture
> 
> The PCI Device IDs you have there are the first ones that I guessed at
> having the performance problem with Relaxed Ordering.  We now apparently
> have a complete list from Intel.
> 
> I don't want to phrase this as a "NAK" because you've gone around the
> mulberry bush a bunch of times already.  So maybe just go with what you've
> got in version 8 of your patch and then do a follow on patch to complete the
> table?
> 
Casey:

Thanks for the good catch, I found that the Ashok has notice this 3 month before, I am so sorry to
miss it, it was really a long discussion for this problem, but don't worry, It is not a big work to fix it,
I will send the v9 version. :)

Ding

> Casey
> .
>
diff mbox

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..1e1cdbe 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4016,6 +4016,44 @@  static void quirk_tw686x_class(struct pci_dev *pdev)
 			      quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+
+/*
+ * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
+ * cause performance problems with Upstream Transaction Layer Packets with
+ * Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..412ec1c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -188,6 +188,8 @@  enum pci_dev_flags {
 	 * the direct_complete optimization.
 	 */
 	PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
+	/* Don't use Relaxed Ordering for TLPs directed at this device */
+	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {