Message ID | MWHPR12MB160087E9E0B17F7A17609714C8AA0@MWHPR12MB1600.namprd12.prod.outlook.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
By the way, it ~seems~ like the patch set confuses the idea of the PCIe Capability Device Control[Relaxed Ordering Enable] with the device's ability to handle incoming TLPs with the Relaxed Ordering Attribute set. These are completely different things. The PCIe Capability Device Control[Relaxed Ordering Enable] solely governs the ability of the device to _send_ TLPs with the Relaxed Ordering Attribute set. It has nothing whatsoever to do with the handling of incoming TLPs with the Relaxed Ordering Attribute set. In fact, there is any standard way to disable receipt processing of such TLPs. That's kind of the whole point of the majority of the patch. If there was a separate "Ignore Incoming Relaxed Ordering Attributes" then we'd mostly just have a quirk which would set that for problematic devices. Worse yet, if I'm reading the patch right, it _is_ turning off the PCIe Capability Device Control[Relaxed Ordering Enable] for the devices that we've identified with problematic receive handling. Not only does this not solve the problem that we've been talking about, it actually almost certainly introduces a huge Graphics Performance Bug. The Relaxed Ordering Attribute was originally developed for Graphics Device support in order to download textures, etc. to a graphics devices as fast as possible and only ensure ordering at points where needed. For instance, as far as I know, the Intel Root Complexes that we've been talking about have no issues whatsoever in generating downstream TLPs with the Relaxed Ordering Attribute set. By turning off the PCIe Capability Device Control[Relaxed Ordering Enable] on these Root Complexes, this patch prevents the generation of downstream TLPs with the Relaxed Ordering Attribute set targeted at devices like graphics adapters. Casey
On Fri, Jul 7, 2017 at 5:30 PM, Casey Leedom <leedom@chelsio.com> wrote: > By the way, it ~seems~ like the patch set confuses the idea of the PCIe Capability Device Control[Relaxed Ordering Enable] with the device's ability to handle incoming TLPs with the Relaxed Ordering Attribute set. These are completely different things. The PCIe Capability Device Control[Relaxed Ordering Enable] solely governs the ability of the device to _send_ TLPs with the Relaxed Ordering Attribute set. It has nothing whatsoever to do with the handling of incoming TLPs with the Relaxed Ordering Attribute set. In fact, there is any standard way to disable receipt processing of such TLPs. That's kind of the whole point of the majority of the patch. If there was a separate "Ignore Incoming Relaxed Ordering Attributes" then we'd mostly just have a quirk which would set that for problematic devices. > > Worse yet, if I'm reading the patch right, it _is_ turning off the PCIe Capability Device Control[Relaxed Ordering Enable] for the devices that we've identified with problematic receive handling. Not only does this not solve the problem that we've been talking about, it actually almost certainly introduces a huge Graphics Performance Bug. The Relaxed Ordering Attribute was originally developed for Graphics Device support in order to download textures, etc. to a graphics devices as fast as possible and only ensure ordering at points where needed. For instance, as far as I know, the Intel Root Complexes that we've been talking about have no issues whatsoever in generating downstream TLPs with the Relaxed Ordering Attribute set. By turning off the PCIe Capability Device Control[Relaxed Ordering Enable] on these Root Complexes, this patch prevents the generation of downstream TLPs with the Relaxed Ordering Attribute set targeted at devices like graphics adapters. > > Casey The patches should be disabling the relaxed ordering on upstream facing ports that would be sending requests to the root complex. So if the root complex cannot handle receiving a message with relaxed ordering enabled we should be clearing the relaxed ordering bit in the PCIe configuration on those device that could send TLPs to that root complex. By doing that the devices hosted on that root complex cannot generate requests that contain relaxed ordering TLPs. I'll have to do some double checking of the patches, but I thought we were only clearing the relaxed ordering bits on the devices that would be sending requests to the root complex, not the downstream ports of the root complex itself. As such we should be able to generate relaxed ordering TLPs from the root complex, but the devices hosted on the bus shouldn't be able to generate relaxed ordering TLPs. The bit in the configuration space controls the ability to generate content, not the ability to receive it so clearing the bit on the device should be the correct behavior for this. As far as your suggestions for the cxgb4 patch it has the same problem it originally had. We want to disable the relaxed ordering bit on the device since the device should not be generating relaxed ordering requests. This is why I was trying to get your input on this as I know you have a peer-to-peer configuration that you wanted to support. What we need to do is identify what platforms cannot support relaxed ordering to the root complex and prevent that from happening which is why we clear the relaxed ordering bit on the device. In that setup we need to have a good way to identify when this has occurred and instead setup a side channel type configuration where we then re-enable relaxed ordering, but only allow for sending directly to the peer and not the root complex. - Alex
Okay, thanks for the note Alexander. I'll have to look more closely at the patch on Monday and try it out on one of the targeted systems to verify the semantics you describe. However, that said, there is no way to tell a priori where a device will send TLPs. To simply assume that all TLPs will be directed towards the Root Complex is a big assumption. Only the device and the code controlling it know where the TLPs will be directed. That's why there are changes required in the cxgb4 driver. For instance, the code in drivers/net/ethernet/chelsio./cxgb4/sge.c: t4_sge_alloc_rxq() knows that it's allocating Free List Buffers in Host Memory and that the RX Queues that it's allocating in the Hardware will eventually send Ingress Data to those Free List Buffers. (And similarly for the Free List Buffer Pointer Queue with respect to DMA Reads from the host.) In that routine we explicitly configure the Hardware to use/not-use the Relaxed Ordering Attribute via the FW_IQ_CMD_FL0FETCHRO and FW_IQ_CMD_FL0DATARO flags. Basically we're conditionally setting them based on the desirability of sending Relaxed Ordering TLPs to the Root Complex. (And we would perform the same kind of check for an nVME application ... which brings us to ...) And what would be the code using these patch APIs to set up a Peer-to-Peer nVME-style application? In that case we'd need the Chelsio adapter's PCIe Capability Device Control[Relaxed Ordering Enable] set for the nVME application ... and we would avoid programming the Chelsio Hardware to use Relaxed Ordering for TLPs directed at the Root Complex. Thus we would be in a position where some TLPs being emitted by the device to Peer devices would have Relaxed Ordering set and some directed at the Root Complex would not. And the only way for that to work is if the source device's Device Control[Relaxed Ordering Enable] is set ... Finally, setting aside my disagreements with the patch, we still have the code in the cxgb4 driver which explicitly turns on its own Device Control[Relaxed Ordering Enable] in cxgb4_main.c: enable_pcie_relaxed_ordering(). So the patch is something of a loop if all we're doing is testing our own Relaxed Ordering Enable state ... Casey
On Fri, Jul 7, 2017 at 7:04 PM, Casey Leedom <leedom@chelsio.com> wrote: > Okay, thanks for the note Alexander. I'll have to look more closely at > the patch on Monday and try it out on one of the targeted systems to verify > the semantics you describe. > > However, that said, there is no way to tell a priori where a device will > send TLPs. To simply assume that all TLPs will be directed towards the Root > Complex is a big assumption. Only the device and the code controlling it > know where the TLPs will be directed. That's why there are changes required > in the cxgb4 driver. For instance, the code in > drivers/net/ethernet/chelsio./cxgb4/sge.c: t4_sge_alloc_rxq() knows that > it's allocating Free List Buffers in Host Memory and that the RX Queues that > it's allocating in the Hardware will eventually send Ingress Data to those > Free List Buffers. (And similarly for the Free List Buffer Pointer Queue > with respect to DMA Reads from the host.) In that routine we explicitly > configure the Hardware to use/not-use the Relaxed Ordering Attribute via the > FW_IQ_CMD_FL0FETCHRO and FW_IQ_CMD_FL0DATARO flags. Basically we're > conditionally setting them based on the desirability of sending Relaxed > Ordering TLPs to the Root Complex. (And we would perform the same kind of > check for an nVME application ... which brings us to ...) The general idea with this is to keep this simple. In the vast majority of cases the assumption is that a device will be sending data back and forth from system memory. As such the mostly likely thing that any given device will be interacting with is the root complex. By making it so that we disable relaxed ordering if the root complex says we can't support it seems like the simplest and most direct solution to avoid the issue of us sending any requests with relaxed ordering enabled TLPs to the root complex. With that said what we are getting into are subtleties that end up impacting devices that perform peer-to-peer operations and I don't suspect that peer-to-peer is really all that common. Ideally my thought on this is that if there is something in the setup that cannot support relaxed ordering we should be disabling relaxed ordering and then re-enabling it in the drivers that have special peer-to-peer routes that they are aware of. This should help to simplify things for cases such as a function being direct assigned as the configuration space should be passed through to the guest with the relaxed ordering attribute disabled, and that status will be visible to the guest. If we just use the quirk we lose that and it becomes problematic if a function is direct assigned on a system that doesn't support relaxed ordering as the guest has no visibility into the host's quirks. > And what would be the code using these patch APIs to set up a Peer-to-Peer > nVME-style application? In that case we'd need the Chelsio adapter's PCIe > Capability Device Control[Relaxed Ordering Enable] set for the nVME > application ... and we would avoid programming the Chelsio Hardware to use > Relaxed Ordering for TLPs directed at the Root Complex. Thus we would be in > a position where some TLPs being emitted by the device to Peer devices would > have Relaxed Ordering set and some directed at the Root Complex would not. > And the only way for that to work is if the source device's Device > Control[Relaxed Ordering Enable] is set ... Right. I admit this is pushing extra complexity into the driver, but what else would you have us do? The idea here is more of a lock-out tag-out type setup. We go in and disable relaxed ordering on the devices if it isn't safe to send a relaxed ordering TLP to the root complex. We then leave it up to the driver to go through and re-enable it if the driver knows enough about how it works and what kind of transactions it might issue. I'm not saying we have to leave the relaxed ordering bit disabled in the control register. I'm saying we disable it at first, and then leave it up to the device driver to re-enable it if it needs the functionality for something that is other than root-complex based and it knows it can avoid sending the frames to the root complex. Ideally such a driver would also clean up after itself if removed so that it leaves the device in the same state it found it in. Maybe we don't even really need patch 3/3 in this series. If disabling the relaxed ordering bit in the configuration space already disabled relaxed ordering for the entire device then this patch doesn't even really do anything then right? The device takes care of it already for us so we could probably just drop this patch as it currently stands. If that is the case maybe we need to refocus patch 3/3 on re-enabling relaxed ordering for that nVME specific case. That might be beyond the scope of what Ding can handle though, and I am not familiar with the Chelsio hardware either. So that might be something that is best left to you as a follow-up patch. > Finally, setting aside my disagreements with the patch, we still have the > code in the cxgb4 driver which explicitly turns on its own Device > Control[Relaxed Ordering Enable] in cxgb4_main.c: > enable_pcie_relaxed_ordering(). So the patch is something of a loop if all > we're doing is testing our own Relaxed Ordering Enable state ... I assume you are talking about what I am suggesting and not what is in the actual patch. Basically what it comes down to is sort of a loop. You would need to add some code on your probe and remove routines to check the status of the bits and configure them the way you want, and on remove you would need to clean things up so that the next time a driver loads it doesn't get confused. However you should only need this in the case where the root complex cannot support relaxed ordering. In all other cases you should be just running with relaxed ordering enabled for everything.
Hi Casey: On 2017/7/8 10:04, Casey Leedom wrote: > Okay, thanks for the note Alexander. I'll have to look more closely at > the patch on Monday and try it out on one of the targeted systems to verify > the semantics you describe. > All the modification is only clearing the device's Device Control{Relaxed Ordering Enable]bit when distinguish that the platform should not support RO and did nothing to the RC configuration, so I don't think it will break anything compare to the first version from yours. > However, that said, there is no way to tell a priori where a device will > send TLPs. To simply assume that all TLPs will be directed towards the Root > Complex is a big assumption. Only the device and the code controlling it > know where the TLPs will be directed. That's why there are changes required > in the cxgb4 driver. For instance, the code in > drivers/net/ethernet/chelsio./cxgb4/sge.c: t4_sge_alloc_rxq() knows that > it's allocating Free List Buffers in Host Memory and that the RX Queues that > it's allocating in the Hardware will eventually send Ingress Data to those > Free List Buffers. (And similarly for the Free List Buffer Pointer Queue > with respect to DMA Reads from the host.) In that routine we explicitly > configure the Hardware to use/not-use the Relaxed Ordering Attribute via the > FW_IQ_CMD_FL0FETCHRO and FW_IQ_CMD_FL0DATARO flags. Basically we're > conditionally setting them based on the desirability of sending Relaxed > Ordering TLPs to the Root Complex. (And we would perform the same kind of > check for an nVME application ... which brings us to ...) > > And what would be the code using these patch APIs to set up a Peer-to-Peer > nVME-style application? In that case we'd need the Chelsio adapter's PCIe > Capability Device Control[Relaxed Ordering Enable] set for the nVME > application ... and we would avoid programming the Chelsio Hardware to use > Relaxed Ordering for TLPs directed at the Root Complex. Thus we would be in > a position where some TLPs being emitted by the device to Peer devices would > have Relaxed Ordering set and some directed at the Root Complex would not. > And the only way for that to work is if the source device's Device > Control[Relaxed Ordering Enable] is set ... > > Finally, setting aside my disagreements with the patch, we still have the > code in the cxgb4 driver which explicitly turns on its own Device > Control[Relaxed Ordering Enable] in cxgb4_main.c: > enable_pcie_relaxed_ordering(). So the patch is something of a loop if all > we're doing is testing our own Relaxed Ordering Enable state ... > > Casey > > . >
Hey Alexander, Okay, I understand your point regarding the "most likely scenario" being TLPs directed upstream to the Root Complex. But I'd still like to make sure that we have an agreed upon API/methodology for doing Peer-to-Peer with Relaxed Ordering and no Relaxed Ordering to the Root Complex. I don't see how the proposed APIs can be used in that fashion. Right now the proposed change for cxgb4 is for it to test its own PCIe Capability Device Control[Relaxed Ordering Enable] in order to use that information to program the Chelsio Hardware to emit/not emit upstream TLPs with the Relaxed Ordering Attribute set. But if we're going to have the mixed mode situation I describe, the PCIe Capability Device Control[Relaxed Ordering Enable] will have to be set which means that we'll be programming the Chelsio Hardware to send upstream TLPs with Relaxed Ordering Enable to the Root Complex which is what we were trying to avoid in the first place ... [[ And, as I noted on Friday evening, the currect cxgb4 Driver hardwires the Relaxed Ordering Enable on early dureing device probe, so that would minimally need to be addressed even if we decide that we don't ever want to support mixed mode Relaxed Ordering. ]] We need some method of telling the Chelsio Driver that it should/shouldn't use Relaxed Ordering with TLPs directed at the Root Complex. And the same is true for a Peer PCIe Device. It may be that we should approach this from the completely opposite direction and instead of having quirks which identify problematic devices, have quirks which identify devices which would benefit from the use of Relaxed Ordering (if the sending device supports that). That is, assume the using Relaxed Ordering shouldn't be done unless the target device says "I love Relaxed Ordering TLPs" ... In such a world, an NVMe or a Graphics device might declare love of Relaxed Ordering and the same for a SPARC Root Complex (I think that was your example). By the way, the sole example of Data Corruption with Relaxed Ordering is the AMD A1100 ARM SoC and AMD appears to have given up on that almost as soon as it was released. So what we're left with currently is a performance problem on modern Intel CPUs ... (And hopefully we'll get a Technical Publication on that issue fairly soon.) Casey
On Mon, Jul 10, 2017 at 5:01 PM, Casey Leedom <leedom@chelsio.com> wrote: > > Hey Alexander, > > Okay, I understand your point regarding the "most likely scenario" being > TLPs directed upstream to the Root Complex. But I'd still like to make sure > that we have an agreed upon API/methodology for doing Peer-to-Peer with > Relaxed Ordering and no Relaxed Ordering to the Root Complex. I don't see > how the proposed APIs can be used in that fashion. > > Right now the proposed change for cxgb4 is for it to test its own PCIe > Capability Device Control[Relaxed Ordering Enable] in order to use that > information to program the Chelsio Hardware to emit/not emit upstream TLPs > with the Relaxed Ordering Attribute set. But if we're going to have the > mixed mode situation I describe, the PCIe Capability Device Control[Relaxed > Ordering Enable] will have to be set which means that we'll be programming > the Chelsio Hardware to send upstream TLPs with Relaxed Ordering Enable to > the Root Complex which is what we were trying to avoid in the first place ... > > [[ And, as I noted on Friday evening, the currect cxgb4 Driver hardwires > the Relaxed Ordering Enable on early dureing device probe, so that > would minimally need to be addressed even if we decide that we don't > ever want to support mixed mode Relaxed Ordering. ]] So when you say "hardwires the Relaxed Ordering Enable" do you mean it cannot be changed by software, or are you saying the firmware is just setting the bit? I just want to make sure the change even works. Really what I was trying to get at is the Relaxed Ordering Enable bit doesn't have to stay as 0 if it is cleared due to the root complex not supporting it. If you have a device that is smart enough to be able to distinguish between different destinations and can change the TLPs depending on where they are routed then the driver should feel free to re-enable the relaxed ordering bit itself and do whatever it wants. The idea though is that in most cases it will probably be enabled by default since most firmwares default to that if I am not mistaken. If we disable it, then it is up to the drivers to figure out if they need to come up with a workaround. > We need some method of telling the Chelsio Driver that it should/shouldn't > use Relaxed Ordering with TLPs directed at the Root Complex. And the same > is true for a Peer PCIe Device. Well my thought for now is to take this in baby steps. Looking over the code I might suggest that Ding just drops patch 3 and just focuses on the first two patches for now. Patch 3 doesn't do anything from the sounds of it since the Relaxed Ordering Enable being cleared will cause the TLPs to already not set the relaxed ordering bit, do I have that right? The main thing Ding cared about is making the ixgbe driver behave more like the cxgb4 driver in that he wanted to have us enable relaxed ordering by default which right now we were only doing for the SPARC platforms. As such we may want to look at changing patch 3 to instead strip the code in ixgbe so that it is enabled to use Relaxed Ordering by default and then let the configuration space determine if we use it or not. > It may be that we should approach this from the completely opposite > direction and instead of having quirks which identify problematic devices, > have quirks which identify devices which would benefit from the use of > Relaxed Ordering (if the sending device supports that). That is, assume the > using Relaxed Ordering shouldn't be done unless the target device says "I > love Relaxed Ordering TLPs" ... In such a world, an NVMe or a Graphics > device might declare love of Relaxed Ordering and the same for a SPARC Root > Complex (I think that was your example). Really I think we are probably better off enabling it by default and leaving it up to the configuration space of the end points to sort it out. The advantage is that it will let us catch issues with platforms that need these kind of quirks sooner since up until now we have been avoiding enabling relaxed ordering for most devices on x86 and as we get faster and faster buses we are going to need to start fully supporting this sooner or later anyway. > By the way, the sole example of Data Corruption with Relaxed Ordering is > the AMD A1100 ARM SoC and AMD appears to have given up on that almost as > soon as it was released. So what we're left with currently is a performance > problem on modern Intel CPUs ... (And hopefully we'll get a Technical > Publication on that issue fairly soon.) > > Casey That's also assuming there aren't any other platforms with issues lurking out there. If something like this had been in place before some of the modern Intel CPUs were developed perhaps they would have caught the relaxed ordering issue soon enough to have resolved it in the silicon. Odds are this sets things up for how we will deal with future issues more than current ones. I'm more a fan of just letting the drivers configure themselves for relaxed ordering to the root complex by default, and then if something comes up where relaxed ordering can't be supported on the platform then we do workarounds for things like the peer-to-peer performance you mentioned before for the NVMe solution. - Alex
On 2017/7/11 8:01, Casey Leedom wrote: > > Hey Alexander, > > Okay, I understand your point regarding the "most likely scenario" being > TLPs directed upstream to the Root Complex. But I'd still like to make sure > that we have an agreed upon API/methodology for doing Peer-to-Peer with > Relaxed Ordering and no Relaxed Ordering to the Root Complex. I don't see > how the proposed APIs can be used in that fashion. > > Right now the proposed change for cxgb4 is for it to test its own PCIe > Capability Device Control[Relaxed Ordering Enable] in order to use that > information to program the Chelsio Hardware to emit/not emit upstream TLPs > with the Relaxed Ordering Attribute set. But if we're going to have the > mixed mode situation I describe, the PCIe Capability Device Control[Relaxed > Ordering Enable] will have to be set which means that we'll be programming > the Chelsio Hardware to send upstream TLPs with Relaxed Ordering Enable to > the Root Complex which is what we were trying to avoid in the first place ... > > [[ And, as I noted on Friday evening, the currect cxgb4 Driver hardwires > the Relaxed Ordering Enable on early dureing device probe, so that > would minimally need to be addressed even if we decide that we don't > ever want to support mixed mode Relaxed Ordering. ]] > > We need some method of telling the Chelsio Driver that it should/shouldn't > use Relaxed Ordering with TLPs directed at the Root Complex. And the same > is true for a Peer PCIe Device. > > It may be that we should approach this from the completely opposite > direction and instead of having quirks which identify problematic devices, > have quirks which identify devices which would benefit from the use of > Relaxed Ordering (if the sending device supports that). That is, assume the > using Relaxed Ordering shouldn't be done unless the target device says "I > love Relaxed Ordering TLPs" ... In such a world, an NVMe or a Graphics > device might declare love of Relaxed Ordering and the same for a SPARC Root > Complex (I think that was your example). > > By the way, the sole example of Data Corruption with Relaxed Ordering is > the AMD A1100 ARM SoC and AMD appears to have given up on that almost as > soon as it was released. So what we're left with currently is a performance > problem on modern Intel CPUs ... (And hopefully we'll get a Technical > Publication on that issue fairly soon.) > > Casey > Hi Casey: After the long discuss, I think If the PCIe Capability Device Control[Relaxed Ordering Enable] to be cleared when the platform's RC has some problematic for RO didn't break anything in your driver, I think you could choose to check the (!pci_dev_should_disable_relaxed_ordering(root)) in the code to to enable ROOT_NO_RELAXED_ORDERING for your adapter, and enable the PCIe Capability Device Control [Relaxed Ordering Enable] bit when you need it, I think we don't have much gap here. And we could leave the pear-to-pear situation to be fixed later. Thanks Ding > . >
Sorry again for the delay. This time at least partially caused by a Chelsio-internal Customer Support request to simply disable Relaxed Ordering entirely due to the performance issues with our 100Gb/s product and relatively recent Intel Root Complexes. Our Customer Support people are tired of asking customers to try turning off Relaxed Ordering. (sigh) So, first off, I've mentioned a couple of times that the current cxgb4 driver hardwired the PCIe Capability Device Control[Relaxed Ordering Enable] on. Here's the code which does it: drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4657: static void enable_pcie_relaxed_ordering(struct pci_dev *dev) { pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN); } This is called from the PCI Probe() routine init_one() later in that file. I just wanted to make sure people knew about this. Obviously given our current very difficult thread, this would either need to be diked out or changed or a completely different mechanism put in place. Second, just to make sure everyone's on the same page, the above simply allows the device to send TLPs with the Relaxed Ordering Attribute. It doesn't cause TLPs to suddenly all be sent with RO set. The use of Relaxed Ordering is selective. For instance, in our hardware we can configure the RX Path to use RO on Ingress Packet Data delivery to Free List Buffers, but not use RO for delivery of messages noting newly delivered Ingress Packet Data. Doing this allows the destination PCIe target to [potentially] optimize the DMA Writes to it based on local conditions (memory controller channel availability, etc.), but ensure that the message noting newly delivered Ingress Packet Data isn't processed till all of the preceding TLPs with RO set containing Ingress Packet Data have been processed. (This by the way is the essence of the AMD A1100 ARM SoC bug: its Root Complex isn't obeying that PCIe ordering rule.) Third, as noted above, I'm getting a lot of pressure to get this addressed sooner than later, so I think that we should go with something fairly simple along the lines that you guys are proposing and I'll stop whining about the problem of needing to handle Peer-to-Peer with Relaxed Ordering while not using it for deliveries to the Root Complex. We can just wait for that kettle of fish to explode on us and deal with the mess then. (Hhmmm, the mixed metaphor landed in an entirely different place than I originally intended ... :-)) If we try to stick as closely to Ding's latest patch set as possible, then we can probably just add the diff to remove the enable_pcie_relaxed_ordering() code in cxgb4_main.c. Casey
On 2017/7/13 8:52, Casey Leedom wrote: > Sorry again for the delay. This time at least partially caused by a Chelsio-internal Customer Support request to simply disable Relaxed Ordering entirely due to the performance issues with our 100Gb/s product and relatively recent Intel Root Complexes. Our Customer Support people are tired of asking customers to try turning off Relaxed Ordering. (sigh) > > So, first off, I've mentioned a couple of times that the current cxgb4 driver hardwired the PCIe Capability Device Control[Relaxed Ordering Enable] on. Here's the code which does it: > > drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4657: > > static void enable_pcie_relaxed_ordering(struct pci_dev *dev) > { > pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN); > } we should remove it. > > This is called from the PCI Probe() routine init_one() later in that file. I just wanted to make sure people knew about this. Obviously given our current very difficult thread, this would either need to be diked out or changed or a completely different mechanism put in place. > > Second, just to make sure everyone's on the same page, the above simply allows the device to send TLPs with the Relaxed Ordering Attribute. It doesn't cause TLPs to suddenly all be sent with RO set. The use of Relaxed Ordering is selective. For instance, in our hardware we can configure the RX Path to use RO on Ingress Packet Data delivery to Free List Buffers, but not use RO for delivery of messages noting newly delivered Ingress Packet Data. Doing this allows the destination PCIe target to [potentially] optimize the DMA Writes to it based on local conditions (memory controller channel availability, etc.), but ensure that the message noting newly delivered Ingress Packet Data isn't processed till all of the preceding TLPs with RO set containing Ingress Packet Data have been processed. (This by the way is the essence of the AMD A1100 ARM SoC bug: its Root Complex isn't obeying that PCIe ordering rule.) > > Third, as noted above, I'm getting a lot of pressure to get this addressed sooner than later, so I think that we should go with something fairly simple along the lines that you guys are proposing and I'll stop whining about the problem of needing to handle Peer-to-Peer with Relaxed Ordering while not using it for deliveries to the Root Complex. We can just wait for that kettle of fish to explode on us and deal with the mess then. (Hhmmm, the mixed metaphor landed in an entirely different place than I originally intended ... :-)) > Ok, we could fix them when we trigger this, I think it is not a big problem. > If we try to stick as closely to Ding's latest patch set as possible, then we can probably just add the diff to remove the enable_pcie_relaxed_ordering() code in cxgb4_main.c. > If no other more suggestion, I will send a new version and remove the enable_pcie_relaxed_ordering(), thanks. :) Ding > Casey > . >
| From: Ding Tianhong <dingtianhong@huawei.com> | Sent: Wednesday, July 12, 2017 6:18 PM | | If no other more suggestion, I will send a new version and remove the | enable_pcie_relaxed_ordering(), thanks. :) Sounds good to me. (And sorry for forgetting to justify that last message. I hate working with web-based email agents.) Casey
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 38a5c67..546538d 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -4620,6 +4620,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent) struct port_info *pi; bool highdma = false; struct adapter *adapter = NULL; + struct pci_dev *root; struct net_device *netdev; void __iomem *regs; u32 whoami, pl_rev; @@ -4726,6 +4727,24 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent) adapter->msg_enable = DFLT_MSG_ENABLE; memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map)); + /* If possible, we use PCIe Relaxed Ordering Attribute to deliver + * Ingress Packet Data to Free List Buffers in order to allow for + * chipset performance optimizations between the Root Complex and + * Memory Controllers. (Messages to the associated Ingress Queue + * notifying new Packet Placement in the Free Lists Buffers will be + * send without the Relaxed Ordering Attribute thus guaranteeing that + * all preceding PCIe Transaction Layer Packets will be processed + * first.) But some Root Complexes have various issues with Upstream + * Transaction Layer Packets with the Relaxed Ordering Attribute set. + * The PCIe devices which under the Root Complexes will be cleared the + * Relaxed Ordering bit in the configuration space, So we check our + * PCIe configuration space to see if it's flagged with advice against + * using Relaxed Ordering. + */ + root = pci_find_pcie_root_port(pdev); + if (!pcie_relaxed_ordering_supported(root)) + adapter->flags |= ROOT_NO_RELAXED_ORDERING; + spin_lock_init(&adapter->stats_lock); spin_lock_init(&adapter->tid_release_lock); spin_lock_init(&adapter->win0_lock);