Message ID | 1314971145-28038-1-git-send-email-mason@myri.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Fri, 2011-09-02 at 08:45 -0500, Jon Mason wrote: > Modifying the Maximum Read Request Size to 0 (value of 128Bytes) has > massive negative ramifications on some devices. Without knowing which > devices have this issue, do not modify from the default value when > walking the PCI-E bus. BTW. I recently ran into an MRRS related problem (on a system that doesn't have any of our recent patches yet). This does have an impact however if we use the algorithm I suggested for setting MPS, which is to allow parents to have a larger MPS than children in order to avoid clamping an entire hierarchy when it contains one child device that has a small MPS. When doing that, it is critical that the MRRS be set to be lower or equal to the MPS of the device. Because the parent bridge (and thus the host bridge) will potentially have an MPS larger than the requesting function, the bridge -will- honor a large read request with a packet potential as large as it's own MPS, and thus potentially larger that what the function can cope with. This isn't a problem if the MPS are clamped to the smallest common denominator of the entire hierarchy. So it depends really what algorithm has been chosen for the configuration of the MPS. Cheers, Ben. > Tested-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> > Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> > Reported-and-tested-by: Niels Ole Salscheider <niels_ole@salscheider-online.de> > References: https://bugzilla.kernel.org/show_bug.cgi?id=42162 > Signed-off-by: Jon Mason <mason@myri.com> > --- > drivers/pci/probe.c | 36 ------------------------------------ > 1 files changed, 0 insertions(+), 36 deletions(-) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 8473727..d896c5e 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1394,41 +1394,6 @@ static void pcie_write_mps(struct pci_dev *dev, int mps) > dev_err(&dev->dev, "Failed attempting to set the MPS\n"); > } > > -static void pcie_write_mrrs(struct pci_dev *dev, int mps) > -{ > - int rc, mrrs; > - > - if (pcie_bus_config == PCIE_BUS_PERFORMANCE) { > - int dev_mpss = 128 << dev->pcie_mpss; > - > - /* For Max performance, the MRRS must be set to the largest > - * supported value. However, it cannot be configured larger > - * than the MPS the device or the bus can support. This assumes > - * that the largest MRRS available on the device cannot be > - * smaller than the device MPSS. > - */ > - mrrs = mps < dev_mpss ? mps : dev_mpss; > - } else > - /* In the "safe" case, configure the MRRS for fairness on the > - * bus by making all devices have the same size > - */ > - mrrs = mps; > - > - > - /* MRRS is a R/W register. Invalid values can be written, but a > - * subsiquent read will verify if the value is acceptable or not. > - * If the MRRS value provided is not acceptable (e.g., too large), > - * shrink the value until it is acceptable to the HW. > - */ > - while (mrrs != pcie_get_readrq(dev) && mrrs >= 128) { > - rc = pcie_set_readrq(dev, mrrs); > - if (rc) > - dev_err(&dev->dev, "Failed attempting to set the MRRS\n"); > - > - mrrs /= 2; > - } > -} > - > static int pcie_bus_configure_set(struct pci_dev *dev, void *data) > { > int mps = 128 << *(u8 *)data; > @@ -1440,7 +1405,6 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data) > pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev)); > > pcie_write_mps(dev, mps); > - pcie_write_mrrs(dev, mps); > > dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n", > pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev)); -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 5, 2011 at 10:48 AM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Fri, 2011-09-02 at 08:45 -0500, Jon Mason wrote: >> Modifying the Maximum Read Request Size to 0 (value of 128Bytes) has >> massive negative ramifications on some devices. Without knowing which >> devices have this issue, do not modify from the default value when >> walking the PCI-E bus. > > BTW. I recently ran into an MRRS related problem (on a system that > doesn't have any of our recent patches yet). > > This does have an impact however if we use the algorithm I suggested for > setting MPS, which is to allow parents to have a larger MPS than > children in order to avoid clamping an entire hierarchy when it contains > one child device that has a small MPS. > > When doing that, it is critical that the MRRS be set to be lower or > equal to the MPS of the device. Because the parent bridge (and thus the > host bridge) will potentially have an MPS larger than the requesting > function, the bridge -will- honor a large read request with a packet > potential as large as it's own MPS, and thus potentially larger that > what the function can cope with. > > This isn't a problem if the MPS are clamped to the smallest common > denominator of the entire hierarchy. > > So it depends really what algorithm has been chosen for the > configuration of the MPS. Yes, that is why I added the MRRS setting code originally. Since the MRRS setting is almost universally larger than the MPS and some devices are freaking when their MRRS is changed, we must either use the "safe" option by default and not touch the MRRS or allow the "performance" option to have this hole. I suppose a third option is to create a list of faulty devices that cannot handle MRRS modification, but that seems rather ugly. I'm open to any option, but there seems to be a large number of issues with the MRRS modification code and we need to get a fix into 3.1. Thanks, Jon > > Cheers, > Ben. > >> Tested-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> >> Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> >> Reported-and-tested-by: Niels Ole Salscheider <niels_ole@salscheider-online.de> >> References: https://bugzilla.kernel.org/show_bug.cgi?id=42162 >> Signed-off-by: Jon Mason <mason@myri.com> >> --- >> drivers/pci/probe.c | 36 ------------------------------------ >> 1 files changed, 0 insertions(+), 36 deletions(-) >> >> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >> index 8473727..d896c5e 100644 >> --- a/drivers/pci/probe.c >> +++ b/drivers/pci/probe.c >> @@ -1394,41 +1394,6 @@ static void pcie_write_mps(struct pci_dev *dev, int mps) >> dev_err(&dev->dev, "Failed attempting to set the MPS\n"); >> } >> >> -static void pcie_write_mrrs(struct pci_dev *dev, int mps) >> -{ >> - int rc, mrrs; >> - >> - if (pcie_bus_config == PCIE_BUS_PERFORMANCE) { >> - int dev_mpss = 128 << dev->pcie_mpss; >> - >> - /* For Max performance, the MRRS must be set to the largest >> - * supported value. However, it cannot be configured larger >> - * than the MPS the device or the bus can support. This assumes >> - * that the largest MRRS available on the device cannot be >> - * smaller than the device MPSS. >> - */ >> - mrrs = mps < dev_mpss ? mps : dev_mpss; >> - } else >> - /* In the "safe" case, configure the MRRS for fairness on the >> - * bus by making all devices have the same size >> - */ >> - mrrs = mps; >> - >> - >> - /* MRRS is a R/W register. Invalid values can be written, but a >> - * subsiquent read will verify if the value is acceptable or not. >> - * If the MRRS value provided is not acceptable (e.g., too large), >> - * shrink the value until it is acceptable to the HW. >> - */ >> - while (mrrs != pcie_get_readrq(dev) && mrrs >= 128) { >> - rc = pcie_set_readrq(dev, mrrs); >> - if (rc) >> - dev_err(&dev->dev, "Failed attempting to set the MRRS\n"); >> - >> - mrrs /= 2; >> - } >> -} >> - >> static int pcie_bus_configure_set(struct pci_dev *dev, void *data) >> { >> int mps = 128 << *(u8 *)data; >> @@ -1440,7 +1405,6 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data) >> pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev)); >> >> pcie_write_mps(dev, mps); >> - pcie_write_mrrs(dev, mps); >> >> dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n", >> pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev)); > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2011-09-06 at 11:11 -0700, Jon Mason wrote: > Yes, that is why I added the MRRS setting code originally. Since the > MRRS setting is almost universally larger than the MPS and some > devices are freaking when their MRRS is changed, we must either use > the "safe" option by default and not touch the MRRS or allow the > "performance" option to have this hole. I suppose a third option is > to create a list of faulty devices that cannot handle MRRS > modification, but that seems rather ugly. Do we know for sure that card can't handle it at all or only freaks out with "0" ? IE Does the card work if MRS == MPS of the card (ie using my algorithm and assuming the host bridge supports arbitrary MPS). > I'm open to any option, but there seems to be a large number of issues > with the MRRS modification code and we need to get a fix into 3.1. Ok, I only head of one, I didn't know there were more. Are these typically hitting with the "performance" option ? IE. It make sense to leave MRRS untouched in the "safe" case. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 6, 2011 at 11:52 AM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Tue, 2011-09-06 at 11:11 -0700, Jon Mason wrote: >> Yes, that is why I added the MRRS setting code originally. Since the >> MRRS setting is almost universally larger than the MPS and some >> devices are freaking when their MRRS is changed, we must either use >> the "safe" option by default and not touch the MRRS or allow the >> "performance" option to have this hole. I suppose a third option is >> to create a list of faulty devices that cannot handle MRRS >> modification, but that seems rather ugly. > > Do we know for sure that card can't handle it at all or only freaks out > with "0" ? It's hard to say since the MPS on most systems is "0" (e.g., 128B). It appears that one of the Radeon chips doesn't like the MRRS set to 0/6/7 (according to a patch attached to the same bug). So, there are a number of invalid settings that will have a negative impact on that. If we don't want to allow the hole in the patch I pushed to Jesse, I think we should go with the "safe" option not setting the MRRS and make it the default. > IE Does the card work if MRS == MPS of the card (ie using my algorithm > and assuming the host bridge supports arbitrary MPS). Currently, the code will set the MRRS to the MPS of the parent (if the "performance" option is used). It will then read back the value to verify that it has been changed. If not, then it will try setting it to the next value lower, etc. until it hits 0. The "safe" option tries to set the MRRS to a uniform size to make everything equitable, which seemed logical but not 100% necessary. I am more than happy to have this option not touch the default MRRS at all. >> I'm open to any option, but there seems to be a large number of issues >> with the MRRS modification code and we need to get a fix into 3.1. > > Ok, I only head of one, I didn't know there were more. I've gotten 5 e-mails so far. I'm betting there are more out there. > Are these typically hitting with the "performance" option ? IE. It make > sense to leave MRRS untouched in the "safe" case. The patch I sent out still used the "performance" option without modifying the default MRRS of the device. All that have tested it said that it resolves their issues. thanks, Jon > > Cheers, > Ben. > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2011-09-06 at 12:12 -0700, Jon Mason wrote: > > Are these typically hitting with the "performance" option ? IE. It > make > > sense to leave MRRS untouched in the "safe" case. > > The patch I sent out still used the "performance" option without > modifying the default MRRS of the device. All that have tested it > said that it resolves their issues. But that will cause other issues as I described, if the MRRS end up larger than the MPS. IE. The MRRS of a device must be set to be lower or equal to the MPS of that device (not of the parent btw) if we allow the parent(s) to have a larger MPS. I -did- hit a very real problem with adapters where that wasn't true. As for those users, are they all the same radeon ? Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Tue, 2011-09-06 at 12:12 -0700, Jon Mason wrote: >> > Are these typically hitting with the "performance" option ? IE. It >> make >> > sense to leave MRRS untouched in the "safe" case. >> >> The patch I sent out still used the "performance" option without >> modifying the default MRRS of the device. All that have tested it >> said that it resolves their issues. > > But that will cause other issues as I described, if the MRRS end up > larger than the MPS. IE. The MRRS of a device must be set to be lower or > equal to the MPS of that device (not of the parent btw) if we allow the > parent(s) to have a larger MPS. I don't think so. Just looking into my lspci I found more than one occurence of things like this: MaxPayload 128 bytes, MaxReadReq 512 bytes Which is perfectly fine. The requester (i.e. the device) may issue read requests up to 512 bytes at once, which is a thing about transfer credits and the like (IIRC). The completer may split the completions into packages of any valid size between the minimum size (IIRC 64 bytes) and the maximum _payload_ size the requester can handle. In this case you will likely get either 8*64 or 4*128 byte completions for a read request of 512 byte, but any combination in between would be valid, too. One needs to keep in mind that a read request of 512 byte is itself only a 12 or 16 byte packet, so it isn't affected by the payload size at all. > I -did- hit a very real problem with adapters where that wasn't true. From my understanding this shows that these adapters were broken, nothing else. An adapter must be prepared to get a bunch of smaller packets anyway, as it has no control of how the completer sends out the data. So maybe your adapters just got screwed by awaiting a 512 byte reply and getting 4*128? You could connect them to some sort of bastard completer if you have one (I don't) that completes every request in packets of 64 byte and see if they will not even explode with 128 byte MRRS. Eike -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2011-09-07 at 10:13 +0200, Rolf Eike Beer wrote: > > On Tue, 2011-09-06 at 12:12 -0700, Jon Mason wrote: > >> > Are these typically hitting with the "performance" option ? IE. It > >> make > >> > sense to leave MRRS untouched in the "safe" case. > >> > >> The patch I sent out still used the "performance" option without > >> modifying the default MRRS of the device. All that have tested it > >> said that it resolves their issues. > > > > But that will cause other issues as I described, if the MRRS end up > > larger than the MPS. IE. The MRRS of a device must be set to be lower or > > equal to the MPS of that device (not of the parent btw) if we allow the > > parent(s) to have a larger MPS. > > I don't think so. Just looking into my lspci I found more than one > occurence of things like this: > > MaxPayload 128 bytes, MaxReadReq 512 bytes > > Which is perfectly fine. The requester (i.e. the device) may issue read > requests up to 512 bytes at once, which is a thing about transfer credits > and the like (IIRC). The completer may split the completions into packages > of any valid size between the minimum size (IIRC 64 bytes) and the maximum > _payload_ size the requester can handle. In this case you will likely get > either 8*64 or 4*128 byte completions for a read request of 512 byte, but > any combination in between would be valid, too. No. If the MPS of the host bridge is larger than 128, then the host bridge will possibly return read responses using payloads up to the MRRS. > One needs to keep in mind that a read request of 512 byte is itself only a > 12 or 16 byte packet, so it isn't affected by the payload size at all. The problem is of course not the request packet itself but the response. > > I -did- hit a very real problem with adapters where that wasn't true. > > >From my understanding this shows that these adapters were broken, nothing > else. An adapter must be prepared to get a bunch of smaller packets > anyway, as it has no control of how the completer sends out the data. So > maybe your adapters just got screwed by awaiting a 512 byte reply and > getting 4*128? You could connect them to some sort of bastard completer if > you have one (I don't) that completes every request in packets of 64 byte > and see if they will not even explode with 128 byte MRRS. Unfortunately, I didn't manage to get a good TLP capture of the problem packets in AER. But basically what happens is: - Host bridge has a large MPS (For example 4096) - Device has a smaller MPS (for example 128) - Device has a large MRRS (for example 512) What I observed is when receiving network packets larger than roughly 128 bytes (I didn't get precise packet size threshold, I wasn't doing the tests myself), the device appears to get read -responses- larger than it's MPS (up to it's MRRS, ie, the size it specified in the read request), and shoots an UE upstream. This happens with e1000e's so I doubt it's a broken PCIe implementation in the device, and it makes sense all things considered. The Host bridge having an MPS larger than 128, it is allowed to send a read response using a large TLP, which will be rejected by the device. The "safe" approach of course is to clamp all MPS to the minimum, but that leads to way too many situations where everybody gets down to 128 bytes because -one- device in the system has 128 bytes, and that means that anything that has a hotplug slot must clamp everybody as well. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2011-09-07 at 06:30 -0300, Benjamin Herrenschmidt wrote: > Unfortunately, I didn't manage to get a good TLP capture of the problem > packets in AER. But basically what happens is: > > - Host bridge has a large MPS (For example 4096) > - Device has a smaller MPS (for example 128) > - Device has a large MRRS (for example 512) Just double checked on the actual machine. The device has a MPS of 256 and the bridge can go up to 4096. We were letting it up and observed the problem with an MRRS of 512 (apparently the power-on default of that adapter). So either we clamp the bridge to 256 and penalize everybody, or we clamp the e1000's MRRS to 256 and things work. Cheers, Ben. > What I observed is when receiving network packets larger than roughly > 128 bytes (I didn't get precise packet size threshold, I wasn't doing > the tests myself), the device appears to get read -responses- larger > than it's MPS (up to it's MRRS, ie, the size it specified in the read > request), and shoots an UE upstream. > > This happens with e1000e's so I doubt it's a broken PCIe implementation > in the device, and it makes sense all things considered. > > The Host bridge having an MPS larger than 128, it is allowed to send a > read response using a large TLP, which will be rejected by the device. > > The "safe" approach of course is to clamp all MPS to the minimum, but > that leads to way too many situations where everybody gets down to 128 > bytes because -one- device in the system has 128 bytes, and that means > that anything that has a hotplug slot must clamp everybody as well. > > Cheers, > Ben. > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Wed, 2011-09-07 at 06:30 -0300, Benjamin Herrenschmidt wrote: > >> Unfortunately, I didn't manage to get a good TLP capture of the problem >> packets in AER. But basically what happens is: >> >> - Host bridge has a large MPS (For example 4096) Out of curiosity: what is this for a board? The only thing we ever found for a reasonable price that has more than 128 byte here is the Intel X58/Tylersburg (and an older NVidia one). This is not really my business anymore but I guess I could make some people happy if I tell them what to look for. >> - Device has a smaller MPS (for example 128) >> - Device has a large MRRS (for example 512) > > Just double checked on the actual machine. The device has a MPS of 256 > and the bridge can go up to 4096. We were letting it up and observed the > problem with an MRRS of 512 (apparently the power-on default of that > adapter). > > So either we clamp the bridge to 256 and penalize everybody, or we clamp > the e1000's MRRS to 256 and things work. We need to change the MPS of the bridge anyway as it could send e.g. a write request of 4k otherwise. Which is completely orthogonal to the MRRS and would cause the same breakage. And as far as I understand what the patches do is exactly this change for exactly this reason: avoid too large packets hitting the device. The MRRS is only for things that were originally requested by the target device, but it is by far not the only way such packets may happen. Maybe it is the most likely way, but nothing more. But it would still be an interesting question to get a list of devices broken when the MRRS is changed. And to kick the vendors hard to fix that mess. Eike -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2011-09-07 at 12:37 +0200, Rolf Eike Beer wrote: > > On Wed, 2011-09-07 at 06:30 -0300, Benjamin Herrenschmidt wrote: > > > >> Unfortunately, I didn't manage to get a good TLP capture of the problem > >> packets in AER. But basically what happens is: > >> > >> - Host bridge has a large MPS (For example 4096) > > Out of curiosity: what is this for a board? The only thing we ever found > for a reasonable price that has more than 128 byte here is the Intel > X58/Tylersburg (and an older NVidia one). This is not really my business > anymore but I guess I could make some people happy if I tell them what to > look for. IBM POWER7 server :-) > >> - Device has a smaller MPS (for example 128) > >> - Device has a large MRRS (for example 512) > > > > Just double checked on the actual machine. The device has a MPS of 256 > > and the bridge can go up to 4096. We were letting it up and observed the > > problem with an MRRS of 512 (apparently the power-on default of that > > adapter). > > > > So either we clamp the bridge to 256 and penalize everybody, or we clamp > > the e1000's MRRS to 256 and things work. > > We need to change the MPS of the bridge anyway as it could send e.g. a > write request of 4k otherwise. No. It couldn't in our case (and in most implementations) simply because write requests from the bridge are triggered by MMIO operations that simply cannot gather that much. Granted, it requires platform specific knowledge to be able to make that assumption (which is why I originally wanted that "fast" mode to be selected specifically by the architecture). In our case, we -know- our bridges will never generate a request longer than 128 bytes. In fact, it can probably only happen if you have something like a fancy DMA controller on the CPU side of the bridge (tho Intels tend to have that nowadays). > Which is completely orthogonal to the MRRS > and would cause the same breakage. And as far as I understand what the > patches do is exactly this change for exactly this reason: avoid too large > packets hitting the device. The MRRS is only for things that were > originally requested by the target device, but it is by far not the only > way such packets may happen. Maybe it is the most likely way, but nothing > more. > > But it would still be an interesting question to get a list of devices > broken when the MRRS is changed. And to kick the vendors hard to fix that > mess. Well, I would definitely recommend that the default unless changed by the architecture code is the "safe" mode, in which case the bridge is clamped (and effectively all devices in the hierarchy below the host bridge are clamped to the lowest common MPS denominator). On Power server, I'm happy to switch to the "fast" approach by default which is to leave the upstream bridges at a higher MPS since that will gain us a significant performance boost with some adapters, and have the "safe" mode remain a kernel command line option just in case. Without that, anything like a hotplug enclosure for example would have to have everything permanently clamped to 128 which sucks. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sep 6, 2011 1:47 PM, "Benjamin Herrenschmidt" <benh@kernel.crashing.org> wrote: > On Tue, 2011-09-06 at 12:12 -0700, Jon Mason wrote: >> > Are these typically hitting with the "performance" option ? IE. It >> make >> > sense to leave MRRS untouched in the "safe" case. >> >> The patch I sent out still used the "performance" option without >> modifying the default MRRS of the device. All that have tested it >> said that it resolves their issues. > > But that will cause other issues as I described, if the MRRS end up > larger than the MPS. IE. The MRRS of a device must be set to be lower or > equal to the MPS of that device (not of the parent btw) if we allow the > parent(s) to have a larger MPS. Yes, that is the hole. > I -did- hit a very real problem with adapters where that wasn't true. > > As for those users, are they all the same radeon ? hpsa users too. Also, some Dell servers are experiencing issues, but I am not sure what hardware is causing issue on those systems. Based on this thread, I believe the best way to move forward is to make the "safe" option the default and have it not set the MRRS. The "performance" option will set the MRRS. I will add a printk to notify the user of potential issues. Hopefully users will try this option and inform us of any issues encountered. Thoughts? Thanks, Jon > Cheers, > Ben. > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: linux-pci-owner@vger.kernel.org [mailto:linux-pci- > owner@vger.kernel.org] On Behalf Of Jon Mason > Sent: Wednesday, September 07, 2011 2:49 PM > To: Benjamin Herrenschmidt > Cc: Jesse Barnes; linux-pci@vger.kernel.org > Subject: Re: [PATCH] PCI: Remove MRRS modification from MPS setting > code > > On Sep 6, 2011 1:47 PM, "Benjamin Herrenschmidt" > <benh@kernel.crashing.org> wrote: > > On Tue, 2011-09-06 at 12:12 -0700, Jon Mason wrote: > >> > Are these typically hitting with the "performance" option ? IE. It > >> make > >> > sense to leave MRRS untouched in the "safe" case. > >> > >> The patch I sent out still used the "performance" option without > >> modifying the default MRRS of the device. All that have tested it > >> said that it resolves their issues. > > > > But that will cause other issues as I described, if the MRRS end up > > larger than the MPS. IE. The MRRS of a device must be set to be lower > or > > equal to the MPS of that device (not of the parent btw) if we allow > the > > parent(s) to have a larger MPS. > > Yes, that is the hole. > > > I -did- hit a very real problem with adapters where that wasn't true. > > > > As for those users, are they all the same radeon ? > > hpsa users too. Also, some Dell servers are experiencing issues, but > I am not sure what hardware is causing issue on those systems. Based > on this thread, I believe the best way to move forward is to make the > "safe" option the default and have it not set the MRRS. The > "performance" option will set the MRRS. I will add a printk to notify > the user of potential issues. Hopefully users will try this option > and inform us of any issues encountered. A parameter option may help the cause for folks who would rather like the performance option. > > Thoughts? > > Thanks, > Jon > > > Cheers, > > Ben. > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 7, 2011 at 1:36 PM, <Shyam_Iyer@dell.com> wrote: > > >> -----Original Message----- >> From: linux-pci-owner@vger.kernel.org [mailto:linux-pci- >> owner@vger.kernel.org] On Behalf Of Jon Mason >> Sent: Wednesday, September 07, 2011 2:49 PM >> To: Benjamin Herrenschmidt >> Cc: Jesse Barnes; linux-pci@vger.kernel.org >> Subject: Re: [PATCH] PCI: Remove MRRS modification from MPS setting >> code >> >> On Sep 6, 2011 1:47 PM, "Benjamin Herrenschmidt" >> <benh@kernel.crashing.org> wrote: >> > On Tue, 2011-09-06 at 12:12 -0700, Jon Mason wrote: >> >> > Are these typically hitting with the "performance" option ? IE. It >> >> make >> >> > sense to leave MRRS untouched in the "safe" case. >> >> >> >> The patch I sent out still used the "performance" option without >> >> modifying the default MRRS of the device. All that have tested it >> >> said that it resolves their issues. >> > >> > But that will cause other issues as I described, if the MRRS end up >> > larger than the MPS. IE. The MRRS of a device must be set to be lower >> or >> > equal to the MPS of that device (not of the parent btw) if we allow >> the >> > parent(s) to have a larger MPS. >> >> Yes, that is the hole. >> >> > I -did- hit a very real problem with adapters where that wasn't true. >> > >> > As for those users, are they all the same radeon ? >> >> hpsa users too. Also, some Dell servers are experiencing issues, but >> I am not sure what hardware is causing issue on those systems. Based >> on this thread, I believe the best way to move forward is to make the >> "safe" option the default and have it not set the MRRS. The >> "performance" option will set the MRRS. I will add a printk to notify >> the user of potential issues. Hopefully users will try this option >> and inform us of any issues encountered. > > A parameter option may help the cause for folks who would rather like the performance option. A boot parm already exists, pci=pcie_bus_perf. > >> >> Thoughts? >> >> Thanks, >> Jon >> >> > Cheers, >> > Ben. >> > >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2011-09-07 at 11:48 -0700, Jon Mason wrote: > > I -did- hit a very real problem with adapters where that wasn't > true. > > > > As for those users, are they all the same radeon ? > > hpsa users too. Also, some Dell servers are experiencing issues, but > I am not sure what hardware is causing issue on those systems. Based > on this thread, I believe the best way to move forward is to make the > "safe" option the default and have it not set the MRRS. The > "performance" option will set the MRRS. I will add a printk to notify > the user of potential issues. Hopefully users will try this option > and inform us of any issues encountered. I'm fine with that tho I'd like to have a way to change the default in my arch code. People tend to use a limited set of adapters on Power which are "supported" by IBM and we more/less know we can do the "performance" option with them :-) Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 8473727..d896c5e 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1394,41 +1394,6 @@ static void pcie_write_mps(struct pci_dev *dev, int mps) dev_err(&dev->dev, "Failed attempting to set the MPS\n"); } -static void pcie_write_mrrs(struct pci_dev *dev, int mps) -{ - int rc, mrrs; - - if (pcie_bus_config == PCIE_BUS_PERFORMANCE) { - int dev_mpss = 128 << dev->pcie_mpss; - - /* For Max performance, the MRRS must be set to the largest - * supported value. However, it cannot be configured larger - * than the MPS the device or the bus can support. This assumes - * that the largest MRRS available on the device cannot be - * smaller than the device MPSS. - */ - mrrs = mps < dev_mpss ? mps : dev_mpss; - } else - /* In the "safe" case, configure the MRRS for fairness on the - * bus by making all devices have the same size - */ - mrrs = mps; - - - /* MRRS is a R/W register. Invalid values can be written, but a - * subsiquent read will verify if the value is acceptable or not. - * If the MRRS value provided is not acceptable (e.g., too large), - * shrink the value until it is acceptable to the HW. - */ - while (mrrs != pcie_get_readrq(dev) && mrrs >= 128) { - rc = pcie_set_readrq(dev, mrrs); - if (rc) - dev_err(&dev->dev, "Failed attempting to set the MRRS\n"); - - mrrs /= 2; - } -} - static int pcie_bus_configure_set(struct pci_dev *dev, void *data) { int mps = 128 << *(u8 *)data; @@ -1440,7 +1405,6 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data) pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev)); pcie_write_mps(dev, mps); - pcie_write_mrrs(dev, mps); dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n", pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev));