Message ID | 1377141888-7000-7-git-send-email-wangyijing@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
[+cc Joe] On Thu, Aug 22, 2013 at 11:24:48AM +0800, Yijing Wang wrote: > Currently we don't update device's mps value when doing > pci device hot-add. The hot-added device's mps will be set > to default value (128B). But the upstream port device's mps > may be larger than 128B which was set by firmware during > system bootup. In this case the new added device may not > work normally. This patch try to update the hot added device > mps equal to its parent mps, if device mpss < parent mps, > print warning. > > References: https://bugzilla.kernel.org/show_bug.cgi?id=60671 > Reported-by: Yijing Wang <wangyijing@huawei.com> > Signed-off-by: Yijing Wang <wangyijing@huawei.com> > Cc: Jon Mason <jdmason@kudzu.us> > Cc: stable@vger.kernel.org # 3.4+ > --- > drivers/pci/probe.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++- > 1 files changed, 47 insertions(+), 1 deletions(-) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 4afd158..06e88c5 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1602,6 +1602,43 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data) > return 0; > } > > +static int pcie_bus_update_set(struct pci_dev *dev, void *data) > +{ > + int mps, p_mps, mpss; > + struct pci_dev *parent; > + > + if (!pci_is_pcie(dev) || !dev->bus->self) > + return 0; > + > + parent = dev->bus->self; > + mps = pcie_get_mps(dev); > + p_mps = pcie_get_mps(dev->bus->self); > + > + if (mps >= p_mps) > + return 0; > + > + /* we only update the device mps, unless its parent device is root port, > + * and it is the only slot directly connected to root port. > + */ > + mpss = 128 << dev->pcie_mpss; > + if (mpss >= p_mps) { > + pcie_write_mps(dev, p_mps); > + } else if (pci_pcie_type(parent) == PCI_EXP_TYPE_ROOT_PORT) { > + pcie_write_mps(parent, mpss); > + pcie_write_mps(dev, mpss); > + } else > + dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n" > + "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n", > + mps, 128 << dev->pcie_mpss, p_mps); > + return 0; > +} > + > +static void pcie_bus_update_setting(struct pci_bus *bus) > +{ > + if (bus->self->is_hotplug_bridge) > + pci_walk_bus(bus, pcie_bus_update_set, NULL); > +} > + > /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down, > * parents then children fashion. If this changes, then this code will not > * work as designed. > @@ -1616,8 +1653,17 @@ void pcie_bus_configure_settings(struct pci_bus *bus) > if (!pci_is_pcie(bus->self)) > return; > > - if (pcie_bus_config == PCIE_BUS_TUNE_OFF) > + if (pcie_bus_config == PCIE_BUS_TUNE_OFF) { > + /* Sometimes we should update device mps here, > + * eg. after hot add, device mps value will be > + * set to default(128B), but the upstream port > + * mps value may be larger than 128B, if we do > + * not update the device mps, it maybe can not > + * work normally. > + */ > + pcie_bus_update_setting(bus); I think the strategy of updating the device MPS when possible makes sense, but I don't think we should do it in PCIE_BUS_TUNE_OFF mode. That mode is documented as "Disable PCIe MPS tuning and use the BIOS-configured MPS defaults." This patch changes that to something like "Disable PCIe MPS tuning, except for hot-added devices" and there is no longer a way to tell Linux to never touch MPS. Eventually, I think the default mode should change to PCIE_BUS_SAFE, where Linux changes MPS settings at boot-time and at hotplug-time to make sure every device works. (This mode assumes no peer-to-peer DMA.) I know this was tried in the past, and we tripped over all sorts of issues, but it's not clear how many were problems with the Linux code and how many were unsolvable BIOS or platform issues. Then we'd have these choices: PCIE_BUS_TUNE_OFF Never touch MPS PCIE_BUS_PEER2PEER Set all MPS to 128, so peer-to-peer DMA works PCIE_BUS_SAFE Configure each device with largest safe MPS (assumes no peer-to-peer DMA) PCIE_BUS_PERFORMANCE Use MRRS in addition to MPS (assumes no peer-to-peer DMA) The hot-add issue [1] could be regarded as a BIOS bug -- the BIOS programmed a hotplug bridge with MPS=256. A hot-added device powers up with MPS=128, so it's only safe for BIOS to set MPS=256 if the OS is smart enough to change the bridge MPS, the device MPS, or both, at hot-add time. That doesn't seem like a good assumption for a BIOS to make. I think we should always *warn* about potential MPS issues, even in PCIE_BUS_TUNE_OFF mode. That would help diagnose the hot-add issue as well as issues like the ones Joe Jin reported [2] and [3]. I think what we should do is *always* call pcie_bus_configure_set(), no matter what mode we're in, but make pcie_bus_configure_set() smart enough to do different things (print warnings, adjust settings, do the stuff you added in pcie_bus_update_set(), etc.) depending on what mode we're in. Bjorn > return; > + } > > /* FIXME - Peer to peer DMA is possible, though the endpoint would need > * to be aware to the MPS of the destination. To work around this, > -- > 1.7.1 > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=60671 [2] http://lkml.kernel.org/r/4FFA9B96.6040901@oracle.com [3] http://lkml.kernel.org/r/509B5038.8090304@oracle.com -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I think the strategy of updating the device MPS when possible makes > sense, but I don't think we should do it in PCIE_BUS_TUNE_OFF mode. > That mode is documented as "Disable PCIe MPS tuning and use the > BIOS-configured MPS defaults." This patch changes that to something > like "Disable PCIe MPS tuning, except for hot-added devices" and there > is no longer a way to tell Linux to never touch MPS. Hi Bjorn, Thanks for your review and comments! As you mentioned, PCIE_BUS_TUNE_OFF means "Disable PCIe MPS tuning and use the BIOS-configured MPS defaults.", But hotplug action make the BIOS default mps setting changed(power off, all registers reset). So If we only touch the newly inserted device mps, I think maybe it's reasonable. > > Eventually, I think the default mode should change to PCIE_BUS_SAFE, > where Linux changes MPS settings at boot-time and at hotplug-time to > make sure every device works. (This mode assumes no peer-to-peer > DMA.) I know this was tried in the past, and we tripped over all > sorts of issues, but it's not clear how many were problems with the > Linux code and how many were unsolvable BIOS or platform issues. Agree. > > Then we'd have these choices: > > PCIE_BUS_TUNE_OFF Never touch MPS > PCIE_BUS_PEER2PEER Set all MPS to 128, so peer-to-peer DMA works > PCIE_BUS_SAFE Configure each device with largest safe MPS > (assumes no peer-to-peer DMA) > PCIE_BUS_PERFORMANCE Use MRRS in addition to MPS > (assumes no peer-to-peer DMA) > > The hot-add issue [1] could be regarded as a BIOS bug -- the BIOS > programmed a hotplug bridge with MPS=256. A hot-added device powers > up with MPS=128, so it's only safe for BIOS to set MPS=256 if the OS > is smart enough to change the bridge MPS, the device MPS, or both, at > hot-add time. That doesn't seem like a good assumption for a BIOS to > make. > > I think we should always *warn* about potential MPS issues, even in > PCIE_BUS_TUNE_OFF mode. That would help diagnose the hot-add issue as > well as issues like the ones Joe Jin reported [2] and [3]. OK, I will add a new patch to provide "warn" info if necessary like Joe Jin reported. But because hotplug issue [1] and Joe reported [2] and [3] only encountered in PCIE_BUS_TUNE_OFF mode. > > I think what we should do is *always* call pcie_bus_configure_set(), > no matter what mode we're in, but make pcie_bus_configure_set() smart > enough to do different things (print warnings, adjust settings, do the > stuff you added in pcie_bus_update_set(), etc.) depending on what mode > we're in. OK, I will try to rework this patch. Thanks! Yijing. >> + } >> >> /* FIXME - Peer to peer DMA is possible, though the endpoint would need >> * to be aware to the MPS of the destination. To work around this, >> -- >> 1.7.1 >> >> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=60671 > [2] http://lkml.kernel.org/r/4FFA9B96.6040901@oracle.com > [3] http://lkml.kernel.org/r/509B5038.8090304@oracle.com > > . >
On Sun, Aug 25, 2013 at 9:42 PM, Yijing Wang <wangyijing@huawei.com> wrote: >> I think the strategy of updating the device MPS when possible makes >> sense, but I don't think we should do it in PCIE_BUS_TUNE_OFF mode. >> That mode is documented as "Disable PCIe MPS tuning and use the >> BIOS-configured MPS defaults." This patch changes that to something >> like "Disable PCIe MPS tuning, except for hot-added devices" and there >> is no longer a way to tell Linux to never touch MPS. > > Hi Bjorn, > Thanks for your review and comments! > > As you mentioned, PCIE_BUS_TUNE_OFF means "Disable PCIe MPS tuning and use the > BIOS-configured MPS defaults.", But hotplug action make the BIOS default mps setting > changed(power off, all registers reset). So If we only touch the newly inserted device mps, > I think maybe it's reasonable. I agree, it might be reasonable. But I think it's too hard to document that behavior. I think it's better to have behavior that is easy to understand and explain, even if it is slightly suboptimal. The current Linux default is PCIE_BUS_TUNE_OFF, and given that I don't want to touch any MPS settings in that mode, I don't see a way to safely fix https://bugzilla.kernel.org/show_bug.cgi?id=60671 (the problem with hot-added devices not working because MPS is incorrect). In the long term, I hope we can fix it by making the default PCIE_BUS_SAFE, but that doesn't help right now. That leaves us with only the workaround of booting the Huawei rh5885 box with "pci=pcie_bus_safe". I'm willing to accept that because I think we can argue that this is really a BIOS defect. The BIOS *can* program MPS to values that will be safe for hotplug even if the OS does nothing, i.e., it can set MPS=128 in all paths that lead to a hotpluggable slot. I think that's probably what this BIOS *should* do, since it has no way of knowing whether the OS will support hotplug or whether the OS will reprogram any MPS values. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 26, 2013 at 2:33 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > The current Linux default is PCIE_BUS_TUNE_OFF, and given that I don't > want to touch any MPS settings in that mode, I don't see a way to > safely fix https://bugzilla.kernel.org/show_bug.cgi?id=60671 (the > problem with hot-added devices not working because MPS is incorrect). > In the long term, I hope we can fix it by making the default > PCIE_BUS_SAFE, but that doesn't help right now. > > That leaves us with only the workaround of booting the Huawei rh5885 > box with "pci=pcie_bus_safe". > > I'm willing to accept that because I think we can argue that this is > really a BIOS defect. The BIOS *can* program MPS to values that will > be safe for hotplug even if the OS does nothing, i.e., it can set > MPS=128 in all paths that lead to a hotpluggable slot. I think that's > probably what this BIOS *should* do, since it has no way of knowing > whether the OS will support hotplug or whether the OS will reprogram > any MPS values. BIOS should have several settings for MPS: 1. 128 2. auto or performance. When it set to Auto, Linux will have problem with hot-add. Default one was 128 before, that is ok, as from sndbrige and ivbridge, chipset could support more than 128. BIOS want to set it auto. BIOS guys is claiming that other OSes are ok with Auto, but only Linux has problem. So maybe it's time for us to change default to pcie_bus_perf iff 1. we detect there are pcie bridge with hotplug support is around 2. mpss for those bridge is not set 128. --- keep this optional ? at same time issue warning that we change to perf, is user have problem they could try to override from command line when they have problem. Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013/8/27 5:33, Bjorn Helgaas wrote: > On Sun, Aug 25, 2013 at 9:42 PM, Yijing Wang <wangyijing@huawei.com> wrote: >>> I think the strategy of updating the device MPS when possible makes >>> sense, but I don't think we should do it in PCIE_BUS_TUNE_OFF mode. >>> That mode is documented as "Disable PCIe MPS tuning and use the >>> BIOS-configured MPS defaults." This patch changes that to something >>> like "Disable PCIe MPS tuning, except for hot-added devices" and there >>> is no longer a way to tell Linux to never touch MPS. >> >> Hi Bjorn, >> Thanks for your review and comments! >> >> As you mentioned, PCIE_BUS_TUNE_OFF means "Disable PCIe MPS tuning and use the >> BIOS-configured MPS defaults.", But hotplug action make the BIOS default mps setting >> changed(power off, all registers reset). So If we only touch the newly inserted device mps, >> I think maybe it's reasonable. > > I agree, it might be reasonable. But I think it's too hard to > document that behavior. I think it's better to have behavior that is > easy to understand and explain, even if it is slightly suboptimal. > > The current Linux default is PCIE_BUS_TUNE_OFF, and given that I don't > want to touch any MPS settings in that mode, I don't see a way to > safely fix https://bugzilla.kernel.org/show_bug.cgi?id=60671 (the > problem with hot-added devices not working because MPS is incorrect). > In the long term, I hope we can fix it by making the default > PCIE_BUS_SAFE, but that doesn't help right now. I also think we should consider to change default mode to pcie_bus_safe. Jon mentioned that there are number of issues discovered on some x86 chipsets. However, no further details. But if we use PCIE_BUS_TUNE_OFF all the time, we never have chance to fix these issues. > > That leaves us with only the workaround of booting the Huawei rh5885 > box with "pci=pcie_bus_safe". > > I'm willing to accept that because I think we can argue that this is > really a BIOS defect. The BIOS *can* program MPS to values that will > be safe for hotplug even if the OS does nothing, i.e., it can set > MPS=128 in all paths that lead to a hotpluggable slot. I think that's > probably what this BIOS *should* do, since it has no way of knowing > whether the OS will support hotplug or whether the OS will reprogram > any MPS values. > Yes, we temporarily make BIOS program all MPS to 128 to avoid this problem now. Thanks! Yijing. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 4afd158..06e88c5 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1602,6 +1602,43 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data) return 0; } +static int pcie_bus_update_set(struct pci_dev *dev, void *data) +{ + int mps, p_mps, mpss; + struct pci_dev *parent; + + if (!pci_is_pcie(dev) || !dev->bus->self) + return 0; + + parent = dev->bus->self; + mps = pcie_get_mps(dev); + p_mps = pcie_get_mps(dev->bus->self); + + if (mps >= p_mps) + return 0; + + /* we only update the device mps, unless its parent device is root port, + * and it is the only slot directly connected to root port. + */ + mpss = 128 << dev->pcie_mpss; + if (mpss >= p_mps) { + pcie_write_mps(dev, p_mps); + } else if (pci_pcie_type(parent) == PCI_EXP_TYPE_ROOT_PORT) { + pcie_write_mps(parent, mpss); + pcie_write_mps(dev, mpss); + } else + dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n" + "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n", + mps, 128 << dev->pcie_mpss, p_mps); + return 0; +} + +static void pcie_bus_update_setting(struct pci_bus *bus) +{ + if (bus->self->is_hotplug_bridge) + pci_walk_bus(bus, pcie_bus_update_set, NULL); +} + /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down, * parents then children fashion. If this changes, then this code will not * work as designed. @@ -1616,8 +1653,17 @@ void pcie_bus_configure_settings(struct pci_bus *bus) if (!pci_is_pcie(bus->self)) return; - if (pcie_bus_config == PCIE_BUS_TUNE_OFF) + if (pcie_bus_config == PCIE_BUS_TUNE_OFF) { + /* Sometimes we should update device mps here, + * eg. after hot add, device mps value will be + * set to default(128B), but the upstream port + * mps value may be larger than 128B, if we do + * not update the device mps, it maybe can not + * work normally. + */ + pcie_bus_update_setting(bus); return; + } /* FIXME - Peer to peer DMA is possible, though the endpoint would need * to be aware to the MPS of the destination. To work around this,
Currently we don't update device's mps value when doing pci device hot-add. The hot-added device's mps will be set to default value (128B). But the upstream port device's mps may be larger than 128B which was set by firmware during system bootup. In this case the new added device may not work normally. This patch try to update the hot added device mps equal to its parent mps, if device mpss < parent mps, print warning. References: https://bugzilla.kernel.org/show_bug.cgi?id=60671 Reported-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Cc: Jon Mason <jdmason@kudzu.us> Cc: stable@vger.kernel.org # 3.4+ --- drivers/pci/probe.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 47 insertions(+), 1 deletions(-)