Message ID | 155597243666.19387.1205950870601742062.stgit@gimli.home (mailing list archive) |
---|---|
State | Accepted, archived |
Commit | 15d2aba7c602cd9005b20ff011b670547b3882c4 |
Headers | show |
Series | PCI/LINK: Account for BW notification in vector calculation | expand |
On 4/22/19 5:43 PM, Alex Williamson wrote: > [ 329.725607] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 > GT/s x16 link) > [ 708.151488] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 > GT/s x16 link) > [ 718.262959] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 > GT/s x16 link) > [ 1138.124932] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 > GT/s x16 link) > > What is the value of this nagging? Good! The bandwidth notification service is working as intended. If this bothers you, you can unbind the device from the bandwidth notification driver: echo 0000:07:00.0:pcie010 | sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind > diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c > index 7d04f9d087a6..1b330129089f 100644 > --- a/drivers/pci/pcie/portdrv_core.c > +++ b/drivers/pci/pcie/portdrv_core.c > @@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask, > * 7.8.2, 7.10.10, 7.31.2. > */ > > - if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) { > + if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP | > + PCIE_PORT_SERVICE_BWNOTIF)) { > pcie_capability_read_word(dev, PCI_EXP_FLAGS, ®16); > *pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9; > nvec = *pme + 1; Good catch!
On Mon, 22 Apr 2019 19:05:57 -0500 Alex G <mr.nuke.me@gmail.com> wrote: > On 4/22/19 5:43 PM, Alex Williamson wrote: > > [ 329.725607] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, > > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 > > GT/s x16 link) > > [ 708.151488] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, > > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 > > GT/s x16 link) > > [ 718.262959] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, > > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 > > GT/s x16 link) > > [ 1138.124932] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, > > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 > > GT/s x16 link) > > > > What is the value of this nagging? > > Good! The bandwidth notification service is working as intended. If this > bothers you, you can unbind the device from the bandwidth notification > driver: > > echo 0000:07:00.0:pcie010 | > sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind That's a bad solution for users, this is meaningless tracking of a device whose driver is actively managing the link bandwidth for power purposes. There is nothing wrong happening here that needs to fill logs. I thought maybe if I enabled notification of autonomous bandwidth changes that it might categorize these as something we could ignore, but it doesn't. How can we identify only cases where this is an erroneous/noteworthy situation? Thanks, Alex > > diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c > > index 7d04f9d087a6..1b330129089f 100644 > > --- a/drivers/pci/pcie/portdrv_core.c > > +++ b/drivers/pci/pcie/portdrv_core.c > > @@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask, > > * 7.8.2, 7.10.10, 7.31.2. > > */ > > > > - if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) { > > + if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP | > > + PCIE_PORT_SERVICE_BWNOTIF)) { > > pcie_capability_read_word(dev, PCI_EXP_FLAGS, ®16); > > *pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9; > > nvec = *pme + 1; > > Good catch!
On 4/22/19 7:33 PM, Alex Williamson wrote: > On Mon, 22 Apr 2019 19:05:57 -0500 > Alex G <mr.nuke.me@gmail.com> wrote: >> echo 0000:07:00.0:pcie010 | >> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind > > That's a bad solution for users, this is meaningless tracking of a > device whose driver is actively managing the link bandwidth for power > purposes. 0.5W savings on a 100+W GPU? I agree it's meaningless. > There is nothing wrong happening here that needs to fill > logs. I thought maybe if I enabled notification of autonomous > bandwidth changes that it might categorize these as something we could > ignore, but it doesn't. > How can we identify only cases where this is > an erroneous/noteworthy situation? Thanks, You don't. Ethernet doesn't. USB doesn't. This logging behavior is consistent with every other subsystem that deals with multi-speed links. I realize some people are very resistant to change (and use very ancient kernels). I do not, however, agree that this is a sufficient argument to dis-unify behavior. Alex
On Tue, 23 Apr 2019 09:33:53 -0500 Alex G <mr.nuke.me@gmail.com> wrote: > On 4/22/19 7:33 PM, Alex Williamson wrote: > > On Mon, 22 Apr 2019 19:05:57 -0500 > > Alex G <mr.nuke.me@gmail.com> wrote: > >> echo 0000:07:00.0:pcie010 | > >> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind > > > > That's a bad solution for users, this is meaningless tracking of a > > device whose driver is actively managing the link bandwidth for power > > purposes. > > 0.5W savings on a 100+W GPU? I agree it's meaningless. Evidence? Regardless, I don't have control of the driver that's making these changes, but the claim seems unfounded and irrelevant. > > There is nothing wrong happening here that needs to fill > > logs. I thought maybe if I enabled notification of autonomous > > bandwidth changes that it might categorize these as something we could > > ignore, but it doesn't. > > How can we identify only cases where this is > > an erroneous/noteworthy situation? Thanks, > > You don't. Ethernet doesn't. USB doesn't. This logging behavior is > consistent with every other subsystem that deals with multi-speed links. > I realize some people are very resistant to change (and use very ancient > kernels). I do not, however, agree that this is a sufficient argument to > dis-unify behavior. Sorry, I don't see how any of this is relevant either. Clearly I'm using a recent kernel or I wouldn't be seeing this new bandwidth notification driver. I'm assigning a device to a VM whose driver is power managing the device via link speed changes. The result is that we now see irrelevant spam in the host dmesg for every inconsequential link downgrade directed by the device. I can see why we might want to be notified of degraded links due to signal issues, but what I'm reporting is that there are also entirely normal and benign reasons that a link might be reduced, we can't seem to tell the difference between a fault and this normal dynamic scaling, and the assumption of a fault is spamming dmesg. So, I don't think what we have here is well cooked. Do drivers have a mechanism to opt-out of this error reporting? Can drivers register an anticipated link change to avoid the spam? What instructions can we *reasonably* give to users as to when these messages mean something, when they don't, any how they can be turned off? Thanks, Alex
On Tue, Apr 23, 2019 at 09:34:08AM -0600, Alex Williamson wrote: > On Tue, 23 Apr 2019 09:33:53 -0500 Alex G <mr.nuke.me@gmail.com> wrote: > > 0.5W savings on a 100+W GPU? I agree it's meaningless. > > Evidence? Regardless, I don't have control of the driver that's making > these changes, but the claim seems unfounded and irrelevant. On laptops, 0.5 W can result in noticeably longer battery life. > I can see why we might want to > be notified of degraded links due to signal issues, but what I'm > reporting is that there are also entirely normal and benign reasons > that a link might be reduced, we can't seem to tell the difference > between a fault and this normal dynamic scaling, and the assumption of > a fault is spamming dmesg. So, I don't think what we have here is well > cooked. Do drivers have a mechanism to opt-out of this error > reporting? Is dmesg spammed even if no driver is bound to a GPU? If so, that would suggest a solution that's not dependent on drivers. E.g., the bw_notification port service could avoid reports for devices matching PCI_BASE_CLASS_DISPLAY. (It could also avoid binding to ports whose children include such a device, but the child may be hot-pluggable and thus appear only after the port is bound.) Then we'd still get a notification on boot about degraded link speed, but not continuous messages. Thanks, Lukas
On 4/23/19 10:34 AM, Alex Williamson wrote: > On Tue, 23 Apr 2019 09:33:53 -0500 > Alex G <mr.nuke.me@gmail.com> wrote: > >> On 4/22/19 7:33 PM, Alex Williamson wrote: >>> On Mon, 22 Apr 2019 19:05:57 -0500 >>> Alex G <mr.nuke.me@gmail.com> wrote: >>>> echo 0000:07:00.0:pcie010 | >>>> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind >>> >>> That's a bad solution for users, this is meaningless tracking of a >>> device whose driver is actively managing the link bandwidth for power >>> purposes. >> >> 0.5W savings on a 100+W GPU? I agree it's meaningless. > > Evidence? Regardless, I don't have control of the driver that's making > these changes, but the claim seems unfounded and irrelevant. The number of 5mW/Gb/lane doesn't ring a bell? [1] [2]. Your GPU supports 5Gb/s, so likely using an older, more power hungry process. I suspect it's still within the same order of magnitude. > I'm assigning a device to a VM [snip] > I can see why we might want to be notified of degraded links due to signal issues, > but what I'm reporting is that there are also entirely normal reasons > [snip] we can't seem to tell the difference Unfortunately, there is no way in PCI-Express to distinguish between an expected link bandwidth change and one due to error. If you're using virt-manager to configure the VM, then virt-manager could have a checkbox to disable link bandwidth management messages. I'd rather we avoid kernel-side heuristics (like Lukas suggested). If you're confident that your link will operate as intended, and don't want messages about it, that's your call as a user -- we shouldn't decide this in the kernel. Alex [1] https://www.synopsys.com/designware-ip/technical-bulletin/reduce-power-consumption.html
On Tue, 23 Apr 2019 11:03:04 -0500 Alex G <mr.nuke.me@gmail.com> wrote: > On 4/23/19 10:34 AM, Alex Williamson wrote: > > On Tue, 23 Apr 2019 09:33:53 -0500 > > Alex G <mr.nuke.me@gmail.com> wrote: > > > >> On 4/22/19 7:33 PM, Alex Williamson wrote: > >>> On Mon, 22 Apr 2019 19:05:57 -0500 > >>> Alex G <mr.nuke.me@gmail.com> wrote: > >>>> echo 0000:07:00.0:pcie010 | > >>>> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind > >>> > >>> That's a bad solution for users, this is meaningless tracking of a > >>> device whose driver is actively managing the link bandwidth for power > >>> purposes. > >> > >> 0.5W savings on a 100+W GPU? I agree it's meaningless. > > > > Evidence? Regardless, I don't have control of the driver that's making > > these changes, but the claim seems unfounded and irrelevant. > > The number of 5mW/Gb/lane doesn't ring a bell? [1] [2]. Your GPU > supports 5Gb/s, so likely using an older, more power hungry process. I > suspect it's still within the same order of magnitude. This doesn't necessarily imply the overall power savings to the endpoint as a whole though, and it's still irrelevant to the discussion here. The driver is doing something reasonable that's generating host dmesg spam. > > I'm assigning a device to a VM [snip] > > I can see why we might want to be notified of degraded links due to signal issues, > > but what I'm reporting is that there are also entirely normal reasons > > [snip] we can't seem to tell the difference > > Unfortunately, there is no way in PCI-Express to distinguish between an > expected link bandwidth change and one due to error. Then assuming every link speed change is an error seems like the wrong approach. Should we instead have a callback that drivers can optionally register to receive link change notifications? If a driver doesn't register such a callback then a generic message can be posted, but if they do, the driver can decide whether this is an error. > If you're using virt-manager to configure the VM, then virt-manager > could have a checkbox to disable link bandwidth management messages. I'd What makes us think that this is the only case where such link speed changes will occur? Hand waving that a userspace management utility should go unbind drivers that over-zealously report errors is a poor solution. > rather we avoid kernel-side heuristics (like Lukas suggested). If you're > confident that your link will operate as intended, and don't want > messages about it, that's your call as a user -- we shouldn't decide > this in the kernel. Nor should pci-core decide what link speed changes are intended or errors. Minimally we should be enabling drivers to receive this feedback. Thanks, Alex
On 4/23/19 11:22 AM, Alex Williamson wrote: > Nor should pci-core decide what link speed changes are intended or > errors. Minimally we should be enabling drivers to receive this > feedback. Thanks, Not errors. pci core reports that a link speed change event has occured. Period. Alex
On Tue, 23 Apr 2019 11:27:39 -0500 Alex G <mr.nuke.me@gmail.com> wrote: > On 4/23/19 11:22 AM, Alex Williamson wrote: > > Nor should pci-core decide what link speed changes are intended or > > errors. Minimally we should be enabling drivers to receive this > > feedback. Thanks, > > Not errors. pci core reports that a link speed change event has occured. > Period. And it shows up in dmesg, and what do users (and developers) think when things are repeatedly reported in dmesg? Whether this is "information" or "error", it's spamming dmesg, irrelevant, and confusing. Thanks, Alex
On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote: > On 4/22/19 7:33 PM, Alex Williamson wrote: > > There is nothing wrong happening here that needs to fill logs. I > > thought maybe if I enabled notification of autonomous bandwidth > > changes that it might categorize these as something we could > > ignore, but it doesn't. How can we identify only cases where this > > is an erroneous/noteworthy situation? Thanks, > > You don't. Ethernet doesn't. USB doesn't. This logging behavior is > consistent with every other subsystem that deals with multi-speed links. Can you point me to the logging in these other subsystems so I can learn more about how they deal with this? I agree that emitting log messages for normal and expected events will lead to user confusion and we need to do something. e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") was merged in v5.1-rc1, so we still have (a little) time to figure this out before v5.1. Bjorn
On 4/23/19 12:10 PM, Bjorn Helgaas wrote: > On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote: >> On 4/22/19 7:33 PM, Alex Williamson wrote: >>> There is nothing wrong happening here that needs to fill logs. I >>> thought maybe if I enabled notification of autonomous bandwidth >>> changes that it might categorize these as something we could >>> ignore, but it doesn't. How can we identify only cases where this >>> is an erroneous/noteworthy situation? Thanks, >> >> You don't. Ethernet doesn't. USB doesn't. This logging behavior is >> consistent with every other subsystem that deals with multi-speed links. > > Can you point me to the logging in these other subsystems so I can > learn more about how they deal with this? I don't have any in-depth articles about the logging in these systems, but I can extract some logs from my machines. Ethernet: [Sun Apr 21 11:14:06 2019] e1000e: eno1 NIC Link is Down [Sun Apr 21 11:14:17 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx [Sun Apr 21 11:14:23 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx [Sun Apr 21 23:33:31 2019] e1000e: eno1 NIC Link is Down [Sun Apr 21 23:33:43 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx [Sun Apr 21 23:33:48 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx I used to have one of these "green" ethernet switches that went down to 100mbps automatically. You can imagine how "clogged" the logs were with link up messages. Thank goodness that switch was killed in a thunderstorm. USB will log every device insertion and removal, very verbosely (see appendix A). > I agree that emitting log messages for normal and expected events will > lead to user confusion and we need to do something. > > e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth > notification") was merged in v5.1-rc1, so we still have (a little) > time to figure this out before v5.1. I always viewed the system log as a system log, instead of a database of system errors. I may have extremist views, but going back to Alex's example, I prefer to see that the power saving mechanism is doing something to save power on my laptop (I'll just ignore it on a desktop). If you think increasing code complexity because people don't want things logged into the system log, then I'm certain we can work out some sane solution. It's the same problem we see with GCC, where people want warning messages here, but don't want the same messages there. Alex P.S. The pedantic in me points out that one of the examples I gave is a terrible example. ASPM "allows hardware-autonomous, dynamic Link power reduction beyond what is achievable by software-only control" [1]. [1] PCI-Express 3.0 -- 5.4.1. Active State Power Management (ASPM) Appendix A: [1618067.987084] usb 1-3.5: new high-speed USB device number 79 using xhci_hcd [1618068.179914] usb 1-3.5: New USB device found, idVendor=0bda, idProduct=4014, bcdDevice= 0.05 [1618068.179924] usb 1-3.5: New USB device strings: Mfr=3, Product=1, SerialNumber=2 [1618068.179930] usb 1-3.5: Product: USB Audio [1618068.179936] usb 1-3.5: Manufacturer: Generic [1618068.179941] usb 1-3.5: SerialNumber: 200901010001 [1618068.280100] usb 1-3.6: new low-speed USB device number 80 using xhci_hcd [1618068.342541] Bluetooth: hci0: Waiting for firmware download to complete [1618068.342795] Bluetooth: hci0: Firmware loaded in 1509081 usecs [1618068.342887] Bluetooth: hci0: Waiting for device to boot [1618068.354919] Bluetooth: hci0: Device booted in 11797 usecs [1618068.356006] Bluetooth: hci0: Found Intel DDC parameters: intel/ibt-12-16.ddc [1618068.358958] Bluetooth: hci0: Applying Intel DDC parameters completed [1618068.378624] usb 1-3.6: New USB device found, idVendor=04d9, idProduct=1400, bcdDevice= 1.43 [1618068.378626] usb 1-3.6: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [1618068.390686] input: HID 04d9:1400 as /devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.0/0003:04D9:1400.0139/input/input921 [1618068.444282] hid-generic 0003:04D9:1400.0139: input,hidraw1: USB HID v1.10 Keyboard [HID 04d9:1400] on usb-0000:00:14.0-3.6/input0 [1618068.456373] input: HID 04d9:1400 Mouse as /devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.1/0003:04D9:1400.013A/input/input922 [1618068.457929] input: HID 04d9:1400 Consumer Control as /devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.1/0003:04D9:1400.013A/input/input923 [1618068.509294] input: HID 04d9:1400 System Control as /devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.1/0003:04D9:1400.013A/input/input924 [1618068.509518] hid-generic 0003:04D9:1400.013A: input,hidraw2: USB HID v1.10 Mouse [HID 04d9:1400] on usb-0000:00:14.0-3.6/input1 [1618068.588078] usb 1-3.7: new full-speed USB device number 81 using xhci_hcd [1618068.679132] usb 1-3.7: New USB device found, idVendor=046d, idProduct=c52b, bcdDevice=12.03 [1618068.679137] usb 1-3.7: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [1618068.679139] usb 1-3.7: Product: USB Receiver [1618068.679142] usb 1-3.7: Manufacturer: Logitech [1618068.692430] logitech-djreceiver 0003:046D:C52B.013D: hiddev96,hidraw3: USB HID v1.11 Device [Logitech USB Receiver] on usb-0000:00:14.0-3.7/input2 [1618068.817334] input: Logitech Performance MX as /devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.7/1-3.7:1.2/0003:046D:C52B.013D/0003:046D:101A.013E/input/input925 [1618068.820357] logitech-hidpp-device 0003:046D:101A.013E: input,hidraw4: USB HID v1.11 Mouse [Logitech Performance MX] on usb-0000:00:14.0-3.7:1
On 4/22/19 5:43 PM, Alex Williamson wrote: > On systems that don't support any PCIe services other than bandwidth > notification, pcie_message_numbers() can return zero vectors, causing > the vector reallocation in pcie_port_enable_irq_vec() to retry with > zero, which fails, resulting in fallback to INTx (which might be > broken) for the bandwidth notification service. This can resolve > spurious interrupt faults due to this service on some systems. > > Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- +1 Tested on some Dell servers. Everything works as expected. I don't have a system with a device that only supports bandwidth notification. Alex
On Tue, 23 Apr 2019 12:53:07 -0500 Alex G <mr.nuke.me@gmail.com> wrote: > On 4/23/19 12:10 PM, Bjorn Helgaas wrote: > > On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote: > >> On 4/22/19 7:33 PM, Alex Williamson wrote: > >>> There is nothing wrong happening here that needs to fill logs. I > >>> thought maybe if I enabled notification of autonomous bandwidth > >>> changes that it might categorize these as something we could > >>> ignore, but it doesn't. How can we identify only cases where this > >>> is an erroneous/noteworthy situation? Thanks, > >> > >> You don't. Ethernet doesn't. USB doesn't. This logging behavior is > >> consistent with every other subsystem that deals with multi-speed links. > > > > Can you point me to the logging in these other subsystems so I can > > learn more about how they deal with this? > > I don't have any in-depth articles about the logging in these systems, > but I can extract some logs from my machines. > > Ethernet: > > [Sun Apr 21 11:14:06 2019] e1000e: eno1 NIC Link is Down > [Sun Apr 21 11:14:17 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full > Duplex, Flow Control: Rx/Tx > [Sun Apr 21 11:14:23 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full > Duplex, Flow Control: Rx/Tx > [Sun Apr 21 23:33:31 2019] e1000e: eno1 NIC Link is Down > [Sun Apr 21 23:33:43 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full > Duplex, Flow Control: Rx/Tx > [Sun Apr 21 23:33:48 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full > Duplex, Flow Control: Rx/Tx > > I used to have one of these "green" ethernet switches that went down to > 100mbps automatically. You can imagine how "clogged" the logs were with > link up messages. Thank goodness that switch was killed in a thunderstorm. > > USB will log every device insertion and removal, very verbosely (see > appendix A). I have a hard time putting USB insertion and removal into the same class, the equivalent is PCI hotplug which is logged separately. Do we ever log beyond USB device discovery if a device is running at a lower speed than is possible? The most directly related is the green ethernet switch, which you admit was a nuisance due to exactly this sort of logging. It was probably confusing to see this logging, perhaps you wondered if the cable was bad or the switch was defective. > > I agree that emitting log messages for normal and expected events will > > lead to user confusion and we need to do something. > > > > e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth > > notification") was merged in v5.1-rc1, so we still have (a little) > > time to figure this out before v5.1. > > I always viewed the system log as a system log, instead of a database of > system errors. I may have extremist views, but going back to Alex's > example, I prefer to see that the power saving mechanism is doing > something to save power on my laptop (I'll just ignore it on a desktop). There's a disconnect from above where similar behavior on ethernet behavior "clogged" the log files, but here we just want to ignore it. Excessive logging can also be considered a denial of service vector when the device generating that excessive logging is attached to a userspace driver. > If you think increasing code complexity because people don't want things > logged into the system log, then I'm certain we can work out some sane > solution. It's the same problem we see with GCC, where people want > warning messages here, but don't want the same messages there. v5.1 is approaching quickly, can we downgrade these to pci_dbg() while we work on maybe some sort of driver participation in this logging? Thanks, Alex
On Mon, Apr 22, 2019 at 04:43:30PM -0600, Alex Williamson wrote: > On systems that don't support any PCIe services other than bandwidth > notification, pcie_message_numbers() can return zero vectors, causing > the vector reallocation in pcie_port_enable_irq_vec() to retry with > zero, which fails, resulting in fallback to INTx (which might be > broken) for the bandwidth notification service. This can resolve > spurious interrupt faults due to this service on some systems. > > Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Applied for (hopefully) v5.1, thanks! > drivers/pci/pcie/portdrv_core.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c > index 7d04f9d087a6..1b330129089f 100644 > --- a/drivers/pci/pcie/portdrv_core.c > +++ b/drivers/pci/pcie/portdrv_core.c > @@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask, > * 7.8.2, 7.10.10, 7.31.2. > */ > > - if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) { > + if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP | > + PCIE_PORT_SERVICE_BWNOTIF)) { > pcie_capability_read_word(dev, PCI_EXP_FLAGS, ®16); > *pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9; > nvec = *pme + 1; >
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c index 7d04f9d087a6..1b330129089f 100644 --- a/drivers/pci/pcie/portdrv_core.c +++ b/drivers/pci/pcie/portdrv_core.c @@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask, * 7.8.2, 7.10.10, 7.31.2. */ - if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) { + if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP | + PCIE_PORT_SERVICE_BWNOTIF)) { pcie_capability_read_word(dev, PCI_EXP_FLAGS, ®16); *pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9; nvec = *pme + 1;
On systems that don't support any PCIe services other than bandwidth notification, pcie_message_numbers() can return zero vectors, causing the vector reallocation in pcie_port_enable_irq_vec() to retry with zero, which fails, resulting in fallback to INTx (which might be broken) for the bandwidth notification service. This can resolve spurious interrupt faults due to this service on some systems. Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- However, the system is still susceptible to random spew in dmesg depending on how the root port handles downstream device managed link speed changes. For example, GPUs like to scale their link speed for power management when idle. A GPU assigned to a VM through vfio-pci can generate link bandwidth notification every time the link is scaled down, ex: [ 329.725607] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link) [ 708.151488] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link) [ 718.262959] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link) [ 1138.124932] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link) What is the value of this nagging? drivers/pci/pcie/portdrv_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)