PCI/LINK: Account for BW notification in vector calculation

Message ID	155597243666.19387.1205950870601742062.stgit@gimli.home (mailing list archive)
State	Accepted, archived
Commit	15d2aba7c602cd9005b20ff011b670547b3882c4
Headers	show Return-Path: <linux-pci-owner@kernel.org> Subject: [PATCH] PCI/LINK: Account for BW notification in vector calculation From: Alex Williamson <alex.williamson@redhat.com> To: bhelgaas@google.com, helgaas@kernel.org, mr.nuke.me@gmail.com, linux-pci@vger.kernel.org Cc: austin_bolen@dell.com, alex_gagniuc@dellteam.com, keith.busch@intel.com, Shyam_Iyer@Dell.com, lukas@wunner.de, okaya@kernel.org, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org Date: Mon, 22 Apr 2019 16:43:30 -0600 Message-ID: <155597243666.19387.1205950870601742062.stgit@gimli.home> User-Agent: StGit/0.19-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-pci-owner@vger.kernel.org Precedence: bulk
Series	PCI/LINK: Account for BW notification in vector calculation \| expand PCI/LINK: Account for BW notification in vector calculation

Alex Williamson April 22, 2019, 10:43 p.m. UTC

On systems that don't support any PCIe services other than bandwidth
notification, pcie_message_numbers() can return zero vectors, causing
the vector reallocation in pcie_port_enable_irq_vec() to retry with
zero, which fails, resulting in fallback to INTx (which might be
broken) for the bandwidth notification service.  This can resolve
spurious interrupt faults due to this service on some systems.

Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

However, the system is still susceptible to random spew in dmesg
depending on how the root port handles downstream device managed link
speed changes.  For example, GPUs like to scale their link speed for
power management when idle.  A GPU assigned to a VM through vfio-pci
can generate link bandwidth notification every time the link is
scaled down, ex:

[  329.725607] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[  708.151488] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[  718.262959] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[ 1138.124932] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)

What is the value of this nagging?

 drivers/pci/pcie/portdrv_core.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Alex G. April 23, 2019, 12:05 a.m. UTC | #1

On 4/22/19 5:43 PM, Alex Williamson wrote:
> [  329.725607] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> GT/s x16 link)
> [  708.151488] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> GT/s x16 link)
> [  718.262959] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> GT/s x16 link)
> [ 1138.124932] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> GT/s x16 link)
> 
> What is the value of this nagging?

Good! The bandwidth notification service is working as intended. If this 
bothers you, you can unbind the device from the bandwidth notification 
driver:

echo 0000:07:00.0:pcie010 |
sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind



> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 7d04f9d087a6..1b330129089f 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
>   	 * 7.8.2, 7.10.10, 7.31.2.
>   	 */
>   
> -	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) {
> +	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
> +		    PCIE_PORT_SERVICE_BWNOTIF)) {
>   		pcie_capability_read_word(dev, PCI_EXP_FLAGS, &reg16);
>   		*pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9;
>   		nvec = *pme + 1;

Good catch!

Alex Williamson April 23, 2019, 12:33 a.m. UTC | #2

On Mon, 22 Apr 2019 19:05:57 -0500
Alex G <mr.nuke.me@gmail.com> wrote:

> On 4/22/19 5:43 PM, Alex Williamson wrote:
> > [  329.725607] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > [  708.151488] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > [  718.262959] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > [ 1138.124932] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > 
> > What is the value of this nagging?  
> 
> Good! The bandwidth notification service is working as intended. If this 
> bothers you, you can unbind the device from the bandwidth notification 
> driver:
> 
> echo 0000:07:00.0:pcie010 |
> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind

That's a bad solution for users, this is meaningless tracking of a
device whose driver is actively managing the link bandwidth for power
purposes.  There is nothing wrong happening here that needs to fill
logs.  I thought maybe if I enabled notification of autonomous
bandwidth changes that it might categorize these as something we could
ignore, but it doesn't.  How can we identify only cases where this is
an erroneous/noteworthy situation?  Thanks,

Alex

> > diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> > index 7d04f9d087a6..1b330129089f 100644
> > --- a/drivers/pci/pcie/portdrv_core.c
> > +++ b/drivers/pci/pcie/portdrv_core.c
> > @@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
> >   	 * 7.8.2, 7.10.10, 7.31.2.
> >   	 */
> >   
> > -	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) {
> > +	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
> > +		    PCIE_PORT_SERVICE_BWNOTIF)) {
> >   		pcie_capability_read_word(dev, PCI_EXP_FLAGS, &reg16);
> >   		*pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9;
> >   		nvec = *pme + 1;  
> 
> Good catch!

Alex G. April 23, 2019, 2:33 p.m. UTC | #3

On 4/22/19 7:33 PM, Alex Williamson wrote:
> On Mon, 22 Apr 2019 19:05:57 -0500
> Alex G <mr.nuke.me@gmail.com> wrote:
>> echo 0000:07:00.0:pcie010 |
>> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind
> 
> That's a bad solution for users, this is meaningless tracking of a
> device whose driver is actively managing the link bandwidth for power
> purposes. 

0.5W savings on a 100+W GPU? I agree it's meaningless.

> There is nothing wrong happening here that needs to fill
> logs.  I thought maybe if I enabled notification of autonomous
> bandwidth changes that it might categorize these as something we could
> ignore, but it doesn't.
> How can we identify only cases where this is
> an erroneous/noteworthy situation?  Thanks,

You don't. Ethernet doesn't. USB doesn't. This logging behavior is 
consistent with every other subsystem that deals with multi-speed links. 
I realize some people are very resistant to change (and use very ancient 
kernels). I do not, however, agree that this is a sufficient argument to 
dis-unify behavior.

Alex

Alex Williamson April 23, 2019, 3:34 p.m. UTC | #4

On Tue, 23 Apr 2019 09:33:53 -0500
Alex G <mr.nuke.me@gmail.com> wrote:

> On 4/22/19 7:33 PM, Alex Williamson wrote:
> > On Mon, 22 Apr 2019 19:05:57 -0500
> > Alex G <mr.nuke.me@gmail.com> wrote:  
> >> echo 0000:07:00.0:pcie010 |
> >> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind  
> > 
> > That's a bad solution for users, this is meaningless tracking of a
> > device whose driver is actively managing the link bandwidth for power
> > purposes.   
> 
> 0.5W savings on a 100+W GPU? I agree it's meaningless.

Evidence?  Regardless, I don't have control of the driver that's making
these changes, but the claim seems unfounded and irrelevant.

> > There is nothing wrong happening here that needs to fill
> > logs.  I thought maybe if I enabled notification of autonomous
> > bandwidth changes that it might categorize these as something we could
> > ignore, but it doesn't.
> > How can we identify only cases where this is
> > an erroneous/noteworthy situation?  Thanks,  
> 
> You don't. Ethernet doesn't. USB doesn't. This logging behavior is 
> consistent with every other subsystem that deals with multi-speed links. 
> I realize some people are very resistant to change (and use very ancient 
> kernels). I do not, however, agree that this is a sufficient argument to 
> dis-unify behavior.

Sorry, I don't see how any of this is relevant either.  Clearly I'm
using a recent kernel or I wouldn't be seeing this new bandwidth
notification driver.  I'm assigning a device to a VM whose driver is
power managing the device via link speed changes.  The result is that
we now see irrelevant spam in the host dmesg for every inconsequential
link downgrade directed by the device.  I can see why we might want to
be notified of degraded links due to signal issues, but what I'm
reporting is that there are also entirely normal and benign reasons
that a link might be reduced, we can't seem to tell the difference
between a fault and this normal dynamic scaling, and the assumption of
a fault is spamming dmesg.  So, I don't think what we have here is well
cooked.  Do drivers have a mechanism to opt-out of this error
reporting?  Can drivers register an anticipated link change to avoid
the spam?  What instructions can we *reasonably* give to users as to
when these messages mean something, when they don't, any how they can
be turned off?  Thanks,

Alex

Lukas Wunner April 23, 2019, 3:49 p.m. UTC | #5

On Tue, Apr 23, 2019 at 09:34:08AM -0600, Alex Williamson wrote:
> On Tue, 23 Apr 2019 09:33:53 -0500 Alex G <mr.nuke.me@gmail.com> wrote:
> > 0.5W savings on a 100+W GPU? I agree it's meaningless.
> 
> Evidence?  Regardless, I don't have control of the driver that's making
> these changes, but the claim seems unfounded and irrelevant.

On laptops, 0.5 W can result in noticeably longer battery life.

> I can see why we might want to
> be notified of degraded links due to signal issues, but what I'm
> reporting is that there are also entirely normal and benign reasons
> that a link might be reduced, we can't seem to tell the difference
> between a fault and this normal dynamic scaling, and the assumption of
> a fault is spamming dmesg.  So, I don't think what we have here is well
> cooked.  Do drivers have a mechanism to opt-out of this error
> reporting?

Is dmesg spammed even if no driver is bound to a GPU?  If so, that would
suggest a solution that's not dependent on drivers.  E.g., the
bw_notification port service could avoid reports for devices matching
PCI_BASE_CLASS_DISPLAY.  (It could also avoid binding to ports whose
children include such a device, but the child may be hot-pluggable
and thus appear only after the port is bound.)  Then we'd still get
a notification on boot about degraded link speed, but not continuous
messages.

Thanks,

Lukas

Alex G. April 23, 2019, 4:03 p.m. UTC | #6

On 4/23/19 10:34 AM, Alex Williamson wrote:
> On Tue, 23 Apr 2019 09:33:53 -0500
> Alex G <mr.nuke.me@gmail.com> wrote:
> 
>> On 4/22/19 7:33 PM, Alex Williamson wrote:
>>> On Mon, 22 Apr 2019 19:05:57 -0500
>>> Alex G <mr.nuke.me@gmail.com> wrote:
>>>> echo 0000:07:00.0:pcie010 |
>>>> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind
>>>
>>> That's a bad solution for users, this is meaningless tracking of a
>>> device whose driver is actively managing the link bandwidth for power
>>> purposes.
>>
>> 0.5W savings on a 100+W GPU? I agree it's meaningless.
> 
> Evidence?  Regardless, I don't have control of the driver that's making
> these changes, but the claim seems unfounded and irrelevant.

The number of 5mW/Gb/lane doesn't ring a bell? [1] [2]. Your GPU 
supports 5Gb/s, so likely using an older, more power hungry process. I 
suspect it's still within the same order of magnitude.

> I'm assigning a device to a VM [snip]
> I can see why we might want to be notified of degraded links due to signal issues,
> but what I'm reporting is that there are also entirely normal reasons
> [snip] we can't seem to tell the difference

Unfortunately, there is no way in PCI-Express to distinguish between an 
expected link bandwidth change and one due to error.

If you're using virt-manager to configure the VM, then virt-manager 
could have a checkbox to disable link bandwidth management messages. I'd 
rather we avoid kernel-side heuristics (like Lukas suggested). If you're 
confident that your link will operate as intended, and don't want 
messages about it, that's your call as a user -- we shouldn't decide 
this in the kernel.

Alex

[1] 
https://www.synopsys.com/designware-ip/technical-bulletin/reduce-power-consumption.html

Alex Williamson April 23, 2019, 4:22 p.m. UTC | #7

On Tue, 23 Apr 2019 11:03:04 -0500
Alex G <mr.nuke.me@gmail.com> wrote:

> On 4/23/19 10:34 AM, Alex Williamson wrote:
> > On Tue, 23 Apr 2019 09:33:53 -0500
> > Alex G <mr.nuke.me@gmail.com> wrote:
> >   
> >> On 4/22/19 7:33 PM, Alex Williamson wrote:  
> >>> On Mon, 22 Apr 2019 19:05:57 -0500
> >>> Alex G <mr.nuke.me@gmail.com> wrote:  
> >>>> echo 0000:07:00.0:pcie010 |
> >>>> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind  
> >>>
> >>> That's a bad solution for users, this is meaningless tracking of a
> >>> device whose driver is actively managing the link bandwidth for power
> >>> purposes.  
> >>
> >> 0.5W savings on a 100+W GPU? I agree it's meaningless.  
> > 
> > Evidence?  Regardless, I don't have control of the driver that's making
> > these changes, but the claim seems unfounded and irrelevant.  
> 
> The number of 5mW/Gb/lane doesn't ring a bell? [1] [2]. Your GPU 
> supports 5Gb/s, so likely using an older, more power hungry process. I 
> suspect it's still within the same order of magnitude.

This doesn't necessarily imply the overall power savings to the
endpoint as a whole though, and it's still irrelevant to the discussion
here.  The driver is doing something reasonable that's generating host
dmesg spam.

> > I'm assigning a device to a VM [snip]
> > I can see why we might want to be notified of degraded links due to signal issues,
> > but what I'm reporting is that there are also entirely normal reasons
> > [snip] we can't seem to tell the difference  
> 
> Unfortunately, there is no way in PCI-Express to distinguish between an 
> expected link bandwidth change and one due to error.

Then assuming every link speed change is an error seems like the wrong
approach.  Should we instead have a callback that drivers can
optionally register to receive link change notifications?  If a driver
doesn't register such a callback then a generic message can be posted,
but if they do, the driver can decide whether this is an error.

> If you're using virt-manager to configure the VM, then virt-manager 
> could have a checkbox to disable link bandwidth management messages. I'd 

What makes us think that this is the only case where such link speed
changes will occur?  Hand waving that a userspace management utility
should go unbind drivers that over-zealously report errors is a poor
solution.

> rather we avoid kernel-side heuristics (like Lukas suggested). If you're 
> confident that your link will operate as intended, and don't want 
> messages about it, that's your call as a user -- we shouldn't decide 
> this in the kernel.

Nor should pci-core decide what link speed changes are intended or
errors.  Minimally we should be enabling drivers to receive this
feedback.  Thanks,

Alex

Alex G. April 23, 2019, 4:27 p.m. UTC | #8

On 4/23/19 11:22 AM, Alex Williamson wrote:
> Nor should pci-core decide what link speed changes are intended or
> errors.  Minimally we should be enabling drivers to receive this
> feedback.  Thanks,

Not errors. pci core reports that a link speed change event has occured. 
Period.

Alex

Alex Williamson April 23, 2019, 4:37 p.m. UTC | #9

On Tue, 23 Apr 2019 11:27:39 -0500
Alex G <mr.nuke.me@gmail.com> wrote:

> On 4/23/19 11:22 AM, Alex Williamson wrote:
> > Nor should pci-core decide what link speed changes are intended or
> > errors.  Minimally we should be enabling drivers to receive this
> > feedback.  Thanks,  
> 
> Not errors. pci core reports that a link speed change event has occured. 
> Period.

And it shows up in dmesg, and what do users (and developers) think when
things are repeatedly reported in dmesg?  Whether this is "information"
or "error", it's spamming dmesg, irrelevant, and confusing.  Thanks,

Alex

Bjorn Helgaas April 23, 2019, 5:10 p.m. UTC | #10

On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote:
> On 4/22/19 7:33 PM, Alex Williamson wrote:
> > There is nothing wrong happening here that needs to fill logs.  I
> > thought maybe if I enabled notification of autonomous bandwidth
> > changes that it might categorize these as something we could
> > ignore, but it doesn't.  How can we identify only cases where this
> > is an erroneous/noteworthy situation?  Thanks,
> 
> You don't. Ethernet doesn't. USB doesn't. This logging behavior is
> consistent with every other subsystem that deals with multi-speed links.

Can you point me to the logging in these other subsystems so I can
learn more about how they deal with this?

I agree that emitting log messages for normal and expected events will
lead to user confusion and we need to do something.

e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
notification") was merged in v5.1-rc1, so we still have (a little)
time to figure this out before v5.1.

Bjorn

Alex G. April 23, 2019, 5:53 p.m. UTC | #11

On 4/23/19 12:10 PM, Bjorn Helgaas wrote:
> On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote:
>> On 4/22/19 7:33 PM, Alex Williamson wrote:
>>> There is nothing wrong happening here that needs to fill logs.  I
>>> thought maybe if I enabled notification of autonomous bandwidth
>>> changes that it might categorize these as something we could
>>> ignore, but it doesn't.  How can we identify only cases where this
>>> is an erroneous/noteworthy situation?  Thanks,
>>
>> You don't. Ethernet doesn't. USB doesn't. This logging behavior is
>> consistent with every other subsystem that deals with multi-speed links.
> 
> Can you point me to the logging in these other subsystems so I can
> learn more about how they deal with this?

I don't have any in-depth articles about the logging in these systems, 
but I can extract some logs from my machines.

Ethernet:

[Sun Apr 21 11:14:06 2019] e1000e: eno1 NIC Link is Down
[Sun Apr 21 11:14:17 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: Rx/Tx
[Sun Apr 21 11:14:23 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: Rx/Tx
[Sun Apr 21 23:33:31 2019] e1000e: eno1 NIC Link is Down
[Sun Apr 21 23:33:43 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: Rx/Tx
[Sun Apr 21 23:33:48 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: Rx/Tx

I used to have one of these "green" ethernet switches that went down to 
100mbps automatically. You can imagine how "clogged" the logs were with 
link up messages. Thank goodness that switch was killed in a thunderstorm.

USB will log every device insertion and removal, very verbosely (see 
appendix A).


> I agree that emitting log messages for normal and expected events will
> lead to user confusion and we need to do something.
> 
> e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
> notification") was merged in v5.1-rc1, so we still have (a little)
> time to figure this out before v5.1.

I always viewed the system log as a system log, instead of a database of 
system errors. I may have extremist views, but going back to Alex's 
example, I prefer to see that the power saving mechanism is doing 
something to save power on my laptop (I'll just ignore it on a desktop).

If you think increasing code complexity because people don't want things 
logged into the system log, then I'm certain we can work out some sane 
solution. It's the same problem we see with GCC, where people want 
warning messages here, but don't want the same messages there.

Alex


P.S. The pedantic in me points out that one of the examples I gave is a 
terrible example. ASPM "allows hardware-autonomous, dynamic Link power 
reduction beyond what is achievable by software-only control" [1].

[1] PCI-Express 3.0 -- 5.4.1. Active State Power Management (ASPM)


Appendix A:

[1618067.987084] usb 1-3.5: new high-speed USB device number 79 using 
xhci_hcd
[1618068.179914] usb 1-3.5: New USB device found, idVendor=0bda, 
idProduct=4014, bcdDevice= 0.05
[1618068.179924] usb 1-3.5: New USB device strings: Mfr=3, Product=1, 
SerialNumber=2
[1618068.179930] usb 1-3.5: Product: USB Audio
[1618068.179936] usb 1-3.5: Manufacturer: Generic
[1618068.179941] usb 1-3.5: SerialNumber: 200901010001
[1618068.280100] usb 1-3.6: new low-speed USB device number 80 using 
xhci_hcd
[1618068.342541] Bluetooth: hci0: Waiting for firmware download to complete
[1618068.342795] Bluetooth: hci0: Firmware loaded in 1509081 usecs
[1618068.342887] Bluetooth: hci0: Waiting for device to boot
[1618068.354919] Bluetooth: hci0: Device booted in 11797 usecs
[1618068.356006] Bluetooth: hci0: Found Intel DDC parameters: 
intel/ibt-12-16.ddc
[1618068.358958] Bluetooth: hci0: Applying Intel DDC parameters completed
[1618068.378624] usb 1-3.6: New USB device found, idVendor=04d9, 
idProduct=1400, bcdDevice= 1.43
[1618068.378626] usb 1-3.6: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
[1618068.390686] input: HID 04d9:1400 as 
/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.0/0003:04D9:1400.0139/input/input921
[1618068.444282] hid-generic 0003:04D9:1400.0139: input,hidraw1: USB HID 
v1.10 Keyboard [HID 04d9:1400] on usb-0000:00:14.0-3.6/input0
[1618068.456373] input: HID 04d9:1400 Mouse as 
/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.1/0003:04D9:1400.013A/input/input922
[1618068.457929] input: HID 04d9:1400 Consumer Control as 
/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.1/0003:04D9:1400.013A/input/input923
[1618068.509294] input: HID 04d9:1400 System Control as 
/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.1/0003:04D9:1400.013A/input/input924
[1618068.509518] hid-generic 0003:04D9:1400.013A: input,hidraw2: USB HID 
v1.10 Mouse [HID 04d9:1400] on usb-0000:00:14.0-3.6/input1
[1618068.588078] usb 1-3.7: new full-speed USB device number 81 using 
xhci_hcd
[1618068.679132] usb 1-3.7: New USB device found, idVendor=046d, 
idProduct=c52b, bcdDevice=12.03
[1618068.679137] usb 1-3.7: New USB device strings: Mfr=1, Product=2, 
SerialNumber=0
[1618068.679139] usb 1-3.7: Product: USB Receiver
[1618068.679142] usb 1-3.7: Manufacturer: Logitech
[1618068.692430] logitech-djreceiver 0003:046D:C52B.013D: 
hiddev96,hidraw3: USB HID v1.11 Device [Logitech USB Receiver] on 
usb-0000:00:14.0-3.7/input2
[1618068.817334] input: Logitech Performance MX as 
/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.7/1-3.7:1.2/0003:046D:C52B.013D/0003:046D:101A.013E/input/input925
[1618068.820357] logitech-hidpp-device 0003:046D:101A.013E: 
input,hidraw4: USB HID v1.11 Mouse [Logitech Performance MX] on 
usb-0000:00:14.0-3.7:1

Alex G. April 23, 2019, 5:59 p.m. UTC | #12

On 4/22/19 5:43 PM, Alex Williamson wrote:
> On systems that don't support any PCIe services other than bandwidth
> notification, pcie_message_numbers() can return zero vectors, causing
> the vector reallocation in pcie_port_enable_irq_vec() to retry with
> zero, which fails, resulting in fallback to INTx (which might be
> broken) for the bandwidth notification service.  This can resolve
> spurious interrupt faults due to this service on some systems.
> 
> Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---

+1
Tested on some Dell servers. Everything works as expected. I don't have 
a system with a device that only supports bandwidth notification.

Alex

Alex Williamson April 23, 2019, 6:38 p.m. UTC | #13

On Tue, 23 Apr 2019 12:53:07 -0500
Alex G <mr.nuke.me@gmail.com> wrote:

> On 4/23/19 12:10 PM, Bjorn Helgaas wrote:
> > On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote:  
> >> On 4/22/19 7:33 PM, Alex Williamson wrote:  
> >>> There is nothing wrong happening here that needs to fill logs.  I
> >>> thought maybe if I enabled notification of autonomous bandwidth
> >>> changes that it might categorize these as something we could
> >>> ignore, but it doesn't.  How can we identify only cases where this
> >>> is an erroneous/noteworthy situation?  Thanks,  
> >>
> >> You don't. Ethernet doesn't. USB doesn't. This logging behavior is
> >> consistent with every other subsystem that deals with multi-speed links.  
> > 
> > Can you point me to the logging in these other subsystems so I can
> > learn more about how they deal with this?  
> 
> I don't have any in-depth articles about the logging in these systems, 
> but I can extract some logs from my machines.
> 
> Ethernet:
> 
> [Sun Apr 21 11:14:06 2019] e1000e: eno1 NIC Link is Down
> [Sun Apr 21 11:14:17 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
> Duplex, Flow Control: Rx/Tx
> [Sun Apr 21 11:14:23 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
> Duplex, Flow Control: Rx/Tx
> [Sun Apr 21 23:33:31 2019] e1000e: eno1 NIC Link is Down
> [Sun Apr 21 23:33:43 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
> Duplex, Flow Control: Rx/Tx
> [Sun Apr 21 23:33:48 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
> Duplex, Flow Control: Rx/Tx
> 
> I used to have one of these "green" ethernet switches that went down to 
> 100mbps automatically. You can imagine how "clogged" the logs were with 
> link up messages. Thank goodness that switch was killed in a thunderstorm.
> 
> USB will log every device insertion and removal, very verbosely (see 
> appendix A).

I have a hard time putting USB insertion and removal into the same
class, the equivalent is PCI hotplug which is logged separately.  Do
we ever log beyond USB device discovery if a device is running at a
lower speed than is possible?  The most directly related is the green
ethernet switch, which you admit was a nuisance due to exactly this
sort of logging.  It was probably confusing to see this logging, perhaps
you wondered if the cable was bad or the switch was defective.

> > I agree that emitting log messages for normal and expected events will
> > lead to user confusion and we need to do something.
> > 
> > e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
> > notification") was merged in v5.1-rc1, so we still have (a little)
> > time to figure this out before v5.1.  
> 
> I always viewed the system log as a system log, instead of a database of 
> system errors. I may have extremist views, but going back to Alex's 
> example, I prefer to see that the power saving mechanism is doing 
> something to save power on my laptop (I'll just ignore it on a desktop).

There's a disconnect from above where similar behavior on ethernet
behavior "clogged" the log files, but here we just want to ignore it.
Excessive logging can also be considered a denial of service vector
when the device generating that excessive logging is attached to a
userspace driver.

> If you think increasing code complexity because people don't want things 
> logged into the system log, then I'm certain we can work out some sane 
> solution. It's the same problem we see with GCC, where people want 
> warning messages here, but don't want the same messages there.

v5.1 is approaching quickly, can we downgrade these to pci_dbg() while
we work on maybe some sort of driver participation in this logging?
Thanks,

Alex

Bjorn Helgaas May 1, 2019, 8:30 p.m. UTC | #14

On Mon, Apr 22, 2019 at 04:43:30PM -0600, Alex Williamson wrote:
> On systems that don't support any PCIe services other than bandwidth
> notification, pcie_message_numbers() can return zero vectors, causing
> the vector reallocation in pcie_port_enable_irq_vec() to retry with
> zero, which fails, resulting in fallback to INTx (which might be
> broken) for the bandwidth notification service.  This can resolve
> spurious interrupt faults due to this service on some systems.
> 
> Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Applied for (hopefully) v5.1, thanks!

>  drivers/pci/pcie/portdrv_core.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 7d04f9d087a6..1b330129089f 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
>  	 * 7.8.2, 7.10.10, 7.31.2.
>  	 */
>  
> -	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) {
> +	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
> +		    PCIE_PORT_SERVICE_BWNOTIF)) {
>  		pcie_capability_read_word(dev, PCI_EXP_FLAGS, &reg16);
>  		*pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9;
>  		nvec = *pme + 1;
>

PCI/LINK: Account for BW notification in vector calculation

Commit Message

Comments

Patch