diff mbox series

[for-linus] PCI: Honor Max Link Speed when determining supported speeds

Message ID e3386d62a766be6d0ef7138a001dabfe563cdff8.1733991971.git.lukas@wunner.de (mailing list archive)
State Superseded
Commit 978463f811ec592c7562b31fa89015718454dd61
Delegated to: Krzysztof Wilczyński
Headers show
Series [for-linus] PCI: Honor Max Link Speed when determining supported speeds | expand

Commit Message

Lukas Wunner Dec. 12, 2024, 8:56 a.m. UTC
The Supported Link Speeds Vector in the Link Capabilities 2 Register
indicates the *supported* link speeds.  The Max Link Speed field in
the Link Capabilities Register indicates the *maximum* of those speeds.

Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
Ilpo recalls seeing this inconsistency on more devices.

pcie_get_supported_speeds() neglects to honor the Max Link Speed field
and will thus incorrectly deem higher speeds as supported.  Fix it.

Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
Reported-by: Niklas Schnelle <niks@kernel.org>
Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/pci.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Ilpo Järvinen Dec. 12, 2024, 2:33 p.m. UTC | #1
On Thu, 12 Dec 2024, Lukas Wunner wrote:

> The Supported Link Speeds Vector in the Link Capabilities 2 Register
> indicates the *supported* link speeds.  The Max Link Speed field in
> the Link Capabilities Register indicates the *maximum* of those speeds.
> 
> Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> Ilpo recalls seeing this inconsistency on more devices.
> 
> pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> and will thus incorrectly deem higher speeds as supported.  Fix it.
> 
> Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> Reported-by: Niklas Schnelle <niks@kernel.org>
> Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>  drivers/pci/pci.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 35dc9f2..b730560 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
>  	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
>  	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
>  
> +	/* Ignore speeds higher than Max Link Speed */
> +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> +	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);

Hi Lukas,

Why do you start GENMASK() from 0th position? That's the reserved bit.
(I doesn't exactly cause a misbehavior to & the never set 0th bit but
it is slightly confusing).

I suggest to get that either from PCI_EXP_LNKCAP2_SLS_2_5GB or 
PCI_EXP_LNKCAP2_SLS (e.g. with __ffs()) and do not use literal at all
to make it explicit where it originates from.
Bjorn Helgaas Dec. 12, 2024, 4:11 p.m. UTC | #2
On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> The Supported Link Speeds Vector in the Link Capabilities 2 Register
> indicates the *supported* link speeds.  The Max Link Speed field in
> the Link Capabilities Register indicates the *maximum* of those speeds.
> 
> Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> Ilpo recalls seeing this inconsistency on more devices.
> 
> pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> and will thus incorrectly deem higher speeds as supported.  Fix it.
> 
> Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> Reported-by: Niklas Schnelle <niks@kernel.org>
> Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

Looks like you want this in v6.13?  Can we make commit log more
explicit as to why we need it there?  Is this change enough to resolve
the boot hang Niklas reported?

> ---
>  drivers/pci/pci.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 35dc9f2..b730560 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
>  	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
>  	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
>  
> +	/* Ignore speeds higher than Max Link Speed */
> +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> +	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> +
>  	/* PCIe r3.0-compliant */
>  	if (speeds)
>  		return speeds;
>  
> -	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> -
>  	/* Synthesize from the Max Link Speed field */
>  	if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
>  		speeds = PCI_EXP_LNKCAP2_SLS_5_0GB | PCI_EXP_LNKCAP2_SLS_2_5GB;
> -- 
> 2.43.0
>
Niklas Schnelle Dec. 12, 2024, 4:58 p.m. UTC | #3
On Thu, 2024-12-12 at 10:11 -0600, Bjorn Helgaas wrote:
> On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > indicates the *supported* link speeds.  The Max Link Speed field in
> > the Link Capabilities Register indicates the *maximum* of those speeds.
> > 
> > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > Ilpo recalls seeing this inconsistency on more devices.
> > 
> > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > and will thus incorrectly deem higher speeds as supported.  Fix it.
> > 
> > Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> > Reported-by: Niklas Schnelle <niks@kernel.org>
> > Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> 
> Looks like you want this in v6.13?  Can we make commit log more
> explicit as to why we need it there?  Is this change enough to resolve
> the boot hang Niklas reported?

As for if it fixes my hang I will test this later today when I come
home. But even if it is not enough on its own, I believe that this will
be needed as a prerequisite for the fix of my hang issue in that
without this patch the dev->supported_speeds incorrectly shows multiple
speeds as supported making it impossible to suppress probing of bwctrl
for devices that only support a fixed speed.
Niklas Schnelle Dec. 12, 2024, 7:40 p.m. UTC | #4
On Thu, 2024-12-12 at 17:58 +0100, Niklas Schnelle wrote:
> On Thu, 2024-12-12 at 10:11 -0600, Bjorn Helgaas wrote:
> > On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> > > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > > indicates the *supported* link speeds.  The Max Link Speed field in
> > > the Link Capabilities Register indicates the *maximum* of those speeds.
> > > 
> > > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > > Ilpo recalls seeing this inconsistency on more devices.
> > > 
> > > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > > and will thus incorrectly deem higher speeds as supported.  Fix it.
> > > 
> > > Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> > > Reported-by: Niklas Schnelle <niks@kernel.org>
> > > Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > 
> > Looks like you want this in v6.13?  Can we make commit log more
> > explicit as to why we need it there?  Is this change enough to resolve
> > the boot hang Niklas reported?
> 
> As for if it fixes my hang I will test this later today when I come
> home. But even if it is not enough on its own, I believe that this will
> be needed as a prerequisite for the fix of my hang issue in that
> without this patch the dev->supported_speeds incorrectly shows multiple
> speeds as supported making it impossible to suppress probing of bwctrl
> for devices that only support a fixed speed.

Ok, gave this a test and as somewhat suspected this patch alone doesn't
fix my boot hang nor do I get more output (also tried Lukas suggestion
with early_printk).

Then I put my patch using the hweight8(dev->supported_speeds) > 1
condition suggested by Lukas on top and with both things work again. I
would now propose that once we've cleared up Ilpo's comment I sent a
series with both patches for you to easily pick together. If you prefer
I can of course also sent my patch stand alone.

Thanks,
Niklas
Niklas Schnelle Dec. 12, 2024, 8:10 p.m. UTC | #5
On Thu, 2024-12-12 at 16:33 +0200, Ilpo Järvinen wrote:
> On Thu, 12 Dec 2024, Lukas Wunner wrote:
> 
> > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > indicates the *supported* link speeds.  The Max Link Speed field in
> > the Link Capabilities Register indicates the *maximum* of those speeds.
> > 
> > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > Ilpo recalls seeing this inconsistency on more devices.
> > 
> > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > and will thus incorrectly deem higher speeds as supported.  Fix it.
> > 
> > Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> > Reported-by: Niklas Schnelle <niks@kernel.org>
> > Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > ---
> >  drivers/pci/pci.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 35dc9f2..b730560 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> >  	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> >  	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> >  
> > +	/* Ignore speeds higher than Max Link Speed */
> > +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > +	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> 
> Hi Lukas,
> 
> Why do you start GENMASK() from 0th position? That's the reserved bit.
> (I doesn't exactly cause a misbehavior to & the never set 0th bit but
> it is slightly confusing).
> 
> I suggest to get that either from PCI_EXP_LNKCAP2_SLS_2_5GB or 
> PCI_EXP_LNKCAP2_SLS (e.g. with __ffs()) and do not use literal at all
> to make it explicit where it originates from.
> 

Hi Ilpo,

I agree this is quite confusing even in the PCIe spec. According to
PCIe r6.2 the value of the Max Link Speed field references a bit in the
Supported Link Speeds Vector with a value of 0b0001 in Max Link Speeds
referring to bit 0 in Supported Link Speeds, a value of 0xb0010 to bit
1 and so on. So the value is actually shiftet left by 1 versus the
typical (i.e. non IBM ;-)) bit counting.

Then looking at the Supported Link Speeds description they refer to bit
0 as 2.5 GT/s, bit 1 as 5 GT/s up to bit 6 (RsvdP) while in the figure
the RsvdP is the right most bit. So unless I have completely confused
myself playing around with this genmask calculator[0] we actually want
GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 1) because just like the Supported
Link Speeds field the dev->supported_speeds also reserves the LSB. And
I feel this actually matches the spec wording pretty well. What do you
think?

Thanks,
Niklas

[0] https://mazdermind.de/genmask/
Lukas Wunner Dec. 12, 2024, 10:13 p.m. UTC | #6
On Thu, Dec 12, 2024 at 04:33:23PM +0200, Ilpo Järvinen wrote:
> On Thu, 12 Dec 2024, Lukas Wunner wrote:
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> >  	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> >  	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> >  
> > +	/* Ignore speeds higher than Max Link Speed */
> > +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > +	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> 
> Why do you start GENMASK() from 0th position? That's the reserved bit.
> (I doesn't exactly cause a misbehavior to & the never set 0th bit but
> it is slightly confusing).

GENMASK() does a BUILD_BUG_ON(l > h) and if a broken PCIe device
does not set any bits in the PCI_EXP_LNKCAP_SLS field, I'd risk
doing a GENMASK(0, 1) here, fulfilling the l > h condition.

Granted, the BUILD_BUG_ON() only triggers if l and h can be
evaluated at compile time, which isn't the case here.

Still, I felt uneasy risking any potential future breakage.
Including the reserved bit 0 is harmless because it's forced to
zero by the earlier "speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS"
expression.


> I suggest to get that either from PCI_EXP_LNKCAP2_SLS_2_5GB or 
> PCI_EXP_LNKCAP2_SLS (e.g. with __ffs()) and do not use literal at all
> to make it explicit where it originates from.

Pardon me for being dense but I don't understand what you mean.

Thanks,

Lukas
Lukas Wunner Dec. 13, 2024, 9:16 a.m. UTC | #7
On Thu, Dec 12, 2024 at 10:11:03AM -0600, Bjorn Helgaas wrote:
> On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > indicates the *supported* link speeds.  The Max Link Speed field in
> > the Link Capabilities Register indicates the *maximum* of those speeds.
> > 
> > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > Ilpo recalls seeing this inconsistency on more devices.
> > 
> > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > and will thus incorrectly deem higher speeds as supported.  Fix it.
> > 
> > Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> > Reported-by: Niklas Schnelle <niks@kernel.org>
> > Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> 
> Looks like you want this in v6.13?  Can we make commit log more
> explicit as to why we need it there?  Is this change enough to resolve
> the boot hang Niklas reported?

This is more like a "we're not conforming to the spec" kind of fix,
but also happens to be a prerequisite to fixing Niklas' boot hang.

Another user-visible issue addressed here is that we're exposing an
incorrect value in the sysfs "max_link_speed" attribute on devices
such as the JHL7540 Thunderbolt controller mentioned in the commit
message.

Thanks,

Lukas

> > ---
> >  drivers/pci/pci.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 35dc9f2..b730560 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> >  	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> >  	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> >  
> > +	/* Ignore speeds higher than Max Link Speed */
> > +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > +	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> > +
> >  	/* PCIe r3.0-compliant */
> >  	if (speeds)
> >  		return speeds;
> >  
> > -	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > -
> >  	/* Synthesize from the Max Link Speed field */
> >  	if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
> >  		speeds = PCI_EXP_LNKCAP2_SLS_5_0GB | PCI_EXP_LNKCAP2_SLS_2_5GB;
> > -- 
> > 2.43.0
Lukas Wunner Dec. 13, 2024, 9:43 a.m. UTC | #8
On Thu, Dec 12, 2024 at 08:40:07PM +0100, Niklas Schnelle wrote:
> > > On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> > > > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > > > indicates the *supported* link speeds.  The Max Link Speed field in
> > > > the Link Capabilities Register indicates the *maximum* of those speeds.
> > > > 
> > > > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > > > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > > > Ilpo recalls seeing this inconsistency on more devices.
> > > > 
> > > > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > > > and will thus incorrectly deem higher speeds as supported.  Fix it.
> 
> Ok, gave this a test and as somewhat suspected this patch alone doesn't
> fix my boot hang nor do I get more output (also tried Lukas suggestion
> with early_printk).

Hm, that's kind of a bummer because while we know how to work around
your boot hang (by disabling bwctrl altogether), we don't really know
the root cause.

The bwctrl IRQ handler runs in hardirq context, so if it ends up in an
infinite loop for some reason or keeps waiting for a spinlock, that
might indeed cause a boot hang.  Not that I'm seeing in the code where
that might occur.  Nevertheless you can try adding "threadirqs" to the
kernel command line to force all IRQ handlers into threads.
Alternatively, enable CONFIG_PREEMPT_RT to also turn spinlocks into
sleeping locks.  Maybe this turns the boot hang into a hung task splat
and thus helps identify the root cause.

Thanks,

Lukas
Ilpo Järvinen Dec. 13, 2024, 10:12 a.m. UTC | #9
On Thu, 12 Dec 2024, Lukas Wunner wrote:

> On Thu, Dec 12, 2024 at 04:33:23PM +0200, Ilpo Järvinen wrote:
> > On Thu, 12 Dec 2024, Lukas Wunner wrote:
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> > >  	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > >  	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> > >  
> > > +	/* Ignore speeds higher than Max Link Speed */
> > > +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > +	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> > 
> > Why do you start GENMASK() from 0th position? That's the reserved bit.
> > (I doesn't exactly cause a misbehavior to & the never set 0th bit but
> > it is slightly confusing).
> 
> GENMASK() does a BUILD_BUG_ON(l > h) and if a broken PCIe device
> does not set any bits in the PCI_EXP_LNKCAP_SLS field, I'd risk
> doing a GENMASK(0, 1) here, fulfilling the l > h condition.
> 
> Granted, the BUILD_BUG_ON() only triggers if l and h can be
> evaluated at compile time, which isn't the case here.
> 
> Still, I felt uneasy risking any potential future breakage.
> Including the reserved bit 0 is harmless because it's forced to
> zero by the earlier "speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS"
> expression.

Fair but that's quite many thoughts you didn't record into the commit 
message. If you intentionally do confusing tricks like that, please be 
mindful about the poor guy who has to figure this out years from now. :-)
(One of the most annoying thing in digging into commits far in the past 
are those nagging "Why?" questions nobody is there to answer.)

I know it doesn't misbehave because of bit 0 is cleared by the earlier 
statement. (I also thought of the GENMASK() inversion.)

Another option would be to explicitly ensure PCI_EXP_LNKCAP_SLS is at 
least PCI_EXP_LNKCAP2_SLS_2_5GB which would also ensure pre-3.0 part
returns always at least one speed (which the current code doesn't 
guarantee).

As in more broader terms there are other kinds of broken devices this 
code doesn't handle. If PCI_EXP_LNKCAP2_SLS is empty of bits but the 
device has >5GT/s in PCI_EXP_LNKCAP_SLS, this function will return 0.

> > I suggest to get that either from PCI_EXP_LNKCAP2_SLS_2_5GB or 
> > PCI_EXP_LNKCAP2_SLS (e.g. with __ffs()) and do not use literal at all
> > to make it explicit where it originates from.
> 
> Pardon me for being dense but I don't understand what you mean.

I meant that instead of GENMASK(..., 0) use
GENMASK(..., __ffs(PCI_EXP_LNKCAP2_SLS)). But it doesn't matter
if go with this bit 0 included into GENMASK() approach.
Niklas Schnelle Dec. 13, 2024, 5:21 p.m. UTC | #10
On Fri, 2024-12-13 at 12:12 +0200, Ilpo Järvinen wrote:
> On Thu, 12 Dec 2024, Lukas Wunner wrote:
> 
> > On Thu, Dec 12, 2024 at 04:33:23PM +0200, Ilpo Järvinen wrote:
> > > On Thu, 12 Dec 2024, Lukas Wunner wrote:
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> > > >  	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > > >  	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> > > >  
> > > > +	/* Ignore speeds higher than Max Link Speed */
> > > > +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > > +	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> > > 
> > > Why do you start GENMASK() from 0th position? That's the reserved bit.
> > > (I doesn't exactly cause a misbehavior to & the never set 0th bit but
> > > it is slightly confusing).
> > 
> > GENMASK() does a BUILD_BUG_ON(l > h) and if a broken PCIe device
> > does not set any bits in the PCI_EXP_LNKCAP_SLS field, I'd risk
> > doing a GENMASK(0, 1) here, fulfilling the l > h condition.
> > 
> > Granted, the BUILD_BUG_ON() only triggers if l and h can be
> > evaluated at compile time, which isn't the case here.
> > 
> > Still, I felt uneasy risking any potential future breakage.
> > Including the reserved bit 0 is harmless because it's forced to
> > zero by the earlier "speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS"
> > expression.
> 
> Fair but that's quite many thoughts you didn't record into the commit 
> message. If you intentionally do confusing tricks like that, please be 
> mindful about the poor guy who has to figure this out years from now. :-)
> (One of the most annoying thing in digging into commits far in the past 
> are those nagging "Why?" questions nobody is there to answer.)
> 
> I know it doesn't misbehave because of bit 0 is cleared by the earlier 
> statement. (I also thought of the GENMASK() inversion.)
> 
> Another option would be to explicitly ensure PCI_EXP_LNKCAP_SLS is at 
> least PCI_EXP_LNKCAP2_SLS_2_5GB which would also ensure pre-3.0 part
> returns always at least one speed (which the current code doesn't 
> guarantee).
> 
> As in more broader terms there are other kinds of broken devices this 
> code doesn't handle. If PCI_EXP_LNKCAP2_SLS is empty of bits but the 
> device has >5GT/s in PCI_EXP_LNKCAP_SLS, this function will return 0.
> 
> > > I suggest to get that either from PCI_EXP_LNKCAP2_SLS_2_5GB or 
> > > PCI_EXP_LNKCAP2_SLS (e.g. with __ffs()) and do not use literal at all
> > > to make it explicit where it originates from.
> > 
> > Pardon me for being dense but I don't understand what you mean.
> 
> I meant that instead of GENMASK(..., 0) use
> GENMASK(..., __ffs(PCI_EXP_LNKCAP2_SLS)). But it doesn't matter
> if go with this bit 0 included into GENMASK() approach.
> 


So that's essentially GENMASK(…, 1). The reason I'm not sure if it
really is more self documenting is that anyone trying to grok this is
going to look in the PCIe spec and that already shows the bit 0 as
reserved (if it still is reserved then) and the bit numbering
explaining the offset and the __ffs() is just going to add confusion.
Instead I prefer your other idea of GENMASK(…,
PCI_EXP_LNKCAP2_SLS_2_5GB) because together with that constant's
comment it clearly communicates "starting at LNKCAP2 SLS Vector bit 0".

For the issue of GENMASK() not being well defined for l == 0, I think
the code should handle lnkcap2 == 0 explizitly which is basically your
point above about other broken devices.

Thanks,
Niklas
Niklas Schnelle Dec. 13, 2024, 5:22 p.m. UTC | #11
On Fri, 2024-12-13 at 10:43 +0100, Lukas Wunner wrote:
> On Thu, Dec 12, 2024 at 08:40:07PM +0100, Niklas Schnelle wrote:
> > > > On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> > > > > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > > > > indicates the *supported* link speeds.  The Max Link Speed field in
> > > > > the Link Capabilities Register indicates the *maximum* of those speeds.
> > > > > 
> > > > > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > > > > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > > > > Ilpo recalls seeing this inconsistency on more devices.
> > > > > 
> > > > > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > > > > and will thus incorrectly deem higher speeds as supported.  Fix it.
> > 
> > Ok, gave this a test and as somewhat suspected this patch alone doesn't
> > fix my boot hang nor do I get more output (also tried Lukas suggestion
> > with early_printk).
> 
> Hm, that's kind of a bummer because while we know how to work around
> your boot hang (by disabling bwctrl altogether), we don't really know
> the root cause.
> 
> The bwctrl IRQ handler runs in hardirq context, so if it ends up in an
> infinite loop for some reason or keeps waiting for a spinlock, that
> might indeed cause a boot hang.  Not that I'm seeing in the code where
> that might occur.  Nevertheless you can try adding "threadirqs" to the
> kernel command line to force all IRQ handlers into threads.
> Alternatively, enable CONFIG_PREEMPT_RT to also turn spinlocks into
> sleeping locks.  Maybe this turns the boot hang into a hung task splat
> and thus helps identify the root cause.
> 
> Thanks,
> 
> Lukas

Tried with "threadirqs" but no change. So then I tried additionally
just exiting pcie_bwnotif_irq() without doing anything if the bus
number matches my thunderbolt controller. Still same result. So I don't
think it is anything the irq handler does in software.

Thanks,
Niklas
Niklas Schnelle Dec. 13, 2024, 5:41 p.m. UTC | #12
On Fri, 2024-12-13 at 12:12 +0200, Ilpo Järvinen wrote:
> On Thu, 12 Dec 2024, Lukas Wunner wrote:
> 
> > On Thu, Dec 12, 2024 at 04:33:23PM +0200, Ilpo Järvinen wrote:
> > > On Thu, 12 Dec 2024, Lukas Wunner wrote:
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> > > >  	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > > >  	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> > > >  
> > > > +	/* Ignore speeds higher than Max Link Speed */
> > > > +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > > +	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> > > 
> 

---8<---

> As in more broader terms there are other kinds of broken devices this 
> code doesn't handle. If PCI_EXP_LNKCAP2_SLS is empty of bits but the 
> device has >5GT/s in PCI_EXP_LNKCAP_SLS, this function will return 0.

On second look I don't think it will. If lnkcap2 & PCI_EXP_LNKCAP2_SLS
is 0 it will proceed to the synthesize part and rely on
PCI_EXP_LNKCAP_SLS alone. The potentially broken part I see is when
lnkcap2 has bits set but lnkcap doesn't which is also when the
GENMASK(…, 1) would become weird. Not sure what the right handling for
that is though.
Niklas Schnelle Dec. 13, 2024, 6:49 p.m. UTC | #13
On Fri, 2024-12-13 at 18:41 +0100, Niklas Schnelle wrote:
> On Fri, 2024-12-13 at 12:12 +0200, Ilpo Järvinen wrote:
> > On Thu, 12 Dec 2024, Lukas Wunner wrote:
> > 
> > > On Thu, Dec 12, 2024 at 04:33:23PM +0200, Ilpo Järvinen wrote:
> > > > On Thu, 12 Dec 2024, Lukas Wunner wrote:
> > > > > --- a/drivers/pci/pci.c
> > > > > +++ b/drivers/pci/pci.c
> > > > > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> > > > >  	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > > > >  	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> > > > >  
> > > > > +	/* Ignore speeds higher than Max Link Speed */
> > > > > +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > > > +	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> > > > 
> > 
> 
> ---8<---
> 
> > As in more broader terms there are other kinds of broken devices this 
> > code doesn't handle. If PCI_EXP_LNKCAP2_SLS is empty of bits but the 
> > device has >5GT/s in PCI_EXP_LNKCAP_SLS, this function will return 0.
> 
> On second look I don't think it will. If lnkcap2 & PCI_EXP_LNKCAP2_SLS
> is 0 it will proceed to the synthesize part and rely on
> PCI_EXP_LNKCAP_SLS alone. The potentially broken part I see is when
> lnkcap2 has bits set but lnkcap doesn't which is also when the
> GENMASK(…, 1) would become weird. Not sure what the right handling for
> that is though.

Never mind, I missed the >5 GT/s bit. Though I think that returning 0
speeds is probably the safest bet for a broken device, no?
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 35dc9f2..b730560 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6240,12 +6240,14 @@  u8 pcie_get_supported_speeds(struct pci_dev *dev)
 	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
 	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
 
+	/* Ignore speeds higher than Max Link Speed */
+	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
+
 	/* PCIe r3.0-compliant */
 	if (speeds)
 		return speeds;
 
-	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
-
 	/* Synthesize from the Max Link Speed field */
 	if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
 		speeds = PCI_EXP_LNKCAP2_SLS_5_0GB | PCI_EXP_LNKCAP2_SLS_2_5GB;