diff mbox series

[v2] PCI: Fix Intel i210 by avoiding overlapping of BARs

Message ID 20201230185317.30915-1-michael@walle.cc (mailing list archive)
State Changes Requested
Delegated to: Bjorn Helgaas
Headers show
Series [v2] PCI: Fix Intel i210 by avoiding overlapping of BARs | expand

Commit Message

Michael Walle Dec. 30, 2020, 6:53 p.m. UTC
The Intel i210 doesn't work if the Expansion ROM BAR overlaps with
another BAR. Networking won't work at all and once a packet is sent the
netdev watchdog will bite:

[   89.059374] ------------[ cut here ]------------
[   89.064019] NETDEV WATCHDOG: enP2p1s0 (igb): transmit queue 0 timed out
[   89.070681] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x3a8/0x3b0
[   89.078989] Modules linked in:
[   89.082053] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W         5.11.0-rc1-00020-gc16f033804b #289
[   89.091574] Hardware name: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval 2.0 carrier (DT)
[   89.099870] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[   89.105900] pc : dev_watchdog+0x3a8/0x3b0
[   89.109923] lr : dev_watchdog+0x3a8/0x3b0
[   89.113945] sp : ffff80001000bd50
[   89.117268] x29: ffff80001000bd50 x28: 0000000000000008
[   89.122602] x27: 0000000000000004 x26: 0000000000000140
[   89.127935] x25: ffff002001c6c000 x24: ffff002001c2b940
[   89.133267] x23: ffff8000118c7000 x22: ffff002001c6c39c
[   89.138600] x21: ffff002001c6bfb8 x20: ffff002001c6c3b8
[   89.143932] x19: 0000000000000000 x18: 0000000000000010
[   89.149264] x17: 0000000000000000 x16: 0000000000000000
[   89.154596] x15: ffffffffffffffff x14: 0720072007200720
[   89.159928] x13: 0720072007740775 x12: ffff80001195b980
[   89.165260] x11: 0000000000000003 x10: ffff800011943940
[   89.170592] x9 : ffff800010100d44 x8 : 0000000000017fe8
[   89.175924] x7 : c0000000ffffefff x6 : 0000000000000001
[   89.181255] x5 : 0000000000000000 x4 : 0000000000000000
[   89.186587] x3 : 00000000ffffffff x2 : ffff8000118eb908
[   89.191919] x1 : 84d8200845006900 x0 : 0000000000000000
[   89.197251] Call trace:
[   89.199701]  dev_watchdog+0x3a8/0x3b0
[   89.203374]  call_timer_fn+0x38/0x208
[   89.207049]  run_timer_softirq+0x290/0x540
[   89.211158]  __do_softirq+0x138/0x404
[   89.214831]  irq_exit+0xe8/0xf8
[   89.217981]  __handle_domain_irq+0x70/0xc8
[   89.222091]  gic_handle_irq+0xc8/0x2b0
[   89.225850]  el1_irq+0xb8/0x180
[   89.228999]  arch_cpu_idle+0x18/0x40
[   89.232587]  default_idle_call+0x70/0x214
[   89.236610]  do_idle+0x21c/0x290
[   89.239848]  cpu_startup_entry+0x2c/0x70
[   89.243783]  secondary_start_kernel+0x1a0/0x1f0
[   89.248332] ---[ end trace 1687af62576397bc ]---
[   89.253350] igb 0002:01:00.0 enP2p1s0: Reset adapter

Before this fixup the Expansion ROM BAR will overlap with BAR3:
  # lspci -ns 2:1:0 -xx
  0002:01:00.0 0200: 8086:1533 (rev 03)
  00: 86 80 33 15 06 04 10 00 03 00 00 02 08 00 00 00
  10: 00 00 00 40 00 00 00 00 00 00 00 00 00 00 20 40
  20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 03 00
  30: 00 00 20 40 40 00 00 00 00 00 00 00 22 01 00 00

Add a quirk which will update the Expansion ROM BAR for Intel i210s even
if the ROM is disabled. After the quirk is applied:
  # lspci -ns 2:1:0 -xx
  0002:01:00.0 0200: 8086:1533 (rev 03)
  00: 86 80 33 15 06 04 10 00 03 00 00 02 08 00 00 00
  10: 00 00 00 40 00 00 00 00 00 00 00 00 00 00 20 40
  20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 03 00
  30: 00 00 10 40 40 00 00 00 00 00 00 00 22 01 00 00

This behavior was seen with U-Boot v2021.01-rc3 on an ARM64 board (kontron
sl28). Earlier versions likely behave in the same way, but the board which
this was tested on, is only supported since the 2021.01 version.

Signed-off-by: Michael Walle <michael@walle.cc>
---

Paul, thanks for the fast first review!

changes since v1:
 - more precise subject
 - added info about bootloader
 - fixed typos
 - added lspci output to the commit message after the quirks is applied
 - added pci_info() to inform about the quirk
 - renamed pci_fixup_rewrite_rom_bar() to pci_fixup_write_rom_bar().
   Technically, linux didn't write the ROM BAR yet.

 drivers/pci/quirks.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

Comments

Bjorn Helgaas Jan. 8, 2021, 9:20 p.m. UTC | #1
On Wed, Dec 30, 2020 at 07:53:17PM +0100, Michael Walle wrote:
> The Intel i210 doesn't work if the Expansion ROM BAR overlaps with
> another BAR. Networking won't work at all and once a packet is sent the
> netdev watchdog will bite:

Hi Michael,

Sorry for the issue on your system and thanks for investigating it and
coming up with a patch to fix it!

1) Is this a regression?  It sounds like you don't know for sure
because earlier kernels don't support your platform.

2) Can you open a bugzilla at https://bugzilla.kernel.org and attach
the complete dmesg and "sudo lspci -vv" output?  I want to see whether
Linux is assigning something incorrectly or this is a consequence of
some firmware initialization.

3) If the Intel i210 is defective in how it handles an Expansion ROM
that overlaps another BAR, a quirk might be the right fix.  But my
guess is the device is working correctly per spec and there's
something wrong in how firmware/Linux is assigning things.  That would
mean we need a more generic fix that's not a quirk and not tied to the
Intel i210.

> [   89.059374] ------------[ cut here ]------------
> [   89.064019] NETDEV WATCHDOG: enP2p1s0 (igb): transmit queue 0 timed out
> [   89.070681] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x3a8/0x3b0
> [   89.078989] Modules linked in:
> [   89.082053] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W         5.11.0-rc1-00020-gc16f033804b #289
> [   89.091574] Hardware name: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval 2.0 carrier (DT)
> [   89.099870] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> [   89.105900] pc : dev_watchdog+0x3a8/0x3b0
> [   89.109923] lr : dev_watchdog+0x3a8/0x3b0
> [   89.113945] sp : ffff80001000bd50
> [   89.117268] x29: ffff80001000bd50 x28: 0000000000000008
> [   89.122602] x27: 0000000000000004 x26: 0000000000000140
> [   89.127935] x25: ffff002001c6c000 x24: ffff002001c2b940
> [   89.133267] x23: ffff8000118c7000 x22: ffff002001c6c39c
> [   89.138600] x21: ffff002001c6bfb8 x20: ffff002001c6c3b8
> [   89.143932] x19: 0000000000000000 x18: 0000000000000010
> [   89.149264] x17: 0000000000000000 x16: 0000000000000000
> [   89.154596] x15: ffffffffffffffff x14: 0720072007200720
> [   89.159928] x13: 0720072007740775 x12: ffff80001195b980
> [   89.165260] x11: 0000000000000003 x10: ffff800011943940
> [   89.170592] x9 : ffff800010100d44 x8 : 0000000000017fe8
> [   89.175924] x7 : c0000000ffffefff x6 : 0000000000000001
> [   89.181255] x5 : 0000000000000000 x4 : 0000000000000000
> [   89.186587] x3 : 00000000ffffffff x2 : ffff8000118eb908
> [   89.191919] x1 : 84d8200845006900 x0 : 0000000000000000
> [   89.197251] Call trace:
> [   89.199701]  dev_watchdog+0x3a8/0x3b0
> [   89.203374]  call_timer_fn+0x38/0x208
> [   89.207049]  run_timer_softirq+0x290/0x540
> [   89.211158]  __do_softirq+0x138/0x404
> [   89.214831]  irq_exit+0xe8/0xf8
> [   89.217981]  __handle_domain_irq+0x70/0xc8
> [   89.222091]  gic_handle_irq+0xc8/0x2b0
> [   89.225850]  el1_irq+0xb8/0x180
> [   89.228999]  arch_cpu_idle+0x18/0x40
> [   89.232587]  default_idle_call+0x70/0x214
> [   89.236610]  do_idle+0x21c/0x290
> [   89.239848]  cpu_startup_entry+0x2c/0x70
> [   89.243783]  secondary_start_kernel+0x1a0/0x1f0
> [   89.248332] ---[ end trace 1687af62576397bc ]---
> [   89.253350] igb 0002:01:00.0 enP2p1s0: Reset adapter

This entire splat is overkill.  The useful part is what somebody who
trips over this might google for.  Strip out the "cut here", the
timestamps, the register dump, and the last 6-8 lines of the call
trace.

> Before this fixup the Expansion ROM BAR will overlap with BAR3:
>   # lspci -ns 2:1:0 -xx
>   0002:01:00.0 0200: 8086:1533 (rev 03)
>   00: 86 80 33 15 06 04 10 00 03 00 00 02 08 00 00 00
>   10: 00 00 00 40 00 00 00 00 00 00 00 00 00 00 20 40
>   20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 03 00
>   30: 00 00 20 40 40 00 00 00 00 00 00 00 22 01 00 00
> 
> Add a quirk which will update the Expansion ROM BAR for Intel i210s even
> if the ROM is disabled. After the quirk is applied:
>   # lspci -ns 2:1:0 -xx
>   0002:01:00.0 0200: 8086:1533 (rev 03)
>   00: 86 80 33 15 06 04 10 00 03 00 00 02 08 00 00 00
>   10: 00 00 00 40 00 00 00 00 00 00 00 00 00 00 20 40
>   20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 03 00
>   30: 00 00 10 40 40 00 00 00 00 00 00 00 22 01 00 00
> 
> This behavior was seen with U-Boot v2021.01-rc3 on an ARM64 board (kontron
> sl28). Earlier versions likely behave in the same way, but the board which
> this was tested on, is only supported since the 2021.01 version.
> 
> Signed-off-by: Michael Walle <michael@walle.cc>
> ---
> 
> Paul, thanks for the fast first review!
> 
> changes since v1:
>  - more precise subject
>  - added info about bootloader
>  - fixed typos
>  - added lspci output to the commit message after the quirks is applied
>  - added pci_info() to inform about the quirk
>  - renamed pci_fixup_rewrite_rom_bar() to pci_fixup_write_rom_bar().
>    Technically, linux didn't write the ROM BAR yet.
> 
>  drivers/pci/quirks.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 653660e3ba9e..a1a904ed5a10 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5612,3 +5612,39 @@ static void apex_pci_fixup_class(struct pci_dev *pdev)
>  }
>  DECLARE_PCI_FIXUP_CLASS_HEADER(0x1ac1, 0x089a,
>  			       PCI_CLASS_NOT_DEFINED, 8, apex_pci_fixup_class);
> +
> +/*
> + * Some devices don't work if the Expansion ROM has the same base address as
> + * one of the other BARs although it is disabled.
> + * This might happen if the bootloader/BIOS enumerates the BARs in a different
> + * way than linux. If the Expansion ROM is disabled, linux deliberately skips
> + * writing the ROM BAR if the BAR is not enabled because of some broken
> + * devices, see pci_std_update_resource(). Thus, the ROM BAR of the device will
> + * still contain the value assigned by the booloader, which might be the same
> + * value as one of the other BARs then.
> + *
> + * As a workaround, update the Expansion ROM BAR even if the Expansion ROM is
> + * disabled.
> + */
> +static void pci_fixup_write_rom_bar(struct pci_dev *dev)
> +{
> +	struct resource *res = &dev->resource[PCI_ROM_RESOURCE];
> +	struct pci_bus_region region;
> +	u32 rom_addr;
> +
> +	pci_read_config_dword(dev, dev->rom_base_reg, &rom_addr);
> +
> +	if (rom_addr & PCI_ROM_ADDRESS_ENABLE)
> +		return;
> +
> +	pci_info(dev, "Writing Expansion ROM BAR to avoid overlapping\n");
> +
> +	pcibios_resource_to_bus(dev->bus, &region, res);
> +	rom_addr &= ~PCI_ROM_ADDRESS_MASK;
> +	rom_addr |= region.start;
> +	pci_write_config_dword(dev, dev->rom_base_reg, rom_addr);
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1533, pci_fixup_write_rom_bar);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1536, pci_fixup_write_rom_bar);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1537, pci_fixup_write_rom_bar);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1538, pci_fixup_write_rom_bar);
> -- 
> 2.20.1
>
Michael Walle Jan. 9, 2021, 6:31 p.m. UTC | #2
Hi Bjorn,

Am 2021-01-08 22:20, schrieb Bjorn Helgaas:
> On Wed, Dec 30, 2020 at 07:53:17PM +0100, Michael Walle wrote:
>> The Intel i210 doesn't work if the Expansion ROM BAR overlaps with
>> another BAR. Networking won't work at all and once a packet is sent 
>> the
>> netdev watchdog will bite:
> 
> 
> 1) Is this a regression?  It sounds like you don't know for sure
> because earlier kernels don't support your platform.

Whats the background of the question? The board is offially supported
since 5.8. I doubt that the code responsible to not touch the ExpROM
BAR in pci_std_update_resource() were recently changed/added; the
comment refers to a mail from 2005. So no I don't think it is a
regression per se.

It is just that some combination of hardware and firmware will program
the BARs in away so that this bug is triggered. And chances of this
happing are very unlikely.

Do we agree that it should be irrelevant how the firmware programs and
enables the BARs in this case? I.e. you could "fix" u-boot to match the
way linux will assign addresses to the BARs. But that would just work
around the real issue here. IMO.

> 2) Can you open a bugzilla at https://bugzilla.kernel.org and attach
> the complete dmesg and "sudo lspci -vv" output?  I want to see whether
> Linux is assigning something incorrectly or this is a consequence of
> some firmware initialization.

Sure, but you wouldn't even see the error with "lspci -vv" because
lspci will just show the mapping linux assigned to it. But not whats
written to the actual BAR for the PCI card. I'll also include a
"lspci -xx". I've enabled CONFIG_PCI_DEBUG, too.

https://bugzilla.kernel.org/show_bug.cgi?id=211105

> 3) If the Intel i210 is defective in how it handles an Expansion ROM
> that overlaps another BAR, a quirk might be the right fix. But my
> guess is the device is working correctly per spec and there's
> something wrong in how firmware/Linux is assigning things.  That would
> mean we need a more generic fix that's not a quirk and not tied to the
> Intel i210.

Agreed, but as you already stated (and I've also found that in the PCI
spec) the Expansion ROM address decoder can be shared by the other BARs
and it shouldn't matter as long as the ExpROM BAR is disabled, which is
the case here.

I've included the Intel ML, maybe the Intel guys can comment on that.

>> [   89.059374] ------------[ cut here ]------------
>> [   89.064019] NETDEV WATCHDOG: enP2p1s0 (igb): transmit queue 0 timed 
>> out
>> [   89.070681] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:443 
>> dev_watchdog+0x3a8/0x3b0
>> [   89.078989] Modules linked in:
>> [   89.082053] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       
>>   5.11.0-rc1-00020-gc16f033804b #289
>> [   89.091574] Hardware name: Kontron SMARC-sAL28 (Single PHY) on 
>> SMARC Eval 2.0 carrier (DT)
>> [   89.099870] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
>> [   89.105900] pc : dev_watchdog+0x3a8/0x3b0
>> [   89.109923] lr : dev_watchdog+0x3a8/0x3b0
>> [   89.113945] sp : ffff80001000bd50
>> [   89.117268] x29: ffff80001000bd50 x28: 0000000000000008
>> [   89.122602] x27: 0000000000000004 x26: 0000000000000140
>> [   89.127935] x25: ffff002001c6c000 x24: ffff002001c2b940
>> [   89.133267] x23: ffff8000118c7000 x22: ffff002001c6c39c
>> [   89.138600] x21: ffff002001c6bfb8 x20: ffff002001c6c3b8
>> [   89.143932] x19: 0000000000000000 x18: 0000000000000010
>> [   89.149264] x17: 0000000000000000 x16: 0000000000000000
>> [   89.154596] x15: ffffffffffffffff x14: 0720072007200720
>> [   89.159928] x13: 0720072007740775 x12: ffff80001195b980
>> [   89.165260] x11: 0000000000000003 x10: ffff800011943940
>> [   89.170592] x9 : ffff800010100d44 x8 : 0000000000017fe8
>> [   89.175924] x7 : c0000000ffffefff x6 : 0000000000000001
>> [   89.181255] x5 : 0000000000000000 x4 : 0000000000000000
>> [   89.186587] x3 : 00000000ffffffff x2 : ffff8000118eb908
>> [   89.191919] x1 : 84d8200845006900 x0 : 0000000000000000
>> [   89.197251] Call trace:
>> [   89.199701]  dev_watchdog+0x3a8/0x3b0
>> [   89.203374]  call_timer_fn+0x38/0x208
>> [   89.207049]  run_timer_softirq+0x290/0x540
>> [   89.211158]  __do_softirq+0x138/0x404
>> [   89.214831]  irq_exit+0xe8/0xf8
>> [   89.217981]  __handle_domain_irq+0x70/0xc8
>> [   89.222091]  gic_handle_irq+0xc8/0x2b0
>> [   89.225850]  el1_irq+0xb8/0x180
>> [   89.228999]  arch_cpu_idle+0x18/0x40
>> [   89.232587]  default_idle_call+0x70/0x214
>> [   89.236610]  do_idle+0x21c/0x290
>> [   89.239848]  cpu_startup_entry+0x2c/0x70
>> [   89.243783]  secondary_start_kernel+0x1a0/0x1f0
>> [   89.248332] ---[ end trace 1687af62576397bc ]---
>> [   89.253350] igb 0002:01:00.0 enP2p1s0: Reset adapter
> 
> This entire splat is overkill.  The useful part is what somebody who
> trips over this might google for.  Strip out the "cut here", the
> timestamps, the register dump, and the last 6-8 lines of the call
> trace.

This seem to be different from subsys to subsys, but whatever ;)

-michael
Bjorn Helgaas Jan. 12, 2021, 10:58 p.m. UTC | #3
On Sat, Jan 09, 2021 at 07:31:46PM +0100, Michael Walle wrote:
> Hi Bjorn,
> 
> Am 2021-01-08 22:20, schrieb Bjorn Helgaas:
> > On Wed, Dec 30, 2020 at 07:53:17PM +0100, Michael Walle wrote:
> > > The Intel i210 doesn't work if the Expansion ROM BAR overlaps with
> > > another BAR. Networking won't work at all and once a packet is sent
> > > the
> > > netdev watchdog will bite:
> > 
> > 1) Is this a regression?  It sounds like you don't know for sure
> > because earlier kernels don't support your platform.
> 
> Whats the background of the question? The board is offially supported
> since 5.8. I doubt that the code responsible to not touch the ExpROM
> BAR in pci_std_update_resource() were recently changed/added; the
> comment refers to a mail from 2005. So no I don't think it is a
> regression per se.

Just asking because it affects the urgency.  If we added a regression
during the v5.11 merge window, we'd try hard to fix it before
v5.11-final.  But it sounds like the problem has been there a long
time, so a fix could wait until v5.12.

> It is just that some combination of hardware and firmware will program
> the BARs in away so that this bug is triggered. And chances of this
> happing are very unlikely.
> 
> Do we agree that it should be irrelevant how the firmware programs and
> enables the BARs in this case? I.e. you could "fix" u-boot to match the
> way linux will assign addresses to the BARs. But that would just work
> around the real issue here. IMO.

I agree, Linux should work correctly regardless of how firmware
programmed the BARs.

> > 2) Can you open a bugzilla at https://bugzilla.kernel.org and attach
> > the complete dmesg and "sudo lspci -vv" output?  I want to see whether
> > Linux is assigning something incorrectly or this is a consequence of
> > some firmware initialization.
> 
> Sure, but you wouldn't even see the error with "lspci -vv" because
> lspci will just show the mapping linux assigned to it. But not whats
> written to the actual BAR for the PCI card. I'll also include a
> "lspci -xx". I've enabled CONFIG_PCI_DEBUG, too.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=211105
> 
> > 3) If the Intel i210 is defective in how it handles an Expansion ROM
> > that overlaps another BAR, a quirk might be the right fix. But my
> > guess is the device is working correctly per spec and there's
> > something wrong in how firmware/Linux is assigning things.  That would
> > mean we need a more generic fix that's not a quirk and not tied to the
> > Intel i210.
> 
> Agreed, but as you already stated (and I've also found that in the PCI
> spec) the Expansion ROM address decoder can be shared by the other BARs
> and it shouldn't matter as long as the ExpROM BAR is disabled, which is
> the case here.

My point is just that if this could theoretically affect devices other
than the i210, the fix should not be an i210-specific quirk.

I'll assume this is a general problem and wait for a generic PCI core
solution unless it's i210-specific.

Bjorn
Michael Walle Jan. 12, 2021, 11:32 p.m. UTC | #4
Am 2021-01-12 23:58, schrieb Bjorn Helgaas:
> On Sat, Jan 09, 2021 at 07:31:46PM +0100, Michael Walle wrote:
>> Hi Bjorn,
>> 
>> Am 2021-01-08 22:20, schrieb Bjorn Helgaas:
>> > On Wed, Dec 30, 2020 at 07:53:17PM +0100, Michael Walle wrote:
>> > > The Intel i210 doesn't work if the Expansion ROM BAR overlaps with
>> > > another BAR. Networking won't work at all and once a packet is sent
>> > > the
>> > > netdev watchdog will bite:
>> >
>> > 1) Is this a regression?  It sounds like you don't know for sure
>> > because earlier kernels don't support your platform.
>> 
>> Whats the background of the question? The board is offially supported
>> since 5.8. I doubt that the code responsible to not touch the ExpROM
>> BAR in pci_std_update_resource() were recently changed/added; the
>> comment refers to a mail from 2005. So no I don't think it is a
>> regression per se.
> 
> Just asking because it affects the urgency.  If we added a regression
> during the v5.11 merge window, we'd try hard to fix it before
> v5.11-final.  But it sounds like the problem has been there a long
> time, so a fix could wait until v5.12.

Yeah sure.

[..]

>> > 3) If the Intel i210 is defective in how it handles an Expansion ROM
>> > that overlaps another BAR, a quirk might be the right fix. But my
>> > guess is the device is working correctly per spec and there's
>> > something wrong in how firmware/Linux is assigning things.  That would
>> > mean we need a more generic fix that's not a quirk and not tied to the
>> > Intel i210.
>> 
>> Agreed, but as you already stated (and I've also found that in the PCI
>> spec) the Expansion ROM address decoder can be shared by the other 
>> BARs
>> and it shouldn't matter as long as the ExpROM BAR is disabled, which 
>> is
>> the case here.
> 
> My point is just that if this could theoretically affect devices other
> than the i210, the fix should not be an i210-specific quirk.
> I'll assume this is a general problem and wait for a generic PCI core
> solution unless it's i210-specific.

I guess the culprit here is that linux skips the programming of the BAR
because of some broken Matrox card. That should have been a quirk 
instead,
right? But I don't know if we want to change that, do we? How many other
cards depend on that?

And still, how do we find out that the i210 is behaving correctly? In
my opinion it is clearly not. You can change the ExpROM BAR value
during runtime and it will start working (while keeping it disabled).
Am I missing something here?

-michael
Bjorn Helgaas Jan. 15, 2021, 11:57 p.m. UTC | #5
On Wed, Jan 13, 2021 at 12:32:32AM +0100, Michael Walle wrote:
> Am 2021-01-12 23:58, schrieb Bjorn Helgaas:
> > On Sat, Jan 09, 2021 at 07:31:46PM +0100, Michael Walle wrote:
> > > Am 2021-01-08 22:20, schrieb Bjorn Helgaas:

> > > > 3) If the Intel i210 is defective in how it handles an Expansion ROM
> > > > that overlaps another BAR, a quirk might be the right fix. But my
> > > > guess is the device is working correctly per spec and there's
> > > > something wrong in how firmware/Linux is assigning things.  That would
> > > > mean we need a more generic fix that's not a quirk and not tied to the
> > > > Intel i210.
> > > 
> > > Agreed, but as you already stated (and I've also found that in
> > > the PCI spec) the Expansion ROM address decoder can be shared by
> > > the other BARs and it shouldn't matter as long as the ExpROM BAR
> > > is disabled, which is the case here.
> > 
> > My point is just that if this could theoretically affect devices
> > other than the i210, the fix should not be an i210-specific quirk.
> > I'll assume this is a general problem and wait for a generic PCI
> > core solution unless it's i210-specific.
> 
> I guess the culprit here is that linux skips the programming of the
> BAR because of some broken Matrox card. That should have been a
> quirk instead, right? But I don't know if we want to change that, do
> we? How many other cards depend on that?

Oh, right.  There's definitely some complicated history there that
makes me a little scared to change things.  But it's also unfortunate
if we have to pile quirks on top of quirks.

> And still, how do we find out that the i210 is behaving correctly?
> In my opinion it is clearly not. You can change the ExpROM BAR value
> during runtime and it will start working (while keeping it
> disabled).  Am I missing something here?

I agree; if the ROM BAR is disabled, I don't think it should matter at
all what it contains, so this does look like an i210 defect.

Would you mind trying the patch below?  It should update the ROM BAR
value even when it is disabled.  With the current pci_enable_rom()
code that doesn't rely on the value read from the BAR, I *think* this
should be safe even on the Matrox and similar devices.

Bjorn


commit 0ca2233eb71f ("PCI: Update ROM BAR even if disabled")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Fri Jan 15 17:17:44 2021 -0600

    PCI: Update ROM BAR even if disabled
    
    Test patch for i210 issue reported by Michael Walle:
    https://lore.kernel.org/r/20201230185317.30915-1-michael@walle.cc

diff --git a/drivers/pci/rom.c b/drivers/pci/rom.c
index 8fc9a4e911e3..fc638034628c 100644
--- a/drivers/pci/rom.c
+++ b/drivers/pci/rom.c
@@ -35,9 +35,8 @@ int pci_enable_rom(struct pci_dev *pdev)
 		return 0;
 
 	/*
-	 * Ideally pci_update_resource() would update the ROM BAR address,
-	 * and we would only set the enable bit here.  But apparently some
-	 * devices have buggy ROM BARs that read as zero when disabled.
+	 * Some ROM BARs read as zero when disabled, so we can't simply
+	 * read the BAR, set the enable bit, and write it back.
 	 */
 	pcibios_resource_to_bus(pdev->bus, &region, res);
 	pci_read_config_dword(pdev, pdev->rom_base_reg, &rom_addr);
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 7f1acb3918d0..f69b7d179617 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -69,18 +69,10 @@ static void pci_std_update_resource(struct pci_dev *dev, int resno)
 	if (resno < PCI_ROM_RESOURCE) {
 		reg = PCI_BASE_ADDRESS_0 + 4 * resno;
 	} else if (resno == PCI_ROM_RESOURCE) {
-
-		/*
-		 * Apparently some Matrox devices have ROM BARs that read
-		 * as zero when disabled, so don't update ROM BARs unless
-		 * they're enabled.  See
-		 * https://lore.kernel.org/r/43147B3D.1030309@vc.cvut.cz/
-		 */
-		if (!(res->flags & IORESOURCE_ROM_ENABLE))
-			return;
+		if (res->flags & IORESOURCE_ROM_ENABLE)
+			new |= PCI_ROM_ADDRESS_ENABLE;
 
 		reg = dev->rom_base_reg;
-		new |= PCI_ROM_ADDRESS_ENABLE;
 	} else
 		return;
 
@@ -99,7 +91,8 @@ static void pci_std_update_resource(struct pci_dev *dev, int resno)
 	pci_write_config_dword(dev, reg, new);
 	pci_read_config_dword(dev, reg, &check);
 
-	if ((new ^ check) & mask) {
+	/* Some ROM BARs read as zero when disabled */
+	if (resno != PCI_ROM_RESOURCE && (new ^ check) & mask) {
 		pci_err(dev, "BAR %d: error updating (%#08x != %#08x)\n",
 			resno, new, check);
 	}
Michael Walle Jan. 17, 2021, 7:27 p.m. UTC | #6
Hi Bjorn,

Am 2021-01-16 00:57, schrieb Bjorn Helgaas:
> On Wed, Jan 13, 2021 at 12:32:32AM +0100, Michael Walle wrote:
>> Am 2021-01-12 23:58, schrieb Bjorn Helgaas:
>> > On Sat, Jan 09, 2021 at 07:31:46PM +0100, Michael Walle wrote:
>> > > Am 2021-01-08 22:20, schrieb Bjorn Helgaas:
> 
>> > > > 3) If the Intel i210 is defective in how it handles an Expansion ROM
>> > > > that overlaps another BAR, a quirk might be the right fix. But my
>> > > > guess is the device is working correctly per spec and there's
>> > > > something wrong in how firmware/Linux is assigning things.  That would
>> > > > mean we need a more generic fix that's not a quirk and not tied to the
>> > > > Intel i210.
>> > >
>> > > Agreed, but as you already stated (and I've also found that in
>> > > the PCI spec) the Expansion ROM address decoder can be shared by
>> > > the other BARs and it shouldn't matter as long as the ExpROM BAR
>> > > is disabled, which is the case here.
>> >
>> > My point is just that if this could theoretically affect devices
>> > other than the i210, the fix should not be an i210-specific quirk.
>> > I'll assume this is a general problem and wait for a generic PCI
>> > core solution unless it's i210-specific.
>> 
>> I guess the culprit here is that linux skips the programming of the
>> BAR because of some broken Matrox card. That should have been a
>> quirk instead, right? But I don't know if we want to change that, do
>> we? How many other cards depend on that?
> 
> Oh, right.  There's definitely some complicated history there that
> makes me a little scared to change things.  But it's also unfortunate
> if we have to pile quirks on top of quirks.
> 
>> And still, how do we find out that the i210 is behaving correctly?
>> In my opinion it is clearly not. You can change the ExpROM BAR value
>> during runtime and it will start working (while keeping it
>> disabled).  Am I missing something here?
> 
> I agree; if the ROM BAR is disabled, I don't think it should matter at
> all what it contains, so this does look like an i210 defect.
> 
> Would you mind trying the patch below?  It should update the ROM BAR
> value even when it is disabled.  With the current pci_enable_rom()
> code that doesn't rely on the value read from the BAR, I *think* this
> should be safe even on the Matrox and similar devices.

Your patch will fix my issue:

Tested-by: Michael Walle <michael@walle.cc>

> 
> commit 0ca2233eb71f ("PCI: Update ROM BAR even if disabled")
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Fri Jan 15 17:17:44 2021 -0600
> 
>     PCI: Update ROM BAR even if disabled
> 
>     Test patch for i210 issue reported by Michael Walle:
>     https://lore.kernel.org/r/20201230185317.30915-1-michael@walle.cc
> 
> diff --git a/drivers/pci/rom.c b/drivers/pci/rom.c
> index 8fc9a4e911e3..fc638034628c 100644
> --- a/drivers/pci/rom.c
> +++ b/drivers/pci/rom.c
> @@ -35,9 +35,8 @@ int pci_enable_rom(struct pci_dev *pdev)
>  		return 0;
> 
>  	/*
> -	 * Ideally pci_update_resource() would update the ROM BAR address,
> -	 * and we would only set the enable bit here.  But apparently some
> -	 * devices have buggy ROM BARs that read as zero when disabled.
> +	 * Some ROM BARs read as zero when disabled, so we can't simply
> +	 * read the BAR, set the enable bit, and write it back.
>  	 */
>  	pcibios_resource_to_bus(pdev->bus, &region, res);
>  	pci_read_config_dword(pdev, pdev->rom_base_reg, &rom_addr);
> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> index 7f1acb3918d0..f69b7d179617 100644
> --- a/drivers/pci/setup-res.c
> +++ b/drivers/pci/setup-res.c
> @@ -69,18 +69,10 @@ static void pci_std_update_resource(struct pci_dev
> *dev, int resno)
>  	if (resno < PCI_ROM_RESOURCE) {
>  		reg = PCI_BASE_ADDRESS_0 + 4 * resno;
>  	} else if (resno == PCI_ROM_RESOURCE) {
> -
> -		/*
> -		 * Apparently some Matrox devices have ROM BARs that read
> -		 * as zero when disabled, so don't update ROM BARs unless
> -		 * they're enabled.  See
> -		 * https://lore.kernel.org/r/43147B3D.1030309@vc.cvut.cz/
> -		 */
> -		if (!(res->flags & IORESOURCE_ROM_ENABLE))
> -			return;
> +		if (res->flags & IORESOURCE_ROM_ENABLE)
> +			new |= PCI_ROM_ADDRESS_ENABLE;
> 
>  		reg = dev->rom_base_reg;
> -		new |= PCI_ROM_ADDRESS_ENABLE;
>  	} else
>  		return;
> 
> @@ -99,7 +91,8 @@ static void pci_std_update_resource(struct pci_dev
> *dev, int resno)
>  	pci_write_config_dword(dev, reg, new);
>  	pci_read_config_dword(dev, reg, &check);
> 
> -	if ((new ^ check) & mask) {
> +	/* Some ROM BARs read as zero when disabled */
> +	if (resno != PCI_ROM_RESOURCE && (new ^ check) & mask) {
>  		pci_err(dev, "BAR %d: error updating (%#08x != %#08x)\n",
>  			resno, new, check);
>  	}
Michael Walle Feb. 1, 2021, 7:49 p.m. UTC | #7
Hi Bjorn,

Am 2021-01-17 20:27, schrieb Michael Walle:
> Am 2021-01-16 00:57, schrieb Bjorn Helgaas:
>> On Wed, Jan 13, 2021 at 12:32:32AM +0100, Michael Walle wrote:
>>> Am 2021-01-12 23:58, schrieb Bjorn Helgaas:
>>> > On Sat, Jan 09, 2021 at 07:31:46PM +0100, Michael Walle wrote:
>>> > > Am 2021-01-08 22:20, schrieb Bjorn Helgaas:
>> 
>>> > > > 3) If the Intel i210 is defective in how it handles an Expansion ROM
>>> > > > that overlaps another BAR, a quirk might be the right fix. But my
>>> > > > guess is the device is working correctly per spec and there's
>>> > > > something wrong in how firmware/Linux is assigning things.  That would
>>> > > > mean we need a more generic fix that's not a quirk and not tied to the
>>> > > > Intel i210.
>>> > >
>>> > > Agreed, but as you already stated (and I've also found that in
>>> > > the PCI spec) the Expansion ROM address decoder can be shared by
>>> > > the other BARs and it shouldn't matter as long as the ExpROM BAR
>>> > > is disabled, which is the case here.
>>> >
>>> > My point is just that if this could theoretically affect devices
>>> > other than the i210, the fix should not be an i210-specific quirk.
>>> > I'll assume this is a general problem and wait for a generic PCI
>>> > core solution unless it's i210-specific.
>>> 
>>> I guess the culprit here is that linux skips the programming of the
>>> BAR because of some broken Matrox card. That should have been a
>>> quirk instead, right? But I don't know if we want to change that, do
>>> we? How many other cards depend on that?
>> 
>> Oh, right.  There's definitely some complicated history there that
>> makes me a little scared to change things.  But it's also unfortunate
>> if we have to pile quirks on top of quirks.
>> 
>>> And still, how do we find out that the i210 is behaving correctly?
>>> In my opinion it is clearly not. You can change the ExpROM BAR value
>>> during runtime and it will start working (while keeping it
>>> disabled).  Am I missing something here?
>> 
>> I agree; if the ROM BAR is disabled, I don't think it should matter at
>> all what it contains, so this does look like an i210 defect.
>> 
>> Would you mind trying the patch below?  It should update the ROM BAR
>> value even when it is disabled.  With the current pci_enable_rom()
>> code that doesn't rely on the value read from the BAR, I *think* this
>> should be safe even on the Matrox and similar devices.
> 
> Your patch will fix my issue:
> 
> Tested-by: Michael Walle <michael@walle.cc>

any news on this?

-michael
Bjorn Helgaas Feb. 1, 2021, 10:20 p.m. UTC | #8
On Mon, Feb 01, 2021 at 08:49:16PM +0100, Michael Walle wrote:
> Am 2021-01-17 20:27, schrieb Michael Walle:
> > Am 2021-01-16 00:57, schrieb Bjorn Helgaas:
> > > On Wed, Jan 13, 2021 at 12:32:32AM +0100, Michael Walle wrote:
> > > > Am 2021-01-12 23:58, schrieb Bjorn Helgaas:
> > > > > On Sat, Jan 09, 2021 at 07:31:46PM +0100, Michael Walle wrote:
> > > > > > Am 2021-01-08 22:20, schrieb Bjorn Helgaas:
> > > 
> > > > > > > 3) If the Intel i210 is defective in how it handles an Expansion ROM
> > > > > > > that overlaps another BAR, a quirk might be the right fix. But my
> > > > > > > guess is the device is working correctly per spec and there's
> > > > > > > something wrong in how firmware/Linux is assigning things.  That would
> > > > > > > mean we need a more generic fix that's not a quirk and not tied to the
> > > > > > > Intel i210.
> > > > > >
> > > > > > Agreed, but as you already stated (and I've also found that in
> > > > > > the PCI spec) the Expansion ROM address decoder can be shared by
> > > > > > the other BARs and it shouldn't matter as long as the ExpROM BAR
> > > > > > is disabled, which is the case here.
> > > > >
> > > > > My point is just that if this could theoretically affect devices
> > > > > other than the i210, the fix should not be an i210-specific quirk.
> > > > > I'll assume this is a general problem and wait for a generic PCI
> > > > > core solution unless it's i210-specific.
> > > > 
> > > > I guess the culprit here is that linux skips the programming of the
> > > > BAR because of some broken Matrox card. That should have been a
> > > > quirk instead, right? But I don't know if we want to change that, do
> > > > we? How many other cards depend on that?
> > > 
> > > Oh, right.  There's definitely some complicated history there that
> > > makes me a little scared to change things.  But it's also unfortunate
> > > if we have to pile quirks on top of quirks.
> > > 
> > > > And still, how do we find out that the i210 is behaving correctly?
> > > > In my opinion it is clearly not. You can change the ExpROM BAR value
> > > > during runtime and it will start working (while keeping it
> > > > disabled).  Am I missing something here?
> > > 
> > > I agree; if the ROM BAR is disabled, I don't think it should matter at
> > > all what it contains, so this does look like an i210 defect.
> > > 
> > > Would you mind trying the patch below?  It should update the ROM BAR
> > > value even when it is disabled.  With the current pci_enable_rom()
> > > code that doesn't rely on the value read from the BAR, I *think* this
> > > should be safe even on the Matrox and similar devices.
> > 
> > Your patch will fix my issue:
> > 
> > Tested-by: Michael Walle <michael@walle.cc>
> 
> any news on this?

Thanks for the reminder.  I was thinking this morning that I need to
get back to this.  I'm trying to convince myself that doing this
wouldn't break the problem fixed by 755528c860b0 ("Ignore disabled ROM
resources at setup").  So far I haven't quite succeeded.

Bjorn
Michael Walle March 15, 2021, 9:51 p.m. UTC | #9
Am 2021-02-01 23:20, schrieb Bjorn Helgaas:
> On Mon, Feb 01, 2021 at 08:49:16PM +0100, Michael Walle wrote:
>> Am 2021-01-17 20:27, schrieb Michael Walle:
>> > Am 2021-01-16 00:57, schrieb Bjorn Helgaas:
>> > > On Wed, Jan 13, 2021 at 12:32:32AM +0100, Michael Walle wrote:
>> > > > Am 2021-01-12 23:58, schrieb Bjorn Helgaas:
>> > > > > On Sat, Jan 09, 2021 at 07:31:46PM +0100, Michael Walle wrote:
>> > > > > > Am 2021-01-08 22:20, schrieb Bjorn Helgaas:
>> > >
>> > > > > > > 3) If the Intel i210 is defective in how it handles an Expansion ROM
>> > > > > > > that overlaps another BAR, a quirk might be the right fix. But my
>> > > > > > > guess is the device is working correctly per spec and there's
>> > > > > > > something wrong in how firmware/Linux is assigning things.  That would
>> > > > > > > mean we need a more generic fix that's not a quirk and not tied to the
>> > > > > > > Intel i210.
>> > > > > >
>> > > > > > Agreed, but as you already stated (and I've also found that in
>> > > > > > the PCI spec) the Expansion ROM address decoder can be shared by
>> > > > > > the other BARs and it shouldn't matter as long as the ExpROM BAR
>> > > > > > is disabled, which is the case here.
>> > > > >
>> > > > > My point is just that if this could theoretically affect devices
>> > > > > other than the i210, the fix should not be an i210-specific quirk.
>> > > > > I'll assume this is a general problem and wait for a generic PCI
>> > > > > core solution unless it's i210-specific.
>> > > >
>> > > > I guess the culprit here is that linux skips the programming of the
>> > > > BAR because of some broken Matrox card. That should have been a
>> > > > quirk instead, right? But I don't know if we want to change that, do
>> > > > we? How many other cards depend on that?
>> > >
>> > > Oh, right.  There's definitely some complicated history there that
>> > > makes me a little scared to change things.  But it's also unfortunate
>> > > if we have to pile quirks on top of quirks.
>> > >
>> > > > And still, how do we find out that the i210 is behaving correctly?
>> > > > In my opinion it is clearly not. You can change the ExpROM BAR value
>> > > > during runtime and it will start working (while keeping it
>> > > > disabled).  Am I missing something here?
>> > >
>> > > I agree; if the ROM BAR is disabled, I don't think it should matter at
>> > > all what it contains, so this does look like an i210 defect.
>> > >
>> > > Would you mind trying the patch below?  It should update the ROM BAR
>> > > value even when it is disabled.  With the current pci_enable_rom()
>> > > code that doesn't rely on the value read from the BAR, I *think* this
>> > > should be safe even on the Matrox and similar devices.
>> >
>> > Your patch will fix my issue:
>> >
>> > Tested-by: Michael Walle <michael@walle.cc>
>> 
>> any news on this?
> 
> Thanks for the reminder.  I was thinking this morning that I need to
> get back to this.  I'm trying to convince myself that doing this
> wouldn't break the problem fixed by 755528c860b0 ("Ignore disabled ROM
> resources at setup").  So far I haven't quite succeeded.

ping #2 ;)

-michael
Michael Walle Aug. 20, 2021, 3:12 p.m. UTC | #10
Am 2021-03-15 22:51, schrieb Michael Walle:
> Am 2021-02-01 23:20, schrieb Bjorn Helgaas:
>> On Mon, Feb 01, 2021 at 08:49:16PM +0100, Michael Walle wrote:
>>> Am 2021-01-17 20:27, schrieb Michael Walle:
>>> > Am 2021-01-16 00:57, schrieb Bjorn Helgaas:
>>> > > On Wed, Jan 13, 2021 at 12:32:32AM +0100, Michael Walle wrote:
>>> > > > Am 2021-01-12 23:58, schrieb Bjorn Helgaas:
>>> > > > > On Sat, Jan 09, 2021 at 07:31:46PM +0100, Michael Walle wrote:
>>> > > > > > Am 2021-01-08 22:20, schrieb Bjorn Helgaas:
>>> > >
>>> > > > > > > 3) If the Intel i210 is defective in how it handles an Expansion ROM
>>> > > > > > > that overlaps another BAR, a quirk might be the right fix. But my
>>> > > > > > > guess is the device is working correctly per spec and there's
>>> > > > > > > something wrong in how firmware/Linux is assigning things.  That would
>>> > > > > > > mean we need a more generic fix that's not a quirk and not tied to the
>>> > > > > > > Intel i210.
>>> > > > > >
>>> > > > > > Agreed, but as you already stated (and I've also found that in
>>> > > > > > the PCI spec) the Expansion ROM address decoder can be shared by
>>> > > > > > the other BARs and it shouldn't matter as long as the ExpROM BAR
>>> > > > > > is disabled, which is the case here.
>>> > > > >
>>> > > > > My point is just that if this could theoretically affect devices
>>> > > > > other than the i210, the fix should not be an i210-specific quirk.
>>> > > > > I'll assume this is a general problem and wait for a generic PCI
>>> > > > > core solution unless it's i210-specific.
>>> > > >
>>> > > > I guess the culprit here is that linux skips the programming of the
>>> > > > BAR because of some broken Matrox card. That should have been a
>>> > > > quirk instead, right? But I don't know if we want to change that, do
>>> > > > we? How many other cards depend on that?
>>> > >
>>> > > Oh, right.  There's definitely some complicated history there that
>>> > > makes me a little scared to change things.  But it's also unfortunate
>>> > > if we have to pile quirks on top of quirks.
>>> > >
>>> > > > And still, how do we find out that the i210 is behaving correctly?
>>> > > > In my opinion it is clearly not. You can change the ExpROM BAR value
>>> > > > during runtime and it will start working (while keeping it
>>> > > > disabled).  Am I missing something here?
>>> > >
>>> > > I agree; if the ROM BAR is disabled, I don't think it should matter at
>>> > > all what it contains, so this does look like an i210 defect.
>>> > >
>>> > > Would you mind trying the patch below?  It should update the ROM BAR
>>> > > value even when it is disabled.  With the current pci_enable_rom()
>>> > > code that doesn't rely on the value read from the BAR, I *think* this
>>> > > should be safe even on the Matrox and similar devices.
>>> >
>>> > Your patch will fix my issue:
>>> >
>>> > Tested-by: Michael Walle <michael@walle.cc>
>>> 
>>> any news on this?
>> 
>> Thanks for the reminder.  I was thinking this morning that I need to
>> get back to this.  I'm trying to convince myself that doing this
>> wouldn't break the problem fixed by 755528c860b0 ("Ignore disabled ROM
>> resources at setup").  So far I haven't quite succeeded.
> 
> ping #2 ;)

ping #3, soon we can celebrate our first one year anniversary :p

-michael
Michael Walle Dec. 20, 2021, 5:43 p.m. UTC | #11
Am 2021-08-20 17:12, schrieb Michael Walle:
> Am 2021-03-15 22:51, schrieb Michael Walle:
>> Am 2021-02-01 23:20, schrieb Bjorn Helgaas:
>>> On Mon, Feb 01, 2021 at 08:49:16PM +0100, Michael Walle wrote:
>>>> Am 2021-01-17 20:27, schrieb Michael Walle:
>>>> > Am 2021-01-16 00:57, schrieb Bjorn Helgaas:
>>>> > > On Wed, Jan 13, 2021 at 12:32:32AM +0100, Michael Walle wrote:
>>>> > > > Am 2021-01-12 23:58, schrieb Bjorn Helgaas:
>>>> > > > > On Sat, Jan 09, 2021 at 07:31:46PM +0100, Michael Walle wrote:
>>>> > > > > > Am 2021-01-08 22:20, schrieb Bjorn Helgaas:
>>>> > >
>>>> > > > > > > 3) If the Intel i210 is defective in how it handles an Expansion ROM
>>>> > > > > > > that overlaps another BAR, a quirk might be the right fix. But my
>>>> > > > > > > guess is the device is working correctly per spec and there's
>>>> > > > > > > something wrong in how firmware/Linux is assigning things.  That would
>>>> > > > > > > mean we need a more generic fix that's not a quirk and not tied to the
>>>> > > > > > > Intel i210.
>>>> > > > > >
>>>> > > > > > Agreed, but as you already stated (and I've also found that in
>>>> > > > > > the PCI spec) the Expansion ROM address decoder can be shared by
>>>> > > > > > the other BARs and it shouldn't matter as long as the ExpROM BAR
>>>> > > > > > is disabled, which is the case here.
>>>> > > > >
>>>> > > > > My point is just that if this could theoretically affect devices
>>>> > > > > other than the i210, the fix should not be an i210-specific quirk.
>>>> > > > > I'll assume this is a general problem and wait for a generic PCI
>>>> > > > > core solution unless it's i210-specific.
>>>> > > >
>>>> > > > I guess the culprit here is that linux skips the programming of the
>>>> > > > BAR because of some broken Matrox card. That should have been a
>>>> > > > quirk instead, right? But I don't know if we want to change that, do
>>>> > > > we? How many other cards depend on that?
>>>> > >
>>>> > > Oh, right.  There's definitely some complicated history there that
>>>> > > makes me a little scared to change things.  But it's also unfortunate
>>>> > > if we have to pile quirks on top of quirks.
>>>> > >
>>>> > > > And still, how do we find out that the i210 is behaving correctly?
>>>> > > > In my opinion it is clearly not. You can change the ExpROM BAR value
>>>> > > > during runtime and it will start working (while keeping it
>>>> > > > disabled).  Am I missing something here?
>>>> > >
>>>> > > I agree; if the ROM BAR is disabled, I don't think it should matter at
>>>> > > all what it contains, so this does look like an i210 defect.
>>>> > >
>>>> > > Would you mind trying the patch below?  It should update the ROM BAR
>>>> > > value even when it is disabled.  With the current pci_enable_rom()
>>>> > > code that doesn't rely on the value read from the BAR, I *think* this
>>>> > > should be safe even on the Matrox and similar devices.
>>>> >
>>>> > Your patch will fix my issue:
>>>> >
>>>> > Tested-by: Michael Walle <michael@walle.cc>
>>>> 
>>>> any news on this?
>>> 
>>> Thanks for the reminder.  I was thinking this morning that I need to
>>> get back to this.  I'm trying to convince myself that doing this
>>> wouldn't break the problem fixed by 755528c860b0 ("Ignore disabled 
>>> ROM
>>> resources at setup").  So far I haven't quite succeeded.
>> 
>> ping #2 ;)
> 
> ping #3, soon we can celebrate our first one year anniversary :p

ping #4

In a few days this is a year old. Please have a look at it and either 
add
my quirk patch or apply your patch. This is still breaking i210 on
my board.

TBH, this is really frustrating.

-michael
Bjorn Helgaas Dec. 21, 2021, 5:48 p.m. UTC | #12
[+to Jesse, Tony for Intel advice;
beginning of thread: https://lore.kernel.org/all/20201230185317.30915-1-michael@walle.cc/]

On Mon, Dec 20, 2021 at 06:43:03PM +0100, Michael Walle wrote:
> ...
> ping #4
> 
> In a few days this is a year old. Please have a look at it and
> either add my quirk patch or apply your patch. This is still
> breaking i210 on my board.
> 
> TBH, this is really frustrating.

You are right to be frustrated.  I'm very sorry that I have dropped
the ball on this.  Thanks for reminding me *again*.

I think we agree that this looks like an I210 defect.  I210 should
ignore the ROM BAR contents unless PCI_ROM_ADDRESS_ENABLE is set.  It
would be great if an Intel person could confirm/deny this and supply
an erratum reference and verify the affected device IDs.

It seems that when the BARs are programmed like this:

  BAR 0: 0x40000000 (32-bit, non-prefetchable) [size=1M]
  BAR 3: 0x40200000 (32-bit, non-prefetchable) [size=16K]
  ROM:   0x40200000 (disabled) [size=1M]

networking doesn't work at all and the transmit queue times out.

Linux assigns non-overlapping address space to the ROM BAR, but
pci_std_update_resource() currently doesn't update the BAR itself
unless it is enabled.

My proposal [1] worked around the defect by always updating the BAR,
but there's no clue that this covers up the I210 issue, so it remains
as sort of a land mine.  A future change could re-expose the problem,
so I don't think this was a good approach.

Your original patch [2] makes it clear that it's an issue with I210,
but there's an implicit connection between the normal BAR update path
(which skips the actual BAR write) and the quirk that does the BAR
write:

  <enumeration resource assignment>
    ...
      pci_assign_resource
        pci_update_resource
          pci_std_update_resource
            if (ROM && ROM-disabled)
              return
            pci_write_config_dword      # ROM BAR update (skipped)

  pci_fixup_write_rom_bar               # final fixup
    pci_write_config_dword              # ROM BAR update

In the boot-time resource assignment path, this works fine, but if
pci_assign_resource() is called from pci_map_rom(), the fixup will not
happen, so we could still have problem.

If we tweaked pci_std_update_resource() to take account of this
defect, I think we could cover that path, too.

Can you try the patch below?

[1] https://lore.kernel.org/all/20210115235721.GA1862880@bjorn-Precision-5520/
[2] https://lore.kernel.org/all/20201230185317.30915-1-michael@walle.cc/


commit 021481cfa576 ("PCI: Work around Intel I210 ROM BAR overlap defect")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Tue Dec 21 10:45:07 2021 -0600

    PCI: Work around Intel I210 ROM BAR overlap defect
    
    Per PCIe r5, sec 7.5.1.2.4, a device must not claim accesses to its
    Expansion ROM unless both the Memory Space Enable and the Expansion ROM
    Enable bit are set.  But apparently some Intel I210 NICs don't work
    correctly if the ROM BAR overlaps another BAR, even if the Expansion ROM is
    disabled.
    
    Michael reported that on a Kontron SMARC-sAL28 ARM64 system with U-Boot
    v2021.01-rc3, the ROM BAR overlaps BAR 3, and networking doesn't work at
    all:
    
      BAR 0: 0x40000000 (32-bit, non-prefetchable) [size=1M]
      BAR 3: 0x40200000 (32-bit, non-prefetchable) [size=16K]
      ROM:   0x40200000 (disabled) [size=1M]
    
      NETDEV WATCHDOG: enP2p1s0 (igb): transmit queue 0 timed out
      Hardware name: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval 2.0 carrier (DT)
      igb 0002:01:00.0 enP2p1s0: Reset adapter
    
    Previously, pci_std_update_resource() wrote the assigned ROM address to the
    BAR only when the ROM was enabled.  This meant that the I210 ROM BAR could
    be left with an address assigned by firmware, which might overlap with
    other BARs.
    
    Quirk these I210 devices so we always write the assigned address to the ROM
    BAR, whether or not the ROM is enabled.
    
    Link: https://lore.kernel.org/r/20201230185317.30915-1-michael@walle.cc
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=211105
    Reported-by: Michael Walle <michael@walle.cc>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 003950c738d2..c14ddbd9146d 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5857,3 +5857,13 @@ static void nvidia_ion_ahci_fixup(struct pci_dev *pdev)
 	pdev->dev_flags |= PCI_DEV_FLAGS_HAS_MSI_MASKING;
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup);
+
+static void intel_i210_rom_defect(struct pci_dev *dev)
+{
+	pci_info(dev, "working around ROM BAR overlap defect\n");
+	dev->rom_bar_overlap = 1;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1533, intel_i210_rom_defect);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1536, intel_i210_rom_defect);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1537, intel_i210_rom_defect);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1538, intel_i210_rom_defect);
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 7f1acb3918d0..439ac5f5907a 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -75,12 +75,16 @@ static void pci_std_update_resource(struct pci_dev *dev, int resno)
 		 * as zero when disabled, so don't update ROM BARs unless
 		 * they're enabled.  See
 		 * https://lore.kernel.org/r/43147B3D.1030309@vc.cvut.cz/
+		 * But we must update ROM BAR for buggy devices where even a
+		 * disabled ROM can conflict with other BARs.
 		 */
-		if (!(res->flags & IORESOURCE_ROM_ENABLE))
+		if (!(res->flags & IORESOURCE_ROM_ENABLE) &&
+		    !dev->rom_bar_overlap)
 			return;
 
 		reg = dev->rom_base_reg;
-		new |= PCI_ROM_ADDRESS_ENABLE;
+		if (res->flags & IORESOURCE_ROM_ENABLE)
+			new |= PCI_ROM_ADDRESS_ENABLE;
 	} else
 		return;
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 18a75c8e615c..51c4a063f489 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -455,6 +455,7 @@ struct pci_dev {
 	unsigned int	link_active_reporting:1;/* Device capable of reporting link active */
 	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
 	unsigned int	no_command_memory:1;	/* No PCI_COMMAND_MEMORY */
+	unsigned int	rom_bar_overlap:1;	/* ROM BAR disable broken */
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
Michael Walle Dec. 23, 2021, 9:27 a.m. UTC | #13
Hi Bjorn,

Am 2021-12-21 18:48, schrieb Bjorn Helgaas:
> [+to Jesse, Tony for Intel advice;
> beginning of thread:
> https://lore.kernel.org/all/20201230185317.30915-1-michael@walle.cc/]
> 
> On Mon, Dec 20, 2021 at 06:43:03PM +0100, Michael Walle wrote:
>> ...
>> ping #4
>> 
>> In a few days this is a year old. Please have a look at it and
>> either add my quirk patch or apply your patch. This is still
>> breaking i210 on my board.
>> 
>> TBH, this is really frustrating.
> 
> You are right to be frustrated.  I'm very sorry that I have dropped
> the ball on this.  Thanks for reminding me *again*.
> 
> I think we agree that this looks like an I210 defect.  I210 should
> ignore the ROM BAR contents unless PCI_ROM_ADDRESS_ENABLE is set.  It
> would be great if an Intel person could confirm/deny this and supply
> an erratum reference and verify the affected device IDs.
> 
> It seems that when the BARs are programmed like this:
> 
>   BAR 0: 0x40000000 (32-bit, non-prefetchable) [size=1M]
>   BAR 3: 0x40200000 (32-bit, non-prefetchable) [size=16K]
>   ROM:   0x40200000 (disabled) [size=1M]
> 
> networking doesn't work at all and the transmit queue times out.
> 
> Linux assigns non-overlapping address space to the ROM BAR, but
> pci_std_update_resource() currently doesn't update the BAR itself
> unless it is enabled.
> 
> My proposal [1] worked around the defect by always updating the BAR,
> but there's no clue that this covers up the I210 issue, so it remains
> as sort of a land mine.  A future change could re-expose the problem,
> so I don't think this was a good approach.
> 
> Your original patch [2] makes it clear that it's an issue with I210,
> but there's an implicit connection between the normal BAR update path
> (which skips the actual BAR write) and the quirk that does the BAR
> write:
> 
>   <enumeration resource assignment>
>     ...
>       pci_assign_resource
>         pci_update_resource
>           pci_std_update_resource
>             if (ROM && ROM-disabled)
>               return
>             pci_write_config_dword      # ROM BAR update (skipped)
> 
>   pci_fixup_write_rom_bar               # final fixup
>     pci_write_config_dword              # ROM BAR update
> 
> In the boot-time resource assignment path, this works fine, but if
> pci_assign_resource() is called from pci_map_rom(), the fixup will not
> happen, so we could still have problem.

I'm not sure I follow. pci_map_rom() should work fine. That was also
the workaround IIRC, that is, enable the rom via sysfs. pci_map_rom()
will call pci_enable_rom(), which should be the same as the fixup.
If memory serves correctly, that is where I shamelessly copied the
code from ;)

btw. the problem is not in pci_assign_resource() which assigns the
correct offsets, but that they are not written to the PCI card.
Eg. see lspci:

# lspci -s 2:1:0 -v
0002:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
Connection (rev 03)
	Subsystem: Hewlett-Packard Company Ethernet I210-T1 GbE NIC
	Flags: bus master, fast devsel, latency 0, IRQ 34, IOMMU group 8
	Memory at 8840000000 (32-bit, non-prefetchable) [size=1M]
	Memory at 8840200000 (32-bit, non-prefetchable) [size=16K]
	Expansion ROM at 8840100000 [disabled] [size=1M]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 00-de-ad-ff-ff-be-ef-04
	Capabilities: [1a0] Transaction Processing Hints
	Kernel driver in use: igb
lspci: Unable to load libkmod resources: error -2

# lspci -s 2:1:0 -xx
0002:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
Connection (rev 03)
00: 86 80 33 15 06 04 10 00 03 00 00 02 08 00 00 00
10: 00 00 00 40 00 00 00 00 00 00 00 00 00 00 20 40
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 03 00
30: 00 00 20 40 40 00 00 00 00 00 00 00 22 01 00 00

Note the difference between "Expansion ROM at 8840100000" (assigned
by pci_assign_resource() I guess) and the actual value at offset
0x30: 0x40200000. The latter will be updated either by pci_enable_rom()
or my pci fixup quirk.

> If we tweaked pci_std_update_resource() to take account of this
> defect, I think we could cover that path, too.
> 
> Can you try the patch below?

I tried, but it doesn't work because the fixup function is called
after pci_std_update_resource(), thus dev->rom_bar_overlap is still
0.

Thanks,
-michael
Bjorn Helgaas Dec. 23, 2021, 4:37 p.m. UTC | #14
On Thu, Dec 23, 2021 at 10:27:15AM +0100, Michael Walle wrote:
> Am 2021-12-21 18:48, schrieb Bjorn Helgaas:
> > [+to Jesse, Tony for Intel advice;
> > beginning of thread:
> > https://lore.kernel.org/all/20201230185317.30915-1-michael@walle.cc/]
> > 
> > On Mon, Dec 20, 2021 at 06:43:03PM +0100, Michael Walle wrote:
> > > ...
> > > ping #4
> > > 
> > > In a few days this is a year old. Please have a look at it and
> > > either add my quirk patch or apply your patch. This is still
> > > breaking i210 on my board.
> > > 
> > > TBH, this is really frustrating.
> > 
> > You are right to be frustrated.  I'm very sorry that I have dropped
> > the ball on this.  Thanks for reminding me *again*.
> > 
> > I think we agree that this looks like an I210 defect.  I210 should
> > ignore the ROM BAR contents unless PCI_ROM_ADDRESS_ENABLE is set.  It
> > would be great if an Intel person could confirm/deny this and supply
> > an erratum reference and verify the affected device IDs.
> > 
> > It seems that when the BARs are programmed like this:
> > 
> >   BAR 0: 0x40000000 (32-bit, non-prefetchable) [size=1M]
> >   BAR 3: 0x40200000 (32-bit, non-prefetchable) [size=16K]
> >   ROM:   0x40200000 (disabled) [size=1M]
> > 
> > networking doesn't work at all and the transmit queue times out.
> > 
> > Linux assigns non-overlapping address space to the ROM BAR, but
> > pci_std_update_resource() currently doesn't update the BAR itself
> > unless it is enabled.
> > 
> > My proposal [1] worked around the defect by always updating the BAR,
> > but there's no clue that this covers up the I210 issue, so it remains
> > as sort of a land mine.  A future change could re-expose the problem,
> > so I don't think this was a good approach.
> > 
> > Your original patch [2] makes it clear that it's an issue with I210,
> > but there's an implicit connection between the normal BAR update path
> > (which skips the actual BAR write) and the quirk that does the BAR
> > write:
> > 
> >   <enumeration resource assignment>
> >     ...
> >       pci_assign_resource
> >         pci_update_resource
> >           pci_std_update_resource
> >             if (ROM && ROM-disabled)
> >               return
> >             pci_write_config_dword      # ROM BAR update (skipped)
> > 
> >   pci_fixup_write_rom_bar               # final fixup
> >     pci_write_config_dword              # ROM BAR update
> > 
> > In the boot-time resource assignment path, this works fine, but if
> > pci_assign_resource() is called from pci_map_rom(), the fixup will not
> > happen, so we could still have problem.
> 
> I'm not sure I follow. pci_map_rom() should work fine. That was also
> the workaround IIRC, that is, enable the rom via sysfs. pci_map_rom()
> will call pci_enable_rom(), which should be the same as the fixup.
> If memory serves correctly, that is where I shamelessly copied the
> code from ;)

Ah, you're right, I missed the fact that pci_enable_rom() would do the
write.

> btw. the problem is not in pci_assign_resource() which assigns the
> correct offsets, but that they are not written to the PCI card.
> Eg. see lspci:
> 
> # lspci -s 2:1:0 -v
> 0002:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> 	Subsystem: Hewlett-Packard Company Ethernet I210-T1 GbE NIC
> 	Flags: bus master, fast devsel, latency 0, IRQ 34, IOMMU group 8
> 	Memory at 8840000000 (32-bit, non-prefetchable) [size=1M]
> 	Memory at 8840200000 (32-bit, non-prefetchable) [size=16K]
> 	Expansion ROM at 8840100000 [disabled] [size=1M]
> 	Capabilities: [40] Power Management version 3
> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> 	Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> 	Capabilities: [a0] Express Endpoint, MSI 00
> 	Capabilities: [100] Advanced Error Reporting
> 	Capabilities: [140] Device Serial Number 00-de-ad-ff-ff-be-ef-04
> 	Capabilities: [1a0] Transaction Processing Hints
> 	Kernel driver in use: igb
> lspci: Unable to load libkmod resources: error -2
> 
> # lspci -s 2:1:0 -xx
> 0002:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> 00: 86 80 33 15 06 04 10 00 03 00 00 02 08 00 00 00
> 10: 00 00 00 40 00 00 00 00 00 00 00 00 00 00 20 40
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 03 00
> 30: 00 00 20 40 40 00 00 00 00 00 00 00 22 01 00 00
> 
> Note the difference between "Expansion ROM at 8840100000" (assigned
> by pci_assign_resource() I guess) and the actual value at offset
> 0x30: 0x40200000. The latter will be updated either by pci_enable_rom()
> or my pci fixup quirk.

Yes, I understand that.  I think it's better to do the update in
pci_std_update_resource() instead of relying on a future fixup or
pci_enable_rom().  There's no explicit connection between the
assignment and the BAR write, so it's fragile.

> > Can you try the patch below?
> 
> I tried, but it doesn't work because the fixup function is called
> after pci_std_update_resource(), thus dev->rom_bar_overlap is still
> 0.

I intended to change the quirk from FINAL to EARLY, but obviously
forgot.  Here's the updated version:

commit bb5639b73a2d ("PCI: Work around Intel I210 ROM BAR overlap defect")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Tue Dec 21 10:45:07 2021 -0600

    PCI: Work around Intel I210 ROM BAR overlap defect
    
    Per PCIe r5, sec 7.5.1.2.4, a device must not claim accesses to its
    Expansion ROM unless both the Memory Space Enable and the Expansion ROM
    Enable bit are set.  But apparently some Intel I210 NICs don't work
    correctly if the ROM BAR overlaps another BAR, even if the Expansion ROM is
    disabled.
    
    Michael reported that on a Kontron SMARC-sAL28 ARM64 system with U-Boot
    v2021.01-rc3, the ROM BAR overlaps BAR 3, and networking doesn't work at
    all:
    
      BAR 0: 0x40000000 (32-bit, non-prefetchable) [size=1M]
      BAR 3: 0x40200000 (32-bit, non-prefetchable) [size=16K]
      ROM:   0x40200000 (disabled) [size=1M]
    
      NETDEV WATCHDOG: enP2p1s0 (igb): transmit queue 0 timed out
      Hardware name: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval 2.0 carrier (DT)
      igb 0002:01:00.0 enP2p1s0: Reset adapter
    
    Previously, pci_std_update_resource() wrote the assigned ROM address to the
    BAR only when the ROM was enabled.  This meant that the I210 ROM BAR could
    be left with an address assigned by firmware, which might overlap with
    other BARs.
    
    Quirk these I210 devices so pci_std_update_resource() always writes the
    assigned address to the ROM BAR, whether or not the ROM is enabled.
    
    Link: https://lore.kernel.org/r/20201230185317.30915-1-michael@walle.cc
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=211105
    Reported-by: Michael Walle <michael@walle.cc>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 003950c738d2..46ff04091fa3 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5857,3 +5857,13 @@ static void nvidia_ion_ahci_fixup(struct pci_dev *pdev)
 	pdev->dev_flags |= PCI_DEV_FLAGS_HAS_MSI_MASKING;
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup);
+
+static void rom_bar_overlap_defect(struct pci_dev *dev)
+{
+	pci_info(dev, "working around ROM BAR overlap defect\n");
+	dev->rom_bar_overlap = 1;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1533, rom_bar_overlap_defect);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect);
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 7f1acb3918d0..439ac5f5907a 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -75,12 +75,16 @@ static void pci_std_update_resource(struct pci_dev *dev, int resno)
 		 * as zero when disabled, so don't update ROM BARs unless
 		 * they're enabled.  See
 		 * https://lore.kernel.org/r/43147B3D.1030309@vc.cvut.cz/
+		 * But we must update ROM BAR for buggy devices where even a
+		 * disabled ROM can conflict with other BARs.
 		 */
-		if (!(res->flags & IORESOURCE_ROM_ENABLE))
+		if (!(res->flags & IORESOURCE_ROM_ENABLE) &&
+		    !dev->rom_bar_overlap)
 			return;
 
 		reg = dev->rom_base_reg;
-		new |= PCI_ROM_ADDRESS_ENABLE;
+		if (res->flags & IORESOURCE_ROM_ENABLE)
+			new |= PCI_ROM_ADDRESS_ENABLE;
 	} else
 		return;
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 18a75c8e615c..51c4a063f489 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -455,6 +455,7 @@ struct pci_dev {
 	unsigned int	link_active_reporting:1;/* Device capable of reporting link active */
 	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
 	unsigned int	no_command_memory:1;	/* No PCI_COMMAND_MEMORY */
+	unsigned int	rom_bar_overlap:1;	/* ROM BAR disable broken */
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
Michael Walle Dec. 23, 2021, 6:12 p.m. UTC | #15
Am 2021-12-23 17:37, schrieb Bjorn Helgaas:

> I intended to change the quirk from FINAL to EARLY, but obviously
> forgot.  Here's the updated version:
> 
> commit bb5639b73a2d ("PCI: Work around Intel I210 ROM BAR overlap 
> defect")
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Tue Dec 21 10:45:07 2021 -0600
> 
>     PCI: Work around Intel I210 ROM BAR overlap defect
> 
>     Per PCIe r5, sec 7.5.1.2.4, a device must not claim accesses to its
>     Expansion ROM unless both the Memory Space Enable and the Expansion 
> ROM
>     Enable bit are set.  But apparently some Intel I210 NICs don't work
>     correctly if the ROM BAR overlaps another BAR, even if the 
> Expansion ROM is
>     disabled.
> 
>     Michael reported that on a Kontron SMARC-sAL28 ARM64 system with 
> U-Boot
>     v2021.01-rc3, the ROM BAR overlaps BAR 3, and networking doesn't 
> work at
>     all:
> 
>       BAR 0: 0x40000000 (32-bit, non-prefetchable) [size=1M]
>       BAR 3: 0x40200000 (32-bit, non-prefetchable) [size=16K]
>       ROM:   0x40200000 (disabled) [size=1M]
> 
>       NETDEV WATCHDOG: enP2p1s0 (igb): transmit queue 0 timed out
>       Hardware name: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval
> 2.0 carrier (DT)
>       igb 0002:01:00.0 enP2p1s0: Reset adapter
> 
>     Previously, pci_std_update_resource() wrote the assigned ROM 
> address to the
>     BAR only when the ROM was enabled.  This meant that the I210 ROM 
> BAR could
>     be left with an address assigned by firmware, which might overlap 
> with
>     other BARs.
> 
>     Quirk these I210 devices so pci_std_update_resource() always writes 
> the
>     assigned address to the ROM BAR, whether or not the ROM is enabled.
> 
>     Link: 
> https://lore.kernel.org/r/20201230185317.30915-1-michael@walle.cc
>     Link: https://bugzilla.kernel.org/show_bug.cgi?id=211105
>     Reported-by: Michael Walle <michael@walle.cc>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Tested-by: Michael Walle <michael@walle.cc>

Thanks,
-michael
Bjorn Helgaas Jan. 12, 2022, 2:50 p.m. UTC | #16
On Thu, Dec 23, 2021 at 07:12:02PM +0100, Michael Walle wrote:
> Am 2021-12-23 17:37, schrieb Bjorn Helgaas:
> 
> > I intended to change the quirk from FINAL to EARLY, but obviously
> > forgot.  Here's the updated version:
> > 
> > commit bb5639b73a2d ("PCI: Work around Intel I210 ROM BAR overlap
> > defect")
> > Author: Bjorn Helgaas <bhelgaas@google.com>
> > Date:   Tue Dec 21 10:45:07 2021 -0600
> > 
> >     PCI: Work around Intel I210 ROM BAR overlap defect
> > 
> >     Per PCIe r5, sec 7.5.1.2.4, a device must not claim accesses to its
> >     Expansion ROM unless both the Memory Space Enable and the Expansion
> > ROM
> >     Enable bit are set.  But apparently some Intel I210 NICs don't work
> >     correctly if the ROM BAR overlaps another BAR, even if the Expansion
> > ROM is
> >     disabled.
> > 
> >     Michael reported that on a Kontron SMARC-sAL28 ARM64 system with
> > U-Boot
> >     v2021.01-rc3, the ROM BAR overlaps BAR 3, and networking doesn't
> > work at
> >     all:
> > 
> >       BAR 0: 0x40000000 (32-bit, non-prefetchable) [size=1M]
> >       BAR 3: 0x40200000 (32-bit, non-prefetchable) [size=16K]
> >       ROM:   0x40200000 (disabled) [size=1M]
> > 
> >       NETDEV WATCHDOG: enP2p1s0 (igb): transmit queue 0 timed out
> >       Hardware name: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval
> > 2.0 carrier (DT)
> >       igb 0002:01:00.0 enP2p1s0: Reset adapter
> > 
> >     Previously, pci_std_update_resource() wrote the assigned ROM address
> > to the
> >     BAR only when the ROM was enabled.  This meant that the I210 ROM BAR
> > could
> >     be left with an address assigned by firmware, which might overlap
> > with
> >     other BARs.
> > 
> >     Quirk these I210 devices so pci_std_update_resource() always writes
> > the
> >     assigned address to the ROM BAR, whether or not the ROM is enabled.
> > 
> >     Link:
> > https://lore.kernel.org/r/20201230185317.30915-1-michael@walle.cc
> >     Link: https://bugzilla.kernel.org/show_bug.cgi?id=211105
> >     Reported-by: Michael Walle <michael@walle.cc>
> >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> Tested-by: Michael Walle <michael@walle.cc>

Applied to pci/resource for v5.17, thanks!
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3ba9e..a1a904ed5a10 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5612,3 +5612,39 @@  static void apex_pci_fixup_class(struct pci_dev *pdev)
 }
 DECLARE_PCI_FIXUP_CLASS_HEADER(0x1ac1, 0x089a,
 			       PCI_CLASS_NOT_DEFINED, 8, apex_pci_fixup_class);
+
+/*
+ * Some devices don't work if the Expansion ROM has the same base address as
+ * one of the other BARs although it is disabled.
+ * This might happen if the bootloader/BIOS enumerates the BARs in a different
+ * way than linux. If the Expansion ROM is disabled, linux deliberately skips
+ * writing the ROM BAR if the BAR is not enabled because of some broken
+ * devices, see pci_std_update_resource(). Thus, the ROM BAR of the device will
+ * still contain the value assigned by the booloader, which might be the same
+ * value as one of the other BARs then.
+ *
+ * As a workaround, update the Expansion ROM BAR even if the Expansion ROM is
+ * disabled.
+ */
+static void pci_fixup_write_rom_bar(struct pci_dev *dev)
+{
+	struct resource *res = &dev->resource[PCI_ROM_RESOURCE];
+	struct pci_bus_region region;
+	u32 rom_addr;
+
+	pci_read_config_dword(dev, dev->rom_base_reg, &rom_addr);
+
+	if (rom_addr & PCI_ROM_ADDRESS_ENABLE)
+		return;
+
+	pci_info(dev, "Writing Expansion ROM BAR to avoid overlapping\n");
+
+	pcibios_resource_to_bus(dev->bus, &region, res);
+	rom_addr &= ~PCI_ROM_ADDRESS_MASK;
+	rom_addr |= region.start;
+	pci_write_config_dword(dev, dev->rom_base_reg, rom_addr);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1533, pci_fixup_write_rom_bar);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1536, pci_fixup_write_rom_bar);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1537, pci_fixup_write_rom_bar);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1538, pci_fixup_write_rom_bar);