diff mbox

Intel I350 mini-PCIe card (igb) on Mirabox (mvebu / Armada 370)

Message ID alpine.DEB.2.10.1404072239050.32332@vroombuntu (mailing list archive)
State New, archived
Headers show

Commit Message

Neil Greatorex April 7, 2014, 9:58 p.m. UTC
Jason, Thomas, Willy,

On Mon, 7 Apr 2014, Jason Gunthorpe wrote:

>>> HOWEVER, looking now very closely:
>>>
>>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
>>>    Memory behind bridge: e0000000-e02fffff
>>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
>>>    Memory behind bridge: e0300000-e03fffff
>>>
>>> This is certainly wrong, MBUS requires special alignment and sizing.
>>> 0x300000 is not a size which is a power of two, and the next window
>>> starts right after.
>
>> Interesting. Does the PCI code provide a way to specify that the sizes
>> much be a power of 2? I don't fully understand the implications but
>> would it be possible to assign just one MBUS window for the whole of
>> the PCIe memory instead?
>
> I know this has come up before, but I don't recall where things
> settled out.. mvebu_pcie_align_resource is the function to look at,
> the size at that point needs to be rounded up to a power of two and
> communicated back to the caller.
>
> Alternately, I belive Thomas once discussed using multiple mbus
> windows for these non-aligned requests.
>
>>> Just to confirm, what does something like the below say for you guys?
>>
>> See https://gist.github.com/ngreatorex/10025253 for the dmesg output.
>> I have also included the contents of
>> /sys/kernel/debug/mvebu-mbus/devices both before and after the
>> modprobe / oops. As you can see I get a total of 3 WARNINGs - one at
>> boot for the xHCI controller, and two when inserting igb.ko. Note that
>> this time I did this with both ports enabled.
>
> Yep, that is certainly the root problem - most likely for
> everyone. This also explains why Will saw success when he reverted
> that unrelated patch. That change altered the allocation pattern of
> the BARs and it just so happened to make things fall out properly.
>

I have finally managed to get the card working on both ports! Of course, 
to do so I have added some nice kludges to the code that now need to be 
implemented properly, but it is verification of what the problem is and 
how to fix it!

I have included the patch at the end of this e-mail. It probably won't 
apply cleanly for you as I have other dev_dbg calls in pci-mvebu.c.

What I did was to alter mvebu_pcie_align_resource to make the bridge 
memory resource aligned to 4M. This had the effect that the 2nd bridge to 
the xHCI controller was bumped to address 0xe0400000 instead of 
0xe0300000. I then also made it so that when we request the MBUS window to 
be set up we ensure that the size is a power of 2. This has the effect of 
creating the windows and addresses how we want them:

Relevant part of lspci -vvv:

00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) 
(prog-if 00 [Normal decode])
         Memory behind bridge: e0000000-e02fffff

00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) 
(prog-if 00 [Normal decode])
         Memory behind bridge: e0400000-e04fffff

cat /sys/kernel/debug/mvebu-mbus/devices:

[00] 00000000e8010000 - 00000000e8020000 : 0004:00e0 (remap 
0000000000010000)
[01] disabled
[02] disabled
[03] disabled
[04] disabled
[05] disabled
[06] disabled
[07] disabled
[08] 00000000fff00000 - 0000000100000000 : 0001:00e0
[09] 00000000e0400000 - 00000000e0500000 : 0008:00e8
[10] 00000000e0000000 - 00000000e0400000 : 0004:00e8
[11] disabled
[12] disabled
[13] disabled
[14] disabled
[15] disabled
[16] disabled
[17] disabled
[18] disabled
[19] disabled

Now, over to the experts to implement this properly :-)

Thanks to Jason, Thomas and Willy for your help with tracking this down.

Cheers,
Neil


From e68870135cd54609d0421746ccdc1cb58ebbd4dd Mon Sep 17 00:00:00 2001
From: Neil Greatorex <neil@fatboyfat.co.uk>
Date: Mon, 7 Apr 2014 22:33:22 +0100
Subject: [PATCH] pci: host: mvebu - Test fix for Mirabox PCI issues

Ensure that bridge MBUS window is aligned at 4M to bump up 2nd bridge to address
e0400000 instead of e0300000. Also ensure that when we request the MBUS window,
it's size is a power of 2.
---
  drivers/pci/host/pci-mvebu.c | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Willy Tarreau April 8, 2014, 6:28 a.m. UTC | #1
Hi Neil,

On Mon, Apr 07, 2014 at 10:58:36PM +0100, Neil Greatorex wrote:
> I have finally managed to get the card working on both ports! Of course, 
> to do so I have added some nice kludges to the code that now need to be 
> implemented properly, but it is verification of what the problem is and 
> how to fix it!
> 
> I have included the patch at the end of this e-mail. It probably won't 
> apply cleanly for you as I have other dev_dbg calls in pci-mvebu.c.
> 
> What I did was to alter mvebu_pcie_align_resource to make the bridge 
> memory resource aligned to 4M. This had the effect that the 2nd bridge to 
> the xHCI controller was bumped to address 0xe0400000 instead of 
> 0xe0300000. I then also made it so that when we request the MBUS window to 
> be set up we ensure that the size is a power of 2. This has the effect of 
> creating the windows and addresses how we want them:
> 
> Relevant part of lspci -vvv:
> 
> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) 
> (prog-if 00 [Normal decode])
>         Memory behind bridge: e0000000-e02fffff
> 
> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) 
> (prog-if 00 [Normal decode])
>         Memory behind bridge: e0400000-e04fffff
> 
> cat /sys/kernel/debug/mvebu-mbus/devices:
> 
> [00] 00000000e8010000 - 00000000e8020000 : 0004:00e0 (remap 
> 0000000000010000)
> [01] disabled
> [02] disabled
> [03] disabled
> [04] disabled
> [05] disabled
> [06] disabled
> [07] disabled
> [08] 00000000fff00000 - 0000000100000000 : 0001:00e0
> [09] 00000000e0400000 - 00000000e0500000 : 0008:00e8
> [10] 00000000e0000000 - 00000000e0400000 : 0004:00e8
> [11] disabled
> [12] disabled
> [13] disabled
> [14] disabled
> [15] disabled
> [16] disabled
> [17] disabled
> [18] disabled
> [19] disabled
> 
> Now, over to the experts to implement this properly :-)
> 
> Thanks to Jason, Thomas and Willy for your help with tracking this down.

Well, on the XPGP board, it made some progress, but now I'm getting
another crash related to IRQs again when both ports are enabled (note
that I do have your other MSI fix). However, enabling only the second
port works now, so I guess it's just an IRQ assignment issue which is
killing it.

Here's what the bus looks like with your patch :

root@xpgp:~# lspci -tvnn
-[0000:00]-+-01.0-[01]--
           +-09.0-[02]--+-00.0  Intel Corporation Device [8086:1521]
           |            \-00.1  Intel Corporation Device [8086:1521]
           \-0a.0-[03]--

root@xpgp:~# lspci -vvv | egrep -i '(^0|memory)'
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 7846 (rev 02) (prog-if 00 [Normal decode])
        Memory behind bridge: fff00000-000fffff
        Prefetchable memory behind bridge: 00000000-000fffff
00:09.0 PCI bridge: Marvell Technology Group Ltd. Device 7846 (rev 02) (prog-if 00 [Normal decode])
        Memory behind bridge: e0000000-e02fffff
        Prefetchable memory behind bridge: 00000000-000fffff
00:0a.0 PCI bridge: Marvell Technology Group Ltd. Device 7846 (rev 02) (prog-if 00 [Normal decode])
        Memory behind bridge: fff00000-000fffff
        Prefetchable memory behind bridge: 00000000-000fffff
02:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Region 0: Memory at e0000000 (32-bit, non-prefetchable) [disabled] [size=512K]
        Region 3: Memory at e0200000 (32-bit, non-prefetchable) [disabled] [size=16K]
02:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Region 0: Memory at e0100000 (32-bit, non-prefetchable) [disabled] [size=512K]
        Region 3: Memory at e0204000 (32-bit, non-prefetchable) [disabled] [size=16K]

I don't know if it's normal to see bridges 00:01.0 and 00:0a.0 overlap
their areas or not. Maybe it's just because they're not configured.
The second bridge seems to correctly cover the IGB's regions though.
Also noteworthy, I get the exact same output when leaving SZ_1M instead
of SZ_4M in your patch. Thus I think that the real part of the fix is this
one :

	if (!is_power_of_2(port->memwin_size))
		port->memwin_size = 1 << fls(port->memwin_size);

BTW, this could be simplified this way (which also happens to be more
readable) which I could verify also works :

	port->memwin_size = roundup_pow_of_two(port->memwin_size);
	
Concerning the panic with the two ports enabled, I suspect that it's again
an issue related to the way IRQs are registered and rolled back in case of
error.

Before the patch :
PCI: enabling device 0000:02:00.1 (0140 -> 0142)
Unhandled fault: external abort on non-linefetch (0x1008) at 0xf0400018
Internal error: : 1008 [#1] SMP THUMB2
Modules linked in: igb(+) i2c_algo_bit
CPU: 1 PID: 1250 Comm: modprobe Not tainted 3.14.0-mvebu #6
task: c74b0e40 ti: c751c000 task.ti: c751c000
PC is at igb_get_invariants_82575+0x75/0x894 [igb]
LR is at igb_probe+0x22a/0xb80 [igb]
...

After the patch :
PCI: enabling device 0000:02:00.1 (0140 -> 0142)
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1266 at kernel/irq/irqdomain.c:277 irq_domain_associate+0xb9/0x110()
error: hwirq 0xffffffe4 is too large for armada_370_xp_msi_irq
Modules linked in: igb(+) i2c_algo_bit
CPU: 0 PID: 1266 Comm: modprobe Not tainted 3.14.0-mvebu #4
[<c0011c39>] (unwind_backtrace) from [<c000f20b>] (show_stack+0xb/0xc)
[<c000f20b>] (show_stack) from [<c02b5cbb>] (dump_stack+0x4f/0x64)
[<c02b5cbb>] (dump_stack) from [<c001a145>] (warn_slowpath_common+0x49/0x68)
[<c001a145>] (warn_slowpath_common) from [<c001a1bd>] (warn_slowpath_fmt+0x1d/0x28)
[<c001a1bd>] (warn_slowpath_fmt) from [<c0044e81>] (irq_domain_associate+0xb9/0x110)
[<c0044e81>] (irq_domain_associate) from [<c0044f1d>] (irq_create_mapping+0x45/0xa0)
[<c0044f1d>] (irq_create_mapping) from [<c016fd2d>] (armada_370_xp_setup_msi_irq+0x35/0x80)
[<c016fd2d>] (armada_370_xp_setup_msi_irq) from [<c0185243>] (arch_setup_msi_irq+0x17/0x2c)
[<c0185243>] (arch_setup_msi_irq) from [<c018530d>] (arch_setup_msi_irqs+0x39/0x4c)
[<c018530d>] (arch_setup_msi_irqs) from [<c01858bd>] (pci_enable_msix+0x195/0x2b0)
[<c01858bd>] (pci_enable_msix) from [<bf80617b>] (igb_msix_other+0x8de/0xb44 [igb])
[<bf80617b>] (igb_msix_other [igb]) from [<bf806dff>] (igb_probe+0x37a/0xb80 [igb])
[<bf806dff>] (igb_probe [igb]) from [<c017d185>] (pci_device_probe+0x45/0x6c)
...
Unable to handle kernel NULL pointer dereference at virtual address 00000024
pgd = ed9a0000
[00000024] *pgd=074b3831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP THUMB2
Modules linked in: igb(+) i2c_algo_bit
CPU: 0 PID: 1266 Comm: modprobe Tainted: G        W    3.14.0-mvebu #4
task: ed97aec0 ti: c75be000 task.ti: c75be000
PC is at igb_set_mac+0x5d/0x164 [igb]
LR is at igb_set_mac+0xaa/0x164 [igb]
pc : [<bf80489e>]    lr : [<bf8048eb>]    psr: 200f0033
sp : c75bfce8  ip : 00000000  fp : ec938898
r10: bf816950  r9 : 00000001  r8 : ec938440
r7 : edadc868  r6 : 00000008  r5 : ec938440  r4 : 00000006
r3 : 00000000  r2 : 80000000  r1 : ec93845c  r0 : ec938440
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment user
Control: 50c53c7d  Table: 2d9a006a  DAC: 00000015
Process modprobe (pid: 1266, stack limit = 0xc75be240)
...
[<bf80489e>] (igb_set_mac [igb]) from [<bf8048eb>] (igb_set_mac+0xaa/0x164 [igb])
[<bf8048eb>] (igb_set_mac [igb]) from [<bf806183>] (igb_msix_other+0x8e6/0xb44 [igb])
[<bf806183>] (igb_msix_other [igb]) from [<bf806dff>] (igb_probe+0x37a/0xb80 [igb])
[<bf806dff>] (igb_probe [igb]) from [<c017d185>] (pci_device_probe+0x45/0x6c)

So we had :

     igb_probe()
         igb_msix_other()
             pci_enable_msix()  => Warning
             igb_set_mac()      => Panic

Cheers,
Willy
Willy Tarreau April 8, 2014, 6:40 a.m. UTC | #2
Oh and BTW, it also fixed another issue I had loading myri10ge on this
board :

Before the patch :

root@xpgp:~# lspci -vvnn |egrep -i '^0|memory'
00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
        Prefetchable memory behind bridge: 00000000-000fffff
00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
        Prefetchable memory behind bridge: 00000000-000fffff
00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
        Memory behind bridge: e0000000-e17fffff
        Prefetchable memory behind bridge: 00000000-000fffff
03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008]
        Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M]
        Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M]

root@xpgp:~# modprobe myri10ge
myri10ge: Version 1.5.3-1.534
PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
myri10ge 0000:03:00.0: PCIE x4 Link
myri10ge 0000:03:00.0: Direct firmware load failed with error -2
myri10ge 0000:03:00.0: Falling back to user helper
myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug
myri10ge 0000:03:00.0: hotplug firmware loading failed
myri10ge 0000:03:00.0: Successfully adopted running firmware
myri10ge 0000:03:00.0: Using firmware currently running on NIC.  For optimal
myri10ge 0000:03:00.0: performance consider loading optimized firmware
myri10ge 0000:03:00.0: via hotplug
myri10ge 0000:03:00.0: dummy rdma enable failed
myri10ge 0000:03:00.0: command 44 timed out, result = -1
myri10ge 0000:03:00.0: command 12 timed out, result = -1
myri10ge 0000:03:00.0: failed MXGEFW_CMD_GET_RX_RING_SIZE
myri10ge 0000:03:00.0: failed to load firmware
myri10ge 0000:03:00.0: myri10ge_probe() failed: MAC=00:60:dd:47:63:e1, SN=320225

After the patch, the lspci output is *exactly* the same but it works
much better :

root@xpgp:~# lspci -vvnn |egrep -i '^0|memory'
00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
        Prefetchable memory behind bridge: 00000000-000fffff
00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
        Prefetchable memory behind bridge: 00000000-000fffff
00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
        Memory behind bridge: e0000000-e17fffff
        Prefetchable memory behind bridge: 00000000-000fffff
03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008]
        Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M]
        Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M]

root@xpgp:~# modprobe myri10ge
myri10ge: Version 1.5.3-1.534
PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
myri10ge 0000:03:00.0: PCIE x1 Link
myri10ge 0000:03:00.0: Direct firmware load failed with error -2
myri10ge 0000:03:00.0: Falling back to user helper
myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug
myri10ge 0000:03:00.0: hotplug firmware loading failed
myri10ge 0000:03:00.0: Successfully adopted running firmware
myri10ge 0000:03:00.0: Using firmware currently running on NIC.  For optimal
myri10ge 0000:03:00.0: performance consider loading optimized firmware
myri10ge 0000:03:00.0: via hotplug
myri10ge 0000:03:00.0: MSI IRQ 112, tx bndry 2048, fw adopted, WC Disabled
root@xpgp:~#

Thus I guess that the timeout I was seeing above during the modprobe was
caused by the incorrect mbus window.

So +1 on this part of the patch here :

        port->memwin_size = roundup_pow_of_two(port->memwin_size);

Cheers,
Willy
Matthew Minter April 8, 2014, 10:53 a.m. UTC | #3
Just to confirm, this patch is stopping my kernel panic also, I am
still suffering from another minor issue however this does seem to be
the crux of my problem also. I can however happily say that I regard
my hotplug issue closed.

Many thanks guys

On 8 April 2014 07:40, Willy Tarreau <w@1wt.eu> wrote:
> Oh and BTW, it also fixed another issue I had loading myri10ge on this
> board :
>
> Before the patch :
>
> root@xpgp:~# lspci -vvnn |egrep -i '^0|memory'
> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>         Prefetchable memory behind bridge: 00000000-000fffff
> 00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>         Prefetchable memory behind bridge: 00000000-000fffff
> 00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>         Memory behind bridge: e0000000-e17fffff
>         Prefetchable memory behind bridge: 00000000-000fffff
> 03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008]
>         Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M]
>         Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M]
>
> root@xpgp:~# modprobe myri10ge
> myri10ge: Version 1.5.3-1.534
> PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
> myri10ge 0000:03:00.0: PCIE x4 Link
> myri10ge 0000:03:00.0: Direct firmware load failed with error -2
> myri10ge 0000:03:00.0: Falling back to user helper
> myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug
> myri10ge 0000:03:00.0: hotplug firmware loading failed
> myri10ge 0000:03:00.0: Successfully adopted running firmware
> myri10ge 0000:03:00.0: Using firmware currently running on NIC.  For optimal
> myri10ge 0000:03:00.0: performance consider loading optimized firmware
> myri10ge 0000:03:00.0: via hotplug
> myri10ge 0000:03:00.0: dummy rdma enable failed
> myri10ge 0000:03:00.0: command 44 timed out, result = -1
> myri10ge 0000:03:00.0: command 12 timed out, result = -1
> myri10ge 0000:03:00.0: failed MXGEFW_CMD_GET_RX_RING_SIZE
> myri10ge 0000:03:00.0: failed to load firmware
> myri10ge 0000:03:00.0: myri10ge_probe() failed: MAC=00:60:dd:47:63:e1, SN=320225
>
> After the patch, the lspci output is *exactly* the same but it works
> much better :
>
> root@xpgp:~# lspci -vvnn |egrep -i '^0|memory'
> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>         Prefetchable memory behind bridge: 00000000-000fffff
> 00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>         Prefetchable memory behind bridge: 00000000-000fffff
> 00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>         Memory behind bridge: e0000000-e17fffff
>         Prefetchable memory behind bridge: 00000000-000fffff
> 03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008]
>         Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M]
>         Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M]
>
> root@xpgp:~# modprobe myri10ge
> myri10ge: Version 1.5.3-1.534
> PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
> myri10ge 0000:03:00.0: PCIE x1 Link
> myri10ge 0000:03:00.0: Direct firmware load failed with error -2
> myri10ge 0000:03:00.0: Falling back to user helper
> myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug
> myri10ge 0000:03:00.0: hotplug firmware loading failed
> myri10ge 0000:03:00.0: Successfully adopted running firmware
> myri10ge 0000:03:00.0: Using firmware currently running on NIC.  For optimal
> myri10ge 0000:03:00.0: performance consider loading optimized firmware
> myri10ge 0000:03:00.0: via hotplug
> myri10ge 0000:03:00.0: MSI IRQ 112, tx bndry 2048, fw adopted, WC Disabled
> root@xpgp:~#
>
> Thus I guess that the timeout I was seeing above during the modprobe was
> caused by the incorrect mbus window.
>
> So +1 on this part of the patch here :
>
>         port->memwin_size = roundup_pow_of_two(port->memwin_size);
>
> Cheers,
> Willy
>
Matthew Minter April 8, 2014, 12:31 p.m. UTC | #4
To add to my previous point, I would consider releasing these patches
as soon as possible as it seems that it can easily cause panics on a
range of boards. Failing that perhaps it is at least worth releasing a
hotfix which will cause PCI probing to fail should the window not be a
power of two? Thus it will fail early with a warning instead of
potentially causing a kernel crash.

Best regards,
Matt

On 8 April 2014 11:53, Matthew Minter <matthew_minter@xyratex.com> wrote:
> Just to confirm, this patch is stopping my kernel panic also, I am
> still suffering from another minor issue however this does seem to be
> the crux of my problem also. I can however happily say that I regard
> my hotplug issue closed.
>
> Many thanks guys
>
> On 8 April 2014 07:40, Willy Tarreau <w@1wt.eu> wrote:
>> Oh and BTW, it also fixed another issue I had loading myri10ge on this
>> board :
>>
>> Before the patch :
>>
>> root@xpgp:~# lspci -vvnn |egrep -i '^0|memory'
>> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>>         Prefetchable memory behind bridge: 00000000-000fffff
>> 00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>>         Prefetchable memory behind bridge: 00000000-000fffff
>> 00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>>         Memory behind bridge: e0000000-e17fffff
>>         Prefetchable memory behind bridge: 00000000-000fffff
>> 03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008]
>>         Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M]
>>         Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M]
>>
>> root@xpgp:~# modprobe myri10ge
>> myri10ge: Version 1.5.3-1.534
>> PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
>> myri10ge 0000:03:00.0: PCIE x4 Link
>> myri10ge 0000:03:00.0: Direct firmware load failed with error -2
>> myri10ge 0000:03:00.0: Falling back to user helper
>> myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug
>> myri10ge 0000:03:00.0: hotplug firmware loading failed
>> myri10ge 0000:03:00.0: Successfully adopted running firmware
>> myri10ge 0000:03:00.0: Using firmware currently running on NIC.  For optimal
>> myri10ge 0000:03:00.0: performance consider loading optimized firmware
>> myri10ge 0000:03:00.0: via hotplug
>> myri10ge 0000:03:00.0: dummy rdma enable failed
>> myri10ge 0000:03:00.0: command 44 timed out, result = -1
>> myri10ge 0000:03:00.0: command 12 timed out, result = -1
>> myri10ge 0000:03:00.0: failed MXGEFW_CMD_GET_RX_RING_SIZE
>> myri10ge 0000:03:00.0: failed to load firmware
>> myri10ge 0000:03:00.0: myri10ge_probe() failed: MAC=00:60:dd:47:63:e1, SN=320225
>>
>> After the patch, the lspci output is *exactly* the same but it works
>> much better :
>>
>> root@xpgp:~# lspci -vvnn |egrep -i '^0|memory'
>> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>>         Prefetchable memory behind bridge: 00000000-000fffff
>> 00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>>         Prefetchable memory behind bridge: 00000000-000fffff
>> 00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode])
>>         Memory behind bridge: e0000000-e17fffff
>>         Prefetchable memory behind bridge: 00000000-000fffff
>> 03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008]
>>         Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M]
>>         Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M]
>>
>> root@xpgp:~# modprobe myri10ge
>> myri10ge: Version 1.5.3-1.534
>> PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
>> myri10ge 0000:03:00.0: PCIE x1 Link
>> myri10ge 0000:03:00.0: Direct firmware load failed with error -2
>> myri10ge 0000:03:00.0: Falling back to user helper
>> myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug
>> myri10ge 0000:03:00.0: hotplug firmware loading failed
>> myri10ge 0000:03:00.0: Successfully adopted running firmware
>> myri10ge 0000:03:00.0: Using firmware currently running on NIC.  For optimal
>> myri10ge 0000:03:00.0: performance consider loading optimized firmware
>> myri10ge 0000:03:00.0: via hotplug
>> myri10ge 0000:03:00.0: MSI IRQ 112, tx bndry 2048, fw adopted, WC Disabled
>> root@xpgp:~#
>>
>> Thus I guess that the timeout I was seeing above during the modprobe was
>> caused by the incorrect mbus window.
>>
>> So +1 on this part of the patch here :
>>
>>         port->memwin_size = roundup_pow_of_two(port->memwin_size);
>>
>> Cheers,
>> Willy
>>
Thomas Petazzoni April 8, 2014, 2:43 p.m. UTC | #5
Willy, Neil, Matthew,

On Tue, 8 Apr 2014 14:36:32 +0200, Willy Tarreau wrote:

> On Tue, Apr 08, 2014 at 01:31:48PM +0100, Matthew Minter wrote:
> > To add to my previous point, I would consider releasing these patches
> > as soon as possible as it seems that it can easily cause panics on a
> > range of boards. Failing that perhaps it is at least worth releasing a
> > hotfix which will cause PCI probing to fail should the window not be a
> > power of two? Thus it will fail early with a warning instead of
> > potentially causing a kernel crash.
> 
> Have you tested the whole patch or just the one enforcing the power of 2 ?
> It would be interesting to know if the one with SZ_4M is needed for you or
> not (I guess not).
> 
> I suspect that just this single-liner will work as well, as it does for me.
> 
> Thanks,
> Willy
> 
> 
> diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
> index 0e79665..552ab73 100644
> --- a/drivers/pci/host/pci-mvebu.c
> +++ b/drivers/pci/host/pci-mvebu.c
> @@ -363,6 +363,7 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
>  	port->memwin_size  =
>  		(((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
>  		port->memwin_base;
> +	port->memwin_size = roundup_pow_of_two(port->memwin_size);
>  
>  	mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
>  				    port->memwin_base, port->memwin_size);

Thanks to all of you for the investigation. So if I summarize your
findings, we have two patches needed to fix the problems of everybody:

 1) The bug fix for the MSI teardown function in irq-armada-370-xp.c.
    This one is easy, I'll test it right now, and give my formal
    Acked-by soon.

 2) The problem of non-power of 2 sized windows. This one is more
    complicated, as we cannot simply round up the size of the windows
    inside the pci-mvebu.c driver: the Linux PCI core is not aware of
    this rounding, and might therefore allocate a BAR for the next
    device at an address that overlaps the previous window we have
    enlarged to match the power of two size requirement. This is
    something that was already discussed with the PCI maintainers, but
    the discussion needs to be revived I guess.

Am I correct, or are other patches needed, or are remaining problems
not solved by these two fixes?

Thomas
Matthew Minter April 8, 2014, 2:52 p.m. UTC | #6
That is indeed correct. Those two patches correct the behaviour fully
on my board. I am by no means an expert on this, but could problem 2
be fixed using resource_alignment?  I am a little fuzzy about how
calls to it work but I know the sysfs entry can force the alignment/
size of PCI resources so presumably the kernel already has this
capability? My board is unfortunately unable to diagnose any possible
issues regarding possibly overlapping device windows as it physically
only allows 1 PCIe device at once.

Again, many thanks with your help, that has indeed fixed it for me.

On 8 April 2014 15:43, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> Willy, Neil, Matthew,
>
> On Tue, 8 Apr 2014 14:36:32 +0200, Willy Tarreau wrote:
>
>> On Tue, Apr 08, 2014 at 01:31:48PM +0100, Matthew Minter wrote:
>> > To add to my previous point, I would consider releasing these patches
>> > as soon as possible as it seems that it can easily cause panics on a
>> > range of boards. Failing that perhaps it is at least worth releasing a
>> > hotfix which will cause PCI probing to fail should the window not be a
>> > power of two? Thus it will fail early with a warning instead of
>> > potentially causing a kernel crash.
>>
>> Have you tested the whole patch or just the one enforcing the power of 2 ?
>> It would be interesting to know if the one with SZ_4M is needed for you or
>> not (I guess not).
>>
>> I suspect that just this single-liner will work as well, as it does for me.
>>
>> Thanks,
>> Willy
>>
>>
>> diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
>> index 0e79665..552ab73 100644
>> --- a/drivers/pci/host/pci-mvebu.c
>> +++ b/drivers/pci/host/pci-mvebu.c
>> @@ -363,6 +363,7 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
>>       port->memwin_size  =
>>               (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
>>               port->memwin_base;
>> +     port->memwin_size = roundup_pow_of_two(port->memwin_size);
>>
>>       mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
>>                                   port->memwin_base, port->memwin_size);
>
> Thanks to all of you for the investigation. So if I summarize your
> findings, we have two patches needed to fix the problems of everybody:
>
>  1) The bug fix for the MSI teardown function in irq-armada-370-xp.c.
>     This one is easy, I'll test it right now, and give my formal
>     Acked-by soon.
>
>  2) The problem of non-power of 2 sized windows. This one is more
>     complicated, as we cannot simply round up the size of the windows
>     inside the pci-mvebu.c driver: the Linux PCI core is not aware of
>     this rounding, and might therefore allocate a BAR for the next
>     device at an address that overlaps the previous window we have
>     enlarged to match the power of two size requirement. This is
>     something that was already discussed with the PCI maintainers, but
>     the discussion needs to be revived I guess.
>
> Am I correct, or are other patches needed, or are remaining problems
> not solved by these two fixes?
>
> Thomas
> --
> Thomas Petazzoni, CTO, Free Electrons
> Embedded Linux, Kernel and Android engineering
> http://free-electrons.com
Willy Tarreau April 8, 2014, 2:53 p.m. UTC | #7
Hi Thomas,

On Tue, Apr 08, 2014 at 04:43:17PM +0200, Thomas Petazzoni wrote:
> Willy, Neil, Matthew,
> 
> On Tue, 8 Apr 2014 14:36:32 +0200, Willy Tarreau wrote:
> 
> > On Tue, Apr 08, 2014 at 01:31:48PM +0100, Matthew Minter wrote:
> > > To add to my previous point, I would consider releasing these patches
> > > as soon as possible as it seems that it can easily cause panics on a
> > > range of boards. Failing that perhaps it is at least worth releasing a
> > > hotfix which will cause PCI probing to fail should the window not be a
> > > power of two? Thus it will fail early with a warning instead of
> > > potentially causing a kernel crash.
> > 
> > Have you tested the whole patch or just the one enforcing the power of 2 ?
> > It would be interesting to know if the one with SZ_4M is needed for you or
> > not (I guess not).
> > 
> > I suspect that just this single-liner will work as well, as it does for me.
> > 
> > Thanks,
> > Willy
> > 
> > 
> > diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
> > index 0e79665..552ab73 100644
> > --- a/drivers/pci/host/pci-mvebu.c
> > +++ b/drivers/pci/host/pci-mvebu.c
> > @@ -363,6 +363,7 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
> >  	port->memwin_size  =
> >  		(((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
> >  		port->memwin_base;
> > +	port->memwin_size = roundup_pow_of_two(port->memwin_size);
> >  
> >  	mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
> >  				    port->memwin_base, port->memwin_size);
> 
> Thanks to all of you for the investigation. So if I summarize your
> findings, we have two patches needed to fix the problems of everybody:
> 
>  1) The bug fix for the MSI teardown function in irq-armada-370-xp.c.
>     This one is easy, I'll test it right now, and give my formal
>     Acked-by soon.
> 
>  2) The problem of non-power of 2 sized windows. This one is more
>     complicated, as we cannot simply round up the size of the windows
>     inside the pci-mvebu.c driver: the Linux PCI core is not aware of
>     this rounding, and might therefore allocate a BAR for the next
>     device at an address that overlaps the previous window we have
>     enlarged to match the power of two size requirement. This is
>     something that was already discussed with the PCI maintainers, but
>     the discussion needs to be revived I guess.
> 
> Am I correct, or are other patches needed, or are remaining problems
> not solved by these two fixes?

There's still the older issue I got with trying to load a 2-port based
igb card on the xp-gp, the code says that hwirq 0xffffffe4 is invalid
and cannot be enabled and after that it panics. I recall having debugged
this one a few months ago, and finding something like ffffffe4 == -ENOSPC
which was returned by one of the inner functions. Let's put that aside
for a moment though, but I would appreciate it if you find the time to
try your igb NIC on your XPGP in 3.14 to see if you get the same result.

Thanks,
Willy
Thomas Petazzoni April 8, 2014, 3:25 p.m. UTC | #8
Dear Willy Tarreau,

On Tue, 8 Apr 2014 16:53:44 +0200, Willy Tarreau wrote:

> > Am I correct, or are other patches needed, or are remaining problems
> > not solved by these two fixes?
> 
> There's still the older issue I got with trying to load a 2-port based
> igb card on the xp-gp, the code says that hwirq 0xffffffe4 is invalid
> and cannot be enabled and after that it panics. I recall having debugged
> this one a few months ago, and finding something like ffffffe4 == -ENOSPC
> which was returned by one of the inner functions. Let's put that aside
> for a moment though, but I would appreciate it if you find the time to
> try your igb NIC on your XPGP in 3.14 to see if you get the same result.

See my second e-mail: this is due to improperly casting a signed value
into a unsigned value.

I currently have the igb NIC in my Armada XP DB, and I've reported a
panic :-)

Thomas
Willy Tarreau April 8, 2014, 5:56 p.m. UTC | #9
On Tue, Apr 08, 2014 at 01:31:48PM +0100, Matthew Minter wrote:
> To add to my previous point, I would consider releasing these patches
> as soon as possible as it seems that it can easily cause panics on a
> range of boards.

On the other hand, it's not the random user who stuffs PCIe cards on
such boards PCIe slots, so in general there's a long testing period.
We identified the igb issue in mid-december without getting completely
down on it, so we remained 3 months without any other single report
about this. Thus I think that the only persons really interested in
having these patches are in this thread, so the emergency is not that
high :-)

Cheers,
Willy
diff mbox

Patch

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index afd0dce..5e0144c 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -364,6 +364,9 @@  static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
  		(((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
  		port->memwin_base;

+	if (!is_power_of_2(port->memwin_size))
+		port->memwin_size = 1 << fls(port->memwin_size);
+
  	mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
  				    port->memwin_base, port->memwin_size);
  }
@@ -728,7 +731,7 @@  static resource_size_t mvebu_pcie_align_resource(struct pci_dev *dev,
  	if (res->flags & IORESOURCE_IO)
  		return round_up(start, max_t(resource_size_t, SZ_64K, size));
  	else if (res->flags & IORESOURCE_MEM)
-		return round_up(start, max_t(resource_size_t, SZ_1M, size));
+		return round_up(start, max_t(resource_size_t, SZ_4M, size));
  	else
  		return start;
  }