Message ID | alpine.DEB.2.10.1404072239050.32332@vroombuntu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Neil, On Mon, Apr 07, 2014 at 10:58:36PM +0100, Neil Greatorex wrote: > I have finally managed to get the card working on both ports! Of course, > to do so I have added some nice kludges to the code that now need to be > implemented properly, but it is verification of what the problem is and > how to fix it! > > I have included the patch at the end of this e-mail. It probably won't > apply cleanly for you as I have other dev_dbg calls in pci-mvebu.c. > > What I did was to alter mvebu_pcie_align_resource to make the bridge > memory resource aligned to 4M. This had the effect that the 2nd bridge to > the xHCI controller was bumped to address 0xe0400000 instead of > 0xe0300000. I then also made it so that when we request the MBUS window to > be set up we ensure that the size is a power of 2. This has the effect of > creating the windows and addresses how we want them: > > Relevant part of lspci -vvv: > > 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) > (prog-if 00 [Normal decode]) > Memory behind bridge: e0000000-e02fffff > > 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) > (prog-if 00 [Normal decode]) > Memory behind bridge: e0400000-e04fffff > > cat /sys/kernel/debug/mvebu-mbus/devices: > > [00] 00000000e8010000 - 00000000e8020000 : 0004:00e0 (remap > 0000000000010000) > [01] disabled > [02] disabled > [03] disabled > [04] disabled > [05] disabled > [06] disabled > [07] disabled > [08] 00000000fff00000 - 0000000100000000 : 0001:00e0 > [09] 00000000e0400000 - 00000000e0500000 : 0008:00e8 > [10] 00000000e0000000 - 00000000e0400000 : 0004:00e8 > [11] disabled > [12] disabled > [13] disabled > [14] disabled > [15] disabled > [16] disabled > [17] disabled > [18] disabled > [19] disabled > > Now, over to the experts to implement this properly :-) > > Thanks to Jason, Thomas and Willy for your help with tracking this down. Well, on the XPGP board, it made some progress, but now I'm getting another crash related to IRQs again when both ports are enabled (note that I do have your other MSI fix). However, enabling only the second port works now, so I guess it's just an IRQ assignment issue which is killing it. Here's what the bus looks like with your patch : root@xpgp:~# lspci -tvnn -[0000:00]-+-01.0-[01]-- +-09.0-[02]--+-00.0 Intel Corporation Device [8086:1521] | \-00.1 Intel Corporation Device [8086:1521] \-0a.0-[03]-- root@xpgp:~# lspci -vvv | egrep -i '(^0|memory)' 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 7846 (rev 02) (prog-if 00 [Normal decode]) Memory behind bridge: fff00000-000fffff Prefetchable memory behind bridge: 00000000-000fffff 00:09.0 PCI bridge: Marvell Technology Group Ltd. Device 7846 (rev 02) (prog-if 00 [Normal decode]) Memory behind bridge: e0000000-e02fffff Prefetchable memory behind bridge: 00000000-000fffff 00:0a.0 PCI bridge: Marvell Technology Group Ltd. Device 7846 (rev 02) (prog-if 00 [Normal decode]) Memory behind bridge: fff00000-000fffff Prefetchable memory behind bridge: 00000000-000fffff 02:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01) Region 0: Memory at e0000000 (32-bit, non-prefetchable) [disabled] [size=512K] Region 3: Memory at e0200000 (32-bit, non-prefetchable) [disabled] [size=16K] 02:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01) Region 0: Memory at e0100000 (32-bit, non-prefetchable) [disabled] [size=512K] Region 3: Memory at e0204000 (32-bit, non-prefetchable) [disabled] [size=16K] I don't know if it's normal to see bridges 00:01.0 and 00:0a.0 overlap their areas or not. Maybe it's just because they're not configured. The second bridge seems to correctly cover the IGB's regions though. Also noteworthy, I get the exact same output when leaving SZ_1M instead of SZ_4M in your patch. Thus I think that the real part of the fix is this one : if (!is_power_of_2(port->memwin_size)) port->memwin_size = 1 << fls(port->memwin_size); BTW, this could be simplified this way (which also happens to be more readable) which I could verify also works : port->memwin_size = roundup_pow_of_two(port->memwin_size); Concerning the panic with the two ports enabled, I suspect that it's again an issue related to the way IRQs are registered and rolled back in case of error. Before the patch : PCI: enabling device 0000:02:00.1 (0140 -> 0142) Unhandled fault: external abort on non-linefetch (0x1008) at 0xf0400018 Internal error: : 1008 [#1] SMP THUMB2 Modules linked in: igb(+) i2c_algo_bit CPU: 1 PID: 1250 Comm: modprobe Not tainted 3.14.0-mvebu #6 task: c74b0e40 ti: c751c000 task.ti: c751c000 PC is at igb_get_invariants_82575+0x75/0x894 [igb] LR is at igb_probe+0x22a/0xb80 [igb] ... After the patch : PCI: enabling device 0000:02:00.1 (0140 -> 0142) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1266 at kernel/irq/irqdomain.c:277 irq_domain_associate+0xb9/0x110() error: hwirq 0xffffffe4 is too large for armada_370_xp_msi_irq Modules linked in: igb(+) i2c_algo_bit CPU: 0 PID: 1266 Comm: modprobe Not tainted 3.14.0-mvebu #4 [<c0011c39>] (unwind_backtrace) from [<c000f20b>] (show_stack+0xb/0xc) [<c000f20b>] (show_stack) from [<c02b5cbb>] (dump_stack+0x4f/0x64) [<c02b5cbb>] (dump_stack) from [<c001a145>] (warn_slowpath_common+0x49/0x68) [<c001a145>] (warn_slowpath_common) from [<c001a1bd>] (warn_slowpath_fmt+0x1d/0x28) [<c001a1bd>] (warn_slowpath_fmt) from [<c0044e81>] (irq_domain_associate+0xb9/0x110) [<c0044e81>] (irq_domain_associate) from [<c0044f1d>] (irq_create_mapping+0x45/0xa0) [<c0044f1d>] (irq_create_mapping) from [<c016fd2d>] (armada_370_xp_setup_msi_irq+0x35/0x80) [<c016fd2d>] (armada_370_xp_setup_msi_irq) from [<c0185243>] (arch_setup_msi_irq+0x17/0x2c) [<c0185243>] (arch_setup_msi_irq) from [<c018530d>] (arch_setup_msi_irqs+0x39/0x4c) [<c018530d>] (arch_setup_msi_irqs) from [<c01858bd>] (pci_enable_msix+0x195/0x2b0) [<c01858bd>] (pci_enable_msix) from [<bf80617b>] (igb_msix_other+0x8de/0xb44 [igb]) [<bf80617b>] (igb_msix_other [igb]) from [<bf806dff>] (igb_probe+0x37a/0xb80 [igb]) [<bf806dff>] (igb_probe [igb]) from [<c017d185>] (pci_device_probe+0x45/0x6c) ... Unable to handle kernel NULL pointer dereference at virtual address 00000024 pgd = ed9a0000 [00000024] *pgd=074b3831, *pte=00000000, *ppte=00000000 Internal error: Oops: 17 [#1] SMP THUMB2 Modules linked in: igb(+) i2c_algo_bit CPU: 0 PID: 1266 Comm: modprobe Tainted: G W 3.14.0-mvebu #4 task: ed97aec0 ti: c75be000 task.ti: c75be000 PC is at igb_set_mac+0x5d/0x164 [igb] LR is at igb_set_mac+0xaa/0x164 [igb] pc : [<bf80489e>] lr : [<bf8048eb>] psr: 200f0033 sp : c75bfce8 ip : 00000000 fp : ec938898 r10: bf816950 r9 : 00000001 r8 : ec938440 r7 : edadc868 r6 : 00000008 r5 : ec938440 r4 : 00000006 r3 : 00000000 r2 : 80000000 r1 : ec93845c r0 : ec938440 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user Control: 50c53c7d Table: 2d9a006a DAC: 00000015 Process modprobe (pid: 1266, stack limit = 0xc75be240) ... [<bf80489e>] (igb_set_mac [igb]) from [<bf8048eb>] (igb_set_mac+0xaa/0x164 [igb]) [<bf8048eb>] (igb_set_mac [igb]) from [<bf806183>] (igb_msix_other+0x8e6/0xb44 [igb]) [<bf806183>] (igb_msix_other [igb]) from [<bf806dff>] (igb_probe+0x37a/0xb80 [igb]) [<bf806dff>] (igb_probe [igb]) from [<c017d185>] (pci_device_probe+0x45/0x6c) So we had : igb_probe() igb_msix_other() pci_enable_msix() => Warning igb_set_mac() => Panic Cheers, Willy
Oh and BTW, it also fixed another issue I had loading myri10ge on this board : Before the patch : root@xpgp:~# lspci -vvnn |egrep -i '^0|memory' 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) Prefetchable memory behind bridge: 00000000-000fffff 00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) Prefetchable memory behind bridge: 00000000-000fffff 00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) Memory behind bridge: e0000000-e17fffff Prefetchable memory behind bridge: 00000000-000fffff 03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008] Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M] Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M] root@xpgp:~# modprobe myri10ge myri10ge: Version 1.5.3-1.534 PCI: enabling device 0000:00:0a.0 (0140 -> 0143) myri10ge 0000:03:00.0: PCIE x4 Link myri10ge 0000:03:00.0: Direct firmware load failed with error -2 myri10ge 0000:03:00.0: Falling back to user helper myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug myri10ge 0000:03:00.0: hotplug firmware loading failed myri10ge 0000:03:00.0: Successfully adopted running firmware myri10ge 0000:03:00.0: Using firmware currently running on NIC. For optimal myri10ge 0000:03:00.0: performance consider loading optimized firmware myri10ge 0000:03:00.0: via hotplug myri10ge 0000:03:00.0: dummy rdma enable failed myri10ge 0000:03:00.0: command 44 timed out, result = -1 myri10ge 0000:03:00.0: command 12 timed out, result = -1 myri10ge 0000:03:00.0: failed MXGEFW_CMD_GET_RX_RING_SIZE myri10ge 0000:03:00.0: failed to load firmware myri10ge 0000:03:00.0: myri10ge_probe() failed: MAC=00:60:dd:47:63:e1, SN=320225 After the patch, the lspci output is *exactly* the same but it works much better : root@xpgp:~# lspci -vvnn |egrep -i '^0|memory' 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) Prefetchable memory behind bridge: 00000000-000fffff 00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) Prefetchable memory behind bridge: 00000000-000fffff 00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) Memory behind bridge: e0000000-e17fffff Prefetchable memory behind bridge: 00000000-000fffff 03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008] Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M] Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M] root@xpgp:~# modprobe myri10ge myri10ge: Version 1.5.3-1.534 PCI: enabling device 0000:00:0a.0 (0140 -> 0143) myri10ge 0000:03:00.0: PCIE x1 Link myri10ge 0000:03:00.0: Direct firmware load failed with error -2 myri10ge 0000:03:00.0: Falling back to user helper myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug myri10ge 0000:03:00.0: hotplug firmware loading failed myri10ge 0000:03:00.0: Successfully adopted running firmware myri10ge 0000:03:00.0: Using firmware currently running on NIC. For optimal myri10ge 0000:03:00.0: performance consider loading optimized firmware myri10ge 0000:03:00.0: via hotplug myri10ge 0000:03:00.0: MSI IRQ 112, tx bndry 2048, fw adopted, WC Disabled root@xpgp:~# Thus I guess that the timeout I was seeing above during the modprobe was caused by the incorrect mbus window. So +1 on this part of the patch here : port->memwin_size = roundup_pow_of_two(port->memwin_size); Cheers, Willy
Just to confirm, this patch is stopping my kernel panic also, I am still suffering from another minor issue however this does seem to be the crux of my problem also. I can however happily say that I regard my hotplug issue closed. Many thanks guys On 8 April 2014 07:40, Willy Tarreau <w@1wt.eu> wrote: > Oh and BTW, it also fixed another issue I had loading myri10ge on this > board : > > Before the patch : > > root@xpgp:~# lspci -vvnn |egrep -i '^0|memory' > 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) > Prefetchable memory behind bridge: 00000000-000fffff > 00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) > Prefetchable memory behind bridge: 00000000-000fffff > 00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) > Memory behind bridge: e0000000-e17fffff > Prefetchable memory behind bridge: 00000000-000fffff > 03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008] > Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M] > Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M] > > root@xpgp:~# modprobe myri10ge > myri10ge: Version 1.5.3-1.534 > PCI: enabling device 0000:00:0a.0 (0140 -> 0143) > myri10ge 0000:03:00.0: PCIE x4 Link > myri10ge 0000:03:00.0: Direct firmware load failed with error -2 > myri10ge 0000:03:00.0: Falling back to user helper > myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug > myri10ge 0000:03:00.0: hotplug firmware loading failed > myri10ge 0000:03:00.0: Successfully adopted running firmware > myri10ge 0000:03:00.0: Using firmware currently running on NIC. For optimal > myri10ge 0000:03:00.0: performance consider loading optimized firmware > myri10ge 0000:03:00.0: via hotplug > myri10ge 0000:03:00.0: dummy rdma enable failed > myri10ge 0000:03:00.0: command 44 timed out, result = -1 > myri10ge 0000:03:00.0: command 12 timed out, result = -1 > myri10ge 0000:03:00.0: failed MXGEFW_CMD_GET_RX_RING_SIZE > myri10ge 0000:03:00.0: failed to load firmware > myri10ge 0000:03:00.0: myri10ge_probe() failed: MAC=00:60:dd:47:63:e1, SN=320225 > > After the patch, the lspci output is *exactly* the same but it works > much better : > > root@xpgp:~# lspci -vvnn |egrep -i '^0|memory' > 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) > Prefetchable memory behind bridge: 00000000-000fffff > 00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) > Prefetchable memory behind bridge: 00000000-000fffff > 00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) > Memory behind bridge: e0000000-e17fffff > Prefetchable memory behind bridge: 00000000-000fffff > 03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008] > Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M] > Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M] > > root@xpgp:~# modprobe myri10ge > myri10ge: Version 1.5.3-1.534 > PCI: enabling device 0000:00:0a.0 (0140 -> 0143) > myri10ge 0000:03:00.0: PCIE x1 Link > myri10ge 0000:03:00.0: Direct firmware load failed with error -2 > myri10ge 0000:03:00.0: Falling back to user helper > myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug > myri10ge 0000:03:00.0: hotplug firmware loading failed > myri10ge 0000:03:00.0: Successfully adopted running firmware > myri10ge 0000:03:00.0: Using firmware currently running on NIC. For optimal > myri10ge 0000:03:00.0: performance consider loading optimized firmware > myri10ge 0000:03:00.0: via hotplug > myri10ge 0000:03:00.0: MSI IRQ 112, tx bndry 2048, fw adopted, WC Disabled > root@xpgp:~# > > Thus I guess that the timeout I was seeing above during the modprobe was > caused by the incorrect mbus window. > > So +1 on this part of the patch here : > > port->memwin_size = roundup_pow_of_two(port->memwin_size); > > Cheers, > Willy >
To add to my previous point, I would consider releasing these patches as soon as possible as it seems that it can easily cause panics on a range of boards. Failing that perhaps it is at least worth releasing a hotfix which will cause PCI probing to fail should the window not be a power of two? Thus it will fail early with a warning instead of potentially causing a kernel crash. Best regards, Matt On 8 April 2014 11:53, Matthew Minter <matthew_minter@xyratex.com> wrote: > Just to confirm, this patch is stopping my kernel panic also, I am > still suffering from another minor issue however this does seem to be > the crux of my problem also. I can however happily say that I regard > my hotplug issue closed. > > Many thanks guys > > On 8 April 2014 07:40, Willy Tarreau <w@1wt.eu> wrote: >> Oh and BTW, it also fixed another issue I had loading myri10ge on this >> board : >> >> Before the patch : >> >> root@xpgp:~# lspci -vvnn |egrep -i '^0|memory' >> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) >> Prefetchable memory behind bridge: 00000000-000fffff >> 00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) >> Prefetchable memory behind bridge: 00000000-000fffff >> 00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) >> Memory behind bridge: e0000000-e17fffff >> Prefetchable memory behind bridge: 00000000-000fffff >> 03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008] >> Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M] >> Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M] >> >> root@xpgp:~# modprobe myri10ge >> myri10ge: Version 1.5.3-1.534 >> PCI: enabling device 0000:00:0a.0 (0140 -> 0143) >> myri10ge 0000:03:00.0: PCIE x4 Link >> myri10ge 0000:03:00.0: Direct firmware load failed with error -2 >> myri10ge 0000:03:00.0: Falling back to user helper >> myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug >> myri10ge 0000:03:00.0: hotplug firmware loading failed >> myri10ge 0000:03:00.0: Successfully adopted running firmware >> myri10ge 0000:03:00.0: Using firmware currently running on NIC. For optimal >> myri10ge 0000:03:00.0: performance consider loading optimized firmware >> myri10ge 0000:03:00.0: via hotplug >> myri10ge 0000:03:00.0: dummy rdma enable failed >> myri10ge 0000:03:00.0: command 44 timed out, result = -1 >> myri10ge 0000:03:00.0: command 12 timed out, result = -1 >> myri10ge 0000:03:00.0: failed MXGEFW_CMD_GET_RX_RING_SIZE >> myri10ge 0000:03:00.0: failed to load firmware >> myri10ge 0000:03:00.0: myri10ge_probe() failed: MAC=00:60:dd:47:63:e1, SN=320225 >> >> After the patch, the lspci output is *exactly* the same but it works >> much better : >> >> root@xpgp:~# lspci -vvnn |egrep -i '^0|memory' >> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) >> Prefetchable memory behind bridge: 00000000-000fffff >> 00:09.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) >> Prefetchable memory behind bridge: 00000000-000fffff >> 00:0a.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (rev 02) (prog-if 00 [Normal decode]) >> Memory behind bridge: e0000000-e17fffff >> Prefetchable memory behind bridge: 00000000-000fffff >> 03:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008] >> Region 0: Memory at e0000000 (64-bit, prefetchable) [size=16M] >> Region 2: Memory at e1000000 (64-bit, non-prefetchable) [size=1M] >> >> root@xpgp:~# modprobe myri10ge >> myri10ge: Version 1.5.3-1.534 >> PCI: enabling device 0000:00:0a.0 (0140 -> 0143) >> myri10ge 0000:03:00.0: PCIE x1 Link >> myri10ge 0000:03:00.0: Direct firmware load failed with error -2 >> myri10ge 0000:03:00.0: Falling back to user helper >> myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug >> myri10ge 0000:03:00.0: hotplug firmware loading failed >> myri10ge 0000:03:00.0: Successfully adopted running firmware >> myri10ge 0000:03:00.0: Using firmware currently running on NIC. For optimal >> myri10ge 0000:03:00.0: performance consider loading optimized firmware >> myri10ge 0000:03:00.0: via hotplug >> myri10ge 0000:03:00.0: MSI IRQ 112, tx bndry 2048, fw adopted, WC Disabled >> root@xpgp:~# >> >> Thus I guess that the timeout I was seeing above during the modprobe was >> caused by the incorrect mbus window. >> >> So +1 on this part of the patch here : >> >> port->memwin_size = roundup_pow_of_two(port->memwin_size); >> >> Cheers, >> Willy >>
Willy, Neil, Matthew, On Tue, 8 Apr 2014 14:36:32 +0200, Willy Tarreau wrote: > On Tue, Apr 08, 2014 at 01:31:48PM +0100, Matthew Minter wrote: > > To add to my previous point, I would consider releasing these patches > > as soon as possible as it seems that it can easily cause panics on a > > range of boards. Failing that perhaps it is at least worth releasing a > > hotfix which will cause PCI probing to fail should the window not be a > > power of two? Thus it will fail early with a warning instead of > > potentially causing a kernel crash. > > Have you tested the whole patch or just the one enforcing the power of 2 ? > It would be interesting to know if the one with SZ_4M is needed for you or > not (I guess not). > > I suspect that just this single-liner will work as well, as it does for me. > > Thanks, > Willy > > > diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c > index 0e79665..552ab73 100644 > --- a/drivers/pci/host/pci-mvebu.c > +++ b/drivers/pci/host/pci-mvebu.c > @@ -363,6 +363,7 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port) > port->memwin_size = > (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) - > port->memwin_base; > + port->memwin_size = roundup_pow_of_two(port->memwin_size); > > mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, > port->memwin_base, port->memwin_size); Thanks to all of you for the investigation. So if I summarize your findings, we have two patches needed to fix the problems of everybody: 1) The bug fix for the MSI teardown function in irq-armada-370-xp.c. This one is easy, I'll test it right now, and give my formal Acked-by soon. 2) The problem of non-power of 2 sized windows. This one is more complicated, as we cannot simply round up the size of the windows inside the pci-mvebu.c driver: the Linux PCI core is not aware of this rounding, and might therefore allocate a BAR for the next device at an address that overlaps the previous window we have enlarged to match the power of two size requirement. This is something that was already discussed with the PCI maintainers, but the discussion needs to be revived I guess. Am I correct, or are other patches needed, or are remaining problems not solved by these two fixes? Thomas
That is indeed correct. Those two patches correct the behaviour fully on my board. I am by no means an expert on this, but could problem 2 be fixed using resource_alignment? I am a little fuzzy about how calls to it work but I know the sysfs entry can force the alignment/ size of PCI resources so presumably the kernel already has this capability? My board is unfortunately unable to diagnose any possible issues regarding possibly overlapping device windows as it physically only allows 1 PCIe device at once. Again, many thanks with your help, that has indeed fixed it for me. On 8 April 2014 15:43, Thomas Petazzoni <thomas.petazzoni@free-electrons.com> wrote: > Willy, Neil, Matthew, > > On Tue, 8 Apr 2014 14:36:32 +0200, Willy Tarreau wrote: > >> On Tue, Apr 08, 2014 at 01:31:48PM +0100, Matthew Minter wrote: >> > To add to my previous point, I would consider releasing these patches >> > as soon as possible as it seems that it can easily cause panics on a >> > range of boards. Failing that perhaps it is at least worth releasing a >> > hotfix which will cause PCI probing to fail should the window not be a >> > power of two? Thus it will fail early with a warning instead of >> > potentially causing a kernel crash. >> >> Have you tested the whole patch or just the one enforcing the power of 2 ? >> It would be interesting to know if the one with SZ_4M is needed for you or >> not (I guess not). >> >> I suspect that just this single-liner will work as well, as it does for me. >> >> Thanks, >> Willy >> >> >> diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c >> index 0e79665..552ab73 100644 >> --- a/drivers/pci/host/pci-mvebu.c >> +++ b/drivers/pci/host/pci-mvebu.c >> @@ -363,6 +363,7 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port) >> port->memwin_size = >> (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) - >> port->memwin_base; >> + port->memwin_size = roundup_pow_of_two(port->memwin_size); >> >> mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, >> port->memwin_base, port->memwin_size); > > Thanks to all of you for the investigation. So if I summarize your > findings, we have two patches needed to fix the problems of everybody: > > 1) The bug fix for the MSI teardown function in irq-armada-370-xp.c. > This one is easy, I'll test it right now, and give my formal > Acked-by soon. > > 2) The problem of non-power of 2 sized windows. This one is more > complicated, as we cannot simply round up the size of the windows > inside the pci-mvebu.c driver: the Linux PCI core is not aware of > this rounding, and might therefore allocate a BAR for the next > device at an address that overlaps the previous window we have > enlarged to match the power of two size requirement. This is > something that was already discussed with the PCI maintainers, but > the discussion needs to be revived I guess. > > Am I correct, or are other patches needed, or are remaining problems > not solved by these two fixes? > > Thomas > -- > Thomas Petazzoni, CTO, Free Electrons > Embedded Linux, Kernel and Android engineering > http://free-electrons.com
Hi Thomas, On Tue, Apr 08, 2014 at 04:43:17PM +0200, Thomas Petazzoni wrote: > Willy, Neil, Matthew, > > On Tue, 8 Apr 2014 14:36:32 +0200, Willy Tarreau wrote: > > > On Tue, Apr 08, 2014 at 01:31:48PM +0100, Matthew Minter wrote: > > > To add to my previous point, I would consider releasing these patches > > > as soon as possible as it seems that it can easily cause panics on a > > > range of boards. Failing that perhaps it is at least worth releasing a > > > hotfix which will cause PCI probing to fail should the window not be a > > > power of two? Thus it will fail early with a warning instead of > > > potentially causing a kernel crash. > > > > Have you tested the whole patch or just the one enforcing the power of 2 ? > > It would be interesting to know if the one with SZ_4M is needed for you or > > not (I guess not). > > > > I suspect that just this single-liner will work as well, as it does for me. > > > > Thanks, > > Willy > > > > > > diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c > > index 0e79665..552ab73 100644 > > --- a/drivers/pci/host/pci-mvebu.c > > +++ b/drivers/pci/host/pci-mvebu.c > > @@ -363,6 +363,7 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port) > > port->memwin_size = > > (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) - > > port->memwin_base; > > + port->memwin_size = roundup_pow_of_two(port->memwin_size); > > > > mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, > > port->memwin_base, port->memwin_size); > > Thanks to all of you for the investigation. So if I summarize your > findings, we have two patches needed to fix the problems of everybody: > > 1) The bug fix for the MSI teardown function in irq-armada-370-xp.c. > This one is easy, I'll test it right now, and give my formal > Acked-by soon. > > 2) The problem of non-power of 2 sized windows. This one is more > complicated, as we cannot simply round up the size of the windows > inside the pci-mvebu.c driver: the Linux PCI core is not aware of > this rounding, and might therefore allocate a BAR for the next > device at an address that overlaps the previous window we have > enlarged to match the power of two size requirement. This is > something that was already discussed with the PCI maintainers, but > the discussion needs to be revived I guess. > > Am I correct, or are other patches needed, or are remaining problems > not solved by these two fixes? There's still the older issue I got with trying to load a 2-port based igb card on the xp-gp, the code says that hwirq 0xffffffe4 is invalid and cannot be enabled and after that it panics. I recall having debugged this one a few months ago, and finding something like ffffffe4 == -ENOSPC which was returned by one of the inner functions. Let's put that aside for a moment though, but I would appreciate it if you find the time to try your igb NIC on your XPGP in 3.14 to see if you get the same result. Thanks, Willy
Dear Willy Tarreau, On Tue, 8 Apr 2014 16:53:44 +0200, Willy Tarreau wrote: > > Am I correct, or are other patches needed, or are remaining problems > > not solved by these two fixes? > > There's still the older issue I got with trying to load a 2-port based > igb card on the xp-gp, the code says that hwirq 0xffffffe4 is invalid > and cannot be enabled and after that it panics. I recall having debugged > this one a few months ago, and finding something like ffffffe4 == -ENOSPC > which was returned by one of the inner functions. Let's put that aside > for a moment though, but I would appreciate it if you find the time to > try your igb NIC on your XPGP in 3.14 to see if you get the same result. See my second e-mail: this is due to improperly casting a signed value into a unsigned value. I currently have the igb NIC in my Armada XP DB, and I've reported a panic :-) Thomas
On Tue, Apr 08, 2014 at 01:31:48PM +0100, Matthew Minter wrote: > To add to my previous point, I would consider releasing these patches > as soon as possible as it seems that it can easily cause panics on a > range of boards. On the other hand, it's not the random user who stuffs PCIe cards on such boards PCIe slots, so in general there's a long testing period. We identified the igb issue in mid-december without getting completely down on it, so we remained 3 months without any other single report about this. Thus I think that the only persons really interested in having these patches are in this thread, so the emergency is not that high :-) Cheers, Willy
diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c index afd0dce..5e0144c 100644 --- a/drivers/pci/host/pci-mvebu.c +++ b/drivers/pci/host/pci-mvebu.c @@ -364,6 +364,9 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port) (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) - port->memwin_base; + if (!is_power_of_2(port->memwin_size)) + port->memwin_size = 1 << fls(port->memwin_size); + mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, port->memwin_base, port->memwin_size); } @@ -728,7 +731,7 @@ static resource_size_t mvebu_pcie_align_resource(struct pci_dev *dev, if (res->flags & IORESOURCE_IO) return round_up(start, max_t(resource_size_t, SZ_64K, size)); else if (res->flags & IORESOURCE_MEM) - return round_up(start, max_t(resource_size_t, SZ_1M, size)); + return round_up(start, max_t(resource_size_t, SZ_4M, size)); else return start; }