diff mbox

Intel I350 mini-PCIe card (igb) on Mirabox (mvebu / Armada 370)

Message ID alpine.DEB.2.10.1404041408280.8617@vroombuntu (mailing list archive)
State New, archived
Headers show

Commit Message

Neil Greatorex April 4, 2014, 1:19 p.m. UTC
Thomas,

After managing to get the card detected on my mirabox, I have been looking 
into getting the thing actually working. With the information from Willy 
and my own investigations I knew there was some issue in the setting up of 
the MSIs for the card. Through adding a copious amount of printk()s I was 
able to determine that the initialisation for the igb driver allocates, 
frees and re-allocates 3 MSIs for the card. I noticed that in doing this 
there was a problem whereby any call to free an MSI was trying to free 
MSI#0. I was able to track this down to the fact that the mapping for the 
IRQ was being disposed before the MSI was actually freed. The below patch 
fixes this problem.

With this patch, I can get one port on the card working. With both ports 
enabled, I still get an OOPS, so some further work is needed.

I would appreciate it if you could test this patch and let me know if you 
find any problems.

Cheers,
Neil

From 50aa11018059704229dd43ca1016defdda04f90c Mon Sep 17 00:00:00 2001
From: Neil Greatorex <neil@fatboyfat.co.uk>
Date: Fri, 4 Apr 2014 13:47:09 +0100
Subject: [PATCH] irqchip: armada-370-xp: Fix releasing of MSIs

This patch moves the call to irq_dispose_mapping() to after the call to
armada_370_xp_free_msi(). Without this patch, the armada_370_xp_free_msi
function would always free MSI#0, no matter what was passed to it.

Signed-off-by: <neil@fatboyfat.co.uk>
---
  drivers/irqchip/irq-armada-370-xp.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Willy Tarreau April 5, 2014, 5:32 p.m. UTC | #1
Hi Neil,

On Fri, Apr 04, 2014 at 02:19:44PM +0100, Neil Greatorex wrote:
> Thomas,
> 
> After managing to get the card detected on my mirabox, I have been looking 
> into getting the thing actually working. With the information from Willy 
> and my own investigations I knew there was some issue in the setting up of 
> the MSIs for the card. Through adding a copious amount of printk()s I was 
> able to determine that the initialisation for the igb driver allocates, 
> frees and re-allocates 3 MSIs for the card.

That sequence matches the memories I have about it indeed.

> I noticed that in doing this 
> there was a problem whereby any call to free an MSI was trying to free 
> MSI#0. I was able to track this down to the fact that the mapping for the 
> IRQ was being disposed before the MSI was actually freed. The below patch 
> fixes this problem.
> 
> With this patch, I can get one port on the card working. With both ports 
> enabled, I still get an OOPS, so some further work is needed.
> 
> I would appreciate it if you could test this patch and let me know if you 
> find any problems.

I don't know how to disable one port on my card (except by fiddling directly
with its eeprom but I don't want to do this anymore, there's a high risk of
bricking it). So I got the same panic as before with the two ports enabled.
However I tried to revert the prefetchable memory patch that makes the NIC
work for me and I noticed that your fix indeed improves things. After the
oops, I no longer get :

---[ end trace f6c1769398756b02 ]---
trying to free unused MSI#0
trying to free unused MSI#0
trying to free unused MSI#0
trying to free unused MSI#0
trying to free unused MSI#0
trying to free unused MSI#0

as I used to see with the revert alone. So yes I think it's a nice step forward!

Note that this was 3.13.9 on the XP-GP board (quad-core XP).

Feel free to add my Tested-by if you like.

Cheers,
Willy
Thomas Petazzoni April 5, 2014, 5:34 p.m. UTC | #2
Dear Neil Greatorex,

On Fri, 4 Apr 2014 14:19:44 +0100 (BST), Neil Greatorex wrote:

> From 50aa11018059704229dd43ca1016defdda04f90c Mon Sep 17 00:00:00 2001
> From: Neil Greatorex <neil@fatboyfat.co.uk>
> Date: Fri, 4 Apr 2014 13:47:09 +0100
> Subject: [PATCH] irqchip: armada-370-xp: Fix releasing of MSIs
> 
> This patch moves the call to irq_dispose_mapping() to after the call to
> armada_370_xp_free_msi(). Without this patch, the armada_370_xp_free_msi
> function would always free MSI#0, no matter what was passed to it.
> 
> Signed-off-by: <neil@fatboyfat.co.uk>
> ---
>   drivers/irqchip/irq-armada-370-xp.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
> index 5409564..f5e129e 100644
> --- a/drivers/irqchip/irq-armada-370-xp.c
> +++ b/drivers/irqchip/irq-armada-370-xp.c
> @@ -157,8 +157,8 @@ static void armada_370_xp_teardown_msi_irq(struct msi_chip *chip,
>   					   unsigned int irq)
>   {
>   	struct irq_data *d = irq_get_irq_data(irq);
> -	irq_dispose_mapping(irq);
>   	armada_370_xp_free_msi(d->hwirq);
> +	irq_dispose_mapping(irq);
>   }
> 
>   static struct irq_chip armada_370_xp_msi_irq_chip = {

I want to give it some test, but as it is, I'd prefer to have the
irq_dispose_mapping() done before, and use a local variable to store
d->hwirq to that armada_370_xp_free_msi() can be called after
irq_dispose_mapping(). This is to ensure the sequence is symmetrical
with the MSI setup sequence.

Also, can you detail how you tested with just one port?

Best regards,

Thomas
Neil Greatorex April 5, 2014, 7 p.m. UTC | #3
Thomas, 


On Saturday, 5 April 2014 at 6:34pm, Thomas Petazzoni wrote:

> Dear Neil Greatorex,
> 
> On Fri, 4 Apr 2014 14:19:44 +0100 (BST), Neil Greatorex wrote:
> 
> > From 50aa11018059704229dd43ca1016defdda04f90c Mon Sep 17 00:00:00 2001
> > From: Neil Greatorex <neil@fatboyfat.co.uk (mailto:neil@fatboyfat.co.uk)>
> > Date: Fri, 4 Apr 2014 13:47:09 +0100
> > Subject: [PATCH] irqchip: armada-370-xp: Fix releasing of MSIs
> > 
> > This patch moves the call to irq_dispose_mapping() to after the call to
> > armada_370_xp_free_msi(). Without this patch, the armada_370_xp_free_msi
> > function would always free MSI#0, no matter what was passed to it.
> > 
> > Signed-off-by: <neil@fatboyfat.co.uk (mailto:neil@fatboyfat.co.uk)>
> > ---
> > drivers/irqchip/irq-armada-370-xp.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
> > index 5409564..f5e129e 100644
> > --- a/drivers/irqchip/irq-armada-370-xp.c
> > +++ b/drivers/irqchip/irq-armada-370-xp.c
> > @@ -157,8 +157,8 @@ static void armada_370_xp_teardown_msi_irq(struct msi_chip *chip,
> > unsigned int irq)
> > {
> > struct irq_data *d = irq_get_irq_data(irq);
> > - irq_dispose_mapping(irq);
> > armada_370_xp_free_msi(d->hwirq);
> > + irq_dispose_mapping(irq);
> > }
> > 
> > static struct irq_chip armada_370_xp_msi_irq_chip = {
> 
> I want to give it some test, but as it is, I'd prefer to have the
> irq_dispose_mapping() done before, and use a local variable to store
> d->hwirq to that armada_370_xp_free_msi() can be called after
> irq_dispose_mapping(). This is to ensure the sequence is symmetrical
> with the MSI setup sequence.

I will redo the patch with a local variable tomorrow and resend it. 
 
Cheers, 
Neil
Willy Tarreau April 6, 2014, 6:58 p.m. UTC | #4
Hi Neil,

On Fri, Apr 04, 2014 at 02:19:44PM +0100, Neil Greatorex wrote:
> With this patch, I can get one port on the card working. With both ports 
> enabled, I still get an OOPS, so some further work is needed.

Concerning this point, here's an update on my side. I found where the crash
happens, but I don't exactly understand why, I suspect that an area is not
correctly mapped :

root@xpgp:~# insmod /tmp/igb.ko 
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
PCI: enabling device 0000:00:09.0 (0140 -> 0143)
PCI: enabling device 0000:02:00.0 (0140 -> 0142)
request_region(pdev=edb21000, 00000049)
hw_addr = pci_iomap(pdev=edb21000, 0, 0) = f0300000
mem_start=e0000000 mem_end=e007ffff
hw_addr = f0300000
hw_addr=f0300000
igb 0000:02:00.0: added PHC on eth4
igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.0: eth4: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:02:00.0: eth4: PBA No: FFFFFF-0FF
igb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
PCI: enabling device 0000:02:00.1 (0140 -> 0142)
request_region(pdev=ed99d800, 00000049)
hw_addr = pci_iomap(pdev=ed99d800, 0, 0) = f0400000
mem_start=e0100000 mem_end=e017ffff
hw_addr = f0400000
hw_addr=f0400000
Unhandled fault: external abort on non-linefetch (0x1008) at 0xf0400018
Internal error: : 1008 [#1] SMP THUMB2

in e1000_82575.o:igb_get_invariants_82575() :
    123c:       f884 33b4       strb.w  r3, [r4, #948]
    1240:       f884 33b9       strb.w  r3, [r4, #953]
    1244:       f8c4 6300       str.w   r6, [r4, #768]
    1248:       6863            ldr     r3, [r4, #4]
=>  124a:       f8d3 8018       ldr.w   r8, [r3, #24]

in e1000_82575.c:igb_get_invariants_82575() :
        hw->phy.media_type = e1000_media_type_copper;
        dev_spec->sgmii_active = false;
        dev_spec->module_plugged = false;

here=>  ctrl_ext = rd32(E1000_CTRL_EXT);

according to e1000_hw.h:

  #define E1000_CTRL_EXT 0x00018      /* Extended Device Control - RW */

according to e1000_regs.h:

  #define rd32(reg) (readl(hw->hw_addr + reg))

===> ctrl_ext = readl((hw->hw_addr = 0xf0400000) + (reg = 0x18))

As you can see, the sequence is exactly the same for both ports. The
first one has no problem performing the readl(), but the second one
cannot. Both of them got the memory address returned by a call to
pci_iomap(dev, 0, 0). I could verify that the pci_resource_len() for
both is 524288 (0x80000).

The last "hwaddr=f0400000" is printed just before calling rd32() and
shows that it was still OK there. Since the resource flags are 0x40200,
we only have IORESOURCE_MEM so pci_iomap() calls ioremap_nocache().

Thus I suspect something is not behaving correctly in the code which
configures the emulated bridge and/or the memory areas, resulting in
the second port not being correctly mapped, thus causing the crash.

But that's the deepest I can go unfortunately, I got lost after that.

Regards,
Willy
Thomas Petazzoni April 6, 2014, 7:11 p.m. UTC | #5
Dear Willy Tarreau,

On Sun, 6 Apr 2014 20:58:33 +0200, Willy Tarreau wrote:

> On Fri, Apr 04, 2014 at 02:19:44PM +0100, Neil Greatorex wrote:
> > With this patch, I can get one port on the card working. With both ports 
> > enabled, I still get an OOPS, so some further work is needed.
> 
> Concerning this point, here's an update on my side. I found where the crash
> happens, but I don't exactly understand why, I suspect that an area is not
> correctly mapped :
> 
> root@xpgp:~# insmod /tmp/igb.ko 
> igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
> igb: Copyright (c) 2007-2013 Intel Corporation.
> PCI: enabling device 0000:00:09.0 (0140 -> 0143)
> PCI: enabling device 0000:02:00.0 (0140 -> 0142)
> request_region(pdev=edb21000, 00000049)
> hw_addr = pci_iomap(pdev=edb21000, 0, 0) = f0300000
> mem_start=e0000000 mem_end=e007ffff
> hw_addr = f0300000
> hw_addr=f0300000
> igb 0000:02:00.0: added PHC on eth4
> igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
> igb 0000:02:00.0: eth4: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
> igb 0000:02:00.0: eth4: PBA No: FFFFFF-0FF
> igb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
> PCI: enabling device 0000:02:00.1 (0140 -> 0142)
> request_region(pdev=ed99d800, 00000049)
> hw_addr = pci_iomap(pdev=ed99d800, 0, 0) = f0400000
> mem_start=e0100000 mem_end=e017ffff
> hw_addr = f0400000
> hw_addr=f0400000
> Unhandled fault: external abort on non-linefetch (0x1008) at 0xf0400018
> Internal error: : 1008 [#1] SMP THUMB2
> 
> in e1000_82575.o:igb_get_invariants_82575() :
>     123c:       f884 33b4       strb.w  r3, [r4, #948]
>     1240:       f884 33b9       strb.w  r3, [r4, #953]
>     1244:       f8c4 6300       str.w   r6, [r4, #768]
>     1248:       6863            ldr     r3, [r4, #4]
> =>  124a:       f8d3 8018       ldr.w   r8, [r3, #24]
> 
> in e1000_82575.c:igb_get_invariants_82575() :
>         hw->phy.media_type = e1000_media_type_copper;
>         dev_spec->sgmii_active = false;
>         dev_spec->module_plugged = false;
> 
> here=>  ctrl_ext = rd32(E1000_CTRL_EXT);
> 
> according to e1000_hw.h:
> 
>   #define E1000_CTRL_EXT 0x00018      /* Extended Device Control - RW */
> 
> according to e1000_regs.h:
> 
>   #define rd32(reg) (readl(hw->hw_addr + reg))
> 
> ===> ctrl_ext = readl((hw->hw_addr = 0xf0400000) + (reg = 0x18))
> 
> As you can see, the sequence is exactly the same for both ports. The
> first one has no problem performing the readl(), but the second one
> cannot. Both of them got the memory address returned by a call to
> pci_iomap(dev, 0, 0). I could verify that the pci_resource_len() for
> both is 524288 (0x80000).
> 
> The last "hwaddr=f0400000" is printed just before calling rd32() and
> shows that it was still OK there. Since the resource flags are 0x40200,
> we only have IORESOURCE_MEM so pci_iomap() calls ioremap_nocache().
> 
> Thus I suspect something is not behaving correctly in the code which
> configures the emulated bridge and/or the memory areas, resulting in
> the second port not being correctly mapped, thus causing the crash.
> 
> But that's the deepest I can go unfortunately, I got lost after that.

Thanks a lot again for all this investigation. I'm hoping to be able to
look at this PCIe issue this week.

Thomas
Neil Greatorex April 6, 2014, 9:57 p.m. UTC | #6
Willy,

On Sun, 6 Apr 2014, Willy Tarreau wrote:

> As you can see, the sequence is exactly the same for both ports. The
> first one has no problem performing the readl(), but the second one
> cannot. Both of them got the memory address returned by a call to
> pci_iomap(dev, 0, 0). I could verify that the pci_resource_len() for
> both is 524288 (0x80000).
>
> The last "hwaddr=f0400000" is printed just before calling rd32() and
> shows that it was still OK there. Since the resource flags are 0x40200,
> we only have IORESOURCE_MEM so pci_iomap() calls ioremap_nocache().
>

Good work - I've been doing similar things myself! I can confirm that I 
see exactly the same thing with similar printks:

First port:
[ 1809.452878] igb 0000:01:00.0: enabling bus mastering
[ 1809.453098] igb 0000:01:00.0 (unregistered net_device): hw_addr is 
f1000000, start=e0000000, len=80000, flags=40200
[ 1809.453109] igb 0000:01:00.0 (unregistered net_device): About to read 
from offset 18
[ 1809.453120] igb 0000:01:00.0 (unregistered net_device): Read from 18 
returned 1400c0

Second port:
[ 1809.459445] igb 0000:01:00.1: enabling bus mastering
[ 1809.459563] igb 0000:01:00.1 (unregistered net_device): hw_addr is
f1100000, start=e0100000, len=80000, flags=40200
[ 1809.459573] igb 0000:01:00.1 (unregistered net_device): About to read
from offset 18
[ 1809.459581] Unhandled fault: external abort on non-linefetch (0x1008) 
at 0xf1100018

In the output above, the start= part shows the physical address and 
hw_addr shows the mapped address.

The physical addresses match those given in the lspci -vvv output (see 
https://gist.github.com/ngreatorex/9772195). I don't know enough about 
PCIe, the SoC *or* the Intel card to know if these addresses look correct 
or even sane! I did wonder if there was some issue due to the fact that 
the resources for 01:00.0 and 01:00.1 overlap, but I would guess(!?) that 
it's common in hardware that presents multiple devices.

It is perhaps noteworthy that this is the first access to the hardware for 
the 2nd port - i.e. there are no successful accesses, other than to enable 
the hardware, which AFAICT is simply accessing registers on the PCIe 
controller.

> Thus I suspect something is not behaving correctly in the code which
> configures the emulated bridge and/or the memory areas, resulting in
> the second port not being correctly mapped, thus causing the crash.
>
> But that's the deepest I can go unfortunately, I got lost after that.
>

Same here. I've tried playing around with a few things but not discovered 
anything even close to useful. Hopefully Thomas will be able to debug 
further when he gets the time.

> Regards,
> Willy
>
>

Cheers,
Neil
Willy Tarreau April 6, 2014, 10:04 p.m. UTC | #7
On Sun, Apr 06, 2014 at 10:57:40PM +0100, Neil Greatorex wrote:
> Good work - I've been doing similar things myself! I can confirm that I 
> see exactly the same thing with similar printks:

Great! A reproduceable bug is always half-resolved :-)

> The physical addresses match those given in the lspci -vvv output

same here.

> (see 
> https://gist.github.com/ngreatorex/9772195). I don't know enough about 
> PCIe, the SoC *or* the Intel card to know if these addresses look correct 
> or even sane! I did wonder if there was some issue due to the fact that 
> the resources for 01:00.0 and 01:00.1 overlap, but I would guess(!?) that 
> it's common in hardware that presents multiple devices.

My understanding and old memories tell me that's OK.

> It is perhaps noteworthy that this is the first access to the hardware for 
> the 2nd port - i.e. there are no successful accesses, other than to enable 
> the hardware, which AFAICT is simply accessing registers on the PCIe 
> controller.

Indeed, I forgot to mention that but you're right, enabling only the
second function leads to an immediate panic as well.

> I've tried playing around with a few things but not discovered 
> anything even close to useful. Hopefully Thomas will be able to debug 
> further when he gets the time.

Yeah, let's hope we have not opened a can of worms!

Willy
Thomas Petazzoni April 6, 2014, 10:16 p.m. UTC | #8
Dear Neil Greatorex,

On Sun, 6 Apr 2014 22:57:40 +0100 (BST), Neil Greatorex wrote:

> Second port:
> [ 1809.459445] igb 0000:01:00.1: enabling bus mastering
> [ 1809.459563] igb 0000:01:00.1 (unregistered net_device): hw_addr is
> f1100000, start=e0100000, len=80000, flags=40200
> [ 1809.459573] igb 0000:01:00.1 (unregistered net_device): About to read
> from offset 18
> [ 1809.459581] Unhandled fault: external abort on non-linefetch (0x1008) 
> at 0xf1100018

This address is a virtual address, so there's not much we can do with
it, as it is.

> The physical addresses match those given in the lspci -vvv output (see 
> https://gist.github.com/ngreatorex/9772195). I don't know enough about 
> PCIe, the SoC *or* the Intel card to know if these addresses look correct 
> or even sane! I did wonder if there was some issue due to the fact that 
> the resources for 01:00.0 and 01:00.1 overlap, but I would guess(!?) that 
> it's common in hardware that presents multiple devices.
> 
> It is perhaps noteworthy that this is the first access to the hardware for 
> the 2nd port - i.e. there are no successful accesses, other than to enable 
> the hardware, which AFAICT is simply accessing registers on the PCIe 
> controller.

It really looks like the MBus window has not been sized correctly, or
there is a missing MBus window. Probably a deficiency in the PCI bridge
emulation layer in pci-mvebu.

Tomorrow, I have a bit of work on AHCI on Armada 38x, but I'm hoping to
get to this after that.

Thanks again for all the investigation. All your research is clearly
going to make the debugging a *lot* easier.

Best regards,

Thomas
Neil Greatorex April 7, 2014, 12:50 a.m. UTC | #9
Thomas,

On Sun, Apr 6, 2014 at 11:16 PM, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> Dear Neil Greatorex,
>
> On Sun, 6 Apr 2014 22:57:40 +0100 (BST), Neil Greatorex wrote:
>
>> Second port:
>> [ 1809.459445] igb 0000:01:00.1: enabling bus mastering
>> [ 1809.459563] igb 0000:01:00.1 (unregistered net_device): hw_addr is
>> f1100000, start=e0100000, len=80000, flags=40200
>> [ 1809.459573] igb 0000:01:00.1 (unregistered net_device): About to read
>> from offset 18
>> [ 1809.459581] Unhandled fault: external abort on non-linefetch (0x1008)
>> at 0xf1100018
>
> This address is a virtual address, so there's not much we can do with
> it, as it is.
>

The address in the fault message is, but the physical address can be
worked out from the 2 lines above. "start=e0100000" and "offset 18".
Adding those together we get the physical address 0xe100018.

>> The physical addresses match those given in the lspci -vvv output (see
>> https://gist.github.com/ngreatorex/9772195). I don't know enough about
>> PCIe, the SoC *or* the Intel card to know if these addresses look correct
>> or even sane! I did wonder if there was some issue due to the fact that
>> the resources for 01:00.0 and 01:00.1 overlap, but I would guess(!?) that
>> it's common in hardware that presents multiple devices.
>>
>> It is perhaps noteworthy that this is the first access to the hardware for
>> the 2nd port - i.e. there are no successful accesses, other than to enable
>> the hardware, which AFAICT is simply accessing registers on the PCIe
>> controller.
>
> It really looks like the MBus window has not been sized correctly, or
> there is a missing MBus window. Probably a deficiency in the PCI bridge
> emulation layer in pci-mvebu.
>

OK well that gives me another direction to point my debugging
attempts. If I discover anything interesting I'll let you know.

> Tomorrow, I have a bit of work on AHCI on Armada 38x, but I'm hoping to
> get to this after that.

Excellent news.

>
> Thanks again for all the investigation. All your research is clearly
> going to make the debugging a *lot* easier.

No problem. I'm always interested in trying to learn more about the
hardware I have and how it all works. Of course it would be much
easier if Marvell would publicly release the datasheet but we can't
have everything!

Cheers,
Neil
diff mbox

Patch

diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
index 5409564..f5e129e 100644
--- a/drivers/irqchip/irq-armada-370-xp.c
+++ b/drivers/irqchip/irq-armada-370-xp.c
@@ -157,8 +157,8 @@  static void armada_370_xp_teardown_msi_irq(struct msi_chip *chip,
  					   unsigned int irq)
  {
  	struct irq_data *d = irq_get_irq_data(irq);
-	irq_dispose_mapping(irq);
  	armada_370_xp_free_msi(d->hwirq);
+	irq_dispose_mapping(irq);
  }

  static struct irq_chip armada_370_xp_msi_irq_chip = {