diff mbox

ath10k + INTEL_IDLE aka. cstates == firmware crash

Message ID CA+BoTQkQq5RG4dDBf0whxBfe5yNvjSb-57AYfoDWeHk-CuZvAw@mail.gmail.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Michal Kazior March 2, 2015, 12:20 p.m. UTC
On 23 February 2015 at 15:41, Fabian Wittenberg
<Fabian.Wittenberg@sophos.com> wrote:
> Hi Michal,
>
> I already did this approach. This works fine and is the current
> workaround to get the product out, but I would like to know what the
> basic problem is.
> The power consumption increases by ~1.25W on idle devices if you disable
> cstates. This is not a real problem but a low mem corruption is one.
> So I assume a bug in the ath10k-driver/firmware.

Hi Fabian,

Can you try the following diff with _INTEL_IDLE=y, please?



Micha?

Comments

Fabian Wittenberg March 19, 2015, 9:20 a.m. UTC | #1
Hi Michal,

thank you for the patch. Unfortunately I didn't had time until yesterday
to test it.
This patch has nearly no influence in the reported behaviour. As soon as
I enable intel_idle
the firmware stops working.
We also did even more intense testing and figured out that even with
disabled intel_idle
sometimes the card stops working in AP mode after a long time period or
putting really heavy load.
Normally SWBA overruns are reported slightly before the cards
stops responding.

If we disable

CONFIG_IRQ_DOMAIN=n
CONFIG_IRQ_DOMAIN_DEBUG=n
CONFIG_PM_RUNTIME=n

the crash still occure after 1-3 days but the card is further working.

At the moment we are invastigating the behavior with disabled
hibernation and cpu_idle.
For now it seems to work. But this could change in the next days...

This problem drives me really crazy.

Regards,
Fabian


Am 02.03.2015 um 13:20 schrieb Michal Kazior:
> On 23 February 2015 at 15:41, Fabian Wittenberg
> <Fabian.Wittenberg@sophos.com> wrote:
>> Hi Michal,
>>
>> I already did this approach. This works fine and is the current
>> workaround to get the product out, but I would like to know what the
>> basic problem is.
>> The power consumption increases by ~1.25W on idle devices if you disable
>> cstates. This is not a real problem but a low mem corruption is one.
>> So I assume a bug in the ath10k-driver/firmware.
> Hi Fabian,
>
> Can you try the following diff with _INTEL_IDLE=y, please?
>
> --- a/drivers/net/wireless/ath/ath10k/pci.c
> +++ b/drivers/net/wireless/ath/ath10k/pci.c
> @@ -2531,6 +2531,11 @@ static int ath10k_pci_claim(struct ath10k *ar)
>
>         pci_set_master(pdev);
>
> +       /* Disable RETRY_TIMEOUT register to prevent PCI Tx retries from
> +        * interfering with C3 CPU state.
> +        */
> +       pci_write_config_byte(pdev, 0x41, 0);
> +
>         /* Workaround: Disable ASPM */
>         pci_read_config_dword(pdev, 0x80, &lcr_val);
>         pci_write_config_dword(pdev, 0x80, (lcr_val & 0xffffff00));
>
>
> Micha?
Adrian Chadd March 19, 2015, 3:44 p.m. UTC | #2
It's possible that you're entering a sleep state, the whole socket +
dram controller is going to sleep, and the latency that the wakeup
causes is confusing the firmware and/or DMA engine.



-adrian
Fabian Wittenberg March 19, 2015, 3:57 p.m. UTC | #3
Yes, I guessed something like that but this should be a firmwarebug :-\
I'm quiet surprized that nowbody else has this problem!?
There are so many configuration constellations that trigger this...

Fabian

Am 19.03.2015 um 16:44 schrieb Adrian Chadd:
> It's possible that you're entering a sleep state, the whole socket +
> dram controller is going to sleep, and the latency that the wakeup
> causes is confusing the firmware and/or DMA engine.
>
>
>
> -adrian
Adrian Chadd March 19, 2015, 4:05 p.m. UTC | #4
On 19 March 2015 at 08:57, Fabian Wittenberg
<Fabian.Wittenberg@sophos.com> wrote:
> Yes, I guessed something like that but this should be a firmwarebug :-\
> I'm quiet surprized that nowbody else has this problem!?
> There are so many configuration constellations that trigger this...

The sleep depth / time that a socket-sleep state can take to wakeup to
do DMA is highly variable. It's based on chipset, BIOS and sleep
settings.

IIRC the ath10k firmware wasn't really debugged with hostap-on-intel
as a supported option, with all the varying things there. So yeah,
someone with more detailed DMA/PCIe bridge documentation for QCA988x
is going to have to dig into the DMA register settings to see what's
going on. Maybe it's just exceeding the transaction timeout and that
should be easy to fix.

(I currently don't have all of the register documentation for the
QCA988x as I do for the pre-11ac chips.)


adrian

> Fabian
>
> Am 19.03.2015 um 16:44 schrieb Adrian Chadd:
>> It's possible that you're entering a sleep state, the whole socket +
>> dram controller is going to sleep, and the latency that the wakeup
>> causes is confusing the firmware and/or DMA engine.
>>
>>
>>
>> -adrian
>
>
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k
Fabian Wittenberg March 19, 2015, 4:18 p.m. UTC | #5
I don't have them either even though we have a NDA with QCA.
There seem to be several NDA steps at QCA. It's really hard to get these
papers.
It's a pain in the but...

Regards,
Fabian

Am 19.03.2015 um 17:05 schrieb Adrian Chadd:
> On 19 March 2015 at 08:57, Fabian Wittenberg
> <Fabian.Wittenberg@sophos.com> wrote:
>> Yes, I guessed something like that but this should be a firmwarebug :-\
>> I'm quiet surprized that nowbody else has this problem!?
>> There are so many configuration constellations that trigger this...
> The sleep depth / time that a socket-sleep state can take to wakeup to
> do DMA is highly variable. It's based on chipset, BIOS and sleep
> settings.
>
> IIRC the ath10k firmware wasn't really debugged with hostap-on-intel
> as a supported option, with all the varying things there. So yeah,
> someone with more detailed DMA/PCIe bridge documentation for QCA988x
> is going to have to dig into the DMA register settings to see what's
> going on. Maybe it's just exceeding the transaction timeout and that
> should be easy to fix.
>
> (I currently don't have all of the register documentation for the
> QCA988x as I do for the pre-11ac chips.)
>
>
> adrian
>
>> Fabian
>>
>> Am 19.03.2015 um 16:44 schrieb Adrian Chadd:
>>> It's possible that you're entering a sleep state, the whole socket +
>>> dram controller is going to sleep, and the latency that the wakeup
>>> causes is confusing the firmware and/or DMA engine.
>>>
>>>
>>>
>>> -adrian
>>
>> _______________________________________________
>> ath10k mailing list
>> ath10k@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/ath10k
Adrian Chadd March 19, 2015, 4:23 p.m. UTC | #6
On 19 March 2015 at 09:18, Fabian Wittenberg
<Fabian.Wittenberg@sophos.com> wrote:
> I don't have them either even though we have a NDA with QCA.
> There seem to be several NDA steps at QCA. It's really hard to get these
> papers.
> It's a pain in the but...

So do I.

I may have to drive down the road again and try to have lunch with the
MAC team...




-adrian
Fabian Wittenberg March 20, 2015, 10:46 a.m. UTC | #7
Today we encountered a similar issue on a QCA9558 as well.
However its really rare to see it on this chipset.
This is a SoC with MIPS architecture. Widely used on access points.
I really think you could be right with your quess.
But that should be a QCA task as its really related to their h/w.
We are using backports/ath10k for the PCI card and the SoC.

Regards,
Fabian

Am 19.03.2015 um 17:05 schrieb Adrian Chadd:
> On 19 March 2015 at 08:57, Fabian Wittenberg
> <Fabian.Wittenberg@sophos.com> wrote:
>> Yes, I guessed something like that but this should be a firmwarebug :-\
>> I'm quiet surprized that nowbody else has this problem!?
>> There are so many configuration constellations that trigger this...
> The sleep depth / time that a socket-sleep state can take to wakeup to
> do DMA is highly variable. It's based on chipset, BIOS and sleep
> settings.
>
> IIRC the ath10k firmware wasn't really debugged with hostap-on-intel
> as a supported option, with all the varying things there. So yeah,
> someone with more detailed DMA/PCIe bridge documentation for QCA988x
> is going to have to dig into the DMA register settings to see what's
> going on. Maybe it's just exceeding the transaction timeout and that
> should be easy to fix.
>
> (I currently don't have all of the register documentation for the
> QCA988x as I do for the pre-11ac chips.)
>
>
> adrian
>
>> Fabian
>>
>> Am 19.03.2015 um 16:44 schrieb Adrian Chadd:
>>> It's possible that you're entering a sleep state, the whole socket +
>>> dram controller is going to sleep, and the latency that the wakeup
>>> causes is confusing the firmware and/or DMA engine.
>>>
>>>
>>>
>>> -adrian
>>
>> _______________________________________________
>> ath10k mailing list
>> ath10k@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/ath10k
diff mbox

Patch

--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -2531,6 +2531,11 @@  static int ath10k_pci_claim(struct ath10k *ar)

        pci_set_master(pdev);

+       /* Disable RETRY_TIMEOUT register to prevent PCI Tx retries from
+        * interfering with C3 CPU state.
+        */
+       pci_write_config_byte(pdev, 0x41, 0);
+
        /* Workaround: Disable ASPM */
        pci_read_config_dword(pdev, 0x80, &lcr_val);
        pci_write_config_dword(pdev, 0x80, (lcr_val & 0xffffff00));