Message ID | CA+BoTQkQq5RG4dDBf0whxBfe5yNvjSb-57AYfoDWeHk-CuZvAw@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Hi Michal, thank you for the patch. Unfortunately I didn't had time until yesterday to test it. This patch has nearly no influence in the reported behaviour. As soon as I enable intel_idle the firmware stops working. We also did even more intense testing and figured out that even with disabled intel_idle sometimes the card stops working in AP mode after a long time period or putting really heavy load. Normally SWBA overruns are reported slightly before the cards stops responding. If we disable CONFIG_IRQ_DOMAIN=n CONFIG_IRQ_DOMAIN_DEBUG=n CONFIG_PM_RUNTIME=n the crash still occure after 1-3 days but the card is further working. At the moment we are invastigating the behavior with disabled hibernation and cpu_idle. For now it seems to work. But this could change in the next days... This problem drives me really crazy. Regards, Fabian Am 02.03.2015 um 13:20 schrieb Michal Kazior: > On 23 February 2015 at 15:41, Fabian Wittenberg > <Fabian.Wittenberg@sophos.com> wrote: >> Hi Michal, >> >> I already did this approach. This works fine and is the current >> workaround to get the product out, but I would like to know what the >> basic problem is. >> The power consumption increases by ~1.25W on idle devices if you disable >> cstates. This is not a real problem but a low mem corruption is one. >> So I assume a bug in the ath10k-driver/firmware. > Hi Fabian, > > Can you try the following diff with _INTEL_IDLE=y, please? > > --- a/drivers/net/wireless/ath/ath10k/pci.c > +++ b/drivers/net/wireless/ath/ath10k/pci.c > @@ -2531,6 +2531,11 @@ static int ath10k_pci_claim(struct ath10k *ar) > > pci_set_master(pdev); > > + /* Disable RETRY_TIMEOUT register to prevent PCI Tx retries from > + * interfering with C3 CPU state. > + */ > + pci_write_config_byte(pdev, 0x41, 0); > + > /* Workaround: Disable ASPM */ > pci_read_config_dword(pdev, 0x80, &lcr_val); > pci_write_config_dword(pdev, 0x80, (lcr_val & 0xffffff00)); > > > Micha?
It's possible that you're entering a sleep state, the whole socket + dram controller is going to sleep, and the latency that the wakeup causes is confusing the firmware and/or DMA engine. -adrian
Yes, I guessed something like that but this should be a firmwarebug :-\ I'm quiet surprized that nowbody else has this problem!? There are so many configuration constellations that trigger this... Fabian Am 19.03.2015 um 16:44 schrieb Adrian Chadd: > It's possible that you're entering a sleep state, the whole socket + > dram controller is going to sleep, and the latency that the wakeup > causes is confusing the firmware and/or DMA engine. > > > > -adrian
On 19 March 2015 at 08:57, Fabian Wittenberg <Fabian.Wittenberg@sophos.com> wrote: > Yes, I guessed something like that but this should be a firmwarebug :-\ > I'm quiet surprized that nowbody else has this problem!? > There are so many configuration constellations that trigger this... The sleep depth / time that a socket-sleep state can take to wakeup to do DMA is highly variable. It's based on chipset, BIOS and sleep settings. IIRC the ath10k firmware wasn't really debugged with hostap-on-intel as a supported option, with all the varying things there. So yeah, someone with more detailed DMA/PCIe bridge documentation for QCA988x is going to have to dig into the DMA register settings to see what's going on. Maybe it's just exceeding the transaction timeout and that should be easy to fix. (I currently don't have all of the register documentation for the QCA988x as I do for the pre-11ac chips.) adrian > Fabian > > Am 19.03.2015 um 16:44 schrieb Adrian Chadd: >> It's possible that you're entering a sleep state, the whole socket + >> dram controller is going to sleep, and the latency that the wakeup >> causes is confusing the firmware and/or DMA engine. >> >> >> >> -adrian > > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k
I don't have them either even though we have a NDA with QCA. There seem to be several NDA steps at QCA. It's really hard to get these papers. It's a pain in the but... Regards, Fabian Am 19.03.2015 um 17:05 schrieb Adrian Chadd: > On 19 March 2015 at 08:57, Fabian Wittenberg > <Fabian.Wittenberg@sophos.com> wrote: >> Yes, I guessed something like that but this should be a firmwarebug :-\ >> I'm quiet surprized that nowbody else has this problem!? >> There are so many configuration constellations that trigger this... > The sleep depth / time that a socket-sleep state can take to wakeup to > do DMA is highly variable. It's based on chipset, BIOS and sleep > settings. > > IIRC the ath10k firmware wasn't really debugged with hostap-on-intel > as a supported option, with all the varying things there. So yeah, > someone with more detailed DMA/PCIe bridge documentation for QCA988x > is going to have to dig into the DMA register settings to see what's > going on. Maybe it's just exceeding the transaction timeout and that > should be easy to fix. > > (I currently don't have all of the register documentation for the > QCA988x as I do for the pre-11ac chips.) > > > adrian > >> Fabian >> >> Am 19.03.2015 um 16:44 schrieb Adrian Chadd: >>> It's possible that you're entering a sleep state, the whole socket + >>> dram controller is going to sleep, and the latency that the wakeup >>> causes is confusing the firmware and/or DMA engine. >>> >>> >>> >>> -adrian >> >> _______________________________________________ >> ath10k mailing list >> ath10k@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/ath10k
On 19 March 2015 at 09:18, Fabian Wittenberg <Fabian.Wittenberg@sophos.com> wrote: > I don't have them either even though we have a NDA with QCA. > There seem to be several NDA steps at QCA. It's really hard to get these > papers. > It's a pain in the but... So do I. I may have to drive down the road again and try to have lunch with the MAC team... -adrian
Today we encountered a similar issue on a QCA9558 as well. However its really rare to see it on this chipset. This is a SoC with MIPS architecture. Widely used on access points. I really think you could be right with your quess. But that should be a QCA task as its really related to their h/w. We are using backports/ath10k for the PCI card and the SoC. Regards, Fabian Am 19.03.2015 um 17:05 schrieb Adrian Chadd: > On 19 March 2015 at 08:57, Fabian Wittenberg > <Fabian.Wittenberg@sophos.com> wrote: >> Yes, I guessed something like that but this should be a firmwarebug :-\ >> I'm quiet surprized that nowbody else has this problem!? >> There are so many configuration constellations that trigger this... > The sleep depth / time that a socket-sleep state can take to wakeup to > do DMA is highly variable. It's based on chipset, BIOS and sleep > settings. > > IIRC the ath10k firmware wasn't really debugged with hostap-on-intel > as a supported option, with all the varying things there. So yeah, > someone with more detailed DMA/PCIe bridge documentation for QCA988x > is going to have to dig into the DMA register settings to see what's > going on. Maybe it's just exceeding the transaction timeout and that > should be easy to fix. > > (I currently don't have all of the register documentation for the > QCA988x as I do for the pre-11ac chips.) > > > adrian > >> Fabian >> >> Am 19.03.2015 um 16:44 schrieb Adrian Chadd: >>> It's possible that you're entering a sleep state, the whole socket + >>> dram controller is going to sleep, and the latency that the wakeup >>> causes is confusing the firmware and/or DMA engine. >>> >>> >>> >>> -adrian >> >> _______________________________________________ >> ath10k mailing list >> ath10k@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/ath10k
--- a/drivers/net/wireless/ath/ath10k/pci.c +++ b/drivers/net/wireless/ath/ath10k/pci.c @@ -2531,6 +2531,11 @@ static int ath10k_pci_claim(struct ath10k *ar) pci_set_master(pdev); + /* Disable RETRY_TIMEOUT register to prevent PCI Tx retries from + * interfering with C3 CPU state. + */ + pci_write_config_byte(pdev, 0x41, 0); + /* Workaround: Disable ASPM */ pci_read_config_dword(pdev, 0x80, &lcr_val); pci_write_config_dword(pdev, 0x80, (lcr_val & 0xffffff00));