diff mbox

[3.19] ath9k: fix race condition in irq processing during hardware reset

Message ID 1421241456-29085-1-git-send-email-nbd@openwrt.org (mailing list archive)
State Accepted
Delegated to: Kalle Valo
Headers show

Commit Message

Felix Fietkau Jan. 14, 2015, 1:17 p.m. UTC
To fix invalid hardware accesses, the commit
"ath9k: do not access hardware on IRQs during reset" made the irq
handler ignore interrupts emitted after queueing a hardware reset (which
disables the IRQ). This left a small time window for the IRQ to get
re-enabled by the tasklet, which caused IRQ storms.
Instead of returning IRQ_NONE when ATH_OP_HW_RESET is set, disable the
IRQ entirely for the duration of the reset.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
---
 drivers/net/wireless/ath/ath9k/main.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Sujith Manoharan Jan. 15, 2015, 9:09 a.m. UTC | #1
Felix Fietkau wrote:
> To fix invalid hardware accesses, the commit
> "ath9k: do not access hardware on IRQs during reset" made the irq
> handler ignore interrupts emitted after queueing a hardware reset (which
> disables the IRQ). This left a small time window for the IRQ to get
> re-enabled by the tasklet, which caused IRQ storms.
> Instead of returning IRQ_NONE when ATH_OP_HW_RESET is set, disable the
> IRQ entirely for the duration of the reset.

Doesn't this make the kill_interrupts() that was added in the earlier
commit unnecessary now ?

Sujith
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felix Fietkau Jan. 15, 2015, 9:46 a.m. UTC | #2
On 2015-01-15 10:09, Sujith Manoharan wrote:
> Felix Fietkau wrote:
>> To fix invalid hardware accesses, the commit
>> "ath9k: do not access hardware on IRQs during reset" made the irq
>> handler ignore interrupts emitted after queueing a hardware reset (which
>> disables the IRQ). This left a small time window for the IRQ to get
>> re-enabled by the tasklet, which caused IRQ storms.
>> Instead of returning IRQ_NONE when ATH_OP_HW_RESET is set, disable the
>> IRQ entirely for the duration of the reset.
> 
> Doesn't this make the kill_interrupts() that was added in the earlier
> commit unnecessary now ?
I think it's still a good idea to try to silence interrupts between
queueing a reset and actually performing it.

- Felix
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rico Derrer Jan. 16, 2015, 8 a.m. UTC | #3
Hi Felix

Felix Fietkau wrote:
> diff --git a/drivers/net/wireless/ath/ath9k/main.c
> b/drivers/net/wireless/ath/ath9k/main.c
> index 9a72640..62b0bf4 100644
> --- a/drivers/net/wireless/ath/ath9k/main.c
> +++ b/drivers/net/wireless/ath/ath9k/main.c
> @@ -285,6 +285,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
> ath9k_channel *hchan)
> 
> __ath_cancel_work(sc);
> 
> + disable_irq(sc->irq);
> tasklet_disable(&sc->intr_tq);
> tasklet_disable(&sc->bcon_tasklet);
> spin_lock_bh(&sc->sc_pcu_lock);
> @@ -331,6 +332,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
> ath9k_channel *hchan)
> r = -EIO;
> 
> out:
> + enable_irq(sc->irq);
> spin_unlock_bh(&sc->sc_pcu_lock);
> tasklet_enable(&sc->bcon_tasklet);
> tasklet_enable(&sc->intr_tq);

This part completely blocks the system on a AR9350. Loading the module works but configuring it hangs the system until watchdog restarts it.

Regards
Rico
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felix Fietkau Jan. 16, 2015, 10:41 a.m. UTC | #4
On 2015-01-16 09:00, Rico Derrer wrote:
> Hi Felix
> 
> Felix Fietkau wrote:
>> diff --git a/drivers/net/wireless/ath/ath9k/main.c
>> b/drivers/net/wireless/ath/ath9k/main.c
>> index 9a72640..62b0bf4 100644
>> --- a/drivers/net/wireless/ath/ath9k/main.c
>> +++ b/drivers/net/wireless/ath/ath9k/main.c
>> @@ -285,6 +285,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
>> ath9k_channel *hchan)
>> 
>> __ath_cancel_work(sc);
>> 
>> + disable_irq(sc->irq);
>> tasklet_disable(&sc->intr_tq);
>> tasklet_disable(&sc->bcon_tasklet);
>> spin_lock_bh(&sc->sc_pcu_lock);
>> @@ -331,6 +332,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
>> ath9k_channel *hchan)
>> r = -EIO;
>> 
>> out:
>> + enable_irq(sc->irq);
>> spin_unlock_bh(&sc->sc_pcu_lock);
>> tasklet_enable(&sc->bcon_tasklet);
>> tasklet_enable(&sc->intr_tq);
> 
> This part completely blocks the system on a AR9350. Loading the
> module works but configuring it hangs the system until watchdog restarts it.
What kernel/software are you running on there? When I committed this
patch to OpenWrt, it uncovered IRQ handling bugs in MIPS kernel code
(both generic and in the ath79 platform code).

You can find the fixes that I've made here:
http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/generic/patches-3.14/130-mips_cpu_irq_disable.patch
http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/ar71xx/patches-3.14/736-MIPS-ath79-fix-chained-irq-disable.patch

I've already submitted the first one to linux-mips.

- Felix
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rico Derrer Jan. 19, 2015, 8:26 a.m. UTC | #5
> On 2015-01-16 09:00, Rico Derrer wrote:
>> Hi Felix
>> 
>> Felix Fietkau wrote:
>>> diff --git a/drivers/net/wireless/ath/ath9k/main.c
>>> b/drivers/net/wireless/ath/ath9k/main.c
>>> index 9a72640..62b0bf4 100644
>>> --- a/drivers/net/wireless/ath/ath9k/main.c
>>> +++ b/drivers/net/wireless/ath/ath9k/main.c
>>> @@ -285,6 +285,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
>>> ath9k_channel *hchan)
>>> 
>>> __ath_cancel_work(sc);
>>> 
>>> + disable_irq(sc->irq);
>>> tasklet_disable(&sc->intr_tq);
>>> tasklet_disable(&sc->bcon_tasklet);
>>> spin_lock_bh(&sc->sc_pcu_lock);
>>> @@ -331,6 +332,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
>>> ath9k_channel *hchan)
>>> r = -EIO;
>>> 
>>> out:
>>> + enable_irq(sc->irq);
>>> spin_unlock_bh(&sc->sc_pcu_lock);
>>> tasklet_enable(&sc->bcon_tasklet);
>>> tasklet_enable(&sc->intr_tq);
>> 
>> This part completely blocks the system on a AR9350. Loading the
>> module works but configuring it hangs the system until watchdog restarts it.
> What kernel/software are you running on there? When I committed this
> patch to OpenWrt, it uncovered IRQ handling bugs in MIPS kernel code
> (both generic and in the ath79 platform code).
> 
> You can find the fixes that I've made here:
> http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/generic/patches-3.14/130-mips_cpu_irq_disable.patch
> http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/ar71xx/patches-3.14/736-MIPS-ath79-fix-chained-irq-disable.patch
> 
> I've already submitted the first one to linux-mips.
> 
> - Felix

Thank you, these patches fixed the problem.

Rico
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kalle Valo Jan. 19, 2015, 12:36 p.m. UTC | #6
Felix Fietkau <nbd@openwrt.org> writes:

> To fix invalid hardware accesses, the commit
> "ath9k: do not access hardware on IRQs during reset" made the irq
> handler ignore interrupts emitted after queueing a hardware reset (which
> disables the IRQ). This left a small time window for the IRQ to get
> re-enabled by the tasklet, which caused IRQ storms.
> Instead of returning IRQ_NONE when ATH_OP_HW_RESET is set, disable the
> IRQ entirely for the duration of the reset.
>
> Signed-off-by: Felix Fietkau <nbd@openwrt.org>

Thanks, applied to wireless-drivers.git. I made a small change and added
the commit id to the commit log:

commit e3f31175a3eeb492a6ab788e4fa136c19b43aab4
Author: Felix Fietkau <nbd@openwrt.org>
Date:   Wed Jan 14 14:17:36 2015 +0100

    ath9k: fix race condition in irq processing during hardware reset
    
    To fix invalid hardware accesses, the commit 872b5d814f99 ("ath9k: do not
    access hardware on IRQs during reset") made the irq handler ignore interrupts
    emitted after queueing a hardware reset (which disables the IRQ). This left a
    small time window for the IRQ to get re-enabled by the tasklet, which caused
    IRQ storms.  Instead of returning IRQ_NONE when ATH_OP_HW_RESET is set, disable
    the IRQ entirely for the duration of the reset.
    
    Signed-off-by: Felix Fietkau <nbd@openwrt.org>
    Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Kalle Valo Jan. 19, 2015, 12:37 p.m. UTC | #7
Rico Derrer <rico.derrer@neratec.com> writes:

>>> This part completely blocks the system on a AR9350. Loading the
>>> module works but configuring it hangs the system until watchdog
>>> restarts it.
>>
>> What kernel/software are you running on there? When I committed this
>> patch to OpenWrt, it uncovered IRQ handling bugs in MIPS kernel code
>> (both generic and in the ath79 platform code).
>> 
>> You can find the fixes that I've made here:
>> http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/generic/patches-3.14/130-mips_cpu_irq_disable.patch
>> http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/ar71xx/patches-3.14/736-MIPS-ath79-fix-chained-irq-disable.patch
>> 
>> I've already submitted the first one to linux-mips.
>
> Thank you, these patches fixed the problem.

Thanks for confirming that, makes it a lot safer for me to apply the
patch.
diff mbox

Patch

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 9a72640..62b0bf4 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -285,6 +285,7 @@  static int ath_reset_internal(struct ath_softc *sc, struct ath9k_channel *hchan)
 
 	__ath_cancel_work(sc);
 
+	disable_irq(sc->irq);
 	tasklet_disable(&sc->intr_tq);
 	tasklet_disable(&sc->bcon_tasklet);
 	spin_lock_bh(&sc->sc_pcu_lock);
@@ -331,6 +332,7 @@  static int ath_reset_internal(struct ath_softc *sc, struct ath9k_channel *hchan)
 		r = -EIO;
 
 out:
+	enable_irq(sc->irq);
 	spin_unlock_bh(&sc->sc_pcu_lock);
 	tasklet_enable(&sc->bcon_tasklet);
 	tasklet_enable(&sc->intr_tq);
@@ -512,9 +514,6 @@  irqreturn_t ath_isr(int irq, void *dev)
 	if (!ah || test_bit(ATH_OP_INVALID, &common->op_flags))
 		return IRQ_NONE;
 
-	if (!AR_SREV_9100(ah) && test_bit(ATH_OP_HW_RESET, &common->op_flags))
-		return IRQ_NONE;
-
 	/* shared irq, not for us */
 	if (!ath9k_hw_intrpend(ah))
 		return IRQ_NONE;
@@ -529,7 +528,7 @@  irqreturn_t ath_isr(int irq, void *dev)
 	ath9k_debug_sync_cause(sc, sync_cause);
 	status &= ah->imask;	/* discard unasked-for bits */
 
-	if (AR_SREV_9100(ah) && test_bit(ATH_OP_HW_RESET, &common->op_flags))
+	if (test_bit(ATH_OP_HW_RESET, &common->op_flags))
 		return IRQ_HANDLED;
 
 	/*