diff mbox

regression: ath_tx_edma_tasklet() Illegal idle entry in RCU read-side critical section

Message ID 20161218155938.GP3924@linux.vnet.ibm.com (mailing list archive)
State Superseded
Delegated to: Kalle Valo
Headers show

Commit Message

Paul E. McKenney Dec. 18, 2016, 3:59 p.m. UTC
On Sun, Dec 18, 2016 at 02:52:48PM +0100, Gabriel C wrote:
> Hello,
> 
> while testing kernel 4.9 I run into a weird issue with the ath9k driver.
> 
> I can boot the box in console mode and it stay up sometime but is not usable.

Looks to me like someone forgot an rcu_read_unlock() somewhere.  Given that
the unmatched rcu_read_lock() appears in ath_tx_edma_tasklet(), perhaps
that is also where the missing rcu_read_unlock() is.  And sure enough,
in the middle of this function we have the following:

		fifo_list = &txq->txq_fifo[txq->txq_tailidx];
		if (list_empty(fifo_list)) {
			ath_txq_unlock(sc, txq);
			return;
		}

This will of course return while still in an RCU read-side critical
section.  The caller cannot tell the difference between a return here
and falling off the end of the function, so this is likely the bug.
Or one of the bugs, anyway.  Copying the author and committer for
their thoughts.

Please try the patch at the end of this email.

							Thanx, Paul

> from dmesg :
> 
> ===============================
> [ INFO: suspicious RCU usage. ]
> 4.9-fw1 #1 Tainted: G          I
> -------------------------------
> kernel/rcu/tree.c:705 Illegal idle entry in RCU read-side critical section.!
> 
> other info that might help us debug this:
> 
> 
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 1
> RCU used illegally from extended quiescent state!
> 1 lock held by swapper/0/0:
>  #0:  (rcu_read_lock){......}, at: [<ffffffffa0ee0240>] ath_tx_edma_tasklet+0x0/0x460 [ath9k]
> 
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I     4.9-fw1 #1
> Hardware name: FUJITSU                          PRIMERGY TX200 S5             /D2709, BIOS 6.00 Rev. 1.14.2709              02/04/2013
>  ffff88043ee03f38 ffffffff812cf0f3 ffffffff81a11540 0000000000000001
>  ffff88043ee03f68 ffffffff810b7865 ffffffff81a55d58 ffff88043efcedc0
>  ffff88083cb1ca00 00000000000000d1 ffff88043ee03f88 ffffffff810dbfe8
> Call Trace:
>  <IRQ>
>  [<ffffffff812cf0f3>] dump_stack+0x86/0xc3
>  [<ffffffff810b7865>] lockdep_rcu_suspicious+0xc5/0x100
>  [<ffffffff810dbfe8>] rcu_eqs_enter_common.constprop.62+0x128/0x130
>  [<ffffffff810ddc78>] rcu_irq_exit+0x38/0x70
>  [<ffffffff81067ec4>] irq_exit+0x74/0xd0
>  [<ffffffff8101e561>] do_IRQ+0x71/0x130
>  [<ffffffff8158700c>] common_interrupt+0x8c/0x8c
>  <EOI>
>  [<ffffffff81472836>] ? cpuidle_enter_state+0x156/0x220
>  [<ffffffff81472922>] cpuidle_enter+0x12/0x20
>  [<ffffffff810ad23e>] call_cpuidle+0x1e/0x40
>  [<ffffffff810ad46d>] cpu_startup_entry+0x11d/0x210
>  [<ffffffff8157892c>] rest_init+0x12c/0x140
>  [<ffffffff81d02ec3>] start_kernel+0x40f/0x41c
>  [<ffffffff81d02120>] ? early_idt_handler_array+0x120/0x120
>  [<ffffffff81d02299>] x86_64_start_reservations+0x2a/0x2c
>  [<ffffffff81d02386>] x86_64_start_kernel+0xeb/0xf8

------------------------------------------------------------------------

commit 5a16fed76936184a7ac22e466cf39bd8bb5ee65e
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Sun Dec 18 07:49:00 2016 -0800

    drivers/ath: Add missing rcu_read_unlock() to ath_tx_edma_tasklet()
    
    Commit d94a461d7a7d ("ath9k: use ieee80211_tx_status_noskb where possible")
    added rcu_read_lock() and rcu_read_unlock() around the body of
    ath_tx_edma_tasklet(), but failed to add the needed rcu_read_unlock()
    before a "return" in the middle of this function.  This commit therefore
    adds the missing rcu_read_unlock().
    
    Reported-by: Gabriel C <nix.or.die@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Felix Fietkau <nbd@nbd.name>
    Cc: Kalle Valo <kvalo@qca.qualcomm.com>
    Cc: QCA ath9k Development <ath9k-devel@qca.qualcomm.com>
    Cc: <linux-wireless@vger.kernel.org?
    Cc: <ath9k-devel@lists.ath9k.org>

Comments

Tobias Klausmann Dec. 18, 2016, 4:17 p.m. UTC | #1
Hi,

A patch for this is already floating on the ML for a while now latest: 
(ath9k: do not return early to fix rcu unlocking)

Hopefully Kalle will include it in one of his upcoming pull requests.

Greetings,

Tobias


On 18.12.2016 16:59, Paul E. McKenney wrote:
> On Sun, Dec 18, 2016 at 02:52:48PM +0100, Gabriel C wrote:
>> Hello,
>>
>> while testing kernel 4.9 I run into a weird issue with the ath9k driver.
>>
>> I can boot the box in console mode and it stay up sometime but is not usable.
> Looks to me like someone forgot an rcu_read_unlock() somewhere.  Given that
> the unmatched rcu_read_lock() appears in ath_tx_edma_tasklet(), perhaps
> that is also where the missing rcu_read_unlock() is.  And sure enough,
> in the middle of this function we have the following:
>
> 		fifo_list = &txq->txq_fifo[txq->txq_tailidx];
> 		if (list_empty(fifo_list)) {
> 			ath_txq_unlock(sc, txq);
> 			return;
> 		}
>
> This will of course return while still in an RCU read-side critical
> section.  The caller cannot tell the difference between a return here
> and falling off the end of the function, so this is likely the bug.
> Or one of the bugs, anyway.  Copying the author and committer for
> their thoughts.
>
> Please try the patch at the end of this email.
>
> 							Thanx, Paul
>
>> from dmesg :
>>
>> ===============================
>> [ INFO: suspicious RCU usage. ]
>> 4.9-fw1 #1 Tainted: G          I
>> -------------------------------
>> kernel/rcu/tree.c:705 Illegal idle entry in RCU read-side critical section.!
>>
>> other info that might help us debug this:
>>
>>
>> RCU used illegally from idle CPU!
>> rcu_scheduler_active = 1, debug_locks = 1
>> RCU used illegally from extended quiescent state!
>> 1 lock held by swapper/0/0:
>>   #0:  (rcu_read_lock){......}, at: [<ffffffffa0ee0240>] ath_tx_edma_tasklet+0x0/0x460 [ath9k]
>>
>> stack backtrace:
>> CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I     4.9-fw1 #1
>> Hardware name: FUJITSU                          PRIMERGY TX200 S5             /D2709, BIOS 6.00 Rev. 1.14.2709              02/04/2013
>>   ffff88043ee03f38 ffffffff812cf0f3 ffffffff81a11540 0000000000000001
>>   ffff88043ee03f68 ffffffff810b7865 ffffffff81a55d58 ffff88043efcedc0
>>   ffff88083cb1ca00 00000000000000d1 ffff88043ee03f88 ffffffff810dbfe8
>> Call Trace:
>>   <IRQ>
>>   [<ffffffff812cf0f3>] dump_stack+0x86/0xc3
>>   [<ffffffff810b7865>] lockdep_rcu_suspicious+0xc5/0x100
>>   [<ffffffff810dbfe8>] rcu_eqs_enter_common.constprop.62+0x128/0x130
>>   [<ffffffff810ddc78>] rcu_irq_exit+0x38/0x70
>>   [<ffffffff81067ec4>] irq_exit+0x74/0xd0
>>   [<ffffffff8101e561>] do_IRQ+0x71/0x130
>>   [<ffffffff8158700c>] common_interrupt+0x8c/0x8c
>>   <EOI>
>>   [<ffffffff81472836>] ? cpuidle_enter_state+0x156/0x220
>>   [<ffffffff81472922>] cpuidle_enter+0x12/0x20
>>   [<ffffffff810ad23e>] call_cpuidle+0x1e/0x40
>>   [<ffffffff810ad46d>] cpu_startup_entry+0x11d/0x210
>>   [<ffffffff8157892c>] rest_init+0x12c/0x140
>>   [<ffffffff81d02ec3>] start_kernel+0x40f/0x41c
>>   [<ffffffff81d02120>] ? early_idt_handler_array+0x120/0x120
>>   [<ffffffff81d02299>] x86_64_start_reservations+0x2a/0x2c
>>   [<ffffffff81d02386>] x86_64_start_kernel+0xeb/0xf8
> ------------------------------------------------------------------------
>
> commit 5a16fed76936184a7ac22e466cf39bd8bb5ee65e
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Sun Dec 18 07:49:00 2016 -0800
>
>      drivers/ath: Add missing rcu_read_unlock() to ath_tx_edma_tasklet()
>      
>      Commit d94a461d7a7d ("ath9k: use ieee80211_tx_status_noskb where possible")
>      added rcu_read_lock() and rcu_read_unlock() around the body of
>      ath_tx_edma_tasklet(), but failed to add the needed rcu_read_unlock()
>      before a "return" in the middle of this function.  This commit therefore
>      adds the missing rcu_read_unlock().
>      
>      Reported-by: Gabriel C <nix.or.die@gmail.com>
>      Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>      Cc: Felix Fietkau <nbd@nbd.name>
>      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
>      Cc: QCA ath9k Development <ath9k-devel@qca.qualcomm.com>
>      Cc: <linux-wireless@vger.kernel.org?
>      Cc: <ath9k-devel@lists.ath9k.org>
>
> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> index 52bfbb988611..857d5ae09a1d 100644
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -2787,6 +2787,7 @@ void ath_tx_edma_tasklet(struct ath_softc *sc)
>   		fifo_list = &txq->txq_fifo[txq->txq_tailidx];
>   		if (list_empty(fifo_list)) {
>   			ath_txq_unlock(sc, txq);
> +			rcu_read_unlock();
>   			return;
>   		}
>   
>
Kalle Valo Dec. 18, 2016, 7:57 p.m. UTC | #2
Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> writes:

> A patch for this is already floating on the ML for a while now latest:
> (ath9k: do not return early to fix rcu unlocking)

It's here:

https://patchwork.kernel.org/patch/9472709/

> Hopefully Kalle will include it in one of his upcoming pull requests.

Yes, I'll try to get it to 4.10-rc2.
Tobias Klausmann Dec. 18, 2016, 8:04 p.m. UTC | #3
On 18.12.2016 20:57, Valo, Kalle wrote:
> Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> writes:
>
>> A patch for this is already floating on the ML for a while now latest:
>> (ath9k: do not return early to fix rcu unlocking)
> It's here:
>
> https://patchwork.kernel.org/patch/9472709/

Good to know!
>
>> Hopefully Kalle will include it in one of his upcoming pull requests.
> Yes, I'll try to get it to 4.10-rc2.

Thanks for the update!
Paul E. McKenney Dec. 18, 2016, 8:14 p.m. UTC | #4
On Sun, Dec 18, 2016 at 07:57:42PM +0000, Valo, Kalle wrote:
> Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> writes:
> 
> > A patch for this is already floating on the ML for a while now latest:
> > (ath9k: do not return early to fix rcu unlocking)
> 
> It's here:
> 
> https://patchwork.kernel.org/patch/9472709/

Feel free to add:

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

							Thanx, Paul

> > Hopefully Kalle will include it in one of his upcoming pull requests.
> 
> Yes, I'll try to get it to 4.10-rc2.
> 
> -- 
> Kalle Valo
Gabriel C Dec. 18, 2016, 11:11 p.m. UTC | #5
On 18.12.2016 21:14, Paul E. McKenney wrote:
> On Sun, Dec 18, 2016 at 07:57:42PM +0000, Valo, Kalle wrote:
>> Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> writes:
>>
>>> A patch for this is already floating on the ML for a while now latest:
>>> (ath9k: do not return early to fix rcu unlocking)
>>
>> It's here:
>>
>> https://patchwork.kernel.org/patch/9472709/
>
> Feel free to add:
>
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>

This patch fixes the bug for me.

You can add my Tested-by if you wish.

Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>


Regards,

Gabriel
Kalle Valo Dec. 20, 2016, 2:20 p.m. UTC | #6
Gabriel C <nix.or.die@gmail.com> writes:

> On 18.12.2016 21:14, Paul E. McKenney wrote:
>> On Sun, Dec 18, 2016 at 07:57:42PM +0000, Valo, Kalle wrote:
>>> Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> writes:
>>>
>>>> A patch for this is already floating on the ML for a while now latest:
>>>> (ath9k: do not return early to fix rcu unlocking)
>>>
>>> It's here:
>>>
>>> https://patchwork.kernel.org/patch/9472709/
>>
>> Feel free to add:
>>
>> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>
>
> This patch fixes the bug for me.
>
> You can add my Tested-by if you wish.
>
> Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>

I added both of these to the commit log, thanks.
diff mbox

Patch

diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index 52bfbb988611..857d5ae09a1d 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -2787,6 +2787,7 @@  void ath_tx_edma_tasklet(struct ath_softc *sc)
 		fifo_list = &txq->txq_fifo[txq->txq_tailidx];
 		if (list_empty(fifo_list)) {
 			ath_txq_unlock(sc, txq);
+			rcu_read_unlock();
 			return;
 		}