diff mbox

ath9k: break out of irq handler after 5 jiffies

Message ID 1517958324-13536-1-git-send-email-greearb@candelatech.com (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show

Commit Message

Ben Greear Feb. 6, 2018, 11:05 p.m. UTC
From: Ben Greear <greearb@candelatech.com>

In case where the system is sluggish, we should probably break out
early.  Maybe this will fix issues where the OS thinks the IRQ handler
is not responding and disables the IRQ because 'nobody cared'

Signed-off-by: Ben Greear <greearb@candelatech.com>
---
 drivers/net/wireless/ath/ath9k/recv.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Felix Fietkau Feb. 7, 2018, 9:16 a.m. UTC | #1
On 2018-02-07 00:05, greearb@candelatech.com wrote:
> From: Ben Greear <greearb@candelatech.com>
> 
> In case where the system is sluggish, we should probably break out
> early.  Maybe this will fix issues where the OS thinks the IRQ handler
> is not responding and disables the IRQ because 'nobody cared'
> 
> Signed-off-by: Ben Greear <greearb@candelatech.com>
5 jiffies as a hardcoded value is a bad idea, since it produces
different behavior based on CONFIG_HZ.

- Felix
Johannes Berg Feb. 7, 2018, 10:55 a.m. UTC | #2
On Wed, 2018-02-07 at 10:16 +0100, Felix Fietkau wrote:
> On 2018-02-07 00:05, greearb@candelatech.com wrote:
> > From: Ben Greear <greearb@candelatech.com>
> > 
> > In case where the system is sluggish, we should probably break out
> > early.  Maybe this will fix issues where the OS thinks the IRQ handler
> > is not responding and disables the IRQ because 'nobody cared'
> > 
> > Signed-off-by: Ben Greear <greearb@candelatech.com>
> 
> 5 jiffies as a hardcoded value is a bad idea, since it produces
> different behavior based on CONFIG_HZ.

Also, err, NAPI? Or is something else is going on here?

johannes
Ben Greear Feb. 7, 2018, 3:39 p.m. UTC | #3
On 02/07/2018 02:55 AM, Johannes Berg wrote:
> On Wed, 2018-02-07 at 10:16 +0100, Felix Fietkau wrote:
>> On 2018-02-07 00:05, greearb@candelatech.com wrote:
>>> From: Ben Greear <greearb@candelatech.com>
>>>
>>> In case where the system is sluggish, we should probably break out
>>> early.  Maybe this will fix issues where the OS thinks the IRQ handler
>>> is not responding and disables the IRQ because 'nobody cared'
>>>
>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>
>> 5 jiffies as a hardcoded value is a bad idea, since it produces
>> different behavior based on CONFIG_HZ.

I figured that was a benefit since it would run shorter duration on systems with
a faster HZ clock.

>
> Also, err, NAPI? Or is something else is going on here?

I don't really know, but part of my test was running traffic while creating
1200 stations, so likely there were lots of higher-level lock contention that
slowed down sending pkts up the stack.

I got a bunch of errors about IRQs being ignored because nobody cared.  I noticed
that the ath9k loop could handle up to 500 or so frames, and that seemed like too
many for my particular test case.

Once I put in this patch, I did not see the 'nobody cared' error again.

There could easily be a better fix.  If you all want me to use a fixed time instead
of HZ, then please suggest a value.  I was testing with HZ of 1000, btw.

Thanks,
Ben

>
> johannes
>
Ben Greear Feb. 26, 2018, 9:39 p.m. UTC | #4
On 02/07/2018 07:39 AM, Ben Greear wrote:
>
>
> On 02/07/2018 02:55 AM, Johannes Berg wrote:
>> On Wed, 2018-02-07 at 10:16 +0100, Felix Fietkau wrote:
>>> On 2018-02-07 00:05, greearb@candelatech.com wrote:
>>>> From: Ben Greear <greearb@candelatech.com>
>>>>
>>>> In case where the system is sluggish, we should probably break out
>>>> early.  Maybe this will fix issues where the OS thinks the IRQ handler
>>>> is not responding and disables the IRQ because 'nobody cared'
>>>>
>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>
>>> 5 jiffies as a hardcoded value is a bad idea, since it produces
>>> different behavior based on CONFIG_HZ.
>
> I figured that was a benefit since it would run shorter duration on systems with
> a faster HZ clock.
>
>>
>> Also, err, NAPI? Or is something else is going on here?
>
> I don't really know, but part of my test was running traffic while creating
> 1200 stations, so likely there were lots of higher-level lock contention that
> slowed down sending pkts up the stack.
>
> I got a bunch of errors about IRQs being ignored because nobody cared.  I noticed
> that the ath9k loop could handle up to 500 or so frames, and that seemed like too
> many for my particular test case.
>
> Once I put in this patch, I did not see the 'nobody cared' error again.
>
> There could easily be a better fix.  If you all want me to use a fixed time instead
> of HZ, then please suggest a value.  I was testing with HZ of 1000, btw.

Hello,

I don't mind changing this patch, but I could use some guidance as to what
values you all want me to use.

Should I use a millisecond based clock instead of jiffies?

What time duration do you want if 5 Jiffies (or 5ms) is not desired?

Thanks,
Ben
Arend van Spriel Feb. 26, 2018, 10:08 p.m. UTC | #5
On 2/26/2018 10:39 PM, Ben Greear wrote:
> On 02/07/2018 07:39 AM, Ben Greear wrote:
>>
>>
>> On 02/07/2018 02:55 AM, Johannes Berg wrote:
>>> On Wed, 2018-02-07 at 10:16 +0100, Felix Fietkau wrote:
>>>> On 2018-02-07 00:05, greearb@candelatech.com wrote:
>>>>> From: Ben Greear <greearb@candelatech.com>
>>>>>
>>>>> In case where the system is sluggish, we should probably break out
>>>>> early.  Maybe this will fix issues where the OS thinks the IRQ handler
>>>>> is not responding and disables the IRQ because 'nobody cared'
>>>>>
>>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>>
>>>> 5 jiffies as a hardcoded value is a bad idea, since it produces
>>>> different behavior based on CONFIG_HZ.
>>
>> I figured that was a benefit since it would run shorter duration on
>> systems with
>> a faster HZ clock.
>>
>>>
>>> Also, err, NAPI? Or is something else is going on here?
>>
>> I don't really know, but part of my test was running traffic while
>> creating
>> 1200 stations, so likely there were lots of higher-level lock
>> contention that
>> slowed down sending pkts up the stack.
>>
>> I got a bunch of errors about IRQs being ignored because nobody
>> cared.  I noticed
>> that the ath9k loop could handle up to 500 or so frames, and that
>> seemed like too
>> many for my particular test case.
>>
>> Once I put in this patch, I did not see the 'nobody cared' error again.
>>
>> There could easily be a better fix.  If you all want me to use a fixed
>> time instead
>> of HZ, then please suggest a value.  I was testing with HZ of 1000, btw.
>
> Hello,
>
> I don't mind changing this patch, but I could use some guidance as to what
> values you all want me to use.
>
> Should I use a millisecond based clock instead of jiffies?
>
> What time duration do you want if 5 Jiffies (or 5ms) is not desired?

Hi Ben,

Instead of using some time unit you could consider breaking out after 
handing 'x' number of frames and make 'x' configurable through debugfs.

Regards,
Arend
Ben Greear Feb. 26, 2018, 10:40 p.m. UTC | #6
On 02/26/2018 02:08 PM, Arend van Spriel wrote:
> On 2/26/2018 10:39 PM, Ben Greear wrote:
>> On 02/07/2018 07:39 AM, Ben Greear wrote:
>>>
>>>
>>> On 02/07/2018 02:55 AM, Johannes Berg wrote:
>>>> On Wed, 2018-02-07 at 10:16 +0100, Felix Fietkau wrote:
>>>>> On 2018-02-07 00:05, greearb@candelatech.com wrote:
>>>>>> From: Ben Greear <greearb@candelatech.com>
>>>>>>
>>>>>> In case where the system is sluggish, we should probably break out
>>>>>> early.  Maybe this will fix issues where the OS thinks the IRQ handler
>>>>>> is not responding and disables the IRQ because 'nobody cared'
>>>>>>
>>>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>>>
>>>>> 5 jiffies as a hardcoded value is a bad idea, since it produces
>>>>> different behavior based on CONFIG_HZ.
>>>
>>> I figured that was a benefit since it would run shorter duration on
>>> systems with
>>> a faster HZ clock.
>>>
>>>>
>>>> Also, err, NAPI? Or is something else is going on here?
>>>
>>> I don't really know, but part of my test was running traffic while
>>> creating
>>> 1200 stations, so likely there were lots of higher-level lock
>>> contention that
>>> slowed down sending pkts up the stack.
>>>
>>> I got a bunch of errors about IRQs being ignored because nobody
>>> cared.  I noticed
>>> that the ath9k loop could handle up to 500 or so frames, and that
>>> seemed like too
>>> many for my particular test case.
>>>
>>> Once I put in this patch, I did not see the 'nobody cared' error again.
>>>
>>> There could easily be a better fix.  If you all want me to use a fixed
>>> time instead
>>> of HZ, then please suggest a value.  I was testing with HZ of 1000, btw.
>>
>> Hello,
>>
>> I don't mind changing this patch, but I could use some guidance as to what
>> values you all want me to use.
>>
>> Should I use a millisecond based clock instead of jiffies?
>>
>> What time duration do you want if 5 Jiffies (or 5ms) is not desired?
>
> Hi Ben,
>
> Instead of using some time unit you could consider breaking out after handing 'x' number of frames and make 'x' configurable through debugfs.

I don't see why you would care about number of pkts...it is just a proxy for time, right?

So, in that case, then using jiffies (or some other fast timer) seems the most useful.

Thanks,
Ben

>
> Regards,
> Arend
>
diff mbox

Patch

diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index b90ea2b..274814c 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -1084,6 +1084,7 @@  int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp)
 	dma_addr_t new_buf_addr;
 	unsigned int budget = 512;
 	struct ieee80211_hdr *hdr;
+	unsigned long expires_jiffies = jiffies + 5;
 
 	if (edma)
 		dma_type = DMA_BIDIRECTIONAL;
@@ -1241,6 +1242,9 @@  int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp)
 
 		if (!budget--)
 			break;
+
+		if (time_is_before_jiffies(expires_jiffies))
+			break;
 	} while (1);
 
 	if (!(ah->imask & ATH9K_INT_RXEOL)) {