ath10k: fix return value check in wake_tx_q op

Message ID	20180506132500.16888-1-erik.stromdahl@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <ath10k-bounces+patchwork-ath10k=patchwork.kernel.org@lists.infradead.org> From: Erik Stromdahl <erik.stromdahl@gmail.com> To: kvalo@qca.qualcomm.com, linux-wireless@vger.kernel.org, ath10k@lists.infradead.org Subject: [PATCH] ath10k: fix return value check in wake_tx_q op Date: Sun, 6 May 2018 15:25:00 +0200 Message-Id: <20180506132500.16888-1-erik.stromdahl@gmail.com> Precedence: list Cc: Erik Stromdahl <erik.stromdahl@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "ath10k" <ath10k-bounces@lists.infradead.org> Errors-To: ath10k-bounces+patchwork-ath10k=patchwork.kernel.org@lists.infradead.org

Erik Stromdahl May 6, 2018, 1:25 p.m. UTC

ath10k_mac_tx_push_txq returns either a postive integer (length) on
success or a negative error code on error.

The "if (ret) break;" statement will thus always break out of the loop
immediately after ath10k_mac_tx_push_txq has returned (making the loop
pointless).

A side effect of this fix is that we will iterate the queue until
ath10k_mac_tx_push_txq returns -ENOENT. This will make sure the queue is
not added back to ar->txqs when it is empty. This could potentially
improve performance somewhat (I have seen a small improvement with SDIO
devices).

Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com>
---
 drivers/net/wireless/ath/ath10k/mac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Kalle Valo May 12, 2018, 9:03 a.m. UTC | #1

Erik Stromdahl <erik.stromdahl@gmail.com> wrote:

> ath10k_mac_tx_push_txq returns either a postive integer (length) on
> success or a negative error code on error.
> 
> The "if (ret) break;" statement will thus always break out of the loop
> immediately after ath10k_mac_tx_push_txq has returned (making the loop
> pointless).
> 
> A side effect of this fix is that we will iterate the queue until
> ath10k_mac_tx_push_txq returns -ENOENT. This will make sure the queue is
> not added back to ar->txqs when it is empty. This could potentially
> improve performance somewhat (I have seen a small improvement with SDIO
> devices).
> 
> Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com>
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

Patch applied to ath-next branch of ath.git, thanks.

e3148cc5fecf ath10k: fix return value check in wake_tx_q op

Yibo Zhao Jan. 28, 2019, 7:01 a.m. UTC | #2

Hi Erik,

We have met performance issue on our two-core system after applying your 
patch. In WDS mode, we found that the peak throughput in TCP-DL and 
UDP-DL dropped more than 10% compared with previous one. And in some 
cases, though throughput stays the same, one CPU usage rises about 20% 
which leads to 10% in total CPU usage. With your change, I think driver 
will try its best to push as many packets as it can. During this time, 
the driver's queue lock will be held for too much time in one CPU and as 
a result, the other CPU will be blocked if it wants to acquired the same 
lock. Working in this way seems not efficiency.

So I think it is better to revert the change till we come up with a new 
solution.

>>  Subject: [PATCH] ath10k: fix return value check in wake_tx_q op
>> 
>>  ath10k_mac_tx_push_txq returns either a postive integer (length) on 
>> success
>>  or a negative error code on error.
>> 
>>  The "if (ret) break;" statement will thus always break out of the 
>> loop
>>  immediately after ath10k_mac_tx_push_txq has returned (making the 
>> loop
>>  pointless).
>> 
>>  A side effect of this fix is that we will iterate the queue until
>>  ath10k_mac_tx_push_txq returns -ENOENT. This will make sure the queue 
>> is
>>  not added back to ar->txqs when it is empty. This could potentially 
>> improve
>>  performance somewhat (I have seen a small improvement with SDIO 
>> devices).
>> 
>>  Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com>
>>  ---
>>   drivers/net/wireless/ath/ath10k/mac.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>>  diff --git a/drivers/net/wireless/ath/ath10k/mac.c
>>  b/drivers/net/wireless/ath/ath10k/mac.c
>>  index 3d7119ad7c7a..487a7a7380fd 100644
>>  --- a/drivers/net/wireless/ath/ath10k/mac.c
>>  +++ b/drivers/net/wireless/ath/ath10k/mac.c
>>  @@ -4290,7 +4290,7 @@ static void ath10k_mac_op_wake_tx_queue(struct
>>  ieee80211_hw *hw,
>> 
>>   	while (ath10k_mac_tx_can_push(hw, f_txq) && max--) {
>>   		ret = ath10k_mac_tx_push_txq(hw, f_txq);
>>  -		if (ret)
>>  +		if (ret < 0)
>>   			break;
>>   	}
>>   	if (ret != -ENOENT)
>>  --
>>  2.17.0

Kalle Valo Feb. 7, 2019, 2:25 p.m. UTC | #3

(please don't top post)

yiboz <yiboz@codeaurora.org> writes:

> We have met performance issue on our two-core system after applying
> your patch. In WDS mode, we found that the peak throughput in TCP-DL
> and UDP-DL dropped more than 10% compared with previous one. And in
> some cases, though throughput stays the same, one CPU usage rises
> about 20% which leads to 10% in total CPU usage. With your change, I
> think driver will try its best to push as many packets as it can.
> During this time, the driver's queue lock will be held for too much
> time in one CPU and as a result, the other CPU will be blocked if it
> wants to acquired the same lock. Working in this way seems not
> efficiency.
>
> So I think it is better to revert the change till we come up with a
> new solution.

I don't think reverting is a clear option at this stage because that
again creates problems for SDIO. IIRC without this patch SDIO was
sending one packet a time (or something like that, can't remember all
the details right now).

Why does this happen only WDS mode? Did you test other modes, like AP or
client mode?

Yibo Zhao Feb. 25, 2019, 4:40 a.m. UTC | #4

在 2019-02-07 22:25，Kalle Valo 写道：
> Yibo Zhao <yiboz@codeaurora.org> writes:
> 
>> We have met performance issue on our two-core system after applying
>> your patch. In WDS mode, we found that the peak throughput in TCP-DL
>> and UDP-DL dropped more than 10% compared with previous one. And in
>> some cases, though throughput stays the same, one CPU usage rises
>> about 20% which leads to 10% in total CPU usage. With your change, I
>> think driver will try its best to push as many packets as it can.
>> During this time, the driver's queue lock will be held for too much
>> time in one CPU and as a result, the other CPU will be blocked if it
>> wants to acquired the same lock. Working in this way seems not
>> efficiency.
>> 
>> So I think it is better to revert the change till we come up with a
>> new solution.
> 
> I don't think reverting is a clear option at this stage because that
> again creates problems for SDIO. IIRC without this patch SDIO was
> sending one packet a time (or something like that, can't remember all
> the details right now).
> 

Below is the aqm result after 1 min test with Erik's patch.

target 19999us interval 99999us ecn yes
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit
collisions tx-bytes tx-packets flags
0 2 0 0 5342229 0 0 0 0 3867657029 5342229 0x0(RUN)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 2 0 0 0 0 144 2 0x0(RUN)
7 0 0 0 2 0 0 0 0 282 2 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)

we see no difference between tx-packets and new-flows.
Whereas if we revert the patch, we get：

target 19999us interval 99999us ecn yes
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit
collisions tx-bytes tx-packets flags
0 2 0 0 2233059 3 0 9236 12 1159661783 6380867 0x0(RUN)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 1 0 0 0 0 144 2 0x0(RUN)
7 0 0 0 1 0 0 0 0 282 2 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)

new-flows are roughly one-third of the total tx-packets.

IMHO, with Erik's change, Erik's change has changed the way fq's 
schedule behavior and it looks like there is no other packets in the fq 
after a packet has been dequeued. And as a result, this flow's deficit 
will be refill and then removed from fq list at once in the same CPU. 
And during this time, the other CPU could be blocked. When new packet 
comes, same thing happens. So we get equal new flows and tx-packets.

Things would be different without Erik's change. After a packet has been 
dequeued, this flow's deficit will not be refill immediately in
CPU0. It is possible that the deficit to be refilled in CPU1 while at 
the same time CPU0 can fetch data from ethernet. So we can see more 
tx-packets delivered to FW from aqm.


> Why does this happen only WDS mode? Did you test other modes, like AP 
> or
> client mode?
AP mode has same issue. UDP throughput drops more than 10%. As for TCP, 
CPU usage rising a lot although throughput stays similiar.
So, I'd say Erik's change does not work for us.

Yibo Zhao March 4, 2019, 1:56 a.m. UTC | #5

在 2019-02-25 12:40，Yibo Zhao 写道：
> 在 2019-02-07 22:25，Kalle Valo 写道：
>> Yibo Zhao <yiboz@codeaurora.org> writes:
>> 
>>> We have met performance issue on our two-core system after applying
>>> your patch. In WDS mode, we found that the peak throughput in TCP-DL
>>> and UDP-DL dropped more than 10% compared with previous one. And in
>>> some cases, though throughput stays the same, one CPU usage rises
>>> about 20% which leads to 10% in total CPU usage. With your change, I
>>> think driver will try its best to push as many packets as it can.
>>> During this time, the driver's queue lock will be held for too much
>>> time in one CPU and as a result, the other CPU will be blocked if it
>>> wants to acquired the same lock. Working in this way seems not
>>> efficiency.
>>> 
>>> So I think it is better to revert the change till we come up with a
>>> new solution.
>> 
>> I don't think reverting is a clear option at this stage because that
>> again creates problems for SDIO. IIRC without this patch SDIO was
>> sending one packet a time (or something like that, can't remember all
>> the details right now).
>> 
> 
> Below is the aqm result after 1 min test with Erik's patch.
> 
> target 19999us interval 99999us ecn yes
> tid ac backlog-bytes backlog-packets new-flows drops marks overlimit
> collisions tx-bytes tx-packets flags
> 0 2 0 0 5342229 0 0 0 0 3867657029 5342229 0x0(RUN)
> 1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 6 0 0 0 2 0 0 0 0 144 2 0x0(RUN)
> 7 0 0 0 2 0 0 0 0 282 2 0x0(RUN)
> 8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 
> we see no difference between tx-packets and new-flows.
> Whereas if we revert the patch, we get：
> 
> target 19999us interval 99999us ecn yes
> tid ac backlog-bytes backlog-packets new-flows drops marks overlimit
> collisions tx-bytes tx-packets flags
> 0 2 0 0 2233059 3 0 9236 12 1159661783 6380867 0x0(RUN)
> 1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 6 0 0 0 1 0 0 0 0 144 2 0x0(RUN)
> 7 0 0 0 1 0 0 0 0 282 2 0x0(RUN)
> 8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
> 
> new-flows are roughly one-third of the total tx-packets.
> 
> IMHO, with Erik's change, Erik's change has changed the way fq's
> schedule behavior and it looks like there is no other packets in the
> fq after a packet has been dequeued. And as a result, this flow's
> deficit will be refill and then removed from fq list at once in the
> same CPU. And during this time, the other CPU could be blocked. When
> new packet comes, same thing happens. So we get equal new flows and
> tx-packets.
> 
> Things would be different without Erik's change. After a packet has
> been dequeued, this flow's deficit will not be refill immediately in
> CPU0. It is possible that the deficit to be refilled in CPU1 while at
> the same time CPU0 can fetch data from ethernet. So we can see more
> tx-packets delivered to FW from aqm.
> 
> 
>> Why does this happen only WDS mode? Did you test other modes, like AP 
>> or
>> client mode?
> AP mode has same issue. UDP throughput drops more than 10%. As for
> TCP, CPU usage rising a lot although throughput stays similiar.
> So, I'd say Erik's change does not work for us.

Hi Kalle,

May I have your comments?

Erik Stromdahl March 11, 2019, 6:44 a.m. UTC | #6

Hi Yibo,

Sorry for a late reply, but I have been busy with other projects lately.
I have added my comments below

On 3/4/19 2:56 AM, Yibo Zhao wrote:
> 在 2019-02-25 12:40，Yibo Zhao 写道：
>> 在 2019-02-07 22:25，Kalle Valo 写道：
>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>
>>>> We have met performance issue on our two-core system after applying
>>>> your patch. In WDS mode, we found that the peak throughput in TCP-DL
>>>> and UDP-DL dropped more than 10% compared with previous one. And in
>>>> some cases, though throughput stays the same, one CPU usage rises
>>>> about 20% which leads to 10% in total CPU usage. With your change, I
>>>> think driver will try its best to push as many packets as it can.
>>>> During this time, the driver's queue lock will be held for too much
>>>> time in one CPU and as a result, the other CPU will be blocked if it
>>>> wants to acquired the same lock. Working in this way seems not
>>>> efficiency.
>>>>
>>>> So I think it is better to revert the change till we come up with a
>>>> new solution.
>>>
>>> I don't think reverting is a clear option at this stage because that
>>> again creates problems for SDIO. IIRC without this patch SDIO was
>>> sending one packet a time (or something like that, can't remember all
>>> the details right now).

I have a few patches related to bundling of TX packets on my private repo.
I have not yet had the time to prepare them for submission.
This patch is related to that work, but I decided to submit it separately
since I considered it a bugfix.

<snip>

>>
>> IMHO, with Erik's change, Erik's change has changed the way fq's
>> schedule behavior and it looks like there is no other packets in the
>> fq after a packet has been dequeued. And as a result, this flow's
>> deficit will be refill and then removed from fq list at once in the
>> same CPU. And during this time, the other CPU could be blocked. When
>> new packet comes, same thing happens. So we get equal new flows and
>> tx-packets.
>>
>> Things would be different without Erik's change. After a packet has
>> been dequeued, this flow's deficit will not be refill immediately in
>> CPU0. It is possible that the deficit to be refilled in CPU1 while at
>> the same time CPU0 can fetch data from ethernet. So we can see more
>> tx-packets delivered to FW from aqm.
>>
>>
>>> Why does this happen only WDS mode? Did you test other modes, like AP or
>>> client mode?
>> AP mode has same issue. UDP throughput drops more than 10%. As for
>> TCP, CPU usage rising a lot although throughput stays similiar.
>> So, I'd say Erik's change does not work for us.
> 
> Hi Kalle,
> 
> May I have your comments?
> 

As I wrote in the commit message, the original code will always break out of
the loop after just one iteration.

It is OK with me to bring back the old logic, but I think we should skip the
loop entirely then.

Something like this:

if (ath10k_mac_tx_can_push(hw, txq)) {
	ath10k_mac_tx_push_txq(hw, txq);
}


Btw, I noticed that the "fair scheduling" mechanism (derived from ath9k) from
Toke have been integrated.

I haven't rebased my tree for a while, so I will most likely have to rewrite
my patches anyway in order to make them work with the new TX scheduling.

Yibo Zhao March 12, 2019, 2:23 a.m. UTC | #7

On 2019-03-11 14:44, Erik Stromdahl wrote:
> Hi Yibo,
> 
> Sorry for a late reply, but I have been busy with other projects 
> lately.
> I have added my comments below
> 
> On 3/4/19 2:56 AM, Yibo Zhao wrote:
>> 在 2019-02-25 12:40，Yibo Zhao 写道：
>>> 在 2019-02-07 22:25，Kalle Valo 写道：
>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
> 
> I have a few patches related to bundling of TX packets on my private 
> repo.
> I have not yet had the time to prepare them for submission.
> This patch is related to that work, but I decided to submit it 
> separately
> since I considered it a bugfix.

Great! Really looking forward to your new patch.

> 
> <snip>
> 
>>> 
>>> IMHO, with Erik's change, Erik's change has changed the way fq's
>>> schedule behavior and it looks like there is no other packets in the
>>> fq after a packet has been dequeued. And as a result, this flow's
>>> deficit will be refill and then removed from fq list at once in the
>>> same CPU. And during this time, the other CPU could be blocked. When
>>> new packet comes, same thing happens. So we get equal new flows and
>>> tx-packets.
>>> 
>>> Things would be different without Erik's change. After a packet has
>>> been dequeued, this flow's deficit will not be refill immediately in
>>> CPU0. It is possible that the deficit to be refilled in CPU1 while at
>>> the same time CPU0 can fetch data from ethernet. So we can see more
>>> tx-packets delivered to FW from aqm.
>>> 
>>> 
>>>> Why does this happen only WDS mode? Did you test other modes, like 
>>>> AP or
>>>> client mode?
>>> AP mode has same issue. UDP throughput drops more than 10%. As for
>>> TCP, CPU usage rising a lot although throughput stays similiar.
>>> So, I'd say Erik's change does not work for us.
>> 
>> Hi Kalle,
>> 
>> May I have your comments?
>> 
> 
> As I wrote in the commit message, the original code will always break 
> out of
> the loop after just one iteration.
> 
> It is OK with me to bring back the old logic, but I think we should 
> skip the
> loop entirely then.
> 
> Something like this:
> 
> if (ath10k_mac_tx_can_push(hw, txq)) {
> 	ath10k_mac_tx_push_txq(hw, txq);
> }
Yes, it is the exact way we tried in our private repo. And it works fine 
in our setup so far. Not sure it is ok for other situations. Have you 
tested on your sdio setup with this change? Any issue observed?
> 
> 
> Btw, I noticed that the "fair scheduling" mechanism (derived from 
> ath9k) from
> Toke have been integrated.
> 
> I haven't rebased my tree for a while, so I will most likely have to 
> rewrite
> my patches anyway in order to make them work with the new TX 
> scheduling.

ath10k: fix return value check in wake_tx_q op

Commit Message

Comments

Patch