Message ID | 20180506132500.16888-1-erik.stromdahl@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Erik Stromdahl <erik.stromdahl@gmail.com> wrote: > ath10k_mac_tx_push_txq returns either a postive integer (length) on > success or a negative error code on error. > > The "if (ret) break;" statement will thus always break out of the loop > immediately after ath10k_mac_tx_push_txq has returned (making the loop > pointless). > > A side effect of this fix is that we will iterate the queue until > ath10k_mac_tx_push_txq returns -ENOENT. This will make sure the queue is > not added back to ar->txqs when it is empty. This could potentially > improve performance somewhat (I have seen a small improvement with SDIO > devices). > > Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com> > Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Patch applied to ath-next branch of ath.git, thanks. e3148cc5fecf ath10k: fix return value check in wake_tx_q op
Hi Erik, We have met performance issue on our two-core system after applying your patch. In WDS mode, we found that the peak throughput in TCP-DL and UDP-DL dropped more than 10% compared with previous one. And in some cases, though throughput stays the same, one CPU usage rises about 20% which leads to 10% in total CPU usage. With your change, I think driver will try its best to push as many packets as it can. During this time, the driver's queue lock will be held for too much time in one CPU and as a result, the other CPU will be blocked if it wants to acquired the same lock. Working in this way seems not efficiency. So I think it is better to revert the change till we come up with a new solution. >> Subject: [PATCH] ath10k: fix return value check in wake_tx_q op >> >> ath10k_mac_tx_push_txq returns either a postive integer (length) on >> success >> or a negative error code on error. >> >> The "if (ret) break;" statement will thus always break out of the >> loop >> immediately after ath10k_mac_tx_push_txq has returned (making the >> loop >> pointless). >> >> A side effect of this fix is that we will iterate the queue until >> ath10k_mac_tx_push_txq returns -ENOENT. This will make sure the queue >> is >> not added back to ar->txqs when it is empty. This could potentially >> improve >> performance somewhat (I have seen a small improvement with SDIO >> devices). >> >> Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com> >> --- >> drivers/net/wireless/ath/ath10k/mac.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/net/wireless/ath/ath10k/mac.c >> b/drivers/net/wireless/ath/ath10k/mac.c >> index 3d7119ad7c7a..487a7a7380fd 100644 >> --- a/drivers/net/wireless/ath/ath10k/mac.c >> +++ b/drivers/net/wireless/ath/ath10k/mac.c >> @@ -4290,7 +4290,7 @@ static void ath10k_mac_op_wake_tx_queue(struct >> ieee80211_hw *hw, >> >> while (ath10k_mac_tx_can_push(hw, f_txq) && max--) { >> ret = ath10k_mac_tx_push_txq(hw, f_txq); >> - if (ret) >> + if (ret < 0) >> break; >> } >> if (ret != -ENOENT) >> -- >> 2.17.0
(please don't top post) yiboz <yiboz@codeaurora.org> writes: > We have met performance issue on our two-core system after applying > your patch. In WDS mode, we found that the peak throughput in TCP-DL > and UDP-DL dropped more than 10% compared with previous one. And in > some cases, though throughput stays the same, one CPU usage rises > about 20% which leads to 10% in total CPU usage. With your change, I > think driver will try its best to push as many packets as it can. > During this time, the driver's queue lock will be held for too much > time in one CPU and as a result, the other CPU will be blocked if it > wants to acquired the same lock. Working in this way seems not > efficiency. > > So I think it is better to revert the change till we come up with a > new solution. I don't think reverting is a clear option at this stage because that again creates problems for SDIO. IIRC without this patch SDIO was sending one packet a time (or something like that, can't remember all the details right now). Why does this happen only WDS mode? Did you test other modes, like AP or client mode?
在 2019-02-07 22:25,Kalle Valo 写道: > Yibo Zhao <yiboz@codeaurora.org> writes: > >> We have met performance issue on our two-core system after applying >> your patch. In WDS mode, we found that the peak throughput in TCP-DL >> and UDP-DL dropped more than 10% compared with previous one. And in >> some cases, though throughput stays the same, one CPU usage rises >> about 20% which leads to 10% in total CPU usage. With your change, I >> think driver will try its best to push as many packets as it can. >> During this time, the driver's queue lock will be held for too much >> time in one CPU and as a result, the other CPU will be blocked if it >> wants to acquired the same lock. Working in this way seems not >> efficiency. >> >> So I think it is better to revert the change till we come up with a >> new solution. > > I don't think reverting is a clear option at this stage because that > again creates problems for SDIO. IIRC without this patch SDIO was > sending one packet a time (or something like that, can't remember all > the details right now). > Below is the aqm result after 1 min test with Erik's patch. target 19999us interval 99999us ecn yes tid ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets flags 0 2 0 0 5342229 0 0 0 0 3867657029 5342229 0x0(RUN) 1 3 0 0 0 0 0 0 0 0 0 0x0(RUN) 2 3 0 0 0 0 0 0 0 0 0 0x0(RUN) 3 2 0 0 0 0 0 0 0 0 0 0x0(RUN) 4 1 0 0 0 0 0 0 0 0 0 0x0(RUN) 5 1 0 0 0 0 0 0 0 0 0 0x0(RUN) 6 0 0 0 2 0 0 0 0 144 2 0x0(RUN) 7 0 0 0 2 0 0 0 0 282 2 0x0(RUN) 8 2 0 0 0 0 0 0 0 0 0 0x0(RUN) 9 3 0 0 0 0 0 0 0 0 0 0x0(RUN) 10 3 0 0 0 0 0 0 0 0 0 0x0(RUN) 11 2 0 0 0 0 0 0 0 0 0 0x0(RUN) 12 1 0 0 0 0 0 0 0 0 0 0x0(RUN) 13 1 0 0 0 0 0 0 0 0 0 0x0(RUN) 14 0 0 0 0 0 0 0 0 0 0 0x0(RUN) 15 0 0 0 0 0 0 0 0 0 0 0x0(RUN) we see no difference between tx-packets and new-flows. Whereas if we revert the patch, we get: target 19999us interval 99999us ecn yes tid ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets flags 0 2 0 0 2233059 3 0 9236 12 1159661783 6380867 0x0(RUN) 1 3 0 0 0 0 0 0 0 0 0 0x0(RUN) 2 3 0 0 0 0 0 0 0 0 0 0x0(RUN) 3 2 0 0 0 0 0 0 0 0 0 0x0(RUN) 4 1 0 0 0 0 0 0 0 0 0 0x0(RUN) 5 1 0 0 0 0 0 0 0 0 0 0x0(RUN) 6 0 0 0 1 0 0 0 0 144 2 0x0(RUN) 7 0 0 0 1 0 0 0 0 282 2 0x0(RUN) 8 2 0 0 0 0 0 0 0 0 0 0x0(RUN) 9 3 0 0 0 0 0 0 0 0 0 0x0(RUN) 10 3 0 0 0 0 0 0 0 0 0 0x0(RUN) 11 2 0 0 0 0 0 0 0 0 0 0x0(RUN) 12 1 0 0 0 0 0 0 0 0 0 0x0(RUN) 13 1 0 0 0 0 0 0 0 0 0 0x0(RUN) 14 0 0 0 0 0 0 0 0 0 0 0x0(RUN) 15 0 0 0 0 0 0 0 0 0 0 0x0(RUN) new-flows are roughly one-third of the total tx-packets. IMHO, with Erik's change, Erik's change has changed the way fq's schedule behavior and it looks like there is no other packets in the fq after a packet has been dequeued. And as a result, this flow's deficit will be refill and then removed from fq list at once in the same CPU. And during this time, the other CPU could be blocked. When new packet comes, same thing happens. So we get equal new flows and tx-packets. Things would be different without Erik's change. After a packet has been dequeued, this flow's deficit will not be refill immediately in CPU0. It is possible that the deficit to be refilled in CPU1 while at the same time CPU0 can fetch data from ethernet. So we can see more tx-packets delivered to FW from aqm. > Why does this happen only WDS mode? Did you test other modes, like AP > or > client mode? AP mode has same issue. UDP throughput drops more than 10%. As for TCP, CPU usage rising a lot although throughput stays similiar. So, I'd say Erik's change does not work for us.
在 2019-02-25 12:40,Yibo Zhao 写道: > 在 2019-02-07 22:25,Kalle Valo 写道: >> Yibo Zhao <yiboz@codeaurora.org> writes: >> >>> We have met performance issue on our two-core system after applying >>> your patch. In WDS mode, we found that the peak throughput in TCP-DL >>> and UDP-DL dropped more than 10% compared with previous one. And in >>> some cases, though throughput stays the same, one CPU usage rises >>> about 20% which leads to 10% in total CPU usage. With your change, I >>> think driver will try its best to push as many packets as it can. >>> During this time, the driver's queue lock will be held for too much >>> time in one CPU and as a result, the other CPU will be blocked if it >>> wants to acquired the same lock. Working in this way seems not >>> efficiency. >>> >>> So I think it is better to revert the change till we come up with a >>> new solution. >> >> I don't think reverting is a clear option at this stage because that >> again creates problems for SDIO. IIRC without this patch SDIO was >> sending one packet a time (or something like that, can't remember all >> the details right now). >> > > Below is the aqm result after 1 min test with Erik's patch. > > target 19999us interval 99999us ecn yes > tid ac backlog-bytes backlog-packets new-flows drops marks overlimit > collisions tx-bytes tx-packets flags > 0 2 0 0 5342229 0 0 0 0 3867657029 5342229 0x0(RUN) > 1 3 0 0 0 0 0 0 0 0 0 0x0(RUN) > 2 3 0 0 0 0 0 0 0 0 0 0x0(RUN) > 3 2 0 0 0 0 0 0 0 0 0 0x0(RUN) > 4 1 0 0 0 0 0 0 0 0 0 0x0(RUN) > 5 1 0 0 0 0 0 0 0 0 0 0x0(RUN) > 6 0 0 0 2 0 0 0 0 144 2 0x0(RUN) > 7 0 0 0 2 0 0 0 0 282 2 0x0(RUN) > 8 2 0 0 0 0 0 0 0 0 0 0x0(RUN) > 9 3 0 0 0 0 0 0 0 0 0 0x0(RUN) > 10 3 0 0 0 0 0 0 0 0 0 0x0(RUN) > 11 2 0 0 0 0 0 0 0 0 0 0x0(RUN) > 12 1 0 0 0 0 0 0 0 0 0 0x0(RUN) > 13 1 0 0 0 0 0 0 0 0 0 0x0(RUN) > 14 0 0 0 0 0 0 0 0 0 0 0x0(RUN) > 15 0 0 0 0 0 0 0 0 0 0 0x0(RUN) > > we see no difference between tx-packets and new-flows. > Whereas if we revert the patch, we get: > > target 19999us interval 99999us ecn yes > tid ac backlog-bytes backlog-packets new-flows drops marks overlimit > collisions tx-bytes tx-packets flags > 0 2 0 0 2233059 3 0 9236 12 1159661783 6380867 0x0(RUN) > 1 3 0 0 0 0 0 0 0 0 0 0x0(RUN) > 2 3 0 0 0 0 0 0 0 0 0 0x0(RUN) > 3 2 0 0 0 0 0 0 0 0 0 0x0(RUN) > 4 1 0 0 0 0 0 0 0 0 0 0x0(RUN) > 5 1 0 0 0 0 0 0 0 0 0 0x0(RUN) > 6 0 0 0 1 0 0 0 0 144 2 0x0(RUN) > 7 0 0 0 1 0 0 0 0 282 2 0x0(RUN) > 8 2 0 0 0 0 0 0 0 0 0 0x0(RUN) > 9 3 0 0 0 0 0 0 0 0 0 0x0(RUN) > 10 3 0 0 0 0 0 0 0 0 0 0x0(RUN) > 11 2 0 0 0 0 0 0 0 0 0 0x0(RUN) > 12 1 0 0 0 0 0 0 0 0 0 0x0(RUN) > 13 1 0 0 0 0 0 0 0 0 0 0x0(RUN) > 14 0 0 0 0 0 0 0 0 0 0 0x0(RUN) > 15 0 0 0 0 0 0 0 0 0 0 0x0(RUN) > > new-flows are roughly one-third of the total tx-packets. > > IMHO, with Erik's change, Erik's change has changed the way fq's > schedule behavior and it looks like there is no other packets in the > fq after a packet has been dequeued. And as a result, this flow's > deficit will be refill and then removed from fq list at once in the > same CPU. And during this time, the other CPU could be blocked. When > new packet comes, same thing happens. So we get equal new flows and > tx-packets. > > Things would be different without Erik's change. After a packet has > been dequeued, this flow's deficit will not be refill immediately in > CPU0. It is possible that the deficit to be refilled in CPU1 while at > the same time CPU0 can fetch data from ethernet. So we can see more > tx-packets delivered to FW from aqm. > > >> Why does this happen only WDS mode? Did you test other modes, like AP >> or >> client mode? > AP mode has same issue. UDP throughput drops more than 10%. As for > TCP, CPU usage rising a lot although throughput stays similiar. > So, I'd say Erik's change does not work for us. Hi Kalle, May I have your comments?
Hi Yibo, Sorry for a late reply, but I have been busy with other projects lately. I have added my comments below On 3/4/19 2:56 AM, Yibo Zhao wrote: > 在 2019-02-25 12:40,Yibo Zhao 写道: >> 在 2019-02-07 22:25,Kalle Valo 写道: >>> Yibo Zhao <yiboz@codeaurora.org> writes: >>> >>>> We have met performance issue on our two-core system after applying >>>> your patch. In WDS mode, we found that the peak throughput in TCP-DL >>>> and UDP-DL dropped more than 10% compared with previous one. And in >>>> some cases, though throughput stays the same, one CPU usage rises >>>> about 20% which leads to 10% in total CPU usage. With your change, I >>>> think driver will try its best to push as many packets as it can. >>>> During this time, the driver's queue lock will be held for too much >>>> time in one CPU and as a result, the other CPU will be blocked if it >>>> wants to acquired the same lock. Working in this way seems not >>>> efficiency. >>>> >>>> So I think it is better to revert the change till we come up with a >>>> new solution. >>> >>> I don't think reverting is a clear option at this stage because that >>> again creates problems for SDIO. IIRC without this patch SDIO was >>> sending one packet a time (or something like that, can't remember all >>> the details right now). I have a few patches related to bundling of TX packets on my private repo. I have not yet had the time to prepare them for submission. This patch is related to that work, but I decided to submit it separately since I considered it a bugfix. <snip> >> >> IMHO, with Erik's change, Erik's change has changed the way fq's >> schedule behavior and it looks like there is no other packets in the >> fq after a packet has been dequeued. And as a result, this flow's >> deficit will be refill and then removed from fq list at once in the >> same CPU. And during this time, the other CPU could be blocked. When >> new packet comes, same thing happens. So we get equal new flows and >> tx-packets. >> >> Things would be different without Erik's change. After a packet has >> been dequeued, this flow's deficit will not be refill immediately in >> CPU0. It is possible that the deficit to be refilled in CPU1 while at >> the same time CPU0 can fetch data from ethernet. So we can see more >> tx-packets delivered to FW from aqm. >> >> >>> Why does this happen only WDS mode? Did you test other modes, like AP or >>> client mode? >> AP mode has same issue. UDP throughput drops more than 10%. As for >> TCP, CPU usage rising a lot although throughput stays similiar. >> So, I'd say Erik's change does not work for us. > > Hi Kalle, > > May I have your comments? > As I wrote in the commit message, the original code will always break out of the loop after just one iteration. It is OK with me to bring back the old logic, but I think we should skip the loop entirely then. Something like this: if (ath10k_mac_tx_can_push(hw, txq)) { ath10k_mac_tx_push_txq(hw, txq); } Btw, I noticed that the "fair scheduling" mechanism (derived from ath9k) from Toke have been integrated. I haven't rebased my tree for a while, so I will most likely have to rewrite my patches anyway in order to make them work with the new TX scheduling.
On 2019-03-11 14:44, Erik Stromdahl wrote: > Hi Yibo, > > Sorry for a late reply, but I have been busy with other projects > lately. > I have added my comments below > > On 3/4/19 2:56 AM, Yibo Zhao wrote: >> 在 2019-02-25 12:40,Yibo Zhao 写道: >>> 在 2019-02-07 22:25,Kalle Valo 写道: >>>> Yibo Zhao <yiboz@codeaurora.org> writes: > > I have a few patches related to bundling of TX packets on my private > repo. > I have not yet had the time to prepare them for submission. > This patch is related to that work, but I decided to submit it > separately > since I considered it a bugfix. Great! Really looking forward to your new patch. > > <snip> > >>> >>> IMHO, with Erik's change, Erik's change has changed the way fq's >>> schedule behavior and it looks like there is no other packets in the >>> fq after a packet has been dequeued. And as a result, this flow's >>> deficit will be refill and then removed from fq list at once in the >>> same CPU. And during this time, the other CPU could be blocked. When >>> new packet comes, same thing happens. So we get equal new flows and >>> tx-packets. >>> >>> Things would be different without Erik's change. After a packet has >>> been dequeued, this flow's deficit will not be refill immediately in >>> CPU0. It is possible that the deficit to be refilled in CPU1 while at >>> the same time CPU0 can fetch data from ethernet. So we can see more >>> tx-packets delivered to FW from aqm. >>> >>> >>>> Why does this happen only WDS mode? Did you test other modes, like >>>> AP or >>>> client mode? >>> AP mode has same issue. UDP throughput drops more than 10%. As for >>> TCP, CPU usage rising a lot although throughput stays similiar. >>> So, I'd say Erik's change does not work for us. >> >> Hi Kalle, >> >> May I have your comments? >> > > As I wrote in the commit message, the original code will always break > out of > the loop after just one iteration. > > It is OK with me to bring back the old logic, but I think we should > skip the > loop entirely then. > > Something like this: > > if (ath10k_mac_tx_can_push(hw, txq)) { > ath10k_mac_tx_push_txq(hw, txq); > } Yes, it is the exact way we tried in our private repo. And it works fine in our setup so far. Not sure it is ok for other situations. Have you tested on your sdio setup with this change? Any issue observed? > > > Btw, I noticed that the "fair scheduling" mechanism (derived from > ath9k) from > Toke have been integrated. > > I haven't rebased my tree for a while, so I will most likely have to > rewrite > my patches anyway in order to make them work with the new TX > scheduling.
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 3d7119ad7c7a..487a7a7380fd 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -4290,7 +4290,7 @@ static void ath10k_mac_op_wake_tx_queue(struct ieee80211_hw *hw, while (ath10k_mac_tx_can_push(hw, f_txq) && max--) { ret = ath10k_mac_tx_push_txq(hw, f_txq); - if (ret) + if (ret < 0) break; } if (ret != -ENOENT)
ath10k_mac_tx_push_txq returns either a postive integer (length) on success or a negative error code on error. The "if (ret) break;" statement will thus always break out of the loop immediately after ath10k_mac_tx_push_txq has returned (making the loop pointless). A side effect of this fix is that we will iterate the queue until ath10k_mac_tx_push_txq returns -ENOENT. This will make sure the queue is not added back to ar->txqs when it is empty. This could potentially improve performance somewhat (I have seen a small improvement with SDIO devices). Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com> --- drivers/net/wireless/ath/ath10k/mac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)