diff mbox

[RFC] mac80211: add stop/start logic for software TXQs

Message ID 1526672192-3873-1-git-send-email-mpubbise@codeaurora.org (mailing list archive)
State RFC
Delegated to: Johannes Berg
Headers show

Commit Message

Manikanta Pubbisetty May 18, 2018, 7:36 p.m. UTC
Sometimes, it is required to stop the transmissions momentarily and
resume it later; stopping the txqs becomes very critical in scenarios where
the packet transmission has to be ceased completely. For example, during
the hardware restart, during off channel operations,
when initiating CSA(upon detecting a radar on the DFS channel), etc.

The TX queue stop/start logic in mac80211 works well in stopping the TX
when drivers make use of netdev queues, i.e, when Qdiscs in network layer
take care of traffic scheduling. Since the devices implementing
wake_tx_queue run without Qdiscs, packets will be handed to mac80211
directly without queueing them in the netdev queues. Also, mac80211
does not invoke any of the netif_stop_*/netif_wake_* APIs to stop the
transmissions and this might very well lead to undesirabled things.

For example,
During hardware restart, we stop the netdev queues so that packets are
not sent to the driver. Since ath10k implements wake_tx_queue,
TX queues will not be stopped and packets might reach the hardware while
it is restarting; this can make hardware unresponsive and can be
recovered only by rebooting the system.

There is another problem to this, it is observed that the packets
were sent on the DFS channel for a prolonged duration after radar detection
impacting the channel closing times.

We can still invoke netif stop/wake APIs when wake_tx_queue is implemented
but this could lead to packet drops in network layer; adding stop/start
logic for software TXQs in mac80211 instead makes more sense; the change
proposed adds the same in mac80211.

Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org>
---
 net/mac80211/agg-tx.c      | 4 ++++
 net/mac80211/ieee80211_i.h | 5 +++++
 net/mac80211/sta_info.c    | 3 +++
 net/mac80211/tx.c          | 3 +++
 net/mac80211/util.c        | 7 ++++++-
 5 files changed, 21 insertions(+), 1 deletion(-)

Comments

Johannes Berg May 23, 2018, 9:46 a.m. UTC | #1
On Sat, 2018-05-19 at 01:06 +0530, Manikanta Pubbisetty wrote:
> Sometimes, it is required to stop the transmissions momentarily and
> resume it later; stopping the txqs becomes very critical in scenarios where
> the packet transmission has to be ceased completely. For example, during
> the hardware restart, during off channel operations,
> when initiating CSA(upon detecting a radar on the DFS channel), etc.
> 
> The TX queue stop/start logic in mac80211 works well in stopping the TX
> when drivers make use of netdev queues, i.e, when Qdiscs in network layer
> take care of traffic scheduling. Since the devices implementing
> wake_tx_queue run without Qdiscs, packets will be handed to mac80211
> directly without queueing them in the netdev queues. Also, mac80211
> does not invoke any of the netif_stop_*/netif_wake_* APIs to stop the
> transmissions and this might very well lead to undesirabled things.

I was ready to say the drivers can handle this ;-)

> For example,
> During hardware restart, we stop the netdev queues so that packets are
> not sent to the driver. Since ath10k implements wake_tx_queue,
> TX queues will not be stopped and packets might reach the hardware while
> it is restarting; this can make hardware unresponsive and can be
> recovered only by rebooting the system.

But yeah, you do have a point here.

> There is another problem to this, it is observed that the packets
> were sent on the DFS channel for a prolonged duration after radar detection
> impacting the channel closing times.

Makes sense.

> We can still invoke netif stop/wake APIs when wake_tx_queue is implemented
> but this could lead to packet drops in network layer; adding stop/start
> logic for software TXQs in mac80211 instead makes more sense; the change
> proposed adds the same in mac80211.

Agree, that seems to make sense.

> +++ b/net/mac80211/agg-tx.c
> @@ -213,11 +213,15 @@ static void
>  ieee80211_agg_start_txq(struct sta_info *sta, int tid, bool enable)
>  {
>  	struct ieee80211_txq *txq = sta->sta.txq[tid];
> +	struct ieee80211_local *local = sta->local;
>  	struct txq_info *txqi;
>  
>  	if (!txq)
>  		return;
>  
> +	if (local->txqs_stopped)
> +		return;
> +
>  	txqi = to_txq_info(txq);

This doesn't seem right, all the logic that clears the TXQ_STOP etc.
still needs to be invoked, otherwise your TXQ will get stuck completely
since you'll never run this code again.

> +	/* pause/resume logic for intermediate software queues,
> +	 * applicable when wake_tx_queue is defined.
> +	 */
> +	unsigned long txqs_stopped;

bool? at least you don't seem to treat it like a bitmap which unsigned
long seems to imply.

> +++ b/net/mac80211/sta_info.c
> @@ -1244,6 +1244,9 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
>  			if (!txq_has_queue(sta->sta.txq[i]))
>  				continue;
>  
> +			if (local->txqs_stopped)
> +				continue;
> +
>  			drv_wake_tx_queue(local, to_txq_info(sta->sta.txq[i]));
>  		}

That seems like it's in an odd place, why bother going through all the
logic first?

Also, you have the same problem as above - you never re-run this code so
 you'd get stuck completely.

> +++ b/net/mac80211/tx.c
> @@ -1559,6 +1559,9 @@ static bool ieee80211_queue_skb(struct ieee80211_local *local,
>  	    sdata->vif.type == NL80211_IFTYPE_MONITOR)
>  		return false;
>  
> +	if (local->txqs_stopped)
> +		return false;
> +
>  	if (sdata->vif.type == NL80211_IFTYPE_AP_VLAN)
>  		sdata = container_of(sdata->bss,
>  				     struct ieee80211_sub_if_data, u.ap);

That also seems too early.

> diff --git a/net/mac80211/util.c b/net/mac80211/util.c
> index 2d82c88..da7ae8b 100644
> --- a/net/mac80211/util.c
> +++ b/net/mac80211/util.c
> @@ -298,6 +298,9 @@ static void __ieee80211_wake_queue(struct ieee80211_hw *hw, int queue,
>  	if (local->q_stop_reasons[queue][reason] == 0)
>  		__clear_bit(reason, &local->queue_stop_reasons[queue]);
>  
> +	if (local->ops->wake_tx_queue)
> +		__clear_bit(reason, &local->txqs_stopped);

Ah, never mind, you do use it as a bitmap.

But I think your approach is wrong, somehow you have to make sure you
restart.

Perhaps a good approach would be to have a check when the driver pulls
an SKB, and then record the queue it was pulled. Then, when the stop
bitmap goes to 0, you can drv_wake_tx_queue() all the queues that were
marked as "pulled while stopped" to recover.

johannes
Manikanta Pubbisetty May 28, 2018, 7:19 a.m. UTC | #2
>> +++ b/net/mac80211/agg-tx.c
>> @@ -213,11 +213,15 @@ static void
>>   ieee80211_agg_start_txq(struct sta_info *sta, int tid, bool enable)
>>   {
>>   	struct ieee80211_txq *txq = sta->sta.txq[tid];
>> +	struct ieee80211_local *local = sta->local;
>>   	struct txq_info *txqi;
>>   
>>   	if (!txq)
>>   		return;
>>   
>> +	if (local->txqs_stopped)
>> +		return;
>> +
>>   	txqi = to_txq_info(txq);
> This doesn't seem right, all the logic that clears the TXQ_STOP etc.
> still needs to be invoked, otherwise your TXQ will get stuck completely
> since you'll never run this code again.

If the queues are blocked more than the block ack timeout then the 
traffic should be sent only in the next BA session.
I am not really sure whether the queues will be blocked more than the 
block ack timeout value unless the queues are stopped because of 
multiple reasons.
In any case, traffic will be pushed out in subsequent BA sessions, no?


[snip]

> +++ b/net/mac80211/sta_info.c
> @@ -1244,6 +1244,9 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
>   			if (!txq_has_queue(sta->sta.txq[i]))
>   				continue;
>   
> +			if (local->txqs_stopped)
> +				continue;
> +
>   			drv_wake_tx_queue(local, to_txq_info(sta->sta.txq[i]));
>   		}
> That seems like it's in an odd place, why bother going through all the
> logic first?
>
> Also, you have the same problem as above - you never re-run this code so
>   you'd get stuck completely.

I didn't get your point here. By the time the queues gets started again, 
there could be possibility that the station might have been back to 
sleep. In this case, it is better not to send the traffic, no?
Anyways, station would receive the traffic in the next cycle when it is 
out of sleep. Considering codel logic, there could be frame drops 
though; maybe I am missing something?

>> +++ b/net/mac80211/tx.c
>> @@ -1559,6 +1559,9 @@ static bool ieee80211_queue_skb(struct ieee80211_local *local,
>>   	    sdata->vif.type == NL80211_IFTYPE_MONITOR)
>>   		return false;
>>   
>> +	if (local->txqs_stopped)
>> +		return false;
>> +
>>   	if (sdata->vif.type == NL80211_IFTYPE_AP_VLAN)
>>   		sdata = container_of(sdata->bss,
>>   				     struct ieee80211_sub_if_data, u.ap);
> That also seems too early.
>
>> diff --git a/net/mac80211/util.c b/net/mac80211/util.c
>> index 2d82c88..da7ae8b 100644
>> --- a/net/mac80211/util.c
>> +++ b/net/mac80211/util.c
>> @@ -298,6 +298,9 @@ static void __ieee80211_wake_queue(struct ieee80211_hw *hw, int queue,
>>   	if (local->q_stop_reasons[queue][reason] == 0)
>>   		__clear_bit(reason, &local->queue_stop_reasons[queue]);
>>   
>> +	if (local->ops->wake_tx_queue)
>> +		__clear_bit(reason, &local->txqs_stopped);
> Ah, never mind, you do use it as a bitmap.
>
> But I think your approach is wrong, somehow you have to make sure you
> restart.
>
> Perhaps a good approach would be to have a check when the driver pulls
> an SKB, and then record the queue it was pulled. Then, when the stop
> bitmap goes to 0, you can drv_wake_tx_queue() all the queues that were
> marked as "pulled while stopped" to recover.
Good idea!!
A much better way to handle this, I will try out this and see how it fares.

Thanks,
Manikanta
Johannes Berg May 28, 2018, 8:06 a.m. UTC | #3
On Mon, 2018-05-28 at 12:49 +0530, Manikanta Pubbisetty wrote:
> > 
> > This doesn't seem right, all the logic that clears the TXQ_STOP etc.
> > still needs to be invoked, otherwise your TXQ will get stuck completely
> > since you'll never run this code again.
> 
> If the queues are blocked more than the block ack timeout then the 
> traffic should be sent only in the next BA session.
> I am not really sure whether the queues will be blocked more than the 
> block ack timeout value unless the queues are stopped because of 
> multiple reasons.
> In any case, traffic will be pushed out in subsequent BA sessions, no?

I'm not really sure what you're saying, but it sounds almost like you're
confusing a "BA session" with a single A-MPDU? The session will get
stuck if you do the code this way, I think.

> > Also, you have the same problem as above - you never re-run this code so
> >   you'd get stuck completely.
> 
> I didn't get your point here. By the time the queues gets started again, 
> there could be possibility that the station might have been back to 
> sleep. In this case, it is better not to send the traffic, no?
> Anyways, station would receive the traffic in the next cycle when it is 
> out of sleep. Considering codel logic, there could be frame drops 
> though; maybe I am missing something?

But this is still the old code before cycling, so you never get here
during the TX cycle you're thinking of?

johannes
Manikanta Pubbisetty May 29, 2018, 7:32 a.m. UTC | #4
On 5/28/2018 1:36 PM, Johannes Berg wrote:
> On Mon, 2018-05-28 at 12:49 +0530, Manikanta Pubbisetty wrote:
>>> This doesn't seem right, all the logic that clears the TXQ_STOP etc.
>>> still needs to be invoked, otherwise your TXQ will get stuck completely
>>> since you'll never run this code again.
>> If the queues are blocked more than the block ack timeout then the
>> traffic should be sent only in the next BA session.
>> I am not really sure whether the queues will be blocked more than the
>> block ack timeout value unless the queues are stopped because of
>> multiple reasons.
>> In any case, traffic will be pushed out in subsequent BA sessions, no?
> I'm not really sure what you're saying, but it sounds almost like you're
> confusing a "BA session" with a single A-MPDU? The session will get
> stuck if you do the code this way, I think.

Okay.

>>> Also, you have the same problem as above - you never re-run this code so
>>>    you'd get stuck completely.
>> I didn't get your point here. By the time the queues gets started again,
>> there could be possibility that the station might have been back to
>> sleep. In this case, it is better not to send the traffic, no?
>> Anyways, station would receive the traffic in the next cycle when it is
>> out of sleep. Considering codel logic, there could be frame drops
>> though; maybe I am missing something?
> But this is still the old code before cycling, so you never get here
> during the TX cycle you're thinking of?

I did not understand your question fully ;-)
But then, as discussed earlier, it is best to mark the queues on which 
driver attempted to dequeue and then release the packets once queues are 
woken up.

Thanks,
Manikanta
diff mbox

Patch

diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 595c662..7cfec25 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -213,11 +213,15 @@  static void
 ieee80211_agg_start_txq(struct sta_info *sta, int tid, bool enable)
 {
 	struct ieee80211_txq *txq = sta->sta.txq[tid];
+	struct ieee80211_local *local = sta->local;
 	struct txq_info *txqi;
 
 	if (!txq)
 		return;
 
+	if (local->txqs_stopped)
+		return;
+
 	txqi = to_txq_info(txq);
 
 	if (enable)
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index d1978aa..dd64e8c 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1137,6 +1137,11 @@  struct ieee80211_local {
 	/* also used to protect ampdu_ac_queue and amdpu_ac_stop_refcnt */
 	spinlock_t queue_stop_reason_lock;
 
+	/* pause/resume logic for intermediate software queues,
+	 * applicable when wake_tx_queue is defined.
+	 */
+	unsigned long txqs_stopped;
+
 	int open_count;
 	int monitors, cooked_mntrs;
 	/* number of interfaces with corresponding FIF_ flags */
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 6428f1a..03ea152 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -1244,6 +1244,9 @@  void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
 			if (!txq_has_queue(sta->sta.txq[i]))
 				continue;
 
+			if (local->txqs_stopped)
+				continue;
+
 			drv_wake_tx_queue(local, to_txq_info(sta->sta.txq[i]));
 		}
 	}
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 8275a58..67156cb 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1559,6 +1559,9 @@  static bool ieee80211_queue_skb(struct ieee80211_local *local,
 	    sdata->vif.type == NL80211_IFTYPE_MONITOR)
 		return false;
 
+	if (local->txqs_stopped)
+		return false;
+
 	if (sdata->vif.type == NL80211_IFTYPE_AP_VLAN)
 		sdata = container_of(sdata->bss,
 				     struct ieee80211_sub_if_data, u.ap);
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 2d82c88..da7ae8b 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -298,6 +298,9 @@  static void __ieee80211_wake_queue(struct ieee80211_hw *hw, int queue,
 	if (local->q_stop_reasons[queue][reason] == 0)
 		__clear_bit(reason, &local->queue_stop_reasons[queue]);
 
+	if (local->ops->wake_tx_queue)
+		__clear_bit(reason, &local->txqs_stopped);
+
 	if (local->queue_stop_reasons[queue] != 0)
 		/* someone still has this queue stopped */
 		return;
@@ -351,8 +354,10 @@  static void __ieee80211_stop_queue(struct ieee80211_hw *hw, int queue,
 	if (__test_and_set_bit(reason, &local->queue_stop_reasons[queue]))
 		return;
 
-	if (local->ops->wake_tx_queue)
+	if (local->ops->wake_tx_queue) {
+		__set_bit(reason, &local->txqs_stopped);
 		return;
+	}
 
 	if (local->hw.queues < IEEE80211_NUM_ACS)
 		n_acs = 1;