diff mbox series

[RFC] ath11k: Don't drop tx_status when peer cannot be found

Message ID 20230801-ath11k-ack_status_leak-v1-1-539cb72c55bc@narfation.org (mailing list archive)
State RFC
Delegated to: Kalle Valo
Headers show
Series [RFC] ath11k: Don't drop tx_status when peer cannot be found | expand

Commit Message

Sven Eckelmann Aug. 1, 2023, 5:38 p.m. UTC
When a station idles for a long time, hostapd will try to send a QoS Null
frame to the station as "poll". NL80211_CMD_PROBE_CLIENT is used for this
purpose. And the skb will be added to ack_status_frame - waiting for a
tx_complete via ieee80211_tx_status*();

But when the peer was already removed before the tx_complete arrives, the
peer will be missing and thus the entry will not be removed from
ack_status_frame. This IDR will therefore run full after 8K clients which
disappeared this way - the access point will then just stall and not allow
any new clients because idr_alloc for ack_status_frame will fail.

Tested-on: IPQ6018 hw1.0 WLAN.HK.2.5.0.1-01100-QCAHKSWPL_SILICONZ-1

Fixes: 6257c702264c ("wifi: ath11k: fix tx status reporting in encap offload mode")
Fixes: 94739d45c388 ("ath11k: switch to using ieee80211_tx_status_ext()")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
---
This problem can be seen with QCA's ath11k fork as:

  attach ack fail -28

when new clients try to connect - and connection attempt will obviously
fail. Most likely with an "deauthenticated due to inactivity (timer
DEAUTH/REMOVE)" by hostapd.

And the fix (required for both platches) would then be something like:

  --- a/drivers/net/wireless/ath/ath11k/dp_tx.c
  +++ b/drivers/net/wireless/ath/ath11k/dp_tx.c
  @@ -629,8 +629,14 @@ static void ath11k_dp_tx_complete_msdu(struct ath11k *ar,
   			   "dp_tx: failed to find the peer with peer_id %d\n",
   			    ts->peer_id);
   		spin_unlock_bh(&ab->base_lock);
  -		dev_kfree_skb_any(msdu);
  -		goto exit;
  +		rcu_read_unlock();
  +
  +		if (skb_cb->flags & ATH11K_SKB_HW_80211_ENCAP)
  +			ieee80211_tx_status_8023(ar->hw, skb_cb->vif, msdu);
  +		else
  +			ieee80211_tx_status(ar->hw, msdu);
  +
  +		return;
   	}
   	arsta = (struct ath11k_sta *)peer->sta->drv_priv;
   	status.sta = peer->sta;

But this is not possible any longer because Felix Fietkau removed
ieee80211_tx_status_8023 in commit 9ae708f00161 ("wifi: mac80211: remove
ieee80211_tx_status_8023") - and the function ieee80211_lookup_ra_sta
(required for this task) is currently not exported. And the sta information
is required to reach the ieee80211_sta_tx_notify code section in
ieee80211_tx_status_ext()
---
 drivers/net/wireless/ath/ath11k/dp_tx.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)


---
base-commit: 1d7dd5aa35474e553b8671b58579e0749b560779
change-id: 20230801-ath11k-ack_status_leak-2338eda5a721

Best regards,

Comments

Felix Fietkau Aug. 1, 2023, 6:11 p.m. UTC | #1
On 01.08.23 19:38, Sven Eckelmann wrote:
> When a station idles for a long time, hostapd will try to send a QoS Null
> frame to the station as "poll". NL80211_CMD_PROBE_CLIENT is used for this
> purpose. And the skb will be added to ack_status_frame - waiting for a
> tx_complete via ieee80211_tx_status*();
> 
> But when the peer was already removed before the tx_complete arrives, the
> peer will be missing and thus the entry will not be removed from
> ack_status_frame. This IDR will therefore run full after 8K clients which
> disappeared this way - the access point will then just stall and not allow
> any new clients because idr_alloc for ack_status_frame will fail.
> 
> Tested-on: IPQ6018 hw1.0 WLAN.HK.2.5.0.1-01100-QCAHKSWPL_SILICONZ-1
> 
> Fixes: 6257c702264c ("wifi: ath11k: fix tx status reporting in encap offload mode")
> Fixes: 94739d45c388 ("ath11k: switch to using ieee80211_tx_status_ext()")
> Signed-off-by: Sven Eckelmann <sven@narfation.org>
> ---
> This problem can be seen with QCA's ath11k fork as:
> 
>    attach ack fail -28
> 
> when new clients try to connect - and connection attempt will obviously
> fail. Most likely with an "deauthenticated due to inactivity (timer
> DEAUTH/REMOVE)" by hostapd.
> 
> And the fix (required for both platches) would then be something like:
> 
>    --- a/drivers/net/wireless/ath/ath11k/dp_tx.c
>    +++ b/drivers/net/wireless/ath/ath11k/dp_tx.c
>    @@ -629,8 +629,14 @@ static void ath11k_dp_tx_complete_msdu(struct ath11k *ar,
>     			   "dp_tx: failed to find the peer with peer_id %d\n",
>     			    ts->peer_id);
>     		spin_unlock_bh(&ab->base_lock);
>    -		dev_kfree_skb_any(msdu);
>    -		goto exit;
>    +		rcu_read_unlock();
>    +
>    +		if (skb_cb->flags & ATH11K_SKB_HW_80211_ENCAP)
>    +			ieee80211_tx_status_8023(ar->hw, skb_cb->vif, msdu);
>    +		else
>    +			ieee80211_tx_status(ar->hw, msdu);
>    +
>    +		return;
>     	}
>     	arsta = (struct ath11k_sta *)peer->sta->drv_priv;
>     	status.sta = peer->sta;
> 
> But this is not possible any longer because Felix Fietkau removed
> ieee80211_tx_status_8023 in commit 9ae708f00161 ("wifi: mac80211: remove
> ieee80211_tx_status_8023") - and the function ieee80211_lookup_ra_sta
> (required for this task) is currently not exported. And the sta information
> is required to reach the ieee80211_sta_tx_notify code section in
> ieee80211_tx_status_ext()

This does not make much sense to me. ieee80211_sta_tx_notify is specific 
to interfaces running in client mode, thus unrelated to anything hostapd 
is doing. It's a different kind of probing than the one you're looking into.

If the status information is irrelevant to mac80211/hostapd, then there 
really is no need to call ieee80211_tx_status* here.

The main bug is the fact that dev_kfree_skb* must not be called for tx 
packets passed from mac80211. If you replace it with a call to 
ieee80211_free_txskb, the bug goes away.

One more note regarding ieee80211_tx_status_8023 - I removed it not only 
because it was unused, but because it should never be used at all. Its 
call to ieee80211_lookup_ra_sta is guaranteed to be broken whenever 
4-address mode AP_VLAN is being used (since the driver cannot pass the 
correct vif).

- Felix
Sven Eckelmann Aug. 1, 2023, 9:41 p.m. UTC | #2
On Tuesday, 1 August 2023 20:11:51 CEST Felix Fietkau wrote:
[...]
> > when new clients try to connect - and connection attempt will obviously
> > fail. Most likely with an "deauthenticated due to inactivity (timer
> > DEAUTH/REMOVE)" by hostapd.
> > 
> > And the fix (required for both platches) would then be something like:
> > 
> >    --- a/drivers/net/wireless/ath/ath11k/dp_tx.c
> >    +++ b/drivers/net/wireless/ath/ath11k/dp_tx.c
> >    @@ -629,8 +629,14 @@ static void ath11k_dp_tx_complete_msdu(struct ath11k *ar,
> >     			   "dp_tx: failed to find the peer with peer_id %d\n",
> >     			    ts->peer_id);
> >     		spin_unlock_bh(&ab->base_lock);
> >    -		dev_kfree_skb_any(msdu);
> >    -		goto exit;
> >    +		rcu_read_unlock();
> >    +
> >    +		if (skb_cb->flags & ATH11K_SKB_HW_80211_ENCAP)
> >    +			ieee80211_tx_status_8023(ar->hw, skb_cb->vif, msdu);
> >    +		else
> >    +			ieee80211_tx_status(ar->hw, msdu);
> >    +
> >    +		return;
> >     	}
> >     	arsta = (struct ath11k_sta *)peer->sta->drv_priv;
> >     	status.sta = peer->sta;
> > 
> > But this is not possible any longer because Felix Fietkau removed
> > ieee80211_tx_status_8023 in commit 9ae708f00161 ("wifi: mac80211: remove
> > ieee80211_tx_status_8023") - and the function ieee80211_lookup_ra_sta
> > (required for this task) is currently not exported. And the sta information
> > is required to reach the ieee80211_sta_tx_notify code section in
> > ieee80211_tx_status_ext()
> 
> This does not make much sense to me. ieee80211_sta_tx_notify is specific 
> to interfaces running in client mode, thus unrelated to anything hostapd 
> is doing. It's a different kind of probing than the one you're looking into.

Sorry, copied something to my notes and then mixed up basically everything 
after that. Interesting for the fix was only that it reaches 
ieee80211_report_ack_skb via ieee80211_report_used_skb. This can either be 
done via __ieee80211_tx_status/ieee80211_tx_status_ext (which doesn't have any 
dependency to the sta - which I incorrectly said earlier) or via 
ieee80211_free_txskb (which I missed earlier)

[...]
> The main bug is the fact that dev_kfree_skb* must not be called for tx 
> packets passed from mac80211. If you replace it with a call to 
> ieee80211_free_txskb, the bug goes away.

Thanks for the hint. Will submit an actual patch with your recommended
replacement.

Kind regards,
	Sven
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath11k/dp_tx.c b/drivers/net/wireless/ath/ath11k/dp_tx.c
index a34833de7c67..c25283a11d27 100644
--- a/drivers/net/wireless/ath/ath11k/dp_tx.c
+++ b/drivers/net/wireless/ath/ath11k/dp_tx.c
@@ -369,7 +369,14 @@  ath11k_dp_tx_htt_tx_complete_buf(struct ath11k_base *ab,
 			   "dp_tx: failed to find the peer with peer_id %d\n",
 			    ts->peer_id);
 		spin_unlock_bh(&ab->base_lock);
-		dev_kfree_skb_any(msdu);
+
+		/* report back to clean up resources - even when the peer
+		 * is not known by ath11k.
+		 *
+		 * TODO handle also for ATH11K_SKB_HW_80211_ENCAP
+		 */
+		if (!(skb_cb->flags & ATH11K_SKB_HW_80211_ENCAP))
+			ieee80211_tx_status(ar->hw, msdu);
 		return;
 	}
 	spin_unlock_bh(&ab->base_lock);
@@ -624,7 +631,14 @@  static void ath11k_dp_tx_complete_msdu(struct ath11k *ar,
 			   "dp_tx: failed to find the peer with peer_id %d\n",
 			    ts->peer_id);
 		spin_unlock_bh(&ab->base_lock);
-		dev_kfree_skb_any(msdu);
+
+		/* report back to clean up resources - even when the peer
+		 * is not known by ath11k.
+		 *
+		 * TODO handle also for ATH11K_SKB_HW_80211_ENCAP
+		 */
+		if (!(skb_cb->flags & ATH11K_SKB_HW_80211_ENCAP))
+			ieee80211_tx_status(ar->hw, msdu);
 		return;
 	}
 	arsta = (struct ath11k_sta *)peer->sta->drv_priv;