mbox series

[0/4] wifi: ath11k: fix connection failure due to unexpected peer delete

Message ID 20240123025700.2929-1-quic_bqiang@quicinc.com (mailing list archive)
Headers show
Series wifi: ath11k: fix connection failure due to unexpected peer delete | expand

Message

Baochen Qiang Jan. 23, 2024, 2:56 a.m. UTC
Currently ath11k_mac_op_unassign_vif_chanctx() deletes peer but
ath11k_mac_op_assign_vif_chanctx() doesn't create it. This results in
connection failure if MAC80211 calls drv_unassign_vif_chanctx() and
drv_assign_vif_chanctx() during AUTH and ASSOC, see below log:

[  102.372431] wlan0: authenticated
[  102.372585] ath11k_pci 0000:01:00.0: wlan0: disabling HT/VHT/HE as WMM/QoS is not supported by the AP
[  102.372593] ath11k_pci 0000:01:00.0: mac chanctx unassign ptr ffff895084638598 vdev_id 0
[  102.372808] ath11k_pci 0000:01:00.0: WMI vdev stop id 0x0
[  102.383114] ath11k_pci 0000:01:00.0: vdev stopped for vdev id 0
[  102.384689] ath11k_pci 0000:01:00.0: WMI peer delete vdev_id 0 peer_addr 20:e5:2a:21:c4:51
[  102.396676] ath11k_pci 0000:01:00.0: htt peer unmap vdev 0 peer 20:e5:2a:21:c4:51 id 3
[  102.396711] ath11k_pci 0000:01:00.0: peer delete resp for vdev id 0 addr 20:e5:2a:21:c4:51
[  102.396722] ath11k_pci 0000:01:00.0: mac removed peer 20:e5:2a:21:c4:51  vdev 0 after vdev stop
[  102.396780] ath11k_pci 0000:01:00.0: mac chanctx assign ptr ffff895084639c18 vdev_id 0
[  102.400628] wlan0: associate with 20:e5:2a:21:c4:51 (try 1/3)
[  102.508864] wlan0: associate with 20:e5:2a:21:c4:51 (try 2/3)
[  102.612815] wlan0: associate with 20:e5:2a:21:c4:51 (try 3/3)
[  102.720846] wlan0: association with 20:e5:2a:21:c4:51 timed out

The peer delete logic in ath11k_mac_op_unassign_vif_chanctx() is
introduced by commit b4a0f54156ac ("ath11k: move peer delete after
vdev stop of station for QCA6390 and WCN6855") to fix firmware
crash issue caused by unexpected vdev stop/peer delete sequence.

Actually for a STA interface peer should be deleted in
ath11k_mac_op_sta_state() when STA's state changes from
IEEE80211_STA_NONE to IEEE80211_STA_NOTEXIST, which also coincides
with current peer creation design that peer is created during
IEEE80211_STA_NOTEXIST -> IEEE80211_STA_NONE transition. So move
peer delete back to ath11k_mac_op_sta_state(), also stop vdev before
deleting peer to fix the firmware crash issue mentioned there. In
this way the connection failure mentioned here is also fixed.

Also do some cleanups in patch "wifi: ath11k: remove invalid peer
create logic", and refactor in patches "wifi: ath11k: rename
ath11k_start_vdev_delay()" and "wifi: ath11k: avoid forward declaration
of ath11k_mac_start_vdev_delay()".

Tested this patch set using QCA6390 and WCN6855 on both STA and SAP
interfaces. Basic connection and ping work well.

Baochen Qiang (4):
  wifi: ath11k: remove invalid peer create logic
  wifi: ath11k: rename ath11k_start_vdev_delay()
  wifi: ath11k: avoid forward declaration of
    ath11k_mac_start_vdev_delay()
  wifi: ath11k: fix connection failure due to unexpected peer delete

 drivers/net/wireless/ath/ath11k/mac.c | 564 +++++++++++++-------------
 1 file changed, 288 insertions(+), 276 deletions(-)


base-commit: 8ff464a183f92836d7fd99edceef50a89d8ea797

Comments

James Prestwood Jan. 29, 2024, 6:54 p.m. UTC | #1
Hi Baochen,

On 1/22/24 6:56 PM, Baochen Qiang wrote:
> Currently ath11k_mac_op_unassign_vif_chanctx() deletes peer but
> ath11k_mac_op_assign_vif_chanctx() doesn't create it. This results in
> connection failure if MAC80211 calls drv_unassign_vif_chanctx() and
> drv_assign_vif_chanctx() during AUTH and ASSOC, see below log:
>
> [  102.372431] wlan0: authenticated
> [  102.372585] ath11k_pci 0000:01:00.0: wlan0: disabling HT/VHT/HE as WMM/QoS is not supported by the AP
> [  102.372593] ath11k_pci 0000:01:00.0: mac chanctx unassign ptr ffff895084638598 vdev_id 0
> [  102.372808] ath11k_pci 0000:01:00.0: WMI vdev stop id 0x0
> [  102.383114] ath11k_pci 0000:01:00.0: vdev stopped for vdev id 0
> [  102.384689] ath11k_pci 0000:01:00.0: WMI peer delete vdev_id 0 peer_addr 20:e5:2a:21:c4:51
> [  102.396676] ath11k_pci 0000:01:00.0: htt peer unmap vdev 0 peer 20:e5:2a:21:c4:51 id 3
> [  102.396711] ath11k_pci 0000:01:00.0: peer delete resp for vdev id 0 addr 20:e5:2a:21:c4:51
> [  102.396722] ath11k_pci 0000:01:00.0: mac removed peer 20:e5:2a:21:c4:51  vdev 0 after vdev stop
> [  102.396780] ath11k_pci 0000:01:00.0: mac chanctx assign ptr ffff895084639c18 vdev_id 0
> [  102.400628] wlan0: associate with 20:e5:2a:21:c4:51 (try 1/3)
> [  102.508864] wlan0: associate with 20:e5:2a:21:c4:51 (try 2/3)
> [  102.612815] wlan0: associate with 20:e5:2a:21:c4:51 (try 3/3)
> [  102.720846] wlan0: association with 20:e5:2a:21:c4:51 timed out
>
> The peer delete logic in ath11k_mac_op_unassign_vif_chanctx() is
> introduced by commit b4a0f54156ac ("ath11k: move peer delete after
> vdev stop of station for QCA6390 and WCN6855") to fix firmware
> crash issue caused by unexpected vdev stop/peer delete sequence.
>
> Actually for a STA interface peer should be deleted in
> ath11k_mac_op_sta_state() when STA's state changes from
> IEEE80211_STA_NONE to IEEE80211_STA_NOTEXIST, which also coincides
> with current peer creation design that peer is created during
> IEEE80211_STA_NOTEXIST -> IEEE80211_STA_NONE transition. So move
> peer delete back to ath11k_mac_op_sta_state(), also stop vdev before
> deleting peer to fix the firmware crash issue mentioned there. In
> this way the connection failure mentioned here is also fixed.
>
> Also do some cleanups in patch "wifi: ath11k: remove invalid peer
> create logic", and refactor in patches "wifi: ath11k: rename
> ath11k_start_vdev_delay()" and "wifi: ath11k: avoid forward declaration
> of ath11k_mac_start_vdev_delay()".
>
> Tested this patch set using QCA6390 and WCN6855 on both STA and SAP
> interfaces. Basic connection and ping work well.

I wanted to let you know I'm seeing similar behavior in my own testing, 
with these patches applied. Granted I've hacked things up quite a bit 
but it appears to happen after a firmware crash. It may be related to my 
own changes of course, but I've connected and created/started a monitor 
vdev, then wait. At some point the firmware crashes likely due to my 
botched patches, but once it starts up again I see this same timeout 
behavior before a retry which connects successfully.

> Baochen Qiang (4):
>    wifi: ath11k: remove invalid peer create logic
>    wifi: ath11k: rename ath11k_start_vdev_delay()
>    wifi: ath11k: avoid forward declaration of
>      ath11k_mac_start_vdev_delay()
>    wifi: ath11k: fix connection failure due to unexpected peer delete
>
>   drivers/net/wireless/ath/ath11k/mac.c | 564 +++++++++++++-------------
>   1 file changed, 288 insertions(+), 276 deletions(-)
>
>
> base-commit: 8ff464a183f92836d7fd99edceef50a89d8ea797