Message ID | 20231117003919.26218-1-quic_bqiang@quicinc.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 5082b3e3027eae393a4e86874bffb4ce3f83c26e |
Delegated to: | Kalle Valo |
Headers | show |
Series | wifi: ath11k: fix race due to setting ATH11K_FLAG_EXT_IRQ_ENABLED too early | expand |
On 11/16/2023 4:39 PM, Baochen Qiang wrote: > We are seeing below error randomly in the case where only > one MSI vector is configured: > > kernel: ath11k_pci 0000:03:00.0: wmi command 16387 timeout > > The reason is, currently, in ath11k_pcic_ext_irq_enable(), > ATH11K_FLAG_EXT_IRQ_ENABLED is set before NAPI is enabled. > This results in a race condition: after > ATH11K_FLAG_EXT_IRQ_ENABLED is set but before NAPI enabled, > CE interrupt breaks in. Since IRQ is shared by CE and data > path, ath11k_pcic_ext_interrupt_handler() is also called > where we call disable_irq_nosync() to disable IRQ. Then > napi_schedule() is called but it does nothing because NAPI > is not enabled at that time, meaning > ath11k_pcic_ext_grp_napi_poll() will never run, so we have > no chance to call enable_irq() to enable IRQ back. Finally > we get above error. > > Fix it by setting ATH11K_FLAG_EXT_IRQ_ENABLED after all > NAPI and IRQ work are done. With the fix, we are sure that > by the time ATH11K_FLAG_EXT_IRQ_ENABLED is set, NAPI is > enabled. > > Note that the fix above also introduce some side effects: > if ath11k_pcic_ext_interrupt_handler() breaks in after NAPI > enabled but before ATH11K_FLAG_EXT_IRQ_ENABLED set, nothing > will be done by the handler this time, the work will be > postponed till the next time the IRQ fires. > > Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 > > Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Baochen Qiang <quic_bqiang@quicinc.com> wrote: > We are seeing below error randomly in the case where only > one MSI vector is configured: > > kernel: ath11k_pci 0000:03:00.0: wmi command 16387 timeout > > The reason is, currently, in ath11k_pcic_ext_irq_enable(), > ATH11K_FLAG_EXT_IRQ_ENABLED is set before NAPI is enabled. > This results in a race condition: after > ATH11K_FLAG_EXT_IRQ_ENABLED is set but before NAPI enabled, > CE interrupt breaks in. Since IRQ is shared by CE and data > path, ath11k_pcic_ext_interrupt_handler() is also called > where we call disable_irq_nosync() to disable IRQ. Then > napi_schedule() is called but it does nothing because NAPI > is not enabled at that time, meaning > ath11k_pcic_ext_grp_napi_poll() will never run, so we have > no chance to call enable_irq() to enable IRQ back. Finally > we get above error. > > Fix it by setting ATH11K_FLAG_EXT_IRQ_ENABLED after all > NAPI and IRQ work are done. With the fix, we are sure that > by the time ATH11K_FLAG_EXT_IRQ_ENABLED is set, NAPI is > enabled. > > Note that the fix above also introduce some side effects: > if ath11k_pcic_ext_interrupt_handler() breaks in after NAPI > enabled but before ATH11K_FLAG_EXT_IRQ_ENABLED set, nothing > will be done by the handler this time, the work will be > postponed till the next time the IRQ fires. > > Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 > > Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> > Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> > Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Patch applied to ath-next branch of ath.git, thanks. 5082b3e3027e wifi: ath11k: fix race due to setting ATH11K_FLAG_EXT_IRQ_ENABLED too early
diff --git a/drivers/net/wireless/ath/ath11k/pcic.c b/drivers/net/wireless/ath/ath11k/pcic.c index 16d1e332193f..e602d4130105 100644 --- a/drivers/net/wireless/ath/ath11k/pcic.c +++ b/drivers/net/wireless/ath/ath11k/pcic.c @@ -460,8 +460,6 @@ void ath11k_pcic_ext_irq_enable(struct ath11k_base *ab) { int i; - set_bit(ATH11K_FLAG_EXT_IRQ_ENABLED, &ab->dev_flags); - for (i = 0; i < ATH11K_EXT_IRQ_GRP_NUM_MAX; i++) { struct ath11k_ext_irq_grp *irq_grp = &ab->ext_irq_grp[i]; @@ -471,6 +469,8 @@ void ath11k_pcic_ext_irq_enable(struct ath11k_base *ab) } ath11k_pcic_ext_grp_enable(irq_grp); } + + set_bit(ATH11K_FLAG_EXT_IRQ_ENABLED, &ab->dev_flags); } EXPORT_SYMBOL(ath11k_pcic_ext_irq_enable);
We are seeing below error randomly in the case where only one MSI vector is configured: kernel: ath11k_pci 0000:03:00.0: wmi command 16387 timeout The reason is, currently, in ath11k_pcic_ext_irq_enable(), ATH11K_FLAG_EXT_IRQ_ENABLED is set before NAPI is enabled. This results in a race condition: after ATH11K_FLAG_EXT_IRQ_ENABLED is set but before NAPI enabled, CE interrupt breaks in. Since IRQ is shared by CE and data path, ath11k_pcic_ext_interrupt_handler() is also called where we call disable_irq_nosync() to disable IRQ. Then napi_schedule() is called but it does nothing because NAPI is not enabled at that time, meaning ath11k_pcic_ext_grp_napi_poll() will never run, so we have no chance to call enable_irq() to enable IRQ back. Finally we get above error. Fix it by setting ATH11K_FLAG_EXT_IRQ_ENABLED after all NAPI and IRQ work are done. With the fix, we are sure that by the time ATH11K_FLAG_EXT_IRQ_ENABLED is set, NAPI is enabled. Note that the fix above also introduce some side effects: if ath11k_pcic_ext_interrupt_handler() breaks in after NAPI enabled but before ATH11K_FLAG_EXT_IRQ_ENABLED set, nothing will be done by the handler this time, the work will be postponed till the next time the IRQ fires. Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> --- drivers/net/wireless/ath/ath11k/pcic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) base-commit: 9a36440d929d134c56030a8492405708a143f580