From patchwork Tue Sep 14 19:25:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Linus_L=C3=BCssing?= X-Patchwork-Id: 12494515 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BA90C433EF for ; Tue, 14 Sep 2021 19:33:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E5A8361168 for ; Tue, 14 Sep 2021 19:33:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233319AbhINTe6 (ORCPT ); Tue, 14 Sep 2021 15:34:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232815AbhINTey (ORCPT ); Tue, 14 Sep 2021 15:34:54 -0400 X-Greylist: delayed 484 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Tue, 14 Sep 2021 12:33:36 PDT Received: from mail.aperture-lab.de (mail.aperture-lab.de [IPv6:2a01:4f8:c2c:665b::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CFFCC061574; Tue, 14 Sep 2021 12:33:36 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 7FA8241008; Tue, 14 Sep 2021 21:25:30 +0200 (CEST) From: =?utf-8?q?Linus_L=C3=BCssing?= To: Kalle Valo , Felix Fietkau , Sujith Manoharan , ath9k-devel@qca.qualcomm.com Cc: linux-wireless@vger.kernel.org, "David S . Miller" , Jakub Kicinski , "John W . Linville" , Felix Fietkau , Simon Wunderlich , Sven Eckelmann , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, =?utf-8?q?Linus_L=C3=BCssing?= Subject: [PATCH 1/3] ath9k: add option to reset the wifi chip via debugfs Date: Tue, 14 Sep 2021 21:25:13 +0200 Message-Id: <20210914192515.9273-2-linus.luessing@c0d3.blue> In-Reply-To: <20210914192515.9273-1-linus.luessing@c0d3.blue> References: <20210914192515.9273-1-linus.luessing@c0d3.blue> MIME-Version: 1.0 X-Last-TLS-Session-Version: TLSv1.3 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Linus Lüssing Sometimes, in yet unknown cases the wifi chip stops working. To allow a watchdog in userspace to easily and quickly reset the wifi chip, add the according functionality to userspace. A reset can then be triggered via: $ echo 1 > /sys/kernel/debug/ieee80211/phy0/ath9k/reset The number of user resets can further be tracked in the row "User reset" in the same file. So far people usually used "iw scan" to fix ath9k chip hangs from userspace. Which triggers the ath9k_queue_reset(), too. The reset file however has the advantage of less overhead, which makes debugging bugs within ath9k_queue_reset() easier. Signed-off-by: Linus Lüssing --- drivers/net/wireless/ath/ath9k/debug.c | 57 ++++++++++++++++++++++++-- drivers/net/wireless/ath/ath9k/debug.h | 1 + 2 files changed, 54 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c index 4c81b1d7f417..fb7a2952d0ce 100644 --- a/drivers/net/wireless/ath/ath9k/debug.c +++ b/drivers/net/wireless/ath/ath9k/debug.c @@ -749,9 +749,9 @@ static int read_file_misc(struct seq_file *file, void *data) static int read_file_reset(struct seq_file *file, void *data) { - struct ieee80211_hw *hw = dev_get_drvdata(file->private); - struct ath_softc *sc = hw->priv; + struct ath_softc *sc = file->private; static const char * const reset_cause[__RESET_TYPE_MAX] = { + [RESET_TYPE_USER] = "User reset", [RESET_TYPE_BB_HANG] = "Baseband Hang", [RESET_TYPE_BB_WATCHDOG] = "Baseband Watchdog", [RESET_TYPE_FATAL_INT] = "Fatal HW Error", @@ -779,6 +779,55 @@ static int read_file_reset(struct seq_file *file, void *data) return 0; } +static int open_file_reset(struct inode *inode, struct file *f) +{ + return single_open(f, read_file_reset, inode->i_private); +} + +static ssize_t write_file_reset(struct file *file, + const char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct ath_softc *sc = file_inode(file)->i_private; + struct ath_hw *ah = sc->sc_ah; + struct ath_common *common = ath9k_hw_common(ah); + unsigned long val; + char buf[32]; + ssize_t len; + + len = min(count, sizeof(buf) - 1); + if (copy_from_user(buf, user_buf, len)) + return -EFAULT; + + buf[len] = '\0'; + if (kstrtoul(buf, 0, &val)) + return -EINVAL; + + if (val != 1) + return -EINVAL; + + /* avoid rearming hw_reset_work on shutdown */ + mutex_lock(&sc->mutex); + if (test_bit(ATH_OP_INVALID, &common->op_flags)) { + mutex_unlock(&sc->mutex); + return -EBUSY; + } + + ath9k_queue_reset(sc, RESET_TYPE_USER); + mutex_unlock(&sc->mutex); + + return count; +} + +static const struct file_operations fops_reset = { + .read = seq_read, + .write = write_file_reset, + .open = open_file_reset, + .owner = THIS_MODULE, + .llseek = seq_lseek, + .release = single_release, +}; + void ath_debug_stat_tx(struct ath_softc *sc, struct ath_buf *bf, struct ath_tx_status *ts, struct ath_txq *txq, unsigned int flags) @@ -1393,8 +1442,8 @@ int ath9k_init_debug(struct ath_hw *ah) read_file_queues); debugfs_create_devm_seqfile(sc->dev, "misc", sc->debug.debugfs_phy, read_file_misc); - debugfs_create_devm_seqfile(sc->dev, "reset", sc->debug.debugfs_phy, - read_file_reset); + debugfs_create_file("reset", 0600, sc->debug.debugfs_phy, + sc, &fops_reset); ath9k_cmn_debug_recv(sc->debug.debugfs_phy, &sc->debug.stats.rxstats); ath9k_cmn_debug_phy_err(sc->debug.debugfs_phy, &sc->debug.stats.rxstats); diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h index 33826aa13687..389459c04d14 100644 --- a/drivers/net/wireless/ath/ath9k/debug.h +++ b/drivers/net/wireless/ath/ath9k/debug.h @@ -39,6 +39,7 @@ struct fft_sample_tlv; #endif enum ath_reset_type { + RESET_TYPE_USER, RESET_TYPE_BB_HANG, RESET_TYPE_BB_WATCHDOG, RESET_TYPE_FATAL_INT, From patchwork Tue Sep 14 19:25:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Linus_L=C3=BCssing?= X-Patchwork-Id: 12494519 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8754C433EF for ; Tue, 14 Sep 2021 19:33:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D32F761175 for ; Tue, 14 Sep 2021 19:33:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233206AbhINTe5 (ORCPT ); Tue, 14 Sep 2021 15:34:57 -0400 Received: from mail.aperture-lab.de ([116.203.183.178]:44902 "EHLO mail.aperture-lab.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230061AbhINTey (ORCPT ); Tue, 14 Sep 2021 15:34:54 -0400 X-Greylist: delayed 482 seconds by postgrey-1.27 at vger.kernel.org; Tue, 14 Sep 2021 15:34:53 EDT Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 173B841012; Tue, 14 Sep 2021 21:25:33 +0200 (CEST) From: =?utf-8?q?Linus_L=C3=BCssing?= To: Kalle Valo , Felix Fietkau , Sujith Manoharan , ath9k-devel@qca.qualcomm.com Cc: linux-wireless@vger.kernel.org, "David S . Miller" , Jakub Kicinski , "John W . Linville" , Felix Fietkau , Simon Wunderlich , Sven Eckelmann , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, =?utf-8?q?Linus_L=C3=BCssing?= , =?utf-8?q?Linus_L?= =?utf-8?q?=C3=BCssing?= Subject: [PATCH 2/3] ath9k: Fix potential interrupt storm on queue reset Date: Tue, 14 Sep 2021 21:25:14 +0200 Message-Id: <20210914192515.9273-3-linus.luessing@c0d3.blue> In-Reply-To: <20210914192515.9273-1-linus.luessing@c0d3.blue> References: <20210914192515.9273-1-linus.luessing@c0d3.blue> MIME-Version: 1.0 X-Last-TLS-Session-Version: TLSv1.3 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Linus Lüssing In tests with two Lima boards from 8devices (QCA4531 based) on OpenWrt 19.07 we could force a silent restart of a device with no serial output when we were sending a high amount of UDP traffic (iperf3 at 80 MBit/s in both directions from external hosts, saturating the wifi and causing a load of about 4.5 to 6) and were then triggering an ath9k_queue_reset(). Further debugging showed that the restart was caused by the ath79 watchdog. With disabled watchdog we could observe that the device was constantly going into ath_isr() interrupt handler and was returning early after the ATH_OP_HW_RESET flag test, without clearing any interrupts. Even though ath9k_queue_reset() calls ath9k_hw_kill_interrupts(). With JTAG we could observe the following race condition: 1) ath9k_queue_reset() ... -> ath9k_hw_kill_interrupts() -> set_bit(ATH_OP_HW_RESET, &common->op_flags); ... <- returns 2) ath9k_tasklet() ... -> ath9k_hw_resume_interrupts() ... <- returns 3) loops around: ... handle_int() -> ath_isr() ... -> if (test_bit(ATH_OP_HW_RESET, &common->op_flags)) return IRQ_HANDLED; x) ath_reset_internal(): => never reached <= And in ath_isr() we would typically see the following interrupts / interrupt causes: * status: 0x00111030 or 0x00110030 * async_cause: 2 (AR_INTR_MAC_IPQ) * sync_cause: 0 So the ath9k_tasklet() reenables the ath9k interrupts through ath9k_hw_resume_interrupts() which ath9k_queue_reset() had just disabled. And ath_isr() then keeps firing because it returns IRQ_HANDLED without actually clearing the interrupt. To fix this IRQ storm also clear/disable the interrupts again when we are in reset state. Cc: Sven Eckelmann Cc: Simon Wunderlich Cc: Linus Lüssing Fixes: 872b5d814f99 ("ath9k: do not access hardware on IRQs during reset") Signed-off-by: Linus Lüssing --- drivers/net/wireless/ath/ath9k/main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c index 139831539da3..98090e40e1cf 100644 --- a/drivers/net/wireless/ath/ath9k/main.c +++ b/drivers/net/wireless/ath/ath9k/main.c @@ -533,8 +533,10 @@ irqreturn_t ath_isr(int irq, void *dev) ath9k_debug_sync_cause(sc, sync_cause); status &= ah->imask; /* discard unasked-for bits */ - if (test_bit(ATH_OP_HW_RESET, &common->op_flags)) + if (test_bit(ATH_OP_HW_RESET, &common->op_flags)) { + ath9k_hw_kill_interrupts(sc->sc_ah); return IRQ_HANDLED; + } /* * If there are no status bits set, then this interrupt was not From patchwork Tue Sep 14 19:25:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Linus_L=C3=BCssing?= X-Patchwork-Id: 12494517 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EA75C4332F for ; Tue, 14 Sep 2021 19:33:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 82A2160E9B for ; Tue, 14 Sep 2021 19:33:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233416AbhINTfA (ORCPT ); Tue, 14 Sep 2021 15:35:00 -0400 Received: from mail.aperture-lab.de ([116.203.183.178]:44930 "EHLO mail.aperture-lab.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232113AbhINTey (ORCPT ); Tue, 14 Sep 2021 15:34:54 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id D4D2641015; Tue, 14 Sep 2021 21:25:34 +0200 (CEST) From: =?utf-8?q?Linus_L=C3=BCssing?= To: Kalle Valo , Felix Fietkau , Sujith Manoharan , ath9k-devel@qca.qualcomm.com Cc: linux-wireless@vger.kernel.org, "David S . Miller" , Jakub Kicinski , "John W . Linville" , Felix Fietkau , Simon Wunderlich , Sven Eckelmann , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, =?utf-8?q?Linus_L=C3=BCssing?= , =?utf-8?q?Linus_L?= =?utf-8?q?=C3=BCssing?= Subject: [PATCH 3/3] ath9k: Fix potential hw interrupt resume during reset Date: Tue, 14 Sep 2021 21:25:15 +0200 Message-Id: <20210914192515.9273-4-linus.luessing@c0d3.blue> In-Reply-To: <20210914192515.9273-1-linus.luessing@c0d3.blue> References: <20210914192515.9273-1-linus.luessing@c0d3.blue> MIME-Version: 1.0 X-Last-TLS-Session-Version: TLSv1.3 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Linus Lüssing There is a small risk of the ath9k hw interrupts being reenabled in the following way: 1) ath_reset_internal() ... -> disable_irq() ... <- returns 2) ath9k_tasklet() ... -> ath9k_hw_resume_interrupts() ... 1) ath_reset_internal() continued: -> tasklet_disable(&sc->intr_tq); (= ath9k_tasklet() off) By first disabling the ath9k interrupt there is a small window afterwards which allows ath9k hw interrupts being reenabled through the ath9k_tasklet() before we disable this tasklet in ath_reset_internal(). Leading to having the ath9k hw interrupts enabled during the reset, which we should avoid. Fixing this by first disabling all ath9k tasklets. And only after they are not running anymore also disabling the overall ath9k interrupt. Either ath9k_queue_reset()->ath9k_kill_hw_interrupts() or ath_reset_internal()->disable_irq()->ath_isr()->ath9k_kill_hw_interrupts() should then have ensured that no ath9k hw interrupts are running during the actual ath9k reset. We could reproduce this issue with two Lima boards from 8devices (QCA4531) on OpenWrt 19.07 while sending UDP traffic between the two and triggering an ath9k_queue_reset() and with added msleep()s between disable_irq() and tasklet_disable() in ath_reset_internal(). Cc: Sven Eckelmann Cc: Simon Wunderlich Cc: Linus Lüssing Fixes: e3f31175a3ee ("ath9k: fix race condition in irq processing during hardware reset") Signed-off-by: Linus Lüssing --- drivers/net/wireless/ath/ath9k/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c index 98090e40e1cf..b9f9a8ae3b56 100644 --- a/drivers/net/wireless/ath/ath9k/main.c +++ b/drivers/net/wireless/ath/ath9k/main.c @@ -292,9 +292,9 @@ static int ath_reset_internal(struct ath_softc *sc, struct ath9k_channel *hchan) __ath_cancel_work(sc); - disable_irq(sc->irq); tasklet_disable(&sc->intr_tq); tasklet_disable(&sc->bcon_tasklet); + disable_irq(sc->irq); spin_lock_bh(&sc->sc_pcu_lock); if (!sc->cur_chan->offchannel) {