Message ID | 20191225120002.11163-2-wgong@codeaurora.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Kalle Valo |
Headers | show |
Series | start recovery process when payload length overflow for sdio | expand |
This does not only effect SDIO. Why a semaphore / count? Could the conf_mutex be held earlier, or perhaps change the state to ATH10K_STATE_RESTARTING first? ath10k_reconfig_complete is also called in mac.c when channel is changed so On Wed, Dec 25, 2019 at 4:01 AM Wen Gong <wgong@codeaurora.org> wrote: > > When it has more than one restart_work queued meanwhile, the 2nd > restart_work is very esay to break the 1st restart work and lead > recovery fail. > > Add a ref count to allow only one restart work running untill > device successfully recovered. > > This patch only effect sdio chips. > > Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00029. > > Signed-off-by: Wen Gong <wgong@codeaurora.org> > --- > drivers/net/wireless/ath/ath10k/core.c | 8 ++++++++ > drivers/net/wireless/ath/ath10k/core.h | 2 ++ > drivers/net/wireless/ath/ath10k/mac.c | 1 + > 3 files changed, 11 insertions(+) > > diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c > index 91f131b87efc..4e0e8c86bdd4 100644 > --- a/drivers/net/wireless/ath/ath10k/core.c > +++ b/drivers/net/wireless/ath/ath10k/core.c > @@ -2199,6 +2199,14 @@ static void ath10k_core_restart(struct work_struct *work) > { > struct ath10k *ar = container_of(work, struct ath10k, restart_work); > int ret; > + int restart_count; > + > + restart_count = atomic_inc_and_test(&ar->restart_count); > + if (restart_count > 1) { > + ath10k_warn(ar, "can not restart, count: %d\n", restart_count); > + atomic_dec(&ar->restart_count); > + return; > + } > > set_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags); > > diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h > index e57b2e7235e3..810c99f2dc0e 100644 > --- a/drivers/net/wireless/ath/ath10k/core.h > +++ b/drivers/net/wireless/ath/ath10k/core.h > @@ -982,6 +982,8 @@ struct ath10k { > /* protected by conf_mutex */ > u8 ps_state_enable; > > + atomic_t restart_count; > + > bool nlo_enabled; > bool p2p; > > diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c > index 3856edba7915..bc1574145e66 100644 > --- a/drivers/net/wireless/ath/ath10k/mac.c > +++ b/drivers/net/wireless/ath/ath10k/mac.c > @@ -7208,6 +7208,7 @@ static void ath10k_reconfig_complete(struct ieee80211_hw *hw, > ath10k_info(ar, "device successfully recovered\n"); > ar->state = ATH10K_STATE_ON; > ieee80211_wake_queues(ar->hw); > + atomic_dec(&ar->restart_count); > } > > mutex_unlock(&ar->conf_mutex); > -- > 2.23.0
On 2019-12-25 23:14, Justin Capella wrote: > This does not only effect SDIO. > > Why a semaphore / count? Could the conf_mutex be held earlier, or > perhaps change the state to ATH10K_STATE_RESTARTING first? > ath10k_reconfig_complete is also called in mac.c when channel is > changed so patch v2: https://patchwork.kernel.org/patch/11313853/ https://patchwork.kernel.org/patch/11313859/
Instead of the atomic restart count, can the state be updated to ATH10K_STATE_RESTARTING while holding mutex_unlock(&ar->conf_mutex); I don't understand the bundles, but I wonder about the case when there are multiple packets (n_rx_pkts) and if pkt_bundle_len might be the one to check. Also if there needs to be a check that the len > sizeof HTC HDR. On Tue, Dec 31, 2019 at 1:37 AM <wgong@codeaurora.org> wrote: > > On 2019-12-25 23:14, Justin Capella wrote: > > This does not only effect SDIO. > > > > Why a semaphore / count? Could the conf_mutex be held earlier, or > > perhaps change the state to ATH10K_STATE_RESTARTING first? > > ath10k_reconfig_complete is also called in mac.c when channel is > > changed so > patch v2: > https://patchwork.kernel.org/patch/11313853/ > https://patchwork.kernel.org/patch/11313859/
On 2020-01-01 19:10, Justin Capella wrote: > Instead of the atomic restart count, can the state be updated to > ATH10K_STATE_RESTARTING while holding > mutex_unlock(&ar->conf_mutex); > the recovery process is begin with ath10k_core_restart, and end with ath10k_reconfig_complete. I already see it has mutex_lock(&ar->conf_mutex) and mutex_unlock(&ar->conf_mutex) in ath10k_core_restart, but it is not enough, for example: 1st recovery has finished ath10k_core_restart, but not arrive ath10k_reconfig_complete, then the 2nd recovery begin to enter ath10k_core_restart, it will destroy the 1st recovery and let 1st recovery fail. After apply this patch, after recovery about 18000+ times, and still can connect/scan/ping success. > I don't understand the bundles, but I wonder about the case when there > are multiple packets (n_rx_pkts) and if pkt_bundle_len might be the > one to check. Also if there needs to be a check that the len > sizeof > HTC HDR. > the htc_hdr->len is len of payload, so it allow < sizeof HTC HDR, but not allow > ATH10K_HTC_MBOX_MAX_PAYLOAD_LENGTH. pkt_bundle is only used when it has many packet in rx side, otherwise it is not bundled in rx. patch v3: https://patchwork.kernel.org/patch/11313853/ https://patchwork.kernel.org/patch/11313859/ > On Tue, Dec 31, 2019 at 1:37 AM <wgong@codeaurora.org> wrote: >> >> On 2019-12-25 23:14, Justin Capella wrote: >> > This does not only effect SDIO. >> > >> > Why a semaphore / count? Could the conf_mutex be held earlier, or >> > perhaps change the state to ATH10K_STATE_RESTARTING first? >> > ath10k_reconfig_complete is also called in mac.c when channel is >> > changed so >> patch v2: >> https://patchwork.kernel.org/patch/11313853/ >> https://patchwork.kernel.org/patch/11313859/
diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index 91f131b87efc..4e0e8c86bdd4 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -2199,6 +2199,14 @@ static void ath10k_core_restart(struct work_struct *work) { struct ath10k *ar = container_of(work, struct ath10k, restart_work); int ret; + int restart_count; + + restart_count = atomic_inc_and_test(&ar->restart_count); + if (restart_count > 1) { + ath10k_warn(ar, "can not restart, count: %d\n", restart_count); + atomic_dec(&ar->restart_count); + return; + } set_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags); diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h index e57b2e7235e3..810c99f2dc0e 100644 --- a/drivers/net/wireless/ath/ath10k/core.h +++ b/drivers/net/wireless/ath/ath10k/core.h @@ -982,6 +982,8 @@ struct ath10k { /* protected by conf_mutex */ u8 ps_state_enable; + atomic_t restart_count; + bool nlo_enabled; bool p2p; diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 3856edba7915..bc1574145e66 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -7208,6 +7208,7 @@ static void ath10k_reconfig_complete(struct ieee80211_hw *hw, ath10k_info(ar, "device successfully recovered\n"); ar->state = ATH10K_STATE_ON; ieee80211_wake_queues(ar->hw); + atomic_dec(&ar->restart_count); } mutex_unlock(&ar->conf_mutex);
When it has more than one restart_work queued meanwhile, the 2nd restart_work is very esay to break the 1st restart work and lead recovery fail. Add a ref count to allow only one restart work running untill device successfully recovered. This patch only effect sdio chips. Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00029. Signed-off-by: Wen Gong <wgong@codeaurora.org> --- drivers/net/wireless/ath/ath10k/core.c | 8 ++++++++ drivers/net/wireless/ath/ath10k/core.h | 2 ++ drivers/net/wireless/ath/ath10k/mac.c | 1 + 3 files changed, 11 insertions(+)