Message ID | 1542163824-795-1-git-send-email-wgong@codeaurora.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Kalle Valo |
Headers | show |
Series | ath10k: Remove ATH10K_STATE_RESTARTED in simulate fw crash | expand |
On Wed, 14 Nov 2018 at 03:51, Wen Gong <wgong@codeaurora.org> wrote: > > When test simulate firmware crash, it is easy to trigger error. > command: > echo soft > /sys/kernel/debug/ieee80211/phyxx/ath10k/simulate_fw_crash. > > If input more than two times continuously, then it will have error. > Error message: > ath10k_pci 0000:02:00.0: failed to set vdev 1 RX wake policy: -108 > ath10k_pci 0000:02:00.0: device is wedged, will not restart > > It is because the state has not changed to ATH10K_STATE_ON immediately, > then it will have more than two simulate crash process running meanwhile, > and complete/wakeup some field twice, it destroy the normal recovery > process. This was intended to allow testing not only firmware crash path (and recovery) but also firmware crash while recovering from a firmware crash. Michał
> > It is because the state has not changed to ATH10K_STATE_ON > > immediately, then it will have more than two simulate crash process > > running meanwhile, and complete/wakeup some field twice, it destroy > > the normal recovery process. > > This was intended to allow testing not only firmware crash path (and > recovery) but also firmware crash while recovering from a firmware crash. > If firmware is recovering from crash, then simulate a new crash will trigger error. So remove it. > > Michał > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k
On Mon, 7 Jan 2019 at 08:16, Wen Gong <wgong@qti.qualcomm.com> wrote: > > > > It is because the state has not changed to ATH10K_STATE_ON > > > immediately, then it will have more than two simulate crash process > > > running meanwhile, and complete/wakeup some field twice, it destroy > > > the normal recovery process. > > > > This was intended to allow testing not only firmware crash path (and > > recovery) but also firmware crash while recovering from a firmware crash. > > > If firmware is recovering from crash, then simulate a new crash will trigger error. > So remove it. That's actually a feature, not a bug. If firmware crashes while driver is restarting after a crash then its likely going to fail again and again causing a crash-restart loop which can affect system performance and responsiveness. It's better to give up and let the system admin take over. If it's still bothering you then please consider a crash counter threshold so that, e.g. after 5 crash-while-restarting it's going to give up. However I doubt it's worth the effort. My experience tells me firmware crashes during recovery are rarely, if at all, transient. The simulated fw crash is not representative here. It's a mere tool to test driver code. Michał
> > > > > > It is because the state has not changed to ATH10K_STATE_ON > > > > immediately, then it will have more than two simulate crash > > > > process running meanwhile, and complete/wakeup some field twice, > > > > it destroy the normal recovery process. > > > > > > This was intended to allow testing not only firmware crash path (and > > > recovery) but also firmware crash while recovering from a firmware crash. > > > > > If firmware is recovering from crash, then simulate a new crash will trigger > error. > > So remove it. > > That's actually a feature, not a bug. If firmware crashes while driver is > restarting after a crash then its likely going to fail again and again causing a > crash-restart loop which can affect system performance and responsiveness. > It's better to give up and let the system admin take over. > > If it's still bothering you then please consider a crash counter threshold so > that, e.g. after 5 crash-while-restarting it's going to give up. However I doubt > it's worth the effort. My experience tells me firmware crashes during > recovery are rarely, if at all, transient. > > The simulated fw crash is not representative here. It's a mere tool to test > driver code. The simulated fw crash is only a tool for user to trigger fw crash with command, This change's purpose is to disallow user to trigger fw crash if the fw is not in a Normal state. If the fw is in recovering state triggered by user's command or by fw, then it will disallow user to run command to trigger fw crash again until fw become to a normal State. > > > Michał
Wen Gong <wgong@qti.qualcomm.com> writes: >> > > > It is because the state has not changed to ATH10K_STATE_ON >> > > > immediately, then it will have more than two simulate crash >> > > > process running meanwhile, and complete/wakeup some field twice, >> > > > it destroy the normal recovery process. >> > > >> > > This was intended to allow testing not only firmware crash path (and >> > > recovery) but also firmware crash while recovering from a firmware crash. >> > > >> > If firmware is recovering from crash, then simulate a new crash will trigger >> error. >> > So remove it. >> >> That's actually a feature, not a bug. If firmware crashes while driver is >> restarting after a crash then its likely going to fail again and again causing a >> crash-restart loop which can affect system performance and responsiveness. >> It's better to give up and let the system admin take over. >> >> If it's still bothering you then please consider a crash counter threshold so >> that, e.g. after 5 crash-while-restarting it's going to give up. However I doubt >> it's worth the effort. My experience tells me firmware crashes during >> recovery are rarely, if at all, transient. >> >> The simulated fw crash is not representative here. It's a mere tool to test >> driver code. > > The simulated fw crash is only a tool for user to trigger fw crash > with command I think Michal knows what simulate_fw_crash as he is the one who implemented it in commit 278c4a85e626 :) > This change's purpose is to disallow user to trigger fw crash if the fw is not in a > Normal state. > > If the fw is in recovering state triggered by user's command or by fw, then it will > disallow user to run command to trigger fw crash again until fw become to a normal > State. I agree with Michal here and his proposal about having a crash counter sounds like a good to me. So I'm dropping this patch.
Kalle Valo <kvalo@codeaurora.org> writes: >> This change's purpose is to disallow user to trigger fw crash if the fw is not in a >> Normal state. >> >> If the fw is in recovering state triggered by user's command or by fw, then it will >> disallow user to run command to trigger fw crash again until fw become to a normal >> State. > > I agree with Michal here and his proposal about having a crash counter > sounds like a good to me. So I'm dropping this patch. Bah, missed a word again. I meant "sounds like a good idea to me".
> -----Original Message----- > From: Michał Kazior <kazikcz@gmail.com> > Sent: Monday, January 7, 2019 4:36 PM > To: Wen Gong <wgong@qti.qualcomm.com> > Cc: Wen Gong <wgong@codeaurora.org>; linux-wireless <linux- > wireless@vger.kernel.org>; ath10k@lists.infradead.org > Subject: [EXT] Re: [PATCH] ath10k: Remove ATH10K_STATE_RESTARTED in > simulate fw crash > That's actually a feature, not a bug. If firmware crashes while driver > is restarting after a crash then its likely going to fail again and > again causing a crash-restart loop which can affect system performance > and responsiveness. It's better to give up and let the system admin > take over. > > If it's still bothering you then please consider a crash counter > threshold so that, e.g. after 5 crash-while-restarting it's going to > give up. However I doubt it's worth the effort. My experience tells me > firmware crashes during recovery are rarely, if at all, transient. > > The simulated fw crash is not representative here. It's a mere tool to > test driver code. > Hi Michal, There have a stress test case for the simulate fw crash, it will simulate fw crash in a very short time for each test, this will trigger the stress test fail. The simulate fw crash process should not be run parallel, after this patch, the Stress test case will pass. > > Michał
> -----Original Message----- > From: Wen Gong > Sent: Monday, April 1, 2019 2:11 PM > To: 'Michał Kazior' <kazikcz@gmail.com> > Cc: Wen Gong <wgong@codeaurora.org>; linux-wireless <linux- > wireless@vger.kernel.org>; ath10k@lists.infradead.org > Subject: RE: [EXT] Re: [PATCH] ath10k: Remove ATH10K_STATE_RESTARTED in > simulate fw crash > > > > > If it's still bothering you then please consider a crash counter > > threshold so that, e.g. after 5 crash-while-restarting it's going to > > give up. However I doubt it's worth the effort. My experience tells me > > firmware crashes during recovery are rarely, if at all, transient. > > > > The simulated fw crash is not representative here. It's a mere tool to > > test driver code. > > > Hi Michal, > There have a stress test case for the simulate fw crash, it will simulate fw > crash > in a very short time for each test, this will trigger the stress test fail. > The simulate fw crash process should not be run parallel, after this patch, the > Stress test case will pass. > > Hi Michał, Do you have some new comments? > > Michał
On Mon, 8 Apr 2019 at 12:20, Wen Gong <wgong@qti.qualcomm.com> wrote: > > -----Original Message----- > > From: Wen Gong > > Sent: Monday, April 1, 2019 2:11 PM > > To: 'Michał Kazior' <kazikcz@gmail.com> > > Cc: Wen Gong <wgong@codeaurora.org>; linux-wireless <linux- > > wireless@vger.kernel.org>; ath10k@lists.infradead.org > > Subject: RE: [EXT] Re: [PATCH] ath10k: Remove ATH10K_STATE_RESTARTED in > > simulate fw crash > > > > > > > > If it's still bothering you then please consider a crash counter > > > threshold so that, e.g. after 5 crash-while-restarting it's going to > > > give up. However I doubt it's worth the effort. My experience tells me > > > firmware crashes during recovery are rarely, if at all, transient. > > > > > > The simulated fw crash is not representative here. It's a mere tool to > > > test driver code. > > > > > Hi Michal, > > There have a stress test case for the simulate fw crash, it will simulate fw > > crash > > in a very short time for each test, this will trigger the stress test fail. > > The simulate fw crash process should not be run parallel, after this patch, the > > Stress test case will pass. > > > > > Hi Michał, > Do you have some new comments? My original use case was to be able to exercise the driver's robustness in handling nested fw crashes, IOW crash-within-a-crash. Your test case, as far as I understand, intends to perform consecutive, non-nested fw crash simulation stress test. Both of these are mutually exclusive and your patch fixes your test case at the expense of breaking my original case. To satisfy both I would suggest you either expose ar->state via debugfs and make your test procedure wait for that to get back into ON state before simulating a crash again, or to extend the set of current simulate_fw_crash commands (currently just: soft, hard, assert, hw-restart) to something that allows expressing the intent whether crash-in-crash prevention is intended (your case) or not (my original case). This could be for example something like this: echo soft wait-ready > simulate_fw_crash The "wait-ready" extra keyword would imply crash-in-crash prevention. This would keep existing tools working (both behavior and syntax) and would allow your test case to be implemented. Michał
> From: Michał Kazior <kazikcz@gmail.com> > Sent: Tuesday, April 9, 2019 1:27 AM > To: Wen Gong <wgong@qti.qualcomm.com> > Cc: Wen Gong <wgong@codeaurora.org>; linux-wireless <linux- > wireless@vger.kernel.org>; ath10k@lists.infradead.org > Subject: [EXT] Re: [PATCH] ath10k: Remove ATH10K_STATE_RESTARTED in > simulate fw crash > > > > Hi Michal, > > > There have a stress test case for the simulate fw crash, it will simulate fw > > > crash > > > in a very short time for each test, this will trigger the stress test fail. > > > The simulate fw crash process should not be run parallel, after this patch, > the > > > Stress test case will pass. > > > > > > > > Hi Michał, > > Do you have some new comments? > > My original use case was to be able to exercise the driver's > robustness in handling nested fw crashes, IOW crash-within-a-crash. > > Your test case, as far as I understand, intends to perform > consecutive, non-nested fw crash simulation stress test. > > Both of these are mutually exclusive and your patch fixes your test > case at the expense of breaking my original case. > > To satisfy both I would suggest you either expose ar->state via > debugfs and make your test procedure wait for that to get back into ON > state before simulating a crash again, or to extend the set of current > simulate_fw_crash commands (currently just: soft, hard, assert, > hw-restart) to something that allows expressing the intent whether > crash-in-crash prevention is intended (your case) or not (my original > case). > > This could be for example something like this: > echo soft wait-ready > simulate_fw_crash > > The "wait-ready" extra keyword would imply crash-in-crash prevention. > This would keep existing tools working (both behavior and syntax) and > would allow your test case to be implemented. > Is it easy to change your existing tools? I want to change it to: echo soft skip-ready > simulate_fw_crash The "skip-ready" extra keyword would imply crash-in-crash, *not* prevention. My test tools is hard to change. > > Michał
On Mon, Apr 8, 2019 at 10:09 PM Wen Gong <wgong@qti.qualcomm.com> wrote: > > From: Michał Kazior <kazikcz@gmail.com> > > To satisfy both I would suggest you either expose ar->state via > > debugfs and make your test procedure wait for that to get back into ON > > state before simulating a crash again, or to extend the set of current > > simulate_fw_crash commands (currently just: soft, hard, assert, > > hw-restart) to something that allows expressing the intent whether > > crash-in-crash prevention is intended (your case) or not (my original > > case). > > > > This could be for example something like this: > > echo soft wait-ready > simulate_fw_crash > > > > The "wait-ready" extra keyword would imply crash-in-crash prevention. > > This would keep existing tools working (both behavior and syntax) and > > would allow your test case to be implemented. > > > Is it easy to change your existing tools? > I want to change it to: echo soft skip-ready > simulate_fw_crash > The "skip-ready" extra keyword would imply crash-in-crash, *not* prevention. > My test tools is hard to change. In case you're talking about the test framework we run for ChromeOS validation, no, it's not hard at all to change. As long as there's a good reason. I haven't closely followed this, but judging by the above summary, it's probably more reasonable for our test framework to only simulate FW crashes after the driver returns to "ready" (or at least, if we do crash-in-crash, don't expect the driver to recover?). I expect we can work with whatever mechanism you implement for that (exposing the "state", or providing a new simulate_fw_crash mode). Brian
> -----Original Message----- > From: Brian Norris <briannorris@chromium.org> > Sent: Wednesday, April 10, 2019 7:25 AM > To: Wen Gong <wgong@qti.qualcomm.com> > Cc: Michał Kazior <kazikcz@gmail.com>; Wen Gong > <wgong@codeaurora.org>; linux-wireless <linux-wireless@vger.kernel.org>; > ath10k@lists.infradead.org > Subject: [EXT] Re: [PATCH] ath10k: Remove ATH10K_STATE_RESTARTED in > simulate fw crash > > On Mon, Apr 8, 2019 at 10:09 PM Wen Gong <wgong@qti.qualcomm.com> > wrote: > > > From: Michał Kazior <kazikcz@gmail.com> > > > To satisfy both I would suggest you either expose ar->state via > > > debugfs and make your test procedure wait for that to get back into ON > > > state before simulating a crash again, or to extend the set of current > > > simulate_fw_crash commands (currently just: soft, hard, assert, > > > hw-restart) to something that allows expressing the intent whether > > > crash-in-crash prevention is intended (your case) or not (my original > > > case). > > > > > > This could be for example something like this: > > > echo soft wait-ready > simulate_fw_crash > > > > > > The "wait-ready" extra keyword would imply crash-in-crash prevention. > > > This would keep existing tools working (both behavior and syntax) and > > > would allow your test case to be implemented. > > > > > Is it easy to change your existing tools? > > I want to change it to: echo soft skip-ready > simulate_fw_crash > > The "skip-ready" extra keyword would imply crash-in-crash, *not* > prevention. > > My test tools is hard to change. > > In case you're talking about the test framework we run for ChromeOS > validation, no, it's not hard at all to change. As long as there's a > good reason. > > I haven't closely followed this, but judging by the above summary, > it's probably more reasonable for our test framework to only simulate > FW crashes after the driver returns to "ready" (or at least, if we do > crash-in-crash, don't expect the driver to recover?). I expect we can > work with whatever mechanism you implement for that (exposing the > "state", or providing a new simulate_fw_crash mode). > If ChromeOS is easy to change tool, I think I will change the mechanism of the simulate_fw_crash. Then all tools will work normally. > Brian
> -----Original Message----- > From: ath10k <ath10k-bounces@lists.infradead.org> On Behalf Of Wen Gong > Sent: Wednesday, April 10, 2019 10:45 AM > To: Brian Norris <briannorris@chromium.org> > Cc: Michał Kazior <kazikcz@gmail.com>; linux-wireless <linux- > wireless@vger.kernel.org>; ath10k@lists.infradead.org; Wen Gong > <wgong@codeaurora.org> > Subject: [EXT] RE: [PATCH] ath10k: Remove ATH10K_STATE_RESTARTED in > simulate fw crash > > If ChromeOS is easy to change tool, > I think I will change the mechanism of the simulate_fw_crash. > Then all tools will work normally. > New patch uploaded https://patchwork.kernel.org/patch/10897587/ [v2] ath10k: Remove ATH10K_STATE_RESTARTED in simulate fw crash
diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c index ada29a4..dc8700b 100644 --- a/drivers/net/wireless/ath/ath10k/debug.c +++ b/drivers/net/wireless/ath/ath10k/debug.c @@ -569,8 +569,7 @@ static ssize_t ath10k_write_simulate_fw_crash(struct file *file, mutex_lock(&ar->conf_mutex); - if (ar->state != ATH10K_STATE_ON && - ar->state != ATH10K_STATE_RESTARTED) { + if (ar->state != ATH10K_STATE_ON) { ret = -ENETDOWN; goto exit; }
When test simulate firmware crash, it is easy to trigger error. command: echo soft > /sys/kernel/debug/ieee80211/phyxx/ath10k/simulate_fw_crash. If input more than two times continuously, then it will have error. Error message: ath10k_pci 0000:02:00.0: failed to set vdev 1 RX wake policy: -108 ath10k_pci 0000:02:00.0: device is wedged, will not restart It is because the state has not changed to ATH10K_STATE_ON immediately, then it will have more than two simulate crash process running meanwhile, and complete/wakeup some field twice, it destroy the normal recovery process. Tested with QCA6174 PCI with firmware WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI. It's not a regression with new firmware releases. Signed-off-by: Wen Gong <wgong@codeaurora.org> --- drivers/net/wireless/ath/ath10k/debug.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)