Message ID | 1397710914-10061-2-git-send-email-bzhao@marvell.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hi Bing, Assuming the timeout happened due to a firmware bug. Does the firmware able to recover after setting adapter->cmd_sent = false and the firmware could accept a new commands without locking?. Seems, this is the bug I was encountering and couldn't access the mlan0 interface anymore... Is there a way to force the firmware to reset it without rebooting it?. Regards, john On Wed, Apr 16, 2014 at 10:01 PM, Bing Zhao <bzhao@marvell.com> wrote: > From: Amitkumar Karwar <akarwar@marvell.com> > > When command timeout occurs due to a firmware/hardware bug, > there is no chance of next command being successful. We will > keep cmd_sent flag on so that next command won't be sent to > firmware. > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> > Signed-off-by: Bing Zhao <bzhao@marvell.com> > --- > drivers/net/wireless/mwifiex/cmdevt.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/net/wireless/mwifiex/cmdevt.c b/drivers/net/wireless/mwifiex/cmdevt.c > index 1062c91..8dee6c8 100644 > --- a/drivers/net/wireless/mwifiex/cmdevt.c > +++ b/drivers/net/wireless/mwifiex/cmdevt.c > @@ -955,8 +955,6 @@ mwifiex_cmd_timeout_func(unsigned long function_context) > adapter->cmd_wait_q.status = -ETIMEDOUT; > wake_up_interruptible(&adapter->cmd_wait_q.wait); > mwifiex_cancel_pending_ioctl(adapter); > - /* reset cmd_sent flag to unblock new commands */ > - adapter->cmd_sent = false; > } > } > if (adapter->hw_status == MWIFIEX_HW_STATUS_INITIALIZING) > -- > 1.8.2.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-wireless" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
SGkgSm9obiwNCg0KPiBIaSBCaW5nLA0KPiANCj4gQXNzdW1pbmcgdGhlIHRpbWVvdXQgaGFwcGVu ZWQgZHVlIHRvIGEgZmlybXdhcmUgYnVnLiBEb2VzIHRoZSBmaXJtd2FyZQ0KPiBhYmxlIHRvIHJl Y292ZXIgYWZ0ZXIgc2V0dGluZyBhZGFwdGVyLT5jbWRfc2VudCA9IGZhbHNlIGFuZCB0aGUNCj4g ZmlybXdhcmUgY291bGQgYWNjZXB0IGEgbmV3IGNvbW1hbmRzIHdpdGhvdXQgbG9ja2luZz8uDQoN ClRoYXQgImFkYXB0ZXItPmNtZF9zZW50ID0gZmFsc2UiIHdhcyBob3BpbmcgdGhlIGZpcm13YXJl IGlzIHN0aWxsIGFsaXZlIGFuZCBjYW4gcmVzcG9uZCB0byBhIG5ldyBjb21tYW5kLiBUaGUgcmVh bGl0eSBpcyB0aGF0IHRoZSB0aW1lb3V0IHVzdWFsbHkgaW5kaWNhdGVzIHRoZSBmaXJtd2FyZSBo YXMgYWxyZWFkeSBodW5nLiBTZW5kaW5nIGFub3RoZXIgY29tbWFuZCB3b24ndCByZWNvdmVyIGl0 IGluIHRoaXMgY2FzZS4NCg0KPiBTZWVtcywgdGhpcyBpcyB0aGUgYnVnIEkgd2FzIGVuY291bnRl cmluZyBhbmQgY291bGRuJ3QgYWNjZXNzIHRoZQ0KPiBtbGFuMCBpbnRlcmZhY2UgYW55bW9yZS4u Lg0KDQpUaGlzIHBhdGNoIGl0c2VsZiBkb2Vzbid0IHNvbHZlIGFueSBleGlzdGluZyBpc3N1ZS4g SXQgaGVscHMga2VlcCB0aGUgc2NlbmUgb2YgdGhlIGluaXRpYWwgdGltZW91dCBmb3IgaW52ZXN0 aWdhdGlvbi4NCg0KPiANCj4gSXMgdGhlcmUgYSB3YXkgdG8gZm9yY2UgdGhlIGZpcm13YXJlIHRv IHJlc2V0IGl0IHdpdGhvdXQgcmVib290aW5nIGl0Py4NCg0KSSBndWVzcyB5b3UgYXJlIHVzaW5n IFNESU8gY2hpcC4gSWYgeW91ciBob3N0IGNvbnRyb2xsZXIgc3VwcG9ydHMgTU1DX1BPV0VSX09G Ri9VUCwgeW91IGNhbiByZXNldCB0aGUgY2hpcCB3aXRoIHRoaXMgYXBwcm9hY2g6DQoNCiAgICAg ICAgbW1jX3JlbW92ZV9ob3N0KGhvc3QpOw0KICAgICAgICAvKiBzb21lIGRlbGF5ICovDQogICAg ICAgIG1tY19hZGRfaG9zdChob3N0KTsNCg0KUmVnYXJkcywNCkJpbmcNCg0KPiANCj4gUmVnYXJk cywNCj4gDQo+IGpvaG4NCj4gDQo+IA0KPiBPbiBXZWQsIEFwciAxNiwgMjAxNCBhdCAxMDowMSBQ TSwgQmluZyBaaGFvIDxiemhhb0BtYXJ2ZWxsLmNvbT4gd3JvdGU6DQo+ID4gRnJvbTogQW1pdGt1 bWFyIEthcndhciA8YWthcndhckBtYXJ2ZWxsLmNvbT4NCj4gPg0KPiA+IFdoZW4gY29tbWFuZCB0 aW1lb3V0IG9jY3VycyBkdWUgdG8gYSBmaXJtd2FyZS9oYXJkd2FyZSBidWcsDQo+ID4gdGhlcmUg aXMgbm8gY2hhbmNlIG9mIG5leHQgY29tbWFuZCBiZWluZyBzdWNjZXNzZnVsLiBXZSB3aWxsDQo+ ID4ga2VlcCBjbWRfc2VudCBmbGFnIG9uIHNvIHRoYXQgbmV4dCBjb21tYW5kIHdvbid0IGJlIHNl bnQgdG8NCj4gPiBmaXJtd2FyZS4NCj4gPg0KPiA+IFNpZ25lZC1vZmYtYnk6IEFtaXRrdW1hciBL YXJ3YXIgPGFrYXJ3YXJAbWFydmVsbC5jb20+DQo+ID4gU2lnbmVkLW9mZi1ieTogQmluZyBaaGFv IDxiemhhb0BtYXJ2ZWxsLmNvbT4NCj4gPiAtLS0NCj4gPiAgZHJpdmVycy9uZXQvd2lyZWxlc3Mv bXdpZmlleC9jbWRldnQuYyB8IDIgLS0NCj4gPiAgMSBmaWxlIGNoYW5nZWQsIDIgZGVsZXRpb25z KC0pDQo+ID4NCj4gPiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9uZXQvd2lyZWxlc3MvbXdpZmlleC9j bWRldnQuYyBiL2RyaXZlcnMvbmV0L3dpcmVsZXNzL213aWZpZXgvY21kZXZ0LmMNCj4gPiBpbmRl eCAxMDYyYzkxLi44ZGVlNmM4IDEwMDY0NA0KPiA+IC0tLSBhL2RyaXZlcnMvbmV0L3dpcmVsZXNz L213aWZpZXgvY21kZXZ0LmMNCj4gPiArKysgYi9kcml2ZXJzL25ldC93aXJlbGVzcy9td2lmaWV4 L2NtZGV2dC5jDQo+ID4gQEAgLTk1NSw4ICs5NTUsNiBAQCBtd2lmaWV4X2NtZF90aW1lb3V0X2Z1 bmModW5zaWduZWQgbG9uZyBmdW5jdGlvbl9jb250ZXh0KQ0KPiA+ICAgICAgICAgICAgICAgICAg ICAgICAgIGFkYXB0ZXItPmNtZF93YWl0X3Euc3RhdHVzID0gLUVUSU1FRE9VVDsNCj4gPiAgICAg ICAgICAgICAgICAgICAgICAgICB3YWtlX3VwX2ludGVycnVwdGlibGUoJmFkYXB0ZXItPmNtZF93 YWl0X3Eud2FpdCk7DQo+ID4gICAgICAgICAgICAgICAgICAgICAgICAgbXdpZmlleF9jYW5jZWxf cGVuZGluZ19pb2N0bChhZGFwdGVyKTsNCj4gPiAtICAgICAgICAgICAgICAgICAgICAgICAvKiBy ZXNldCBjbWRfc2VudCBmbGFnIHRvIHVuYmxvY2sgbmV3IGNvbW1hbmRzICovDQo+ID4gLSAgICAg ICAgICAgICAgICAgICAgICAgYWRhcHRlci0+Y21kX3NlbnQgPSBmYWxzZTsNCj4gPiAgICAgICAg ICAgICAgICAgfQ0KPiA+ICAgICAgICAgfQ0KPiA+ICAgICAgICAgaWYgKGFkYXB0ZXItPmh3X3N0 YXR1cyA9PSBNV0lGSUVYX0hXX1NUQVRVU19JTklUSUFMSVpJTkcpDQo+ID4gLS0NCj4gPiAxLjgu Mi4zDQo+ID4NCj4gPiAtLQ0KPiA+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5k IHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC13aXJlbGVzcyIgaW4NCj4gPiB0aGUgYm9keSBv ZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KPiA+IE1vcmUgbWFqb3Jk b21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0K -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 17, 2014 at 04:33:58PM -0700, Bing Zhao wrote: > Hi John, > > > Hi Bing, > > > > Assuming the timeout happened due to a firmware bug. Does the > > firmware able to recover after setting adapter->cmd_sent = false > > and the firmware could accept a new commands without locking?. > > That "adapter->cmd_sent = false" was hoping the firmware is still > alive and can respond to a new command. The reality is that the > timeout usually indicates the firmware has already hung. Sending > another command won't recover it in this case. I'm dealing with a firmware hang when more than 13 nodes are in an ad-hoc IBSS, and I've just found out isn't entirely a firmware hang; in that we can see beacons and probe responses from the card, using tcpdump and monitor mode. I'm interested to know if the "firmware hangs" that you experiment with prevent autonomous RF TX, or if RF TX typically proceeds. > I guess you are using SDIO chip. If your host controller supports > MMC_POWER_OFF/UP, you can reset the chip with this approach: > > mmc_remove_host(host); > /* some delay */ > mmc_add_host(host); Thanks, adding that to my list of things to try, as I am using SDIO too.
Hi James, > > That "adapter->cmd_sent = false" was hoping the firmware is still > > alive and can respond to a new command. The reality is that the > > timeout usually indicates the firmware has already hung. Sending > > another command won't recover it in this case. > > I'm dealing with a firmware hang when more than 13 nodes are in an ad-hoc > IBSS, and I've just found out isn't entirely a firmware hang; in that we can see > beacons and probe responses from the card, using tcpdump and monitor > mode. > > I'm interested to know if the "firmware hangs" that you experiment with > prevent autonomous RF TX, or if RF TX typically proceeds. It depends. Even if firmware hangs the hardware is still alive. So you could see beacons and probe responses from the card if hardware has been programmed before firmware hangs. > > I guess you are using SDIO chip. If your host controller supports > > MMC_POWER_OFF/UP, you can reset the chip with this approach: > > > > mmc_remove_host(host); > > /* some delay */ > > mmc_add_host(host); > > Thanks, adding that to my list of things to try, as I am using SDIO too. This code (with 20ms delay) is already in latest driver. Your platform and controller may require a longer delay. Regards, Bing -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 18, 2014 at 12:16:07PM -0700, Bing Zhao wrote: > Hi James, > > > > That "adapter->cmd_sent = false" was hoping the firmware is > > > still alive and can respond to a new command. The reality is > > > that the timeout usually indicates the firmware has already > > > hung. Sending another command won't recover it in this case. > > > > I'm dealing with a firmware hang when more than 13 nodes are in an > > ad-hoc IBSS, and I've just found out isn't entirely a firmware > > hang; in that we can see beacons and probe responses from the > > card, using tcpdump and monitor mode. > > > > I'm interested to know if the "firmware hangs" that you experiment > > with prevent autonomous RF TX, or if RF TX typically proceeds. > > It depends. Even if firmware hangs the hardware is still alive. > So you could see beacons and probe responses from the card if > hardware has been programmed before firmware hangs. Thanks. I neglected to mention the time period; beacons and probe responses are seen for many minutes after the timeout report by the driver, and I have not yet tested for how long this lasts. The probe responses are in reply to new probe requests. It makes me think the card is working fine, apart from not communicating with the host. HOST_INSTATUS_REG, RD_BITMAP_{U,L} are all zero when read at the timeout. I am reliably reproducing this particular problem. > > > I guess you are using SDIO chip. If your host controller > > > supports MMC_POWER_OFF/UP, you can reset the chip with this > > > approach: > > > > > > mmc_remove_host(host); > > > /* some delay */ > > > mmc_add_host(host); > > > > Thanks, adding that to my list of things to try, as I am using > > SDIO too. > > This code (with 20ms delay) is already in latest driver. Your > platform and controller may require a longer delay. Thanks. This is the patch I found: mwifiex: add support for SDIO card reset and it isn't in our tree yet. Yes, we may need to test the delay required. We have a host GPIO that drives power to the card. We have discharge clamps on that path as well. mmc_* is configured through device-tree to use the GPIO, which we use for suspend and resume. We have power-delay-ms properties but they aren't used. I've been testing the patch with 3000ms delay, and additional output: pr_err("Resetting card (3000ms) ...\n"); mmc_remove_host(reset_host); pr_err("removed host\n"); mdelay(3000); pr_err("delayed\n"); mmc_add_host(reset_host); pr_err("added host\n"); If the host joins an IBSS with 10 peers, and three more peers added, the wireless LED stays on, and: [ 105.023274] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id (1397865681.433582) = 0xa4, act = 0x0 [ 105.033735] mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0 [ 105.039533] mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0 [ 105.045235] mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1 [ 105.045245] mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0 [ 105.055866] mwifiex_sdio mmc0:0001:1: last_cmd_index = 3 [ 105.061148] mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 2 [ 105.066868] mwifiex_sdio mmc0:0001:1: last_event_index = 3 [ 105.072320] mwifiex_sdio mmc0:0001:1: data_sent=0 cmd_sent=1 [ 105.077944] mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0 [ 105.083408] mwifiex_sdio: Resetting card (3000ms) ... [ 105.083408] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing [ 105.098195] mwifiex_sdio mmc0:0001:1: cmd timeout This is mmc_remove_host not returning. I've no idea why yet. +CC cjb. If the host joins an IBSS with with 13 peers, the wireless LED goes off, and: [ 83.603038] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id (1397865805.48239) = 0x10, act = 0x1 [ 83.613425] mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0 [ 83.613425] mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0 [ 83.624911] mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1 [ 83.624918] mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0 [ 83.635542] mwifiex_sdio mmc0:0001:1: last_cmd_index = 2 [ 83.640833] mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 1 [ 83.646542] mwifiex_sdio mmc0:0001:1: last_event_index = 2 [ 83.652002] mwifiex_sdio mmc0:0001:1: data_sent=1 cmd_sent=1 [ 83.657612] mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0 [ 83.663071] mwifiex_sdio: Resetting card (3000ms) ... [ 83.668157] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing [ 83.677902] mwifiex_sdio mmc0:0001:1: failed to get signal information [ 83.684925] mwifiex_sdio mmc0:0001:1: PREP_CMD: card is removed [ 83.713537] mmc0: card 0001 removed [ 83.713537] mwifiex_sdio: removed host [ 87.660599] mwifiex_sdio: delayed [ 87.703045] mwifiex_sdio: added host [ 87.740247] mmc0: new high speed SDIO card at address 0001 [ 97.911584] mwifiex_sdio mmc0:0001:1: FW failed to be active in time But bringing the card back to life has failed. It seems to depend on what command was outstanding; get RSSI vs MAC multicast address. Is there another patch needed? I looked through all the patches but none seemed to relate to this. What about forcing a reset instead of using power? We have a host GPIO tied to the reset input on the card.
Hi James, May I know what processor are you using?. Thanks, john On Fri, Apr 18, 2014 at 5:34 PM, James Cameron <quozl@laptop.org> wrote: > On Fri, Apr 18, 2014 at 12:16:07PM -0700, Bing Zhao wrote: >> Hi James, >> >> > > That "adapter->cmd_sent = false" was hoping the firmware is >> > > still alive and can respond to a new command. The reality is >> > > that the timeout usually indicates the firmware has already >> > > hung. Sending another command won't recover it in this case. >> > >> > I'm dealing with a firmware hang when more than 13 nodes are in an >> > ad-hoc IBSS, and I've just found out isn't entirely a firmware >> > hang; in that we can see beacons and probe responses from the >> > card, using tcpdump and monitor mode. >> > >> > I'm interested to know if the "firmware hangs" that you experiment >> > with prevent autonomous RF TX, or if RF TX typically proceeds. >> >> It depends. Even if firmware hangs the hardware is still alive. >> So you could see beacons and probe responses from the card if >> hardware has been programmed before firmware hangs. > > Thanks. I neglected to mention the time period; beacons and probe > responses are seen for many minutes after the timeout report by the > driver, and I have not yet tested for how long this lasts. The probe > responses are in reply to new probe requests. It makes me think the > card is working fine, apart from not communicating with the host. > > HOST_INSTATUS_REG, RD_BITMAP_{U,L} are all zero when read at the > timeout. > > I am reliably reproducing this particular problem. > >> > > I guess you are using SDIO chip. If your host controller >> > > supports MMC_POWER_OFF/UP, you can reset the chip with this >> > > approach: >> > > >> > > mmc_remove_host(host); >> > > /* some delay */ >> > > mmc_add_host(host); >> > >> > Thanks, adding that to my list of things to try, as I am using >> > SDIO too. >> >> This code (with 20ms delay) is already in latest driver. Your >> platform and controller may require a longer delay. > > Thanks. This is the patch I found: > > mwifiex: add support for SDIO card reset > > and it isn't in our tree yet. > > Yes, we may need to test the delay required. We have a host GPIO > that drives power to the card. We have discharge clamps on that path > as well. mmc_* is configured through device-tree to use the GPIO, > which we use for suspend and resume. We have power-delay-ms > properties but they aren't used. > > I've been testing the patch with 3000ms delay, and additional output: > > pr_err("Resetting card (3000ms) ...\n"); > mmc_remove_host(reset_host); > pr_err("removed host\n"); > mdelay(3000); > pr_err("delayed\n"); > mmc_add_host(reset_host); > pr_err("added host\n"); > > If the host joins an IBSS with 10 peers, and three more peers added, > the wireless LED stays on, and: > > [ 105.023274] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id (1397865681.433582) = 0xa4, act = 0x0 > [ 105.033735] mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0 > [ 105.039533] mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0 > [ 105.045235] mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1 > [ 105.045245] mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0 > [ 105.055866] mwifiex_sdio mmc0:0001:1: last_cmd_index = 3 > [ 105.061148] mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 2 > [ 105.066868] mwifiex_sdio mmc0:0001:1: last_event_index = 3 > [ 105.072320] mwifiex_sdio mmc0:0001:1: data_sent=0 cmd_sent=1 > [ 105.077944] mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0 > [ 105.083408] mwifiex_sdio: Resetting card (3000ms) ... > [ 105.083408] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing > [ 105.098195] mwifiex_sdio mmc0:0001:1: cmd timeout > > This is mmc_remove_host not returning. I've no idea why yet. +CC cjb. > > If the host joins an IBSS with with 13 peers, the wireless LED goes > off, and: > > [ 83.603038] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id (1397865805.48239) = 0x10, act = 0x1 > [ 83.613425] mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0 > [ 83.613425] mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0 > [ 83.624911] mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1 > [ 83.624918] mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0 > [ 83.635542] mwifiex_sdio mmc0:0001:1: last_cmd_index = 2 > [ 83.640833] mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 1 > [ 83.646542] mwifiex_sdio mmc0:0001:1: last_event_index = 2 > [ 83.652002] mwifiex_sdio mmc0:0001:1: data_sent=1 cmd_sent=1 > [ 83.657612] mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0 > [ 83.663071] mwifiex_sdio: Resetting card (3000ms) ... > [ 83.668157] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing > [ 83.677902] mwifiex_sdio mmc0:0001:1: failed to get signal information > [ 83.684925] mwifiex_sdio mmc0:0001:1: PREP_CMD: card is removed > [ 83.713537] mmc0: card 0001 removed > [ 83.713537] mwifiex_sdio: removed host > [ 87.660599] mwifiex_sdio: delayed > [ 87.703045] mwifiex_sdio: added host > [ 87.740247] mmc0: new high speed SDIO card at address 0001 > [ 97.911584] mwifiex_sdio mmc0:0001:1: FW failed to be active in time > > But bringing the card back to life has failed. It seems to depend on > what command was outstanding; get RSSI vs MAC multicast address. > > Is there another patch needed? I looked through all the patches but > none seemed to relate to this. > > What about forcing a reset instead of using power? We have a host > GPIO tied to the reset input on the card. > > -- > James Cameron > http://quozl.linux.org.au/ -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 18, 2014 at 05:42:57PM -0700, John Tobias wrote:
> May I know what processor are you using?.
Sure.
OLPC XO-4 laptop, with Marvell PXA2128 system on a chip.
The arm-3.5 branch at git://dev.laptop.org/git/olpc-kernel/
On Sat, Apr 19, 2014 at 10:34:10AM +1000, James Cameron wrote: > On Fri, Apr 18, 2014 at 12:16:07PM -0700, Bing Zhao wrote: > > Hi James, > > > > > > That "adapter->cmd_sent = false" was hoping the firmware is > > > > still alive and can respond to a new command. The reality is > > > > that the timeout usually indicates the firmware has already > > > > hung. Sending another command won't recover it in this case. > > > > > > I'm dealing with a firmware hang when more than 13 nodes are in > > > an ad-hoc IBSS, and I've just found out isn't entirely a > > > firmware hang; in that we can see beacons and probe responses > > > from the card, using tcpdump and monitor mode. > > > > > > I'm interested to know if the "firmware hangs" that you > > > experiment with prevent autonomous RF TX, or if RF TX typically > > > proceeds. > > > > It depends. Even if firmware hangs the hardware is still alive. > > So you could see beacons and probe responses from the card if > > hardware has been programmed before firmware hangs. > > Thanks. I neglected to mention the time period; beacons and probe > responses are seen for many minutes after the timeout report by the > driver, and I have not yet tested for how long this lasts. The > probe responses are in reply to new probe requests. It makes me > think the card is working fine, apart from not communicating with > the host. Downgrading wireless firmware to 14.66.9.p80 has fixed this problem.
Hi James, > > > I'm interested to know if the "firmware hangs" that you experiment > > > with prevent autonomous RF TX, or if RF TX typically proceeds. > > > > It depends. Even if firmware hangs the hardware is still alive. > > So you could see beacons and probe responses from the card if hardware > > has been programmed before firmware hangs. > > Thanks. I neglected to mention the time period; beacons and probe > responses are seen for many minutes after the timeout report by the driver, > and I have not yet tested for how long this lasts. The probe responses are in > reply to new probe requests. It makes me think the card is working fine, > apart from not communicating with the host. > > HOST_INSTATUS_REG, RD_BITMAP_{U,L} are all zero when read at the > timeout. This means that the firmware does not have any packet (command response, event, rx data) for host. > [ 83.663071] mwifiex_sdio: Resetting card (3000ms) ... > [ 83.668157] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing > [ 83.677902] mwifiex_sdio mmc0:0001:1: failed to get signal information > [ 83.684925] mwifiex_sdio mmc0:0001:1: PREP_CMD: card is removed > [ 83.713537] mmc0: card 0001 removed > [ 83.713537] mwifiex_sdio: removed host > [ 87.660599] mwifiex_sdio: delayed > [ 87.703045] mwifiex_sdio: added host > [ 87.740247] mmc0: new high speed SDIO card at address 0001 > [ 97.911584] mwifiex_sdio mmc0:0001:1: FW failed to be active in time > > But bringing the card back to life has failed. It seems to depend on what > command was outstanding; get RSSI vs MAC multicast address. Unlikely, it's just a coincidence. > > Is there another patch needed? I looked through all the patches but none > seemed to relate to this. No other patch is needed if mmc host power off/up is implemented. > > What about forcing a reset instead of using power? We have a host GPIO > tied to the reset input on the card. Usually toggling 8787 PDn pin is sufficient to power cycle the chip. But if that's not working for whatever reason it's worth a try on RESETn pin. Thanks, Bing -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi James, > > > > I'm interested to know if the "firmware hangs" that you experiment > > > > with prevent autonomous RF TX, or if RF TX typically proceeds. > > > > > > It depends. Even if firmware hangs the hardware is still alive. > > > So you could see beacons and probe responses from the card if > > > hardware has been programmed before firmware hangs. > > > > Thanks. I neglected to mention the time period; beacons and probe > > responses are seen for many minutes after the timeout report by the > > driver, and I have not yet tested for how long this lasts. The probe > > responses are in reply to new probe requests. It makes me think the > > card is working fine, apart from not communicating with the host. > > Downgrading wireless firmware to 14.66.9.p80 has fixed this problem. Wow! It means that p96 firmware had introduced the problem. Thanks, Bing -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi James, How did you know that by downgrading the firmware the problem has been solved?. Did you see a scenario or flow in the driver that both occurred when using the two different firmware but able to work on the p80?. The reason why I am asking is because sometimes the bug/s did not occur often. Regards, john On Thu, Apr 24, 2014 at 12:28 AM, Bing Zhao <bzhao@marvell.com> wrote: > Hi James, > >> > > > I'm interested to know if the "firmware hangs" that you experiment >> > > > with prevent autonomous RF TX, or if RF TX typically proceeds. >> > > >> > > It depends. Even if firmware hangs the hardware is still alive. >> > > So you could see beacons and probe responses from the card if >> > > hardware has been programmed before firmware hangs. >> > >> > Thanks. I neglected to mention the time period; beacons and probe >> > responses are seen for many minutes after the timeout report by the >> > driver, and I have not yet tested for how long this lasts. The probe >> > responses are in reply to new probe requests. It makes me think the >> > card is working fine, apart from not communicating with the host. >> >> Downgrading wireless firmware to 14.66.9.p80 has fixed this problem. > > Wow! It means that p96 firmware had introduced the problem. > > Thanks, > Bing > -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 24, 2014 at 09:45:37AM -0700, John Tobias wrote: > How did you know that by downgrading the firmware the problem has > been solved?. Did you see a scenario or flow in the driver that both > occurred when using the two different firmware but able to work on > the p80?. I knew it had been solved because the problem stopped happening, whereas before it would always happen. No, there was nothing in the driver changed, and nothing I could see in the driver had any effect on the problem. > The reason why I am asking is because sometimes the bug/s did not > occur often. We are probably facing different problems. For the problem I am working, it always happens, all I have to do is boot 14 or more laptops. The sequence is: - the automatic starting or joining of an ad-hoc network, by the Sugar learning software, - a stream of 13 adhoc station connect events from the card, one for each station in the network, - at about the point that the 14th beacon is seen on RF, the card firmware hangs, - a command is sent by the host (e.g. to get RSSI to update the display), - no interrupt occurs, and so the mwifiex driver reports a command timeout. http://dev.laptop.org/ticket/12763 has some of the details. I have an instrumented kernel that reports the adhoc station connect and disconnect events, and counts the number of stations that the card knows about. There's some sort of timer used by the card to issue the adhoc station disconnect event; when no beacons from the station have been heard for a few seconds. So increasing the beacon interval to 10000 TU also avoided the problem. I doubt your problem is caused by firmware, but you could test for it.
diff --git a/drivers/net/wireless/mwifiex/cmdevt.c b/drivers/net/wireless/mwifiex/cmdevt.c index 1062c91..8dee6c8 100644 --- a/drivers/net/wireless/mwifiex/cmdevt.c +++ b/drivers/net/wireless/mwifiex/cmdevt.c @@ -955,8 +955,6 @@ mwifiex_cmd_timeout_func(unsigned long function_context) adapter->cmd_wait_q.status = -ETIMEDOUT; wake_up_interruptible(&adapter->cmd_wait_q.wait); mwifiex_cancel_pending_ioctl(adapter); - /* reset cmd_sent flag to unblock new commands */ - adapter->cmd_sent = false; } } if (adapter->hw_status == MWIFIEX_HW_STATUS_INITIALIZING)