diff mbox

[2/2] mwifiex: don't clear cmd_sent flag in timeout handler

Message ID 1397710914-10061-2-git-send-email-bzhao@marvell.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Bing Zhao April 17, 2014, 5:01 a.m. UTC
From: Amitkumar Karwar <akarwar@marvell.com>

When command timeout occurs due to a firmware/hardware bug,
there is no chance of next command being successful. We will
keep cmd_sent flag on so that next command won't be sent to
firmware.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
---
 drivers/net/wireless/mwifiex/cmdevt.c | 2 --
 1 file changed, 2 deletions(-)

Comments

John Tobias April 17, 2014, 9:41 p.m. UTC | #1
Hi Bing,

Assuming the timeout happened due to a firmware bug. Does the firmware
able to recover after setting adapter->cmd_sent = false and the
firmware could accept a new commands without locking?.
Seems, this is the bug I was encountering and couldn't access the
mlan0 interface anymore...

Is there a way to force the firmware to reset it without rebooting it?.

Regards,

john


On Wed, Apr 16, 2014 at 10:01 PM, Bing Zhao <bzhao@marvell.com> wrote:
> From: Amitkumar Karwar <akarwar@marvell.com>
>
> When command timeout occurs due to a firmware/hardware bug,
> there is no chance of next command being successful. We will
> keep cmd_sent flag on so that next command won't be sent to
> firmware.
>
> Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> Signed-off-by: Bing Zhao <bzhao@marvell.com>
> ---
>  drivers/net/wireless/mwifiex/cmdevt.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/net/wireless/mwifiex/cmdevt.c b/drivers/net/wireless/mwifiex/cmdevt.c
> index 1062c91..8dee6c8 100644
> --- a/drivers/net/wireless/mwifiex/cmdevt.c
> +++ b/drivers/net/wireless/mwifiex/cmdevt.c
> @@ -955,8 +955,6 @@ mwifiex_cmd_timeout_func(unsigned long function_context)
>                         adapter->cmd_wait_q.status = -ETIMEDOUT;
>                         wake_up_interruptible(&adapter->cmd_wait_q.wait);
>                         mwifiex_cancel_pending_ioctl(adapter);
> -                       /* reset cmd_sent flag to unblock new commands */
> -                       adapter->cmd_sent = false;
>                 }
>         }
>         if (adapter->hw_status == MWIFIEX_HW_STATUS_INITIALIZING)
> --
> 1.8.2.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bing Zhao April 17, 2014, 11:33 p.m. UTC | #2
SGkgSm9obiwNCg0KPiBIaSBCaW5nLA0KPiANCj4gQXNzdW1pbmcgdGhlIHRpbWVvdXQgaGFwcGVu
ZWQgZHVlIHRvIGEgZmlybXdhcmUgYnVnLiBEb2VzIHRoZSBmaXJtd2FyZQ0KPiBhYmxlIHRvIHJl
Y292ZXIgYWZ0ZXIgc2V0dGluZyBhZGFwdGVyLT5jbWRfc2VudCA9IGZhbHNlIGFuZCB0aGUNCj4g
ZmlybXdhcmUgY291bGQgYWNjZXB0IGEgbmV3IGNvbW1hbmRzIHdpdGhvdXQgbG9ja2luZz8uDQoN
ClRoYXQgImFkYXB0ZXItPmNtZF9zZW50ID0gZmFsc2UiIHdhcyBob3BpbmcgdGhlIGZpcm13YXJl
IGlzIHN0aWxsIGFsaXZlIGFuZCBjYW4gcmVzcG9uZCB0byBhIG5ldyBjb21tYW5kLiBUaGUgcmVh
bGl0eSBpcyB0aGF0IHRoZSB0aW1lb3V0IHVzdWFsbHkgaW5kaWNhdGVzIHRoZSBmaXJtd2FyZSBo
YXMgYWxyZWFkeSBodW5nLiBTZW5kaW5nIGFub3RoZXIgY29tbWFuZCB3b24ndCByZWNvdmVyIGl0
IGluIHRoaXMgY2FzZS4NCg0KPiBTZWVtcywgdGhpcyBpcyB0aGUgYnVnIEkgd2FzIGVuY291bnRl
cmluZyBhbmQgY291bGRuJ3QgYWNjZXNzIHRoZQ0KPiBtbGFuMCBpbnRlcmZhY2UgYW55bW9yZS4u
Lg0KDQpUaGlzIHBhdGNoIGl0c2VsZiBkb2Vzbid0IHNvbHZlIGFueSBleGlzdGluZyBpc3N1ZS4g
SXQgaGVscHMga2VlcCB0aGUgc2NlbmUgb2YgdGhlIGluaXRpYWwgdGltZW91dCBmb3IgaW52ZXN0
aWdhdGlvbi4NCg0KPiANCj4gSXMgdGhlcmUgYSB3YXkgdG8gZm9yY2UgdGhlIGZpcm13YXJlIHRv
IHJlc2V0IGl0IHdpdGhvdXQgcmVib290aW5nIGl0Py4NCg0KSSBndWVzcyB5b3UgYXJlIHVzaW5n
IFNESU8gY2hpcC4gSWYgeW91ciBob3N0IGNvbnRyb2xsZXIgc3VwcG9ydHMgTU1DX1BPV0VSX09G
Ri9VUCwgeW91IGNhbiByZXNldCB0aGUgY2hpcCB3aXRoIHRoaXMgYXBwcm9hY2g6DQoNCiAgICAg
ICAgbW1jX3JlbW92ZV9ob3N0KGhvc3QpOw0KICAgICAgICAvKiBzb21lIGRlbGF5ICovDQogICAg
ICAgIG1tY19hZGRfaG9zdChob3N0KTsNCg0KUmVnYXJkcywNCkJpbmcNCg0KPiANCj4gUmVnYXJk
cywNCj4gDQo+IGpvaG4NCj4gDQo+IA0KPiBPbiBXZWQsIEFwciAxNiwgMjAxNCBhdCAxMDowMSBQ
TSwgQmluZyBaaGFvIDxiemhhb0BtYXJ2ZWxsLmNvbT4gd3JvdGU6DQo+ID4gRnJvbTogQW1pdGt1
bWFyIEthcndhciA8YWthcndhckBtYXJ2ZWxsLmNvbT4NCj4gPg0KPiA+IFdoZW4gY29tbWFuZCB0
aW1lb3V0IG9jY3VycyBkdWUgdG8gYSBmaXJtd2FyZS9oYXJkd2FyZSBidWcsDQo+ID4gdGhlcmUg
aXMgbm8gY2hhbmNlIG9mIG5leHQgY29tbWFuZCBiZWluZyBzdWNjZXNzZnVsLiBXZSB3aWxsDQo+
ID4ga2VlcCBjbWRfc2VudCBmbGFnIG9uIHNvIHRoYXQgbmV4dCBjb21tYW5kIHdvbid0IGJlIHNl
bnQgdG8NCj4gPiBmaXJtd2FyZS4NCj4gPg0KPiA+IFNpZ25lZC1vZmYtYnk6IEFtaXRrdW1hciBL
YXJ3YXIgPGFrYXJ3YXJAbWFydmVsbC5jb20+DQo+ID4gU2lnbmVkLW9mZi1ieTogQmluZyBaaGFv
IDxiemhhb0BtYXJ2ZWxsLmNvbT4NCj4gPiAtLS0NCj4gPiAgZHJpdmVycy9uZXQvd2lyZWxlc3Mv
bXdpZmlleC9jbWRldnQuYyB8IDIgLS0NCj4gPiAgMSBmaWxlIGNoYW5nZWQsIDIgZGVsZXRpb25z
KC0pDQo+ID4NCj4gPiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9uZXQvd2lyZWxlc3MvbXdpZmlleC9j
bWRldnQuYyBiL2RyaXZlcnMvbmV0L3dpcmVsZXNzL213aWZpZXgvY21kZXZ0LmMNCj4gPiBpbmRl
eCAxMDYyYzkxLi44ZGVlNmM4IDEwMDY0NA0KPiA+IC0tLSBhL2RyaXZlcnMvbmV0L3dpcmVsZXNz
L213aWZpZXgvY21kZXZ0LmMNCj4gPiArKysgYi9kcml2ZXJzL25ldC93aXJlbGVzcy9td2lmaWV4
L2NtZGV2dC5jDQo+ID4gQEAgLTk1NSw4ICs5NTUsNiBAQCBtd2lmaWV4X2NtZF90aW1lb3V0X2Z1
bmModW5zaWduZWQgbG9uZyBmdW5jdGlvbl9jb250ZXh0KQ0KPiA+ICAgICAgICAgICAgICAgICAg
ICAgICAgIGFkYXB0ZXItPmNtZF93YWl0X3Euc3RhdHVzID0gLUVUSU1FRE9VVDsNCj4gPiAgICAg
ICAgICAgICAgICAgICAgICAgICB3YWtlX3VwX2ludGVycnVwdGlibGUoJmFkYXB0ZXItPmNtZF93
YWl0X3Eud2FpdCk7DQo+ID4gICAgICAgICAgICAgICAgICAgICAgICAgbXdpZmlleF9jYW5jZWxf
cGVuZGluZ19pb2N0bChhZGFwdGVyKTsNCj4gPiAtICAgICAgICAgICAgICAgICAgICAgICAvKiBy
ZXNldCBjbWRfc2VudCBmbGFnIHRvIHVuYmxvY2sgbmV3IGNvbW1hbmRzICovDQo+ID4gLSAgICAg
ICAgICAgICAgICAgICAgICAgYWRhcHRlci0+Y21kX3NlbnQgPSBmYWxzZTsNCj4gPiAgICAgICAg
ICAgICAgICAgfQ0KPiA+ICAgICAgICAgfQ0KPiA+ICAgICAgICAgaWYgKGFkYXB0ZXItPmh3X3N0
YXR1cyA9PSBNV0lGSUVYX0hXX1NUQVRVU19JTklUSUFMSVpJTkcpDQo+ID4gLS0NCj4gPiAxLjgu
Mi4zDQo+ID4NCj4gPiAtLQ0KPiA+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5k
IHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC13aXJlbGVzcyIgaW4NCj4gPiB0aGUgYm9keSBv
ZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KPiA+IE1vcmUgbWFqb3Jk
b21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0K
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Cameron April 18, 2014, 4:46 a.m. UTC | #3
On Thu, Apr 17, 2014 at 04:33:58PM -0700, Bing Zhao wrote:
> Hi John,
> 
> > Hi Bing,
> > 
> > Assuming the timeout happened due to a firmware bug. Does the
> > firmware able to recover after setting adapter->cmd_sent = false
> > and the firmware could accept a new commands without locking?.
> 
> That "adapter->cmd_sent = false" was hoping the firmware is still
> alive and can respond to a new command. The reality is that the
> timeout usually indicates the firmware has already hung. Sending
> another command won't recover it in this case.

I'm dealing with a firmware hang when more than 13 nodes are in an
ad-hoc IBSS, and I've just found out isn't entirely a firmware hang;
in that we can see beacons and probe responses from the card, using
tcpdump and monitor mode.

I'm interested to know if the "firmware hangs" that you experiment
with prevent autonomous RF TX, or if RF TX typically proceeds.

> I guess you are using SDIO chip. If your host controller supports
> MMC_POWER_OFF/UP, you can reset the chip with this approach:
> 
>         mmc_remove_host(host);
>         /* some delay */
>         mmc_add_host(host);

Thanks, adding that to my list of things to try, as I am using SDIO too.
Bing Zhao April 18, 2014, 7:16 p.m. UTC | #4
Hi James,

> > That "adapter->cmd_sent = false" was hoping the firmware is still
> > alive and can respond to a new command. The reality is that the
> > timeout usually indicates the firmware has already hung. Sending
> > another command won't recover it in this case.
> 
> I'm dealing with a firmware hang when more than 13 nodes are in an ad-hoc
> IBSS, and I've just found out isn't entirely a firmware hang; in that we can see
> beacons and probe responses from the card, using tcpdump and monitor
> mode.
> 
> I'm interested to know if the "firmware hangs" that you experiment with
> prevent autonomous RF TX, or if RF TX typically proceeds.

It depends. Even if firmware hangs the hardware is still alive.
So you could see beacons and probe responses from the card if hardware has been programmed before firmware hangs.

> > I guess you are using SDIO chip. If your host controller supports
> > MMC_POWER_OFF/UP, you can reset the chip with this approach:
> >
> >         mmc_remove_host(host);
> >         /* some delay */
> >         mmc_add_host(host);
> 
> Thanks, adding that to my list of things to try, as I am using SDIO too.

This code (with 20ms delay) is already in latest driver. Your platform and controller may require a longer delay.

Regards,
Bing
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Cameron April 19, 2014, 12:34 a.m. UTC | #5
On Fri, Apr 18, 2014 at 12:16:07PM -0700, Bing Zhao wrote:
> Hi James,
> 
> > > That "adapter->cmd_sent = false" was hoping the firmware is
> > > still alive and can respond to a new command. The reality is
> > > that the timeout usually indicates the firmware has already
> > > hung. Sending another command won't recover it in this case.
> > 
> > I'm dealing with a firmware hang when more than 13 nodes are in an
> > ad-hoc IBSS, and I've just found out isn't entirely a firmware
> > hang; in that we can see beacons and probe responses from the
> > card, using tcpdump and monitor mode.
> > 
> > I'm interested to know if the "firmware hangs" that you experiment
> > with prevent autonomous RF TX, or if RF TX typically proceeds.
> 
> It depends. Even if firmware hangs the hardware is still alive.
> So you could see beacons and probe responses from the card if
> hardware has been programmed before firmware hangs.

Thanks.  I neglected to mention the time period; beacons and probe
responses are seen for many minutes after the timeout report by the
driver, and I have not yet tested for how long this lasts.  The probe
responses are in reply to new probe requests.  It makes me think the
card is working fine, apart from not communicating with the host.

HOST_INSTATUS_REG, RD_BITMAP_{U,L} are all zero when read at the
timeout.

I am reliably reproducing this particular problem.

> > > I guess you are using SDIO chip. If your host controller
> > > supports MMC_POWER_OFF/UP, you can reset the chip with this
> > > approach:
> > >
> > >         mmc_remove_host(host);
> > >         /* some delay */
> > >         mmc_add_host(host);
> > 
> > Thanks, adding that to my list of things to try, as I am using
> > SDIO too.
> 
> This code (with 20ms delay) is already in latest driver. Your
> platform and controller may require a longer delay.

Thanks.  This is the patch I found:

	mwifiex: add support for SDIO card reset

and it isn't in our tree yet.

Yes, we may need to test the delay required.  We have a host GPIO
that drives power to the card.  We have discharge clamps on that path
as well.  mmc_* is configured through device-tree to use the GPIO,
which we use for suspend and resume.  We have power-delay-ms
properties but they aren't used.

I've been testing the patch with 3000ms delay, and additional output:

	pr_err("Resetting card (3000ms) ...\n");
	mmc_remove_host(reset_host);
	pr_err("removed host\n");
	mdelay(3000);
	pr_err("delayed\n");
	mmc_add_host(reset_host);
	pr_err("added host\n");

If the host joins an IBSS with 10 peers, and three more peers added,
the wireless LED stays on, and:

[  105.023274] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id (1397865681.433582) = 0xa4, act = 0x0
[  105.033735] mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0
[  105.039533] mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0
[  105.045235] mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1
[  105.045245] mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0
[  105.055866] mwifiex_sdio mmc0:0001:1: last_cmd_index = 3
[  105.061148] mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 2
[  105.066868] mwifiex_sdio mmc0:0001:1: last_event_index = 3
[  105.072320] mwifiex_sdio mmc0:0001:1: data_sent=0 cmd_sent=1
[  105.077944] mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0
[  105.083408] mwifiex_sdio: Resetting card (3000ms) ...
[  105.083408] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing
[  105.098195] mwifiex_sdio mmc0:0001:1: cmd timeout

This is mmc_remove_host not returning.  I've no idea why yet.  +CC cjb.

If the host joins an IBSS with with 13 peers, the wireless LED goes
off, and:

[   83.603038] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id (1397865805.48239) = 0x10, act = 0x1
[   83.613425] mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0
[   83.613425] mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0
[   83.624911] mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1
[   83.624918] mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0
[   83.635542] mwifiex_sdio mmc0:0001:1: last_cmd_index = 2
[   83.640833] mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 1
[   83.646542] mwifiex_sdio mmc0:0001:1: last_event_index = 2
[   83.652002] mwifiex_sdio mmc0:0001:1: data_sent=1 cmd_sent=1
[   83.657612] mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0
[   83.663071] mwifiex_sdio: Resetting card (3000ms) ...
[   83.668157] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing
[   83.677902] mwifiex_sdio mmc0:0001:1: failed to get signal information
[   83.684925] mwifiex_sdio mmc0:0001:1: PREP_CMD: card is removed
[   83.713537] mmc0: card 0001 removed
[   83.713537] mwifiex_sdio: removed host
[   87.660599] mwifiex_sdio: delayed
[   87.703045] mwifiex_sdio: added host
[   87.740247] mmc0: new high speed SDIO card at address 0001
[   97.911584] mwifiex_sdio mmc0:0001:1: FW failed to be active in time

But bringing the card back to life has failed.  It seems to depend on
what command was outstanding; get RSSI vs MAC multicast address.

Is there another patch needed?  I looked through all the patches but
none seemed to relate to this.

What about forcing a reset instead of using power?  We have a host
GPIO tied to the reset input on the card.
John Tobias April 19, 2014, 12:42 a.m. UTC | #6
Hi James,

May I know what processor are you using?.

Thanks,

john

On Fri, Apr 18, 2014 at 5:34 PM, James Cameron <quozl@laptop.org> wrote:
> On Fri, Apr 18, 2014 at 12:16:07PM -0700, Bing Zhao wrote:
>> Hi James,
>>
>> > > That "adapter->cmd_sent = false" was hoping the firmware is
>> > > still alive and can respond to a new command. The reality is
>> > > that the timeout usually indicates the firmware has already
>> > > hung. Sending another command won't recover it in this case.
>> >
>> > I'm dealing with a firmware hang when more than 13 nodes are in an
>> > ad-hoc IBSS, and I've just found out isn't entirely a firmware
>> > hang; in that we can see beacons and probe responses from the
>> > card, using tcpdump and monitor mode.
>> >
>> > I'm interested to know if the "firmware hangs" that you experiment
>> > with prevent autonomous RF TX, or if RF TX typically proceeds.
>>
>> It depends. Even if firmware hangs the hardware is still alive.
>> So you could see beacons and probe responses from the card if
>> hardware has been programmed before firmware hangs.
>
> Thanks.  I neglected to mention the time period; beacons and probe
> responses are seen for many minutes after the timeout report by the
> driver, and I have not yet tested for how long this lasts.  The probe
> responses are in reply to new probe requests.  It makes me think the
> card is working fine, apart from not communicating with the host.
>
> HOST_INSTATUS_REG, RD_BITMAP_{U,L} are all zero when read at the
> timeout.
>
> I am reliably reproducing this particular problem.
>
>> > > I guess you are using SDIO chip. If your host controller
>> > > supports MMC_POWER_OFF/UP, you can reset the chip with this
>> > > approach:
>> > >
>> > >         mmc_remove_host(host);
>> > >         /* some delay */
>> > >         mmc_add_host(host);
>> >
>> > Thanks, adding that to my list of things to try, as I am using
>> > SDIO too.
>>
>> This code (with 20ms delay) is already in latest driver. Your
>> platform and controller may require a longer delay.
>
> Thanks.  This is the patch I found:
>
>         mwifiex: add support for SDIO card reset
>
> and it isn't in our tree yet.
>
> Yes, we may need to test the delay required.  We have a host GPIO
> that drives power to the card.  We have discharge clamps on that path
> as well.  mmc_* is configured through device-tree to use the GPIO,
> which we use for suspend and resume.  We have power-delay-ms
> properties but they aren't used.
>
> I've been testing the patch with 3000ms delay, and additional output:
>
>         pr_err("Resetting card (3000ms) ...\n");
>         mmc_remove_host(reset_host);
>         pr_err("removed host\n");
>         mdelay(3000);
>         pr_err("delayed\n");
>         mmc_add_host(reset_host);
>         pr_err("added host\n");
>
> If the host joins an IBSS with 10 peers, and three more peers added,
> the wireless LED stays on, and:
>
> [  105.023274] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id (1397865681.433582) = 0xa4, act = 0x0
> [  105.033735] mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0
> [  105.039533] mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0
> [  105.045235] mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1
> [  105.045245] mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0
> [  105.055866] mwifiex_sdio mmc0:0001:1: last_cmd_index = 3
> [  105.061148] mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 2
> [  105.066868] mwifiex_sdio mmc0:0001:1: last_event_index = 3
> [  105.072320] mwifiex_sdio mmc0:0001:1: data_sent=0 cmd_sent=1
> [  105.077944] mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0
> [  105.083408] mwifiex_sdio: Resetting card (3000ms) ...
> [  105.083408] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing
> [  105.098195] mwifiex_sdio mmc0:0001:1: cmd timeout
>
> This is mmc_remove_host not returning.  I've no idea why yet.  +CC cjb.
>
> If the host joins an IBSS with with 13 peers, the wireless LED goes
> off, and:
>
> [   83.603038] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id (1397865805.48239) = 0x10, act = 0x1
> [   83.613425] mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0
> [   83.613425] mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0
> [   83.624911] mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1
> [   83.624918] mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0
> [   83.635542] mwifiex_sdio mmc0:0001:1: last_cmd_index = 2
> [   83.640833] mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 1
> [   83.646542] mwifiex_sdio mmc0:0001:1: last_event_index = 2
> [   83.652002] mwifiex_sdio mmc0:0001:1: data_sent=1 cmd_sent=1
> [   83.657612] mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0
> [   83.663071] mwifiex_sdio: Resetting card (3000ms) ...
> [   83.668157] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing
> [   83.677902] mwifiex_sdio mmc0:0001:1: failed to get signal information
> [   83.684925] mwifiex_sdio mmc0:0001:1: PREP_CMD: card is removed
> [   83.713537] mmc0: card 0001 removed
> [   83.713537] mwifiex_sdio: removed host
> [   87.660599] mwifiex_sdio: delayed
> [   87.703045] mwifiex_sdio: added host
> [   87.740247] mmc0: new high speed SDIO card at address 0001
> [   97.911584] mwifiex_sdio mmc0:0001:1: FW failed to be active in time
>
> But bringing the card back to life has failed.  It seems to depend on
> what command was outstanding; get RSSI vs MAC multicast address.
>
> Is there another patch needed?  I looked through all the patches but
> none seemed to relate to this.
>
> What about forcing a reset instead of using power?  We have a host
> GPIO tied to the reset input on the card.
>
> --
> James Cameron
> http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Cameron April 19, 2014, 12:48 a.m. UTC | #7
On Fri, Apr 18, 2014 at 05:42:57PM -0700, John Tobias wrote:
> May I know what processor are you using?.

Sure.

OLPC XO-4 laptop, with Marvell PXA2128 system on a chip.

The arm-3.5 branch at git://dev.laptop.org/git/olpc-kernel/
James Cameron April 24, 2014, 6:11 a.m. UTC | #8
On Sat, Apr 19, 2014 at 10:34:10AM +1000, James Cameron wrote:
> On Fri, Apr 18, 2014 at 12:16:07PM -0700, Bing Zhao wrote:
> > Hi James,
> > 
> > > > That "adapter->cmd_sent = false" was hoping the firmware is
> > > > still alive and can respond to a new command. The reality is
> > > > that the timeout usually indicates the firmware has already
> > > > hung. Sending another command won't recover it in this case.
> > > 
> > > I'm dealing with a firmware hang when more than 13 nodes are in
> > > an ad-hoc IBSS, and I've just found out isn't entirely a
> > > firmware hang; in that we can see beacons and probe responses
> > > from the card, using tcpdump and monitor mode.
> > > 
> > > I'm interested to know if the "firmware hangs" that you
> > > experiment with prevent autonomous RF TX, or if RF TX typically
> > > proceeds.
> > 
> > It depends. Even if firmware hangs the hardware is still alive.
> > So you could see beacons and probe responses from the card if
> > hardware has been programmed before firmware hangs.
> 
> Thanks.  I neglected to mention the time period; beacons and probe
> responses are seen for many minutes after the timeout report by the
> driver, and I have not yet tested for how long this lasts.  The
> probe responses are in reply to new probe requests.  It makes me
> think the card is working fine, apart from not communicating with
> the host.

Downgrading wireless firmware to 14.66.9.p80 has fixed this problem.
Bing Zhao April 24, 2014, 7:22 a.m. UTC | #9
Hi James,

> > > I'm interested to know if the "firmware hangs" that you experiment
> > > with prevent autonomous RF TX, or if RF TX typically proceeds.
> >
> > It depends. Even if firmware hangs the hardware is still alive.
> > So you could see beacons and probe responses from the card if hardware
> > has been programmed before firmware hangs.
> 
> Thanks.  I neglected to mention the time period; beacons and probe
> responses are seen for many minutes after the timeout report by the driver,
> and I have not yet tested for how long this lasts.  The probe responses are in
> reply to new probe requests.  It makes me think the card is working fine,
> apart from not communicating with the host.
> 
> HOST_INSTATUS_REG, RD_BITMAP_{U,L} are all zero when read at the
> timeout.

This means that the firmware does not have any packet (command response, event, rx data) for host.

> [   83.663071] mwifiex_sdio: Resetting card (3000ms) ...
> [   83.668157] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing
> [   83.677902] mwifiex_sdio mmc0:0001:1: failed to get signal information
> [   83.684925] mwifiex_sdio mmc0:0001:1: PREP_CMD: card is removed
> [   83.713537] mmc0: card 0001 removed
> [   83.713537] mwifiex_sdio: removed host
> [   87.660599] mwifiex_sdio: delayed
> [   87.703045] mwifiex_sdio: added host
> [   87.740247] mmc0: new high speed SDIO card at address 0001
> [   97.911584] mwifiex_sdio mmc0:0001:1: FW failed to be active in time
> 
> But bringing the card back to life has failed.  It seems to depend on what
> command was outstanding; get RSSI vs MAC multicast address.

Unlikely, it's just a coincidence.

> 
> Is there another patch needed?  I looked through all the patches but none
> seemed to relate to this.

No other patch is needed if mmc host power off/up is implemented.

> 
> What about forcing a reset instead of using power?  We have a host GPIO
> tied to the reset input on the card.

Usually toggling 8787 PDn pin is sufficient to power cycle the chip. But if that's not working for whatever reason it's worth a try on RESETn pin.

Thanks,
Bing

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bing Zhao April 24, 2014, 7:28 a.m. UTC | #10
Hi James,

> > > > I'm interested to know if the "firmware hangs" that you experiment
> > > > with prevent autonomous RF TX, or if RF TX typically proceeds.
> > >
> > > It depends. Even if firmware hangs the hardware is still alive.
> > > So you could see beacons and probe responses from the card if
> > > hardware has been programmed before firmware hangs.
> >
> > Thanks.  I neglected to mention the time period; beacons and probe
> > responses are seen for many minutes after the timeout report by the
> > driver, and I have not yet tested for how long this lasts.  The probe
> > responses are in reply to new probe requests.  It makes me think the
> > card is working fine, apart from not communicating with the host.
> 
> Downgrading wireless firmware to 14.66.9.p80 has fixed this problem.

Wow! It means that p96 firmware had introduced the problem.

Thanks,
Bing

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Tobias April 24, 2014, 4:45 p.m. UTC | #11
Hi James,

How did you know that by downgrading the firmware the problem has been
solved?. Did you see a scenario or flow in the driver that both
occurred when using the two different firmware but able to work on the
p80?.

The reason why I am asking is because sometimes the bug/s did not occur often.

Regards,

john


On Thu, Apr 24, 2014 at 12:28 AM, Bing Zhao <bzhao@marvell.com> wrote:
> Hi James,
>
>> > > > I'm interested to know if the "firmware hangs" that you experiment
>> > > > with prevent autonomous RF TX, or if RF TX typically proceeds.
>> > >
>> > > It depends. Even if firmware hangs the hardware is still alive.
>> > > So you could see beacons and probe responses from the card if
>> > > hardware has been programmed before firmware hangs.
>> >
>> > Thanks.  I neglected to mention the time period; beacons and probe
>> > responses are seen for many minutes after the timeout report by the
>> > driver, and I have not yet tested for how long this lasts.  The probe
>> > responses are in reply to new probe requests.  It makes me think the
>> > card is working fine, apart from not communicating with the host.
>>
>> Downgrading wireless firmware to 14.66.9.p80 has fixed this problem.
>
> Wow! It means that p96 firmware had introduced the problem.
>
> Thanks,
> Bing
>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Cameron April 24, 2014, 8:48 p.m. UTC | #12
On Thu, Apr 24, 2014 at 09:45:37AM -0700, John Tobias wrote:
> How did you know that by downgrading the firmware the problem has
> been solved?. Did you see a scenario or flow in the driver that both
> occurred when using the two different firmware but able to work on
> the p80?.

I knew it had been solved because the problem stopped happening,
whereas before it would always happen.

No, there was nothing in the driver changed, and nothing I could see
in the driver had any effect on the problem.

> The reason why I am asking is because sometimes the bug/s did not
> occur often.

We are probably facing different problems.  For the problem I am
working, it always happens, all I have to do is boot 14 or more
laptops.  The sequence is:

- the automatic starting or joining of an ad-hoc network, by the Sugar
  learning software,

- a stream of 13 adhoc station connect events from the card, one for
  each station in the network,

- at about the point that the 14th beacon is seen on RF, the card
  firmware hangs,

- a command is sent by the host (e.g. to get RSSI to update the
  display),

- no interrupt occurs, and so the mwifiex driver reports a command
  timeout.

http://dev.laptop.org/ticket/12763 has some of the details.

I have an instrumented kernel that reports the adhoc station connect
and disconnect events, and counts the number of stations that the card
knows about.

There's some sort of timer used by the card to issue the adhoc station
disconnect event; when no beacons from the station have been heard for
a few seconds.  So increasing the beacon interval to 10000 TU also
avoided the problem.

I doubt your problem is caused by firmware, but you could test for it.
diff mbox

Patch

diff --git a/drivers/net/wireless/mwifiex/cmdevt.c b/drivers/net/wireless/mwifiex/cmdevt.c
index 1062c91..8dee6c8 100644
--- a/drivers/net/wireless/mwifiex/cmdevt.c
+++ b/drivers/net/wireless/mwifiex/cmdevt.c
@@ -955,8 +955,6 @@  mwifiex_cmd_timeout_func(unsigned long function_context)
 			adapter->cmd_wait_q.status = -ETIMEDOUT;
 			wake_up_interruptible(&adapter->cmd_wait_q.wait);
 			mwifiex_cancel_pending_ioctl(adapter);
-			/* reset cmd_sent flag to unblock new commands */
-			adapter->cmd_sent = false;
 		}
 	}
 	if (adapter->hw_status == MWIFIEX_HW_STATUS_INITIALIZING)