Message ID | OFAF74C95C.6058723F-ON48257E93.00322982-48257E93.00363C6C@zte.com.cn (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/31/2015 11:52 AM, jiang.biao2@zte.com.cn wrote: > scsi_error: should not get sense for timeout IO in scsi error handler > > When an IO timeout occurs, the IO will be aborted in > scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because > of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add(). > So when scsi error handler starts, it will get sense for this > timeout IO and the scmd of the IO request will be reused. In that > case, the scmd may be double released when racing with io_done(), > which will result in crash. > SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense. > The bug maybe reproduced when the link between host and disk is > unstable. > > Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> > Signed-off-by: Long Chun <long.chun@zte.com.cn> > Reviewed-by: Tan Hu <tan.hu@zte.com.cn> > Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn> > Reviewed-by: Cai Qu <cai.qu@zte.com.cn> > > diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c > --- scsi/scsi_error.c 2015-07-31 16:03:18.000000000 +0800 > +++ scsi_new/scsi_error.c 2015-07-31 16:29:25.000000000 +0800 > @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head * > struct Scsi_Host *shost; > int rtn; > > + /* > + * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO, > + * should not get sense. > + */ > list_for_each_entry_safe(scmd, next, work_q, eh_entry) { > if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) || > - SCSI_SENSE_VALID(scmd)) > + (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || > + SCSI_SENSE_VALID(scmd)) > continue; > > shost = scmd->device->host; > -- _Actually_ you need to test for both, SCSI_EH_CANCEL_CMD _and_ SCSI_EH_ABORT_SCHEDULED. Not every driver is required to implement and/or support asynchronous command aborts, and those will be setting SCSI_EH_CANCEL_CMD even though they've run into a timeout. Cheers, Hannes
bGludXgtc2NzaS1vd25lckB2Z2VyLmtlcm5lbC5vcmcgd3JvdGUgb24gMjAxNS8wNy8zMSAyMTox NzozMzoNCg0KPiBIYW5uZXMgUmVpbmVja2UgPGhhcmVAc3VzZS5kZT4gDQo+IOWPkeS7tuS6ujog IGxpbnV4LXNjc2ktb3duZXJAdmdlci5rZXJuZWwub3JnDQo+IA0KPiAyMDE1LzA3LzMxIDIxOjE3 DQo+IA0KPiDmlLbku7bkuroNCj4gDQo+IGppYW5nLmJpYW8yQHp0ZS5jb20uY24sIGxpbnV4LXNj c2lAdmdlci5rZXJuZWwub3JnLCBKQm90dG9tbGV5QG9kaW4uY29tLCANCg0KPiANCj4g5oqE6YCB DQo+IA0KPiDkuLvpopgNCj4gDQo+IFJlOiBbUGF0Y2hdIHNjc2lfZXJyb3I6IHNob3VsZCBub3Qg Z2V0IHNlbnNlIGZvciB0aW1lb3V0IElPIGluIHNjc2kgDQo+IGVycm9yIGhhbmRsZXINCj4gDQo+ IE9uIDA3LzMxLzIwMTUgMTE6NTIgQU0sIGppYW5nLmJpYW8yQHp0ZS5jb20uY24gd3JvdGU6DQo+ ID4gc2NzaV9lcnJvcjogc2hvdWxkIG5vdCBnZXQgc2Vuc2UgZm9yIHRpbWVvdXQgSU8gaW4gc2Nz aSBlcnJvciBoYW5kbGVyDQo+ID4gDQo+ID4gV2hlbiBhbiBJTyB0aW1lb3V0IG9jY3VycywgdGhl IElPIHdpbGwgYmUgYWJvcnRlZCBpbg0KPiA+IHNjc2lfYWJvcnRfY29tbWFuZCgpIGFuZCBTQ1NJ X0VIX0FCT1JUX1NDSEVEVUxFRCB3aWxsIGJlIHNldC4gQmVjYXVzZQ0KPiA+IG9mIHRoYXQsIHRo ZSBTQ1NJX0VIX0NBTkNFTF9DTUQgd2lsbCBiZSBjbGVhciBpbiBzY3NpX2VoX3NjbWRfYWRkKCku DQo+ID4gU28gd2hlbiBzY3NpIGVycm9yIGhhbmRsZXIgc3RhcnRzLCBpdCB3aWxsIGdldCBzZW5z ZSBmb3IgdGhpcw0KPiA+IHRpbWVvdXQgSU8gYW5kIHRoZSBzY21kIG9mIHRoZSBJTyByZXF1ZXN0 IHdpbGwgYmUgcmV1c2VkLiBJbiB0aGF0DQo+ID4gY2FzZSwgdGhlIHNjbWQgbWF5IGJlIGRvdWJs ZSByZWxlYXNlZCB3aGVuIHJhY2luZyB3aXRoIGlvX2RvbmUoKSwNCj4gPiB3aGljaCB3aWxsIHJl c3VsdCBpbiBjcmFzaC4NCj4gPiBTTyBTQ1NJX0VIX0FCT1JUX1NDSEVEVUxFRCBzaG91bGQgYWxz byBiZSBjaGVja2VkIHdoZW4gZ2V0dGluZyBzZW5zZS4NCj4gPiBUaGUgYnVnIG1heWJlIHJlcHJv ZHVjZWQgd2hlbiB0aGUgbGluayBiZXR3ZWVuIGhvc3QgYW5kIGRpc2sgaXMNCj4gPiB1bnN0YWJs ZS4NCj4gPiANCj4gPiBTaWduZWQtb2ZmLWJ5OiBKaWFuZyBCaWFvIDxqaWFuZy5iaWFvMkB6dGUu Y29tLmNuPg0KPiA+IFNpZ25lZC1vZmYtYnk6IExvbmcgQ2h1biA8bG9uZy5jaHVuQHp0ZS5jb20u Y24+DQo+ID4gUmV2aWV3ZWQtYnk6IFRhbiBIdSA8dGFuLmh1QHp0ZS5jb20uY24+DQo+ID4gUmV2 aWV3ZWQtYnk6IENoZW4gRG9uZ2hhaSA8Y2hlbi5kb25naGFpQHp0ZS5jb20uY24+DQo+ID4gUmV2 aWV3ZWQtYnk6IENhaSBRdSA8Y2FpLnF1QHp0ZS5jb20uY24+DQo+ID4gDQo+ID4gZGlmZiAtdXBy TiBkcml2ZXJzL3Njc2kvc2NzaV9lcnJvci5jIGRyaXZlcnNfbmV3L3Njc2kvc2NzaV9lcnJvci5j DQo+ID4gLS0tIHNjc2kvc2NzaV9lcnJvci5jICAgMjAxNS0wNy0zMSAxNjowMzoxOC4wMDAwMDAw MDAgKzA4MDANCj4gPiArKysgc2NzaV9uZXcvc2NzaV9lcnJvci5jICAgICAgIDIwMTUtMDctMzEg MTY6Mjk6MjUuMDAwMDAwMDAwICswODAwDQo+ID4gQEAgLTExNTYsOSArMTE1NiwxNCBAQCBpbnQg c2NzaV9laF9nZXRfc2Vuc2Uoc3RydWN0IGxpc3RfaGVhZCAqDQo+ID4gICAgICAgICBzdHJ1Y3Qg U2NzaV9Ib3N0ICpzaG9zdDsNCj4gPiAgICAgICAgIGludCBydG47DQo+ID4gDQo+ID4gKyAgICAg ICAvKg0KPiA+ICsgICAgICAgICogSWYgU0NTSV9FSF9BQk9SVF9TQ0hFRFVMRUQgaGFzIGJlZW4g c2V0LCBpdCBpcyB0aW1lb3V0IElPLA0KPiA+ICsgICAgICAgICogc2hvdWxkIG5vdCBnZXQgc2Vu c2UuDQo+ID4gKyAgICAgICAgKi8NCj4gPiAgICAgICAgIGxpc3RfZm9yX2VhY2hfZW50cnlfc2Fm ZShzY21kLCBuZXh0LCB3b3JrX3EsIGVoX2VudHJ5KSB7DQo+ID4gICAgICAgICAgICAgICAgIGlm ICgoc2NtZC0+ZWhfZWZsYWdzICYgU0NTSV9FSF9DQU5DRUxfQ01EKSB8fA0KPiA+IC0gICAgICAg ICAgICAgICAgICAgU0NTSV9TRU5TRV9WQUxJRChzY21kKSkNCj4gPiArICAgICAgICAgICAgICAg ICAgIChzY21kLT5laF9lZmxhZ3MgJiBTQ1NJX0VIX0FCT1JUX1NDSEVEVUxFRCkgfHwNCj4gPiAr ICAgICAgICAgICAgICAgICAgICBTQ1NJX1NFTlNFX1ZBTElEKHNjbWQpKQ0KPiA+ICAgICAgICAg ICAgICAgICAgICAgICAgIGNvbnRpbnVlOw0KPiA+IA0KPiA+ICAgICAgICAgICAgICAgICBzaG9z dCA9IHNjbWQtPmRldmljZS0+aG9zdDsNCj4gPiAtLQ0KPiBfQWN0dWFsbHlfIHlvdSBuZWVkIHRv IHRlc3QgZm9yIGJvdGgsIFNDU0lfRUhfQ0FOQ0VMX0NNRCBfYW5kXw0KPiBTQ1NJX0VIX0FCT1JU X1NDSEVEVUxFRC4NCj4gTm90IGV2ZXJ5IGRyaXZlciBpcyByZXF1aXJlZCB0byBpbXBsZW1lbnQg YW5kL29yIHN1cHBvcnQNCj4gYXN5bmNocm9ub3VzIGNvbW1hbmQgYWJvcnRzLCBhbmQgdGhvc2Ug d2lsbCBiZSBzZXR0aW5nDQo+IFNDU0lfRUhfQ0FOQ0VMX0NNRCBldmVuIHRob3VnaCB0aGV5J3Zl IHJ1biBpbnRvIGEgdGltZW91dC4NCj4gDQpUaGF0J3MgcmlnaHQsIGJ1dCBTQ1NJX0VIX0NBTkNF TF9DTUQgX2hhc18gYWxyZWFkeSBiZWVuIHRlc3RlZCANCmluIHRoZSBjdXJyZW50IGNvZGUsIHNv IHRoZXJlJ3Mgbm8gbmVlZCB0byBhZGQgaW4gdGhlIHBhdGNoLiANCkFmdGVyIHBhdGNoZWQsIGJv dGggU0NTSV9FSF9DQU5DRUxfQ01EIF9hbmRfIA0KU0NTSV9FSF9BQk9SVF9TQ0hFRFVMRUQgYXJl IHRlc3RlZCBoZXJlLCB0aGF0J2xsIGVuc3VyZSBubyANCmdldHRpbmcgc2Vuc2UgZm9yICp0aW1l b3V0IGlvKi4NCg0KVGhhbmtzLg0K -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/01/2015 04:26 AM, jiang.biao2@zte.com.cn wrote: > Hannes Reinecke <hare@suse.de> ?? 2015/07/31 21:17:33: > >> Hannes Reinecke <hare@suse.de> >> 2015/07/31 21:17 >> >> ??? >> >> jiang.biao2@zte.com.cn, linux-scsi@vger.kernel.org, JBottomley@odin.com, > >> >> ?? >> >> ?? >> >> Re: [Patch] scsi_error: should not get sense for timeout IO in scsi >> error handler >> >> On 07/31/2015 11:52 AM, jiang.biao2@zte.com.cn wrote: >>> scsi_error: should not get sense for timeout IO in scsi error handler >>> >>> When an IO timeout occurs, the IO will be aborted in >>> scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because >>> of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add(). >>> So when scsi error handler starts, it will get sense for this >>> timeout IO and the scmd of the IO request will be reused. In that >>> case, the scmd may be double released when racing with io_done(), >>> which will result in crash. >>> SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense. >>> The bug maybe reproduced when the link between host and disk is >>> unstable. >>> >>> Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> >>> Signed-off-by: Long Chun <long.chun@zte.com.cn> >>> Reviewed-by: Tan Hu <tan.hu@zte.com.cn> >>> Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn> >>> Reviewed-by: Cai Qu <cai.qu@zte.com.cn> >>> >>> diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c >>> --- scsi/scsi_error.c 2015-07-31 16:03:18.000000000 +0800 >>> +++ scsi_new/scsi_error.c 2015-07-31 16:29:25.000000000 +0800 >>> @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head * >>> struct Scsi_Host *shost; >>> int rtn; >>> >>> + /* >>> + * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO, >>> + * should not get sense. >>> + */ >>> list_for_each_entry_safe(scmd, next, work_q, eh_entry) { >>> if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) || >>> - SCSI_SENSE_VALID(scmd)) >>> + (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || >>> + SCSI_SENSE_VALID(scmd)) >>> continue; >>> >>> shost = scmd->device->host; >>> -- >> _Actually_ you need to test for both, SCSI_EH_CANCEL_CMD _and_ >> SCSI_EH_ABORT_SCHEDULED. >> Not every driver is required to implement and/or support >> asynchronous command aborts, and those will be setting >> SCSI_EH_CANCEL_CMD even though they've run into a timeout. > That's right, but SCSI_EH_CANCEL_CMD _has_ already been tested > in the current code, so there's no need to add in the patch. > After patched, both SCSI_EH_CANCEL_CMD _and_ > SCSI_EH_ABORT_SCHEDULED are tested here, that'll ensure no > getting sense for timeout io. > Ah, right. I've misread the patch. Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes
On Fri, 2015-07-31 at 17:52 +0800, jiang.biao2@zte.com.cn wrote: > scsi_error: should not get sense for timeout IO in scsi error handler > > When an IO timeout occurs, the IO will be aborted in > scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because > of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add(). > So when scsi error handler starts, it will get sense for this > timeout IO and the scmd of the IO request will be reused. In that > case, the scmd may be double released when racing with io_done(), > which will result in crash. > SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense. > The bug maybe reproduced when the link between host and disk is > unstable. > > Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> > Signed-off-by: Long Chun <long.chun@zte.com.cn> > Reviewed-by: Tan Hu <tan.hu@zte.com.cn> > Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn> > Reviewed-by: Cai Qu <cai.qu@zte.com.cn> > > diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c to apply easily, diffs need to start at the top of the tree, please. > --- scsi/scsi_error.c 2015-07-31 16:03:18.000000000 +0800 > +++ scsi_new/scsi_error.c 2015-07-31 16:29:25.000000000 +0800 > @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head * > struct Scsi_Host *shost; > int rtn; > > + /* > + * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO, > + * should not get sense. > + */ > list_for_each_entry_safe(scmd, next, work_q, eh_entry) { and here all the tabs have been converted to spaces; you need to read Documentation/email-clients.txt for details on how to avoid this. I managed to fix it up this time, but won't again. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c --- scsi/scsi_error.c 2015-07-31 16:03:18.000000000 +0800 +++ scsi_new/scsi_error.c 2015-07-31 16:29:25.000000000 +0800 @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head * struct Scsi_Host *shost; int rtn; + /* + * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO, + * should not get sense. + */ list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) || - SCSI_SENSE_VALID(scmd)) + (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || + SCSI_SENSE_VALID(scmd)) continue; shost = scmd->device->host;