Message ID | 20200930080256.90964-1-hare@suse.de (mailing list archive) |
---|---|
Headers | show |
Series | scsi: remove devices in ALUA transitioning status | expand |
Hannes, > during testing we found that there is an issue with dev_loss_tmo and > devices in ALUA transitioning state. What happens is that I/O gets > requeued via BLK_STS_RESOURCE for these devices, so when dev_loss_tmo > triggers the SCSI core cannot flush the request list as I/O is simply > requeued. > > So when the driver is trying to re-establish the device it'll wait for > that last reference to drop in order to re-attach the device, but as > I/O is still outstanding on the (old) device it'll wait for ever. > > Fix this by returning 'BLK_STS_AGAIN' from scsi_dh_alua when the > device is in ALUA transitioning, and also set the 'transitioning' > state when scsi_dh_alua is receiving a sense code, and not only after > scsi_dh_alua successfully received the response to a REPORT TARGET > PORT GROUPS command. It would be good to get this revived/reviewed. Thanks!
On Wed, 2020-09-30 at 10:02 +0200, Hannes Reinecke wrote: > Hi all, > > during testing we found that there is an issue with dev_loss_tmo and > devices in ALUA transitioning state. > What happens is that I/O gets requeued via BLK_STS_RESOURCE for these > devices, so when dev_loss_tmo triggers the SCSI core cannot flush the > request list as I/O is simply requeued. > > So when the driver is trying to re-establish the device it'll wait > for > that last reference to drop in order to re-attach the device, but as > I/O > is still outstanding on the (old) device it'll wait for ever. > > Fix this by returning 'BLK_STS_AGAIN' from scsi_dh_alua when the > device > is in ALUA transitioning, and also set the 'transitioning' state when > scsi_dh_alua is receiving a sense code, and not only after > scsi_dh_alua > successfully received the response to a REPORT TARGET PORT GROUPS > command. > > Hannes Reinecke (4): > block: return status code in blk_mq_end_request() > scsi_dh_alua: return BLK_STS_AGAIN for ALUA transitioning state > scsi_dh_alua: set 'transitioning' state on unit attention > scsi: return BLK_STS_AGAIN for ALUA transitioning > > block/blk-mq.c | 2 +- > drivers/scsi/device_handler/scsi_dh_alua.c | 10 +++++++++- > drivers/scsi/scsi_lib.c | 8 ++++++++ > 3 files changed, 18 insertions(+), 2 deletions(-) > We had a report of I/O hangs during storage controller resets and analysis of the kernel state showed the sdev in ALUA transitioning. The patch set fixes the ALUA transitioning issue, it looks good. There was a reproducible test case. Reviewed-by: Ewan D. Milne <emilne@redhat.com> -Ewan
Ewan, Hannes, >> during testing we found that there is an issue with dev_loss_tmo and >> devices in ALUA transitioning state. What happens is that I/O gets >> requeued via BLK_STS_RESOURCE for these devices, so when dev_loss_tmo >> triggers the SCSI core cannot flush the request list as I/O is simply >> requeued. Applied to 5.11/scsi-staging, thanks!