Message ID | 20181007083537.89131-2-hare@suse.de (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | libfc state machine fixes | expand |
This change would permit a device that was OFFLINE (for any reason) to go back to RUNNING via: OFFLINE -> BLOCKED -> RUNNING Obviously that was your intention, but e.g. if a device was put OFFLINE due to exceeding max_medium_access_timeouts a fabric event and recovery would put it back online. Same would be true if someone had put the device offline via sysfs. If an rport goes away due to devloss, and comes back, then we rescan so this doesn't matter because we recreate the sdev anyway, right? -Ewan On Sun, 2018-10-07 at 10:35 +0200, Hannes Reinecke wrote: > From: Hannes Reinecke <hare@suse.com> > > When an RSCN gets delayed (or not being sent at all) the transport > class will detect an error, EH kicks in, and eventually will be > setting the device to offline. > If we receive an RSCN after that the device will stay in 'offline'. > This patch allows for an 'offline' to 'blocked' transition, thereby > allowing the device to become active again. > > Signed-off-by: Hannes Reinecke <hare@suse.com> > --- > drivers/scsi/scsi_lib.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index aaa1819b0a69..7db3c5fae469 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -2764,6 +2764,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state) > switch (oldstate) { > case SDEV_RUNNING: > case SDEV_CREATED_BLOCK: > + case SDEV_OFFLINE: > break; > default: > goto illegal;
On 10/8/18 10:22 PM, Ewan D. Milne wrote: > This change would permit a device that was OFFLINE (for any reason) > to go back to RUNNING via: OFFLINE -> BLOCKED -> RUNNING > Correct. > Obviously that was your intention, but e.g. if a device was put > OFFLINE due to exceeding max_medium_access_timeouts a fabric > event and recovery would put it back online. > > Same would be true if someone had put the device offline via sysfs. > > If an rport goes away due to devloss, and comes back, then we rescan > so this doesn't matter because we recreate the sdev anyway, right? > Exactly. Once devloss_tmo triggers the sdev will be recreated and we lose all settings we might have done. And experience shows that about 95% of all SCSI EH invocations on FC are due to a SAN configuration problems; I think I've seen a 'real' SCSI error only 3 times, and only one of them registered as a medium error. So rechecking the device after RSCN is a sensible choice. Cheers, Hannes
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index aaa1819b0a69..7db3c5fae469 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2764,6 +2764,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state) switch (oldstate) { case SDEV_RUNNING: case SDEV_CREATED_BLOCK: + case SDEV_OFFLINE: break; default: goto illegal;