diff mbox series

[1/3] scsi: Allow state transitions from OFFLINE to BLOCKED

Message ID 20181007083537.89131-2-hare@suse.de (mailing list archive)
State Accepted
Headers show
Series libfc state machine fixes | expand

Commit Message

Hannes Reinecke Oct. 7, 2018, 8:35 a.m. UTC
From: Hannes Reinecke <hare@suse.com>

When an RSCN gets delayed (or not being sent at all) the transport
class will detect an error, EH kicks in, and eventually will be
setting the device to offline.
If we receive an RSCN after that the device will stay in 'offline'.
This patch allows for an 'offline' to 'blocked' transition, thereby
allowing the device to become active again.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/scsi_lib.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Ewan Milne Oct. 8, 2018, 8:22 p.m. UTC | #1
This change would permit a device that was OFFLINE (for any reason)
to go back to RUNNING via:  OFFLINE -> BLOCKED -> RUNNING

Obviously that was your intention, but e.g. if a device was put
OFFLINE due to exceeding max_medium_access_timeouts a fabric
event and recovery would put it back online.

Same would be true if someone had put the device offline via sysfs.

If an rport goes away due to devloss, and comes back, then we rescan
so this doesn't matter because we recreate the sdev anyway, right?

-Ewan


On Sun, 2018-10-07 at 10:35 +0200, Hannes Reinecke wrote:
> From: Hannes Reinecke <hare@suse.com>
> 
> When an RSCN gets delayed (or not being sent at all) the transport
> class will detect an error, EH kicks in, and eventually will be
> setting the device to offline.
> If we receive an RSCN after that the device will stay in 'offline'.
> This patch allows for an 'offline' to 'blocked' transition, thereby
> allowing the device to become active again.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/scsi/scsi_lib.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index aaa1819b0a69..7db3c5fae469 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2764,6 +2764,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
>  		switch (oldstate) {
>  		case SDEV_RUNNING:
>  		case SDEV_CREATED_BLOCK:
> +		case SDEV_OFFLINE:
>  			break;
>  		default:
>  			goto illegal;
Hannes Reinecke Oct. 9, 2018, 6:05 a.m. UTC | #2
On 10/8/18 10:22 PM, Ewan D. Milne wrote:
> This change would permit a device that was OFFLINE (for any reason)
> to go back to RUNNING via:  OFFLINE -> BLOCKED -> RUNNING
> 
Correct.

> Obviously that was your intention, but e.g. if a device was put
> OFFLINE due to exceeding max_medium_access_timeouts a fabric
> event and recovery would put it back online.
> 
> Same would be true if someone had put the device offline via sysfs.
> 
> If an rport goes away due to devloss, and comes back, then we rescan
> so this doesn't matter because we recreate the sdev anyway, right?
> 
Exactly. Once devloss_tmo triggers the sdev will be recreated and we 
lose all settings we might have done.
And experience shows that about 95% of all SCSI EH invocations on FC are 
due to a SAN configuration problems; I think I've seen a 'real' SCSI 
error only 3 times, and only one of them registered as a medium error. 
So rechecking the device after RSCN is a sensible choice.

Cheers,

Hannes
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index aaa1819b0a69..7db3c5fae469 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2764,6 +2764,7 @@  scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		switch (oldstate) {
 		case SDEV_RUNNING:
 		case SDEV_CREATED_BLOCK:
+		case SDEV_OFFLINE:
 			break;
 		default:
 			goto illegal;