[1/3] scsi: Allow state transitions from OFFLINE to BLOCKED

Message ID	20181007083537.89131-2-hare@suse.de (mailing list archive)
State	Accepted
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Hannes Reinecke <hare@suse.de> To: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de>, James Bottomley <james.bottomley@hansenpartnership.com>, linux-scsi@vger.kernel.org, Chad Dupuis <chad.dupuis@cavium.com>, Arun Easi <arun.easi@cavium.com>, Hannes Reinecke <hare@suse.com> Subject: [PATCH 1/3] scsi: Allow state transitions from OFFLINE to BLOCKED Date: Sun, 7 Oct 2018 10:35:35 +0200 Message-Id: <20181007083537.89131-2-hare@suse.de> In-Reply-To: <20181007083537.89131-1-hare@suse.de> References: <20181007083537.89131-1-hare@suse.de> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk
Series	libfc state machine fixes \| expand [0/3] libfc state machine fixes [1/3] scsi: Allow state transitions from OFFLINE to BLOCKED [2/3] libfc: check fc_frame_payload_get() return value for null [3/3] libfc: retry PRLI if we cannot analyse the payload

Message ID

20181007083537.89131-2-hare@suse.de (mailing list archive)

State

Accepted

Headers

From: Hannes Reinecke <hare@suse.de>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>,
        James Bottomley <james.bottomley@hansenpartnership.com>,
        linux-scsi@vger.kernel.org, Chad Dupuis <chad.dupuis@cavium.com>,
        Arun Easi <arun.easi@cavium.com>,
        Hannes Reinecke <hare@suse.com>
Subject: [PATCH 1/3] scsi: Allow state transitions from OFFLINE to BLOCKED
Date: Sun,  7 Oct 2018 10:35:35 +0200
Message-Id: <20181007083537.89131-2-hare@suse.de>
In-Reply-To: <20181007083537.89131-1-hare@suse.de>
References: <20181007083537.89131-1-hare@suse.de>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk

Series

libfc state machine fixes | expand

Commit Message

Hannes Reinecke Oct. 7, 2018, 8:35 a.m. UTC

From: Hannes Reinecke <hare@suse.com>

When an RSCN gets delayed (or not being sent at all) the transport
class will detect an error, EH kicks in, and eventually will be
setting the device to offline.
If we receive an RSCN after that the device will stay in 'offline'.
This patch allows for an 'offline' to 'blocked' transition, thereby
allowing the device to become active again.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/scsi_lib.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Ewan Milne Oct. 8, 2018, 8:22 p.m. UTC | #1

This change would permit a device that was OFFLINE (for any reason)
to go back to RUNNING via:  OFFLINE -> BLOCKED -> RUNNING

Obviously that was your intention, but e.g. if a device was put
OFFLINE due to exceeding max_medium_access_timeouts a fabric
event and recovery would put it back online.

Same would be true if someone had put the device offline via sysfs.

If an rport goes away due to devloss, and comes back, then we rescan
so this doesn't matter because we recreate the sdev anyway, right?

-Ewan


On Sun, 2018-10-07 at 10:35 +0200, Hannes Reinecke wrote:
> From: Hannes Reinecke <hare@suse.com>
> 
> When an RSCN gets delayed (or not being sent at all) the transport
> class will detect an error, EH kicks in, and eventually will be
> setting the device to offline.
> If we receive an RSCN after that the device will stay in 'offline'.
> This patch allows for an 'offline' to 'blocked' transition, thereby
> allowing the device to become active again.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/scsi/scsi_lib.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index aaa1819b0a69..7db3c5fae469 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2764,6 +2764,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
>  		switch (oldstate) {
>  		case SDEV_RUNNING:
>  		case SDEV_CREATED_BLOCK:
> +		case SDEV_OFFLINE:
>  			break;
>  		default:
>  			goto illegal;

Hannes Reinecke Oct. 9, 2018, 6:05 a.m. UTC | #2

On 10/8/18 10:22 PM, Ewan D. Milne wrote:
> This change would permit a device that was OFFLINE (for any reason)
> to go back to RUNNING via:  OFFLINE -> BLOCKED -> RUNNING
> 
Correct.

> Obviously that was your intention, but e.g. if a device was put
> OFFLINE due to exceeding max_medium_access_timeouts a fabric
> event and recovery would put it back online.
> 
> Same would be true if someone had put the device offline via sysfs.
> 
> If an rport goes away due to devloss, and comes back, then we rescan
> so this doesn't matter because we recreate the sdev anyway, right?
> 
Exactly. Once devloss_tmo triggers the sdev will be recreated and we 
lose all settings we might have done.
And experience shows that about 95% of all SCSI EH invocations on FC are 
due to a SAN configuration problems; I think I've seen a 'real' SCSI 
error only 3 times, and only one of them registered as a medium error. 
So rechecking the device after RSCN is a sensible choice.

Cheers,

Hannes

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index aaa1819b0a69..7db3c5fae469 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2764,6 +2764,7 @@  scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		switch (oldstate) {
 		case SDEV_RUNNING:
 		case SDEV_CREATED_BLOCK:
+		case SDEV_OFFLINE:
 			break;
 		default:
 			goto illegal;

[1/3] scsi: Allow state transitions from OFFLINE to BLOCKED

Commit Message

Comments

Patch