diff mbox

sd: succeed check_event if device is not removable

Message ID 1516292559-26649-1-git-send-email-jinpu.wangl@profitbricks.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Jinpu Wang Jan. 18, 2018, 4:22 p.m. UTC
From: Jack Wang <jinpu.wang@profitbricks.com>

The check_events interface was added for check if device changes,
mainly for device is removable eg. CDROM

In sd_open, it checks if device is removable then check_disk_change.

when the device is not removable, we can simple succeeds the call
without send TUR.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/scsi/sd.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

James Bottomley Jan. 18, 2018, 4:32 p.m. UTC | #1
On Thu, 2018-01-18 at 17:22 +0100, Jack Wang wrote:
> From: Jack Wang <jinpu.wang@profitbricks.com>
> 
> The check_events interface was added for check if device changes,
> mainly for device is removable eg. CDROM
> 
> In sd_open, it checks if device is removable then check_disk_change.
> 
> when the device is not removable, we can simple succeeds the call
> without send TUR.
> 
> Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
> ---
>  drivers/scsi/sd.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index ab75ebd..773ce81 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1576,6 +1576,10 @@ static unsigned int sd_check_events(struct
> gendisk *disk, unsigned int clearing)
>  	sdp = sdkp->device;
>  	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp,
> "sd_check_events\n"));
>  
> +	if (!sdp->removable) {
> +		retval = 0;
> +		goto out;
> +	}

This looks like very much the wrong place to fix whatever problem
you're seeing is.  We could simply avoid setting up the events work
function in genhd.c:device_add_disk(), which would be way more
efficient.  However, I think some devices do require being probed
occasionally with a TUR because its the only way SCSI gets AENs (not
that we make much use of them).

So first of all, what's the actual problem?

James
Jinpu Wang Jan. 19, 2018, 9:59 a.m. UTC | #2
Hi James,

Thanks for prompt reply.

On Thu, Jan 18, 2018 at 5:32 PM, James Bottomley
<jejb@linux.vnet.ibm.com> wrote:
> On Thu, 2018-01-18 at 17:22 +0100, Jack Wang wrote:
>> From: Jack Wang <jinpu.wang@profitbricks.com>
>>
>> The check_events interface was added for check if device changes,
>> mainly for device is removable eg. CDROM
>>
>> In sd_open, it checks if device is removable then check_disk_change.
>>
>> when the device is not removable, we can simple succeeds the call
>> without send TUR.
>>
>> Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
>> ---
>>  drivers/scsi/sd.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index ab75ebd..773ce81 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1576,6 +1576,10 @@ static unsigned int sd_check_events(struct
>> gendisk *disk, unsigned int clearing)
>>       sdp = sdkp->device;
>>       SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp,
>> "sd_check_events\n"));
>>
>> +     if (!sdp->removable) {
>> +             retval = 0;
>> +             goto out;
>> +     }
>
> This looks like very much the wrong place to fix whatever problem
> you're seeing is.  We could simply avoid setting up the events work
> function in genhd.c:device_add_disk(), which would be way more
> efficient.  However, I think some devices do require being probed
> occasionally with a TUR because its the only way SCSI gets AENs (not
> that we make much use of them).
>
> So first of all, what's the actual problem?
>
> James
>

Eagle eyes! The real problem I want to workaround is deadlock between
check_event and device removal, reported here:
https://patchwork.kernel.org/patch/10057977/

We've see in our production during incident, storage failure lead scsi
error handle and
offlined both devices, so both multipaths down, raid1 failed one leg (the
dm device).

During incident recovery, when tried to delete the broken scsi device,
there are processes in D state.

I failed to reproduce on my test machines, this is why I want to
workaround the problem by avoiding TUR.

Thanks,
diff mbox

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ab75ebd..773ce81 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1576,6 +1576,10 @@  static unsigned int sd_check_events(struct gendisk *disk, unsigned int clearing)
 	sdp = sdkp->device;
 	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_check_events\n"));
 
+	if (!sdp->removable) {
+		retval = 0;
+		goto out;
+	}
 	/*
 	 * If the device is offline, don't send any commands - just pretend as
 	 * if the command failed.  If the device ever comes back online, we