diff mbox series

scsi: sd: mark the scsi device in shutdown as deleted

Message ID 20230330164943.11607-1-thenzl@redhat.com (mailing list archive)
State Superseded
Headers show
Series scsi: sd: mark the scsi device in shutdown as deleted | expand

Commit Message

Tomas Henzl March 30, 2023, 4:49 p.m. UTC
Set the state to deleted in sd_shutdown so that the attached LLD
doesn't receive new I/O (can happen when in kexec) later after
LLD's shutdown function has been called.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
---
 drivers/scsi/sd.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Bart Van Assche March 30, 2023, 5:08 p.m. UTC | #1
On 3/30/23 09:49, Tomas Henzl wrote:
> Set the state to deleted in sd_shutdown so that the attached LLD
> doesn't receive new I/O (can happen when in kexec) later after
> LLD's shutdown function has been called.
> 
> Signed-off-by: Tomas Henzl <thenzl@redhat.com>
> ---
>   drivers/scsi/sd.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 4f28dd617eca..8095f0302e66 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3694,10 +3694,13 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>   static void sd_shutdown(struct device *dev)
>   {
>   	struct scsi_disk *sdkp = dev_get_drvdata(dev);
> +	struct scsi_device *sdp;
>   
>   	if (!sdkp)
>   		return;         /* this can happen */
>   
> +	sdp = sdkp->device;
> +
>   	if (pm_runtime_suspended(dev))
>   		return;
>   
> @@ -3710,6 +3713,10 @@ static void sd_shutdown(struct device *dev)
>   		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>   		sd_start_stop_device(sdkp, 0);
>   	}
> +
> +	mutex_lock(&sdp->state_mutex);
> +	scsi_device_set_state(sdp, SDEV_DEL);
> +	mutex_unlock(&sdp->state_mutex);
>   }

Shouldn't new I/O to the SCSI disk be prevented whether or not the SCSI 
disk has been runtime suspended?

Thanks,

Bart.
Mike Christie March 30, 2023, 5:12 p.m. UTC | #2
On 3/30/23 11:49 AM, Tomas Henzl wrote:
> Set the state to deleted in sd_shutdown so that the attached LLD
> doesn't receive new I/O (can happen when in kexec) later after
> LLD's shutdown function has been called.
> 
> Signed-off-by: Tomas Henzl <thenzl@redhat.com>
> ---
>  drivers/scsi/sd.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 4f28dd617eca..8095f0302e66 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3694,10 +3694,13 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>  static void sd_shutdown(struct device *dev)
>  {
>  	struct scsi_disk *sdkp = dev_get_drvdata(dev);
> +	struct scsi_device *sdp;
>  
>  	if (!sdkp)
>  		return;         /* this can happen */
>  
> +	sdp = sdkp->device;
> +
>  	if (pm_runtime_suspended(dev))
>  		return;
>  
> @@ -3710,6 +3713,10 @@ static void sd_shutdown(struct device *dev)
>  		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>  		sd_start_stop_device(sdkp, 0);
>  	}
> +
> +	mutex_lock(&sdp->state_mutex);
> +	scsi_device_set_state(sdp, SDEV_DEL);
> +	mutex_unlock(&sdp->state_mutex);
>  }

If this is run for device removal what state will be in here?

Are we going to do:
1. __scsi_remove_device sets the state to SDEV_CANCEL at the beginning
of the function
2. device_unregister causes sd_remove to be called and sd_shutdown sets
the state to SDEV_DEL
3. then __scsi_remove_device sets the state to SDEV_DEL at the bottom,
so we get "Illegal state transition" errors printed.
Tomas Henzl March 31, 2023, 9:03 a.m. UTC | #3
On 3/30/23 19:08, Bart Van Assche wrote:
> On 3/30/23 09:49, Tomas Henzl wrote:
>> Set the state to deleted in sd_shutdown so that the attached LLD
>> doesn't receive new I/O (can happen when in kexec) later after
>> LLD's shutdown function has been called.
>>
>> Signed-off-by: Tomas Henzl <thenzl@redhat.com>
>> ---
>>   drivers/scsi/sd.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 4f28dd617eca..8095f0302e66 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3694,10 +3694,13 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>>   static void sd_shutdown(struct device *dev)
>>   {
>>   	struct scsi_disk *sdkp = dev_get_drvdata(dev);
>> +	struct scsi_device *sdp;
>>   
>>   	if (!sdkp)
>>   		return;         /* this can happen */
>>   
>> +	sdp = sdkp->device;
>> +
>>   	if (pm_runtime_suspended(dev))
>>   		return;
>>   
>> @@ -3710,6 +3713,10 @@ static void sd_shutdown(struct device *dev)
>>   		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>>   		sd_start_stop_device(sdkp, 0);
>>   	}
>> +
>> +	mutex_lock(&sdp->state_mutex);
>> +	scsi_device_set_state(sdp, SDEV_DEL);
>> +	mutex_unlock(&sdp->state_mutex);
>>   }
> 
> Shouldn't new I/O to the SCSI disk be prevented whether or not the SCSI 
> disk has been runtime suspended?
Thanks, I'll fix that.
Tomas

> 
> Thanks,
> 
> Bart.
>
Tomas Henzl March 31, 2023, 9:10 a.m. UTC | #4
On 3/30/23 19:12, Mike Christie wrote:
> On 3/30/23 11:49 AM, Tomas Henzl wrote:
>> Set the state to deleted in sd_shutdown so that the attached LLD
>> doesn't receive new I/O (can happen when in kexec) later after
>> LLD's shutdown function has been called.
>>
>> Signed-off-by: Tomas Henzl <thenzl@redhat.com>
>> ---
>>  drivers/scsi/sd.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 4f28dd617eca..8095f0302e66 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3694,10 +3694,13 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>>  static void sd_shutdown(struct device *dev)
>>  {
>>  	struct scsi_disk *sdkp = dev_get_drvdata(dev);
>> +	struct scsi_device *sdp;
>>  
>>  	if (!sdkp)
>>  		return;         /* this can happen */
>>  
>> +	sdp = sdkp->device;
>> +
>>  	if (pm_runtime_suspended(dev))
>>  		return;
>>  
>> @@ -3710,6 +3713,10 @@ static void sd_shutdown(struct device *dev)
>>  		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>>  		sd_start_stop_device(sdkp, 0);
>>  	}
>> +
>> +	mutex_lock(&sdp->state_mutex);
>> +	scsi_device_set_state(sdp, SDEV_DEL);
>> +	mutex_unlock(&sdp->state_mutex);
>>  }
> 
> If this is run for device removal what state will be in here?
> 
> Are we going to do:
> 1. __scsi_remove_device sets the state to SDEV_CANCEL at the beginning
> of the function
> 2. device_unregister causes sd_remove to be called and sd_shutdown sets
> the state to SDEV_DEL
> 3. then ide sets the state to SDEV_DEL at the bottom,
> so we get "Illegal state transition" errors printed.
> 
Thanks for looking.
A state change from SDEV_DEL to SDEV_DEL isn't illegal (state ==
oldstate) or am I wrong?
James Bottomley March 31, 2023, 11:48 a.m. UTC | #5
On Thu, 2023-03-30 at 12:12 -0500, Mike Christie wrote:
> On 3/30/23 11:49 AM, Tomas Henzl wrote:
> > Set the state to deleted in sd_shutdown so that the attached LLD
> > doesn't receive new I/O (can happen when in kexec) later after
> > LLD's shutdown function has been called.
> > 
> > Signed-off-by: Tomas Henzl <thenzl@redhat.com>
> > ---
> >  drivers/scsi/sd.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > index 4f28dd617eca..8095f0302e66 100644
> > --- a/drivers/scsi/sd.c
> > +++ b/drivers/scsi/sd.c
> > @@ -3694,10 +3694,13 @@ static int sd_start_stop_device(struct
> > scsi_disk *sdkp, int start)
> >  static void sd_shutdown(struct device *dev)
> >  {
> >         struct scsi_disk *sdkp = dev_get_drvdata(dev);
> > +       struct scsi_device *sdp;
> >  
> >         if (!sdkp)
> >                 return;         /* this can happen */
> >  
> > +       sdp = sdkp->device;
> > +
> >         if (pm_runtime_suspended(dev))
> >                 return;
> >  
> > @@ -3710,6 +3713,10 @@ static void sd_shutdown(struct device *dev)
> >                 sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
> >                 sd_start_stop_device(sdkp, 0);
> >         }
> > +
> > +       mutex_lock(&sdp->state_mutex);
> > +       scsi_device_set_state(sdp, SDEV_DEL);
> > +       mutex_unlock(&sdp->state_mutex);
> >  }
> 
> If this is run for device removal what state will be in here?
> 
> Are we going to do:
> 1. __scsi_remove_device sets the state to SDEV_CANCEL at the
> beginning of the function

It will also interfere with target and host device removal.  They
traverse their own lists and assume that anything in DEL is already
being removed, which won't be the case here.  So basically, after this
happens it's impossible to clean the device trees.  It also means any
I/O to the root device wouldn't be allowed.

I assume the contention is that if we get here, we're either going for
immediate shutdown or all the root device remounting to read only has
already been done?  If so, could you say that?

James
Tomas Henzl March 31, 2023, 2:11 p.m. UTC | #6
On 3/31/23 13:48, James Bottomley wrote:
> On Thu, 2023-03-30 at 12:12 -0500, Mike Christie wrote:
>> On 3/30/23 11:49 AM, Tomas Henzl wrote:
>>> Set the state to deleted in sd_shutdown so that the attached LLD
>>> doesn't receive new I/O (can happen when in kexec) later after
>>> LLD's shutdown function has been called.
>>>
>>> Signed-off-by: Tomas Henzl <thenzl@redhat.com>
>>> ---
>>>  drivers/scsi/sd.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>>> index 4f28dd617eca..8095f0302e66 100644
>>> --- a/drivers/scsi/sd.c
>>> +++ b/drivers/scsi/sd.c
>>> @@ -3694,10 +3694,13 @@ static int sd_start_stop_device(struct
>>> scsi_disk *sdkp, int start)
>>>  static void sd_shutdown(struct device *dev)
>>>  {
>>>         struct scsi_disk *sdkp = dev_get_drvdata(dev);
>>> +       struct scsi_device *sdp;
>>>  
>>>         if (!sdkp)
>>>                 return;         /* this can happen */
>>>  
>>> +       sdp = sdkp->device;
>>> +
>>>         if (pm_runtime_suspended(dev))
>>>                 return;
>>>  
>>> @@ -3710,6 +3713,10 @@ static void sd_shutdown(struct device *dev)
>>>                 sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>>>                 sd_start_stop_device(sdkp, 0);
>>>         }
>>> +
>>> +       mutex_lock(&sdp->state_mutex);
>>> +       scsi_device_set_state(sdp, SDEV_DEL);
>>> +       mutex_unlock(&sdp->state_mutex);
>>>  }
>>
>> If this is run for device removal what state will be in here?
>>
>> Are we going to do:
>> 1. __scsi_remove_device sets the state to SDEV_CANCEL at the
>> beginning of the function
> 
> It will also interfere with target and host device removal.  They
> traverse their own lists and assume that anything in DEL is already
> being removed, which won't be the case here.  So basically, after this
> happens it's impossible to clean the device trees.  It also means any
> I/O to the root device wouldn't be allowed.
How will it interfere? After a return from sd_remove or via
device_unregister->__scsi_remove_device the device state is SDEV_DEL
regardless whether this patch has been added or not. Or is sd_shutdown
called directly?

> I assume the contention is that if we get here, we're either going for
> immediate shutdown or all the root device remounting to read only has
> already been done?  If so, could you say that?
I can't say that, quite the opposite (see body of the mail). When the
system goes shutdown the individual device's .shutdown is called. Just
moments after sd shutdown the LLD shutdown is entered and the driver
stops any I/O immediately anyway. With this patch the I/O is stopped
before reaching LLD with a reasonable message and without error
correction mechanism in place.

I also assume that no I/O after sd_shutdown was projected when it was
written as there is a cache sync followed by a device power down.

Tomas

> 
> James
>
James Bottomley March 31, 2023, 3:06 p.m. UTC | #7
On Fri, 2023-03-31 at 16:11 +0200, Tomas Henzl wrote:
> On 3/31/23 13:48, James Bottomley wrote:
> > On Thu, 2023-03-30 at 12:12 -0500, Mike Christie wrote:
[...]
> > > Are we going to do:
> > > 1. __scsi_remove_device sets the state to SDEV_CANCEL at the
> > > beginning of the function
> > 
> > It will also interfere with target and host device removal.  They
> > traverse their own lists and assume that anything in DEL is already
> > being removed, which won't be the case here.  So basically, after
> > this happens it's impossible to clean the device trees.  It also
> > means any I/O to the root device wouldn't be allowed.
>  
> How will it interfere? After a return from sd_remove or via
> device_unregister->__scsi_remove_device the device state is SDEV_DEL
> regardless whether this patch has been added or not. Or is
> sd_shutdown called directly?

I thought it was called directly from the restart logic via the
device_shutdown() call (see kernel/reboot.c:kernel_restart_prepare())
and was completely independent of any other state transitions within
the device model ... however, I'd really like *you* to confirm this.

I think by the time it's called, we're already in the system death
throes, so if there were going to be an orderly shut down it's already
happened, but if not, once you set a devices state to DEL, it will
defeat any later attempt to do orderly remove of the host or target.

> > I assume the contention is that if we get here, we're either going
> > for immediate shutdown or all the root device remounting to read
> > only has already been done?  If so, could you say that?
>  
> I can't say that, quite the opposite (see body of the mail). When the
> system goes shutdown the individual device's .shutdown is called.
> Just moments after sd shutdown the LLD shutdown is entered and the
> driver stops any I/O immediately anyway. With this patch the I/O is
> stopped before reaching LLD with a reasonable message and without
> error correction mechanism in place.
> 
> I also assume that no I/O after sd_shutdown was projected when it was
> written as there is a cache sync followed by a device power down.

It seems reasonable, but can you validate that?  Shutdown is called
both from reboot and kexec and if we stop IO to a quiescing root device
before it completes, so that all filesystems come back dirty, we'll
have a lot of unhappy users ...

James
Tomas Henzl March 31, 2023, 6:32 p.m. UTC | #8
On 3/31/23 17:06, James Bottomley wrote:
> On Fri, 2023-03-31 at 16:11 +0200, Tomas Henzl wrote:
>> On 3/31/23 13:48, James Bottomley wrote:
>>> On Thu, 2023-03-30 at 12:12 -0500, Mike Christie wrote:
> [...]
>>>> Are we going to do:
>>>> 1. __scsi_remove_device sets the state to SDEV_CANCEL at the
>>>> beginning of the function
>>>
>>> It will also interfere with target and host device removal.  They
>>> traverse their own lists and assume that anything in DEL is already
>>> being removed, which won't be the case here.  So basically, after
>>> this happens it's impossible to clean the device trees.  It also
>>> means any I/O to the root device wouldn't be allowed.
>>  
>> How will it interfere? After a return from sd_remove or via
>> device_unregister->__scsi_remove_device the device state is SDEV_DEL
>> regardless whether this patch has been added or not. Or is
>> sd_shutdown called directly?
> 
> I thought it was called directly from the restart logic via the
> device_shutdown() call (see kernel/reboot.c:kernel_restart_prepare())
> and was completely independent of any other state transitions within
> the device model ... however, I'd really like *you* to confirm this.
> 
> I think by the time it's called, we're already in the system death
> throes, so if there were going to be an orderly shut down it's already
> happened, but if not, once you set a devices state to DEL, it will
> defeat any later attempt to do orderly remove of the host or target.
After the first device's shutdown method in device_shutdown has been
called there is now way back (we don't have an opposite to shutdown like
for suspend resume is) so it can't be undone and also it doesn't matter
if some structures remain in memory. To me it seems to be the only
logical consequence but I can't prove it.

>>> I assume the contention is that if we get here, we're either going
>>> for immediate shutdown or all the root device remounting to read
>>> only has already been done?  If so, could you say that?
>>  
>> I can't say that, quite the opposite (see body of the mail). When the
>> system goes shutdown the individual device's .shutdown is called.
>> Just moments after sd shutdown the LLD shutdown is entered and the
>> driver stops any I/O immediately anyway. With this patch the I/O is
>> stopped before reaching LLD with a reasonable message and without
>> error correction mechanism in place.
>>
>> I also assume that no I/O after sd_shutdown was projected when it was
>> written as there is a cache sync followed by a device power down.
> 
> It seems reasonable, but can you validate that?  Shutdown is called
> both from reboot and kexec and if we stop IO to a quiescing root device
> before it completes, so that all filesystems come back dirty, we'll
> have a lot of unhappy users ...
Validate, I don't know how or what you mean. Making sure that everything
is in a state when the kernel shutdown procedure may be entered is for
example on my system done by userspace systemd-shutdown service like so
:"systemd-shutdown[1]: Syncing filesystems and block devices ..."
When the kernel shutdown procedure is entered the I/O will be cut off
abruptly by a LLD.shutdown. That means it depends on scripts like
systemd and that works well on all systems I checked for reboot but not
for kexec on everywhere.
The patch may stop the I/O few msec sooner that is true. But the clearer
 messages may bring a script writer to fix it.

The patch doesn't fix a real bug so it isn't urgent nor important,
seeing the congestion it creates please just drop it.

Tomas

> 
> James
>
Bart Van Assche March 31, 2023, 6:48 p.m. UTC | #9
On 3/31/23 11:32, Tomas Henzl wrote:
> The patch doesn't fix a real bug so it isn't urgent nor important,
> seeing the congestion it creates please just drop it.

Hi Tomas,

I'm interested in failing future I/O from inside sd_shutdown() because 
it would allow me to remove the I/O quiescing code from the UFS driver 
shutdown callback.

Thanks,

Bart.
Tomas Henzl April 2, 2023, 9:49 p.m. UTC | #10
On 3/31/23 20:48, Bart Van Assche wrote:
> On 3/31/23 11:32, Tomas Henzl wrote:
>> The patch doesn't fix a real bug so it isn't urgent nor important,
>> seeing the congestion it creates please just drop it.
> 
> Hi Tomas,
> 
> I'm interested in failing future I/O from inside sd_shutdown() because 
> it would allow me to remove the I/O quiescing code from the UFS driver 
> shutdown callback.
I'm aware of this, other drivers do have similar code and so it would
help elsewhere as well. The patch as it is doesn't however ensure that
there isn't for example an I/O started before sd.shutdown which may
arrive in a driver after his shutdown has been called. Because of that I
haven't used this as an argument in the discussion here.

Regards,
Tomas

> 
> Thanks,
> 
> Bart.
>
Bart Van Assche April 3, 2023, 7:56 p.m. UTC | #11
On 4/2/23 14:49, Tomas Henzl wrote:
> On 3/31/23 20:48, Bart Van Assche wrote:
>> I'm interested in failing future I/O from inside sd_shutdown() because
>> it would allow me to remove the I/O quiescing code from the UFS driver
>> shutdown callback.
> I'm aware of this, other drivers do have similar code and so it would
> help elsewhere as well. The patch as it is doesn't however ensure that
> there isn't for example an I/O started before sd.shutdown which may
> arrive in a driver after his shutdown has been called. Because of that I
> haven't used this as an argument in the discussion here.

Has it been considered to call blk_mq_freeze_queue() and 
blk_mq_unfreeze_queue() to wait for I/O that started earlier?

Has it been considered to set QUEUE_FLAG_DYING to make future SCSI 
commands fail? See also blk_mq_destroy_queue().

Thanks,

Bart.
Tomas Henzl April 4, 2023, 9:29 a.m. UTC | #12
On 4/3/23 21:56, Bart Van Assche wrote:
> On 4/2/23 14:49, Tomas Henzl wrote:
>> On 3/31/23 20:48, Bart Van Assche wrote:
>>> I'm interested in failing future I/O from inside sd_shutdown() because
>>> it would allow me to remove the I/O quiescing code from the UFS driver
>>> shutdown callback.
>> I'm aware of this, other drivers do have similar code and so it would
>> help elsewhere as well. The patch as it is doesn't however ensure that
>> there isn't for example an I/O started before sd.shutdown which may
>> arrive in a driver after his shutdown has been called. Because of that I
>> haven't used this as an argument in the discussion here.
> 
> Has it been considered to call blk_mq_freeze_queue() and 
> blk_mq_unfreeze_queue() to wait for I/O that started earlier?
> 
> Has it been considered to set QUEUE_FLAG_DYING to make future SCSI 
> commands fail? See also blk_mq_destroy_queue().

I haven't been considering anything from above. I just had noticed on a
system timeouts when in kexec and wanted to have that fixed and for this
case it is sufficient when the blocking is only close to 100% (and the
patch is simple).
I, like you, see the benefit of moving a bit of functionality from LLD
to scsi layer but my patch wasn't accepted and I doubt that something
more complex could be.

> 
> Thanks,
> 
> Bart.
>
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4f28dd617eca..8095f0302e66 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3694,10 +3694,13 @@  static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
 static void sd_shutdown(struct device *dev)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
+	struct scsi_device *sdp;
 
 	if (!sdkp)
 		return;         /* this can happen */
 
+	sdp = sdkp->device;
+
 	if (pm_runtime_suspended(dev))
 		return;
 
@@ -3710,6 +3713,10 @@  static void sd_shutdown(struct device *dev)
 		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
 		sd_start_stop_device(sdkp, 0);
 	}
+
+	mutex_lock(&sdp->state_mutex);
+	scsi_device_set_state(sdp, SDEV_DEL);
+	mutex_unlock(&sdp->state_mutex);
 }
 
 static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)