diff mbox series

scsi: sd: add runtime pm to open / release

Message ID 20200623111018.31954-1-martin.kepplinger@puri.sm (mailing list archive)
State Changes Requested
Headers show
Series scsi: sd: add runtime pm to open / release | expand

Commit Message

Martin Kepplinger June 23, 2020, 11:10 a.m. UTC
This add a very conservative but simple implementation for runtime PM
to the sd scsi driver:
Resume when opened (mounted) and suspend when released (unmounted).

Improvements that allow suspending while a device is "open" can
be added later, but now we save power when no filesystem is mounted
and runtime PM is enabled.

Signed-off-by: Martin Kepplinger <martin.kepplinger@puri.sm>
---
 drivers/scsi/sd.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Bart Van Assche June 24, 2020, 1:33 p.m. UTC | #1
On 2020-06-23 04:10, Martin Kepplinger wrote:
> This add a very conservative but simple implementation for runtime PM
> to the sd scsi driver:
> Resume when opened (mounted) and suspend when released (unmounted).
> 
> Improvements that allow suspending while a device is "open" can
> be added later, but now we save power when no filesystem is mounted
> and runtime PM is enabled.
> 
> Signed-off-by: Martin Kepplinger <martin.kepplinger@puri.sm>
> ---
>  drivers/scsi/sd.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d90fefffe31b..fe4cb7c50ec1 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1372,6 +1372,7 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
>  	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_open\n"));
>  
>  	sdev = sdkp->device;
> +	scsi_autopm_get_device(sdev);
>  
>  	/*
>  	 * If the device is in error recovery, wait until it is done.
> @@ -1418,6 +1419,9 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
>  
>  error_out:
>  	scsi_disk_put(sdkp);
> +
> +	scsi_autopm_put_device(sdev);
> +
>  	return retval;	
>  }
>  
> @@ -1441,6 +1445,8 @@ static void sd_release(struct gendisk *disk, fmode_t mode)
>  
>  	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_release\n"));
>  
> +	scsi_autopm_put_device(sdev);
> +
>  	if (atomic_dec_return(&sdkp->openers) == 0 && sdev->removable) {
>  		if (scsi_block_when_processing_errors(sdev))
>  			scsi_set_medium_removal(sdev, SCSI_REMOVAL_ALLOW);

My understanding of the above patch is that it introduces a regression,
namely by disabling runtime suspend as long as an sd device is held open.

Bart.
Martin Kepplinger June 25, 2020, 8:16 a.m. UTC | #2
On 24.06.20 15:33, Bart Van Assche wrote:
> On 2020-06-23 04:10, Martin Kepplinger wrote:
>> This add a very conservative but simple implementation for runtime PM
>> to the sd scsi driver:
>> Resume when opened (mounted) and suspend when released (unmounted).
>>
>> Improvements that allow suspending while a device is "open" can
>> be added later, but now we save power when no filesystem is mounted
>> and runtime PM is enabled.
>>
>> Signed-off-by: Martin Kepplinger <martin.kepplinger@puri.sm>
>> ---
>>  drivers/scsi/sd.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index d90fefffe31b..fe4cb7c50ec1 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1372,6 +1372,7 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
>>  	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_open\n"));
>>  
>>  	sdev = sdkp->device;
>> +	scsi_autopm_get_device(sdev);
>>  
>>  	/*
>>  	 * If the device is in error recovery, wait until it is done.
>> @@ -1418,6 +1419,9 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
>>  
>>  error_out:
>>  	scsi_disk_put(sdkp);
>> +
>> +	scsi_autopm_put_device(sdev);
>> +
>>  	return retval;	
>>  }
>>  
>> @@ -1441,6 +1445,8 @@ static void sd_release(struct gendisk *disk, fmode_t mode)
>>  
>>  	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_release\n"));
>>  
>> +	scsi_autopm_put_device(sdev);
>> +
>>  	if (atomic_dec_return(&sdkp->openers) == 0 && sdev->removable) {
>>  		if (scsi_block_when_processing_errors(sdev))
>>  			scsi_set_medium_removal(sdev, SCSI_REMOVAL_ALLOW);
> 
> My understanding of the above patch is that it introduces a regression,
> namely by disabling runtime suspend as long as an sd device is held open.
> 
> Bart.
> 
> 

hi Bart,

Alan says the same (on block request, the block layer should initiate a
runtime resume), so merging with the thread from
https://lore.kernel.org/linux-usb/8738e4d3-62b1-0144-107d-ff42000ed6c6@puri.sm/T/
now and answer to both Bart and Alan here:]

I see scsi-pm.c using the blk-pm.c API but I'm not sure how the block
layer would itself resume the scsi device (I use it via usb_storage, so
that usb_stor_resume() follows in my case but I guess that doesn't
matter here):

my understanding of "sd" is: enable runtime pm in probe(), so *allow*
the device to be suspended (if enabled by the user), but never
resume(?). Also, why isn't "autopm" used in its ioctl() implementation
(as opposed to in "sr")?

here's roughly what happens when enabling runtime PM in sysfs (again,
because sd_probe() calls autopm_put() and thus allows it:

[   27.384446] sd 0:0:0:0: scsi_runtime_suspend
[   27.432282] blk_pre_runtime_suspend
[   27.435783] sd_suspend_common
[   27.438782] blk_post_runtime_suspend
[   27.442427] scsi target0:0:0: scsi_runtime_suspend
[   27.447303] scsi host0: scsi_runtime_suspend

then I "mount /dev/sda1 /mnt" and none of the resume() functions get
called. To me it looks like the sd driver should initiate resuming, and
that's not implemented.

what am I doing wrong or overlooking? how exactly does (or should) the
block layer initiate resume here?

thanks again for your time,

                               martin
Alan Stern June 25, 2020, 2:52 p.m. UTC | #3
On Thu, Jun 25, 2020 at 10:16:06AM +0200, Martin Kepplinger wrote:
> On 24.06.20 15:33, Bart Van Assche wrote:
> > On 2020-06-23 04:10, Martin Kepplinger wrote:
> >> This add a very conservative but simple implementation for runtime PM
> >> to the sd scsi driver:
> >> Resume when opened (mounted) and suspend when released (unmounted).
> >>
> >> Improvements that allow suspending while a device is "open" can
> >> be added later, but now we save power when no filesystem is mounted
> >> and runtime PM is enabled.
> >>
> >> Signed-off-by: Martin Kepplinger <martin.kepplinger@puri.sm>
> >> ---
> >>  drivers/scsi/sd.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> >> index d90fefffe31b..fe4cb7c50ec1 100644
> >> --- a/drivers/scsi/sd.c
> >> +++ b/drivers/scsi/sd.c
> >> @@ -1372,6 +1372,7 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
> >>  	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_open\n"));
> >>  
> >>  	sdev = sdkp->device;
> >> +	scsi_autopm_get_device(sdev);
> >>  
> >>  	/*
> >>  	 * If the device is in error recovery, wait until it is done.
> >> @@ -1418,6 +1419,9 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
> >>  
> >>  error_out:
> >>  	scsi_disk_put(sdkp);
> >> +
> >> +	scsi_autopm_put_device(sdev);
> >> +
> >>  	return retval;	
> >>  }
> >>  
> >> @@ -1441,6 +1445,8 @@ static void sd_release(struct gendisk *disk, fmode_t mode)
> >>  
> >>  	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_release\n"));
> >>  
> >> +	scsi_autopm_put_device(sdev);
> >> +
> >>  	if (atomic_dec_return(&sdkp->openers) == 0 && sdev->removable) {
> >>  		if (scsi_block_when_processing_errors(sdev))
> >>  			scsi_set_medium_removal(sdev, SCSI_REMOVAL_ALLOW);
> > 
> > My understanding of the above patch is that it introduces a regression,
> > namely by disabling runtime suspend as long as an sd device is held open.
> > 
> > Bart.
> > 
> > 
> 
> hi Bart,
> 
> Alan says the same (on block request, the block layer should initiate a
> runtime resume), so merging with the thread from
> https://lore.kernel.org/linux-usb/8738e4d3-62b1-0144-107d-ff42000ed6c6@puri.sm/T/
> now and answer to both Bart and Alan here:]
> 
> I see scsi-pm.c using the blk-pm.c API but I'm not sure how the block
> layer would itself resume the scsi device (I use it via usb_storage, so
> that usb_stor_resume() follows in my case but I guess that doesn't
> matter here):

The block layer does this in block/blk-core.c:blk_queue_enter(), as part 
of the condition check in the call to wait_event() near the end of the 
function.  The blk_pm_request_resume() inline routine calls 
pm_request_resume().

At least, that's what is _supposed_ to happen.  See commit 0d25bd072b49 
("block: Schedule runtime resume earlier").

> my understanding of "sd" is: enable runtime pm in probe(), so *allow*
> the device to be suspended (if enabled by the user), but never
> resume(?). Also, why isn't "autopm" used in its ioctl() implementation
> (as opposed to in "sr")?

I don't remember the reason.  It may be that the code in sr.c isn't 
needed.

> here's roughly what happens when enabling runtime PM in sysfs (again,
> because sd_probe() calls autopm_put() and thus allows it:
> 
> [   27.384446] sd 0:0:0:0: scsi_runtime_suspend
> [   27.432282] blk_pre_runtime_suspend
> [   27.435783] sd_suspend_common
> [   27.438782] blk_post_runtime_suspend
> [   27.442427] scsi target0:0:0: scsi_runtime_suspend
> [   27.447303] scsi host0: scsi_runtime_suspend
> 
> then I "mount /dev/sda1 /mnt" and none of the resume() functions get
> called. To me it looks like the sd driver should initiate resuming, and
> that's not implemented.
> 
> what am I doing wrong or overlooking? how exactly does (or should) the
> block layer initiate resume here?

I don't know what's going wrong.  Bart, can you look into it?  As far as I 
can tell, you're the last person to touch the block-layer's runtime PM 
code.

Alan Stern
Bart Van Assche June 26, 2020, 3:53 a.m. UTC | #4
On 2020-06-25 01:16, Martin Kepplinger wrote:
> Also, why isn't "autopm" used in its ioctl() implementation
> (as opposed to in "sr")?

Some of the scsi_autopm_{get,put}_device() calls in the sr driver
have been introduced by me before I fully understood runtime pm.
I will have a another look to see whether these are really
necessary and if not, post a patch to remove these again.

Bart.
Bart Van Assche June 26, 2020, 3:07 p.m. UTC | #5
On 2020-06-25 01:16, Martin Kepplinger wrote:
> here's roughly what happens when enabling runtime PM in sysfs (again,
> because sd_probe() calls autopm_put() and thus allows it:
> 
> [   27.384446] sd 0:0:0:0: scsi_runtime_suspend
> [   27.432282] blk_pre_runtime_suspend
> [   27.435783] sd_suspend_common
> [   27.438782] blk_post_runtime_suspend
> [   27.442427] scsi target0:0:0: scsi_runtime_suspend
> [   27.447303] scsi host0: scsi_runtime_suspend
> 
> then I "mount /dev/sda1 /mnt" and none of the resume() functions get
> called. To me it looks like the sd driver should initiate resuming, and
> that's not implemented.
> 
> what am I doing wrong or overlooking? how exactly does (or should) the
> block layer initiate resume here?

As far as I know runtime power management support in the sd driver is working
fine and is being used intensively by the UFS driver. The following commit was
submitted to fix a bug encountered by an UFS developer: 05d18ae1cc8a ("scsi:
pm: Balance pm_only counter of request queue during system resume") # v5.7.
I'm not sure which bug is causing trouble on your setup but I think it's likely
that the root cause is somewhere else than in the block layer, the SCSI core
or the SCSI sd driver.

Bart.
Alan Stern June 26, 2020, 3:44 p.m. UTC | #6
On Fri, Jun 26, 2020 at 08:07:51AM -0700, Bart Van Assche wrote:
> On 2020-06-25 01:16, Martin Kepplinger wrote:
> > here's roughly what happens when enabling runtime PM in sysfs (again,
> > because sd_probe() calls autopm_put() and thus allows it:
> > 
> > [   27.384446] sd 0:0:0:0: scsi_runtime_suspend
> > [   27.432282] blk_pre_runtime_suspend
> > [   27.435783] sd_suspend_common
> > [   27.438782] blk_post_runtime_suspend
> > [   27.442427] scsi target0:0:0: scsi_runtime_suspend
> > [   27.447303] scsi host0: scsi_runtime_suspend
> > 
> > then I "mount /dev/sda1 /mnt" and none of the resume() functions get
> > called. To me it looks like the sd driver should initiate resuming, and
> > that's not implemented.
> > 
> > what am I doing wrong or overlooking? how exactly does (or should) the
> > block layer initiate resume here?
> 
> As far as I know runtime power management support in the sd driver is working
> fine and is being used intensively by the UFS driver. The following commit was
> submitted to fix a bug encountered by an UFS developer: 05d18ae1cc8a ("scsi:
> pm: Balance pm_only counter of request queue during system resume") # v5.7.

I just looked at that commit for the first time.

Instead of making the SCSI driver do the work of deciding what routine to 
call, why not redefine blk_set_runtime_active(q) to simply call 
blk_post_runtime_resume(q, 0)?  Or vice versa: if err == 0 have 
blk_post_runtime_resume call blk_set_runtime_active?

After all, the two routines do almost the same thing -- and the bug 
addressed by this commit was caused by the difference in their behaviors.

If the device was already runtime-active during the system suspend, doing 
an extra clear of the pm_only counter won't hurt anything.

> I'm not sure which bug is causing trouble on your setup but I think it's likely
> that the root cause is somewhere else than in the block layer, the SCSI core
> or the SCSI sd driver.
> 
> Bart.

Martin's best approach would be to add some debugging code to find out why 
blk_queue_enter() isn't calling bkl_pm_request_resume(), or why that call 
doesn't lead to pm_request_resume().

Alan Stern
Bart Van Assche June 28, 2020, 2:37 a.m. UTC | #7
On 2020-06-26 08:44, Alan Stern wrote:
> On Fri, Jun 26, 2020 at 08:07:51AM -0700, Bart Van Assche wrote:
>> As far as I know runtime power management support in the sd driver is working
>> fine and is being used intensively by the UFS driver. The following commit was
>> submitted to fix a bug encountered by an UFS developer: 05d18ae1cc8a ("scsi:
>> pm: Balance pm_only counter of request queue during system resume") # v5.7.
> 
> I just looked at that commit for the first time.
> 
> Instead of making the SCSI driver do the work of deciding what routine to 
> call, why not redefine blk_set_runtime_active(q) to simply call 
> blk_post_runtime_resume(q, 0)?  Or vice versa: if err == 0 have 
> blk_post_runtime_resume call blk_set_runtime_active?
> 
> After all, the two routines do almost the same thing -- and the bug 
> addressed by this commit was caused by the difference in their behaviors.
> 
> If the device was already runtime-active during the system suspend, doing 
> an extra clear of the pm_only counter won't hurt anything.

Hi Alan,

Do you want to submit a patch that implements this change or do you
perhaps expect me to do that?

Thanks,

Bart.
Alan Stern June 28, 2020, 1:10 p.m. UTC | #8
On Sat, Jun 27, 2020 at 07:37:54PM -0700, Bart Van Assche wrote:
> On 2020-06-26 08:44, Alan Stern wrote:
> > On Fri, Jun 26, 2020 at 08:07:51AM -0700, Bart Van Assche wrote:
> >> As far as I know runtime power management support in the sd driver is working
> >> fine and is being used intensively by the UFS driver. The following commit was
> >> submitted to fix a bug encountered by an UFS developer: 05d18ae1cc8a ("scsi:
> >> pm: Balance pm_only counter of request queue during system resume") # v5.7.
> > 
> > I just looked at that commit for the first time.
> > 
> > Instead of making the SCSI driver do the work of deciding what routine to 
> > call, why not redefine blk_set_runtime_active(q) to simply call 
> > blk_post_runtime_resume(q, 0)?  Or vice versa: if err == 0 have 
> > blk_post_runtime_resume call blk_set_runtime_active?
> > 
> > After all, the two routines do almost the same thing -- and the bug 
> > addressed by this commit was caused by the difference in their behaviors.
> > 
> > If the device was already runtime-active during the system suspend, doing 
> > an extra clear of the pm_only counter won't hurt anything.
> 
> Hi Alan,
> 
> Do you want to submit a patch that implements this change or do you
> perhaps expect me to do that?

I'll submit a patch in a few days.  I just wanted to check first that the 
idea was sound.

Alan Stern
Martin Kepplinger June 29, 2020, 9:42 a.m. UTC | #9
On 26.06.20 17:44, Alan Stern wrote:
> On Fri, Jun 26, 2020 at 08:07:51AM -0700, Bart Van Assche wrote:
>> On 2020-06-25 01:16, Martin Kepplinger wrote:
>>> here's roughly what happens when enabling runtime PM in sysfs (again,
>>> because sd_probe() calls autopm_put() and thus allows it:
>>>
>>> [   27.384446] sd 0:0:0:0: scsi_runtime_suspend
>>> [   27.432282] blk_pre_runtime_suspend
>>> [   27.435783] sd_suspend_common
>>> [   27.438782] blk_post_runtime_suspend
>>> [   27.442427] scsi target0:0:0: scsi_runtime_suspend
>>> [   27.447303] scsi host0: scsi_runtime_suspend
>>>
>>> then I "mount /dev/sda1 /mnt" and none of the resume() functions get
>>> called. To me it looks like the sd driver should initiate resuming, and
>>> that's not implemented.
>>>
>>> what am I doing wrong or overlooking? how exactly does (or should) the
>>> block layer initiate resume here?
>>
>> As far as I know runtime power management support in the sd driver is working
>> fine and is being used intensively by the UFS driver. The following commit was
>> submitted to fix a bug encountered by an UFS developer: 05d18ae1cc8a ("scsi:
>> pm: Balance pm_only counter of request queue during system resume") # v5.7.
> 
> I just looked at that commit for the first time.
> 
> Instead of making the SCSI driver do the work of deciding what routine to 
> call, why not redefine blk_set_runtime_active(q) to simply call 
> blk_post_runtime_resume(q, 0)?  Or vice versa: if err == 0 have 
> blk_post_runtime_resume call blk_set_runtime_active?
> 
> After all, the two routines do almost the same thing -- and the bug 
> addressed by this commit was caused by the difference in their behaviors.
> 
> If the device was already runtime-active during the system suspend, doing 
> an extra clear of the pm_only counter won't hurt anything.
> 
>> I'm not sure which bug is causing trouble on your setup but I think it's likely
>> that the root cause is somewhere else than in the block layer, the SCSI core
>> or the SCSI sd driver.
>>
>> Bart.
> 
> Martin's best approach would be to add some debugging code to find out why 
> blk_queue_enter() isn't calling bkl_pm_request_resume(), or why that call 
> doesn't lead to pm_request_resume().
> 
> Alan Stern
> 

Hi Alan,

blk_queue_enter() always - especially when sd is runtime suspended and I
try to mount as above - sets success to be true for me, so never
continues down to bkl_pm_request_resume(). All I see is "PM: Removing
info for No Bus:sda1".

thanks,
                                martin
Alan Stern June 29, 2020, 4:15 p.m. UTC | #10
On Mon, Jun 29, 2020 at 11:42:59AM +0200, Martin Kepplinger wrote:
> 
> 
> On 26.06.20 17:44, Alan Stern wrote:
> > Martin's best approach would be to add some debugging code to find out why 
> > blk_queue_enter() isn't calling bkl_pm_request_resume(), or why that call 
> > doesn't lead to pm_request_resume().
> > 
> > Alan Stern
> > 
> 
> Hi Alan,
> 
> blk_queue_enter() always - especially when sd is runtime suspended and I
> try to mount as above - sets success to be true for me, so never
> continues down to bkl_pm_request_resume(). All I see is "PM: Removing
> info for No Bus:sda1".

Aha.  Looking at this more closely, it's apparent that the code in 
blk-core.c contains a logic bug: It assumes that if the BLK_MQ_REQ_PREEMPT 
flag is set then the request can be issued regardless of the queue's 
runtime status.  That is not correct when the queue is suspended.

Below is my attempt to fix this up.  I'm not sure that the patch is 
entirely correct, but it should fix this logic bug.  I would appreciate a 
critical review.

Martin, does this fix the problem?

Alan Stern



Index: usb-devel/block/blk-core.c
===================================================================
--- usb-devel.orig/block/blk-core.c
+++ usb-devel/block/blk-core.c
@@ -423,7 +423,8 @@ int blk_queue_enter(struct request_queue
 			 * responsible for ensuring that that counter is
 			 * globally visible before the queue is unfrozen.
 			 */
-			if (pm || !blk_queue_pm_only(q)) {
+			if ((pm && q->rpm_status != RPM_SUSPENDED) ||
+			    !blk_queue_pm_only(q)) {
 				success = true;
 			} else {
 				percpu_ref_put(&q->q_usage_counter);
@@ -448,8 +449,7 @@ int blk_queue_enter(struct request_queue
 
 		wait_event(q->mq_freeze_wq,
 			   (!q->mq_freeze_depth &&
-			    (pm || (blk_pm_request_resume(q),
-				    !blk_queue_pm_only(q)))) ||
+			    blk_pm_resume_queue(pm, q)) ||
 			   blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
Index: usb-devel/block/blk-pm.h
===================================================================
--- usb-devel.orig/block/blk-pm.h
+++ usb-devel/block/blk-pm.h
@@ -6,11 +6,14 @@
 #include <linux/pm_runtime.h>
 
 #ifdef CONFIG_PM
-static inline void blk_pm_request_resume(struct request_queue *q)
+static inline int blk_pm_resume_queue(const bool pm, struct request_queue *q)
 {
-	if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
-		       q->rpm_status == RPM_SUSPENDING))
-		pm_request_resume(q->dev);
+	if (!q->dev || !blk_queue_pm_only(q))
+		return 1;	/* Nothing to do */
+	if (pm && q->rpm_status != RPM_SUSPENDED)
+		return 1;	/* Request allowed */
+	pm_request_resume(q->dev);
+	return 0;
 }
 
 static inline void blk_pm_mark_last_busy(struct request *rq)
@@ -44,8 +47,9 @@ static inline void blk_pm_put_request(st
 		--rq->q->nr_pending;
 }
 #else
-static inline void blk_pm_request_resume(struct request_queue *q)
+static inline int blk_pm_resume_queue(const bool pm, struct request_queue *q)
 {
+	return 1;
 }
 
 static inline void blk_pm_mark_last_busy(struct request *rq)
Bart Van Assche June 29, 2020, 4:56 p.m. UTC | #11
On 2020-06-29 09:15, Alan Stern wrote:
> Aha.  Looking at this more closely, it's apparent that the code in 
> blk-core.c contains a logic bug: It assumes that if the BLK_MQ_REQ_PREEMPT 
> flag is set then the request can be issued regardless of the queue's 
> runtime status.  That is not correct when the queue is suspended.

Please clarify why this is not correct.

> Index: usb-devel/block/blk-core.c
> ===================================================================
> --- usb-devel.orig/block/blk-core.c
> +++ usb-devel/block/blk-core.c
> @@ -423,7 +423,8 @@ int blk_queue_enter(struct request_queue
>  			 * responsible for ensuring that that counter is
>  			 * globally visible before the queue is unfrozen.
>  			 */
> -			if (pm || !blk_queue_pm_only(q)) {
> +			if ((pm && q->rpm_status != RPM_SUSPENDED) ||
> +			    !blk_queue_pm_only(q)) {
>  				success = true;
>  			} else {
>  				percpu_ref_put(&q->q_usage_counter);

Does the above change make it impossible to bring a suspended device
back to the RPM_ACTIVE state if the BLK_MQ_REQ_NOWAIT flag is set?

> @@ -448,8 +449,7 @@ int blk_queue_enter(struct request_queue
>  
>  		wait_event(q->mq_freeze_wq,
>  			   (!q->mq_freeze_depth &&
> -			    (pm || (blk_pm_request_resume(q),
> -				    !blk_queue_pm_only(q)))) ||
> +			    blk_pm_resume_queue(pm, q)) ||
>  			   blk_queue_dying(q));
>  		if (blk_queue_dying(q))
>  			return -ENODEV;
> Index: usb-devel/block/blk-pm.h
> ===================================================================
> --- usb-devel.orig/block/blk-pm.h
> +++ usb-devel/block/blk-pm.h
> @@ -6,11 +6,14 @@
>  #include <linux/pm_runtime.h>
>  
>  #ifdef CONFIG_PM
> -static inline void blk_pm_request_resume(struct request_queue *q)
> +static inline int blk_pm_resume_queue(const bool pm, struct request_queue *q)
>  {
> -	if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
> -		       q->rpm_status == RPM_SUSPENDING))
> -		pm_request_resume(q->dev);
> +	if (!q->dev || !blk_queue_pm_only(q))
> +		return 1;	/* Nothing to do */
> +	if (pm && q->rpm_status != RPM_SUSPENDED)
> +		return 1;	/* Request allowed */
> +	pm_request_resume(q->dev);
> +	return 0;
>  }

Does the above change, especially the " && q->rpm_status !=
RPM_SUSPENDED" part, make it impossible to bring a suspended device back
to the RPM_ACTIVE state?

Thanks,

Bart.
Alan Stern June 29, 2020, 5:40 p.m. UTC | #12
On Mon, Jun 29, 2020 at 09:56:49AM -0700, Bart Van Assche wrote:
> On 2020-06-29 09:15, Alan Stern wrote:
> > Aha.  Looking at this more closely, it's apparent that the code in 
> > blk-core.c contains a logic bug: It assumes that if the BLK_MQ_REQ_PREEMPT 
> > flag is set then the request can be issued regardless of the queue's 
> > runtime status.  That is not correct when the queue is suspended.
> 
> Please clarify why this is not correct.

As I understand it, BLK_MQ_REQ_PREEMPT is supposed to mean (among other 
things) that this request may be issued as part of the procedure for 
putting a device into a low-power state or returning it to a high-power 
state.  Consequently, requests with that flag set must be allowed while 
the queue is in the RPM_SUSPENDING or RPM_RESUMING runtime states -- as 
opposed to ordinary requests, which are allowed only in the RPM_ACTIVE 
state.

In the RPM_SUSPENDED state, however, the queue is entirely inactive.  Even 
if a request were to be issued somehow, it would fail because the system 
would not be able to transmit it to the device.  In other words, when the 
queue is in the RPM_SUSPENDED state, a resume must be requested before 
_any_ request can be issued.

> > Index: usb-devel/block/blk-core.c
> > ===================================================================
> > --- usb-devel.orig/block/blk-core.c
> > +++ usb-devel/block/blk-core.c
> > @@ -423,7 +423,8 @@ int blk_queue_enter(struct request_queue
> >  			 * responsible for ensuring that that counter is
> >  			 * globally visible before the queue is unfrozen.
> >  			 */
> > -			if (pm || !blk_queue_pm_only(q)) {
> > +			if ((pm && q->rpm_status != RPM_SUSPENDED) ||
> > +			    !blk_queue_pm_only(q)) {
> >  				success = true;
> >  			} else {
> >  				percpu_ref_put(&q->q_usage_counter);
> 
> Does the above change make it impossible to bring a suspended device
> back to the RPM_ACTIVE state if the BLK_MQ_REQ_NOWAIT flag is set?

The only case affected by this change is when BLK_MQ_REQ_PREEMPT is set 
and the queue is in the RPM_SUSPENDED state.  If BLK_MQ_REQ_NOWAIT was 
also set, the original code would set "success" to true, allowing the 
request to proceed even though it could not be carried out immediately -- 
a bug.

With the patch, such a request will fail without resuming the device.  I 
don't know whether that is the desired behavior or not, but at least it's 
not obviously a bug.

It does seem odd that blk_queue_enter() tests the queue's pm_only status 
and the request flag in two different spots (here and below).  Why does it 
do this?  It seems like an invitation for bugs.

> > @@ -448,8 +449,7 @@ int blk_queue_enter(struct request_queue
> >  
> >  		wait_event(q->mq_freeze_wq,
> >  			   (!q->mq_freeze_depth &&
> > -			    (pm || (blk_pm_request_resume(q),
> > -				    !blk_queue_pm_only(q)))) ||
> > +			    blk_pm_resume_queue(pm, q)) ||
> >  			   blk_queue_dying(q));
> >  		if (blk_queue_dying(q))
> >  			return -ENODEV;
> > Index: usb-devel/block/blk-pm.h
> > ===================================================================
> > --- usb-devel.orig/block/blk-pm.h
> > +++ usb-devel/block/blk-pm.h
> > @@ -6,11 +6,14 @@
> >  #include <linux/pm_runtime.h>
> >  
> >  #ifdef CONFIG_PM
> > -static inline void blk_pm_request_resume(struct request_queue *q)
> > +static inline int blk_pm_resume_queue(const bool pm, struct request_queue *q)
> >  {
> > -	if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
> > -		       q->rpm_status == RPM_SUSPENDING))
> > -		pm_request_resume(q->dev);
> > +	if (!q->dev || !blk_queue_pm_only(q))
> > +		return 1;	/* Nothing to do */
> > +	if (pm && q->rpm_status != RPM_SUSPENDED)
> > +		return 1;	/* Request allowed */
> > +	pm_request_resume(q->dev);
> > +	return 0;
> >  }
> 
> Does the above change, especially the " && q->rpm_status !=
> RPM_SUSPENDED" part, make it impossible to bring a suspended device back
> to the RPM_ACTIVE state?

Just the opposite -- the change makes it _possible_ for a 
BLK_MQ_REQ_PREEMPT request to bring a suspended device back to the 
RPM_ACTIVE state.

Look at the existing code: If pm is true then blk_pm_request_resume() will 
be skipped, so the device won't be resumed.  With this patch -- in 
particular with the "&& q->rpm_status != RPM_SUSPENDED" part added -- the 
call won't be skipped and so the resume will take place.

The rather complicated syntax of the wait_event() call in the existing 
code contributes to this confusion.  One of the things my patch tries to 
do is make the code more straightforward and easier to grasp.

I admit that there are parts to this thing I don't understand.  The 
wait_event() call in blk_queue_enter(), for example: If we are waiting for 
the device to leave the RPM_SUSPENDED state (or enter the RPM_ACTIVE 
state), where does q->mq_freeze_wq get woken up?  There's no obvious spot 
in blk_{pre|post}_runtime_resume().

Alan Stern
Martin Kepplinger June 30, 2020, 3:33 a.m. UTC | #13
On 29.06.20 18:15, Alan Stern wrote:
> On Mon, Jun 29, 2020 at 11:42:59AM +0200, Martin Kepplinger wrote:
>>
>>
>> On 26.06.20 17:44, Alan Stern wrote:
>>> Martin's best approach would be to add some debugging code to find out why 
>>> blk_queue_enter() isn't calling bkl_pm_request_resume(), or why that call 
>>> doesn't lead to pm_request_resume().
>>>
>>> Alan Stern
>>>
>>
>> Hi Alan,
>>
>> blk_queue_enter() always - especially when sd is runtime suspended and I
>> try to mount as above - sets success to be true for me, so never
>> continues down to bkl_pm_request_resume(). All I see is "PM: Removing
>> info for No Bus:sda1".
> 
> Aha.  Looking at this more closely, it's apparent that the code in 
> blk-core.c contains a logic bug: It assumes that if the BLK_MQ_REQ_PREEMPT 
> flag is set then the request can be issued regardless of the queue's 
> runtime status.  That is not correct when the queue is suspended.
> 
> Below is my attempt to fix this up.  I'm not sure that the patch is 
> entirely correct, but it should fix this logic bug.  I would appreciate a 
> critical review.
> 
> Martin, does this fix the problem?
> 

not quite: mounting works and resuming itself indeed happens now when
copying a file, but the I/O itself doesn't, but says "device offline or
changed":

[  167.167615] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[  167.167630] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
[  167.167638] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
[  167.167648] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 24
c2 00 00 01 00
[  167.167658] blk_update_request: I/O error, dev sda, sector 9410 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  167.178327] FAT-fs (sda1): FAT read failed (blocknr 1218)
[  167.183895] sd 0:0:0:0: [sda] tag#0 device offline or changed
[  167.189695] blk_update_request: I/O error, dev sda, sector 5101888 op
0x0:(READ) flags 0x80700 phys_seg 8 prio class 0
[  167.200510] sd 0:0:0:0: [sda] tag#0 device offline or changed


and a later try to copy a file only yields (mostly my own debug prints):


[  371.110798] blk_queue_enter: wait_event: pm=0
[  371.300666] scsi_runtime_resume
[  371.303834] scsi_runtime_resume
[  371.307007] scsi_runtime_resume
[  371.310213] sd 0:0:0:0: [sda] tag#0 device offline or changed
[  371.316011] blk_update_request: I/O error, dev sda, sector 5101888 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  372.560690] scsi_runtime_suspend
[  372.563968] scsi_runtime_suspend
[  372.567237] scsi_runtime_suspend

thanks Alan for taking the time and trying to fix this! you're close.
what is missing?

                                martin
Alan Stern June 30, 2020, 1:38 p.m. UTC | #14
On Tue, Jun 30, 2020 at 05:33:25AM +0200, Martin Kepplinger wrote:
> > Martin, does this fix the problem?
> > 
> 
> not quite: mounting works and resuming itself indeed happens now when
> copying a file, but the I/O itself doesn't, but says "device offline or
> changed":
> 
> [  167.167615] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
> [  167.167630] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
> [  167.167638] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0

That code stands for "Not-ready to ready transition".  It isn't really an 
error, just a notification.  The command should have been retried.

> [  167.167648] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 24
> c2 00 00 01 00
> [  167.167658] blk_update_request: I/O error, dev sda, sector 9410 op
> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [  167.178327] FAT-fs (sda1): FAT read failed (blocknr 1218)

And it should not have failed.  Martin or James, any ideas about this?

> [  167.183895] sd 0:0:0:0: [sda] tag#0 device offline or changed
> [  167.189695] blk_update_request: I/O error, dev sda, sector 5101888 op
> 0x0:(READ) flags 0x80700 phys_seg 8 prio class 0
> [  167.200510] sd 0:0:0:0: [sda] tag#0 device offline or changed
> 
> 
> and a later try to copy a file only yields (mostly my own debug prints):
> 
> 
> [  371.110798] blk_queue_enter: wait_event: pm=0
> [  371.300666] scsi_runtime_resume
> [  371.303834] scsi_runtime_resume
> [  371.307007] scsi_runtime_resume
> [  371.310213] sd 0:0:0:0: [sda] tag#0 device offline or changed
> [  371.316011] blk_update_request: I/O error, dev sda, sector 5101888 op
> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

No way to tell from the log what caused this error.

> [  372.560690] scsi_runtime_suspend
> [  372.563968] scsi_runtime_suspend
> [  372.567237] scsi_runtime_suspend
> 
> thanks Alan for taking the time and trying to fix this! you're close.
> what is missing?

At this point I don't know.

Alan Stern
Bart Van Assche June 30, 2020, 3:59 p.m. UTC | #15
On 2020-06-29 09:15, Alan Stern wrote:
> Aha.  Looking at this more closely, it's apparent that the code in 
> blk-core.c contains a logic bug: It assumes that if the BLK_MQ_REQ_PREEMPT 
> flag is set then the request can be issued regardless of the queue's 
> runtime status.  That is not correct when the queue is suspended.

Are you sure of this? In the past (legacy block layer) no requests were
processed for queues in state RPM_SUSPENDED. However, that function and
its successor blk_pm_allow_request() are gone. The following code was
removed by commit 7cedffec8e75 ("block: Make blk_get_request() block for
non-PM requests while suspended").

static struct request *blk_pm_peek_request(struct request_queue *q,
                                           struct request *rq)
{
        if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
            (q->rpm_status != RPM_ACTIVE && !(rq->cmd_flags & REQ_PM))))
                return NULL;
        else
                return rq;
}

Bart.
Alan Stern June 30, 2020, 6:02 p.m. UTC | #16
On Tue, Jun 30, 2020 at 08:59:00AM -0700, Bart Van Assche wrote:
> On 2020-06-29 09:15, Alan Stern wrote:
> > Aha.  Looking at this more closely, it's apparent that the code in 
> > blk-core.c contains a logic bug: It assumes that if the BLK_MQ_REQ_PREEMPT 
> > flag is set then the request can be issued regardless of the queue's 
> > runtime status.  That is not correct when the queue is suspended.
> 
> Are you sure of this? In the past (legacy block layer) no requests were
> processed for queues in state RPM_SUSPENDED. However, that function and
> its successor blk_pm_allow_request() are gone. The following code was
> removed by commit 7cedffec8e75 ("block: Make blk_get_request() block for
> non-PM requests while suspended").
> 
> static struct request *blk_pm_peek_request(struct request_queue *q,
>                                            struct request *rq)
> {
>         if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
>             (q->rpm_status != RPM_ACTIVE && !(rq->cmd_flags & REQ_PM))))
>                 return NULL;
>         else
>                 return rq;
> }

No, it wasn't.  Another routine, blk_pm_allow_request(), was removed by 
that commit, but almost no other code was deleted.  Maybe you're thinking 
of a different commit?

In any case, I don't understand the point you're trying to make.

Here's what I _do_ understand: While the queue is in the RPM_SUSPENDED 
state, it can't carry out any requests at all.  If a request is added to 
the queue, the queue must be resumed before the request can be issued to 
the lower-layer drivers -- no matter what flags are set in the request.

Right now there doesn't seem to be any mechanism for resuming the queue 
if an REQ_PREEMPT request is added while the queue is suspended.

Alan Stern
Bart Van Assche June 30, 2020, 7:23 p.m. UTC | #17
On 2020-06-30 11:02, Alan Stern wrote:
> Right now there doesn't seem to be any mechanism for resuming the queue 
> if an REQ_PREEMPT request is added while the queue is suspended.

I do not agree with the above statement. My understanding is that resuming
happens as follows if a request is submitted against a runtime suspended
queue owned by a SCSI LLD:

blk_queue_enter()
  -> blk_pm_request_resume()
    -> pm_request_resume(dev)
      -> __pm_runtime_resume(dev, RPM_ASYNC)
        -> rpm_resume(dev, RPM_ASYNC)
          -> dev->power.request = RPM_REQ_RESUME;
          -> queue_work(pm_wq, &dev->power.work)
            -> pm_runtime_work()
              -> rpm_resume(dev, RPM_NOWAIT)
                -> callback = scsi_runtime_resume;
                -> rpm_callback(callback, dev);
                  -> scsi_runtime_resume(dev);
                    -> sdev_runtime_resume(dev);
                      -> blk_pre_runtime_resume(sdev->request_queue);
                        -> q->rpm_status = RPM_RESUMING;
                      -> sd_resume(dev);
                        -> sd_start_stop_device(sdkp);
                          -> sd_pm_ops.runtime_resume == scsi_execute(sdp, START);
                            -> blk_get_request(..., ..., BLK_MQ_REQ_PREEMPT)
                              -> blk_mq_alloc_request()
                                -> blk_queue_enter()
                                -> __blk_mq_alloc_request()
                            -> blk_execute_rq()
                            -> blk_put_request()
                      -> blk_post_runtime_resume(sdev->request_queue);
                        -> q->rpm_status = RPM_ACTIVE;
                        -> pm_runtime_mark_last_busy(q->dev);
                        -> pm_request_autosuspend(q->dev);

Bart.
Alan Stern June 30, 2020, 7:38 p.m. UTC | #18
On Tue, Jun 30, 2020 at 12:23:12PM -0700, Bart Van Assche wrote:
> On 2020-06-30 11:02, Alan Stern wrote:
> > Right now there doesn't seem to be any mechanism for resuming the queue 
> > if an REQ_PREEMPT request is added while the queue is suspended.
> 
> I do not agree with the above statement. My understanding is that resuming
> happens as follows if a request is submitted against a runtime suspended
> queue owned by a SCSI LLD:
> 
> blk_queue_enter()
>   -> blk_pm_request_resume()

Assume that BLK_MQ_REQ_PREEMPT is set in flags.  Then where exactly 
does blk_queue_enter(q, flags) call blk_pm_request_resume(q)?

Nowhere, as far as I can see.

Alan Stern

>     -> pm_request_resume(dev)
>       -> __pm_runtime_resume(dev, RPM_ASYNC)
>         -> rpm_resume(dev, RPM_ASYNC)
>           -> dev->power.request = RPM_REQ_RESUME;
>           -> queue_work(pm_wq, &dev->power.work)
>             -> pm_runtime_work()
>               -> rpm_resume(dev, RPM_NOWAIT)
>                 -> callback = scsi_runtime_resume;
>                 -> rpm_callback(callback, dev);
>                   -> scsi_runtime_resume(dev);
>                     -> sdev_runtime_resume(dev);
>                       -> blk_pre_runtime_resume(sdev->request_queue);
>                         -> q->rpm_status = RPM_RESUMING;
>                       -> sd_resume(dev);
>                         -> sd_start_stop_device(sdkp);
>                           -> sd_pm_ops.runtime_resume == scsi_execute(sdp, START);
>                             -> blk_get_request(..., ..., BLK_MQ_REQ_PREEMPT)
>                               -> blk_mq_alloc_request()
>                                 -> blk_queue_enter()
>                                 -> __blk_mq_alloc_request()
>                             -> blk_execute_rq()
>                             -> blk_put_request()
>                       -> blk_post_runtime_resume(sdev->request_queue);
>                         -> q->rpm_status = RPM_ACTIVE;
>                         -> pm_runtime_mark_last_busy(q->dev);
>                         -> pm_request_autosuspend(q->dev);
> 
> Bart.
Bart Van Assche June 30, 2020, 11:31 p.m. UTC | #19
On 2020-06-30 12:38, Alan Stern wrote:
> Assume that BLK_MQ_REQ_PREEMPT is set in flags.  Then where exactly 
> does blk_queue_enter(q, flags) call blk_pm_request_resume(q)?

Please take a look at how the *current* implementation of runtime power
management works. Your question is relevant for the old implementation
of runtime power management but not for the current implementation.

Bart.
Alan Stern July 1, 2020, 12:49 a.m. UTC | #20
On Tue, Jun 30, 2020 at 04:31:58PM -0700, Bart Van Assche wrote:
> On 2020-06-30 12:38, Alan Stern wrote:
> > Assume that BLK_MQ_REQ_PREEMPT is set in flags.  Then where exactly 
> > does blk_queue_enter(q, flags) call blk_pm_request_resume(q)?
> 
> Please take a look at how the *current* implementation of runtime power
> management works. Your question is relevant for the old implementation
> of runtime power management but not for the current implementation.

What do you mean by "current"?  I have been looking at the implementation 
in 5.8-rc3 from Linus's tree.  Should I look somewhere else?

Alan Stern
Alan Stern July 6, 2020, 4:41 p.m. UTC | #21
On Tue, Jun 30, 2020 at 20:49:58PM -0400, Alan Stern wrote:
> On Tue, Jun 30, 2020 at 04:31:58PM -0700, Bart Van Assche wrote:
> > On 2020-06-30 12:38, Alan Stern wrote:
> > > Assume that BLK_MQ_REQ_PREEMPT is set in flags.  Then where exactly 
> > > does blk_queue_enter(q, flags) call blk_pm_request_resume(q)?
> > 
> > Please take a look at how the *current* implementation of runtime power
> > management works. Your question is relevant for the old implementation
> > of runtime power management but not for the current implementation.
> 
> What do you mean by "current"?  I have been looking at the implementation 
> in 5.8-rc3 from Linus's tree.  Should I look somewhere else?

Any reply to this, or further concerns about the proposed patch?

I'd like to fix up this API, and it appears that you are the only
person to have worked on it significantly in the last two years.

Alan Stern
Martin Kepplinger July 28, 2020, 7:02 a.m. UTC | #22
On 06.07.20 18:41, Alan Stern wrote:
> On Tue, Jun 30, 2020 at 20:49:58PM -0400, Alan Stern wrote:
>> On Tue, Jun 30, 2020 at 04:31:58PM -0700, Bart Van Assche wrote:
>>> On 2020-06-30 12:38, Alan Stern wrote:
>>>> Assume that BLK_MQ_REQ_PREEMPT is set in flags.  Then where exactly 
>>>> does blk_queue_enter(q, flags) call blk_pm_request_resume(q)?
>>>
>>> Please take a look at how the *current* implementation of runtime power
>>> management works. Your question is relevant for the old implementation
>>> of runtime power management but not for the current implementation.
>>
>> What do you mean by "current"?  I have been looking at the implementation 
>> in 5.8-rc3 from Linus's tree.  Should I look somewhere else?
> 
> Any reply to this, or further concerns about the proposed patch?
> 
> I'd like to fix up this API, and it appears that you are the only
> person to have worked on it significantly in the last two years.
> 
> Alan Stern
> 

Hi Alan,

Any API cleanup is of course welcome. I just wanted to remind you that
the underlying problem: broken block device runtime pm. Your initial
proposed fix "almost" did it and mounting works but during file access,
it still just looks like a runtime_resume is missing somewhere.

As we need to have that working at some point, I might look into it, but
someone who has experience in the block layer can surely do it more
efficiently.

to test, turn off polling:
echo 0 > /sys/module/block/parameters/events_dfl_poll_msecs

and enable runtime pm in your (scsi) device: power/autosuspend_delay_ms
and of course power/control (set to auto).

thanks,
                              martin
Alan Stern July 28, 2020, 8:02 p.m. UTC | #23
On Tue, Jul 28, 2020 at 09:02:44AM +0200, Martin Kepplinger wrote:
> Hi Alan,
> 
> Any API cleanup is of course welcome. I just wanted to remind you that
> the underlying problem: broken block device runtime pm. Your initial
> proposed fix "almost" did it and mounting works but during file access,
> it still just looks like a runtime_resume is missing somewhere.

Well, I have tested that proposed fix several times, and on my system 
it's working perfectly.  When I stop accessing a drive it autosuspends, 
and when I access it again it gets resumed and works -- as you would 
expect.

> As we need to have that working at some point, I might look into it, but
> someone who has experience in the block layer can surely do it more
> efficiently.

I suspect that any problems you still face are caused by something else.

Alan Stern
Martin Kepplinger July 29, 2020, 2:12 p.m. UTC | #24
On 28.07.20 22:02, Alan Stern wrote:
> On Tue, Jul 28, 2020 at 09:02:44AM +0200, Martin Kepplinger wrote:
>> Hi Alan,
>>
>> Any API cleanup is of course welcome. I just wanted to remind you that
>> the underlying problem: broken block device runtime pm. Your initial
>> proposed fix "almost" did it and mounting works but during file access,
>> it still just looks like a runtime_resume is missing somewhere.
> 
> Well, I have tested that proposed fix several times, and on my system 
> it's working perfectly.  When I stop accessing a drive it autosuspends, 
> and when I access it again it gets resumed and works -- as you would 
> expect.

that's weird. when I mount, everything looks good, "sda1". But as soon
as I cd to the mountpoint and do "ls" (on another SD card "ls" works but
actual file reading leads to the exact same errors), I get:

[   77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[   77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
[   77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
[   77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 60
40 00 00 01 00
[   77.474678] blk_update_request: I/O error, dev sda, sector 24640 op
0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   77.485836] sd 0:0:0:0: [sda] tag#0 device offline or changed
[   77.491628] blk_update_request: I/O error, dev sda, sector 24641 op
0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   77.502275] sd 0:0:0:0: [sda] tag#0 device offline or changed
[   77.508051] blk_update_request: I/O error, dev sda, sector 24642 op
0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   77.518651] sd 0:0:0:0: [sda] tag#0 device offline or changed
(...)
[   77.947653] sd 0:0:0:0: [sda] tag#0 device offline or changed
[   77.953434] FAT-fs (sda1): Directory bread(block 16448) failed
[   77.959333] sd 0:0:0:0: [sda] tag#0 device offline or changed
[   77.965118] FAT-fs (sda1): Directory bread(block 16449) failed
[   77.971014] sd 0:0:0:0: [sda] tag#0 device offline or changed
[   77.976802] FAT-fs (sda1): Directory bread(block 16450) failed
[   77.982698] sd 0:0:0:0: [sda] tag#0 device offline or changed
(...)
[   78.384929] FAT-fs (sda1): Filesystem has been set read-only
[  103.070973] sd 0:0:0:0: [sda] tag#0 device offline or changed
[  103.076751] print_req_error: 118 callbacks suppressed
[  103.076760] blk_update_request: I/O error, dev sda, sector 9748 op
0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
[  103.087428] Buffer I/O error on dev sda1, logical block 1556, lost
async page write
[  103.095309] sd 0:0:0:0: [sda] tag#0 device offline or changed
[  103.101123] blk_update_request: I/O error, dev sda, sector 17162 op
0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
[  103.111883] Buffer I/O error on dev sda1, logical block 8970, lost
async page write

> 
>> As we need to have that working at some point, I might look into it, but
>> someone who has experience in the block layer can surely do it more
>> efficiently.
> 
> I suspect that any problems you still face are caused by something else.
> 

I then formatted sda1 to ext2 (on the runtime suspend system testing
your patch) and that seems to have worked!

Again accessing the mountpoint then yield the very same "device offline
or changed" errors.

What kind of device are you testing? You should be easily able to
reproduce this using an "sd" device.

The problems must lie in the different other drivers we use I guess.

                             martin


> Alan Stern
>
Alan Stern July 29, 2020, 2:32 p.m. UTC | #25
On Wed, Jul 29, 2020 at 04:12:22PM +0200, Martin Kepplinger wrote:
> On 28.07.20 22:02, Alan Stern wrote:
> > On Tue, Jul 28, 2020 at 09:02:44AM +0200, Martin Kepplinger wrote:
> >> Hi Alan,
> >>
> >> Any API cleanup is of course welcome. I just wanted to remind you that
> >> the underlying problem: broken block device runtime pm. Your initial
> >> proposed fix "almost" did it and mounting works but during file access,
> >> it still just looks like a runtime_resume is missing somewhere.
> > 
> > Well, I have tested that proposed fix several times, and on my system 
> > it's working perfectly.  When I stop accessing a drive it autosuspends, 
> > and when I access it again it gets resumed and works -- as you would 
> > expect.
> 
> that's weird. when I mount, everything looks good, "sda1". But as soon
> as I cd to the mountpoint and do "ls" (on another SD card "ls" works but
> actual file reading leads to the exact same errors), I get:
> 
> [   77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
> [   77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
> [   77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
> [   77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 60
> 40 00 00 01 00

This error report comes from the SCSI layer, not the block layer.

> [   77.474678] blk_update_request: I/O error, dev sda, sector 24640 op
> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [   77.485836] sd 0:0:0:0: [sda] tag#0 device offline or changed
> [   77.491628] blk_update_request: I/O error, dev sda, sector 24641 op
> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [   77.502275] sd 0:0:0:0: [sda] tag#0 device offline or changed
> [   77.508051] blk_update_request: I/O error, dev sda, sector 24642 op
> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [   77.518651] sd 0:0:0:0: [sda] tag#0 device offline or changed
> (...)
> [   77.947653] sd 0:0:0:0: [sda] tag#0 device offline or changed
> [   77.953434] FAT-fs (sda1): Directory bread(block 16448) failed
> [   77.959333] sd 0:0:0:0: [sda] tag#0 device offline or changed
> [   77.965118] FAT-fs (sda1): Directory bread(block 16449) failed
> [   77.971014] sd 0:0:0:0: [sda] tag#0 device offline or changed
> [   77.976802] FAT-fs (sda1): Directory bread(block 16450) failed
> [   77.982698] sd 0:0:0:0: [sda] tag#0 device offline or changed
> (...)
> [   78.384929] FAT-fs (sda1): Filesystem has been set read-only
> [  103.070973] sd 0:0:0:0: [sda] tag#0 device offline or changed
> [  103.076751] print_req_error: 118 callbacks suppressed
> [  103.076760] blk_update_request: I/O error, dev sda, sector 9748 op
> 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
> [  103.087428] Buffer I/O error on dev sda1, logical block 1556, lost
> async page write
> [  103.095309] sd 0:0:0:0: [sda] tag#0 device offline or changed
> [  103.101123] blk_update_request: I/O error, dev sda, sector 17162 op
> 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
> [  103.111883] Buffer I/O error on dev sda1, logical block 8970, lost
> async page write

I can't tell why you're getting that error.  In one of my tests the 
device returned the same kind of error status (Sense Key = 6, ASC = 
0x28) but the operation was then retried successfully.  Perhaps the 
problem lies in the device you are testing.

> >> As we need to have that working at some point, I might look into it, but
> >> someone who has experience in the block layer can surely do it more
> >> efficiently.
> > 
> > I suspect that any problems you still face are caused by something else.
> > 
> 
> I then formatted sda1 to ext2 (on the runtime suspend system testing
> your patch) and that seems to have worked!
> 
> Again accessing the mountpoint then yield the very same "device offline
> or changed" errors.
> 
> What kind of device are you testing? You should be easily able to
> reproduce this using an "sd" device.

I tested two devices: a SanDisk Cruzer USB flash drive and a 
g-mass-storage gadget running under dummy-hcd.  They each showed up as 
/dev/sdb on my system.

I haven't tried testing with an SD card.  If you have any specific 
sequence of commands you would like me to run, let me know.

> The problems must lie in the different other drivers we use I guess.

Or the devices.  Have you tried testing with a USB flash drive?

Alan Stern
Martin K. Petersen July 29, 2020, 2:44 p.m. UTC | #26
Alan,

>> [   77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
>> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
>> [   77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
>> [   77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
>> [   77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 60
>> 40 00 00 01 00
>
> This error report comes from the SCSI layer, not the block layer.

This the device telling us that the media (SD card?) has changed.
James Bottomley July 29, 2020, 2:46 p.m. UTC | #27
On Wed, 2020-07-29 at 10:32 -0400, Alan Stern wrote:
> On Wed, Jul 29, 2020 at 04:12:22PM +0200, Martin Kepplinger wrote:
> > On 28.07.20 22:02, Alan Stern wrote:
> > > On Tue, Jul 28, 2020 at 09:02:44AM +0200, Martin Kepplinger
> > > wrote:
> > > > Hi Alan,
> > > > 
> > > > Any API cleanup is of course welcome. I just wanted to remind
> > > > you that the underlying problem: broken block device runtime
> > > > pm. Your initial proposed fix "almost" did it and mounting
> > > > works but during file access, it still just looks like a
> > > > runtime_resume is missing somewhere.
> > > 
> > > Well, I have tested that proposed fix several times, and on my
> > > system it's working perfectly.  When I stop accessing a drive it
> > > autosuspends, and when I access it again it gets resumed and
> > > works -- as you would expect.
> > 
> > that's weird. when I mount, everything looks good, "sda1". But as
> > soon as I cd to the mountpoint and do "ls" (on another SD card "ls"
> > works but actual file reading leads to the exact same errors), I
> > get:
> > 
> > [   77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
> > hostbyte=0x00 driverbyte=0x08 cmd_age=0s
> > [   77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
> > [   77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
> > [   77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00
> > 60 40 00 00 01 00
> 
> This error report comes from the SCSI layer, not the block layer.

That sense code means "NOT READY TO READY CHANGE, MEDIUM MAY HAVE
CHANGED" so it sounds like it something we should be ignoring.  Usually
this signals a problem, like you changed the medium manually (ejected
the CD).  But in this case you can tell us to expect this by setting

sdev->expecting_cc_ua

And we'll retry.  I think you need to set this on all resumed devices.

James
James Bottomley July 29, 2020, 2:53 p.m. UTC | #28
On Wed, 2020-07-29 at 07:46 -0700, James Bottomley wrote:
> On Wed, 2020-07-29 at 10:32 -0400, Alan Stern wrote:
> > On Wed, Jul 29, 2020 at 04:12:22PM +0200, Martin Kepplinger wrote:
> > > On 28.07.20 22:02, Alan Stern wrote:
> > > > On Tue, Jul 28, 2020 at 09:02:44AM +0200, Martin Kepplinger
> > > > wrote:
> > > > > Hi Alan,
> > > > > 
> > > > > Any API cleanup is of course welcome. I just wanted to remind
> > > > > you that the underlying problem: broken block device runtime
> > > > > pm. Your initial proposed fix "almost" did it and mounting
> > > > > works but during file access, it still just looks like a
> > > > > runtime_resume is missing somewhere.
> > > > 
> > > > Well, I have tested that proposed fix several times, and on my
> > > > system it's working perfectly.  When I stop accessing a drive
> > > > it autosuspends, and when I access it again it gets resumed and
> > > > works -- as you would expect.
> > > 
> > > that's weird. when I mount, everything looks good, "sda1". But as
> > > soon as I cd to the mountpoint and do "ls" (on another SD card
> > > "ls" works but actual file reading leads to the exact same
> > > errors), I get:
> > > 
> > > [   77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
> > > hostbyte=0x00 driverbyte=0x08 cmd_age=0s
> > > [   77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
> > > [   77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
> > > [   77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00
> > > 00 60 40 00 00 01 00
> > 
> > This error report comes from the SCSI layer, not the block layer.
> 
> That sense code means "NOT READY TO READY CHANGE, MEDIUM MAY HAVE
> CHANGED" so it sounds like it something we should be
> ignoring.  Usually this signals a problem, like you changed the
> medium manually (ejected the CD).  But in this case you can tell us
> to expect this by setting
> 
> sdev->expecting_cc_ua
> 
> And we'll retry.  I think you need to set this on all resumed
> devices.

Actually, it's not quite that easy, we filter out this ASC/ASCQ
combination from the check because we should never ignore medium might
have changed events on running devices.  We could ignore it if we had a
flag to say the power has been yanked (perhaps an additional sdev flag
you set on resume) but we would still miss the case where you really
had powered off the drive and then changed the media ... if you can
regard this as the user's problem, then we might have a solution.

James
Alan Stern July 29, 2020, 2:56 p.m. UTC | #29
On Wed, Jul 29, 2020 at 10:44:26AM -0400, Martin K. Petersen wrote:
> 
> Alan,
> 
> >> [   77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
> >> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
> >> [   77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
> >> [   77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
> >> [   77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 60
> >> 40 00 00 01 00
> >
> > This error report comes from the SCSI layer, not the block layer.
> 
> This the device telling us that the media (SD card?) has changed.

Ah yes, thank you.  I knew that SK=6 ASC=0x28 meant "Not Ready to Ready 
Transition", but I had forgotten the "(Media May Have Changed)" part.

This makes sense and is a reasonable thing to see, since many SD card 
readers lose track of whether or not the card has been changed when they 
go into suspend.

Alan Stern
Martin Kepplinger July 29, 2020, 3:40 p.m. UTC | #30
On 29.07.20 16:53, James Bottomley wrote:
> On Wed, 2020-07-29 at 07:46 -0700, James Bottomley wrote:
>> On Wed, 2020-07-29 at 10:32 -0400, Alan Stern wrote:
>>> On Wed, Jul 29, 2020 at 04:12:22PM +0200, Martin Kepplinger wrote:
>>>> On 28.07.20 22:02, Alan Stern wrote:
>>>>> On Tue, Jul 28, 2020 at 09:02:44AM +0200, Martin Kepplinger
>>>>> wrote:
>>>>>> Hi Alan,
>>>>>>
>>>>>> Any API cleanup is of course welcome. I just wanted to remind
>>>>>> you that the underlying problem: broken block device runtime
>>>>>> pm. Your initial proposed fix "almost" did it and mounting
>>>>>> works but during file access, it still just looks like a
>>>>>> runtime_resume is missing somewhere.
>>>>>
>>>>> Well, I have tested that proposed fix several times, and on my
>>>>> system it's working perfectly.  When I stop accessing a drive
>>>>> it autosuspends, and when I access it again it gets resumed and
>>>>> works -- as you would expect.
>>>>
>>>> that's weird. when I mount, everything looks good, "sda1". But as
>>>> soon as I cd to the mountpoint and do "ls" (on another SD card
>>>> "ls" works but actual file reading leads to the exact same
>>>> errors), I get:
>>>>
>>>> [   77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
>>>> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
>>>> [   77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
>>>> [   77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
>>>> [   77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00
>>>> 00 60 40 00 00 01 00
>>>
>>> This error report comes from the SCSI layer, not the block layer.
>>
>> That sense code means "NOT READY TO READY CHANGE, MEDIUM MAY HAVE
>> CHANGED" so it sounds like it something we should be
>> ignoring.  Usually this signals a problem, like you changed the
>> medium manually (ejected the CD).  But in this case you can tell us
>> to expect this by setting
>>
>> sdev->expecting_cc_ua
>>
>> And we'll retry.  I think you need to set this on all resumed
>> devices.
> 
> Actually, it's not quite that easy, we filter out this ASC/ASCQ
> combination from the check because we should never ignore medium might
> have changed events on running devices.  We could ignore it if we had a
> flag to say the power has been yanked (perhaps an additional sdev flag
> you set on resume) but we would still miss the case where you really
> had powered off the drive and then changed the media ... if you can
> regard this as the user's problem, then we might have a solution.
> 
> James
>  

oh I see what you mean now, thanks for the ellaboration.

if I do the following change, things all look normal and runtime pm
works. I'm not 100% sure if just setting expecting_cc_ua in resume() is
"correct" but that looks like it is what you're talking about:

(note that this is of course with the one block layer diff applied that
Alan posted a few emails back)


--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -554,16 +554,8 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
                 * so that we can deal with it there.
                 */
                if (scmd->device->expecting_cc_ua) {
-                       /*
-                        * Because some device does not queue unit
-                        * attentions correctly, we carefully check
-                        * additional sense code and qualifier so as
-                        * not to squash media change unit attention.
-                        */
-                       if (sshdr.asc != 0x28 || sshdr.ascq != 0x00) {
-                               scmd->device->expecting_cc_ua = 0;
-                               return NEEDS_RETRY;
-                       }
+                       scmd->device->expecting_cc_ua = 0;
+                       return NEEDS_RETRY;
                }
                /*
                 * we might also expect a cc/ua if another LUN on the target
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d90fefffe31b..5ad847fed8b9 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3642,6 +3642,8 @@ static int sd_resume(struct device *dev)
        if (!sdkp)      /* E.g.: runtime resume at the start of
sd_probe() */
                return 0;

+       sdkp->device->expecting_cc_ua = 1;
+
        if (!sdkp->device->manage_start_stop)
                return 0;
Alan Stern July 29, 2020, 3:40 p.m. UTC | #31
On Wed, Jul 29, 2020 at 07:53:52AM -0700, James Bottomley wrote:
> On Wed, 2020-07-29 at 07:46 -0700, James Bottomley wrote:
> > On Wed, 2020-07-29 at 10:32 -0400, Alan Stern wrote:
> > > On Wed, Jul 29, 2020 at 04:12:22PM +0200, Martin Kepplinger wrote:
> > > > On 28.07.20 22:02, Alan Stern wrote:
> > > > > On Tue, Jul 28, 2020 at 09:02:44AM +0200, Martin Kepplinger
> > > > > wrote:
> > > > > > Hi Alan,
> > > > > > 
> > > > > > Any API cleanup is of course welcome. I just wanted to remind
> > > > > > you that the underlying problem: broken block device runtime
> > > > > > pm. Your initial proposed fix "almost" did it and mounting
> > > > > > works but during file access, it still just looks like a
> > > > > > runtime_resume is missing somewhere.
> > > > > 
> > > > > Well, I have tested that proposed fix several times, and on my
> > > > > system it's working perfectly.  When I stop accessing a drive
> > > > > it autosuspends, and when I access it again it gets resumed and
> > > > > works -- as you would expect.
> > > > 
> > > > that's weird. when I mount, everything looks good, "sda1". But as
> > > > soon as I cd to the mountpoint and do "ls" (on another SD card
> > > > "ls" works but actual file reading leads to the exact same
> > > > errors), I get:
> > > > 
> > > > [   77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
> > > > hostbyte=0x00 driverbyte=0x08 cmd_age=0s
> > > > [   77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
> > > > [   77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
> > > > [   77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00
> > > > 00 60 40 00 00 01 00
> > > 
> > > This error report comes from the SCSI layer, not the block layer.
> > 
> > That sense code means "NOT READY TO READY CHANGE, MEDIUM MAY HAVE
> > CHANGED" so it sounds like it something we should be
> > ignoring.  Usually this signals a problem, like you changed the
> > medium manually (ejected the CD).  But in this case you can tell us
> > to expect this by setting
> > 
> > sdev->expecting_cc_ua
> > 
> > And we'll retry.  I think you need to set this on all resumed
> > devices.
> 
> Actually, it's not quite that easy, we filter out this ASC/ASCQ
> combination from the check because we should never ignore medium might
> have changed events on running devices.  We could ignore it if we had a
> flag to say the power has been yanked (perhaps an additional sdev flag
> you set on resume) but we would still miss the case where you really
> had powered off the drive and then changed the media ... if you can
> regard this as the user's problem, then we might have a solution.

Indeed, I was going to make the same point.

The only reasonable conclusion is that suspending these SD card readers 
isn't safe unless they don't contain a card -- or maybe just if the 
device file isn't open or mounted.

Although support for this sort of thing could be added to the kernel, 
for now it's best to rely on userspace doing the right thing.  The 
kernel doesn't even know which devices suffer from this problem.

Alan Stern
James Bottomley July 29, 2020, 3:44 p.m. UTC | #32
On Wed, 2020-07-29 at 17:40 +0200, Martin Kepplinger wrote:
> On 29.07.20 16:53, James Bottomley wrote:
> > On Wed, 2020-07-29 at 07:46 -0700, James Bottomley wrote:
> > > On Wed, 2020-07-29 at 10:32 -0400, Alan Stern wrote:
[...]
> > > > This error report comes from the SCSI layer, not the block
> > > > layer.
> > > 
> > > That sense code means "NOT READY TO READY CHANGE, MEDIUM MAY HAVE
> > > CHANGED" so it sounds like it something we should be
> > > ignoring.  Usually this signals a problem, like you changed the
> > > medium manually (ejected the CD).  But in this case you can tell
> > > us to expect this by setting
> > > 
> > > sdev->expecting_cc_ua
> > > 
> > > And we'll retry.  I think you need to set this on all resumed
> > > devices.
> > 
> > Actually, it's not quite that easy, we filter out this ASC/ASCQ
> > combination from the check because we should never ignore medium
> > might have changed events on running devices.  We could ignore it
> > if we had a flag to say the power has been yanked (perhaps an
> > additional sdev flag you set on resume) but we would still miss the
> > case where you really had powered off the drive and then changed
> > the media ... if you can regard this as the user's problem, then we
> > might have a solution.
> > 
> > James
> >  
> 
> oh I see what you mean now, thanks for the ellaboration.
> 
> if I do the following change, things all look normal and runtime pm
> works. I'm not 100% sure if just setting expecting_cc_ua in resume()
> is "correct" but that looks like it is what you're talking about:
> 
> (note that this is of course with the one block layer diff applied
> that Alan posted a few emails back)
> 
> 
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -554,16 +554,8 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>                  * so that we can deal with it there.
>                  */
>                 if (scmd->device->expecting_cc_ua) {
> -                       /*
> -                        * Because some device does not queue unit
> -                        * attentions correctly, we carefully check
> -                        * additional sense code and qualifier so as
> -                        * not to squash media change unit attention.
> -                        */
> -                       if (sshdr.asc != 0x28 || sshdr.ascq != 0x00)
> {
> -                               scmd->device->expecting_cc_ua = 0;
> -                               return NEEDS_RETRY;
> -                       }
> +                       scmd->device->expecting_cc_ua = 0;
> +                       return NEEDS_RETRY;

Well, yes, but you can't do this because it would lose us media change
events in the non-suspend/resume case which we really don't want. 
That's why I was suggesting a new flag.

James
James Bottomley July 29, 2020, 3:49 p.m. UTC | #33
On Wed, 2020-07-29 at 11:40 -0400, Alan Stern wrote:
> On Wed, Jul 29, 2020 at 07:53:52AM -0700, James Bottomley wrote:
> > On Wed, 2020-07-29 at 07:46 -0700, James Bottomley wrote:
[...]
> > > That sense code means "NOT READY TO READY CHANGE, MEDIUM MAY HAVE
> > > CHANGED" so it sounds like it something we should be
> > > ignoring.  Usually this signals a problem, like you changed the
> > > medium manually (ejected the CD).  But in this case you can tell
> > > us to expect this by setting
> > > 
> > > sdev->expecting_cc_ua
> > > 
> > > And we'll retry.  I think you need to set this on all resumed
> > > devices.
> > 
> > Actually, it's not quite that easy, we filter out this ASC/ASCQ
> > combination from the check because we should never ignore medium
> > might have changed events on running devices.  We could ignore it
> > if we had a flag to say the power has been yanked (perhaps an
> > additional sdev flag you set on resume) but we would still miss the
> > case where you really had powered off the drive and then changed
> > the media ... if you can regard this as the user's problem, then we
> > might have a solution.
> 
> Indeed, I was going to make the same point.
> 
> The only reasonable conclusion is that suspending these SD card
> readers isn't safe unless they don't contain a card -- or maybe just
> if the device file isn't open or mounted.

For standard removable media, I could see issuing an eject before
suspend to emphasize the point.  However, sd cards don't work like that
... they're not ejectable except manually and mostly used as the HDD of
embedded, so the 99% use case is that the user will suspend and resume
the device with the same sdd insert.  It does sound like a use case we
should support.

> Although support for this sort of thing could be added to the
> kernel, for now it's best to rely on userspace doing the right
> thing.  The kernel doesn't even know which devices suffer from this
> problem.

Can we solve from userspace the case where the user hasn't changed the
media and the resume fails because we don't know?

James
Martin Kepplinger July 29, 2020, 3:52 p.m. UTC | #34
On 29.07.20 17:40, Alan Stern wrote:
> On Wed, Jul 29, 2020 at 07:53:52AM -0700, James Bottomley wrote:
>> On Wed, 2020-07-29 at 07:46 -0700, James Bottomley wrote:
>>> On Wed, 2020-07-29 at 10:32 -0400, Alan Stern wrote:
>>>> On Wed, Jul 29, 2020 at 04:12:22PM +0200, Martin Kepplinger wrote:
>>>>> On 28.07.20 22:02, Alan Stern wrote:
>>>>>> On Tue, Jul 28, 2020 at 09:02:44AM +0200, Martin Kepplinger
>>>>>> wrote:
>>>>>>> Hi Alan,
>>>>>>>
>>>>>>> Any API cleanup is of course welcome. I just wanted to remind
>>>>>>> you that the underlying problem: broken block device runtime
>>>>>>> pm. Your initial proposed fix "almost" did it and mounting
>>>>>>> works but during file access, it still just looks like a
>>>>>>> runtime_resume is missing somewhere.
>>>>>>
>>>>>> Well, I have tested that proposed fix several times, and on my
>>>>>> system it's working perfectly.  When I stop accessing a drive
>>>>>> it autosuspends, and when I access it again it gets resumed and
>>>>>> works -- as you would expect.
>>>>>
>>>>> that's weird. when I mount, everything looks good, "sda1". But as
>>>>> soon as I cd to the mountpoint and do "ls" (on another SD card
>>>>> "ls" works but actual file reading leads to the exact same
>>>>> errors), I get:
>>>>>
>>>>> [   77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
>>>>> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
>>>>> [   77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
>>>>> [   77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
>>>>> [   77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00
>>>>> 00 60 40 00 00 01 00
>>>>
>>>> This error report comes from the SCSI layer, not the block layer.
>>>
>>> That sense code means "NOT READY TO READY CHANGE, MEDIUM MAY HAVE
>>> CHANGED" so it sounds like it something we should be
>>> ignoring.  Usually this signals a problem, like you changed the
>>> medium manually (ejected the CD).  But in this case you can tell us
>>> to expect this by setting
>>>
>>> sdev->expecting_cc_ua
>>>
>>> And we'll retry.  I think you need to set this on all resumed
>>> devices.
>>
>> Actually, it's not quite that easy, we filter out this ASC/ASCQ
>> combination from the check because we should never ignore medium might
>> have changed events on running devices.  We could ignore it if we had a
>> flag to say the power has been yanked (perhaps an additional sdev flag
>> you set on resume) but we would still miss the case where you really
>> had powered off the drive and then changed the media ... if you can
>> regard this as the user's problem, then we might have a solution.
> 
> Indeed, I was going to make the same point.
> 
> The only reasonable conclusion is that suspending these SD card readers 
> isn't safe unless they don't contain a card -- or maybe just if the 
> device file isn't open or mounted.
> 
> Although support for this sort of thing could be added to the kernel, 
> for now it's best to rely on userspace doing the right thing.  The 
> kernel doesn't even know which devices suffer from this problem.
> 
> 

well, userspace can do something like "automatically unmount if not
used" but that is *way* more inefficient than if the kernel would
support this is some way or another, for SD cards.
Alan Stern July 29, 2020, 4:17 p.m. UTC | #35
On Wed, Jul 29, 2020 at 08:49:34AM -0700, James Bottomley wrote:
> On Wed, 2020-07-29 at 11:40 -0400, Alan Stern wrote:
> > On Wed, Jul 29, 2020 at 07:53:52AM -0700, James Bottomley wrote:
> > > On Wed, 2020-07-29 at 07:46 -0700, James Bottomley wrote:
> [...]
> > > > That sense code means "NOT READY TO READY CHANGE, MEDIUM MAY HAVE
> > > > CHANGED" so it sounds like it something we should be
> > > > ignoring.  Usually this signals a problem, like you changed the
> > > > medium manually (ejected the CD).  But in this case you can tell
> > > > us to expect this by setting
> > > > 
> > > > sdev->expecting_cc_ua
> > > > 
> > > > And we'll retry.  I think you need to set this on all resumed
> > > > devices.
> > > 
> > > Actually, it's not quite that easy, we filter out this ASC/ASCQ
> > > combination from the check because we should never ignore medium
> > > might have changed events on running devices.  We could ignore it
> > > if we had a flag to say the power has been yanked (perhaps an
> > > additional sdev flag you set on resume) but we would still miss the
> > > case where you really had powered off the drive and then changed
> > > the media ... if you can regard this as the user's problem, then we
> > > might have a solution.
> > 
> > Indeed, I was going to make the same point.
> > 
> > The only reasonable conclusion is that suspending these SD card
> > readers isn't safe unless they don't contain a card -- or maybe just
> > if the device file isn't open or mounted.
> 
> For standard removable media, I could see issuing an eject before
> suspend to emphasize the point.

That's not a workable approach in general.  For example, you wouldn't 
want to eject a CD just because it hadn't been used for a couple of 
minutes and the drive was therefore about to be suspended.

>  However, sd cards don't work like that
> ... they're not ejectable except manually and mostly used as the HDD of
> embedded, so the 99% use case is that the user will suspend and resume
> the device with the same sdd insert.  It does sound like a use case we
> should support.

Agreed.

> > Although support for this sort of thing could be added to the
> > kernel, for now it's best to rely on userspace doing the right
> > thing.  The kernel doesn't even know which devices suffer from this
> > problem.
> 
> Can we solve from userspace the case where the user hasn't changed the
> media and the resume fails because we don't know?

The difficulty is recognizing situations where the media really was 
changed while the device was suspended.  Most devices, AFAIK, don't have 
any problem with this, but card readers often do.

I suppose the kernel could just rely on users not to change media in 
drives while they are mounted.  Then any problems that arise would be 
the users' fault.  Yes, this is nothing more than passing the buck, but 
it's hard to come up with any better approach.  Doesn't Windows do 
something like this?

Alan Stern
Martin Kepplinger July 29, 2020, 4:43 p.m. UTC | #36
Am 29. Juli 2020 17:44:42 MESZ schrieb James Bottomley <James.Bottomley@HansenPartnership.com>:
>On Wed, 2020-07-29 at 17:40 +0200, Martin Kepplinger wrote:
>> On 29.07.20 16:53, James Bottomley wrote:
>> > On Wed, 2020-07-29 at 07:46 -0700, James Bottomley wrote:
>> > > On Wed, 2020-07-29 at 10:32 -0400, Alan Stern wrote:
>[...]
>> > > > This error report comes from the SCSI layer, not the block
>> > > > layer.
>> > > 
>> > > That sense code means "NOT READY TO READY CHANGE, MEDIUM MAY HAVE
>> > > CHANGED" so it sounds like it something we should be
>> > > ignoring.  Usually this signals a problem, like you changed the
>> > > medium manually (ejected the CD).  But in this case you can tell
>> > > us to expect this by setting
>> > > 
>> > > sdev->expecting_cc_ua
>> > > 
>> > > And we'll retry.  I think you need to set this on all resumed
>> > > devices.
>> > 
>> > Actually, it's not quite that easy, we filter out this ASC/ASCQ
>> > combination from the check because we should never ignore medium
>> > might have changed events on running devices.  We could ignore it
>> > if we had a flag to say the power has been yanked (perhaps an
>> > additional sdev flag you set on resume) but we would still miss the
>> > case where you really had powered off the drive and then changed
>> > the media ... if you can regard this as the user's problem, then we
>> > might have a solution.
>> > 
>> > James
>> >  
>> 
>> oh I see what you mean now, thanks for the ellaboration.
>> 
>> if I do the following change, things all look normal and runtime pm
>> works. I'm not 100% sure if just setting expecting_cc_ua in resume()
>> is "correct" but that looks like it is what you're talking about:
>> 
>> (note that this is of course with the one block layer diff applied
>> that Alan posted a few emails back)
>> 
>> 
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -554,16 +554,8 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>>                  * so that we can deal with it there.
>>                  */
>>                 if (scmd->device->expecting_cc_ua) {
>> -                       /*
>> -                        * Because some device does not queue unit
>> -                        * attentions correctly, we carefully check
>> -                        * additional sense code and qualifier so as
>> -                        * not to squash media change unit attention.
>> -                        */
>> -                       if (sshdr.asc != 0x28 || sshdr.ascq != 0x00)
>> {
>> -                               scmd->device->expecting_cc_ua = 0;
>> -                               return NEEDS_RETRY;
>> -                       }
>> +                       scmd->device->expecting_cc_ua = 0;
>> +                       return NEEDS_RETRY;
>
>Well, yes, but you can't do this because it would lose us media change
>events in the non-suspend/resume case which we really don't want. 
>That's why I was suggesting a new flag.
>
>James

also if I set expecting_cc_ua in resume() only, like I did?
Douglas Gilbert July 29, 2020, 6:10 p.m. UTC | #37
On 2020-07-29 10:32 a.m., Alan Stern wrote:
> On Wed, Jul 29, 2020 at 04:12:22PM +0200, Martin Kepplinger wrote:
>> On 28.07.20 22:02, Alan Stern wrote:
>>> On Tue, Jul 28, 2020 at 09:02:44AM +0200, Martin Kepplinger wrote:
>>>> Hi Alan,
>>>>
>>>> Any API cleanup is of course welcome. I just wanted to remind you that
>>>> the underlying problem: broken block device runtime pm. Your initial
>>>> proposed fix "almost" did it and mounting works but during file access,
>>>> it still just looks like a runtime_resume is missing somewhere.
>>>
>>> Well, I have tested that proposed fix several times, and on my system
>>> it's working perfectly.  When I stop accessing a drive it autosuspends,
>>> and when I access it again it gets resumed and works -- as you would
>>> expect.
>>
>> that's weird. when I mount, everything looks good, "sda1". But as soon
>> as I cd to the mountpoint and do "ls" (on another SD card "ls" works but
>> actual file reading leads to the exact same errors), I get:
>>
>> [   77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
>> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
>> [   77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
>> [   77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
>> [   77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 60
>> 40 00 00 01 00
> 
> This error report comes from the SCSI layer, not the block layer.

SCSI's first 11 byte command! I'm guessing the first byte is being
repeated and it's actually:
     28 00 00 00 60 40 00 00 01 00  [READ(10)]

That should be fixed. It should be something like: "...CDB in hex: 28 00 ...".

Doug Gilbert

>> [   77.474678] blk_update_request: I/O error, dev sda, sector 24640 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   77.485836] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   77.491628] blk_update_request: I/O error, dev sda, sector 24641 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   77.502275] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   77.508051] blk_update_request: I/O error, dev sda, sector 24642 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   77.518651] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> (...)
>> [   77.947653] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   77.953434] FAT-fs (sda1): Directory bread(block 16448) failed
>> [   77.959333] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   77.965118] FAT-fs (sda1): Directory bread(block 16449) failed
>> [   77.971014] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   77.976802] FAT-fs (sda1): Directory bread(block 16450) failed
>> [   77.982698] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> (...)
>> [   78.384929] FAT-fs (sda1): Filesystem has been set read-only
>> [  103.070973] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [  103.076751] print_req_error: 118 callbacks suppressed
>> [  103.076760] blk_update_request: I/O error, dev sda, sector 9748 op
>> 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
>> [  103.087428] Buffer I/O error on dev sda1, logical block 1556, lost
>> async page write
>> [  103.095309] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [  103.101123] blk_update_request: I/O error, dev sda, sector 17162 op
>> 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
>> [  103.111883] Buffer I/O error on dev sda1, logical block 8970, lost
>> async page write
> 
> I can't tell why you're getting that error.  In one of my tests the
> device returned the same kind of error status (Sense Key = 6, ASC =
> 0x28) but the operation was then retried successfully.  Perhaps the
> problem lies in the device you are testing.
> 
>>>> As we need to have that working at some point, I might look into it, but
>>>> someone who has experience in the block layer can surely do it more
>>>> efficiently.
>>>
>>> I suspect that any problems you still face are caused by something else.
>>>
>>
>> I then formatted sda1 to ext2 (on the runtime suspend system testing
>> your patch) and that seems to have worked!
>>
>> Again accessing the mountpoint then yield the very same "device offline
>> or changed" errors.
>>
>> What kind of device are you testing? You should be easily able to
>> reproduce this using an "sd" device.
> 
> I tested two devices: a SanDisk Cruzer USB flash drive and a
> g-mass-storage gadget running under dummy-hcd.  They each showed up as
> /dev/sdb on my system.
> 
> I haven't tried testing with an SD card.  If you have any specific
> sequence of commands you would like me to run, let me know.
> 
>> The problems must lie in the different other drivers we use I guess.
> 
> Or the devices.  Have you tried testing with a USB flash drive?
> 
> Alan Stern
>
Alan Stern July 29, 2020, 6:25 p.m. UTC | #38
On Wed, Jul 29, 2020 at 06:43:48PM +0200, Martin Kepplinger wrote:
> 
> 
> Am 29. Juli 2020 17:44:42 MESZ schrieb James Bottomley <James.Bottomley@HansenPartnership.com>:
> >On Wed, 2020-07-29 at 17:40 +0200, Martin Kepplinger wrote:
> >> On 29.07.20 16:53, James Bottomley wrote:
> >> > On Wed, 2020-07-29 at 07:46 -0700, James Bottomley wrote:
> >> > > On Wed, 2020-07-29 at 10:32 -0400, Alan Stern wrote:
> >[...]
> >> > > > This error report comes from the SCSI layer, not the block
> >> > > > layer.
> >> > > 
> >> > > That sense code means "NOT READY TO READY CHANGE, MEDIUM MAY HAVE
> >> > > CHANGED" so it sounds like it something we should be
> >> > > ignoring.  Usually this signals a problem, like you changed the
> >> > > medium manually (ejected the CD).  But in this case you can tell
> >> > > us to expect this by setting
> >> > > 
> >> > > sdev->expecting_cc_ua
> >> > > 
> >> > > And we'll retry.  I think you need to set this on all resumed
> >> > > devices.
> >> > 
> >> > Actually, it's not quite that easy, we filter out this ASC/ASCQ
> >> > combination from the check because we should never ignore medium
> >> > might have changed events on running devices.  We could ignore it
> >> > if we had a flag to say the power has been yanked (perhaps an
> >> > additional sdev flag you set on resume) but we would still miss the
> >> > case where you really had powered off the drive and then changed
> >> > the media ... if you can regard this as the user's problem, then we
> >> > might have a solution.
> >> > 
> >> > James
> >> >  
> >> 
> >> oh I see what you mean now, thanks for the ellaboration.
> >> 
> >> if I do the following change, things all look normal and runtime pm
> >> works. I'm not 100% sure if just setting expecting_cc_ua in resume()
> >> is "correct" but that looks like it is what you're talking about:
> >> 
> >> (note that this is of course with the one block layer diff applied
> >> that Alan posted a few emails back)
> >> 
> >> 
> >> --- a/drivers/scsi/scsi_error.c
> >> +++ b/drivers/scsi/scsi_error.c
> >> @@ -554,16 +554,8 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
> >>                  * so that we can deal with it there.
> >>                  */
> >>                 if (scmd->device->expecting_cc_ua) {
> >> -                       /*
> >> -                        * Because some device does not queue unit
> >> -                        * attentions correctly, we carefully check
> >> -                        * additional sense code and qualifier so as
> >> -                        * not to squash media change unit attention.
> >> -                        */
> >> -                       if (sshdr.asc != 0x28 || sshdr.ascq != 0x00)
> >> {
> >> -                               scmd->device->expecting_cc_ua = 0;
> >> -                               return NEEDS_RETRY;
> >> -                       }
> >> +                       scmd->device->expecting_cc_ua = 0;
> >> +                       return NEEDS_RETRY;
> >
> >Well, yes, but you can't do this because it would lose us media change
> >events in the non-suspend/resume case which we really don't want. 
> >That's why I was suggesting a new flag.
> >
> >James
> 
> also if I set expecting_cc_ua in resume() only, like I did?

That wouldn't make any difference.  The information sent by your card 
reader has sshdr.asc == 0x28 and sshdr.ascq == 0x00 (you can see it in 
the log).  So because of the code here in scsi_check_sense(), which you 
can't change, the Unit Attention sent by the card reader would not be 
retried even if you do set the flag in resume().

Alan Stern
James Bottomley July 29, 2020, 6:29 p.m. UTC | #39
On Wed, 2020-07-29 at 14:25 -0400, Alan Stern wrote:
> On Wed, Jul 29, 2020 at 06:43:48PM +0200, Martin Kepplinger wrote:
[...]
> > > > --- a/drivers/scsi/scsi_error.c
> > > > +++ b/drivers/scsi/scsi_error.c
> > > > @@ -554,16 +554,8 @@ int scsi_check_sense(struct scsi_cmnd
> > > > *scmd)
> > > >                  * so that we can deal with it there.
> > > >                  */
> > > >                 if (scmd->device->expecting_cc_ua) {
> > > > -                       /*
> > > > -                        * Because some device does not queue
> > > > unit
> > > > -                        * attentions correctly, we carefully
> > > > check
> > > > -                        * additional sense code and qualifier
> > > > so as
> > > > -                        * not to squash media change unit
> > > > attention.
> > > > -                        */
> > > > -                       if (sshdr.asc != 0x28 || sshdr.ascq !=
> > > > 0x00)
> > > > {
> > > > -                               scmd->device->expecting_cc_ua =
> > > > 0;
> > > > -                               return NEEDS_RETRY;
> > > > -                       }
> > > > +                       scmd->device->expecting_cc_ua = 0;
> > > > +                       return NEEDS_RETRY;
> > > 
> > > Well, yes, but you can't do this because it would lose us media
> > > change events in the non-suspend/resume case which we really
> > > don't want.  That's why I was suggesting a new flag.
> > > 
> > > James
> > 
> > also if I set expecting_cc_ua in resume() only, like I did?
> 
> That wouldn't make any difference.  The information sent by your
> card reader has sshdr.asc == 0x28 and sshdr.ascq == 0x00 (you can see
> it in the log).  So because of the code here in scsi_check_sense(),
> which you can't change, the Unit Attention sent by the card reader
> would not be  retried even if you do set the flag in resume().

But if we had a new flag, like expecting_media_change, you could set
that in resume and we could condition the above test in the code on it
and reset it and do a retry if it gets set.  I'm not saying we have to
do it this way, but it sounds like we have to do something in the
kernel, so I think the flag will become necessary but there might be a
bit of policy based dance around how it gets set in the kernel (to
avoid losing accurate media change events).

James
Martin Kepplinger July 30, 2020, 8:05 a.m. UTC | #40
On 29.06.20 18:15, Alan Stern wrote:
> On Mon, Jun 29, 2020 at 11:42:59AM +0200, Martin Kepplinger wrote:
>>
>>
>> On 26.06.20 17:44, Alan Stern wrote:
>>> Martin's best approach would be to add some debugging code to find out why 
>>> blk_queue_enter() isn't calling bkl_pm_request_resume(), or why that call 
>>> doesn't lead to pm_request_resume().
>>>
>>> Alan Stern
>>>
>>
>> Hi Alan,
>>
>> blk_queue_enter() always - especially when sd is runtime suspended and I
>> try to mount as above - sets success to be true for me, so never
>> continues down to bkl_pm_request_resume(). All I see is "PM: Removing
>> info for No Bus:sda1".
> 
> Aha.  Looking at this more closely, it's apparent that the code in 
> blk-core.c contains a logic bug: It assumes that if the BLK_MQ_REQ_PREEMPT 
> flag is set then the request can be issued regardless of the queue's 
> runtime status.  That is not correct when the queue is suspended.
> 
> Below is my attempt to fix this up.  I'm not sure that the patch is 
> entirely correct, but it should fix this logic bug.  I would appreciate a 
> critical review.
> 
> Martin, does this fix the problem?
> 
> Alan Stern

Hi Alan,

So in the block layer your change below indeed fixes the problem and if
you want to submit that 1:1 feel free to add

Tested-by: Martin Kepplinger <martin.kepplinger@puri.sm>

thanks for your help in this!

                       martin


> 
> 
> 
> Index: usb-devel/block/blk-core.c
> ===================================================================
> --- usb-devel.orig/block/blk-core.c
> +++ usb-devel/block/blk-core.c
> @@ -423,7 +423,8 @@ int blk_queue_enter(struct request_queue
>  			 * responsible for ensuring that that counter is
>  			 * globally visible before the queue is unfrozen.
>  			 */
> -			if (pm || !blk_queue_pm_only(q)) {
> +			if ((pm && q->rpm_status != RPM_SUSPENDED) ||
> +			    !blk_queue_pm_only(q)) {
>  				success = true;
>  			} else {
>  				percpu_ref_put(&q->q_usage_counter);
> @@ -448,8 +449,7 @@ int blk_queue_enter(struct request_queue
>  
>  		wait_event(q->mq_freeze_wq,
>  			   (!q->mq_freeze_depth &&
> -			    (pm || (blk_pm_request_resume(q),
> -				    !blk_queue_pm_only(q)))) ||
> +			    blk_pm_resume_queue(pm, q)) ||
>  			   blk_queue_dying(q));
>  		if (blk_queue_dying(q))
>  			return -ENODEV;
> Index: usb-devel/block/blk-pm.h
> ===================================================================
> --- usb-devel.orig/block/blk-pm.h
> +++ usb-devel/block/blk-pm.h
> @@ -6,11 +6,14 @@
>  #include <linux/pm_runtime.h>
>  
>  #ifdef CONFIG_PM
> -static inline void blk_pm_request_resume(struct request_queue *q)
> +static inline int blk_pm_resume_queue(const bool pm, struct request_queue *q)
>  {
> -	if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
> -		       q->rpm_status == RPM_SUSPENDING))
> -		pm_request_resume(q->dev);
> +	if (!q->dev || !blk_queue_pm_only(q))
> +		return 1;	/* Nothing to do */
> +	if (pm && q->rpm_status != RPM_SUSPENDED)
> +		return 1;	/* Request allowed */
> +	pm_request_resume(q->dev);
> +	return 0;
>  }
>  
>  static inline void blk_pm_mark_last_busy(struct request *rq)
> @@ -44,8 +47,9 @@ static inline void blk_pm_put_request(st
>  		--rq->q->nr_pending;
>  }
>  #else
> -static inline void blk_pm_request_resume(struct request_queue *q)
> +static inline int blk_pm_resume_queue(const bool pm, struct request_queue *q)
>  {
> +	return 1;
>  }
>  
>  static inline void blk_pm_mark_last_busy(struct request *rq)
>
Martin Kepplinger July 30, 2020, 8:52 a.m. UTC | #41
On 29.07.20 20:29, James Bottomley wrote:
> On Wed, 2020-07-29 at 14:25 -0400, Alan Stern wrote:
>> On Wed, Jul 29, 2020 at 06:43:48PM +0200, Martin Kepplinger wrote:
> [...]
>>>>> --- a/drivers/scsi/scsi_error.c
>>>>> +++ b/drivers/scsi/scsi_error.c
>>>>> @@ -554,16 +554,8 @@ int scsi_check_sense(struct scsi_cmnd
>>>>> *scmd)
>>>>>                  * so that we can deal with it there.
>>>>>                  */
>>>>>                 if (scmd->device->expecting_cc_ua) {
>>>>> -                       /*
>>>>> -                        * Because some device does not queue
>>>>> unit
>>>>> -                        * attentions correctly, we carefully
>>>>> check
>>>>> -                        * additional sense code and qualifier
>>>>> so as
>>>>> -                        * not to squash media change unit
>>>>> attention.
>>>>> -                        */
>>>>> -                       if (sshdr.asc != 0x28 || sshdr.ascq !=
>>>>> 0x00)
>>>>> {
>>>>> -                               scmd->device->expecting_cc_ua =
>>>>> 0;
>>>>> -                               return NEEDS_RETRY;
>>>>> -                       }
>>>>> +                       scmd->device->expecting_cc_ua = 0;
>>>>> +                       return NEEDS_RETRY;
>>>>
>>>> Well, yes, but you can't do this because it would lose us media
>>>> change events in the non-suspend/resume case which we really
>>>> don't want.  That's why I was suggesting a new flag.
>>>>
>>>> James
>>>
>>> also if I set expecting_cc_ua in resume() only, like I did?
>>
>> That wouldn't make any difference.  The information sent by your
>> card reader has sshdr.asc == 0x28 and sshdr.ascq == 0x00 (you can see
>> it in the log).  So because of the code here in scsi_check_sense(),
>> which you can't change, the Unit Attention sent by the card reader
>> would not be  retried even if you do set the flag in resume().
> 
> But if we had a new flag, like expecting_media_change, you could set
> that in resume and we could condition the above test in the code on it
> and reset it and do a retry if it gets set.  I'm not saying we have to
> do it this way, but it sounds like we have to do something in the
> kernel, so I think the flag will become necessary but there might be a
> bit of policy based dance around how it gets set in the kernel (to
> avoid losing accurate media change events).
> 
> James
> 

Maybe I should just start a new discussion with a patch, but the below
is what makes sense to me (when I understand you correctly) and seems to
work. I basically add a new flag, so that the old flags behave unchanged
and only call it during *runtime* resume for SD cards:


--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -553,15 +553,21 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
                 * information that we should pass up to the upper-level
driver
                 * so that we can deal with it there.
                 */
-               if (scmd->device->expecting_cc_ua) {
+               if (scmd->device->expecting_cc_ua ||
+                   scmd->device->expecting_media_change) {
                        /*
                         * Because some device does not queue unit
                         * attentions correctly, we carefully check
                         * additional sense code and qualifier so as
-                        * not to squash media change unit attention.
+                        * not to squash media change unit attention;
+                        * unless expecting_media_change is set, indicating
+                        * that the media (most likely) didn't change
+                        * but a device only believes so (for example
+                        * because of suspend/resume).
                         */
-                       if (sshdr.asc != 0x28 || sshdr.ascq != 0x00) {
-                               scmd->device->expecting_cc_ua = 0;
+                       if ((sshdr.asc != 0x28 || sshdr.ascq != 0x00) ||
+                           scmd->device->expecting_media_change) {
+                               scmd->device->expecting_media_change = 0;
                                return NEEDS_RETRY;
                        }
                }
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d90fefffe31b..b647fab2b663 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -114,6 +114,7 @@ static void sd_shutdown(struct device *);
 static int sd_suspend_system(struct device *);
 static int sd_suspend_runtime(struct device *);
 static int sd_resume(struct device *);
+static int sd_resume_runtime(struct device *);
 static void sd_rescan(struct device *);
 static blk_status_t sd_init_command(struct scsi_cmnd *SCpnt);
 static void sd_uninit_command(struct scsi_cmnd *SCpnt);
@@ -574,7 +575,7 @@ static const struct dev_pm_ops sd_pm_ops = {
        .poweroff               = sd_suspend_system,
        .restore                = sd_resume,
        .runtime_suspend        = sd_suspend_runtime,
-       .runtime_resume         = sd_resume,
+       .runtime_resume         = sd_resume_runtime,
 };

 static struct scsi_driver sd_template = {
@@ -3652,6 +3653,21 @@ static int sd_resume(struct device *dev)
        return ret;
 }

+static int sd_resume_runtime(struct device *dev)
+{
+       struct scsi_disk *sdkp = dev_get_drvdata(dev);
+
+       /* Some SD cardreaders report media change when resuming from
suspend
+        * because they can't keep track during suspend. */
+
+       /* XXX This is not unproblematic though: We won't notice when a card
+        * was really changed during runtime suspend! We basically rely
on users
+        * to unmount or suspend before doing so. */
+       sdkp->device->expecting_media_change = 1;
+
+       return sd_resume(dev);
+}
+
 /**
  *     init_sd - entry point for this driver (both when built in or when
  *     a module).
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index bc5909033d13..8c8f053f71c8 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -169,6 +169,8 @@ struct scsi_device {
                                 * this device */
        unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
                                     * because we did a bus reset. */
+       unsigned expecting_media_change:1; /* Expecting media change
ASC/ASCQ
+                                             when it actually doesn't
change */
        unsigned use_10_for_rw:1; /* first try 10-byte read / write */
        unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
        unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
Martin Kepplinger July 30, 2020, 8:54 a.m. UTC | #42
On 30.07.20 10:52, Martin Kepplinger wrote:
> On 29.07.20 20:29, James Bottomley wrote:
>> On Wed, 2020-07-29 at 14:25 -0400, Alan Stern wrote:
>>> On Wed, Jul 29, 2020 at 06:43:48PM +0200, Martin Kepplinger wrote:
>> [...]
>>>>>> --- a/drivers/scsi/scsi_error.c
>>>>>> +++ b/drivers/scsi/scsi_error.c
>>>>>> @@ -554,16 +554,8 @@ int scsi_check_sense(struct scsi_cmnd
>>>>>> *scmd)
>>>>>>                  * so that we can deal with it there.
>>>>>>                  */
>>>>>>                 if (scmd->device->expecting_cc_ua) {
>>>>>> -                       /*
>>>>>> -                        * Because some device does not queue
>>>>>> unit
>>>>>> -                        * attentions correctly, we carefully
>>>>>> check
>>>>>> -                        * additional sense code and qualifier
>>>>>> so as
>>>>>> -                        * not to squash media change unit
>>>>>> attention.
>>>>>> -                        */
>>>>>> -                       if (sshdr.asc != 0x28 || sshdr.ascq !=
>>>>>> 0x00)
>>>>>> {
>>>>>> -                               scmd->device->expecting_cc_ua =
>>>>>> 0;
>>>>>> -                               return NEEDS_RETRY;
>>>>>> -                       }
>>>>>> +                       scmd->device->expecting_cc_ua = 0;
>>>>>> +                       return NEEDS_RETRY;
>>>>>
>>>>> Well, yes, but you can't do this because it would lose us media
>>>>> change events in the non-suspend/resume case which we really
>>>>> don't want.  That's why I was suggesting a new flag.
>>>>>
>>>>> James
>>>>
>>>> also if I set expecting_cc_ua in resume() only, like I did?
>>>
>>> That wouldn't make any difference.  The information sent by your
>>> card reader has sshdr.asc == 0x28 and sshdr.ascq == 0x00 (you can see
>>> it in the log).  So because of the code here in scsi_check_sense(),
>>> which you can't change, the Unit Attention sent by the card reader
>>> would not be  retried even if you do set the flag in resume().
>>
>> But if we had a new flag, like expecting_media_change, you could set
>> that in resume and we could condition the above test in the code on it
>> and reset it and do a retry if it gets set.  I'm not saying we have to
>> do it this way, but it sounds like we have to do something in the
>> kernel, so I think the flag will become necessary but there might be a
>> bit of policy based dance around how it gets set in the kernel (to
>> avoid losing accurate media change events).
>>
>> James
>>
> 
> Maybe I should just start a new discussion with a patch, but the below
> is what makes sense to me (when I understand you correctly) and seems to
> work. I basically add a new flag, so that the old flags behave unchanged
> and only call it during *runtime* resume for SD cards:
> 
> 
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -553,15 +553,21 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>                  * information that we should pass up to the upper-level
> driver
>                  * so that we can deal with it there.
>                  */
> -               if (scmd->device->expecting_cc_ua) {
> +               if (scmd->device->expecting_cc_ua ||
> +                   scmd->device->expecting_media_change) {
>                         /*
>                          * Because some device does not queue unit
>                          * attentions correctly, we carefully check
>                          * additional sense code and qualifier so as
> -                        * not to squash media change unit attention.
> +                        * not to squash media change unit attention;
> +                        * unless expecting_media_change is set, indicating
> +                        * that the media (most likely) didn't change
> +                        * but a device only believes so (for example
> +                        * because of suspend/resume).
>                          */
> -                       if (sshdr.asc != 0x28 || sshdr.ascq != 0x00) {
> -                               scmd->device->expecting_cc_ua = 0;
> +                       if ((sshdr.asc != 0x28 || sshdr.ascq != 0x00) ||
> +                           scmd->device->expecting_media_change) {
> +                               scmd->device->expecting_media_change = 0;

oops, I missed expecting_cc_ua here, but you get the idea.

>                                 return NEEDS_RETRY;
>                         }
>                 }
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d90fefffe31b..b647fab2b663 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -114,6 +114,7 @@ static void sd_shutdown(struct device *);
>  static int sd_suspend_system(struct device *);
>  static int sd_suspend_runtime(struct device *);
>  static int sd_resume(struct device *);
> +static int sd_resume_runtime(struct device *);
>  static void sd_rescan(struct device *);
>  static blk_status_t sd_init_command(struct scsi_cmnd *SCpnt);
>  static void sd_uninit_command(struct scsi_cmnd *SCpnt);
> @@ -574,7 +575,7 @@ static const struct dev_pm_ops sd_pm_ops = {
>         .poweroff               = sd_suspend_system,
>         .restore                = sd_resume,
>         .runtime_suspend        = sd_suspend_runtime,
> -       .runtime_resume         = sd_resume,
> +       .runtime_resume         = sd_resume_runtime,
>  };
> 
>  static struct scsi_driver sd_template = {
> @@ -3652,6 +3653,21 @@ static int sd_resume(struct device *dev)
>         return ret;
>  }
> 
> +static int sd_resume_runtime(struct device *dev)
> +{
> +       struct scsi_disk *sdkp = dev_get_drvdata(dev);
> +
> +       /* Some SD cardreaders report media change when resuming from
> suspend
> +        * because they can't keep track during suspend. */
> +
> +       /* XXX This is not unproblematic though: We won't notice when a card
> +        * was really changed during runtime suspend! We basically rely
> on users
> +        * to unmount or suspend before doing so. */
> +       sdkp->device->expecting_media_change = 1;
> +
> +       return sd_resume(dev);
> +}
> +
>  /**
>   *     init_sd - entry point for this driver (both when built in or when
>   *     a module).
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index bc5909033d13..8c8f053f71c8 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -169,6 +169,8 @@ struct scsi_device {
>                                  * this device */
>         unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
>                                      * because we did a bus reset. */
> +       unsigned expecting_media_change:1; /* Expecting media change
> ASC/ASCQ
> +                                             when it actually doesn't
> change */
>         unsigned use_10_for_rw:1; /* first try 10-byte read / write */
>         unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
>         unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
>
Alan Stern July 30, 2020, 3:10 p.m. UTC | #43
On Thu, Jul 30, 2020 at 10:52:14AM +0200, Martin Kepplinger wrote:
> Maybe I should just start a new discussion with a patch, but the below
> is what makes sense to me (when I understand you correctly) and seems to
> work. I basically add a new flag, so that the old flags behave unchanged
> and only call it during *runtime* resume for SD cards:
> 
> 
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -553,15 +553,21 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>                  * information that we should pass up to the upper-level
> driver
>                  * so that we can deal with it there.
>                  */
> -               if (scmd->device->expecting_cc_ua) {
> +               if (scmd->device->expecting_cc_ua ||
> +                   scmd->device->expecting_media_change) {
>                         /*
>                          * Because some device does not queue unit
>                          * attentions correctly, we carefully check
>                          * additional sense code and qualifier so as
> -                        * not to squash media change unit attention.
> +                        * not to squash media change unit attention;
> +                        * unless expecting_media_change is set, indicating
> +                        * that the media (most likely) didn't change
> +                        * but a device only believes so (for example
> +                        * because of suspend/resume).
>                          */
> -                       if (sshdr.asc != 0x28 || sshdr.ascq != 0x00) {
> -                               scmd->device->expecting_cc_ua = 0;
> +                       if ((sshdr.asc != 0x28 || sshdr.ascq != 0x00) ||
> +                           scmd->device->expecting_media_change) {
> +                               scmd->device->expecting_media_change = 0;
>                                 return NEEDS_RETRY;
>                         }
>                 }
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d90fefffe31b..b647fab2b663 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -114,6 +114,7 @@ static void sd_shutdown(struct device *);
>  static int sd_suspend_system(struct device *);
>  static int sd_suspend_runtime(struct device *);
>  static int sd_resume(struct device *);
> +static int sd_resume_runtime(struct device *);
>  static void sd_rescan(struct device *);
>  static blk_status_t sd_init_command(struct scsi_cmnd *SCpnt);
>  static void sd_uninit_command(struct scsi_cmnd *SCpnt);
> @@ -574,7 +575,7 @@ static const struct dev_pm_ops sd_pm_ops = {
>         .poweroff               = sd_suspend_system,
>         .restore                = sd_resume,
>         .runtime_suspend        = sd_suspend_runtime,
> -       .runtime_resume         = sd_resume,
> +       .runtime_resume         = sd_resume_runtime,
>  };
> 
>  static struct scsi_driver sd_template = {
> @@ -3652,6 +3653,21 @@ static int sd_resume(struct device *dev)
>         return ret;
>  }
> 
> +static int sd_resume_runtime(struct device *dev)
> +{
> +       struct scsi_disk *sdkp = dev_get_drvdata(dev);
> +
> +       /* Some SD cardreaders report media change when resuming from
> suspend
> +        * because they can't keep track during suspend. */
> +
> +       /* XXX This is not unproblematic though: We won't notice when a card
> +        * was really changed during runtime suspend! We basically rely
> on users
> +        * to unmount or suspend before doing so. */
> +       sdkp->device->expecting_media_change = 1;
> +
> +       return sd_resume(dev);
> +}
> +
>  /**
>   *     init_sd - entry point for this driver (both when built in or when
>   *     a module).
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index bc5909033d13..8c8f053f71c8 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -169,6 +169,8 @@ struct scsi_device {
>                                  * this device */
>         unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
>                                      * because we did a bus reset. */
> +       unsigned expecting_media_change:1; /* Expecting media change
> ASC/ASCQ
> +                                             when it actually doesn't
> change */
>         unsigned use_10_for_rw:1; /* first try 10-byte read / write */
>         unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
>         unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */

That's pretty much what James was suggesting, except for one thing: You 
must not set sdkp->device->expecting_media_change to 1 for all devices 
in sd_runtime_resume().  Only for devices which may generate a spurious 
Unit Attention following runtime resume -- and maybe not even for all of 
them, depending on what the user wants.

Alan Stern
Alan Stern July 30, 2020, 3:14 p.m. UTC | #44
On Thu, Jul 30, 2020 at 10:05:50AM +0200, Martin Kepplinger wrote:
> On 29.06.20 18:15, Alan Stern wrote:
> > On Mon, Jun 29, 2020 at 11:42:59AM +0200, Martin Kepplinger wrote:
> >>
> >>
> >> On 26.06.20 17:44, Alan Stern wrote:
> >>> Martin's best approach would be to add some debugging code to find out why 
> >>> blk_queue_enter() isn't calling bkl_pm_request_resume(), or why that call 
> >>> doesn't lead to pm_request_resume().
> >>>
> >>> Alan Stern
> >>>
> >>
> >> Hi Alan,
> >>
> >> blk_queue_enter() always - especially when sd is runtime suspended and I
> >> try to mount as above - sets success to be true for me, so never
> >> continues down to bkl_pm_request_resume(). All I see is "PM: Removing
> >> info for No Bus:sda1".
> > 
> > Aha.  Looking at this more closely, it's apparent that the code in 
> > blk-core.c contains a logic bug: It assumes that if the BLK_MQ_REQ_PREEMPT 
> > flag is set then the request can be issued regardless of the queue's 
> > runtime status.  That is not correct when the queue is suspended.
> > 
> > Below is my attempt to fix this up.  I'm not sure that the patch is 
> > entirely correct, but it should fix this logic bug.  I would appreciate a 
> > critical review.
> > 
> > Martin, does this fix the problem?
> > 
> > Alan Stern
> 
> Hi Alan,
> 
> So in the block layer your change below indeed fixes the problem and if
> you want to submit that 1:1 feel free to add
> 
> Tested-by: Martin Kepplinger <martin.kepplinger@puri.sm>
> 
> thanks for your help in this!

Thank you for _your_ help!

The next merge window is coming up soon.  I think I'll wait until it is 
over before submitting the patch (maintainers tend to be too busy to 
consider new patches during a merge window).

But I am still open to comments or criticism of the patch in the 
meantime.  There hasn't been any feedback since Bart's initial set of 
questions.

Alan Stern
Martin Kepplinger Aug. 4, 2020, 9:39 a.m. UTC | #45
On 30.07.20 17:10, Alan Stern wrote:
> On Thu, Jul 30, 2020 at 10:52:14AM +0200, Martin Kepplinger wrote:
>> Maybe I should just start a new discussion with a patch, but the below
>> is what makes sense to me (when I understand you correctly) and seems to
>> work. I basically add a new flag, so that the old flags behave unchanged
>> and only call it during *runtime* resume for SD cards:
>>
>>
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -553,15 +553,21 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>>                  * information that we should pass up to the upper-level
>> driver
>>                  * so that we can deal with it there.
>>                  */
>> -               if (scmd->device->expecting_cc_ua) {
>> +               if (scmd->device->expecting_cc_ua ||
>> +                   scmd->device->expecting_media_change) {
>>                         /*
>>                          * Because some device does not queue unit
>>                          * attentions correctly, we carefully check
>>                          * additional sense code and qualifier so as
>> -                        * not to squash media change unit attention.
>> +                        * not to squash media change unit attention;
>> +                        * unless expecting_media_change is set, indicating
>> +                        * that the media (most likely) didn't change
>> +                        * but a device only believes so (for example
>> +                        * because of suspend/resume).
>>                          */
>> -                       if (sshdr.asc != 0x28 || sshdr.ascq != 0x00) {
>> -                               scmd->device->expecting_cc_ua = 0;
>> +                       if ((sshdr.asc != 0x28 || sshdr.ascq != 0x00) ||
>> +                           scmd->device->expecting_media_change) {
>> +                               scmd->device->expecting_media_change = 0;
>>                                 return NEEDS_RETRY;
>>                         }
>>                 }
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index d90fefffe31b..b647fab2b663 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -114,6 +114,7 @@ static void sd_shutdown(struct device *);
>>  static int sd_suspend_system(struct device *);
>>  static int sd_suspend_runtime(struct device *);
>>  static int sd_resume(struct device *);
>> +static int sd_resume_runtime(struct device *);
>>  static void sd_rescan(struct device *);
>>  static blk_status_t sd_init_command(struct scsi_cmnd *SCpnt);
>>  static void sd_uninit_command(struct scsi_cmnd *SCpnt);
>> @@ -574,7 +575,7 @@ static const struct dev_pm_ops sd_pm_ops = {
>>         .poweroff               = sd_suspend_system,
>>         .restore                = sd_resume,
>>         .runtime_suspend        = sd_suspend_runtime,
>> -       .runtime_resume         = sd_resume,
>> +       .runtime_resume         = sd_resume_runtime,
>>  };
>>
>>  static struct scsi_driver sd_template = {
>> @@ -3652,6 +3653,21 @@ static int sd_resume(struct device *dev)
>>         return ret;
>>  }
>>
>> +static int sd_resume_runtime(struct device *dev)
>> +{
>> +       struct scsi_disk *sdkp = dev_get_drvdata(dev);
>> +
>> +       /* Some SD cardreaders report media change when resuming from
>> suspend
>> +        * because they can't keep track during suspend. */
>> +
>> +       /* XXX This is not unproblematic though: We won't notice when a card
>> +        * was really changed during runtime suspend! We basically rely
>> on users
>> +        * to unmount or suspend before doing so. */
>> +       sdkp->device->expecting_media_change = 1;
>> +
>> +       return sd_resume(dev);
>> +}
>> +
>>  /**
>>   *     init_sd - entry point for this driver (both when built in or when
>>   *     a module).
>> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
>> index bc5909033d13..8c8f053f71c8 100644
>> --- a/include/scsi/scsi_device.h
>> +++ b/include/scsi/scsi_device.h
>> @@ -169,6 +169,8 @@ struct scsi_device {
>>                                  * this device */
>>         unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
>>                                      * because we did a bus reset. */
>> +       unsigned expecting_media_change:1; /* Expecting media change
>> ASC/ASCQ
>> +                                             when it actually doesn't
>> change */
>>         unsigned use_10_for_rw:1; /* first try 10-byte read / write */
>>         unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
>>         unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
> 
> That's pretty much what James was suggesting, except for one thing: You 
> must not set sdkp->device->expecting_media_change to 1 for all devices 
> in sd_runtime_resume().  Only for devices which may generate a spurious 
> Unit Attention following runtime resume -- and maybe not even for all of 
> them, depending on what the user wants.
> 
> Alan Stern
> 

when I mount the SD card myself or via Nautilus, things work. When I put
sth like:

/dev/sda1 /mnt/sda1 auto auto,nofail 0 2

into fstab, I *still* get (constantly) when accessing the files:

[   50.838061] sd 0:0:0:0: [sda] tag#0 device offline or changed

why could that be? is there another place we would add such a new flag
(not only resume())?

                              martin
Martin Kepplinger Aug. 7, 2020, 9:51 a.m. UTC | #46
On 04.08.20 11:39, Martin Kepplinger wrote:
> On 30.07.20 17:10, Alan Stern wrote:
>> On Thu, Jul 30, 2020 at 10:52:14AM +0200, Martin Kepplinger wrote:
>>> Maybe I should just start a new discussion with a patch, but the below
>>> is what makes sense to me (when I understand you correctly) and seems to
>>> work. I basically add a new flag, so that the old flags behave unchanged
>>> and only call it during *runtime* resume for SD cards:
>>>
>>>
>>> --- a/drivers/scsi/scsi_error.c
>>> +++ b/drivers/scsi/scsi_error.c
>>> @@ -553,15 +553,21 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>>>                  * information that we should pass up to the upper-level
>>> driver
>>>                  * so that we can deal with it there.
>>>                  */
>>> -               if (scmd->device->expecting_cc_ua) {
>>> +               if (scmd->device->expecting_cc_ua ||
>>> +                   scmd->device->expecting_media_change) {
>>>                         /*
>>>                          * Because some device does not queue unit
>>>                          * attentions correctly, we carefully check
>>>                          * additional sense code and qualifier so as
>>> -                        * not to squash media change unit attention.
>>> +                        * not to squash media change unit attention;
>>> +                        * unless expecting_media_change is set, indicating
>>> +                        * that the media (most likely) didn't change
>>> +                        * but a device only believes so (for example
>>> +                        * because of suspend/resume).
>>>                          */
>>> -                       if (sshdr.asc != 0x28 || sshdr.ascq != 0x00) {
>>> -                               scmd->device->expecting_cc_ua = 0;
>>> +                       if ((sshdr.asc != 0x28 || sshdr.ascq != 0x00) ||
>>> +                           scmd->device->expecting_media_change) {
>>> +                               scmd->device->expecting_media_change = 0;
>>>                                 return NEEDS_RETRY;
>>>                         }
>>>                 }
>>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>>> index d90fefffe31b..b647fab2b663 100644
>>> --- a/drivers/scsi/sd.c
>>> +++ b/drivers/scsi/sd.c
>>> @@ -114,6 +114,7 @@ static void sd_shutdown(struct device *);
>>>  static int sd_suspend_system(struct device *);
>>>  static int sd_suspend_runtime(struct device *);
>>>  static int sd_resume(struct device *);
>>> +static int sd_resume_runtime(struct device *);
>>>  static void sd_rescan(struct device *);
>>>  static blk_status_t sd_init_command(struct scsi_cmnd *SCpnt);
>>>  static void sd_uninit_command(struct scsi_cmnd *SCpnt);
>>> @@ -574,7 +575,7 @@ static const struct dev_pm_ops sd_pm_ops = {
>>>         .poweroff               = sd_suspend_system,
>>>         .restore                = sd_resume,
>>>         .runtime_suspend        = sd_suspend_runtime,
>>> -       .runtime_resume         = sd_resume,
>>> +       .runtime_resume         = sd_resume_runtime,
>>>  };
>>>
>>>  static struct scsi_driver sd_template = {
>>> @@ -3652,6 +3653,21 @@ static int sd_resume(struct device *dev)
>>>         return ret;
>>>  }
>>>
>>> +static int sd_resume_runtime(struct device *dev)
>>> +{
>>> +       struct scsi_disk *sdkp = dev_get_drvdata(dev);
>>> +
>>> +       /* Some SD cardreaders report media change when resuming from
>>> suspend
>>> +        * because they can't keep track during suspend. */
>>> +
>>> +       /* XXX This is not unproblematic though: We won't notice when a card
>>> +        * was really changed during runtime suspend! We basically rely
>>> on users
>>> +        * to unmount or suspend before doing so. */
>>> +       sdkp->device->expecting_media_change = 1;
>>> +
>>> +       return sd_resume(dev);
>>> +}
>>> +
>>>  /**
>>>   *     init_sd - entry point for this driver (both when built in or when
>>>   *     a module).
>>> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
>>> index bc5909033d13..8c8f053f71c8 100644
>>> --- a/include/scsi/scsi_device.h
>>> +++ b/include/scsi/scsi_device.h
>>> @@ -169,6 +169,8 @@ struct scsi_device {
>>>                                  * this device */
>>>         unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
>>>                                      * because we did a bus reset. */
>>> +       unsigned expecting_media_change:1; /* Expecting media change
>>> ASC/ASCQ
>>> +                                             when it actually doesn't
>>> change */
>>>         unsigned use_10_for_rw:1; /* first try 10-byte read / write */
>>>         unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
>>>         unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
>>
>> That's pretty much what James was suggesting, except for one thing: You 
>> must not set sdkp->device->expecting_media_change to 1 for all devices 
>> in sd_runtime_resume().  Only for devices which may generate a spurious 
>> Unit Attention following runtime resume -- and maybe not even for all of 
>> them, depending on what the user wants.
>>
>> Alan Stern
>>
> 
> when I mount the SD card myself or via Nautilus, things work. When I put
> sth like:
> 
> /dev/sda1 /mnt/sda1 auto auto,nofail 0 2
> 
> into fstab, I *still* get (constantly) when accessing the files:
> 
> [   50.838061] sd 0:0:0:0: [sda] tag#0 device offline or changed
> 
> why could that be? is there another place we would add such a new flag
> (not only resume())?
> 
>                               martin
> 


it's really strange: below is the change I'm trying. Of course that's
only for testing the functionality, nothing how a patch could look like.

While I remember it had worked, now (weirdly since I tried that mounting
via fstab) it doesn't anymore!

What I understand (not much): I handle the error with "retry" via the
new flag, but scsi_decide_disposition() returns SUCCESS because of "no
more retries"; but it's the first and only time it's called.

How can this be? What am I missing?



--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -565,6 +565,13 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
 				return NEEDS_RETRY;
 			}
 		}
+		if (scmd->device->expecting_media_change) {
+			if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
+				scmd->device->expecting_media_change = 0;
+				return NEEDS_RETRY;
+			}
+		}
+
 		/*
 		 * we might also expect a cc/ua if another LUN on the target
 		 * reported a UA with an ASC/ASCQ of 3F 0E -
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d90fefffe31b..bb583e403b81 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3642,6 +3642,8 @@ static int sd_resume(struct device *dev)
 	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
 		return 0;

+	sdkp->device->expecting_media_change = 1;
+
 	if (!sdkp->device->manage_start_stop)
 		return 0;

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index bc5909033d13..f5fc1af68e00 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -169,6 +169,7 @@ struct scsi_device {
 				 * this device */
 	unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
 				     * because we did a bus reset. */
+	unsigned expecting_media_change:1;
 	unsigned use_10_for_rw:1; /* first try 10-byte read / write */
 	unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
 	unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
Alan Stern Aug. 7, 2020, 2:30 p.m. UTC | #47
On Fri, Aug 07, 2020 at 11:51:21AM +0200, Martin Kepplinger wrote:
> it's really strange: below is the change I'm trying. Of course that's
> only for testing the functionality, nothing how a patch could look like.
> 
> While I remember it had worked, now (weirdly since I tried that mounting
> via fstab) it doesn't anymore!
> 
> What I understand (not much): I handle the error with "retry" via the
> new flag, but scsi_decide_disposition() returns SUCCESS because of "no
> more retries"; but it's the first and only time it's called.

Are you saying that scmd->allowed is set to 0?  Or is scsi_notry_cmd() 
returning a nonzero value?  Whichever is true, why does it happen that 
way?

What is the failing command?  Is it a READ(10)?

> How can this be? What am I missing?

It's kind of hard to tell without seeing the error messages or system 
log or any debugging information.

Alan Stern

> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -565,6 +565,13 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>  				return NEEDS_RETRY;
>  			}
>  		}
> +		if (scmd->device->expecting_media_change) {
> +			if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
> +				scmd->device->expecting_media_change = 0;
> +				return NEEDS_RETRY;
> +			}
> +		}
> +
>  		/*
>  		 * we might also expect a cc/ua if another LUN on the target
>  		 * reported a UA with an ASC/ASCQ of 3F 0E -
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d90fefffe31b..bb583e403b81 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3642,6 +3642,8 @@ static int sd_resume(struct device *dev)
>  	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
>  		return 0;
> 
> +	sdkp->device->expecting_media_change = 1;
> +
>  	if (!sdkp->device->manage_start_stop)
>  		return 0;
> 
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index bc5909033d13..f5fc1af68e00 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -169,6 +169,7 @@ struct scsi_device {
>  				 * this device */
>  	unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
>  				     * because we did a bus reset. */
> +	unsigned expecting_media_change:1;
>  	unsigned use_10_for_rw:1; /* first try 10-byte read / write */
>  	unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
>  	unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
Martin Kepplinger Aug. 8, 2020, 6:59 a.m. UTC | #48
On 07.08.20 16:30, Alan Stern wrote:
> On Fri, Aug 07, 2020 at 11:51:21AM +0200, Martin Kepplinger wrote:
>> it's really strange: below is the change I'm trying. Of course that's
>> only for testing the functionality, nothing how a patch could look like.
>>
>> While I remember it had worked, now (weirdly since I tried that mounting
>> via fstab) it doesn't anymore!
>>
>> What I understand (not much): I handle the error with "retry" via the
>> new flag, but scsi_decide_disposition() returns SUCCESS because of "no
>> more retries"; but it's the first and only time it's called.
> 
> Are you saying that scmd->allowed is set to 0?  Or is scsi_notry_cmd() 
> returning a nonzero value?  Whichever is true, why does it happen that 
> way?

scsi_notry_cmd() is returning 1. (it's retry 1 of 5 allowed).

why is it returning 1? REQ_FAILFAST_DEV is set. It's DID_OK, then "if
(status_byte(scmd->result) != CHECK_CONDITION)" appearently is not true,
then at the end it returns 1 because of REQ_FAILFAST_DEV.

that seems to come from the block layer. why and when? could I change
that so that the scsi error handling stays in control?

> 
> What is the failing command?  Is it a READ(10)?

Not sure how I'd answer that, but here's the test to trigger the error:

mount /dev/sda1 /mnt
cd /mnt
ls
cp file ~/ (if ls "works" and doesn't yet trigger the error)

and that's the (familiar looking) logs when doing so. again: despite the
mentioned workaround in scsi_error and the new expected_media_change
flag *is* set and gets cleared, as it should be. REQ_FAILFAST_DEV seems
to override what I want to do is scsi_error:

[   55.557629] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
hostbyte=0x00 driverbyte=0x08 cmd_age=0s
[   55.557639] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
[   55.557646] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
[   55.557657] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 08 fc
e0 00 00 01 00
[   55.557666] blk_update_request: I/O error, dev sda, sector 589024 op
0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   55.568899] sd 0:0:0:0: [sda] tag#0 device offline or changed
[   55.574691] blk_update_request: I/O error, dev sda, sector 589025 op
0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   55.585756] sd 0:0:0:0: [sda] tag#0 device offline or changed
[   55.591562] blk_update_request: I/O error, dev sda, sector 589026 op
0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   55.602274] sd 0:0:0:0: [sda] tag#0 device offline or changed
(... goes on with the same)


> 
>> How can this be? What am I missing?
> 
> It's kind of hard to tell without seeing the error messages or system 
> log or any debugging information.

thanks a lot for getting back to me,

                           martin


> 
> Alan Stern
> 
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -565,6 +565,13 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>>  				return NEEDS_RETRY;
>>  			}
>>  		}
>> +		if (scmd->device->expecting_media_change) {
>> +			if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
>> +				scmd->device->expecting_media_change = 0;
>> +				return NEEDS_RETRY;
>> +			}
>> +		}
>> +
>>  		/*
>>  		 * we might also expect a cc/ua if another LUN on the target
>>  		 * reported a UA with an ASC/ASCQ of 3F 0E -
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index d90fefffe31b..bb583e403b81 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3642,6 +3642,8 @@ static int sd_resume(struct device *dev)
>>  	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
>>  		return 0;
>>
>> +	sdkp->device->expecting_media_change = 1;
>> +
>>  	if (!sdkp->device->manage_start_stop)
>>  		return 0;
>>
>> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
>> index bc5909033d13..f5fc1af68e00 100644
>> --- a/include/scsi/scsi_device.h
>> +++ b/include/scsi/scsi_device.h
>> @@ -169,6 +169,7 @@ struct scsi_device {
>>  				 * this device */
>>  	unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
>>  				     * because we did a bus reset. */
>> +	unsigned expecting_media_change:1;
>>  	unsigned use_10_for_rw:1; /* first try 10-byte read / write */
>>  	unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
>>  	unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
Alan Stern Aug. 8, 2020, 3:05 p.m. UTC | #49
On Sat, Aug 08, 2020 at 08:59:09AM +0200, Martin Kepplinger wrote:
> On 07.08.20 16:30, Alan Stern wrote:
> > On Fri, Aug 07, 2020 at 11:51:21AM +0200, Martin Kepplinger wrote:
> >> it's really strange: below is the change I'm trying. Of course that's
> >> only for testing the functionality, nothing how a patch could look like.
> >>
> >> While I remember it had worked, now (weirdly since I tried that mounting
> >> via fstab) it doesn't anymore!
> >>
> >> What I understand (not much): I handle the error with "retry" via the
> >> new flag, but scsi_decide_disposition() returns SUCCESS because of "no
> >> more retries"; but it's the first and only time it's called.
> > 
> > Are you saying that scmd->allowed is set to 0?  Or is scsi_notry_cmd() 
> > returning a nonzero value?  Whichever is true, why does it happen that 
> > way?
> 
> scsi_notry_cmd() is returning 1. (it's retry 1 of 5 allowed).
> 
> why is it returning 1? REQ_FAILFAST_DEV is set. It's DID_OK, then "if
> (status_byte(scmd->result) != CHECK_CONDITION)" appearently is not true,
> then at the end it returns 1 because of REQ_FAILFAST_DEV.
> 
> that seems to come from the block layer. why and when? could I change
> that so that the scsi error handling stays in control?

The only place I see where that flag might get set is in 
blk_mq_bio_to_request() in block/blk-mq.c, which does:

	if (bio->bi_opf & REQ_RAHEAD)
		rq->cmd_flags |= REQ_FAILFAST_MASK;

So apparently read-ahead reads are supposed to fail fast (i.e., without 
retries), presumably because they are optional after all.

> > What is the failing command?  Is it a READ(10)?
> 
> Not sure how I'd answer that, but here's the test to trigger the error:
> 
> mount /dev/sda1 /mnt
> cd /mnt
> ls
> cp file ~/ (if ls "works" and doesn't yet trigger the error)
> 
> and that's the (familiar looking) logs when doing so. again: despite the
> mentioned workaround in scsi_error and the new expected_media_change
> flag *is* set and gets cleared, as it should be. REQ_FAILFAST_DEV seems
> to override what I want to do is scsi_error:
> 
> [   55.557629] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
> [   55.557639] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
> [   55.557646] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
> [   55.557657] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 08 fc
> e0 00 00 01 00

Yes, 0x28 is READ(10).  Likely this is a read-ahead request, although I 
don't know how we can tell for sure.

> [   55.557666] blk_update_request: I/O error, dev sda, sector 589024 op
> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [   55.568899] sd 0:0:0:0: [sda] tag#0 device offline or changed
> [   55.574691] blk_update_request: I/O error, dev sda, sector 589025 op
> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [   55.585756] sd 0:0:0:0: [sda] tag#0 device offline or changed
> [   55.591562] blk_update_request: I/O error, dev sda, sector 589026 op
> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [   55.602274] sd 0:0:0:0: [sda] tag#0 device offline or changed
> (... goes on with the same)

Is such a drastic response really appropriate for the failure of a 
read-ahead request?  It seems like a more logical response would be to 
let the request fail but keep the device online.

Of course, that would only solve part of your problem -- your log would 
still get filled with those "I/O error" messages even though they 
wouldn't be fatal.  Probably a better approach would be to make the new 
expecting_media_change flag override scsi_no_retry_cmd().

But this is not my area of expertise.  Maybe someone else will have more 
to say.

Alan Stern
Martin Kepplinger Aug. 9, 2020, 9:20 a.m. UTC | #50
On 08.08.20 17:05, Alan Stern wrote:
> On Sat, Aug 08, 2020 at 08:59:09AM +0200, Martin Kepplinger wrote:
>> On 07.08.20 16:30, Alan Stern wrote:
>>> On Fri, Aug 07, 2020 at 11:51:21AM +0200, Martin Kepplinger wrote:
>>>> it's really strange: below is the change I'm trying. Of course that's
>>>> only for testing the functionality, nothing how a patch could look like.
>>>>
>>>> While I remember it had worked, now (weirdly since I tried that mounting
>>>> via fstab) it doesn't anymore!
>>>>
>>>> What I understand (not much): I handle the error with "retry" via the
>>>> new flag, but scsi_decide_disposition() returns SUCCESS because of "no
>>>> more retries"; but it's the first and only time it's called.
>>>
>>> Are you saying that scmd->allowed is set to 0?  Or is scsi_notry_cmd() 
>>> returning a nonzero value?  Whichever is true, why does it happen that 
>>> way?
>>
>> scsi_notry_cmd() is returning 1. (it's retry 1 of 5 allowed).
>>
>> why is it returning 1? REQ_FAILFAST_DEV is set. It's DID_OK, then "if
>> (status_byte(scmd->result) != CHECK_CONDITION)" appearently is not true,
>> then at the end it returns 1 because of REQ_FAILFAST_DEV.
>>
>> that seems to come from the block layer. why and when? could I change
>> that so that the scsi error handling stays in control?
> 
> The only place I see where that flag might get set is in 
> blk_mq_bio_to_request() in block/blk-mq.c, which does:
> 
> 	if (bio->bi_opf & REQ_RAHEAD)
> 		rq->cmd_flags |= REQ_FAILFAST_MASK;
> 
> So apparently read-ahead reads are supposed to fail fast (i.e., without 
> retries), presumably because they are optional after all.
> 
>>> What is the failing command?  Is it a READ(10)?
>>
>> Not sure how I'd answer that, but here's the test to trigger the error:
>>
>> mount /dev/sda1 /mnt
>> cd /mnt
>> ls
>> cp file ~/ (if ls "works" and doesn't yet trigger the error)
>>
>> and that's the (familiar looking) logs when doing so. again: despite the
>> mentioned workaround in scsi_error and the new expected_media_change
>> flag *is* set and gets cleared, as it should be. REQ_FAILFAST_DEV seems
>> to override what I want to do is scsi_error:
>>
>> [   55.557629] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
>> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
>> [   55.557639] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
>> [   55.557646] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
>> [   55.557657] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 08 fc
>> e0 00 00 01 00
> 
> Yes, 0x28 is READ(10).  Likely this is a read-ahead request, although I 
> don't know how we can tell for sure.
> 
>> [   55.557666] blk_update_request: I/O error, dev sda, sector 589024 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   55.568899] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   55.574691] blk_update_request: I/O error, dev sda, sector 589025 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   55.585756] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   55.591562] blk_update_request: I/O error, dev sda, sector 589026 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   55.602274] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> (... goes on with the same)
> 
> Is such a drastic response really appropriate for the failure of a 
> read-ahead request?  It seems like a more logical response would be to 
> let the request fail but keep the device online.
> 
> Of course, that would only solve part of your problem -- your log would 
> still get filled with those "I/O error" messages even though they 
> wouldn't be fatal.  Probably a better approach would be to make the new 
> expecting_media_change flag override scsi_no_retry_cmd().
> 
> But this is not my area of expertise.  Maybe someone else will have more 
> to say.
> 
> Alan Stern
> 

Hey Alan, I'm really glad for that, I suspected some of this but I have
little experience in scsi/block layers, so that is super helpful.

I'd appreciate an opinion on the below workaround that *seems* to work
now (let's see, I had thought so before :)

Whether or not this helps to find a real solution, let's see. But
integration of such a flag in the error handling paths is what's
interesting for now:


--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -565,6 +565,17 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
 				return NEEDS_RETRY;
 			}
 		}
+		if (scmd->device->expecting_media_change) {
+			if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
+				/* clear expecting_media_change in
+				 * scsi_noretry_cmd() because we need
+				 * to override possible "failfast" overrides
+				 * that block readahead can cause.
+				 */
+				return NEEDS_RETRY;
+			}
+		}
+
 		/*
 		 * we might also expect a cc/ua if another LUN on the target
 		 * reported a UA with an ASC/ASCQ of 3F 0E -
@@ -1744,6 +1755,15 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd)
 		return (scmd->request->cmd_flags & REQ_FAILFAST_DRIVER);
 	}

+	/*
+	 * We need to have retries when expecting_media_change is set.
+	 * So we need to return success and clear the flag here.
+	 */
+	if (scmd->device->expecting_media_change) {
+		scmd->device->expecting_media_change = 0;
+		return 0;
+	}
+
 	if (status_byte(scmd->result) != CHECK_CONDITION)
 		return 0;

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d90fefffe31b..bb583e403b81 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3642,6 +3642,8 @@ static int sd_resume(struct device *dev)
 	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
 		return 0;

+	sdkp->device->expecting_media_change = 1;
+
 	if (!sdkp->device->manage_start_stop)
 		return 0;

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index bc5909033d13..f5fc1af68e00 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -169,6 +169,7 @@ struct scsi_device {
 				 * this device */
 	unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
 				     * because we did a bus reset. */
+	unsigned expecting_media_change:1;
 	unsigned use_10_for_rw:1; /* first try 10-byte read / write */
 	unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
 	unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
--
Alan Stern Aug. 9, 2020, 3:26 p.m. UTC | #51
On Sun, Aug 09, 2020 at 11:20:22AM +0200, Martin Kepplinger wrote:
> Hey Alan, I'm really glad for that, I suspected some of this but I have
> little experience in scsi/block layers, so that is super helpful.
> 
> I'd appreciate an opinion on the below workaround that *seems* to work
> now (let's see, I had thought so before :)
> 
> Whether or not this helps to find a real solution, let's see. But
> integration of such a flag in the error handling paths is what's
> interesting for now:
> 
> 
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -565,6 +565,17 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>  				return NEEDS_RETRY;
>  			}
>  		}
> +		if (scmd->device->expecting_media_change) {
> +			if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
> +				/* clear expecting_media_change in
> +				 * scsi_noretry_cmd() because we need
> +				 * to override possible "failfast" overrides
> +				 * that block readahead can cause.
> +				 */
> +				return NEEDS_RETRY;

This is a somewhat fragile approach.  You don't know for certain that 
scsi_noretry_cmd will be called.  Also, scsi_noretry_cmd can be called 
from other places.

It would be better to clear the expecting_media_change flag just before 
returning from scsi_decide_disposition.  That way its use is localized 
to one routine, not spread out between two.

Alan Stern
Martin Kepplinger Aug. 10, 2020, 12:03 p.m. UTC | #52
On 09.08.20 17:26, Alan Stern wrote:
> On Sun, Aug 09, 2020 at 11:20:22AM +0200, Martin Kepplinger wrote:
>> Hey Alan, I'm really glad for that, I suspected some of this but I have
>> little experience in scsi/block layers, so that is super helpful.
>>
>> I'd appreciate an opinion on the below workaround that *seems* to work
>> now (let's see, I had thought so before :)
>>
>> Whether or not this helps to find a real solution, let's see. But
>> integration of such a flag in the error handling paths is what's
>> interesting for now:
>>
>>
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -565,6 +565,17 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>>  				return NEEDS_RETRY;
>>  			}
>>  		}
>> +		if (scmd->device->expecting_media_change) {
>> +			if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
>> +				/* clear expecting_media_change in
>> +				 * scsi_noretry_cmd() because we need
>> +				 * to override possible "failfast" overrides
>> +				 * that block readahead can cause.
>> +				 */
>> +				return NEEDS_RETRY;
> 
> This is a somewhat fragile approach.  You don't know for certain that 
> scsi_noretry_cmd will be called.  Also, scsi_noretry_cmd can be called 
> from other places.
> 
> It would be better to clear the expecting_media_change flag just before 
> returning from scsi_decide_disposition.  That way its use is localized 
> to one routine, not spread out between two.
> 
> Alan Stern
> 

Hi Alan,

maybe you're right. I initially just thought that I'd allow for specific
error codes in scsi_noretry_cmd() to return non-NULL (BUS_BUSY, PARITY,
ERROR) despite having the flag set.

The below version works equally fine for me but I'm not sure if it's
actually more safe.

James, when exposing a new writable sysfs option like
"suspend_no_media_change"(?) that drivers can check before setting the
new "expecting_media_change" flag (during resume), would this addition
make sense to you?

thanks,

                      martin



--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -565,6 +565,18 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
 				return NEEDS_RETRY;
 			}
 		}
+		if (scmd->device->expecting_media_change) {
+			if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
+				/*
+				 * clear the expecting_media_change in
+				 * scsi_decide_disposition() because we
+				 * need to catch possible "fail fast" overrides
+				 * that block readahead can cause.
+				 */
+				return NEEDS_RETRY;
+			}
+		}
+
 		/*
 		 * we might also expect a cc/ua if another LUN on the target
 		 * reported a UA with an ASC/ASCQ of 3F 0E -
@@ -1944,9 +1956,19 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
 	 * the request was not marked fast fail.  Note that above,
 	 * even if the request is marked fast fail, we still requeue
 	 * for queue congestion conditions (QUEUE_FULL or BUSY) */
-	if ((++scmd->retries) <= scmd->allowed
-	    && !scsi_noretry_cmd(scmd)) {
-		return NEEDS_RETRY;
+	if ((++scmd->retries) <= scmd->allowed) {
+		/*
+		 * but scsi_noretry_cmd() cannot override the
+		 * expecting_media_change flag.
+		 */
+		if (!scsi_noretry_cmd(scmd) ||
+		    scmd->device->expecting_media_change) {
+			scmd->device->expecting_media_change = 0;
+			return NEEDS_RETRY;
+		} else {
+			/* marked fast fail and not expected. */
+			return SUCCESS;
+		}
 	} else {
 		/*
 		 * no more retries - report this one back to upper level.
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d90fefffe31b..bb583e403b81 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3642,6 +3642,8 @@ static int sd_resume(struct device *dev)
 	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
 		return 0;

+	sdkp->device->expecting_media_change = 1;
+
 	if (!sdkp->device->manage_start_stop)
 		return 0;

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index bc5909033d13..f5fc1af68e00 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -169,6 +169,7 @@ struct scsi_device {
 				 * this device */
 	unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
 				     * because we did a bus reset. */
+	unsigned expecting_media_change:1;
 	unsigned use_10_for_rw:1; /* first try 10-byte read / write */
 	unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
 	unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
--
Alan Stern Aug. 10, 2020, 2:13 p.m. UTC | #53
On Mon, Aug 10, 2020 at 02:03:17PM +0200, Martin Kepplinger wrote:
> On 09.08.20 17:26, Alan Stern wrote:
> > This is a somewhat fragile approach.  You don't know for certain that 
> > scsi_noretry_cmd will be called.  Also, scsi_noretry_cmd can be called 
> > from other places.
> > 
> > It would be better to clear the expecting_media_change flag just before 
> > returning from scsi_decide_disposition.  That way its use is localized 
> > to one routine, not spread out between two.
> > 
> > Alan Stern
> > 
> 
> Hi Alan,
> 
> maybe you're right. I initially just thought that I'd allow for specific
> error codes in scsi_noretry_cmd() to return non-NULL (BUS_BUSY, PARITY,
> ERROR) despite having the flag set.
> 
> The below version works equally fine for me but I'm not sure if it's
> actually more safe.
> 
> James, when exposing a new writable sysfs option like
> "suspend_no_media_change"(?) that drivers can check before setting the
> new "expecting_media_change" flag (during resume), would this addition
> make sense to you?
> 
> thanks,
> 
>                       martin
> 
> 
> 
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -565,6 +565,18 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>  				return NEEDS_RETRY;
>  			}
>  		}
> +		if (scmd->device->expecting_media_change) {
> +			if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
> +				/*
> +				 * clear the expecting_media_change in
> +				 * scsi_decide_disposition() because we
> +				 * need to catch possible "fail fast" overrides
> +				 * that block readahead can cause.
> +				 */
> +				return NEEDS_RETRY;
> +			}
> +		}
> +
>  		/*
>  		 * we might also expect a cc/ua if another LUN on the target
>  		 * reported a UA with an ASC/ASCQ of 3F 0E -
> @@ -1944,9 +1956,19 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
>  	 * the request was not marked fast fail.  Note that above,
>  	 * even if the request is marked fast fail, we still requeue
>  	 * for queue congestion conditions (QUEUE_FULL or BUSY) */
> -	if ((++scmd->retries) <= scmd->allowed
> -	    && !scsi_noretry_cmd(scmd)) {
> -		return NEEDS_RETRY;
> +	if ((++scmd->retries) <= scmd->allowed) {
> +		/*
> +		 * but scsi_noretry_cmd() cannot override the
> +		 * expecting_media_change flag.
> +		 */
> +		if (!scsi_noretry_cmd(scmd) ||
> +		    scmd->device->expecting_media_change) {
> +			scmd->device->expecting_media_change = 0;
> +			return NEEDS_RETRY;
> +		} else {
> +			/* marked fast fail and not expected. */
> +			return SUCCESS;
> +		}
>  	} else {

This may not matter...  but it's worth pointing out that 
expecting_media_change doesn't get cleared if ++scmd->retries > 
scmd->allowed.

It also doesn't get cleared in cases where the device _doesn't_ 
report a Unit Attention.

Alan Stern
Martin Kepplinger Aug. 11, 2020, 7:55 a.m. UTC | #54
On 10.08.20 16:13, Alan Stern wrote:
> On Mon, Aug 10, 2020 at 02:03:17PM +0200, Martin Kepplinger wrote:
>> On 09.08.20 17:26, Alan Stern wrote:
>>> This is a somewhat fragile approach.  You don't know for certain that 
>>> scsi_noretry_cmd will be called.  Also, scsi_noretry_cmd can be called 
>>> from other places.
>>>
>>> It would be better to clear the expecting_media_change flag just before 
>>> returning from scsi_decide_disposition.  That way its use is localized 
>>> to one routine, not spread out between two.
>>>
>>> Alan Stern
>>>
>>
>> Hi Alan,
>>
>> maybe you're right. I initially just thought that I'd allow for specific
>> error codes in scsi_noretry_cmd() to return non-NULL (BUS_BUSY, PARITY,
>> ERROR) despite having the flag set.
>>
>> The below version works equally fine for me but I'm not sure if it's
>> actually more safe.
>>
>> James, when exposing a new writable sysfs option like
>> "suspend_no_media_change"(?) that drivers can check before setting the
>> new "expecting_media_change" flag (during resume), would this addition
>> make sense to you?
>>
>> thanks,
>>
>>                       martin
>>
>>
>>
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -565,6 +565,18 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>>  				return NEEDS_RETRY;
>>  			}
>>  		}
>> +		if (scmd->device->expecting_media_change) {
>> +			if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
>> +				/*
>> +				 * clear the expecting_media_change in
>> +				 * scsi_decide_disposition() because we
>> +				 * need to catch possible "fail fast" overrides
>> +				 * that block readahead can cause.
>> +				 */
>> +				return NEEDS_RETRY;
>> +			}
>> +		}
>> +
>>  		/*
>>  		 * we might also expect a cc/ua if another LUN on the target
>>  		 * reported a UA with an ASC/ASCQ of 3F 0E -
>> @@ -1944,9 +1956,19 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
>>  	 * the request was not marked fast fail.  Note that above,
>>  	 * even if the request is marked fast fail, we still requeue
>>  	 * for queue congestion conditions (QUEUE_FULL or BUSY) */
>> -	if ((++scmd->retries) <= scmd->allowed
>> -	    && !scsi_noretry_cmd(scmd)) {
>> -		return NEEDS_RETRY;
>> +	if ((++scmd->retries) <= scmd->allowed) {
>> +		/*
>> +		 * but scsi_noretry_cmd() cannot override the
>> +		 * expecting_media_change flag.
>> +		 */
>> +		if (!scsi_noretry_cmd(scmd) ||
>> +		    scmd->device->expecting_media_change) {
>> +			scmd->device->expecting_media_change = 0;
>> +			return NEEDS_RETRY;
>> +		} else {
>> +			/* marked fast fail and not expected. */
>> +			return SUCCESS;
>> +		}
>>  	} else {
> 
> This may not matter...  but it's worth pointing out that 
> expecting_media_change doesn't get cleared if ++scmd->retries > 
> scmd->allowed.

absolutely worth pointing out and I'm not yet sure about that one.

> 
> It also doesn't get cleared in cases where the device _doesn't_ 
> report a Unit Attention.

true. but don't we set the flag for a future UA we don't yet know of? If
we would want to clear it outside of a UA, I think we'd need to keep
track of a suspend/resume cycle and if we see that we *had* successfully
"done requests" after resuming, we could clear it...

> 
> Alan Stern
>
Alan Stern Aug. 11, 2020, 1:48 p.m. UTC | #55
On Tue, Aug 11, 2020 at 09:55:54AM +0200, Martin Kepplinger wrote:
> On 10.08.20 16:13, Alan Stern wrote:
> > This may not matter...  but it's worth pointing out that 
> > expecting_media_change doesn't get cleared if ++scmd->retries > 
> > scmd->allowed.
> 
> absolutely worth pointing out and I'm not yet sure about that one.
> 
> > 
> > It also doesn't get cleared in cases where the device _doesn't_ 
> > report a Unit Attention.
> 
> true. but don't we set the flag for a future UA we don't yet know of? If
> we would want to clear it outside of a UA, I think we'd need to keep
> track of a suspend/resume cycle and if we see that we *had* successfully
> "done requests" after resuming, we could clear it...

The point is that expecting_media_change should apply only to the _next_ 
command.  It should be cleared after _every_ command.  You do not want 
to leave it hanging around -- if you do then you will miss real media 
changes.

There's a potential issue when a bunch of commands get sent and 
completed all at once, but hopefully it won't matter here.

Alan Stern
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d90fefffe31b..fe4cb7c50ec1 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1372,6 +1372,7 @@  static int sd_open(struct block_device *bdev, fmode_t mode)
 	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_open\n"));
 
 	sdev = sdkp->device;
+	scsi_autopm_get_device(sdev);
 
 	/*
 	 * If the device is in error recovery, wait until it is done.
@@ -1418,6 +1419,9 @@  static int sd_open(struct block_device *bdev, fmode_t mode)
 
 error_out:
 	scsi_disk_put(sdkp);
+
+	scsi_autopm_put_device(sdev);
+
 	return retval;	
 }
 
@@ -1441,6 +1445,8 @@  static void sd_release(struct gendisk *disk, fmode_t mode)
 
 	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_release\n"));
 
+	scsi_autopm_put_device(sdev);
+
 	if (atomic_dec_return(&sdkp->openers) == 0 && sdev->removable) {
 		if (scsi_block_when_processing_errors(sdev))
 			scsi_set_medium_removal(sdev, SCSI_REMOVAL_ALLOW);