Message ID | 20230731003956.572414-1-dlemoal@kernel.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | ata,scsi: do not issue START STOP UNIT on resume | expand |
Tested-by: Tanner Watkins On 7/30/23 18:39, Damien Le Moal wrote: > During system resume, ata_port_pm_resume() triggers ata EH to > 1) Resume the controller > 2) Reset and rescan the ports > 3) Revalidate devices > This EH execution is started asynchronously from ata_port_pm_resume(), > which means that when sd_resume() is executed, none or only part of the > above processing may have been executed. However, sd_resume() issues a > START STOP UNIT to wake up the drive from sleep mode. This command is > translated to ATA with ata_scsi_start_stop_xlat() and issued to the > device. However, depending on the state of execution of the EH process > and revalidation triggerred by ata_port_pm_resume(), two things may > happen: > 1) The START STOP UNIT fails if it is received before the controller has > been reenabled at the beginning of the EH execution. This is visible > with error messages like: > > ata10.00: device reported invalid CHS sector 0 > sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK > sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] > sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command > sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 > sd 9:0:0:0: PM: failed to resume async: error -5 > > 2) The START STOP UNIT command is received while the EH process is > on-going, which mean that it is stopped and must wait for its > completion, at which point the command is rather useless as the drive > is already fully spun up already. This case results also in a > significant delay in sd_resume() which is observable by users as > the entire system resume completion is delayed. > > Given that ATA devices will be woken up by libata activity on resume, > sd_resume() has no need to issue a START STOP UNIT command, which solves > the above mentioned problems. Do not issue this command by introducing > the new scsi_device flag no_start_on_resume and setting this flag to 1 > in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > UNIT command only if this flag is not set. > > Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 > Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") > Signed-off-by: Damien Le Moal <dlemoal@kernel.org> > --- > drivers/ata/libata-scsi.c | 7 +++++++ > drivers/scsi/sd.c | 9 ++++++--- > include/scsi/scsi_device.h | 1 + > 3 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index 370d18aca71e..c6ece32de8e3 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -1100,7 +1100,14 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) > } > } else { > sdev->sector_size = ata_id_logical_sector_size(dev->id); > + /* > + * Stop the drive on suspend but do not issue START STOP UNIT > + * on resume as this is not necessary and may fail: the device > + * will be woken up by ata_port_pm_resume() with a port reset > + * and device revalidation. > + */ > sdev->manage_start_stop = 1; > + sdev->no_start_on_resume = 1; > } > > /* > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 68b12afa0721..3c668cfb146d 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev) > static int sd_resume(struct device *dev) > { > struct scsi_disk *sdkp = dev_get_drvdata(dev); > - int ret; > + int ret = 0; > > if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ > return 0; > @@ -3884,8 +3884,11 @@ static int sd_resume(struct device *dev) > if (!sdkp->device->manage_start_stop) > return 0; > > - sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); > - ret = sd_start_stop_device(sdkp, 1); > + if (!sdkp->device->no_start_on_resume) { > + sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); > + ret = sd_start_stop_device(sdkp, 1); > + } > + > if (!ret) > opal_unlock_from_suspend(sdkp->opal_dev); > return ret; > diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h > index 75b2235b99e2..b9230b6add04 100644 > --- a/include/scsi/scsi_device.h > +++ b/include/scsi/scsi_device.h > @@ -194,6 +194,7 @@ struct scsi_device { > unsigned no_start_on_add:1; /* do not issue start on add */ > unsigned allow_restart:1; /* issue START_UNIT in error handler */ > unsigned manage_start_stop:1; /* Let HLD (sd) manage start/stop */ > + unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */ > unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT */ > unsigned no_uld_attach:1; /* disable connecting to upper level drivers */ > unsigned select_no_atn:1;
On 7/31/23 12:48, TW wrote: > Tested-by: Tanner Watkins A tested-by tag must have the form: Tested-by: Tanner Watkins <dalzot@gmail.com> Also, please do not top-post. That is, reply with your text (this tag), below the original message (which you can trim). Thanks. > > On 7/30/23 18:39, Damien Le Moal wrote: >> During system resume, ata_port_pm_resume() triggers ata EH to >> 1) Resume the controller >> 2) Reset and rescan the ports >> 3) Revalidate devices >> This EH execution is started asynchronously from ata_port_pm_resume(), >> which means that when sd_resume() is executed, none or only part of the >> above processing may have been executed. However, sd_resume() issues a >> START STOP UNIT to wake up the drive from sleep mode. This command is >> translated to ATA with ata_scsi_start_stop_xlat() and issued to the >> device. However, depending on the state of execution of the EH process >> and revalidation triggerred by ata_port_pm_resume(), two things may >> happen: >> 1) The START STOP UNIT fails if it is received before the controller has >> been reenabled at the beginning of the EH execution. This is visible >> with error messages like: >> >> ata10.00: device reported invalid CHS sector 0 >> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK >> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] >> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command >> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 >> sd 9:0:0:0: PM: failed to resume async: error -5 >> >> 2) The START STOP UNIT command is received while the EH process is >> on-going, which mean that it is stopped and must wait for its >> completion, at which point the command is rather useless as the drive >> is already fully spun up already. This case results also in a >> significant delay in sd_resume() which is observable by users as >> the entire system resume completion is delayed. >> >> Given that ATA devices will be woken up by libata activity on resume, >> sd_resume() has no need to issue a START STOP UNIT command, which solves >> the above mentioned problems. Do not issue this command by introducing >> the new scsi_device flag no_start_on_resume and setting this flag to 1 >> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP >> UNIT command only if this flag is not set. >> >> Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> >> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 >> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") >> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> >> --- >> drivers/ata/libata-scsi.c | 7 +++++++ >> drivers/scsi/sd.c | 9 ++++++--- >> include/scsi/scsi_device.h | 1 + >> 3 files changed, 14 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c >> index 370d18aca71e..c6ece32de8e3 100644 >> --- a/drivers/ata/libata-scsi.c >> +++ b/drivers/ata/libata-scsi.c >> @@ -1100,7 +1100,14 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) >> } >> } else { >> sdev->sector_size = ata_id_logical_sector_size(dev->id); >> + /* >> + * Stop the drive on suspend but do not issue START STOP UNIT >> + * on resume as this is not necessary and may fail: the device >> + * will be woken up by ata_port_pm_resume() with a port reset >> + * and device revalidation. >> + */ >> sdev->manage_start_stop = 1; >> + sdev->no_start_on_resume = 1; >> } >> >> /* >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c >> index 68b12afa0721..3c668cfb146d 100644 >> --- a/drivers/scsi/sd.c >> +++ b/drivers/scsi/sd.c >> @@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev) >> static int sd_resume(struct device *dev) >> { >> struct scsi_disk *sdkp = dev_get_drvdata(dev); >> - int ret; >> + int ret = 0; >> >> if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ >> return 0; >> @@ -3884,8 +3884,11 @@ static int sd_resume(struct device *dev) >> if (!sdkp->device->manage_start_stop) >> return 0; >> >> - sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); >> - ret = sd_start_stop_device(sdkp, 1); >> + if (!sdkp->device->no_start_on_resume) { >> + sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); >> + ret = sd_start_stop_device(sdkp, 1); >> + } >> + >> if (!ret) >> opal_unlock_from_suspend(sdkp->opal_dev); >> return ret; >> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h >> index 75b2235b99e2..b9230b6add04 100644 >> --- a/include/scsi/scsi_device.h >> +++ b/include/scsi/scsi_device.h >> @@ -194,6 +194,7 @@ struct scsi_device { >> unsigned no_start_on_add:1; /* do not issue start on add */ >> unsigned allow_restart:1; /* issue START_UNIT in error handler */ >> unsigned manage_start_stop:1; /* Let HLD (sd) manage start/stop */ >> + unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */ >> unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT */ >> unsigned no_uld_attach:1; /* disable connecting to upper level drivers */ >> unsigned select_no_atn:1;
On 7/30/23 18:39, Damien Le Moal wrote: > During system resume, ata_port_pm_resume() triggers ata EH to > 1) Resume the controller > 2) Reset and rescan the ports > 3) Revalidate devices > This EH execution is started asynchronously from ata_port_pm_resume(), > which means that when sd_resume() is executed, none or only part of the > above processing may have been executed. However, sd_resume() issues a > START STOP UNIT to wake up the drive from sleep mode. This command is > translated to ATA with ata_scsi_start_stop_xlat() and issued to the > device. However, depending on the state of execution of the EH process > and revalidation triggerred by ata_port_pm_resume(), two things may > happen: > 1) The START STOP UNIT fails if it is received before the controller has > been reenabled at the beginning of the EH execution. This is visible > with error messages like: > > ata10.00: device reported invalid CHS sector 0 > sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK > sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] > sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command > sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 > sd 9:0:0:0: PM: failed to resume async: error -5 > > 2) The START STOP UNIT command is received while the EH process is > on-going, which mean that it is stopped and must wait for its > completion, at which point the command is rather useless as the drive > is already fully spun up already. This case results also in a > significant delay in sd_resume() which is observable by users as > the entire system resume completion is delayed. > > Given that ATA devices will be woken up by libata activity on resume, > sd_resume() has no need to issue a START STOP UNIT command, which solves > the above mentioned problems. Do not issue this command by introducing > the new scsi_device flag no_start_on_resume and setting this flag to 1 > in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > UNIT command only if this flag is not set. > > Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 > Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") > Signed-off-by: Damien Le Moal <dlemoal@kernel.org> > --- > drivers/ata/libata-scsi.c | 7 +++++++ > drivers/scsi/sd.c | 9 ++++++--- > include/scsi/scsi_device.h | 1 + > 3 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index 370d18aca71e..c6ece32de8e3 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -1100,7 +1100,14 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) > } > } else { > sdev->sector_size = ata_id_logical_sector_size(dev->id); > + /* > + * Stop the drive on suspend but do not issue START STOP UNIT > + * on resume as this is not necessary and may fail: the device > + * will be woken up by ata_port_pm_resume() with a port reset > + * and device revalidation. > + */ > sdev->manage_start_stop = 1; > + sdev->no_start_on_resume = 1; > } > > /* > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 68b12afa0721..3c668cfb146d 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev) > static int sd_resume(struct device *dev) > { > struct scsi_disk *sdkp = dev_get_drvdata(dev); > - int ret; > + int ret = 0; > > if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ > return 0; > @@ -3884,8 +3884,11 @@ static int sd_resume(struct device *dev) > if (!sdkp->device->manage_start_stop) > return 0; > > - sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); > - ret = sd_start_stop_device(sdkp, 1); > + if (!sdkp->device->no_start_on_resume) { > + sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); > + ret = sd_start_stop_device(sdkp, 1); > + } > + > if (!ret) > opal_unlock_from_suspend(sdkp->opal_dev); > return ret; > diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h > index 75b2235b99e2..b9230b6add04 100644 > --- a/include/scsi/scsi_device.h > +++ b/include/scsi/scsi_device.h > @@ -194,6 +194,7 @@ struct scsi_device { > unsigned no_start_on_add:1; /* do not issue start on add */ > unsigned allow_restart:1; /* issue START_UNIT in error handler */ > unsigned manage_start_stop:1; /* Let HLD (sd) manage start/stop */ > + unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */ > unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT */ > unsigned no_uld_attach:1; /* disable connecting to upper level drivers */ > unsigned select_no_atn:1; Tested-by: Tanner Watkins<dalzot@gmail.com>
On 7/31/23 02:39, Damien Le Moal wrote: > During system resume, ata_port_pm_resume() triggers ata EH to > 1) Resume the controller > 2) Reset and rescan the ports > 3) Revalidate devices > This EH execution is started asynchronously from ata_port_pm_resume(), > which means that when sd_resume() is executed, none or only part of the > above processing may have been executed. However, sd_resume() issues a > START STOP UNIT to wake up the drive from sleep mode. This command is > translated to ATA with ata_scsi_start_stop_xlat() and issued to the > device. However, depending on the state of execution of the EH process > and revalidation triggerred by ata_port_pm_resume(), two things may > happen: > 1) The START STOP UNIT fails if it is received before the controller has > been reenabled at the beginning of the EH execution. This is visible > with error messages like: > > ata10.00: device reported invalid CHS sector 0 > sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK > sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] > sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command > sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 > sd 9:0:0:0: PM: failed to resume async: error -5 > > 2) The START STOP UNIT command is received while the EH process is > on-going, which mean that it is stopped and must wait for its > completion, at which point the command is rather useless as the drive > is already fully spun up already. This case results also in a > significant delay in sd_resume() which is observable by users as > the entire system resume completion is delayed. > > Given that ATA devices will be woken up by libata activity on resume, > sd_resume() has no need to issue a START STOP UNIT command, which solves > the above mentioned problems. Do not issue this command by introducing > the new scsi_device flag no_start_on_resume and setting this flag to 1 > in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > UNIT command only if this flag is not set. > Q: As libata starts up the drive internally via reset/revalidate, why do we have to sent START STOP UNIT for shutdown? Wouldn't it be better to disable START STOP UNIT completely for libata, and let everything be handled via ATA command (IDLE IMMEDIATE, SLEEP) ? Hmm? Otherwise a really good idea. We've fallen into this trap several times ourselves. Incidentally, USB could benefit from the same mechanism ... Cheers, Hannes
On 7/30/23 17:39, Damien Le Moal wrote: > During system resume, ata_port_pm_resume() triggers ata EH to > 1) Resume the controller > 2) Reset and rescan the ports > 3) Revalidate devices > This EH execution is started asynchronously from ata_port_pm_resume(), > which means that when sd_resume() is executed, none or only part of the > above processing may have been executed. However, sd_resume() issues a > START STOP UNIT to wake up the drive from sleep mode. This command is > translated to ATA with ata_scsi_start_stop_xlat() and issued to the > device. However, depending on the state of execution of the EH process > and revalidation triggerred by ata_port_pm_resume(), two things may > happen: > 1) The START STOP UNIT fails if it is received before the controller has > been reenabled at the beginning of the EH execution. This is visible > with error messages like: > > ata10.00: device reported invalid CHS sector 0 > sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK > sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] > sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command > sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 > sd 9:0:0:0: PM: failed to resume async: error -5 > > 2) The START STOP UNIT command is received while the EH process is > on-going, which mean that it is stopped and must wait for its > completion, at which point the command is rather useless as the drive > is already fully spun up already. This case results also in a > significant delay in sd_resume() which is observable by users as > the entire system resume completion is delayed. > > Given that ATA devices will be woken up by libata activity on resume, > sd_resume() has no need to issue a START STOP UNIT command, which solves > the above mentioned problems. Do not issue this command by introducing > the new scsi_device flag no_start_on_resume and setting this flag to 1 > in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > UNIT command only if this flag is not set. > > Reported-by: Paul Ausbeck<paula@soe.ucsc.edu> > Closes:https://bugzilla.kernel.org/show_bug.cgi?id=215880 > Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") > Signed-off-by: Damien Le Moal<dlemoal@kernel.org> Tested-by: Paul Ausbeck <paula@soe.ucsc.edu>
On 8/1/23 01:13, Hannes Reinecke wrote: > On 7/31/23 02:39, Damien Le Moal wrote: >> During system resume, ata_port_pm_resume() triggers ata EH to >> 1) Resume the controller >> 2) Reset and rescan the ports >> 3) Revalidate devices >> This EH execution is started asynchronously from ata_port_pm_resume(), >> which means that when sd_resume() is executed, none or only part of the >> above processing may have been executed. However, sd_resume() issues a >> START STOP UNIT to wake up the drive from sleep mode. This command is >> translated to ATA with ata_scsi_start_stop_xlat() and issued to the >> device. However, depending on the state of execution of the EH process >> and revalidation triggerred by ata_port_pm_resume(), two things may >> happen: >> 1) The START STOP UNIT fails if it is received before the controller has >> been reenabled at the beginning of the EH execution. This is visible >> with error messages like: >> >> ata10.00: device reported invalid CHS sector 0 >> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK >> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] >> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command >> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 >> sd 9:0:0:0: PM: failed to resume async: error -5 >> >> 2) The START STOP UNIT command is received while the EH process is >> on-going, which mean that it is stopped and must wait for its >> completion, at which point the command is rather useless as the drive >> is already fully spun up already. This case results also in a >> significant delay in sd_resume() which is observable by users as >> the entire system resume completion is delayed. >> >> Given that ATA devices will be woken up by libata activity on resume, >> sd_resume() has no need to issue a START STOP UNIT command, which solves >> the above mentioned problems. Do not issue this command by introducing >> the new scsi_device flag no_start_on_resume and setting this flag to 1 >> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP >> UNIT command only if this flag is not set. >> > Q: As libata starts up the drive internally via reset/revalidate, why do > we have to sent START STOP UNIT for shutdown? To spin down the drive and put it to sleep. > Wouldn't it be better to disable START STOP UNIT completely for libata, > and let everything be handled via ATA command (IDLE IMMEDIATE, SLEEP) ? > Hmm? I am not sure this buys us much. Could try though, but that is a little too much changes for a bug fix for this cycle. I think we need to think about that in the context of reworking ata suspend/resume to be correct with regard to synchronizing ata device and scsi device suspend/resume with a direct device link, which we do not have right now. > Otherwise a really good idea. We've fallen into this trap several times > ourselves. > Incidentally, USB could benefit from the same mechanism ... Could look into it, but these bug reports, if any, do not come to me :) > > Cheers, > > Hannes
On 8/1/23 05:44, Damien Le Moal wrote: > On 8/1/23 01:13, Hannes Reinecke wrote: >> On 7/31/23 02:39, Damien Le Moal wrote: >>> During system resume, ata_port_pm_resume() triggers ata EH to >>> 1) Resume the controller >>> 2) Reset and rescan the ports >>> 3) Revalidate devices >>> This EH execution is started asynchronously from ata_port_pm_resume(), >>> which means that when sd_resume() is executed, none or only part of the >>> above processing may have been executed. However, sd_resume() issues a >>> START STOP UNIT to wake up the drive from sleep mode. This command is >>> translated to ATA with ata_scsi_start_stop_xlat() and issued to the >>> device. However, depending on the state of execution of the EH process >>> and revalidation triggerred by ata_port_pm_resume(), two things may >>> happen: >>> 1) The START STOP UNIT fails if it is received before the controller has >>> been reenabled at the beginning of the EH execution. This is visible >>> with error messages like: >>> >>> ata10.00: device reported invalid CHS sector 0 >>> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK >>> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] >>> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command >>> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 >>> sd 9:0:0:0: PM: failed to resume async: error -5 >>> >>> 2) The START STOP UNIT command is received while the EH process is >>> on-going, which mean that it is stopped and must wait for its >>> completion, at which point the command is rather useless as the drive >>> is already fully spun up already. This case results also in a >>> significant delay in sd_resume() which is observable by users as >>> the entire system resume completion is delayed. >>> >>> Given that ATA devices will be woken up by libata activity on resume, >>> sd_resume() has no need to issue a START STOP UNIT command, which solves >>> the above mentioned problems. Do not issue this command by introducing >>> the new scsi_device flag no_start_on_resume and setting this flag to 1 >>> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP >>> UNIT command only if this flag is not set. >>> >> Q: As libata starts up the drive internally via reset/revalidate, why do >> we have to sent START STOP UNIT for shutdown? > > To spin down the drive and put it to sleep. > >> Wouldn't it be better to disable START STOP UNIT completely for libata, >> and let everything be handled via ATA command (IDLE IMMEDIATE, SLEEP) ? >> Hmm? > > I am not sure this buys us much. Could try though, but that is a little too much > changes for a bug fix for this cycle. > Fair point. > I think we need to think about that in the context of reworking ata > suspend/resume to be correct with regard to synchronizing ata device and scsi > device suspend/resume with a direct device link, which we do not have right now. > Yeah, that's been on the agenda for quite some time. So you can add: Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes
On 7/30/23 17:39, Damien Le Moal wrote: > Given that ATA devices will be woken up by libata activity on resume, > sd_resume() has no need to issue a START STOP UNIT command, which solves > the above mentioned problems. Do not issue this command by introducing > the new scsi_device flag no_start_on_resume and setting this flag to 1 > in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > UNIT command only if this flag is not set. Reviewed-by: Bart Van Assche <bvanassche@acm.org>
On 7/31/23 09:39, Damien Le Moal wrote: > During system resume, ata_port_pm_resume() triggers ata EH to > 1) Resume the controller > 2) Reset and rescan the ports > 3) Revalidate devices > This EH execution is started asynchronously from ata_port_pm_resume(), > which means that when sd_resume() is executed, none or only part of the > above processing may have been executed. However, sd_resume() issues a > START STOP UNIT to wake up the drive from sleep mode. This command is > translated to ATA with ata_scsi_start_stop_xlat() and issued to the > device. However, depending on the state of execution of the EH process > and revalidation triggerred by ata_port_pm_resume(), two things may > happen: > 1) The START STOP UNIT fails if it is received before the controller has > been reenabled at the beginning of the EH execution. This is visible > with error messages like: > > ata10.00: device reported invalid CHS sector 0 > sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK > sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] > sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command > sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 > sd 9:0:0:0: PM: failed to resume async: error -5 > > 2) The START STOP UNIT command is received while the EH process is > on-going, which mean that it is stopped and must wait for its > completion, at which point the command is rather useless as the drive > is already fully spun up already. This case results also in a > significant delay in sd_resume() which is observable by users as > the entire system resume completion is delayed. > > Given that ATA devices will be woken up by libata activity on resume, > sd_resume() has no need to issue a START STOP UNIT command, which solves > the above mentioned problems. Do not issue this command by introducing > the new scsi_device flag no_start_on_resume and setting this flag to 1 > in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > UNIT command only if this flag is not set. Martin, Are you OK with this patch ? Can I get your Acked-by please ?
On Mon, Jul 31, 2023 at 09:39:56AM +0900, Damien Le Moal wrote: > During system resume, ata_port_pm_resume() triggers ata EH to > 1) Resume the controller > 2) Reset and rescan the ports > 3) Revalidate devices > This EH execution is started asynchronously from ata_port_pm_resume(), > which means that when sd_resume() is executed, none or only part of the > above processing may have been executed. However, sd_resume() issues a > START STOP UNIT to wake up the drive from sleep mode. This command is > translated to ATA with ata_scsi_start_stop_xlat() and issued to the > device. However, depending on the state of execution of the EH process > and revalidation triggerred by ata_port_pm_resume(), two things may > happen: > 1) The START STOP UNIT fails if it is received before the controller has > been reenabled at the beginning of the EH execution. This is visible > with error messages like: > > ata10.00: device reported invalid CHS sector 0 > sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK > sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] > sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command > sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 > sd 9:0:0:0: PM: failed to resume async: error -5 > > 2) The START STOP UNIT command is received while the EH process is > on-going, which mean that it is stopped and must wait for its > completion, at which point the command is rather useless as the drive > is already fully spun up already. This case results also in a > significant delay in sd_resume() which is observable by users as > the entire system resume completion is delayed. > > Given that ATA devices will be woken up by libata activity on resume, > sd_resume() has no need to issue a START STOP UNIT command, which solves > the above mentioned problems. Do not issue this command by introducing > the new scsi_device flag no_start_on_resume and setting this flag to 1 > in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > UNIT command only if this flag is not set. Hi Damien, Last week I noticed that a basic test in our validation started failing, then I noticed that it was subsequent quick suspend and autoresume using rtcwake that was problematic. I couldn't collect any specific log that was pointing to some useful direction. After a painful bisect I got to this patch. After reverting in from the top of our tree, the tests are back to life. The issue was that the subsequent quick suspend-resume (sometimes the second, sometimes third or even sixth) was simply hanging the machine in different points at Suspend. So, maybe we have some kind of disks/configuration out there where this start upon resume is needed? Maybe it is just a matter of timming to ensure some firmware underneath is up and back to life? Well, please let me know the best way to report this issue to you and what kind of logs I should get. Meanwhile if this ends up blocking our CI we can keep a revert in a topic branch for CI. Thanks, Rodrigo. > > Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 > Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") > Signed-off-by: Damien Le Moal <dlemoal@kernel.org> > --- > drivers/ata/libata-scsi.c | 7 +++++++ > drivers/scsi/sd.c | 9 ++++++--- > include/scsi/scsi_device.h | 1 + > 3 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index 370d18aca71e..c6ece32de8e3 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -1100,7 +1100,14 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) > } > } else { > sdev->sector_size = ata_id_logical_sector_size(dev->id); > + /* > + * Stop the drive on suspend but do not issue START STOP UNIT > + * on resume as this is not necessary and may fail: the device > + * will be woken up by ata_port_pm_resume() with a port reset > + * and device revalidation. > + */ > sdev->manage_start_stop = 1; > + sdev->no_start_on_resume = 1; > } > > /* > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 68b12afa0721..3c668cfb146d 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev) > static int sd_resume(struct device *dev) > { > struct scsi_disk *sdkp = dev_get_drvdata(dev); > - int ret; > + int ret = 0; > > if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ > return 0; > @@ -3884,8 +3884,11 @@ static int sd_resume(struct device *dev) > if (!sdkp->device->manage_start_stop) > return 0; > > - sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); > - ret = sd_start_stop_device(sdkp, 1); > + if (!sdkp->device->no_start_on_resume) { > + sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); > + ret = sd_start_stop_device(sdkp, 1); > + } > + > if (!ret) > opal_unlock_from_suspend(sdkp->opal_dev); > return ret; > diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h > index 75b2235b99e2..b9230b6add04 100644 > --- a/include/scsi/scsi_device.h > +++ b/include/scsi/scsi_device.h > @@ -194,6 +194,7 @@ struct scsi_device { > unsigned no_start_on_add:1; /* do not issue start on add */ > unsigned allow_restart:1; /* issue START_UNIT in error handler */ > unsigned manage_start_stop:1; /* Let HLD (sd) manage start/stop */ > + unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */ > unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT */ > unsigned no_uld_attach:1; /* disable connecting to upper level drivers */ > unsigned select_no_atn:1; > -- > 2.41.0 >
On 8/25/23 03:28, Rodrigo Vivi wrote: > On Mon, Jul 31, 2023 at 09:39:56AM +0900, Damien Le Moal wrote: >> During system resume, ata_port_pm_resume() triggers ata EH to >> 1) Resume the controller >> 2) Reset and rescan the ports >> 3) Revalidate devices >> This EH execution is started asynchronously from ata_port_pm_resume(), >> which means that when sd_resume() is executed, none or only part of the >> above processing may have been executed. However, sd_resume() issues a >> START STOP UNIT to wake up the drive from sleep mode. This command is >> translated to ATA with ata_scsi_start_stop_xlat() and issued to the >> device. However, depending on the state of execution of the EH process >> and revalidation triggerred by ata_port_pm_resume(), two things may >> happen: >> 1) The START STOP UNIT fails if it is received before the controller has >> been reenabled at the beginning of the EH execution. This is visible >> with error messages like: >> >> ata10.00: device reported invalid CHS sector 0 >> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK >> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] >> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command >> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 >> sd 9:0:0:0: PM: failed to resume async: error -5 >> >> 2) The START STOP UNIT command is received while the EH process is >> on-going, which mean that it is stopped and must wait for its >> completion, at which point the command is rather useless as the drive >> is already fully spun up already. This case results also in a >> significant delay in sd_resume() which is observable by users as >> the entire system resume completion is delayed. >> >> Given that ATA devices will be woken up by libata activity on resume, >> sd_resume() has no need to issue a START STOP UNIT command, which solves >> the above mentioned problems. Do not issue this command by introducing >> the new scsi_device flag no_start_on_resume and setting this flag to 1 >> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP >> UNIT command only if this flag is not set. > > Hi Damien, > > Last week I noticed that a basic test in our validation started failing, > then I noticed that it was subsequent quick suspend and autoresume using > rtcwake that was problematic. Arg... Again... Since the change of scsi layer to use async PM operations for suspend/resume, ATA side has been a constant source of issues. When the change was done, I did not notice any issue but several users reported problems. > I couldn't collect any specific log that was pointing to some useful direction. > After a painful bisect I got to this patch. After reverting in from the > top of our tree, the tests are back to life. > > The issue was that the subsequent quick suspend-resume (sometimes the > second, sometimes third or even sixth) was simply hanging the machine > in different points at Suspend. I would have expected issues on the resume side. But it seems you are getting a hang on suspend, which is new. How quick are your suspend/resume cycles ? I did use rtcqake for my tests as well, but I was setting the wake timer at +5s or more and suspending with "systemctl suspend". > So, maybe we have some kind of disks/configuration out there where this > start upon resume is needed? Maybe it is just a matter of timming to > ensure some firmware underneath is up and back to life? I do not think so. Suspend will issue a start stop unit command to put the drive to sleep and resume will reset the port (which should wake up the drive) and then issue an IDENTIFY command (which will also wake up the drive) and other read logs etc to rescan the drive. In both cases, if the commands do not complete, we would see errors/timeout and likely port reset/drive gone events. So I think this is likely another subtle race between scsi suspend and ata suspend that is causing a deadlock. The main issue I think is that there is no direct ancestry between the ata port (device) and scsi device, so the change to scsi async pm ops made a mess of the suspend/resume operations ordering. For suspend, scsi device (child of ata port) should be first, then ata port device (parent). For resume, the reverse order is needed. PM normally ensures that parent/child ordering, but we lack that parent/child relationship. I am working on fixing that but it is very slow progress because I have been so far enable to recreate any of the issues that have been reported. I am patching "blind"... Any chance you could get a thread stack dump when the system hangs ? echo t > /proc/sysrq-trigger And: echo d > /proc/sysrq-trigger would be useful as well... That is, unless you really have a hard lockup... > Well, please let me know the best way to report this issue to you and what > kind of logs I should get. If you can get the above ? dmesg output as well with PM debug messages turned on. Also, what is your setup ? What machine, adapter and drive are you using ?
Damien, > The main issue I think is that there is no direct ancestry between the > ata port (device) and scsi device, I really think this should be fixed. It is a major deficiency as far as I'm concerned and it affects other things than power management.
On 8/25/23 10:31, Martin K. Petersen wrote: > > Damien, > >> The main issue I think is that there is no direct ancestry between the >> ata port (device) and scsi device, > > I really think this should be fixed. It is a major deficiency as far as > I'm concerned and it affects other things than power management. Working on it :) >
On 8/25/23 03:28, Rodrigo Vivi wrote: > On Mon, Jul 31, 2023 at 09:39:56AM +0900, Damien Le Moal wrote: >> During system resume, ata_port_pm_resume() triggers ata EH to >> 1) Resume the controller >> 2) Reset and rescan the ports >> 3) Revalidate devices >> This EH execution is started asynchronously from ata_port_pm_resume(), >> which means that when sd_resume() is executed, none or only part of the >> above processing may have been executed. However, sd_resume() issues a >> START STOP UNIT to wake up the drive from sleep mode. This command is >> translated to ATA with ata_scsi_start_stop_xlat() and issued to the >> device. However, depending on the state of execution of the EH process >> and revalidation triggerred by ata_port_pm_resume(), two things may >> happen: >> 1) The START STOP UNIT fails if it is received before the controller has >> been reenabled at the beginning of the EH execution. This is visible >> with error messages like: >> >> ata10.00: device reported invalid CHS sector 0 >> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK >> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] >> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command >> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 >> sd 9:0:0:0: PM: failed to resume async: error -5 >> >> 2) The START STOP UNIT command is received while the EH process is >> on-going, which mean that it is stopped and must wait for its >> completion, at which point the command is rather useless as the drive >> is already fully spun up already. This case results also in a >> significant delay in sd_resume() which is observable by users as >> the entire system resume completion is delayed. >> >> Given that ATA devices will be woken up by libata activity on resume, >> sd_resume() has no need to issue a START STOP UNIT command, which solves >> the above mentioned problems. Do not issue this command by introducing >> the new scsi_device flag no_start_on_resume and setting this flag to 1 >> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP >> UNIT command only if this flag is not set. > > Hi Damien, > > Last week I noticed that a basic test in our validation started failing, > then I noticed that it was subsequent quick suspend and autoresume using > rtcwake that was problematic. > > I couldn't collect any specific log that was pointing to some useful direction. > After a painful bisect I got to this patch. After reverting in from the > top of our tree, the tests are back to life. > > The issue was that the subsequent quick suspend-resume (sometimes the > second, sometimes third or even sixth) was simply hanging the machine > in different points at Suspend. > > So, maybe we have some kind of disks/configuration out there where this > start upon resume is needed? Maybe it is just a matter of timming to > ensure some firmware underneath is up and back to life? > > Well, please let me know the best way to report this issue to you and what > kind of logs I should get. > > Meanwhile if this ends up blocking our CI we can keep a revert in a > topic branch for CI. Can you try adding the patch attached to this email ? Thanks.
On Fri, Aug 25, 2023 at 08:42:13AM +0900, Damien Le Moal wrote: > On 8/25/23 03:28, Rodrigo Vivi wrote: > > On Mon, Jul 31, 2023 at 09:39:56AM +0900, Damien Le Moal wrote: > >> During system resume, ata_port_pm_resume() triggers ata EH to > >> 1) Resume the controller > >> 2) Reset and rescan the ports > >> 3) Revalidate devices > >> This EH execution is started asynchronously from ata_port_pm_resume(), > >> which means that when sd_resume() is executed, none or only part of the > >> above processing may have been executed. However, sd_resume() issues a > >> START STOP UNIT to wake up the drive from sleep mode. This command is > >> translated to ATA with ata_scsi_start_stop_xlat() and issued to the > >> device. However, depending on the state of execution of the EH process > >> and revalidation triggerred by ata_port_pm_resume(), two things may > >> happen: > >> 1) The START STOP UNIT fails if it is received before the controller has > >> been reenabled at the beginning of the EH execution. This is visible > >> with error messages like: > >> > >> ata10.00: device reported invalid CHS sector 0 > >> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK > >> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] > >> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command > >> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 > >> sd 9:0:0:0: PM: failed to resume async: error -5 > >> > >> 2) The START STOP UNIT command is received while the EH process is > >> on-going, which mean that it is stopped and must wait for its > >> completion, at which point the command is rather useless as the drive > >> is already fully spun up already. This case results also in a > >> significant delay in sd_resume() which is observable by users as > >> the entire system resume completion is delayed. > >> > >> Given that ATA devices will be woken up by libata activity on resume, > >> sd_resume() has no need to issue a START STOP UNIT command, which solves > >> the above mentioned problems. Do not issue this command by introducing > >> the new scsi_device flag no_start_on_resume and setting this flag to 1 > >> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > >> UNIT command only if this flag is not set. > > > > Hi Damien, > > > > Last week I noticed that a basic test in our validation started failing, > > then I noticed that it was subsequent quick suspend and autoresume using > > rtcwake that was problematic. > > Arg... Again... Since the change of scsi layer to use async PM operations for > suspend/resume, ATA side has been a constant source of issues. When the change > was done, I did not notice any issue but several users reported problems. > > > I couldn't collect any specific log that was pointing to some useful direction. > > After a painful bisect I got to this patch. After reverting in from the > > top of our tree, the tests are back to life. > > > > The issue was that the subsequent quick suspend-resume (sometimes the > > second, sometimes third or even sixth) was simply hanging the machine > > in different points at Suspend. > > I would have expected issues on the resume side. But it seems you are getting a > hang on suspend, which is new. How quick are your suspend/resume cycles ? I did > use rtcqake for my tests as well, but I was setting the wake timer at +5s or > more and suspending with "systemctl suspend". I meant resume. It hangs during random parts at the resume sequence. Brain thought resume, fingers typed 'Suspend', I'm sorry! > > > So, maybe we have some kind of disks/configuration out there where this > > start upon resume is needed? Maybe it is just a matter of timming to > > ensure some firmware underneath is up and back to life? > > I do not think so. Suspend will issue a start stop unit command to put the drive > to sleep and resume will reset the port (which should wake up the drive) and > then issue an IDENTIFY command (which will also wake up the drive) and other > read logs etc to rescan the drive. > In both cases, if the commands do not complete, we would see errors/timeout and > likely port reset/drive gone events. So I think this is likely another subtle > race between scsi suspend and ata suspend that is causing a deadlock. > > The main issue I think is that there is no direct ancestry between the ata port > (device) and scsi device, so the change to scsi async pm ops made a mess of the > suspend/resume operations ordering. For suspend, scsi device (child of ata port) > should be first, then ata port device (parent). For resume, the reverse order is > needed. PM normally ensures that parent/child ordering, but we lack that > parent/child relationship. I am working on fixing that but it is very slow > progress because I have been so far enable to recreate any of the issues that > have been reported. I am patching "blind"... I believe your suspicious makes sense. And on these lines, that patch you attached earlier would fix that. However my initial tries of that didn't help. I'm going to run more tests and get back to you. > > Any chance you could get a thread stack dump when the system hangs ? > > echo t > /proc/sysrq-trigger > > And: > > echo d > /proc/sysrq-trigger > > would be useful as well... > > That is, unless you really have a hard lockup... no chance... it is a hard lockup. > > > Well, please let me know the best way to report this issue to you and what > > kind of logs I should get. > > If you can get the above ? dmesg output as well with PM debug messages turned on. I have the CONFIG_PM_DEBUG here with no_console_suspend. If you remember any other config or parameter that would help, please let me know that I collect it again: [ 104.571459] PM: suspend entry (deep) [ 104.585967] Filesystems sync: 0.010 seconds [ 104.618685] Freezing user space processes [ 104.625374] Freezing user space processes completed (elapsed 0.002 seconds) [ 104.632448] OOM killer disabled. [ 104.635712] Freezing remaining freezable tasks [ 104.641899] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 104.669767] wlp6s0: deauthenticating from 08:36:c9:85:df:ef by local choice (Reason: 3=DEAUTH_LEAVING) [ 104.679812] serial 00:01: disabled [ 104.683466] sd 7:0:0:0: [sdc] Synchronizing SCSI cache [ 104.688902] sd 7:0:0:0: [sdc] Stopping disk [ 104.693176] sd 5:0:0:0: [sdb] Synchronizing SCSI cache [ 104.698419] sd 4:0:0:0: [sda] Synchronizing SCSI cache [ 104.703730] sd 4:0:0:0: [sda] Stopping disk [ 104.885912] sd 5:0:0:0: [sdb] Stopping disk [ 106.163215] PM: suspend devices took 1.514 seconds [ 107.003217] serial 00:01: activated [ 107.076779] nvme nvme0: 16/0/0 default/read/poll queues [ 107.123917] r8169 0000:07:00.0 enp7s0: Link is Down [ 107.208945] PM: resume devices took 0.241 seconds [ 107.214746] pcieport 0000:00:1c.0: PCI bridge to [bus 06] [ 107.220274] pcieport 0000:00:1c.0: bridge window [mem 0x43700000-0x437fffff] [ 107.227538] OOM killer enabled. [ 107.230710] Restarting tasks ... [ 107.231803] pcieport 0000:00:1c.2: PCI bridge to [bus 07] [ 107.236474] done. [ 107.240599] pcieport 0000:00:1c.2: bridge window [io 0x4000-0x4fff] [ 107.242574] random: crng reseeded on system resumption [ 107.249119] pcieport 0000:00:1c.2: bridge window [mem 0x43600000-0x436fffff] [ 107.249405] pcieport 0000:00:1c.6: PCI bridge to [bus 08] [ 107.259714] PM: suspend exit [ 107.261623] pcieport 0000:00:1c.6: bridge window [io 0x3000-0x3fff] [ 107.276554] pcieport 0000:00:1c.6: bridge window [mem 0x43500000-0x435fffff] [ 107.283849] pcieport 0000:00:1c.6: bridge window [mem 0x70900000-0x709fffff 64bit pref] [ 107.293567] ata7: SATA link down (SStatus 4 SControl 300) [ 107.304150] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 107.310975] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 107.319173] ata5.00: configured for UDMA/133 [ 107.324620] ata5.00: Enabling discard_zeroes_data [ 107.398370] ata6.00: configured for UDMA/133 [ 108.563229] PM: suspend entry (deep) [ 108.573610] Filesystems sync: 0.006 seconds [ 108.580617] Freezing user space processes [ 108.586774] Freezing user space processes completed (elapsed 0.002 seconds) [ 108.593793] OOM killer disabled. [ 108.597055] Freezing remaining freezable tasks [ 108.603246] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 108.621515] sd 7:0:0:0: [sdc] Synchronizing SCSI cache [ 108.621522] sd 5:0:0:0: [sdb] Synchronizing SCSI cache [ 108.622018] serial 00:01: disabled [ 108.635420] sd 4:0:0:0: [sda] Synchronizing SCSI cache [ 108.640747] sd 4:0:0:0: [sda] Stopping disk [ 108.644148] sd 5:0:0:0: [sdb] Stopping disk [ 108.983487] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 108.991923] ata8.00: configured for UDMA/133 [ 108.996423] sd 7:0:0:0: [sdc] Stopping disk [ 109.973722] PM: suspend devices took 1.363 seconds [ 110.721600] serial 00:01: activated [ 110.802094] nvme nvme0: 16/0/0 default/read/poll queues [ 110.873036] r8169 0000:07:00.0 enp7s0: Link is Down [ 111.032278] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 111.038583] ata7: SATA link down (SStatus 4 SControl 300) [ 111.044065] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 111.051326] ata5.00: configured for UDMA/133 [ 111.056118] ata5.00: Enabling discard_zeroes_data [ 111.131795] ata6.00: configured for UDMA/133 [ 112.713764] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 112.724250] ata8.00: configured for UDMA/133 > > Also, what is your setup ? What machine, adapter and drive are you using ? $ sudo lshw -class disk *-namespace:0 description: NVMe disk physical id: 0 logical name: hwmon1 *-namespace:1 description: NVMe disk physical id: 2 logical name: /dev/ng0n1 *-namespace:2 description: NVMe disk physical id: 1 bus info: nvme@0:1 logical name: /dev/nvme0n1 size: 465GiB (500GB) capabilities: gpt-1.00 partitioned partitioned:gpt configuration: guid=fd1cdad8-4a22-4fc9-8479-cce1748e78d5 logicalsectorsize=512 sectorsize=512 wwid=eui.e8238fa6bf530001001b448b45c5252c *-disk:0 description: ATA Disk product: INTEL SSDSC2BB48 physical id: 0 bus info: scsi@4:0.0.0 logical name: /dev/sda version: 0370 serial: BTWL409509Q6480QGN size: 447GiB (480GB) capabilities: partitioned partitioned:dos configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096 signature=000226cf *-disk:1 description: ATA Disk product: INTEL SSDSC2BW48 physical id: 1 bus info: scsi@5:0.0.0 logical name: /dev/sdb version: DC32 serial: PHDA4166003M4805GN size: 447GiB (480GB) capabilities: gpt-1.00 partitioned partitioned:gpt configuration: ansiversion=5 guid=b6c93ada-cbb1-412e-92e9-d644a476cacc logicalsectorsize=512 sectorsize=512 *-disk:2 description: ATA Disk product: ST1000DM010-2EP1 physical id: 0.0.0 bus info: scsi@7:0.0.0 logical name: /dev/sdc version: CC46 serial: ZN1VGAFZ size: 931GiB (1TB) capabilities: gpt-1.00 partitioned partitioned:gpt configuration: ansiversion=5 guid=8bd323ef-07b9-49c6-99c9-6581274d2f65 logicalsectorsize=512 sectorsize=4096 $ lspci -vv 00:00.0 Host bridge: Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers (rev 02) DeviceName: Onboard - Other Subsystem: Gigabyte Technology Co., Ltd Device 5000 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 00:01.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 (rev 02) (prog-if 00 [Normal decode]) Subsystem: Gigabyte Technology Co., Ltd Device 5000 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 120 Bus: primary=00, secondary=01, subordinate=04, sec-latency=0 I/O behind bridge: [disabled] [16-bit] Memory behind bridge: 42000000-433fffff [size=20M] [32-bit] Prefetchable memory behind bridge: 60000000-707fffff [size=264M] [32-bit] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA+ VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 00:02.0 Display controller: Intel Corporation AlderLake-S GT1 (rev 0c) DeviceName: Onboard - Video Subsystem: Gigabyte Technology Co., Ltd Device d000 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 255 Region 0: Memory at 41000000 (64-bit, non-prefetchable) [size=16M] Region 2: Memory at 50000000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at 5000 [disabled] [size=64] Capabilities: <access denied> Kernel modules: i915, xe 00:06.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin D routed to IRQ 121 Bus: primary=00, secondary=05, subordinate=05, sec-latency=0 I/O behind bridge: [disabled] [16-bit] Memory behind bridge: 43800000-438fffff [size=1M] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 00:0a.0 Signal processing controller: Intel Corporation Platform Monitoring Technology (rev 01) DeviceName: Onboard - Other Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Region 0: Memory at 43910000 (64-bit, non-prefetchable) [size=32K] Capabilities: <access denied> Kernel driver in use: intel_vsec Kernel modules: intel_vsec 00:14.0 USB controller: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller (rev 11) (prog-if 30 [XHCI]) DeviceName: Onboard - Other Subsystem: Gigabyte Technology Co., Ltd Device 5007 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 126 Region 0: Memory at 43900000 (64-bit, non-prefetchable) [size=64K] Capabilities: <access denied> Kernel driver in use: xhci_hcd 00:14.2 RAM memory: Intel Corporation Alder Lake-S PCH Shared SRAM (rev 11) DeviceName: Onboard - Other Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Region 0: Memory at 4391c000 (64-bit, non-prefetchable) [disabled] [size=16K] Region 2: Memory at 4392b000 (64-bit, non-prefetchable) [disabled] [size=4K] Capabilities: <access denied> 00:15.0 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #0 (rev 11) DeviceName: Onboard - Other Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 27 Region 0: Memory at 40400000 (64-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: intel-lpss 00:15.1 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #1 (rev 11) DeviceName: Onboard - Other Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 40 Region 0: Memory at 40401000 (64-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: intel-lpss 00:15.2 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #2 (rev 11) DeviceName: Onboard - Other Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin C routed to IRQ 29 Region 0: Memory at 40402000 (64-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: intel-lpss 00:15.3 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #3 (rev 11) DeviceName: Onboard - Other Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin D routed to IRQ 43 Region 0: Memory at 40403000 (64-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: intel-lpss 00:16.0 Communication controller: Intel Corporation Alder Lake-S PCH HECI Controller #1 (rev 11) DeviceName: Onboard - Other Subsystem: Gigabyte Technology Co., Ltd Device 1c3a Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 146 Region 0: Memory at 43927000 (64-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: mei_me Kernel modules: mei_me 00:17.0 SATA controller: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] (rev 11) (prog-if 01 [AHCI 1.0]) DeviceName: Onboard - SATA Subsystem: Gigabyte Technology Co., Ltd Device b005 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 125 Region 0: Memory at 43920000 (32-bit, non-prefetchable) [size=8K] Region 1: Memory at 43926000 (32-bit, non-prefetchable) [size=256] Region 2: I/O ports at 5090 [size=8] Region 3: I/O ports at 5080 [size=4] Region 4: I/O ports at 5060 [size=32] Region 5: Memory at 43925000 (32-bit, non-prefetchable) [size=2K] Capabilities: <access denied> Kernel driver in use: ahci 00:19.0 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #4 (rev 11) DeviceName: Onboard - Other Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 31 Region 0: Memory at 40404000 (64-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: intel-lpss 00:19.1 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #5 (rev 11) DeviceName: Onboard - Other Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 32 Region 0: Memory at 40405000 (64-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: intel-lpss 00:1c.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 (rev 11) (prog-if 00 [Normal decode]) Subsystem: Gigabyte Technology Co., Ltd Device 5001 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 122 Bus: primary=00, secondary=06, subordinate=06, sec-latency=0 I/O behind bridge: [disabled] [16-bit] Memory behind bridge: 43700000-437fffff [size=1M] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 00:1c.2 PCI bridge: Intel Corporation Device 7aba (rev 11) (prog-if 00 [Normal decode]) Subsystem: Gigabyte Technology Co., Ltd Device 5001 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin C routed to IRQ 123 Bus: primary=00, secondary=07, subordinate=07, sec-latency=0 I/O behind bridge: 4000-4fff [size=4K] [16-bit] Memory behind bridge: 43600000-436fffff [size=1M] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 00:1c.6 PCI bridge: Intel Corporation Device 7abe (rev 11) (prog-if 00 [Normal decode]) Subsystem: Gigabyte Technology Co., Ltd Device 5001 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin C routed to IRQ 124 Bus: primary=00, secondary=08, subordinate=08, sec-latency=0 I/O behind bridge: 3000-3fff [size=4K] [16-bit] Memory behind bridge: 43500000-435fffff [size=1M] [32-bit] Prefetchable memory behind bridge: 70900000-709fffff [size=1M] [32-bit] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 00:1f.0 ISA bridge: Intel Corporation Device 7a86 (rev 11) DeviceName: Onboard - Other Subsystem: Gigabyte Technology Co., Ltd Device 5001 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 00:1f.3 Audio device: Intel Corporation Alder Lake-S HD Audio Controller (rev 11) DeviceName: Onboard - Sound Subsystem: Gigabyte Technology Co., Ltd Device a194 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 148 Region 0: Memory at 43918000 (64-bit, non-prefetchable) [size=16K] Region 4: Memory at 43400000 (64-bit, non-prefetchable) [size=1M] Capabilities: <access denied> Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel, snd_sof_pci_intel_tgl 00:1f.4 SMBus: Intel Corporation Alder Lake-S PCH SMBus Controller (rev 11) DeviceName: Onboard - Other Subsystem: Gigabyte Technology Co., Ltd Device 5001 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin C routed to IRQ 18 Region 0: Memory at 43922000 (64-bit, non-prefetchable) [size=256] Region 4: I/O ports at efa0 [size=32] Kernel driver in use: i801_smbus Kernel modules: i2c_i801 00:1f.5 Serial bus controller: Intel Corporation Alder Lake-S PCH SPI Controller (rev 11) DeviceName: Onboard - Other Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Region 0: Memory at 40406000 (32-bit, non-prefetchable) [size=4K] 01:00.0 PCI bridge: Intel Corporation Device 4fa0 (rev 01) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at 70000000 (64-bit, prefetchable) [size=8M] Bus: primary=01, secondary=02, subordinate=04, sec-latency=0 I/O behind bridge: [disabled] [32-bit] Memory behind bridge: 42000000-433fffff [size=20M] [32-bit] Prefetchable memory behind bridge: 60000000-6fffffff [size=256M] [32-bit] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA+ VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 02:01.0 PCI bridge: Intel Corporation Device 4fa4 (prog-if 00 [Normal decode]) Subsystem: Intel Corporation Device 4fa4 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=02, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: [disabled] [16-bit] Memory behind bridge: 42000000-431fffff [size=18M] [32-bit] Prefetchable memory behind bridge: 60000000-6fffffff [size=256M] [32-bit] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA+ VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 02:04.0 PCI bridge: Intel Corporation Device 4fa4 (prog-if 00 [Normal decode]) Subsystem: Intel Corporation Device 0000 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=02, secondary=04, subordinate=04, sec-latency=0 I/O behind bridge: [disabled] [16-bit] Memory behind bridge: 43300000-433fffff [size=1M] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A750] (rev 08) (prog-if 00 [VGA controller]) Subsystem: ASRock Incorporation Device 6002 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin ? routed to IRQ 145 Region 0: Memory at 42000000 (64-bit, non-prefetchable) [size=16M] Region 2: Memory at 60000000 (64-bit, prefetchable) [size=256M] Expansion ROM at 43000000 [disabled] [size=2M] Capabilities: <access denied> Kernel driver in use: xe Kernel modules: i915, xe 04:00.0 Audio device: Intel Corporation DG2 Audio Controller Subsystem: ASRock Incorporation Device 6002 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin ? routed to IRQ 149 Region 0: Memory at 43300000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 05:00.0 Non-Volatile memory controller: Sandisk Corp Device 501e (prog-if 02 [NVM Express]) Subsystem: Sandisk Corp Device 501e Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 NUMA node: 0 Region 0: Memory at 43800000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: nvme Kernel modules: nvme 06:00.0 Network controller: Intel Corporation Dual Band Wireless-AC 3168NGW [Stone Peak] (rev 10) Subsystem: Intel Corporation Device 2110 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 147 Region 0: Memory at 43700000 (64-bit, non-prefetchable) [size=8K] Capabilities: <access denied> Kernel driver in use: iwlwifi Kernel modules: iwlwifi 07:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 16) Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 18 Region 0: I/O ports at 4000 [size=256] Region 2: Memory at 43604000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at 43600000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: r8169 Kernel modules: r8169 08:00.0 Serial controller: Device 1c00:3253 (rev 10) (prog-if 05 [16850]) Subsystem: Device 1c00:3253 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 18 Region 0: I/O ports at 3000 [size=256] Region 1: Memory at 70900000 (32-bit, prefetchable) [size=32K] Region 2: I/O ports at 3100 [size=4] Expansion ROM at 43500000 [disabled] [size=32K] Capabilities: <access denied> Kernel driver in use: serial anything else or more specific? Thanks, Rodrigo. > > -- > Damien Le Moal > Western Digital Research >
On 8/26/23 02:09, Rodrigo Vivi wrote: >> I would have expected issues on the resume side. But it seems you are getting a >> hang on suspend, which is new. How quick are your suspend/resume cycles ? I did >> use rtcqake for my tests as well, but I was setting the wake timer at +5s or >> more and suspending with "systemctl suspend". > > I meant resume. It hangs during random parts at the resume sequence. > Brain thought resume, fingers typed 'Suspend', I'm sorry! That makes more sense :) >>> So, maybe we have some kind of disks/configuration out there where this >>> start upon resume is needed? Maybe it is just a matter of timming to >>> ensure some firmware underneath is up and back to life? >> >> I do not think so. Suspend will issue a start stop unit command to put the drive >> to sleep and resume will reset the port (which should wake up the drive) and >> then issue an IDENTIFY command (which will also wake up the drive) and other >> read logs etc to rescan the drive. >> In both cases, if the commands do not complete, we would see errors/timeout and >> likely port reset/drive gone events. So I think this is likely another subtle >> race between scsi suspend and ata suspend that is causing a deadlock. >> >> The main issue I think is that there is no direct ancestry between the ata port >> (device) and scsi device, so the change to scsi async pm ops made a mess of the >> suspend/resume operations ordering. For suspend, scsi device (child of ata port) >> should be first, then ata port device (parent). For resume, the reverse order is >> needed. PM normally ensures that parent/child ordering, but we lack that >> parent/child relationship. I am working on fixing that but it is very slow >> progress because I have been so far enable to recreate any of the issues that >> have been reported. I am patching "blind"... > > I believe your suspicious makes sense. And on these lines, that patch you > attached earlier would fix that. However my initial tries of that didn't > help. I'm going to run more tests and get back to you. Yes, I now better understand what is going on, somewhat. The patch I sent corrects the suspend side: the scsi device will be suspended first, which triggers issuing an IDLE command to spin down the drive, then that ATA port is suspended. All good. However, this patch does not address the resume side. The ata port is resumed first. This operation is rather quick and only trigger ata EH asynchronously. The scsi device is resumed after the ata port. This operation triggers issuing a VERIFY command to spinup the drive and put it back in the active power state. Media access commands like VERIFY are the only commands that restore a disk to active state. However, the "no start on resume" patch I sent disables that, which is wrong. But reverting it is also not a full solution as there is no way to know when the VERIFY command will be issued with regard to ata EH triggered. That verify command should be issued after the port is resumed and before the revalidation start ! I suspect that you get your lockup with the no start on resume patch because the revalidation never proceeds correctly as the drive is still sleeping. And things work with the patch reverted as you get lucky and the VERIFY command makes it through at the right timing... Would need to add more messages to confirm all this. I still cannot really explain the hard lockup though as in the end, you should be seeing command timeouts, port reset, retries, and eventually drive removal. Anyway, what is clear is that using scsi manage_start_stop cannot possibly work for resume. So more patches are needed to move the IDLE and VERIFY commands to libata EH so that these commands are issued within the EH context instead of the scsi device suspend/resume context. Working on that. I should have patches ready for you Monday. >> Any chance you could get a thread stack dump when the system hangs ? >> >> echo t > /proc/sysrq-trigger >> >> And: >> >> echo d > /proc/sysrq-trigger >> >> would be useful as well... >> >> That is, unless you really have a hard lockup... > > no chance... it is a hard lockup. Too bad... I really would like to understand where things get deadlocked. > I have the CONFIG_PM_DEBUG here with no_console_suspend. > If you remember any other config or parameter that would help, please let > me know that I collect it again: > > [ 104.571459] PM: suspend entry (deep) > [ 104.585967] Filesystems sync: 0.010 seconds > [ 104.618685] Freezing user space processes > [ 104.625374] Freezing user space processes completed (elapsed 0.002 seconds) > [ 104.632448] OOM killer disabled. > [ 104.635712] Freezing remaining freezable tasks > [ 104.641899] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) > [ 104.669767] wlp6s0: deauthenticating from 08:36:c9:85:df:ef by local choice (Reason: 3=DEAUTH_LEAVING) > [ 104.679812] serial 00:01: disabled > [ 104.683466] sd 7:0:0:0: [sdc] Synchronizing SCSI cache > [ 104.688902] sd 7:0:0:0: [sdc] Stopping disk > [ 104.693176] sd 5:0:0:0: [sdb] Synchronizing SCSI cache > [ 104.698419] sd 4:0:0:0: [sda] Synchronizing SCSI cache > [ 104.703730] sd 4:0:0:0: [sda] Stopping disk > [ 104.885912] sd 5:0:0:0: [sdb] Stopping disk So we have: 4:0:0:0 -> ata5 port 5:0:0:0 -> ata6 port 7:0:0:0 -> ata8 port > [ 106.163215] PM: suspend devices took 1.514 seconds > [ 107.003217] serial 00:01: activated > [ 107.076779] nvme nvme0: 16/0/0 default/read/poll queues > [ 107.123917] r8169 0000:07:00.0 enp7s0: Link is Down > [ 107.208945] PM: resume devices took 0.241 seconds > [ 107.214746] pcieport 0000:00:1c.0: PCI bridge to [bus 06] > [ 107.220274] pcieport 0000:00:1c.0: bridge window [mem 0x43700000-0x437fffff] > [ 107.227538] OOM killer enabled. > [ 107.230710] Restarting tasks ... > [ 107.231803] pcieport 0000:00:1c.2: PCI bridge to [bus 07] > [ 107.236474] done. > [ 107.240599] pcieport 0000:00:1c.2: bridge window [io 0x4000-0x4fff] > [ 107.242574] random: crng reseeded on system resumption > [ 107.249119] pcieport 0000:00:1c.2: bridge window [mem 0x43600000-0x436fffff] > [ 107.249405] pcieport 0000:00:1c.6: PCI bridge to [bus 08] > [ 107.259714] PM: suspend exit > [ 107.261623] pcieport 0000:00:1c.6: bridge window [io 0x3000-0x3fff] > [ 107.276554] pcieport 0000:00:1c.6: bridge window [mem 0x43500000-0x435fffff] > [ 107.283849] pcieport 0000:00:1c.6: bridge window [mem 0x70900000-0x709fffff 64bit pref] The messages below come from libata EH which was triggered asynchronously and end up running after PM exit, which is fine as PM does not need to wait for the drive to spin up. > [ 107.293567] ata7: SATA link down (SStatus 4 SControl 300) > [ 107.304150] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 107.310975] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 107.319173] ata5.00: configured for UDMA/133 > [ 107.324620] ata5.00: Enabling discard_zeroes_data > [ 107.398370] ata6.00: configured for UDMA/133 Looks like libata EH is finished here. And then comes the next suspend. > [ 108.563229] PM: suspend entry (deep) > [ 108.573610] Filesystems sync: 0.006 seconds > [ 108.580617] Freezing user space processes > [ 108.586774] Freezing user space processes completed (elapsed 0.002 seconds) > [ 108.593793] OOM killer disabled. > [ 108.597055] Freezing remaining freezable tasks > [ 108.603246] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) > [ 108.621515] sd 7:0:0:0: [sdc] Synchronizing SCSI cache > [ 108.621522] sd 5:0:0:0: [sdb] Synchronizing SCSI cache > [ 108.622018] serial 00:01: disabled > [ 108.635420] sd 4:0:0:0: [sda] Synchronizing SCSI cache > [ 108.640747] sd 4:0:0:0: [sda] Stopping disk > [ 108.644148] sd 5:0:0:0: [sdb] Stopping disk > [ 108.983487] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 108.991923] ata8.00: configured for UDMA/133 Looks like ata EH from the previous resume was actually not finished and completes here. > [ 108.996423] sd 7:0:0:0: [sdc] Stopping disk > [ 109.973722] PM: suspend devices took 1.363 seconds > [ 110.721600] serial 00:01: activated > [ 110.802094] nvme nvme0: 16/0/0 default/read/poll queues > [ 110.873036] r8169 0000:07:00.0 enp7s0: Link is Down > [ 111.032278] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 111.038583] ata7: SATA link down (SStatus 4 SControl 300) > [ 111.044065] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 111.051326] ata5.00: configured for UDMA/133 > [ 111.056118] ata5.00: Enabling discard_zeroes_data > [ 111.131795] ata6.00: configured for UDMA/133 > [ 112.713764] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 112.724250] ata8.00: configured for UDMA/133 Looks like the drives have all come back but there is no "PM exit" to be seen, so I suspect the scsi device resume side is stuck... Not sure why, especially with the "no start on resume" patch. Will send patches to fix all this as soon as possible.
On 8/26/23 02:09, Rodrigo Vivi wrote: >>> So, maybe we have some kind of disks/configuration out there where this >>> start upon resume is needed? Maybe it is just a matter of timming to >>> ensure some firmware underneath is up and back to life? >> >> I do not think so. Suspend will issue a start stop unit command to put the drive >> to sleep and resume will reset the port (which should wake up the drive) and >> then issue an IDENTIFY command (which will also wake up the drive) and other >> read logs etc to rescan the drive. >> In both cases, if the commands do not complete, we would see errors/timeout and >> likely port reset/drive gone events. So I think this is likely another subtle >> race between scsi suspend and ata suspend that is causing a deadlock. >> >> The main issue I think is that there is no direct ancestry between the ata port >> (device) and scsi device, so the change to scsi async pm ops made a mess of the >> suspend/resume operations ordering. For suspend, scsi device (child of ata port) >> should be first, then ata port device (parent). For resume, the reverse order is >> needed. PM normally ensures that parent/child ordering, but we lack that >> parent/child relationship. I am working on fixing that but it is very slow >> progress because I have been so far enable to recreate any of the issues that >> have been reported. I am patching "blind"... > > I believe your suspicious makes sense. And on these lines, that patch you > attached earlier would fix that. However my initial tries of that didn't > help. I'm going to run more tests and get back to you. Rodrigo, I pushed the resume-v2 branch to libata tree: git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata (or https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git) This branch adds 13 patches on top of 6.5.0 to cleanup libata suspend/resume and other device shutdown issues. The first 4 patches are the main ones to fix suspend resume. I tested that on 2 different machines with different drives and with qemu. All seems fine. Could you try to run this through your CI ? I am very interested in seeing if it survives your suspend/resume tests. If you can confirm that all issues are fixed, I will rebase this on for-next and post. Thanks !
On Tue, Aug 29, 2023 at 03:17:38PM +0900, Damien Le Moal wrote: > On 8/26/23 02:09, Rodrigo Vivi wrote: > >>> So, maybe we have some kind of disks/configuration out there where this > >>> start upon resume is needed? Maybe it is just a matter of timming to > >>> ensure some firmware underneath is up and back to life? > >> > >> I do not think so. Suspend will issue a start stop unit command to put the drive > >> to sleep and resume will reset the port (which should wake up the drive) and > >> then issue an IDENTIFY command (which will also wake up the drive) and other > >> read logs etc to rescan the drive. > >> In both cases, if the commands do not complete, we would see errors/timeout and > >> likely port reset/drive gone events. So I think this is likely another subtle > >> race between scsi suspend and ata suspend that is causing a deadlock. > >> > >> The main issue I think is that there is no direct ancestry between the ata port > >> (device) and scsi device, so the change to scsi async pm ops made a mess of the > >> suspend/resume operations ordering. For suspend, scsi device (child of ata port) > >> should be first, then ata port device (parent). For resume, the reverse order is > >> needed. PM normally ensures that parent/child ordering, but we lack that > >> parent/child relationship. I am working on fixing that but it is very slow > >> progress because I have been so far enable to recreate any of the issues that > >> have been reported. I am patching "blind"... > > > > I believe your suspicious makes sense. And on these lines, that patch you > > attached earlier would fix that. However my initial tries of that didn't > > help. I'm going to run more tests and get back to you. > > Rodrigo, > > I pushed the resume-v2 branch to libata tree: > > git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata > (or https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git) > > This branch adds 13 patches on top of 6.5.0 to cleanup libata suspend/resume and > other device shutdown issues. The first 4 patches are the main ones to fix > suspend resume. I tested that on 2 different machines with different drives and > with qemu. All seems fine. > > Could you try to run this through your CI ? I am very interested in seeing if it > survives your suspend/resume tests. well, in the end this didn't affect the CI machinery as I was afraid. it is only in my local DG2. https://intel-gfx-ci.01.org/tree/intel-xe/bat-all.html?testfilter=suspend (bat-dg2-oem2 one) I just got these 13 patches and applied to my branch and tested it again and it still *fails* for me. [ 79.648328] [IGT] kms_pipe_crc_basic: finished subtest pipe-A-DP-2, SUCCESS [ 79.657353] [IGT] kms_pipe_crc_basic: starting dynamic subtest pipe-B-DP-2 [ 80.375042] PM: suspend entry (deep) [ 80.380799] Filesystems sync: 0.002 seconds [ 80.386476] Freezing user space processes [ 80.392286] Freezing user space processes completed (elapsed 0.001 seconds) [ 80.399294] OOM killer disabled. [ 80.402536] Freezing remaining freezable tasks [ 80.408335] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 80.439372] sd 5:0:0:0: [sdb] Synchronizing SCSI cache [ 80.439716] serial 00:01: disabled [ 80.448011] sd 4:0:0:0: [sda] Synchronizing SCSI cache [ 80.448014] sd 7:0:0:0: [sdc] Synchronizing SCSI cache [ 80.453600] ata6.00: Entering standby power mode [ 80.464217] ata5.00: Entering standby power mode [ 80.812294] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 80.818520] ata8.00: Entering active power mode [ 80.842989] ata8.00: configured for UDMA/133 [ 80.847660] ata8.00: Entering standby power mode [ 81.119426] xe 0000:03:00.0: [drm] GT0: suspended [ 81.800508] PM: suspend of devices complete after 1367.829 msecs [ 81.806661] PM: start suspend of devices complete after 1390.859 msecs [ 81.813244] PM: suspend devices took 1.398 seconds [ 81.820101] PM: late suspend of devices complete after 2.036 msecs �[ 82.403857] serial 00:01: activated [ 82.489612] nvme nvme0: 16/0/0 default/read/poll queues [ 82.563318] r8169 0000:07:00.0 enp7s0: Link is Down [ 82.581444] xe REG[0x223a8-0x223af]: allow read access [ 82.586704] xe REG[0x1c03a8-0x1c03af]: allow read access [ 82.592071] xe REG[0x1d03a8-0x1d03af]: allow read access [ 82.597423] xe REG[0x1c83a8-0x1c83af]: allow read access [ 82.602765] xe REG[0x1d83a8-0x1d83af]: allow read access [ 82.608113] xe REG[0x1a3a8-0x1a3af]: allow read access [ 82.613281] xe REG[0x1c3a8-0x1c3af]: allow read access [ 82.618454] xe REG[0x1e3a8-0x1e3af]: allow read access [ 82.623634] xe REG[0x263a8-0x263af]: allow read access [ 82.628816] xe 0000:03:00.0: [drm] GT0: resumed [ 82.728005] ata7: SATA link down (SStatus 4 SControl 300) [ 82.733531] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 82.739773] ata5.00: Entering active power mode [ 82.744398] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 82.750618] ata6.00: Entering active power mode [ 82.755961] ata5.00: configured for UDMA/133 [ 82.760479] ata5.00: Enabling discard_zeroes_data [ 82.836266] ata6.00: configured for UDMA/133 [ 84.460081] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 84.466354] ata8.00: Entering active power mode [ 84.497256] ata8.00: configured for UDMA/133 ... > > If you can confirm that all issues are fixed, I will rebase this on for-next and > post. > > Thanks ! > > > -- > Damien Le Moal > Western Digital Research >
On 8/31/23 07:14, Rodrigo Vivi wrote: > On Tue, Aug 29, 2023 at 03:17:38PM +0900, Damien Le Moal wrote: >> On 8/26/23 02:09, Rodrigo Vivi wrote: >>>>> So, maybe we have some kind of disks/configuration out there where this >>>>> start upon resume is needed? Maybe it is just a matter of timming to >>>>> ensure some firmware underneath is up and back to life? >>>> >>>> I do not think so. Suspend will issue a start stop unit command to put the drive >>>> to sleep and resume will reset the port (which should wake up the drive) and >>>> then issue an IDENTIFY command (which will also wake up the drive) and other >>>> read logs etc to rescan the drive. >>>> In both cases, if the commands do not complete, we would see errors/timeout and >>>> likely port reset/drive gone events. So I think this is likely another subtle >>>> race between scsi suspend and ata suspend that is causing a deadlock. >>>> >>>> The main issue I think is that there is no direct ancestry between the ata port >>>> (device) and scsi device, so the change to scsi async pm ops made a mess of the >>>> suspend/resume operations ordering. For suspend, scsi device (child of ata port) >>>> should be first, then ata port device (parent). For resume, the reverse order is >>>> needed. PM normally ensures that parent/child ordering, but we lack that >>>> parent/child relationship. I am working on fixing that but it is very slow >>>> progress because I have been so far enable to recreate any of the issues that >>>> have been reported. I am patching "blind"... >>> >>> I believe your suspicious makes sense. And on these lines, that patch you >>> attached earlier would fix that. However my initial tries of that didn't >>> help. I'm going to run more tests and get back to you. >> >> Rodrigo, >> >> I pushed the resume-v2 branch to libata tree: >> >> git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata >> (or https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git) >> >> This branch adds 13 patches on top of 6.5.0 to cleanup libata suspend/resume and >> other device shutdown issues. The first 4 patches are the main ones to fix >> suspend resume. I tested that on 2 different machines with different drives and >> with qemu. All seems fine. >> >> Could you try to run this through your CI ? I am very interested in seeing if it >> survives your suspend/resume tests. > > well, in the end this didn't affect the CI machinery as I was afraid. > it is only in my local DG2. > > https://intel-gfx-ci.01.org/tree/intel-xe/bat-all.html?testfilter=suspend > (bat-dg2-oem2 one) > > I just got these 13 patches and applied to my branch and tested it again > and it still *fails* for me. That is annoying... But I think the messages give us a hint as to what is going on. See below. > > [ 79.648328] [IGT] kms_pipe_crc_basic: finished subtest pipe-A-DP-2, SUCCESS > [ 79.657353] [IGT] kms_pipe_crc_basic: starting dynamic subtest pipe-B-DP-2 > [ 80.375042] PM: suspend entry (deep) > [ 80.380799] Filesystems sync: 0.002 seconds > [ 80.386476] Freezing user space processes > [ 80.392286] Freezing user space processes completed (elapsed 0.001 seconds) > [ 80.399294] OOM killer disabled. > [ 80.402536] Freezing remaining freezable tasks > [ 80.408335] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) > [ 80.439372] sd 5:0:0:0: [sdb] Synchronizing SCSI cache > [ 80.439716] serial 00:01: disabled > [ 80.448011] sd 4:0:0:0: [sda] Synchronizing SCSI cache > [ 80.448014] sd 7:0:0:0: [sdc] Synchronizing SCSI cache > [ 80.453600] ata6.00: Entering standby power mode This is sd 5:0:0:0. All good, ordered properly with the "Synchronizing SCSI cache". > [ 80.464217] ata5.00: Entering standby power mode Same here for sd 4:0:0:0. > [ 80.812294] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 80.818520] ata8.00: Entering active power mode > [ 80.842989] ata8.00: configured for UDMA/133 Arg ! sd 7:0:0:0 is resuming ! But the above "Synchronizing SCSI cache" tells us that it was suspending and libata EH did not yet put that drive to standby... > [ 80.847660] ata8.00: Entering standby power mode ... which happens here. So it looks like libata EH had both the suspend and resume requests at the same time, which is totally weird. > [ 81.119426] xe 0000:03:00.0: [drm] GT0: suspended > [ 81.800508] PM: suspend of devices complete after 1367.829 msecs > [ 81.806661] PM: start suspend of devices complete after 1390.859 msecs > [ 81.813244] PM: suspend devices took 1.398 seconds > [ 81.820101] PM: late suspend of devices complete after 2.036 msecs ...and PM suspend completes here. Resume "starts" now (but clearly it started earlier already given that sd 7:0:0:0 was reactivated. > �[ 82.403857] serial 00:01: activated > [ 82.489612] nvme nvme0: 16/0/0 default/read/poll queues > [ 82.563318] r8169 0000:07:00.0 enp7s0: Link is Down > [ 82.581444] xe REG[0x223a8-0x223af]: allow read access > [ 82.586704] xe REG[0x1c03a8-0x1c03af]: allow read access > [ 82.592071] xe REG[0x1d03a8-0x1d03af]: allow read access > [ 82.597423] xe REG[0x1c83a8-0x1c83af]: allow read access > [ 82.602765] xe REG[0x1d83a8-0x1d83af]: allow read access > [ 82.608113] xe REG[0x1a3a8-0x1a3af]: allow read access > [ 82.613281] xe REG[0x1c3a8-0x1c3af]: allow read access > [ 82.618454] xe REG[0x1e3a8-0x1e3af]: allow read access > [ 82.623634] xe REG[0x263a8-0x263af]: allow read access > [ 82.628816] xe 0000:03:00.0: [drm] GT0: resumed > [ 82.728005] ata7: SATA link down (SStatus 4 SControl 300) > [ 82.733531] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 82.739773] ata5.00: Entering active power mode > [ 82.744398] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 82.750618] ata6.00: Entering active power mode > [ 82.755961] ata5.00: configured for UDMA/133 > [ 82.760479] ata5.00: Enabling discard_zeroes_data > [ 82.836266] ata6.00: configured for UDMA/133 > [ 84.460081] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 84.466354] ata8.00: Entering active power mode > [ 84.497256] ata8.00: configured for UDMA/133 > ... And this looks all normal, the drives have all been transitioned to active power mode as expected. And yet, your system is stuck after this, right ? Can you try to boot with "sysrq_always_enabled" and try to see if sending "ctrl-sysrq-t" keys can give you a stack backtrace of the tasks to see where they are stuck ? I am going to try something like you do with very short resume rtc timer and multiple disks to see if I can reproduce. But it is starting to look like PM is starting resuming before suspend completes... Not sure that is correct. But the end result may be that libata EH endup getting stuck. Let me dig further.
On Thu, 2023-08-31 at 09:32 +0900, Damien Le Moal wrote: > On 8/31/23 07:14, Rodrigo Vivi wrote: > > On Tue, Aug 29, 2023 at 03:17:38PM +0900, Damien Le Moal wrote: > > > On 8/26/23 02:09, Rodrigo Vivi wrote: > > > > > > So, maybe we have some kind of disks/configuration out > > > > > > there where this > > > > > > start upon resume is needed? Maybe it is just a matter of > > > > > > timming to > > > > > > ensure some firmware underneath is up and back to life? > > > > > > > > > > I do not think so. Suspend will issue a start stop unit > > > > > command to put the drive > > > > > to sleep and resume will reset the port (which should wake up > > > > > the drive) and > > > > > then issue an IDENTIFY command (which will also wake up the > > > > > drive) and other > > > > > read logs etc to rescan the drive. > > > > > In both cases, if the commands do not complete, we would see > > > > > errors/timeout and > > > > > likely port reset/drive gone events. So I think this is > > > > > likely another subtle > > > > > race between scsi suspend and ata suspend that is causing a > > > > > deadlock. > > > > > > > > > > The main issue I think is that there is no direct ancestry > > > > > between the ata port > > > > > (device) and scsi device, so the change to scsi async pm ops > > > > > made a mess of the > > > > > suspend/resume operations ordering. For suspend, scsi device > > > > > (child of ata port) > > > > > should be first, then ata port device (parent). For resume, > > > > > the reverse order is > > > > > needed. PM normally ensures that parent/child ordering, but > > > > > we lack that > > > > > parent/child relationship. I am working on fixing that but it > > > > > is very slow > > > > > progress because I have been so far enable to recreate any of > > > > > the issues that > > > > > have been reported. I am patching "blind"... > > > > > > > > I believe your suspicious makes sense. And on these lines, that > > > > patch you > > > > attached earlier would fix that. However my initial tries of > > > > that didn't > > > > help. I'm going to run more tests and get back to you. > > > > > > Rodrigo, > > > > > > I pushed the resume-v2 branch to libata tree: > > > > > > git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata > > > (or > > > https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.gi > > > t) > > > > > > This branch adds 13 patches on top of 6.5.0 to cleanup libata > > > suspend/resume and > > > other device shutdown issues. The first 4 patches are the main > > > ones to fix > > > suspend resume. I tested that on 2 different machines with > > > different drives and > > > with qemu. All seems fine. > > > > > > Could you try to run this through your CI ? I am very interested > > > in seeing if it > > > survives your suspend/resume tests. > > > > well, in the end this didn't affect the CI machinery as I was > > afraid. > > it is only in my local DG2. > > > > https://intel-gfx-ci.01.org/tree/intel-xe/bat- > > all.html?testfilter=suspend > > (bat-dg2-oem2 one) > > > > I just got these 13 patches and applied to my branch and tested it > > again > > and it still *fails* for me. > > That is annoying... But I think the messages give us a hint as to > what is going > on. See below. > > > > > [ 79.648328] [IGT] kms_pipe_crc_basic: finished subtest pipe-A- > > DP-2, SUCCESS > > [ 79.657353] [IGT] kms_pipe_crc_basic: starting dynamic subtest > > pipe-B-DP-2 > > [ 80.375042] PM: suspend entry (deep) > > [ 80.380799] Filesystems sync: 0.002 seconds > > [ 80.386476] Freezing user space processes > > [ 80.392286] Freezing user space processes completed (elapsed > > 0.001 seconds) > > [ 80.399294] OOM killer disabled. > > [ 80.402536] Freezing remaining freezable tasks > > [ 80.408335] Freezing remaining freezable tasks completed > > (elapsed 0.001 seconds) > > [ 80.439372] sd 5:0:0:0: [sdb] Synchronizing SCSI cache > > [ 80.439716] serial 00:01: disabled > > [ 80.448011] sd 4:0:0:0: [sda] Synchronizing SCSI cache > > [ 80.448014] sd 7:0:0:0: [sdc] Synchronizing SCSI cache > > [ 80.453600] ata6.00: Entering standby power mode > > This is sd 5:0:0:0. All good, ordered properly with the > "Synchronizing SCSI cache". > > > [ 80.464217] ata5.00: Entering standby power mode > > Same here for sd 4:0:0:0. > > > [ 80.812294] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl > > 300) > > [ 80.818520] ata8.00: Entering active power mode > > [ 80.842989] ata8.00: configured for UDMA/133 > > Arg ! sd 7:0:0:0 is resuming ! But the above "Synchronizing SCSI > cache" tells > us that it was suspending and libata EH did not yet put that drive to > standby... > > > [ 80.847660] ata8.00: Entering standby power mode > > ... which happens here. So it looks like libata EH had both the > suspend and > resume requests at the same time, which is totally weird. although it looks weird, it totally matches the 'use case'. I mean, if I suspend, resume, and wait a bit before suspend and resume again, it will work 100% of the time. The issue is really only when another suspend comes right after the resume, in a loop without any wait. > > > [ 81.119426] xe 0000:03:00.0: [drm] GT0: suspended > > [ 81.800508] PM: suspend of devices complete after 1367.829 msecs > > [ 81.806661] PM: start suspend of devices complete after 1390.859 > > msecs > > [ 81.813244] PM: suspend devices took 1.398 seconds > > [ 81.820101] PM: late suspend of devices complete after 2.036 > > msecs > > ...and PM suspend completes here. Resume "starts" now (but clearly it > started > earlier already given that sd 7:0:0:0 was reactivated. that is weird. > > > �[ 82.403857] serial 00:01: activated > > [ 82.489612] nvme nvme0: 16/0/0 default/read/poll queues > > [ 82.563318] r8169 0000:07:00.0 enp7s0: Link is Down > > [ 82.581444] xe REG[0x223a8-0x223af]: allow read access > > [ 82.586704] xe REG[0x1c03a8-0x1c03af]: allow read access > > [ 82.592071] xe REG[0x1d03a8-0x1d03af]: allow read access > > [ 82.597423] xe REG[0x1c83a8-0x1c83af]: allow read access > > [ 82.602765] xe REG[0x1d83a8-0x1d83af]: allow read access > > [ 82.608113] xe REG[0x1a3a8-0x1a3af]: allow read access > > [ 82.613281] xe REG[0x1c3a8-0x1c3af]: allow read access > > [ 82.618454] xe REG[0x1e3a8-0x1e3af]: allow read access > > [ 82.623634] xe REG[0x263a8-0x263af]: allow read access > > [ 82.628816] xe 0000:03:00.0: [drm] GT0: resumed > > [ 82.728005] ata7: SATA link down (SStatus 4 SControl 300) > > [ 82.733531] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl > > 300) > > [ 82.739773] ata5.00: Entering active power mode > > [ 82.744398] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl > > 300) > > [ 82.750618] ata6.00: Entering active power mode > > [ 82.755961] ata5.00: configured for UDMA/133 > > [ 82.760479] ata5.00: Enabling discard_zeroes_data > > [ 82.836266] ata6.00: configured for UDMA/133 > > [ 84.460081] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl > > 300) > > [ 84.466354] ata8.00: Entering active power mode > > [ 84.497256] ata8.00: configured for UDMA/133 > > ... > > And this looks all normal, the drives have all been transitioned to > active > power mode as expected. And yet, your system is stuck after this, > right ? yes > Can you try to boot with "sysrq_always_enabled" and try to see if > sending > "ctrl-sysrq-t" keys can give you a stack backtrace of the tasks to > see where > they are stuck ? I will try tomorrow. > > I am going to try something like you do with very short resume rtc > timer and > multiple disks to see if I can reproduce. But it is starting to look > like PM is > starting resuming before suspend completes... yes, this is what it looks like. > Not sure that is correct. But the > end result may be that libata EH endup getting stuck. Let me dig > further. Thank you so much for the hard work on this!
On 8/31/23 10:48, Vivi, Rodrigo wrote: > On Thu, 2023-08-31 at 09:32 +0900, Damien Le Moal wrote: >> On 8/31/23 07:14, Rodrigo Vivi wrote: >>> On Tue, Aug 29, 2023 at 03:17:38PM +0900, Damien Le Moal wrote: >>>> On 8/26/23 02:09, Rodrigo Vivi wrote: >>>>>>> So, maybe we have some kind of disks/configuration out >>>>>>> there where this >>>>>>> start upon resume is needed? Maybe it is just a matter of >>>>>>> timming to >>>>>>> ensure some firmware underneath is up and back to life? >>>>>> >>>>>> I do not think so. Suspend will issue a start stop unit >>>>>> command to put the drive >>>>>> to sleep and resume will reset the port (which should wake up >>>>>> the drive) and >>>>>> then issue an IDENTIFY command (which will also wake up the >>>>>> drive) and other >>>>>> read logs etc to rescan the drive. >>>>>> In both cases, if the commands do not complete, we would see >>>>>> errors/timeout and >>>>>> likely port reset/drive gone events. So I think this is >>>>>> likely another subtle >>>>>> race between scsi suspend and ata suspend that is causing a >>>>>> deadlock. >>>>>> >>>>>> The main issue I think is that there is no direct ancestry >>>>>> between the ata port >>>>>> (device) and scsi device, so the change to scsi async pm ops >>>>>> made a mess of the >>>>>> suspend/resume operations ordering. For suspend, scsi device >>>>>> (child of ata port) >>>>>> should be first, then ata port device (parent). For resume, >>>>>> the reverse order is >>>>>> needed. PM normally ensures that parent/child ordering, but >>>>>> we lack that >>>>>> parent/child relationship. I am working on fixing that but it >>>>>> is very slow >>>>>> progress because I have been so far enable to recreate any of >>>>>> the issues that >>>>>> have been reported. I am patching "blind"... >>>>> >>>>> I believe your suspicious makes sense. And on these lines, that >>>>> patch you >>>>> attached earlier would fix that. However my initial tries of >>>>> that didn't >>>>> help. I'm going to run more tests and get back to you. >>>> >>>> Rodrigo, >>>> >>>> I pushed the resume-v2 branch to libata tree: >>>> >>>> git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata >>>> (or >>>> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.gi >>>> t) >>>> >>>> This branch adds 13 patches on top of 6.5.0 to cleanup libata >>>> suspend/resume and >>>> other device shutdown issues. The first 4 patches are the main >>>> ones to fix >>>> suspend resume. I tested that on 2 different machines with >>>> different drives and >>>> with qemu. All seems fine. >>>> >>>> Could you try to run this through your CI ? I am very interested >>>> in seeing if it >>>> survives your suspend/resume tests. >>> >>> well, in the end this didn't affect the CI machinery as I was >>> afraid. >>> it is only in my local DG2. >>> >>> https://intel-gfx-ci.01.org/tree/intel-xe/bat- >>> all.html?testfilter=suspend >>> (bat-dg2-oem2 one) >>> >>> I just got these 13 patches and applied to my branch and tested it >>> again >>> and it still *fails* for me. >> >> That is annoying... But I think the messages give us a hint as to >> what is going >> on. See below. >> >>> >>> [ 79.648328] [IGT] kms_pipe_crc_basic: finished subtest pipe-A- >>> DP-2, SUCCESS >>> [ 79.657353] [IGT] kms_pipe_crc_basic: starting dynamic subtest >>> pipe-B-DP-2 >>> [ 80.375042] PM: suspend entry (deep) >>> [ 80.380799] Filesystems sync: 0.002 seconds >>> [ 80.386476] Freezing user space processes >>> [ 80.392286] Freezing user space processes completed (elapsed >>> 0.001 seconds) >>> [ 80.399294] OOM killer disabled. >>> [ 80.402536] Freezing remaining freezable tasks >>> [ 80.408335] Freezing remaining freezable tasks completed >>> (elapsed 0.001 seconds) >>> [ 80.439372] sd 5:0:0:0: [sdb] Synchronizing SCSI cache >>> [ 80.439716] serial 00:01: disabled >>> [ 80.448011] sd 4:0:0:0: [sda] Synchronizing SCSI cache >>> [ 80.448014] sd 7:0:0:0: [sdc] Synchronizing SCSI cache >>> [ 80.453600] ata6.00: Entering standby power mode >> >> This is sd 5:0:0:0. All good, ordered properly with the >> "Synchronizing SCSI cache". >> >>> [ 80.464217] ata5.00: Entering standby power mode >> >> Same here for sd 4:0:0:0. >> >>> [ 80.812294] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl >>> 300) >>> [ 80.818520] ata8.00: Entering active power mode >>> [ 80.842989] ata8.00: configured for UDMA/133 >> >> Arg ! sd 7:0:0:0 is resuming ! But the above "Synchronizing SCSI >> cache" tells >> us that it was suspending and libata EH did not yet put that drive to >> standby... >> >>> [ 80.847660] ata8.00: Entering standby power mode >> >> ... which happens here. So it looks like libata EH had both the >> suspend and >> resume requests at the same time, which is totally weird. > > although it looks weird, it totally matches the 'use case'. > I mean, if I suspend, resume, and wait a bit before suspend and resume > again, it will work 100% of the time. > The issue is really only when another suspend comes right after the > resume, in a loop without any wait. > >> >>> [ 81.119426] xe 0000:03:00.0: [drm] GT0: suspended >>> [ 81.800508] PM: suspend of devices complete after 1367.829 msecs >>> [ 81.806661] PM: start suspend of devices complete after 1390.859 >>> msecs >>> [ 81.813244] PM: suspend devices took 1.398 seconds >>> [ 81.820101] PM: late suspend of devices complete after 2.036 >>> msecs >> >> ...and PM suspend completes here. Resume "starts" now (but clearly it >> started >> earlier already given that sd 7:0:0:0 was reactivated. > > that is weird. > >> >>> �[ 82.403857] serial 00:01: activated >>> [ 82.489612] nvme nvme0: 16/0/0 default/read/poll queues >>> [ 82.563318] r8169 0000:07:00.0 enp7s0: Link is Down >>> [ 82.581444] xe REG[0x223a8-0x223af]: allow read access >>> [ 82.586704] xe REG[0x1c03a8-0x1c03af]: allow read access >>> [ 82.592071] xe REG[0x1d03a8-0x1d03af]: allow read access >>> [ 82.597423] xe REG[0x1c83a8-0x1c83af]: allow read access >>> [ 82.602765] xe REG[0x1d83a8-0x1d83af]: allow read access >>> [ 82.608113] xe REG[0x1a3a8-0x1a3af]: allow read access >>> [ 82.613281] xe REG[0x1c3a8-0x1c3af]: allow read access >>> [ 82.618454] xe REG[0x1e3a8-0x1e3af]: allow read access >>> [ 82.623634] xe REG[0x263a8-0x263af]: allow read access >>> [ 82.628816] xe 0000:03:00.0: [drm] GT0: resumed >>> [ 82.728005] ata7: SATA link down (SStatus 4 SControl 300) >>> [ 82.733531] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl >>> 300) >>> [ 82.739773] ata5.00: Entering active power mode >>> [ 82.744398] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl >>> 300) >>> [ 82.750618] ata6.00: Entering active power mode >>> [ 82.755961] ata5.00: configured for UDMA/133 >>> [ 82.760479] ata5.00: Enabling discard_zeroes_data >>> [ 82.836266] ata6.00: configured for UDMA/133 >>> [ 84.460081] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl >>> 300) >>> [ 84.466354] ata8.00: Entering active power mode >>> [ 84.497256] ata8.00: configured for UDMA/133 >>> ... >> >> And this looks all normal, the drives have all been transitioned to >> active >> power mode as expected. And yet, your system is stuck after this, >> right ? > > yes > >> Can you try to boot with "sysrq_always_enabled" and try to see if >> sending >> "ctrl-sysrq-t" keys can give you a stack backtrace of the tasks to >> see where >> they are stuck ? > > I will try tomorrow. After fixing my qemu setup to get rtcwake to work, I succedded in recreating the hang ! With multiple disks attached to the VM, suspending the VM with an rtcwake timer set to +1 seconds triggers the resume in the middle of the suspend and it hangs. And I found the problem: it is a race in ata_port_request_pm() where the call to ata_port_wait_eh() is completely innefective, causing suspend and resume to be scheduled simultaneously, and so one being lost since the pm_msg field is per port. I have a fix. I keep doing suspend/resume in the VM and all looks good now. Cleaning up the patches and I will push a resume-v3 branch soon for you to test. > >> >> I am going to try something like you do with very short resume rtc >> timer and >> multiple disks to see if I can reproduce. But it is starting to look >> like PM is >> starting resuming before suspend completes... > > yes, this is what it looks like. It seems that the PM code is fine with that. The issue really was with libata :)
On 8/26/23 02:09, Rodrigo Vivi wrote: > I have the CONFIG_PM_DEBUG here with no_console_suspend. > If you remember any other config or parameter that would help, please let > me know that I collect it again: Coming back to this previous dmesg output as I am in fact seeing something very similar now. See below. > [ 104.571459] PM: suspend entry (deep) > [ 104.585967] Filesystems sync: 0.010 seconds > [ 104.618685] Freezing user space processes > [ 104.625374] Freezing user space processes completed (elapsed 0.002 seconds) > [ 104.632448] OOM killer disabled. > [ 104.635712] Freezing remaining freezable tasks > [ 104.641899] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) suspend start and freezes the user tasks, as expected. > [ 104.669767] wlp6s0: deauthenticating from 08:36:c9:85:df:ef by local choice (Reason: 3=DEAUTH_LEAVING) > [ 104.679812] serial 00:01: disabled > [ 104.683466] sd 7:0:0:0: [sdc] Synchronizing SCSI cache > [ 104.688902] sd 7:0:0:0: [sdc] Stopping disk > [ 104.693176] sd 5:0:0:0: [sdb] Synchronizing SCSI cache > [ 104.698419] sd 4:0:0:0: [sda] Synchronizing SCSI cache > [ 104.703730] sd 4:0:0:0: [sda] Stopping disk > [ 104.885912] sd 5:0:0:0: [sdb] Stopping disk > [ 106.163215] PM: suspend devices took 1.514 seconds All devices are suspended. No issues. Then resume starts... > [ 107.003217] serial 00:01: activated > [ 107.076779] nvme nvme0: 16/0/0 default/read/poll queues > [ 107.123917] r8169 0000:07:00.0 enp7s0: Link is Down > [ 107.208945] PM: resume devices took 0.241 seconds > [ 107.214746] pcieport 0000:00:1c.0: PCI bridge to [bus 06] > [ 107.220274] pcieport 0000:00:1c.0: bridge window [mem 0x43700000-0x437fffff] > [ 107.227538] OOM killer enabled. > [ 107.230710] Restarting tasks ... ... remember this one ... > [ 107.231803] pcieport 0000:00:1c.2: PCI bridge to [bus 07] > [ 107.236474] done. > [ 107.240599] pcieport 0000:00:1c.2: bridge window [io 0x4000-0x4fff] > [ 107.242574] random: crng reseeded on system resumption > [ 107.249119] pcieport 0000:00:1c.2: bridge window [mem 0x43600000-0x436fffff] > [ 107.249405] pcieport 0000:00:1c.6: PCI bridge to [bus 08] > [ 107.259714] PM: suspend exit > [ 107.261623] pcieport 0000:00:1c.6: bridge window [io 0x3000-0x3fff] > [ 107.276554] pcieport 0000:00:1c.6: bridge window [mem 0x43500000-0x435fffff] > [ 107.283849] pcieport 0000:00:1c.6: bridge window [mem 0x70900000-0x709fffff 64bit pref] > [ 107.293567] ata7: SATA link down (SStatus 4 SControl 300) > [ 107.304150] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 107.310975] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 107.319173] ata5.00: configured for UDMA/133 > [ 107.324620] ata5.00: Enabling discard_zeroes_data > [ 107.398370] ata6.00: configured for UDMA/133 started bringing back up devices, but suspend comes back again. > [ 108.563229] PM: suspend entry (deep) > [ 108.573610] Filesystems sync: 0.006 seconds > [ 108.580617] Freezing user space processes > [ 108.586774] Freezing user space processes completed (elapsed 0.002 seconds) > [ 108.593793] OOM killer disabled. > [ 108.597055] Freezing remaining freezable tasks > [ 108.603246] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) which freezes the tasks, again. > [ 108.621515] sd 7:0:0:0: [sdc] Synchronizing SCSI cache > [ 108.621522] sd 5:0:0:0: [sdb] Synchronizing SCSI cache > [ 108.622018] serial 00:01: disabled > [ 108.635420] sd 4:0:0:0: [sda] Synchronizing SCSI cache > [ 108.640747] sd 4:0:0:0: [sda] Stopping disk > [ 108.644148] sd 5:0:0:0: [sdb] Stopping disk > [ 108.983487] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 108.991923] ata8.00: configured for UDMA/133 > [ 108.996423] sd 7:0:0:0: [sdc] Stopping disk > [ 109.973722] PM: suspend devices took 1.363 seconds And suspend finishes, followed by resume starting. > [ 110.721600] serial 00:01: activated > [ 110.802094] nvme nvme0: 16/0/0 default/read/poll queues > [ 110.873036] r8169 0000:07:00.0 enp7s0: Link is Down > [ 111.032278] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 111.038583] ata7: SATA link down (SStatus 4 SControl 300) > [ 111.044065] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 111.051326] ata5.00: configured for UDMA/133 > [ 111.056118] ata5.00: Enabling discard_zeroes_data > [ 111.131795] ata6.00: configured for UDMA/133 > [ 112.713764] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 112.724250] ata8.00: configured for UDMA/133 Resume restarted the devices, but there is no "Restarting tasks" ! So the user tasks are frozen, hence the "hang". There are no user processes to run, so no coming back to the shell. So even without the libata patches fixing the resume mess, this issue does not look like a libata issue. I think your bisect hitting the no_start_on_resume patch is simply because that patch changes the timing of things and makes it easier to hit this issue. Adding PM folks because I am out of my depth on this one... PM folks, Rodrigo hit an issue doing suspend+resume cycles in a loop, with the resume triggered a very short time (wakealarm) after starting resume. This ends up with a hang. I could recreate the same in qemu. I cannot get wakealarm to work for some reason, but I have a virtio device not supporting suspend which triggers the resume in the middle of suspend. And repeating "systemctl suspend" again and again, I end up with a similar hang: [ 98.455929] PM: suspend entry (s2idle) [ 98.466622] Filesystems sync: 0.008 seconds [ 98.473871] Freezing user space processes [ 98.476372] Freezing user space processes completed (elapsed 0.001 seconds) [ 98.478197] OOM killer disabled. [ 98.479199] Freezing remaining freezable tasks [ 98.481887] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) suspend starts, freezineg the tasks. [ 98.493890] virtio-fs: suspend/resume not yet supported [ 98.493971] sd 2:0:0:0: [sdc] Synchronizing SCSI cache [ 98.499658] virtio-pci 0000:00:05.0: PM: pci_pm_suspend(): virtio_pci_freeze+0x0/0x50 returns -95 [ 98.504241] virtio-pci 0000:00:05.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x230 returns -95 [ 98.507762] virtio-pci 0000:00:05.0: PM: failed to suspend async: error -95 Failure of the virtio device suspend which will trigger pm_recover+resume. [ 98.509451] ata3.00: Entering standby power mode [ 98.511197] sd 1:0:0:0: [sdb] Synchronizing SCSI cache [ 98.512917] sd 0:0:0:0: [sda] Synchronizing SCSI cache Here, the scsi devices already suspended (asynchronously) do their thing. [ 98.515562] PM: Some devices failed to suspend, or early wake event detected [ 98.521098] virtio_blk virtio3: 4/0/0 default/read/poll queues [ 98.548229] OOM killer enabled. [ 98.549338] Restarting tasks ... done. The tasks are restarted due to the suspend error above. [ 98.551734] random: crng reseeded on system resumption [ 98.559697] PM: suspend exit [ 98.561038] PM: suspend entry (s2idle) [ 98.566692] Filesystems sync: 0.004 seconds [ 98.574461] Freezing user space processes [ 98.577265] Freezing user space processes completed (elapsed 0.001 seconds) [ 98.579351] OOM killer disabled. [ 98.580341] Freezing remaining freezable tasks [ 98.583109] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) but here, for whatever reason, the tasks are suspended again despite the fact that we should be resuming. After this, I only see the scsi&ata devices finishing suspend and then being resumed all normally. But no coming back to the shell. It is frozen... Any idea ?
On 8/31/23 10:48, Vivi, Rodrigo wrote: > On Thu, 2023-08-31 at 09:32 +0900, Damien Le Moal wrote: >> On 8/31/23 07:14, Rodrigo Vivi wrote: >>> On Tue, Aug 29, 2023 at 03:17:38PM +0900, Damien Le Moal wrote: >>>> On 8/26/23 02:09, Rodrigo Vivi wrote: >>>>>>> So, maybe we have some kind of disks/configuration out >>>>>>> there where this >>>>>>> start upon resume is needed? Maybe it is just a matter of >>>>>>> timming to >>>>>>> ensure some firmware underneath is up and back to life? >>>>>> >>>>>> I do not think so. Suspend will issue a start stop unit >>>>>> command to put the drive >>>>>> to sleep and resume will reset the port (which should wake up >>>>>> the drive) and >>>>>> then issue an IDENTIFY command (which will also wake up the >>>>>> drive) and other >>>>>> read logs etc to rescan the drive. >>>>>> In both cases, if the commands do not complete, we would see >>>>>> errors/timeout and >>>>>> likely port reset/drive gone events. So I think this is >>>>>> likely another subtle >>>>>> race between scsi suspend and ata suspend that is causing a >>>>>> deadlock. >>>>>> >>>>>> The main issue I think is that there is no direct ancestry >>>>>> between the ata port >>>>>> (device) and scsi device, so the change to scsi async pm ops >>>>>> made a mess of the >>>>>> suspend/resume operations ordering. For suspend, scsi device >>>>>> (child of ata port) >>>>>> should be first, then ata port device (parent). For resume, >>>>>> the reverse order is >>>>>> needed. PM normally ensures that parent/child ordering, but >>>>>> we lack that >>>>>> parent/child relationship. I am working on fixing that but it >>>>>> is very slow >>>>>> progress because I have been so far enable to recreate any of >>>>>> the issues that >>>>>> have been reported. I am patching "blind"... >>>>> >>>>> I believe your suspicious makes sense. And on these lines, that >>>>> patch you >>>>> attached earlier would fix that. However my initial tries of >>>>> that didn't >>>>> help. I'm going to run more tests and get back to you. >>>> >>>> Rodrigo, >>>> >>>> I pushed the resume-v2 branch to libata tree: >>>> >>>> git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata >>>> (or >>>> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.gi >>>> t) >>>> >>>> This branch adds 13 patches on top of 6.5.0 to cleanup libata >>>> suspend/resume and >>>> other device shutdown issues. The first 4 patches are the main >>>> ones to fix >>>> suspend resume. I tested that on 2 different machines with >>>> different drives and >>>> with qemu. All seems fine. >>>> >>>> Could you try to run this through your CI ? I am very interested >>>> in seeing if it >>>> survives your suspend/resume tests. >>> >>> well, in the end this didn't affect the CI machinery as I was >>> afraid. >>> it is only in my local DG2. >>> >>> https://intel-gfx-ci.01.org/tree/intel-xe/bat- >>> all.html?testfilter=suspend >>> (bat-dg2-oem2 one) >>> >>> I just got these 13 patches and applied to my branch and tested it >>> again >>> and it still *fails* for me. >> >> That is annoying... But I think the messages give us a hint as to >> what is going >> on. See below. >> >>> >>> [ 79.648328] [IGT] kms_pipe_crc_basic: finished subtest pipe-A- >>> DP-2, SUCCESS >>> [ 79.657353] [IGT] kms_pipe_crc_basic: starting dynamic subtest >>> pipe-B-DP-2 >>> [ 80.375042] PM: suspend entry (deep) >>> [ 80.380799] Filesystems sync: 0.002 seconds >>> [ 80.386476] Freezing user space processes >>> [ 80.392286] Freezing user space processes completed (elapsed >>> 0.001 seconds) >>> [ 80.399294] OOM killer disabled. >>> [ 80.402536] Freezing remaining freezable tasks >>> [ 80.408335] Freezing remaining freezable tasks completed >>> (elapsed 0.001 seconds) >>> [ 80.439372] sd 5:0:0:0: [sdb] Synchronizing SCSI cache >>> [ 80.439716] serial 00:01: disabled >>> [ 80.448011] sd 4:0:0:0: [sda] Synchronizing SCSI cache >>> [ 80.448014] sd 7:0:0:0: [sdc] Synchronizing SCSI cache >>> [ 80.453600] ata6.00: Entering standby power mode >> >> This is sd 5:0:0:0. All good, ordered properly with the >> "Synchronizing SCSI cache". >> >>> [ 80.464217] ata5.00: Entering standby power mode >> >> Same here for sd 4:0:0:0. >> >>> [ 80.812294] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl >>> 300) >>> [ 80.818520] ata8.00: Entering active power mode >>> [ 80.842989] ata8.00: configured for UDMA/133 >> >> Arg ! sd 7:0:0:0 is resuming ! But the above "Synchronizing SCSI >> cache" tells >> us that it was suspending and libata EH did not yet put that drive to >> standby... >> >>> [ 80.847660] ata8.00: Entering standby power mode >> >> ... which happens here. So it looks like libata EH had both the >> suspend and >> resume requests at the same time, which is totally weird. > > although it looks weird, it totally matches the 'use case'. > I mean, if I suspend, resume, and wait a bit before suspend and resume > again, it will work 100% of the time. > The issue is really only when another suspend comes right after the > resume, in a loop without any wait. > >> >>> [ 81.119426] xe 0000:03:00.0: [drm] GT0: suspended >>> [ 81.800508] PM: suspend of devices complete after 1367.829 msecs >>> [ 81.806661] PM: start suspend of devices complete after 1390.859 >>> msecs >>> [ 81.813244] PM: suspend devices took 1.398 seconds >>> [ 81.820101] PM: late suspend of devices complete after 2.036 >>> msecs >> >> ...and PM suspend completes here. Resume "starts" now (but clearly it >> started >> earlier already given that sd 7:0:0:0 was reactivated. > > that is weird. > >> >>> �[ 82.403857] serial 00:01: activated >>> [ 82.489612] nvme nvme0: 16/0/0 default/read/poll queues >>> [ 82.563318] r8169 0000:07:00.0 enp7s0: Link is Down >>> [ 82.581444] xe REG[0x223a8-0x223af]: allow read access >>> [ 82.586704] xe REG[0x1c03a8-0x1c03af]: allow read access >>> [ 82.592071] xe REG[0x1d03a8-0x1d03af]: allow read access >>> [ 82.597423] xe REG[0x1c83a8-0x1c83af]: allow read access >>> [ 82.602765] xe REG[0x1d83a8-0x1d83af]: allow read access >>> [ 82.608113] xe REG[0x1a3a8-0x1a3af]: allow read access >>> [ 82.613281] xe REG[0x1c3a8-0x1c3af]: allow read access >>> [ 82.618454] xe REG[0x1e3a8-0x1e3af]: allow read access >>> [ 82.623634] xe REG[0x263a8-0x263af]: allow read access >>> [ 82.628816] xe 0000:03:00.0: [drm] GT0: resumed >>> [ 82.728005] ata7: SATA link down (SStatus 4 SControl 300) >>> [ 82.733531] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl >>> 300) >>> [ 82.739773] ata5.00: Entering active power mode >>> [ 82.744398] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl >>> 300) >>> [ 82.750618] ata6.00: Entering active power mode >>> [ 82.755961] ata5.00: configured for UDMA/133 >>> [ 82.760479] ata5.00: Enabling discard_zeroes_data >>> [ 82.836266] ata6.00: configured for UDMA/133 >>> [ 84.460081] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl >>> 300) >>> [ 84.466354] ata8.00: Entering active power mode >>> [ 84.497256] ata8.00: configured for UDMA/133 >>> ... >> >> And this looks all normal, the drives have all been transitioned to >> active >> power mode as expected. And yet, your system is stuck after this, >> right ? > > yes I think I have now figured it out, and fixed. I could reliably recreate the same hang both with qemu using a failed suspend (using a device not supporting suspend) and real hardware with a short rtc wake. It turns out that the root cause of the hang is ata_scsi_dev_rescan(), which is scheduled asynchronously from PM context on resume. With quick suspend after a resume, suspend may win the race against that ata_scsi_dev_rescan() task execution and we endup calling scsi_rescan_device() on a suspended device, causing that function to wait with the device_lock() held, which causes PM to deadlock when it needs to resume the scsi device. The recent commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume") was intended to fix that, but it did so less than ideally and the fix has a race on the scsi power state check, thus not always preventing the resume hang. I pushed a new patch series that goes on top of 6.5.0: resume-v3 branch in the libata tree: https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git This works very well for me. Using this script on real hardware: for (( i=0; i<20; i++ )); do echo "+2" > /sys/class/rtc/rtc0/wakealarm echo mem > /sys/power/state done The system repeatedly suspends and resumes and comes back OK. Of note is that if I set the delay to +1 second, then I sometime do not see the system resume and the script stops. But using wakeup-on-lan (wol command) from another machine to wake it up, the machine resumes normally and continues executing the script. So it seems that setting the rtc alarm unreasonably early result in it being lost and the system suspending wating to be woken up. I also tested this in qemu. As mentioned before, I cannot get rtc alarm to wake up the VM guest though. However, using a virtio device that does not support suspend, resume strats in the middle of the suspend operation due to the suspend error reported by that device. And it turns out that systemd really insists on suspending the system despite the error, so when running "systemctl suspend" I see a retry for suspend right after the first failed one. That is enough to trigger the issue without the patches. Please test ! Thanks.
On Tue, Sep 05, 2023 at 02:20:23PM +0900, Damien Le Moal wrote: > On 8/31/23 10:48, Vivi, Rodrigo wrote: > > On Thu, 2023-08-31 at 09:32 +0900, Damien Le Moal wrote: > >> On 8/31/23 07:14, Rodrigo Vivi wrote: > >>> On Tue, Aug 29, 2023 at 03:17:38PM +0900, Damien Le Moal wrote: > >>>> On 8/26/23 02:09, Rodrigo Vivi wrote: > >>>>>>> So, maybe we have some kind of disks/configuration out > >>>>>>> there where this > >>>>>>> start upon resume is needed? Maybe it is just a matter of > >>>>>>> timming to > >>>>>>> ensure some firmware underneath is up and back to life? > >>>>>> > >>>>>> I do not think so. Suspend will issue a start stop unit > >>>>>> command to put the drive > >>>>>> to sleep and resume will reset the port (which should wake up > >>>>>> the drive) and > >>>>>> then issue an IDENTIFY command (which will also wake up the > >>>>>> drive) and other > >>>>>> read logs etc to rescan the drive. > >>>>>> In both cases, if the commands do not complete, we would see > >>>>>> errors/timeout and > >>>>>> likely port reset/drive gone events. So I think this is > >>>>>> likely another subtle > >>>>>> race between scsi suspend and ata suspend that is causing a > >>>>>> deadlock. > >>>>>> > >>>>>> The main issue I think is that there is no direct ancestry > >>>>>> between the ata port > >>>>>> (device) and scsi device, so the change to scsi async pm ops > >>>>>> made a mess of the > >>>>>> suspend/resume operations ordering. For suspend, scsi device > >>>>>> (child of ata port) > >>>>>> should be first, then ata port device (parent). For resume, > >>>>>> the reverse order is > >>>>>> needed. PM normally ensures that parent/child ordering, but > >>>>>> we lack that > >>>>>> parent/child relationship. I am working on fixing that but it > >>>>>> is very slow > >>>>>> progress because I have been so far enable to recreate any of > >>>>>> the issues that > >>>>>> have been reported. I am patching "blind"... > >>>>> > >>>>> I believe your suspicious makes sense. And on these lines, that > >>>>> patch you > >>>>> attached earlier would fix that. However my initial tries of > >>>>> that didn't > >>>>> help. I'm going to run more tests and get back to you. > >>>> > >>>> Rodrigo, > >>>> > >>>> I pushed the resume-v2 branch to libata tree: > >>>> > >>>> git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata > >>>> (or > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.gi > >>>> t) > >>>> > >>>> This branch adds 13 patches on top of 6.5.0 to cleanup libata > >>>> suspend/resume and > >>>> other device shutdown issues. The first 4 patches are the main > >>>> ones to fix > >>>> suspend resume. I tested that on 2 different machines with > >>>> different drives and > >>>> with qemu. All seems fine. > >>>> > >>>> Could you try to run this through your CI ? I am very interested > >>>> in seeing if it > >>>> survives your suspend/resume tests. > >>> > >>> well, in the end this didn't affect the CI machinery as I was > >>> afraid. > >>> it is only in my local DG2. > >>> > >>> https://intel-gfx-ci.01.org/tree/intel-xe/bat- > >>> all.html?testfilter=suspend > >>> (bat-dg2-oem2 one) > >>> > >>> I just got these 13 patches and applied to my branch and tested it > >>> again > >>> and it still *fails* for me. > >> > >> That is annoying... But I think the messages give us a hint as to > >> what is going > >> on. See below. > >> > >>> > >>> [ 79.648328] [IGT] kms_pipe_crc_basic: finished subtest pipe-A- > >>> DP-2, SUCCESS > >>> [ 79.657353] [IGT] kms_pipe_crc_basic: starting dynamic subtest > >>> pipe-B-DP-2 > >>> [ 80.375042] PM: suspend entry (deep) > >>> [ 80.380799] Filesystems sync: 0.002 seconds > >>> [ 80.386476] Freezing user space processes > >>> [ 80.392286] Freezing user space processes completed (elapsed > >>> 0.001 seconds) > >>> [ 80.399294] OOM killer disabled. > >>> [ 80.402536] Freezing remaining freezable tasks > >>> [ 80.408335] Freezing remaining freezable tasks completed > >>> (elapsed 0.001 seconds) > >>> [ 80.439372] sd 5:0:0:0: [sdb] Synchronizing SCSI cache > >>> [ 80.439716] serial 00:01: disabled > >>> [ 80.448011] sd 4:0:0:0: [sda] Synchronizing SCSI cache > >>> [ 80.448014] sd 7:0:0:0: [sdc] Synchronizing SCSI cache > >>> [ 80.453600] ata6.00: Entering standby power mode > >> > >> This is sd 5:0:0:0. All good, ordered properly with the > >> "Synchronizing SCSI cache". > >> > >>> [ 80.464217] ata5.00: Entering standby power mode > >> > >> Same here for sd 4:0:0:0. > >> > >>> [ 80.812294] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl > >>> 300) > >>> [ 80.818520] ata8.00: Entering active power mode > >>> [ 80.842989] ata8.00: configured for UDMA/133 > >> > >> Arg ! sd 7:0:0:0 is resuming ! But the above "Synchronizing SCSI > >> cache" tells > >> us that it was suspending and libata EH did not yet put that drive to > >> standby... > >> > >>> [ 80.847660] ata8.00: Entering standby power mode > >> > >> ... which happens here. So it looks like libata EH had both the > >> suspend and > >> resume requests at the same time, which is totally weird. > > > > although it looks weird, it totally matches the 'use case'. > > I mean, if I suspend, resume, and wait a bit before suspend and resume > > again, it will work 100% of the time. > > The issue is really only when another suspend comes right after the > > resume, in a loop without any wait. > > > >> > >>> [ 81.119426] xe 0000:03:00.0: [drm] GT0: suspended > >>> [ 81.800508] PM: suspend of devices complete after 1367.829 msecs > >>> [ 81.806661] PM: start suspend of devices complete after 1390.859 > >>> msecs > >>> [ 81.813244] PM: suspend devices took 1.398 seconds > >>> [ 81.820101] PM: late suspend of devices complete after 2.036 > >>> msecs > >> > >> ...and PM suspend completes here. Resume "starts" now (but clearly it > >> started > >> earlier already given that sd 7:0:0:0 was reactivated. > > > > that is weird. > > > >> > >>> �[ 82.403857] serial 00:01: activated > >>> [ 82.489612] nvme nvme0: 16/0/0 default/read/poll queues > >>> [ 82.563318] r8169 0000:07:00.0 enp7s0: Link is Down > >>> [ 82.581444] xe REG[0x223a8-0x223af]: allow read access > >>> [ 82.586704] xe REG[0x1c03a8-0x1c03af]: allow read access > >>> [ 82.592071] xe REG[0x1d03a8-0x1d03af]: allow read access > >>> [ 82.597423] xe REG[0x1c83a8-0x1c83af]: allow read access > >>> [ 82.602765] xe REG[0x1d83a8-0x1d83af]: allow read access > >>> [ 82.608113] xe REG[0x1a3a8-0x1a3af]: allow read access > >>> [ 82.613281] xe REG[0x1c3a8-0x1c3af]: allow read access > >>> [ 82.618454] xe REG[0x1e3a8-0x1e3af]: allow read access > >>> [ 82.623634] xe REG[0x263a8-0x263af]: allow read access > >>> [ 82.628816] xe 0000:03:00.0: [drm] GT0: resumed > >>> [ 82.728005] ata7: SATA link down (SStatus 4 SControl 300) > >>> [ 82.733531] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl > >>> 300) > >>> [ 82.739773] ata5.00: Entering active power mode > >>> [ 82.744398] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl > >>> 300) > >>> [ 82.750618] ata6.00: Entering active power mode > >>> [ 82.755961] ata5.00: configured for UDMA/133 > >>> [ 82.760479] ata5.00: Enabling discard_zeroes_data > >>> [ 82.836266] ata6.00: configured for UDMA/133 > >>> [ 84.460081] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl > >>> 300) > >>> [ 84.466354] ata8.00: Entering active power mode > >>> [ 84.497256] ata8.00: configured for UDMA/133 > >>> ... > >> > >> And this looks all normal, the drives have all been transitioned to > >> active > >> power mode as expected. And yet, your system is stuck after this, > >> right ? > > > > yes > > I think I have now figured it out, and fixed. I could reliably recreate the same > hang both with qemu using a failed suspend (using a device not supporting > suspend) and real hardware with a short rtc wake. > > It turns out that the root cause of the hang is ata_scsi_dev_rescan(), which is > scheduled asynchronously from PM context on resume. With quick suspend after a > resume, suspend may win the race against that ata_scsi_dev_rescan() task > execution and we endup calling scsi_rescan_device() on a suspended device, > causing that function to wait with the device_lock() held, which causes PM to > deadlock when it needs to resume the scsi device. The recent commit 6aa0365a3c85 > ("ata: libata-scsi: Avoid deadlock on rescan after device resume") was intended > to fix that, but it did so less than ideally and the fix has a race on the scsi > power state check, thus not always preventing the resume hang. > > I pushed a new patch series that goes on top of 6.5.0: resume-v3 branch in the > libata tree: > > https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git > > This works very well for me. Using this script on real hardware: > > for (( i=0; i<20; i++ )); do > echo "+2" > /sys/class/rtc/rtc0/wakealarm > echo mem > /sys/power/state > done > > The system repeatedly suspends and resumes and comes back OK. Of note is that if > I set the delay to +1 second, then I sometime do not see the system resume and > the script stops. But using wakeup-on-lan (wol command) from another machine to > wake it up, the machine resumes normally and continues executing the script. So > it seems that setting the rtc alarm unreasonably early result in it being lost > and the system suspending wating to be woken up. > > I also tested this in qemu. As mentioned before, I cannot get rtc alarm to wake > up the VM guest though. However, using a virtio device that does not support > suspend, resume strats in the middle of the suspend operation due to the suspend > error reported by that device. And it turns out that systemd really insists on > suspending the system despite the error, so when running "systemctl suspend" I > see a retry for suspend right after the first failed one. That is enough to > trigger the issue without the patches. > > Please test ! \o/ works for me! Feel free to use: Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Thank you so much, Rodrigo. > > Thanks. > > -- > Damien Le Moal > Western Digital Research >
On 9/6/23 02:17, Rodrigo Vivi wrote: >> I think I have now figured it out, and fixed. I could reliably recreate the same >> hang both with qemu using a failed suspend (using a device not supporting >> suspend) and real hardware with a short rtc wake. >> >> It turns out that the root cause of the hang is ata_scsi_dev_rescan(), which is >> scheduled asynchronously from PM context on resume. With quick suspend after a >> resume, suspend may win the race against that ata_scsi_dev_rescan() task >> execution and we endup calling scsi_rescan_device() on a suspended device, >> causing that function to wait with the device_lock() held, which causes PM to >> deadlock when it needs to resume the scsi device. The recent commit 6aa0365a3c85 >> ("ata: libata-scsi: Avoid deadlock on rescan after device resume") was intended >> to fix that, but it did so less than ideally and the fix has a race on the scsi >> power state check, thus not always preventing the resume hang. >> >> I pushed a new patch series that goes on top of 6.5.0: resume-v3 branch in the >> libata tree: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git >> >> This works very well for me. Using this script on real hardware: >> >> for (( i=0; i<20; i++ )); do >> echo "+2" > /sys/class/rtc/rtc0/wakealarm >> echo mem > /sys/power/state >> done >> >> The system repeatedly suspends and resumes and comes back OK. Of note is that if >> I set the delay to +1 second, then I sometime do not see the system resume and >> the script stops. But using wakeup-on-lan (wol command) from another machine to >> wake it up, the machine resumes normally and continues executing the script. So >> it seems that setting the rtc alarm unreasonably early result in it being lost >> and the system suspending wating to be woken up. >> >> I also tested this in qemu. As mentioned before, I cannot get rtc alarm to wake >> up the VM guest though. However, using a virtio device that does not support >> suspend, resume strats in the middle of the suspend operation due to the suspend >> error reported by that device. And it turns out that systemd really insists on >> suspending the system despite the error, so when running "systemctl suspend" I >> see a retry for suspend right after the first failed one. That is enough to >> trigger the issue without the patches. >> >> Please test ! > > \o/ works for me! > > Feel free to use: > Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Awesome ! Thank you for testing. I will rebase the patches and post the official version for 6.6 fixes (and the other cleanup patches for 6.7), after retesting again. Never know :)
Hi Damien, On Mon, 31 Jul 2023, Damien Le Moal wrote: > During system resume, ata_port_pm_resume() triggers ata EH to > 1) Resume the controller > 2) Reset and rescan the ports > 3) Revalidate devices > This EH execution is started asynchronously from ata_port_pm_resume(), > which means that when sd_resume() is executed, none or only part of the > above processing may have been executed. However, sd_resume() issues a > START STOP UNIT to wake up the drive from sleep mode. This command is > translated to ATA with ata_scsi_start_stop_xlat() and issued to the > device. However, depending on the state of execution of the EH process > and revalidation triggerred by ata_port_pm_resume(), two things may > happen: > 1) The START STOP UNIT fails if it is received before the controller has > been reenabled at the beginning of the EH execution. This is visible > with error messages like: > > ata10.00: device reported invalid CHS sector 0 > sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK > sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] > sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command > sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 > sd 9:0:0:0: PM: failed to resume async: error -5 > > 2) The START STOP UNIT command is received while the EH process is > on-going, which mean that it is stopped and must wait for its > completion, at which point the command is rather useless as the drive > is already fully spun up already. This case results also in a > significant delay in sd_resume() which is observable by users as > the entire system resume completion is delayed. > > Given that ATA devices will be woken up by libata activity on resume, > sd_resume() has no need to issue a START STOP UNIT command, which solves > the above mentioned problems. Do not issue this command by introducing > the new scsi_device flag no_start_on_resume and setting this flag to 1 > in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > UNIT command only if this flag is not set. > > Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 > Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") > Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Thanks for your patch, which is now commit 0a8589055936d8fe ("ata,scsi: do not issue START STOP UNIT on resume") in v6.5-rc5. Sorry for being late to the party, but this commit landed upstream during my summer holidays, and apparently I wasn't that focussed on noticing small behavioral changes after getting back to work... I noticed an oddity after s2idle or s2ram on Renesas Salvator-XS (R-Car H3 ES2.0) with an old (spinning rust) SATA drive, and bisected it to this commit: when accessing the drive after system resume, there is now a delay of ca. 5s before data is returned, presumably due to starting the drive, and having to wait for it to spin up to be able to read data. But the good news is that the actual system resume takes less time than before (reduced by even more than ca. 5s!), so this looks like a net win... I'm not sure this is to be called a regression, I just wanted you to be aware of it. Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On 9/13/23 02:39, Geert Uytterhoeven wrote: > Hi Damien, > > On Mon, 31 Jul 2023, Damien Le Moal wrote: >> During system resume, ata_port_pm_resume() triggers ata EH to >> 1) Resume the controller >> 2) Reset and rescan the ports >> 3) Revalidate devices >> This EH execution is started asynchronously from ata_port_pm_resume(), >> which means that when sd_resume() is executed, none or only part of the >> above processing may have been executed. However, sd_resume() issues a >> START STOP UNIT to wake up the drive from sleep mode. This command is >> translated to ATA with ata_scsi_start_stop_xlat() and issued to the >> device. However, depending on the state of execution of the EH process >> and revalidation triggerred by ata_port_pm_resume(), two things may >> happen: >> 1) The START STOP UNIT fails if it is received before the controller has >> been reenabled at the beginning of the EH execution. This is visible >> with error messages like: >> >> ata10.00: device reported invalid CHS sector 0 >> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK >> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] >> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command >> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 >> sd 9:0:0:0: PM: failed to resume async: error -5 >> >> 2) The START STOP UNIT command is received while the EH process is >> on-going, which mean that it is stopped and must wait for its >> completion, at which point the command is rather useless as the drive >> is already fully spun up already. This case results also in a >> significant delay in sd_resume() which is observable by users as >> the entire system resume completion is delayed. >> >> Given that ATA devices will be woken up by libata activity on resume, >> sd_resume() has no need to issue a START STOP UNIT command, which solves >> the above mentioned problems. Do not issue this command by introducing >> the new scsi_device flag no_start_on_resume and setting this flag to 1 >> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP >> UNIT command only if this flag is not set. >> >> Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> >> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 >> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") >> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> > > Thanks for your patch, which is now commit 0a8589055936d8fe > ("ata,scsi: do not issue START STOP UNIT on resume") in v6.5-rc5. > Sorry for being late to the party, but this commit landed upstream > during my summer holidays, and apparently I wasn't that focussed on > noticing small behavioral changes after getting back to work... > > I noticed an oddity after s2idle or s2ram on Renesas Salvator-XS (R-Car > H3 ES2.0) with an old (spinning rust) SATA drive, and bisected it to > this commit: when accessing the drive after system resume, there is now > a delay of ca. 5s before data is returned, presumably due to starting > the drive, and having to wait for it to spin up to be able to read data. > But the good news is that the actual system resume takes less time than > before (reduced by even more than ca. 5s!), so this looks like a net > win... Thanks for the report. The delay for the first data access from user space right after resume is 100% expected, with or without this patch. The reason is that waking up the drive (spinning it up) is done asynchronously from the PM resume context, so when you get "PM suspend exit" message signaling that the system is resumed, the drive may not yet be spinning. Any access will wait for that to happen before proceeding. Depending on the drive that can take up to 10s or so. I am not entirely sure where the net win you see come from. But the patch you mention is in fact completely wrong and does not fix the underlying issues with ata suspend/resume and potential deadlocks in PM due to ata ports relationship with scsi devices. So I have been working on fixing this for the last 2 weeks, after another user also reported issues with the patch you mention [1]. Could you try libata for-next branch on your system ? There are 7 fix patches in there that I plan to send out for 6.6-rc2 to fix the patch in question and other issues potentially causing deadlocks on resume. The patches were posted also [2]. https://lore.kernel.org/linux-ide/20230912005655.368075-1-dlemoal@kernel.org/T/#t > I'm not sure this is to be called a regression, I just wanted you to be > aware of it. The delay on access is not a regression. You likely where not seeing it before because of the lack of synchronization between ata & scsi on resume was likely making the resume processing take longer than it needs to be, which gave time to the drive to spin up. The patches in libata for-next should fix all that. Please test ! Thanks ! [1] https://lore.kernel.org/linux-ide/8be9c370-2f1-5815-431-f68ab868669@linux-m68k.org/T/#m066ae988869d588c1b8c684cb52773e7708ade4c [2] https://lore.kernel.org/linux-ide/20230912005655.368075-1-dlemoal@kernel.org/T/#t
Hi Damien, On Wed, Sep 13, 2023 at 12:58 AM Damien Le Moal <dlemoal@kernel.org> wrote: > On 9/13/23 02:39, Geert Uytterhoeven wrote: > > On Mon, 31 Jul 2023, Damien Le Moal wrote: > >> During system resume, ata_port_pm_resume() triggers ata EH to > >> 1) Resume the controller > >> 2) Reset and rescan the ports > >> 3) Revalidate devices > >> This EH execution is started asynchronously from ata_port_pm_resume(), > >> which means that when sd_resume() is executed, none or only part of the > >> above processing may have been executed. However, sd_resume() issues a > >> START STOP UNIT to wake up the drive from sleep mode. This command is > >> translated to ATA with ata_scsi_start_stop_xlat() and issued to the > >> device. However, depending on the state of execution of the EH process > >> and revalidation triggerred by ata_port_pm_resume(), two things may > >> happen: > >> 1) The START STOP UNIT fails if it is received before the controller has > >> been reenabled at the beginning of the EH execution. This is visible > >> with error messages like: > >> > >> ata10.00: device reported invalid CHS sector 0 > >> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK > >> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] > >> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command > >> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 > >> sd 9:0:0:0: PM: failed to resume async: error -5 > >> > >> 2) The START STOP UNIT command is received while the EH process is > >> on-going, which mean that it is stopped and must wait for its > >> completion, at which point the command is rather useless as the drive > >> is already fully spun up already. This case results also in a > >> significant delay in sd_resume() which is observable by users as > >> the entire system resume completion is delayed. > >> > >> Given that ATA devices will be woken up by libata activity on resume, > >> sd_resume() has no need to issue a START STOP UNIT command, which solves > >> the above mentioned problems. Do not issue this command by introducing > >> the new scsi_device flag no_start_on_resume and setting this flag to 1 > >> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > >> UNIT command only if this flag is not set. > >> > >> Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> > >> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 > >> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") > >> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> > > > > Thanks for your patch, which is now commit 0a8589055936d8fe > > ("ata,scsi: do not issue START STOP UNIT on resume") in v6.5-rc5. > > Sorry for being late to the party, but this commit landed upstream > > during my summer holidays, and apparently I wasn't that focussed on > > noticing small behavioral changes after getting back to work... > > > > I noticed an oddity after s2idle or s2ram on Renesas Salvator-XS (R-Car > > H3 ES2.0) with an old (spinning rust) SATA drive, and bisected it to > > this commit: when accessing the drive after system resume, there is now > > a delay of ca. 5s before data is returned, presumably due to starting > > the drive, and having to wait for it to spin up to be able to read data. > > But the good news is that the actual system resume takes less time than > > before (reduced by even more than ca. 5s!), so this looks like a net > > win... > > Thanks for the report. The delay for the first data access from user space right > after resume is 100% expected, with or without this patch. The reason is that > waking up the drive (spinning it up) is done asynchronously from the PM resume > context, so when you get "PM suspend exit" message signaling that the system is > resumed, the drive may not yet be spinning. Any access will wait for that to > happen before proceeding. Depending on the drive that can take up to 10s or so. That does not match with what I am seeing: before this patch, there was no delay on first data access from user space, as the drive is fully spun up when system resume returns. With this patch, system resume returns earlier, and the drive is only spun up when user space starts accessing data. Note that I do not have any file system mounted, and use "hd /dev/sda | head -70" to access the disk. > I am not entirely sure where the net win you see come from. But the patch you > mention is in fact completely wrong and does not fix the underlying issues with > ata suspend/resume and potential deadlocks in PM due to ata ports relationship > with scsi devices. So I have been working on fixing this for the last 2 weeks, > after another user also reported issues with the patch you mention [1]. > > Could you try libata for-next branch on your system ? There are 7 fix patches in > there that I plan to send out for 6.6-rc2 to fix the patch in question and other > issues potentially causing deadlocks on resume. The patches were posted also [2]. > > https://lore.kernel.org/linux-ide/20230912005655.368075-1-dlemoal@kernel.org/T/#t Unfortunately that didn't work, as /dev/sda no longer exists. Will reply to the patch I bisected the issue to... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hi Damien, On Wed, Sep 13, 2023 at 12:21 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Wed, Sep 13, 2023 at 12:58 AM Damien Le Moal <dlemoal@kernel.org> wrote: > > On 9/13/23 02:39, Geert Uytterhoeven wrote: > > > On Mon, 31 Jul 2023, Damien Le Moal wrote: > > >> During system resume, ata_port_pm_resume() triggers ata EH to > > >> 1) Resume the controller > > >> 2) Reset and rescan the ports > > >> 3) Revalidate devices > > >> This EH execution is started asynchronously from ata_port_pm_resume(), > > >> which means that when sd_resume() is executed, none or only part of the > > >> above processing may have been executed. However, sd_resume() issues a > > >> START STOP UNIT to wake up the drive from sleep mode. This command is > > >> translated to ATA with ata_scsi_start_stop_xlat() and issued to the > > >> device. However, depending on the state of execution of the EH process > > >> and revalidation triggerred by ata_port_pm_resume(), two things may > > >> happen: > > >> 1) The START STOP UNIT fails if it is received before the controller has > > >> been reenabled at the beginning of the EH execution. This is visible > > >> with error messages like: > > >> > > >> ata10.00: device reported invalid CHS sector 0 > > >> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK > > >> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] > > >> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command > > >> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 > > >> sd 9:0:0:0: PM: failed to resume async: error -5 > > >> > > >> 2) The START STOP UNIT command is received while the EH process is > > >> on-going, which mean that it is stopped and must wait for its > > >> completion, at which point the command is rather useless as the drive > > >> is already fully spun up already. This case results also in a > > >> significant delay in sd_resume() which is observable by users as > > >> the entire system resume completion is delayed. > > >> > > >> Given that ATA devices will be woken up by libata activity on resume, > > >> sd_resume() has no need to issue a START STOP UNIT command, which solves > > >> the above mentioned problems. Do not issue this command by introducing > > >> the new scsi_device flag no_start_on_resume and setting this flag to 1 > > >> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > > >> UNIT command only if this flag is not set. > > >> > > >> Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> > > >> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 > > >> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") > > >> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> > > > > > > Thanks for your patch, which is now commit 0a8589055936d8fe > > > ("ata,scsi: do not issue START STOP UNIT on resume") in v6.5-rc5. > > > Sorry for being late to the party, but this commit landed upstream > > > during my summer holidays, and apparently I wasn't that focussed on > > > noticing small behavioral changes after getting back to work... > > > > > > I noticed an oddity after s2idle or s2ram on Renesas Salvator-XS (R-Car > > > H3 ES2.0) with an old (spinning rust) SATA drive, and bisected it to > > > this commit: when accessing the drive after system resume, there is now > > > a delay of ca. 5s before data is returned, presumably due to starting > > > the drive, and having to wait for it to spin up to be able to read data. > > > But the good news is that the actual system resume takes less time than > > > before (reduced by even more than ca. 5s!), so this looks like a net > > > win... > > > > Thanks for the report. The delay for the first data access from user space right > > after resume is 100% expected, with or without this patch. The reason is that > > waking up the drive (spinning it up) is done asynchronously from the PM resume > > context, so when you get "PM suspend exit" message signaling that the system is > > resumed, the drive may not yet be spinning. Any access will wait for that to > > happen before proceeding. Depending on the drive that can take up to 10s or so. > > That does not match with what I am seeing: before this patch, there > was no delay on first data access from user space, as the drive is fully > spun up when system resume returns. > With this patch, system resume returns earlier, and the drive is only > spun up when user space starts accessing data. > > Note that I do not have any file system mounted, and use > "hd /dev/sda | head -70" to access the disk. > > > I am not entirely sure where the net win you see come from. But the patch you > > mention is in fact completely wrong and does not fix the underlying issues with > > ata suspend/resume and potential deadlocks in PM due to ata ports relationship > > with scsi devices. So I have been working on fixing this for the last 2 weeks, > > after another user also reported issues with the patch you mention [1]. > > > > Could you try libata for-next branch on your system ? There are 7 fix patches in > > there that I plan to send out for 6.6-rc2 to fix the patch in question and other > > issues potentially causing deadlocks on resume. The patches were posted also [2]. > > > > https://lore.kernel.org/linux-ide/20230912005655.368075-1-dlemoal@kernel.org/T/#t > > Unfortunately that didn't work, as /dev/sda no longer exists. > Will reply to the patch I bisected the issue to... With libata/for-next (fa2259a59966c005 ("ata: libata: Cleanup inline DMA helper functions")) and commit 99626085d036ec32 ("ata: libata-scsi: link ata port and scsi device") reverted, it behaves as before (disk is spun up when system resume completes, no delay when accessing the disk from userspace). Gr{oetje,eeting}s, Geert
On 9/13/23 19:21, Geert Uytterhoeven wrote: >> Thanks for the report. The delay for the first data access from user space right >> after resume is 100% expected, with or without this patch. The reason is that >> waking up the drive (spinning it up) is done asynchronously from the PM resume >> context, so when you get "PM suspend exit" message signaling that the system is >> resumed, the drive may not yet be spinning. Any access will wait for that to >> happen before proceeding. Depending on the drive that can take up to 10s or so. > > That does not match with what I am seeing: before this patch, there > was no delay on first data access from user space, as the drive is fully > spun up when system resume returns. Yes, that is a possibility, but not by design. Some users have complained about the long resume times which causes laptop screens to be "black" until disks spin up. That did not happen before 5.16, when the change to scsi using the PM async ops to do suspend/resume created all the issues with suspend/resume on ata side. I am going to run 5.15 again to check. The patch "do not issue START STOP UNIT on resume" was a botch attempt to remove this delay. But it is a bad one because the ATA specs define that a drive can get out of standby (spun down) power state only with a media access command (a VERIFY command is used to spin up the disk). And furthermore, the specs also says that even a reset shall not change the device power state. So issuing a VERIFY command to spin up the drive is required per specs. Note that I do see many of my drives (I have hundreds in the lab) spinning up on reset, which is against the specs. But not all of them. So with the patch "do not issue START STOP UNIT on resume", one risks not seeing the drive resuming correctly (timeout errors on IDENTIFY command issued on resume will get the drive removed). > With this patch, system resume returns earlier, and the drive is only > spun up when user space starts accessing data. Yes, but "when user space starts accessing data" -> "user space accesses are processed only after the drive completes spinning up" would be more accurate. That is by design and expected. This is the behavior one would see if the drive is set to use standby timers (to go to standby on its own after some idle time) or if runtime suspend is used. I do not see any issue with this behavior. Is this causing any issue on your end ? Do you have concerns about this approach ? Having the resume process wait for the drive to fully spin-up is easy to do. But as I mentioned, many users are really not happy about how slow resuming become with that. If I do that, you will get back the previous behavior you mention, but I will be again getting emails about "resume is broken". I made a decision: no waiting for spinup. That causes a delay for the user on first access, but that makes resume overall far faster. I do not want to go back on that, unless you can confirm that there is a real regression/error/data corruption happening. > Note that I do not have any file system mounted, and use > "hd /dev/sda | head -70" to access the disk. That is fine. dd would do as well. Any media access, from user or from an FS will be delayed until the disk spins up. The default timeout for commands is 30s, which is plenty of time for the drive to complete spinup. So unless one starts pounding the drive with some crazy workload right after user space resumes, there should be no timeout errors or any issues with that. >> I am not entirely sure where the net win you see come from. But the patch you >> mention is in fact completely wrong and does not fix the underlying issues with >> ata suspend/resume and potential deadlocks in PM due to ata ports relationship >> with scsi devices. So I have been working on fixing this for the last 2 weeks, >> after another user also reported issues with the patch you mention [1]. >> >> Could you try libata for-next branch on your system ? There are 7 fix patches in >> there that I plan to send out for 6.6-rc2 to fix the patch in question and other >> issues potentially causing deadlocks on resume. The patches were posted also [2]. >> >> https://lore.kernel.org/linux-ide/20230912005655.368075-1-dlemoal@kernel.org/T/#t > > Unfortunately that didn't work, as /dev/sda no longer exists. > Will reply to the patch I bisected the issue to... Now that is a problem...
On 9/13/23 19:34, Geert Uytterhoeven wrote: > Hi Damien, > > On Wed, Sep 13, 2023 at 12:21 PM Geert Uytterhoeven > <geert@linux-m68k.org> wrote: >> On Wed, Sep 13, 2023 at 12:58 AM Damien Le Moal <dlemoal@kernel.org> wrote: >>> On 9/13/23 02:39, Geert Uytterhoeven wrote: >>>> On Mon, 31 Jul 2023, Damien Le Moal wrote: >>>>> During system resume, ata_port_pm_resume() triggers ata EH to >>>>> 1) Resume the controller >>>>> 2) Reset and rescan the ports >>>>> 3) Revalidate devices >>>>> This EH execution is started asynchronously from ata_port_pm_resume(), >>>>> which means that when sd_resume() is executed, none or only part of the >>>>> above processing may have been executed. However, sd_resume() issues a >>>>> START STOP UNIT to wake up the drive from sleep mode. This command is >>>>> translated to ATA with ata_scsi_start_stop_xlat() and issued to the >>>>> device. However, depending on the state of execution of the EH process >>>>> and revalidation triggerred by ata_port_pm_resume(), two things may >>>>> happen: >>>>> 1) The START STOP UNIT fails if it is received before the controller has >>>>> been reenabled at the beginning of the EH execution. This is visible >>>>> with error messages like: >>>>> >>>>> ata10.00: device reported invalid CHS sector 0 >>>>> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK >>>>> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] >>>>> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command >>>>> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 >>>>> sd 9:0:0:0: PM: failed to resume async: error -5 >>>>> >>>>> 2) The START STOP UNIT command is received while the EH process is >>>>> on-going, which mean that it is stopped and must wait for its >>>>> completion, at which point the command is rather useless as the drive >>>>> is already fully spun up already. This case results also in a >>>>> significant delay in sd_resume() which is observable by users as >>>>> the entire system resume completion is delayed. >>>>> >>>>> Given that ATA devices will be woken up by libata activity on resume, >>>>> sd_resume() has no need to issue a START STOP UNIT command, which solves >>>>> the above mentioned problems. Do not issue this command by introducing >>>>> the new scsi_device flag no_start_on_resume and setting this flag to 1 >>>>> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP >>>>> UNIT command only if this flag is not set. >>>>> >>>>> Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> >>>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 >>>>> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") >>>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> >>>> >>>> Thanks for your patch, which is now commit 0a8589055936d8fe >>>> ("ata,scsi: do not issue START STOP UNIT on resume") in v6.5-rc5. >>>> Sorry for being late to the party, but this commit landed upstream >>>> during my summer holidays, and apparently I wasn't that focussed on >>>> noticing small behavioral changes after getting back to work... >>>> >>>> I noticed an oddity after s2idle or s2ram on Renesas Salvator-XS (R-Car >>>> H3 ES2.0) with an old (spinning rust) SATA drive, and bisected it to >>>> this commit: when accessing the drive after system resume, there is now >>>> a delay of ca. 5s before data is returned, presumably due to starting >>>> the drive, and having to wait for it to spin up to be able to read data. >>>> But the good news is that the actual system resume takes less time than >>>> before (reduced by even more than ca. 5s!), so this looks like a net >>>> win... >>> >>> Thanks for the report. The delay for the first data access from user space right >>> after resume is 100% expected, with or without this patch. The reason is that >>> waking up the drive (spinning it up) is done asynchronously from the PM resume >>> context, so when you get "PM suspend exit" message signaling that the system is >>> resumed, the drive may not yet be spinning. Any access will wait for that to >>> happen before proceeding. Depending on the drive that can take up to 10s or so. >> >> That does not match with what I am seeing: before this patch, there >> was no delay on first data access from user space, as the drive is fully >> spun up when system resume returns. >> With this patch, system resume returns earlier, and the drive is only >> spun up when user space starts accessing data. >> >> Note that I do not have any file system mounted, and use >> "hd /dev/sda | head -70" to access the disk. >> >>> I am not entirely sure where the net win you see come from. But the patch you >>> mention is in fact completely wrong and does not fix the underlying issues with >>> ata suspend/resume and potential deadlocks in PM due to ata ports relationship >>> with scsi devices. So I have been working on fixing this for the last 2 weeks, >>> after another user also reported issues with the patch you mention [1]. >>> >>> Could you try libata for-next branch on your system ? There are 7 fix patches in >>> there that I plan to send out for 6.6-rc2 to fix the patch in question and other >>> issues potentially causing deadlocks on resume. The patches were posted also [2]. >>> >>> https://lore.kernel.org/linux-ide/20230912005655.368075-1-dlemoal@kernel.org/T/#t >> >> Unfortunately that didn't work, as /dev/sda no longer exists. >> Will reply to the patch I bisected the issue to... > > With libata/for-next (fa2259a59966c005 ("ata: libata: Cleanup inline > DMA helper functions")) and commit 99626085d036ec32 ("ata: libata-scsi: > link ata port and scsi device") reverted, it behaves as before (disk > is spun up when system resume completes, no delay when accessing the > disk from userspace). I will check the ata platform driver for R-CAR. I may have overlooked something in that area. I tested with AHCI and libsas adapters only as I do not have ATA on the few ARM SBC boards I have. And I do not have an R-CAR board. What surprises me is that you need to revert ata: libata: Cleanup inline DMA helper functions". This patch has 0 functional changes and really is only a code cleanup... Nothing should change with it. Can you confirm that you really need to revert that patch to get things working again ?
Hi Damien, On Thu, Sep 14, 2023 at 12:03 AM Damien Le Moal <dlemoal@kernel.org> wrote: > On 9/13/23 19:21, Geert Uytterhoeven wrote: > >> Thanks for the report. The delay for the first data access from user space right > >> after resume is 100% expected, with or without this patch. The reason is that > >> waking up the drive (spinning it up) is done asynchronously from the PM resume > >> context, so when you get "PM suspend exit" message signaling that the system is > >> resumed, the drive may not yet be spinning. Any access will wait for that to > >> happen before proceeding. Depending on the drive that can take up to 10s or so. > > > > That does not match with what I am seeing: before this patch, there > > was no delay on first data access from user space, as the drive is fully > > spun up when system resume returns. > > Yes, that is a possibility, but not by design. Some users have complained about > the long resume times which causes laptop screens to be "black" until disks spin > up. That did not happen before 5.16, when the change to scsi using the PM async > ops to do suspend/resume created all the issues with suspend/resume on ata side. > I am going to run 5.15 again to check. > > The patch "do not issue START STOP UNIT on resume" was a botch attempt to remove > this delay. But it is a bad one because the ATA specs define that a drive can > get out of standby (spun down) power state only with a media access command (a > VERIFY command is used to spin up the disk). And furthermore, the specs also > says that even a reset shall not change the device power state. So issuing a > VERIFY command to spin up the drive is required per specs. Note that I do see > many of my drives (I have hundreds in the lab) spinning up on reset, which is > against the specs. But not all of them. So with the patch "do not issue START > STOP UNIT on resume", one risks not seeing the drive resuming correctly (timeout > errors on IDENTIFY command issued on resume will get the drive removed). > > > With this patch, system resume returns earlier, and the drive is only > > spun up when user space starts accessing data. > > Yes, but "when user space starts accessing data" -> "user space accesses are > processed only after the drive completes spinning up" would be more accurate. Sure, I wrote it down in terms of what the user is experiencing, which may not be identical to what's happening under the hood. > That is by design and expected. This is the behavior one would see if the drive > is set to use standby timers (to go to standby on its own after some idle time) > or if runtime suspend is used. I do not see any issue with this behavior. Is > this causing any issue on your end ? Do you have concerns about this approach ? > > Having the resume process wait for the drive to fully spin-up is easy to do. But > as I mentioned, many users are really not happy about how slow resuming become > with that. If I do that, you will get back the previous behavior you mention, > but I will be again getting emails about "resume is broken". > > I made a decision: no waiting for spinup. That causes a delay for the user on > first access, but that makes resume overall far faster. I do not want to go back > on that, unless you can confirm that there is a real regression/error/data > corruption happening. I agree that is fine. Gr{oetje,eeting}s, Geert
On 9/14/23 15:53, Geert Uytterhoeven wrote: > Hi Damien, > > On Thu, Sep 14, 2023 at 12:03 AM Damien Le Moal <dlemoal@kernel.org> wrote: >> On 9/13/23 19:21, Geert Uytterhoeven wrote: >>>> Thanks for the report. The delay for the first data access from user space right >>>> after resume is 100% expected, with or without this patch. The reason is that >>>> waking up the drive (spinning it up) is done asynchronously from the PM resume >>>> context, so when you get "PM suspend exit" message signaling that the system is >>>> resumed, the drive may not yet be spinning. Any access will wait for that to >>>> happen before proceeding. Depending on the drive that can take up to 10s or so. >>> >>> That does not match with what I am seeing: before this patch, there >>> was no delay on first data access from user space, as the drive is fully >>> spun up when system resume returns. >> >> Yes, that is a possibility, but not by design. Some users have complained about >> the long resume times which causes laptop screens to be "black" until disks spin >> up. That did not happen before 5.16, when the change to scsi using the PM async >> ops to do suspend/resume created all the issues with suspend/resume on ata side. >> I am going to run 5.15 again to check. >> >> The patch "do not issue START STOP UNIT on resume" was a botch attempt to remove >> this delay. But it is a bad one because the ATA specs define that a drive can >> get out of standby (spun down) power state only with a media access command (a >> VERIFY command is used to spin up the disk). And furthermore, the specs also >> says that even a reset shall not change the device power state. So issuing a >> VERIFY command to spin up the drive is required per specs. Note that I do see >> many of my drives (I have hundreds in the lab) spinning up on reset, which is >> against the specs. But not all of them. So with the patch "do not issue START >> STOP UNIT on resume", one risks not seeing the drive resuming correctly (timeout >> errors on IDENTIFY command issued on resume will get the drive removed). >> >>> With this patch, system resume returns earlier, and the drive is only >>> spun up when user space starts accessing data. >> >> Yes, but "when user space starts accessing data" -> "user space accesses are >> processed only after the drive completes spinning up" would be more accurate. > > Sure, I wrote it down in terms of what the user is experiencing, which may > not be identical to what's happening under the hood. > >> That is by design and expected. This is the behavior one would see if the drive >> is set to use standby timers (to go to standby on its own after some idle time) >> or if runtime suspend is used. I do not see any issue with this behavior. Is >> this causing any issue on your end ? Do you have concerns about this approach ? >> >> Having the resume process wait for the drive to fully spin-up is easy to do. But >> as I mentioned, many users are really not happy about how slow resuming become >> with that. If I do that, you will get back the previous behavior you mention, >> but I will be again getting emails about "resume is broken". >> >> I made a decision: no waiting for spinup. That causes a delay for the user on >> first access, but that makes resume overall far faster. I do not want to go back >> on that, unless you can confirm that there is a real regression/error/data >> corruption happening. > > I agree that is fine. Still, R-CAR is not working for you with the patch adding the link. Looking into it now. Trying to verify things with qemu & regular AHCI on PCs too as I do not have an R-CAR board. Hard to debug :) Preparing some patches for you to get more debug info. > > Gr{oetje,eeting}s, > > Geert >
Hi Damien, On Thu, Sep 14, 2023 at 12:08 AM Damien Le Moal <dlemoal@kernel.org> wrote: > On 9/13/23 19:34, Geert Uytterhoeven wrote: > > On Wed, Sep 13, 2023 at 12:21 PM Geert Uytterhoeven > > <geert@linux-m68k.org> wrote: > >> On Wed, Sep 13, 2023 at 12:58 AM Damien Le Moal <dlemoal@kernel.org> wrote: > >>> On 9/13/23 02:39, Geert Uytterhoeven wrote: > >>>> On Mon, 31 Jul 2023, Damien Le Moal wrote: > >>>>> During system resume, ata_port_pm_resume() triggers ata EH to > >>>>> 1) Resume the controller > >>>>> 2) Reset and rescan the ports > >>>>> 3) Revalidate devices > >>>>> This EH execution is started asynchronously from ata_port_pm_resume(), > >>>>> which means that when sd_resume() is executed, none or only part of the > >>>>> above processing may have been executed. However, sd_resume() issues a > >>>>> START STOP UNIT to wake up the drive from sleep mode. This command is > >>>>> translated to ATA with ata_scsi_start_stop_xlat() and issued to the > >>>>> device. However, depending on the state of execution of the EH process > >>>>> and revalidation triggerred by ata_port_pm_resume(), two things may > >>>>> happen: > >>>>> 1) The START STOP UNIT fails if it is received before the controller has > >>>>> been reenabled at the beginning of the EH execution. This is visible > >>>>> with error messages like: > >>>>> > >>>>> ata10.00: device reported invalid CHS sector 0 > >>>>> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK > >>>>> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] > >>>>> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command > >>>>> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 > >>>>> sd 9:0:0:0: PM: failed to resume async: error -5 > >>>>> > >>>>> 2) The START STOP UNIT command is received while the EH process is > >>>>> on-going, which mean that it is stopped and must wait for its > >>>>> completion, at which point the command is rather useless as the drive > >>>>> is already fully spun up already. This case results also in a > >>>>> significant delay in sd_resume() which is observable by users as > >>>>> the entire system resume completion is delayed. > >>>>> > >>>>> Given that ATA devices will be woken up by libata activity on resume, > >>>>> sd_resume() has no need to issue a START STOP UNIT command, which solves > >>>>> the above mentioned problems. Do not issue this command by introducing > >>>>> the new scsi_device flag no_start_on_resume and setting this flag to 1 > >>>>> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP > >>>>> UNIT command only if this flag is not set. > >>>>> > >>>>> Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> > >>>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 > >>>>> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") > >>>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> > >>>> > >>>> Thanks for your patch, which is now commit 0a8589055936d8fe > >>>> ("ata,scsi: do not issue START STOP UNIT on resume") in v6.5-rc5. > >>>> Sorry for being late to the party, but this commit landed upstream > >>>> during my summer holidays, and apparently I wasn't that focussed on > >>>> noticing small behavioral changes after getting back to work... > >>>> > >>>> I noticed an oddity after s2idle or s2ram on Renesas Salvator-XS (R-Car > >>>> H3 ES2.0) with an old (spinning rust) SATA drive, and bisected it to > >>>> this commit: when accessing the drive after system resume, there is now > >>>> a delay of ca. 5s before data is returned, presumably due to starting > >>>> the drive, and having to wait for it to spin up to be able to read data. > >>>> But the good news is that the actual system resume takes less time than > >>>> before (reduced by even more than ca. 5s!), so this looks like a net > >>>> win... > >>> > >>> Thanks for the report. The delay for the first data access from user space right > >>> after resume is 100% expected, with or without this patch. The reason is that > >>> waking up the drive (spinning it up) is done asynchronously from the PM resume > >>> context, so when you get "PM suspend exit" message signaling that the system is > >>> resumed, the drive may not yet be spinning. Any access will wait for that to > >>> happen before proceeding. Depending on the drive that can take up to 10s or so. > >> > >> That does not match with what I am seeing: before this patch, there > >> was no delay on first data access from user space, as the drive is fully > >> spun up when system resume returns. > >> With this patch, system resume returns earlier, and the drive is only > >> spun up when user space starts accessing data. > >> > >> Note that I do not have any file system mounted, and use > >> "hd /dev/sda | head -70" to access the disk. > >> > >>> I am not entirely sure where the net win you see come from. But the patch you > >>> mention is in fact completely wrong and does not fix the underlying issues with > >>> ata suspend/resume and potential deadlocks in PM due to ata ports relationship > >>> with scsi devices. So I have been working on fixing this for the last 2 weeks, > >>> after another user also reported issues with the patch you mention [1]. > >>> > >>> Could you try libata for-next branch on your system ? There are 7 fix patches in > >>> there that I plan to send out for 6.6-rc2 to fix the patch in question and other > >>> issues potentially causing deadlocks on resume. The patches were posted also [2]. > >>> > >>> https://lore.kernel.org/linux-ide/20230912005655.368075-1-dlemoal@kernel.org/T/#t > >> > >> Unfortunately that didn't work, as /dev/sda no longer exists. > >> Will reply to the patch I bisected the issue to... > > > > With libata/for-next (fa2259a59966c005 ("ata: libata: Cleanup inline > > DMA helper functions")) and commit 99626085d036ec32 ("ata: libata-scsi: > > link ata port and scsi device") reverted, it behaves as before (disk > > is spun up when system resume completes, no delay when accessing the > > disk from userspace). > > I will check the ata platform driver for R-CAR. I may have overlooked something > in that area. I tested with AHCI and libsas adapters only as I do not have ATA > on the few ARM SBC boards I have. And I do not have an R-CAR board. > > What surprises me is that you need to revert ata: libata: Cleanup inline DMA > helper functions". This patch has 0 functional changes and really is only a code > cleanup... Nothing should change with it. Can you confirm that you really need > to revert that patch to get things working again ? My apologies, I should have written: With libata/for-next (which is currently at fa2259a59966c005 ("ata: libata: Cleanup inline DMA helper functions")), and commit 99626085d036ec32 ("ata: libata-scsi: link ata port and scsi device") reverted, ... So 99626085d036ec32 was the only commit I had to revert. Sorry for the confusion. Gr{oetje,eeting}s, Geert
Damien Le Moal <dlemoal@kernel.org> writes: > VERIFY command is used to spin up the disk). And furthermore, the specs also > says that even a reset shall not change the device power state. So issuing a > VERIFY command to spin up the drive is required per specs. Note that I do see > many of my drives (I have hundreds in the lab) spinning up on reset, which is > against the specs. But not all of them. So with the patch "do not issue START > STOP UNIT on resume", one risks not seeing the drive resuming correctly (timeout > errors on IDENTIFY command issued on resume will get the drive removed). I tried to get some patches merged a few years back along these lines to be able to keep some ATA disks in a mostly permanent state of being powered down. I don't recall the specs saying that a reset would not change power states. What I do recall is that as a general rule, ATA disks automatically spin up after power on, unless the Power Up in Standby feature is activated, which at least for Western Digital drives, requires a jumper to short some pins on the drive. IIRC, I had a patch to skip the START UNIT command on resume based on a sysfs setting. I think it was rejected because SCSI disks can not be accessed without the START UNIT command, and even though the scsi eh path will notice when a drive needs this and issue it, there was something not good about the eh path. I think it was something along the lines of all IO to all disks on the whole scsi host are suspended during eh. I also tried to have the runtime pm notice whether the drive auto powered up or not and set the runtime pm state to correctly indicate whether the drive is running or not, as well as to use the deeper ATA SLEEP state to save more power over STANDBY and allow the ATA link to be runtime pm suspended.
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 370d18aca71e..c6ece32de8e3 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1100,7 +1100,14 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) } } else { sdev->sector_size = ata_id_logical_sector_size(dev->id); + /* + * Stop the drive on suspend but do not issue START STOP UNIT + * on resume as this is not necessary and may fail: the device + * will be woken up by ata_port_pm_resume() with a port reset + * and device revalidation. + */ sdev->manage_start_stop = 1; + sdev->no_start_on_resume = 1; } /* diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 68b12afa0721..3c668cfb146d 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev) static int sd_resume(struct device *dev) { struct scsi_disk *sdkp = dev_get_drvdata(dev); - int ret; + int ret = 0; if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ return 0; @@ -3884,8 +3884,11 @@ static int sd_resume(struct device *dev) if (!sdkp->device->manage_start_stop) return 0; - sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); - ret = sd_start_stop_device(sdkp, 1); + if (!sdkp->device->no_start_on_resume) { + sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); + ret = sd_start_stop_device(sdkp, 1); + } + if (!ret) opal_unlock_from_suspend(sdkp->opal_dev); return ret; diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 75b2235b99e2..b9230b6add04 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -194,6 +194,7 @@ struct scsi_device { unsigned no_start_on_add:1; /* do not issue start on add */ unsigned allow_restart:1; /* issue START_UNIT in error handler */ unsigned manage_start_stop:1; /* Let HLD (sd) manage start/stop */ + unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */ unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT */ unsigned no_uld_attach:1; /* disable connecting to upper level drivers */ unsigned select_no_atn:1;
During system resume, ata_port_pm_resume() triggers ata EH to 1) Resume the controller 2) Reset and rescan the ports 3) Revalidate devices This EH execution is started asynchronously from ata_port_pm_resume(), which means that when sd_resume() is executed, none or only part of the above processing may have been executed. However, sd_resume() issues a START STOP UNIT to wake up the drive from sleep mode. This command is translated to ATA with ata_scsi_start_stop_xlat() and issued to the device. However, depending on the state of execution of the EH process and revalidation triggerred by ata_port_pm_resume(), two things may happen: 1) The START STOP UNIT fails if it is received before the controller has been reenabled at the beginning of the EH execution. This is visible with error messages like: ata10.00: device reported invalid CHS sector 0 sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 sd 9:0:0:0: PM: failed to resume async: error -5 2) The START STOP UNIT command is received while the EH process is on-going, which mean that it is stopped and must wait for its completion, at which point the command is rather useless as the drive is already fully spun up already. This case results also in a significant delay in sd_resume() which is observable by users as the entire system resume completion is delayed. Given that ATA devices will be woken up by libata activity on resume, sd_resume() has no need to issue a START STOP UNIT command, which solves the above mentioned problems. Do not issue this command by introducing the new scsi_device flag no_start_on_resume and setting this flag to 1 in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP UNIT command only if this flag is not set. Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> --- drivers/ata/libata-scsi.c | 7 +++++++ drivers/scsi/sd.c | 9 ++++++--- include/scsi/scsi_device.h | 1 + 3 files changed, 14 insertions(+), 3 deletions(-)