Message ID | 20240709160013.634308-1-tadamsjr@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | scsi: pm80xx: Remove msleep() loop from pm8001_dev_gone_notify() | expand |
On 09/07/2024 17:00, TJ Adams wrote: > From: Igor Pylypiv <ipylypiv@google.com> > > It's possible to end up in a state where pm8001_dev->running_req never > reaches zero. Is that a driver bug then? > In that state we will be sleeping forever. > > sas_execute_internal_abort_dev() can wait for a response for > up to 60 seconds (3 retries x 20 seconds). 60 seconds should be enough > for pm8001_dev->running_req to get to zero. May I suggest you drop running_req at some stage, and use other methods to find how many IOs are active? > > Signed-off-by: Igor Pylypiv <ipylypiv@google.com> > Signed-off-by: TJ Adams <tadamsjr@google.com> > --- > drivers/scsi/pm8001/pm8001_sas.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c > index a5a31dfa4512..513e9a49838c 100644 > --- a/drivers/scsi/pm8001/pm8001_sas.c > +++ b/drivers/scsi/pm8001/pm8001_sas.c > @@ -712,8 +712,11 @@ static void pm8001_dev_gone_notify(struct domain_device *dev) > if (atomic_read(&pm8001_dev->running_req)) { > spin_unlock_irqrestore(&pm8001_ha->lock, flags); > sas_execute_internal_abort_dev(dev, 0, NULL); > - while (atomic_read(&pm8001_dev->running_req)) > - msleep(20); > + if (atomic_read(&pm8001_dev->running_req)) { > + pm8001_dbg(pm8001_ha, FAIL, > + "device_id: %u: Failed to abort %d requests!\n", > + device_id, atomic_read(&pm8001_dev->running_req)); > + } > spin_lock_irqsave(&pm8001_ha->lock, flags); > } > PM8001_CHIP_DISP->dereg_dev_req(pm8001_ha, device_id);
Sorry for the late response. > > It's possible to end up in a state where pm8001_dev->running_req never > > reaches zero. > > Is that a driver bug then? I haven't seen this unless artificially creating the situation. This is a preventative change rather than a response to a specific issue seen. > > In that state we will be sleeping forever. > > > > sas_execute_internal_abort_dev() can wait for a response for > > up to 60 seconds (3 retries x 20 seconds). 60 seconds should be enough > > for pm8001_dev->running_req to get to zero. > May I suggest you drop running_req at some stage, and use other methods > to find how many IOs are active? I haven't given much thought about better ways to keep track of active ios, so it will have to come later but definitely noted! On Tue, Jul 9, 2024 at 9:09 AM John Garry <john.g.garry@oracle.com> wrote: > > On 09/07/2024 17:00, TJ Adams wrote: > > From: Igor Pylypiv <ipylypiv@google.com> > > > > It's possible to end up in a state where pm8001_dev->running_req never > > reaches zero. > > Is that a driver bug then? > > > In that state we will be sleeping forever. > > > > sas_execute_internal_abort_dev() can wait for a response for > > up to 60 seconds (3 retries x 20 seconds). 60 seconds should be enough > > for pm8001_dev->running_req to get to zero. > > May I suggest you drop running_req at some stage, and use other methods > to find how many IOs are active? > > > > > Signed-off-by: Igor Pylypiv <ipylypiv@google.com> > > Signed-off-by: TJ Adams <tadamsjr@google.com> > > --- > > drivers/scsi/pm8001/pm8001_sas.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c > > index a5a31dfa4512..513e9a49838c 100644 > > --- a/drivers/scsi/pm8001/pm8001_sas.c > > +++ b/drivers/scsi/pm8001/pm8001_sas.c > > @@ -712,8 +712,11 @@ static void pm8001_dev_gone_notify(struct domain_device *dev) > > if (atomic_read(&pm8001_dev->running_req)) { > > spin_unlock_irqrestore(&pm8001_ha->lock, flags); > > sas_execute_internal_abort_dev(dev, 0, NULL); > > - while (atomic_read(&pm8001_dev->running_req)) > > - msleep(20); > > + if (atomic_read(&pm8001_dev->running_req)) { > > + pm8001_dbg(pm8001_ha, FAIL, > > + "device_id: %u: Failed to abort %d requests!\n", > > + device_id, atomic_read(&pm8001_dev->running_req)); > > + } > > spin_lock_irqsave(&pm8001_ha->lock, flags); > > } > > PM8001_CHIP_DISP->dereg_dev_req(pm8001_ha, device_id); >
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c index a5a31dfa4512..513e9a49838c 100644 --- a/drivers/scsi/pm8001/pm8001_sas.c +++ b/drivers/scsi/pm8001/pm8001_sas.c @@ -712,8 +712,11 @@ static void pm8001_dev_gone_notify(struct domain_device *dev) if (atomic_read(&pm8001_dev->running_req)) { spin_unlock_irqrestore(&pm8001_ha->lock, flags); sas_execute_internal_abort_dev(dev, 0, NULL); - while (atomic_read(&pm8001_dev->running_req)) - msleep(20); + if (atomic_read(&pm8001_dev->running_req)) { + pm8001_dbg(pm8001_ha, FAIL, + "device_id: %u: Failed to abort %d requests!\n", + device_id, atomic_read(&pm8001_dev->running_req)); + } spin_lock_irqsave(&pm8001_ha->lock, flags); } PM8001_CHIP_DISP->dereg_dev_req(pm8001_ha, device_id);