Message ID | 1516800025-193614-1-git-send-email-chenxiang66@hisilicon.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Hello, On Wed, Jan 24, 2018 at 09:20:25PM +0800, chenxiang wrote: > In ata_eh_reset, it will reset three times at most for sata disk. For > some drivers through libsas, it calls sas_ata_hard_reset at last. When > device is gone, function sas_ata_hard_reset will return -ENODEV. But > it will still try to reset three times for offline device. This process > lasts a long time: > > [11248.344323] ata13.00: status: { ERR } > [11248.344324] ata13.00: error: { ABRT } > [11248.344327] ata13: hard resetting link > [11248.503557] sas: ata: ex 500e004aaaaaaa1f phy02:U:A attached:0000000000000000 (no device) > [11249.359524] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta03d:19h:35m:17s] > [11249.365692] ata13: reset failed (errno=-19), retrying in 9 secs > [11258.451402] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops][eta 03d:22h:10m:48s] > [11259.411508] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 03d:22h:28m:05s] > [11259.417683] ata13: reset failed (errno=-19), retrying in 10 secs > [11268.695401] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops] [eta 04d:01h:03m:37s] > [11269.699513] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 04d:01h:20m:54s] > [11269.705689] ata13: reset failed (errno=-19), retrying in 34 secs > [11304.275393] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops] [eta 04d:11h:25m:43s] > [11305.283516] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 04d:11h:43m:00s] > [11305.289692] ata13: reset failed, giving up > [11305.293785] ata13.00: disabled > > Actually it is no need to reset three times for this scenario. So add > a check to avoid it. I'm a bit reluctant in changing this per-driver. Does this actually hurt something? Thanks.
Hi Tejun, 在 2018/2/13 0:51, Tejun Heo 写道: > Hello, > > On Wed, Jan 24, 2018 at 09:20:25PM +0800, chenxiang wrote: >> In ata_eh_reset, it will reset three times at most for sata disk. For >> some drivers through libsas, it calls sas_ata_hard_reset at last. When >> device is gone, function sas_ata_hard_reset will return -ENODEV. But >> it will still try to reset three times for offline device. This process >> lasts a long time: >> >> [11248.344323] ata13.00: status: { ERR } >> [11248.344324] ata13.00: error: { ABRT } >> [11248.344327] ata13: hard resetting link >> [11248.503557] sas: ata: ex 500e004aaaaaaa1f phy02:U:A attached:0000000000000000 (no device) >> [11249.359524] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta03d:19h:35m:17s] >> [11249.365692] ata13: reset failed (errno=-19), retrying in 9 secs >> [11258.451402] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops][eta 03d:22h:10m:48s] >> [11259.411508] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 03d:22h:28m:05s] >> [11259.417683] ata13: reset failed (errno=-19), retrying in 10 secs >> [11268.695401] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops] [eta 04d:01h:03m:37s] >> [11269.699513] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 04d:01h:20m:54s] >> [11269.705689] ata13: reset failed (errno=-19), retrying in 34 secs >> [11304.275393] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops] [eta 04d:11h:25m:43s] >> [11305.283516] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 04d:11h:43m:00s] >> [11305.289692] ata13: reset failed, giving up >> [11305.293785] ata13.00: disabled >> >> Actually it is no need to reset three times for this scenario. So add >> a check to avoid it. > I'm a bit reluctant in changing this per-driver. Does this actually > hurt something? For those drivers using libsas, i think they have the same issue. It takes about 1 minute to recover but actually device is gone, so this recover is useless for this scenario (when enter EH, all normal IOs are blocked actually, so it will cause normal IOs are blocked one more minute which user doesn't want to). Actually in sas_ata_hard_reset, there are two situations returned -ENODEV which represent device is gone: - LLDD directly returns -ENODEV through lldd_I_T_nexus_reset; - It sends SMP DISCOVER to check local phy in smp_ata_check_ready, and find it is gone; > Thanks. >
Hello, On Tue, Feb 13, 2018 at 09:44:53AM +0800, chenxiang (M) wrote: > For those drivers using libsas, i think they have the same issue. > It takes about 1 minute to > recover but actually device is gone, so this recover is useless for > this scenario (when enter EH, > all normal IOs are blocked actually, so it will cause normal IOs are > blocked one more minute which > user doesn't want to). Right, it'd block other devices sharing the port. Doesn't sas map each ata device to its own port tho? > Actually in sas_ata_hard_reset, there are two situations returned > -ENODEV which represent device is gone: > - LLDD directly returns -ENODEV through lldd_I_T_nexus_reset; > - It sends SMP DISCOVER to check local phy in smp_ata_check_ready, > and find it is gone; So, if there are real consequences, we can definitely add a way to short-circuit the recovery logic but let's do that by adding proper signaling rathr than testing for driver type. Thanks.
Hi Tejun, Sorry for my late reply as i have a vacation last week. 在 2018/2/13 22:27, Tejun Heo 写道: > Hello, > > On Tue, Feb 13, 2018 at 09:44:53AM +0800, chenxiang (M) wrote: >> For those drivers using libsas, i think they have the same issue. >> It takes about 1 minute to >> recover but actually device is gone, so this recover is useless for >> this scenario (when enter EH, >> all normal IOs are blocked actually, so it will cause normal IOs are >> blocked one more minute which >> user doesn't want to). > Right, it'd block other devices sharing the port. Doesn't sas map > each ata device to its own port tho? Yes, for ata devices connected with expander, all the devices share the same port. > >> Actually in sas_ata_hard_reset, there are two situations returned >> -ENODEV which represent device is gone: >> - LLDD directly returns -ENODEV through lldd_I_T_nexus_reset; >> - It sends SMP DISCOVER to check local phy in smp_ata_check_ready, >> and find it is gone; > So, if there are real consequences, we can definitely add a way to > short-circuit the recovery logic but let's do that by adding proper > signaling rathr than testing for driver type. I am not familiar with ata recovery logic, and do you have idea about how to add a way to short-circuit the recovery logic by adding proper signaling? > > Thanks. >
Hello, On Mon, Feb 26, 2018 at 07:45:37PM +0800, chenxiang (M) wrote: > >So, if there are real consequences, we can definitely add a way to > >short-circuit the recovery logic but let's do that by adding proper > >signaling rathr than testing for driver type. > > I am not familiar with ata recovery logic, and do you have idea > about how to add a way to > short-circuit the recovery logic by adding proper signaling? e.g. define a return value from reset function which indicates no retry or introduce a port flag. Basically, something which doens't add special casing logic to the core logic. Thanks.
Hi Tejun, 在 2018/2/28 2:19, Tejun Heo 写道: > Hello, > > On Mon, Feb 26, 2018 at 07:45:37PM +0800, chenxiang (M) wrote: >>> So, if there are real consequences, we can definitely add a way to >>> short-circuit the recovery logic but let's do that by adding proper >>> signaling rathr than testing for driver type. >> I am not familiar with ata recovery logic, and do you have idea >> about how to add a way to >> short-circuit the recovery logic by adding proper signaling? > e.g. define a return value from reset function which indicates no > retry or introduce a port flag. Basically, something which doens't > add special casing logic to the core logic. If we can introduce a port flags such as ATA_LFLAG_DISABLED or ATA_EHI_NO_RECOVERY before ata_eh_recover, it will skip recovery. But we only get device status NODEV from ata_do_reset which is after ata_eh_recover, i don't think it is used to introduce a port flags at that time. From function ata_eh_reset, it seems there are two situations that end the recovery logic in current code: 1. Retry 3 times; 2. Reset function return value -ERESTART, but this return value seems be specal situation for ATA; > > Thanks. >
Hello, On Wed, Feb 28, 2018 at 03:18:39PM +0800, chenxiang (M) wrote: > If we can introduce a port flags such as ATA_LFLAG_DISABLED or > ATA_EHI_NO_RECOVERY before ata_eh_recover, > it will skip recovery. But we only get device status NODEV from > ata_do_reset which is after ata_eh_recover, i don't > think it is used to introduce a port flags at that time. > From function ata_eh_reset, it seems there are two situations that > end the recovery logic in current code: > 1. Retry 3 times; > 2. Reset function return value -ERESTART, but this return value > seems be specal situation for ATA; So, we have -ERESTART for restarting, -EPIPE for speeding down. We can just add another special return code - say, -ENOENT - to indicate that there's nothing and it shouldn't keep trying. Thanks.
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 11c3137..23a8946 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -3032,6 +3032,15 @@ int ata_eh_reset(struct ata_link *link, int classify, goto out; } + if (rc == -ENODEV && ap->flags & ATA_FLAG_SAS_HOST) { + ata_link_warn(failed_link, + "reset failed (errno=%d, device is offline for SAS host\n)", + rc); + if (ata_is_host_link(link)) + ata_eh_thaw_port(ap); + goto out; + } + now = jiffies; if (time_before(now, deadline)) { unsigned long delta = deadline - now;
In ata_eh_reset, it will reset three times at most for sata disk. For some drivers through libsas, it calls sas_ata_hard_reset at last. When device is gone, function sas_ata_hard_reset will return -ENODEV. But it will still try to reset three times for offline device. This process lasts a long time: [11248.344323] ata13.00: status: { ERR } [11248.344324] ata13.00: error: { ABRT } [11248.344327] ata13: hard resetting link [11248.503557] sas: ata: ex 500e004aaaaaaa1f phy02:U:A attached:0000000000000000 (no device) [11249.359524] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta03d:19h:35m:17s] [11249.365692] ata13: reset failed (errno=-19), retrying in 9 secs [11258.451402] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops][eta 03d:22h:10m:48s] [11259.411508] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 03d:22h:28m:05s] [11259.417683] ata13: reset failed (errno=-19), retrying in 10 secs [11268.695401] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops] [eta 04d:01h:03m:37s] [11269.699513] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 04d:01h:20m:54s] [11269.705689] ata13: reset failed (errno=-19), retrying in 34 secs [11304.275393] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops] [eta 04d:11h:25m:43s] [11305.283516] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 04d:11h:43m:00s] [11305.289692] ata13: reset failed, giving up [11305.293785] ata13.00: disabled Actually it is no need to reset three times for this scenario. So add a check to avoid it. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> --- drivers/ata/libata-eh.c | 9 +++++++++ 1 file changed, 9 insertions(+)