Message ID | 20160114214602.GC3520@mtj.duckdns.org (mailing list archive) |
---|---|
State | Rejected, archived |
Headers | show |
On 01/14/2016 10:46 PM, Tejun Heo wrote: > SCSI command completion path bumps ioerr_cnt whenever scsi_cmd->result > isn't zero; unfortunately, this means that non-error sense reporting > bumps the counter too. This is pronounced with ATA passthrough > commands because most of them explicitly request the resulting > taskfile to be transported via sense data bumping the count > unconditionally. > > Don't bump the counter if scsi_cmd->result simply indicates that sense > data is available. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Dave Jones <dsj@fb.com> > --- > drivers/scsi/scsi_lib.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index fa6b2c4..e90e3f7 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1622,7 +1622,8 @@ static void scsi_softirq_done(struct request *rq) > INIT_LIST_HEAD(&cmd->eh_entry); > > atomic_inc(&cmd->device->iodone_cnt); > - if (cmd->result) > + if (cmd->result && > + cmd->result != ((DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION)) > atomic_inc(&cmd->device->ioerr_cnt); > > disposition = scsi_decide_disposition(cmd); Reviewed-by: Hannes Reinecke <hare@suse.com> Hannes
On Thu, 2016-01-14 at 16:46 -0500, Tejun Heo wrote: > SCSI command completion path bumps ioerr_cnt whenever scsi_cmd > ->result isn't zero; unfortunately, this means that non-error sense > reporting bumps the counter too. This is pronounced with ATA > passthrough commands because most of them explicitly request the > resulting taskfile to be transported via sense data bumping the count > unconditionally. > > Don't bump the counter if scsi_cmd->result simply indicates that > sense data is available. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Dave Jones <dsj@fb.com> > --- > drivers/scsi/scsi_lib.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index fa6b2c4..e90e3f7 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1622,7 +1622,8 @@ static void scsi_softirq_done(struct request > *rq) > INIT_LIST_HEAD(&cmd->eh_entry); > > atomic_inc(&cmd->device->iodone_cnt); > - if (cmd->result) > + if (cmd->result && > + cmd->result != ((DRIVER_SENSE << 24) | > SAM_STAT_CHECK_CONDITION)) > atomic_inc(&cmd->device->ioerr_cnt); OK, it makes sense to me that we don't include non-error check conditions. However, then you shouldn't be checking DRIVER_SENSE. We still have a few drivers that rely on the error handler to fetch sense explicitly ... they could eventually return non-error conditions as well. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2016-01-15 at 07:46 -0800, James Bottomley wrote: > On Thu, 2016-01-14 at 16:46 -0500, Tejun Heo wrote: > > SCSI command completion path bumps ioerr_cnt whenever scsi_cmd > > ->result isn't zero; unfortunately, this means that non-error sense > > reporting bumps the counter too. This is pronounced with ATA > > passthrough commands because most of them explicitly request the > > resulting taskfile to be transported via sense data bumping the > > count > > unconditionally. > > > > Don't bump the counter if scsi_cmd->result simply indicates that > > sense data is available. > > > > Signed-off-by: Tejun Heo <tj@kernel.org> > > Reported-by: Dave Jones <dsj@fb.com> > > --- > > drivers/scsi/scsi_lib.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index fa6b2c4..e90e3f7 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1622,7 +1622,8 @@ static void scsi_softirq_done(struct request > > *rq) > > INIT_LIST_HEAD(&cmd->eh_entry); > > > > atomic_inc(&cmd->device->iodone_cnt); > > - if (cmd->result) > > + if (cmd->result && > > + cmd->result != ((DRIVER_SENSE << 24) | > > SAM_STAT_CHECK_CONDITION)) > > atomic_inc(&cmd->device->ioerr_cnt); > > OK, it makes sense to me that we don't include non-error check > conditions. However, then you shouldn't be checking DRIVER_SENSE. > We still have a few drivers that rely on the error handler to fetch > sense explicitly ... they could eventually return non-error > conditions as well. Actually, I take this back: if we add your proposal, we never increment the ioerr_cnt even for sense returns indicating failure. That looks to be even worse than incrementing it too often. The other problem is that if we do this for you, we should do the same for SCSI with BUSY and QUEUE_FULL ... they indicate temporary retry conditions and shouldn't be treated as errors. I'll stop looking now before I find any more problems with the statistics code ... I think it needs a rethink. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 16-01-15 04:55 PM, James Bottomley wrote: > On Fri, 2016-01-15 at 07:46 -0800, James Bottomley wrote: >> On Thu, 2016-01-14 at 16:46 -0500, Tejun Heo wrote: >>> SCSI command completion path bumps ioerr_cnt whenever scsi_cmd >>> ->result isn't zero; unfortunately, this means that non-error sense >>> reporting bumps the counter too. This is pronounced with ATA >>> passthrough commands because most of them explicitly request the >>> resulting taskfile to be transported via sense data bumping the >>> count >>> unconditionally. >>> >>> Don't bump the counter if scsi_cmd->result simply indicates that >>> sense data is available. >>> >>> Signed-off-by: Tejun Heo <tj@kernel.org> >>> Reported-by: Dave Jones <dsj@fb.com> >>> --- >>> drivers/scsi/scsi_lib.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >>> index fa6b2c4..e90e3f7 100644 >>> --- a/drivers/scsi/scsi_lib.c >>> +++ b/drivers/scsi/scsi_lib.c >>> @@ -1622,7 +1622,8 @@ static void scsi_softirq_done(struct request >>> *rq) >>> INIT_LIST_HEAD(&cmd->eh_entry); >>> >>> atomic_inc(&cmd->device->iodone_cnt); >>> - if (cmd->result) >>> + if (cmd->result && >>> + cmd->result != ((DRIVER_SENSE << 24) | >>> SAM_STAT_CHECK_CONDITION)) >>> atomic_inc(&cmd->device->ioerr_cnt); >> >> OK, it makes sense to me that we don't include non-error check >> conditions. However, then you shouldn't be checking DRIVER_SENSE. >> We still have a few drivers that rely on the error handler to fetch >> sense explicitly ... they could eventually return non-error >> conditions as well. > > Actually, I take this back: if we add your proposal, we never increment > the ioerr_cnt even for sense returns indicating failure. That looks to > be even worse than incrementing it too often. > > The other problem is that if we do this for you, we should do the same > for SCSI with BUSY and QUEUE_FULL ... they indicate temporary retry > conditions and shouldn't be treated as errors. > > I'll stop looking now before I find any more problems with the > statistics code ... I think it needs a rethink. SCSI status and sense data is non-trivial to decode. It looks like someone thought that one-liner would bypass a lot of hard work. Most of the time sense data indicates an error but not always. Even worse, it can contain vendor specific codes. If this statistic is on the fast path then IMO it should be retired (and any others like it). For backward compatibility set it to 0 once at initialization and document the change. Or you could have a discouraging kernel config option such as CONFIG_EXPENSIVE_SCSI_STATISTICS ("nonsensical" is another term that comes to mind). Doug Gilbert -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2016-01-15 at 17:50 +0100, Douglas Gilbert wrote: > On 16-01-15 04:55 PM, James Bottomley wrote: > > On Fri, 2016-01-15 at 07:46 -0800, James Bottomley wrote: > > > On Thu, 2016-01-14 at 16:46 -0500, Tejun Heo wrote: > > > > SCSI command completion path bumps ioerr_cnt whenever scsi_cmd > > > > ->result isn't zero; unfortunately, this means that non-error > > > > sense > > > > reporting bumps the counter too. This is pronounced with ATA > > > > passthrough commands because most of them explicitly request > > > > the > > > > resulting taskfile to be transported via sense data bumping the > > > > count > > > > unconditionally. > > > > > > > > Don't bump the counter if scsi_cmd->result simply indicates > > > > that > > > > sense data is available. > > > > > > > > Signed-off-by: Tejun Heo <tj@kernel.org> > > > > Reported-by: Dave Jones <dsj@fb.com> > > > > --- > > > > drivers/scsi/scsi_lib.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > > > index fa6b2c4..e90e3f7 100644 > > > > --- a/drivers/scsi/scsi_lib.c > > > > +++ b/drivers/scsi/scsi_lib.c > > > > @@ -1622,7 +1622,8 @@ static void scsi_softirq_done(struct > > > > request > > > > *rq) > > > > INIT_LIST_HEAD(&cmd->eh_entry); > > > > > > > > atomic_inc(&cmd->device->iodone_cnt); > > > > - if (cmd->result) > > > > + if (cmd->result && > > > > + cmd->result != ((DRIVER_SENSE << 24) | > > > > SAM_STAT_CHECK_CONDITION)) > > > > atomic_inc(&cmd->device->ioerr_cnt); > > > > > > OK, it makes sense to me that we don't include non-error check > > > conditions. However, then you shouldn't be checking > > > DRIVER_SENSE. > > > We still have a few drivers that rely on the error handler to > > > fetch > > > sense explicitly ... they could eventually return non-error > > > conditions as well. > > > > Actually, I take this back: if we add your proposal, we never > > increment > > the ioerr_cnt even for sense returns indicating failure. That > > looks to > > be even worse than incrementing it too often. > > > > The other problem is that if we do this for you, we should do the > > same > > for SCSI with BUSY and QUEUE_FULL ... they indicate temporary retry > > conditions and shouldn't be treated as errors. > > > > I'll stop looking now before I find any more problems with the > > statistics code ... I think it needs a rethink. > > SCSI status and sense data is non-trivial to decode. It looks > like someone thought that one-liner would bypass a lot of hard > work. Most of the time sense data indicates an error but not > always. Even worse, it can contain vendor specific codes. If > this statistic is on the fast path then IMO it should be retired > (and any others like it). For backward compatibility set it to > 0 once at initialization and document the change. Or you could > have a discouraging kernel config option such as > CONFIG_EXPENSIVE_SCSI_STATISTICS ("nonsensical" is another term > that comes to mind). Well, I can see sense in having an error count of everything that comes back that's not good status because it's easy and has a well defined meaning (calling it the "error count" is more debatable, agreed). It appears that Dave and Tejun want the count to mean something else. Lets treat this as a feature exercise: Dave and Tejun, what do you want, then we can see if we could add an additional counter giving you that. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, James. On Fri, Jan 15, 2016 at 10:35:34AM -0800, James Bottomley wrote: > Well, I can see sense in having an error count of everything that comes > back that's not good status because it's easy and has a well defined > meaning (calling it the "error count" is more debatable, agreed). It > appears that Dave and Tejun want the count to mean something else. > Lets treat this as a feature exercise: Dave and Tejun, what do you > want, then we can see if we could add an additional counter giving you > that. Well, currently, for libata devices, all passthrough commands bump it up because the result taskfile is reported via sense data, which is pretty misleading for something named ioerr_cnt. Maybe we can ignore CHECK_CONDITION for ATA passthrough commands but special casing them can be confusing in other ways. Thanks.
On Fri, 2016-01-15 at 13:42 -0500, Tejun Heo wrote: > Hello, James. > > On Fri, Jan 15, 2016 at 10:35:34AM -0800, James Bottomley wrote: > > Well, I can see sense in having an error count of everything that > > comes > > back that's not good status because it's easy and has a well > > defined > > meaning (calling it the "error count" is more debatable, agreed). > > It > > appears that Dave and Tejun want the count to mean something else. > > Lets treat this as a feature exercise: Dave and Tejun, what do you > > want, then we can see if we could add an additional counter giving > > you > > that. > > Well, currently, for libata devices, all passthrough commands bump it > up because the result taskfile is reported via sense data, which is > pretty misleading for something named ioerr_cnt. Maybe we can ignore > CHECK_CONDITION for ATA passthrough commands but special casing them > can be confusing in other ways. That doesn't really help me pin down what you want. I already said ioerr_cnt is misleading, but we're stuck with the name, becuase it's an ABI. Under the definition I gave, the behaviour you currently see is correct: all commands with non good status count as errors. That includes a ton of stuff I wouldn't classify as real errors, like queue full status, deferred sense codes and even auto correct, which is why I believe the term "error count" is misleading, but, as I said, we're stuck with it. If we want to change what is being counted, we have to change the definition, so what is the definition you want to see for counting errors? and what's the reason driving this change? James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, James. On Fri, Jan 15, 2016 at 11:09:16AM -0800, James Bottomley wrote: > If we want to change what is being counted, we have to change the > definition, so what is the definition you want to see for counting > errors? and what's the reason driving this change? There isn't anything else to it than what I already wrote. It goes up for things which aren't failures making the counter essentially useless. It's confusing from userland as it's unintuitive that these commands are wrapped in SCSI ATA passthrough commands which use sense data for result reporting and doing equivalent operations on native SCSI and ATA devices lead to different outcomes. The counter as-is is just useless for libata devices. If it can be rectified easily, great. If not, it isn't anything critical. Thanks.
On Fri, 2016-01-15 at 14:27 -0500, Tejun Heo wrote: > Hello, James. > > On Fri, Jan 15, 2016 at 11:09:16AM -0800, James Bottomley wrote: > > If we want to change what is being counted, we have to change the > > definition, so what is the definition you want to see for counting > > errors? and what's the reason driving this change? > > There isn't anything else to it than what I already wrote. It goes > up > for things which aren't failures making the counter essentially > useless. It's confusing from userland as it's unintuitive that these > commands are wrapped in SCSI ATA passthrough commands which use sense > data for result reporting and doing equivalent operations on native > SCSI and ATA devices lead to different outcomes. The counter as-is > is > just useless for libata devices. If it can be rectified easily, > great. If not, it isn't anything critical. That's the point we were making: it can't be rectified easily. Simply counting non-good returns is easy. Counting something that's closer to errors is hard because first we have to define what an "error" is (I could see queue full being not an error, but what about not ready or initializing command required?) and then implement and infrastructure to classify and count the returns, which means an awful lot of sense data parsing. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 15, 2016 at 11:36:08AM -0800, James Bottomley wrote: > That's the point we were making: it can't be rectified easily. Simply > counting non-good returns is easy. Counting something that's closer to > errors is hard because first we have to define what an "error" is (I > could see queue full being not an error, but what about not ready or > initializing command required?) and then implement and infrastructure > to classify and count the returns, which means an awful lot of sense > data parsing. Yeap, no objection. Leaving as-is fine.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index fa6b2c4..e90e3f7 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1622,7 +1622,8 @@ static void scsi_softirq_done(struct request *rq) INIT_LIST_HEAD(&cmd->eh_entry); atomic_inc(&cmd->device->iodone_cnt); - if (cmd->result) + if (cmd->result && + cmd->result != ((DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION)) atomic_inc(&cmd->device->ioerr_cnt); disposition = scsi_decide_disposition(cmd);
SCSI command completion path bumps ioerr_cnt whenever scsi_cmd->result isn't zero; unfortunately, this means that non-error sense reporting bumps the counter too. This is pronounced with ATA passthrough commands because most of them explicitly request the resulting taskfile to be transported via sense data bumping the count unconditionally. Don't bump the counter if scsi_cmd->result simply indicates that sense data is available. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dave Jones <dsj@fb.com> --- drivers/scsi/scsi_lib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html