diff mbox

[blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

Message ID 1523955817-15790-1-git-send-email-maier@linux.ibm.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Steffen Maier April 17, 2018, 9:03 a.m. UTC
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
---
 tests/scsi/004     |   59 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/scsi/004.out |    3 ++
 2 files changed, 62 insertions(+), 0 deletions(-)
 create mode 100755 tests/scsi/004
 create mode 100644 tests/scsi/004.out

Comments

Johannes Thumshirn April 17, 2018, 9:41 a.m. UTC | #1
Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Omar Sandoval April 19, 2018, 6:53 p.m. UTC | #2
On Tue, Apr 17, 2018 at 11:03:37AM +0200, Steffen Maier wrote:
> Signed-off-by: Steffen Maier <maier@linux.ibm.com>
> ---
>  tests/scsi/004     |   59 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/scsi/004.out |    3 ++
>  2 files changed, 62 insertions(+), 0 deletions(-)
>  create mode 100755 tests/scsi/004
>  create mode 100644 tests/scsi/004.out
> 
> diff --git a/tests/scsi/004 b/tests/scsi/004
> new file mode 100755
> index 0000000..4852efc
> --- /dev/null
> +++ b/tests/scsi/004
> @@ -0,0 +1,59 @@
> +#!/bin/bash
> +#
> +# Ensure repeated SAM_STAT_TASK_SET_FULL results in EIO on timing out command.
> +#
> +# Regression test for commit cbe095e2b584 ("Revert "scsi: core: return
> +# BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"")
> +#
> +# Found independently of corresponding commit mail threads while
> +# experimenting with storage mirroring. This test is a storage-independent
> +# reproducer for the error case I ran into.
> +#
> +# Copyright IBM Corp. 2018
> +# Author: Steffen Maier <maier@linux.ibm.com>
> +#
> +# This program is free software: you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation, either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +. common/scsi_debug
> +
> +DESCRIPTION="ensure repeated TASK SET FULL results in EIO on timing out command"
> +
> +requires() {
> +	_have_scsi_debug
> +}
> +
> +test() {
> +	echo "Running ${TEST_NAME}"
> +
> +	if ! _init_scsi_debug add_host=1 max_luns=1 statistics=1 every_nth=1; then
> +	    return 1
> +	fi
> +	# every_nth RW with full queue gets SAM_STAT_TASK_SET_FULL
> +	echo 0x800 > /sys/bus/pseudo/drivers/scsi_debug/opts
> +	# delay to reduce response repetition: around 1..10sec depending on HZ
> +	echo 1000 > /sys/bus/pseudo/drivers/scsi_debug/delay
> +	# a single command fills device queue to satisfy 0x800 opts condition
> +	echo 1 > "/sys/block/${SCSI_DEBUG_DEVICES[0]}/device/queue_depth"
> +	dd if="/dev/${SCSI_DEBUG_DEVICES[0]}" iflag=direct of=/dev/null bs=512 count=1 |& grep -o "Input/output error"
> +	# stop injection
> +	echo 0 > /sys/bus/pseudo/drivers/scsi_debug/opts
> +	# dd closing SCSI disk causes implicit TUR also being delayed once
> +	while grep -q -F "in_use_bm BUSY:" "/proc/scsi/scsi_debug/${SCSI_DEBUG_HOSTS[0]}"; do
> +		sleep 1
> +	done
> +	echo 1 > /sys/bus/pseudo/drivers/scsi_debug/delay
> +	_exit_scsi_debug
> +
> +	echo "Test complete"
> +}
> diff --git a/tests/scsi/004.out b/tests/scsi/004.out
> new file mode 100644
> index 0000000..b1126fb
> --- /dev/null
> +++ b/tests/scsi/004.out
> @@ -0,0 +1,3 @@
> +Running scsi/004
> +Input/output error
> +Test complete
> -- 
> 1.7.1
> 

Thanks for the test! Applied.
Omar Sandoval April 19, 2018, 7:13 p.m. UTC | #3
On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
> Thanks for the test! Applied.

Side note, it's unfortunate that this test takes 180 seconds to run only
because we have to wait for the command timeout. We should be able to
export request_queue->rq_timeout writeable in sysfs. Would you be
interested in doing that?
Bart Van Assche April 19, 2018, 7:41 p.m. UTC | #4
On Thu, 2018-04-19 at 12:13 -0700, Omar Sandoval wrote:
> On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:

> > Thanks for the test! Applied.

> 

> Side note, it's unfortunate that this test takes 180 seconds to run only

> because we have to wait for the command timeout. We should be able to

> export request_queue->rq_timeout writeable in sysfs. Would you be

> interested in doing that?


Hello Omar,

Is this perhaps what you are looking for?
# ls -l /sys/class/scsi_device/*/*/timeout             
-rw-r--r-- 1 root root 4096 Apr 19 08:52 /sys/class/scsi_device/2:0:0:0/device/timeout
-rw-r--r-- 1 root root 4096 Apr 19 12:39 /sys/class/scsi_device/8:0:0:1/device/timeout

Thanks,

Bart.
Jens Axboe April 19, 2018, 7:44 p.m. UTC | #5
On 4/19/18 1:41 PM, Bart Van Assche wrote:
> On Thu, 2018-04-19 at 12:13 -0700, Omar Sandoval wrote:
>> On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
>>> Thanks for the test! Applied.
>>
>> Side note, it's unfortunate that this test takes 180 seconds to run only
>> because we have to wait for the command timeout. We should be able to
>> export request_queue->rq_timeout writeable in sysfs. Would you be
>> interested in doing that?
> 
> Hello Omar,
> 
> Is this perhaps what you are looking for?
> # ls -l /sys/class/scsi_device/*/*/timeout             
> -rw-r--r-- 1 root root 4096 Apr 19 08:52 /sys/class/scsi_device/2:0:0:0/device/timeout
> -rw-r--r-- 1 root root 4096 Apr 19 12:39 /sys/class/scsi_device/8:0:0:1/device/timeout

We should have it generically available though, not just for SCSI. In
retrospect, it should have been under queue/ from the start, now we'll
end up with duplicate entries for SCSI.
Omar Sandoval April 19, 2018, 8:18 p.m. UTC | #6
On Thu, Apr 19, 2018 at 01:44:41PM -0600, Jens Axboe wrote:
> On 4/19/18 1:41 PM, Bart Van Assche wrote:
> > On Thu, 2018-04-19 at 12:13 -0700, Omar Sandoval wrote:
> >> On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
> >>> Thanks for the test! Applied.
> >>
> >> Side note, it's unfortunate that this test takes 180 seconds to run only
> >> because we have to wait for the command timeout. We should be able to
> >> export request_queue->rq_timeout writeable in sysfs. Would you be
> >> interested in doing that?
> > 
> > Hello Omar,
> > 
> > Is this perhaps what you are looking for?
> > # ls -l /sys/class/scsi_device/*/*/timeout             
> > -rw-r--r-- 1 root root 4096 Apr 19 08:52 /sys/class/scsi_device/2:0:0:0/device/timeout
> > -rw-r--r-- 1 root root 4096 Apr 19 12:39 /sys/class/scsi_device/8:0:0:1/device/timeout
> 
> We should have it generically available though, not just for SCSI. In
> retrospect, it should have been under queue/ from the start, now we'll
> end up with duplicate entries for SCSI.

For the sake of this test, I just decreased the timeout through SCSI.
Steffen Maier April 23, 2018, 12:25 p.m. UTC | #7
On 04/19/2018 10:18 PM, Omar Sandoval wrote:
> On Thu, Apr 19, 2018 at 01:44:41PM -0600, Jens Axboe wrote:
>> On 4/19/18 1:41 PM, Bart Van Assche wrote:
>>> On Thu, 2018-04-19 at 12:13 -0700, Omar Sandoval wrote:
>>>> On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
>>>>> Thanks for the test! Applied.
>>>>
>>>> Side note, it's unfortunate that this test takes 180 seconds to run only
>>>> because we have to wait for the command timeout. We should be able to
>>>> export request_queue->rq_timeout writeable in sysfs. Would you be
>>>> interested in doing that?
>>>
>>> Hello Omar,
>>>
>>> Is this perhaps what you are looking for?
>>> # ls -l /sys/class/scsi_device/*/*/timeout
>>> -rw-r--r-- 1 root root 4096 Apr 19 08:52 /sys/class/scsi_device/2:0:0:0/device/timeout
>>> -rw-r--r-- 1 root root 4096 Apr 19 12:39 /sys/class/scsi_device/8:0:0:1/device/timeout
>>
>> We should have it generically available though, not just for SCSI. In
>> retrospect, it should have been under queue/ from the start, now we'll
>> end up with duplicate entries for SCSI.
> 
> For the sake of this test, I just decreased the timeout through SCSI.

Great idea.

> 	echo 5 > "/sys/block/${SCSI_DEBUG_DEVICES[0]}/device/timeout"

However, the timeout should be sufficiently larger than scsi_debug/delay,
in order not to run into the command timeout.
It may be unfortunate that scsi_debug/delay uses jiffies as unit and
can thus differ in a range of an order of magnitude for different kernel configs.

> 	# delay to reduce response repetition: around 1..10sec depending on HZ
> 	echo 1000 > /sys/bus/pseudo/drivers/scsi_debug/delay

On s390, we typically have HZ=100, so 1000 jiffies are 10 seconds.

We can increase the sdev cmd timeout or decrease the scsi_debug/delay.
100 instead of 1000 for scsi_debug/delay worked for me;
but for some reason the loop checking for busy did not work (any more?)
causing an unexpected test case error:

> # ./check scsi/004
> scsi/004 (ensure repeated TASK SET FULL results in EIO on timing out command) [failed]
> runtime  31.892s  ...  31.720s
>     --- tests/scsi/004.out	2018-04-16 11:47:19.105931872 +0200
>     +++ results/nodev/scsi/004.out.bad	2018-04-23 14:07:33.615445253 +0200
>     @@ -1,3 +1,3 @@
>      Running scsi/004
>     -Input/output error
>     +modprobe: FATAL: Module scsi_debug is in use.
>      Test complete

so I added another sleep hack:

         # dd closing SCSI disk causes implicit TUR also being delayed once
+        # sleep over time window where READ was done and TUR not yet queued
+        sleep 2
         while grep -q -F "in_use_bm BUSY:" "/proc/scsi/scsi_debug/${SCSI_DEBUG_HOSTS[0]}"; do

What do you think?
Omar Sandoval April 24, 2018, 11:14 p.m. UTC | #8
On Mon, Apr 23, 2018 at 02:25:03PM +0200, Steffen Maier wrote:
> 
> On 04/19/2018 10:18 PM, Omar Sandoval wrote:
> > On Thu, Apr 19, 2018 at 01:44:41PM -0600, Jens Axboe wrote:
> >> On 4/19/18 1:41 PM, Bart Van Assche wrote:
> >>> On Thu, 2018-04-19 at 12:13 -0700, Omar Sandoval wrote:
> >>>> On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
> >>>>> Thanks for the test! Applied.
> >>>>
> >>>> Side note, it's unfortunate that this test takes 180 seconds to run only
> >>>> because we have to wait for the command timeout. We should be able to
> >>>> export request_queue->rq_timeout writeable in sysfs. Would you be
> >>>> interested in doing that?
> >>>
> >>> Hello Omar,
> >>>
> >>> Is this perhaps what you are looking for?
> >>> # ls -l /sys/class/scsi_device/*/*/timeout
> >>> -rw-r--r-- 1 root root 4096 Apr 19 08:52 /sys/class/scsi_device/2:0:0:0/device/timeout
> >>> -rw-r--r-- 1 root root 4096 Apr 19 12:39 /sys/class/scsi_device/8:0:0:1/device/timeout
> >>
> >> We should have it generically available though, not just for SCSI. In
> >> retrospect, it should have been under queue/ from the start, now we'll
> >> end up with duplicate entries for SCSI.
> > 
> > For the sake of this test, I just decreased the timeout through SCSI.
> 
> Great idea.
> 
> > 	echo 5 > "/sys/block/${SCSI_DEBUG_DEVICES[0]}/device/timeout"
> 
> However, the timeout should be sufficiently larger than scsi_debug/delay,
> in order not to run into the command timeout.
> It may be unfortunate that scsi_debug/delay uses jiffies as unit and
> can thus differ in a range of an order of magnitude for different kernel configs.
> 
> > 	# delay to reduce response repetition: around 1..10sec depending on HZ
> > 	echo 1000 > /sys/bus/pseudo/drivers/scsi_debug/delay
> 
> On s390, we typically have HZ=100, so 1000 jiffies are 10 seconds.

Good catch, I just switched this to use ndelay in nanoseconds instead of
delay.

> We can increase the sdev cmd timeout or decrease the scsi_debug/delay.
> 100 instead of 1000 for scsi_debug/delay worked for me;
> but for some reason the loop checking for busy did not work (any more?)
> causing an unexpected test case error:
> 
> > # ./check scsi/004
> > scsi/004 (ensure repeated TASK SET FULL results in EIO on timing out command) [failed]
> > runtime  31.892s  ...  31.720s
> >     --- tests/scsi/004.out	2018-04-16 11:47:19.105931872 +0200
> >     +++ results/nodev/scsi/004.out.bad	2018-04-23 14:07:33.615445253 +0200
> >     @@ -1,3 +1,3 @@
> >      Running scsi/004
> >     -Input/output error
> >     +modprobe: FATAL: Module scsi_debug is in use.
> >      Test complete
> 
> so I added another sleep hack:
> 
>          # dd closing SCSI disk causes implicit TUR also being delayed once
> +        # sleep over time window where READ was done and TUR not yet queued
> +        sleep 2
>          while grep -q -F "in_use_bm BUSY:" "/proc/scsi/scsi_debug/${SCSI_DEBUG_HOSTS[0]}"; do
> 
> What do you think?

I've been hitting this on and off on all of the scsi-debug tests for
awhile, and I can't figure out where the lingering reference comes from.
I don't think it's related, but I'll look into it.
diff mbox

Patch

diff --git a/tests/scsi/004 b/tests/scsi/004
new file mode 100755
index 0000000..4852efc
--- /dev/null
+++ b/tests/scsi/004
@@ -0,0 +1,59 @@ 
+#!/bin/bash
+#
+# Ensure repeated SAM_STAT_TASK_SET_FULL results in EIO on timing out command.
+#
+# Regression test for commit cbe095e2b584 ("Revert "scsi: core: return
+# BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"")
+#
+# Found independently of corresponding commit mail threads while
+# experimenting with storage mirroring. This test is a storage-independent
+# reproducer for the error case I ran into.
+#
+# Copyright IBM Corp. 2018
+# Author: Steffen Maier <maier@linux.ibm.com>
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. common/scsi_debug
+
+DESCRIPTION="ensure repeated TASK SET FULL results in EIO on timing out command"
+
+requires() {
+	_have_scsi_debug
+}
+
+test() {
+	echo "Running ${TEST_NAME}"
+
+	if ! _init_scsi_debug add_host=1 max_luns=1 statistics=1 every_nth=1; then
+	    return 1
+	fi
+	# every_nth RW with full queue gets SAM_STAT_TASK_SET_FULL
+	echo 0x800 > /sys/bus/pseudo/drivers/scsi_debug/opts
+	# delay to reduce response repetition: around 1..10sec depending on HZ
+	echo 1000 > /sys/bus/pseudo/drivers/scsi_debug/delay
+	# a single command fills device queue to satisfy 0x800 opts condition
+	echo 1 > "/sys/block/${SCSI_DEBUG_DEVICES[0]}/device/queue_depth"
+	dd if="/dev/${SCSI_DEBUG_DEVICES[0]}" iflag=direct of=/dev/null bs=512 count=1 |& grep -o "Input/output error"
+	# stop injection
+	echo 0 > /sys/bus/pseudo/drivers/scsi_debug/opts
+	# dd closing SCSI disk causes implicit TUR also being delayed once
+	while grep -q -F "in_use_bm BUSY:" "/proc/scsi/scsi_debug/${SCSI_DEBUG_HOSTS[0]}"; do
+		sleep 1
+	done
+	echo 1 > /sys/bus/pseudo/drivers/scsi_debug/delay
+	_exit_scsi_debug
+
+	echo "Test complete"
+}
diff --git a/tests/scsi/004.out b/tests/scsi/004.out
new file mode 100644
index 0000000..b1126fb
--- /dev/null
+++ b/tests/scsi/004.out
@@ -0,0 +1,3 @@ 
+Running scsi/004
+Input/output error
+Test complete