Message ID | 20180416215247.27290-1-alan.christopher.jenkins@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Apr 16, 2018 at 10:52:47PM +0100, Alan Jenkins wrote: > > Without this fix, I get an IO error in this test: > > > > # dd if=/dev/sda of=/dev/null iflag=direct & \ > > while killall -SIGUSR1 dd; do sleep 0.1; done & \ > > echo mem > /sys/power/state ; \ > > sleep 5; killall dd # stop after 5 seconds > > linux-block insisted they wanted this, based on my reproducer above. > If you start wondering why you wouldn't base it on scsi_debug with a new > "quiesce" option, that makes two of us. Thanks for doing this: > + # Maybe annoying for Xen dom0, but I'm guessing it's not common. > + if grep -q ^flags.*\ hypervisor /proc/cpuinfo && > + (( !HAVE_BARE_METAL_SCSI )); then > + SKIP_REASON=\ > +"Hypervisor detected, but this test wants bare-metal SCSI timings. > +If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1." > + return 1 > + fi I don't think we need this, if people want to shoot themselves in the foot by runnning a possibly destructive test suite in a HV we shouldn't stop them. > + > + _have_fio Not needed anymore as per below comment? > + # I tried using fio with the exact same IO pattern, > + # sadly it was not 100% reliable at reproducing this > + # issue. > + # > + dd iflag=direct if="$TEST_DEV" of=/dev/null bs=$bs status=none & > + dd_pid=$! > + > +# fio --output="$FULL" --loops=65535 \ > +# --thread --exitall_on_error \ > +# --direct=1 \ > +# --bs=$bs --rw=read --name=reads \ > +# --filename="$TEST_DEV" & > +# fio_pid=$! I think we can just zap the commented out fio part and the comment about it.
On 17/04/18 08:21, Johannes Thumshirn wrote: > On Mon, Apr 16, 2018 at 10:52:47PM +0100, Alan Jenkins wrote: >>> Without this fix, I get an IO error in this test: >>> >>> # dd if=/dev/sda of=/dev/null iflag=direct & \ >>> while killall -SIGUSR1 dd; do sleep 0.1; done & \ >>> echo mem > /sys/power/state ; \ >>> sleep 5; killall dd # stop after 5 seconds >> linux-block insisted they wanted this, based on my reproducer above. >> If you start wondering why you wouldn't base it on scsi_debug with a new >> "quiesce" option, that makes two of us. > Thanks for doing this: >> + # Maybe annoying for Xen dom0, but I'm guessing it's not common. >> + if grep -q ^flags.*\ hypervisor /proc/cpuinfo && >> + (( !HAVE_BARE_METAL_SCSI )); then >> + SKIP_REASON=\ >> +"Hypervisor detected, but this test wants bare-metal SCSI timings. >> +If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1." >> + return 1 >> + fi > I don't think we need this, if people want to shoot themselves in the > foot by runnning a possibly destructive test suite in a HV we > shouldn't stop them. Thanks, this is what I was hoping to get discussion on :). What is meant by HV? This is phrased to solve a problem I hadn't mentioned previously: + # I can't expect to hit the window using bash, if the device is + # emulated by cpu. I haven't tested this reproducer against devices emulated in software. It's written against real hardware which takes a whole second to bring the SATA link up. (And maybe spin up the hdd as well?). I'm not familiar with linux-block's testing, to know how prevalent that bare-metal access is. I would like to avoid putting out a lot of "pass" where it's effectively being skipped. (The comment you did quote just refers to this check being annoying in dom0, because I assume the check detects dom0 as being virtualized, despite it having access to bare-metal scsi devices. It's not a great comment; I will clarify it a bit). Yes, if this reason turns out to be marginal, I would be back to the concern about reliability of VM suspend, and wanting it to be opt-in with DO_PM_TEST=1 in the config file or something. As a newb to blk-tests I would be very frustrated to spin up virt-manager with a virtual test device, because I would run into what turns out to be an unfixed kernel bug. I'm happy to have the simpler check for DO_PM_TEST (with more verbose SKIP_REASON) if it works; maybe the autodetection just made extra complexity. >> + >> + _have_fio > Not needed anymore as per below comment? Good point, yes (but see below). >> + # I tried using fio with the exact same IO pattern, >> + # sadly it was not 100% reliable at reproducing this >> + # issue. >> + # >> + dd iflag=direct if="$TEST_DEV" of=/dev/null bs=$bs status=none & >> + dd_pid=$! >> + >> +# fio --output="$FULL" --loops=65535 \ >> +# --thread --exitall_on_error \ >> +# --direct=1 \ >> +# --bs=$bs --rw=read --name=reads \ >> +# --filename="$TEST_DEV" & >> +# fio_pid=$! > I think we can just zap the commented out fio part and the comment > about it. fio was attractive as a way to avoid the failure case for ludicrously small/fast devices. Later I actually hit that case, from running this test on a small unused partition :). I think I've worked out the reliability now. So I can start using fio, and have a solid answer to both issues. Alan
On Tue, Apr 17, 2018 at 04:55:51PM +0100, Alan Jenkins wrote: > On 17/04/18 08:21, Johannes Thumshirn wrote: > > On Mon, Apr 16, 2018 at 10:52:47PM +0100, Alan Jenkins wrote: > > > > Without this fix, I get an IO error in this test: > > > > > > > > # dd if=/dev/sda of=/dev/null iflag=direct & \ > > > > while killall -SIGUSR1 dd; do sleep 0.1; done & \ > > > > echo mem > /sys/power/state ; \ > > > > sleep 5; killall dd # stop after 5 seconds > > > linux-block insisted they wanted this, based on my reproducer above. > > > If you start wondering why you wouldn't base it on scsi_debug with a new > > > "quiesce" option, that makes two of us. > > Thanks for doing this: > > > + # Maybe annoying for Xen dom0, but I'm guessing it's not common. > > > + if grep -q ^flags.*\ hypervisor /proc/cpuinfo && > > > + (( !HAVE_BARE_METAL_SCSI )); then > > > + SKIP_REASON=\ > > > +"Hypervisor detected, but this test wants bare-metal SCSI timings. > > > +If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1." > > > + return 1 > > > + fi > > I don't think we need this, if people want to shoot themselves in the > > foot by runnning a possibly destructive test suite in a HV we > > shouldn't stop them. > > Thanks, this is what I was hoping to get discussion on :). > > What is meant by HV? Hypervisor, sorry. > > This is phrased to solve a problem I hadn't mentioned previously: > > + # I can't expect to hit the window using bash, if the device is > + # emulated by cpu. > > I haven't tested this reproducer against devices emulated in software. It's > written against real hardware which takes a whole second to bring the SATA > link up. (And maybe spin up the hdd as well?). > > I'm not familiar with linux-block's testing, to know how prevalent that > bare-metal access is. I would like to avoid putting out a lot of "pass" > where it's effectively being skipped. > > (The comment you did quote just refers to this check being annoying in dom0, > because I assume the check detects dom0 as being virtualized, despite it > having access to bare-metal scsi devices. It's not a great comment; I will > clarify it a bit). > > Yes, if this reason turns out to be marginal, I would be back to the concern > about reliability of VM suspend, and wanting it to be opt-in with > DO_PM_TEST=1 in the config file or something. As a newb to blk-tests I > would be very frustrated to spin up virt-manager with a virtual test device, > because I would run into what turns out to be an unfixed kernel bug. Well it's a test result as well, isn't it ;-). I personally run all my testing in VMs (with optional PCI passthrough if I need special Hardware). But yes this is just my personal preference. > I'm happy to have the simpler check for DO_PM_TEST (with more verbose > SKIP_REASON) if it works; maybe the autodetection just made extra > complexity. > > > + > > > + _have_fio > > Not needed anymore as per below comment? > > Good point, yes (but see below). > > > > + # I tried using fio with the exact same IO pattern, > > > + # sadly it was not 100% reliable at reproducing this > > > + # issue. > > > + # > > > + dd iflag=direct if="$TEST_DEV" of=/dev/null bs=$bs status=none & > > > + dd_pid=$! > > > + > > > +# fio --output="$FULL" --loops=65535 \ > > > +# --thread --exitall_on_error \ > > > +# --direct=1 \ > > > +# --bs=$bs --rw=read --name=reads \ > > > +# --filename="$TEST_DEV" & > > > +# fio_pid=$! > > I think we can just zap the commented out fio part and the comment > > about it. > > fio was attractive as a way to avoid the failure case for ludicrously > small/fast devices. Later I actually hit that case, from running this test > on a small unused partition :). > > I think I've worked out the reliability now. So I can start using fio, and > have a solid answer to both issues. OK Thanks, Johannes
diff --git a/tests/scsi/004 b/tests/scsi/004 new file mode 100755 index 0000000..ef42033 --- /dev/null +++ b/tests/scsi/004 @@ -0,0 +1,235 @@ +#!/bin/bash +# +# Regression test for patch "block: do not use interruptible wait anywhere". +# +# > Without this fix, I get an IO error in this test: +# > +# > # dd if=/dev/sda of=/dev/null iflag=direct & \ +# > while killall -SIGUSR1 dd; do sleep 0.1; done & \ +# > echo mem > /sys/power/state ; \ +# > sleep 5; killall dd # stop after 5 seconds +# +# AJ: linux-block insisted they wanted this, based on my reproducer above. +# If you start wondering why you wouldn't base it on scsi_debug with a new +# "quiesce" option, that makes two of us. +# Thread: http://lkml.kernel.org/r/891e334c-cf19-032c-b996-59ac166fcde1@gmail.com +# +# +# RATIONALE for a suspend test: +# +# The original root cause issue was the behaviour around blk_queue_freeze(). +# It put tasks into an interruptible wait, which is wrong for block devices. +# +# The freeze feature is not directly exposed to userspace, so I can not test +# it directly :(. (It's used to "guarantee no request is in use, so we can +# change any data structure of the queue afterward". I.e. freeze, modify the +# queue structure, unfreeze). +# +# However, this lead to a regression with a decent reproducer. In v4.15 the +# same interruptible wait was also used for SCSI suspend/resume. SCSI resume +# can take a second or so... hence we like to do it asynchronously. This +# means we can observe the wait at resume time, and we can test if it is +# interruptible. +# +# Note `echo quiesce > /sys/class/scsi_device/*/device/state` can *not* +# trigger the specific wait in the block layer. That code path only +# sets the SCSI device state; it does not set any block device state. +# (It does not call into blk_queue_freeze() or blk_set_preempt_only(); +# it literally just sets sdev->sdev_state to SDEV_QUIESCE). +# +# +# Copyright (C) 2018 Alan Jenkins +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +DESCRIPTION="check SCSI blockdev suspend is not interruptible" + +QUICK=1 + +requires() { + # I can't expect to hit the window using bash, if the device is + # emulated by cpu. + # + # Maybe annoying for Xen dom0, but I'm guessing it's not common. + if grep -q ^flags.*\ hypervisor /proc/cpuinfo && + (( !HAVE_BARE_METAL_SCSI )); then + SKIP_REASON=\ +"Hypervisor detected, but this test wants bare-metal SCSI timings. +If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1." + return 1 + fi + + # "If a user has disabled async probing a likely reason + # is due to a storage enclosure that does not inject + # staggered spin-ups. For safety, make resume + # synchronous as well in that case." + if ! scan="$(cat /sys/module/scsi_mod/parameters/scan)"; then + SKIP_REASON="Could not read '/sys/module/scsi_mod/parameters/scan'" + return 1 + fi + if [[ "$scan" != async ]]; then + SKIP_REASON="This test does not work if you have set 'scsi_mod.scan=sync'" + return 1 + fi + + if ! cat /sys/power/pm_test > /dev/null; then + SKIP_REASON="Error reading pm_test. Maybe kernel lacks CONFIG_PM_TEST?" + return 1 + fi + + _have_fio +} + +do_test_device() ( # run whole function in a subshell + + sysfs_pm_test_delay=/sys/module/suspend/parameters/pm_test_delay + saved_pm_test_delay= + dd_pid= + subshell_pid= + + fail() { + echo "$*" + exit 1 + } + + cleanup_pid() { + local pid="$1" + + # Suppress shell messages, about killed process + exec 3>&1 4>&2 + exec >>"$FULL" 2>&1 + + kill "$pid" >&3 2>&4 + + # Don't try to re-redirect output from `wait` just in case, + # if `wait` is executed in a subshell then it cannot work. + wait "$pid" + + # Restore stdout/stderr + exec >&3 2>&4 + exec 3>&- 4>&- + } + + cleanup() { + if [[ -n "$subshell_pid" ]]; then + echo "Killing sub-shell..." + cleanup_pid "$subshell_pid" + fi + if [[ -n "$dd_pid" ]]; then + echo "Killing dd..." + cleanup_pid "$dd_pid" + fi + + echo "Resetting pm_test" + echo none > /sys/power/pm_test + + echo "Resetting pm_test_delay" + if [[ -n "$saved_pm_test_delay" ]]; then + echo "$saved_pm_test_delay" > "$sysfs_pm_test_delay" + fi + } + trap cleanup EXIT + + # Have not tested on devices which require larger IO, + # but let's not die if we see one. + bs="$(_test_dev_queue_get hw_sector_size)" + + # Start dd, as a background process which submits IOs + # and yells when one fails. We want to test the block + # layer, so we use direct IO to avoid being served + # from page cache. + # + # I tried using fio with the exact same IO pattern, + # sadly it was not 100% reliable at reproducing this + # issue. + # + dd iflag=direct if="$TEST_DEV" of=/dev/null bs=$bs status=none & + dd_pid=$! + +# fio --output="$FULL" --loops=65535 \ +# --thread --exitall_on_error \ +# --direct=1 \ +# --bs=$bs --rw=read --name=reads \ +# --filename="$TEST_DEV" & +# fio_pid=$! + + # Keep sending signals to 'dd'. Give it 1ms between + # signals so it gets a chance to actually submit IOs. + # + # In theory this script is probably subject to various + # pid re-use races. But I started in sh... so far + # blktests does not depend on python... also direct IO + # is best to reproduce this, which is not built in to + # python. + ( + while kill -STOP $dd_pid 2>/dev/null && + kill -CONT $dd_pid 2>/dev/null; do + + sleep 0.001 + done + + # Wait to be killed. Simplifies cleanup. + while true; do + sleep 1 + done + ) & + subshell_pid=$! + + if ! echo devices > /sys/power/pm_test; then + fail "error setting pm_test" + fi + + if ! saved_pm_test_delay="$(cat "$sysfs_pm_test_delay")"; then + fail "error reading pm_test_delay" + fi + if ! echo 0 > "$sysfs_pm_test_delay"; then + fail "error setting pm_test_delay" + fi + + # Log that we're suspending. User might not have guessed, + # or maybe suspend (or pm_test suspend) is broken on this system. + echo "Simulating suspend/resume now" + echo mem > /sys/power/state + + # Now wait for TEST_DEV to resume asynchronously + dd iflag=direct if="$TEST_DEV" of=/dev/null count=1 status=none + + # Wait another second. This might be useful in the case dd got blocked on a + # page fault during the suspend; it will have a second to get sorted out, + # so it can potentially receive an IO error and exit. + sleep 1 + + if ! kill -0 $dd_pid 2>/dev/null; then + # dd exited before we entered cleanup. + # Read its exit status + wait $dd_pid + ret=$? + dd_pid= + + if [[ $ret == 0 ]]; then + echo "'dd' exited early, without error." + echo "Is your scsi device implausibly fast or small?" + else + # Test should already fail at this point due to + # error messages, but let's log it while we're here. + echo "'dd' exited with error $ret" + fi + fi +) # end subshell function + +test_device() { + echo "Running ${TEST_NAME}" + do_test_device + echo "Test complete" +} diff --git a/tests/scsi/004.out b/tests/scsi/004.out new file mode 100644 index 0000000..4085f62 --- /dev/null +++ b/tests/scsi/004.out @@ -0,0 +1,7 @@ +Running scsi/004 +Simulating suspend/resume now +Killing sub-shell... +Killing dd... +Resetting pm_test +Resetting pm_test_delay +Test complete