From patchwork Tue Apr 17 15:10:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Jenkins X-Patchwork-Id: 10345423 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4BDB060365 for ; Tue, 17 Apr 2018 15:10:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 33F262847D for ; Tue, 17 Apr 2018 15:10:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3142D284FF; Tue, 17 Apr 2018 15:10:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CC0A12848B for ; Tue, 17 Apr 2018 15:10:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751204AbeDQPKT (ORCPT ); Tue, 17 Apr 2018 11:10:19 -0400 Received: from mail-wr0-f179.google.com ([209.85.128.179]:44801 "EHLO mail-wr0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752786AbeDQPKS (ORCPT ); Tue, 17 Apr 2018 11:10:18 -0400 Received: by mail-wr0-f179.google.com with SMTP id o15so5526980wro.11 for ; Tue, 17 Apr 2018 08:10:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=g0fde05d5ufG8lFyRtqgvstk+37+OED1vXqugIGYYqw=; b=bNT7D/jdMvmcFztZcW5zLctresK6yXhg2E8ntbQsne+eQIvgxT4nyPLBU9/MCafcTs /6htEJGOaLLj0+00yMyyBXmyXMiw5DSiO3g2GXcOPRfsXjY5+81Lugkzbm4XwdbETvlq BegdLmQQYnKrkbYNCnIwJ5/4DdToJhigTNVlpE/KXHwjs3j5aN2py1kxYgux0bo0A2EO 62ZUnXHZb+pc/85yN7dkl48z/x/n0aBT3opxRrM+ODOWALHKxPGB3wSG8fV2ubFfWXjR oJqea9U57UOQioC7e8cHDKpDvbyAfrIZbqIi9qG0yTDvz6rNlFfx/IYMPnQ2KrAB2Q79 OzuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=g0fde05d5ufG8lFyRtqgvstk+37+OED1vXqugIGYYqw=; b=ZFv7SwJnkBHrbadZRrfHEHbw+ndrpIa/WCuxtkhpYxGP2L63lHe02Hum1b5KlXAk1y mZ2pGyfILdoiTkEnNjEwhenmXrmaXOqnTX8/sczFyEoOel/qhK7pkiL349U34z/1tb5P apO0nlxGM0a+KhXtCR9387eluVUTIfS08E/VshxOGAZkD9jKy2zYIrB64C+VLjFWLsKU GdOjVDrSPX8bnQgtGq9yOEnYgf74ujs2HGxJuPseEbhEy++goizI+KTzeuHiGXM5CNcX n0g4FCMu+o2hPL0FVHsPLgV4n77M+zVSJl+NJDVMQ05UpSAX6hlCzP8dhY0QF9/BIcgD +ONw== X-Gm-Message-State: ALQs6tC9xRB7gYT56l5M11DSipXfGxznANG8LRPJxWxIKU9MdoFZG974 l8A8EZLfF32V4lB/iadITbE= X-Google-Smtp-Source: AIpwx4/Ro4hPUTXSSdUDljKBb+x5DPKUe7sl0cWjAdD889ksaH/VOlDC/PH/9kginA1/W9+KnRom1w== X-Received: by 10.223.176.29 with SMTP id f29mr2006711wra.39.1523977816444; Tue, 17 Apr 2018 08:10:16 -0700 (PDT) Received: from alan-laptop.carrier.duckdns.org (host-89-243-165-90.as13285.net. [89.243.165.90]) by smtp.gmail.com with ESMTPSA id e11sm13096310wma.4.2018.04.17.08.10.14 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 17 Apr 2018 08:10:15 -0700 (PDT) From: Alan Jenkins To: Johannes Thumshirn , Jens Axboe , linux-block@vger.kernel.org Cc: Bart Van Assche , Alan Jenkins Subject: [PATCH v3] blktests: regression test "block: do not use interruptible wait anywhere" Date: Tue, 17 Apr 2018 16:10:00 +0100 Message-Id: <20180417151000.9931-1-alan.christopher.jenkins@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <78254510-5ba0-1bba-dd4a-87fdc88e55ee@gmail.com> References: <78254510-5ba0-1bba-dd4a-87fdc88e55ee@gmail.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP > Without this fix, I get an IO error in this test: > > # dd if=/dev/sda of=/dev/null iflag=direct & \ > while killall -SIGUSR1 dd; do sleep 0.1; done & \ > echo mem > /sys/power/state ; \ > sleep 5; killall dd # stop after 5 seconds linux-block specifically asked for a test derived from this reproducer. They didn't come up with any suggestion for testing the code more directly (and robustly). So this test uses system suspend, automated with pm_test. Signed-off-by: Alan Jenkins --- v3: Switch from dd to fio, clarify some comment. The HAVE_BARE_METAL_SCSI check is left unchanged, waiting for further discussion. tests/scsi/004 | 255 +++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/scsi/004.out | 12 +++ 2 files changed, 267 insertions(+) create mode 100755 tests/scsi/004 create mode 100644 tests/scsi/004.out diff --git a/tests/scsi/004 b/tests/scsi/004 new file mode 100755 index 0000000..2a7f794 --- /dev/null +++ b/tests/scsi/004 @@ -0,0 +1,255 @@ +#!/bin/bash +# +# Regression test for patch "block: do not use interruptible wait anywhere". +# +# > Without this fix, I get an IO error in this test: +# > +# > # dd if=/dev/sda of=/dev/null iflag=direct & \ +# > while killall -SIGUSR1 dd; do sleep 0.1; done & \ +# > echo mem > /sys/power/state ; \ +# > sleep 5; killall dd # stop after 5 seconds +# +# AJ: linux-block specifically asked for a test derived from this reproducer. +# They didn't come up with any suggestion for testing the code more directly +# (and robustly). So this test uses system suspend, automated with pm_test. +# +# +# Rationale for the test needing system suspend: +# +# The original root cause issue was the behaviour around blk_queue_freeze(). +# It put tasks into an interruptible wait, which is wrong for block devices. +# +# The freeze feature is not directly exposed to userspace, so I can not test +# it directly :(. (It's used to "guarantee no request is in use, so we can +# change any data structure of the queue afterward". I.e. freeze, modify the +# queue structure, unfreeze). +# +# However, this lead to a kernel regression with a decent reproducer. In +# v4.15 the same interruptible wait was also used for SCSI suspend/resume. +# SCSI resume can take a second or so, hence we like to do it asynchronously. +# This means we can observe the wait at resume time, and we can test if it is +# interruptible. +# +# Note `echo quiesce > /sys/class/scsi_device/*/device/state` can *not* +# trigger the specific wait in the block layer. That code path only +# sets the SCSI device state; it does not set any block device state. +# (It does not call into blk_queue_freeze() or blk_set_preempt_only(); +# it literally just sets sdev->sdev_state to SDEV_QUIESCE). +# +# +# Copyright (C) 2018 Alan Jenkins +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +DESCRIPTION="check SCSI blockdev suspend is not interruptible" + +QUICK=1 + +requires() { + # I can't expect to hit the window using bash, if the device is + # emulated by cpu. + # + # Maybe annoying to see this message on Xen dom0, + # but I'm guessing that's not common. + # + if grep -q ^flags.*\ hypervisor /proc/cpuinfo && + (( !HAVE_BARE_METAL_SCSI )); then + SKIP_REASON=\ +"Hypervisor detected, but this test wants bare-metal SCSI timings. +If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1." + return 1 + fi + + # "If a user has disabled async probing a likely reason + # is due to a storage enclosure that does not inject + # staggered spin-ups. For safety, make resume + # synchronous as well in that case." + if ! scan="$(cat /sys/module/scsi_mod/parameters/scan)"; then + SKIP_REASON="Could not read '/sys/module/scsi_mod/parameters/scan'" + return 1 + fi + if [[ "$scan" != async ]]; then + SKIP_REASON="This test does not work if you have set 'scsi_mod.scan=sync'" + return 1 + fi + + if ! cat /sys/power/pm_test > /dev/null; then + SKIP_REASON="Error reading pm_test. Maybe kernel lacks CONFIG_PM_TEST?" + return 1 + fi + + _have_fio +} + +do_test_device() ( # run whole function in a subshell + + sysfs_pm_test_delay=/sys/module/suspend/parameters/pm_test_delay + saved_pm_test_delay= + fio_pid= + subshell_pid= + + # Fail the test early, in cases where it should not continue. + fail() { + echo "$*" + exit 1 + } + + # Terminate child process + cleanup_pid() { + local pid="$1" + + # Suppress shell messages about killed process. + # The messages would vary, causing the test to fail. + exec 3>&1 4>&2 + exec >>"$FULL" 2>&1 + + # Send terminate signal. Also send the continue signal, + # in case the process was currently stopped. + (kill "$pid" && kill -CONT "$pid") >&3 2>&4 + + # Don't try to re-redirect output from `wait` just in case, + # if `wait` is executed in a subshell then it cannot work. + wait "$pid" + + # Restore stdout/stderr + exec >&3 2>&4 + exec 3>&- 4>&- + } + + cleanup() { + if [[ -n "$subshell_pid" ]]; then + echo "Killing sub-shell..." + cleanup_pid "$subshell_pid" + fi + if [[ -n "$fio_pid" ]]; then + echo "Killing fio..." + cleanup_pid "$fio_pid" + fi + + echo "Resetting pm_test_delay" + if [[ -n "$saved_pm_test_delay" ]]; then + echo "$saved_pm_test_delay" > "$sysfs_pm_test_delay" + fi + + echo "Resetting pm_test" + echo none > /sys/power/pm_test + } + trap cleanup EXIT + + # Start fio, as a background process which submits IOs and stops + # with an error when one fails. Use threads instead of separate + # processes, so it's easier to send signals to the IO thread. + # + # This is the same behaviour as dd, except that we loop in case the + # device is tiny. (Strictly speaking, the block size is different too). + # + fio --output="${FULL}.fio" --filename="$TEST_DEV" \ + --thread --exitall_on_error --loops=1G \ + --direct=1 --rw=read --name=reads & + fio_pid=$! + + # Keep sending signals to 'fio`. Give it 1ms between + # signals so it gets a chance to actually submit IOs. + # + # In theory this script is probably subject to various + # pid re-use races. But I started in sh... so far + # blktests does not depend on python... also direct IO + # is best to reproduce this, which is not built in to + # python. + ( + while kill -STOP $fio_pid 2>>"$FULL" && + kill -CONT $fio_pid 2>>"$FULL"; do + + sleep 0.001 + done + + # dd exited. Wait to be killed, it simplifies cleanup. + while true; do + sleep 1 + done + ) & + subshell_pid=$! + + # Here's the real race condition. + # + # We only want to suspend once both child processes have reached their + # main loops. Otherwise we get a false pass. We use the following + # mitigations: + # + # 1. Wait 1 second first. + # + # 2. Make sure to call this function twice, so hopefully the second + # time will not have to wait to page anything in. + # + # 3. Wait for any pending writes first. I think that this redundant in + # principle, but will make for more consistent timings. + # + # (You can actually solve this precisely using strace or the like... + # but it still looks weird, and adds another depedency) + # + sync + sleep 1 + + if ! echo devices > /sys/power/pm_test; then + fail "error setting pm_test" + fi + + if ! saved_pm_test_delay="$(cat "$sysfs_pm_test_delay")"; then + fail "error reading pm_test_delay" + fi + if ! echo 0 > "$sysfs_pm_test_delay"; then + fail "error setting pm_test_delay" + fi + + # Log that we're suspending. User might not have guessed, + # or maybe suspend (or pm_test suspend) is broken on this system. + echo "Simulating suspend/resume now" + echo mem > /sys/power/state + + # Now wait for TEST_DEV to resume asynchronously + dd iflag=direct if="$TEST_DEV" of=/dev/null count=1 status=none + + # Wait again. This will be useful in the case fio got blocked on a + # page fault during the suspend; it will have a second to get sorted out, + # so it can potentially receive an IO error and exit. + sleep 1 + dd iflag=direct if="$TEST_DEV" of=/dev/null count=1 status=none + + if ! kill -0 $fio_pid 2>/dev/null; then + # dd exited before we entered cleanup. + # Read its exit status + wait $fio_pid + ret=$? + fio_pid= + + if [[ $ret == 0 ]]; then + fail "'fio' exited early, without error. Please report this as a bug." + else + # Test should already fail at this point due to + # error messages. But let's log it while we're here, + # and also not run the second iteration of the test. + fail "'fio' exited with error $ret" + fi + fi +) # end subshell function + +test_device() { + echo "Running ${TEST_NAME}" + + # Run the test twice. Hopefully the second iteration will + # have everything in page cache for consistent timings. + do_test_device && do_test_device + + echo "Test complete" +} diff --git a/tests/scsi/004.out b/tests/scsi/004.out new file mode 100644 index 0000000..7211b4d --- /dev/null +++ b/tests/scsi/004.out @@ -0,0 +1,12 @@ +Running scsi/004 +Simulating suspend/resume now +Killing sub-shell... +Killing fio... +Resetting pm_test_delay +Resetting pm_test +Simulating suspend/resume now +Killing sub-shell... +Killing fio... +Resetting pm_test_delay +Resetting pm_test +Test complete