diff mbox series

[blktests,1/2] block/011: recover test target devices to online or live status

Message ID 20231129103145.655612-2-shinichiro.kawasaki@wdc.com (mailing list archive)
State New, archived
Headers show
Series improve block/011 | expand

Commit Message

Shinichiro Kawasaki Nov. 29, 2023, 10:31 a.m. UTC
The test case runs fio while disabling and enabling PCI device of the
test target block device. This often leaves the devices in offline or
dead status. For example, when the block device is a HDD connected to
HBA, kernel makes the device into offline mode with this message:

    sd x:x:x:x Device offlined - not ready after error recovery

This causes following test cases to fail. To avoid the failure, remove
and rescan the devices to get them back to online or live status. This
improvement is similar as the commit f8f33218eca7 ("block/011: recover
test target NVME device capacity"). While at this change, improve code
comments for the commit f8f33218eca7, and add missing local variable
declarations.

Of note is that the added rescan operation triggers a lockdep WARN if
the system has devices which depend on P2SB [1].

[1] https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsuawuqqqydx@7cl3uik5ef6j/

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 tests/block/011 | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

Comments

Bart Van Assche Nov. 29, 2023, 6:05 p.m. UTC | #1
On 11/29/23 02:31, Shin'ichiro Kawasaki wrote:
> +	if [[ -r $TEST_DEV_SYSFS/device/state ]]; then
> +		state=$(cat "$TEST_DEV_SYSFS/device/state")

Why the separate -r test and cat instead of this?

state=$({ cat "$TEST_DEV_SYSFS/device/state"; } 2>/dev/null)

Thanks,

Bart.
Shinichiro Kawasaki Dec. 4, 2023, 12:43 a.m. UTC | #2
On Nov 29, 2023 / 10:05, Bart Van Assche wrote:
> On 11/29/23 02:31, Shin'ichiro Kawasaki wrote:
> > +	if [[ -r $TEST_DEV_SYSFS/device/state ]]; then
> > +		state=$(cat "$TEST_DEV_SYSFS/device/state")
> 
> Why the separate -r test and cat instead of this?
> 
> state=$({ cat "$TEST_DEV_SYSFS/device/state"; } 2>/dev/null)

Thanks Bart, this looks the better. Will reflect this change.
diff mbox series

Patch

diff --git a/tests/block/011 b/tests/block/011
index a4230f4..2a967d1 100755
--- a/tests/block/011
+++ b/tests/block/011
@@ -38,6 +38,8 @@  device_requires() {
 test_device() {
 	echo "Running ${TEST_NAME}"
 
+	local pdev size rescan=false state i
+
 	pdev="$(_get_pci_dev_from_blkdev)"
 
 	if _test_dev_is_rotational; then
@@ -60,17 +62,36 @@  test_device() {
 
 	echo "Test complete"
 
-	# This test triggers NVME controller resets. When any failure happens
-	# during the resets, the driver marks the NVME block devices with zero
-	# capacity. Then following tests fail with the zero capacity devices. To
-	# get back the correct capacity, remove and rescan the devices.
+	# This test triggers NVME controller resets. When failures happen during
+	# the resets, the driver marks the NVME block devices as zero capacity.
+	# Remove and rescan the devices to regain the correct capacity.
 	if ((!$(<"$TEST_DEV_SYSFS/size"))); then
-		echo "$TEST_DEV has zero capacity" >> "$FULL"
-		if [[ -w $TEST_DEV_SYSFS/device/device/remove ]] &&
+		echo "$TEST_DEV has zero capacity. Rescan it." >> "$FULL"
+		rescan=true
+	fi
+
+	# This test case often makes NVME or HDDs connected to HBAs in offline
+	# or dead mode. Remove and rescan the devices to make them online again.
+	if [[ -r $TEST_DEV_SYSFS/device/state ]]; then
+		state=$(cat "$TEST_DEV_SYSFS/device/state")
+		if [[ $state == offline || $state == dead ]]; then
+			echo "$TEST_DEV is $state. Rescan it." >> "$FULL"
+			rescan=true
+		fi
+	fi
+
+	if [[ $rescan == true ]]; then
+		if [[ -w /sys/bus/pci/devices/$pdev/remove ]] &&
 			   [[ -w /sys/bus/pci/rescan ]]; then
-			echo "Rescan to tegain the correct capacity" >> "$FULL"
-			echo 1 > "$TEST_DEV_SYSFS/device/device/remove"
+			echo 1 > "/sys/bus/pci/devices/$pdev/remove"
 			echo 1 > /sys/bus/pci/rescan
+		else
+			echo "Can not rescan PCI device for recovery"
+			return 1
 		fi
+		for ((i = 0; i < 10; i++)); do
+			[[ -w $TEST_DEV ]] && break
+			sleep 5
+		done
 	fi
 }