[15/34] check: run tests in a private pid/mount namespace

Message ID	173870406337.546134.5825194290554919668.stgit@frogsfrogsfrogs (mailing list archive)
State	New
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A49125A62C; Tue, 4 Feb 2025 21:26:14 +0000 (UTC) Date: Tue, 04 Feb 2025 13:26:13 -0800 Subject: [PATCH 15/34] check: run tests in a private pid/mount namespace From: "Darrick J. Wong" <djwong@kernel.org> To: zlang@redhat.com, djwong@kernel.org Cc: fstests@vger.kernel.org, linux-xfs@vger.kernel.org Message-ID: <173870406337.546134.5825194290554919668.stgit@frogsfrogsfrogs> In-Reply-To: <173870406063.546134.14070590745847431026.stgit@frogsfrogsfrogs> References: <173870406063.546134.14070590745847431026.stgit@frogsfrogsfrogs> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit
Series	[01/34] generic/476: fix fsstress process management \| expand [01/34] generic/476: fix fsstress process management [02/34] metadump: make non-local function variables more obvious [03/34] metadump: fix cleanup for v1 metadump testing [04/34] generic/019: don't fail if fio crashes while shutting down [05/34] fuzzy: do not set _FSSTRESS_PID when exercising fsx [06/34] common/rc: revert recursive unmount in _clear_mount_stack [07/34] common/dump: don't replace pids arbitrarily [08/34] common/populate: correct the parent pointer name creation formulae [09/34] generic/759,760: fix MADV_COLLAPSE detection and inclusion [10/34] generic/759,760: skip test if we can't set up a hugepage for IO [11/34] common/rc: create a wrapper for the su command [12/34] fuzzy: kill subprocesses with SIGPIPE, not SIGINT [13/34] common/rc: hoist pkill to a helper function [14/34] common: fix pkill by running test program in a separate session [15/34] check: run tests in a private pid/mount namespace [16/34] check: deprecate using process sessions to isolate test instances [17/34] common/rc: don't copy fsstress to $TEST_DIR [18/34] unmount: resume logging of stdout and stderr for filtering [19/34] mkfs: don't hardcode log size [20/34] common/rc: return mount_ret in _try_scratch_mount [21/34] preamble: fix missing _kill_fsstress [22/34] generic/650: revert SOAK DURATION changes [23/34] generic/032: fix pinned mount failure [24/34] fuzzy: stop __stress_scrub_fsx_loop if fsx fails [25/34] fuzzy: don't use readarray for xfsfind output [26/34] fuzzy: always stop the scrub fsstress loop on error [27/34] fuzzy: port fsx and fsstress loop to use --duration [28/34] fix _require_scratch_duperemove ordering [29/34] fsstress: fix a memory leak [30/34] fsx: fix leaked log file pointer [31/34] misc: don't put nr_cpus into the fsstress -n argument [32/34] common/config: add $here to FSSTRESS_PROG [33/34] config: add FSX_PROG variable [34/34] build: initialize stack variables to zero by default

diff --git a/check b/check index d854515ed1aac5..24b21cf139f927 100755 --- a/check +++ b/check @@ -674,6 +674,11 @@ _stash_test_status() { esac } +# Can we run in a private pid/mount namespace? +HAVE_PRIVATENS= +./tools/run_seq_pidns bash -c "exit 77" +test $? -eq 77 && HAVE_PRIVATENS=yes + # Can we run systemd scopes? HAVE_SYSTEMD_SCOPES= systemctl reset-failed "fstests-check" &>/dev/null @@ -691,22 +696,29 @@ _adjust_oom_score -500 # the system runs out of memory it'll be the test that gets killed and not the # test framework. The test is run in a separate process without any of our # functions, so we open-code adjusting the OOM score. -# -# If systemd is available, run the entire test script in a scope so that we can -# kill all subprocesses of the test if it fails to clean up after itself. This -# is essential for ensuring that the post-test unmount succeeds. Note that -# systemd doesn't automatically remove transient scopes that fail to terminate -# when systemd tells them to terminate (e.g. programs stuck in D state when -# systemd sends SIGKILL), so we use reset-failed to tear down the scope. -# -# Use setsid to run the test program with a separate session id so that we -# can pkill only the processes started by this test. _run_seq() { local res unset CHILDPID unset FSTESTS_ISOL # set by tools/run_seq_* - if [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then + if [ -n "${HAVE_PRIVATENS}" ]; then + # If pid and mount namespaces are available, run the whole test + # inside them so that the test cannot access any process or + # /tmp contents that it does not itself create. The ./$seq + # process is considered the "init" process of the pid + # namespace, so all subprocesses will be sent SIGKILL when it + # terminates. + ./tools/run_seq_pidns "./$seq" + res=$? + elif [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then + # If systemd is available, run the entire test script in a + # scope so that we can kill all subprocesses of the test if it + # fails to clean up after itself. This is essential for + # ensuring that the post-test unmount succeeds. Note that + # systemd doesn't automatically remove transient scopes that + # fail to terminate when systemd tells them to terminate (e.g. + # programs stuck in D state when systemd sends SIGKILL), so we + # use reset-failed to tear down the scope. local unit="$(systemd-escape "fs$seq").scope" systemctl reset-failed "${unit}" &> /dev/null systemd-run --quiet --unit "${unit}" --scope \ diff --git a/common/rc b/common/rc index bda80995f8dd55..25900533acb974 100644 --- a/common/rc +++ b/common/rc @@ -33,7 +33,11 @@ _test_sync() # Kill only the processes started by this test. _pkill() { - pkill --session 0 "$@" + if [ "$FSTESTS_ISOL" = "setsid" ]; then + pkill --session 0 "$@" + else + pkill "$@" + fi } # Common execution handling for fsstress invocation. @@ -2736,7 +2740,11 @@ _require_user_exists() # not, passing $SHELL in this manner works both for "su" and "su -c cmd". _su() { - su --session-command $SHELL "$@" + if [ "$FSTESTS_ISOL" = "setsid" ]; then + su --session-command $SHELL "$@" + else + su "$@" + fi } # check if a user exists and is able to execute commands. diff --git a/src/nsexec.c b/src/nsexec.c index 750d52df129716..5c0bc922153514 100644 --- a/src/nsexec.c +++ b/src/nsexec.c @@ -54,6 +54,7 @@ usage(char *pname) fpe(" If -M or -G is specified, -U is required\n"); fpe("-s Set uid/gid to 0 in the new user namespace\n"); fpe("-v Display verbose messages\n"); + fpe("-z Return child's return value\n"); fpe("\n"); fpe("Map strings for -M and -G consist of records of the form:\n"); fpe("\n"); @@ -144,6 +145,8 @@ int main(int argc, char *argv[]) { int flags, opt; + int return_child_status = 0; + int child_status = 0; pid_t child_pid; struct child_args args; char *uid_map, *gid_map; @@ -161,7 +164,7 @@ main(int argc, char *argv[]) setid = 0; gid_map = NULL; uid_map = NULL; - while ((opt = getopt(argc, argv, "+imnpuUM:G:vs")) != -1) { + while ((opt = getopt(argc, argv, "+imnpuUM:G:vsz")) != -1) { switch (opt) { case 'i': flags |= CLONE_NEWIPC; break; case 'm': flags |= CLONE_NEWNS; break; @@ -173,6 +176,7 @@ main(int argc, char *argv[]) case 'G': gid_map = optarg; break; case 'U': flags |= CLONE_NEWUSER; break; case 's': setid = 1; break; + case 'z': return_child_status = 1; break; default: usage(argv[0]); } } @@ -229,11 +233,19 @@ main(int argc, char *argv[]) close(args.pipe_fd[1]); - if (waitpid(child_pid, NULL, 0) == -1) /* Wait for child */ + if (waitpid(child_pid, &child_status, 0) == -1) /* Wait for child */ errExit("waitpid"); if (verbose) - printf("%s: terminating\n", argv[0]); + printf("%s: terminating %d\n", argv[0], WEXITSTATUS(child_status)); + + if (return_child_status) { + if (WIFEXITED(child_status)) + exit(WEXITSTATUS(child_status)); + if (WIFSIGNALED(child_status)) + exit(WTERMSIG(child_status) + 128); /* like sh */ + exit(EXIT_FAILURE); + } exit(EXIT_SUCCESS); } diff --git a/tests/generic/504 b/tests/generic/504 index 271c040e7b842a..96f18a0bbc7ba2 100755 --- a/tests/generic/504 +++ b/tests/generic/504 @@ -18,7 +18,7 @@ _cleanup() { exec {test_fd}<&- cd / - rm -f $tmp.* + rm -r -f $tmp.* } # Import common functions. @@ -35,13 +35,24 @@ echo inode $tf_inode >> $seqres.full # Create new fd by exec exec {test_fd}> $testfile -# flock locks the fd then exits, we should see the lock info even the owner is dead +# flock locks the fd then exits, we should see the lock info even the owner is +# dead. If we're using pid namespace isolation we have to move /proc so that +# we can access the /proc/locks from the init_pid_ns. +if [ "$FSTESTS_ISOL" = "privatens" ]; then + move_proc="$tmp.procdir" + mkdir -p "$move_proc" + mount --move /proc "$move_proc" +fi flock -x $test_fd cat /proc/locks >> $seqres.full # Checking grep -q ":$tf_inode " /proc/locks || echo "lock info not found" +if [ -n "$move_proc" ]; then + mount --move "$move_proc" /proc +fi + # success, all done status=0 echo "Silence is golden" diff --git a/tools/run_seq_pidns b/tools/run_seq_pidns new file mode 100755 index 00000000000000..df94974ab30c3c --- /dev/null +++ b/tools/run_seq_pidns @@ -0,0 +1,28 @@ +#!/bin/bash + +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2025 Oracle. All Rights Reserved. +# +# Try starting things in a private pid/mount namespace with a private /tmp +# and /proc so that child process trees cannot interfere with each other. + +if [ -n "${FSTESTS_ISOL}" ]; then + for path in /proc /tmp; do + mount --make-private "$path" + done + mount -t proc proc /proc + mount -t tmpfs tmpfs /tmp + + # Allow the test to become a target of the oom killer + oom_knob="/proc/self/oom_score_adj" + test -w "${oom_knob}" && echo 250 > "${oom_knob}" + + exec "$@" +fi + +if [ -z "$1" ] || [ "$1" = "--help" ]; then + echo "Usage: $0 command [args...]" + exit 1 +fi + +FSTESTS_ISOL=privatens exec "$(dirname "$0")/../src/nsexec" -z -m -p "$0" "$@"

[15/34] check: run tests in a private pid/mount namespace

Commit Message

Comments

Patch