Message ID | 173870406337.546134.5825194290554919668.stgit@frogsfrogsfrogs (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [01/34] generic/476: fix fsstress process management | expand |
On Tue, Feb 04, 2025 at 01:26:13PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > As mentioned in the previous patch, trying to isolate processes from > separate test instances through the use of distinct Unix process > sessions is annoying due to the many complications with signal handling. > > Instead, we could just use nsexec to run the test program with a private > pid namespace so that each test instance can only see its own processes; > and private mount namespace so that tests writing to /tmp cannot clobber > other tests or the stuff running on the main system. > > However, it's not guaranteed that a particular kernel has pid and mount > namespaces enabled. Mount (2.4.19) and pid (2.6.24) namespaces have > been around for a long time, but there's no hard requirement for the > latter to be enabled in the kernel. Therefore, this bugfix slips > namespace support in alongside the session id thing. > > Declaring CONFIG_PID_NS=n a deprecated configuration and removing > support should be a separate conversation, not something that I have to > do in a bug fix to get mainline QA back up. > > Cc: <fstests@vger.kernel.org> # v2024.12.08 > Fixes: 8973af00ec212f ("fstests: cleanup fsstress process management") > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> > --- > check | 34 +++++++++++++++++++++++----------- > common/rc | 12 ++++++++++-- > src/nsexec.c | 18 +++++++++++++++--- > tests/generic/504 | 15 +++++++++++++-- > tools/run_seq_pidns | 28 ++++++++++++++++++++++++++++ > 5 files changed, 89 insertions(+), 18 deletions(-) > create mode 100755 tools/run_seq_pidns Same question as for session ids - is this all really necessary (or desired) if check-parallel executes check in it's own private PID namespace? If so, then the code is fine apart from the same nit about tools/run_seq_pidns - call it run_pidns because this helper will also be used by check-parallel to run check in it's own private mount and PID namespaces... > diff --git a/tests/generic/504 b/tests/generic/504 > index 271c040e7b842a..96f18a0bbc7ba2 100755 > --- a/tests/generic/504 > +++ b/tests/generic/504 > @@ -18,7 +18,7 @@ _cleanup() > { > exec {test_fd}<&- > cd / > - rm -f $tmp.* > + rm -r -f $tmp.* > } > > # Import common functions. > @@ -35,13 +35,24 @@ echo inode $tf_inode >> $seqres.full > > # Create new fd by exec > exec {test_fd}> $testfile > -# flock locks the fd then exits, we should see the lock info even the owner is dead > +# flock locks the fd then exits, we should see the lock info even the owner is > +# dead. If we're using pid namespace isolation we have to move /proc so that > +# we can access the /proc/locks from the init_pid_ns. > +if [ "$FSTESTS_ISOL" = "privatens" ]; then > + move_proc="$tmp.procdir" > + mkdir -p "$move_proc" > + mount --move /proc "$move_proc" > +fi > flock -x $test_fd > cat /proc/locks >> $seqres.full > > # Checking > grep -q ":$tf_inode " /proc/locks || echo "lock info not found" > > +if [ -n "$move_proc" ]; then > + mount --move "$move_proc" /proc > +fi > + > # success, all done > status=0 > echo "Silence is golden" Urk. That explains the failure I've noticed but not had time to debug from check-parallel when using a private pidns. Do you know why /proc/locks in the overlaid mount does not show the locks taken from within that namespace? Is that a bug in the namespace/lock code? Regardless, the code looks ok so with the helper renamed: Reviewed-by: Dave Chinner <dchinner@redhat.com>
On Wed, Feb 05, 2025 at 11:37:00AM +1100, Dave Chinner wrote: > On Tue, Feb 04, 2025 at 01:26:13PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > As mentioned in the previous patch, trying to isolate processes from > > separate test instances through the use of distinct Unix process > > sessions is annoying due to the many complications with signal handling. > > > > Instead, we could just use nsexec to run the test program with a private > > pid namespace so that each test instance can only see its own processes; > > and private mount namespace so that tests writing to /tmp cannot clobber > > other tests or the stuff running on the main system. > > > > However, it's not guaranteed that a particular kernel has pid and mount > > namespaces enabled. Mount (2.4.19) and pid (2.6.24) namespaces have > > been around for a long time, but there's no hard requirement for the > > latter to be enabled in the kernel. Therefore, this bugfix slips > > namespace support in alongside the session id thing. > > > > Declaring CONFIG_PID_NS=n a deprecated configuration and removing > > support should be a separate conversation, not something that I have to > > do in a bug fix to get mainline QA back up. > > > > Cc: <fstests@vger.kernel.org> # v2024.12.08 > > Fixes: 8973af00ec212f ("fstests: cleanup fsstress process management") > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> > > --- > > check | 34 +++++++++++++++++++++++----------- > > common/rc | 12 ++++++++++-- > > src/nsexec.c | 18 +++++++++++++++--- > > tests/generic/504 | 15 +++++++++++++-- > > tools/run_seq_pidns | 28 ++++++++++++++++++++++++++++ > > 5 files changed, 89 insertions(+), 18 deletions(-) > > create mode 100755 tools/run_seq_pidns > > Same question as for session ids - is this all really necessary (or > desired) if check-parallel executes check in it's own private PID > namespace? > > If so, then the code is fine apart from the same nit about > tools/run_seq_pidns - call it run_pidns because this helper will > also be used by check-parallel to run check in it's own private > mount and PID namespaces... I prefer to name it tools/run_privatens since it creates more than just a pid namespace. At some point we might even decide to privatize more namespaces (e.g. do we want a private network namespace for nfs?) and I don't want this to become lsfmmbpfbbq'd, as it were. > > diff --git a/tests/generic/504 b/tests/generic/504 > > index 271c040e7b842a..96f18a0bbc7ba2 100755 > > --- a/tests/generic/504 > > +++ b/tests/generic/504 > > @@ -18,7 +18,7 @@ _cleanup() > > { > > exec {test_fd}<&- > > cd / > > - rm -f $tmp.* > > + rm -r -f $tmp.* > > } > > > > # Import common functions. > > @@ -35,13 +35,24 @@ echo inode $tf_inode >> $seqres.full > > > > # Create new fd by exec > > exec {test_fd}> $testfile > > -# flock locks the fd then exits, we should see the lock info even the owner is dead > > +# flock locks the fd then exits, we should see the lock info even the owner is > > +# dead. If we're using pid namespace isolation we have to move /proc so that > > +# we can access the /proc/locks from the init_pid_ns. > > +if [ "$FSTESTS_ISOL" = "privatens" ]; then > > + move_proc="$tmp.procdir" > > + mkdir -p "$move_proc" > > + mount --move /proc "$move_proc" > > +fi > > flock -x $test_fd > > cat /proc/locks >> $seqres.full > > > > # Checking > > grep -q ":$tf_inode " /proc/locks || echo "lock info not found" > > > > +if [ -n "$move_proc" ]; then > > + mount --move "$move_proc" /proc > > +fi > > + > > # success, all done > > status=0 > > echo "Silence is golden" > > Urk. That explains the failure I've noticed but not had time to > debug from check-parallel when using a private pidns. Do you know > why /proc/locks in the overlaid mount does not show the locks taken > from within that namespace? Is that a bug in the namespace/lock > code? I /think/ this happens because the code in fs/locks.c records the pid of "flock -x $test_fd" as the owner of the lock. But then flock exits, so that pid is no longer recorded in the pid_namespace and this code in locks_translate_pid: pid = find_pid_ns(fl->flc_pid, &init_pid_ns); vnr = pid_nr_ns(pid, ns); returns with vnr == 0, which causes locks_show to skip the lock. However, the underlying /proc is associated with init_pid_ns, so locks_translate_pid always returns a nonzero pid. Unfortunately, that means we can't have tools/run_privatens unmount the /proc it inherits before mounting the pidns-specific /proc. I'll note this in the commit message. > Regardless, the code looks ok so with the helper renamed: > > Reviewed-by: Dave Chinner <dchinner@redhat.com> Thanks! --D > -- > Dave Chinner > david@fromorbit.com >
On Wed, Feb 05, 2025 at 10:00:48AM -0800, Darrick J. Wong wrote: > On Wed, Feb 05, 2025 at 11:37:00AM +1100, Dave Chinner wrote: > > On Tue, Feb 04, 2025 at 01:26:13PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > As mentioned in the previous patch, trying to isolate processes from > > > separate test instances through the use of distinct Unix process > > > sessions is annoying due to the many complications with signal handling. > > > > > > Instead, we could just use nsexec to run the test program with a private > > > pid namespace so that each test instance can only see its own processes; > > > and private mount namespace so that tests writing to /tmp cannot clobber > > > other tests or the stuff running on the main system. > > > > > > However, it's not guaranteed that a particular kernel has pid and mount > > > namespaces enabled. Mount (2.4.19) and pid (2.6.24) namespaces have > > > been around for a long time, but there's no hard requirement for the > > > latter to be enabled in the kernel. Therefore, this bugfix slips > > > namespace support in alongside the session id thing. > > > > > > Declaring CONFIG_PID_NS=n a deprecated configuration and removing > > > support should be a separate conversation, not something that I have to > > > do in a bug fix to get mainline QA back up. > > > > > > Cc: <fstests@vger.kernel.org> # v2024.12.08 > > > Fixes: 8973af00ec212f ("fstests: cleanup fsstress process management") > > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> > > > --- > > > check | 34 +++++++++++++++++++++++----------- > > > common/rc | 12 ++++++++++-- > > > src/nsexec.c | 18 +++++++++++++++--- > > > tests/generic/504 | 15 +++++++++++++-- > > > tools/run_seq_pidns | 28 ++++++++++++++++++++++++++++ > > > 5 files changed, 89 insertions(+), 18 deletions(-) > > > create mode 100755 tools/run_seq_pidns > > > > Same question as for session ids - is this all really necessary (or > > desired) if check-parallel executes check in it's own private PID > > namespace? Forgot to respond to this question -- Because check-parallel runs (will run?) each child ./check instance in a private namepsace, each of those instances will be isolated from the others. So no, it's probably not absolutely necessary. However, there are a couple of reasons to let it happen: (a) the private ns that ./check creates in _run_seq() isolates the actual test code from its parent ./check process; and (b) the process started by nsexec is considered to be the "init" process for that namespace, so when it dies, the kernel will kill -9 all other processes in that namespace, so we won't have any stray fsstress processes that bleed into the next test. --D > > If so, then the code is fine apart from the same nit about > > tools/run_seq_pidns - call it run_pidns because this helper will > > also be used by check-parallel to run check in it's own private > > mount and PID namespaces... > > I prefer to name it tools/run_privatens since it creates more than just > a pid namespace. At some point we might even decide to privatize more > namespaces (e.g. do we want a private network namespace for nfs?) and I > don't want this to become lsfmmbpfbbq'd, as it were. > > > > diff --git a/tests/generic/504 b/tests/generic/504 > > > index 271c040e7b842a..96f18a0bbc7ba2 100755 > > > --- a/tests/generic/504 > > > +++ b/tests/generic/504 > > > @@ -18,7 +18,7 @@ _cleanup() > > > { > > > exec {test_fd}<&- > > > cd / > > > - rm -f $tmp.* > > > + rm -r -f $tmp.* > > > } > > > > > > # Import common functions. > > > @@ -35,13 +35,24 @@ echo inode $tf_inode >> $seqres.full > > > > > > # Create new fd by exec > > > exec {test_fd}> $testfile > > > -# flock locks the fd then exits, we should see the lock info even the owner is dead > > > +# flock locks the fd then exits, we should see the lock info even the owner is > > > +# dead. If we're using pid namespace isolation we have to move /proc so that > > > +# we can access the /proc/locks from the init_pid_ns. > > > +if [ "$FSTESTS_ISOL" = "privatens" ]; then > > > + move_proc="$tmp.procdir" > > > + mkdir -p "$move_proc" > > > + mount --move /proc "$move_proc" > > > +fi > > > flock -x $test_fd > > > cat /proc/locks >> $seqres.full > > > > > > # Checking > > > grep -q ":$tf_inode " /proc/locks || echo "lock info not found" > > > > > > +if [ -n "$move_proc" ]; then > > > + mount --move "$move_proc" /proc > > > +fi > > > + > > > # success, all done > > > status=0 > > > echo "Silence is golden" > > > > Urk. That explains the failure I've noticed but not had time to > > debug from check-parallel when using a private pidns. Do you know > > why /proc/locks in the overlaid mount does not show the locks taken > > from within that namespace? Is that a bug in the namespace/lock > > code? > > I /think/ this happens because the code in fs/locks.c records the pid of > "flock -x $test_fd" as the owner of the lock. But then flock exits, so > that pid is no longer recorded in the pid_namespace and this code in > locks_translate_pid: > > pid = find_pid_ns(fl->flc_pid, &init_pid_ns); > vnr = pid_nr_ns(pid, ns); > > returns with vnr == 0, which causes locks_show to skip the lock. > However, the underlying /proc is associated with init_pid_ns, so > locks_translate_pid always returns a nonzero pid. Unfortunately, that > means we can't have tools/run_privatens unmount the /proc it inherits > before mounting the pidns-specific /proc. > > I'll note this in the commit message. > > > Regardless, the code looks ok so with the helper renamed: > > > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > Thanks! > > --D > > > -- > > Dave Chinner > > david@fromorbit.com > > >
On Wed, Feb 05, 2025 at 10:00:48AM -0800, Darrick J. Wong wrote: > On Wed, Feb 05, 2025 at 11:37:00AM +1100, Dave Chinner wrote: > > On Tue, Feb 04, 2025 at 01:26:13PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > As mentioned in the previous patch, trying to isolate processes from > > > separate test instances through the use of distinct Unix process > > > sessions is annoying due to the many complications with signal handling. > > > > > > Instead, we could just use nsexec to run the test program with a private > > > pid namespace so that each test instance can only see its own processes; > > > and private mount namespace so that tests writing to /tmp cannot clobber > > > other tests or the stuff running on the main system. > > > > > > However, it's not guaranteed that a particular kernel has pid and mount > > > namespaces enabled. Mount (2.4.19) and pid (2.6.24) namespaces have > > > been around for a long time, but there's no hard requirement for the > > > latter to be enabled in the kernel. Therefore, this bugfix slips > > > namespace support in alongside the session id thing. > > > > > > Declaring CONFIG_PID_NS=n a deprecated configuration and removing > > > support should be a separate conversation, not something that I have to > > > do in a bug fix to get mainline QA back up. > > > > > > Cc: <fstests@vger.kernel.org> # v2024.12.08 > > > Fixes: 8973af00ec212f ("fstests: cleanup fsstress process management") > > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> > > > --- > > > check | 34 +++++++++++++++++++++++----------- > > > common/rc | 12 ++++++++++-- > > > src/nsexec.c | 18 +++++++++++++++--- > > > tests/generic/504 | 15 +++++++++++++-- > > > tools/run_seq_pidns | 28 ++++++++++++++++++++++++++++ > > > 5 files changed, 89 insertions(+), 18 deletions(-) > > > create mode 100755 tools/run_seq_pidns > > > > Same question as for session ids - is this all really necessary (or > > desired) if check-parallel executes check in it's own private PID > > namespace? > > > > If so, then the code is fine apart from the same nit about > > tools/run_seq_pidns - call it run_pidns because this helper will > > also be used by check-parallel to run check in it's own private > > mount and PID namespaces... > > I prefer to name it tools/run_privatens since it creates more than just > a pid namespace. I'm fine with that. It was only the "seq" part of the name that triggered me. > At some point we might even decide to privatize more > namespaces (e.g. do we want a private network namespace for nfs?) and I > don't want this to become lsfmmbpfbbq'd, as it were. *nod* -Dave.
On Wed, Feb 05, 2025 at 10:19:59AM -0800, Darrick J. Wong wrote: > On Wed, Feb 05, 2025 at 10:00:48AM -0800, Darrick J. Wong wrote: > > On Wed, Feb 05, 2025 at 11:37:00AM +1100, Dave Chinner wrote: > > > On Tue, Feb 04, 2025 at 01:26:13PM -0800, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > > > As mentioned in the previous patch, trying to isolate processes from > > > > separate test instances through the use of distinct Unix process > > > > sessions is annoying due to the many complications with signal handling. > > > > > > > > Instead, we could just use nsexec to run the test program with a private > > > > pid namespace so that each test instance can only see its own processes; > > > > and private mount namespace so that tests writing to /tmp cannot clobber > > > > other tests or the stuff running on the main system. > > > > > > > > However, it's not guaranteed that a particular kernel has pid and mount > > > > namespaces enabled. Mount (2.4.19) and pid (2.6.24) namespaces have > > > > been around for a long time, but there's no hard requirement for the > > > > latter to be enabled in the kernel. Therefore, this bugfix slips > > > > namespace support in alongside the session id thing. > > > > > > > > Declaring CONFIG_PID_NS=n a deprecated configuration and removing > > > > support should be a separate conversation, not something that I have to > > > > do in a bug fix to get mainline QA back up. > > > > > > > > Cc: <fstests@vger.kernel.org> # v2024.12.08 > > > > Fixes: 8973af00ec212f ("fstests: cleanup fsstress process management") > > > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> > > > > --- > > > > check | 34 +++++++++++++++++++++++----------- > > > > common/rc | 12 ++++++++++-- > > > > src/nsexec.c | 18 +++++++++++++++--- > > > > tests/generic/504 | 15 +++++++++++++-- > > > > tools/run_seq_pidns | 28 ++++++++++++++++++++++++++++ > > > > 5 files changed, 89 insertions(+), 18 deletions(-) > > > > create mode 100755 tools/run_seq_pidns > > > > > > Same question as for session ids - is this all really necessary (or > > > desired) if check-parallel executes check in it's own private PID > > > namespace? > > Forgot to respond to this question -- > > Because check-parallel runs (will run?) each child ./check instance in a > private namepsace, each of those instances will be isolated from the > others. So no, it's probably not absolutely necessary. > > However, there are a couple of reasons to let it happen: (a) the private > ns that ./check creates in _run_seq() isolates the actual test code from > its parent ./check process; and (b) the process started by nsexec is > considered to be the "init" process for that namespace, so when it dies, > the kernel will kill -9 all other processes in that namespace, so we > won't have any stray fsstress processes that bleed into the next test. Ok - that might be worth adding to the commit message so that anyone looking at it in a few months time doesn't need to remember this detail. -Dave.
On Thu, Feb 06, 2025 at 08:15:03AM +1100, Dave Chinner wrote: > On Wed, Feb 05, 2025 at 10:19:59AM -0800, Darrick J. Wong wrote: > > On Wed, Feb 05, 2025 at 10:00:48AM -0800, Darrick J. Wong wrote: > > > On Wed, Feb 05, 2025 at 11:37:00AM +1100, Dave Chinner wrote: > > > > On Tue, Feb 04, 2025 at 01:26:13PM -0800, Darrick J. Wong wrote: > > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > > > > > As mentioned in the previous patch, trying to isolate processes from > > > > > separate test instances through the use of distinct Unix process > > > > > sessions is annoying due to the many complications with signal handling. > > > > > > > > > > Instead, we could just use nsexec to run the test program with a private > > > > > pid namespace so that each test instance can only see its own processes; > > > > > and private mount namespace so that tests writing to /tmp cannot clobber > > > > > other tests or the stuff running on the main system. > > > > > > > > > > However, it's not guaranteed that a particular kernel has pid and mount > > > > > namespaces enabled. Mount (2.4.19) and pid (2.6.24) namespaces have > > > > > been around for a long time, but there's no hard requirement for the > > > > > latter to be enabled in the kernel. Therefore, this bugfix slips > > > > > namespace support in alongside the session id thing. > > > > > > > > > > Declaring CONFIG_PID_NS=n a deprecated configuration and removing > > > > > support should be a separate conversation, not something that I have to > > > > > do in a bug fix to get mainline QA back up. > > > > > > > > > > Cc: <fstests@vger.kernel.org> # v2024.12.08 > > > > > Fixes: 8973af00ec212f ("fstests: cleanup fsstress process management") > > > > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> > > > > > --- > > > > > check | 34 +++++++++++++++++++++++----------- > > > > > common/rc | 12 ++++++++++-- > > > > > src/nsexec.c | 18 +++++++++++++++--- > > > > > tests/generic/504 | 15 +++++++++++++-- > > > > > tools/run_seq_pidns | 28 ++++++++++++++++++++++++++++ > > > > > 5 files changed, 89 insertions(+), 18 deletions(-) > > > > > create mode 100755 tools/run_seq_pidns > > > > > > > > Same question as for session ids - is this all really necessary (or > > > > desired) if check-parallel executes check in it's own private PID > > > > namespace? > > > > Forgot to respond to this question -- > > > > Because check-parallel runs (will run?) each child ./check instance in a > > private namepsace, each of those instances will be isolated from the > > others. So no, it's probably not absolutely necessary. > > > > However, there are a couple of reasons to let it happen: (a) the private > > ns that ./check creates in _run_seq() isolates the actual test code from > > its parent ./check process; and (b) the process started by nsexec is > > considered to be the "init" process for that namespace, so when it dies, > > the kernel will kill -9 all other processes in that namespace, so we > > won't have any stray fsstress processes that bleed into the next test. > > Ok - that might be worth adding to the commit message so that anyone > looking at it in a few months time doesn't need to remember this > detail. It /does/ say that in _run_seq() but yeah I'll add it to the commit message too: "Instead, we could just use nsexec to run the test program with a private pid namespace so that each test instance can only see its own processes; and private mount namespace so that tests writing to /tmp cannot clobber other tests or the stuff running on the main system. Further, the process created by the clone(CLONE_NEWPID) call is considered the init process of that pid namespace, so all processes will be SIGKILL'd when the init process terminates, so we no longer need systemd scopes for externally enforced cleanup." --D > > -Dave. > -- > Dave Chinner > david@fromorbit.com >
diff --git a/check b/check index d854515ed1aac5..24b21cf139f927 100755 --- a/check +++ b/check @@ -674,6 +674,11 @@ _stash_test_status() { esac } +# Can we run in a private pid/mount namespace? +HAVE_PRIVATENS= +./tools/run_seq_pidns bash -c "exit 77" +test $? -eq 77 && HAVE_PRIVATENS=yes + # Can we run systemd scopes? HAVE_SYSTEMD_SCOPES= systemctl reset-failed "fstests-check" &>/dev/null @@ -691,22 +696,29 @@ _adjust_oom_score -500 # the system runs out of memory it'll be the test that gets killed and not the # test framework. The test is run in a separate process without any of our # functions, so we open-code adjusting the OOM score. -# -# If systemd is available, run the entire test script in a scope so that we can -# kill all subprocesses of the test if it fails to clean up after itself. This -# is essential for ensuring that the post-test unmount succeeds. Note that -# systemd doesn't automatically remove transient scopes that fail to terminate -# when systemd tells them to terminate (e.g. programs stuck in D state when -# systemd sends SIGKILL), so we use reset-failed to tear down the scope. -# -# Use setsid to run the test program with a separate session id so that we -# can pkill only the processes started by this test. _run_seq() { local res unset CHILDPID unset FSTESTS_ISOL # set by tools/run_seq_* - if [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then + if [ -n "${HAVE_PRIVATENS}" ]; then + # If pid and mount namespaces are available, run the whole test + # inside them so that the test cannot access any process or + # /tmp contents that it does not itself create. The ./$seq + # process is considered the "init" process of the pid + # namespace, so all subprocesses will be sent SIGKILL when it + # terminates. + ./tools/run_seq_pidns "./$seq" + res=$? + elif [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then + # If systemd is available, run the entire test script in a + # scope so that we can kill all subprocesses of the test if it + # fails to clean up after itself. This is essential for + # ensuring that the post-test unmount succeeds. Note that + # systemd doesn't automatically remove transient scopes that + # fail to terminate when systemd tells them to terminate (e.g. + # programs stuck in D state when systemd sends SIGKILL), so we + # use reset-failed to tear down the scope. local unit="$(systemd-escape "fs$seq").scope" systemctl reset-failed "${unit}" &> /dev/null systemd-run --quiet --unit "${unit}" --scope \ diff --git a/common/rc b/common/rc index bda80995f8dd55..25900533acb974 100644 --- a/common/rc +++ b/common/rc @@ -33,7 +33,11 @@ _test_sync() # Kill only the processes started by this test. _pkill() { - pkill --session 0 "$@" + if [ "$FSTESTS_ISOL" = "setsid" ]; then + pkill --session 0 "$@" + else + pkill "$@" + fi } # Common execution handling for fsstress invocation. @@ -2736,7 +2740,11 @@ _require_user_exists() # not, passing $SHELL in this manner works both for "su" and "su -c cmd". _su() { - su --session-command $SHELL "$@" + if [ "$FSTESTS_ISOL" = "setsid" ]; then + su --session-command $SHELL "$@" + else + su "$@" + fi } # check if a user exists and is able to execute commands. diff --git a/src/nsexec.c b/src/nsexec.c index 750d52df129716..5c0bc922153514 100644 --- a/src/nsexec.c +++ b/src/nsexec.c @@ -54,6 +54,7 @@ usage(char *pname) fpe(" If -M or -G is specified, -U is required\n"); fpe("-s Set uid/gid to 0 in the new user namespace\n"); fpe("-v Display verbose messages\n"); + fpe("-z Return child's return value\n"); fpe("\n"); fpe("Map strings for -M and -G consist of records of the form:\n"); fpe("\n"); @@ -144,6 +145,8 @@ int main(int argc, char *argv[]) { int flags, opt; + int return_child_status = 0; + int child_status = 0; pid_t child_pid; struct child_args args; char *uid_map, *gid_map; @@ -161,7 +164,7 @@ main(int argc, char *argv[]) setid = 0; gid_map = NULL; uid_map = NULL; - while ((opt = getopt(argc, argv, "+imnpuUM:G:vs")) != -1) { + while ((opt = getopt(argc, argv, "+imnpuUM:G:vsz")) != -1) { switch (opt) { case 'i': flags |= CLONE_NEWIPC; break; case 'm': flags |= CLONE_NEWNS; break; @@ -173,6 +176,7 @@ main(int argc, char *argv[]) case 'G': gid_map = optarg; break; case 'U': flags |= CLONE_NEWUSER; break; case 's': setid = 1; break; + case 'z': return_child_status = 1; break; default: usage(argv[0]); } } @@ -229,11 +233,19 @@ main(int argc, char *argv[]) close(args.pipe_fd[1]); - if (waitpid(child_pid, NULL, 0) == -1) /* Wait for child */ + if (waitpid(child_pid, &child_status, 0) == -1) /* Wait for child */ errExit("waitpid"); if (verbose) - printf("%s: terminating\n", argv[0]); + printf("%s: terminating %d\n", argv[0], WEXITSTATUS(child_status)); + + if (return_child_status) { + if (WIFEXITED(child_status)) + exit(WEXITSTATUS(child_status)); + if (WIFSIGNALED(child_status)) + exit(WTERMSIG(child_status) + 128); /* like sh */ + exit(EXIT_FAILURE); + } exit(EXIT_SUCCESS); } diff --git a/tests/generic/504 b/tests/generic/504 index 271c040e7b842a..96f18a0bbc7ba2 100755 --- a/tests/generic/504 +++ b/tests/generic/504 @@ -18,7 +18,7 @@ _cleanup() { exec {test_fd}<&- cd / - rm -f $tmp.* + rm -r -f $tmp.* } # Import common functions. @@ -35,13 +35,24 @@ echo inode $tf_inode >> $seqres.full # Create new fd by exec exec {test_fd}> $testfile -# flock locks the fd then exits, we should see the lock info even the owner is dead +# flock locks the fd then exits, we should see the lock info even the owner is +# dead. If we're using pid namespace isolation we have to move /proc so that +# we can access the /proc/locks from the init_pid_ns. +if [ "$FSTESTS_ISOL" = "privatens" ]; then + move_proc="$tmp.procdir" + mkdir -p "$move_proc" + mount --move /proc "$move_proc" +fi flock -x $test_fd cat /proc/locks >> $seqres.full # Checking grep -q ":$tf_inode " /proc/locks || echo "lock info not found" +if [ -n "$move_proc" ]; then + mount --move "$move_proc" /proc +fi + # success, all done status=0 echo "Silence is golden" diff --git a/tools/run_seq_pidns b/tools/run_seq_pidns new file mode 100755 index 00000000000000..df94974ab30c3c --- /dev/null +++ b/tools/run_seq_pidns @@ -0,0 +1,28 @@ +#!/bin/bash + +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2025 Oracle. All Rights Reserved. +# +# Try starting things in a private pid/mount namespace with a private /tmp +# and /proc so that child process trees cannot interfere with each other. + +if [ -n "${FSTESTS_ISOL}" ]; then + for path in /proc /tmp; do + mount --make-private "$path" + done + mount -t proc proc /proc + mount -t tmpfs tmpfs /tmp + + # Allow the test to become a target of the oom killer + oom_knob="/proc/self/oom_score_adj" + test -w "${oom_knob}" && echo 250 > "${oom_knob}" + + exec "$@" +fi + +if [ -z "$1" ] || [ "$1" = "--help" ]; then + echo "Usage: $0 command [args...]" + exit 1 +fi + +FSTESTS_ISOL=privatens exec "$(dirname "$0")/../src/nsexec" -z -m -p "$0" "$@"