Message ID | 20220619134657.1846292-4-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | aborted fstests may leave frozen fs behind | expand |
On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > Those tests failed to cleanup background jobs after test > is interrupted. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > tests/xfs/422 | 8 ++++++++ > tests/xfs/517 | 1 + > 2 files changed, 9 insertions(+) > > diff --git a/tests/xfs/422 b/tests/xfs/422 > index a83a66df..8e9a3576 100755 > --- a/tests/xfs/422 > +++ b/tests/xfs/422 > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > _register_cleanup "_cleanup" BUS > > +# Override the default cleanup function. > +_cleanup() > +{ > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > + cd / > + rm -rf $tmp.* > +} > + > # Import common functions. > . ./common/filter > . ./common/fuzzy > diff --git a/tests/xfs/517 b/tests/xfs/517 > index 961668e3..18404248 100755 > --- a/tests/xfs/517 > +++ b/tests/xfs/517 > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > # Override the default cleanup function. > _cleanup() > { > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 Besides the missing wait, this cleanup is still racy, because stress_loop can spawn a new fstress process after killall /proc iteration. Worst, freeze_loop can freeze after killall /proc iteration. The correct solution (I think) is to record the pid of the XXX_loop sub-shells and kill those, which should also solve the "Terminated" false positive error. Will give that a shot. Thanks, Amir.
On Tue, Jun 21, 2022 at 07:02:02AM +0300, Amir Goldstein wrote: > On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > Those tests failed to cleanup background jobs after test > > is interrupted. > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > tests/xfs/422 | 8 ++++++++ > > tests/xfs/517 | 1 + > > 2 files changed, 9 insertions(+) > > > > diff --git a/tests/xfs/422 b/tests/xfs/422 > > index a83a66df..8e9a3576 100755 > > --- a/tests/xfs/422 > > +++ b/tests/xfs/422 > > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > > > _register_cleanup "_cleanup" BUS > > > > +# Override the default cleanup function. > > +_cleanup() > > +{ > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > + cd / > > + rm -rf $tmp.* > > +} > > + > > # Import common functions. > > . ./common/filter > > . ./common/fuzzy > > diff --git a/tests/xfs/517 b/tests/xfs/517 > > index 961668e3..18404248 100755 > > --- a/tests/xfs/517 > > +++ b/tests/xfs/517 > > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > > # Override the default cleanup function. > > _cleanup() > > { > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > Besides the missing wait, this cleanup is still racy, because stress_loop > can spawn a new fstress process after killall /proc iteration. > > Worst, freeze_loop can freeze after killall /proc iteration. > > The correct solution (I think) is to record the pid of the XXX_loop > sub-shells and kill those, which should also solve the "Terminated" > false positive error. > > Will give that a shot. FWIW I already have patches reworking 422 and all the other scrub-related stress tests: https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes I swear I'll send all these some day, if I ever get caught up... Delegating LTS maintenance is a big help, but as it is I still can't stay abreast of all the mainline patchsets /and/ send my own stuff. :( --D > Thanks, > Amir.
On Thu, Jun 23, 2022 at 9:04 PM Darrick J. Wong <djwong@kernel.org> wrote: > > On Tue, Jun 21, 2022 at 07:02:02AM +0300, Amir Goldstein wrote: > > On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > Those tests failed to cleanup background jobs after test > > > is interrupted. > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > --- > > > tests/xfs/422 | 8 ++++++++ > > > tests/xfs/517 | 1 + > > > 2 files changed, 9 insertions(+) > > > > > > diff --git a/tests/xfs/422 b/tests/xfs/422 > > > index a83a66df..8e9a3576 100755 > > > --- a/tests/xfs/422 > > > +++ b/tests/xfs/422 > > > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > > > > > _register_cleanup "_cleanup" BUS > > > > > > +# Override the default cleanup function. > > > +_cleanup() > > > +{ > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > + cd / > > > + rm -rf $tmp.* > > > +} > > > + > > > # Import common functions. > > > . ./common/filter > > > . ./common/fuzzy > > > diff --git a/tests/xfs/517 b/tests/xfs/517 > > > index 961668e3..18404248 100755 > > > --- a/tests/xfs/517 > > > +++ b/tests/xfs/517 > > > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > > > # Override the default cleanup function. > > > _cleanup() > > > { > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > Besides the missing wait, this cleanup is still racy, because stress_loop > > can spawn a new fstress process after killall /proc iteration. > > > > Worst, freeze_loop can freeze after killall /proc iteration. > > > > The correct solution (I think) is to record the pid of the XXX_loop > > sub-shells and kill those, which should also solve the "Terminated" > > false positive error. > > > > Will give that a shot. > > FWIW I already have patches reworking 422 and all the other > scrub-related stress tests: > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes > That looks very nice. The cleanup looks very similar to the one I posted for v2, but instead of "killall xfs_io fsstress" you would be better off with "kill $__SCRUB_STRESS_FREEZE_PID" killing the child process and waiting for its parent bash process has the side effect of the bash parent emitting "Terminated" or "Killed" message with breaks golden output if the inner loop did not already exit, which is the behavior that got me started on doing this cleanup. > I swear I'll send all these some day, if I ever get caught up... > Delegating LTS maintenance is a big help, but as it is I still can't > stay abreast of all the mainline patchsets /and/ send my own stuff. :( > I will respond to that in another email. Thanks, Amir.
[+fsdevel] > > I swear I'll send all these some day, if I ever get caught up... > Delegating LTS maintenance is a big help, but as it is I still can't > stay abreast of all the mainline patchsets /and/ send my own stuff. :( I have to repeat what I said in LSFMM about the LTP project and what fstests could be like. Companies put dedicated engineers to work as proactive LTP maintainers. There is absolutely no reason that companies won't do the same for fstests. If only every large corp. that employs >10 fs developers will assign a halftime engineer for fstests maintenance, their fs developer's work would become so much more productive and their fs product will become more reliable. I think the fact that this is not happening is a failure on our part to communicate that to our managers. From my experience, if you had sent stuff like your fstests cleanups to the LTP maintainers and ask for their help to land it, they would thank you for the work you did and take care of all the testing on all platforms and fixing all the code style and framework issues. LTP maintainers constantly work on improving the framework and providing more features to test writers as well as on converting old tests to use new infrastructure. Stuff like Dave's work on sorting up the cleanup mess or the groups cleanup and groups speedup - all of those do not have to add load to busy maintainers - life can be different! Taking responsibility away from developers to deliver their own tests is a slippery slope, but getting help and working together is essential for offloading work from busy maintainers. Thanks, Amir.
On Fri, Jun 24, 2022 at 07:36:51AM +0300, Amir Goldstein wrote: > On Thu, Jun 23, 2022 at 9:04 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > On Tue, Jun 21, 2022 at 07:02:02AM +0300, Amir Goldstein wrote: > > > On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > Those tests failed to cleanup background jobs after test > > > > is interrupted. > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > --- > > > > tests/xfs/422 | 8 ++++++++ > > > > tests/xfs/517 | 1 + > > > > 2 files changed, 9 insertions(+) > > > > > > > > diff --git a/tests/xfs/422 b/tests/xfs/422 > > > > index a83a66df..8e9a3576 100755 > > > > --- a/tests/xfs/422 > > > > +++ b/tests/xfs/422 > > > > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > > > > > > > _register_cleanup "_cleanup" BUS > > > > > > > > +# Override the default cleanup function. > > > > +_cleanup() > > > > +{ > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > + cd / > > > > + rm -rf $tmp.* > > > > +} > > > > + > > > > # Import common functions. > > > > . ./common/filter > > > > . ./common/fuzzy > > > > diff --git a/tests/xfs/517 b/tests/xfs/517 > > > > index 961668e3..18404248 100755 > > > > --- a/tests/xfs/517 > > > > +++ b/tests/xfs/517 > > > > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > > > > # Override the default cleanup function. > > > > _cleanup() > > > > { > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > > > Besides the missing wait, this cleanup is still racy, because stress_loop > > > can spawn a new fstress process after killall /proc iteration. > > > > > > Worst, freeze_loop can freeze after killall /proc iteration. > > > > > > The correct solution (I think) is to record the pid of the XXX_loop > > > sub-shells and kill those, which should also solve the "Terminated" > > > false positive error. > > > > > > Will give that a shot. > > > > FWIW I already have patches reworking 422 and all the other > > scrub-related stress tests: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes > > > > That looks very nice. > The cleanup looks very similar to the one I posted for v2, > but instead of "killall xfs_io fsstress" you would be better off with > "kill $__SCRUB_STRESS_FREEZE_PID" > > killing the child process and waiting for its parent bash > process has the side effect of the bash parent emitting > "Terminated" or "Killed" message with breaks golden output > if the inner loop did not already exit, which is the behavior > that got me started on doing this cleanup. > > > I swear I'll send all these some day, if I ever get caught up... > > Delegating LTS maintenance is a big help, but as it is I still can't > > stay abreast of all the mainline patchsets /and/ send my own stuff. :( > > > > I will respond to that in another email. Hi Amir, So ... will you singly resend a V3 patch to remove the xfs/422 changes in this patch? Thanks, Zorro > > Thanks, > Amir. >
On Fri, Jun 24, 2022 at 9:31 AM Zorro Lang <zlang@redhat.com> wrote: > > On Fri, Jun 24, 2022 at 07:36:51AM +0300, Amir Goldstein wrote: > > On Thu, Jun 23, 2022 at 9:04 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > > > On Tue, Jun 21, 2022 at 07:02:02AM +0300, Amir Goldstein wrote: > > > > On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > > Those tests failed to cleanup background jobs after test > > > > > is interrupted. > > > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > > --- > > > > > tests/xfs/422 | 8 ++++++++ > > > > > tests/xfs/517 | 1 + > > > > > 2 files changed, 9 insertions(+) > > > > > > > > > > diff --git a/tests/xfs/422 b/tests/xfs/422 > > > > > index a83a66df..8e9a3576 100755 > > > > > --- a/tests/xfs/422 > > > > > +++ b/tests/xfs/422 > > > > > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > > > > > > > > > _register_cleanup "_cleanup" BUS > > > > > > > > > > +# Override the default cleanup function. > > > > > +_cleanup() > > > > > +{ > > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > > + cd / > > > > > + rm -rf $tmp.* > > > > > +} > > > > > + > > > > > # Import common functions. > > > > > . ./common/filter > > > > > . ./common/fuzzy > > > > > diff --git a/tests/xfs/517 b/tests/xfs/517 > > > > > index 961668e3..18404248 100755 > > > > > --- a/tests/xfs/517 > > > > > +++ b/tests/xfs/517 > > > > > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > > > > > # Override the default cleanup function. > > > > > _cleanup() > > > > > { > > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > > > > > Besides the missing wait, this cleanup is still racy, because stress_loop > > > > can spawn a new fstress process after killall /proc iteration. > > > > > > > > Worst, freeze_loop can freeze after killall /proc iteration. > > > > > > > > The correct solution (I think) is to record the pid of the XXX_loop > > > > sub-shells and kill those, which should also solve the "Terminated" > > > > false positive error. > > > > > > > > Will give that a shot. > > > > > > FWIW I already have patches reworking 422 and all the other > > > scrub-related stress tests: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes > > > > > > > That looks very nice. > > The cleanup looks very similar to the one I posted for v2, > > but instead of "killall xfs_io fsstress" you would be better off with > > "kill $__SCRUB_STRESS_FREEZE_PID" > > > > killing the child process and waiting for its parent bash > > process has the side effect of the bash parent emitting > > "Terminated" or "Killed" message with breaks golden output > > if the inner loop did not already exit, which is the behavior > > that got me started on doing this cleanup. > > > > > I swear I'll send all these some day, if I ever get caught up... > > > Delegating LTS maintenance is a big help, but as it is I still can't > > > stay abreast of all the mainline patchsets /and/ send my own stuff. :( > > > > > > > I will respond to that in another email. > > Hi Amir, > > So ... will you singly resend a V3 patch to remove the xfs/422 changes in this patch? > I don't understand. V2 patch should be reviewed https://lore.kernel.org/fstests/20220621173729.2135249-1-amir73il@gmail.com/ The fact that Darrick has future work pending to cleanup xfs/422 should not stop a cleanup patch to be applied now. Am I missing something? Thanks, Amir.
On Fri, Jun 24, 2022 at 09:41:34AM +0300, Amir Goldstein wrote: > On Fri, Jun 24, 2022 at 9:31 AM Zorro Lang <zlang@redhat.com> wrote: > > > > On Fri, Jun 24, 2022 at 07:36:51AM +0300, Amir Goldstein wrote: > > > On Thu, Jun 23, 2022 at 9:04 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > > > > > On Tue, Jun 21, 2022 at 07:02:02AM +0300, Amir Goldstein wrote: > > > > > On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > > > > Those tests failed to cleanup background jobs after test > > > > > > is interrupted. > > > > > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > > > --- > > > > > > tests/xfs/422 | 8 ++++++++ > > > > > > tests/xfs/517 | 1 + > > > > > > 2 files changed, 9 insertions(+) > > > > > > > > > > > > diff --git a/tests/xfs/422 b/tests/xfs/422 > > > > > > index a83a66df..8e9a3576 100755 > > > > > > --- a/tests/xfs/422 > > > > > > +++ b/tests/xfs/422 > > > > > > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > > > > > > > > > > > _register_cleanup "_cleanup" BUS > > > > > > > > > > > > +# Override the default cleanup function. > > > > > > +_cleanup() > > > > > > +{ > > > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > > > + cd / > > > > > > + rm -rf $tmp.* > > > > > > +} > > > > > > + > > > > > > # Import common functions. > > > > > > . ./common/filter > > > > > > . ./common/fuzzy > > > > > > diff --git a/tests/xfs/517 b/tests/xfs/517 > > > > > > index 961668e3..18404248 100755 > > > > > > --- a/tests/xfs/517 > > > > > > +++ b/tests/xfs/517 > > > > > > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > > > > > > # Override the default cleanup function. > > > > > > _cleanup() > > > > > > { > > > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > > > > > > > Besides the missing wait, this cleanup is still racy, because stress_loop > > > > > can spawn a new fstress process after killall /proc iteration. > > > > > > > > > > Worst, freeze_loop can freeze after killall /proc iteration. > > > > > > > > > > The correct solution (I think) is to record the pid of the XXX_loop > > > > > sub-shells and kill those, which should also solve the "Terminated" > > > > > false positive error. > > > > > > > > > > Will give that a shot. > > > > > > > > FWIW I already have patches reworking 422 and all the other > > > > scrub-related stress tests: > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes > > > > > > > > > > That looks very nice. > > > The cleanup looks very similar to the one I posted for v2, > > > but instead of "killall xfs_io fsstress" you would be better off with > > > "kill $__SCRUB_STRESS_FREEZE_PID" > > > > > > killing the child process and waiting for its parent bash > > > process has the side effect of the bash parent emitting > > > "Terminated" or "Killed" message with breaks golden output > > > if the inner loop did not already exit, which is the behavior > > > that got me started on doing this cleanup. > > > > > > > I swear I'll send all these some day, if I ever get caught up... > > > > Delegating LTS maintenance is a big help, but as it is I still can't > > > > stay abreast of all the mainline patchsets /and/ send my own stuff. :( > > > > > > > > > > I will respond to that in another email. > > > > Hi Amir, > > > > So ... will you singly resend a V3 patch to remove the xfs/422 changes in this patch? > > > > I don't understand. > > V2 patch should be reviewed > > https://lore.kernel.org/fstests/20220621173729.2135249-1-amir73il@gmail.com/ > > The fact that Darrick has future work pending to cleanup xfs/422 > should not stop a cleanup patch to be applied now. Oh, maybe I misunderstood. I thought Darrick meant his patchset (which haven't been sent out) contains what you did on xfs/422. If that's misunderstanding, I'll review and merge your V2 :) Thanks, Zorro > > Am I missing something? > > Thanks, > Amir. >
On Fri, Jun 24, 2022 at 07:49:29AM +0300, Amir Goldstein wrote: > [+fsdevel] > > > > > I swear I'll send all these some day, if I ever get caught up... > > Delegating LTS maintenance is a big help, but as it is I still can't > > stay abreast of all the mainline patchsets /and/ send my own stuff. :( > > I have to repeat what I said in LSFMM about the LTP project and > what fstests could be like. > > Companies put dedicated engineers to work as proactive LTP > maintainers. There is absolutely no reason that companies won't > do the same for fstests. > > If only every large corp. that employs >10 fs developers will assign > a halftime engineer for fstests maintenance, their fs developer's > work would become so much more productive and their fs product > will become more reliable. > > I think the fact that this is not happening is a failure on our part to > communicate that to our managers. I'd enjoy that too; I'll bring it up the next time they start asking about budgeting here (which means in 2 weeks). > From my experience, if you had sent stuff like your fstests cleanups > to the LTP maintainers and ask for their help to land it, they would > thank you for the work you did and take care of all the testing > on all platforms and fixing all the code style and framework issues. Though to be fair -- a lot of the fstests changes backing up in djwong-dev exist to enable testing of the online fsck feature. This whole year I've deprioritized sending any of those patches to concentrate on writing the design documentation for online fsck[1]. Now that I've submitted *that*, I'm hoping to start code review once I convince a few people to grok the design doc. So perhaps next week I'll resume the patchbombing that I've become infamous for doing. In the mean time, no objections to merging /this/ series. The group labelling is a little odd and I think that should be separate fix from adding _require_freeze), but if zorro's ok with its present form then so be it. --D [1] https://lore.kernel.org/linux-xfs/165456652256.167418.912764930038710353.stgit@magnolia/ > LTP maintainers constantly work on improving the framework and > providing more features to test writers as well as on converting old > tests to use new infrastructure. > > Stuff like Dave's work on sorting up the cleanup mess or the groups > cleanup and groups speedup - all of those do not have to add load > to busy maintainers - life can be different! > > Taking responsibility away from developers to deliver their own tests > is a slippery slope, but getting help and working together is essential > for offloading work from busy maintainers. > > Thanks, > Amir.
diff --git a/tests/xfs/422 b/tests/xfs/422 index a83a66df..8e9a3576 100755 --- a/tests/xfs/422 +++ b/tests/xfs/422 @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze _register_cleanup "_cleanup" BUS +# Override the default cleanup function. +_cleanup() +{ + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 + cd / + rm -rf $tmp.* +} + # Import common functions. . ./common/filter . ./common/fuzzy diff --git a/tests/xfs/517 b/tests/xfs/517 index 961668e3..18404248 100755 --- a/tests/xfs/517 +++ b/tests/xfs/517 @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS # Override the default cleanup function. _cleanup() { + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 cd / rm -rf $tmp.* }
Those tests failed to cleanup background jobs after test is interrupted. Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- tests/xfs/422 | 8 ++++++++ tests/xfs/517 | 1 + 2 files changed, 9 insertions(+)