diff mbox series

kunit: Fix race condition in try-catch completion

Message ID 20240412025903.489470-1-davidgow@google.com (mailing list archive)
State Accepted
Commit 1eb69ded805103ce3ddc8b1a207abd8c24ca9e63
Delegated to: Brendan Higgins
Headers show
Series kunit: Fix race condition in try-catch completion | expand

Commit Message

David Gow April 12, 2024, 2:59 a.m. UTC
KUnit's try-catch infrastructure now uses vfork_done, which is always
set to a valid completion when a kthread is created, but which is set to
NULL once the thread terminates. This creates a race condition, where
the kthread exits before we can wait on it.

Keep a copy of vfork_done, which is taken before we wake_up_process()
and so valid, and wait on that instead.

Fixes: 4de2a8e4cca4 ("kunit: Handle test faults")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/lkml/20240410102710.35911-1-naresh.kamboju@linaro.org/
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Acked-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: David Gow <davidgow@google.com>
---
 lib/kunit/try-catch.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Rae Moar April 12, 2024, 9:32 p.m. UTC | #1
On Thu, Apr 11, 2024 at 10:59 PM David Gow <davidgow@google.com> wrote:
>
> KUnit's try-catch infrastructure now uses vfork_done, which is always
> set to a valid completion when a kthread is created, but which is set to
> NULL once the thread terminates. This creates a race condition, where
> the kthread exits before we can wait on it.
>
> Keep a copy of vfork_done, which is taken before we wake_up_process()
> and so valid, and wait on that instead.
>
> Fixes: 4de2a8e4cca4 ("kunit: Handle test faults")
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> Closes: https://lore.kernel.org/lkml/20240410102710.35911-1-naresh.kamboju@linaro.org/
> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
> Acked-by: Mickaël Salaün <mic@digikod.net>
> Signed-off-by: David Gow <davidgow@google.com>

Hello,

This fix looks good to me. I have tested it and besides the fortify
test error discussed in the previous patch series I am happy.

Thanks!
-Rae

Reviewed-by: Rae Moar <rmoar@google.com>


> ---
>  lib/kunit/try-catch.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c
> index fa687278ccc9..6bbe0025b079 100644
> --- a/lib/kunit/try-catch.c
> +++ b/lib/kunit/try-catch.c
> @@ -63,6 +63,7 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
>  {
>         struct kunit *test = try_catch->test;
>         struct task_struct *task_struct;
> +       struct completion *task_done;
>         int exit_code, time_remaining;
>
>         try_catch->context = context;
> @@ -75,13 +76,16 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
>                 return;
>         }
>         get_task_struct(task_struct);
> -       wake_up_process(task_struct);
>         /*
>          * As for a vfork(2), task_struct->vfork_done (pointing to the
>          * underlying kthread->exited) can be used to wait for the end of a
> -        * kernel thread.
> +        * kernel thread. It is set to NULL when the thread exits, so we
> +        * keep a copy here.
>          */
> -       time_remaining = wait_for_completion_timeout(task_struct->vfork_done,
> +       task_done = task_struct->vfork_done;
> +       wake_up_process(task_struct);
> +
> +       time_remaining = wait_for_completion_timeout(task_done,
>                                                      kunit_test_timeout());
>         if (time_remaining == 0) {
>                 try_catch->try_result = -ETIMEDOUT;
> --
> 2.44.0.683.g7961c838ac-goog
>
Miguel Ojeda April 13, 2024, 9:04 p.m. UTC | #2
On Thu, Apr 11, 2024 at 10:59 PM David Gow <davidgow@google.com> wrote:
>
> KUnit's try-catch infrastructure now uses vfork_done, which is always
> set to a valid completion when a kthread is created, but which is set to
> NULL once the thread terminates. This creates a race condition, where
> the kthread exits before we can wait on it.
>
> Keep a copy of vfork_done, which is taken before we wake_up_process()
> and so valid, and wait on that instead.
>
> Fixes: 4de2a8e4cca4 ("kunit: Handle test faults")
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> Closes: https://lore.kernel.org/lkml/20240410102710.35911-1-naresh.kamboju@linaro.org/
> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
> Acked-by: Mickaël Salaün <mic@digikod.net>
> Signed-off-by: David Gow <davidgow@google.com>

I noticed it with the Rust tests too, and indeed this fixed it:

Tested-by: Miguel Ojeda <ojeda@kernel.org>

Thanks!

Cheers,
Miguel
diff mbox series

Patch

diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c
index fa687278ccc9..6bbe0025b079 100644
--- a/lib/kunit/try-catch.c
+++ b/lib/kunit/try-catch.c
@@ -63,6 +63,7 @@  void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
 {
 	struct kunit *test = try_catch->test;
 	struct task_struct *task_struct;
+	struct completion *task_done;
 	int exit_code, time_remaining;
 
 	try_catch->context = context;
@@ -75,13 +76,16 @@  void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
 		return;
 	}
 	get_task_struct(task_struct);
-	wake_up_process(task_struct);
 	/*
 	 * As for a vfork(2), task_struct->vfork_done (pointing to the
 	 * underlying kthread->exited) can be used to wait for the end of a
-	 * kernel thread.
+	 * kernel thread. It is set to NULL when the thread exits, so we
+	 * keep a copy here.
 	 */
-	time_remaining = wait_for_completion_timeout(task_struct->vfork_done,
+	task_done = task_struct->vfork_done;
+	wake_up_process(task_struct);
+
+	time_remaining = wait_for_completion_timeout(task_done,
 						     kunit_test_timeout());
 	if (time_remaining == 0) {
 		try_catch->try_result = -ETIMEDOUT;