Message ID | 20230316123028.2890338-1-elver@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | bcb7ee79029dcaeb09668a4d1489de256829a7cc |
Headers | show |
Series | [v6,1/2] posix-timers: Prefer delivery of signals to the current thread | expand |
On Thu, 16 Mar 2023 at 13:31, Marco Elver <elver@google.com> wrote: > > From: Dmitry Vyukov <dvyukov@google.com> > > POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main > thread of a thread group for signal delivery. However, this has a > significant downside: it requires waking up a potentially idle thread. > > Instead, prefer to deliver signals to the current thread (in the same > thread group) if SIGEV_THREAD_ID is not set by the user. This does not > change guaranteed semantics, since POSIX process CPU time timers have > never guaranteed that signal delivery is to a specific thread (without > SIGEV_THREAD_ID set). > > The effect is that we no longer wake up potentially idle threads, and > the kernel is no longer biased towards delivering the timer signal to > any particular thread (which better distributes the timer signals esp. > when multiple timers fire concurrently). > > Signed-off-by: Dmitry Vyukov <dvyukov@google.com> > Suggested-by: Oleg Nesterov <oleg@redhat.com> > Reviewed-by: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Marco Elver <elver@google.com> Gentle ping... Thanks, -- Marco
On Thu, 16 Mar 2023 at 13:31, Marco Elver <elver@google.com> wrote: > > From: Dmitry Vyukov <dvyukov@google.com> > > POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main > thread of a thread group for signal delivery. However, this has a > significant downside: it requires waking up a potentially idle thread. > > Instead, prefer to deliver signals to the current thread (in the same > thread group) if SIGEV_THREAD_ID is not set by the user. This does not > change guaranteed semantics, since POSIX process CPU time timers have > never guaranteed that signal delivery is to a specific thread (without > SIGEV_THREAD_ID set). > > The effect is that we no longer wake up potentially idle threads, and > the kernel is no longer biased towards delivering the timer signal to > any particular thread (which better distributes the timer signals esp. > when multiple timers fire concurrently). > > Signed-off-by: Dmitry Vyukov <dvyukov@google.com> > Suggested-by: Oleg Nesterov <oleg@redhat.com> > Reviewed-by: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Marco Elver <elver@google.com> > --- > v6: > - Split test from this patch. > - Update wording on what this patch aims to improve. > > v5: > - Rebased onto v6.2. > > v4: > - Restructured checks in send_sigqueue() as suggested. > > v3: > - Switched to the completely different implementation (much simpler) > based on the Oleg's idea. > > RFC v2: > - Added additional Cc as Thomas asked. > --- > kernel/signal.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/kernel/signal.c b/kernel/signal.c > index 8cb28f1df294..605445fa27d4 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) > /* > * Now find a thread we can wake up to take the signal off the queue. > * > - * If the main thread wants the signal, it gets first crack. > - * Probably the least surprising to the average bear. > + * Try the suggested task first (may or may not be the main thread). > */ > if (wants_signal(sig, p)) > t = p; > @@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) > > ret = -1; > rcu_read_lock(); > + /* > + * This function is used by POSIX timers to deliver a timer signal. > + * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID > + * set), the signal must be delivered to the specific thread (queues > + * into t->pending). > + * > + * Where type is not PIDTYPE_PID, signals must just be delivered to the > + * current process. In this case, prefer to deliver to current if it is > + * in the same thread group as the target, as it avoids unnecessarily > + * waking up a potentially idle task. > + */ > t = pid_task(pid, type); > - if (!t || !likely(lock_task_sighand(t, &flags))) > + if (!t) > + goto ret; > + if (type != PIDTYPE_PID && same_thread_group(t, current)) > + t = current; > + if (!likely(lock_task_sighand(t, &flags))) > goto ret; > > ret = 1; /* the signal is ignored */ > @@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) > q->info.si_overrun = 0; > > signalfd_notify(t, sig); > + /* > + * If the type is not PIDTYPE_PID, we just use shared_pending, which > + * won't guarantee that the specified task will receive the signal, but > + * is sufficient if t==current in the common case. > + */ > pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; > list_add_tail(&q->list, &pending->list); > sigaddset(&pending->signal, sig); > -- One last semi-gentle ping. ;-) 1. We're seeing that in some applications that use POSIX timers heavily, but where the main thread is mostly idle, the main thread receives a disproportional amount of the signals along with being woken up constantly. This is bad, because the main thread usually waits with the help of a futex or really long sleeps. Now the main thread will steal time (to go back to sleep) from another thread that could have instead just proceeded with whatever it was doing. 2. Delivering signals to random threads is currently way too expensive. We need to resort to this crazy algorithm: 1) receive timer signal, 2) check if main thread, 3) if main thread (which is likely), pick a random thread and do tgkill. To find a random thread, iterate /proc/self/task, but that's just abysmal for various reasons. Other alternatives, like inherited task clock perf events are too expensive as soon as we need to enable/disable the timers (does IPIs), and maintaining O(#threads) timers is just as horrible. This patch solves both the above issues. We acknowledge the unfortunate situation of attributing this patch to one clear subsystem and owner: it straddles into signal delivery and POSIX timers territory, and perhaps some scheduling. The patch itself only touches kernel/signal.c. If anyone has serious objections, please shout (soon'ish). Given the patch has been reviewed by Oleg, and scrutinized by Dmitry and myself, presumably we need to find a tree that currently takes kernel/signal.c patches? Thanks! -- Marco
Le Thu, Apr 06, 2023 at 04:12:04PM +0200, Marco Elver a écrit : > On Thu, 16 Mar 2023 at 13:31, Marco Elver <elver@google.com> wrote: > One last semi-gentle ping. ;-) > > 1. We're seeing that in some applications that use POSIX timers > heavily, but where the main thread is mostly idle, the main thread > receives a disproportional amount of the signals along with being > woken up constantly. This is bad, because the main thread usually > waits with the help of a futex or really long sleeps. Now the main > thread will steal time (to go back to sleep) from another thread that > could have instead just proceeded with whatever it was doing. > > 2. Delivering signals to random threads is currently way too > expensive. We need to resort to this crazy algorithm: 1) receive timer > signal, 2) check if main thread, 3) if main thread (which is likely), > pick a random thread and do tgkill. To find a random thread, iterate > /proc/self/task, but that's just abysmal for various reasons. Other > alternatives, like inherited task clock perf events are too expensive > as soon as we need to enable/disable the timers (does IPIs), and > maintaining O(#threads) timers is just as horrible. > > This patch solves both the above issues. > > We acknowledge the unfortunate situation of attributing this patch to > one clear subsystem and owner: it straddles into signal delivery and > POSIX timers territory, and perhaps some scheduling. The patch itself > only touches kernel/signal.c. > > If anyone has serious objections, please shout (soon'ish). Given the > patch has been reviewed by Oleg, and scrutinized by Dmitry and myself, > presumably we need to find a tree that currently takes kernel/signal.c > patches? > > Thanks! Thanks for the reminder! In the very unlikely case Thomas ignores this before the next merge window, I'll tentatively do a pull request to Linus. Thanks. > > -- Marco
On Thu, Mar 16, 2023 at 01:30:27PM +0100, Marco Elver wrote: > From: Dmitry Vyukov <dvyukov@google.com> > > POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main > thread of a thread group for signal delivery. However, this has a > significant downside: it requires waking up a potentially idle thread. > > Instead, prefer to deliver signals to the current thread (in the same > thread group) if SIGEV_THREAD_ID is not set by the user. This does not > change guaranteed semantics, since POSIX process CPU time timers have > never guaranteed that signal delivery is to a specific thread (without > SIGEV_THREAD_ID set). > > The effect is that we no longer wake up potentially idle threads, and > the kernel is no longer biased towards delivering the timer signal to > any particular thread (which better distributes the timer signals esp. > when multiple timers fire concurrently). > > Signed-off-by: Dmitry Vyukov <dvyukov@google.com> > Suggested-by: Oleg Nesterov <oleg@redhat.com> > Reviewed-by: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Marco Elver <elver@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/signal.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/kernel/signal.c b/kernel/signal.c > index 8cb28f1df294..605445fa27d4 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) > /* > * Now find a thread we can wake up to take the signal off the queue. > * > - * If the main thread wants the signal, it gets first crack. > - * Probably the least surprising to the average bear. > + * Try the suggested task first (may or may not be the main thread). > */ > if (wants_signal(sig, p)) > t = p; > @@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) > > ret = -1; > rcu_read_lock(); > + /* > + * This function is used by POSIX timers to deliver a timer signal. > + * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID > + * set), the signal must be delivered to the specific thread (queues > + * into t->pending). > + * > + * Where type is not PIDTYPE_PID, signals must just be delivered to the > + * current process. In this case, prefer to deliver to current if it is > + * in the same thread group as the target, as it avoids unnecessarily > + * waking up a potentially idle task. > + */ > t = pid_task(pid, type); > - if (!t || !likely(lock_task_sighand(t, &flags))) > + if (!t) > + goto ret; > + if (type != PIDTYPE_PID && same_thread_group(t, current)) > + t = current; > + if (!likely(lock_task_sighand(t, &flags))) > goto ret; > > ret = 1; /* the signal is ignored */ > @@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) > q->info.si_overrun = 0; > > signalfd_notify(t, sig); > + /* > + * If the type is not PIDTYPE_PID, we just use shared_pending, which > + * won't guarantee that the specified task will receive the signal, but > + * is sufficient if t==current in the common case. > + */ > pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; > list_add_tail(&q->list, &pending->list); > sigaddset(&pending->signal, sig); > -- > 2.40.0.rc1.284.g88254d51c5-goog >
On Thu, Mar 16, 2023 at 5:30 AM Marco Elver <elver@google.com> wrote: > > From: Dmitry Vyukov <dvyukov@google.com> > > POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main > thread of a thread group for signal delivery. However, this has a > significant downside: it requires waking up a potentially idle thread. > > Instead, prefer to deliver signals to the current thread (in the same > thread group) if SIGEV_THREAD_ID is not set by the user. This does not > change guaranteed semantics, since POSIX process CPU time timers have > never guaranteed that signal delivery is to a specific thread (without > SIGEV_THREAD_ID set). > > The effect is that we no longer wake up potentially idle threads, and > the kernel is no longer biased towards delivering the timer signal to > any particular thread (which better distributes the timer signals esp. > when multiple timers fire concurrently). > > Signed-off-by: Dmitry Vyukov <dvyukov@google.com> > Suggested-by: Oleg Nesterov <oleg@redhat.com> > Reviewed-by: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Marco Elver <elver@google.com> Apologies for drudging up this old thread. I wanted to ask if anyone had objections to including this in the -stable trees? After this and the follow-on patch e797203fb3ba ("selftests/timers/posix_timers: Test delivery of signals across threads") landed, folks testing older kernels with the latest selftests started to see the new test checking for this behavior to stall. Thomas did submit an adjustment to the test here to avoid the stall: https://lore.kernel.org/lkml/20230606142031.071059989@linutronix.de/, but it didn't seem to land, however that would just result in the test failing instead of hanging. This change does seem to cherry-pick cleanly back to at least stable/linux-5.10.y cleanly, so it looks simple to pull this change back. But I wanted to make sure there wasn't anything subtle I was missing before sending patches. thanks -john
On Mon, 1 Apr 2024 at 22:17, John Stultz <jstultz@google.com> wrote: > > On Thu, Mar 16, 2023 at 5:30 AM Marco Elver <elver@google.com> wrote: > > > > From: Dmitry Vyukov <dvyukov@google.com> > > > > POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main > > thread of a thread group for signal delivery. However, this has a > > significant downside: it requires waking up a potentially idle thread. > > > > Instead, prefer to deliver signals to the current thread (in the same > > thread group) if SIGEV_THREAD_ID is not set by the user. This does not > > change guaranteed semantics, since POSIX process CPU time timers have > > never guaranteed that signal delivery is to a specific thread (without > > SIGEV_THREAD_ID set). > > > > The effect is that we no longer wake up potentially idle threads, and > > the kernel is no longer biased towards delivering the timer signal to > > any particular thread (which better distributes the timer signals esp. > > when multiple timers fire concurrently). > > > > Signed-off-by: Dmitry Vyukov <dvyukov@google.com> > > Suggested-by: Oleg Nesterov <oleg@redhat.com> > > Reviewed-by: Oleg Nesterov <oleg@redhat.com> > > Signed-off-by: Marco Elver <elver@google.com> > > Apologies for drudging up this old thread. > > I wanted to ask if anyone had objections to including this in the -stable trees? > > After this and the follow-on patch e797203fb3ba > ("selftests/timers/posix_timers: Test delivery of signals across > threads") landed, folks testing older kernels with the latest > selftests started to see the new test checking for this behavior to > stall. Thomas did submit an adjustment to the test here to avoid the > stall: https://lore.kernel.org/lkml/20230606142031.071059989@linutronix.de/, > but it didn't seem to land, however that would just result in the test > failing instead of hanging. > > This change does seem to cherry-pick cleanly back to at least > stable/linux-5.10.y cleanly, so it looks simple to pull this change > back. But I wanted to make sure there wasn't anything subtle I was > missing before sending patches. I don't have objections per se. But I wonder how other tests deal with such situations. It should happen for any test for new functionality. Can we do the same other tests are doing?
On Mon, Apr 01 2024 at 13:17, John Stultz wrote: > Apologies for drudging up this old thread. > I wanted to ask if anyone had objections to including this in the -stable trees? > > After this and the follow-on patch e797203fb3ba > ("selftests/timers/posix_timers: Test delivery of signals across > threads") landed, folks testing older kernels with the latest > selftests started to see the new test checking for this behavior to > stall. Thomas did submit an adjustment to the test here to avoid the > stall: https://lore.kernel.org/lkml/20230606142031.071059989@linutronix.de/, > but it didn't seem to land, however that would just result in the test > failing instead of hanging. Thanks for reminding me about this series. I completely forgot about it. > This change does seem to cherry-pick cleanly back to at least > stable/linux-5.10.y cleanly, so it looks simple to pull this change > back. But I wanted to make sure there wasn't anything subtle I was > missing before sending patches. This test in particular exercises new functionality/behaviour, which really has no business to be backported into stable just to make the relevant test usable on older kernels. Why would testing with latest tests against an older kernel be valid per se? Thanks, tglx
On Tue, Apr 2, 2024 at 7:57 AM Thomas Gleixner <tglx@linutronix.de> wrote: > On Mon, Apr 01 2024 at 13:17, John Stultz wrote: > > This change does seem to cherry-pick cleanly back to at least > > stable/linux-5.10.y cleanly, so it looks simple to pull this change > > back. But I wanted to make sure there wasn't anything subtle I was > > missing before sending patches. > > This test in particular exercises new functionality/behaviour, which > really has no business to be backported into stable just to make the > relevant test usable on older kernels. That's fair. I didn't have all the context around what motivated the change and the follow-on test, which is why I'm asking here. > Why would testing with latest tests against an older kernel be valid per > se? So yeah, it definitely can get fuzzy trying to split hairs between when a change in behavior is a "new feature" or a "fix". Greg could probably articulate it better, but my understanding is the main point for running newer tests on older kernels is that newer tests will have more coverage of what is expected of the kernel. For features that older kernels don't support, ideally the tests will check for that functionality like userland applications would, and skip that portion of the test if it's unsupported. This way, we're able to find issues (important enough to warrant tests having been created) that have not yet been patched in the -stable trees. In this case, there is a behavioral change combined with a compliance test, which makes it look a bit more like a fix, rather than a feature (additionally the lack of a way for userland to probe for this new "feature" makes it seem fix-like). But the intended result of this is just spurring this discussion to see if it makes sense to backport or not. Disabling/ignoring the test (maybe after Thomas' fix to avoid it from hanging :) is a fine solution too, but not one I'd want folks to do until they've synced with maintainers and had full context. thanks -john
On Tue, Apr 02 2024 at 10:23, John Stultz wrote: > On Tue, Apr 2, 2024 at 7:57 AM Thomas Gleixner <tglx@linutronix.de> wrote: >> This test in particular exercises new functionality/behaviour, which >> really has no business to be backported into stable just to make the >> relevant test usable on older kernels. > > That's fair. I didn't have all the context around what motivated the > change and the follow-on test, which is why I'm asking here. It's a performance enhancement to avoid waking up idle threads for signal delivery instead of just delivering it to the current running thread which made the CPU timer fire. So it does not qualify for fix. >> Why would testing with latest tests against an older kernel be valid per >> se? > > So yeah, it definitely can get fuzzy trying to split hairs between > when a change in behavior is a "new feature" or a "fix". > > Greg could probably articulate it better, but my understanding is the > main point for running newer tests on older kernels is that newer > tests will have more coverage of what is expected of the kernel. For > features that older kernels don't support, ideally the tests will > check for that functionality like userland applications would, and > skip that portion of the test if it's unsupported. This way, we're > able to find issues (important enough to warrant tests having been > created) that have not yet been patched in the -stable trees. > > In this case, there is a behavioral change combined with a compliance > test, which makes it look a bit more like a fix, rather than a feature > (additionally the lack of a way for userland to probe for this new > "feature" makes it seem fix-like). But the intended result of this is > just spurring this discussion to see if it makes sense to backport or > not. Disabling/ignoring the test (maybe after Thomas' fix to avoid it > from hanging :) is a fine solution too, but not one I'd want folks to > do until they've synced with maintainers and had full context. I was staring at this test because it hangs even on upstream on a regular base, at least in a VM. The timeout change I posted prevents the hang, but still the posixtimer test will not have 0 fails. The test if fragile as hell as there is absolutely no guarantee that the signal target distribution is as expected. The expectation is based on a statistical assumption which does not really hold. So I came up with a modified variant of that, which can deduce pretty reliably that the test runs on an older kernel. Thanks, tglx --- Subject: selftests/timers/posix_timers: Make signal distribution test less fragile From: Thomas Gleixner <tglx@linutronix.de> Date: Mon, 15 May 2023 00:40:10 +0200 The signal distribution test has a tendency to hang for a long time as the signal delivery is not really evenly distributed. In fact it might never be distributed across all threads ever in the way it is written. Address this by: 1) Adding a timeout which aborts the test 2) Letting the test threads do a usleep() once they got a signal instead of running continuously. That ensures that the other threads will expire the timer and get the signal 3) Adding a detection whether all signals arrvied at the main thread, which allows to run the test on older kernels. While at it get rid of the pointless atomic operation on a the thread local variable in the signal handler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- tools/testing/selftests/timers/posix_timers.c | 48 +++++++++++++++++--------- 1 file changed, 32 insertions(+), 16 deletions(-) --- a/tools/testing/selftests/timers/posix_timers.c +++ b/tools/testing/selftests/timers/posix_timers.c @@ -184,18 +184,22 @@ static int check_timer_create(int which) return 0; } -int remain; -__thread int got_signal; +static int remain; +static __thread int got_signal; static void *distribution_thread(void *arg) { - while (__atomic_load_n(&remain, __ATOMIC_RELAXED)); - return NULL; + while (__atomic_load_n(&remain, __ATOMIC_RELAXED) && !done) { + if (got_signal) + usleep(10); + } + + return (void *)got_signal; } static void distribution_handler(int nr) { - if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED)) + if (++got_signal == 1) __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED); } @@ -205,8 +209,6 @@ static void distribution_handler(int nr) */ static int check_timer_distribution(void) { - int err, i; - timer_t id; const int nthreads = 10; pthread_t threads[nthreads]; struct itimerspec val = { @@ -215,7 +217,11 @@ static int check_timer_distribution(void .it_interval.tv_sec = 0, .it_interval.tv_nsec = 1000 * 1000, }; + int err, i, nsigs; + time_t start, now; + timer_t id; + done = 0; remain = nthreads + 1; /* worker threads + this thread */ signal(SIGALRM, distribution_handler); err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); @@ -231,7 +237,7 @@ static int check_timer_distribution(void for (i = 0; i < nthreads; i++) { err = pthread_create(&threads[i], NULL, distribution_thread, - NULL); + thread_sigs + i); if (err) { ksft_print_msg("Can't create thread: %s (%d)\n", strerror(errno), errno); @@ -240,23 +246,33 @@ static int check_timer_distribution(void } /* Wait for all threads to receive the signal. */ - while (__atomic_load_n(&remain, __ATOMIC_RELAXED)); + now = start = time(NULL); + while (__atomic_load_n(&remain, __ATOMIC_RELAXED)) { + now = time(NULL); + if (now - start > 5) + break; + } + done = 1; - for (i = 0; i < nthreads; i++) { + if (timer_delete(id)) { + ksft_perror("Can't delete timer\n"); + return -1; + } + + for (i = 0, nsigs = 0; i < nthreads; i++) { err = pthread_join(threads[i], NULL); if (err) { ksft_print_msg("Can't join thread: %s (%d)\n", strerror(errno), errno); return -1; } + nsigs += thread_sigs[i]; } - if (timer_delete(id)) { - ksft_perror("Can't delete timer"); - return -1; - } - - ksft_test_result_pass("check_timer_distribution\n"); + if (!nsigs) + ksft_test_result_skip("No signal distribution. Assuming old kernel\n"); + else + ksft_test_result(now - start < 5, "check_timer_distribution\n"); return 0; }
On 04/03, Thomas Gleixner wrote: > > The test if fragile as hell as there is absolutely no guarantee that the > signal target distribution is as expected. The expectation is based on a > statistical assumption which does not really hold. Agreed. I too never liked this test-case. I forgot everything about this patch and test-case, I can't really read your patch right now (sorry), so I am sure I missed something, but > static void *distribution_thread(void *arg) > { > - while (__atomic_load_n(&remain, __ATOMIC_RELAXED)); > - return NULL; > + while (__atomic_load_n(&remain, __ATOMIC_RELAXED) && !done) { > + if (got_signal) > + usleep(10); > + } > + > + return (void *)got_signal; > } Why distribution_thread() can't simply exit if got_signal != 0 ? See https://lore.kernel.org/all/20230128195641.GA14906@redhat.com/ Oleg.
On Wed, Apr 03 2024 at 17:03, Oleg Nesterov wrote: > On 04/03, Thomas Gleixner wrote: >> The test if fragile as hell as there is absolutely no guarantee that the >> signal target distribution is as expected. The expectation is based on a >> statistical assumption which does not really hold. > > Agreed. I too never liked this test-case. > > I forgot everything about this patch and test-case, I can't really read > your patch right now (sorry), so I am sure I missed something, but > >> static void *distribution_thread(void *arg) >> { >> - while (__atomic_load_n(&remain, __ATOMIC_RELAXED)); >> - return NULL; >> + while (__atomic_load_n(&remain, __ATOMIC_RELAXED) && !done) { >> + if (got_signal) >> + usleep(10); >> + } >> + >> + return (void *)got_signal; >> } > > Why distribution_thread() can't simply exit if got_signal != 0 ? > > See https://lore.kernel.org/all/20230128195641.GA14906@redhat.com/ Indeed. It's too obvious :)
On Wed, Apr 03 2024 at 17:43, Thomas Gleixner wrote: > On Wed, Apr 03 2024 at 17:03, Oleg Nesterov wrote: >> >> Why distribution_thread() can't simply exit if got_signal != 0 ? >> >> See https://lore.kernel.org/all/20230128195641.GA14906@redhat.com/ > > Indeed. It's too obvious :) Revised simpler version below. Thanks, tglx --- Subject: selftests/timers/posix_timers: Make signal distribution test less fragile From: Thomas Gleixner <tglx@linutronix.de> The signal distribution test has a tendency to hang for a long time as the signal delivery is not really evenly distributed. In fact it might never be distributed across all threads ever in the way it is written. Address this by: 1) Adding a timeout which aborts the test 2) Letting the test threads exit once they got a signal instead of running continuously. That ensures that the other threads will have a chance to expire the timer and get the signal. 3) Adding a detection whether all signals arrvied at the main thread, which allows to run the test on older kernels and emit 'SKIP'. While at it get rid of the pointless atomic operation on a the thread local variable in the signal handler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- tools/testing/selftests/timers/posix_timers.c | 41 ++++++++++++++++---------- 1 file changed, 26 insertions(+), 15 deletions(-) --- a/tools/testing/selftests/timers/posix_timers.c +++ b/tools/testing/selftests/timers/posix_timers.c @@ -184,18 +184,19 @@ static int check_timer_create(int which) return 0; } -int remain; -__thread int got_signal; +static int remain; +static __thread int got_signal; static void *distribution_thread(void *arg) { - while (__atomic_load_n(&remain, __ATOMIC_RELAXED)); + while (!done && !got_signal); + return NULL; } static void distribution_handler(int nr) { - if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED)) + if (++got_signal == 1) __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED); } @@ -205,8 +206,6 @@ static void distribution_handler(int nr) */ static int check_timer_distribution(void) { - int err, i; - timer_t id; const int nthreads = 10; pthread_t threads[nthreads]; struct itimerspec val = { @@ -215,7 +214,11 @@ static int check_timer_distribution(void .it_interval.tv_sec = 0, .it_interval.tv_nsec = 1000 * 1000, }; + time_t start, now; + timer_t id; + int err, i; + done = 0; remain = nthreads + 1; /* worker threads + this thread */ signal(SIGALRM, distribution_handler); err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); @@ -230,8 +233,7 @@ static int check_timer_distribution(void } for (i = 0; i < nthreads; i++) { - err = pthread_create(&threads[i], NULL, distribution_thread, - NULL); + err = pthread_create(&threads[i], NULL, distribution_thread, NULL); if (err) { ksft_print_msg("Can't create thread: %s (%d)\n", strerror(errno), errno); @@ -240,7 +242,18 @@ static int check_timer_distribution(void } /* Wait for all threads to receive the signal. */ - while (__atomic_load_n(&remain, __ATOMIC_RELAXED)); + now = start = time(NULL); + while (__atomic_load_n(&remain, __ATOMIC_RELAXED)) { + now = time(NULL); + if (now - start > 2) + break; + } + done = 1; + + if (timer_delete(id)) { + ksft_perror("Can't delete timer\n"); + return -1; + } for (i = 0; i < nthreads; i++) { err = pthread_join(threads[i], NULL); @@ -251,12 +264,10 @@ static int check_timer_distribution(void } } - if (timer_delete(id)) { - ksft_perror("Can't delete timer"); - return -1; - } - - ksft_test_result_pass("check_timer_distribution\n"); + if (__atomic_load_n(&remain, __ATOMIC_RELAXED) == nthreads) + ksft_test_result_skip("No signal distribution. Assuming old kernel\n"); + else + ksft_test_result(now - start <= 2, "check signal distribution\n"); return 0; }
On Wed, Apr 3, 2024 at 9:32 AM Thomas Gleixner <tglx@linutronix.de> wrote: > Subject: selftests/timers/posix_timers: Make signal distribution test less fragile > From: Thomas Gleixner <tglx@linutronix.de> > > The signal distribution test has a tendency to hang for a long time as the > signal delivery is not really evenly distributed. In fact it might never be > distributed across all threads ever in the way it is written. > > Address this by: > > 1) Adding a timeout which aborts the test > > 2) Letting the test threads exit once they got a signal instead of > running continuously. That ensures that the other threads will > have a chance to expire the timer and get the signal. > > 3) Adding a detection whether all signals arrvied at the main thread, > which allows to run the test on older kernels and emit 'SKIP'. > > While at it get rid of the pointless atomic operation on a the thread local > variable in the signal handler. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Thanks for this, Thomas! Just FYI: testing with 6.1, the test no longer hangs, but I don't see the SKIP behavior. It just fails: not ok 6 check signal distribution # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0 I've not had time yet to dig into what's going on, but let me know if you need any further details. thanks -john
On Wed, Apr 03 2024 at 11:16, John Stultz wrote: > On Wed, Apr 3, 2024 at 9:32 AM Thomas Gleixner <tglx@linutronix.de> wrote: > Thanks for this, Thomas! > > Just FYI: testing with 6.1, the test no longer hangs, but I don't see > the SKIP behavior. It just fails: > not ok 6 check signal distribution > # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0 > > I've not had time yet to dig into what's going on, but let me know if > you need any further details. That's weird. I ran it on my laptop with 6.1.y ... What kind of machine is that?
On Wed, Apr 3, 2024 at 12:10 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Wed, Apr 03 2024 at 11:16, John Stultz wrote: > > On Wed, Apr 3, 2024 at 9:32 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Thanks for this, Thomas! > > > > Just FYI: testing with 6.1, the test no longer hangs, but I don't see > > the SKIP behavior. It just fails: > > not ok 6 check signal distribution > > # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0 > > > > I've not had time yet to dig into what's going on, but let me know if > > you need any further details. > > That's weird. I ran it on my laptop with 6.1.y ... > > What kind of machine is that? I was running it in a VM. Interestingly with 64cpus it sometimes will do the skip behavior, but with 4 cpus it seems to always fail. thanks -john
On Wed, Apr 03 2024 at 12:35, John Stultz wrote: > On Wed, Apr 3, 2024 at 12:10 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> >> On Wed, Apr 03 2024 at 11:16, John Stultz wrote: >> > On Wed, Apr 3, 2024 at 9:32 AM Thomas Gleixner <tglx@linutronixde> wrote: >> > Thanks for this, Thomas! >> > >> > Just FYI: testing with 6.1, the test no longer hangs, but I don't see >> > the SKIP behavior. It just fails: >> > not ok 6 check signal distribution >> > # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0 >> > >> > I've not had time yet to dig into what's going on, but let me know if >> > you need any further details. >> >> That's weird. I ran it on my laptop with 6.1.y ... >> >> What kind of machine is that? > > I was running it in a VM. > > Interestingly with 64cpus it sometimes will do the skip behavior, but > with 4 cpus it seems to always fail. Duh, yes. The problem is that any thread might grab the signal as it is process wide. What was I thinking? Not much obviously. The distribution mechanism is only targeting the wakeup at signal queuing time and therefore avoids the wakeup of idle tasks. But it does not guarantee that the signal is evenly distributed to the threads on actual signal delivery. Even with the change to stop the worker threads when they got a signal it's not guaranteed that the last worker will actually get one within the timeout simply because the main thread can win the race to collect the signal every time. I just managed to make the patched test fail in one out of 100 runs. IOW, we cannot test this reliably at all with the current approach. I'll think about it tomorrow again with brain awake. Thanks, tglx
On Wed, 3 Apr 2024 at 17:43, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Wed, Apr 03 2024 at 17:03, Oleg Nesterov wrote: > > On 04/03, Thomas Gleixner wrote: > >> The test if fragile as hell as there is absolutely no guarantee that the > >> signal target distribution is as expected. The expectation is based on a > >> statistical assumption which does not really hold. > > > > Agreed. I too never liked this test-case. > > > > I forgot everything about this patch and test-case, I can't really read > > your patch right now (sorry), so I am sure I missed something, but > > > >> static void *distribution_thread(void *arg) > >> { > >> - while (__atomic_load_n(&remain, __ATOMIC_RELAXED)); > >> - return NULL; > >> + while (__atomic_load_n(&remain, __ATOMIC_RELAXED) && !done) { > >> + if (got_signal) > >> + usleep(10); > >> + } > >> + > >> + return (void *)got_signal; > >> } > > > > Why distribution_thread() can't simply exit if got_signal != 0 ? > > > > See https://lore.kernel.org/all/20230128195641.GA14906@redhat.com/ > > Indeed. It's too obvious :) This test models the intended use-case that was the motivation for the change: We want to sample execution of a running multi-threaded program, it has multiple active threads (that don't exit), since all threads are running and consuming CPU, they all should get a signal eventually. If threads will exit once they get a signal, then the test will pass even if signal delivery is biased towards a single running thread all the time (the previous kernel impl).
Perhaps I am totally confused, but. On 04/04, Dmitry Vyukov wrote: > > On Wed, 3 Apr 2024 at 17:43, Thomas Gleixner <tglx@linutronix.de> wrote: > > > > > Why distribution_thread() can't simply exit if got_signal != 0 ? > > > > > > See https://lore.kernel.org/all/20230128195641.GA14906@redhat.com/ > > > > Indeed. It's too obvious :) > > This test models the intended use-case that was the motivation for the change: > We want to sample execution of a running multi-threaded program, it > has multiple active threads (that don't exit), since all threads are > running and consuming CPU, Yes, > they all should get a signal eventually. Well, yes and no. No, in a sense that the motivation was not to ensure that all threads get a signal, the motivation was to ensure that cpu_timer_fire() paths will use the current task as the default target for signal_wake_up/etc. This is just optimization. But yes, all should get a signal eventually. And this will happen with or without the commit bcb7ee79029dca ("posix-timers: Prefer delivery of signals to the current thread"). Any thread can dequeue a shared signal, say, on return from interrupt. Just without that commit this "eventually" means A_LOT_OF_TIME statistically. > If threads will exit once they get a signal, just in case, the main thread should not exit ... > then the test will pass > even if signal delivery is biased towards a single running thread all > the time (the previous kernel impl). See above. But yes, I agree, if thread exits once it get a signal, then A_LOT_OF_TIME will be significantly decreased. But again, this is just statistical issue, I do not see how can we test the commit bcb7ee79029dca reliably. OTOH. If the threads do not exit after they get signal, then _in theory_ nothing can guarantee that this test-case will ever complete even with that commit. It is possible that one of the threads will "never" have a chance to run cpu_timer_fire(). In short, I leave this to you and Thomas. I have no idea how to write a "good" test for that commit. Well... perhaps the main thread should just sleep in pause(), and distribution_handler() should check that gettid() != getpid() ? Something like this maybe... We need to ensure that the main thread enters pause before timer_settime(). Oleg.
On 04/04, Thomas Gleixner wrote: > > IOW, we cannot test this reliably at all with the current approach. Agreed! So how about a REALLY SIMPLE test-case below? Lacks error checking, should be updated to match tools/testing/selftests. Without commit bcb7ee79029dca assert(sig_cnt > SIG_CNT) fails, the very 1st tick wakes the leader up. With that commit it doesn't fail. Oleg. #include <stdio.h> #include <unistd.h> #include <signal.h> #include <pthread.h> #include <time.h> #include <assert.h> #define SIG_CNT 100 static volatile int sig_cnt; static void alarm_func(int sig) { ++sig_cnt; } static void *thread_func(void *arg) { // one second before the 1st tick to ensure the leader sleeps struct itimerspec val = { .it_value.tv_sec = 1, .it_value.tv_nsec = 0, .it_interval.tv_sec = 0, .it_interval.tv_nsec = 1000 * 1000, }; timer_t id; timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); timer_settime(id, 0, &val, NULL); while (sig_cnt < SIG_CNT) ; // wake up the leader kill(getpid(), SIGALRM); return NULL; } int main(void) { pthread_t thread; signal(SIGALRM, alarm_func); pthread_create(&thread, NULL, thread_func, NULL); pause(); assert(sig_cnt > SIG_CNT); // likely SIG_CNT + 1 return 0; }
On Thu, Apr 04 2024 at 15:43, Oleg Nesterov wrote: > On 04/04, Dmitry Vyukov wrote: >> they all should get a signal eventually. > > Well, yes and no. > > No, in a sense that the motivation was not to ensure that all threads > get a signal, the motivation was to ensure that cpu_timer_fire() paths > will use the current task as the default target for signal_wake_up/etc. > This is just optimization. > > But yes, all should get a signal eventually. And this will happen with > or without the commit bcb7ee79029dca ("posix-timers: Prefer delivery of > signals to the current thread"). Any thread can dequeue a shared signal, > say, on return from interrupt. > > Just without that commit this "eventually" means A_LOT_OF_TIME > statistically. bcb7ee79029dca only directs the wakeup to current, but the signal is still queued in the process wide shared pending list. So the thread which sees sigpending() first will grab and deliver it to itself. > But yes, I agree, if thread exits once it get a signal, then A_LOT_OF_TIME > will be significantly decreased. But again, this is just statistical issue, > I do not see how can we test the commit bcb7ee79029dca reliably. We can't. What we can actually test is the avoidance of waking up the main thread by doing the following in the main thread: start_threads(); barrier_wait(); nanosleep(2 seconds); done = 1; stop_threads(); and in the first thread which is started: first_thread() barrier_wait(); start_timer(); loop() On a pre 6.3 kernel nanosleep() will return early because the main thread is woken up and will eventually win the race to deliver the signal. On a 6.3 and later kernel nanosleep() will not return early because the main thread is not woken up as the wake up is directed at current, i.e. a worker thread, which is running anyway and will consume the signal. > OTOH. If the threads do not exit after they get signal, then _in theory_ > nothing can guarantee that this test-case will ever complete even with > that commit. It is possible that one of the threads will "never" have a > chance to run cpu_timer_fire(). Even with the exit I managed to make one out of 100 runs run into the timeout because the main thread always won the race. > In short, I leave this to you and Thomas. I have no idea how to write a > "good" test for that commit. > > Well... perhaps the main thread should just sleep in pause(), and > distribution_handler() should check that gettid() != getpid() ? > Something like this maybe... We need to ensure that the main thread > enters pause before timer_settime(). I'm testing a modification which implements something like the above and the success condition is that the main thread does not return early from nanosleep() and has no signal accounted. It survived 2000 iterations by now. Let me polish it up. Thanks, tglx
On 04/04, Thomas Gleixner wrote: > > On Thu, Apr 04 2024 at 15:43, Oleg Nesterov wrote: > > > And this will happen with > > or without the commit bcb7ee79029dca ("posix-timers: Prefer delivery of > > signals to the current thread"). Any thread can dequeue a shared signal, > > say, on return from interrupt. > > > > Just without that commit this "eventually" means A_LOT_OF_TIME > > statistically. > > bcb7ee79029dca only directs the wakeup to current, but the signal is > still queued in the process wide shared pending list. So the thread > which sees sigpending() first will grab and deliver it to itself. This is what I tried to say above. > What we can actually test is the avoidance of waking up the main thread > by doing the following in the main thread: Hmm... I think it can be even simpler, > I'm testing a modification which implements something like the above and > the success condition is that the main thread does not return early from > nanosleep() and has no signal accounted. It survived 2000 iterations by > now. Yes, but please see a trivial test-case I sent you few minutes ago. Oleg.
On Thu, Apr 04 2024 at 16:54, Oleg Nesterov wrote: > On 04/04, Thomas Gleixner wrote: >> >> IOW, we cannot test this reliably at all with the current approach. > > Agreed! > > So how about a REALLY SIMPLE test-case below? > > Lacks error checking, should be updated to match tools/testing/selftests. > > Without commit bcb7ee79029dca assert(sig_cnt > SIG_CNT) fails, the very > 1st tick wakes the leader up. > > With that commit it doesn't fail. Clever!
On Thu, 4 Apr 2024 at 15:45, Oleg Nesterov <oleg@redhat.com> wrote: > > Perhaps I am totally confused, but. > > On 04/04, Dmitry Vyukov wrote: > > > > On Wed, 3 Apr 2024 at 17:43, Thomas Gleixner <tglx@linutronix.de> wrote: > > > > > > > Why distribution_thread() can't simply exit if got_signal != 0 ? > > > > > > > > See https://lore.kernel.org/all/20230128195641.GA14906@redhat.com/ > > > > > > Indeed. It's too obvious :) > > > > This test models the intended use-case that was the motivation for the change: > > We want to sample execution of a running multi-threaded program, it > > has multiple active threads (that don't exit), since all threads are > > running and consuming CPU, > > Yes, > > > they all should get a signal eventually. > > Well, yes and no. > > No, in a sense that the motivation was not to ensure that all threads > get a signal, the motivation was to ensure that cpu_timer_fire() paths > will use the current task as the default target for signal_wake_up/etc. > This is just optimization. > > But yes, all should get a signal eventually. And this will happen with > or without the commit bcb7ee79029dca ("posix-timers: Prefer delivery of > signals to the current thread"). Any thread can dequeue a shared signal, > say, on return from interrupt. > > Just without that commit this "eventually" means A_LOT_OF_TIME statistically. I agree that any thread can pick the signal, but this A_LOT_OF_TIME makes it impossible for the test to reliably repeatedly pass w/o the change in any reasonable testing system. With the change the test was finishing/passing for me immediately all the time. Again, if the test causes practical problems (flaky), then I don't mind relaxing it (flaky tests suck). I was just against giving up on testing proactively just in case. > > If threads will exit once they get a signal, > > just in case, the main thread should not exit ... > > > then the test will pass > > even if signal delivery is biased towards a single running thread all > > the time (the previous kernel impl). > > See above. > > But yes, I agree, if thread exits once it get a signal, then A_LOT_OF_TIME > will be significantly decreased. But again, this is just statistical issue, > I do not see how can we test the commit bcb7ee79029dca reliably. > > OTOH. If the threads do not exit after they get signal, then _in theory_ > nothing can guarantee that this test-case will ever complete even with > that commit. It is possible that one of the threads will "never" have a > chance to run cpu_timer_fire(). > > In short, I leave this to you and Thomas. I have no idea how to write a > "good" test for that commit. > > Well... perhaps the main thread should just sleep in pause(), and > distribution_handler() should check that gettid() != getpid() ? > Something like this maybe... We need to ensure that the main thread > enters pause before timer_settime(). > > Oleg. >
diff --git a/kernel/signal.c b/kernel/signal.c index 8cb28f1df294..605445fa27d4 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) /* * Now find a thread we can wake up to take the signal off the queue. * - * If the main thread wants the signal, it gets first crack. - * Probably the least surprising to the average bear. + * Try the suggested task first (may or may not be the main thread). */ if (wants_signal(sig, p)) t = p; @@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) ret = -1; rcu_read_lock(); + /* + * This function is used by POSIX timers to deliver a timer signal. + * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID + * set), the signal must be delivered to the specific thread (queues + * into t->pending). + * + * Where type is not PIDTYPE_PID, signals must just be delivered to the + * current process. In this case, prefer to deliver to current if it is + * in the same thread group as the target, as it avoids unnecessarily + * waking up a potentially idle task. + */ t = pid_task(pid, type); - if (!t || !likely(lock_task_sighand(t, &flags))) + if (!t) + goto ret; + if (type != PIDTYPE_PID && same_thread_group(t, current)) + t = current; + if (!likely(lock_task_sighand(t, &flags))) goto ret; ret = 1; /* the signal is ignored */ @@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) q->info.si_overrun = 0; signalfd_notify(t, sig); + /* + * If the type is not PIDTYPE_PID, we just use shared_pending, which + * won't guarantee that the specified task will receive the signal, but + * is sufficient if t==current in the common case. + */ pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; list_add_tail(&q->list, &pending->list); sigaddset(&pending->signal, sig);