Message ID | 20210122102041.27031-1-mreitz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | coroutine-sigaltstack: Keep SIGUSR2 handler up | expand |
On 1/22/21 4:20 AM, Max Reitz wrote: > Modifying signal handlers is a process-global operation. When two > threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently, > they may interfere with each other: One of them may revert the SIGUSR2 > handler back to the default between the other thread setting up > coroutine_trampoline() as the handler and raising SIGUSR2. That SIGUSR2 > will then lead to the process exiting. > > Outside of coroutine-sigaltstack, qemu does not use SIGUSR2. We can > thus keep the signal handler installed all the time. > CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its > stack is set up so a new coroutine is to be launched (i.e., it should > invoke sigsetjmp()), or not (i.e., the signal came from an external > source and we should just perform the default action, which is to exit > the process). Not just exit the process, but exit due to a signal. It matters... > > Note that in user-mode emulation, the guest can register signal handlers > for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2 > handler, sigaltstack coroutines will break from then on. However, we do > not use coroutines for user-mode emulation, so that is fine. > > Suggested-by: Laszlo Ersek <lersek@redhat.com> > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > util/coroutine-sigaltstack.c | 56 +++++++++++++++++++----------------- > 1 file changed, 29 insertions(+), 27 deletions(-) > > @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal) > /* Get the thread specific information */ > coTS = coroutine_get_thread_state(); > self = coTS->tr_handler; > + > + if (!self) { > + /* > + * This SIGUSR2 came from an external source, not from > + * qemu_coroutine_new(), so perform the default action. > + */ > + exit(0); > + } ...here. Silently exiting with status 0 is wrong; the correct response is to use signal(SIGUSR2, SIG_DFL); raise(SIGUSR2) to terminate the process in the same manner as if we had not installed our hander at all. With that fixed, Reviewed-by: Eric Blake <eblake@redhat.com>
On 01/22/21 11:20, Max Reitz wrote: > Modifying signal handlers is a process-global operation. When two > threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently, > they may interfere with each other: One of them may revert the SIGUSR2 > handler back to the default between the other thread setting up > coroutine_trampoline() as the handler and raising SIGUSR2. That SIGUSR2 > will then lead to the process exiting. > > Outside of coroutine-sigaltstack, qemu does not use SIGUSR2. We can > thus keep the signal handler installed all the time. > CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its > stack is set up so a new coroutine is to be launched (i.e., it should > invoke sigsetjmp()), or not (i.e., the signal came from an external > source and we should just perform the default action, which is to exit > the process). > > Note that in user-mode emulation, the guest can register signal handlers > for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2 > handler, sigaltstack coroutines will break from then on. However, we do > not use coroutines for user-mode emulation, so that is fine. > > Suggested-by: Laszlo Ersek <lersek@redhat.com> > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > util/coroutine-sigaltstack.c | 56 +++++++++++++++++++----------------- > 1 file changed, 29 insertions(+), 27 deletions(-) (1) With SIGUSR2 permanently dedicated to "coroutine-sigaltstack.c", the comment on the qemu_init_main_loop() declaration, in "include/qemu/main-loop.h", also needs to be updated. SIGUSR2 is no longer a "free for thread-internal usage" signal. > > diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c > index aade82afb8..2d32afc322 100644 > --- a/util/coroutine-sigaltstack.c > +++ b/util/coroutine-sigaltstack.c > @@ -59,6 +59,8 @@ typedef struct { > > static pthread_key_t thread_state_key; > > +static void coroutine_trampoline(int signal); > + > static CoroutineThreadState *coroutine_get_thread_state(void) > { > CoroutineThreadState *s = pthread_getspecific(thread_state_key); > @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void *opaque) > > static void __attribute__((constructor)) coroutine_init(void) > { > + struct sigaction sa; > int ret; > > ret = pthread_key_create(&thread_state_key, qemu_coroutine_thread_cleanup); > @@ -87,6 +90,20 @@ static void __attribute__((constructor)) coroutine_init(void) > fprintf(stderr, "unable to create leader key: %s\n", strerror(errno)); > abort(); > } > + > + /* > + * Establish the SIGUSR2 signal handler. This is a process-wide > + * operation, and so will apply to all threads from here on. > + */ > + sa = (struct sigaction) { > + .sa_handler = coroutine_trampoline, > + .sa_flags = SA_ONSTACK, > + }; > + > + if (sigaction(SIGUSR2, &sa, NULL) != 0) { > + perror("Unable to install SIGUSR2 handler"); > + abort(); > + } > } > > /* "boot" function > @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal) > /* Get the thread specific information */ > coTS = coroutine_get_thread_state(); > self = coTS->tr_handler; > + > + if (!self) { > + /* > + * This SIGUSR2 came from an external source, not from > + * qemu_coroutine_new(), so perform the default action. > + */ > + exit(0); > + } (2) exit() is generally unsafe to call in signal handlers. We could reason whether or not it is safe in this particular case (POSIX describes the exact conditions -- <https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03>), but it's much simpler to just call _exit(). (3) "Performing the default action" would be slightly different from calling _exit(). When a process is terminated with a signal, the parent can distinguish that, when reaping the child. See waitpid() / WIFSIGNALED() / WTERMSIG(), versus WIFEXITED() / WEXITSTATUS(). So for the "default action", we'd have to: - restore the SIGUSR2 handler to SIG_DFL, and - re-raise the signal for the thread, and - because the signal being handled is automatically blocked unless SA_NODEFER was set: unblock the signal for the thread. The pthread_sigmask() call, made for the last step, would be the one that would not return. *However*, all of this complexity is not really useful here. We don't really want to simulate being "forcefully terminated" by the unexpected (asynchronous) SIGUSR2. We just want to exit. Therefore, _exit() is fine. But, we should use an exit code different from 0, as this is definitely not a pristine exit from QEMU. I'm not sure if a convention exists for nonzero exit codes in QEMU; if not, then just pass EXIT_FAILURE to _exit(). (4) Furthermore, please update the comment, because "perform the default action" is not precise. > + > coTS->tr_called = 1; > + coTS->tr_handler = NULL; > co = &self->base; > > /* (5) I see that "tr_called" has type "volatile sig_atomic_t", which is great (I think that "sig_atomic_t" is not even necessary here, because of the careful signal masking that we do, and because the signal is raised synchronously). "volatile" is certainly justified though (the compiler may not be able to trace the call through the signal), and that's what I'm missing from "tr_handler" too. IMO, the "tr_handler" field should be decalered as follow: volatile void * volatile tr_handler; wherein the second "volatile" serves just the same purpose as the "volatile" in "tr_called", and the first "volatile" follows from *that* -- wherever we chase the *chain of pointers* rooted in "tr_handler" should not be optimized out by the compiler. But: my point (5) is orthogonal to this patch. In practice, it may not matter at all. So feel free to ignore now, I guess. > @@ -150,12 +177,9 @@ Coroutine *qemu_coroutine_new(void) > { > CoroutineSigAltStack *co; > CoroutineThreadState *coTS; > - struct sigaction sa; > - struct sigaction osa; > stack_t ss; > stack_t oss; > sigset_t sigs; > - sigset_t osigs; > sigjmp_buf old_env; > > /* The way to manipulate stack is with the sigaltstack function. We > @@ -172,24 +196,6 @@ Coroutine *qemu_coroutine_new(void) > co->stack = qemu_alloc_stack(&co->stack_size); > co->base.entry_arg = &old_env; /* stash away our jmp_buf */ > > - coTS = coroutine_get_thread_state(); > - coTS->tr_handler = co; > - > - /* > - * Preserve the SIGUSR2 signal state, block SIGUSR2, > - * and establish our signal handler. The signal will > - * later transfer control onto the signal stack. > - */ > - sigemptyset(&sigs); > - sigaddset(&sigs, SIGUSR2); > - pthread_sigmask(SIG_BLOCK, &sigs, &osigs); > - sa.sa_handler = coroutine_trampoline; > - sigfillset(&sa.sa_mask); > - sa.sa_flags = SA_ONSTACK; > - if (sigaction(SIGUSR2, &sa, &osa) != 0) { > - abort(); > - } > - > /* > * Set the new stack. > */ > @@ -207,6 +213,8 @@ Coroutine *qemu_coroutine_new(void) > * signal can be delivered the first time sigsuspend() is > * called. > */ > + coTS = coroutine_get_thread_state(); > + coTS->tr_handler = co; > coTS->tr_called = 0; > pthread_kill(pthread_self(), SIGUSR2); > sigfillset(&sigs); > @@ -230,12 +238,6 @@ Coroutine *qemu_coroutine_new(void) > sigaltstack(&oss, NULL); > } > > - /* > - * Restore the old SIGUSR2 signal handler and mask > - */ > - sigaction(SIGUSR2, &osa, NULL); > - pthread_sigmask(SIG_SETMASK, &osigs, NULL); > - > /* > * Now enter the trampoline again, but this time not as a signal > * handler. Instead we jump into it directly. The functionally > (6) This change, for qemu_coroutine_new(), is too heavy-handed, in my opinion. I suggest (a) removing only the sigaction() calls and their directly needed aux variables, and (b) *not* moving around operations. In particular, you remove the blocking of SIGUSR2 for the thread -- the pthread_sigmask() call. That means the sigsuspend() later on becomes superfluous, as the signal will not be delivered inside sigsuspend(), but inside pthread_kill(). With the signal not blocked, *generation* and *delivery* will coalesce into a single event. The general logic should stay the same, only the signal action manipulation should be removed. IOW, for this function, I propose the following change only: > diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c > index aade82afb8c0..722fed7b2502 100644 > --- a/util/coroutine-sigaltstack.c > +++ b/util/coroutine-sigaltstack.c > @@ -149,107 +149,97 @@ static void coroutine_trampoline(int signal) > Coroutine *qemu_coroutine_new(void) > { > CoroutineSigAltStack *co; > CoroutineThreadState *coTS; > - struct sigaction sa; > - struct sigaction osa; > stack_t ss; > stack_t oss; > sigset_t sigs; > sigset_t osigs; > sigjmp_buf old_env; > > /* The way to manipulate stack is with the sigaltstack function. We > * prepare a stack, with it delivering a signal to ourselves and then > * put sigsetjmp/siglongjmp where needed. > * This has been done keeping coroutine-ucontext as a model and with the > * pth ideas (GNU Portable Threads). See coroutine-ucontext for the basics > * of the coroutines and see pth_mctx.c (from the pth project) for the > * sigaltstack way of manipulating stacks. > */ > > co = g_malloc0(sizeof(*co)); > co->stack_size = COROUTINE_STACK_SIZE; > co->stack = qemu_alloc_stack(&co->stack_size); > co->base.entry_arg = &old_env; /* stash away our jmp_buf */ > > coTS = coroutine_get_thread_state(); > coTS->tr_handler = co; > > /* > - * Preserve the SIGUSR2 signal state, block SIGUSR2, > - * and establish our signal handler. The signal will > - * later transfer control onto the signal stack. > + * Block SIGUSR2. The signal will later transfer control onto the signal > + * stack. > */ > sigemptyset(&sigs); > sigaddset(&sigs, SIGUSR2); > pthread_sigmask(SIG_BLOCK, &sigs, &osigs); > - sa.sa_handler = coroutine_trampoline; > - sigfillset(&sa.sa_mask); > - sa.sa_flags = SA_ONSTACK; > - if (sigaction(SIGUSR2, &sa, &osa) != 0) { > - abort(); > - } > > /* > * Set the new stack. > */ > ss.ss_sp = co->stack; > ss.ss_size = co->stack_size; > ss.ss_flags = 0; > if (sigaltstack(&ss, &oss) < 0) { > abort(); > } > > /* > * Now transfer control onto the signal stack and set it up. > * It will return immediately via "return" after the sigsetjmp() > * was performed. Be careful here with race conditions. The > * signal can be delivered the first time sigsuspend() is > * called. > */ > coTS->tr_called = 0; > pthread_kill(pthread_self(), SIGUSR2); > sigfillset(&sigs); > sigdelset(&sigs, SIGUSR2); > while (!coTS->tr_called) { > sigsuspend(&sigs); > } > > /* > * Inform the system that we are back off the signal stack by > * removing the alternative signal stack. Be careful here: It > * first has to be disabled, before it can be removed. > */ > sigaltstack(NULL, &ss); > ss.ss_flags = SS_DISABLE; > if (sigaltstack(&ss, NULL) < 0) { > abort(); > } > sigaltstack(NULL, &ss); > if (!(oss.ss_flags & SS_DISABLE)) { > sigaltstack(&oss, NULL); > } > > /* > - * Restore the old SIGUSR2 signal handler and mask > + * Restore the old mask > */ > - sigaction(SIGUSR2, &osa, NULL); > pthread_sigmask(SIG_SETMASK, &osigs, NULL); > > /* > * Now enter the trampoline again, but this time not as a signal > * handler. Instead we jump into it directly. The functionally > * redundant ping-pong pointer arithmetic is necessary to avoid > * type-conversion warnings related to the `volatile' qualifier and > * the fact that `jmp_buf' usually is an array type. > */ > if (!sigsetjmp(old_env, 0)) { > siglongjmp(coTS->tr_reenter, 1); > } > > /* > * Ok, we returned again, so now we're finished > */ > > return &co->base; > } (7) The sigaction() call has not been moved entirely correctly from qemu_coroutine_new() to coroutine_init(), in my opinion. Namely, the original call site fills the "sa_mask" member, meaning that, while the signal handler is executing, not only SIGUSR2 itself should be blocked, but *all* signals. This is missing from the new call site, in coroutine_init() -- the "sa_mask" member is left zeroed. Now, in practice, this may not matter a whole lot, because "sa_mask" is additive, and at the only place we allow the signal to be delivered, namely in sigsuspend(), we already have everything blocked, except for SIGUSR2. So when "sa_mask" is ORed with the set of blocked signals, upon delivery of SIGUSR2, there is no actual change to the signal mask. *But*, I feel that such a change would really deserve its own argument, i.e. a separate patch, or at least a separate paragraph in the commit message. And I suggest not doing those; instead please just faithfully transfer the "sa_mask" setting too, to coroutine_init(). (My impression is that the effective removal of the "sa_mask" population was an oversight in this patch, not a conscious decision.) Thanks Laszlo
On 01/22/21 11:20, Max Reitz wrote: > Modifying signal handlers is a process-global operation. When two > threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently, > they may interfere with each other: One of them may revert the SIGUSR2 > handler back to the default between the other thread setting up > coroutine_trampoline() as the handler and raising SIGUSR2. That SIGUSR2 > will then lead to the process exiting. > > Outside of coroutine-sigaltstack, qemu does not use SIGUSR2. We can > thus keep the signal handler installed all the time. > CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its > stack is set up so a new coroutine is to be launched (i.e., it should > invoke sigsetjmp()), or not (i.e., the signal came from an external > source and we should just perform the default action, which is to exit > the process). > > Note that in user-mode emulation, the guest can register signal handlers > for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2 > handler, sigaltstack coroutines will break from then on. However, we do > not use coroutines for user-mode emulation, so that is fine. > > Suggested-by: Laszlo Ersek <lersek@redhat.com> > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > util/coroutine-sigaltstack.c | 56 +++++++++++++++++++----------------- > 1 file changed, 29 insertions(+), 27 deletions(-) > > diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c > index aade82afb8..2d32afc322 100644 > --- a/util/coroutine-sigaltstack.c > +++ b/util/coroutine-sigaltstack.c > @@ -59,6 +59,8 @@ typedef struct { > > static pthread_key_t thread_state_key; > > +static void coroutine_trampoline(int signal); > + > static CoroutineThreadState *coroutine_get_thread_state(void) > { > CoroutineThreadState *s = pthread_getspecific(thread_state_key); > @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void *opaque) > > static void __attribute__((constructor)) coroutine_init(void) > { > + struct sigaction sa; > int ret; > > ret = pthread_key_create(&thread_state_key, qemu_coroutine_thread_cleanup); > @@ -87,6 +90,20 @@ static void __attribute__((constructor)) coroutine_init(void) > fprintf(stderr, "unable to create leader key: %s\n", strerror(errno)); > abort(); > } > + > + /* > + * Establish the SIGUSR2 signal handler. This is a process-wide > + * operation, and so will apply to all threads from here on. > + */ > + sa = (struct sigaction) { > + .sa_handler = coroutine_trampoline, > + .sa_flags = SA_ONSTACK, > + }; > + > + if (sigaction(SIGUSR2, &sa, NULL) != 0) { > + perror("Unable to install SIGUSR2 handler"); > + abort(); > + } > } > > /* "boot" function > @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal) > /* Get the thread specific information */ > coTS = coroutine_get_thread_state(); > self = coTS->tr_handler; > + > + if (!self) { > + /* > + * This SIGUSR2 came from an external source, not from > + * qemu_coroutine_new(), so perform the default action. > + */ > + exit(0); > + } > + > coTS->tr_called = 1; > + coTS->tr_handler = NULL; > co = &self->base; > > /* (8) There's a further complication here, assuming we really want to recognize the case when the handler is executing unexpectedly: - pthread_getspecific() is not necessarily async-signal-safe, according to POSIX, so calling coroutine_get_thread_state() in the "unexpected" case (e.g. in response to an asynchronously generated SIGUSR2) is problematic in its own right, - if the SIGUSR2 is delivered to a thread that has never called coroutine_get_thread_state() before, then we'll reach g_malloc0() inside coroutine_get_thread_state(), in signal handler context, which is very bad. You'd have to block SIGUSR2 for the entire process (all threads) at all times, and only temporarily unblock it for a particular coroutine thread, with the sigsuspend(). The above check would suffice, that way. Such blocking is possible by calling pthread_sigmask() from the main thread, before any other thread is created (the signal mask is inherited across pthread_create()). I guess it could be done in coroutine_init() too. And *then* the pthread_sigmask() calls should indeed be removed from qemu_coroutine_new(). (Apologies if my feedback is difficult to understand, it's my fault. I could propose a patch, if (and only if) you want that.) Thanks Laszlo > @@ -150,12 +177,9 @@ Coroutine *qemu_coroutine_new(void) > { > CoroutineSigAltStack *co; > CoroutineThreadState *coTS; > - struct sigaction sa; > - struct sigaction osa; > stack_t ss; > stack_t oss; > sigset_t sigs; > - sigset_t osigs; > sigjmp_buf old_env; > > /* The way to manipulate stack is with the sigaltstack function. We > @@ -172,24 +196,6 @@ Coroutine *qemu_coroutine_new(void) > co->stack = qemu_alloc_stack(&co->stack_size); > co->base.entry_arg = &old_env; /* stash away our jmp_buf */ > > - coTS = coroutine_get_thread_state(); > - coTS->tr_handler = co; > - > - /* > - * Preserve the SIGUSR2 signal state, block SIGUSR2, > - * and establish our signal handler. The signal will > - * later transfer control onto the signal stack. > - */ > - sigemptyset(&sigs); > - sigaddset(&sigs, SIGUSR2); > - pthread_sigmask(SIG_BLOCK, &sigs, &osigs); > - sa.sa_handler = coroutine_trampoline; > - sigfillset(&sa.sa_mask); > - sa.sa_flags = SA_ONSTACK; > - if (sigaction(SIGUSR2, &sa, &osa) != 0) { > - abort(); > - } > - > /* > * Set the new stack. > */ > @@ -207,6 +213,8 @@ Coroutine *qemu_coroutine_new(void) > * signal can be delivered the first time sigsuspend() is > * called. > */ > + coTS = coroutine_get_thread_state(); > + coTS->tr_handler = co; > coTS->tr_called = 0; > pthread_kill(pthread_self(), SIGUSR2); > sigfillset(&sigs); > @@ -230,12 +238,6 @@ Coroutine *qemu_coroutine_new(void) > sigaltstack(&oss, NULL); > } > > - /* > - * Restore the old SIGUSR2 signal handler and mask > - */ > - sigaction(SIGUSR2, &osa, NULL); > - pthread_sigmask(SIG_SETMASK, &osigs, NULL); > - > /* > * Now enter the trampoline again, but this time not as a signal > * handler. Instead we jump into it directly. The functionally >
On 22.01.21 17:38, Laszlo Ersek wrote: > On 01/22/21 11:20, Max Reitz wrote: >> Modifying signal handlers is a process-global operation. When two >> threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently, >> they may interfere with each other: One of them may revert the SIGUSR2 >> handler back to the default between the other thread setting up >> coroutine_trampoline() as the handler and raising SIGUSR2. That SIGUSR2 >> will then lead to the process exiting. >> >> Outside of coroutine-sigaltstack, qemu does not use SIGUSR2. We can >> thus keep the signal handler installed all the time. >> CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its >> stack is set up so a new coroutine is to be launched (i.e., it should >> invoke sigsetjmp()), or not (i.e., the signal came from an external >> source and we should just perform the default action, which is to exit >> the process). >> >> Note that in user-mode emulation, the guest can register signal handlers >> for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2 >> handler, sigaltstack coroutines will break from then on. However, we do >> not use coroutines for user-mode emulation, so that is fine. >> >> Suggested-by: Laszlo Ersek <lersek@redhat.com> >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> util/coroutine-sigaltstack.c | 56 +++++++++++++++++++----------------- >> 1 file changed, 29 insertions(+), 27 deletions(-) > > (1) With SIGUSR2 permanently dedicated to "coroutine-sigaltstack.c", the > comment on the qemu_init_main_loop() declaration, in > "include/qemu/main-loop.h", also needs to be updated. SIGUSR2 is no > longer a "free for thread-internal usage" signal. It’s possible that I haven’t understood that comment, but I haven’t adjusted it because I didn’t interpret it to mean that the signals listed there were free to use. For example, SIGUSR1 (aliased to SIG_IPI) is generated by cpus_kick_thread() and caught by KVM and HVF, so it doesn’t seem like it would be free for thread-internal usage either. Instead, I understood it to mean that the signals listed there do not have to be blocked by non-main threads, because it is OK for non-main threads to catch them. I can’t think of a reason why SIGUSR2 should be blocked by any thread, so I decided not to change the comment. But perhaps I just didn’t understand the whole comment. That’s definitely possible, given that I don’t even see a place where non-main threads would block the signals not listed there. >> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c >> index aade82afb8..2d32afc322 100644 >> --- a/util/coroutine-sigaltstack.c >> +++ b/util/coroutine-sigaltstack.c >> @@ -59,6 +59,8 @@ typedef struct { >> >> static pthread_key_t thread_state_key; >> >> +static void coroutine_trampoline(int signal); >> + >> static CoroutineThreadState *coroutine_get_thread_state(void) >> { >> CoroutineThreadState *s = pthread_getspecific(thread_state_key); >> @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void *opaque) >> >> static void __attribute__((constructor)) coroutine_init(void) >> { >> + struct sigaction sa; >> int ret; >> >> ret = pthread_key_create(&thread_state_key, qemu_coroutine_thread_cleanup); >> @@ -87,6 +90,20 @@ static void __attribute__((constructor)) coroutine_init(void) >> fprintf(stderr, "unable to create leader key: %s\n", strerror(errno)); >> abort(); >> } >> + >> + /* >> + * Establish the SIGUSR2 signal handler. This is a process-wide >> + * operation, and so will apply to all threads from here on. >> + */ >> + sa = (struct sigaction) { >> + .sa_handler = coroutine_trampoline, >> + .sa_flags = SA_ONSTACK, >> + }; >> + >> + if (sigaction(SIGUSR2, &sa, NULL) != 0) { >> + perror("Unable to install SIGUSR2 handler"); >> + abort(); >> + } >> } >> >> /* "boot" function >> @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal) >> /* Get the thread specific information */ >> coTS = coroutine_get_thread_state(); >> self = coTS->tr_handler; >> + >> + if (!self) { >> + /* >> + * This SIGUSR2 came from an external source, not from >> + * qemu_coroutine_new(), so perform the default action. >> + */ >> + exit(0); >> + } > > (2) exit() is generally unsafe to call in signal handlers. > > We could reason whether or not it is safe in this particular case (POSIX > describes the exact conditions -- > <https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03>), > but it's much simpler to just call _exit(). > > > (3) "Performing the default action" would be slightly different from > calling _exit(). When a process is terminated with a signal, the parent > can distinguish that, when reaping the child. See waitpid() / > WIFSIGNALED() / WTERMSIG(), versus WIFEXITED() / WEXITSTATUS(). > > So for the "default action", we'd have to: > - restore the SIGUSR2 handler to SIG_DFL, and > - re-raise the signal for the thread, and > - because the signal being handled is automatically blocked unless > SA_NODEFER was set: unblock the signal for the thread. > > The pthread_sigmask() call, made for the last step, would be the one > that would not return. > > *However*, all of this complexity is not really useful here. We don't > really want to simulate being "forcefully terminated" by the unexpected > (asynchronous) SIGUSR2. We just want to exit. > > Therefore, _exit() is fine. But, we should use an exit code different > from 0, as this is definitely not a pristine exit from QEMU. I'm not > sure if a convention exists for nonzero exit codes in QEMU; if not, then > just pass EXIT_FAILURE to _exit(). I’m fine with calling _exit(). I hope, Eric is, too (as long as the comment no longer claims this were the default behavior). Given that SIGUSR2 is not something that qemu is prepared to receive from the outside, EXIT_FAILURE seems right to me. (Even abort() could be justified, but I don’t think generating a core dump does any good here.) (As for qemu-specific exit code conventions, the only ones I know of are for certain qemu-img subcommands. I’m sure you grepped, too, but I can’t find anything for the system emulator.) > (4) Furthermore, please update the comment, because "perform the default > action" is not precise. Sure, that’s definitely easier than to re-raise SIGUSR2. >> + >> coTS->tr_called = 1; >> + coTS->tr_handler = NULL; >> co = &self->base; >> >> /* > > (5) I see that "tr_called" has type "volatile sig_atomic_t", which is > great (I think that "sig_atomic_t" is not even necessary here, because > of the careful signal masking that we do, and because the signal is > raised synchronously). > > "volatile" is certainly justified though (the compiler may not be able > to trace the call through the signal), and that's what I'm missing from > "tr_handler" too. IMO, the "tr_handler" field should be decalered as > follow: > > volatile void * volatile tr_handler; > > wherein the second "volatile" serves just the same purpose as the > "volatile" in "tr_called", and the first "volatile" follows from *that* > -- wherever we chase the *chain of pointers* rooted in "tr_handler" > should not be optimized out by the compiler. > > But: my point (5) is orthogonal to this patch. In practice, it may not > matter at all. So feel free to ignore now, I guess. I suppose signal handlers are indeed preempting functions (i.e., the compiler is not aware of them executing). I overlooked that, given that logically, we more or less explicitly invoke the SIGUSR2 handler, but of course, technically, we just trigger the signal and the handler is then invoked preemptively. I suspect the pthread_kill() function is sufficiently complex that the compiler can’t prove that it won’t access *coTS (e.g. because perhaps it contains a syscall?), but better be safe than sorry. But I don’t really like the “volatile void *volatile tr_handler”, particularly the “volatile void *” combinations. I find volatile voids weird. Why is tr_handler even a void pointer, and not a pointer to CoroutineSigAltStack, if all it can point to is such an object? “volatile CoroutineSigAltStack *” would make more sense to me. Given that perhaps the CoroutineSigAltStack object should be volatile, shouldn’t also the CoroutineThreadState object be volatile? If it were, we could drop the volatile from tr_called and wouldn’t need to make the pointer value tr_handler volatile. OTOH, if I look more through the code, making everything volatile seems a bit excessive to me. I understand correctly that sigsetjmp()/siglongjmp() act as barriers to the compiler, don’t they? The only value that I can see the in-coroutine code writing that the calling code reads (before the next sigsetjmp()) is tr_called. So shouldn’t it be sufficient to insert a barrier() before the pthread_kill(), and then it’d be sufficient to keep tr_called volatile, but the rest could stay as it is? >> @@ -150,12 +177,9 @@ Coroutine *qemu_coroutine_new(void) >> { >> CoroutineSigAltStack *co; >> CoroutineThreadState *coTS; >> - struct sigaction sa; >> - struct sigaction osa; >> stack_t ss; >> stack_t oss; >> sigset_t sigs; >> - sigset_t osigs; >> sigjmp_buf old_env; >> >> /* The way to manipulate stack is with the sigaltstack function. We >> @@ -172,24 +196,6 @@ Coroutine *qemu_coroutine_new(void) >> co->stack = qemu_alloc_stack(&co->stack_size); >> co->base.entry_arg = &old_env; /* stash away our jmp_buf */ >> >> - coTS = coroutine_get_thread_state(); >> - coTS->tr_handler = co; >> - >> - /* >> - * Preserve the SIGUSR2 signal state, block SIGUSR2, >> - * and establish our signal handler. The signal will >> - * later transfer control onto the signal stack. >> - */ >> - sigemptyset(&sigs); >> - sigaddset(&sigs, SIGUSR2); >> - pthread_sigmask(SIG_BLOCK, &sigs, &osigs); >> - sa.sa_handler = coroutine_trampoline; >> - sigfillset(&sa.sa_mask); >> - sa.sa_flags = SA_ONSTACK; >> - if (sigaction(SIGUSR2, &sa, &osa) != 0) { >> - abort(); >> - } >> - >> /* >> * Set the new stack. >> */ >> @@ -207,6 +213,8 @@ Coroutine *qemu_coroutine_new(void) >> * signal can be delivered the first time sigsuspend() is >> * called. >> */ >> + coTS = coroutine_get_thread_state(); >> + coTS->tr_handler = co; >> coTS->tr_called = 0; >> pthread_kill(pthread_self(), SIGUSR2); >> sigfillset(&sigs); >> @@ -230,12 +238,6 @@ Coroutine *qemu_coroutine_new(void) >> sigaltstack(&oss, NULL); >> } >> >> - /* >> - * Restore the old SIGUSR2 signal handler and mask >> - */ >> - sigaction(SIGUSR2, &osa, NULL); >> - pthread_sigmask(SIG_SETMASK, &osigs, NULL); >> - >> /* >> * Now enter the trampoline again, but this time not as a signal >> * handler. Instead we jump into it directly. The functionally >> > > (6) This change, for qemu_coroutine_new(), is too heavy-handed, in my > opinion. I suggest (a) removing only the sigaction() calls and their > directly needed aux variables, and (b) *not* moving around operations. > > In particular, you remove the blocking of SIGUSR2 for the thread -- the > pthread_sigmask() call. That means the sigsuspend() later on becomes > superfluous, as the signal will not be delivered inside sigsuspend(), > but inside pthread_kill(). With the signal not blocked, *generation* and > *delivery* will coalesce into a single event. Are you saying that (a) is a problem because this is too heavy-handed of a change for a single patch, or because it would actually be a problem to deliver the signal inside pthread_kill()? (Either way will result in me dropping the change from this patch, so it’s just a question out of curiosity.) As for (b), just FYI, I deliberately moved around the operation, so that tr_handler is only set once the coroutine stack is set up. Otherwise it might be a problem if an external SIGUSR2 comes in before then. But if we keep SIGUSR2 blocked, there is no reason for that movement, hence the “just FYI”. > The general logic should stay the same, only the signal action > manipulation should be removed. IOW, for this function, I propose the > following change only: > >> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c >> index aade82afb8c0..722fed7b2502 100644 >> --- a/util/coroutine-sigaltstack.c >> +++ b/util/coroutine-sigaltstack.c >> @@ -149,107 +149,97 @@ static void coroutine_trampoline(int signal) >> Coroutine *qemu_coroutine_new(void) >> { >> CoroutineSigAltStack *co; >> CoroutineThreadState *coTS; >> - struct sigaction sa; >> - struct sigaction osa; >> stack_t ss; >> stack_t oss; >> sigset_t sigs; >> sigset_t osigs; >> sigjmp_buf old_env; >> >> /* The way to manipulate stack is with the sigaltstack function. We >> * prepare a stack, with it delivering a signal to ourselves and then >> * put sigsetjmp/siglongjmp where needed. >> * This has been done keeping coroutine-ucontext as a model and with the >> * pth ideas (GNU Portable Threads). See coroutine-ucontext for the basics >> * of the coroutines and see pth_mctx.c (from the pth project) for the >> * sigaltstack way of manipulating stacks. >> */ >> >> co = g_malloc0(sizeof(*co)); >> co->stack_size = COROUTINE_STACK_SIZE; >> co->stack = qemu_alloc_stack(&co->stack_size); >> co->base.entry_arg = &old_env; /* stash away our jmp_buf */ >> >> coTS = coroutine_get_thread_state(); >> coTS->tr_handler = co; >> >> /* >> - * Preserve the SIGUSR2 signal state, block SIGUSR2, >> - * and establish our signal handler. The signal will >> - * later transfer control onto the signal stack. >> + * Block SIGUSR2. The signal will later transfer control onto the signal >> + * stack. >> */ >> sigemptyset(&sigs); >> sigaddset(&sigs, SIGUSR2); >> pthread_sigmask(SIG_BLOCK, &sigs, &osigs); >> - sa.sa_handler = coroutine_trampoline; >> - sigfillset(&sa.sa_mask); >> - sa.sa_flags = SA_ONSTACK; >> - if (sigaction(SIGUSR2, &sa, &osa) != 0) { >> - abort(); >> - } >> >> /* >> * Set the new stack. >> */ >> ss.ss_sp = co->stack; >> ss.ss_size = co->stack_size; >> ss.ss_flags = 0; >> if (sigaltstack(&ss, &oss) < 0) { >> abort(); >> } >> >> /* >> * Now transfer control onto the signal stack and set it up. >> * It will return immediately via "return" after the sigsetjmp() >> * was performed. Be careful here with race conditions. The >> * signal can be delivered the first time sigsuspend() is >> * called. >> */ >> coTS->tr_called = 0; >> pthread_kill(pthread_self(), SIGUSR2); >> sigfillset(&sigs); >> sigdelset(&sigs, SIGUSR2); >> while (!coTS->tr_called) { >> sigsuspend(&sigs); >> } >> >> /* >> * Inform the system that we are back off the signal stack by >> * removing the alternative signal stack. Be careful here: It >> * first has to be disabled, before it can be removed. >> */ >> sigaltstack(NULL, &ss); >> ss.ss_flags = SS_DISABLE; >> if (sigaltstack(&ss, NULL) < 0) { >> abort(); >> } >> sigaltstack(NULL, &ss); >> if (!(oss.ss_flags & SS_DISABLE)) { >> sigaltstack(&oss, NULL); >> } >> >> /* >> - * Restore the old SIGUSR2 signal handler and mask >> + * Restore the old mask >> */ >> - sigaction(SIGUSR2, &osa, NULL); >> pthread_sigmask(SIG_SETMASK, &osigs, NULL); >> >> /* >> * Now enter the trampoline again, but this time not as a signal >> * handler. Instead we jump into it directly. The functionally >> * redundant ping-pong pointer arithmetic is necessary to avoid >> * type-conversion warnings related to the `volatile' qualifier and >> * the fact that `jmp_buf' usually is an array type. >> */ >> if (!sigsetjmp(old_env, 0)) { >> siglongjmp(coTS->tr_reenter, 1); >> } >> >> /* >> * Ok, we returned again, so now we're finished >> */ >> >> return &co->base; >> } > > > (7) The sigaction() call has not been moved entirely correctly from > qemu_coroutine_new() to coroutine_init(), in my opinion. > > Namely, the original call site fills the "sa_mask" member, meaning that, > while the signal handler is executing, not only SIGUSR2 itself should be > blocked, but *all* signals. > > This is missing from the new call site, in coroutine_init() -- the > "sa_mask" member is left zeroed. > > Now, in practice, this may not matter a whole lot, because "sa_mask" is > additive, and at the only place we allow the signal to be delivered, > namely in sigsuspend(), we already have everything blocked, except for > SIGUSR2. So when "sa_mask" is ORed with the set of blocked signals, upon > delivery of SIGUSR2, there is no actual change to the signal mask. > > *But*, I feel that such a change would really deserve its own argument, > i.e. a separate patch, or at least a separate paragraph in the commit > message. And I suggest not doing those; instead please just faithfully > transfer the "sa_mask" setting too, to coroutine_init(). (My impression > is that the effective removal of the "sa_mask" population was an > oversight in this patch, not a conscious decision.) Yes, it was totally an oversight. Thanks a lot for your detailed review! I have absolutely no experience with coroutine-sigaltstack in particular, and very little experience with signal handling in projects where correctness actually matters. I’m grateful that you inspect this patch with great scrutiny. (Btw, that’s why I was very close to just ending a new version of the mutex patch just with the commit message adjusted, because that felt like I could do much less wrong than here. Turns out I was right. O:)) Max
On 22.01.21 18:09, Laszlo Ersek wrote: > On 01/22/21 11:20, Max Reitz wrote: >> Modifying signal handlers is a process-global operation. When two >> threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently, >> they may interfere with each other: One of them may revert the SIGUSR2 >> handler back to the default between the other thread setting up >> coroutine_trampoline() as the handler and raising SIGUSR2. That SIGUSR2 >> will then lead to the process exiting. >> >> Outside of coroutine-sigaltstack, qemu does not use SIGUSR2. We can >> thus keep the signal handler installed all the time. >> CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its >> stack is set up so a new coroutine is to be launched (i.e., it should >> invoke sigsetjmp()), or not (i.e., the signal came from an external >> source and we should just perform the default action, which is to exit >> the process). >> >> Note that in user-mode emulation, the guest can register signal handlers >> for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2 >> handler, sigaltstack coroutines will break from then on. However, we do >> not use coroutines for user-mode emulation, so that is fine. >> >> Suggested-by: Laszlo Ersek <lersek@redhat.com> >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> util/coroutine-sigaltstack.c | 56 +++++++++++++++++++----------------- >> 1 file changed, 29 insertions(+), 27 deletions(-) >> >> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c >> index aade82afb8..2d32afc322 100644 >> --- a/util/coroutine-sigaltstack.c >> +++ b/util/coroutine-sigaltstack.c >> @@ -59,6 +59,8 @@ typedef struct { >> >> static pthread_key_t thread_state_key; >> >> +static void coroutine_trampoline(int signal); >> + >> static CoroutineThreadState *coroutine_get_thread_state(void) >> { >> CoroutineThreadState *s = pthread_getspecific(thread_state_key); >> @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void *opaque) >> >> static void __attribute__((constructor)) coroutine_init(void) >> { >> + struct sigaction sa; >> int ret; >> >> ret = pthread_key_create(&thread_state_key, qemu_coroutine_thread_cleanup); >> @@ -87,6 +90,20 @@ static void __attribute__((constructor)) coroutine_init(void) >> fprintf(stderr, "unable to create leader key: %s\n", strerror(errno)); >> abort(); >> } >> + >> + /* >> + * Establish the SIGUSR2 signal handler. This is a process-wide >> + * operation, and so will apply to all threads from here on. >> + */ >> + sa = (struct sigaction) { >> + .sa_handler = coroutine_trampoline, >> + .sa_flags = SA_ONSTACK, >> + }; >> + >> + if (sigaction(SIGUSR2, &sa, NULL) != 0) { >> + perror("Unable to install SIGUSR2 handler"); >> + abort(); >> + } >> } >> >> /* "boot" function >> @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal) >> /* Get the thread specific information */ >> coTS = coroutine_get_thread_state(); >> self = coTS->tr_handler; >> + >> + if (!self) { >> + /* >> + * This SIGUSR2 came from an external source, not from >> + * qemu_coroutine_new(), so perform the default action. >> + */ >> + exit(0); >> + } >> + >> coTS->tr_called = 1; >> + coTS->tr_handler = NULL; >> co = &self->base; >> >> /* > > (8) There's a further complication here, assuming we really want to > recognize the case when the handler is executing unexpectedly: > > - pthread_getspecific() is not necessarily async-signal-safe, according > to POSIX, so calling coroutine_get_thread_state() in the "unexpected" > case (e.g. in response to an asynchronously generated SIGUSR2) is > problematic in its own right, That’s a shame. > - if the SIGUSR2 is delivered to a thread that has never called > coroutine_get_thread_state() before, then we'll reach g_malloc0() inside > coroutine_get_thread_state(), in signal handler context, which is very bad. Could be solved with a coroutine_try_get_thread_state() that will never malloc, but return NULL then. > You'd have to block SIGUSR2 for the entire process (all threads) at all > times, and only temporarily unblock it for a particular coroutine > thread, with the sigsuspend(). The above check would suffice, that way. Yes, that’s what I was originally afraid of. I feel like that may be the complexity drop that pushes this change too far out of my comfort zone. (And as evidenced by your review, it already was pretty much outside as it was.) > Such blocking is possible by calling pthread_sigmask() from the main > thread, before any other thread is created (the signal mask is inherited > across pthread_create()). I guess it could be done in coroutine_init() too. > > And *then* the pthread_sigmask() calls should indeed be removed from > qemu_coroutine_new(). OTOH, that does sound rather simple... > (Apologies if my feedback is difficult to understand, it's my fault. I > could propose a patch, if (and only if) you want that.) I can’t say I wouldn’t be happy with a patch for this code that doesn’t bear my S-o-b. ;) I feel conflicted. I can send a v2 that addresses this (probably consisting of multiple patches then, e.g. I’d split the SIGUSR2 blocking off the main patch), but to me, this bug is really more of a nuisance that just blocks me from sending a pull request for my block branch... So I’d rather not drag it out forever. OTOH, sending a quick and bad fix just because I can’t wait is just bad. I suppose I’ll have to decide over the weekend. Though if you’re itching to write a patch yourself, I’d definitely be grateful. Max
On 1/22/21 11:58 AM, Max Reitz wrote: >>> + if (!self) { >>> + /* >>> + * This SIGUSR2 came from an external source, not from >>> + * qemu_coroutine_new(), so perform the default action. >>> + */ >>> + exit(0); >>> + } >> >> (2) exit() is generally unsafe to call in signal handlers. >> >> We could reason whether or not it is safe in this particular case (POSIX >> describes the exact conditions -- >> <https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03>), >> >> but it's much simpler to just call _exit(). >> >> >> (3) "Performing the default action" would be slightly different from >> calling _exit(). When a process is terminated with a signal, the parent >> can distinguish that, when reaping the child. See waitpid() / >> WIFSIGNALED() / WTERMSIG(), versus WIFEXITED() / WEXITSTATUS(). >> >> So for the "default action", we'd have to: >> - restore the SIGUSR2 handler to SIG_DFL, and >> - re-raise the signal for the thread, and >> - because the signal being handled is automatically blocked unless >> SA_NODEFER was set: unblock the signal for the thread. >> >> The pthread_sigmask() call, made for the last step, would be the one >> that would not return. >> >> *However*, all of this complexity is not really useful here. We don't >> really want to simulate being "forcefully terminated" by the unexpected >> (asynchronous) SIGUSR2. We just want to exit. >> >> Therefore, _exit() is fine. But, we should use an exit code different >> from 0, as this is definitely not a pristine exit from QEMU. I'm not >> sure if a convention exists for nonzero exit codes in QEMU; if not, then >> just pass EXIT_FAILURE to _exit(). > > I’m fine with calling _exit(). I hope, Eric is, too (as long as the > comment no longer claims this were the default behavior). Using _exit(nonzero) is fine by me as long as the comment is accurate. There are signals like SIGINT where you really DO want to terminate by signal rather than using _exit(SIGINT+128), because shells can tell the difference [1]; but SIGUSR2 is not one of the signals where shells special-case their behavior. [1] https://www.cons.org/cracauer/sigint.html > > Given that SIGUSR2 is not something that qemu is prepared to receive > from the outside, EXIT_FAILURE seems right to me. (Even abort() could > be justified, but I don’t think generating a core dump does any good here.) > > (As for qemu-specific exit code conventions, the only ones I know of are > for certain qemu-img subcommands. I’m sure you grepped, too, but I > can’t find anything for the system emulator.) > >> (4) Furthermore, please update the comment, because "perform the default >> action" is not precise. > > Sure, that’s definitely easier than to re-raise SIGUSR2. Works for me as well.
On 01/22/21 18:58, Max Reitz wrote: > On 22.01.21 17:38, Laszlo Ersek wrote: >> On 01/22/21 11:20, Max Reitz wrote: >>> Modifying signal handlers is a process-global operation. When two >>> threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently, >>> they may interfere with each other: One of them may revert the SIGUSR2 >>> handler back to the default between the other thread setting up >>> coroutine_trampoline() as the handler and raising SIGUSR2. That SIGUSR2 >>> will then lead to the process exiting. >>> >>> Outside of coroutine-sigaltstack, qemu does not use SIGUSR2. We can >>> thus keep the signal handler installed all the time. >>> CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its >>> stack is set up so a new coroutine is to be launched (i.e., it should >>> invoke sigsetjmp()), or not (i.e., the signal came from an external >>> source and we should just perform the default action, which is to exit >>> the process). >>> >>> Note that in user-mode emulation, the guest can register signal handlers >>> for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2 >>> handler, sigaltstack coroutines will break from then on. However, we do >>> not use coroutines for user-mode emulation, so that is fine. >>> >>> Suggested-by: Laszlo Ersek <lersek@redhat.com> >>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>> --- >>> util/coroutine-sigaltstack.c | 56 +++++++++++++++++++----------------- >>> 1 file changed, 29 insertions(+), 27 deletions(-) >> >> (1) With SIGUSR2 permanently dedicated to "coroutine-sigaltstack.c", the >> comment on the qemu_init_main_loop() declaration, in >> "include/qemu/main-loop.h", also needs to be updated. SIGUSR2 is no >> longer a "free for thread-internal usage" signal. > > It’s possible that I haven’t understood that comment, but I haven’t > adjusted it because I didn’t interpret it to mean that the signals > listed there were free to use. For example, SIGUSR1 (aliased to > SIG_IPI) is generated by cpus_kick_thread() and caught by KVM and HVF, > so it doesn’t seem like it would be free for thread-internal usage either. > > Instead, I understood it to mean that the signals listed there do not > have to be blocked by non-main threads, because it is OK for non-main > threads to catch them. I can’t think of a reason why SIGUSR2 should be > blocked by any thread, so I decided not to change the comment. > > But perhaps I just didn’t understand the whole comment. That’s > definitely possible, given that I don’t even see a place where non-main > threads would block the signals not listed there. OK, then I agree that I don't understand the comment on qemu_init_main_loop() either. :) If you consciously decided not to change the comment, then I won't request otherwise. > >>> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c >>> index aade82afb8..2d32afc322 100644 >>> --- a/util/coroutine-sigaltstack.c >>> +++ b/util/coroutine-sigaltstack.c >>> @@ -59,6 +59,8 @@ typedef struct { >>> >>> static pthread_key_t thread_state_key; >>> >>> +static void coroutine_trampoline(int signal); >>> + >>> static CoroutineThreadState *coroutine_get_thread_state(void) >>> { >>> CoroutineThreadState *s = pthread_getspecific(thread_state_key); >>> @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void >>> *opaque) >>> >>> static void __attribute__((constructor)) coroutine_init(void) >>> { >>> + struct sigaction sa; >>> int ret; >>> >>> ret = pthread_key_create(&thread_state_key, >>> qemu_coroutine_thread_cleanup); >>> @@ -87,6 +90,20 @@ static void __attribute__((constructor)) >>> coroutine_init(void) >>> fprintf(stderr, "unable to create leader key: %s\n", >>> strerror(errno)); >>> abort(); >>> } >>> + >>> + /* >>> + * Establish the SIGUSR2 signal handler. This is a process-wide >>> + * operation, and so will apply to all threads from here on. >>> + */ >>> + sa = (struct sigaction) { >>> + .sa_handler = coroutine_trampoline, >>> + .sa_flags = SA_ONSTACK, >>> + }; >>> + >>> + if (sigaction(SIGUSR2, &sa, NULL) != 0) { >>> + perror("Unable to install SIGUSR2 handler"); >>> + abort(); >>> + } >>> } >>> >>> /* "boot" function >>> @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal) >>> /* Get the thread specific information */ >>> coTS = coroutine_get_thread_state(); >>> self = coTS->tr_handler; >>> + >>> + if (!self) { >>> + /* >>> + * This SIGUSR2 came from an external source, not from >>> + * qemu_coroutine_new(), so perform the default action. >>> + */ >>> + exit(0); >>> + } >> >> (2) exit() is generally unsafe to call in signal handlers. >> >> We could reason whether or not it is safe in this particular case (POSIX >> describes the exact conditions -- >> <https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03>), >> >> but it's much simpler to just call _exit(). >> >> >> (3) "Performing the default action" would be slightly different from >> calling _exit(). When a process is terminated with a signal, the parent >> can distinguish that, when reaping the child. See waitpid() / >> WIFSIGNALED() / WTERMSIG(), versus WIFEXITED() / WEXITSTATUS(). >> >> So for the "default action", we'd have to: >> - restore the SIGUSR2 handler to SIG_DFL, and >> - re-raise the signal for the thread, and >> - because the signal being handled is automatically blocked unless >> SA_NODEFER was set: unblock the signal for the thread. >> >> The pthread_sigmask() call, made for the last step, would be the one >> that would not return. >> >> *However*, all of this complexity is not really useful here. We don't >> really want to simulate being "forcefully terminated" by the unexpected >> (asynchronous) SIGUSR2. We just want to exit. >> >> Therefore, _exit() is fine. But, we should use an exit code different >> from 0, as this is definitely not a pristine exit from QEMU. I'm not >> sure if a convention exists for nonzero exit codes in QEMU; if not, then >> just pass EXIT_FAILURE to _exit(). > > I’m fine with calling _exit(). I hope, Eric is, too (as long as the > comment no longer claims this were the default behavior). > > Given that SIGUSR2 is not something that qemu is prepared to receive > from the outside, EXIT_FAILURE seems right to me. (Even abort() could > be justified, but I don’t think generating a core dump does any good here.) > > (As for qemu-specific exit code conventions, the only ones I know of are > for certain qemu-img subcommands. I’m sure you grepped, too, but I > can’t find anything for the system emulator.) > >> (4) Furthermore, please update the comment, because "perform the default >> action" is not precise. > > Sure, that’s definitely easier than to re-raise SIGUSR2. > >>> + >>> coTS->tr_called = 1; >>> + coTS->tr_handler = NULL; >>> co = &self->base; >>> >>> /* >> >> (5) I see that "tr_called" has type "volatile sig_atomic_t", which is >> great (I think that "sig_atomic_t" is not even necessary here, because >> of the careful signal masking that we do, and because the signal is >> raised synchronously). >> >> "volatile" is certainly justified though (the compiler may not be able >> to trace the call through the signal), and that's what I'm missing from >> "tr_handler" too. IMO, the "tr_handler" field should be decalered as >> follow: >> >> volatile void * volatile tr_handler; >> >> wherein the second "volatile" serves just the same purpose as the >> "volatile" in "tr_called", and the first "volatile" follows from *that* >> -- wherever we chase the *chain of pointers* rooted in "tr_handler" >> should not be optimized out by the compiler. >> >> But: my point (5) is orthogonal to this patch. In practice, it may not >> matter at all. So feel free to ignore now, I guess. > > I suppose signal handlers are indeed preempting functions (i.e., the > compiler is not aware of them executing). I overlooked that, given that > logically, we more or less explicitly invoke the SIGUSR2 handler, but of > course, technically, we just trigger the signal and the handler is then > invoked preemptively. > > I suspect the pthread_kill() function is sufficiently complex that the > compiler can’t prove that it won’t access *coTS (e.g. because perhaps it > contains a syscall?), but better be safe than sorry. > > But I don’t really like the “volatile void *volatile tr_handler”, > particularly the “volatile void *” combinations. I find volatile voids > weird. > > Why is tr_handler even a void pointer, and not a pointer to > CoroutineSigAltStack, if all it can point to is such an object? > “volatile CoroutineSigAltStack *” would make more sense to me. > > Given that perhaps the CoroutineSigAltStack object should be volatile, > shouldn’t also the CoroutineThreadState object be volatile? If it were, > we could drop the volatile from tr_called and wouldn’t need to make the > pointer value tr_handler volatile. Good point... > > > OTOH, if I look more through the code, making everything volatile seems > a bit excessive to me. I understand correctly that > sigsetjmp()/siglongjmp() act as barriers to the compiler, don’t they? Hmmm... https://pubs.opengroup.org/onlinepubs/9699919799/functions/longjmp.html "I guess so". > > The only value that I can see the in-coroutine code writing that the > calling code reads (before the next sigsetjmp()) is tr_called. So > shouldn’t it be sufficient to insert a barrier() before the > pthread_kill(), and then it’d be sufficient to keep tr_called volatile, > but the rest could stay as it is? Well, I guess one could argue that pthread_kill() itself should work as a barrier. It's hard to find a reason why it should not be a barrier. > >>> @@ -150,12 +177,9 @@ Coroutine *qemu_coroutine_new(void) >>> { >>> CoroutineSigAltStack *co; >>> CoroutineThreadState *coTS; >>> - struct sigaction sa; >>> - struct sigaction osa; >>> stack_t ss; >>> stack_t oss; >>> sigset_t sigs; >>> - sigset_t osigs; >>> sigjmp_buf old_env; >>> >>> /* The way to manipulate stack is with the sigaltstack >>> function. We >>> @@ -172,24 +196,6 @@ Coroutine *qemu_coroutine_new(void) >>> co->stack = qemu_alloc_stack(&co->stack_size); >>> co->base.entry_arg = &old_env; /* stash away our jmp_buf */ >>> >>> - coTS = coroutine_get_thread_state(); >>> - coTS->tr_handler = co; >>> - >>> - /* >>> - * Preserve the SIGUSR2 signal state, block SIGUSR2, >>> - * and establish our signal handler. The signal will >>> - * later transfer control onto the signal stack. >>> - */ >>> - sigemptyset(&sigs); >>> - sigaddset(&sigs, SIGUSR2); >>> - pthread_sigmask(SIG_BLOCK, &sigs, &osigs); >>> - sa.sa_handler = coroutine_trampoline; >>> - sigfillset(&sa.sa_mask); >>> - sa.sa_flags = SA_ONSTACK; >>> - if (sigaction(SIGUSR2, &sa, &osa) != 0) { >>> - abort(); >>> - } >>> - >>> /* >>> * Set the new stack. >>> */ >>> @@ -207,6 +213,8 @@ Coroutine *qemu_coroutine_new(void) >>> * signal can be delivered the first time sigsuspend() is >>> * called. >>> */ >>> + coTS = coroutine_get_thread_state(); >>> + coTS->tr_handler = co; >>> coTS->tr_called = 0; >>> pthread_kill(pthread_self(), SIGUSR2); >>> sigfillset(&sigs); >>> @@ -230,12 +238,6 @@ Coroutine *qemu_coroutine_new(void) >>> sigaltstack(&oss, NULL); >>> } >>> >>> - /* >>> - * Restore the old SIGUSR2 signal handler and mask >>> - */ >>> - sigaction(SIGUSR2, &osa, NULL); >>> - pthread_sigmask(SIG_SETMASK, &osigs, NULL); >>> - >>> /* >>> * Now enter the trampoline again, but this time not as a signal >>> * handler. Instead we jump into it directly. The functionally >>> >> >> (6) This change, for qemu_coroutine_new(), is too heavy-handed, in my >> opinion. I suggest (a) removing only the sigaction() calls and their >> directly needed aux variables, and (b) *not* moving around operations. >> >> In particular, you remove the blocking of SIGUSR2 for the thread -- the >> pthread_sigmask() call. That means the sigsuspend() later on becomes >> superfluous, as the signal will not be delivered inside sigsuspend(), >> but inside pthread_kill(). With the signal not blocked, *generation* and >> *delivery* will coalesce into a single event. > > Are you saying that (a) is a problem because this is too heavy-handed of > a change for a single patch, or because it would actually be a problem > to deliver the signal inside pthread_kill()? The former: all the things together that the patch does to qemu_coroutine_new() are too heavy-weight for a single patch. They also render sigsuspend() useless, while keeping sigsuspend() in place. > > (Either way will result in me dropping the change from this patch, so > it’s just a question out of curiosity.) > > As for (b), just FYI, I deliberately moved around the operation, so that > tr_handler is only set once the coroutine stack is set up. Otherwise it > might be a problem if an external SIGUSR2 comes in before then. > > But if we keep SIGUSR2 blocked, there is no reason for that movement, > hence the “just FYI”. > >> The general logic should stay the same, only the signal action >> manipulation should be removed. IOW, for this function, I propose the >> following change only: >> >>> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c >>> index aade82afb8c0..722fed7b2502 100644 >>> --- a/util/coroutine-sigaltstack.c >>> +++ b/util/coroutine-sigaltstack.c >>> @@ -149,107 +149,97 @@ static void coroutine_trampoline(int signal) >>> Coroutine *qemu_coroutine_new(void) >>> { >>> CoroutineSigAltStack *co; >>> CoroutineThreadState *coTS; >>> - struct sigaction sa; >>> - struct sigaction osa; >>> stack_t ss; >>> stack_t oss; >>> sigset_t sigs; >>> sigset_t osigs; >>> sigjmp_buf old_env; >>> >>> /* The way to manipulate stack is with the sigaltstack >>> function. We >>> * prepare a stack, with it delivering a signal to ourselves >>> and then >>> * put sigsetjmp/siglongjmp where needed. >>> * This has been done keeping coroutine-ucontext as a model and >>> with the >>> * pth ideas (GNU Portable Threads). See coroutine-ucontext for >>> the basics >>> * of the coroutines and see pth_mctx.c (from the pth project) >>> for the >>> * sigaltstack way of manipulating stacks. >>> */ >>> >>> co = g_malloc0(sizeof(*co)); >>> co->stack_size = COROUTINE_STACK_SIZE; >>> co->stack = qemu_alloc_stack(&co->stack_size); >>> co->base.entry_arg = &old_env; /* stash away our jmp_buf */ >>> >>> coTS = coroutine_get_thread_state(); >>> coTS->tr_handler = co; >>> >>> /* >>> - * Preserve the SIGUSR2 signal state, block SIGUSR2, >>> - * and establish our signal handler. The signal will >>> - * later transfer control onto the signal stack. >>> + * Block SIGUSR2. The signal will later transfer control onto >>> the signal >>> + * stack. >>> */ >>> sigemptyset(&sigs); >>> sigaddset(&sigs, SIGUSR2); >>> pthread_sigmask(SIG_BLOCK, &sigs, &osigs); >>> - sa.sa_handler = coroutine_trampoline; >>> - sigfillset(&sa.sa_mask); >>> - sa.sa_flags = SA_ONSTACK; >>> - if (sigaction(SIGUSR2, &sa, &osa) != 0) { >>> - abort(); >>> - } >>> >>> /* >>> * Set the new stack. >>> */ >>> ss.ss_sp = co->stack; >>> ss.ss_size = co->stack_size; >>> ss.ss_flags = 0; >>> if (sigaltstack(&ss, &oss) < 0) { >>> abort(); >>> } >>> >>> /* >>> * Now transfer control onto the signal stack and set it up. >>> * It will return immediately via "return" after the sigsetjmp() >>> * was performed. Be careful here with race conditions. The >>> * signal can be delivered the first time sigsuspend() is >>> * called. >>> */ >>> coTS->tr_called = 0; >>> pthread_kill(pthread_self(), SIGUSR2); >>> sigfillset(&sigs); >>> sigdelset(&sigs, SIGUSR2); >>> while (!coTS->tr_called) { >>> sigsuspend(&sigs); >>> } >>> >>> /* >>> * Inform the system that we are back off the signal stack by >>> * removing the alternative signal stack. Be careful here: It >>> * first has to be disabled, before it can be removed. >>> */ >>> sigaltstack(NULL, &ss); >>> ss.ss_flags = SS_DISABLE; >>> if (sigaltstack(&ss, NULL) < 0) { >>> abort(); >>> } >>> sigaltstack(NULL, &ss); >>> if (!(oss.ss_flags & SS_DISABLE)) { >>> sigaltstack(&oss, NULL); >>> } >>> >>> /* >>> - * Restore the old SIGUSR2 signal handler and mask >>> + * Restore the old mask >>> */ >>> - sigaction(SIGUSR2, &osa, NULL); >>> pthread_sigmask(SIG_SETMASK, &osigs, NULL); >>> >>> /* >>> * Now enter the trampoline again, but this time not as a signal >>> * handler. Instead we jump into it directly. The functionally >>> * redundant ping-pong pointer arithmetic is necessary to avoid >>> * type-conversion warnings related to the `volatile' qualifier >>> and >>> * the fact that `jmp_buf' usually is an array type. >>> */ >>> if (!sigsetjmp(old_env, 0)) { >>> siglongjmp(coTS->tr_reenter, 1); >>> } >>> >>> /* >>> * Ok, we returned again, so now we're finished >>> */ >>> >>> return &co->base; >>> } >> >> >> (7) The sigaction() call has not been moved entirely correctly from >> qemu_coroutine_new() to coroutine_init(), in my opinion. >> >> Namely, the original call site fills the "sa_mask" member, meaning that, >> while the signal handler is executing, not only SIGUSR2 itself should be >> blocked, but *all* signals. >> >> This is missing from the new call site, in coroutine_init() -- the >> "sa_mask" member is left zeroed. >> >> Now, in practice, this may not matter a whole lot, because "sa_mask" is >> additive, and at the only place we allow the signal to be delivered, >> namely in sigsuspend(), we already have everything blocked, except for >> SIGUSR2. So when "sa_mask" is ORed with the set of blocked signals, upon >> delivery of SIGUSR2, there is no actual change to the signal mask. >> >> *But*, I feel that such a change would really deserve its own argument, >> i.e. a separate patch, or at least a separate paragraph in the commit >> message. And I suggest not doing those; instead please just faithfully >> transfer the "sa_mask" setting too, to coroutine_init(). (My impression >> is that the effective removal of the "sa_mask" population was an >> oversight in this patch, not a conscious decision.) > > Yes, it was totally an oversight. > > Thanks a lot for your detailed review! I have absolutely no experience > with coroutine-sigaltstack in particular, and very little experience > with signal handling in projects where correctness actually matters. I’m > grateful that you inspect this patch with great scrutiny. > > (Btw, that’s why I was very close to just ending a new version of the > mutex patch just with the commit message adjusted, because that felt > like I could do much less wrong than here. Turns out I was right. O:)) TBH I started liking the mutex version too. :) It's possible we're working just too hard for solving this issue. Laszlo
On 01/22/21 19:19, Eric Blake wrote: > On 1/22/21 11:58 AM, Max Reitz wrote: > >>>> + if (!self) { >>>> + /* >>>> + * This SIGUSR2 came from an external source, not from >>>> + * qemu_coroutine_new(), so perform the default action. >>>> + */ >>>> + exit(0); >>>> + } >>> >>> (2) exit() is generally unsafe to call in signal handlers. >>> >>> We could reason whether or not it is safe in this particular case (POSIX >>> describes the exact conditions -- >>> <https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03>), >>> >>> but it's much simpler to just call _exit(). >>> >>> >>> (3) "Performing the default action" would be slightly different from >>> calling _exit(). When a process is terminated with a signal, the parent >>> can distinguish that, when reaping the child. See waitpid() / >>> WIFSIGNALED() / WTERMSIG(), versus WIFEXITED() / WEXITSTATUS(). >>> >>> So for the "default action", we'd have to: >>> - restore the SIGUSR2 handler to SIG_DFL, and >>> - re-raise the signal for the thread, and >>> - because the signal being handled is automatically blocked unless >>> SA_NODEFER was set: unblock the signal for the thread. >>> >>> The pthread_sigmask() call, made for the last step, would be the one >>> that would not return. >>> >>> *However*, all of this complexity is not really useful here. We don't >>> really want to simulate being "forcefully terminated" by the unexpected >>> (asynchronous) SIGUSR2. We just want to exit. >>> >>> Therefore, _exit() is fine. But, we should use an exit code different >>> from 0, as this is definitely not a pristine exit from QEMU. I'm not >>> sure if a convention exists for nonzero exit codes in QEMU; if not, then >>> just pass EXIT_FAILURE to _exit(). >> >> I’m fine with calling _exit(). I hope, Eric is, too (as long as the >> comment no longer claims this were the default behavior). > > Using _exit(nonzero) is fine by me as long as the comment is accurate. > There are signals like SIGINT where you really DO want to terminate by > signal rather than using _exit(SIGINT+128), because shells can tell the > difference [1]; but SIGUSR2 is not one of the signals where shells > special-case their behavior. > > [1] https://www.cons.org/cracauer/sigint.html Good point! > >> >> Given that SIGUSR2 is not something that qemu is prepared to receive >> from the outside, EXIT_FAILURE seems right to me. (Even abort() could >> be justified, but I don’t think generating a core dump does any good here.) >> >> (As for qemu-specific exit code conventions, the only ones I know of are >> for certain qemu-img subcommands. I’m sure you grepped, too, but I >> can’t find anything for the system emulator.) >> >>> (4) Furthermore, please update the comment, because "perform the default >>> action" is not precise. >> >> Sure, that’s definitely easier than to re-raise SIGUSR2. > > Works for me as well. > Thanks. Laszlo
On 01/22/21 19:05, Max Reitz wrote: > On 22.01.21 18:09, Laszlo Ersek wrote: >> On 01/22/21 11:20, Max Reitz wrote: >>> Modifying signal handlers is a process-global operation. When two >>> threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently, >>> they may interfere with each other: One of them may revert the SIGUSR2 >>> handler back to the default between the other thread setting up >>> coroutine_trampoline() as the handler and raising SIGUSR2. That SIGUSR2 >>> will then lead to the process exiting. >>> >>> Outside of coroutine-sigaltstack, qemu does not use SIGUSR2. We can >>> thus keep the signal handler installed all the time. >>> CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its >>> stack is set up so a new coroutine is to be launched (i.e., it should >>> invoke sigsetjmp()), or not (i.e., the signal came from an external >>> source and we should just perform the default action, which is to exit >>> the process). >>> >>> Note that in user-mode emulation, the guest can register signal handlers >>> for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2 >>> handler, sigaltstack coroutines will break from then on. However, we do >>> not use coroutines for user-mode emulation, so that is fine. >>> >>> Suggested-by: Laszlo Ersek <lersek@redhat.com> >>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>> --- >>> util/coroutine-sigaltstack.c | 56 +++++++++++++++++++----------------- >>> 1 file changed, 29 insertions(+), 27 deletions(-) >>> >>> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c >>> index aade82afb8..2d32afc322 100644 >>> --- a/util/coroutine-sigaltstack.c >>> +++ b/util/coroutine-sigaltstack.c >>> @@ -59,6 +59,8 @@ typedef struct { >>> static pthread_key_t thread_state_key; >>> +static void coroutine_trampoline(int signal); >>> + >>> static CoroutineThreadState *coroutine_get_thread_state(void) >>> { >>> CoroutineThreadState *s = pthread_getspecific(thread_state_key); >>> @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void >>> *opaque) >>> static void __attribute__((constructor)) coroutine_init(void) >>> { >>> + struct sigaction sa; >>> int ret; >>> ret = pthread_key_create(&thread_state_key, >>> qemu_coroutine_thread_cleanup); >>> @@ -87,6 +90,20 @@ static void __attribute__((constructor)) >>> coroutine_init(void) >>> fprintf(stderr, "unable to create leader key: %s\n", >>> strerror(errno)); >>> abort(); >>> } >>> + >>> + /* >>> + * Establish the SIGUSR2 signal handler. This is a process-wide >>> + * operation, and so will apply to all threads from here on. >>> + */ >>> + sa = (struct sigaction) { >>> + .sa_handler = coroutine_trampoline, >>> + .sa_flags = SA_ONSTACK, >>> + }; >>> + >>> + if (sigaction(SIGUSR2, &sa, NULL) != 0) { >>> + perror("Unable to install SIGUSR2 handler"); >>> + abort(); >>> + } >>> } >>> /* "boot" function >>> @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal) >>> /* Get the thread specific information */ >>> coTS = coroutine_get_thread_state(); >>> self = coTS->tr_handler; >>> + >>> + if (!self) { >>> + /* >>> + * This SIGUSR2 came from an external source, not from >>> + * qemu_coroutine_new(), so perform the default action. >>> + */ >>> + exit(0); >>> + } >>> + >>> coTS->tr_called = 1; >>> + coTS->tr_handler = NULL; >>> co = &self->base; >>> /* >> >> (8) There's a further complication here, assuming we really want to >> recognize the case when the handler is executing unexpectedly: >> >> - pthread_getspecific() is not necessarily async-signal-safe, according >> to POSIX, so calling coroutine_get_thread_state() in the "unexpected" >> case (e.g. in response to an asynchronously generated SIGUSR2) is >> problematic in its own right, > > That’s a shame. > >> - if the SIGUSR2 is delivered to a thread that has never called >> coroutine_get_thread_state() before, then we'll reach g_malloc0() inside >> coroutine_get_thread_state(), in signal handler context, which is very >> bad. > > Could be solved with a coroutine_try_get_thread_state() that will never > malloc, but return NULL then. > >> You'd have to block SIGUSR2 for the entire process (all threads) at all >> times, and only temporarily unblock it for a particular coroutine >> thread, with the sigsuspend(). The above check would suffice, that way. > > Yes, that’s what I was originally afraid of. I feel like that may be > the complexity drop that pushes this change too far out of my comfort > zone. (And as evidenced by your review, it already was pretty much > outside as it was.) > >> Such blocking is possible by calling pthread_sigmask() from the main >> thread, before any other thread is created (the signal mask is inherited >> across pthread_create()). I guess it could be done in coroutine_init() >> too. >> >> And *then* the pthread_sigmask() calls should indeed be removed from >> qemu_coroutine_new(). > > OTOH, that does sound rather simple... > >> (Apologies if my feedback is difficult to understand, it's my fault. I >> could propose a patch, if (and only if) you want that.) > > I can’t say I wouldn’t be happy with a patch for this code that doesn’t > bear my S-o-b. ;) > > I feel conflicted. I can send a v2 that addresses this (probably > consisting of multiple patches then, e.g. I’d split the SIGUSR2 blocking > off the main patch), but to me, this bug is really more of a nuisance > that just blocks me from sending a pull request for my block branch... > So I’d rather not drag it out forever. OTOH, sending a quick and bad > fix just because I can’t wait is just bad. > > I suppose I’ll have to decide over the weekend. Though if you’re > itching to write a patch yourself, I’d definitely be grateful. OK, I'll try my hand at it; I hope I won't be eating my words. Laszlo
On 01/22/21 18:58, Max Reitz wrote: > On 22.01.21 17:38, Laszlo Ersek wrote: >> (1) With SIGUSR2 permanently dedicated to "coroutine-sigaltstack.c", the >> comment on the qemu_init_main_loop() declaration, in >> "include/qemu/main-loop.h", also needs to be updated. SIGUSR2 is no >> longer a "free for thread-internal usage" signal. > > It’s possible that I haven’t understood that comment, but I haven’t > adjusted it because I didn’t interpret it to mean that the signals > listed there were free to use. For example, SIGUSR1 (aliased to > SIG_IPI) is generated by cpus_kick_thread() and caught by KVM and HVF, > so it doesn’t seem like it would be free for thread-internal usage either. > > Instead, I understood it to mean that the signals listed there do not > have to be blocked by non-main threads, because it is OK for non-main > threads to catch them. I can’t think of a reason why SIGUSR2 should be > blocked by any thread, so I decided not to change the comment. > > But perhaps I just didn’t understand the whole comment. That’s > definitely possible, given that I don’t even see a place where non-main > threads would block the signals not listed there. Upon re-reading the comment, and also looking at the qemu_init_main_loop() and qemu_signal_init() function definitions, I now think that the consider these signals to be safe language simply means that the main thread does not intend to handle the listed signals. Your SIG_IPI example is great, because, while SIG_IPI does not satisfy my original characterization ("free for thread-internal usage"), it does satisfy "not handled by the main thread". If we accept that interpretation, then we shouldn't in fact modify the comment on qemu_init_main_loop(). Because, our dedicating SIGUSR2 to coroutine-sigaltstack remains consistent with the main thread not intending to handle SIGUSR2. Put differently, what we're doing with SIGUSR2 now -- which is on the list in the comment -- has been done *earlier* with SIG_IPI -- SIGUSR1, also on the list --, namely, using the signal for a particular inter-thread, or intra-thread, purpose, while making sure the main thread does not intend to handle the signal. Thanks Laszlo
On 01/22/21 19:29, Laszlo Ersek wrote:
> OK, I'll try my hand at it; I hope I won't be eating my words.
The more I look at it, the less comfortable I am with the existent code.
For example, I don't understand why the original commit -- 3194c8ceeba0,
"coroutine: adding sigaltstack method (.c source)", 2012-03-12 --
wrapped the sigsuspend() in a loop, dependent on the "tr_called" flag.
That looks like a platform bug workaround -- it suggests that
sigsuspend() could wake spuriously, i.e., not in response to the pending
SIGUSR2.
That seems bogus, per POSIX, given that all signals except SIGUSR2 are
included in the mask passed to sigsuspend().
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigsuspend.html
Also, the comment says, "the signal can be delivered the first time
sigsuspend() is called", which is misleading -- the signal *IS*
delivered the first time sigsuspend() is called, given that we call
pthread_kill(pthread_self()) just before, with SIGUSR2 masked. So by the
time we reach sigsuspend(), the signal is pending.
(The synchronous nature of pthread_kill(pthread_self()) is confirmed by:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/raise.html
which explains (a) the equivalence of raise() with
pthread_kill(pthread_self()), and (b) the fact that raise() does not
return until after the signal handler does, if a signal handler is
called. Given that delivery is dependent on generation, and delivery is
synchronous per description, generation *cannot* be asynchronous.)
All of this makes me super uncomfortable with the code. Either the
platform(s) where it was written & tested did not conform to POSIX, or
the original author missed something, or *I* am missing something. In
each case, I should not be modifying this code; I'm flying blind.
I'm drifting towards an overhaul of coroutine-sigaltstack, based on my
personal understanding of POSIX, but given that I can absolutely not
*test* coroutine-sigaltstack on the platforms where it actually matters,
an "overhaul" by me would be reckless.
I didn't expect these skeletons when I first read Max's "Thread safety
of coroutine-sigaltstack" email :/
Max, after having worked on top of your patch for a few hours, I
officially endorse your mutex approach. I can't encourage you or myself
to touch this code, in good conscience. It's not that it's "bad"; it's
inexplicable and (to me) untestable.
Laszlo
On 01/22/21 22:26, Laszlo Ersek wrote: > I'm drifting towards an overhaul of coroutine-sigaltstack, based on my > personal understanding of POSIX, but given that I can absolutely not > *test* coroutine-sigaltstack on the platforms where it actually matters, > an "overhaul" by me would be reckless. > > I didn't expect these skeletons when I first read Max's "Thread safety > of coroutine-sigaltstack" email :/ > > Max, after having worked on top of your patch for a few hours, I > officially endorse your mutex approach. I can't encourage you or myself > to touch this code, in good conscience. It's not that it's "bad"; it's > inexplicable and (to me) untestable. I'm attaching a patch (based on 0e3246263068). I'm not convinced that I should take responsibility for this, given the lack of testability on my end. So I'm not posting it stand-alone even as an RFC. I've built it and have booted one of my existent domains with it, but that's all. Thanks Laszlo
On 22/01/21 22:26, Laszlo Ersek wrote: > That seems bogus, per POSIX, given that all signals except SIGUSR2 are > included in the mask passed to sigsuspend(). What happens if you get a SIGSTOP at exactly the wrong time? (Yeah I know how incredibly unlikely that would be). BTW if we are in a mood for cleanup, there's no reason to use pthread_key_t instead of __thread + qemu_thread_atexit_add (adding a Notifier to struct CoroutineThreadState). That would fix the issue with async-signal safety of pthread_getspecific. (It makes sense for the function not to be async-signal safe since it can in principle allocate memory for the data. In practice it's most likely okay if the function has been called before on this thread). Paolo
On 23.01.21 01:41, Laszlo Ersek wrote: > On 01/22/21 22:26, Laszlo Ersek wrote: > >> I'm drifting towards an overhaul of coroutine-sigaltstack, based on my >> personal understanding of POSIX, but given that I can absolutely not >> *test* coroutine-sigaltstack on the platforms where it actually matters, >> an "overhaul" by me would be reckless. >> >> I didn't expect these skeletons when I first read Max's "Thread safety >> of coroutine-sigaltstack" email :/ >> >> Max, after having worked on top of your patch for a few hours, I >> officially endorse your mutex approach. I can't encourage you or myself >> to touch this code, in good conscience. It's not that it's "bad"; it's >> inexplicable and (to me) untestable. On one hand, that’s too bad; on the other perhaps it’s just for the better to get all of this out of our minds again (for now)... O:) > I'm attaching a patch (based on 0e3246263068). I'm not convinced that I > should take responsibility for this, given the lack of testability on my > end. So I'm not posting it stand-alone even as an RFC. I've built it and > have booted one of my existent domains with it, but that's all. FWIW, it looks good to me. We should keep it in mind if in the future for some reason sigaltstack becomes more important, but for now I’m not too sad to abort any improvement efforts. Max
On 01/23/21 23:13, Paolo Bonzini wrote: > On 22/01/21 22:26, Laszlo Ersek wrote: >> That seems bogus, per POSIX, given that all signals except SIGUSR2 are >> included in the mask passed to sigsuspend(). > > What happens if you get a SIGSTOP at exactly the wrong time? (Yeah I > know how incredibly unlikely that would be). Nothing; it is impossible to ignore or catch SIGSTOP: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html And therefore it is also impossible to block it (but an attempt to block it will not cause issues); see e.g. https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigsuspend.html ("It is not possible to block signals that cannot be ignored. This is enforced by the system without causing an error to be indicated.") So the process will simply stop executing. The more interesting question is SIGCONT. SIGCONT will always behave like it should: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html The interesting part is that it's possible to install a handler for SIGCONT, which will act *in addition* to the normal SIGCONT effect. Such a handler is actually useful for (e.g.) restarting timers that may have been thrown off while the process was not scheduled for a while (due to SIGSTOP). If there is no handler installed for SIGCONT, nothing particular happens when it is delivered (the process doesn't notice). QEMU does seem to install a SIGCONT handler, in "chardev/char-stdio.c". But that's not a problem (at least with the patch I had attached) because the mask that sigsuspend() puts in place, temporarily, for unblocking SIGUSR2, also blocks SIGCONT (together with everything else). If a STOP/CONT occurs exactly then, then SIGCONT will resume the process, but the handler for SIGCONT will run precisely *after* sigsuspend() -- including the SIGUSR2 handling -- returns, and SIGCONT becomes unblocked by virtue of sigsuspend() restoring the original mask. (Side comment: I'm not convinced the existent SIGCONT handling in "chardev/char-stdio.c" is safe, regardless of the topic at hand -- SIGCONT does not seem to be masked for all threads except one, and so any thread could execute term_stdio_handler(). While the only called function in that is tcsetattr(), which is async signal safe, I don't know whether accessing "stdio_echo_state" and "stdio_allow_signal" is also safe. It may be.) BTW, the signal(7) Linux manual documents spurious wakeups for sigtimedwait(2) and sigwaitinfo(2), indeed in connection with SIGCONT. However, sigsuspend() is not listed in that section (I had checked): https://man7.org/linux/man-pages/man7/signal.7.html Interruption of system calls and library functions by stop signals On Linux, even in the absence of signal handlers, certain blocking interfaces can fail with the error EINTR after the process is stopped by one of the stop signals and then resumed via SIGCONT. This behavior is not sanctioned by POSIX.1, and doesn't occur on other systems. The Linux interfaces that display this behavior are: * "Input" socket interfaces, when a timeout (SO_RCVTIMEO) has been set on the socket using setsockopt(2): accept(2), recv(2), recvfrom(2), recvmmsg(2) (also with a non-NULL timeout argument), and recvmsg(2). * "Output" socket interfaces, when a timeout (SO_RCVTIMEO) has been set on the socket using setsockopt(2): connect(2), send(2), sendto(2), and sendmsg(2), if a send timeout (SO_SNDTIMEO) has been set. * epoll_wait(2), epoll_pwait(2). * semop(2), semtimedop(2). * sigtimedwait(2), sigwaitinfo(2). * Linux 3.7 and earlier: read(2) from an inotify(7) file descriptor * Linux 2.6.21 and earlier: futex(2) FUTEX_WAIT, sem_timedwait(3), sem_wait(3). * Linux 2.6.8 and earlier: msgrcv(2), msgsnd(2). * Linux 2.4 and earlier: nanosleep(2). If other host OSes that QEMU supports have similar non-conformance notes, and sigsuspend() is affected on them, then removing the loop around sigsuspend() is unsafe. I only know where to check POSIX and the Linux manuals. > BTW if we are in a mood for cleanup, there's no reason to use > pthread_key_t instead of __thread + qemu_thread_atexit_add (adding a > Notifier to struct CoroutineThreadState). That would fix the issue with > async-signal safety of pthread_getspecific. > > (It makes sense for the function not to be async-signal safe since it > can in principle allocate memory for the data. In practice it's most > likely okay if the function has been called before on this thread). This is a good idea, but with the patch I had attached, it's not urgent -- pthread_getspecific is only called from a safe context. Per POSIX, there is a problem only when an unsafe function is executed while the signal interrupts an unsafe function: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03 In the presence of signals, all functions defined by this volume of POSIX.1-2017 shall behave as defined when called from or interrupted by a signal-catching function, with the exception that when a signal interrupts an unsafe function or equivalent (such as the processing equivalent to exit() performed after a return from the initial call to main()) and the signal-catching function calls an unsafe function, the behavior is undefined. And with the proposed patch, the only function that can be interrupted by SIGUSR2 is sigsuspend(), and sigsuspend() is listed in the table of safe functions. Thanks Laszlo
On 01/25/21 11:57, Max Reitz wrote: > On 23.01.21 01:41, Laszlo Ersek wrote: >> On 01/22/21 22:26, Laszlo Ersek wrote: >> >>> I'm drifting towards an overhaul of coroutine-sigaltstack, based on my >>> personal understanding of POSIX, but given that I can absolutely not >>> *test* coroutine-sigaltstack on the platforms where it actually matters, >>> an "overhaul" by me would be reckless. >>> >>> I didn't expect these skeletons when I first read Max's "Thread safety >>> of coroutine-sigaltstack" email :/ >>> >>> Max, after having worked on top of your patch for a few hours, I >>> officially endorse your mutex approach. I can't encourage you or myself >>> to touch this code, in good conscience. It's not that it's "bad"; it's >>> inexplicable and (to me) untestable. > > On one hand, that’s too bad; on the other perhaps it’s just for the > better to get all of this out of our minds again (for now)... O:) > >> I'm attaching a patch (based on 0e3246263068). I'm not convinced that I >> should take responsibility for this, given the lack of testability on my >> end. So I'm not posting it stand-alone even as an RFC. I've built it and >> have booted one of my existent domains with it, but that's all. > > FWIW, it looks good to me. We should keep it in mind if in the future > for some reason sigaltstack becomes more important, but for now I’m not > too sad to abort any improvement efforts. OK -- so do you plan to post your mutex approach stand-alone? (Sorry if that's already been done and I've missed it.) I actually feel somewhat safe regarding my patch, as long as it is *only* executed on Linux hosts, and on other hosts that claim conformance to POSIX. But, of course, this whole thing has come up because of OSX... and I don't have the first clue about OSX. (For example whether sigsuspend() on OSX is prone to a spurious wakeup in response to SIGCONT.) Thanks Laszlo
On 01/25/21 22:13, Laszlo Ersek wrote: > On 01/23/21 23:13, Paolo Bonzini wrote: >> On 22/01/21 22:26, Laszlo Ersek wrote: >>> That seems bogus, per POSIX, given that all signals except SIGUSR2 are >>> included in the mask passed to sigsuspend(). >> >> What happens if you get a SIGSTOP at exactly the wrong time? (Yeah I >> know how incredibly unlikely that would be). > > Nothing; it is impossible to ignore or catch SIGSTOP: > > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html > > And therefore it is also impossible to block it (but an attempt to block > it will not cause issues); see e.g. > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigsuspend.html > > ("It is not possible to block signals that cannot be ignored. This is > enforced by the system without causing an error to be indicated.") > > So the process will simply stop executing. > > > The more interesting question is SIGCONT. SIGCONT will always behave > like it should: > > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html > > The interesting part is that it's possible to install a handler for > SIGCONT, which will act *in addition* to the normal SIGCONT effect. > > Such a handler is actually useful for (e.g.) restarting timers that may > have been thrown off while the process was not scheduled for a while > (due to SIGSTOP). > > If there is no handler installed for SIGCONT, nothing particular happens > when it is delivered (the process doesn't notice). Sorry, I need to correct my terminology here (no change to my description of the behavior, just the terms) -- basically SIGCONT resumes the target process (if stopped) as soon as it is generated (= made pending) for the target process; the disposition and/or masking of SIGCONT in the target has no effect on this. If a handler and/or masking are used in the target process, those only affect delivery; and in that case, delivery of SIGCONT is no different from that of other signals. In the end, SIGCONT first resumes the process (if stopped), regardless of disposition / masking, and *then* it is delivered, subject to disposition and masking. (If SIGCONT is not blocked but it is caught, then of course generation and delivery are indistinguishable in the target process.) (A program I had written earlier relied on this specifically, for restarting a timer -- it would block SIGCONT at all times, except in a very tight "event fetch" call, such as pselect(). So there could easily be sections of code that would run with SIGCONT actively pending. In fact, async SIGCONT could be used to force a timer restart, without ever sending SIGSTOP!) So the short paragraph I'm quoting right above is inaccurate because, if no handler is installed, SIGCONT is not "delivered" at all. Thanks Laszlo > > QEMU does seem to install a SIGCONT handler, in "chardev/char-stdio.c". > But that's not a problem (at least with the patch I had attached) > because the mask that sigsuspend() puts in place, temporarily, for > unblocking SIGUSR2, also blocks SIGCONT (together with everything else). > If a STOP/CONT occurs exactly then, then SIGCONT will resume the > process, but the handler for SIGCONT will run precisely *after* > sigsuspend() -- including the SIGUSR2 handling -- returns, and SIGCONT > becomes unblocked by virtue of sigsuspend() restoring the original mask. > > (Side comment: I'm not convinced the existent SIGCONT handling in > "chardev/char-stdio.c" is safe, regardless of the topic at hand -- > SIGCONT does not seem to be masked for all threads except one, and so > any thread could execute term_stdio_handler(). While the only called > function in that is tcsetattr(), which is async signal safe, I don't > know whether accessing "stdio_echo_state" and "stdio_allow_signal" is > also safe. It may be.) > > > BTW, the signal(7) Linux manual documents spurious wakeups for > sigtimedwait(2) and sigwaitinfo(2), indeed in connection with SIGCONT. > However, sigsuspend() is not listed in that section (I had checked): > > https://man7.org/linux/man-pages/man7/signal.7.html > > Interruption of system calls and library functions by stop signals > On Linux, even in the absence of signal handlers, certain > blocking interfaces can fail with the error EINTR after the > process is stopped by one of the stop signals and then resumed > via SIGCONT. This behavior is not sanctioned by POSIX.1, and > doesn't occur on other systems. > > The Linux interfaces that display this behavior are: > > * "Input" socket interfaces, when a timeout (SO_RCVTIMEO) has > been set on the socket using setsockopt(2): accept(2), recv(2), > recvfrom(2), recvmmsg(2) (also with a non-NULL timeout > argument), and recvmsg(2). > > * "Output" socket interfaces, when a timeout (SO_RCVTIMEO) has > been set on the socket using setsockopt(2): connect(2), > send(2), sendto(2), and sendmsg(2), if a send timeout > (SO_SNDTIMEO) has been set. > > * epoll_wait(2), epoll_pwait(2). > > * semop(2), semtimedop(2). > > * sigtimedwait(2), sigwaitinfo(2). > > * Linux 3.7 and earlier: read(2) from an inotify(7) file > descriptor > > * Linux 2.6.21 and earlier: futex(2) FUTEX_WAIT, > sem_timedwait(3), sem_wait(3). > > * Linux 2.6.8 and earlier: msgrcv(2), msgsnd(2). > > * Linux 2.4 and earlier: nanosleep(2). > > If other host OSes that QEMU supports have similar non-conformance > notes, and sigsuspend() is affected on them, then removing the loop > around sigsuspend() is unsafe. > > I only know where to check POSIX and the Linux manuals. > > >> BTW if we are in a mood for cleanup, there's no reason to use >> pthread_key_t instead of __thread + qemu_thread_atexit_add (adding a >> Notifier to struct CoroutineThreadState). That would fix the issue with >> async-signal safety of pthread_getspecific. >> >> (It makes sense for the function not to be async-signal safe since it >> can in principle allocate memory for the data. In practice it's most >> likely okay if the function has been called before on this thread). > > This is a good idea, but with the patch I had attached, it's not urgent > -- pthread_getspecific is only called from a safe context. Per POSIX, > there is a problem only when an unsafe function is executed while the > signal interrupts an unsafe function: > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03 > > In the presence of signals, all functions defined by this volume of > POSIX.1-2017 shall behave as defined when called from or interrupted > by a signal-catching function, with the exception that when a signal > interrupts an unsafe function or equivalent (such as the processing > equivalent to exit() performed after a return from the initial call > to main()) and the signal-catching function calls an unsafe > function, the behavior is undefined. > > And with the proposed patch, the only function that can be interrupted > by SIGUSR2 is sigsuspend(), and sigsuspend() is listed in the table of > safe functions. > > Thanks > Laszlo >
diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c index aade82afb8..2d32afc322 100644 --- a/util/coroutine-sigaltstack.c +++ b/util/coroutine-sigaltstack.c @@ -59,6 +59,8 @@ typedef struct { static pthread_key_t thread_state_key; +static void coroutine_trampoline(int signal); + static CoroutineThreadState *coroutine_get_thread_state(void) { CoroutineThreadState *s = pthread_getspecific(thread_state_key); @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void *opaque) static void __attribute__((constructor)) coroutine_init(void) { + struct sigaction sa; int ret; ret = pthread_key_create(&thread_state_key, qemu_coroutine_thread_cleanup); @@ -87,6 +90,20 @@ static void __attribute__((constructor)) coroutine_init(void) fprintf(stderr, "unable to create leader key: %s\n", strerror(errno)); abort(); } + + /* + * Establish the SIGUSR2 signal handler. This is a process-wide + * operation, and so will apply to all threads from here on. + */ + sa = (struct sigaction) { + .sa_handler = coroutine_trampoline, + .sa_flags = SA_ONSTACK, + }; + + if (sigaction(SIGUSR2, &sa, NULL) != 0) { + perror("Unable to install SIGUSR2 handler"); + abort(); + } } /* "boot" function @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal) /* Get the thread specific information */ coTS = coroutine_get_thread_state(); self = coTS->tr_handler; + + if (!self) { + /* + * This SIGUSR2 came from an external source, not from + * qemu_coroutine_new(), so perform the default action. + */ + exit(0); + } + coTS->tr_called = 1; + coTS->tr_handler = NULL; co = &self->base; /* @@ -150,12 +177,9 @@ Coroutine *qemu_coroutine_new(void) { CoroutineSigAltStack *co; CoroutineThreadState *coTS; - struct sigaction sa; - struct sigaction osa; stack_t ss; stack_t oss; sigset_t sigs; - sigset_t osigs; sigjmp_buf old_env; /* The way to manipulate stack is with the sigaltstack function. We @@ -172,24 +196,6 @@ Coroutine *qemu_coroutine_new(void) co->stack = qemu_alloc_stack(&co->stack_size); co->base.entry_arg = &old_env; /* stash away our jmp_buf */ - coTS = coroutine_get_thread_state(); - coTS->tr_handler = co; - - /* - * Preserve the SIGUSR2 signal state, block SIGUSR2, - * and establish our signal handler. The signal will - * later transfer control onto the signal stack. - */ - sigemptyset(&sigs); - sigaddset(&sigs, SIGUSR2); - pthread_sigmask(SIG_BLOCK, &sigs, &osigs); - sa.sa_handler = coroutine_trampoline; - sigfillset(&sa.sa_mask); - sa.sa_flags = SA_ONSTACK; - if (sigaction(SIGUSR2, &sa, &osa) != 0) { - abort(); - } - /* * Set the new stack. */ @@ -207,6 +213,8 @@ Coroutine *qemu_coroutine_new(void) * signal can be delivered the first time sigsuspend() is * called. */ + coTS = coroutine_get_thread_state(); + coTS->tr_handler = co; coTS->tr_called = 0; pthread_kill(pthread_self(), SIGUSR2); sigfillset(&sigs); @@ -230,12 +238,6 @@ Coroutine *qemu_coroutine_new(void) sigaltstack(&oss, NULL); } - /* - * Restore the old SIGUSR2 signal handler and mask - */ - sigaction(SIGUSR2, &osa, NULL); - pthread_sigmask(SIG_SETMASK, &osigs, NULL); - /* * Now enter the trampoline again, but this time not as a signal * handler. Instead we jump into it directly. The functionally
Modifying signal handlers is a process-global operation. When two threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently, they may interfere with each other: One of them may revert the SIGUSR2 handler back to the default between the other thread setting up coroutine_trampoline() as the handler and raising SIGUSR2. That SIGUSR2 will then lead to the process exiting. Outside of coroutine-sigaltstack, qemu does not use SIGUSR2. We can thus keep the signal handler installed all the time. CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its stack is set up so a new coroutine is to be launched (i.e., it should invoke sigsetjmp()), or not (i.e., the signal came from an external source and we should just perform the default action, which is to exit the process). Note that in user-mode emulation, the guest can register signal handlers for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2 handler, sigaltstack coroutines will break from then on. However, we do not use coroutines for user-mode emulation, so that is fine. Suggested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> --- util/coroutine-sigaltstack.c | 56 +++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 27 deletions(-)