Message ID | 20190529161157.GA27659@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()) | expand |
From: Oleg Nesterov > Sent: 29 May 2019 17:12 > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. > > I think that the following code is correct > > int interrupted = 0; > > void sigint_handler(int sig) > { > interrupted = 1; > } > > int main(void) > { > sigset_t sigint, empty; > > sigemptyset(&sigint); > sigaddset(&sigint, SIGINT); > sigprocmask(SIG_BLOCK, &sigint, NULL); > > signal(SIGINT, sigint_handler); > > sigemptyset(&empty); // so pselect() unblocks SIGINT > > ret = pselect(..., &empty); ^^^^^ sigint > > if (ret >= 0) // sucess or timeout > assert(!interrupted); > > if (interrupted) > assert(ret == -EINTR); > } > > IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this > signal should not be delivered if a ready fd was found or timeout. The signal > handle should only run if ret == -EINTR. Personally I think that is wrong. Given code like the above that has: while (!interrupted) { pselect(..., &sigint); // process available data } You want the signal handler to be executed even if one of the fds always has available data. Otherwise you can't interrupt a process that is always busy. One option is to return -EINTR if a signal is pending when the mask is updated - before even looking at anything else. Signals that happen later on (eg after a timeout) need not be reported (until the next time around the loop). > (pselect() can be interrupted by any other signal which has a handler. In this > case the handler can be called even if ret >= 0. This is correct, I fail to > understand why some people think this is wrong, and in any case we simply can't > avoid this). You mean any signal that isn't blocked when pselect() is called.... > This was true until 854a6ed56839a ("signal: Add restore_user_sigmask()"), > now this is broken by the signal_pending() check in restore_user_sigmask(). > > This patch https://lore.kernel.org/lkml/20190522032144.10995-1-deepa.kernel@gmail.com/ > turns 0 into -EINTR if signal_pending(), but I think we should simply restore > the old behaviour and simplify the code. > > See the compile-tested patch at the end. Of course, the new _xxx() helpers > should be renamed somehow. fs/aio.c doesn't look right with or without this > patch, but iiuc this is what it did before 854a6ed56839a. > > Let me show the code with the patch applied. I am using epoll_pwait() as an > example because it looks very simple. > > > static inline void set_restore_sigmask(void) > { > // WARN_ON(!TIF_SIGPENDING) was removed by this patch > current->restore_sigmask = true; > } > > int set_xxx(const sigset_t __user *umask, size_t sigsetsize) > { > sigset_t *kmask; ^ no '*' here, add & before uses. > > if (!umask) > return 0; > if (sigsetsize != sizeof(sigset_t)) > return -EINVAL; > if (copy_from_user(kmask, umask, sizeof(sigset_t))) > return -EFAULT; > > // we can safely modify ->saved_sigmask/restore_sigmask, they has no meaning > // until the syscall returns. > set_restore_sigmask(); > current->saved_sigmask = current->blocked; > set_current_blocked(kmask); > > return 0; > } > > > void update_xxx(bool interrupted) > { > // the main reason for this helper is WARN_ON(!TIF_SIGPENDING) which was "moved" > // from set_restore_sigmask() above. > if (interrupted) > WARN_ON(!test_thread_flag(TIF_SIGPENDING)); > else > restore_saved_sigmask(); > } I looked at the code earlier, but failed to find the code that actually delivers the signals. It may be 'racy' with update_xxx() regardless of whether that is looking for -EINTR or just a pending signal. I assume that TIF_SIGPENGING is used to (not) short-circuit the system call return path so that signals get delivered. So that it is important that update_xxx() calls restore_saved_sigmask() if there is no signal pending. (Although a signal can happen after the test - which can/will be ignored until the signal is enabled again.) restore_saved_sigmask() must itself be able to set TIF_SIGPENDING (the inner sigmask could be more restrictive!). If restore_saved_sigmask() isn't called here, the syscall return path must do it after calling all the handlers and after clearing TIF_SIGPENDING, and then call unmasked handlers again. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Resending due to inadvertent conversion of prior message to html. On Wed, May 29, 2019 at 9:12 AM Oleg Nesterov <oleg@redhat.com> wrote: > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. > > I think that the following code is correct > > int interrupted = 0; > > void sigint_handler(int sig) > { > interrupted = 1; > } > > int main(void) > { > sigset_t sigint, empty; > > sigemptyset(&sigint); > sigaddset(&sigint, SIGINT); > sigprocmask(SIG_BLOCK, &sigint, NULL); > > signal(SIGINT, sigint_handler); > > sigemptyset(&empty); // so pselect() unblocks SIGINT > > ret = pselect(..., &empty); > > if (ret >= 0) // sucess or timeout > assert(!interrupted); > > if (interrupted) > assert(ret == -EINTR); > } > > IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this > signal should not be delivered if a ready fd was found or timeout. The signal > handle should only run if ret == -EINTR. > > (pselect() can be interrupted by any other signal which has a handler. In this > case the handler can be called even if ret >= 0. This is correct, I fail to > understand why some people think this is wrong, and in any case we simply can't > avoid this). > > This was true until 854a6ed56839a ("signal: Add restore_user_sigmask()"), > now this is broken by the signal_pending() check in restore_user_sigmask(). > > This patch https://lore.kernel.org/lkml/20190522032144.10995-1-deepa.kernel@gmail.com/ > turns 0 into -EINTR if signal_pending(), but I think we should simply restore > the old behaviour and simplify the code. > > See the compile-tested patch at the end. Of course, the new _xxx() helpers > should be renamed somehow. fs/aio.c doesn't look right with or without this > patch, but iiuc this is what it did before 854a6ed56839a. > > Let me show the code with the patch applied. I am using epoll_pwait() as an > example because it looks very simple. > > > static inline void set_restore_sigmask(void) > { > // WARN_ON(!TIF_SIGPENDING) was removed by this patch > current->restore_sigmask = true; > } > > int set_xxx(const sigset_t __user *umask, size_t sigsetsize) > { > sigset_t *kmask; > > if (!umask) > return 0; > if (sigsetsize != sizeof(sigset_t)) > return -EINVAL; > if (copy_from_user(kmask, umask, sizeof(sigset_t))) > return -EFAULT; > > // we can safely modify ->saved_sigmask/restore_sigmask, they has no meaning > // until the syscall returns. > set_restore_sigmask(); > current->saved_sigmask = current->blocked; > set_current_blocked(kmask); > > return 0; > } > > > void update_xxx(bool interrupted) > { > // the main reason for this helper is WARN_ON(!TIF_SIGPENDING) which was "moved" > // from set_restore_sigmask() above. > if (interrupted) > WARN_ON(!test_thread_flag(TIF_SIGPENDING)); > else > restore_saved_sigmask(); > } > > SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, > int, maxevents, int, timeout, const sigset_t __user *, sigmask, > size_t, sigsetsize) > { > int error; > > error = set_xxx(sigmask, sigsetsize); > if (error) > return error; > > error = do_epoll_wait(epfd, events, maxevents, timeout); > update_xxx(error == -EINTR); > > return error; > } > > Oleg. > --- > > fs/aio.c | 40 ++++++++++++++--------- > fs/eventpoll.c | 12 +++---- > fs/io_uring.c | 12 +++---- > fs/select.c | 40 +++++++---------------- > include/linux/compat.h | 4 +-- > include/linux/sched/signal.h | 2 -- > include/linux/signal.h | 6 ++-- > kernel/signal.c | 77 ++++++++++++++++---------------------------- > 8 files changed, 74 insertions(+), 119 deletions(-) > > > diff --git a/fs/aio.c b/fs/aio.c > index 3490d1f..8315bd2 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -2093,8 +2093,8 @@ SYSCALL_DEFINE6(io_pgetevents, > const struct __aio_sigset __user *, usig) > { > struct __aio_sigset ksig = { NULL, }; > - sigset_t ksigmask, sigsaved; > struct timespec64 ts; > + bool interrupted; > int ret; > > if (timeout && unlikely(get_timespec64(&ts, timeout))) > @@ -2103,13 +2103,15 @@ SYSCALL_DEFINE6(io_pgetevents, I already admitted I was wrong about this. And the behavior you point out is consistent with man page. I'm not sure why we are not understanding that I'm agreeing with you: https://lore.kernel.org/lkml/CABeXuvouBzZuNarmNcd9JgZgvonL1N_p21gat=O_x0-1hMx=6A@mail.gmail.com/ This was true until 854a6ed56839a("signal: Add restore_user_sigmask()"), now this is broken by the signal_pending() check in restore_user_sigmask(). This patch is wrong because I did not know about the above behavior before. I also admitted to this. And proposed another way to revert the patch. This patch https://lore.kernel.org/lkml/20190522032144.10995-1-deepa.kernel@gmail.com/ turns 0 into -EINTR if signal_pending(), but I think we should simply restore the old behaviour and simplify the code. I agree that the patch that turns 0 to -EINTR should not be applied. It was my misunderstanding. The part I'm not sure is that I cannot find anything that is wrong with 854a6ed56839a("signal: Add restore_user_sigmask()") that was not already before. But, that is ok, we can do the revert and I will investigate with the test application if I can find something. -Deepa > if (usig && copy_from_user(&ksig, usig, sizeof(ksig))) > return -EFAULT; > > - ret = set_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); > + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); > if (ret) > return ret; > > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL); > - restore_user_sigmask(ksig.sigmask, &sigsaved); > - if (signal_pending(current) && !ret) > + > + interrupted = signal_pending(current); > + update_xxx(interrupted); > + if (interrupted && !ret) > ret = -ERESTARTNOHAND; > > return ret; > @@ -2126,8 +2128,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32, > const struct __aio_sigset __user *, usig) > { > struct __aio_sigset ksig = { NULL, }; > - sigset_t ksigmask, sigsaved; > struct timespec64 ts; > + bool interrupted; > int ret; > > if (timeout && unlikely(get_old_timespec32(&ts, timeout))) > @@ -2137,13 +2139,15 @@ SYSCALL_DEFINE6(io_pgetevents_time32, > return -EFAULT; > > > - ret = set_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); > + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); > if (ret) > return ret; > > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL); > - restore_user_sigmask(ksig.sigmask, &sigsaved); > - if (signal_pending(current) && !ret) > + > + interrupted = signal_pending(current); > + update_xxx(interrupted); > + if (interrupted && !ret) > ret = -ERESTARTNOHAND; > > return ret; > @@ -2191,8 +2195,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, > const struct __compat_aio_sigset __user *, usig) > { > struct __compat_aio_sigset ksig = { NULL, }; > - sigset_t ksigmask, sigsaved; > struct timespec64 t; > + bool interrupted; > int ret; > > if (timeout && get_old_timespec32(&t, timeout)) > @@ -2201,13 +2205,15 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, > if (usig && copy_from_user(&ksig, usig, sizeof(ksig))) > return -EFAULT; > > - ret = set_compat_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); > + ret = set_compat_xxx(ksig.sigmask, ksig.sigsetsize); > if (ret) > return ret; > > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL); > - restore_user_sigmask(ksig.sigmask, &sigsaved); > - if (signal_pending(current) && !ret) > + > + interrupted = signal_pending(current); > + update_xxx(interrupted); > + if (interrupted && !ret) > ret = -ERESTARTNOHAND; > > return ret; > @@ -2224,8 +2230,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, > const struct __compat_aio_sigset __user *, usig) > { > struct __compat_aio_sigset ksig = { NULL, }; > - sigset_t ksigmask, sigsaved; > struct timespec64 t; > + bool interrupted; > int ret; > > if (timeout && get_timespec64(&t, timeout)) > @@ -2234,13 +2240,15 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, > if (usig && copy_from_user(&ksig, usig, sizeof(ksig))) > return -EFAULT; > > - ret = set_compat_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); > + ret = set_compat_xxx(ksig.sigmask, ksig.sigsetsize); > if (ret) > return ret; > > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL); > - restore_user_sigmask(ksig.sigmask, &sigsaved); > - if (signal_pending(current) && !ret) > + > + interrupted = signal_pending(current); > + update_xxx(interrupted); > + if (interrupted && !ret) > ret = -ERESTARTNOHAND; > > return ret; > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index 4a0e98d..0b1a337 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -2318,19 +2318,17 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, > size_t, sigsetsize) > { > int error; > - sigset_t ksigmask, sigsaved; > > /* > * If the caller wants a certain signal mask to be set during the wait, > * we apply it here. > */ > - error = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); > + error = set_xxx(sigmask, sigsetsize); > if (error) > return error; > > error = do_epoll_wait(epfd, events, maxevents, timeout); > - > - restore_user_sigmask(sigmask, &sigsaved); > + update_xxx(error == -EINTR); > > return error; > } > @@ -2343,19 +2341,17 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd, > compat_size_t, sigsetsize) > { > long err; > - sigset_t ksigmask, sigsaved; > > /* > * If the caller wants a certain signal mask to be set during the wait, > * we apply it here. > */ > - err = set_compat_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); > + err = set_compat_xxx(sigmask, sigsetsize); > if (err) > return err; > > err = do_epoll_wait(epfd, events, maxevents, timeout); > - > - restore_user_sigmask(sigmask, &sigsaved); > + update_xxx(err == -EINTR); > > return err; > } > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 310f8d1..b5b99e2 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -2180,7 +2180,6 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, > const sigset_t __user *sig, size_t sigsz) > { > struct io_cq_ring *ring = ctx->cq_ring; > - sigset_t ksigmask, sigsaved; > int ret; > > if (io_cqring_events(ring) >= min_events) > @@ -2189,24 +2188,21 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, > if (sig) { > #ifdef CONFIG_COMPAT > if (in_compat_syscall()) > - ret = set_compat_user_sigmask((const compat_sigset_t __user *)sig, > - &ksigmask, &sigsaved, sigsz); > + ret = set_compat_xxx((const compat_sigset_t __user *)sig, > + sigsz); > else > #endif > - ret = set_user_sigmask(sig, &ksigmask, > - &sigsaved, sigsz); > + ret = set_xxx(sig, sigsz); > > if (ret) > return ret; > } > > ret = wait_event_interruptible(ctx->wait, io_cqring_events(ring) >= min_events); > + update_xxx(ret == -ERESTARTSYS); > if (ret == -ERESTARTSYS) > ret = -EINTR; > > - if (sig) > - restore_user_sigmask(sig, &sigsaved); > - > return READ_ONCE(ring->r.head) == READ_ONCE(ring->r.tail) ? ret : 0; > } > > diff --git a/fs/select.c b/fs/select.c > index 6cbc9ff..7eab132 100644 > --- a/fs/select.c > +++ b/fs/select.c > @@ -730,7 +730,6 @@ static long do_pselect(int n, fd_set __user *inp, fd_set __user *outp, > const sigset_t __user *sigmask, size_t sigsetsize, > enum poll_time_type type) > { > - sigset_t ksigmask, sigsaved; > struct timespec64 ts, end_time, *to = NULL; > int ret; > > @@ -753,15 +752,14 @@ static long do_pselect(int n, fd_set __user *inp, fd_set __user *outp, > return -EINVAL; > } > > - ret = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); > + ret = set_xxx(sigmask, sigsetsize); > if (ret) > return ret; > > ret = core_sys_select(n, inp, outp, exp, to); > + update_xxx(ret == -ERESTARTNOHAND); > ret = poll_select_copy_remaining(&end_time, tsp, type, ret); > > - restore_user_sigmask(sigmask, &sigsaved); > - > return ret; > } > > @@ -1087,7 +1085,6 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds, unsigned int, nfds, > struct __kernel_timespec __user *, tsp, const sigset_t __user *, sigmask, > size_t, sigsetsize) > { > - sigset_t ksigmask, sigsaved; > struct timespec64 ts, end_time, *to = NULL; > int ret; > > @@ -1100,18 +1097,16 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds, unsigned int, nfds, > return -EINVAL; > } > > - ret = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); > + ret = set_xxx(sigmask, sigsetsize); > if (ret) > return ret; > > ret = do_sys_poll(ufds, nfds, to); > > - restore_user_sigmask(sigmask, &sigsaved); > - > + update_xxx(ret == -EINTR); > /* We can restart this syscall, usually */ > if (ret == -EINTR) > ret = -ERESTARTNOHAND; > - > ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret); > > return ret; > @@ -1123,7 +1118,6 @@ SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds, > struct old_timespec32 __user *, tsp, const sigset_t __user *, sigmask, > size_t, sigsetsize) > { > - sigset_t ksigmask, sigsaved; > struct timespec64 ts, end_time, *to = NULL; > int ret; > > @@ -1136,18 +1130,16 @@ SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds, > return -EINVAL; > } > > - ret = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); > + ret = set_xxx(sigmask, sigsetsize); > if (ret) > return ret; > > ret = do_sys_poll(ufds, nfds, to); > > - restore_user_sigmask(sigmask, &sigsaved); > - > + update_xxx(ret == -EINTR); > /* We can restart this syscall, usually */ > if (ret == -EINTR) > ret = -ERESTARTNOHAND; > - > ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret); > > return ret; > @@ -1322,7 +1314,6 @@ static long do_compat_pselect(int n, compat_ulong_t __user *inp, > void __user *tsp, compat_sigset_t __user *sigmask, > compat_size_t sigsetsize, enum poll_time_type type) > { > - sigset_t ksigmask, sigsaved; > struct timespec64 ts, end_time, *to = NULL; > int ret; > > @@ -1345,15 +1336,14 @@ static long do_compat_pselect(int n, compat_ulong_t __user *inp, > return -EINVAL; > } > > - ret = set_compat_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); > + ret = set_compat_xxx(sigmask, sigsetsize); > if (ret) > return ret; > > ret = compat_core_sys_select(n, inp, outp, exp, to); > + update_xxx(ret == -ERESTARTNOHAND); > ret = poll_select_copy_remaining(&end_time, tsp, type, ret); > > - restore_user_sigmask(sigmask, &sigsaved); > - > return ret; > } > > @@ -1406,7 +1396,6 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, > unsigned int, nfds, struct old_timespec32 __user *, tsp, > const compat_sigset_t __user *, sigmask, compat_size_t, sigsetsize) > { > - sigset_t ksigmask, sigsaved; > struct timespec64 ts, end_time, *to = NULL; > int ret; > > @@ -1419,18 +1408,16 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, > return -EINVAL; > } > > - ret = set_compat_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); > + ret = set_compat_xxx(sigmask, sigsetsize); > if (ret) > return ret; > > ret = do_sys_poll(ufds, nfds, to); > > - restore_user_sigmask(sigmask, &sigsaved); > - > + update_xxx(ret == -EINTR); > /* We can restart this syscall, usually */ > if (ret == -EINTR) > ret = -ERESTARTNOHAND; > - > ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret); > > return ret; > @@ -1442,7 +1429,6 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct pollfd __user *, ufds, > unsigned int, nfds, struct __kernel_timespec __user *, tsp, > const compat_sigset_t __user *, sigmask, compat_size_t, sigsetsize) > { > - sigset_t ksigmask, sigsaved; > struct timespec64 ts, end_time, *to = NULL; > int ret; > > @@ -1455,18 +1441,16 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct pollfd __user *, ufds, > return -EINVAL; > } > > - ret = set_compat_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); > + ret = set_compat_xxx(sigmask, sigsetsize); > if (ret) > return ret; > > ret = do_sys_poll(ufds, nfds, to); > > - restore_user_sigmask(sigmask, &sigsaved); > - > + update_xxx(ret == -EINTR); > /* We can restart this syscall, usually */ > if (ret == -EINTR) > ret = -ERESTARTNOHAND; > - > ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret); > > return ret; > diff --git a/include/linux/compat.h b/include/linux/compat.h > index ebddcb6..b20b001 100644 > --- a/include/linux/compat.h > +++ b/include/linux/compat.h > @@ -138,9 +138,7 @@ typedef struct { > compat_sigset_word sig[_COMPAT_NSIG_WORDS]; > } compat_sigset_t; > > -int set_compat_user_sigmask(const compat_sigset_t __user *usigmask, > - sigset_t *set, sigset_t *oldset, > - size_t sigsetsize); > +int set_compat_xxx(const compat_sigset_t __user *umask, size_t sigsetsize); > > struct compat_sigaction { > #ifndef __ARCH_HAS_IRIX_SIGACTION > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > index 38a0f07..8b5b537 100644 > --- a/include/linux/sched/signal.h > +++ b/include/linux/sched/signal.h > @@ -417,7 +417,6 @@ void task_join_group_stop(struct task_struct *task); > static inline void set_restore_sigmask(void) > { > set_thread_flag(TIF_RESTORE_SIGMASK); > - WARN_ON(!test_thread_flag(TIF_SIGPENDING)); > } > > static inline void clear_tsk_restore_sigmask(struct task_struct *task) > @@ -448,7 +447,6 @@ static inline bool test_and_clear_restore_sigmask(void) > static inline void set_restore_sigmask(void) > { > current->restore_sigmask = true; > - WARN_ON(!test_thread_flag(TIF_SIGPENDING)); > } > static inline void clear_tsk_restore_sigmask(struct task_struct *task) > { > diff --git a/include/linux/signal.h b/include/linux/signal.h > index 9702016..65f84ac 100644 > --- a/include/linux/signal.h > +++ b/include/linux/signal.h > @@ -273,10 +273,8 @@ extern int group_send_sig_info(int sig, struct kernel_siginfo *info, > struct task_struct *p, enum pid_type type); > extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *); > extern int sigprocmask(int, sigset_t *, sigset_t *); > -extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set, > - sigset_t *oldset, size_t sigsetsize); > -extern void restore_user_sigmask(const void __user *usigmask, > - sigset_t *sigsaved); > +extern int set_xxx(const sigset_t __user *umask, size_t sigsetsize); > +extern void update_xxx(bool interupted); > extern void set_current_blocked(sigset_t *); > extern void __set_current_blocked(const sigset_t *); > extern int show_unhandled_signals; > diff --git a/kernel/signal.c b/kernel/signal.c > index d7b9d14..0a1ec68 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -2861,79 +2861,56 @@ EXPORT_SYMBOL(sigprocmask); > * > * This is useful for syscalls such as ppoll, pselect, io_pgetevents and > * epoll_pwait where a new sigmask is passed from userland for the syscalls. > + * > + * Note that it does set_restore_sigmask() in advance, so it must be always > + * paired with update_xxx() before return from syscall. > */ > -int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set, > - sigset_t *oldset, size_t sigsetsize) > +int set_xxx(const sigset_t __user *umask, size_t sigsetsize) > { > - if (!usigmask) > - return 0; > + sigset_t *kmask; > > + if (!umask) > + return 0; > if (sigsetsize != sizeof(sigset_t)) > return -EINVAL; > - if (copy_from_user(set, usigmask, sizeof(sigset_t))) > + if (copy_from_user(kmask, umask, sizeof(sigset_t))) > return -EFAULT; > > - *oldset = current->blocked; > - set_current_blocked(set); > + set_restore_sigmask(); > + current->saved_sigmask = current->blocked; > + set_current_blocked(kmask); > > return 0; > } > -EXPORT_SYMBOL(set_user_sigmask); > + > +void update_xxx(bool interrupted) > +{ > + if (interrupted) > + WARN_ON(!test_thread_flag(TIF_SIGPENDING)); > + else > + restore_saved_sigmask(); > +} > > #ifdef CONFIG_COMPAT > -int set_compat_user_sigmask(const compat_sigset_t __user *usigmask, > - sigset_t *set, sigset_t *oldset, > - size_t sigsetsize) > +int set_compat_xxx(const compat_sigset_t __user *umask, size_t sigsetsize) > { > - if (!usigmask) > - return 0; > + sigset_t *kmask; > > + if (!umask) > + return 0; > if (sigsetsize != sizeof(compat_sigset_t)) > return -EINVAL; > - if (get_compat_sigset(set, usigmask)) > + if (get_compat_sigset(kmask, umask)) > return -EFAULT; > > - *oldset = current->blocked; > - set_current_blocked(set); > + set_restore_sigmask(); > + current->saved_sigmask = current->blocked; > + set_current_blocked(kmask); > > return 0; > } > -EXPORT_SYMBOL(set_compat_user_sigmask); > #endif > > -/* > - * restore_user_sigmask: > - * usigmask: sigmask passed in from userland. > - * sigsaved: saved sigmask when the syscall started and changed the sigmask to > - * usigmask. > - * > - * This is useful for syscalls such as ppoll, pselect, io_pgetevents and > - * epoll_pwait where a new sigmask is passed in from userland for the syscalls. > - */ > -void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) > -{ > - > - if (!usigmask) > - return; > - /* > - * When signals are pending, do not restore them here. > - * Restoring sigmask here can lead to delivering signals that the above > - * syscalls are intended to block because of the sigmask passed in. > - */ > - if (signal_pending(current)) { > - current->saved_sigmask = *sigsaved; > - set_restore_sigmask(); > - return; > - } > - > - /* > - * This is needed because the fast syscall return path does not restore > - * saved_sigmask when signals are not pending. > - */ > - set_current_blocked(sigsaved); > -} > -EXPORT_SYMBOL(restore_user_sigmask); > - > /** > * sys_rt_sigprocmask - change the list of currently blocked signals > * @how: whether to add, remove, or set signals >
On Wed, May 29, 2019 at 9:12 AM Oleg Nesterov <oleg@redhat.com> wrote: > > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. > > I think that the following code is correct > > int interrupted = 0; > > void sigint_handler(int sig) > { > interrupted = 1; > } > > int main(void) > { > sigset_t sigint, empty; > > sigemptyset(&sigint); > sigaddset(&sigint, SIGINT); > sigprocmask(SIG_BLOCK, &sigint, NULL); > > signal(SIGINT, sigint_handler); > > sigemptyset(&empty); // so pselect() unblocks SIGINT > > ret = pselect(..., &empty); > > if (ret >= 0) // sucess or timeout > assert(!interrupted); > > if (interrupted) > assert(ret == -EINTR); > } > > IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this > signal should not be delivered if a ready fd was found or timeout. The signal > handle should only run if ret == -EINTR. I do not think we discussed this part earlier. But, if this is true then this is what is wrong as part of 854a6ed56839a. I missed that before. > (pselect() can be interrupted by any other signal which has a handler. In this > case the handler can be called even if ret >= 0. This is correct, I fail to > understand why some people think this is wrong, and in any case we simply can't > avoid this). This patch is wrong because I did not know that it was ok to deliver a signal and not set the errno before. I also admitted to this. And proposed another way to revert the patch.: https://lore.kernel.org/lkml/CABeXuvouBzZuNarmNcd9JgZgvonL1N_p21gat=O_x0-1hMx=6A@mail.gmail.com/ -Deepa
David Laight <David.Laight@ACULAB.COM> wrote: > From: Oleg Nesterov > > Sent: 29 May 2019 17:12 > > Al, Linus, Eric, please help. > > > > The previous discussion was very confusing, we simply can not understand each > > other. > > > > To me everything looks very simple and clear, but perhaps I missed something > > obvious? Please correct me. > > > > I think that the following code is correct > > > > int interrupted = 0; > > > > void sigint_handler(int sig) > > { > > interrupted = 1; > > } > > > > int main(void) > > { > > sigset_t sigint, empty; > > > > sigemptyset(&sigint); > > sigaddset(&sigint, SIGINT); > > sigprocmask(SIG_BLOCK, &sigint, NULL); > > > > signal(SIGINT, sigint_handler); > > > > sigemptyset(&empty); // so pselect() unblocks SIGINT > > > > ret = pselect(..., &empty); > ^^^^^ sigint > > > > if (ret >= 0) // sucess or timeout > > assert(!interrupted); > > > > if (interrupted) > > assert(ret == -EINTR); > > } > > > > IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this > > signal should not be delivered if a ready fd was found or timeout. The signal > > handle should only run if ret == -EINTR. > > Personally I think that is wrong. > Given code like the above that has: > while (!interrupted) { > pselect(..., &sigint); > // process available data > } > > You want the signal handler to be executed even if one of the fds > always has available data. > Otherwise you can't interrupt a process that is always busy. Agreed... I believe cmogstored has always had a bug in the way it uses epoll_pwait because it failed to check interrupts if: a) an FD is ready + interrupt b) epoll_pwait returns 0 on interrupt The bug remains in userspace for a), which I will fix by adding an interrupt check when an FD is ready. The window is very small for a) and difficult to trigger, and also in a rare code path. The b) case is the kernel bug introduced in 854a6ed56839a40f ("signal: Add restore_user_sigmask()"). I don't think there's any disagreement that b) is a kernel bug. So the confusion is for a), and POSIX is not clear w.r.t. how pselect/poll works when there's both FD readiness and an interrupt. Thus I'm inclined to believe *select/*poll/epoll_*wait should follow POSIX read() semantics: If a read() is interrupted by a signal before it reads any data, it shall return −1 with errno set to [EINTR]. If a read() is interrupted by a signal after it has successfully read some data, it shall return the number of bytes read. > One option is to return -EINTR if a signal is pending when the mask > is updated - before even looking at anything else. > > Signals that happen later on (eg after a timeout) need not be reported > (until the next time around the loop). I'm not sure that's necessary and it would cause delays in signal handling.
On Wed, May 29, 2019 at 6:12 PM Oleg Nesterov <oleg@redhat.com> wrote: > > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. Thanks for the elaborate explanation in this patch, it all starts making sense to me now. I also looked at your patch in detail and thought I had found a few mistakes at first but those all turned out to be mistakes in my reading. > See the compile-tested patch at the end. Of course, the new _xxx() helpers > should be renamed somehow. fs/aio.c doesn't look right with or without this > patch, but iiuc this is what it did before 854a6ed56839a. I think this is a nice simplification, but it would help not to mix up the minimal regression fix with the rewrite of those functions. For the stable kernels, I think we want just the addition of the 'bool interrupted' argument to restore_user_sigmask() to close the race that was introduced 854a6ed56839a. Following up on that for future kernels, your patch improves the readability, but we can probably take it even further. > - ret = set_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); > + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); > if (ret) > return ret; > > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL); > - restore_user_sigmask(ksig.sigmask, &sigsaved); > - if (signal_pending(current) && !ret) > + > + interrupted = signal_pending(current); > + update_xxx(interrupted); Maybe name this restore_saved_sigmask_if(!interrupted); and make restore_saved_sigmask_if() an inline function next to restore_saved_sigmask()? > @@ -2201,13 +2205,15 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, > if (usig && copy_from_user(&ksig, usig, sizeof(ksig))) > return -EFAULT; > > - ret = set_compat_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); > + ret = set_compat_xxx(ksig.sigmask, ksig.sigsetsize); > if (ret) > return ret; With some of the recent discussions about compat syscall handling, I now think that we want to just fold set_compat_user_sigmask() into set_user_sigmask() (whatever they get called in the end) with an in_compat_syscall() conditional inside it, and completely get rid of the COMPAT_SYSCALL_DEFINEx() definitions for those system calls for which this is the only difference. Unfortunately we still need the time32/time64 distinction, but removing syscall handlers is a significant cleanup here already, and we can move most of the function body of sys_io_pgetevents() into do_io_getevents() in the process. Same for some of the other calls. Not sure about the order of the cleanups, but probably something like this would work: 1. fix the race (to be backported) 2. unify set_compat_user_sigmask/set_user_sigmask 3. remove unneeded compat handlers 4. replace restore_user_sigmask with restore_saved_sigmask_if() 5. also unify compat_get_fd_set()/get_fd_set() and kill off compat select() variants. Arnd
Arnd Bergmann <arnd@arndb.de> writes: > On Wed, May 29, 2019 at 6:12 PM Oleg Nesterov <oleg@redhat.com> wrote: >> >> Al, Linus, Eric, please help. >> >> The previous discussion was very confusing, we simply can not understand each >> other. >> >> To me everything looks very simple and clear, but perhaps I missed something >> obvious? Please correct me. > > Thanks for the elaborate explanation in this patch, it all starts making sense > to me now. I also looked at your patch in detail and thought I had found > a few mistakes at first but those all turned out to be mistakes in my reading. > >> See the compile-tested patch at the end. Of course, the new _xxx() helpers >> should be renamed somehow. fs/aio.c doesn't look right with or without this >> patch, but iiuc this is what it did before 854a6ed56839a. > > I think this is a nice simplification, but it would help not to mix up the > minimal regression fix with the rewrite of those functions. For the stable > kernels, I think we want just the addition of the 'bool interrupted' argument > to restore_user_sigmask() to close the race that was introduced > 854a6ed56839a. Following up on that for future kernels, your patch > improves the readability, but we can probably take it even further. > >> - ret = set_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); >> + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); >> if (ret) >> return ret; >> >> ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL); >> - restore_user_sigmask(ksig.sigmask, &sigsaved); >> - if (signal_pending(current) && !ret) >> + >> + interrupted = signal_pending(current); >> + update_xxx(interrupted); > > Maybe name this > > restore_saved_sigmask_if(!interrupted); > > and make restore_saved_sigmask_if() an inline function > next to restore_saved_sigmask()? > >> @@ -2201,13 +2205,15 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, >> if (usig && copy_from_user(&ksig, usig, sizeof(ksig))) >> return -EFAULT; >> >> - ret = set_compat_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); >> + ret = set_compat_xxx(ksig.sigmask, ksig.sigsetsize); >> if (ret) >> return ret; > > With some of the recent discussions about compat syscall handling, > I now think that we want to just fold set_compat_user_sigmask() > into set_user_sigmask() (whatever they get called in the end) > with an in_compat_syscall() conditional inside it, and completely get > rid of the COMPAT_SYSCALL_DEFINEx() definitions for those > system calls for which this is the only difference. > > Unfortunately we still need the time32/time64 distinction, but removing > syscall handlers is a significant cleanup here already, and we can > move most of the function body of sys_io_pgetevents() into > do_io_getevents() in the process. Same for some of the other calls. > > Not sure about the order of the cleanups, but probably something like > this would work: > > 1. fix the race (to be backported) > 2. unify set_compat_user_sigmask/set_user_sigmask > 3. remove unneeded compat handlers > 4. replace restore_user_sigmask with restore_saved_sigmask_if() > 5. also unify compat_get_fd_set()/get_fd_set() and kill off > compat select() variants. Are new system calls added preventing a revert of the patch in question for stable kernels? Eric
From: Eric Wong > Sent: 29 May 2019 19:50 ... > > Personally I think that is wrong. > > Given code like the above that has: > > while (!interrupted) { > > pselect(..., &sigint); > > // process available data > > } > > > > You want the signal handler to be executed even if one of the fds > > always has available data. > > Otherwise you can't interrupt a process that is always busy. FWIW in the past I've seen a process that loops select-accept-fork-exec get its priority reduced to the point where it never blocked in select. The client side retries made it go badly wrong! If it had limited when SIG_INT was detected it would have been a little more difficult to kill. > Agreed... I believe cmogstored has always had a bug in the way > it uses epoll_pwait because it failed to check interrupts if: > > a) an FD is ready + interrupt > b) epoll_pwait returns 0 on interrupt But the kernel code seems to only call the signal handler (for signals that are enabled during pselect() (etc)) if the underlying wait is interrupted. > The bug remains in userspace for a), which I will fix by adding > an interrupt check when an FD is ready. The window is very > small for a) and difficult to trigger, and also in a rare code > path. > > The b) case is the kernel bug introduced in 854a6ed56839a40f > ("signal: Add restore_user_sigmask()"). > > I don't think there's any disagreement that b) is a kernel bug. If the signal is raised after the wait has timed out but before the signal mask is restored. > So the confusion is for a), and POSIX is not clear w.r.t. how > pselect/poll works when there's both FD readiness and an > interrupt. > > Thus I'm inclined to believe *select/*poll/epoll_*wait should > follow POSIX read() semantics: > > If a read() is interrupted by a signal before it reads any data, it shall > return −1 with errno set to [EINTR]. > > If a read() is interrupted by a signal after it has successfully read > some data, it shall return the number of bytes read. Except that above you want different semantics :-) For read() any signal handler is always called. And the return value of a non-blocking read that returns no data is not affected by any pending signals. For pselect() that would mean that if the wait timed out (result 0) and then a signal was raised (before the mask got changed back) then the return value would be zero and the signal handler would be called. So your (b) above is not a bug. Even select() can return 'timeout' and have a signal handler called. There are really two separate issues: 1) Signals that are pending when the new mask is applied. 2) Signals that are raised after the wait terminates (success or timeout). If the signal handlers for (2) are not called then they become (1) next time around the application loop. Maybe the 'restore sigmask' function should be passed an indication of whether it is allowed to let signal handler be called and return whether they would be called (ie whether it restored the signal mask or left it for the syscall exit code to do after calling the signal handlers). That would allow epoll() to convert timeout+pending signal to EINTR, or to allow all handlers be called regardless of the return value. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Oleg Nesterov <oleg@redhat.com> writes: > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. If I have read this thread correctly the core issue is that ther is a program that used to work and that fails now. The question is why. There are two ways the semantics for a sigmask changing system call can be defined. The naive way and by reading posix. I will pick on pselect. The naive way: int pselect(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec *timeout, const sigset_t *sigmask) { sigset_t oldmask; int ret; if (sigmask) sigprocmask(SIG_SETMASK, sigmask, &oldmask); ret = select(nfds, readfds, writefds, exceptfds, timeout); if (sigmask) sigprocmask(SIG_SETMASK, &oldmask, NULL); return ret; } The standard for pselect behavior at https://pubs.opengroup.org/onlinepubs/009695399/functions/pselect.html says: > If sigmask is not a null pointer, then the pselect() function shall > replace the signal mask of the caller by the set of signals pointed to > by sigmask before examining the descriptors, and shall restore the > signal mask of the calling thread before returning. ... > On failure, the objects pointed to by the readfds, writefds, and > errorfds arguments shall not be modified. If the timeout interval > expires without the specified condition being true for any of the > specified file descriptors, the objects pointed to by the readfds, > writefds, and errorfds arguments shall have all bits set to 0. So the standard specified behavior is if the return value is -EINTR you know you were interrupted by a signal, for any other return value you don't know. An error value just indicates that the file descriptor sets have not been updated. That is what the standard and reasonable people will expect from the system call. Looking at set_user_sigmask and restore_user_sigmask I believe we are don't violate those semantics today. Which means I believe we have a semantically valid change in behavior that is causing a regression. From my inspection of the code the change in behavior is from these two pieces of code: From the v4.18 epoll_pwait. /* * If we changed the signal mask, we need to restore the original one. * In case we've got a signal while waiting, we do not restore the * signal mask yet, and we allow do_signal() to deliver the signal on * the way back to userspace, before the signal mask is restored. */ if (sigmask) { if (error == -EINTR) { memcpy(¤t->saved_sigmask, &sigsaved, sizeof(sigsaved)); set_restore_sigmask(); } else set_current_blocked(&sigsaved); } /* * restore_user_sigmask: * usigmask: sigmask passed in from userland. * sigsaved: saved sigmask when the syscall started and changed the sigmask to * usigmask. * * This is useful for syscalls such as ppoll, pselect, io_pgetevents and * epoll_pwait where a new sigmask is passed in from userland for the syscalls. */ void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) { if (!usigmask) return; /* * When signals are pending, do not restore them here. * Restoring sigmask here can lead to delivering signals that the above * syscalls are intended to block because of the sigmask passed in. */ if (signal_pending(current)) { current->saved_sigmask = *sigsaved; set_restore_sigmask(); return; } /* * This is needed because the fast syscall return path does not restore * saved_sigmask when signals are not pending. */ set_current_blocked(sigsaved); } Which has been reported results in a return value of 0, and a signal delivered, where previously that did not happen. They only way I can see that happening is that set_current_blocked in __set_task_blocked clears pending signals (that will be blocked) and calls retarget_shared_pending. Frankly the only reason this appears to be worth touching is that we have a userspace regression. Otherwise I would call the current behavior more correct and desirable than ignoring the signal longer. If I am reading sitaution properly I suggest we go back to resoring the sigmask by hand in epoll_pwait, and put in a big fat comment about a silly userspace program depending on that behavior. That will be easier to backport and it will just fix the regression and not overfix the problem for reasonable applications. Oleg your cleanup also seems reasonable. But I would like to make certain we understand and fix the regression first. We seem to be talking very small windows for both cases. Eric
Eric Wong <e@80x24.org> writes: > Agreed... I believe cmogstored has always had a bug in the way > it uses epoll_pwait because it failed to check interrupts if: > > a) an FD is ready + interrupt > b) epoll_pwait returns 0 on interrupt > > The bug remains in userspace for a), which I will fix by adding > an interrupt check when an FD is ready. The window is very > small for a) and difficult to trigger, and also in a rare code > path. > > The b) case is the kernel bug introduced in 854a6ed56839a40f > ("signal: Add restore_user_sigmask()"). > > I don't think there's any disagreement that b) is a kernel bug. See my reply to Oleg. I think (b) is a regression that needs to be fixed. I do not think that (b) is a kernel bug. Both versions of the of what sigmask means posix and naive will allow (b). Because fundamentally the sigmask is restored after the rest of the system call happens. Eric
On 05/30, Arnd Bergmann wrote: > > I think this is a nice simplification, but it would help not to mix up the > minimal regression fix with the rewrite of those functions. Yes, yes, agreed. Plus every file touched by this patch asks for more cleanups. Say, do_poll() should return -ERESTARTNOHAND, not -EINTR, after that we can remove the ugly EINTR->ERESTARTNOHAND in its callers. And more. > For the stable > kernels, I think we want just the addition of the 'bool interrupted' argument > to restore_user_sigmask() or simply revert this patch. I will check if this is possible today... At first glance 854a6ed56839a40f6 fixed another bug by accident, do_pselect() did "ret == -ERESTARTNOHAND" after "ret = poll_select_copy_remaining()" which can turn ERESTARTNOHAND into EINTR, but this is simple. I'll check tomorrow. > > - ret = set_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); > > + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); > > if (ret) > > return ret; > > > > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL); > > - restore_user_sigmask(ksig.sigmask, &sigsaved); > > - if (signal_pending(current) && !ret) > > + > > + interrupted = signal_pending(current); > > + update_xxx(interrupted); > > Maybe name this > > restore_saved_sigmask_if(!interrupted); Yes, I thought about restore_if(), but to me restore_saved_sigmask_if(ret != -EINTR); doesn't look readable... May be restore_saved_sigmask_unless(ret == -EINTR); ? but actually I agree with any naming. > and make restore_saved_sigmask_if() an inline function > next to restore_saved_sigmask()? agreed, > With some of the recent discussions about compat syscall handling, > I now think that we want to just fold set_compat_user_sigmask() > into set_user_sigmask() agreed, and I thought about this too. But again, I'd prefer to do this and other cleanups later, on top of this patch. Oleg.
From: Eric W. Biederman > Sent: 30 May 2019 14:01 > Oleg Nesterov <oleg@redhat.com> writes: > > > Al, Linus, Eric, please help. > > > > The previous discussion was very confusing, we simply can not understand each > > other. > > > > To me everything looks very simple and clear, but perhaps I missed something > > obvious? Please correct me. > > If I have read this thread correctly the core issue is that ther is a > program that used to work and that fails now. The question is why. > > There are two ways the semantics for a sigmask changing system call > can be defined. The naive way and by reading posix. I will pick > on pselect. > > The naive way: > int pselect(int nfds, fd_set *readfds, fd_set *writefds, > fd_set *exceptfds, const struct timespec *timeout, > const sigset_t *sigmask) > { > sigset_t oldmask; > int ret; > > if (sigmask) > sigprocmask(SIG_SETMASK, sigmask, &oldmask); > > ret = select(nfds, readfds, writefds, exceptfds, timeout); > > if (sigmask) > sigprocmask(SIG_SETMASK, &oldmask, NULL); > > return ret; > } > > The standard for pselect behavior at > https://pubs.opengroup.org/onlinepubs/009695399/functions/pselect.html > says: > > If sigmask is not a null pointer, then the pselect() function shall > > replace the signal mask of the caller by the set of signals pointed to > > by sigmask before examining the descriptors, and shall restore the > > signal mask of the calling thread before returning. Right, but that bit says nothing about when signals are 'delivered'. Section 2.4 of http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html isn't very enlightening either - well, pretty impenetrable really. But since the ToG specs don't require a user-kernel boundary I believe that the signal handler is expected to be called as soon as it is unmasked. Now, for async signals, the kernel can always pretend that they happened slightly later than they actually did. So signal delivery is delayed until 'return to user'. In some cases system calls are restarted to make the whole thing transparent. For pselect() etc I think this means: 1) Signal handlers can only be called if EINTR is returned. 2) If a signal is deliverable after the mask is changed then the signal hander must be called. This means EINTR must be returned. ie this must be detected before checking anything else. 3) A signal that is raised after the wait completes can be 'deemed' to have happened after the mask is restored and left masked until the applications next pselect() call. 4) As an optimisation a signal that arrives after the timer expires, but before the mask is restored can be 'deemed' to have happened before the timeout expired and EINTR returned. Now not all system calls that use a temporary mask may want to work that way. In particular they may want to return 'success' and have the signal handlers called. So maybe the set_xxx() function that installs to user's mask should return: -ve: errno value 0: no signal pending 1: signal pending. And the update_xxx() function that restores the saved mask should take a parameter to indicate whether signal handlers should be called maybe 'bool call_handlers' and return 0/1 depending on whether signals were pending. pselect() then contains: rval = set_xxx(); if (rval) { if (rval < 0) return rval; update_xxx(true); return -EINTR: } rval = wait....(); #if 0 // don't report late signals update_xxx(rval == -EINTR); #else // report signals after timeout if (update_xxx(!rval || rval == -EINTR)) rval == -EINTR; #endif return rval; } David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
ebiederm@xmission.com (Eric W. Biederman) writes: > Which means I believe we have a semantically valid change in behavior > that is causing a regression. I haven't made a survey of all of the functions yet but fucntions return -ENORESTARTNOHAND will never return -EINTR and are immune from this problem. AKA pselect is fine. While epoll_pwait can be affected. Has anyone contacted Omar Kilani to see if that is his issue? https://lore.kernel.org/lkml/CA+8F9hicnF=kvjXPZFQy=Pa2HJUS3JS+G9VswFHNQQynPMHGVQ@mail.gmail.com/ So far the only regression report I am seeing is from Eric Wong. AKA https://lore.kernel.org/lkml/20190501021405.hfvd7ps623liu25i@dcvr/ Are there any others? How did we get to be talking about more than just epoll_pwait? Eric
> On May 30, 2019, at 8:38 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > ebiederm@xmission.com (Eric W. Biederman) writes: > >> Which means I believe we have a semantically valid change in behavior >> that is causing a regression. > > I haven't made a survey of all of the functions yet but > fucntions return -ENORESTARTNOHAND will never return -EINTR and are > immune from this problem. > > AKA pselect is fine. While epoll_pwait can be affected. This was my understanding as well. > Has anyone contacted Omar Kilani to see if that is his issue? > https://lore.kernel.org/lkml/CA+8F9hicnF=kvjXPZFQy=Pa2HJUS3JS+G9VswFHNQQynPMHGVQ@mail.gmail.com/ Omar was cc-ed when this regression was reported. I did cc him on fix and asked if he could try it. We have not heard from him. > So far the only regression report I am seeing is from Eric Wong. > AKA https://lore.kernel.org/lkml/20190501021405.hfvd7ps623liu25i@dcvr/ > Are there any others? How did we get to be talking about more > than just epoll_pwait? This is the only report that I know of. I’m not sure why people started talking about pselect. I was also confused why instead of reviewing the patch and discussing the fix, we ended up talking about how to simplify the code. We have deviated much from what should have been a code review. -Deepa
On 05/30, Eric W. Biederman wrote: > > Oleg Nesterov <oleg@redhat.com> writes: > > > Al, Linus, Eric, please help. > > > > The previous discussion was very confusing, we simply can not understand each > > other. > > > > To me everything looks very simple and clear, but perhaps I missed something > > obvious? Please correct me. > > If I have read this thread correctly the core issue is that ther is a > program that used to work and that fails now. The question is why. I didn't even try to investigate, I wasn't cc'ed initially and I then I had enough confusion when I started to look at the patch. However, 854a6ed56839a40f6 obviously introduced the user-visible change so I am not surprised it breaks something. And yes, personally I think this change is not right. > Which means I believe we have a semantically valid change in behavior > that is causing a regression. See below, > void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) > { > > if (!usigmask) > return; > /* > * When signals are pending, do not restore them here. > * Restoring sigmask here can lead to delivering signals that the above > * syscalls are intended to block because of the sigmask passed in. > */ > if (signal_pending(current)) { > current->saved_sigmask = *sigsaved; > set_restore_sigmask(); > return; > } > > /* > * This is needed because the fast syscall return path does not restore > * saved_sigmask when signals are not pending. > */ > set_current_blocked(sigsaved); > } > > Which has been reported results in a return value of 0, 0 or success > and a signal > delivered, where previously that did not happen. Yes. And to me this doesn't look right. OK, OK, probably this is because I never read the docs, only the source code in fs/select.c. But to me pselect() should either return success/timeout or deliver a signal. Not both. Even if the signal was already pending before pselect() was called. To clarify, "a signal" means a signal which was blocked before pselect(sigmask) and temporary unblocked in this syscall. And even if this doesn't violate posix, I see no reason to change the historic behaviour. And this regression probably means we can't ;) Oleg.
On 05/30, Eric W. Biederman wrote: > > ebiederm@xmission.com (Eric W. Biederman) writes: > > > Which means I believe we have a semantically valid change in behavior > > that is causing a regression. > > I haven't made a survey of all of the functions yet but > fucntions return -ENORESTARTNOHAND will never return -EINTR and are > immune from this problem. Hmm. handle_signal: case -ERESTARTNOHAND: regs->ax = -EINTR; break; but I am not sure I understand which problem do you mean.. Oleg.
On 05/30, David Laight wrote: > > 4) As an optimisation a signal that arrives after the timer > expires, but before the mask is restored can be 'deemed' > to have happened before the timeout expired and EINTR > returned. This is what pselect/ppoll already does. Oleg.
From: Eric W. Biederman > Sent: 30 May 2019 16:38 > ebiederm@xmission.com (Eric W. Biederman) writes: > > > Which means I believe we have a semantically valid change in behavior > > that is causing a regression. > > I haven't made a survey of all of the functions yet but > fucntions return -ENORESTARTNOHAND will never return -EINTR and are > immune from this problem. Eh? ERESTARTNOHAND just makes the system call restart if there is no signal handler, EINTR should still be returned if there is a handler. All the functions that have a temporary signal mask are likely to be expected to work the same way and thus have the same bugs. http://pubs.opengroup.org/onlinepubs/9699919799/functions/pselect.html# isn't overly helpful. But I think it should return EINTR even if there is no handler unless SA_RESTART is set: [EINTR] The function was interrupted while blocked waiting for any of the selected descriptors to become ready and before the timeout interval expired. If SA_RESTART has been set for the interrupting signal, it is implementation-defined whether the function restarts or returns with [EINTR]. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, May 30, 2019 at 8:48 AM Deepa Dinamani <deepa.kernel@gmail.com> wrote: > > > On May 30, 2019, at 8:38 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > > > ebiederm@xmission.com (Eric W. Biederman) writes: > > > >> Which means I believe we have a semantically valid change in behavior > >> that is causing a regression. > > > > I haven't made a survey of all of the functions yet but > > fucntions return -ENORESTARTNOHAND will never return -EINTR and are > > immune from this problem. > > > > AKA pselect is fine. While epoll_pwait can be affected. > > This was my understanding as well. I think I was misremembered here. I had noted this before: https://lore.kernel.org/linux-fsdevel/CABeXuvq7gCV2qPOo+Q8jvNyRaTvhkRLRbnL_oJ-AuK7Sp=P3QQ@mail.gmail.com/ "sys_io_pgetevents() does not seem to have this problem as we are still checking signal_pending() here. sys_pselect6() seems to have a similar problem. The changes to sys_pselect6() also impact sys_select() as the changes are in the common code path." This was the code replaced for io_pgetevents by 854a6ed56839a40f6b is as below. No matter what events completed, there was signal_pending() check after the return from do_io_getevents(). --- a/fs/aio.c +++ b/fs/aio.c @@ -2110,18 +2110,9 @@ SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL); - if (signal_pending(current)) { - if (ksig.sigmask) { - current->saved_sigmask = sigsaved; - set_restore_sigmask(); - } - - if (!ret) - ret = -ERESTARTNOHAND; - } else { - if (ksig.sigmask) - sigprocmask(SIG_SETMASK, &sigsaved, NULL); - } + restore_user_sigmask(ksig.sigmask, &sigsaved); + if (signal_pending(current) && !ret) + ret = -ERESTARTNOHAND; Can I ask a simple question for my understanding? man page for epoll_pwait says EINTR The call was interrupted by a signal handler before either any of the requested events occurred or the timeout expired; see signal(7). But it is not clear to me if we can figure out(without race) the chronological order if one of the requested events are completed or a signal came first. Is this a correct exectation? Also like pointed out above, this behavior is not consistent for all such syscalls(io_pgetevents). Was this also by design? -Deepa
Oleg Nesterov <oleg@redhat.com> writes: > On 05/30, Eric W. Biederman wrote: >> >> ebiederm@xmission.com (Eric W. Biederman) writes: >> >> > Which means I believe we have a semantically valid change in behavior >> > that is causing a regression. >> >> I haven't made a survey of all of the functions yet but >> fucntions return -ENORESTARTNOHAND will never return -EINTR and are >> immune from this problem. > > Hmm. handle_signal: > > case -ERESTARTNOHAND: > regs->ax = -EINTR; > break; > > but I am not sure I understand which problem do you mean.. Yes. My mistake. I looked at the transparent restart case for when a signal is not pending and failed to look at what happens when a signal is delivered. So yes. Everything changed does appear to have a behavioral difference where they can now succeed and not return -EINTR. Eric
On Thu, May 30, 2019 at 3:54 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > Arnd Bergmann <arnd@arndb.de> writes: > > On Wed, May 29, 2019 at 6:12 PM Oleg Nesterov <oleg@redhat.com> wrote: > > > > Not sure about the order of the cleanups, but probably something like > > this would work: > > > > 1. fix the race (to be backported) > > 2. unify set_compat_user_sigmask/set_user_sigmask > > 3. remove unneeded compat handlers > > 4. replace restore_user_sigmask with restore_saved_sigmask_if() > > 5. also unify compat_get_fd_set()/get_fd_set() and kill off > > compat select() variants. > > Are new system calls added preventing a revert of the patch in question > for stable kernels? Yes, a straight revert would not work, as it was done as a cleanup in order to simplify the following conversion. I suppose one could undo the cleanup in both the time32 and time64 versions of each syscall, but I would consider that a more risky change than just fixing the bug that was reported. Arnd
On Thu, May 30, 2019 at 4:41 PM Oleg Nesterov <oleg@redhat.com> wrote: > On 05/30, Arnd Bergmann wrote: > Plus every file touched by this patch asks for more cleanups. Say, do_poll() > should return -ERESTARTNOHAND, not -EINTR, after that we can remove the ugly > EINTR->ERESTARTNOHAND in its callers. And more. > > > For the stable > > kernels, I think we want just the addition of the 'bool interrupted' argument > > to restore_user_sigmask() > > or simply revert this patch. I will check if this is possible today... At first > glance 854a6ed56839a40f6 fixed another bug by accident, do_pselect() did > "ret == -ERESTARTNOHAND" after "ret = poll_select_copy_remaining()" which can > turn ERESTARTNOHAND into EINTR, but this is simple. I'll check tomorrow. Right, there were several differences between the system calls that Deepa's original change got rid of. I don't know if any ones besides the do_pselect() return code can be observed in practice. > > > - ret = set_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); > > > + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); > > > if (ret) > > > return ret; > > > > > > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL); > > > - restore_user_sigmask(ksig.sigmask, &sigsaved); > > > - if (signal_pending(current) && !ret) > > > + > > > + interrupted = signal_pending(current); > > > + update_xxx(interrupted); > > > > Maybe name this > > > > restore_saved_sigmask_if(!interrupted); > > Yes, I thought about restore_if(), but to me > > restore_saved_sigmask_if(ret != -EINTR); > > doesn't look readable... May be > > restore_saved_sigmask_unless(ret == -EINTR); > > ? but actually I agree with any naming. Yes, restore_saved_sigmask_unless() probably better. > > With some of the recent discussions about compat syscall handling, > > I now think that we want to just fold set_compat_user_sigmask() > > into set_user_sigmask() > > agreed, and I thought about this too. But again, I'd prefer to do this > and other cleanups later, on top of this patch. Ok, fair enough. I don't care much about the order as long as the regression fix comes first. Arnd
"Eric W. Biederman" <ebiederm@xmission.com> wrote: > Frankly the only reason this appears to be worth touching is that we > have a userspace regression. Otherwise I would call the current > behavior more correct and desirable than ignoring the signal longer. > > If I am reading sitaution properly I suggest we go back to resoring the > sigmask by hand in epoll_pwait, and put in a big fat comment about a > silly userspace program depending on that behavior. > > That will be easier to backport and it will just fix the regression and > not overfix the problem for reasonable applications. Fwiw, the cmogstored (userspace program which regressed) only hit this regression in an obscure code path for tuning; that code path probably sees no use outside of the test suite. Add to that, cmogstored is an unofficial and optional component to the obscure, largely-forgotten MogileFS. Finally, the main users of cmogstored are on FreeBSD. They hit the kqueue+self-pipe code path instead of epoll_pwait. I only used epoll_pwait on Linux since I figured I could save a few bytes of memory by skipping eventfd/self-pipe... Anyways, I updated cmogstored a few weeks back to call `note_run' (the signal dispatcher) if epoll_pwait (wrapped by `mog_idleq_wait_intr') returns 0 instead of -1 (EINTR) to workaround this kernel change: https://bogomips.org/cmogstored-public/20190511075630.17811-1-e@80x24.org/ I could easily make a change to call `note_run' unconditionally when `mog_idleq_wait_intr' returns, too. But that said, there could be other users hitting the same problem I did. So maybe cmogstored's primary use on Linux these days is finding regressions :>
diff --git a/fs/aio.c b/fs/aio.c index 3490d1f..8315bd2 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -2093,8 +2093,8 @@ SYSCALL_DEFINE6(io_pgetevents, const struct __aio_sigset __user *, usig) { struct __aio_sigset ksig = { NULL, }; - sigset_t ksigmask, sigsaved; struct timespec64 ts; + bool interrupted; int ret; if (timeout && unlikely(get_timespec64(&ts, timeout))) @@ -2103,13 +2103,15 @@ SYSCALL_DEFINE6(io_pgetevents, if (usig && copy_from_user(&ksig, usig, sizeof(ksig))) return -EFAULT; - ret = set_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); if (ret) return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL); - restore_user_sigmask(ksig.sigmask, &sigsaved); - if (signal_pending(current) && !ret) + + interrupted = signal_pending(current); + update_xxx(interrupted); + if (interrupted && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2126,8 +2128,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32, const struct __aio_sigset __user *, usig) { struct __aio_sigset ksig = { NULL, }; - sigset_t ksigmask, sigsaved; struct timespec64 ts; + bool interrupted; int ret; if (timeout && unlikely(get_old_timespec32(&ts, timeout))) @@ -2137,13 +2139,15 @@ SYSCALL_DEFINE6(io_pgetevents_time32, return -EFAULT; - ret = set_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); + ret = set_xxx(ksig.sigmask, ksig.sigsetsize); if (ret) return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL); - restore_user_sigmask(ksig.sigmask, &sigsaved); - if (signal_pending(current) && !ret) + + interrupted = signal_pending(current); + update_xxx(interrupted); + if (interrupted && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2191,8 +2195,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, const struct __compat_aio_sigset __user *, usig) { struct __compat_aio_sigset ksig = { NULL, }; - sigset_t ksigmask, sigsaved; struct timespec64 t; + bool interrupted; int ret; if (timeout && get_old_timespec32(&t, timeout)) @@ -2201,13 +2205,15 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, if (usig && copy_from_user(&ksig, usig, sizeof(ksig))) return -EFAULT; - ret = set_compat_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); + ret = set_compat_xxx(ksig.sigmask, ksig.sigsetsize); if (ret) return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL); - restore_user_sigmask(ksig.sigmask, &sigsaved); - if (signal_pending(current) && !ret) + + interrupted = signal_pending(current); + update_xxx(interrupted); + if (interrupted && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2224,8 +2230,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, const struct __compat_aio_sigset __user *, usig) { struct __compat_aio_sigset ksig = { NULL, }; - sigset_t ksigmask, sigsaved; struct timespec64 t; + bool interrupted; int ret; if (timeout && get_timespec64(&t, timeout)) @@ -2234,13 +2240,15 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, if (usig && copy_from_user(&ksig, usig, sizeof(ksig))) return -EFAULT; - ret = set_compat_user_sigmask(ksig.sigmask, &ksigmask, &sigsaved, ksig.sigsetsize); + ret = set_compat_xxx(ksig.sigmask, ksig.sigsetsize); if (ret) return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL); - restore_user_sigmask(ksig.sigmask, &sigsaved); - if (signal_pending(current) && !ret) + + interrupted = signal_pending(current); + update_xxx(interrupted); + if (interrupted && !ret) ret = -ERESTARTNOHAND; return ret; diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 4a0e98d..0b1a337 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2318,19 +2318,17 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, size_t, sigsetsize) { int error; - sigset_t ksigmask, sigsaved; /* * If the caller wants a certain signal mask to be set during the wait, * we apply it here. */ - error = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); + error = set_xxx(sigmask, sigsetsize); if (error) return error; error = do_epoll_wait(epfd, events, maxevents, timeout); - - restore_user_sigmask(sigmask, &sigsaved); + update_xxx(error == -EINTR); return error; } @@ -2343,19 +2341,17 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd, compat_size_t, sigsetsize) { long err; - sigset_t ksigmask, sigsaved; /* * If the caller wants a certain signal mask to be set during the wait, * we apply it here. */ - err = set_compat_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); + err = set_compat_xxx(sigmask, sigsetsize); if (err) return err; err = do_epoll_wait(epfd, events, maxevents, timeout); - - restore_user_sigmask(sigmask, &sigsaved); + update_xxx(err == -EINTR); return err; } diff --git a/fs/io_uring.c b/fs/io_uring.c index 310f8d1..b5b99e2 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2180,7 +2180,6 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, const sigset_t __user *sig, size_t sigsz) { struct io_cq_ring *ring = ctx->cq_ring; - sigset_t ksigmask, sigsaved; int ret; if (io_cqring_events(ring) >= min_events) @@ -2189,24 +2188,21 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, if (sig) { #ifdef CONFIG_COMPAT if (in_compat_syscall()) - ret = set_compat_user_sigmask((const compat_sigset_t __user *)sig, - &ksigmask, &sigsaved, sigsz); + ret = set_compat_xxx((const compat_sigset_t __user *)sig, + sigsz); else #endif - ret = set_user_sigmask(sig, &ksigmask, - &sigsaved, sigsz); + ret = set_xxx(sig, sigsz); if (ret) return ret; } ret = wait_event_interruptible(ctx->wait, io_cqring_events(ring) >= min_events); + update_xxx(ret == -ERESTARTSYS); if (ret == -ERESTARTSYS) ret = -EINTR; - if (sig) - restore_user_sigmask(sig, &sigsaved); - return READ_ONCE(ring->r.head) == READ_ONCE(ring->r.tail) ? ret : 0; } diff --git a/fs/select.c b/fs/select.c index 6cbc9ff..7eab132 100644 --- a/fs/select.c +++ b/fs/select.c @@ -730,7 +730,6 @@ static long do_pselect(int n, fd_set __user *inp, fd_set __user *outp, const sigset_t __user *sigmask, size_t sigsetsize, enum poll_time_type type) { - sigset_t ksigmask, sigsaved; struct timespec64 ts, end_time, *to = NULL; int ret; @@ -753,15 +752,14 @@ static long do_pselect(int n, fd_set __user *inp, fd_set __user *outp, return -EINVAL; } - ret = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); + ret = set_xxx(sigmask, sigsetsize); if (ret) return ret; ret = core_sys_select(n, inp, outp, exp, to); + update_xxx(ret == -ERESTARTNOHAND); ret = poll_select_copy_remaining(&end_time, tsp, type, ret); - restore_user_sigmask(sigmask, &sigsaved); - return ret; } @@ -1087,7 +1085,6 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds, unsigned int, nfds, struct __kernel_timespec __user *, tsp, const sigset_t __user *, sigmask, size_t, sigsetsize) { - sigset_t ksigmask, sigsaved; struct timespec64 ts, end_time, *to = NULL; int ret; @@ -1100,18 +1097,16 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds, unsigned int, nfds, return -EINVAL; } - ret = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); + ret = set_xxx(sigmask, sigsetsize); if (ret) return ret; ret = do_sys_poll(ufds, nfds, to); - restore_user_sigmask(sigmask, &sigsaved); - + update_xxx(ret == -EINTR); /* We can restart this syscall, usually */ if (ret == -EINTR) ret = -ERESTARTNOHAND; - ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret); return ret; @@ -1123,7 +1118,6 @@ SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds, struct old_timespec32 __user *, tsp, const sigset_t __user *, sigmask, size_t, sigsetsize) { - sigset_t ksigmask, sigsaved; struct timespec64 ts, end_time, *to = NULL; int ret; @@ -1136,18 +1130,16 @@ SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds, return -EINVAL; } - ret = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); + ret = set_xxx(sigmask, sigsetsize); if (ret) return ret; ret = do_sys_poll(ufds, nfds, to); - restore_user_sigmask(sigmask, &sigsaved); - + update_xxx(ret == -EINTR); /* We can restart this syscall, usually */ if (ret == -EINTR) ret = -ERESTARTNOHAND; - ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret); return ret; @@ -1322,7 +1314,6 @@ static long do_compat_pselect(int n, compat_ulong_t __user *inp, void __user *tsp, compat_sigset_t __user *sigmask, compat_size_t sigsetsize, enum poll_time_type type) { - sigset_t ksigmask, sigsaved; struct timespec64 ts, end_time, *to = NULL; int ret; @@ -1345,15 +1336,14 @@ static long do_compat_pselect(int n, compat_ulong_t __user *inp, return -EINVAL; } - ret = set_compat_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); + ret = set_compat_xxx(sigmask, sigsetsize); if (ret) return ret; ret = compat_core_sys_select(n, inp, outp, exp, to); + update_xxx(ret == -ERESTARTNOHAND); ret = poll_select_copy_remaining(&end_time, tsp, type, ret); - restore_user_sigmask(sigmask, &sigsaved); - return ret; } @@ -1406,7 +1396,6 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds, struct old_timespec32 __user *, tsp, const compat_sigset_t __user *, sigmask, compat_size_t, sigsetsize) { - sigset_t ksigmask, sigsaved; struct timespec64 ts, end_time, *to = NULL; int ret; @@ -1419,18 +1408,16 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, return -EINVAL; } - ret = set_compat_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); + ret = set_compat_xxx(sigmask, sigsetsize); if (ret) return ret; ret = do_sys_poll(ufds, nfds, to); - restore_user_sigmask(sigmask, &sigsaved); - + update_xxx(ret == -EINTR); /* We can restart this syscall, usually */ if (ret == -EINTR) ret = -ERESTARTNOHAND; - ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret); return ret; @@ -1442,7 +1429,6 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct pollfd __user *, ufds, unsigned int, nfds, struct __kernel_timespec __user *, tsp, const compat_sigset_t __user *, sigmask, compat_size_t, sigsetsize) { - sigset_t ksigmask, sigsaved; struct timespec64 ts, end_time, *to = NULL; int ret; @@ -1455,18 +1441,16 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct pollfd __user *, ufds, return -EINVAL; } - ret = set_compat_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); + ret = set_compat_xxx(sigmask, sigsetsize); if (ret) return ret; ret = do_sys_poll(ufds, nfds, to); - restore_user_sigmask(sigmask, &sigsaved); - + update_xxx(ret == -EINTR); /* We can restart this syscall, usually */ if (ret == -EINTR) ret = -ERESTARTNOHAND; - ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret); return ret; diff --git a/include/linux/compat.h b/include/linux/compat.h index ebddcb6..b20b001 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -138,9 +138,7 @@ typedef struct { compat_sigset_word sig[_COMPAT_NSIG_WORDS]; } compat_sigset_t; -int set_compat_user_sigmask(const compat_sigset_t __user *usigmask, - sigset_t *set, sigset_t *oldset, - size_t sigsetsize); +int set_compat_xxx(const compat_sigset_t __user *umask, size_t sigsetsize); struct compat_sigaction { #ifndef __ARCH_HAS_IRIX_SIGACTION diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 38a0f07..8b5b537 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -417,7 +417,6 @@ void task_join_group_stop(struct task_struct *task); static inline void set_restore_sigmask(void) { set_thread_flag(TIF_RESTORE_SIGMASK); - WARN_ON(!test_thread_flag(TIF_SIGPENDING)); } static inline void clear_tsk_restore_sigmask(struct task_struct *task) @@ -448,7 +447,6 @@ static inline bool test_and_clear_restore_sigmask(void) static inline void set_restore_sigmask(void) { current->restore_sigmask = true; - WARN_ON(!test_thread_flag(TIF_SIGPENDING)); } static inline void clear_tsk_restore_sigmask(struct task_struct *task) { diff --git a/include/linux/signal.h b/include/linux/signal.h index 9702016..65f84ac 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -273,10 +273,8 @@ extern int group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p, enum pid_type type); extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *); extern int sigprocmask(int, sigset_t *, sigset_t *); -extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set, - sigset_t *oldset, size_t sigsetsize); -extern void restore_user_sigmask(const void __user *usigmask, - sigset_t *sigsaved); +extern int set_xxx(const sigset_t __user *umask, size_t sigsetsize); +extern void update_xxx(bool interupted); extern void set_current_blocked(sigset_t *); extern void __set_current_blocked(const sigset_t *); extern int show_unhandled_signals; diff --git a/kernel/signal.c b/kernel/signal.c index d7b9d14..0a1ec68 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2861,79 +2861,56 @@ EXPORT_SYMBOL(sigprocmask); * * This is useful for syscalls such as ppoll, pselect, io_pgetevents and * epoll_pwait where a new sigmask is passed from userland for the syscalls. + * + * Note that it does set_restore_sigmask() in advance, so it must be always + * paired with update_xxx() before return from syscall. */ -int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set, - sigset_t *oldset, size_t sigsetsize) +int set_xxx(const sigset_t __user *umask, size_t sigsetsize) { - if (!usigmask) - return 0; + sigset_t *kmask; + if (!umask) + return 0; if (sigsetsize != sizeof(sigset_t)) return -EINVAL; - if (copy_from_user(set, usigmask, sizeof(sigset_t))) + if (copy_from_user(kmask, umask, sizeof(sigset_t))) return -EFAULT; - *oldset = current->blocked; - set_current_blocked(set); + set_restore_sigmask(); + current->saved_sigmask = current->blocked; + set_current_blocked(kmask); return 0; } -EXPORT_SYMBOL(set_user_sigmask); + +void update_xxx(bool interrupted) +{ + if (interrupted) + WARN_ON(!test_thread_flag(TIF_SIGPENDING)); + else + restore_saved_sigmask(); +} #ifdef CONFIG_COMPAT -int set_compat_user_sigmask(const compat_sigset_t __user *usigmask, - sigset_t *set, sigset_t *oldset, - size_t sigsetsize) +int set_compat_xxx(const compat_sigset_t __user *umask, size_t sigsetsize) { - if (!usigmask) - return 0; + sigset_t *kmask; + if (!umask) + return 0; if (sigsetsize != sizeof(compat_sigset_t)) return -EINVAL; - if (get_compat_sigset(set, usigmask)) + if (get_compat_sigset(kmask, umask)) return -EFAULT; - *oldset = current->blocked; - set_current_blocked(set); + set_restore_sigmask(); + current->saved_sigmask = current->blocked; + set_current_blocked(kmask); return 0; } -EXPORT_SYMBOL(set_compat_user_sigmask); #endif -/* - * restore_user_sigmask: - * usigmask: sigmask passed in from userland. - * sigsaved: saved sigmask when the syscall started and changed the sigmask to - * usigmask. - * - * This is useful for syscalls such as ppoll, pselect, io_pgetevents and - * epoll_pwait where a new sigmask is passed in from userland for the syscalls. - */ -void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) -{ - - if (!usigmask) - return; - /* - * When signals are pending, do not restore them here. - * Restoring sigmask here can lead to delivering signals that the above - * syscalls are intended to block because of the sigmask passed in. - */ - if (signal_pending(current)) { - current->saved_sigmask = *sigsaved; - set_restore_sigmask(); - return; - } - - /* - * This is needed because the fast syscall return path does not restore - * saved_sigmask when signals are not pending. - */ - set_current_blocked(sigsaved); -} -EXPORT_SYMBOL(restore_user_sigmask); - /** * sys_rt_sigprocmask - change the list of currently blocked signals * @how: whether to add, remove, or set signals