Message ID | 20190109114744.10936-6-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v6] x86: load FPU registers on return to userland | expand |
On Wed, Jan 09, 2019 at 12:47:27PM +0100, Sebastian Andrzej Siewior wrote: > Since ->initialized is always true for user tasks and kernel threads > don't get this far, Yeah, this is commit message is too laconic. Don'g get this far "where"? > we always save the registers directly to userspace. We don't save registers to userspace - please write stuff out. So from looking at what you're removing I can barely rhyme up what you're doing but this needs a lot more explanation why it is OK to remove the else case. Hell, why was the thing added in the first place if ->initialized is always true? And why is it ok to save registers directly to the user task's buffers? So please be more verbose even at the risk of explaning the obvious. This is the FPU code, remember? :) Thx.
On 2019-01-16 20:36:03 [+0100], Borislav Petkov wrote: > On Wed, Jan 09, 2019 at 12:47:27PM +0100, Sebastian Andrzej Siewior wrote: > > Since ->initialized is always true for user tasks and kernel threads > > don't get this far, > > Yeah, this is commit message is too laconic. Don'g get this far "where"? To reach copy_fpregs_to_sigframe(). A kernel thread never invokes copy_fpregs_to_sigframe(). Which means only user threads invoke copy_fpregs_to_sigframe() and they have ->initialized set to one. > > we always save the registers directly to userspace. > > We don't save registers to userspace - please write stuff out. Actually we do. copy_fpregs_to_sigframe() saves current FPU registers to task's stack frame which is userspace memory. > So from looking at what you're removing I can barely rhyme up what > you're doing but this needs a lot more explanation why it is OK to > remove the else case. Hell, why was the thing added in the first place > if ->initialized is always true? I think *parts* of the ->initialized field was wrongly converted while lazy-FPU was removed *or* it was forgotten to be removed afterwards. Or I don't know but it looks like a leftover. At the beginning (while it was added) it was part of the lazy-FPU code. So if tasks's FPU register are not active then they are saved in task's FPU struct. So in this case (the else path) it does __copy_to_user(buf_fx, xsave, fpu_user_xstate_size) In the other case (task's FPU struct is not up-to date, the current FPU register content is in CPU's registers) it does copy_fpregs_to_sigframe(buf_fx) which copies CPU's registers. In both cases it copies them (the FPU registers) to the task's stack frame (the same location). Easy so far? How does using_compacted_format() fit in here? The point is that the "compacted" format is never exposed to userland so it requires normal xsave. So far so good, right? But how does it work in in the '->initialized = 0' case right? It was introduced in commit 99aa22d0d8f7 ("x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use") and it probably does not explain why this works, right? So *either* fpregs_active() was always true if the task used FPU *once* or if it used FPU *recently* and task's FPU register are active (I don't remember anymore). Anyway: a) we don't get here because caller checks for fpregs_active() before invoking copy_fpstate_to_sigframe() b) a preemption check resets fpregs_active() after the first check then we do "xsave", xsaves traps because FPU is off/disabled, trap loads task's FPU registers, gets back to "xsave", "xsave" saves CPU's register to the stack frame. The b part does not work like that since commit bef8b6da9522 ("x86/fpu: Handle #NM without FPU emulation as an error") but then at that point it was "okay" because fpregs_active() would return true if the task used FPU registers at least once. If it did not use them then it would not invoke that function (the caller checks for fpregs_active()). > And why is it ok to save registers directly to the user task's buffers? So I can't tell you why it is okay but I can explain why it is done (well, that part I puzzled together). The task is running and using FPU registers. Then an evil mind sends a signal. The task goes into kernel, prepares itself and is about to handle the signal in userland. It saves its FPU registers on the stack frame. It zeros its current FPU registers (ready for a fresh start), loads the address of the signal handler and returns to user land handling the signal. Now. The signal handler may use FPU registers and the signal handler maybe be preempted so you need to save the FPU registers of the signal handler and you can't mix them up with the FPU register's of the task (before it started handling the signal). So in order to avoid a second FPU struct it saves them on user's stack frame. I *think* this (avoiding a second FPU struct) is the primary motivation. A bonus point might be that the signal handler has a third argument the `context'. That means you can use can access the task's FPU registers from the signal handler. Not sure *why* you want to do so but yo can. I can't imagine a use case and I was looking for a user and expecting it to be glibc but I didn't find anything in the glibc that would explain it. Intel even defines a few bytes as "user reserved" which are used by "struct _fpx_sw_bytes" to add a marker in the signal and recognise it on restore. The only user that seems to make use of that is `criu' (or it looked like it does use it). I would prefer to add a second struct-FPU and use that for the signal handler. This would avoid the whole dance here. And `criu' could maybe become a proper interface. I don't think as of now that it will break something in userland if the signal handler suddenly does not have a pointer to the FPU struct. > So please be more verbose even at the risk of explaning the obvious. > This is the FPU code, remember? :) Okay. So I was verbose *now*. Depending on what you say (or don't) I will try to recycle this into commit message in a few days. > Thx. Sebastian
On Wed, Jan 16, 2019 at 11:40:37PM +0100, Sebastian Andrzej Siewior wrote: > Actually we do. copy_fpregs_to_sigframe() saves current FPU registers to > task's stack frame which is userspace memory. I know we do - I was only pointing at the not optimal choice of words - "save registers to userspace" and to rather say "save hardware registers to user buffers" or so. > I think *parts* of the ->initialized field was wrongly converted while > lazy-FPU was removed *or* it was forgotten to be removed afterwards. Or > I don't know but it looks like a leftover. > > At the beginning (while it was added) it was part of the lazy-FPU code. > So if tasks's FPU register are not active then they are saved in task's > FPU struct. So in this case (the else path) it does > __copy_to_user(buf_fx, xsave, fpu_user_xstate_size) So far, so good. Comment above says so too: * If the fpu, extended register state is live, save the state directly * to the user frame pointed by the aligned pointer 'buf_fx'. Otherwise, * copy the thread's fpu state to the user frame starting at 'buf_fx'. > In the other case (task's FPU struct is not up-to date, the current > FPU register content is in CPU's registers) it does > copy_fpregs_to_sigframe(buf_fx) ACK. > How does using_compacted_format() fit in here? > The point is that the "compacted" format is never exposed to > userland so it requires normal xsave. So far so good, right? But how > does it work in in the '->initialized = 0' case right? It was > introduced in commit > 99aa22d0d8f7 ("x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use") > > and it probably does not explain why this works, right? I think this was imposed by our inability to handle XSAVES compacted format. And that should be fixed now, AFAICR. > So *either* fpregs_active() was always true if the task used FPU *once* > or if it used FPU *recently* and task's FPU register are active (I don't > remember anymore). Anyway: > a) we don't get here because caller checks for fpregs_active() before > invoking copy_fpstate_to_sigframe() Ok. > b) a preemption check resets fpregs_active() after the first check > then we do "xsave", xsaves traps because FPU is off/disabled, trap > loads task's FPU registers, gets back to "xsave", "xsave" saves > CPU's register to the stack frame. > > The b part does not work like that since commit > bef8b6da9522 ("x86/fpu: Handle #NM without FPU emulation as an error") > > but then at that point it was "okay" because fpregs_active() would > return true if the task used FPU registers at least once. If it did not > use them then it would not invoke that function (the caller checks for > fpregs_active()). Right, AFAICT, we were moving to eager FPU at that time and this commit is part of the lazy FPU removal stuff. > So I can't tell you why it is okay but I can explain why it is done > (well, that part I puzzled together). I hate the fact that we have to puzzle stuff together for the FPU code. ;-\ > The task is running and using FPU registers. Then an evil mind sends a > signal. The task goes into kernel, prepares itself and is about to > handle the signal in userland. It saves its FPU registers on the stack > frame. It zeros its current FPU registers (ready for a fresh start), > loads the address of the signal handler and returns to user land > handling the signal. > > Now. The signal handler may use FPU registers and the signal handler > maybe be preempted so you need to save the FPU registers of the signal > handler and you can't mix them up with the FPU register's of the task > (before it started handling the signal). > > So in order to avoid a second FPU struct it saves them on user's stack > frame. I *think* this (avoiding a second FPU struct) is the primary > motivation. Yah, makes sense. Sounds like something we'd do :-) > A bonus point might be that the signal handler has a third > argument the `context'. That means you can use can access the task's FPU > registers from the signal handler. Not sure *why* you want to do so but > yo can. For <raisins>. > I can't imagine a use case and I was looking for a user and expecting it > to be glibc but I didn't find anything in the glibc that would explain > it. Intel even defines a few bytes as "user reserved" which are used by > "struct _fpx_sw_bytes" to add a marker in the signal and recognise it on > restore. > The only user that seems to make use of that is `criu' (or it looked > like it does use it). I would prefer to add a second struct-FPU and use > that for the signal handler. This would avoid the whole dance here. That would be interesting from the perspective of making the code straight-forward and not having to document all that dance somewhere. > And `criu' could maybe become a proper interface. I don't think as of > now that it will break something in userland if the signal handler > suddenly does not have a pointer to the FPU struct. Well, but allocating a special FPU pointer for the signal handler context sounds simple and clean, no? Or are we afraid that that would slowdown signal handling, the whole allocation and assignment and stuff...? > Okay. So I was verbose *now*. Depending on what you say (or don't) I > will try to recycle this into commit message in a few days. Yeah, much much better. Thanks a lot for the effort!
tl;dr The kernel saves task's FPU registers on user's signal stack before entering the signal handler. Can we avoid that and have in-kernel memory for that? Does someone rely on the FPU registers from the task in the signal handler? On 2019-01-17 13:22:53 [+0100], Borislav Petkov wrote: > > The task is running and using FPU registers. Then an evil mind sends a > > signal. The task goes into kernel, prepares itself and is about to > > handle the signal in userland. It saves its FPU registers on the stack > > frame. It zeros its current FPU registers (ready for a fresh start), > > loads the address of the signal handler and returns to user land > > handling the signal. > > > > Now. The signal handler may use FPU registers and the signal handler > > maybe be preempted so you need to save the FPU registers of the signal > > handler and you can't mix them up with the FPU register's of the task > > (before it started handling the signal). > > > > So in order to avoid a second FPU struct it saves them on user's stack > > frame. I *think* this (avoiding a second FPU struct) is the primary > > motivation. > > Yah, makes sense. Sounds like something we'd do :-) > > > A bonus point might be that the signal handler has a third > > argument the `context'. That means you can use can access the task's FPU > > registers from the signal handler. Not sure *why* you want to do so but > > yo can. > > For <raisins>. > > > I can't imagine a use case and I was looking for a user and expecting it > > to be glibc but I didn't find anything in the glibc that would explain > > it. Intel even defines a few bytes as "user reserved" which are used by > > "struct _fpx_sw_bytes" to add a marker in the signal and recognise it on > > restore. > > The only user that seems to make use of that is `criu' (or it looked > > like it does use it). I would prefer to add a second struct-FPU and use > > that for the signal handler. This would avoid the whole dance here. > > That would be interesting from the perspective of making the code > straight-forward and not having to document all that dance somewhere. > > > And `criu' could maybe become a proper interface. I don't think as of > > now that it will break something in userland if the signal handler > > suddenly does not have a pointer to the FPU struct. > > Well, but allocating a special FPU pointer for the signal handler > context sounds simple and clean, no? Or are we afraid that that would > slowdown signal handling, the whole allocation and assignment and > stuff...? So I *think* we could allocate a second struct fpu for the signal handler at task creation time and use it. It should not slow-down signal handling. So instead saving it to user's stack we would save it to "our" memory. On the restore path we could trust our buffer and simply load it again. Sebastian
On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote: > The kernel saves task's FPU registers on user's signal stack before > entering the signal handler. Can we avoid that and have in-kernel memory > for that? Does someone rely on the FPU registers from the task in the > signal handler? This is part of our ABI for *sure*. Inspecting that state is how userspace makes sense of MPX or protection keys faults. We even use this in selftests/.
On 2019-01-18 13:17:28 [-0800], Dave Hansen wrote: > On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote: > > The kernel saves task's FPU registers on user's signal stack before > > entering the signal handler. Can we avoid that and have in-kernel memory > > for that? Does someone rely on the FPU registers from the task in the > > signal handler? > > This is part of our ABI for *sure*. I missed that part. I will try to look it up and look see if says something about optional part. But ABI means we must keep doing it even if there are no users? > Inspecting that state is how > userspace makes sense of MPX or protection keys faults. We even use > this in selftests/. Okay. MPX does not check for FP_XSTATE_MAGIC[12] and simply assumes it is there. That is why I didn't find it. So we would break MPX. But then MPX is on its way out, so… Sebastian
On 1/18/19 1:37 PM, Sebastian Andrzej Siewior wrote: > On 2019-01-18 13:17:28 [-0800], Dave Hansen wrote: >> On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote: >>> The kernel saves task's FPU registers on user's signal stack before >>> entering the signal handler. Can we avoid that and have in-kernel memory >>> for that? Does someone rely on the FPU registers from the task in the >>> signal handler? >> >> This is part of our ABI for *sure*. > > I missed that part. I will try to look it up and look see if says > something about optional part. > But ABI means we must keep doing it even if there are no users? I'd bet a large sum of money there are users. Google for 'uc_mcontext fpregs'.
On 01/18, Dave Hansen wrote: > > On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote: > > The kernel saves task's FPU registers on user's signal stack before > > entering the signal handler. Can we avoid that and have in-kernel memory > > for that? Does someone rely on the FPU registers from the task in the > > signal handler? > > This is part of our ABI for *sure*. Inspecting that state is how > userspace makes sense of MPX or protection keys faults. We even use > this in selftests/. Yes. And in any case I do not understand the idea to use the second in-kernel struct fpu. A signal handler can be interrupted by another signal, this will need to save/restore the FPU state again. Oleg.
On Mon, Jan 21, 2019 at 12:21:17PM +0100, Oleg Nesterov wrote: > And in any case I do not understand the idea to use the second > in-kernel struct fpu. A signal handler can be interrupted by another > signal, this will need to save/restore the FPU state again. Well, we were just speculating whether doing that would simplify the code around get_sigframe() et al. But if that is an ABI, then we can't really touch it. Btw, where is that whole ABI deal about saving FPU regs on the user signal stack documented? Thx.
On 01/22, Borislav Petkov wrote: > > On Mon, Jan 21, 2019 at 12:21:17PM +0100, Oleg Nesterov wrote: > > And in any case I do not understand the idea to use the second > > in-kernel struct fpu. A signal handler can be interrupted by another > > signal, this will need to save/restore the FPU state again. > > Well, we were just speculating whether doing that would simplify the > code around get_sigframe() et al. But if that is an ABI, then we can't > really touch it. > > Btw, where is that whole ABI deal about saving FPU regs on the user > signal stack documented? I don't know... tried to google, found nothing. the comment in /usr/include/sys/ucontext.h mentions SysV/i386 ABI + historical reasons, this didn't help. Oleg.
On Tue, Jan 22, 2019 at 05:15:51PM +0100, Oleg Nesterov wrote: > I don't know... tried to google, found nothing. > > the comment in /usr/include/sys/ucontext.h mentions SysV/i386 ABI + historical > reasons, this didn't help. So I'm being told by one of the psABI folks that this is not really written down somewhere explicitly but it is the result from the POSIX and psABI treatise of signal handlers, what they're supposed to do, caller- and callee-saved registers, etc. And FPU registers are volatile, i.e., caller-saved. Which means, the handler itself doesn't save them but the caller, which, doesn't really expect any signals - they are async. So the kernel must do that and slap the FPU regs onto the user stack... Hohumm. Makes sense.
On 2019-01-21 12:21:17 [+0100], Oleg Nesterov wrote: > > This is part of our ABI for *sure*. Inspecting that state is how > > userspace makes sense of MPX or protection keys faults. We even use > > this in selftests/. > > Yes. > > And in any case I do not understand the idea to use the second in-kernel struct fpu. > A signal handler can be interrupted by another signal, this will need to save/restore > the FPU state again. So I assumed that while SIGUSR1 is handled SIGUSR2 will wait until the current signal is handled. So no interruption. But then SIGSEGV is probably the exception which will interrupt SIGUSR1. So we would need a third one… The idea was to save the FPU state in-kernel so we don't have to revalidate everything because userspace had access to it and could do things. Some things are complicated and not documented why they are the way they are. For instance on 64bit (based on the code) the signal handler can remove SSE from the state-mask and the kernel loads the "default-empty" SSE registers and the enabled states from user. This isn't done on 32bit. Also: we save with XSAVE and allocate the buffer on stack. But if we can't find the FP_XSTATE_MAGIC* or the buffer is not properly aligned then we fallback to FXSR and assume that we have a FXSR buffer in front of us. > Oleg. Sebastian
On 2019-01-22 18:00:23 [+0100], Borislav Petkov wrote: > On Tue, Jan 22, 2019 at 05:15:51PM +0100, Oleg Nesterov wrote: > > I don't know... tried to google, found nothing. > > > > the comment in /usr/include/sys/ucontext.h mentions SysV/i386 ABI + historical > > reasons, this didn't help. > > So I'm being told by one of the psABI folks that this is not really > written down somewhere explicitly but it is the result from the POSIX > and psABI treatise of signal handlers, what they're supposed to do, > caller- and callee-saved registers, etc. > > And FPU registers are volatile, i.e., caller-saved. Which means, the > handler itself doesn't save them but the caller, which, doesn't really > expect any signals - they are async. So the kernel must do that and > slap the FPU regs onto the user stack... My point was save them somewhere else if it is possible. So we could save a few cycles during signal delivery and it would make the code a little simpler. Let me finish the series and then we can think how we could improve it. > Hohumm. Makes sense. > Sebastian
Hi Sebastian, Sorry, I just noticed your email... On 02/05, Sebastian Andrzej Siewior wrote: > > On 2019-01-21 12:21:17 [+0100], Oleg Nesterov wrote: > > > This is part of our ABI for *sure*. Inspecting that state is how > > > userspace makes sense of MPX or protection keys faults. We even use > > > this in selftests/. > > > > Yes. > > > > And in any case I do not understand the idea to use the second in-kernel struct fpu. > > A signal handler can be interrupted by another signal, this will need to save/restore > > the FPU state again. > > So I assumed that while SIGUSR1 is handled SIGUSR2 will wait until the > current signal is handled. So no interruption. But then SIGSEGV is > probably the exception which will interrupt SIGUSR1. So we would need a > third one… I guess you do not need my answer, but just in case. SIGSEGV is not an exception. A SIGUSR1 handler can be interrupted by any other signal which is not included in sigaction->sa_mask. Even SIGUSR1 can interrupt the handler if SA_NODEFER was used. > The idea was to save the FPU state in-kernel so we don't have to > revalidate everything because userspace had access to it and could do > things. I understand, but this simply can't work, see above. Oleg.
On 2019-02-26 17:38:22 [+0100], Oleg Nesterov wrote: > Hi Sebastian, Hi Oleg, > Sorry, I just noticed your email... no worries. > > So I assumed that while SIGUSR1 is handled SIGUSR2 will wait until the > > current signal is handled. So no interruption. But then SIGSEGV is > > probably the exception which will interrupt SIGUSR1. So we would need a > > third one… > > I guess you do not need my answer, but just in case. > > SIGSEGV is not an exception. A SIGUSR1 handler can be interrupted by any other > signal which is not included in sigaction->sa_mask. Even SIGUSR1 can interrupt > the handler if SA_NODEFER was used. okay, understood. My understanding was that since signal sending is not very deterministic and as such you can't reliably control if one signal arrived before the other finished it should not matter. But well, this all is gone now… Thank you for the explanation. > Oleg. Sebastian
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index c0cdcb9b7de5a..c136a4327659d 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -157,7 +157,6 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf) int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size) { struct fpu *fpu = ¤t->thread.fpu; - struct xregs_state *xsave = &fpu->state.xsave; struct task_struct *tsk = current; int ia32_fxstate = (buf != buf_fx); @@ -172,29 +171,12 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size) sizeof(struct user_i387_ia32_struct), NULL, (struct _fpstate_32 __user *) buf) ? -1 : 1; - if (fpu->initialized || using_compacted_format()) { - /* Save the live register state to the user directly. */ - if (copy_fpregs_to_sigframe(buf_fx)) - return -1; - /* Update the thread's fxstate to save the fsave header. */ - if (ia32_fxstate) - copy_fxregs_to_kernel(fpu); - } else { - /* - * It is a *bug* if kernel uses compacted-format for xsave - * area and we copy it out directly to a signal frame. It - * should have been handled above by saving the registers - * directly. - */ - if (boot_cpu_has(X86_FEATURE_XSAVES)) { - WARN_ONCE(1, "x86/fpu: saving compacted-format xsave area to a signal frame!\n"); - return -1; - } - - fpstate_sanitize_xstate(fpu); - if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size)) - return -1; - } + /* Save the live register state to the user directly. */ + if (copy_fpregs_to_sigframe(buf_fx)) + return -1; + /* Update the thread's fxstate to save the fsave header. */ + if (ia32_fxstate) + copy_fxregs_to_kernel(fpu); /* Save the fsave header for the 32-bit frames. */ if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))
Since ->initialized is always true for user tasks and kernel threads don't get this far, we always save the registers directly to userspace. Remove check for ->initialized because it is always true and remove the false condition. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- arch/x86/kernel/fpu/signal.c | 30 ++++++------------------------ 1 file changed, 6 insertions(+), 24 deletions(-)