Message ID | 20190221115020.12385-15-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/22] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() | expand |
On 2/21/19 3:50 AM, Sebastian Andrzej Siewior wrote: > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h > index 67e4805bccb6f..05f6fce62e9f1 100644 > --- a/arch/x86/include/asm/fpu/internal.h > +++ b/arch/x86/include/asm/fpu/internal.h > @@ -562,8 +562,24 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu) > */ > static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu) > { > - if (static_cpu_has(X86_FEATURE_FPU)) > - __fpregs_load_activate(new_fpu, cpu); > + struct pkru_state *pk; > + u32 pkru_val = 0; > + > + if (!static_cpu_has(X86_FEATURE_FPU)) > + return; > + > + __fpregs_load_activate(new_fpu, cpu); This is still a bit light on comments. Maybe: /* PKRU state is switched eagerly because... */ > + if (!cpu_feature_enabled(X86_FEATURE_OSPKE)) > + return; > + > + if (current->mm) { > + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); > + WARN_ON_ONCE(!pk); This can trip on us of the 'init optimization' is in play because get_xsave_addr() checks xsave->header.xfeatures. That's unlikely today because we usually set PKRU to a restrictive value. But, it's also not *guaranteed*. Userspace could easily do an XRSTOR that puts PKRU back in its init state if it wanted to, then this would end up with pk==NULL. We might actually want a selftest that *does* that. I don't think we have one. > + if (pk) > + pkru_val = pk->pkru; > + }> + __write_pkru(pkru_val); > } A comment above __write_pkru() would be nice to say that it only actually does the slow instruction on changes to the value. BTW, this has the implicit behavior of always trying to do a __write_pkru(0) on switches to kernel threads. That seems a bit weird and it is likely to impose WRPKRU overhead on switches between user and kernel threads. The 0 value is also the most permissive, which is not great considering that user mm's can be active the in page tables when running kernel threads if we're being lazy. Seems like we should either leave PKRU alone or have 'init_pkru_value' be the default. That gives good security properties and is likely to match the application value, removing the WRPKRU overhead.
On 2019-02-25 10:16:24 [-0800], Dave Hansen wrote: > On 2/21/19 3:50 AM, Sebastian Andrzej Siewior wrote: > > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h > > index 67e4805bccb6f..05f6fce62e9f1 100644 > > --- a/arch/x86/include/asm/fpu/internal.h > > +++ b/arch/x86/include/asm/fpu/internal.h > > @@ -562,8 +562,24 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu) > > */ > > static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu) > > { > > - if (static_cpu_has(X86_FEATURE_FPU)) > > - __fpregs_load_activate(new_fpu, cpu); > > + struct pkru_state *pk; > > + u32 pkru_val = 0; > > + > > + if (!static_cpu_has(X86_FEATURE_FPU)) > > + return; > > + > > + __fpregs_load_activate(new_fpu, cpu); > > This is still a bit light on comments. > > Maybe: > /* PKRU state is switched eagerly because... */ okay, will update. > > + if (!cpu_feature_enabled(X86_FEATURE_OSPKE)) > > + return; > > + > > + if (current->mm) { > > + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); > > + WARN_ON_ONCE(!pk); > > This can trip on us of the 'init optimization' is in play because > get_xsave_addr() checks xsave->header.xfeatures. That's unlikely today > because we usually set PKRU to a restrictive value. But, it's also not > *guaranteed*. > > Userspace could easily do an XRSTOR that puts PKRU back in its init > state if it wanted to, then this would end up with pk==NULL. > > We might actually want a selftest that *does* that. I don't think we > have one. So you are saying that the above warning might trigger and be "okay"? My understanding is that the in-kernel XSAVE will always save everything so we should never "lose" the XFEATURE_PKRU no matter what user space does. So as test case you want xsave (-1 & ~XFEATURE_PKRU) xrestore (-1 & ~XFEATURE_PKRU) in userland and then a context switch to see if the warning above triggers? > > + if (pk) > > + pkru_val = pk->pkru; > > + }> + __write_pkru(pkru_val); > > } > > A comment above __write_pkru() would be nice to say that it only > actually does the slow instruction on changes to the value. Could we please not do this? It is a comment above one of the callers function and we have two or three. And we have that comment already within __write_pkru(). > BTW, this has the implicit behavior of always trying to do a > __write_pkru(0) on switches to kernel threads. That seems a bit weird > and it is likely to impose WRPKRU overhead on switches between user and > kernel threads. > > The 0 value is also the most permissive, which is not great considering > that user mm's can be active the in page tables when running kernel > threads if we're being lazy. > > Seems like we should either leave PKRU alone or have 'init_pkru_value' > be the default. That gives good security properties and is likely to > match the application value, removing the WRPKRU overhead. Last time we talked about this we agreed (or this was my impression) that 0 should be written so that the kernel thread should always be able to write to user space in case it borrowed its mm (otherwise it has none and it would fail anyway). We didn't want to leave PKRU alone because the outcome (whether or not the write by the kernel thread succeeds) should not depend on the last running task (and be random) but deterministic. I am personally open to each outcome you decide :) I you want to use `init_pkru_value' instead of 0 then I can change this. If you want to skip the possible update for kernel threads then okay but maybe this should be documented somehow. Sebastian
On 3/8/19 10:08 AM, Sebastian Andrzej Siewior wrote: > On 2019-02-25 10:16:24 [-0800], Dave Hansen wrote: >>> + if (!cpu_feature_enabled(X86_FEATURE_OSPKE)) >>> + return; >>> + >>> + if (current->mm) { >>> + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); >>> + WARN_ON_ONCE(!pk); >> >> This can trip on us of the 'init optimization' is in play because >> get_xsave_addr() checks xsave->header.xfeatures. That's unlikely today >> because we usually set PKRU to a restrictive value. But, it's also not >> *guaranteed*. >> >> Userspace could easily do an XRSTOR that puts PKRU back in its init >> state if it wanted to, then this would end up with pk==NULL. >> >> We might actually want a selftest that *does* that. I don't think we >> have one. > > So you are saying that the above warning might trigger and be "okay"? Nothing will break, but the warning will trigger, which isn't nice. > My understanding is that the in-kernel XSAVE will always save everything > so we should never "lose" the XFEATURE_PKRU no matter what user space > does. > > So as test case you want > xsave (-1 & ~XFEATURE_PKRU) > xrestore (-1 & ~XFEATURE_PKRU) > > in userland and then a context switch to see if the warning above > triggers? I think you need an XRSTOR with RFBM=-1 (or at least with the PKRU bit set) and the PKRU bit in the XFEATURES field in the XSAVE buffer set to 0. >>> + if (pk) >>> + pkru_val = pk->pkru; >>> + }> + __write_pkru(pkru_val); >>> } >> >> A comment above __write_pkru() would be nice to say that it only >> actually does the slow instruction on changes to the value. > > Could we please not do this? It is a comment above one of the callers > function and we have two or three. And we have that comment already > within __write_pkru(). I looked at this code and thought "writing PKRU is slow", and "this writes PKRU unconditionally", and "the __ version of the function shoudn't have much logic in it". I got 2/3 wrong. To me that means this site needs a 1-line comment. Feel free to move one of the other comments to here if you think it's over-commented, but this site needs one. >> BTW, this has the implicit behavior of always trying to do a >> __write_pkru(0) on switches to kernel threads. That seems a bit weird >> and it is likely to impose WRPKRU overhead on switches between user and >> kernel threads. >> >> The 0 value is also the most permissive, which is not great considering >> that user mm's can be active the in page tables when running kernel >> threads if we're being lazy. >> >> Seems like we should either leave PKRU alone or have 'init_pkru_value' >> be the default. That gives good security properties and is likely to >> match the application value, removing the WRPKRU overhead. > > Last time we talked about this we agreed (or this was my impression) that > 0 should be written so that the kernel thread should always be able to > write to user space in case it borrowed its mm (otherwise it has none > and it would fail anyway). We can't write to userspace when borrowing an mm. If the kernel borrows an mm, we might as well be on the init_mm which has no userspace mappings. > We didn't want to leave PKRU alone because the outcome (whether or not > the write by the kernel thread succeeds) should not depend on the last > running task (and be random) but deterministic. Right, so let's make it deterministically restrictive: either init_pkru_value, or -1 since kernel threads shouldn't be touching userspace in the first place.
On 2019-03-08 11:01:25 [-0800], Dave Hansen wrote: > On 3/8/19 10:08 AM, Sebastian Andrzej Siewior wrote: > > On 2019-02-25 10:16:24 [-0800], Dave Hansen wrote: > >>> + if (!cpu_feature_enabled(X86_FEATURE_OSPKE)) > >>> + return; > >>> + > >>> + if (current->mm) { > >>> + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); > >>> + WARN_ON_ONCE(!pk); … > Nothing will break, but the warning will trigger, which isn't nice. the warning should trigger if something goes south, I was not expecting it to happen. > > My understanding is that the in-kernel XSAVE will always save everything > > so we should never "lose" the XFEATURE_PKRU no matter what user space > > does. > > > > So as test case you want > > xsave (-1 & ~XFEATURE_PKRU) > > xrestore (-1 & ~XFEATURE_PKRU) > > > > in userland and then a context switch to see if the warning above > > triggers? > > I think you need an XRSTOR with RFBM=-1 (or at least with the PKRU bit > set) and the PKRU bit in the XFEATURES field in the XSAVE buffer set to 0. let me check that, write a test case in userland and I come back with the results. I can remove that warning but I wasn't expecting it to trigger so let me verify that first. > >>> + if (pk) > >>> + pkru_val = pk->pkru; > >>> + }> + __write_pkru(pkru_val); > >>> } > >> > >> A comment above __write_pkru() would be nice to say that it only > >> actually does the slow instruction on changes to the value. > > > > Could we please not do this? It is a comment above one of the callers > > function and we have two or three. And we have that comment already > > within __write_pkru(). > > I looked at this code and thought "writing PKRU is slow", and "this > writes PKRU unconditionally", and "the __ version of the function > shoudn't have much logic in it". > > I got 2/3 wrong. To me that means this site needs a 1-line comment. > Feel free to move one of the other comments to here if you think it's > over-commented, but this site needs one. right because things changed as part of patch series. You wanted to have in __write_pkru() the same semantic like in __read_pkru() which is currently the case because __write_pkru() has the check. It would be great if we could rename it to something else and avoid the comment. (Because if this user gets a comment then other should, too and I think this is an overkill). > > Last time we talked about this we agreed (or this was my impression) that > > 0 should be written so that the kernel thread should always be able to > > write to user space in case it borrowed its mm (otherwise it has none > > and it would fail anyway). > > We can't write to userspace when borrowing an mm. If the kernel borrows > an mm, we might as well be on the init_mm which has no userspace mappings. If a kernel thread borrows a mm from a user task via use_mm() then it _can_ write to that task's user land memory from a kthread. > > We didn't want to leave PKRU alone because the outcome (whether or not > > the write by the kernel thread succeeds) should not depend on the last > > running task (and be random) but deterministic. > > Right, so let's make it deterministically restrictive: either > init_pkru_value, or -1 since kernel threads shouldn't be touching > userspace in the first place. I'm fine either way, just tell me what you want. Just consider the use_mm() part above I wrote. (I remember you/luto suggest to have an API for something like that so that the PKRU value can be Sebastian
On 2019-03-11 12:06:05 [+0100], To Dave Hansen wrote: > On 2019-03-08 11:01:25 [-0800], Dave Hansen wrote: > > On 3/8/19 10:08 AM, Sebastian Andrzej Siewior wrote: > > > On 2019-02-25 10:16:24 [-0800], Dave Hansen wrote: > > >>> + if (!cpu_feature_enabled(X86_FEATURE_OSPKE)) > > >>> + return; > > >>> + > > >>> + if (current->mm) { > > >>> + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); > > >>> + WARN_ON_ONCE(!pk); > … > > Nothing will break, but the warning will trigger, which isn't nice. > > the warning should trigger if something goes south, I was not expecting > it to happen. > > > > My understanding is that the in-kernel XSAVE will always save everything > > > so we should never "lose" the XFEATURE_PKRU no matter what user space > > > does. > > > > > > So as test case you want > > > xsave (-1 & ~XFEATURE_PKRU) > > > xrestore (-1 & ~XFEATURE_PKRU) > > > > > > in userland and then a context switch to see if the warning above > > > triggers? > > > > I think you need an XRSTOR with RFBM=-1 (or at least with the PKRU bit > > set) and the PKRU bit in the XFEATURES field in the XSAVE buffer set to 0. > > let me check that, write a test case in userland and I come back with > the results. I can remove that warning but I wasn't expecting it to > trigger so let me verify that first. so I made dis: https://breakpoint.cc/tc-xsave.c and it doesn't trigger. XSAVE saves what is specified and enabled. XRSTOR restores what is specified, available in the header and enabled. Which means even if userland disables a bit in the header, it is still available during context switch by kernel's XSAVE as long as it is set in XSTATE_BV. The user can't use XSETBV (can only query via XGETBV) which means that XFEATURE_PKRU can't be removed by the user. But you got me thinking: During signal delivery we save tasks' FPU content on the signal stack. If the signal handler removes XFEATURE_PKRU bit then __fpu__restore_sig() will load the "missing state" from init_fpstate. Which means that the protection key will be set to zero. Not sure we want this or not but this the case. A XRSTOR in userland without XFEATURE_PKRU would leave the PKRU state unchanged. The test case from above does this if you want to double check. Sebastian
On 2019-03-12 17:06:18 [-0700], Dave Hansen wrote: > Thanks for doing this. I'll see if I can dig into the test case and > figure out why it's not doing what I expected. It might be that the CPU > _could_ go back to the init state but chooses not to, or that something > else is taking us out of the init state before we reach the kernel code > in question. I'm going to drop that WARN_ON() check. The reason is that we are fully preemptible during fpstate_init() and a context switch during memset() and fpstate_init_xstate() triggers this warning. Sebastian
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 67e4805bccb6f..05f6fce62e9f1 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -562,8 +562,24 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu) */ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu) { - if (static_cpu_has(X86_FEATURE_FPU)) - __fpregs_load_activate(new_fpu, cpu); + struct pkru_state *pk; + u32 pkru_val = 0; + + if (!static_cpu_has(X86_FEATURE_FPU)) + return; + + __fpregs_load_activate(new_fpu, cpu); + + if (!cpu_feature_enabled(X86_FEATURE_OSPKE)) + return; + + if (current->mm) { + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); + WARN_ON_ONCE(!pk); + if (pk) + pkru_val = pk->pkru; + } + __write_pkru(pkru_val); } /* diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index fbe41f808e5d8..4e18a837223ff 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -5,6 +5,7 @@ #include <linux/types.h> #include <asm/processor.h> #include <linux/uaccess.h> +#include <asm/user.h> /* Bit 63 of XCR0 is reserved for future expansion */ #define XFEATURE_MASK_EXTEND (~(XFEATURE_MASK_FPSSE | (1ULL << 63)))