Message ID | 20180710222639.8241-19-yu-cheng.yu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> +/* > + * WRUSS is a kernel instrcution and but writes to user > + * shadow stack memory. When a fault occurs, both > + * X86_PF_USER and X86_PF_SHSTK are set. > + */ > +static int is_wruss(struct pt_regs *regs, unsigned long error_code) > +{ > + return (((error_code & (X86_PF_USER | X86_PF_SHSTK)) == > + (X86_PF_USER | X86_PF_SHSTK)) && !user_mode(regs)); > +} I thought X86_PF_USER was set based on the mode in which the fault occurred. Does this mean that the architecture of this bit is different now? That seems like something we need to call out if so. It also means we need to update the SDM because some of the text is wrong. > static void > show_fault_oops(struct pt_regs *regs, unsigned long error_code, > unsigned long address) > @@ -848,7 +859,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, > struct task_struct *tsk = current; > > /* User mode accesses just cause a SIGSEGV */ > - if (error_code & X86_PF_USER) { > + if ((error_code & X86_PF_USER) && !is_wruss(regs, error_code)) { > /* > * It's possible to have interrupts off here: > */ This needs commenting about why is_wruss() is special.
On Tue, Jul 10, 2018 at 03:26:30PM -0700, Yu-cheng Yu wrote: > WRUSS is a new kernel-mode instruction but writes directly > to user shadow stack memory. This is used to construct > a return address on the shadow stack for the signal > handler. > > This instruction can fault if the user shadow stack is > invalid shadow stack memory. In that case, the kernel does > fixup. > > +static inline int write_user_shstk_64(unsigned long addr, unsigned long val) > +{ > + int err = 0; > + > + asm volatile("1: wrussq %[val], (%[addr])\n" > + "xor %[err], %[err]\n" this XOR is superfluous, you already cleared @err above. > + "2:\n" > + ".section .fixup,\"ax\"\n" > + "3: mov $-1, %[err]; jmp 2b\n" > + ".previous\n" > + _ASM_EXTABLE(1b, 3b) > + : [err] "=a" (err) > + : [val] "S" (val), [addr] "D" (addr)); > + > + return err; > +} > +#endif /* CONFIG_X86_INTEL_CET */ > + > #define nop() asm volatile ("nop") What happened to: https://lkml.kernel.org/r/1528729376.4526.0.camel@2b52.sc.intel.com
On Tue, Jul 10, 2018 at 03:26:30PM -0700, Yu-cheng Yu wrote: > diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt > index e0b85930dd77..72bb7c48a7df 100644 > --- a/arch/x86/lib/x86-opcode-map.txt > +++ b/arch/x86/lib/x86-opcode-map.txt > @@ -789,7 +789,7 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2) > f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2) > f2: ANDN Gy,By,Ey (v) > f3: Grp17 (1A) > -f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) > +f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W > f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) > f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v) > EndTable Where are all the other instructions? ISTR that documentation patch listing a whole bunch of new instructions, not just wuss.
On Wed, 2018-07-11 at 11:45 +0200, Peter Zijlstra wrote: > On Tue, Jul 10, 2018 at 03:26:30PM -0700, Yu-cheng Yu wrote: > > > > diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86- > > opcode-map.txt > > index e0b85930dd77..72bb7c48a7df 100644 > > --- a/arch/x86/lib/x86-opcode-map.txt > > +++ b/arch/x86/lib/x86-opcode-map.txt > > @@ -789,7 +789,7 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 > > Gd,Eb (F2) | CRC32 Gd,Eb (66&F2) > > f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 > > Gd,Ew (66&F2) > > f2: ANDN Gy,By,Ey (v) > > f3: Grp17 (1A) > > -f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey > > (F2),(v) > > +f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey > > (F2),(v) | WRUSS Pq,Qq (66),REX.W > > f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) > > f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By > > (F3),(v) | SHRX Gy,Ey,By (F2),(v) > > EndTable > Where are all the other instructions? ISTR that documentation patch > listing a whole bunch of new instructions, not just wuss. Currently we only use WRUSS in the kernel code. Do we want to add all instructions here? Yu-cheng
On Wed, 2018-07-11 at 11:44 +0200, Peter Zijlstra wrote: > On Tue, Jul 10, 2018 at 03:26:30PM -0700, Yu-cheng Yu wrote: > > > > WRUSS is a new kernel-mode instruction but writes directly > > to user shadow stack memory. This is used to construct > > a return address on the shadow stack for the signal > > handler. > > > > This instruction can fault if the user shadow stack is > > invalid shadow stack memory. In that case, the kernel does > > fixup. > > > > > > +static inline int write_user_shstk_64(unsigned long addr, unsigned > > long val) > > +{ > > + int err = 0; > > + > > + asm volatile("1: wrussq %[val], (%[addr])\n" > > + "xor %[err], %[err]\n" > this XOR is superfluous, you already cleared @err above. I will fix it. > > > > > + "2:\n" > > + ".section .fixup,\"ax\"\n" > > + "3: mov $-1, %[err]; jmp 2b\n" > > + ".previous\n" > > + _ASM_EXTABLE(1b, 3b) > > + : [err] "=a" (err) > > + : [val] "S" (val), [addr] "D" (addr)); > > + > > + return err; > > +} > > +#endif /* CONFIG_X86_INTEL_CET */ > > + > > #define nop() asm volatile ("nop") > What happened to: > > https://lkml.kernel.org/r/1528729376.4526.0.camel@2b52.sc.intel.com Yes, I put that in once and realized we only need to skip the instruction and return err. Do you think we still need a handler for that? Yu-cheng
On Wed, Jul 11, 2018 at 07:58:09AM -0700, Yu-cheng Yu wrote: > On Wed, 2018-07-11 at 11:45 +0200, Peter Zijlstra wrote: > > On Tue, Jul 10, 2018 at 03:26:30PM -0700, Yu-cheng Yu wrote: > > > > > > diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86- > > > opcode-map.txt > > > index e0b85930dd77..72bb7c48a7df 100644 > > > --- a/arch/x86/lib/x86-opcode-map.txt > > > +++ b/arch/x86/lib/x86-opcode-map.txt > > > @@ -789,7 +789,7 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 > > > Gd,Eb (F2) | CRC32 Gd,Eb (66&F2) > > > f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 > > > Gd,Ew (66&F2) > > > f2: ANDN Gy,By,Ey (v) > > > f3: Grp17 (1A) > > > -f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey > > > (F2),(v) > > > +f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey > > > (F2),(v) | WRUSS Pq,Qq (66),REX.W > > > f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) > > > f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By > > > (F3),(v) | SHRX Gy,Ey,By (F2),(v) > > > EndTable > > Where are all the other instructions? ISTR that documentation patch > > listing a whole bunch of new instructions, not just wuss. > > Currently we only use WRUSS in the kernel code. Do we want to add all > instructions here? Yes, since we also use the in-kernel decoder to decode random userspace code.
On Wed, Jul 11, 2018 at 08:06:55AM -0700, Yu-cheng Yu wrote: > On Wed, 2018-07-11 at 11:44 +0200, Peter Zijlstra wrote: > > What happened to: > > > > https://lkml.kernel.org/r/1528729376.4526.0.camel@2b52.sc.intel.com > > Yes, I put that in once and realized we only need to skip the > instruction and return err. Do you think we still need a handler for > that? I find that other form more readable, but then there's Nadav doing asm macros to shrink inline asm thingies so maybe he has another suggestion.
On Wed, 2018-07-11 at 17:27 +0200, Peter Zijlstra wrote: > On Wed, Jul 11, 2018 at 07:58:09AM -0700, Yu-cheng Yu wrote: > > > > On Wed, 2018-07-11 at 11:45 +0200, Peter Zijlstra wrote: > > > > > > On Tue, Jul 10, 2018 at 03:26:30PM -0700, Yu-cheng Yu wrote: > > > > > > > > > > > > diff --git a/arch/x86/lib/x86-opcode-map.txt > > > > b/arch/x86/lib/x86- > > > > opcode-map.txt > > > > index e0b85930dd77..72bb7c48a7df 100644 > > > > --- a/arch/x86/lib/x86-opcode-map.txt > > > > +++ b/arch/x86/lib/x86-opcode-map.txt > > > > @@ -789,7 +789,7 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 > > > > Gd,Eb (F2) | CRC32 Gd,Eb (66&F2) > > > > f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 > > > > Gd,Ew (66&F2) > > > > f2: ANDN Gy,By,Ey (v) > > > > f3: Grp17 (1A) > > > > -f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey > > > > (F2),(v) > > > > +f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey > > > > (F2),(v) | WRUSS Pq,Qq (66),REX.W > > > > f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey > > > > (F2),(v) > > > > f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX > > > > Gy,Ey,By > > > > (F3),(v) | SHRX Gy,Ey,By (F2),(v) > > > > EndTable > > > Where are all the other instructions? ISTR that documentation > > > patch > > > listing a whole bunch of new instructions, not just wuss. > > Currently we only use WRUSS in the kernel code. Do we want to add > > all > > instructions here? > Yes, since we also use the in-kernel decoder to decode random > userspace > code. I will add other instructions. Yu-cheng
On Tue, 2018-07-10 at 16:48 -0700, Dave Hansen wrote: > > > > +/* > > + * WRUSS is a kernel instrcution and but writes to user > > + * shadow stack memory. When a fault occurs, both > > + * X86_PF_USER and X86_PF_SHSTK are set. > > + */ > > +static int is_wruss(struct pt_regs *regs, unsigned long error_code) > > +{ > > + return (((error_code & (X86_PF_USER | X86_PF_SHSTK)) == > > + (X86_PF_USER | X86_PF_SHSTK)) && !user_mode(regs)); > > +} > I thought X86_PF_USER was set based on the mode in which the fault > occurred. Does this mean that the architecture of this bit is different > now? Yes. > That seems like something we need to call out if so. It also means we > need to update the SDM because some of the text is wrong. It needs to mention the WRUSS case. > > > > > static void > > show_fault_oops(struct pt_regs *regs, unsigned long error_code, > > unsigned long address) > > @@ -848,7 +859,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, > > struct task_struct *tsk = current; > > > > /* User mode accesses just cause a SIGSEGV */ > > - if (error_code & X86_PF_USER) { > > + if ((error_code & X86_PF_USER) && !is_wruss(regs, error_code)) { > > /* > > * It's possible to have interrupts off here: > > */ > This needs commenting about why is_wruss() is special. Ok.
On 07/12/2018 03:59 PM, Yu-cheng Yu wrote: > On Tue, 2018-07-10 at 16:48 -0700, Dave Hansen wrote: >>> >>> +/* >>> + * WRUSS is a kernel instrcution and but writes to user >>> + * shadow stack memory. When a fault occurs, both >>> + * X86_PF_USER and X86_PF_SHSTK are set. >>> + */ >>> +static int is_wruss(struct pt_regs *regs, unsigned long error_code) >>> +{ >>> + return (((error_code & (X86_PF_USER | X86_PF_SHSTK)) == >>> + (X86_PF_USER | X86_PF_SHSTK)) && !user_mode(regs)); >>> +} >> I thought X86_PF_USER was set based on the mode in which the fault >> occurred. Does this mean that the architecture of this bit is different >> now? > > Yes. > >> That seems like something we need to call out if so. It also means we >> need to update the SDM because some of the text is wrong. > > It needs to mention the WRUSS case. Ugh. The documentation for this is not pretty. But, I guess this is not fundamentally different from access to U=1 pages when SMAP is in place and we've set EFLAGS.AC=1. But, sheesh, we need to call this out really explicitly and make it crystal clear what is going on. We need to go through the page fault code very carefully and audit all the X86_PF_USER spots and make sure there's no impact to those. SMAP should mean that we already dealt with these, but we still need an audit. The docs[1] are clear as mud on this though: "Page entry has user privilege (U=1) for a supervisor-level shadow-stack-load, shadow-stack-store-intent or shadow-stack-store access except those that originate from the WRUSS instruction." Or, in short: "Page has U=1 ... except those that originate from the WRUSS instruction." Which is backwards from what you said. I really wish those docs had reused the established SDM language instead of reinventing their own way of saying things. 1. https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
On 07/12/2018 04:49 PM, Dave Hansen wrote: >>> That seems like something we need to call out if so. It also means we >>> need to update the SDM because some of the text is wrong. >> It needs to mention the WRUSS case. > Ugh. The documentation for this is not pretty. But, I guess this is > not fundamentally different from access to U=1 pages when SMAP is in > place and we've set EFLAGS.AC=1. I was wrong and misread the docs. We do not get X86_PF_USER set when EFLAGS.AC=1. But, we *do* get X86_PF_USER (otherwise defined to be set when in ring3) when running in ring0 with the WRUSS instruction and some other various shadow-stack-access-related things. I'm sure folks had a good reason for this architecture, but it is a pretty fundamentally *new* architecture that we have to account for. This new architecture is also not spelled out or accounted for in the SDM as of yet. It's only called out here as far as I know: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf Which reminds me: Yu-cheng, do you have a link to the docs anywhere in your set? If not, you really should.
> On Jul 12, 2018, at 6:50 PM, Dave Hansen <dave.hansen@intel.com> wrote: > > On 07/12/2018 04:49 PM, Dave Hansen wrote: >>>> That seems like something we need to call out if so. It also means we >>>> need to update the SDM because some of the text is wrong. >>> It needs to mention the WRUSS case. >> Ugh. The documentation for this is not pretty. But, I guess this is >> not fundamentally different from access to U=1 pages when SMAP is in >> place and we've set EFLAGS.AC=1. > > I was wrong and misread the docs. We do not get X86_PF_USER set when > EFLAGS.AC=1. > > But, we *do* get X86_PF_USER (otherwise defined to be set when in ring3) > when running in ring0 with the WRUSS instruction and some other various > shadow-stack-access-related things. I'm sure folks had a good reason > for this architecture, but it is a pretty fundamentally *new* > architecture that we have to account for. I think it makes (some) sense. The USER bit is set for a page fault that was done with user privilege. So a descriptor table fault at CPL 3 has USER clear (regardless of the cause of the fault) and WRUSS has USER set. > > This new architecture is also not spelled out or accounted for in the > SDM as of yet. It's only called out here as far as I know: > https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf > > Which reminds me: Yu-cheng, do you have a link to the docs anywhere in > your set? If not, you really should. I am tempted to suggest that the whole series not be merged until there are actual docs. It’s not a fantastic precedent.
On 07/12/2018 07:21 PM, Andy Lutomirski wrote: > I am tempted to suggest that the whole series not be merged until > there are actual docs. It’s not a fantastic precedent. Do you mean Documentation or manpages, or are you talking about hardware documentation? https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
On 07/12/2018 09:16 PM, Dave Hansen wrote: > On 07/12/2018 07:21 PM, Andy Lutomirski wrote: >> I am tempted to suggest that the whole series not be merged until >> there are actual docs. It’s not a fantastic precedent. > > Do you mean Documentation or manpages, or are you talking about hardware > documentation? > https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf Hit send too soon... We do need manpages a well. If I had to do it for protection keys, everyone else has to suffer too. :) Yu-cheng, I really do think selftests are a necessity before this gets merged.
> On Jul 12, 2018, at 9:16 PM, Dave Hansen <dave.hansen@intel.com> wrote: > >> On 07/12/2018 07:21 PM, Andy Lutomirski wrote: >> I am tempted to suggest that the whole series not be merged until >> there are actual docs. It’s not a fantastic precedent. > > Do you mean Documentation or manpages, or are you talking about hardware > documentation? > https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf I mean hardware docs. The “preview” is a little bit dubious IMO.
On 07/10/2018 03:26 PM, Yu-cheng Yu wrote: > +static int is_wruss(struct pt_regs *regs, unsigned long error_code) > +{ > + return (((error_code & (X86_PF_USER | X86_PF_SHSTK)) == > + (X86_PF_USER | X86_PF_SHSTK)) && !user_mode(regs)); > +} > + > static void > show_fault_oops(struct pt_regs *regs, unsigned long error_code, > unsigned long address) > @@ -848,7 +859,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, > struct task_struct *tsk = current; > > /* User mode accesses just cause a SIGSEGV */ > - if (error_code & X86_PF_USER) { > + if ((error_code & X86_PF_USER) && !is_wruss(regs, error_code)) { > /* > * It's possible to have interrupts off here: > */ Please don't do it this way. We have two styles of page fault: 1. User page faults: find a VMA, try to handle (allocate memory et al.), kill process if we can't handle. 2. Kernel page faults: search for a *discrete* set of conditions that can be handled, including faults in instructions marked in exception tables. X86_PF_USER *means*: do user page fault handling. In the places where the hardware doesn't set it, but we still want user page fault handling, we manually set it, like this where we "downgrade" an implicit supervisor access to a user access: if (user_mode(regs)) { local_irq_enable(); error_code |= X86_PF_USER; flags |= FAULT_FLAG_USER; So, just please *clear* X86_PF_USER if !user_mode(regs) and X86_PF_SS. We do not want user page fault handling, thus we should not keep the bit set.
On Fri, 2018-07-13 at 05:12 -0700, Dave Hansen wrote: > On 07/10/2018 03:26 PM, Yu-cheng Yu wrote: > > > > +static int is_wruss(struct pt_regs *regs, unsigned long error_code) > > +{ > > + return (((error_code & (X86_PF_USER | X86_PF_SHSTK)) == > > + (X86_PF_USER | X86_PF_SHSTK)) && !user_mode(regs)); > > +} > > + > > static void > > show_fault_oops(struct pt_regs *regs, unsigned long error_code, > > unsigned long address) > > @@ -848,7 +859,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, > > struct task_struct *tsk = current; > > > > /* User mode accesses just cause a SIGSEGV */ > > - if (error_code & X86_PF_USER) { > > + if ((error_code & X86_PF_USER) && !is_wruss(regs, error_code)) { > > /* > > * It's possible to have interrupts off here: > > */ > Please don't do it this way. > > We have two styles of page fault: > 1. User page faults: find a VMA, try to handle (allocate memory et al.), > kill process if we can't handle. > 2. Kernel page faults: search for a *discrete* set of conditions that > can be handled, including faults in instructions marked in exception > tables. > > X86_PF_USER *means*: do user page fault handling. In the places where > the hardware doesn't set it, but we still want user page fault handling, > we manually set it, like this where we "downgrade" an implicit > supervisor access to a user access: > > if (user_mode(regs)) { > local_irq_enable(); > error_code |= X86_PF_USER; > flags |= FAULT_FLAG_USER; > > So, just please *clear* X86_PF_USER if !user_mode(regs) and X86_PF_SS. > We do not want user page fault handling, thus we should not keep the bit > set. Agree. I will change that. Yu-cheng
On Thu, 2018-07-12 at 21:18 -0700, Dave Hansen wrote: > On 07/12/2018 09:16 PM, Dave Hansen wrote: > > > > On 07/12/2018 07:21 PM, Andy Lutomirski wrote: > > > > > > I am tempted to suggest that the whole series not be merged until > > > there are actual docs. It’s not a fantastic precedent. > > Do you mean Documentation or manpages, or are you talking about hardware > > documentation? > > https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf > Hit send too soon... > > We do need manpages a well. If I had to do it for protection keys, > everyone else has to suffer too. :) > > Yu-cheng, I really do think selftests are a necessity before this gets > merged. > We already have some. I will put those in patches. Yu-cheng
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 317fc59b512c..c69d8d6b457f 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -237,6 +237,51 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_INTEL_CET + +#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32) +static inline int write_user_shstk_32(unsigned long addr, unsigned int val) +{ + int err; + + asm volatile("1: wrussd %[val], (%[addr])\n" + "xor %[err], %[err]\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: mov $-1, %[err]; jmp 2b\n" + ".previous\n" + _ASM_EXTABLE(1b, 3b) + : [err] "=a" (err) + : [val] "S" (val), [addr] "D" (addr)); + + return err; +} +#else +static inline int write_user_shstk_32(unsigned long addr, unsigned int val) +{ + BUG(); + return 0; +} +#endif + +static inline int write_user_shstk_64(unsigned long addr, unsigned long val) +{ + int err = 0; + + asm volatile("1: wrussq %[val], (%[addr])\n" + "xor %[err], %[err]\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: mov $-1, %[err]; jmp 2b\n" + ".previous\n" + _ASM_EXTABLE(1b, 3b) + : [err] "=a" (err) + : [val] "S" (val), [addr] "D" (addr)); + + return err; +} +#endif /* CONFIG_X86_INTEL_CET */ + #define nop() asm volatile ("nop") diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt index e0b85930dd77..72bb7c48a7df 100644 --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -789,7 +789,7 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2) f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2) f2: ANDN Gy,By,Ey (v) f3: Grp17 (1A) -f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) +f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v) EndTable diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fcd5739151f9..92f178b8b598 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -641,6 +641,17 @@ static int is_f00f_bug(struct pt_regs *regs, unsigned long address) return 0; } +/* + * WRUSS is a kernel instrcution and but writes to user + * shadow stack memory. When a fault occurs, both + * X86_PF_USER and X86_PF_SHSTK are set. + */ +static int is_wruss(struct pt_regs *regs, unsigned long error_code) +{ + return (((error_code & (X86_PF_USER | X86_PF_SHSTK)) == + (X86_PF_USER | X86_PF_SHSTK)) && !user_mode(regs)); +} + static void show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address) @@ -848,7 +859,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, struct task_struct *tsk = current; /* User mode accesses just cause a SIGSEGV */ - if (error_code & X86_PF_USER) { + if ((error_code & X86_PF_USER) && !is_wruss(regs, error_code)) { /* * It's possible to have interrupts off here: */ diff --git a/tools/objtool/arch/x86/lib/x86-opcode-map.txt b/tools/objtool/arch/x86/lib/x86-opcode-map.txt index e0b85930dd77..72bb7c48a7df 100644 --- a/tools/objtool/arch/x86/lib/x86-opcode-map.txt +++ b/tools/objtool/arch/x86/lib/x86-opcode-map.txt @@ -789,7 +789,7 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2) f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2) f2: ANDN Gy,By,Ey (v) f3: Grp17 (1A) -f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) +f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v) EndTable
WRUSS is a new kernel-mode instruction but writes directly to user shadow stack memory. This is used to construct a return address on the shadow stack for the signal handler. This instruction can fault if the user shadow stack is invalid shadow stack memory. In that case, the kernel does fixup. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> --- arch/x86/include/asm/special_insns.h | 45 +++++++++++++++++++ arch/x86/lib/x86-opcode-map.txt | 2 +- arch/x86/mm/fault.c | 13 +++++- tools/objtool/arch/x86/lib/x86-opcode-map.txt | 2 +- 4 files changed, 59 insertions(+), 3 deletions(-)