Message ID | 20200501225838.9866-10-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86: Support for CET Supervisor Shadow Stacks | expand |
On 02.05.2020 00:58, Andrew Cooper wrote: > When executing an IRET-to-self, the shadow stack must agree with the regular > stack. We can't manipulate SSP directly, so have to fake a shadow IRET frame > by executing 3 CALLs, then editing the result to look correct. > > This is not a fastpath, is called on the BSP long before CET can be set up, > and may be called on the crash path after CET is disabled. Use the fact that > INCSSP is allocated from the hint nop space to construct a test for CET being > active which is safe on all processors. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> albeit with a question which may make a further change necessary: > --- a/xen/include/asm-x86/processor.h > +++ b/xen/include/asm-x86/processor.h > @@ -544,17 +544,40 @@ static inline void enable_nmis(void) > { > unsigned long tmp; > > - asm volatile ( "mov %%rsp, %[tmp] \n\t" > - "push %[ss] \n\t" > - "push %[tmp] \n\t" > - "pushf \n\t" > - "push %[cs] \n\t" > - "lea 1f(%%rip), %[tmp] \n\t" > - "push %[tmp] \n\t" > - "iretq; 1: \n\t" > - : [tmp] "=&r" (tmp) > + asm volatile ( "mov %%rsp, %[rsp] \n\t" > + "lea .Ldone(%%rip), %[rip] \n\t" > +#ifdef CONFIG_XEN_SHSTK > + /* Check for CET-SS being active. */ > + "mov $1, %k[ssp] \n\t" > + "rdsspq %[ssp] \n\t" > + "cmp $1, %k[ssp] \n\t" > + "je .Lshstk_done \n\t" > + > + /* Push 3 words on the shadow stack */ > + ".rept 3 \n\t" > + "call 1f; nop; 1: \n\t" > + ".endr \n\t" > + > + /* Fixup to be an IRET shadow stack frame */ > + "wrssq %q[cs], -1*8(%[ssp]) \n\t" > + "wrssq %[rip], -2*8(%[ssp]) \n\t" > + "wrssq %[ssp], -3*8(%[ssp]) \n\t" > + > + ".Lshstk_done:" > +#endif > + /* Write an IRET regular frame */ > + "push %[ss] \n\t" > + "push %[rsp] \n\t" > + "pushf \n\t" > + "push %q[cs] \n\t" > + "push %[rip] \n\t" > + "iretq \n\t" > + ".Ldone: \n\t" > + : [rip] "=&r" (tmp), > + [rsp] "=&r" (tmp), > + [ssp] "=&r" (tmp) Even after an hour of reading and searching through the gcc docs I can't convince myself that this utilizes defined behavior. We do tie multiple outputs to the same C variable elsewhere, yes, but only in cases where we don't really care about the register, or where the register is a fixed one anyway. What I can't find is a clear statement that gcc wouldn't ever, now or in the future, use the same register for all three outputs. I think one can deduce this in certain ways, and experimentation also seems to confirm it, but still. Jan
On 05/05/2020 15:48, Jan Beulich wrote: > [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe. > > On 02.05.2020 00:58, Andrew Cooper wrote: >> When executing an IRET-to-self, the shadow stack must agree with the regular >> stack. We can't manipulate SSP directly, so have to fake a shadow IRET frame >> by executing 3 CALLs, then editing the result to look correct. >> >> This is not a fastpath, is called on the BSP long before CET can be set up, >> and may be called on the crash path after CET is disabled. Use the fact that >> INCSSP is allocated from the hint nop space to construct a test for CET being >> active which is safe on all processors. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> > albeit with a question which may make a further change necessary: > >> --- a/xen/include/asm-x86/processor.h >> +++ b/xen/include/asm-x86/processor.h >> @@ -544,17 +544,40 @@ static inline void enable_nmis(void) >> { >> unsigned long tmp; >> >> - asm volatile ( "mov %%rsp, %[tmp] \n\t" >> - "push %[ss] \n\t" >> - "push %[tmp] \n\t" >> - "pushf \n\t" >> - "push %[cs] \n\t" >> - "lea 1f(%%rip), %[tmp] \n\t" >> - "push %[tmp] \n\t" >> - "iretq; 1: \n\t" >> - : [tmp] "=&r" (tmp) >> + asm volatile ( "mov %%rsp, %[rsp] \n\t" >> + "lea .Ldone(%%rip), %[rip] \n\t" >> +#ifdef CONFIG_XEN_SHSTK >> + /* Check for CET-SS being active. */ >> + "mov $1, %k[ssp] \n\t" >> + "rdsspq %[ssp] \n\t" >> + "cmp $1, %k[ssp] \n\t" >> + "je .Lshstk_done \n\t" >> + >> + /* Push 3 words on the shadow stack */ >> + ".rept 3 \n\t" >> + "call 1f; nop; 1: \n\t" >> + ".endr \n\t" >> + >> + /* Fixup to be an IRET shadow stack frame */ >> + "wrssq %q[cs], -1*8(%[ssp]) \n\t" >> + "wrssq %[rip], -2*8(%[ssp]) \n\t" >> + "wrssq %[ssp], -3*8(%[ssp]) \n\t" >> + >> + ".Lshstk_done:" >> +#endif >> + /* Write an IRET regular frame */ >> + "push %[ss] \n\t" >> + "push %[rsp] \n\t" >> + "pushf \n\t" >> + "push %q[cs] \n\t" >> + "push %[rip] \n\t" >> + "iretq \n\t" >> + ".Ldone: \n\t" >> + : [rip] "=&r" (tmp), >> + [rsp] "=&r" (tmp), >> + [ssp] "=&r" (tmp) > Even after an hour of reading and searching through the gcc docs > I can't convince myself that this utilizes defined behavior. We > do tie multiple outputs to the same C variable elsewhere, yes, > but only in cases where we don't really care about the register, > or where the register is a fixed one anyway. What I can't find > is a clear statement that gcc wouldn't ever, now or in the > future, use the same register for all three outputs. I think one > can deduce this in certain ways, and experimentation also seems > to confirm it, but still. I don't see how different local variable will make any difference. The variables themselves will be dropped for being write-only before register scheduling happens. GCC and Clang reliably use the callee preserved registers first, then start spilling caller preserved registers, and finally refuse to assemble as soon as we hit 16 registers of this form, citing impossible constraints (GCC) or too many registers (Clang). The important point is that they are listed as separate outputs, so cannot share register allocations. If this were to be violated, loads of code would malfunction. ~Andrew
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index 54e1a8b605..654d46a6f4 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -544,17 +544,40 @@ static inline void enable_nmis(void) { unsigned long tmp; - asm volatile ( "mov %%rsp, %[tmp] \n\t" - "push %[ss] \n\t" - "push %[tmp] \n\t" - "pushf \n\t" - "push %[cs] \n\t" - "lea 1f(%%rip), %[tmp] \n\t" - "push %[tmp] \n\t" - "iretq; 1: \n\t" - : [tmp] "=&r" (tmp) + asm volatile ( "mov %%rsp, %[rsp] \n\t" + "lea .Ldone(%%rip), %[rip] \n\t" +#ifdef CONFIG_XEN_SHSTK + /* Check for CET-SS being active. */ + "mov $1, %k[ssp] \n\t" + "rdsspq %[ssp] \n\t" + "cmp $1, %k[ssp] \n\t" + "je .Lshstk_done \n\t" + + /* Push 3 words on the shadow stack */ + ".rept 3 \n\t" + "call 1f; nop; 1: \n\t" + ".endr \n\t" + + /* Fixup to be an IRET shadow stack frame */ + "wrssq %q[cs], -1*8(%[ssp]) \n\t" + "wrssq %[rip], -2*8(%[ssp]) \n\t" + "wrssq %[ssp], -3*8(%[ssp]) \n\t" + + ".Lshstk_done:" +#endif + /* Write an IRET regular frame */ + "push %[ss] \n\t" + "push %[rsp] \n\t" + "pushf \n\t" + "push %q[cs] \n\t" + "push %[rip] \n\t" + "iretq \n\t" + ".Ldone: \n\t" + : [rip] "=&r" (tmp), + [rsp] "=&r" (tmp), + [ssp] "=&r" (tmp) : [ss] "i" (__HYPERVISOR_DS), - [cs] "i" (__HYPERVISOR_CS) ); + [cs] "r" (__HYPERVISOR_CS) ); } void sysenter_entry(void);
When executing an IRET-to-self, the shadow stack must agree with the regular stack. We can't manipulate SSP directly, so have to fake a shadow IRET frame by executing 3 CALLs, then editing the result to look correct. This is not a fastpath, is called on the BSP long before CET can be set up, and may be called on the crash path after CET is disabled. Use the fact that INCSSP is allocated from the hint nop space to construct a test for CET being active which is safe on all processors. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wl@xen.org> CC: Roger Pau Monné <roger.pau@citrix.com> --- xen/include/asm-x86/processor.h | 43 +++++++++++++++++++++++++++++++---------- 1 file changed, 33 insertions(+), 10 deletions(-)