Message ID | 20220224194853.17774-2-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/CET: Fix S3 resume with shadow stacks active | expand |
On 24.02.2022 20:48, Andrew Cooper wrote: > The original shadow stack support has an error on S3 resume with very bizzare > fallout. The BSP comes back up, but APs fail with: > > (XEN) Enabling non-boot CPUs ... > (XEN) Stuck ?? > (XEN) Error bringing CPU1 up: -5 > > and then later (on at least two Intel TigerLake platforms), the next HVM vCPU > to be scheduled on the BSP dies with: > > (XEN) d1v0 Unexpected vmexit: reason 3 > (XEN) domain_crash called from vmx.c:4304 > (XEN) Domain 1 (vcpu#0) crashed on cpu#0: > > The VMExit reason is EXIT_REASON_INIT, which has nothing to do with the > scheduled vCPU, and will be addressed in a subsequent patch. It is a > consequence of the APs triple faulting. > > The reason the APs triple fault is because we don't tear down the stacks on > suspend. The idle/play_dead loop is killed in the middle of running, meaning > that the supervisor token is left busy. > > On resume, SETSSBSY finds the token already busy, suffers #CP and triple > faults because the IDT isn't configured this early. > > Rework the AP bringup path to (re)create the supervisor token. This ensures > the primary stack is non-busy before use. > > Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks") > Link: https://github.com/QubesOS/qubes-issues/issues/7283 > Reported-by: Thiner Logoer <logoerthiner1@163.com> > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Tested-by: Thiner Logoer <logoerthiner1@163.com> > Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> > Slightly RFC. This does fix the crash encountered, but it occurs to me that > there's a race condition when S3 platform powerdown is incident with an > NMI/#MC, where more than just the primary shadow stack can end up busy on > resume. > > A larger fix would be to change how we allocate tokens, and always have each > CPU set up its own tokens. I didn't do this originally in the hopes of having > WRSSQ generally disabled, but that plan failed when encountering reality... While I think this wants fixing one way or another, I also think this shouldn't block the immediate fix here (which addresses an unconditional crash rather than a pretty unlikely one). Jan
On 25/02/2022 08:38, Jan Beulich wrote: > On 24.02.2022 20:48, Andrew Cooper wrote: >> The original shadow stack support has an error on S3 resume with very bizzare >> fallout. The BSP comes back up, but APs fail with: >> >> (XEN) Enabling non-boot CPUs ... >> (XEN) Stuck ?? >> (XEN) Error bringing CPU1 up: -5 >> >> and then later (on at least two Intel TigerLake platforms), the next HVM vCPU >> to be scheduled on the BSP dies with: >> >> (XEN) d1v0 Unexpected vmexit: reason 3 >> (XEN) domain_crash called from vmx.c:4304 >> (XEN) Domain 1 (vcpu#0) crashed on cpu#0: >> >> The VMExit reason is EXIT_REASON_INIT, which has nothing to do with the >> scheduled vCPU, and will be addressed in a subsequent patch. It is a >> consequence of the APs triple faulting. >> >> The reason the APs triple fault is because we don't tear down the stacks on >> suspend. The idle/play_dead loop is killed in the middle of running, meaning >> that the supervisor token is left busy. >> >> On resume, SETSSBSY finds the token already busy, suffers #CP and triple >> faults because the IDT isn't configured this early. >> >> Rework the AP bringup path to (re)create the supervisor token. This ensures >> the primary stack is non-busy before use. >> >> Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks") >> Link: https://github.com/QubesOS/qubes-issues/issues/7283 >> Reported-by: Thiner Logoer <logoerthiner1@163.com> >> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> Tested-by: Thiner Logoer <logoerthiner1@163.com> >> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> Thanks. >> Slightly RFC. This does fix the crash encountered, but it occurs to me that >> there's a race condition when S3 platform powerdown is incident with an >> NMI/#MC, where more than just the primary shadow stack can end up busy on >> resume. >> >> A larger fix would be to change how we allocate tokens, and always have each >> CPU set up its own tokens. I didn't do this originally in the hopes of having >> WRSSQ generally disabled, but that plan failed when encountering reality... > While I think this wants fixing one way or another, I also think this > shouldn't block the immediate fix here (which addresses an unconditional > crash rather than a pretty unlikely one). Fair point. I'll get this committed now, and work on the IST shstks later. As a note for backporting, this depends on the adjustments made in c/s 311434bfc9d1 so isn't safe to backport in exactly this form. I'll sort something out in due course. ~Andrew
diff --git a/xen/arch/x86/boot/x86_64.S b/xen/arch/x86/boot/x86_64.S index fa41990dde0f..5d12937a0e40 100644 --- a/xen/arch/x86/boot/x86_64.S +++ b/xen/arch/x86/boot/x86_64.S @@ -51,13 +51,21 @@ ENTRY(__high_start) test $CET_SHSTK_EN, %al jz .L_ap_cet_done - /* Derive MSR_PL0_SSP from %rsp (token written when stack is allocated). */ - mov $MSR_PL0_SSP, %ecx + /* Derive the supervisor token address from %rsp. */ mov %rsp, %rdx + and $~(STACK_SIZE - 1), %rdx + or $(PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8, %rdx + + /* + * Write a new supervisor token. Doesn't matter on boot, but for S3 + * resume this clears the busy bit. + */ + wrssq %rdx, (%rdx) + + /* Point MSR_PL0_SSP at the token. */ + mov $MSR_PL0_SSP, %ecx + mov %edx, %eax shr $32, %rdx - mov %esp, %eax - and $~(STACK_SIZE - 1), %eax - or $(PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8, %eax wrmsr setssbsy