Message ID | 20220411164636.74866-1-jon@nutanix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/speculation, KVM: respect user IBPB configuration | expand |
On Mon, Apr 11, 2022, Jon Kohler wrote: > On vmx_vcpu_load_vmcs and svm_vcpu_load, respect user IBPB config and only > attempt IBPB MSR if either always_ibpb or cond_ibpb and the vcpu thread > has TIF_SPEC_IB. > > A vcpu thread will have TIF_SPEC_IB on qemu-kvm using -sandbox on if > kernel cmdline spectre_v2_user=seccomp, which would indicate that the user > is looking for a higher security environment and has workloads that need > to be secured from each other. > > Note: The behavior of spectre_v2_user recently changed in 5.16 on > commit 2f46993d83ff ("x86: change default to > spec_store_bypass_disable=prctl spectre_v2_user=prctl") > > Prior to that, qemu-kvm with -sandbox on would also have TIF_SPEC_IB > if spectre_v2_user=auto. > > Signed-off-by: Jon Kohler <jon@nutanix.com> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Kees Cook <keescook@chromium.org> > Cc: Josh Poimboeuf <jpoimboe@redhat.com> > Cc: Waiman Long <longman@redhat.com> > --- > arch/x86/include/asm/spec-ctrl.h | 12 ++++++++++++ > arch/x86/kernel/cpu/bugs.c | 6 ++++-- > arch/x86/kvm/svm/svm.c | 2 +- > arch/x86/kvm/vmx/vmx.c | 2 +- > 4 files changed, 18 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h > index 5393babc0598..552757847d5b 100644 > --- a/arch/x86/include/asm/spec-ctrl.h > +++ b/arch/x86/include/asm/spec-ctrl.h > @@ -85,4 +85,16 @@ static inline void speculative_store_bypass_ht_init(void) { } > extern void speculation_ctrl_update(unsigned long tif); > extern void speculation_ctrl_update_current(void); > > +/* > + * Always issue IBPB if switch_mm_always_ibpb and respect conditional > + * IBPB if this thread does not have !TIF_SPEC_IB. > + */ > +static inline void maybe_indirect_branch_prediction_barrier(void) I think it makes sense to give this a virtualization specific name, e.g. x86_virt_cond_indirect_branch_prediction_barrier() or x86_virt_cond_ibpb(), to follow x86_virt_spec_ctrl(). Or if "cond" is misleading in the "always" case, perhaps x86_virt_guest_switch_ibpb()? > +{ > + if (static_key_enabled(&switch_mm_always_ibpb) || > + (static_key_enabled(&switch_mm_cond_ibpb) && > + test_thread_flag(TIF_SPEC_IB))) The cond_ibpb case in particular needs a more detailed comment. Specifically it should call out why this path doesn't do IBPB when switching from a task with TIF_SPEC_IB to a task without TIF_SPEC_IB, whereas cond_mitigation() does emit IBPB when switching mms in this case. But stepping back, why does KVM do its own IBPB in the first place? The goal is to prevent one vCPU from attacking the next vCPU run on the same pCPU. But unless userspace is running multiple VMs in the same process/mm_struct, switching vCPUs, i.e. switching tasks, will also switch mm_structs and thus do IPBP via cond_mitigation. If userspace runs multiple VMs in the same process, enables cond_ipbp, _and_ sets TIF_SPEC_IB, then it's being stupid and isn't getting full protection in any case, e.g. if userspace is handling an exit-to-userspace condition for two vCPUs from different VMs, then the kernel could switch between those two vCPUs' tasks without bouncing through KVM and thus without doing KVM's IBPB. I can kinda see doing this for always_ibpb, e.g. if userspace is unaware of spectre and is naively running multiple VMs in the same process. What am I missing?
> On Apr 15, 2022, at 10:28 AM, Sean Christopherson <seanjc@google.com> wrote: > > On Mon, Apr 11, 2022, Jon Kohler wrote: >> On vmx_vcpu_load_vmcs and svm_vcpu_load, respect user IBPB config and only >> attempt IBPB MSR if either always_ibpb or cond_ibpb and the vcpu thread >> has TIF_SPEC_IB. >> >> A vcpu thread will have TIF_SPEC_IB on qemu-kvm using -sandbox on if >> kernel cmdline spectre_v2_user=seccomp, which would indicate that the user >> is looking for a higher security environment and has workloads that need >> to be secured from each other. >> >> Note: The behavior of spectre_v2_user recently changed in 5.16 on >> commit 2f46993d83ff ("x86: change default to >> spec_store_bypass_disable=prctl spectre_v2_user=prctl") >> >> Prior to that, qemu-kvm with -sandbox on would also have TIF_SPEC_IB >> if spectre_v2_user=auto. >> >> Signed-off-by: Jon Kohler <jon@nutanix.com> >> Cc: Andrea Arcangeli <aarcange@redhat.com> >> Cc: Kees Cook <keescook@chromium.org> >> Cc: Josh Poimboeuf <jpoimboe@redhat.com> >> Cc: Waiman Long <longman@redhat.com> >> --- >> arch/x86/include/asm/spec-ctrl.h | 12 ++++++++++++ >> arch/x86/kernel/cpu/bugs.c | 6 ++++-- >> arch/x86/kvm/svm/svm.c | 2 +- >> arch/x86/kvm/vmx/vmx.c | 2 +- >> 4 files changed, 18 insertions(+), 4 deletions(-) >> >> diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h >> index 5393babc0598..552757847d5b 100644 >> --- a/arch/x86/include/asm/spec-ctrl.h >> +++ b/arch/x86/include/asm/spec-ctrl.h >> @@ -85,4 +85,16 @@ static inline void speculative_store_bypass_ht_init(void) { } >> extern void speculation_ctrl_update(unsigned long tif); >> extern void speculation_ctrl_update_current(void); >> >> +/* >> + * Always issue IBPB if switch_mm_always_ibpb and respect conditional >> + * IBPB if this thread does not have !TIF_SPEC_IB. >> + */ >> +static inline void maybe_indirect_branch_prediction_barrier(void) > > I think it makes sense to give this a virtualization specific name, e.g. > x86_virt_cond_indirect_branch_prediction_barrier() or x86_virt_cond_ibpb(), > to follow x86_virt_spec_ctrl(). Or if "cond" is misleading in the "always" case, > perhaps x86_virt_guest_switch_ibpb()? +Bijan Yea, good suggestion. Bijan and I were going back and forth on this in our internal review and couldn’t land on a good name. I like this naming better. > >> +{ >> + if (static_key_enabled(&switch_mm_always_ibpb) || >> + (static_key_enabled(&switch_mm_cond_ibpb) && >> + test_thread_flag(TIF_SPEC_IB))) > > The cond_ibpb case in particular needs a more detailed comment. Specifically it > should call out why this path doesn't do IBPB when switching from a task with > TIF_SPEC_IB to a task without TIF_SPEC_IB, whereas cond_mitigation() does emit > IBPB when switching mms in this case. > > But stepping back, why does KVM do its own IBPB in the first place? The goal is > to prevent one vCPU from attacking the next vCPU run on the same pCPU. But unless > userspace is running multiple VMs in the same process/mm_struct, switching vCPUs, > i.e. switching tasks, will also switch mm_structs and thus do IPBP via cond_mitigation. Good question, I couldn’t figure out the answer to this by walking the code and looking at git history/blame for this area. Are there VMMs that even run multiple VMs within the same process? The only case I could think of is a nested situation? > > If userspace runs multiple VMs in the same process, enables cond_ipbp, _and_ sets > TIF_SPEC_IB, then it's being stupid and isn't getting full protection in any case, > e.g. if userspace is handling an exit-to-userspace condition for two vCPUs from > different VMs, then the kernel could switch between those two vCPUs' tasks without > bouncing through KVM and thus without doing KVM's IBPB. Exactly, so meaning that the only time this would make sense is for some sort of nested situation or some other funky VMM tomfoolery, but that nested hypervisor might not be KVM, so it's a farce, yea? Meaning that even in that case, there is zero guarantee from the host kernel perspective that barriers within that process are being issued on switch, which would make this security posture just window dressing? > > I can kinda see doing this for always_ibpb, e.g. if userspace is unaware of spectre > and is naively running multiple VMs in the same process. Agreed. I’ve thought of always_ibpb as "paranoid mode" and if a user signs up for that, they rarely care about the fast path / performance implications, even if some of the security surface area is just complete window dressing :( Looking forward, what if we simplified this to have KVM issue barriers IFF always_ibpb? And drop the cond’s, since the switching mm_structs should take care of that? The nice part is that then the cond_mitigation() path handles the going to thread with flag or going from a thread with flag situation gracefully, and we don’t need to try to duplicate that smarts in kvm code or somewhere else. > > What am I missing?
On Fri, Apr 15, 2022, Jon Kohler wrote: > > > On Apr 15, 2022, at 10:28 AM, Sean Christopherson <seanjc@google.com> wrote: > > But stepping back, why does KVM do its own IBPB in the first place? The goal is > > to prevent one vCPU from attacking the next vCPU run on the same pCPU. But unless > > userspace is running multiple VMs in the same process/mm_struct, switching vCPUs, > > i.e. switching tasks, will also switch mm_structs and thus do IPBP via cond_mitigation. > > Good question, I couldn’t figure out the answer to this by walking the code and looking > at git history/blame for this area. Are there VMMs that even run multiple VMs within > the same process? The only case I could think of is a nested situation? Selftests? :-) > > If userspace runs multiple VMs in the same process, enables cond_ipbp, _and_ sets > > TIF_SPEC_IB, then it's being stupid and isn't getting full protection in any case, > > e.g. if userspace is handling an exit-to-userspace condition for two vCPUs from > > different VMs, then the kernel could switch between those two vCPUs' tasks without > > bouncing through KVM and thus without doing KVM's IBPB. > > Exactly, so meaning that the only time this would make sense is for some sort of nested > situation or some other funky VMM tomfoolery, but that nested hypervisor might not be > KVM, so it's a farce, yea? Meaning that even in that case, there is zero guarantee > from the host kernel perspective that barriers within that process are being issued on > switch, which would make this security posture just window dressing? > > > > > I can kinda see doing this for always_ibpb, e.g. if userspace is unaware of spectre > > and is naively running multiple VMs in the same process. > > Agreed. I’ve thought of always_ibpb as "paranoid mode" and if a user signs up for that, > they rarely care about the fast path / performance implications, even if some of the > security surface area is just complete window dressing :( > > Looking forward, what if we simplified this to have KVM issue barriers IFF always_ibpb? > > And drop the cond’s, since the switching mm_structs should take care of that? > > The nice part is that then the cond_mitigation() path handles the going to thread > with flag or going from a thread with flag situation gracefully, and we don’t need to > try to duplicate that smarts in kvm code or somewhere else. Unless there's an edge case we're overlooking, that has my vote. And if the above is captured in a comment, then there shouldn't be any confusion as to why the kernel/KVM is consuming a flag named "switch_mm" when switching vCPUs, i.e. when there may or may not have been a change in mm structs.
> On Apr 18, 2022, at 12:28 PM, Sean Christopherson <seanjc@google.com> wrote: > > On Fri, Apr 15, 2022, Jon Kohler wrote: >> >>> On Apr 15, 2022, at 10:28 AM, Sean Christopherson <seanjc@google.com> wrote: >>> But stepping back, why does KVM do its own IBPB in the first place? The goal is >>> to prevent one vCPU from attacking the next vCPU run on the same pCPU. But unless >>> userspace is running multiple VMs in the same process/mm_struct, switching vCPUs, >>> i.e. switching tasks, will also switch mm_structs and thus do IPBP via cond_mitigation. >> >> Good question, I couldn’t figure out the answer to this by walking the code and looking >> at git history/blame for this area. Are there VMMs that even run multiple VMs within >> the same process? The only case I could think of is a nested situation? > > Selftests? :-) Ah! I’ll take a mulligan, I was only thinking about run of the mill user stuff, not the tests, thx. > >>> If userspace runs multiple VMs in the same process, enables cond_ipbp, _and_ sets >>> TIF_SPEC_IB, then it's being stupid and isn't getting full protection in any case, >>> e.g. if userspace is handling an exit-to-userspace condition for two vCPUs from >>> different VMs, then the kernel could switch between those two vCPUs' tasks without >>> bouncing through KVM and thus without doing KVM's IBPB. >> >> Exactly, so meaning that the only time this would make sense is for some sort of nested >> situation or some other funky VMM tomfoolery, but that nested hypervisor might not be >> KVM, so it's a farce, yea? Meaning that even in that case, there is zero guarantee >> from the host kernel perspective that barriers within that process are being issued on >> switch, which would make this security posture just window dressing? >> >>> >>> I can kinda see doing this for always_ibpb, e.g. if userspace is unaware of spectre >>> and is naively running multiple VMs in the same process. >> >> Agreed. I’ve thought of always_ibpb as "paranoid mode" and if a user signs up for that, >> they rarely care about the fast path / performance implications, even if some of the >> security surface area is just complete window dressing :( >> >> Looking forward, what if we simplified this to have KVM issue barriers IFF always_ibpb? >> >> And drop the cond’s, since the switching mm_structs should take care of that? >> >> The nice part is that then the cond_mitigation() path handles the going to thread >> with flag or going from a thread with flag situation gracefully, and we don’t need to >> try to duplicate that smarts in kvm code or somewhere else. > > Unless there's an edge case we're overlooking, that has my vote. And if the > above is captured in a comment, then there shouldn't be any confusion as to why > the kernel/KVM is consuming a flag named "switch_mm" when switching vCPUs, i.e. > when there may or may not have been a change in mm structs. Ok great. I’ll work up a v2 and send it out.
diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h index 5393babc0598..552757847d5b 100644 --- a/arch/x86/include/asm/spec-ctrl.h +++ b/arch/x86/include/asm/spec-ctrl.h @@ -85,4 +85,16 @@ static inline void speculative_store_bypass_ht_init(void) { } extern void speculation_ctrl_update(unsigned long tif); extern void speculation_ctrl_update_current(void); +/* + * Always issue IBPB if switch_mm_always_ibpb and respect conditional + * IBPB if this thread does not have !TIF_SPEC_IB. + */ +static inline void maybe_indirect_branch_prediction_barrier(void) +{ + if (static_key_enabled(&switch_mm_always_ibpb) || + (static_key_enabled(&switch_mm_cond_ibpb) && + test_thread_flag(TIF_SPEC_IB))) + indirect_branch_prediction_barrier(); +} + #endif diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 6296e1ebed1d..737826bf974c 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -66,10 +66,12 @@ u64 __ro_after_init x86_amd_ls_cfg_ssbd_mask; /* Control conditional STIBP in switch_to() */ DEFINE_STATIC_KEY_FALSE(switch_to_cond_stibp); -/* Control conditional IBPB in switch_mm() */ +/* Control conditional IBPB in switch_mm() and vmcs/vmcb load */ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); -/* Control unconditional IBPB in switch_mm() */ +EXPORT_SYMBOL_GPL(switch_mm_cond_ibpb); +/* Control unconditional IBPB in switch_mm() and vmcs/vmcb load */ DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb); +EXPORT_SYMBOL_GPL(switch_mm_always_ibpb); /* Control MDS CPU buffer clear before returning to user space */ DEFINE_STATIC_KEY_FALSE(mds_user_clear); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index bd4c64b362d2..7762ca1197b5 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1302,7 +1302,7 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (sd->current_vmcb != svm->vmcb) { sd->current_vmcb = svm->vmcb; - indirect_branch_prediction_barrier(); + maybe_indirect_branch_prediction_barrier(); } if (kvm_vcpu_apicv_active(vcpu)) __avic_vcpu_load(vcpu, cpu); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 04d170c4b61e..baaf658263b5 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1270,7 +1270,7 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu, * The L1 VMM can protect itself with retpolines, IBPB or IBRS. */ if (!buddy || WARN_ON_ONCE(buddy->vmcs != prev)) - indirect_branch_prediction_barrier(); + maybe_indirect_branch_prediction_barrier(); } if (!already_loaded) {
On vmx_vcpu_load_vmcs and svm_vcpu_load, respect user IBPB config and only attempt IBPB MSR if either always_ibpb or cond_ibpb and the vcpu thread has TIF_SPEC_IB. A vcpu thread will have TIF_SPEC_IB on qemu-kvm using -sandbox on if kernel cmdline spectre_v2_user=seccomp, which would indicate that the user is looking for a higher security environment and has workloads that need to be secured from each other. Note: The behavior of spectre_v2_user recently changed in 5.16 on commit 2f46993d83ff ("x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl") Prior to that, qemu-kvm with -sandbox on would also have TIF_SPEC_IB if spectre_v2_user=auto. Signed-off-by: Jon Kohler <jon@nutanix.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Waiman Long <longman@redhat.com> --- arch/x86/include/asm/spec-ctrl.h | 12 ++++++++++++ arch/x86/kernel/cpu/bugs.c | 6 ++++-- arch/x86/kvm/svm/svm.c | 2 +- arch/x86/kvm/vmx/vmx.c | 2 +- 4 files changed, 18 insertions(+), 4 deletions(-)