Message ID | 20230531150112.76156-1-jon@nutanix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host() | expand |
On Wed, May 31, 2023 at 11:01:12AM -0400, Jon Kohler wrote: > Remove barrier_nospec(), which translates to LFENCE, in > vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1]) > already exist prior to this point. > > This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest > RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however, > commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in > 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added > directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host. > > For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to > IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value > does not match, which serves as an additional RSB-barrier. > > [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html > > Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS") > Fixes: 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") Sorry, I knew I should have put a comment there. The goal of this barrier_nospec() is to prevent speculative execution from bypassing the SPEC_CTRL write (due to misprediction of the conditional branch, Spectre v1 style). Otherwise the next indirect branch or unbalanced RET could be an attack target. So any previous LFENCEs before that conditional branch won't help here.
> On May 31, 2023, at 7:18 PM, Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > On Wed, May 31, 2023 at 11:01:12AM -0400, Jon Kohler wrote: >> Remove barrier_nospec(), which translates to LFENCE, in >> vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1]) >> already exist prior to this point. >> >> This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest >> RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however, >> commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in >> 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added >> directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host. >> >> For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to >> IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value >> does not match, which serves as an additional RSB-barrier. >> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.intel.com_content_www_us_en_developer_articles_technical_software-2Dsecurity-2Dguidance_advisory-2Dguidance_post-2Dbarrier-2Dreturn-2Dstack-2Dbuffer-2Dpredictions.html&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=mWtdoxDY5cOR_aKAtV9D8kHfeWi380k1kYwGB_RAPTEL1F_AUSstYbevyVn9lhk-&s=IG-gZfjPGO_XI9FkdbrvZFvHPyWMQD8EK9AuBEpVY94&e= >> >> Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS") >> Fixes: 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") > > Sorry, I knew I should have put a comment there. > > The goal of this barrier_nospec() is to prevent speculative execution > from bypassing the SPEC_CTRL write (due to misprediction of the > conditional branch, Spectre v1 style). Otherwise the next indirect > branch or unbalanced RET could be an attack target. > > So any previous LFENCEs before that conditional branch won't help here. > > -- > Josh Ah interesting. Ok, to be clear, thats a guest -> host attack, correct? And such an attack would not at all be thwarted by the first CALL retire + LFENCE that was added on commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections”)? Sorry to be long winded, just wanting to triple check because the aforementioned commit was added slightly after the original one, and I want to make extra sure that they aren’t solving the same thing. If that is indeed the case, does that commit need to be revisited at all? Or are we saying that this Intel vulnerability needs *two* LFENCE’s to keep the host secure? If that’s the case, that’s fine, and I’m happy to transform this commit into some comments in this area to capture the issue for future onlookers?
On 31/05/2023 4:01 pm, Jon Kohler wrote: > Remove barrier_nospec(), which translates to LFENCE, in > vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1]) > already exist prior to this point. > > This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest > RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however, > commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in > 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added > directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host. > > For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to > IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value > does not match, which serves as an additional RSB-barrier. > > [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html Yeah, unfortunately PBRSB is insidious. From memory (please correct me if I'm wrong), the required safety property is this: After a VMExit (if IBRS was set prior to exit) or the write to MSR_SPEC_CTRL setting IBRS (if IBRS was not set prior to exit), one single CALL instruction must architecturally retire before any RET instructions execute (speculatively). There are several ways to arrange this, but they all basically boil down to having some serialising instruction between the first CALL and first RET on any reachable path from VMExit. ~Andrew
On Wed, May 31, 2023 at 11:58:12PM +0000, Jon Kohler wrote: > > The goal of this barrier_nospec() is to prevent speculative execution > > from bypassing the SPEC_CTRL write (due to misprediction of the > > conditional branch, Spectre v1 style). Otherwise the next indirect > > branch or unbalanced RET could be an attack target. > > > > So any previous LFENCEs before that conditional branch won't help here. > > Ah interesting. Ok, to be clear, thats a guest -> host attack, correct? And such > an attack would not at all be thwarted by the first CALL retire + LFENCE that > was added on commit 2b1299322016 ("x86/speculation: Add RSB VM Exit > protections”)? Right. > Sorry to be long winded, just wanting to triple check because > the aforementioned commit was added slightly after the original one, and I > want to make extra sure that they aren’t solving the same thing. > > If that is indeed the case, does that commit need to be revisited at all? > > Or are we saying that this Intel vulnerability needs *two* LFENCE’s to keep > the host secure? The first LFENCE (FILL_RETURN_BUFFER) forces the CALL to retire so the RSB stuff is guaranteed to take effect before the next unbalanced RET can be speculatively executed. The second LFENCE (vmx_spec_ctrl_restore_host) forces the conditional branch to retire so the SPEC_CTRL write (potential IBRS/eIBRS enablement) is guaranteed to take effect before the next indirect branch and/or unbalanced RET can be speculatively executed. So each LFENCE has a distinct purpose. That said, there are no indirect branches or unbalanced RETs between them. So it should be fine to combine them into a single LFENCE after both. You could for example just remove the first LFENCE. But only for that usage site, i.e. not for other users of FILL_RETURN_BUFFER. Or, remove them both and add an LFENCE in vmx_vmexit() right after the call to vmx_spec_ctrl_restore_host(). That might be clearer. Then it could have a comment describing its dual purposes.
On 01/06/2023 1:42 am, Josh Poimboeuf wrote: > So each LFENCE has a distinct purpose. That said, there are no indirect > branches or unbalanced RETs between them. How lucky are you feeling? You're in C at this point, which means the compiler could have emitted a call to mem{cpy,cmp}() in place of a simple assignment/comparison. ~Andrew
On Thu, Jun 01, 2023 at 01:29:12AM +0100, Andrew Cooper wrote: > On 31/05/2023 4:01 pm, Jon Kohler wrote: > > Remove barrier_nospec(), which translates to LFENCE, in > > vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1]) > > already exist prior to this point. > > > > This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest > > RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however, > > commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in > > 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added > > directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host. > > > > For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to > > IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value > > does not match, which serves as an additional RSB-barrier. > > > > [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html > > Yeah, unfortunately PBRSB is insidious. > > From memory (please correct me if I'm wrong), the required safety > property is this: After a VMExit (if IBRS was set prior to exit) or the > write to MSR_SPEC_CTRL setting IBRS (if IBRS was not set prior to exit), > one single CALL instruction must architecturally retire before any RET > instructions execute (speculatively). > > There are several ways to arrange this, but they all basically boil down > to having some serialising instruction between the first CALL and first > RET on any reachable path from VMExit. The document says the problem is *unbalanced* RET, i.e. RSB underflow. So the mitigation needs a single RSB stuff (i.e., unbalanced CALL) and then an LFENCE anytime before the next unbalanced RET.
On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote: > On 01/06/2023 1:42 am, Josh Poimboeuf wrote: > > So each LFENCE has a distinct purpose. That said, there are no indirect > > branches or unbalanced RETs between them. > > How lucky are you feeling? > > You're in C at this point, which means the compiler could have emitted a > call to mem{cpy,cmp}() in place of a simple assignment/comparison. But it's only unbalanced RETs we're concerned about, per Intel.
On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote: > On 01/06/2023 1:42 am, Josh Poimboeuf wrote: > > So each LFENCE has a distinct purpose. That said, there are no indirect > > branches or unbalanced RETs between them. > > How lucky are you feeling? > > You're in C at this point, which means the compiler could have emitted a > call to mem{cpy,cmp}() in place of a simple assignment/comparison. Moving the second LFENCE to the else part of WRMSR should be possible? So that the serialization can be achived either by WRMSR or LFENCE. This saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ. --- diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index d2d6e1b6c788..d32e6d172dd6 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7157,8 +7157,8 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx, if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) || vmx->spec_ctrl != hostval) native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval); - - barrier_nospec(); + else + barrier_nospec(); } static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote: ## 2023-05-31 > On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote: > > On 01/06/2023 1:42 am, Josh Poimboeuf wrote: > > > So each LFENCE has a distinct purpose. That said, there are no indirect > > > branches or unbalanced RETs between them. > > > > How lucky are you feeling? > > > > You're in C at this point, which means the compiler could have emitted a > > call to mem{cpy,cmp}() in place of a simple assignment/comparison. > > Moving the second LFENCE to the else part of WRMSR should be possible? > So that the serialization can be achived either by WRMSR or LFENCE. This > saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ. Yes. Though in practice it might not make much of a difference. With wrmsr+lfence, the lfence has nothing to do so it might be almost instantaneous anyway.
> On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote: > > ## 2023-05-31 >> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote: >>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote: >>>> So each LFENCE has a distinct purpose. That said, there are no indirect >>>> branches or unbalanced RETs between them. >>> >>> How lucky are you feeling? >>> >>> You're in C at this point, which means the compiler could have emitted a >>> call to mem{cpy,cmp}() in place of a simple assignment/comparison. >> >> Moving the second LFENCE to the else part of WRMSR should be possible? >> So that the serialization can be achived either by WRMSR or LFENCE. This >> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ. > > Yes. Though in practice it might not make much of a difference. With > wrmsr+lfence, the lfence has nothing to do so it might be almost > instantaneous anyway. > > -- > Josh Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above FILL_RETURN_BUFFER, and dropped this LFENCE as I did here? That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h, and that would act as the “final line of defense” LFENCE. Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur before any sort of calls no matter what? Thanks, Jon
On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote: > > > > On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > > > On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote: > > > > ## 2023-05-31 > >> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote: > >>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote: > >>>> So each LFENCE has a distinct purpose. That said, there are no indirect > >>>> branches or unbalanced RETs between them. > >>> > >>> How lucky are you feeling? > >>> > >>> You're in C at this point, which means the compiler could have emitted a > >>> call to mem{cpy,cmp}() in place of a simple assignment/comparison. > >> > >> Moving the second LFENCE to the else part of WRMSR should be possible? > >> So that the serialization can be achived either by WRMSR or LFENCE. This > >> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ. > > > > Yes. Though in practice it might not make much of a difference. With > > wrmsr+lfence, the lfence has nothing to do so it might be almost > > instantaneous anyway. > > > > -- > > Josh > > Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above > FILL_RETURN_BUFFER, and dropped this LFENCE as I did here? > > That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h, > and that would act as the “final line of defense” LFENCE. > > Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur > before any sort of calls no matter what? If we go by Intel's statement that only unbalanced RETs are a concern, that *might* be ok as long as there's a nice comment above the FILL_RETURN_BUFFER usage site describing the two purposes for the LFENCE. However, based on Andy's concerns, which I've discussed with him privately (but I'm not qualified to agree or disagree with), we may want to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than sorry. My original implementation of that function was actually asm. I can try to dig up that code.
> On Jun 5, 2023, at 12:35 PM, Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote: >> >> >>> On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <jpoimboe@kernel.org> wrote: >>> >>> On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote: >>> >>> ## 2023-05-31 >>>> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote: >>>>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote: >>>>>> So each LFENCE has a distinct purpose. That said, there are no indirect >>>>>> branches or unbalanced RETs between them. >>>>> >>>>> How lucky are you feeling? >>>>> >>>>> You're in C at this point, which means the compiler could have emitted a >>>>> call to mem{cpy,cmp}() in place of a simple assignment/comparison. >>>> >>>> Moving the second LFENCE to the else part of WRMSR should be possible? >>>> So that the serialization can be achived either by WRMSR or LFENCE. This >>>> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ. >>> >>> Yes. Though in practice it might not make much of a difference. With >>> wrmsr+lfence, the lfence has nothing to do so it might be almost >>> instantaneous anyway. >>> >>> -- >>> Josh >> >> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above >> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here? >> >> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h, >> and that would act as the “final line of defense” LFENCE. >> >> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur >> before any sort of calls no matter what? > > If we go by Intel's statement that only unbalanced RETs are a concern, > that *might* be ok as long as there's a nice comment above the > FILL_RETURN_BUFFER usage site describing the two purposes for the > LFENCE. > > However, based on Andy's concerns, which I've discussed with him > privately (but I'm not qualified to agree or disagree with), we may want > to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than > sorry. My original implementation of that function was actually asm. I > can try to dig up that code. > > -- > Josh RE ASM - that’d be nice, given that the restore guest pre-vmentry is now ASM, would match nicely. Then all the code is in one spot, in one file, in one language. If you could dig that up so we didn’t have to start from scratch, that’d be great (imho). Jon
On Mon, Jun 05, 2023 at 04:39:02PM +0000, Jon Kohler wrote: > > > > On Jun 5, 2023, at 12:35 PM, Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > > > On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote: > >> > >> > >>> On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <jpoimboe@kernel.org> wrote: > >>> > >>> On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote: > >>> > >>> ## 2023-05-31 > >>>> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote: > >>>>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote: > >>>>>> So each LFENCE has a distinct purpose. That said, there are no indirect > >>>>>> branches or unbalanced RETs between them. > >>>>> > >>>>> How lucky are you feeling? > >>>>> > >>>>> You're in C at this point, which means the compiler could have emitted a > >>>>> call to mem{cpy,cmp}() in place of a simple assignment/comparison. > >>>> > >>>> Moving the second LFENCE to the else part of WRMSR should be possible? > >>>> So that the serialization can be achived either by WRMSR or LFENCE. This > >>>> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ. > >>> > >>> Yes. Though in practice it might not make much of a difference. With > >>> wrmsr+lfence, the lfence has nothing to do so it might be almost > >>> instantaneous anyway. > >>> > >>> -- > >>> Josh > >> > >> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above > >> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here? > >> > >> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h, > >> and that would act as the “final line of defense” LFENCE. > >> > >> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur > >> before any sort of calls no matter what? > > > > If we go by Intel's statement that only unbalanced RETs are a concern, > > that *might* be ok as long as there's a nice comment above the > > FILL_RETURN_BUFFER usage site describing the two purposes for the > > LFENCE. We would then need FILL_RETURN_BUFFER to unconditionally execute LFENCE to account for wrmsr branch misprediction. Currently LFENCE is not executed for !X86_BUG_EIBRS_PBRSB. > > However, based on Andy's concerns, which I've discussed with him > > privately (but I'm not qualified to agree or disagree with), we may want > > to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than > > sorry. My original implementation of that function was actually asm. I > > can try to dig up that code. Note: VMexit CALL RET RET <---- This is also a problem if the first call hasn't retired yet. LFENCE Converting vmx_spec_ctrl_restore_host() to ASM should be able to take care of this.
On Mon, Jun 05, 2023, Pawan Gupta wrote: > On Mon, Jun 05, 2023 at 04:39:02PM +0000, Jon Kohler wrote: > > >>> Yes. Though in practice it might not make much of a difference. With > > >>> wrmsr+lfence, the lfence has nothing to do so it might be almost > > >>> instantaneous anyway. > > >>> > > >>> -- > > >>> Josh > > >> > > >> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above > > >> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here? > > >> > > >> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h, > > >> and that would act as the “final line of defense” LFENCE. > > >> > > >> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur > > >> before any sort of calls no matter what? > > > > > > If we go by Intel's statement that only unbalanced RETs are a concern, > > > that *might* be ok as long as there's a nice comment above the > > > FILL_RETURN_BUFFER usage site describing the two purposes for the > > > LFENCE. > > We would then need FILL_RETURN_BUFFER to unconditionally execute LFENCE > to account for wrmsr branch misprediction. Currently LFENCE is not > executed for !X86_BUG_EIBRS_PBRSB. > > > > However, based on Andy's concerns, which I've discussed with him > > > privately (but I'm not qualified to agree or disagree with), we may want > > > to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than > > > sorry. My original implementation of that function was actually asm. I > > > can try to dig up that code. > > Note: > > VMexit > CALL > RET > RET <---- This is also a problem if the first call hasn't retired yet. > LFENCE > > Converting vmx_spec_ctrl_restore_host() to ASM should be able to take care > of this. Is there an actual bug here, or are we just micro-optimizing something that may or may not need additional optimization? Unless there's a bug to be fixed, moving code into ASM and increasing complexity doesn't seem worthwhile.
> On Jun 5, 2023, at 2:31 PM, Sean Christopherson <seanjc@google.com> wrote: > > On Mon, Jun 05, 2023, Pawan Gupta wrote: >> On Mon, Jun 05, 2023 at 04:39:02PM +0000, Jon Kohler wrote: >>>>>> Yes. Though in practice it might not make much of a difference. With >>>>>> wrmsr+lfence, the lfence has nothing to do so it might be almost >>>>>> instantaneous anyway. >>>>>> >>>>>> -- >>>>>> Josh >>>>> >>>>> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above >>>>> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here? >>>>> >>>>> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h, >>>>> and that would act as the “final line of defense” LFENCE. >>>>> >>>>> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur >>>>> before any sort of calls no matter what? >>>> >>>> If we go by Intel's statement that only unbalanced RETs are a concern, >>>> that *might* be ok as long as there's a nice comment above the >>>> FILL_RETURN_BUFFER usage site describing the two purposes for the >>>> LFENCE. >> >> We would then need FILL_RETURN_BUFFER to unconditionally execute LFENCE >> to account for wrmsr branch misprediction. Currently LFENCE is not >> executed for !X86_BUG_EIBRS_PBRSB. >> >>>> However, based on Andy's concerns, which I've discussed with him >>>> privately (but I'm not qualified to agree or disagree with), we may want >>>> to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than >>>> sorry. My original implementation of that function was actually asm. I >>>> can try to dig up that code. >> >> Note: >> >> VMexit >> CALL >> RET >> RET <---- This is also a problem if the first call hasn't retired yet. >> LFENCE >> >> Converting vmx_spec_ctrl_restore_host() to ASM should be able to take care >> of this. > > Is there an actual bug here, or are we just micro-optimizing something that may or > may not need additional optimization? Unless there's a bug to be fixed, moving > code into ASM and increasing complexity doesn't seem worthwhile. The (slight) bug here is that on systems where the host != guest spec ctrl, they get hit with LFENCE + WRMSR to SPEC_CTRL + LFENCE, when in reality they should just get LFENCE + WRMSR to SPEC_CTRL and thats it. That would be satisfied with Pawan’s suggestion to move the LFENCE into the else condition in the last branch of vmx_spec_ctrl_restore_host(). The optimization on top of that would be to see if we could whack that 2x LFENCE down to 1x LFENCE. Just feels wrong to have 2x LFENCE’s in the critical path, even if the second one ends up being fairly snappy because of the first one (and/or the WRMSR). So to recap, fixing “the bug” does not require moving to ASM. Optimizing the 2x LFENCE into 1x LFENCE (probably) requires going to ASM from the sounds of it?
On Mon, Jun 05, 2023 at 11:31:34AM -0700, Sean Christopherson wrote: > Is there an actual bug here, or are we just micro-optimizing something that may or > may not need additional optimization? Unless there's a bug to be fixed, moving > code into ASM and increasing complexity doesn't seem worthwhile. I can't really argue with that. FWIW, here's the (completely untested) patch. ---8<--- From: Josh Poimboeuf <jpoimboe@kernel.org> Subject: [PATCH] KVM: VMX: Convert vmx_spec_ctrl_restore_host() to assembly Convert vmx_spec_ctrl_restore_host() to assembly. This allows the removal of a redundant LFENCE. It also "feels" safer as it doesn't allow the compiler to insert any surprises. And it's more symmetrical with the pre-vmentry SPEC_CTRL handling, which is also done in assembly. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> --- arch/x86/kvm/vmx/vmenter.S | 71 ++++++++++++++++++++++++++++++++------ arch/x86/kvm/vmx/vmx.c | 25 -------------- arch/x86/kvm/vmx/vmx.h | 1 - 3 files changed, 61 insertions(+), 36 deletions(-) diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 631fd7da2bc3..977f3487469c 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -108,7 +108,7 @@ SYM_FUNC_START(__vmx_vcpu_run) lea (%_ASM_SP), %_ASM_ARG2 call vmx_update_host_rsp - ALTERNATIVE "jmp .Lspec_ctrl_done", "", X86_FEATURE_MSR_SPEC_CTRL + ALTERNATIVE "jmp .Lguest_spec_ctrl_done", "", X86_FEATURE_MSR_SPEC_CTRL /* * SPEC_CTRL handling: if the guest's SPEC_CTRL value differs from the @@ -122,13 +122,13 @@ SYM_FUNC_START(__vmx_vcpu_run) movl VMX_spec_ctrl(%_ASM_DI), %edi movl PER_CPU_VAR(x86_spec_ctrl_current), %esi cmp %edi, %esi - je .Lspec_ctrl_done + je .Lguest_spec_ctrl_done mov $MSR_IA32_SPEC_CTRL, %ecx xor %edx, %edx mov %edi, %eax wrmsr -.Lspec_ctrl_done: +.Lguest_spec_ctrl_done: /* * Since vmentry is serializing on affected CPUs, there's no need for @@ -253,9 +253,65 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL) #endif /* - * IMPORTANT: RSB filling and SPEC_CTRL handling must be done before - * the first unbalanced RET after vmexit! + * IMPORTANT: The below SPEC_CTRL handling and RSB filling must be done + * before the first RET after vmexit! + */ + + ALTERNATIVE "jmp .Lhost_spec_ctrl_done", "", X86_FEATURE_MSR_SPEC_CTRL + + pop %_ASM_SI /* @flags */ + pop %_ASM_DI /* @vmx */ + + /* + * if (flags & VMX_RUN_SAVE_SPEC_CTRL) + * vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL); + */ + test $VMX_RUN_SAVE_SPEC_CTRL, %_ASM_SI + jz .Lhost_spec_ctrl_check + + mov $MSR_IA32_SPEC_CTRL, %ecx + rdmsr + mov %eax, VMX_spec_ctrl(%_ASM_DI) + +.Lhost_spec_ctrl_check: + /* + * If the guest/host SPEC_CTRL values differ, restore the host value. * + * For legacy IBRS, the IBRS bit always needs to be written after + * transitioning from a less privileged predictor mode, regardless of + * whether the guest/host values differ. + * + * if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) || + * vmx->spec_ctrl != this_cpu_read(x86_spec_ctrl_current)) + * native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval); + */ + ALTERNATIVE "", "jmp .Lhost_spec_ctrl_write", X86_FEATURE_KERNEL_IBRS + movl VMX_spec_ctrl(%_ASM_DI), %edi + movl PER_CPU_VAR(x86_spec_ctrl_current), %esi + cmp %edi, %esi + je .Lhost_spec_ctrl_done_lfence + +.Lhost_spec_ctrl_write: + mov $MSR_IA32_SPEC_CTRL, %ecx + xor %edx, %edx + mov %esi, %eax + wrmsr + +.Lhost_spec_ctrl_done_lfence: + /* + * This ensures that speculative execution doesn't incorrectly bypass + * the above SPEC_CTRL wrmsr by mispredicting the 'je'. + * + * It's only needed if the below FILL_RETURN_BUFFER doesn't do an + * LFENCE. Thus the X86_FEATURE_RSB_VMEXIT{_LITE} alternatives. + */ + ALTERNATIVE_2 "lfence", \ + "", X86_FEATURE_RSB_VMEXIT, \ + "", X86_FEATURE_RSB_VMEXIT_LITE + +.Lhost_spec_ctrl_done: + + /* * For retpoline or IBRS, RSB filling is needed to prevent poisoned RSB * entries and (in some cases) RSB underflow. * @@ -267,11 +323,6 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL) FILL_RETURN_BUFFER %_ASM_CX, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT,\ X86_FEATURE_RSB_VMEXIT_LITE - pop %_ASM_ARG2 /* @flags */ - pop %_ASM_ARG1 /* @vmx */ - - call vmx_spec_ctrl_restore_host - /* Put return value in AX */ mov %_ASM_BX, %_ASM_AX diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 44fb619803b8..d353b0e23918 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7109,31 +7109,6 @@ void noinstr vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp) } } -void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx, - unsigned int flags) -{ - u64 hostval = this_cpu_read(x86_spec_ctrl_current); - - if (!cpu_feature_enabled(X86_FEATURE_MSR_SPEC_CTRL)) - return; - - if (flags & VMX_RUN_SAVE_SPEC_CTRL) - vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL); - - /* - * If the guest/host SPEC_CTRL values differ, restore the host value. - * - * For legacy IBRS, the IBRS bit always needs to be written after - * transitioning from a less privileged predictor mode, regardless of - * whether the guest/host values differ. - */ - if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) || - vmx->spec_ctrl != hostval) - native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval); - - barrier_nospec(); -} - static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu) { switch (to_vmx(vcpu)->exit_reason.basic) { diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 9e66531861cf..f9fab33f4d76 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -420,7 +420,6 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu); struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr); void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu); void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp); -void vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx, unsigned int flags); unsigned int __vmx_vcpu_run_flags(struct vcpu_vmx *vmx); bool __vmx_vcpu_run(struct vcpu_vmx *vmx, unsigned long *regs, unsigned int flags);
On 05/06/2023 5:35 pm, Josh Poimboeuf wrote: > On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote: >> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above >> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here? >> >> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h, >> and that would act as the “final line of defense” LFENCE. >> >> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur >> before any sort of calls no matter what? > If we go by Intel's statement that only unbalanced RETs are a concern, > that *might* be ok as long as there's a nice comment above the > FILL_RETURN_BUFFER usage site describing the two purposes for the > LFENCE. > > However, based on Andy's concerns, which I've discussed with him > privately (but I'm not qualified to agree or disagree with), we may want > to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than > sorry. My original implementation of that function was actually asm. I > can try to dig up that code. Lemme see if I can remember the whole safely position. I've just gone through a years worth of notes and conversations, including the following gems: From Intel: "And then on top of that, the LFENCE in vmx_spec_ctrl_restore_host would save you. Fingers crossed." From me: "The how and why is important. Not for right now, but for the years to come when all of this logic inevitably gets tweaked/refactored/moved". Date-check says 11 months... __vmx_vcpu_run() is a leaf function as far as the kernel stack is concerned, so to a first approximation, it behaves as such: VMEXIT RET The RET gets a prediction from the top of the RSB (a guest controlled value), but during execute the prediction is discovered to be wrong so a resteer occurs, causing it to restart from __vmx_vcpu_run()'s caller. Skipping the middle-ground with a 32-entry RSB stuffling loop, we have the following behaviour on eIBRS parts: VMEXIT (flush RSB if IBRS was 1 in guest) Restore Host MSR_SPEC_CTRL (flush RSB if IBRS was 0 in guest) RET where the RET (in theory) encounters an RSB-empty condition and falls back to an indirect prediction. PBRSB is a missing corner case in the hardware RSB flush, which is only corrected after one CALL instruction architecturally retires. The problem with mitigating however is that it is heavily entangled with BCBS, so I refer you to https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/analysis-speculative-execution-side-channels.html#inpage-nav-undefined-1-undefined which describes one example of how RETs can go wrong. The critical observation is that while for PBRSB it's the first unbalanced RET which causes problem, as soon as *any* RET has executed and got a bad resteer (even temporarily), you've lost all ability to contain the damage. So, to protect against PBRSB, one CALL instruction must retire before *any* RET instruction executes. Pawan's patch to turn the unconditional lfence into an else-lfence should be safe seeing as Intel's guidance https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html explicitly says you can use the WRMSR to restore the host MSR_SPEC_CTRL value as the speculation barrier. But, the safety of vmx_spec_ctrl_restore_host() in the first place depends on the early return never ever becoming a conditional, and the compiler never emitting a call to memcpy()/memset()/whatever behind your back - something which is not prohibited by noinstr. I hope this clears things up. ~Andrew
On Tue, Jun 06, 2023 at 01:20:52AM +0100, Andrew Cooper wrote: <clip very useful summary which belongs in git somewhere> > But, the safety of vmx_spec_ctrl_restore_host() in the first place > depends on the early return never ever becoming a conditional, Good point. And that would be easier to overlook in C. > and the compiler never emitting a call to memcpy()/memset()/whatever > behind your back - something which is not prohibited by noinstr. Au contraire, objtool has checking for that: if (state->noinstr && state->instr <= 0 && !noinstr_call_dest(file, insn, insn_call_dest(insn))) { WARN_INSN(insn, "call to %s() leaves .noinstr.text section", call_dest_name(insn)); return 1; } Regardless, despite being the person who wrote this thing in C to begin with, I believe asm really is a better fit due to the delicate and precise nature of the mitigations.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 44fb619803b8..98ca8b79aab1 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7130,8 +7130,6 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx, if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) || vmx->spec_ctrl != hostval) native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval); - - barrier_nospec(); } static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
Remove barrier_nospec(), which translates to LFENCE, in vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1]) already exist prior to this point. This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however, commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host. For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value does not match, which serves as an additional RSB-barrier. [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS") Fixes: 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") Signed-off-by: Jon Kohler <jon@nutanix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Daniel Sneddon <daniel.sneddon@linux.intel.com> CC: Josh Poimboeuf <jpoimboe@kernel.org> CC: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> CC: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/kvm/vmx/vmx.c | 2 -- 1 file changed, 2 deletions(-)