Message ID | 443ba725-01b7-9174-3298-66f44ba3f1ec@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | XSA-292 follow-up | expand |
On Wed, Sep 11, 2019 at 05:23:20PM +0200, Jan Beulich wrote: > The bit is meaningful only for MOV-to-CR3 insns, not anywhere else, in > particular not when loading nested guest state. Can't you use the current vcpu to check if the guest is in nested mode, and avoid having to explicitly pass the noflush parameter? > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/xen/arch/x86/hvm/emulate.c > +++ b/xen/arch/x86/hvm/emulate.c > @@ -2080,6 +2080,8 @@ static int hvmemul_write_cr( > HVMTRACE_LONG_2D(CR_WRITE, reg, TRC_PAR_LONG(val)); > switch ( reg ) > { > + bool noflush; > + > case 0: > rc = hvm_set_cr0(val, true); > break; > @@ -2090,7 +2092,10 @@ static int hvmemul_write_cr( > break; > > case 3: > - rc = hvm_set_cr3(val, true); > + noflush = hvm_pcid_enabled(current) && (val & X86_CR3_NOFLUSH); > + if ( noflush ) > + val &= ~X86_CR3_NOFLUSH; > + rc = hvm_set_cr3(val, noflush, true); > break; > > case 4: > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -2059,12 +2059,17 @@ int hvm_mov_to_cr(unsigned int cr, unsig > > switch ( cr ) > { > + bool noflush; > + > case 0: > rc = hvm_set_cr0(val, true); > break; > > case 3: > - rc = hvm_set_cr3(val, true); > + noflush = hvm_pcid_enabled(curr) && (val & X86_CR3_NOFLUSH); > + if ( noflush ) > + val &= ~X86_CR3_NOFLUSH; > + rc = hvm_set_cr3(val, noflush, true); > break; > > case 4: > @@ -2282,12 +2287,11 @@ int hvm_set_cr0(unsigned long value, boo > return X86EMUL_OKAY; > } > > -int hvm_set_cr3(unsigned long value, bool may_defer) > +int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer) I would rather introduce a flush parameter instead, I think negated booleans are harder to parse mentally, and easier to get wrong. Thanks, Roger.
On 12.09.2019 13:35, Roger Pau Monné wrote: > On Wed, Sep 11, 2019 at 05:23:20PM +0200, Jan Beulich wrote: >> The bit is meaningful only for MOV-to-CR3 insns, not anywhere else, in >> particular not when loading nested guest state. > > Can't you use the current vcpu to check if the guest is in nested > mode, and avoid having to explicitly pass the noflush parameter? Even if this implication held today (it doesn't according to the uses in hvmemul_write_cr() and hvm_mov_to_cr()), I don't think introducing such a dependency would be a good idea. >> @@ -2282,12 +2287,11 @@ int hvm_set_cr0(unsigned long value, boo >> return X86EMUL_OKAY; >> } >> >> -int hvm_set_cr3(unsigned long value, bool may_defer) >> +int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer) > > I would rather introduce a flush parameter instead, I think negated > booleans are harder to parse mentally, and easier to get wrong. I did actually consider this, but decided against for the reason of this "no flush" behavior being a later addition to the effects CR3 writes have, i.e. I'd intentionally like it to be in line with X86_CR3_NOFLUSH. Jan
On Thu, Sep 12, 2019 at 01:52:12PM +0200, Jan Beulich wrote: > On 12.09.2019 13:35, Roger Pau Monné wrote: > > On Wed, Sep 11, 2019 at 05:23:20PM +0200, Jan Beulich wrote: > >> The bit is meaningful only for MOV-to-CR3 insns, not anywhere else, in > >> particular not when loading nested guest state. > > > > Can't you use the current vcpu to check if the guest is in nested > > mode, and avoid having to explicitly pass the noflush parameter? > > Even if this implication held today (it doesn't according to > the uses in hvmemul_write_cr() and hvm_mov_to_cr()), I don't > think introducing such a dependency would be a good idea. Oh, I see. Even when running a nested guest hvm_mov_to_cr could still cause a no flush load. > >> @@ -2282,12 +2287,11 @@ int hvm_set_cr0(unsigned long value, boo > >> return X86EMUL_OKAY; > >> } > >> > >> -int hvm_set_cr3(unsigned long value, bool may_defer) > >> +int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer) > > > > I would rather introduce a flush parameter instead, I think negated > > booleans are harder to parse mentally, and easier to get wrong. > > I did actually consider this, but decided against for the > reason of this "no flush" behavior being a later addition to > the effects CR3 writes have, i.e. I'd intentionally like it > to be in line with X86_CR3_NOFLUSH. IMO the hardware addition is a noflush flag in order to keep the previous behaviour of a cr3 write (ie: no flush has to be an opt-in). Here it's a software interface that already requires you to change the call sites anyway, and hence I would have preferred a flush parameter. I see however there's already quite some functions using such a noflush parameter, so I'm not going to oppose. On a different question, AFAICT hvm_set_cr3 should never be called with X86_CR3_NOFLUSH set? If so, do you think it would make sense to add an assert to that regard? Thanks, Roger.
On 12.09.2019 16:44, Roger Pau Monné wrote: > On a different question, AFAICT hvm_set_cr3 should never be called > with X86_CR3_NOFLUSH set? If so, do you think it would make sense to > add an assert to that regard? I've been debating this with myself, and decided against for now. We don't know what meaning the bit may gain eventually in the actual register. Jan
On Thu, Sep 12, 2019 at 04:47:17PM +0200, Jan Beulich wrote: > On 12.09.2019 16:44, Roger Pau Monné wrote: > > On a different question, AFAICT hvm_set_cr3 should never be called > > with X86_CR3_NOFLUSH set? If so, do you think it would make sense to > > add an assert to that regard? > > I've been debating this with myself, and decided against for now. > We don't know what meaning the bit may gain eventually in the > actual register. I'm slightly lost here, the noflush bit is actually defined in the Intel SDM for cr3, and hence won't gain any other meaning? Or else you might still risk writing a cr3 with noflush set in case the callers somehow misbehave? Thanks, Roger.
On 12.09.2019 17:42, Roger Pau Monné wrote: > On Thu, Sep 12, 2019 at 04:47:17PM +0200, Jan Beulich wrote: >> On 12.09.2019 16:44, Roger Pau Monné wrote: >>> On a different question, AFAICT hvm_set_cr3 should never be called >>> with X86_CR3_NOFLUSH set? If so, do you think it would make sense to >>> add an assert to that regard? >> >> I've been debating this with myself, and decided against for now. >> We don't know what meaning the bit may gain eventually in the >> actual register. > > I'm slightly lost here, the noflush bit is actually defined in the > Intel SDM for cr3, and hence won't gain any other meaning? The noflush bit is a operation one, i.e. taking effect on the MOV-to-CR3, without getting written to the underlying register. Therefore there may well appear a meaning for the actual register bit, but I agree it doesn't seem very likely for such an overload to get put in place. Jan
--- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -2080,6 +2080,8 @@ static int hvmemul_write_cr( HVMTRACE_LONG_2D(CR_WRITE, reg, TRC_PAR_LONG(val)); switch ( reg ) { + bool noflush; + case 0: rc = hvm_set_cr0(val, true); break; @@ -2090,7 +2092,10 @@ static int hvmemul_write_cr( break; case 3: - rc = hvm_set_cr3(val, true); + noflush = hvm_pcid_enabled(current) && (val & X86_CR3_NOFLUSH); + if ( noflush ) + val &= ~X86_CR3_NOFLUSH; + rc = hvm_set_cr3(val, noflush, true); break; case 4: --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2059,12 +2059,17 @@ int hvm_mov_to_cr(unsigned int cr, unsig switch ( cr ) { + bool noflush; + case 0: rc = hvm_set_cr0(val, true); break; case 3: - rc = hvm_set_cr3(val, true); + noflush = hvm_pcid_enabled(curr) && (val & X86_CR3_NOFLUSH); + if ( noflush ) + val &= ~X86_CR3_NOFLUSH; + rc = hvm_set_cr3(val, noflush, true); break; case 4: @@ -2282,12 +2287,11 @@ int hvm_set_cr0(unsigned long value, boo return X86EMUL_OKAY; } -int hvm_set_cr3(unsigned long value, bool may_defer) +int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer) { struct vcpu *v = current; struct page_info *page; unsigned long old = v->arch.hvm.guest_cr[3]; - bool noflush = false; if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled & monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) ) @@ -2299,17 +2303,12 @@ int hvm_set_cr3(unsigned long value, boo /* The actual write will occur in hvm_do_resume(), if permitted. */ v->arch.vm_event->write_data.do_write.cr3 = 1; v->arch.vm_event->write_data.cr3 = value; + v->arch.vm_event->write_data.cr3_noflush = noflush; return X86EMUL_OKAY; } } - if ( hvm_pcid_enabled(v) ) /* Clear the noflush bit. */ - { - noflush = value & X86_CR3_NOFLUSH; - value &= ~X86_CR3_NOFLUSH; - } - if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) && (value != v->arch.hvm.guest_cr[3]) ) { @@ -3004,7 +3003,7 @@ void hvm_task_switch( if ( task_switch_load_seg(x86_seg_ldtr, tss.ldt, new_cpl, 0) ) goto out; - rc = hvm_set_cr3(tss.cr3, true); + rc = hvm_set_cr3(tss.cr3, false, true); if ( rc == X86EMUL_EXCEPTION ) hvm_inject_hw_exception(TRAP_gp_fault, 0); if ( rc != X86EMUL_OKAY ) --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -324,7 +324,7 @@ static int nsvm_vcpu_hostrestore(struct v->arch.guest_table = pagetable_null(); /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */ } - rc = hvm_set_cr3(n1vmcb->_cr3, true); + rc = hvm_set_cr3(n1vmcb->_cr3, false, true); if ( rc == X86EMUL_EXCEPTION ) hvm_inject_hw_exception(TRAP_gp_fault, 0); if (rc != X86EMUL_OKAY) @@ -584,7 +584,7 @@ static int nsvm_vmcb_prepare4vmrun(struc nestedsvm_vmcb_set_nestedp2m(v, ns_vmcb, n2vmcb); /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */ - rc = hvm_set_cr3(ns_vmcb->_cr3, true); + rc = hvm_set_cr3(ns_vmcb->_cr3, false, true); if ( rc == X86EMUL_EXCEPTION ) hvm_inject_hw_exception(TRAP_gp_fault, 0); if (rc != X86EMUL_OKAY) @@ -598,7 +598,7 @@ static int nsvm_vmcb_prepare4vmrun(struc * we assume it intercepts page faults. */ /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */ - rc = hvm_set_cr3(ns_vmcb->_cr3, true); + rc = hvm_set_cr3(ns_vmcb->_cr3, false, true); if ( rc == X86EMUL_EXCEPTION ) hvm_inject_hw_exception(TRAP_gp_fault, 0); if (rc != X86EMUL_OKAY) --- a/xen/arch/x86/hvm/vm_event.c +++ b/xen/arch/x86/hvm/vm_event.c @@ -110,7 +110,7 @@ void hvm_vm_event_do_resume(struct vcpu if ( unlikely(w->do_write.cr3) ) { - if ( hvm_set_cr3(w->cr3, false) == X86EMUL_EXCEPTION ) + if ( hvm_set_cr3(w->cr3, w->cr3_noflush, false) == X86EMUL_EXCEPTION ) hvm_inject_hw_exception(TRAP_gp_fault, 0); w->do_write.cr3 = 0; --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1032,7 +1032,7 @@ static void load_shadow_guest_state(stru if ( rc == X86EMUL_EXCEPTION ) hvm_inject_hw_exception(TRAP_gp_fault, 0); - rc = hvm_set_cr3(get_vvmcs(v, GUEST_CR3), true); + rc = hvm_set_cr3(get_vvmcs(v, GUEST_CR3), false, true); if ( rc == X86EMUL_EXCEPTION ) hvm_inject_hw_exception(TRAP_gp_fault, 0); @@ -1246,7 +1246,7 @@ static void load_vvmcs_host_state(struct if ( rc == X86EMUL_EXCEPTION ) hvm_inject_hw_exception(TRAP_gp_fault, 0); - rc = hvm_set_cr3(get_vvmcs(v, HOST_CR3), true); + rc = hvm_set_cr3(get_vvmcs(v, HOST_CR3), false, true); if ( rc == X86EMUL_EXCEPTION ) hvm_inject_hw_exception(TRAP_gp_fault, 0); --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -274,6 +274,8 @@ struct monitor_write_data { unsigned int cr4 : 1; } do_write; + bool cr3_noflush; + uint32_t msr; uint64_t value; uint64_t cr0; --- a/xen/include/asm-x86/hvm/support.h +++ b/xen/include/asm-x86/hvm/support.h @@ -135,7 +135,7 @@ void hvm_shadow_handle_cd(struct vcpu *v */ int hvm_set_efer(uint64_t value); int hvm_set_cr0(unsigned long value, bool may_defer); -int hvm_set_cr3(unsigned long value, bool may_defer); +int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer); int hvm_set_cr4(unsigned long value, bool may_defer); int hvm_descriptor_access_intercept(uint64_t exit_info, uint64_t vmx_exit_qualification,
The bit is meaningful only for MOV-to-CR3 insns, not anywhere else, in particular not when loading nested guest state. Signed-off-by: Jan Beulich <jbeulich@suse.com>