Message ID | 20191224132616.47441-2-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86: improve assisted tlb flush and use it in guest mode | expand |
On 24/12/2019 13:26, Roger Pau Monne wrote: > There's no need to call paging_update_cr3 unless CR3 trapping is > enabled, and that's only the case when using shadow paging or when > requested for introspection purposes, otherwise there's no need to > pause all the vCPUs of the domain in order to perform the flush. > > Check whether CR3 trapping is currently in use in order to decide > whether the vCPUs should be paused, otherwise just perform the flush. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> I agree that the existing logic poor, but this direction looks to be even more fragile. Instead, I think it would be better to follow the EPT invalidation example; mark all vcpus as needing a tlb flush, and IPI the domain dirty mask, having the return-to-guest path do the flushing. This avoids all vcpu pausing/unpausing activities, and the cost of the flush is incurred by the target vcpu, rather than the vcpu making the hypercall accumulate the cost for everything, as well as a large amount of remote VMCS accesses. It can probably also remove the need for the flush_vcpu() callback which is going to be expensive due to retpoline, and whose contents are trivial. ~Andrew
On Fri, Dec 27, 2019 at 02:52:17PM +0000, Andrew Cooper wrote: > On 24/12/2019 13:26, Roger Pau Monne wrote: > > There's no need to call paging_update_cr3 unless CR3 trapping is > > enabled, and that's only the case when using shadow paging or when > > requested for introspection purposes, otherwise there's no need to > > pause all the vCPUs of the domain in order to perform the flush. > > > > Check whether CR3 trapping is currently in use in order to decide > > whether the vCPUs should be paused, otherwise just perform the flush. > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > I agree that the existing logic poor, but this direction looks to be > even more fragile. > > Instead, I think it would be better to follow the EPT invalidation > example; mark all vcpus as needing a tlb flush, and IPI the domain dirty > mask, having the return-to-guest path do the flushing. AFAICT there's no need to call the tlb flush, the vmexit/vmentry itself will perform the necessary flushes, so the only requirement is to IPI the pCPUs in order to force a vmexit. > This avoids all vcpu pausing/unpausing activities, and the cost of the > flush is incurred by the target vcpu, rather than the vcpu making the > hypercall accumulate the cost for everything, as well as a large amount > of remote VMCS accesses. Hm, then we would need a way to pin the vCPUs to the pCPUs they are running on, or else in the introspection-enabled case you could end up calling paging_update_cr3 on vCPUs of other domains (maybe that's fine, but it could mess up with introspection I guess). AFAICT the call to paging_update_cr3 needs to be done from hvm_flush_vcpu_tlb or else we would have to freeze the scheduler so that vCPUs don't move around pCPUs (or get de-scheduled), I think we still need the pause in the introspection case, but the open coded pause loop could be replaced with domain_pause_except_self. > It can probably also remove the need for the flush_vcpu() callback which > is going to be expensive due to retpoline, and whose contents are trivial. I was planning to look into this, but wanted to send this version first since it's already a big improvement in terms of performance. Thanks, Roger.
On 31.12.2019 13:10, Roger Pau Monné wrote: > On Fri, Dec 27, 2019 at 02:52:17PM +0000, Andrew Cooper wrote: >> On 24/12/2019 13:26, Roger Pau Monne wrote: >>> There's no need to call paging_update_cr3 unless CR3 trapping is >>> enabled, and that's only the case when using shadow paging or when >>> requested for introspection purposes, otherwise there's no need to >>> pause all the vCPUs of the domain in order to perform the flush. >>> >>> Check whether CR3 trapping is currently in use in order to decide >>> whether the vCPUs should be paused, otherwise just perform the flush. >>> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >> >> I agree that the existing logic poor, but this direction looks to be >> even more fragile. >> >> Instead, I think it would be better to follow the EPT invalidation >> example; mark all vcpus as needing a tlb flush, and IPI the domain dirty >> mask, having the return-to-guest path do the flushing. > > AFAICT there's no need to call the tlb flush, the vmexit/vmentry > itself will perform the necessary flushes, so the only requirement is > to IPI the pCPUs in order to force a vmexit. TLB flushing is at best conditional upon VM entry - see the callers of hvm_asid_handle_vmenter(). Jan
On 24.12.2019 14:26, Roger Pau Monne wrote: > There's no need to call paging_update_cr3 unless CR3 trapping is > enabled, and that's only the case when using shadow paging or when > requested for introspection purposes, otherwise there's no need to > pause all the vCPUs of the domain in order to perform the flush. > > Check whether CR3 trapping is currently in use in order to decide > whether the vCPUs should be paused, otherwise just perform the flush. First of all - with the commit introducing the pausing not saying anything on the "why", you must have gained some understanding there. Could you share this? I can't see why this was needed, and sh_update_cr3() also doesn't look to have any respective ASSERT() or alike. I'm having even more trouble seeing why in HAP mode the pausing would be needed. As a result I wonder whether, rather than determining whether pausing is needed inside the function, this shouldn't be a flag in struct paging_mode. Next I seriously doubt introspection hooks should be called here. Introspection should be about guest actions, and us calling paging_update_cr3() is an implementation detail of Xen, not something the guest controls. Even more, there not being any CR3 change here I wonder whether the call by the hooks to hvm_update_guest_cr3() couldn't be suppressed altogether in this case. Quite possibly in the shadow case there could be more steps that aren't really needed, so perhaps a separate hook might be on order. Jan
On Fri, Jan 03, 2020 at 04:17:14PM +0100, Jan Beulich wrote: > On 24.12.2019 14:26, Roger Pau Monne wrote: > > There's no need to call paging_update_cr3 unless CR3 trapping is > > enabled, and that's only the case when using shadow paging or when > > requested for introspection purposes, otherwise there's no need to > > pause all the vCPUs of the domain in order to perform the flush. > > > > Check whether CR3 trapping is currently in use in order to decide > > whether the vCPUs should be paused, otherwise just perform the flush. > > First of all - with the commit introducing the pausing not saying > anything on the "why", you must have gained some understanding > there. Could you share this? hap_update_cr3 does a "v->arch.hvm.hw_cr[3] = v->arch.hvm.guest_cr[3]" unconditionally, and such access would be racy if the vCPU is running and also modifying cr3 at the same time AFAICT. Just pausing each vCPU before calling paging_update_cr3 should be fine and would have a smaller performance penalty. > I can't see why this was needed, and > sh_update_cr3() also doesn't look to have any respective ASSERT() > or alike. I'm having even more trouble seeing why in HAP mode the > pausing would be needed. > > As a result I wonder whether, rather than determining whether > pausing is needed inside the function, this shouldn't be a flag > in struct paging_mode. > > Next I seriously doubt introspection hooks should be called here. > Introspection should be about guest actions, and us calling > paging_update_cr3() is an implementation detail of Xen, not > something the guest controls. Even more, there not being any CR3 > change here I wonder whether the call by the hooks to > hvm_update_guest_cr3() couldn't be suppressed altogether in this > case. Quite possibly in the shadow case there could be more > steps that aren't really needed, so perhaps a separate hook might > be on order. Right, I guess just having a hook that does a flush would be enough. Let me try to propose something slightly better. Thanks, Roger.
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 4dfaf35566..7dcc16afc6 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3985,25 +3985,36 @@ bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v), static DEFINE_PER_CPU(cpumask_t, flush_cpumask); cpumask_t *mask = &this_cpu(flush_cpumask); struct domain *d = current->domain; + /* + * CR3 trapping is only enabled when running with shadow paging or when + * requested for introspection purposes, otherwise there's no need to call + * paging_update_cr3 and hence pause all vCPUs. + */ + bool trap_cr3 = !paging_mode_hap(d) || + (d->arch.monitor.write_ctrlreg_enabled & + monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)); struct vcpu *v; - /* Avoid deadlock if more than one vcpu tries this at the same time. */ - if ( !spin_trylock(&d->hypercall_deadlock_mutex) ) - return false; + if ( trap_cr3 ) + { + /* Avoid deadlock if more than one vcpu tries this at the same time. */ + if ( !spin_trylock(&d->hypercall_deadlock_mutex) ) + return false; - /* Pause all other vcpus. */ - for_each_vcpu ( d, v ) - if ( v != current && flush_vcpu(ctxt, v) ) - vcpu_pause_nosync(v); + /* Pause all other vcpus. */ + for_each_vcpu ( d, v ) + if ( v != current && flush_vcpu(ctxt, v) ) + vcpu_pause_nosync(v); - /* Now that all VCPUs are signalled to deschedule, we wait... */ - for_each_vcpu ( d, v ) - if ( v != current && flush_vcpu(ctxt, v) ) - while ( !vcpu_runnable(v) && v->is_running ) - cpu_relax(); + /* Now that all VCPUs are signalled to deschedule, we wait... */ + for_each_vcpu ( d, v ) + if ( v != current && flush_vcpu(ctxt, v) ) + while ( !vcpu_runnable(v) && v->is_running ) + cpu_relax(); - /* All other vcpus are paused, safe to unlock now. */ - spin_unlock(&d->hypercall_deadlock_mutex); + /* All other vcpus are paused, safe to unlock now. */ + spin_unlock(&d->hypercall_deadlock_mutex); + } cpumask_clear(mask); @@ -4015,8 +4026,15 @@ bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v), if ( !flush_vcpu(ctxt, v) ) continue; - paging_update_cr3(v, false); + if ( trap_cr3 ) + paging_update_cr3(v, false); + /* + * It's correct to do this flush without pausing the vCPUs: any vCPU + * context switch will already flush the tlb and the worse that could + * happen is that Xen ends up performing flushes on pCPUs that are no + * longer running the target vCPUs. + */ cpu = read_atomic(&v->dirty_cpu); if ( is_vcpu_dirty_cpu(cpu) ) __cpumask_set_cpu(cpu, mask); @@ -4026,9 +4044,10 @@ bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v), flush_tlb_mask(mask); /* Done. */ - for_each_vcpu ( d, v ) - if ( v != current && flush_vcpu(ctxt, v) ) - vcpu_unpause(v); + if ( trap_cr3 ) + for_each_vcpu ( d, v ) + if ( v != current && flush_vcpu(ctxt, v) ) + vcpu_unpause(v); return true; }
There's no need to call paging_update_cr3 unless CR3 trapping is enabled, and that's only the case when using shadow paging or when requested for introspection purposes, otherwise there's no need to pause all the vCPUs of the domain in order to perform the flush. Check whether CR3 trapping is currently in use in order to decide whether the vCPUs should be paused, otherwise just perform the flush. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/x86/hvm/hvm.c | 55 ++++++++++++++++++++++++++++-------------- 1 file changed, 37 insertions(+), 18 deletions(-)