Message ID | 18fcf499-a2ae-ab48-a66f-ca0499097e8a@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86: reduce cache flushing overhead | expand |
On 19/04/2023 11:46 am, Jan Beulich wrote: > There's no need to write back caches on all CPUs upon seeing a WBINVD > exit; ones that a vCPU hasn't run on since the last writeback (or since > it was started) can't hold data which may need writing back. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> I find it unlikely that this is an improvement in any way at all. You're adding a memory allocation, and making the fastpath slower, for all HVM domains even the ~100% of them in practice which never get given a device in the first place. Just so you can skip the WBINVD side effect on the L1/L2 caches of the CPUs this domain happens not to have run on since the last time they flushed (which is already an under estimate). Note how this does not change the behaviour for the L3 caches, which form the overwhelming majority of the WBINVD overhead in the first place. So my response was going to be "definitely not without the per-domain 'reduced cacheability permitted' setting we've discussed". And even then, not without numbers suggesting it's a problem in the first place, or at least a better explanation of why it might plausibly be an issue. But, in writing this, I've realised a real bug. Cache snoops can occur and pull lines sideways for microarchitectural reasons. And even if we want to hand-wave that away as being unlikely (it is), you can't hand-wave away rogue speculation in the directmap. By not issuing WBINVD on all cores, you've got a real chance of letting some lines escape the attempt to evict them. But it's worse than that - even IPIing all cores, there's a speculation pattern which can cause some lines to survive. Rare, sure, but not impossible. Right now, I'm not sure that WBINVD can even be used safely without the extra careful use of CR0.{CD,NW}, which provides a workaround for native, but nothing helpful for hypervisors... ~Andrew
On 19.04.2023 23:55, Andrew Cooper wrote: > On 19/04/2023 11:46 am, Jan Beulich wrote: >> There's no need to write back caches on all CPUs upon seeing a WBINVD >> exit; ones that a vCPU hasn't run on since the last writeback (or since >> it was started) can't hold data which may need writing back. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > I find it unlikely that this is an improvement in any way at all. > > You're adding a memory allocation, and making the fastpath slower, for > all HVM domains even the ~100% of them in practice which never get given > a device in the first place. > > Just so you can skip the WBINVD side effect on the L1/L2 caches of the > CPUs this domain happens not to have run on since the last time they > flushed (which is already an under estimate). Note how this does not > change the behaviour for the L3 caches, which form the overwhelming > majority of the WBINVD overhead in the first place. You're thinking of only single-node systems here, it seems? I view this change as particularly relevant for node-constrained domains on NUMA systems. As to "making the fastpath slower": That can only be the __cpumask_set_cpu() added to hvm_do_resume(). What do you suggest? I can certainly add a conditional there (and then I could also make the memory allocation conditional), but I thought a LOCK-less RMW memory operation might better be done in straight line code. As an aside - after having thought of making this change, I did go and check what KVM does. And (surprise or not) they do exactly this. > So my response was going to be "definitely not without the per-domain > 'reduced cacheability permitted' setting we've discussed". And even > then, not without numbers suggesting it's a problem in the first place, > or at least a better explanation of why it might plausibly be an issue. > > > But, in writing this, I've realised a real bug. > > Cache snoops can occur and pull lines sideways for microarchitectural > reasons. And even if we want to hand-wave that away as being unlikely > (it is), you can't hand-wave away rogue speculation in the directmap. > > By not issuing WBINVD on all cores, you've got a real chance of letting > some lines escape the attempt to evict them. > > But it's worse than that - even IPIing all cores, there's a speculation > pattern which can cause some lines to survive. Rare, sure, but not > impossible. > > Right now, I'm not sure that WBINVD can even be used safely without the > extra careful use of CR0.{CD,NW}, which provides a workaround for > native, but nothing helpful for hypervisors... Wait: See the title and the earlier patches. We're not talking of "evict" here anymore, but of "write-back". The few cases of "evict" are left alone by this change. If any of those are affected by what you say (and it looks like some might be), then fixing that definitely is unrelated work. (You may have meant that latter part of your reply like this, but you haven't said so in any way I would recognize.) Jan
--- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -537,6 +537,8 @@ void hvm_do_resume(struct vcpu *v) v->arch.hvm.inject_event.vector = HVM_EVENT_VECTOR_UNSET; } + __cpumask_set_cpu(v->processor, v->arch.hvm.cache_dirty_mask); + if ( unlikely(v->arch.vm_event) && v->arch.monitor.next_interrupt_enabled ) { struct x86_event info; @@ -1592,6 +1594,10 @@ int hvm_vcpu_initialise(struct vcpu *v) if ( rc ) goto fail6; + rc = -ENOMEM; + if ( !zalloc_cpumask_var(&v->arch.hvm.cache_dirty_mask) ) + goto fail6; + rc = ioreq_server_add_vcpu_all(d, v); if ( rc != 0 ) goto fail6; @@ -1621,6 +1627,7 @@ int hvm_vcpu_initialise(struct vcpu *v) hvm_vcpu_cacheattr_destroy(v); fail1: viridian_vcpu_deinit(v); + FREE_CPUMASK_VAR(v->arch.hvm.cache_dirty_mask); return rc; } @@ -1628,6 +1635,8 @@ void hvm_vcpu_destroy(struct vcpu *v) { viridian_vcpu_deinit(v); + FREE_CPUMASK_VAR(v->arch.hvm.cache_dirty_mask); + ioreq_server_remove_vcpu_all(v->domain, v); if ( hvm_altp2m_supported() ) --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2363,8 +2363,14 @@ static void svm_vmexit_mce_intercept( static void cf_check svm_wbinvd_intercept(void) { - if ( cache_flush_permitted(current->domain) ) - flush_all(FLUSH_WRITEBACK); + struct vcpu *curr = current; + + if ( !cache_flush_permitted(curr->domain) ) + return; + + flush_mask(curr->arch.hvm.cache_dirty_mask, FLUSH_WRITEBACK); + cpumask_copy(curr->arch.hvm.cache_dirty_mask, + cpumask_of(curr->processor)); } static void svm_vmexit_do_invalidate_cache(struct cpu_user_regs *regs, --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3710,11 +3710,17 @@ static void vmx_do_extint(struct cpu_use static void cf_check vmx_wbinvd_intercept(void) { - if ( !cache_flush_permitted(current->domain) || iommu_snoop ) + struct vcpu *curr = current; + + if ( !cache_flush_permitted(curr->domain) || iommu_snoop ) return; if ( cpu_has_wbinvd_exiting ) - flush_all(FLUSH_WRITEBACK); + { + flush_mask(curr->arch.hvm.cache_dirty_mask, FLUSH_WRITEBACK); + cpumask_copy(curr->arch.hvm.cache_dirty_mask, + cpumask_of(curr->processor)); + } else wbnoinvd(); } --- a/xen/arch/x86/include/asm/hvm/vcpu.h +++ b/xen/arch/x86/include/asm/hvm/vcpu.h @@ -161,6 +161,8 @@ struct hvm_vcpu { struct svm_vcpu svm; }; + cpumask_var_t cache_dirty_mask; + struct tasklet assert_evtchn_irq_tasklet; struct nestedvcpu nvcpu;
There's no need to write back caches on all CPUs upon seeing a WBINVD exit; ones that a vCPU hasn't run on since the last writeback (or since it was started) can't hold data which may need writing back. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- With us not running AMD IOMMUs in non-coherent ways, I wonder whether svm_wbinvd_intercept() really needs to do anything (or whether it couldn't check iommu_snoop just like VMX does, knowing that as of c609108b2190 ["x86/shadow: make iommu_snoop usage consistent with HAP's"] that's always set; this would largely serve as grep fodder then, to make sure this code is updated once / when we do away with this global variable, and it would be the penultimate step to being able to fold SVM's and VT-x'es functions).