Message ID | aad47c43b7cd7a391492b8be7b881cd37e9764c7.1607686878.git.havanur@amazon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [XEN,v1,1/1] Invalidate cache for cpus affinitized to the domain | expand |
CCing Andy and Jan Is restricting cache flush to set of cpus bound to the domain, a right thing to do? On Fri, 2020-12-11 at 11:44 +0000, Harsha Shamsundara Havanur wrote: > A HVM domain flushes cache on all the cpus using > `flush_all` macro which uses cpu_online_map, during > i) creation of a new domain > ii) when device-model op is performed > iii) when domain is destructed. > > This triggers IPI on all the cpus, thus affecting other > domains that are pinned to different pcpus. This patch > restricts cache flush to the set of cpus affinitized to > the current domain using `domain->dirty_cpumask`. > > Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com> > --- > xen/arch/x86/hvm/hvm.c | 2 +- > xen/arch/x86/hvm/mtrr.c | 6 +++--- > xen/arch/x86/hvm/svm/svm.c | 2 +- > xen/arch/x86/hvm/vmx/vmx.c | 2 +- > 4 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index 54e32e4fe8..ec247c7010 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -2219,7 +2219,7 @@ void hvm_shadow_handle_cd(struct vcpu *v, > unsigned long value) > domain_pause_nosync(v->domain); > > /* Flush physical caches. */ > - flush_all(FLUSH_CACHE); > + flush_mask(v->domain->dirty_cpumask, FLUSH_CACHE); > hvm_set_uc_mode(v, 1); > > domain_unpause(v->domain); > diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c > index fb051d59c3..0d804c1fa0 100644 > --- a/xen/arch/x86/hvm/mtrr.c > +++ b/xen/arch/x86/hvm/mtrr.c > @@ -631,7 +631,7 @@ int hvm_set_mem_pinned_cacheattr(struct domain > *d, uint64_t gfn_start, > break; > /* fall through */ > default: > - flush_all(FLUSH_CACHE); > + flush_mask(d->dirty_cpumask, FLUSH_CACHE); > break; > } > return 0; > @@ -683,7 +683,7 @@ int hvm_set_mem_pinned_cacheattr(struct domain > *d, uint64_t gfn_start, > list_add_rcu(&range->list, &d- > >arch.hvm.pinned_cacheattr_ranges); > p2m_memory_type_changed(d); > if ( type != PAT_TYPE_WRBACK ) > - flush_all(FLUSH_CACHE); > + flush_mask(d->dirty_cpumask, FLUSH_CACHE); > > return 0; > } > @@ -785,7 +785,7 @@ void memory_type_changed(struct domain *d) > d->vcpu && d->vcpu[0] ) > { > p2m_memory_type_changed(d); > - flush_all(FLUSH_CACHE); > + flush_mask(d->dirty_cpumask, FLUSH_CACHE); > } > } > > diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c > index cfea5b5523..383e763d7d 100644 > --- a/xen/arch/x86/hvm/svm/svm.c > +++ b/xen/arch/x86/hvm/svm/svm.c > @@ -2395,7 +2395,7 @@ static void svm_vmexit_mce_intercept( > static void svm_wbinvd_intercept(void) > { > if ( cache_flush_permitted(current->domain) ) > - flush_all(FLUSH_CACHE); > + flush_mask(current->domain->dirty_cpumask, FLUSH_CACHE); > } > > static void svm_vmexit_do_invalidate_cache(struct cpu_user_regs > *regs, > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index 86b8916a5d..a05c7036c4 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -3349,7 +3349,7 @@ static void vmx_wbinvd_intercept(void) > return; > > if ( cpu_has_wbinvd_exiting ) > - flush_all(FLUSH_CACHE); > + flush_mask(current->domain->dirty_cpumask, FLUSH_CACHE); > else > wbinvd(); > }
On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote: > A HVM domain flushes cache on all the cpus using > `flush_all` macro which uses cpu_online_map, during > i) creation of a new domain > ii) when device-model op is performed > iii) when domain is destructed. > > This triggers IPI on all the cpus, thus affecting other > domains that are pinned to different pcpus. This patch > restricts cache flush to the set of cpus affinitized to > the current domain using `domain->dirty_cpumask`. But then you need to effect cache flushing when a CPU gets taken out of domain->dirty_cpumask. I don't think you/we want to do that. Jan
On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote: > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender > and know the content is safe. > > > > On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote: > > A HVM domain flushes cache on all the cpus using > > `flush_all` macro which uses cpu_online_map, during > > i) creation of a new domain > > ii) when device-model op is performed > > iii) when domain is destructed. > > > > This triggers IPI on all the cpus, thus affecting other > > domains that are pinned to different pcpus. This patch > > restricts cache flush to the set of cpus affinitized to > > the current domain using `domain->dirty_cpumask`. > > But then you need to effect cache flushing when a CPU gets > taken out of domain->dirty_cpumask. I don't think you/we want > to do that. > If we do not restrict, it could lead to DoS attack, where a malicious guest could keep writing to MTRR registers or do a cache flush through DM Op and keep sending IPIs to other neighboring guests. -Harsha > Jan >
On 14.12.2020 10:26, Shamsundara Havanur, Harsha wrote: > On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote: >> On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote: >>> A HVM domain flushes cache on all the cpus using >>> `flush_all` macro which uses cpu_online_map, during >>> i) creation of a new domain >>> ii) when device-model op is performed >>> iii) when domain is destructed. >>> >>> This triggers IPI on all the cpus, thus affecting other >>> domains that are pinned to different pcpus. This patch >>> restricts cache flush to the set of cpus affinitized to >>> the current domain using `domain->dirty_cpumask`. >> >> But then you need to effect cache flushing when a CPU gets >> taken out of domain->dirty_cpumask. I don't think you/we want >> to do that. >> > If we do not restrict, it could lead to DoS attack, where a malicious > guest could keep writing to MTRR registers or do a cache flush through > DM Op and keep sending IPIs to other neighboring guests. Could you outline how this can become a DoS? Throughput may be (heavily) impacted, yes, but I don't see how this could suppress forward progress altogether. Improved accounting may be desirable here, such that the time spent in the flushes gets subtracted from the initiator's credits rather than the vCPU's which happens to run on the subject pCPU at the time. This is a more general topic though, which was previously brought up: Time spent in servicing interrupts should in general not be accounted to the vCPU running of which happened to be interrupted. It's just that for the majority of interrupts the amount of time needed to handle them is pretty well bounded, albeit very high interrupt rates could have the same effect as a single interrupt taking very long to service. An intermediate option (albeit presumably somewhat intrusive) might be to use e.g. a tasklet on each CPU to effect the flushing. This wouldn't reduce the overall hit on the system, but would at least avoid penalizing other vCPU-s as to their scheduling time slices. The issuing vCPU would then need pausing until all of the flushes got carried out. Jan
Hi Harsha, On 14/12/2020 09:26, Shamsundara Havanur, Harsha wrote: > On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote: >> CAUTION: This email originated from outside of the organization. Do >> not click links or open attachments unless you can confirm the sender >> and know the content is safe. >> >> >> >> On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote: >>> A HVM domain flushes cache on all the cpus using >>> `flush_all` macro which uses cpu_online_map, during >>> i) creation of a new domain >>> ii) when device-model op is performed >>> iii) when domain is destructed. >>> >>> This triggers IPI on all the cpus, thus affecting other >>> domains that are pinned to different pcpus. This patch >>> restricts cache flush to the set of cpus affinitized to >>> the current domain using `domain->dirty_cpumask`. >> >> But then you need to effect cache flushing when a CPU gets >> taken out of domain->dirty_cpumask. I don't think you/we want >> to do that. >> > If we do not restrict, it could lead to DoS attack, where a malicious > guest could keep writing to MTRR registers or do a cache flush through > DM Op and keep sending IPIs to other neighboring guests. I saw Jan already answered about the alleged DoS, so I will just focus on the resolution. I agree that in the ideal situation we want to limit the impact on the other vCPUs. However, we also need to make sure the cure is not worse than the symptoms. The cache flush cannot be restricted in all the pinning situation because pinning doesn't imply the pCPU will be dedicated to a given vCPU or even the vCPU will stick on pCPU (we may allow floating on a NUMA socket). Although your setup may offer this guarantee. My knowledge in this area is quite limited. But below a few question that hopefully will help to make a decision. The first question to answer is: can the flush can be restricted in a setup where each vCPUs are running on a decicated pCPU (i.e partionned system)? If the answer is yes, then we should figure out whether using domain->dirty_cpumask would always be correct? For instance, a vCPU may not have yet run, so can we consider the associated pCPU cache would be consistent? Another line of question is what can we do on system supporting self-snooping? IOW, would it be possible to restrict the flush for all the setup? Cheers,
On 14/12/2020 10:56, Julien Grall wrote: > Hi Harsha, > > On 14/12/2020 09:26, Shamsundara Havanur, Harsha wrote: >> On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote: >>> CAUTION: This email originated from outside of the organization. Do >>> not click links or open attachments unless you can confirm the sender >>> and know the content is safe. >>> >>> >>> >>> On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote: >>>> A HVM domain flushes cache on all the cpus using >>>> `flush_all` macro which uses cpu_online_map, during >>>> i) creation of a new domain >>>> ii) when device-model op is performed >>>> iii) when domain is destructed. >>>> >>>> This triggers IPI on all the cpus, thus affecting other >>>> domains that are pinned to different pcpus. This patch >>>> restricts cache flush to the set of cpus affinitized to >>>> the current domain using `domain->dirty_cpumask`. >>> >>> But then you need to effect cache flushing when a CPU gets >>> taken out of domain->dirty_cpumask. I don't think you/we want >>> to do that. >>> >> If we do not restrict, it could lead to DoS attack, where a malicious >> guest could keep writing to MTRR registers or do a cache flush through >> DM Op and keep sending IPIs to other neighboring guests. > > I saw Jan already answered about the alleged DoS, so I will just focus > on the resolution. > > I agree that in the ideal situation we want to limit the impact on the > other vCPUs. However, we also need to make sure the cure is not worse > than the symptoms. And specifically, only a change which is correct. This patch very definitely isn't. Lines can get cached on other cpus from, e.g. qemu mappings and PV backends. > > The cache flush cannot be restricted in all the pinning situation > because pinning doesn't imply the pCPU will be dedicated to a given > vCPU or even the vCPU will stick on pCPU (we may allow floating on a > NUMA socket). Although your setup may offer this guarantee. > > My knowledge in this area is quite limited. But below a few question > that hopefully will help to make a decision. > > The first question to answer is: can the flush can be restricted in a > setup where each vCPUs are running on a decicated pCPU (i.e partionned > system)? Not really. Lines can become cached even from speculation in the directmap. If you need to flush the caches (and don't have a virtual mapping to issue clflush/clflushopt/clwb over), it must be on all CPUs. > If the answer is yes, then we should figure out whether using > domain->dirty_cpumask would always be correct? For instance, a vCPU > may not have yet run, so can we consider the associated pCPU cache > would be consistent? > > Another line of question is what can we do on system supporting > self-snooping? IOW, would it be possible to restrict the flush for all > the setup? Right - this is the avenue which ought to be investigated. (Working) self-noop ought to remove the need for some of these cache flushes. Others look like they're not correct to begin with. Others, such as the wbinvd intercepts absolutely must stay as they are. ~Andrew
On Mon, 2020-12-14 at 16:01 +0000, Andrew Cooper wrote: > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender > and know the content is safe. > > > > On 14/12/2020 10:56, Julien Grall wrote: > > Hi Harsha, > > > > On 14/12/2020 09:26, Shamsundara Havanur, Harsha wrote: > > > On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote: > > > > CAUTION: This email originated from outside of the > > > > organization. Do > > > > not click links or open attachments unless you can confirm the > > > > sender > > > > and know the content is safe. > > > > > > > > > > > > > > > > On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote: > > > > > A HVM domain flushes cache on all the cpus using > > > > > `flush_all` macro which uses cpu_online_map, during > > > > > i) creation of a new domain > > > > > ii) when device-model op is performed > > > > > iii) when domain is destructed. > > > > > > > > > > This triggers IPI on all the cpus, thus affecting other > > > > > domains that are pinned to different pcpus. This patch > > > > > restricts cache flush to the set of cpus affinitized to > > > > > the current domain using `domain->dirty_cpumask`. > > > > > > > > But then you need to effect cache flushing when a CPU gets > > > > taken out of domain->dirty_cpumask. I don't think you/we want > > > > to do that. > > > > > > > > > > If we do not restrict, it could lead to DoS attack, where a > > > malicious > > > guest could keep writing to MTRR registers or do a cache flush > > > through > > > DM Op and keep sending IPIs to other neighboring guests. > > > > I saw Jan already answered about the alleged DoS, so I will just > > focus > > on the resolution. > > > > I agree that in the ideal situation we want to limit the impact on > > the > > other vCPUs. However, we also need to make sure the cure is not > > worse > > than the symptoms. > > And specifically, only a change which is correct. This patch very > definitely isn't. > > Lines can get cached on other cpus from, e.g. qemu mappings and PV > backends. > > > > > The cache flush cannot be restricted in all the pinning situation > > because pinning doesn't imply the pCPU will be dedicated to a given > > vCPU or even the vCPU will stick on pCPU (we may allow floating on > > a > > NUMA socket). Although your setup may offer this guarantee. > > > > My knowledge in this area is quite limited. But below a few > > question > > that hopefully will help to make a decision. > > > > The first question to answer is: can the flush can be restricted in > > a > > setup where each vCPUs are running on a decicated pCPU (i.e > > partionned > > system)? > > Not really. Lines can become cached even from speculation in the > directmap. > > If you need to flush the caches (and don't have a virtual mapping to > issue clflush/clflushopt/clwb over), it must be on all CPUs. If lines are cached due to aggressive speculation from a different guest, wouldn't they be invalidated at the speculation boundary, since it's a wrong speculation? Would it still require to be flushed explicitly? -Harsha > > > If the answer is yes, then we should figure out whether using > > domain->dirty_cpumask would always be correct? For instance, a vCPU > > may not have yet run, so can we consider the associated pCPU cache > > would be consistent? > > > > Another line of question is what can we do on system supporting > > self-snooping? IOW, would it be possible to restrict the flush for > > all > > the setup? > > Right - this is the avenue which ought to be investigated. (Working) > self-noop ought to remove the need for some of these cache flushes. > Others look like they're not correct to begin with. Others, such as > the > wbinvd intercepts absolutely must stay as they are. > > ~Andrew
On 14/12/2020 19:05, Shamsundara Havanur, Harsha wrote: > On Mon, 2020-12-14 at 16:01 +0000, Andrew Cooper wrote: >> CAUTION: This email originated from outside of the organization. Do >> not click links or open attachments unless you can confirm the sender >> and know the content is safe. >> >> >> >> On 14/12/2020 10:56, Julien Grall wrote: >>> Hi Harsha, >>> >>> On 14/12/2020 09:26, Shamsundara Havanur, Harsha wrote: >>>> On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote: >>>>> CAUTION: This email originated from outside of the >>>>> organization. Do >>>>> not click links or open attachments unless you can confirm the >>>>> sender >>>>> and know the content is safe. >>>>> >>>>> >>>>> >>>>> On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote: >>>>>> A HVM domain flushes cache on all the cpus using >>>>>> `flush_all` macro which uses cpu_online_map, during >>>>>> i) creation of a new domain >>>>>> ii) when device-model op is performed >>>>>> iii) when domain is destructed. >>>>>> >>>>>> This triggers IPI on all the cpus, thus affecting other >>>>>> domains that are pinned to different pcpus. This patch >>>>>> restricts cache flush to the set of cpus affinitized to >>>>>> the current domain using `domain->dirty_cpumask`. >>>>> But then you need to effect cache flushing when a CPU gets >>>>> taken out of domain->dirty_cpumask. I don't think you/we want >>>>> to do that. >>>>> >>>> If we do not restrict, it could lead to DoS attack, where a >>>> malicious >>>> guest could keep writing to MTRR registers or do a cache flush >>>> through >>>> DM Op and keep sending IPIs to other neighboring guests. >>> I saw Jan already answered about the alleged DoS, so I will just >>> focus >>> on the resolution. >>> >>> I agree that in the ideal situation we want to limit the impact on >>> the >>> other vCPUs. However, we also need to make sure the cure is not >>> worse >>> than the symptoms. >> And specifically, only a change which is correct. This patch very >> definitely isn't. >> >> Lines can get cached on other cpus from, e.g. qemu mappings and PV >> backends. >> >>> The cache flush cannot be restricted in all the pinning situation >>> because pinning doesn't imply the pCPU will be dedicated to a given >>> vCPU or even the vCPU will stick on pCPU (we may allow floating on >>> a >>> NUMA socket). Although your setup may offer this guarantee. >>> >>> My knowledge in this area is quite limited. But below a few >>> question >>> that hopefully will help to make a decision. >>> >>> The first question to answer is: can the flush can be restricted in >>> a >>> setup where each vCPUs are running on a decicated pCPU (i.e >>> partionned >>> system)? >> Not really. Lines can become cached even from speculation in the >> directmap. >> >> If you need to flush the caches (and don't have a virtual mapping to >> issue clflush/clflushopt/clwb over), it must be on all CPUs. > If lines are cached due to aggressive speculation from a different > guest, wouldn't they be invalidated at the speculation boundary, since > it's a wrong speculation? Would it still require to be flushed > explicitly? No. Caches are microarchitectural state (just like TLBs, linefill buffers, etc.) The entire mess surrounding speculative security issues is that the perturbance from bad speculation survive, and can be recovered at a later point. ~Andrew
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 54e32e4fe8..ec247c7010 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2219,7 +2219,7 @@ void hvm_shadow_handle_cd(struct vcpu *v, unsigned long value) domain_pause_nosync(v->domain); /* Flush physical caches. */ - flush_all(FLUSH_CACHE); + flush_mask(v->domain->dirty_cpumask, FLUSH_CACHE); hvm_set_uc_mode(v, 1); domain_unpause(v->domain); diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c index fb051d59c3..0d804c1fa0 100644 --- a/xen/arch/x86/hvm/mtrr.c +++ b/xen/arch/x86/hvm/mtrr.c @@ -631,7 +631,7 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start, break; /* fall through */ default: - flush_all(FLUSH_CACHE); + flush_mask(d->dirty_cpumask, FLUSH_CACHE); break; } return 0; @@ -683,7 +683,7 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start, list_add_rcu(&range->list, &d->arch.hvm.pinned_cacheattr_ranges); p2m_memory_type_changed(d); if ( type != PAT_TYPE_WRBACK ) - flush_all(FLUSH_CACHE); + flush_mask(d->dirty_cpumask, FLUSH_CACHE); return 0; } @@ -785,7 +785,7 @@ void memory_type_changed(struct domain *d) d->vcpu && d->vcpu[0] ) { p2m_memory_type_changed(d); - flush_all(FLUSH_CACHE); + flush_mask(d->dirty_cpumask, FLUSH_CACHE); } } diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index cfea5b5523..383e763d7d 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2395,7 +2395,7 @@ static void svm_vmexit_mce_intercept( static void svm_wbinvd_intercept(void) { if ( cache_flush_permitted(current->domain) ) - flush_all(FLUSH_CACHE); + flush_mask(current->domain->dirty_cpumask, FLUSH_CACHE); } static void svm_vmexit_do_invalidate_cache(struct cpu_user_regs *regs, diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 86b8916a5d..a05c7036c4 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3349,7 +3349,7 @@ static void vmx_wbinvd_intercept(void) return; if ( cpu_has_wbinvd_exiting ) - flush_all(FLUSH_CACHE); + flush_mask(current->domain->dirty_cpumask, FLUSH_CACHE); else wbinvd(); }
A HVM domain flushes cache on all the cpus using `flush_all` macro which uses cpu_online_map, during i) creation of a new domain ii) when device-model op is performed iii) when domain is destructed. This triggers IPI on all the cpus, thus affecting other domains that are pinned to different pcpus. This patch restricts cache flush to the set of cpus affinitized to the current domain using `domain->dirty_cpumask`. Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com> --- xen/arch/x86/hvm/hvm.c | 2 +- xen/arch/x86/hvm/mtrr.c | 6 +++--- xen/arch/x86/hvm/svm/svm.c | 2 +- xen/arch/x86/hvm/vmx/vmx.c | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-)