[XEN,v1,1/1] Invalidate cache for cpus affinitized to the domain

Message ID	aad47c43b7cd7a391492b8be7b881cd37e9764c7.1607686878.git.havanur@amazon.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=Z24W=FP=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A6E623DE3 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Harsha Shamsundara Havanur <havanur@amazon.com> To: <xen-devel@lists.xenproject.org> CC: Harsha Shamsundara Havanur <havanur@amazon.com> Subject: [XEN PATCH v1 1/1] Invalidate cache for cpus affinitized to the domain Date: Fri, 11 Dec 2020 11:44:36 +0000 Message-ID: <aad47c43b7cd7a391492b8be7b881cd37e9764c7.1607686878.git.havanur@amazon.com> In-Reply-To: <cover.1607686878.git.havanur@amazon.com> References: <cover.1607686878.git.havanur@amazon.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: Bulk
Series	[XEN,v1,1/1] Invalidate cache for cpus affinitized to the domain \| expand [XEN,v1,1/1] Invalidate cache for cpus affinitized to the domain

Shamsundara Havanur, Harsha Dec. 11, 2020, 11:44 a.m. UTC

A HVM domain flushes cache on all the cpus using
`flush_all` macro which uses cpu_online_map, during
i) creation of a new domain
ii) when device-model op is performed
iii) when domain is destructed.

This triggers IPI on all the cpus, thus affecting other
domains that are pinned to different pcpus. This patch
restricts cache flush to the set of cpus affinitized to
the current domain using `domain->dirty_cpumask`.

Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
---
 xen/arch/x86/hvm/hvm.c     | 2 +-
 xen/arch/x86/hvm/mtrr.c    | 6 +++---
 xen/arch/x86/hvm/svm/svm.c | 2 +-
 xen/arch/x86/hvm/vmx/vmx.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

Shamsundara Havanur, Harsha Dec. 12, 2020, 1:35 p.m. UTC | #1

CCing Andy and Jan
Is restricting cache flush to set of cpus bound to the domain, a
right thing to do?

On Fri, 2020-12-11 at 11:44 +0000, Harsha Shamsundara Havanur wrote:
> A HVM domain flushes cache on all the cpus using
> `flush_all` macro which uses cpu_online_map, during
> i) creation of a new domain
> ii) when device-model op is performed
> iii) when domain is destructed.
> 
> This triggers IPI on all the cpus, thus affecting other
> domains that are pinned to different pcpus. This patch
> restricts cache flush to the set of cpus affinitized to
> the current domain using `domain->dirty_cpumask`.
> 
> Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
> ---
>  xen/arch/x86/hvm/hvm.c     | 2 +-
>  xen/arch/x86/hvm/mtrr.c    | 6 +++---
>  xen/arch/x86/hvm/svm/svm.c | 2 +-
>  xen/arch/x86/hvm/vmx/vmx.c | 2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 54e32e4fe8..ec247c7010 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2219,7 +2219,7 @@ void hvm_shadow_handle_cd(struct vcpu *v,
> unsigned long value)
>              domain_pause_nosync(v->domain);
>  
>              /* Flush physical caches. */
> -            flush_all(FLUSH_CACHE);
> +            flush_mask(v->domain->dirty_cpumask, FLUSH_CACHE);
>              hvm_set_uc_mode(v, 1);
>  
>              domain_unpause(v->domain);
> diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
> index fb051d59c3..0d804c1fa0 100644
> --- a/xen/arch/x86/hvm/mtrr.c
> +++ b/xen/arch/x86/hvm/mtrr.c
> @@ -631,7 +631,7 @@ int hvm_set_mem_pinned_cacheattr(struct domain
> *d, uint64_t gfn_start,
>                          break;
>                      /* fall through */
>                  default:
> -                    flush_all(FLUSH_CACHE);
> +                    flush_mask(d->dirty_cpumask, FLUSH_CACHE);
>                      break;
>                  }
>                  return 0;
> @@ -683,7 +683,7 @@ int hvm_set_mem_pinned_cacheattr(struct domain
> *d, uint64_t gfn_start,
>      list_add_rcu(&range->list, &d-
> >arch.hvm.pinned_cacheattr_ranges);
>      p2m_memory_type_changed(d);
>      if ( type != PAT_TYPE_WRBACK )
> -        flush_all(FLUSH_CACHE);
> +        flush_mask(d->dirty_cpumask, FLUSH_CACHE);
>  
>      return 0;
>  }
> @@ -785,7 +785,7 @@ void memory_type_changed(struct domain *d)
>           d->vcpu && d->vcpu[0] )
>      {
>          p2m_memory_type_changed(d);
> -        flush_all(FLUSH_CACHE);
> +        flush_mask(d->dirty_cpumask, FLUSH_CACHE);
>      }
>  }
>  
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index cfea5b5523..383e763d7d 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -2395,7 +2395,7 @@ static void svm_vmexit_mce_intercept(
>  static void svm_wbinvd_intercept(void)
>  {
>      if ( cache_flush_permitted(current->domain) )
> -        flush_all(FLUSH_CACHE);
> +        flush_mask(current->domain->dirty_cpumask, FLUSH_CACHE);
>  }
>  
>  static void svm_vmexit_do_invalidate_cache(struct cpu_user_regs
> *regs,
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 86b8916a5d..a05c7036c4 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -3349,7 +3349,7 @@ static void vmx_wbinvd_intercept(void)
>          return;
>  
>      if ( cpu_has_wbinvd_exiting )
> -        flush_all(FLUSH_CACHE);
> +        flush_mask(current->domain->dirty_cpumask, FLUSH_CACHE);
>      else
>          wbinvd();
>  }

Jan Beulich Dec. 14, 2020, 8:52 a.m. UTC | #2

On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote:
> A HVM domain flushes cache on all the cpus using
> `flush_all` macro which uses cpu_online_map, during
> i) creation of a new domain
> ii) when device-model op is performed
> iii) when domain is destructed.
> 
> This triggers IPI on all the cpus, thus affecting other
> domains that are pinned to different pcpus. This patch
> restricts cache flush to the set of cpus affinitized to
> the current domain using `domain->dirty_cpumask`.

But then you need to effect cache flushing when a CPU gets
taken out of domain->dirty_cpumask. I don't think you/we want
to do that.

Jan

Shamsundara Havanur, Harsha Dec. 14, 2020, 9:26 a.m. UTC | #3

On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
> 
> 
> 
> On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote:
> > A HVM domain flushes cache on all the cpus using
> > `flush_all` macro which uses cpu_online_map, during
> > i) creation of a new domain
> > ii) when device-model op is performed
> > iii) when domain is destructed.
> > 
> > This triggers IPI on all the cpus, thus affecting other
> > domains that are pinned to different pcpus. This patch
> > restricts cache flush to the set of cpus affinitized to
> > the current domain using `domain->dirty_cpumask`.
> 
> But then you need to effect cache flushing when a CPU gets
> taken out of domain->dirty_cpumask. I don't think you/we want
> to do that.
> 
If we do not restrict, it could lead to DoS attack, where a malicious
guest could keep writing to MTRR registers or do a cache flush through
DM Op and keep sending IPIs to other neighboring guests.

-Harsha
> Jan
>

Jan Beulich Dec. 14, 2020, 9:55 a.m. UTC | #4

On 14.12.2020 10:26, Shamsundara Havanur, Harsha wrote:
> On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote:
>> On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote:
>>> A HVM domain flushes cache on all the cpus using
>>> `flush_all` macro which uses cpu_online_map, during
>>> i) creation of a new domain
>>> ii) when device-model op is performed
>>> iii) when domain is destructed.
>>>
>>> This triggers IPI on all the cpus, thus affecting other
>>> domains that are pinned to different pcpus. This patch
>>> restricts cache flush to the set of cpus affinitized to
>>> the current domain using `domain->dirty_cpumask`.
>>
>> But then you need to effect cache flushing when a CPU gets
>> taken out of domain->dirty_cpumask. I don't think you/we want
>> to do that.
>>
> If we do not restrict, it could lead to DoS attack, where a malicious
> guest could keep writing to MTRR registers or do a cache flush through
> DM Op and keep sending IPIs to other neighboring guests.

Could you outline how this can become a DoS? Throughput may be
(heavily) impacted, yes, but I don't see how this could suppress
forward progress altogether. Improved accounting may be desirable
here, such that the time spent in the flushes gets subtracted
from the initiator's credits rather than the vCPU's which happens
to run on the subject pCPU at the time. This is a more general
topic though, which was previously brought up: Time spent in
servicing interrupts should in general not be accounted to the
vCPU running of which happened to be interrupted. It's just that
for the majority of interrupts the amount of time needed to
handle them is pretty well bounded, albeit very high interrupt
rates could have the same effect as a single interrupt taking
very long to service.

An intermediate option (albeit presumably somewhat intrusive)
might be to use e.g. a tasklet on each CPU to effect the
flushing. This wouldn't reduce the overall hit on the system,
but would at least avoid penalizing other vCPU-s as to their
scheduling time slices. The issuing vCPU would then need
pausing until all of the flushes got carried out.

Jan

Julien Grall Dec. 14, 2020, 10:56 a.m. UTC | #5

Hi Harsha,

On 14/12/2020 09:26, Shamsundara Havanur, Harsha wrote:
> On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote:
>> CAUTION: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender
>> and know the content is safe.
>>
>>
>>
>> On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote:
>>> A HVM domain flushes cache on all the cpus using
>>> `flush_all` macro which uses cpu_online_map, during
>>> i) creation of a new domain
>>> ii) when device-model op is performed
>>> iii) when domain is destructed.
>>>
>>> This triggers IPI on all the cpus, thus affecting other
>>> domains that are pinned to different pcpus. This patch
>>> restricts cache flush to the set of cpus affinitized to
>>> the current domain using `domain->dirty_cpumask`.
>>
>> But then you need to effect cache flushing when a CPU gets
>> taken out of domain->dirty_cpumask. I don't think you/we want
>> to do that.
>>
> If we do not restrict, it could lead to DoS attack, where a malicious
> guest could keep writing to MTRR registers or do a cache flush through
> DM Op and keep sending IPIs to other neighboring guests.

I saw Jan already answered about the alleged DoS, so I will just focus 
on the resolution.

I agree that in the ideal situation we want to limit the impact on the 
other vCPUs. However, we also need to make sure the cure is not worse 
than the symptoms.

The cache flush cannot be restricted in all the pinning situation 
because pinning doesn't imply the pCPU will be dedicated to a given vCPU 
or even the vCPU will stick on pCPU (we may allow floating on a NUMA 
socket). Although your setup may offer this guarantee.

My knowledge in this area is quite limited. But below a few question 
that hopefully will help to make a decision.

The first question to answer is: can the flush can be restricted in a 
setup where each vCPUs are running on a decicated pCPU (i.e partionned 
system)?

If the answer is yes, then we should figure out whether using 
domain->dirty_cpumask would always be correct? For instance, a vCPU may 
not have yet run, so can we consider the associated pCPU cache would be 
consistent?

Another line of question is what can we do on system supporting 
self-snooping? IOW, would it be possible to restrict the flush for all 
the setup?

Cheers,

Andrew Cooper Dec. 14, 2020, 4:01 p.m. UTC | #6

On 14/12/2020 10:56, Julien Grall wrote:
> Hi Harsha,
>
> On 14/12/2020 09:26, Shamsundara Havanur, Harsha wrote:
>> On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote:
>>> CAUTION: This email originated from outside of the organization. Do
>>> not click links or open attachments unless you can confirm the sender
>>> and know the content is safe.
>>>
>>>
>>>
>>> On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote:
>>>> A HVM domain flushes cache on all the cpus using
>>>> `flush_all` macro which uses cpu_online_map, during
>>>> i) creation of a new domain
>>>> ii) when device-model op is performed
>>>> iii) when domain is destructed.
>>>>
>>>> This triggers IPI on all the cpus, thus affecting other
>>>> domains that are pinned to different pcpus. This patch
>>>> restricts cache flush to the set of cpus affinitized to
>>>> the current domain using `domain->dirty_cpumask`.
>>>
>>> But then you need to effect cache flushing when a CPU gets
>>> taken out of domain->dirty_cpumask. I don't think you/we want
>>> to do that.
>>>
>> If we do not restrict, it could lead to DoS attack, where a malicious
>> guest could keep writing to MTRR registers or do a cache flush through
>> DM Op and keep sending IPIs to other neighboring guests.
>
> I saw Jan already answered about the alleged DoS, so I will just focus
> on the resolution.
>
> I agree that in the ideal situation we want to limit the impact on the
> other vCPUs. However, we also need to make sure the cure is not worse
> than the symptoms.

And specifically, only a change which is correct.  This patch very
definitely isn't.

Lines can get cached on other cpus from, e.g. qemu mappings and PV backends.

>
> The cache flush cannot be restricted in all the pinning situation
> because pinning doesn't imply the pCPU will be dedicated to a given
> vCPU or even the vCPU will stick on pCPU (we may allow floating on a
> NUMA socket). Although your setup may offer this guarantee.
>
> My knowledge in this area is quite limited. But below a few question
> that hopefully will help to make a decision.
>
> The first question to answer is: can the flush can be restricted in a
> setup where each vCPUs are running on a decicated pCPU (i.e partionned
> system)?

Not really.  Lines can become cached even from speculation in the directmap.

If you need to flush the caches (and don't have a virtual mapping to
issue clflush/clflushopt/clwb over), it must be on all CPUs.

> If the answer is yes, then we should figure out whether using
> domain->dirty_cpumask would always be correct? For instance, a vCPU
> may not have yet run, so can we consider the associated pCPU cache
> would be consistent?
>
> Another line of question is what can we do on system supporting
> self-snooping? IOW, would it be possible to restrict the flush for all
> the setup?

Right - this is the avenue which ought to be investigated.  (Working)
self-noop ought to remove the need for some of these cache flushes. 
Others look like they're not correct to begin with.  Others, such as the
wbinvd intercepts absolutely must stay as they are.

~Andrew

Shamsundara Havanur, Harsha Dec. 14, 2020, 7:05 p.m. UTC | #7

On Mon, 2020-12-14 at 16:01 +0000, Andrew Cooper wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
> 
> 
> 
> On 14/12/2020 10:56, Julien Grall wrote:
> > Hi Harsha,
> > 
> > On 14/12/2020 09:26, Shamsundara Havanur, Harsha wrote:
> > > On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote:
> > > > CAUTION: This email originated from outside of the
> > > > organization. Do
> > > > not click links or open attachments unless you can confirm the
> > > > sender
> > > > and know the content is safe.
> > > > 
> > > > 
> > > > 
> > > > On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote:
> > > > > A HVM domain flushes cache on all the cpus using
> > > > > `flush_all` macro which uses cpu_online_map, during
> > > > > i) creation of a new domain
> > > > > ii) when device-model op is performed
> > > > > iii) when domain is destructed.
> > > > > 
> > > > > This triggers IPI on all the cpus, thus affecting other
> > > > > domains that are pinned to different pcpus. This patch
> > > > > restricts cache flush to the set of cpus affinitized to
> > > > > the current domain using `domain->dirty_cpumask`.
> > > > 
> > > > But then you need to effect cache flushing when a CPU gets
> > > > taken out of domain->dirty_cpumask. I don't think you/we want
> > > > to do that.
> > > > 
> > > 
> > > If we do not restrict, it could lead to DoS attack, where a
> > > malicious
> > > guest could keep writing to MTRR registers or do a cache flush
> > > through
> > > DM Op and keep sending IPIs to other neighboring guests.
> > 
> > I saw Jan already answered about the alleged DoS, so I will just
> > focus
> > on the resolution.
> > 
> > I agree that in the ideal situation we want to limit the impact on
> > the
> > other vCPUs. However, we also need to make sure the cure is not
> > worse
> > than the symptoms.
> 
> And specifically, only a change which is correct.  This patch very
> definitely isn't.
> 
> Lines can get cached on other cpus from, e.g. qemu mappings and PV
> backends.
> 
> > 
> > The cache flush cannot be restricted in all the pinning situation
> > because pinning doesn't imply the pCPU will be dedicated to a given
> > vCPU or even the vCPU will stick on pCPU (we may allow floating on
> > a
> > NUMA socket). Although your setup may offer this guarantee.
> > 
> > My knowledge in this area is quite limited. But below a few
> > question
> > that hopefully will help to make a decision.
> > 
> > The first question to answer is: can the flush can be restricted in
> > a
> > setup where each vCPUs are running on a decicated pCPU (i.e
> > partionned
> > system)?
> 
> Not really.  Lines can become cached even from speculation in the
> directmap.
> 
> If you need to flush the caches (and don't have a virtual mapping to
> issue clflush/clflushopt/clwb over), it must be on all CPUs.

If lines are cached due to aggressive speculation from a different
guest, wouldn't they be invalidated at the speculation boundary, since
it's a wrong speculation? Would it still require to be flushed
explicitly?

-Harsha

> 
> > If the answer is yes, then we should figure out whether using
> > domain->dirty_cpumask would always be correct? For instance, a vCPU
> > may not have yet run, so can we consider the associated pCPU cache
> > would be consistent?
> > 
> > Another line of question is what can we do on system supporting
> > self-snooping? IOW, would it be possible to restrict the flush for
> > all
> > the setup?
> 
> Right - this is the avenue which ought to be investigated.  (Working)
> self-noop ought to remove the need for some of these cache flushes.
> Others look like they're not correct to begin with.  Others, such as
> the
> wbinvd intercepts absolutely must stay as they are.
> 
> ~Andrew

Andrew Cooper Dec. 14, 2020, 7:36 p.m. UTC | #8

On 14/12/2020 19:05, Shamsundara Havanur, Harsha wrote:
> On Mon, 2020-12-14 at 16:01 +0000, Andrew Cooper wrote:
>> CAUTION: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender
>> and know the content is safe.
>>
>>
>>
>> On 14/12/2020 10:56, Julien Grall wrote:
>>> Hi Harsha,
>>>
>>> On 14/12/2020 09:26, Shamsundara Havanur, Harsha wrote:
>>>> On Mon, 2020-12-14 at 09:52 +0100, Jan Beulich wrote:
>>>>> CAUTION: This email originated from outside of the
>>>>> organization. Do
>>>>> not click links or open attachments unless you can confirm the
>>>>> sender
>>>>> and know the content is safe.
>>>>>
>>>>>
>>>>>
>>>>> On 11.12.2020 12:44, Harsha Shamsundara Havanur wrote:
>>>>>> A HVM domain flushes cache on all the cpus using
>>>>>> `flush_all` macro which uses cpu_online_map, during
>>>>>> i) creation of a new domain
>>>>>> ii) when device-model op is performed
>>>>>> iii) when domain is destructed.
>>>>>>
>>>>>> This triggers IPI on all the cpus, thus affecting other
>>>>>> domains that are pinned to different pcpus. This patch
>>>>>> restricts cache flush to the set of cpus affinitized to
>>>>>> the current domain using `domain->dirty_cpumask`.
>>>>> But then you need to effect cache flushing when a CPU gets
>>>>> taken out of domain->dirty_cpumask. I don't think you/we want
>>>>> to do that.
>>>>>
>>>> If we do not restrict, it could lead to DoS attack, where a
>>>> malicious
>>>> guest could keep writing to MTRR registers or do a cache flush
>>>> through
>>>> DM Op and keep sending IPIs to other neighboring guests.
>>> I saw Jan already answered about the alleged DoS, so I will just
>>> focus
>>> on the resolution.
>>>
>>> I agree that in the ideal situation we want to limit the impact on
>>> the
>>> other vCPUs. However, we also need to make sure the cure is not
>>> worse
>>> than the symptoms.
>> And specifically, only a change which is correct.  This patch very
>> definitely isn't.
>>
>> Lines can get cached on other cpus from, e.g. qemu mappings and PV
>> backends.
>>
>>> The cache flush cannot be restricted in all the pinning situation
>>> because pinning doesn't imply the pCPU will be dedicated to a given
>>> vCPU or even the vCPU will stick on pCPU (we may allow floating on
>>> a
>>> NUMA socket). Although your setup may offer this guarantee.
>>>
>>> My knowledge in this area is quite limited. But below a few
>>> question
>>> that hopefully will help to make a decision.
>>>
>>> The first question to answer is: can the flush can be restricted in
>>> a
>>> setup where each vCPUs are running on a decicated pCPU (i.e
>>> partionned
>>> system)?
>> Not really.  Lines can become cached even from speculation in the
>> directmap.
>>
>> If you need to flush the caches (and don't have a virtual mapping to
>> issue clflush/clflushopt/clwb over), it must be on all CPUs.
> If lines are cached due to aggressive speculation from a different
> guest, wouldn't they be invalidated at the speculation boundary, since
> it's a wrong speculation? Would it still require to be flushed
> explicitly?

No.  Caches are microarchitectural state (just like TLBs, linefill
buffers, etc.)

The entire mess surrounding speculative security issues is that the
perturbance from bad speculation survive, and can be recovered at a
later point.

~Andrew

[XEN,v1,1/1] Invalidate cache for cpus affinitized to the domain

Commit Message

Comments

Patch