diff mbox series

[v9,1/3] x86/tlb: introduce a flush HVM ASIDs flag

Message ID 20200406105703.79201-2-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series x86/guest: use assisted TLB flush in guest mode | expand

Commit Message

Roger Pau Monné April 6, 2020, 10:57 a.m. UTC
Introduce a specific flag to request a HVM guest linear TLB flush,
which is an ASID/VPID tickle that forces a guest linear to guest
physical TLB flush for all HVM guests.

This was previously unconditionally done in each pre_flush call, but
that's not required: HVM guests not using shadow don't require linear
TLB flushes as Xen doesn't modify the guest page tables in that case
(ie: when using HAP). Note that shadow paging code already takes care
of issuing the necessary flushes when the shadow page tables are
modified.

In order to keep the previous behavior modify all shadow code TLB
flushes to also flush the guest linear to physical TLB if the guest is
HVM. I haven't looked at each specific shadow code TLB flush in order
to figure out whether it actually requires a guest TLB flush or not,
so there might be room for improvement in that regard.

Also perform ASID/VPID flushes when modifying the p2m tables as it's a
requirement for AMD hardware. Finally keep the flush in
switch_cr3_cr4, as it's not clear whether code could rely on
switch_cr3_cr4 also performing a guest linear TLB flush. A following
patch can remove the ASID/VPID tickle from switch_cr3_cr4 if found to
not be necessary.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v8:
 - Don't flush host TLB on HAP changes.
 - Introduce a helper for shadow changes that only flushes ASIDs/VPIDs
   when the guest is HVM.
 - Introduce a helper for HAP that only flushes ASIDs/VPIDs.

Changes since v7:
 - Do not perform an ASID flush in filtered_flush_tlb_mask: the
   requested flush is related to the page need_tlbflush field and not
   to p2m changes (applies to both callers).

Changes since v6:
 - Add ASID/VPID flushes when modifying the p2m.
 - Keep the ASID/VPID flush in switch_cr3_cr4.

Changes since v5:
 - Rename FLUSH_GUESTS_TLB to FLUSH_HVM_ASID_CORE.
 - Clarify commit message.
 - Define FLUSH_HVM_ASID_CORE to 0 when !CONFIG_HVM.
---
 xen/arch/x86/flushtlb.c          |  6 ++++--
 xen/arch/x86/mm/hap/hap.c        |  8 ++++----
 xen/arch/x86/mm/hap/nested_hap.c |  2 +-
 xen/arch/x86/mm/hap/private.h    |  5 +++++
 xen/arch/x86/mm/p2m-pt.c         |  2 +-
 xen/arch/x86/mm/paging.c         |  3 ++-
 xen/arch/x86/mm/shadow/common.c  | 18 +++++++++---------
 xen/arch/x86/mm/shadow/hvm.c     |  2 +-
 xen/arch/x86/mm/shadow/multi.c   | 17 +++++++++--------
 xen/arch/x86/mm/shadow/private.h |  6 ++++++
 xen/include/asm-x86/flushtlb.h   |  6 ++++++
 11 files changed, 48 insertions(+), 27 deletions(-)

Comments

Jan Beulich April 8, 2020, 11:25 a.m. UTC | #1
On 06.04.2020 12:57, Roger Pau Monne wrote:
> Introduce a specific flag to request a HVM guest linear TLB flush,
> which is an ASID/VPID tickle that forces a guest linear to guest
> physical TLB flush for all HVM guests.
> 
> This was previously unconditionally done in each pre_flush call, but
> that's not required: HVM guests not using shadow don't require linear
> TLB flushes as Xen doesn't modify the guest page tables in that case
> (ie: when using HAP). Note that shadow paging code already takes care
> of issuing the necessary flushes when the shadow page tables are
> modified.
> 
> In order to keep the previous behavior modify all shadow code TLB
> flushes to also flush the guest linear to physical TLB if the guest is
> HVM. I haven't looked at each specific shadow code TLB flush in order
> to figure out whether it actually requires a guest TLB flush or not,
> so there might be room for improvement in that regard.
> 
> Also perform ASID/VPID flushes when modifying the p2m tables as it's a
> requirement for AMD hardware. Finally keep the flush in
> switch_cr3_cr4, as it's not clear whether code could rely on
> switch_cr3_cr4 also performing a guest linear TLB flush. A following
> patch can remove the ASID/VPID tickle from switch_cr3_cr4 if found to
> not be necessary.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one really minor remark:

> --- a/xen/arch/x86/mm/paging.c
> +++ b/xen/arch/x86/mm/paging.c
> @@ -613,7 +613,8 @@ void paging_log_dirty_range(struct domain *d,
>  
>      p2m_unlock(p2m);
>  
> -    flush_tlb_mask(d->dirty_cpumask);
> +    flush_mask(d->dirty_cpumask, (!hap_enabled(d) ? FLUSH_TLB : 0) |
> +                                 FLUSH_HVM_ASID_CORE);

In cases where one case is assumed to be more likely than the other
putting the more likely one first can be viewed as a mild hint to
the compiler, and hence an extra ! may be warranted in an if() or
a conditional expression. Here, however, I don't think we can
really consider one case more likely than the other, and hence I'd
suggest to avoid the !, flipping the other two expressions
accordingly. I may take the liberty to adjust this while committing
(if I'm to be the one).

Jan
Roger Pau Monné April 8, 2020, 3:10 p.m. UTC | #2
On Wed, Apr 08, 2020 at 01:25:14PM +0200, Jan Beulich wrote:
> On 06.04.2020 12:57, Roger Pau Monne wrote:
> > Introduce a specific flag to request a HVM guest linear TLB flush,
> > which is an ASID/VPID tickle that forces a guest linear to guest
> > physical TLB flush for all HVM guests.
> > 
> > This was previously unconditionally done in each pre_flush call, but
> > that's not required: HVM guests not using shadow don't require linear
> > TLB flushes as Xen doesn't modify the guest page tables in that case
> > (ie: when using HAP). Note that shadow paging code already takes care
> > of issuing the necessary flushes when the shadow page tables are
> > modified.
> > 
> > In order to keep the previous behavior modify all shadow code TLB
> > flushes to also flush the guest linear to physical TLB if the guest is
> > HVM. I haven't looked at each specific shadow code TLB flush in order
> > to figure out whether it actually requires a guest TLB flush or not,
> > so there might be room for improvement in that regard.
> > 
> > Also perform ASID/VPID flushes when modifying the p2m tables as it's a
> > requirement for AMD hardware. Finally keep the flush in
> > switch_cr3_cr4, as it's not clear whether code could rely on
> > switch_cr3_cr4 also performing a guest linear TLB flush. A following
> > patch can remove the ASID/VPID tickle from switch_cr3_cr4 if found to
> > not be necessary.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with one really minor remark:
> 
> > --- a/xen/arch/x86/mm/paging.c
> > +++ b/xen/arch/x86/mm/paging.c
> > @@ -613,7 +613,8 @@ void paging_log_dirty_range(struct domain *d,
> >  
> >      p2m_unlock(p2m);
> >  
> > -    flush_tlb_mask(d->dirty_cpumask);
> > +    flush_mask(d->dirty_cpumask, (!hap_enabled(d) ? FLUSH_TLB : 0) |
> > +                                 FLUSH_HVM_ASID_CORE);
> 
> In cases where one case is assumed to be more likely than the other
> putting the more likely one first can be viewed as a mild hint to
> the compiler, and hence an extra ! may be warranted in an if() or
> a conditional expression. Here, however, I don't think we can
> really consider one case more likely than the other, and hence I'd
> suggest to avoid the !, flipping the other two expressions
> accordingly. I may take the liberty to adjust this while committing
> (if I'm to be the one).

That's fine, thanks. Somehow '!hap -> flush' was clearer in my mind.

Roger.
Jan Beulich April 9, 2020, 11:16 a.m. UTC | #3
On 08.04.2020 17:10, Roger Pau Monné wrote:
> On Wed, Apr 08, 2020 at 01:25:14PM +0200, Jan Beulich wrote:
>> On 06.04.2020 12:57, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/mm/paging.c
>>> +++ b/xen/arch/x86/mm/paging.c
>>> @@ -613,7 +613,8 @@ void paging_log_dirty_range(struct domain *d,
>>>  
>>>      p2m_unlock(p2m);
>>>  
>>> -    flush_tlb_mask(d->dirty_cpumask);
>>> +    flush_mask(d->dirty_cpumask, (!hap_enabled(d) ? FLUSH_TLB : 0) |
>>> +                                 FLUSH_HVM_ASID_CORE);
>>
>> In cases where one case is assumed to be more likely than the other
>> putting the more likely one first can be viewed as a mild hint to
>> the compiler, and hence an extra ! may be warranted in an if() or
>> a conditional expression. Here, however, I don't think we can
>> really consider one case more likely than the other, and hence I'd
>> suggest to avoid the !, flipping the other two expressions
>> accordingly. I may take the liberty to adjust this while committing
>> (if I'm to be the one).
> 
> That's fine, thanks. Somehow '!hap -> flush' was clearer in my mind.

Thinking about it with the other HVM-related changes in v9, shouldn't
this then be

    flush_mask(d->dirty_cpumask, (hap_enabled(d) ? 0 : FLUSH_TLB) |
                                 (is_hvm_domain(d) ? FLUSH_HVM_ASID_CORE : 0));

Or wait - the only caller lives in hap.c. As a result the FLUSH_TLB
part can be dropped altogether. And I question the need of flushing
guest TLBs - this is purely a p2m operation. I'll go look at the
history of this function, but for now I think the call should be
dropped (albeit then maybe better in a separate patch).

Jan
Jan Beulich April 9, 2020, 11:54 a.m. UTC | #4
On 06.04.2020 12:57, Roger Pau Monne wrote:
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -118,7 +118,7 @@ int hap_track_dirty_vram(struct domain *d,
>              p2m_change_type_range(d, begin_pfn, begin_pfn + nr,
>                                    p2m_ram_rw, p2m_ram_logdirty);
>  
> -            flush_tlb_mask(d->dirty_cpumask);
> +            hap_flush_tlb_mask(d->dirty_cpumask);
>  
>              memset(dirty_bitmap, 0xff, size); /* consider all pages dirty */
>          }
> @@ -205,7 +205,7 @@ static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
>           * to be read-only, or via hardware-assisted log-dirty.
>           */
>          p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
> -        flush_tlb_mask(d->dirty_cpumask);
> +        hap_flush_tlb_mask(d->dirty_cpumask);
>      }
>      return 0;
>  }
> @@ -234,7 +234,7 @@ static void hap_clean_dirty_bitmap(struct domain *d)
>       * be read-only, or via hardware-assisted log-dirty.
>       */
>      p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
> -    flush_tlb_mask(d->dirty_cpumask);
> +    hap_flush_tlb_mask(d->dirty_cpumask);
>  }
>  
>  /************************************************/
> @@ -798,7 +798,7 @@ hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
>  
>      safe_write_pte(p, new);
>      if ( old_flags & _PAGE_PRESENT )
> -        flush_tlb_mask(d->dirty_cpumask);
> +        hap_flush_tlb_mask(d->dirty_cpumask);
>  
>      paging_unlock(d);
>  

Following up on my earlier mail about paging_log_dirty_range(), I'm
now of the opinion that all of these flushes should go away too. I
can only assume that they got put where they are when HAP code was
cloned from the shadow one. These are only p2m operations, and hence
p2m level TLB flushing is all that's needed here.

> --- a/xen/arch/x86/mm/hap/nested_hap.c
> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> @@ -84,7 +84,7 @@ nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
>      safe_write_pte(p, new);
>  
>      if (old_flags & _PAGE_PRESENT)
> -        flush_tlb_mask(p2m->dirty_cpumask);
> +        hap_flush_tlb_mask(p2m->dirty_cpumask);

Same here then presumably.

As suggested in my earlier reply, the plain removals of flush
invocations would probably better be split out into a separate
patch.

> --- a/xen/arch/x86/mm/hap/private.h
> +++ b/xen/arch/x86/mm/hap/private.h
> @@ -47,4 +47,9 @@ unsigned long hap_p2m_ga_to_gfn_4_levels(struct vcpu *v,
>      struct p2m_domain *p2m, unsigned long cr3,
>      paddr_t ga, uint32_t *pfec, unsigned int *page_order);
>  
> +static inline void hap_flush_tlb_mask(const cpumask_t *mask)
> +{
> +    flush_mask(mask, FLUSH_HVM_ASID_CORE);
> +}

With the above introduction of this would then become unnecessary.

Jan
Wei Liu April 10, 2020, 3:48 p.m. UTC | #5
On Thu, Apr 09, 2020 at 01:54:40PM +0200, Jan Beulich wrote:
> On 06.04.2020 12:57, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/mm/hap/hap.c
> > +++ b/xen/arch/x86/mm/hap/hap.c
> > @@ -118,7 +118,7 @@ int hap_track_dirty_vram(struct domain *d,
> >              p2m_change_type_range(d, begin_pfn, begin_pfn + nr,
> >                                    p2m_ram_rw, p2m_ram_logdirty);
> >  
> > -            flush_tlb_mask(d->dirty_cpumask);
> > +            hap_flush_tlb_mask(d->dirty_cpumask);
> >  
> >              memset(dirty_bitmap, 0xff, size); /* consider all pages dirty */
> >          }
> > @@ -205,7 +205,7 @@ static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
> >           * to be read-only, or via hardware-assisted log-dirty.
> >           */
> >          p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
> > -        flush_tlb_mask(d->dirty_cpumask);
> > +        hap_flush_tlb_mask(d->dirty_cpumask);
> >      }
> >      return 0;
> >  }
> > @@ -234,7 +234,7 @@ static void hap_clean_dirty_bitmap(struct domain *d)
> >       * be read-only, or via hardware-assisted log-dirty.
> >       */
> >      p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
> > -    flush_tlb_mask(d->dirty_cpumask);
> > +    hap_flush_tlb_mask(d->dirty_cpumask);
> >  }
> >  
> >  /************************************************/
> > @@ -798,7 +798,7 @@ hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
> >  
> >      safe_write_pte(p, new);
> >      if ( old_flags & _PAGE_PRESENT )
> > -        flush_tlb_mask(d->dirty_cpumask);
> > +        hap_flush_tlb_mask(d->dirty_cpumask);
> >  
> >      paging_unlock(d);
> >  
> 
> Following up on my earlier mail about paging_log_dirty_range(), I'm
> now of the opinion that all of these flushes should go away too. I
> can only assume that they got put where they are when HAP code was
> cloned from the shadow one. These are only p2m operations, and hence
> p2m level TLB flushing is all that's needed here.
> 
> > --- a/xen/arch/x86/mm/hap/nested_hap.c
> > +++ b/xen/arch/x86/mm/hap/nested_hap.c
> > @@ -84,7 +84,7 @@ nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
> >      safe_write_pte(p, new);
> >  
> >      if (old_flags & _PAGE_PRESENT)
> > -        flush_tlb_mask(p2m->dirty_cpumask);
> > +        hap_flush_tlb_mask(p2m->dirty_cpumask);
> 
> Same here then presumably.
> 
> As suggested in my earlier reply, the plain removals of flush
> invocations would probably better be split out into a separate
> patch.
> 
> > --- a/xen/arch/x86/mm/hap/private.h
> > +++ b/xen/arch/x86/mm/hap/private.h
> > @@ -47,4 +47,9 @@ unsigned long hap_p2m_ga_to_gfn_4_levels(struct vcpu *v,
> >      struct p2m_domain *p2m, unsigned long cr3,
> >      paddr_t ga, uint32_t *pfec, unsigned int *page_order);
> >  
> > +static inline void hap_flush_tlb_mask(const cpumask_t *mask)
> > +{
> > +    flush_mask(mask, FLUSH_HVM_ASID_CORE);
> > +}
> 
> With the above introduction of this would then become unnecessary.

I had planned to make the required adjustment(s) and commit v9 today.
But seeing your comment it appears v10 is warranted.

Wei.

> 
> Jan
Jan Beulich April 14, 2020, 6:28 a.m. UTC | #6
On 10.04.2020 17:48, Wei Liu wrote:
> On Thu, Apr 09, 2020 at 01:54:40PM +0200, Jan Beulich wrote:
>> On 06.04.2020 12:57, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/mm/hap/private.h
>>> +++ b/xen/arch/x86/mm/hap/private.h
>>> @@ -47,4 +47,9 @@ unsigned long hap_p2m_ga_to_gfn_4_levels(struct vcpu *v,
>>>      struct p2m_domain *p2m, unsigned long cr3,
>>>      paddr_t ga, uint32_t *pfec, unsigned int *page_order);
>>>  
>>> +static inline void hap_flush_tlb_mask(const cpumask_t *mask)
>>> +{
>>> +    flush_mask(mask, FLUSH_HVM_ASID_CORE);
>>> +}
>>
>> With the above introduction of this would then become unnecessary.
> 
> I had planned to make the required adjustment(s) and commit v9 today.

Actually I came to make these comments in the course of preparing
to commit the series, while making the small adjustments.

Jan
Roger Pau Monné April 14, 2020, 7:52 a.m. UTC | #7
On Thu, Apr 09, 2020 at 01:54:40PM +0200, Jan Beulich wrote:
> On 06.04.2020 12:57, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/mm/hap/hap.c
> > +++ b/xen/arch/x86/mm/hap/hap.c
> > @@ -118,7 +118,7 @@ int hap_track_dirty_vram(struct domain *d,
> >              p2m_change_type_range(d, begin_pfn, begin_pfn + nr,
> >                                    p2m_ram_rw, p2m_ram_logdirty);
> >  
> > -            flush_tlb_mask(d->dirty_cpumask);
> > +            hap_flush_tlb_mask(d->dirty_cpumask);
> >  
> >              memset(dirty_bitmap, 0xff, size); /* consider all pages dirty */
> >          }
> > @@ -205,7 +205,7 @@ static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
> >           * to be read-only, or via hardware-assisted log-dirty.
> >           */
> >          p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
> > -        flush_tlb_mask(d->dirty_cpumask);
> > +        hap_flush_tlb_mask(d->dirty_cpumask);
> >      }
> >      return 0;
> >  }
> > @@ -234,7 +234,7 @@ static void hap_clean_dirty_bitmap(struct domain *d)
> >       * be read-only, or via hardware-assisted log-dirty.
> >       */
> >      p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
> > -    flush_tlb_mask(d->dirty_cpumask);
> > +    hap_flush_tlb_mask(d->dirty_cpumask);
> >  }
> >  
> >  /************************************************/
> > @@ -798,7 +798,7 @@ hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
> >  
> >      safe_write_pte(p, new);
> >      if ( old_flags & _PAGE_PRESENT )
> > -        flush_tlb_mask(d->dirty_cpumask);
> > +        hap_flush_tlb_mask(d->dirty_cpumask);
> >  
> >      paging_unlock(d);
> >  
> 
> Following up on my earlier mail about paging_log_dirty_range(), I'm
> now of the opinion that all of these flushes should go away too. I
> can only assume that they got put where they are when HAP code was
> cloned from the shadow one. These are only p2m operations, and hence
> p2m level TLB flushing is all that's needed here.

IIRC without those ASID flushes NPT won't work correctly, as I think
it relies on those flushes when updating the p2m.

We could see about moving those flushes inside the NPT functions that
modify the p2m, AFAICT p2m_pt_set_entry which calls
p2m->write_p2m_entry relies on the later doing the ASID flushes, as it
doesn't perform any.

Thanks, Roger.
Roger Pau Monné April 14, 2020, 8:01 a.m. UTC | #8
On Thu, Apr 09, 2020 at 01:16:57PM +0200, Jan Beulich wrote:
> On 08.04.2020 17:10, Roger Pau Monné wrote:
> > On Wed, Apr 08, 2020 at 01:25:14PM +0200, Jan Beulich wrote:
> >> On 06.04.2020 12:57, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/mm/paging.c
> >>> +++ b/xen/arch/x86/mm/paging.c
> >>> @@ -613,7 +613,8 @@ void paging_log_dirty_range(struct domain *d,
> >>>  
> >>>      p2m_unlock(p2m);
> >>>  
> >>> -    flush_tlb_mask(d->dirty_cpumask);
> >>> +    flush_mask(d->dirty_cpumask, (!hap_enabled(d) ? FLUSH_TLB : 0) |
> >>> +                                 FLUSH_HVM_ASID_CORE);
> >>
> >> In cases where one case is assumed to be more likely than the other
> >> putting the more likely one first can be viewed as a mild hint to
> >> the compiler, and hence an extra ! may be warranted in an if() or
> >> a conditional expression. Here, however, I don't think we can
> >> really consider one case more likely than the other, and hence I'd
> >> suggest to avoid the !, flipping the other two expressions
> >> accordingly. I may take the liberty to adjust this while committing
> >> (if I'm to be the one).
> > 
> > That's fine, thanks. Somehow '!hap -> flush' was clearer in my mind.
> 
> Thinking about it with the other HVM-related changes in v9, shouldn't
> this then be
> 
>     flush_mask(d->dirty_cpumask, (hap_enabled(d) ? 0 : FLUSH_TLB) |
>                                  (is_hvm_domain(d) ? FLUSH_HVM_ASID_CORE : 0));
> 
> Or wait - the only caller lives in hap.c. As a result the FLUSH_TLB
> part can be dropped altogether. And I question the need of flushing
> guest TLBs - this is purely a p2m operation. I'll go look at the
> history of this function, but for now I think the call should be
> dropped (albeit then maybe better in a separate patch).

The ASID flush needs to stay unless it's moved into p2m_pt_set_entry,
as p2m_pt_set_entry itself doesn't perform any ASID flush and won't
work correctly.

I think it's safe to remove the TLB flush, as the code is only called
from HAP, and hence is not used by shadow (which is what would require
a plain TLB flush). The placement of this function seems misleading to
me, as it looks like it's used by both shadow and HAP. It might be
better to move it to hap.c if it's only to be used by HAP code.

Thanks, Roger.
Jan Beulich April 14, 2020, 9:01 a.m. UTC | #9
On 14.04.2020 09:52, Roger Pau Monné wrote:
> On Thu, Apr 09, 2020 at 01:54:40PM +0200, Jan Beulich wrote:
>> On 06.04.2020 12:57, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/mm/hap/hap.c
>>> +++ b/xen/arch/x86/mm/hap/hap.c
>>> @@ -118,7 +118,7 @@ int hap_track_dirty_vram(struct domain *d,
>>>              p2m_change_type_range(d, begin_pfn, begin_pfn + nr,
>>>                                    p2m_ram_rw, p2m_ram_logdirty);
>>>  
>>> -            flush_tlb_mask(d->dirty_cpumask);
>>> +            hap_flush_tlb_mask(d->dirty_cpumask);
>>>  
>>>              memset(dirty_bitmap, 0xff, size); /* consider all pages dirty */
>>>          }
>>> @@ -205,7 +205,7 @@ static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
>>>           * to be read-only, or via hardware-assisted log-dirty.
>>>           */
>>>          p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
>>> -        flush_tlb_mask(d->dirty_cpumask);
>>> +        hap_flush_tlb_mask(d->dirty_cpumask);
>>>      }
>>>      return 0;
>>>  }
>>> @@ -234,7 +234,7 @@ static void hap_clean_dirty_bitmap(struct domain *d)
>>>       * be read-only, or via hardware-assisted log-dirty.
>>>       */
>>>      p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
>>> -    flush_tlb_mask(d->dirty_cpumask);
>>> +    hap_flush_tlb_mask(d->dirty_cpumask);
>>>  }
>>>  
>>>  /************************************************/
>>> @@ -798,7 +798,7 @@ hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
>>>  
>>>      safe_write_pte(p, new);
>>>      if ( old_flags & _PAGE_PRESENT )
>>> -        flush_tlb_mask(d->dirty_cpumask);
>>> +        hap_flush_tlb_mask(d->dirty_cpumask);
>>>  
>>>      paging_unlock(d);
>>>  
>>
>> Following up on my earlier mail about paging_log_dirty_range(), I'm
>> now of the opinion that all of these flushes should go away too. I
>> can only assume that they got put where they are when HAP code was
>> cloned from the shadow one. These are only p2m operations, and hence
>> p2m level TLB flushing is all that's needed here.
> 
> IIRC without those ASID flushes NPT won't work correctly, as I think
> it relies on those flushes when updating the p2m.

Hmm, yes - at least for this last one (in patch context) I definitely
agree: NPT's TLB invalidation is quite different from EPT's (and I
was too focused on the latter when writing the earlier reply).
Therefore how about keeping hap_flush_tlb_mask() (and its uses), but
teaching it to do nothing for EPT, while doing an ASID based flush
for NPT?

There may then even be the option to have a wider purpose helper,
dealing with most (all?) of the flushes needed from underneath
x86/mm/, setting the TLB and HVM_ASID_CORE flags as appropriate. In
the EPT case the function would still be a no-op (afaict).

> We could see about moving those flushes inside the NPT functions that
> modify the p2m, AFAICT p2m_pt_set_entry which calls
> p2m->write_p2m_entry relies on the later doing the ASID flushes, as it
> doesn't perform any.

I don't think we want to go this far at this point.

Jan
Jan Beulich April 14, 2020, 9:09 a.m. UTC | #10
On 14.04.2020 10:01, Roger Pau Monné wrote:
> On Thu, Apr 09, 2020 at 01:16:57PM +0200, Jan Beulich wrote:
>> On 08.04.2020 17:10, Roger Pau Monné wrote:
>>> On Wed, Apr 08, 2020 at 01:25:14PM +0200, Jan Beulich wrote:
>>>> On 06.04.2020 12:57, Roger Pau Monne wrote:
>>>>> --- a/xen/arch/x86/mm/paging.c
>>>>> +++ b/xen/arch/x86/mm/paging.c
>>>>> @@ -613,7 +613,8 @@ void paging_log_dirty_range(struct domain *d,
>>>>>  
>>>>>      p2m_unlock(p2m);
>>>>>  
>>>>> -    flush_tlb_mask(d->dirty_cpumask);
>>>>> +    flush_mask(d->dirty_cpumask, (!hap_enabled(d) ? FLUSH_TLB : 0) |
>>>>> +                                 FLUSH_HVM_ASID_CORE);
>>>>
>>>> In cases where one case is assumed to be more likely than the other
>>>> putting the more likely one first can be viewed as a mild hint to
>>>> the compiler, and hence an extra ! may be warranted in an if() or
>>>> a conditional expression. Here, however, I don't think we can
>>>> really consider one case more likely than the other, and hence I'd
>>>> suggest to avoid the !, flipping the other two expressions
>>>> accordingly. I may take the liberty to adjust this while committing
>>>> (if I'm to be the one).
>>>
>>> That's fine, thanks. Somehow '!hap -> flush' was clearer in my mind.
>>
>> Thinking about it with the other HVM-related changes in v9, shouldn't
>> this then be
>>
>>     flush_mask(d->dirty_cpumask, (hap_enabled(d) ? 0 : FLUSH_TLB) |
>>                                  (is_hvm_domain(d) ? FLUSH_HVM_ASID_CORE : 0));
>>
>> Or wait - the only caller lives in hap.c. As a result the FLUSH_TLB
>> part can be dropped altogether. And I question the need of flushing
>> guest TLBs - this is purely a p2m operation. I'll go look at the
>> history of this function, but for now I think the call should be
>> dropped (albeit then maybe better in a separate patch).
> 
> The ASID flush needs to stay unless it's moved into p2m_pt_set_entry,
> as p2m_pt_set_entry itself doesn't perform any ASID flush and won't
> work correctly.

Just like for said in the other reply sent a few minutes ago - yes
for NPT, but no for EPT.

> I think it's safe to remove the TLB flush, as the code is only called
> from HAP, and hence is not used by shadow (which is what would require
> a plain TLB flush). The placement of this function seems misleading to
> me, as it looks like it's used by both shadow and HAP. It might be
> better to move it to hap.c if it's only to be used by HAP code.

Either placement has its problems, I think. The function is meant to
be a paging layer one, but is needed by HAP only right now. I'm
pondering whether to wrap it in #ifdef CONFIG_HVM (plus perhaps a
respective ASSERT_UNREACHABLE()).

In the end, just like in the other cases, this may be a valid further
user of the more generic helper that I did suggest (resulting in no
flushing on EPT and an ASID-based one on NPT).

Jan
Roger Pau Monné April 14, 2020, 9:34 a.m. UTC | #11
On Tue, Apr 14, 2020 at 11:09:43AM +0200, Jan Beulich wrote:
> On 14.04.2020 10:01, Roger Pau Monné wrote:
> > On Thu, Apr 09, 2020 at 01:16:57PM +0200, Jan Beulich wrote:
> >> On 08.04.2020 17:10, Roger Pau Monné wrote:
> >>> On Wed, Apr 08, 2020 at 01:25:14PM +0200, Jan Beulich wrote:
> >>>> On 06.04.2020 12:57, Roger Pau Monne wrote:
> >>>>> --- a/xen/arch/x86/mm/paging.c
> >>>>> +++ b/xen/arch/x86/mm/paging.c
> >>>>> @@ -613,7 +613,8 @@ void paging_log_dirty_range(struct domain *d,
> >>>>>  
> >>>>>      p2m_unlock(p2m);
> >>>>>  
> >>>>> -    flush_tlb_mask(d->dirty_cpumask);
> >>>>> +    flush_mask(d->dirty_cpumask, (!hap_enabled(d) ? FLUSH_TLB : 0) |
> >>>>> +                                 FLUSH_HVM_ASID_CORE);
> >>>>
> >>>> In cases where one case is assumed to be more likely than the other
> >>>> putting the more likely one first can be viewed as a mild hint to
> >>>> the compiler, and hence an extra ! may be warranted in an if() or
> >>>> a conditional expression. Here, however, I don't think we can
> >>>> really consider one case more likely than the other, and hence I'd
> >>>> suggest to avoid the !, flipping the other two expressions
> >>>> accordingly. I may take the liberty to adjust this while committing
> >>>> (if I'm to be the one).
> >>>
> >>> That's fine, thanks. Somehow '!hap -> flush' was clearer in my mind.
> >>
> >> Thinking about it with the other HVM-related changes in v9, shouldn't
> >> this then be
> >>
> >>     flush_mask(d->dirty_cpumask, (hap_enabled(d) ? 0 : FLUSH_TLB) |
> >>                                  (is_hvm_domain(d) ? FLUSH_HVM_ASID_CORE : 0));
> >>
> >> Or wait - the only caller lives in hap.c. As a result the FLUSH_TLB
> >> part can be dropped altogether. And I question the need of flushing
> >> guest TLBs - this is purely a p2m operation. I'll go look at the
> >> history of this function, but for now I think the call should be
> >> dropped (albeit then maybe better in a separate patch).
> > 
> > The ASID flush needs to stay unless it's moved into p2m_pt_set_entry,
> > as p2m_pt_set_entry itself doesn't perform any ASID flush and won't
> > work correctly.
> 
> Just like for said in the other reply sent a few minutes ago - yes
> for NPT, but no for EPT.

It's not strictly wrong for EPT as it won't cause EPT domains to
malfunction, it's just redundant.

> > I think it's safe to remove the TLB flush, as the code is only called
> > from HAP, and hence is not used by shadow (which is what would require
> > a plain TLB flush). The placement of this function seems misleading to
> > me, as it looks like it's used by both shadow and HAP. It might be
> > better to move it to hap.c if it's only to be used by HAP code.
> 
> Either placement has its problems, I think. The function is meant to
> be a paging layer one, but is needed by HAP only right now. I'm
> pondering whether to wrap it in #ifdef CONFIG_HVM (plus perhaps a
> respective ASSERT_UNREACHABLE()).

IMO if a TLB flush is not performed here we should add an
ASSERT_UNREACHABLE if called from a shadow mode domain, or else we
risk someone trying to use it in shadow later without realizing it's
missing a TLB flush.

Thanks, Roger.
Jan Beulich April 14, 2020, 9:52 a.m. UTC | #12
On 14.04.2020 11:34, Roger Pau Monné wrote:
> On Tue, Apr 14, 2020 at 11:09:43AM +0200, Jan Beulich wrote:
>> On 14.04.2020 10:01, Roger Pau Monné wrote:
>>> On Thu, Apr 09, 2020 at 01:16:57PM +0200, Jan Beulich wrote:
>>>> On 08.04.2020 17:10, Roger Pau Monné wrote:
>>>>> On Wed, Apr 08, 2020 at 01:25:14PM +0200, Jan Beulich wrote:
>>>>>> On 06.04.2020 12:57, Roger Pau Monne wrote:
>>>>>>> --- a/xen/arch/x86/mm/paging.c
>>>>>>> +++ b/xen/arch/x86/mm/paging.c
>>>>>>> @@ -613,7 +613,8 @@ void paging_log_dirty_range(struct domain *d,
>>>>>>>  
>>>>>>>      p2m_unlock(p2m);
>>>>>>>  
>>>>>>> -    flush_tlb_mask(d->dirty_cpumask);
>>>>>>> +    flush_mask(d->dirty_cpumask, (!hap_enabled(d) ? FLUSH_TLB : 0) |
>>>>>>> +                                 FLUSH_HVM_ASID_CORE);
>>>>>>
>>>>>> In cases where one case is assumed to be more likely than the other
>>>>>> putting the more likely one first can be viewed as a mild hint to
>>>>>> the compiler, and hence an extra ! may be warranted in an if() or
>>>>>> a conditional expression. Here, however, I don't think we can
>>>>>> really consider one case more likely than the other, and hence I'd
>>>>>> suggest to avoid the !, flipping the other two expressions
>>>>>> accordingly. I may take the liberty to adjust this while committing
>>>>>> (if I'm to be the one).
>>>>>
>>>>> That's fine, thanks. Somehow '!hap -> flush' was clearer in my mind.
>>>>
>>>> Thinking about it with the other HVM-related changes in v9, shouldn't
>>>> this then be
>>>>
>>>>     flush_mask(d->dirty_cpumask, (hap_enabled(d) ? 0 : FLUSH_TLB) |
>>>>                                  (is_hvm_domain(d) ? FLUSH_HVM_ASID_CORE : 0));
>>>>
>>>> Or wait - the only caller lives in hap.c. As a result the FLUSH_TLB
>>>> part can be dropped altogether. And I question the need of flushing
>>>> guest TLBs - this is purely a p2m operation. I'll go look at the
>>>> history of this function, but for now I think the call should be
>>>> dropped (albeit then maybe better in a separate patch).
>>>
>>> The ASID flush needs to stay unless it's moved into p2m_pt_set_entry,
>>> as p2m_pt_set_entry itself doesn't perform any ASID flush and won't
>>> work correctly.
>>
>> Just like for said in the other reply sent a few minutes ago - yes
>> for NPT, but no for EPT.
> 
> It's not strictly wrong for EPT as it won't cause EPT domains to
> malfunction, it's just redundant.

Right - it's wrong just in the sense of being pointless extra
overhead.

>>> I think it's safe to remove the TLB flush, as the code is only called
>>> from HAP, and hence is not used by shadow (which is what would require
>>> a plain TLB flush). The placement of this function seems misleading to
>>> me, as it looks like it's used by both shadow and HAP. It might be
>>> better to move it to hap.c if it's only to be used by HAP code.
>>
>> Either placement has its problems, I think. The function is meant to
>> be a paging layer one, but is needed by HAP only right now. I'm
>> pondering whether to wrap it in #ifdef CONFIG_HVM (plus perhaps a
>> respective ASSERT_UNREACHABLE()).
> 
> IMO if a TLB flush is not performed here we should add an
> ASSERT_UNREACHABLE if called from a shadow mode domain, or else we
> risk someone trying to use it in shadow later without realizing it's
> missing a TLB flush.

This would be fine with me.

Jan
Roger Pau Monné April 14, 2020, 10:02 a.m. UTC | #13
On Tue, Apr 14, 2020 at 11:01:06AM +0200, Jan Beulich wrote:
> On 14.04.2020 09:52, Roger Pau Monné wrote:
> > On Thu, Apr 09, 2020 at 01:54:40PM +0200, Jan Beulich wrote:
> >> On 06.04.2020 12:57, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/mm/hap/hap.c
> >>> +++ b/xen/arch/x86/mm/hap/hap.c
> >>> @@ -118,7 +118,7 @@ int hap_track_dirty_vram(struct domain *d,
> >>>              p2m_change_type_range(d, begin_pfn, begin_pfn + nr,
> >>>                                    p2m_ram_rw, p2m_ram_logdirty);
> >>>  
> >>> -            flush_tlb_mask(d->dirty_cpumask);
> >>> +            hap_flush_tlb_mask(d->dirty_cpumask);
> >>>  
> >>>              memset(dirty_bitmap, 0xff, size); /* consider all pages dirty */
> >>>          }
> >>> @@ -205,7 +205,7 @@ static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
> >>>           * to be read-only, or via hardware-assisted log-dirty.
> >>>           */
> >>>          p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
> >>> -        flush_tlb_mask(d->dirty_cpumask);
> >>> +        hap_flush_tlb_mask(d->dirty_cpumask);
> >>>      }
> >>>      return 0;
> >>>  }
> >>> @@ -234,7 +234,7 @@ static void hap_clean_dirty_bitmap(struct domain *d)
> >>>       * be read-only, or via hardware-assisted log-dirty.
> >>>       */
> >>>      p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
> >>> -    flush_tlb_mask(d->dirty_cpumask);
> >>> +    hap_flush_tlb_mask(d->dirty_cpumask);
> >>>  }
> >>>  
> >>>  /************************************************/
> >>> @@ -798,7 +798,7 @@ hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
> >>>  
> >>>      safe_write_pte(p, new);
> >>>      if ( old_flags & _PAGE_PRESENT )
> >>> -        flush_tlb_mask(d->dirty_cpumask);
> >>> +        hap_flush_tlb_mask(d->dirty_cpumask);
> >>>  
> >>>      paging_unlock(d);
> >>>  
> >>
> >> Following up on my earlier mail about paging_log_dirty_range(), I'm
> >> now of the opinion that all of these flushes should go away too. I
> >> can only assume that they got put where they are when HAP code was
> >> cloned from the shadow one. These are only p2m operations, and hence
> >> p2m level TLB flushing is all that's needed here.
> > 
> > IIRC without those ASID flushes NPT won't work correctly, as I think
> > it relies on those flushes when updating the p2m.
> 
> Hmm, yes - at least for this last one (in patch context) I definitely
> agree: NPT's TLB invalidation is quite different from EPT's (and I
> was too focused on the latter when writing the earlier reply).
> Therefore how about keeping hap_flush_tlb_mask() (and its uses), but
> teaching it to do nothing for EPT, while doing an ASID based flush
> for NPT?

I could give that a try. I'm slightly worried that EPT code might rely
on some of those ASID/VPID flushes. It seems like it's trying to do
VPID flushes when needed, but issues could be masked by the ASID/VPID
flushes done by the callers.

> There may then even be the option to have a wider purpose helper,
> dealing with most (all?) of the flushes needed from underneath
> x86/mm/, setting the TLB and HVM_ASID_CORE flags as appropriate. In
> the EPT case the function would still be a no-op (afaict).

That seems nice, we would have to be careful however as reducing the
number of ASID/VPID flushes could uncover issues in the existing code.
I guess you mean something like:

static inline void guest_flush_tlb_mask(const struct domain *d,
                                        const cpumask_t *mask)
{
    flush_mask(mask, (is_pv_domain(d) || shadow_mode_enabled(d) ? FLUSH_TLB
                                                                : 0) |
    		     (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
		                                      : 0));
}

I think this should work, but I would rather do it in a separate
patch. I'm also not fully convinced guest_flush_tlb_mask is the best
name, but I couldn't think of anything else more descriptive of the
purpose of the function.

Thanks, Roger.
Jan Beulich April 14, 2020, 10:13 a.m. UTC | #14
On 14.04.2020 12:02, Roger Pau Monné wrote:
> On Tue, Apr 14, 2020 at 11:01:06AM +0200, Jan Beulich wrote:
>> On 14.04.2020 09:52, Roger Pau Monné wrote:
>>> On Thu, Apr 09, 2020 at 01:54:40PM +0200, Jan Beulich wrote:
>>>> On 06.04.2020 12:57, Roger Pau Monne wrote:
>>>>> --- a/xen/arch/x86/mm/hap/hap.c
>>>>> +++ b/xen/arch/x86/mm/hap/hap.c
>>>>> @@ -118,7 +118,7 @@ int hap_track_dirty_vram(struct domain *d,
>>>>>              p2m_change_type_range(d, begin_pfn, begin_pfn + nr,
>>>>>                                    p2m_ram_rw, p2m_ram_logdirty);
>>>>>  
>>>>> -            flush_tlb_mask(d->dirty_cpumask);
>>>>> +            hap_flush_tlb_mask(d->dirty_cpumask);
>>>>>  
>>>>>              memset(dirty_bitmap, 0xff, size); /* consider all pages dirty */
>>>>>          }
>>>>> @@ -205,7 +205,7 @@ static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
>>>>>           * to be read-only, or via hardware-assisted log-dirty.
>>>>>           */
>>>>>          p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
>>>>> -        flush_tlb_mask(d->dirty_cpumask);
>>>>> +        hap_flush_tlb_mask(d->dirty_cpumask);
>>>>>      }
>>>>>      return 0;
>>>>>  }
>>>>> @@ -234,7 +234,7 @@ static void hap_clean_dirty_bitmap(struct domain *d)
>>>>>       * be read-only, or via hardware-assisted log-dirty.
>>>>>       */
>>>>>      p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
>>>>> -    flush_tlb_mask(d->dirty_cpumask);
>>>>> +    hap_flush_tlb_mask(d->dirty_cpumask);
>>>>>  }
>>>>>  
>>>>>  /************************************************/
>>>>> @@ -798,7 +798,7 @@ hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
>>>>>  
>>>>>      safe_write_pte(p, new);
>>>>>      if ( old_flags & _PAGE_PRESENT )
>>>>> -        flush_tlb_mask(d->dirty_cpumask);
>>>>> +        hap_flush_tlb_mask(d->dirty_cpumask);
>>>>>  
>>>>>      paging_unlock(d);
>>>>>  
>>>>
>>>> Following up on my earlier mail about paging_log_dirty_range(), I'm
>>>> now of the opinion that all of these flushes should go away too. I
>>>> can only assume that they got put where they are when HAP code was
>>>> cloned from the shadow one. These are only p2m operations, and hence
>>>> p2m level TLB flushing is all that's needed here.
>>>
>>> IIRC without those ASID flushes NPT won't work correctly, as I think
>>> it relies on those flushes when updating the p2m.
>>
>> Hmm, yes - at least for this last one (in patch context) I definitely
>> agree: NPT's TLB invalidation is quite different from EPT's (and I
>> was too focused on the latter when writing the earlier reply).
>> Therefore how about keeping hap_flush_tlb_mask() (and its uses), but
>> teaching it to do nothing for EPT, while doing an ASID based flush
>> for NPT?
> 
> I could give that a try. I'm slightly worried that EPT code might rely
> on some of those ASID/VPID flushes. It seems like it's trying to do
> VPID flushes when needed, but issues could be masked by the ASID/VPID
> flushes done by the callers.

I can't seem to find any EPT code doing VPID flushes, and I'd also
not expect to. There's VMX code doing so, yes. EPT should be fully
agnostic to guest virtual address space.

>> There may then even be the option to have a wider purpose helper,
>> dealing with most (all?) of the flushes needed from underneath
>> x86/mm/, setting the TLB and HVM_ASID_CORE flags as appropriate. In
>> the EPT case the function would still be a no-op (afaict).
> 
> That seems nice, we would have to be careful however as reducing the
> number of ASID/VPID flushes could uncover issues in the existing code.
> I guess you mean something like:
> 
> static inline void guest_flush_tlb_mask(const struct domain *d,
>                                         const cpumask_t *mask)
> {
>     flush_mask(mask, (is_pv_domain(d) || shadow_mode_enabled(d) ? FLUSH_TLB
>                                                                 : 0) |
>     		     (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> 		                                      : 0));
> }

Almost - is_hvm_domain(d) && cpu_has_svm seems to wide for me. I'd
rather use hap_enabled() && cpu_has_svm, which effectively means NPT.
Or am I overlooking a need to do ASID flushes also in shadow mode?

Also I'd suggest to calculate the flags up front, to avoid calling
flush_mask() in the first place in case (EPT) neither bit is set.

> I think this should work, but I would rather do it in a separate
> patch.

Yes, just like the originally (wrongly, as you validly say) suggested
full removal of them, putting this in a separate patch would indeed
seem better.

> I'm also not fully convinced guest_flush_tlb_mask is the best
> name, but I couldn't think of anything else more descriptive of the
> purpose of the function.

That's the name I was thinking of, too, despite also not being
entirely happy with it.

Jan
Roger Pau Monné April 14, 2020, 11:19 a.m. UTC | #15
On Tue, Apr 14, 2020 at 12:13:04PM +0200, Jan Beulich wrote:
> On 14.04.2020 12:02, Roger Pau Monné wrote:
> > On Tue, Apr 14, 2020 at 11:01:06AM +0200, Jan Beulich wrote:
> >> On 14.04.2020 09:52, Roger Pau Monné wrote:
> >>> On Thu, Apr 09, 2020 at 01:54:40PM +0200, Jan Beulich wrote:
> >>>> On 06.04.2020 12:57, Roger Pau Monne wrote:
> >>>>> --- a/xen/arch/x86/mm/hap/hap.c
> >>>>> +++ b/xen/arch/x86/mm/hap/hap.c
> >>>>> @@ -118,7 +118,7 @@ int hap_track_dirty_vram(struct domain *d,
> >>>>>              p2m_change_type_range(d, begin_pfn, begin_pfn + nr,
> >>>>>                                    p2m_ram_rw, p2m_ram_logdirty);
> >>>>>  
> >>>>> -            flush_tlb_mask(d->dirty_cpumask);
> >>>>> +            hap_flush_tlb_mask(d->dirty_cpumask);
> >>>>>  
> >>>>>              memset(dirty_bitmap, 0xff, size); /* consider all pages dirty */
> >>>>>          }
> >>>>> @@ -205,7 +205,7 @@ static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
> >>>>>           * to be read-only, or via hardware-assisted log-dirty.
> >>>>>           */
> >>>>>          p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
> >>>>> -        flush_tlb_mask(d->dirty_cpumask);
> >>>>> +        hap_flush_tlb_mask(d->dirty_cpumask);
> >>>>>      }
> >>>>>      return 0;
> >>>>>  }
> >>>>> @@ -234,7 +234,7 @@ static void hap_clean_dirty_bitmap(struct domain *d)
> >>>>>       * be read-only, or via hardware-assisted log-dirty.
> >>>>>       */
> >>>>>      p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
> >>>>> -    flush_tlb_mask(d->dirty_cpumask);
> >>>>> +    hap_flush_tlb_mask(d->dirty_cpumask);
> >>>>>  }
> >>>>>  
> >>>>>  /************************************************/
> >>>>> @@ -798,7 +798,7 @@ hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
> >>>>>  
> >>>>>      safe_write_pte(p, new);
> >>>>>      if ( old_flags & _PAGE_PRESENT )
> >>>>> -        flush_tlb_mask(d->dirty_cpumask);
> >>>>> +        hap_flush_tlb_mask(d->dirty_cpumask);
> >>>>>  
> >>>>>      paging_unlock(d);
> >>>>>  
> >>>>
> >>>> Following up on my earlier mail about paging_log_dirty_range(), I'm
> >>>> now of the opinion that all of these flushes should go away too. I
> >>>> can only assume that they got put where they are when HAP code was
> >>>> cloned from the shadow one. These are only p2m operations, and hence
> >>>> p2m level TLB flushing is all that's needed here.
> >>>
> >>> IIRC without those ASID flushes NPT won't work correctly, as I think
> >>> it relies on those flushes when updating the p2m.
> >>
> >> Hmm, yes - at least for this last one (in patch context) I definitely
> >> agree: NPT's TLB invalidation is quite different from EPT's (and I
> >> was too focused on the latter when writing the earlier reply).
> >> Therefore how about keeping hap_flush_tlb_mask() (and its uses), but
> >> teaching it to do nothing for EPT, while doing an ASID based flush
> >> for NPT?
> > 
> > I could give that a try. I'm slightly worried that EPT code might rely
> > on some of those ASID/VPID flushes. It seems like it's trying to do
> > VPID flushes when needed, but issues could be masked by the ASID/VPID
> > flushes done by the callers.
> 
> I can't seem to find any EPT code doing VPID flushes, and I'd also
> not expect to. There's VMX code doing so, yes. EPT should be fully
> agnostic to guest virtual address space.

I got confused.

> >> There may then even be the option to have a wider purpose helper,
> >> dealing with most (all?) of the flushes needed from underneath
> >> x86/mm/, setting the TLB and HVM_ASID_CORE flags as appropriate. In
> >> the EPT case the function would still be a no-op (afaict).
> > 
> > That seems nice, we would have to be careful however as reducing the
> > number of ASID/VPID flushes could uncover issues in the existing code.
> > I guess you mean something like:
> > 
> > static inline void guest_flush_tlb_mask(const struct domain *d,
> >                                         const cpumask_t *mask)
> > {
> >     flush_mask(mask, (is_pv_domain(d) || shadow_mode_enabled(d) ? FLUSH_TLB
> >                                                                 : 0) |
> >     		     (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> > 		                                      : 0));
> > }
> 
> Almost - is_hvm_domain(d) && cpu_has_svm seems to wide for me. I'd
> rather use hap_enabled() && cpu_has_svm, which effectively means NPT.
> Or am I overlooking a need to do ASID flushes also in shadow mode?

I think so, I've used is_hvm_domain in order to cover for HVM domains
running in shadow mode on AMD hardware, I think those also need the
ASID flushes.

> Also I'd suggest to calculate the flags up front, to avoid calling
> flush_mask() in the first place in case (EPT) neither bit is set.
> 
> > I think this should work, but I would rather do it in a separate
> > patch.
> 
> Yes, just like the originally (wrongly, as you validly say) suggested
> full removal of them, putting this in a separate patch would indeed
> seem better.

Would you like me to resend with the requested fix to
paging_log_dirty_range (ie: drop the FLUSH_TLB and only call
flush_mask for HAP guests running on AMD) then?

Thanks, Roger.
Jan Beulich April 14, 2020, 1:50 p.m. UTC | #16
On 14.04.2020 13:19, Roger Pau Monné wrote:
> On Tue, Apr 14, 2020 at 12:13:04PM +0200, Jan Beulich wrote:
>> On 14.04.2020 12:02, Roger Pau Monné wrote:
>>> That seems nice, we would have to be careful however as reducing the
>>> number of ASID/VPID flushes could uncover issues in the existing code.
>>> I guess you mean something like:
>>>
>>> static inline void guest_flush_tlb_mask(const struct domain *d,
>>>                                         const cpumask_t *mask)
>>> {
>>>     flush_mask(mask, (is_pv_domain(d) || shadow_mode_enabled(d) ? FLUSH_TLB
>>>                                                                 : 0) |
>>>     		     (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
>>> 		                                      : 0));
>>> }
>>
>> Almost - is_hvm_domain(d) && cpu_has_svm seems to wide for me. I'd
>> rather use hap_enabled() && cpu_has_svm, which effectively means NPT.
>> Or am I overlooking a need to do ASID flushes also in shadow mode?
> 
> I think so, I've used is_hvm_domain in order to cover for HVM domains
> running in shadow mode on AMD hardware, I think those also need the
> ASID flushes.

I'm unconvinced: The entire section "TLB Management" in the PM gives
the impression that "ordinary" TLB flushing covers all ASIDs anyway.
It's not stated anywhere (I could find) explicitly though.

>> Also I'd suggest to calculate the flags up front, to avoid calling
>> flush_mask() in the first place in case (EPT) neither bit is set.
>>
>>> I think this should work, but I would rather do it in a separate
>>> patch.
>>
>> Yes, just like the originally (wrongly, as you validly say) suggested
>> full removal of them, putting this in a separate patch would indeed
>> seem better.
> 
> Would you like me to resend with the requested fix to
> paging_log_dirty_range (ie: drop the FLUSH_TLB and only call
> flush_mask for HAP guests running on AMD) then?

Well, ideally I'd see that function also make use of the intended
new helper function, if at all possible (and suitable).

Jan
Roger Pau Monné April 14, 2020, 2:53 p.m. UTC | #17
On Tue, Apr 14, 2020 at 03:50:15PM +0200, Jan Beulich wrote:
> On 14.04.2020 13:19, Roger Pau Monné wrote:
> > On Tue, Apr 14, 2020 at 12:13:04PM +0200, Jan Beulich wrote:
> >> On 14.04.2020 12:02, Roger Pau Monné wrote:
> >>> That seems nice, we would have to be careful however as reducing the
> >>> number of ASID/VPID flushes could uncover issues in the existing code.
> >>> I guess you mean something like:
> >>>
> >>> static inline void guest_flush_tlb_mask(const struct domain *d,
> >>>                                         const cpumask_t *mask)
> >>> {
> >>>     flush_mask(mask, (is_pv_domain(d) || shadow_mode_enabled(d) ? FLUSH_TLB
> >>>                                                                 : 0) |
> >>>     		     (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> >>> 		                                      : 0));
> >>> }
> >>
> >> Almost - is_hvm_domain(d) && cpu_has_svm seems to wide for me. I'd
> >> rather use hap_enabled() && cpu_has_svm, which effectively means NPT.
> >> Or am I overlooking a need to do ASID flushes also in shadow mode?
> > 
> > I think so, I've used is_hvm_domain in order to cover for HVM domains
> > running in shadow mode on AMD hardware, I think those also need the
> > ASID flushes.
> 
> I'm unconvinced: The entire section "TLB Management" in the PM gives
> the impression that "ordinary" TLB flushing covers all ASIDs anyway.
> It's not stated anywhere (I could find) explicitly though.

Hm, I don't think so. XenRT found a myriad of issues with NPT when p2m
code wasn't modified to do ASID flushes instead of plain TLB flushes.
Even if it's just to stay on the safe side I would perform ASID
flushes for HVM guests with shadow running on AMD.

> >> Also I'd suggest to calculate the flags up front, to avoid calling
> >> flush_mask() in the first place in case (EPT) neither bit is set.
> >>
> >>> I think this should work, but I would rather do it in a separate
> >>> patch.
> >>
> >> Yes, just like the originally (wrongly, as you validly say) suggested
> >> full removal of them, putting this in a separate patch would indeed
> >> seem better.
> > 
> > Would you like me to resend with the requested fix to
> > paging_log_dirty_range (ie: drop the FLUSH_TLB and only call
> > flush_mask for HAP guests running on AMD) then?
> 
> Well, ideally I'd see that function also make use of the intended
> new helper function, if at all possible (and suitable).

Oh, OK. Just to make sure I understand what you are asking for, you
would like me to resend introducing the new guest_flush_tlb_mask
helper and use it in the flush_tlb_mask callers modified by this
patch?

Thanks, Roger.
Jan Beulich April 14, 2020, 3:06 p.m. UTC | #18
On 14.04.2020 16:53, Roger Pau Monné wrote:
> On Tue, Apr 14, 2020 at 03:50:15PM +0200, Jan Beulich wrote:
>> On 14.04.2020 13:19, Roger Pau Monné wrote:
>>> On Tue, Apr 14, 2020 at 12:13:04PM +0200, Jan Beulich wrote:
>>>> On 14.04.2020 12:02, Roger Pau Monné wrote:
>>>>> That seems nice, we would have to be careful however as reducing the
>>>>> number of ASID/VPID flushes could uncover issues in the existing code.
>>>>> I guess you mean something like:
>>>>>
>>>>> static inline void guest_flush_tlb_mask(const struct domain *d,
>>>>>                                         const cpumask_t *mask)
>>>>> {
>>>>>     flush_mask(mask, (is_pv_domain(d) || shadow_mode_enabled(d) ? FLUSH_TLB
>>>>>                                                                 : 0) |
>>>>>     		     (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
>>>>> 		                                      : 0));
>>>>> }
>>>>
>>>> Almost - is_hvm_domain(d) && cpu_has_svm seems to wide for me. I'd
>>>> rather use hap_enabled() && cpu_has_svm, which effectively means NPT.
>>>> Or am I overlooking a need to do ASID flushes also in shadow mode?
>>>
>>> I think so, I've used is_hvm_domain in order to cover for HVM domains
>>> running in shadow mode on AMD hardware, I think those also need the
>>> ASID flushes.
>>
>> I'm unconvinced: The entire section "TLB Management" in the PM gives
>> the impression that "ordinary" TLB flushing covers all ASIDs anyway.
>> It's not stated anywhere (I could find) explicitly though.
> 
> Hm, I don't think so. XenRT found a myriad of issues with NPT when p2m
> code wasn't modified to do ASID flushes instead of plain TLB flushes.

Well, that's clear from e.g. p2m_pt_set_entry() not doing any
flushing itself.

> Even if it's just to stay on the safe side I would perform ASID
> flushes for HVM guests with shadow running on AMD.

Tim, any chance you could voice your thoughts here? To me it seems
odd to do an all-ASIDs flush followed by an ASID one.

>>>> Also I'd suggest to calculate the flags up front, to avoid calling
>>>> flush_mask() in the first place in case (EPT) neither bit is set.
>>>>
>>>>> I think this should work, but I would rather do it in a separate
>>>>> patch.
>>>>
>>>> Yes, just like the originally (wrongly, as you validly say) suggested
>>>> full removal of them, putting this in a separate patch would indeed
>>>> seem better.
>>>
>>> Would you like me to resend with the requested fix to
>>> paging_log_dirty_range (ie: drop the FLUSH_TLB and only call
>>> flush_mask for HAP guests running on AMD) then?
>>
>> Well, ideally I'd see that function also make use of the intended
>> new helper function, if at all possible (and suitable).
> 
> Oh, OK. Just to make sure I understand what you are asking for, you
> would like me to resend introducing the new guest_flush_tlb_mask
> helper and use it in the flush_tlb_mask callers modified by this
> patch?

Yes (and I now realize it may not make sense to split it off into a
separate change).

Jan
Roger Pau Monné April 15, 2020, 11:49 a.m. UTC | #19
On Tue, Apr 14, 2020 at 05:06:23PM +0200, Jan Beulich wrote:
> On 14.04.2020 16:53, Roger Pau Monné wrote:
> > On Tue, Apr 14, 2020 at 03:50:15PM +0200, Jan Beulich wrote:
> >> On 14.04.2020 13:19, Roger Pau Monné wrote:
> >>>>> I think this should work, but I would rather do it in a separate
> >>>>> patch.
> >>>>
> >>>> Yes, just like the originally (wrongly, as you validly say) suggested
> >>>> full removal of them, putting this in a separate patch would indeed
> >>>> seem better.
> >>>
> >>> Would you like me to resend with the requested fix to
> >>> paging_log_dirty_range (ie: drop the FLUSH_TLB and only call
> >>> flush_mask for HAP guests running on AMD) then?
> >>
> >> Well, ideally I'd see that function also make use of the intended
> >> new helper function, if at all possible (and suitable).
> > 
> > Oh, OK. Just to make sure I understand what you are asking for, you
> > would like me to resend introducing the new guest_flush_tlb_mask
> > helper and use it in the flush_tlb_mask callers modified by this
> > patch?
> 
> Yes (and I now realize it may not make sense to split it off into a
> separate change).

I could do a pre-patch that introduces guest_flush_tlb_mask as a
simple wrapper around flush_tlb_mask and replace the callers that I
modify in this patch. Then this patch would only introduce
FLUSH_HVM_ASID_CORE and modify guest_flush_tlb_mask to use it when
required.

It might make it more complicated to see which callers require the
ASID flush, but if you think it's better I can arrange the patches in
that way.

Thanks, Roger.
Jan Beulich April 15, 2020, 11:51 a.m. UTC | #20
On 15.04.2020 13:49, Roger Pau Monné wrote:
> On Tue, Apr 14, 2020 at 05:06:23PM +0200, Jan Beulich wrote:
>> On 14.04.2020 16:53, Roger Pau Monné wrote:
>>> On Tue, Apr 14, 2020 at 03:50:15PM +0200, Jan Beulich wrote:
>>>> On 14.04.2020 13:19, Roger Pau Monné wrote:
>>>>>>> I think this should work, but I would rather do it in a separate
>>>>>>> patch.
>>>>>>
>>>>>> Yes, just like the originally (wrongly, as you validly say) suggested
>>>>>> full removal of them, putting this in a separate patch would indeed
>>>>>> seem better.
>>>>>
>>>>> Would you like me to resend with the requested fix to
>>>>> paging_log_dirty_range (ie: drop the FLUSH_TLB and only call
>>>>> flush_mask for HAP guests running on AMD) then?
>>>>
>>>> Well, ideally I'd see that function also make use of the intended
>>>> new helper function, if at all possible (and suitable).
>>>
>>> Oh, OK. Just to make sure I understand what you are asking for, you
>>> would like me to resend introducing the new guest_flush_tlb_mask
>>> helper and use it in the flush_tlb_mask callers modified by this
>>> patch?
>>
>> Yes (and I now realize it may not make sense to split it off into a
>> separate change).
> 
> I could do a pre-patch that introduces guest_flush_tlb_mask as a
> simple wrapper around flush_tlb_mask and replace the callers that I
> modify in this patch. Then this patch would only introduce
> FLUSH_HVM_ASID_CORE and modify guest_flush_tlb_mask to use it when
> required.
> 
> It might make it more complicated to see which callers require the
> ASID flush, but if you think it's better I can arrange the patches in
> that way.

No, I think it's beteer to leave as a single patch. The idea with
splitting was that you'd (fully) take care of some of the sites
needing modification ahead of what is now patch 1. I.e. this would
have been a suitable approach only if some use sites could really
have the call dropped altogether.

Jan
Roger Pau Monné April 15, 2020, 2:49 p.m. UTC | #21
On Tue, Apr 14, 2020 at 05:06:23PM +0200, Jan Beulich wrote:
> On 14.04.2020 16:53, Roger Pau Monné wrote:
> > On Tue, Apr 14, 2020 at 03:50:15PM +0200, Jan Beulich wrote:
> >> On 14.04.2020 13:19, Roger Pau Monné wrote:
> >>> On Tue, Apr 14, 2020 at 12:13:04PM +0200, Jan Beulich wrote:
> >>>> On 14.04.2020 12:02, Roger Pau Monné wrote:
> >>>>> That seems nice, we would have to be careful however as reducing the
> >>>>> number of ASID/VPID flushes could uncover issues in the existing code.
> >>>>> I guess you mean something like:
> >>>>>
> >>>>> static inline void guest_flush_tlb_mask(const struct domain *d,
> >>>>>                                         const cpumask_t *mask)
> >>>>> {
> >>>>>     flush_mask(mask, (is_pv_domain(d) || shadow_mode_enabled(d) ? FLUSH_TLB
> >>>>>                                                                 : 0) |
> >>>>>     		     (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> >>>>> 		                                      : 0));
> >>>>> }
> >>>>
> >>>> Almost - is_hvm_domain(d) && cpu_has_svm seems to wide for me. I'd
> >>>> rather use hap_enabled() && cpu_has_svm, which effectively means NPT.
> >>>> Or am I overlooking a need to do ASID flushes also in shadow mode?
> >>>
> >>> I think so, I've used is_hvm_domain in order to cover for HVM domains
> >>> running in shadow mode on AMD hardware, I think those also need the
> >>> ASID flushes.
> >>
> >> I'm unconvinced: The entire section "TLB Management" in the PM gives
> >> the impression that "ordinary" TLB flushing covers all ASIDs anyway.
> >> It's not stated anywhere (I could find) explicitly though.
> > 
> > Hm, I don't think so. XenRT found a myriad of issues with NPT when p2m
> > code wasn't modified to do ASID flushes instead of plain TLB flushes.
> 
> Well, that's clear from e.g. p2m_pt_set_entry() not doing any
> flushing itself.
> 
> > Even if it's just to stay on the safe side I would perform ASID
> > flushes for HVM guests with shadow running on AMD.
> 
> Tim, any chance you could voice your thoughts here? To me it seems
> odd to do an all-ASIDs flush followed by an ASID one.

I've been reading a bit more into this, and section 15.16.1 states:

"TLB flush operations must not be assumed to affect all ASIDs."

Since the VMM runs on it's own ASID it's my understanding that doing a
TLB flush from the VMM does not flush any of the guests TLBs, and
hence an ASID flush is still required.

Thanks, Roger.
Jan Beulich April 15, 2020, 3:42 p.m. UTC | #22
On 15.04.2020 16:49, Roger Pau Monné wrote:
> On Tue, Apr 14, 2020 at 05:06:23PM +0200, Jan Beulich wrote:
>> On 14.04.2020 16:53, Roger Pau Monné wrote:
>>> On Tue, Apr 14, 2020 at 03:50:15PM +0200, Jan Beulich wrote:
>>>> On 14.04.2020 13:19, Roger Pau Monné wrote:
>>>>> On Tue, Apr 14, 2020 at 12:13:04PM +0200, Jan Beulich wrote:
>>>>>> On 14.04.2020 12:02, Roger Pau Monné wrote:
>>>>>>> That seems nice, we would have to be careful however as reducing the
>>>>>>> number of ASID/VPID flushes could uncover issues in the existing code.
>>>>>>> I guess you mean something like:
>>>>>>>
>>>>>>> static inline void guest_flush_tlb_mask(const struct domain *d,
>>>>>>>                                         const cpumask_t *mask)
>>>>>>> {
>>>>>>>     flush_mask(mask, (is_pv_domain(d) || shadow_mode_enabled(d) ? FLUSH_TLB
>>>>>>>                                                                 : 0) |
>>>>>>>     		     (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
>>>>>>> 		                                      : 0));
>>>>>>> }
>>>>>>
>>>>>> Almost - is_hvm_domain(d) && cpu_has_svm seems to wide for me. I'd
>>>>>> rather use hap_enabled() && cpu_has_svm, which effectively means NPT.
>>>>>> Or am I overlooking a need to do ASID flushes also in shadow mode?
>>>>>
>>>>> I think so, I've used is_hvm_domain in order to cover for HVM domains
>>>>> running in shadow mode on AMD hardware, I think those also need the
>>>>> ASID flushes.
>>>>
>>>> I'm unconvinced: The entire section "TLB Management" in the PM gives
>>>> the impression that "ordinary" TLB flushing covers all ASIDs anyway.
>>>> It's not stated anywhere (I could find) explicitly though.
>>>
>>> Hm, I don't think so. XenRT found a myriad of issues with NPT when p2m
>>> code wasn't modified to do ASID flushes instead of plain TLB flushes.
>>
>> Well, that's clear from e.g. p2m_pt_set_entry() not doing any
>> flushing itself.
>>
>>> Even if it's just to stay on the safe side I would perform ASID
>>> flushes for HVM guests with shadow running on AMD.
>>
>> Tim, any chance you could voice your thoughts here? To me it seems
>> odd to do an all-ASIDs flush followed by an ASID one.
> 
> I've been reading a bit more into this, and section 15.16.1 states:
> 
> "TLB flush operations must not be assumed to affect all ASIDs."

That's the section talking about the tlb_control VMCB field. It is
in this context that the sentence needs to be interpreted, imo.

Jan
Roger Pau Monné April 15, 2020, 3:54 p.m. UTC | #23
On Wed, Apr 15, 2020 at 05:42:20PM +0200, Jan Beulich wrote:
> On 15.04.2020 16:49, Roger Pau Monné wrote:
> > On Tue, Apr 14, 2020 at 05:06:23PM +0200, Jan Beulich wrote:
> >> On 14.04.2020 16:53, Roger Pau Monné wrote:
> >>> On Tue, Apr 14, 2020 at 03:50:15PM +0200, Jan Beulich wrote:
> >>>> On 14.04.2020 13:19, Roger Pau Monné wrote:
> >>>>> On Tue, Apr 14, 2020 at 12:13:04PM +0200, Jan Beulich wrote:
> >>>>>> On 14.04.2020 12:02, Roger Pau Monné wrote:
> >>>>>>> That seems nice, we would have to be careful however as reducing the
> >>>>>>> number of ASID/VPID flushes could uncover issues in the existing code.
> >>>>>>> I guess you mean something like:
> >>>>>>>
> >>>>>>> static inline void guest_flush_tlb_mask(const struct domain *d,
> >>>>>>>                                         const cpumask_t *mask)
> >>>>>>> {
> >>>>>>>     flush_mask(mask, (is_pv_domain(d) || shadow_mode_enabled(d) ? FLUSH_TLB
> >>>>>>>                                                                 : 0) |
> >>>>>>>     		     (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> >>>>>>> 		                                      : 0));
> >>>>>>> }
> >>>>>>
> >>>>>> Almost - is_hvm_domain(d) && cpu_has_svm seems to wide for me. I'd
> >>>>>> rather use hap_enabled() && cpu_has_svm, which effectively means NPT.
> >>>>>> Or am I overlooking a need to do ASID flushes also in shadow mode?
> >>>>>
> >>>>> I think so, I've used is_hvm_domain in order to cover for HVM domains
> >>>>> running in shadow mode on AMD hardware, I think those also need the
> >>>>> ASID flushes.
> >>>>
> >>>> I'm unconvinced: The entire section "TLB Management" in the PM gives
> >>>> the impression that "ordinary" TLB flushing covers all ASIDs anyway.
> >>>> It's not stated anywhere (I could find) explicitly though.
> >>>
> >>> Hm, I don't think so. XenRT found a myriad of issues with NPT when p2m
> >>> code wasn't modified to do ASID flushes instead of plain TLB flushes.
> >>
> >> Well, that's clear from e.g. p2m_pt_set_entry() not doing any
> >> flushing itself.
> >>
> >>> Even if it's just to stay on the safe side I would perform ASID
> >>> flushes for HVM guests with shadow running on AMD.
> >>
> >> Tim, any chance you could voice your thoughts here? To me it seems
> >> odd to do an all-ASIDs flush followed by an ASID one.
> > 
> > I've been reading a bit more into this, and section 15.16.1 states:
> > 
> > "TLB flush operations must not be assumed to affect all ASIDs."
> 
> That's the section talking about the tlb_control VMCB field. It is
> in this context that the sentence needs to be interpreted, imo.

It explicitly mentions move-to-cr3 and move-to-cr4 before that phrase:

"TLB flush operations function identically whether or not SVM is
enabled (e.g., MOV-TO-CR3 flushes non-global mappings, whereas
MOV-TO-CR4 flushes global and non-global mappings). TLB flush
operations must not be assumed to affect all ASIDs."

So my reading is that normal flush operations not involving
tlb_control VMCB field should not assume to flush all ASIDs. Again
this is of course my interpretation of the text in the PM. I will go
with my suggested approach, which is safer and should cause no
functional issues AFAICT. The only downside is the maybe redundant
flush, but that's safe.

Thanks, Roger.
Jan Beulich April 15, 2020, 3:59 p.m. UTC | #24
On 15.04.2020 17:54, Roger Pau Monné wrote:
> On Wed, Apr 15, 2020 at 05:42:20PM +0200, Jan Beulich wrote:
>> On 15.04.2020 16:49, Roger Pau Monné wrote:
>>> On Tue, Apr 14, 2020 at 05:06:23PM +0200, Jan Beulich wrote:
>>>> On 14.04.2020 16:53, Roger Pau Monné wrote:
>>>>> On Tue, Apr 14, 2020 at 03:50:15PM +0200, Jan Beulich wrote:
>>>>>> On 14.04.2020 13:19, Roger Pau Monné wrote:
>>>>>>> On Tue, Apr 14, 2020 at 12:13:04PM +0200, Jan Beulich wrote:
>>>>>>>> On 14.04.2020 12:02, Roger Pau Monné wrote:
>>>>>>>>> That seems nice, we would have to be careful however as reducing the
>>>>>>>>> number of ASID/VPID flushes could uncover issues in the existing code.
>>>>>>>>> I guess you mean something like:
>>>>>>>>>
>>>>>>>>> static inline void guest_flush_tlb_mask(const struct domain *d,
>>>>>>>>>                                         const cpumask_t *mask)
>>>>>>>>> {
>>>>>>>>>     flush_mask(mask, (is_pv_domain(d) || shadow_mode_enabled(d) ? FLUSH_TLB
>>>>>>>>>                                                                 : 0) |
>>>>>>>>>     		     (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
>>>>>>>>> 		                                      : 0));
>>>>>>>>> }
>>>>>>>>
>>>>>>>> Almost - is_hvm_domain(d) && cpu_has_svm seems to wide for me. I'd
>>>>>>>> rather use hap_enabled() && cpu_has_svm, which effectively means NPT.
>>>>>>>> Or am I overlooking a need to do ASID flushes also in shadow mode?
>>>>>>>
>>>>>>> I think so, I've used is_hvm_domain in order to cover for HVM domains
>>>>>>> running in shadow mode on AMD hardware, I think those also need the
>>>>>>> ASID flushes.
>>>>>>
>>>>>> I'm unconvinced: The entire section "TLB Management" in the PM gives
>>>>>> the impression that "ordinary" TLB flushing covers all ASIDs anyway.
>>>>>> It's not stated anywhere (I could find) explicitly though.
>>>>>
>>>>> Hm, I don't think so. XenRT found a myriad of issues with NPT when p2m
>>>>> code wasn't modified to do ASID flushes instead of plain TLB flushes.
>>>>
>>>> Well, that's clear from e.g. p2m_pt_set_entry() not doing any
>>>> flushing itself.
>>>>
>>>>> Even if it's just to stay on the safe side I would perform ASID
>>>>> flushes for HVM guests with shadow running on AMD.
>>>>
>>>> Tim, any chance you could voice your thoughts here? To me it seems
>>>> odd to do an all-ASIDs flush followed by an ASID one.
>>>
>>> I've been reading a bit more into this, and section 15.16.1 states:
>>>
>>> "TLB flush operations must not be assumed to affect all ASIDs."
>>
>> That's the section talking about the tlb_control VMCB field. It is
>> in this context that the sentence needs to be interpreted, imo.
> 
> It explicitly mentions move-to-cr3 and move-to-cr4 before that phrase:
> 
> "TLB flush operations function identically whether or not SVM is
> enabled (e.g., MOV-TO-CR3 flushes non-global mappings, whereas
> MOV-TO-CR4 flushes global and non-global mappings). TLB flush
> operations must not be assumed to affect all ASIDs."

Hmm, indeed. How did I not spot this already when reading this the
other day?

> So my reading is that normal flush operations not involving
> tlb_control VMCB field should not assume to flush all ASIDs. Again
> this is of course my interpretation of the text in the PM. I will go
> with my suggested approach, which is safer and should cause no
> functional issues AFAICT. The only downside is the maybe redundant
> flush, but that's safe.

Okay. And I'm sorry for having attempted to mislead you.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 03f92c23dc..c81e53c0ae 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -59,8 +59,6 @@  static u32 pre_flush(void)
         raise_softirq(NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ);
 
  skip_clocktick:
-    hvm_flush_guest_tlbs();
-
     return t2;
 }
 
@@ -118,6 +116,7 @@  void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
     local_irq_save(flags);
 
     t = pre_flush();
+    hvm_flush_guest_tlbs();
 
     old_cr4 = read_cr4();
     ASSERT(!(old_cr4 & X86_CR4_PCIDE) || !(old_cr4 & X86_CR4_PGE));
@@ -221,6 +220,9 @@  unsigned int flush_area_local(const void *va, unsigned int flags)
             do_tlb_flush();
     }
 
+    if ( flags & FLUSH_HVM_ASID_CORE )
+        hvm_flush_guest_tlbs();
+
     if ( flags & FLUSH_CACHE )
     {
         const struct cpuinfo_x86 *c = &current_cpu_data;
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index a6d5e39b02..12856245be 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -118,7 +118,7 @@  int hap_track_dirty_vram(struct domain *d,
             p2m_change_type_range(d, begin_pfn, begin_pfn + nr,
                                   p2m_ram_rw, p2m_ram_logdirty);
 
-            flush_tlb_mask(d->dirty_cpumask);
+            hap_flush_tlb_mask(d->dirty_cpumask);
 
             memset(dirty_bitmap, 0xff, size); /* consider all pages dirty */
         }
@@ -205,7 +205,7 @@  static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
          * to be read-only, or via hardware-assisted log-dirty.
          */
         p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
-        flush_tlb_mask(d->dirty_cpumask);
+        hap_flush_tlb_mask(d->dirty_cpumask);
     }
     return 0;
 }
@@ -234,7 +234,7 @@  static void hap_clean_dirty_bitmap(struct domain *d)
      * be read-only, or via hardware-assisted log-dirty.
      */
     p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
-    flush_tlb_mask(d->dirty_cpumask);
+    hap_flush_tlb_mask(d->dirty_cpumask);
 }
 
 /************************************************/
@@ -798,7 +798,7 @@  hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
 
     safe_write_pte(p, new);
     if ( old_flags & _PAGE_PRESENT )
-        flush_tlb_mask(d->dirty_cpumask);
+        hap_flush_tlb_mask(d->dirty_cpumask);
 
     paging_unlock(d);
 
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index abe5958a52..02a5ae75c0 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -84,7 +84,7 @@  nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
     safe_write_pte(p, new);
 
     if (old_flags & _PAGE_PRESENT)
-        flush_tlb_mask(p2m->dirty_cpumask);
+        hap_flush_tlb_mask(p2m->dirty_cpumask);
 
     paging_unlock(d);
 
diff --git a/xen/arch/x86/mm/hap/private.h b/xen/arch/x86/mm/hap/private.h
index 973fbe8be5..7ee8480d83 100644
--- a/xen/arch/x86/mm/hap/private.h
+++ b/xen/arch/x86/mm/hap/private.h
@@ -47,4 +47,9 @@  unsigned long hap_p2m_ga_to_gfn_4_levels(struct vcpu *v,
     struct p2m_domain *p2m, unsigned long cr3,
     paddr_t ga, uint32_t *pfec, unsigned int *page_order);
 
+static inline void hap_flush_tlb_mask(const cpumask_t *mask)
+{
+    flush_mask(mask, FLUSH_HVM_ASID_CORE);
+}
+
 #endif /* __HAP_PRIVATE_H__ */
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index eb66077496..c90032dc88 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -896,7 +896,7 @@  static void p2m_pt_change_entry_type_global(struct p2m_domain *p2m,
     unmap_domain_page(tab);
 
     if ( changed )
-         flush_tlb_mask(p2m->domain->dirty_cpumask);
+         flush_mask(p2m->domain->dirty_cpumask, FLUSH_HVM_ASID_CORE);
 }
 
 static int p2m_pt_change_entry_type_range(struct p2m_domain *p2m,
diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index 469bb76429..d0bccaf7eb 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -613,7 +613,8 @@  void paging_log_dirty_range(struct domain *d,
 
     p2m_unlock(p2m);
 
-    flush_tlb_mask(d->dirty_cpumask);
+    flush_mask(d->dirty_cpumask, (!hap_enabled(d) ? FLUSH_TLB : 0) |
+                                 FLUSH_HVM_ASID_CORE);
 }
 
 /*
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 75dd414a6e..467e0d3fe1 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -363,7 +363,7 @@  static int oos_remove_write_access(struct vcpu *v, mfn_t gmfn,
     }
 
     if ( ftlb )
-        flush_tlb_mask(d->dirty_cpumask);
+        sh_flush_tlb_mask(d, d->dirty_cpumask);
 
     return 0;
 }
@@ -939,7 +939,7 @@  static void _shadow_prealloc(struct domain *d, unsigned int pages)
                 /* See if that freed up enough space */
                 if ( d->arch.paging.shadow.free_pages >= pages )
                 {
-                    flush_tlb_mask(d->dirty_cpumask);
+                    sh_flush_tlb_mask(d, d->dirty_cpumask);
                     return;
                 }
             }
@@ -993,7 +993,7 @@  static void shadow_blow_tables(struct domain *d)
                                pagetable_get_mfn(v->arch.shadow_table[i]), 0);
 
     /* Make sure everyone sees the unshadowings */
-    flush_tlb_mask(d->dirty_cpumask);
+    sh_flush_tlb_mask(d, d->dirty_cpumask);
 }
 
 void shadow_blow_tables_per_domain(struct domain *d)
@@ -1102,7 +1102,7 @@  mfn_t shadow_alloc(struct domain *d,
         if ( unlikely(!cpumask_empty(&mask)) )
         {
             perfc_incr(shadow_alloc_tlbflush);
-            flush_tlb_mask(&mask);
+            sh_flush_tlb_mask(d, &mask);
         }
         /* Now safe to clear the page for reuse */
         clear_domain_page(page_to_mfn(sp));
@@ -2293,7 +2293,7 @@  void sh_remove_shadows(struct domain *d, mfn_t gmfn, int fast, int all)
 
     /* Need to flush TLBs now, so that linear maps are safe next time we
      * take a fault. */
-    flush_tlb_mask(d->dirty_cpumask);
+    sh_flush_tlb_mask(d, d->dirty_cpumask);
 
     paging_unlock(d);
 }
@@ -3008,7 +3008,7 @@  static void sh_unshadow_for_p2m_change(struct domain *d, unsigned long gfn,
         {
             sh_remove_all_shadows_and_parents(d, mfn);
             if ( sh_remove_all_mappings(d, mfn, _gfn(gfn)) )
-                flush_tlb_mask(d->dirty_cpumask);
+                sh_flush_tlb_mask(d, d->dirty_cpumask);
         }
     }
 
@@ -3048,7 +3048,7 @@  static void sh_unshadow_for_p2m_change(struct domain *d, unsigned long gfn,
                 }
                 omfn = mfn_add(omfn, 1);
             }
-            flush_tlb_mask(&flushmask);
+            sh_flush_tlb_mask(d, &flushmask);
 
             if ( npte )
                 unmap_domain_page(npte);
@@ -3335,7 +3335,7 @@  int shadow_track_dirty_vram(struct domain *d,
         }
     }
     if ( flush_tlb )
-        flush_tlb_mask(d->dirty_cpumask);
+        sh_flush_tlb_mask(d, d->dirty_cpumask);
     goto out;
 
 out_sl1ma:
@@ -3405,7 +3405,7 @@  bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
     }
 
     /* Flush TLBs on all CPUs with dirty vcpu state. */
-    flush_tlb_mask(mask);
+    sh_flush_tlb_mask(d, mask);
 
     /* Done. */
     for_each_vcpu ( d, v )
diff --git a/xen/arch/x86/mm/shadow/hvm.c b/xen/arch/x86/mm/shadow/hvm.c
index 1e6024c71f..149f346a48 100644
--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -591,7 +591,7 @@  static void validate_guest_pt_write(struct vcpu *v, mfn_t gmfn,
 
     if ( rc & SHADOW_SET_FLUSH )
         /* Need to flush TLBs to pick up shadow PT changes */
-        flush_tlb_mask(d->dirty_cpumask);
+        sh_flush_tlb_mask(d, d->dirty_cpumask);
 
     if ( rc & SHADOW_SET_ERROR )
     {
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index f6b1628742..17af28cdbd 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3067,7 +3067,7 @@  static int sh_page_fault(struct vcpu *v,
         perfc_incr(shadow_rm_write_flush_tlb);
         smp_wmb();
         atomic_inc(&d->arch.paging.shadow.gtable_dirty_version);
-        flush_tlb_mask(d->dirty_cpumask);
+        sh_flush_tlb_mask(d, d->dirty_cpumask);
     }
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
@@ -3576,7 +3576,8 @@  static bool sh_invlpg(struct vcpu *v, unsigned long linear)
     if ( mfn_to_page(sl1mfn)->u.sh.type
          == SH_type_fl1_shadow )
     {
-        flush_tlb_local();
+        flush_local(FLUSH_TLB |
+                    (is_hvm_domain(v->domain) ? FLUSH_HVM_ASID_CORE : 0));
         return false;
     }
 
@@ -3811,7 +3812,7 @@  sh_update_linear_entries(struct vcpu *v)
          * table entry. But, without this change, it would fetch the wrong
          * value due to a stale TLB.
          */
-        flush_tlb_local();
+        flush_local(FLUSH_TLB | (is_hvm_domain(d) ? FLUSH_HVM_ASID_CORE : 0));
     }
 }
 
@@ -4012,7 +4013,7 @@  sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
      * (old) shadow linear maps in the writeable mapping heuristics. */
 #if GUEST_PAGING_LEVELS == 2
     if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
-        flush_tlb_mask(d->dirty_cpumask);
+        sh_flush_tlb_mask(d, d->dirty_cpumask);
     sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow);
 #elif GUEST_PAGING_LEVELS == 3
     /* PAE guests have four shadow_table entries, based on the
@@ -4036,7 +4037,7 @@  sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
             }
         }
         if ( flush )
-            flush_tlb_mask(d->dirty_cpumask);
+            sh_flush_tlb_mask(d, d->dirty_cpumask);
         /* Now install the new shadows. */
         for ( i = 0; i < 4; i++ )
         {
@@ -4057,7 +4058,7 @@  sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
     }
 #elif GUEST_PAGING_LEVELS == 4
     if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
-        flush_tlb_mask(d->dirty_cpumask);
+        sh_flush_tlb_mask(d, d->dirty_cpumask);
     sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow);
     if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
     {
@@ -4503,7 +4504,7 @@  static void sh_pagetable_dying(paddr_t gpa)
         }
     }
     if ( flush )
-        flush_tlb_mask(d->dirty_cpumask);
+        sh_flush_tlb_mask(d, d->dirty_cpumask);
 
     /* Remember that we've seen the guest use this interface, so we
      * can rely on it using it in future, instead of guessing at
@@ -4540,7 +4541,7 @@  static void sh_pagetable_dying(paddr_t gpa)
         mfn_to_page(gmfn)->pagetable_dying = true;
         shadow_unhook_mappings(d, smfn, 1/* user pages only */);
         /* Now flush the TLB: we removed toplevel mappings. */
-        flush_tlb_mask(d->dirty_cpumask);
+        sh_flush_tlb_mask(d, d->dirty_cpumask);
     }
 
     /* Remember that we've seen the guest use this interface, so we
diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
index e8b028a365..2404ca4ff8 100644
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -818,6 +818,12 @@  static inline int sh_check_page_has_no_refs(struct page_info *page)
 bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
                       void *ctxt);
 
+static inline void sh_flush_tlb_mask(const struct domain *d,
+                                     const cpumask_t *mask)
+{
+    flush_mask(mask, FLUSH_TLB | (is_hvm_domain(d) ? FLUSH_HVM_ASID_CORE : 0));
+}
+
 #endif /* _XEN_SHADOW_PRIVATE_H */
 
 /*
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 2cfe4e6e97..579dc56803 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -105,6 +105,12 @@  void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
 #define FLUSH_VCPU_STATE 0x1000
  /* Flush the per-cpu root page table */
 #define FLUSH_ROOT_PGTBL 0x2000
+#if CONFIG_HVM
+ /* Flush all HVM guests linear TLB (using ASID/VPID) */
+#define FLUSH_HVM_ASID_CORE 0x4000
+#else
+#define FLUSH_HVM_ASID_CORE 0
+#endif
 
 /* Flush local TLBs/caches. */
 unsigned int flush_area_local(const void *va, unsigned int flags);