diff mbox series

[v2] x86: do not enable global pages when virtualized on AMD hardware

Message ID 20191204104420.34418-1-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series [v2] x86: do not enable global pages when virtualized on AMD hardware | expand

Commit Message

Roger Pau Monné Dec. 4, 2019, 10:44 a.m. UTC
When using global pages a full tlb flush can only be performed by
toggling the PGE bit in CR4, which is usually quite expensive in terms
of performance when running virtualized. This is specially relevant on
AMD hardware, which doesn't have the ability to do selective CR4
trapping, but can also be relevant on Intel if the underlying
hypervisor also traps accesses to the PGE CR4 bit.

In order to avoid this performance penalty, do not use global pages
when running virtualized on AMD hardware. A command line option
'global-pages' is provided in order to allow the user to select
whether global pages will be enabled for PV guests.

The above figures are from a PV shim running on AMD hardware with
32 vCPUs:

PGE enabled, x2APIC mode:

(XEN) Global lock flush_lock: addr=ffff82d0804b01c0, lockval=1adb1adb, not locked
(XEN)   lock:1841883(1375128998543), block:1658716(10193054890781)

Average lock time:   746588ns
Average block time: 6145147ns

PGE disabled, x2APIC mode:

(XEN) Global lock flush_lock: addr=ffff82d0804af1c0, lockval=a8bfa8bf, not locked
(XEN)   lock:2730175(657505389886), block:2039716(2963768247738)

Average lock time:   240829ns
Average block time: 1453029ns

As seen from the above figures the lock and block time of the flush
lock is reduced to approximately 1/3 of the original value.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Provide command line option to enable/disable PGE.
 - Only disable PGE on AMD hardware when virtualized.
 - Document the global-pages option.
---
 docs/misc/xen-command-line.pandoc | 13 +++++++++++++
 xen/arch/x86/pv/domain.c          |  9 ++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

Comments

Jan Beulich Dec. 4, 2019, 11:05 a.m. UTC | #1
On 04.12.2019 11:44, Roger Pau Monne wrote:
> When using global pages a full tlb flush can only be performed by
> toggling the PGE bit in CR4, which is usually quite expensive in terms
> of performance when running virtualized. This is specially relevant on
> AMD hardware, which doesn't have the ability to do selective CR4
> trapping, but can also be relevant on Intel if the underlying
> hypervisor also traps accesses to the PGE CR4 bit.
> 
> In order to avoid this performance penalty, do not use global pages
> when running virtualized on AMD hardware. A command line option
> 'global-pages' is provided in order to allow the user to select
> whether global pages will be enabled for PV guests.
> 
> The above figures are from a PV shim running on AMD hardware with
> 32 vCPUs:
> 
> PGE enabled, x2APIC mode:
> 
> (XEN) Global lock flush_lock: addr=ffff82d0804b01c0, lockval=1adb1adb, not locked
> (XEN)   lock:1841883(1375128998543), block:1658716(10193054890781)
> 
> Average lock time:   746588ns
> Average block time: 6145147ns
> 
> PGE disabled, x2APIC mode:
> 
> (XEN) Global lock flush_lock: addr=ffff82d0804af1c0, lockval=a8bfa8bf, not locked
> (XEN)   lock:2730175(657505389886), block:2039716(2963768247738)
> 
> Average lock time:   240829ns
> Average block time: 1453029ns
> 
> As seen from the above figures the lock and block time of the flush
> lock is reduced to approximately 1/3 of the original value.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v1:
>  - Provide command line option to enable/disable PGE.
>  - Only disable PGE on AMD hardware when virtualized.
>  - Document the global-pages option.
> ---
>  docs/misc/xen-command-line.pandoc | 13 +++++++++++++
>  xen/arch/x86/pv/domain.c          |  9 ++++++++-
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index d9495ef6b9..7be30f2766 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1087,6 +1087,19 @@ value settable via Xen tools.
>  
>  Dom0 is using this value for sizing its maptrack table.
>  
> +### global-pages (x86)
> +> `= <boolean>`
> +
> +> Default: `true` unless running virtualized on AMD hardware
> +
> +Set whether the PGE bit in CR4 will be enabled for PV guests. This controls the
> +usage of global pages, and thus the need to perform tlb flushes by writing to
> +CR4.
> +
> +Note it's disabled by default when running virtualized on AMD hardware since
> +AMD SVM doesn't support selective trapping of CR4, so global pages are not
> +enabled in order to reduce the overhead of tlb flushes.
> +
>  ### guest_loglvl
>  > `= <level>[/<rate-limited level>]` where level is `none | error | warning | info | debug | all`
>  
> diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
> index 4b6f48dea2..93fb823d63 100644
> --- a/xen/arch/x86/pv/domain.c
> +++ b/xen/arch/x86/pv/domain.c
> @@ -118,11 +118,18 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4)
>              (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK));
>  }
>  
> +static int opt_global_pages = -1;

int8_t __read_mostly

> +boolean_runtime_param("global-pages", opt_global_pages);
> +
>  unsigned long pv_make_cr4(const struct vcpu *v)
>  {
>      const struct domain *d = v->domain;
>      unsigned long cr4 = mmu_cr4_features &
>          ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD);
> +    bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor ||
> +                                         boot_cpu_data.x86_vendor !=
> +                                         X86_VENDOR_AMD)
> +                                      : !!opt_global_pages;

Let's avoid re-doing this evaluation each time we come here.
Post boot the value can only change to 0 or 1. Hence in some
__init function you can apply the default calculation done
here.

Jan
Roger Pau Monné Dec. 4, 2019, 11:52 a.m. UTC | #2
On Wed, Dec 04, 2019 at 12:05:35PM +0100, Jan Beulich wrote:
> On 04.12.2019 11:44, Roger Pau Monne wrote:
> > When using global pages a full tlb flush can only be performed by
> > toggling the PGE bit in CR4, which is usually quite expensive in terms
> > of performance when running virtualized. This is specially relevant on
> > AMD hardware, which doesn't have the ability to do selective CR4
> > trapping, but can also be relevant on Intel if the underlying
> > hypervisor also traps accesses to the PGE CR4 bit.
> > 
> > In order to avoid this performance penalty, do not use global pages
> > when running virtualized on AMD hardware. A command line option
> > 'global-pages' is provided in order to allow the user to select
> > whether global pages will be enabled for PV guests.
> > 
> > The above figures are from a PV shim running on AMD hardware with
> > 32 vCPUs:
> > 
> > PGE enabled, x2APIC mode:
> > 
> > (XEN) Global lock flush_lock: addr=ffff82d0804b01c0, lockval=1adb1adb, not locked
> > (XEN)   lock:1841883(1375128998543), block:1658716(10193054890781)
> > 
> > Average lock time:   746588ns
> > Average block time: 6145147ns
> > 
> > PGE disabled, x2APIC mode:
> > 
> > (XEN) Global lock flush_lock: addr=ffff82d0804af1c0, lockval=a8bfa8bf, not locked
> > (XEN)   lock:2730175(657505389886), block:2039716(2963768247738)
> > 
> > Average lock time:   240829ns
> > Average block time: 1453029ns
> > 
> > As seen from the above figures the lock and block time of the flush
> > lock is reduced to approximately 1/3 of the original value.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Changes since v1:
> >  - Provide command line option to enable/disable PGE.
> >  - Only disable PGE on AMD hardware when virtualized.
> >  - Document the global-pages option.
> > ---
> >  docs/misc/xen-command-line.pandoc | 13 +++++++++++++
> >  xen/arch/x86/pv/domain.c          |  9 ++++++++-
> >  2 files changed, 21 insertions(+), 1 deletion(-)
> > 
> > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> > index d9495ef6b9..7be30f2766 100644
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -1087,6 +1087,19 @@ value settable via Xen tools.
> >  
> >  Dom0 is using this value for sizing its maptrack table.
> >  
> > +### global-pages (x86)
> > +> `= <boolean>`
> > +
> > +> Default: `true` unless running virtualized on AMD hardware
> > +
> > +Set whether the PGE bit in CR4 will be enabled for PV guests. This controls the
> > +usage of global pages, and thus the need to perform tlb flushes by writing to
> > +CR4.
> > +
> > +Note it's disabled by default when running virtualized on AMD hardware since
> > +AMD SVM doesn't support selective trapping of CR4, so global pages are not
> > +enabled in order to reduce the overhead of tlb flushes.
> > +
> >  ### guest_loglvl
> >  > `= <level>[/<rate-limited level>]` where level is `none | error | warning | info | debug | all`
> >  
> > diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
> > index 4b6f48dea2..93fb823d63 100644
> > --- a/xen/arch/x86/pv/domain.c
> > +++ b/xen/arch/x86/pv/domain.c
> > @@ -118,11 +118,18 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4)
> >              (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK));
> >  }
> >  
> > +static int opt_global_pages = -1;
> 
> int8_t __read_mostly
> 
> > +boolean_runtime_param("global-pages", opt_global_pages);
> > +
> >  unsigned long pv_make_cr4(const struct vcpu *v)
> >  {
> >      const struct domain *d = v->domain;
> >      unsigned long cr4 = mmu_cr4_features &
> >          ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD);
> > +    bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor ||
> > +                                         boot_cpu_data.x86_vendor !=
> > +                                         X86_VENDOR_AMD)
> > +                                      : !!opt_global_pages;
> 
> Let's avoid re-doing this evaluation each time we come here.
> Post boot the value can only change to 0 or 1. Hence in some
> __init function you can apply the default calculation done
> here.

I've assumed that boolean_runtime_param can be changed during runtime
by the user, and hence the value calculated at boot time would become
stale if the user changes it after boot, which should be fine for this
option.

Thanks, Roger.
Jan Beulich Dec. 4, 2019, 1:15 p.m. UTC | #3
On 04.12.2019 12:52, Roger Pau Monné wrote:
> On Wed, Dec 04, 2019 at 12:05:35PM +0100, Jan Beulich wrote:
>> On 04.12.2019 11:44, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/pv/domain.c
>>> +++ b/xen/arch/x86/pv/domain.c
>>> @@ -118,11 +118,18 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4)
>>>              (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK));
>>>  }
>>>  
>>> +static int opt_global_pages = -1;
>>
>> int8_t __read_mostly
>>
>>> +boolean_runtime_param("global-pages", opt_global_pages);
>>> +
>>>  unsigned long pv_make_cr4(const struct vcpu *v)
>>>  {
>>>      const struct domain *d = v->domain;
>>>      unsigned long cr4 = mmu_cr4_features &
>>>          ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD);
>>> +    bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor ||
>>> +                                         boot_cpu_data.x86_vendor !=
>>> +                                         X86_VENDOR_AMD)
>>> +                                      : !!opt_global_pages;
>>
>> Let's avoid re-doing this evaluation each time we come here.
>> Post boot the value can only change to 0 or 1. Hence in some
>> __init function you can apply the default calculation done
>> here.
> 
> I've assumed that boolean_runtime_param can be changed during runtime
> by the user, and hence the value calculated at boot time would become
> stale if the user changes it after boot, which should be fine for this
> option.

I'm afraid I can't decide whether you agree or disagree with my
comment.

Jan
Roger Pau Monné Dec. 4, 2019, 1:17 p.m. UTC | #4
On Wed, Dec 04, 2019 at 02:15:58PM +0100, Jan Beulich wrote:
> On 04.12.2019 12:52, Roger Pau Monné wrote:
> > On Wed, Dec 04, 2019 at 12:05:35PM +0100, Jan Beulich wrote:
> >> On 04.12.2019 11:44, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/pv/domain.c
> >>> +++ b/xen/arch/x86/pv/domain.c
> >>> @@ -118,11 +118,18 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4)
> >>>              (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK));
> >>>  }
> >>>  
> >>> +static int opt_global_pages = -1;
> >>
> >> int8_t __read_mostly
> >>
> >>> +boolean_runtime_param("global-pages", opt_global_pages);
> >>> +
> >>>  unsigned long pv_make_cr4(const struct vcpu *v)
> >>>  {
> >>>      const struct domain *d = v->domain;
> >>>      unsigned long cr4 = mmu_cr4_features &
> >>>          ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD);
> >>> +    bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor ||
> >>> +                                         boot_cpu_data.x86_vendor !=
> >>> +                                         X86_VENDOR_AMD)
> >>> +                                      : !!opt_global_pages;
> >>
> >> Let's avoid re-doing this evaluation each time we come here.
> >> Post boot the value can only change to 0 or 1. Hence in some
> >> __init function you can apply the default calculation done
> >> here.
> > 
> > I've assumed that boolean_runtime_param can be changed during runtime
> > by the user, and hence the value calculated at boot time would become
> > stale if the user changes it after boot, which should be fine for this
> > option.
> 
> I'm afraid I can't decide whether you agree or disagree with my
> comment.

Oh sorry fro not being clear. I was meant to disagree, calculating a
value at init time would be wrong, since opt_global_pages can change
during runtime.

Thanks, Roger.
Jan Beulich Dec. 4, 2019, 2:22 p.m. UTC | #5
On 04.12.2019 14:17, Roger Pau Monné  wrote:
> On Wed, Dec 04, 2019 at 02:15:58PM +0100, Jan Beulich wrote:
>> On 04.12.2019 12:52, Roger Pau Monné wrote:
>>> On Wed, Dec 04, 2019 at 12:05:35PM +0100, Jan Beulich wrote:
>>>> On 04.12.2019 11:44, Roger Pau Monne wrote:
>>>>> --- a/xen/arch/x86/pv/domain.c
>>>>> +++ b/xen/arch/x86/pv/domain.c
>>>>> @@ -118,11 +118,18 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4)
>>>>>              (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK));
>>>>>  }
>>>>>  
>>>>> +static int opt_global_pages = -1;
>>>>
>>>> int8_t __read_mostly
>>>>
>>>>> +boolean_runtime_param("global-pages", opt_global_pages);
>>>>> +
>>>>>  unsigned long pv_make_cr4(const struct vcpu *v)
>>>>>  {
>>>>>      const struct domain *d = v->domain;
>>>>>      unsigned long cr4 = mmu_cr4_features &
>>>>>          ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD);
>>>>> +    bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor ||
>>>>> +                                         boot_cpu_data.x86_vendor !=
>>>>> +                                         X86_VENDOR_AMD)
>>>>> +                                      : !!opt_global_pages;
>>>>
>>>> Let's avoid re-doing this evaluation each time we come here.
>>>> Post boot the value can only change to 0 or 1. Hence in some
>>>> __init function you can apply the default calculation done
>>>> here.
>>>
>>> I've assumed that boolean_runtime_param can be changed during runtime
>>> by the user, and hence the value calculated at boot time would become
>>> stale if the user changes it after boot, which should be fine for this
>>> option.
>>
>> I'm afraid I can't decide whether you agree or disagree with my
>> comment.
> 
> Oh sorry fro not being clear. I was meant to disagree, calculating a
> value at init time would be wrong, since opt_global_pages can change
> during runtime.

But that's what I've explained in my earlier reply: At run time,
the value can only change to 0 or 1, but not to -1. Hence once a
runtime update occurred, the default calculation won't be used
anymore (as much as it wouldn't be used if there was a respective
command line option).

Jan
diff mbox series

Patch

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index d9495ef6b9..7be30f2766 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1087,6 +1087,19 @@  value settable via Xen tools.
 
 Dom0 is using this value for sizing its maptrack table.
 
+### global-pages (x86)
+> `= <boolean>`
+
+> Default: `true` unless running virtualized on AMD hardware
+
+Set whether the PGE bit in CR4 will be enabled for PV guests. This controls the
+usage of global pages, and thus the need to perform tlb flushes by writing to
+CR4.
+
+Note it's disabled by default when running virtualized on AMD hardware since
+AMD SVM doesn't support selective trapping of CR4, so global pages are not
+enabled in order to reduce the overhead of tlb flushes.
+
 ### guest_loglvl
 > `= <level>[/<rate-limited level>]` where level is `none | error | warning | info | debug | all`
 
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 4b6f48dea2..93fb823d63 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -118,11 +118,18 @@  unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4)
             (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK));
 }
 
+static int opt_global_pages = -1;
+boolean_runtime_param("global-pages", opt_global_pages);
+
 unsigned long pv_make_cr4(const struct vcpu *v)
 {
     const struct domain *d = v->domain;
     unsigned long cr4 = mmu_cr4_features &
         ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD);
+    bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor ||
+                                         boot_cpu_data.x86_vendor !=
+                                         X86_VENDOR_AMD)
+                                      : !!opt_global_pages;
 
     /*
      * PCIDE or PGE depends on the PCID/XPTI settings, but must not both be
@@ -130,7 +137,7 @@  unsigned long pv_make_cr4(const struct vcpu *v)
      */
     if ( d->arch.pv.pcid )
         cr4 |= X86_CR4_PCIDE;
-    else if ( !d->arch.pv.xpti )
+    else if ( !d->arch.pv.xpti && pge )
         cr4 |= X86_CR4_PGE;
 
     /*