Message ID | 20191204104420.34418-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] x86: do not enable global pages when virtualized on AMD hardware | expand |
On 04.12.2019 11:44, Roger Pau Monne wrote: > When using global pages a full tlb flush can only be performed by > toggling the PGE bit in CR4, which is usually quite expensive in terms > of performance when running virtualized. This is specially relevant on > AMD hardware, which doesn't have the ability to do selective CR4 > trapping, but can also be relevant on Intel if the underlying > hypervisor also traps accesses to the PGE CR4 bit. > > In order to avoid this performance penalty, do not use global pages > when running virtualized on AMD hardware. A command line option > 'global-pages' is provided in order to allow the user to select > whether global pages will be enabled for PV guests. > > The above figures are from a PV shim running on AMD hardware with > 32 vCPUs: > > PGE enabled, x2APIC mode: > > (XEN) Global lock flush_lock: addr=ffff82d0804b01c0, lockval=1adb1adb, not locked > (XEN) lock:1841883(1375128998543), block:1658716(10193054890781) > > Average lock time: 746588ns > Average block time: 6145147ns > > PGE disabled, x2APIC mode: > > (XEN) Global lock flush_lock: addr=ffff82d0804af1c0, lockval=a8bfa8bf, not locked > (XEN) lock:2730175(657505389886), block:2039716(2963768247738) > > Average lock time: 240829ns > Average block time: 1453029ns > > As seen from the above figures the lock and block time of the flush > lock is reduced to approximately 1/3 of the original value. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > Changes since v1: > - Provide command line option to enable/disable PGE. > - Only disable PGE on AMD hardware when virtualized. > - Document the global-pages option. > --- > docs/misc/xen-command-line.pandoc | 13 +++++++++++++ > xen/arch/x86/pv/domain.c | 9 ++++++++- > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc > index d9495ef6b9..7be30f2766 100644 > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -1087,6 +1087,19 @@ value settable via Xen tools. > > Dom0 is using this value for sizing its maptrack table. > > +### global-pages (x86) > +> `= <boolean>` > + > +> Default: `true` unless running virtualized on AMD hardware > + > +Set whether the PGE bit in CR4 will be enabled for PV guests. This controls the > +usage of global pages, and thus the need to perform tlb flushes by writing to > +CR4. > + > +Note it's disabled by default when running virtualized on AMD hardware since > +AMD SVM doesn't support selective trapping of CR4, so global pages are not > +enabled in order to reduce the overhead of tlb flushes. > + > ### guest_loglvl > > `= <level>[/<rate-limited level>]` where level is `none | error | warning | info | debug | all` > > diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c > index 4b6f48dea2..93fb823d63 100644 > --- a/xen/arch/x86/pv/domain.c > +++ b/xen/arch/x86/pv/domain.c > @@ -118,11 +118,18 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4) > (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK)); > } > > +static int opt_global_pages = -1; int8_t __read_mostly > +boolean_runtime_param("global-pages", opt_global_pages); > + > unsigned long pv_make_cr4(const struct vcpu *v) > { > const struct domain *d = v->domain; > unsigned long cr4 = mmu_cr4_features & > ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD); > + bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor || > + boot_cpu_data.x86_vendor != > + X86_VENDOR_AMD) > + : !!opt_global_pages; Let's avoid re-doing this evaluation each time we come here. Post boot the value can only change to 0 or 1. Hence in some __init function you can apply the default calculation done here. Jan
On Wed, Dec 04, 2019 at 12:05:35PM +0100, Jan Beulich wrote: > On 04.12.2019 11:44, Roger Pau Monne wrote: > > When using global pages a full tlb flush can only be performed by > > toggling the PGE bit in CR4, which is usually quite expensive in terms > > of performance when running virtualized. This is specially relevant on > > AMD hardware, which doesn't have the ability to do selective CR4 > > trapping, but can also be relevant on Intel if the underlying > > hypervisor also traps accesses to the PGE CR4 bit. > > > > In order to avoid this performance penalty, do not use global pages > > when running virtualized on AMD hardware. A command line option > > 'global-pages' is provided in order to allow the user to select > > whether global pages will be enabled for PV guests. > > > > The above figures are from a PV shim running on AMD hardware with > > 32 vCPUs: > > > > PGE enabled, x2APIC mode: > > > > (XEN) Global lock flush_lock: addr=ffff82d0804b01c0, lockval=1adb1adb, not locked > > (XEN) lock:1841883(1375128998543), block:1658716(10193054890781) > > > > Average lock time: 746588ns > > Average block time: 6145147ns > > > > PGE disabled, x2APIC mode: > > > > (XEN) Global lock flush_lock: addr=ffff82d0804af1c0, lockval=a8bfa8bf, not locked > > (XEN) lock:2730175(657505389886), block:2039716(2963768247738) > > > > Average lock time: 240829ns > > Average block time: 1453029ns > > > > As seen from the above figures the lock and block time of the flush > > lock is reduced to approximately 1/3 of the original value. > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > --- > > Changes since v1: > > - Provide command line option to enable/disable PGE. > > - Only disable PGE on AMD hardware when virtualized. > > - Document the global-pages option. > > --- > > docs/misc/xen-command-line.pandoc | 13 +++++++++++++ > > xen/arch/x86/pv/domain.c | 9 ++++++++- > > 2 files changed, 21 insertions(+), 1 deletion(-) > > > > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc > > index d9495ef6b9..7be30f2766 100644 > > --- a/docs/misc/xen-command-line.pandoc > > +++ b/docs/misc/xen-command-line.pandoc > > @@ -1087,6 +1087,19 @@ value settable via Xen tools. > > > > Dom0 is using this value for sizing its maptrack table. > > > > +### global-pages (x86) > > +> `= <boolean>` > > + > > +> Default: `true` unless running virtualized on AMD hardware > > + > > +Set whether the PGE bit in CR4 will be enabled for PV guests. This controls the > > +usage of global pages, and thus the need to perform tlb flushes by writing to > > +CR4. > > + > > +Note it's disabled by default when running virtualized on AMD hardware since > > +AMD SVM doesn't support selective trapping of CR4, so global pages are not > > +enabled in order to reduce the overhead of tlb flushes. > > + > > ### guest_loglvl > > > `= <level>[/<rate-limited level>]` where level is `none | error | warning | info | debug | all` > > > > diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c > > index 4b6f48dea2..93fb823d63 100644 > > --- a/xen/arch/x86/pv/domain.c > > +++ b/xen/arch/x86/pv/domain.c > > @@ -118,11 +118,18 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4) > > (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK)); > > } > > > > +static int opt_global_pages = -1; > > int8_t __read_mostly > > > +boolean_runtime_param("global-pages", opt_global_pages); > > + > > unsigned long pv_make_cr4(const struct vcpu *v) > > { > > const struct domain *d = v->domain; > > unsigned long cr4 = mmu_cr4_features & > > ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD); > > + bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor || > > + boot_cpu_data.x86_vendor != > > + X86_VENDOR_AMD) > > + : !!opt_global_pages; > > Let's avoid re-doing this evaluation each time we come here. > Post boot the value can only change to 0 or 1. Hence in some > __init function you can apply the default calculation done > here. I've assumed that boolean_runtime_param can be changed during runtime by the user, and hence the value calculated at boot time would become stale if the user changes it after boot, which should be fine for this option. Thanks, Roger.
On 04.12.2019 12:52, Roger Pau Monné wrote: > On Wed, Dec 04, 2019 at 12:05:35PM +0100, Jan Beulich wrote: >> On 04.12.2019 11:44, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/pv/domain.c >>> +++ b/xen/arch/x86/pv/domain.c >>> @@ -118,11 +118,18 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4) >>> (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK)); >>> } >>> >>> +static int opt_global_pages = -1; >> >> int8_t __read_mostly >> >>> +boolean_runtime_param("global-pages", opt_global_pages); >>> + >>> unsigned long pv_make_cr4(const struct vcpu *v) >>> { >>> const struct domain *d = v->domain; >>> unsigned long cr4 = mmu_cr4_features & >>> ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD); >>> + bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor || >>> + boot_cpu_data.x86_vendor != >>> + X86_VENDOR_AMD) >>> + : !!opt_global_pages; >> >> Let's avoid re-doing this evaluation each time we come here. >> Post boot the value can only change to 0 or 1. Hence in some >> __init function you can apply the default calculation done >> here. > > I've assumed that boolean_runtime_param can be changed during runtime > by the user, and hence the value calculated at boot time would become > stale if the user changes it after boot, which should be fine for this > option. I'm afraid I can't decide whether you agree or disagree with my comment. Jan
On Wed, Dec 04, 2019 at 02:15:58PM +0100, Jan Beulich wrote: > On 04.12.2019 12:52, Roger Pau Monné wrote: > > On Wed, Dec 04, 2019 at 12:05:35PM +0100, Jan Beulich wrote: > >> On 04.12.2019 11:44, Roger Pau Monne wrote: > >>> --- a/xen/arch/x86/pv/domain.c > >>> +++ b/xen/arch/x86/pv/domain.c > >>> @@ -118,11 +118,18 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4) > >>> (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK)); > >>> } > >>> > >>> +static int opt_global_pages = -1; > >> > >> int8_t __read_mostly > >> > >>> +boolean_runtime_param("global-pages", opt_global_pages); > >>> + > >>> unsigned long pv_make_cr4(const struct vcpu *v) > >>> { > >>> const struct domain *d = v->domain; > >>> unsigned long cr4 = mmu_cr4_features & > >>> ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD); > >>> + bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor || > >>> + boot_cpu_data.x86_vendor != > >>> + X86_VENDOR_AMD) > >>> + : !!opt_global_pages; > >> > >> Let's avoid re-doing this evaluation each time we come here. > >> Post boot the value can only change to 0 or 1. Hence in some > >> __init function you can apply the default calculation done > >> here. > > > > I've assumed that boolean_runtime_param can be changed during runtime > > by the user, and hence the value calculated at boot time would become > > stale if the user changes it after boot, which should be fine for this > > option. > > I'm afraid I can't decide whether you agree or disagree with my > comment. Oh sorry fro not being clear. I was meant to disagree, calculating a value at init time would be wrong, since opt_global_pages can change during runtime. Thanks, Roger.
On 04.12.2019 14:17, Roger Pau Monné wrote: > On Wed, Dec 04, 2019 at 02:15:58PM +0100, Jan Beulich wrote: >> On 04.12.2019 12:52, Roger Pau Monné wrote: >>> On Wed, Dec 04, 2019 at 12:05:35PM +0100, Jan Beulich wrote: >>>> On 04.12.2019 11:44, Roger Pau Monne wrote: >>>>> --- a/xen/arch/x86/pv/domain.c >>>>> +++ b/xen/arch/x86/pv/domain.c >>>>> @@ -118,11 +118,18 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4) >>>>> (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK)); >>>>> } >>>>> >>>>> +static int opt_global_pages = -1; >>>> >>>> int8_t __read_mostly >>>> >>>>> +boolean_runtime_param("global-pages", opt_global_pages); >>>>> + >>>>> unsigned long pv_make_cr4(const struct vcpu *v) >>>>> { >>>>> const struct domain *d = v->domain; >>>>> unsigned long cr4 = mmu_cr4_features & >>>>> ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD); >>>>> + bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor || >>>>> + boot_cpu_data.x86_vendor != >>>>> + X86_VENDOR_AMD) >>>>> + : !!opt_global_pages; >>>> >>>> Let's avoid re-doing this evaluation each time we come here. >>>> Post boot the value can only change to 0 or 1. Hence in some >>>> __init function you can apply the default calculation done >>>> here. >>> >>> I've assumed that boolean_runtime_param can be changed during runtime >>> by the user, and hence the value calculated at boot time would become >>> stale if the user changes it after boot, which should be fine for this >>> option. >> >> I'm afraid I can't decide whether you agree or disagree with my >> comment. > > Oh sorry fro not being clear. I was meant to disagree, calculating a > value at init time would be wrong, since opt_global_pages can change > during runtime. But that's what I've explained in my earlier reply: At run time, the value can only change to 0 or 1, but not to -1. Hence once a runtime update occurred, the default calculation won't be used anymore (as much as it wouldn't be used if there was a respective command line option). Jan
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index d9495ef6b9..7be30f2766 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -1087,6 +1087,19 @@ value settable via Xen tools. Dom0 is using this value for sizing its maptrack table. +### global-pages (x86) +> `= <boolean>` + +> Default: `true` unless running virtualized on AMD hardware + +Set whether the PGE bit in CR4 will be enabled for PV guests. This controls the +usage of global pages, and thus the need to perform tlb flushes by writing to +CR4. + +Note it's disabled by default when running virtualized on AMD hardware since +AMD SVM doesn't support selective trapping of CR4, so global pages are not +enabled in order to reduce the overhead of tlb flushes. + ### guest_loglvl > `= <level>[/<rate-limited level>]` where level is `none | error | warning | info | debug | all` diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c index 4b6f48dea2..93fb823d63 100644 --- a/xen/arch/x86/pv/domain.c +++ b/xen/arch/x86/pv/domain.c @@ -118,11 +118,18 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, unsigned long cr4) (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK)); } +static int opt_global_pages = -1; +boolean_runtime_param("global-pages", opt_global_pages); + unsigned long pv_make_cr4(const struct vcpu *v) { const struct domain *d = v->domain; unsigned long cr4 = mmu_cr4_features & ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD); + bool pge = opt_global_pages == -1 ? (!cpu_has_hypervisor || + boot_cpu_data.x86_vendor != + X86_VENDOR_AMD) + : !!opt_global_pages; /* * PCIDE or PGE depends on the PCID/XPTI settings, but must not both be @@ -130,7 +137,7 @@ unsigned long pv_make_cr4(const struct vcpu *v) */ if ( d->arch.pv.pcid ) cr4 |= X86_CR4_PCIDE; - else if ( !d->arch.pv.xpti ) + else if ( !d->arch.pv.xpti && pge ) cr4 |= X86_CR4_PGE; /*
When using global pages a full tlb flush can only be performed by toggling the PGE bit in CR4, which is usually quite expensive in terms of performance when running virtualized. This is specially relevant on AMD hardware, which doesn't have the ability to do selective CR4 trapping, but can also be relevant on Intel if the underlying hypervisor also traps accesses to the PGE CR4 bit. In order to avoid this performance penalty, do not use global pages when running virtualized on AMD hardware. A command line option 'global-pages' is provided in order to allow the user to select whether global pages will be enabled for PV guests. The above figures are from a PV shim running on AMD hardware with 32 vCPUs: PGE enabled, x2APIC mode: (XEN) Global lock flush_lock: addr=ffff82d0804b01c0, lockval=1adb1adb, not locked (XEN) lock:1841883(1375128998543), block:1658716(10193054890781) Average lock time: 746588ns Average block time: 6145147ns PGE disabled, x2APIC mode: (XEN) Global lock flush_lock: addr=ffff82d0804af1c0, lockval=a8bfa8bf, not locked (XEN) lock:2730175(657505389886), block:2039716(2963768247738) Average lock time: 240829ns Average block time: 1453029ns As seen from the above figures the lock and block time of the flush lock is reduced to approximately 1/3 of the original value. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Changes since v1: - Provide command line option to enable/disable PGE. - Only disable PGE on AMD hardware when virtualized. - Document the global-pages option. --- docs/misc/xen-command-line.pandoc | 13 +++++++++++++ xen/arch/x86/pv/domain.c | 9 ++++++++- 2 files changed, 21 insertions(+), 1 deletion(-)