Message ID | 20221115132657.97864-3-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | amd/virt_ssbd: refactoring and fixes | expand |
On 15/11/2022 13:26, Roger Pau Monne wrote: > Since the VIRT_SPEC_CTRL.SSBD selection is no longer context switched > on vm{entry,exit} there's no need to use a synthetic feature bit for > it anymore. > > Remove the bit and instead use a global variable. > > No functional change intended. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> > Release-acked-by: Henry Wang <Henry.Wang@arm.com> This is definitely not appropriate for 4.17, but it's a performance regression in general, hence my firm and repeated objection to this style of patch. General synthetic bits have existed for several decades longer than alternatives. It has never ever been a rule, or even a recommendation, to aggressively purge the non-alternative bits, because it's a provably bad thing to do. You are attempting a micro-optimisation, that won't produce any improvement at all in centuries, while... > diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c > index a332087604..9e3b9094d3 100644 > --- a/xen/arch/x86/cpu/amd.c > +++ b/xen/arch/x86/cpu/amd.c > @@ -49,6 +49,7 @@ boolean_param("allow_unsafe", opt_allow_unsafe); > /* Signal whether the ACPI C1E quirk is required. */ > bool __read_mostly amd_acpi_c1e_quirk; > bool __ro_after_init amd_legacy_ssbd; > +bool __ro_after_init amd_virt_spec_ctrl; ... actually expending .rodata with something 8 times less efficiently packed, and ... > > static inline int rdmsr_amd_safe(unsigned int msr, unsigned int *lo, > unsigned int *hi) > diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c > index 822f9ace10..acc2f606ce 100644 > --- a/xen/arch/x86/cpuid.c > +++ b/xen/arch/x86/cpuid.c > @@ -3,6 +3,7 @@ > #include <xen/param.h> > #include <xen/sched.h> > #include <xen/nospec.h> > +#include <asm/amd.h> ... (Specific to this instance) making life harder for the people trying to make CONFIG_AMD work, and ... > diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c > index 4e53056624..0b94af6b86 100644 > --- a/xen/arch/x86/spec_ctrl.c > +++ b/xen/arch/x86/spec_ctrl.c > @@ -514,12 +514,12 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps) > (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) || > boot_cpu_has(X86_FEATURE_SC_RSB_HVM) || > boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) || > - boot_cpu_has(X86_FEATURE_VIRT_SC_MSR_HVM) || > + amd_virt_spec_ctrl || ... breaking apart a single TEST instruction, which not only adds an extra conditional merge, but now hits an cold-ish cache line everywhere it's used. Count how many synthetic feature bits it will actually take to change the per-cpu data size, and realise that, when it will take more than 200 years at the current rate of accumulation, any believe that this is an improvement to be had disappears. Yes, it is only a micro regression, but you need a far better justification than "there is a gain of 64 bytes per CPU which will be non-theoretical in more than 200 years" when traded off vs the actual 512 bytes, plus extra code bloat bloat, plus reduced locality of data that this "improvement" genuinely costs today. ~Andrew
On 15.11.2022 17:21, Andrew Cooper wrote: > On 15/11/2022 13:26, Roger Pau Monne wrote: >> Since the VIRT_SPEC_CTRL.SSBD selection is no longer context switched >> on vm{entry,exit} there's no need to use a synthetic feature bit for >> it anymore. >> >> Remove the bit and instead use a global variable. >> >> No functional change intended. >> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >> Reviewed-by: Jan Beulich <jbeulich@suse.com> >> Release-acked-by: Henry Wang <Henry.Wang@arm.com> > > This is definitely not appropriate for 4.17, but it's a performance > regression in general, hence my firm and repeated objection to this > style of patch. > > General synthetic bits have existed for several decades longer than > alternatives. It has never ever been a rule, or even a recommendation, > to aggressively purge the non-alternative bits, because it's a provably > bad thing to do. There we are again - you state something as bad without really saying why it is bad. My view is that synthetic bits were wrong to introduce when they don't stand a chance of being used in an alternative. I agree though that there's no strong need for this to make 4.17. It may end up making backports slightly easier, as no such bit existed in 4.16. > You are attempting a micro-optimisation, that won't produce any > improvement at all in centuries, while... > >> diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c >> index a332087604..9e3b9094d3 100644 >> --- a/xen/arch/x86/cpu/amd.c >> +++ b/xen/arch/x86/cpu/amd.c >> @@ -49,6 +49,7 @@ boolean_param("allow_unsafe", opt_allow_unsafe); >> /* Signal whether the ACPI C1E quirk is required. */ >> bool __read_mostly amd_acpi_c1e_quirk; >> bool __ro_after_init amd_legacy_ssbd; >> +bool __ro_after_init amd_virt_spec_ctrl; > > ... actually expending .rodata with something 8 times less efficiently > packed, and ... ... as long as you're talking of just a single CPU. The break-even is at 8 CPUs (8 bits used either way). >> --- a/xen/arch/x86/spec_ctrl.c >> +++ b/xen/arch/x86/spec_ctrl.c >> @@ -514,12 +514,12 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps) >> (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) || >> boot_cpu_has(X86_FEATURE_SC_RSB_HVM) || >> boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) || >> - boot_cpu_has(X86_FEATURE_VIRT_SC_MSR_HVM) || >> + amd_virt_spec_ctrl || > > ... breaking apart a single TEST instruction, which not only adds an > extra conditional merge, but now hits an cold-ish cache line everywhere > it's used. > > Count how many synthetic feature bits it will actually take to change > the per-cpu data size, and realise that, when it will take more than 200 > years at the current rate of accumulation, any believe that this is an > improvement to be had disappears. > > Yes, it is only a micro regression, but you need a far better > justification than "there is a gain of 64 bytes per CPU which will be > non-theoretical in more than 200 years" when traded off vs the actual > 512 bytes, plus extra code bloat bloat, plus reduced locality of data > that this "improvement" genuinely costs today. I don't see Roger stating anything like this. I think we need to settle on at least halfway firm rules on when to use synthetic feature bits and when to use plain global booleans. Without that the tastes of the three of us are going to collide again every once in a while. Jan
On Tue, Nov 15, 2022 at 04:21:07PM +0000, Andrew Cooper wrote: > On 15/11/2022 13:26, Roger Pau Monne wrote: > > Since the VIRT_SPEC_CTRL.SSBD selection is no longer context switched > > on vm{entry,exit} there's no need to use a synthetic feature bit for > > it anymore. > > > > Remove the bit and instead use a global variable. > > > > No functional change intended. > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > Release-acked-by: Henry Wang <Henry.Wang@arm.com> > > This is definitely not appropriate for 4.17, but it's a performance > regression in general, hence my firm and repeated objection to this > style of patch. While I don't have any objections in deferring this past 4.17, none of the modified paths are performance sensitive AFAICT. > General synthetic bits have existed for several decades longer than > alternatives. It has never ever been a rule, or even a recommendation, > to aggressively purge the non-alternative bits, because it's a provably > bad thing to do. > > > You are attempting a micro-optimisation, that won't produce any > improvement at all in centuries, while... Oh, I wasn't attempting any micro-optimizations TBH, just didn't see the need to keep this as a synthetic feature, and generally consider better to use a global variable because it's IMO easier to follow. > > diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c > > index a332087604..9e3b9094d3 100644 > > --- a/xen/arch/x86/cpu/amd.c > > +++ b/xen/arch/x86/cpu/amd.c > > @@ -49,6 +49,7 @@ boolean_param("allow_unsafe", opt_allow_unsafe); > > /* Signal whether the ACPI C1E quirk is required. */ > > bool __read_mostly amd_acpi_c1e_quirk; > > bool __ro_after_init amd_legacy_ssbd; > > +bool __ro_after_init amd_virt_spec_ctrl; > > ... actually expending .rodata with something 8 times less efficiently > packed, and ... > > > > > static inline int rdmsr_amd_safe(unsigned int msr, unsigned int *lo, > > unsigned int *hi) > > diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c > > index 822f9ace10..acc2f606ce 100644 > > --- a/xen/arch/x86/cpuid.c > > +++ b/xen/arch/x86/cpuid.c > > @@ -3,6 +3,7 @@ > > #include <xen/param.h> > > #include <xen/sched.h> > > #include <xen/nospec.h> > > +#include <asm/amd.h> > > ... (Specific to this instance) making life harder for the people trying > to make CONFIG_AMD work, and ... That's indeed a point, albeit I think adding a `#define amd_virt_spec_ctrl false` won't be the bigger of the problems when dealing with CONFIG_AMD, and will need to be done for other AMD specific variables anyway. > > diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c > > index 4e53056624..0b94af6b86 100644 > > --- a/xen/arch/x86/spec_ctrl.c > > +++ b/xen/arch/x86/spec_ctrl.c > > @@ -514,12 +514,12 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps) > > (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) || > > boot_cpu_has(X86_FEATURE_SC_RSB_HVM) || > > boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) || > > - boot_cpu_has(X86_FEATURE_VIRT_SC_MSR_HVM) || > > + amd_virt_spec_ctrl || > > ... breaking apart a single TEST instruction, which not only adds an > extra conditional merge, but now hits an cold-ish cache line everywhere > it's used. Why does performance matter here? It's an init function that prints the speculation related settings to the screen, so that's likely to be many times slower that accessing a cold cache line. > Count how many synthetic feature bits it will actually take to change > the per-cpu data size, and realise that, when it will take more than 200 > years at the current rate of accumulation, any believe that this is an > improvement to be had disappears. > > Yes, it is only a micro regression, but you need a far better > justification than "there is a gain of 64 bytes per CPU which will be > non-theoretical in more than 200 years" when traded off vs the actual > 512 bytes, plus extra code bloat bloat, plus reduced locality of data > that this "improvement" genuinely costs today. I wasn't considering any of the above when proposing the change, my only motivation was that global variables are clearer to use than synthetic features, and I didn't see a need for a synthetic feature in this case. If we agree the above possible performance regressions are worth it I'm fine keeping it as-is. Now that I realize it amd_virt_spec_ctrl could even be plain __init. Thanks, Roger.
On 15/11/2022 16:44, Jan Beulich wrote: > On 15.11.2022 17:21, Andrew Cooper wrote: >> On 15/11/2022 13:26, Roger Pau Monne wrote: >>> Since the VIRT_SPEC_CTRL.SSBD selection is no longer context switched >>> on vm{entry,exit} there's no need to use a synthetic feature bit for >>> it anymore. >>> >>> Remove the bit and instead use a global variable. >>> >>> No functional change intended. >>> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>> Reviewed-by: Jan Beulich <jbeulich@suse.com> >>> Release-acked-by: Henry Wang <Henry.Wang@arm.com> >> This is definitely not appropriate for 4.17, but it's a performance >> regression in general, hence my firm and repeated objection to this >> style of patch. >> >> General synthetic bits have existed for several decades longer than >> alternatives. It has never ever been a rule, or even a recommendation, >> to aggressively purge the non-alternative bits, because it's a provably >> bad thing to do. > There we are again - you state something as bad without really saying > why it is bad. You may not agree with the reasoning, but you are lying to yourself, if no-one else, by claiming that no justification was presented. > My view is that synthetic bits were wrong to introduce > when they don't stand a chance of being used in an alternative. Your view is incompatible with a linear interpretation of history, as has been pointed repeatedly before by the fact that 1/3 of Xen's synthetic features full predate the introduction of alternatives. "I don't like using synthetic bits in this way" is a point of view, but is not something that counters technical reasoning about the tradeoff in question. > > I agree though that there's no strong need for this to make 4.17. It > may end up making backports slightly easier, as no such bit existed > in 4.16. *This* is a good justification to take the change. Equally, Roger's subsequent observation that it can actually live in __initdata. >> You are attempting a micro-optimisation, that won't produce any >> improvement at all in centuries, while... >> >>> diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c >>> index a332087604..9e3b9094d3 100644 >>> --- a/xen/arch/x86/cpu/amd.c >>> +++ b/xen/arch/x86/cpu/amd.c >>> @@ -49,6 +49,7 @@ boolean_param("allow_unsafe", opt_allow_unsafe); >>> /* Signal whether the ACPI C1E quirk is required. */ >>> bool __read_mostly amd_acpi_c1e_quirk; >>> bool __ro_after_init amd_legacy_ssbd; >>> +bool __ro_after_init amd_virt_spec_ctrl; >> ... actually expending .rodata with something 8 times less efficiently >> packed, and ... > ... as long as you're talking of just a single CPU. The break-even is > at 8 CPUs (8 bits used either way). And still irrelevant when the size of the per-cpu data area doesn't change for several centuries in the argued case. > I think we need to settle on at least halfway firm rules on when to use > synthetic feature bits and when to use plain global booleans. Without > that the tastes of the three of us are going to collide again every once > in a while. Its very easy. All other things being equal, synthetic features are the most efficient option. In most cases, things aren't all equal, and literally any technically-credible justification will do. If a tradeoff doesn't plausibly work within a decade, then it's probably a waste of time raising, and definitely not a point to legitimately object with. Especially as in the past, I've already given you an alternative course of action where the synthetic features aren't per-cpu... ~Andrew
On 16.11.2022 00:54, Andrew Cooper wrote: > On 15/11/2022 16:44, Jan Beulich wrote: >> I think we need to settle on at least halfway firm rules on when to use >> synthetic feature bits and when to use plain global booleans. Without >> that the tastes of the three of us are going to collide again every once >> in a while. > > Its very easy. All other things being equal, synthetic features are the > most efficient option. See Roger's better wording of "why use a more complicated construct when a simple one will do". Yes, generated code may be better in certain cases, but no, we don't always judge by that aspect alone. Source simplicity is an important criteria, which at other times I recall you also weighing higher than the performance of resulting code (especially when dealing with performance aspects when they don't really matter at most/all use sites of whichever construct). Jan
On Wed, Nov 16, 2022 at 08:41:06AM +0100, Jan Beulich wrote: > On 16.11.2022 00:54, Andrew Cooper wrote: > > On 15/11/2022 16:44, Jan Beulich wrote: > >> I think we need to settle on at least halfway firm rules on when to use > >> synthetic feature bits and when to use plain global booleans. Without > >> that the tastes of the three of us are going to collide again every once > >> in a while. > > > > Its very easy. All other things being equal, synthetic features are the > > most efficient option. > > See Roger's better wording of "why use a more complicated construct when > a simple one will do". Yes, generated code may be better in certain cases, > but no, we don't always judge by that aspect alone. Source simplicity is > an important criteria, which at other times I recall you also weighing > higher than the performance of resulting code (especially when dealing > with performance aspects when they don't really matter at most/all use > sites of whichever construct). I think it would be easier if we can discuss this in one of our x86 related meetings. It's still unclear to me why a synthetic feature would be preferred rather than a global variable in most cases (like the one here, even if the variable didn't end up having the __init attribute). Thanks, Roger.
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c index a332087604..9e3b9094d3 100644 --- a/xen/arch/x86/cpu/amd.c +++ b/xen/arch/x86/cpu/amd.c @@ -49,6 +49,7 @@ boolean_param("allow_unsafe", opt_allow_unsafe); /* Signal whether the ACPI C1E quirk is required. */ bool __read_mostly amd_acpi_c1e_quirk; bool __ro_after_init amd_legacy_ssbd; +bool __ro_after_init amd_virt_spec_ctrl; static inline int rdmsr_amd_safe(unsigned int msr, unsigned int *lo, unsigned int *hi) diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index 822f9ace10..acc2f606ce 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -3,6 +3,7 @@ #include <xen/param.h> #include <xen/sched.h> #include <xen/nospec.h> +#include <asm/amd.h> #include <asm/cpuid.h> #include <asm/hvm/hvm.h> #include <asm/hvm/nestedhvm.h> @@ -543,9 +544,9 @@ static void __init calculate_hvm_max_policy(void) /* * VIRT_SSBD is exposed in the default policy as a result of - * VIRT_SC_MSR_HVM being set, it also needs exposing in the max policy. + * amd_virt_spec_ctrl being set, it also needs exposing in the max policy. */ - if ( boot_cpu_has(X86_FEATURE_VIRT_SC_MSR_HVM) ) + if ( amd_virt_spec_ctrl ) __set_bit(X86_FEATURE_VIRT_SSBD, hvm_featureset); /* @@ -606,9 +607,9 @@ static void __init calculate_hvm_def_policy(void) /* * Only expose VIRT_SSBD if AMD_SSBD is not available, and thus - * VIRT_SC_MSR_HVM is set. + * amd_virt_spec_ctrl is set. */ - if ( boot_cpu_has(X86_FEATURE_VIRT_SC_MSR_HVM) ) + if ( amd_virt_spec_ctrl ) __set_bit(X86_FEATURE_VIRT_SSBD, hvm_featureset); sanitise_featureset(hvm_featureset); diff --git a/xen/arch/x86/include/asm/amd.h b/xen/arch/x86/include/asm/amd.h index 6a42f68542..a975d3de26 100644 --- a/xen/arch/x86/include/asm/amd.h +++ b/xen/arch/x86/include/asm/amd.h @@ -152,6 +152,7 @@ extern bool amd_acpi_c1e_quirk; void amd_check_disable_c1e(unsigned int port, u8 value); extern bool amd_legacy_ssbd; +extern bool amd_virt_spec_ctrl; bool amd_setup_legacy_ssbd(void); void amd_set_legacy_ssbd(bool enable); diff --git a/xen/arch/x86/include/asm/cpufeatures.h b/xen/arch/x86/include/asm/cpufeatures.h index 3895de4faf..efd3a667ef 100644 --- a/xen/arch/x86/include/asm/cpufeatures.h +++ b/xen/arch/x86/include/asm/cpufeatures.h @@ -24,7 +24,7 @@ XEN_CPUFEATURE(APERFMPERF, X86_SYNTH( 8)) /* APERFMPERF */ XEN_CPUFEATURE(MFENCE_RDTSC, X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */ XEN_CPUFEATURE(XEN_SMEP, X86_SYNTH(10)) /* SMEP gets used by Xen itself */ XEN_CPUFEATURE(XEN_SMAP, X86_SYNTH(11)) /* SMAP gets used by Xen itself */ -XEN_CPUFEATURE(VIRT_SC_MSR_HVM, X86_SYNTH(12)) /* MSR_VIRT_SPEC_CTRL exposed to HVM */ +/* Bit 12 unused. */ XEN_CPUFEATURE(IND_THUNK_LFENCE, X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */ XEN_CPUFEATURE(IND_THUNK_JMP, X86_SYNTH(14)) /* Use IND_THUNK_JMP */ XEN_CPUFEATURE(SC_NO_BRANCH_HARDEN, X86_SYNTH(15)) /* (Disable) Conditional branch hardening */ diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c index 4e53056624..0b94af6b86 100644 --- a/xen/arch/x86/spec_ctrl.c +++ b/xen/arch/x86/spec_ctrl.c @@ -514,12 +514,12 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps) (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) || boot_cpu_has(X86_FEATURE_SC_RSB_HVM) || boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) || - boot_cpu_has(X86_FEATURE_VIRT_SC_MSR_HVM) || + amd_virt_spec_ctrl || opt_eager_fpu || opt_md_clear_hvm) ? "" : " None", boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ? " MSR_SPEC_CTRL" : "", (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) || - boot_cpu_has(X86_FEATURE_VIRT_SC_MSR_HVM)) ? " MSR_VIRT_SPEC_CTRL" - : "", + amd_virt_spec_ctrl) ? " MSR_VIRT_SPEC_CTRL" + : "", boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ? " RSB" : "", opt_eager_fpu ? " EAGER_FPU" : "", opt_md_clear_hvm ? " MD_CLEAR" : "", @@ -1247,7 +1247,7 @@ void __init init_speculation_mitigations(void) /* Support VIRT_SPEC_CTRL.SSBD if AMD_SSBD is not available. */ if ( opt_msr_sc_hvm && !cpu_has_amd_ssbd && (cpu_has_virt_ssbd || (amd_legacy_ssbd && amd_setup_legacy_ssbd())) ) - setup_force_cpu_cap(X86_FEATURE_VIRT_SC_MSR_HVM); + amd_virt_spec_ctrl = true; /* Figure out default_xen_spec_ctrl. */ if ( has_spec_ctrl && ibrs )