Message ID | 5da4ed2e-8eb8-0b18-3c1f-9d419371c08a@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86emul: further work | expand |
On 05/05/2020 09:18, Jan Beulich wrote: > AMD's PM specifies that MPERF (and its r/o counterpart) reads are > affected by the TSC ratio. Hence when processing such reads in software > we too should scale the values. While we don't currently (yet) expose > the underlying feature flags, besides us allowing the MSRs to be read > nevertheless, RDPRU is going to expose the values even to user space. > > Furthermore, due to the not exposed feature flags, this change has the > effect of making properly inaccessible (for reads) the two MSRs. > > Note that writes to MPERF (and APERF) continue to be unsupported. Linux is now using MPERF/APERF for its frequency-invariant scheduling logic. Irritatingly, via its read/write alias rather than its read-only alias. Even more irritatingly, Intel's reference algorithm recommends writing to both, despite this being being far less efficient than (one of) AMD's (two) algorithm(s) which tells you just to subtract the values you last sampled. On the one hand, I'm tempted to suggest that we offer EFRO on Intel and update Linux to use it. OTOH, that would VMExit as Intel CPUs don't understand the EFRO interface. I can't see any sane way to virtualise the write behaviour for MPERF/APERF. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > v3: New. > --- > I did consider whether to put the code in guest_rdmsr() instead, but > decided that it's better to have it next to TSC handling. Please do put it in guest_rdmsr(). This is code hygene just as much as bool_t or style fixes are. The relationship to TSC is passing-at-best. > > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -3478,6 +3478,22 @@ int hvm_msr_read_intercept(unsigned int > *msr_content = v->arch.hvm.msr_tsc_adjust; > break; > > + case MSR_MPERF_RD_ONLY: > + if ( !d->arch.cpuid->extd.efro ) > + { > + goto gp_fault; > + > + case MSR_IA32_MPERF: > + if ( !(d->arch.cpuid->basic.raw[6].c & > + CPUID6_ECX_APERFMPERF_CAPABILITY) ) > + goto gp_fault; > + } > + if ( rdmsr_safe(msr, *msr_content) ) > + goto gp_fault; > + if ( d->arch.cpuid->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) ) I suspect we want to gain amd_like() outside of the emulator. > + *msr_content = hvm_get_guest_tsc_fixed(v, *msr_content); > + break; > + > case MSR_APIC_BASE: > *msr_content = vcpu_vlapic(v)->hw.apic_base_msr; > break; > --- a/xen/include/asm-x86/msr-index.h > +++ b/xen/include/asm-x86/msr-index.h > @@ -405,6 +405,9 @@ > #define MSR_IA32_MPERF 0x000000e7 > #define MSR_IA32_APERF 0x000000e8 > > +#define MSR_MPERF_RD_ONLY 0xc00000e7 > +#define MSR_APERF_RD_ONLY 0xc00000e8 S/RD_ONLY/RO/ ? No loss of meaning. Also, above the legacy line please. ~Andrew > + > #define MSR_IA32_THERM_CONTROL 0x0000019a > #define MSR_IA32_THERM_INTERRUPT 0x0000019b > #define MSR_IA32_THERM_STATUS 0x0000019c >
--- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3478,6 +3478,22 @@ int hvm_msr_read_intercept(unsigned int *msr_content = v->arch.hvm.msr_tsc_adjust; break; + case MSR_MPERF_RD_ONLY: + if ( !d->arch.cpuid->extd.efro ) + { + goto gp_fault; + + case MSR_IA32_MPERF: + if ( !(d->arch.cpuid->basic.raw[6].c & + CPUID6_ECX_APERFMPERF_CAPABILITY) ) + goto gp_fault; + } + if ( rdmsr_safe(msr, *msr_content) ) + goto gp_fault; + if ( d->arch.cpuid->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) ) + *msr_content = hvm_get_guest_tsc_fixed(v, *msr_content); + break; + case MSR_APIC_BASE: *msr_content = vcpu_vlapic(v)->hw.apic_base_msr; break; --- a/xen/include/asm-x86/msr-index.h +++ b/xen/include/asm-x86/msr-index.h @@ -405,6 +405,9 @@ #define MSR_IA32_MPERF 0x000000e7 #define MSR_IA32_APERF 0x000000e8 +#define MSR_MPERF_RD_ONLY 0xc00000e7 +#define MSR_APERF_RD_ONLY 0xc00000e8 + #define MSR_IA32_THERM_CONTROL 0x0000019a #define MSR_IA32_THERM_INTERRUPT 0x0000019b #define MSR_IA32_THERM_STATUS 0x0000019c
AMD's PM specifies that MPERF (and its r/o counterpart) reads are affected by the TSC ratio. Hence when processing such reads in software we too should scale the values. While we don't currently (yet) expose the underlying feature flags, besides us allowing the MSRs to be read nevertheless, RDPRU is going to expose the values even to user space. Furthermore, due to the not exposed feature flags, this change has the effect of making properly inaccessible (for reads) the two MSRs. Note that writes to MPERF (and APERF) continue to be unsupported. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v3: New. --- I did consider whether to put the code in guest_rdmsr() instead, but decided that it's better to have it next to TSC handling.