Message ID | 946e5494801866c93332cc5d9ec0fa03a4df00d7.1689686046.git.simon@invisiblethingslab.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [XEN] x86/msr: Allow hardware domain to read package C-state residency counters | expand |
On 18/07/2023 2:17 pm, Simon Gaiser wrote: > Since it's limited to the hardware domain it should be safe and it's > very useful to have access to this directly in dom0 when debugging power > related things for example S0ix. You need a SoB. But, this is an area there things are subtly broken. For package-scope MSRs on single socket systems (which does include client systems), then this happens to function. It does not function for core-scoped MSRs, or at all in a multi-socket system. In such scenarios, dom0 can be rescheduled to a CPU in a different scope while it thinks it is sampling a single scope. This is one of the areas where dom0 and Xen end up fighting over the system. I agree that we want some way for dom0 to get this information, but I'm afraid it's not as simple as just permitting access to the MSRs like this. ~Andrew
On 18.07.2023 15:17, Simon Gaiser wrote: > --- a/xen/arch/x86/pv/emul-priv-op.c > +++ b/xen/arch/x86/pv/emul-priv-op.c > @@ -965,6 +965,20 @@ static int cf_check read_msr( > *val = 0; > return X86EMUL_OKAY; > > + case MSR_PKG_C2_RESIDENCY: > + case MSR_PKG_C3_RESIDENCY: > + case MSR_PKG_C6_RESIDENCY: > + case MSR_PKG_C7_RESIDENCY: > + case MSR_PKG_C8_RESIDENCY: > + case MSR_PKG_C9_RESIDENCY: > + case MSR_PKG_C10_RESIDENCY: > + if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) > + break; > + if ( is_hardware_domain(currd) ) > + goto normal; > + *val = 0; > + return X86EMUL_OKAY; In addition to what Andrew said: Why would we suddenly allow these reads to succeed for DomU-s? Jan
Jan Beulich: > On 18.07.2023 15:17, Simon Gaiser wrote: >> --- a/xen/arch/x86/pv/emul-priv-op.c >> +++ b/xen/arch/x86/pv/emul-priv-op.c >> @@ -965,6 +965,20 @@ static int cf_check read_msr( >> *val = 0; >> return X86EMUL_OKAY; >> >> + case MSR_PKG_C2_RESIDENCY: >> + case MSR_PKG_C3_RESIDENCY: >> + case MSR_PKG_C6_RESIDENCY: >> + case MSR_PKG_C7_RESIDENCY: >> + case MSR_PKG_C8_RESIDENCY: >> + case MSR_PKG_C9_RESIDENCY: >> + case MSR_PKG_C10_RESIDENCY: >> + if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) >> + break; >> + if ( is_hardware_domain(currd) ) >> + goto normal; >> + *val = 0; >> + return X86EMUL_OKAY; > > In addition to what Andrew said: Why would we suddenly allow these > reads to succeed for DomU-s? That patch wouldn't actually allow those reads, but fake a 0 response, or do I miss something. If you mean that behavior: I just mirrored what is done there in some of the other cases. If you prefer something else, for example treating it as unimplemented, I can change that. Simon
Andrew Cooper: > On 18/07/2023 2:17 pm, Simon Gaiser wrote: >> Since it's limited to the hardware domain it should be safe and it's >> very useful to have access to this directly in dom0 when debugging power >> related things for example S0ix. > > You need a SoB. Yeah, sorry. > But, this is an area there things are subtly broken. For package-scope > MSRs on single socket systems (which does include client systems), then > this happens to function. > > It does not function for core-scoped MSRs, or at all in a multi-socket > system. In such scenarios, dom0 can be rescheduled to a CPU in a > different scope while it thinks it is sampling a single scope. > > This is one of the areas where dom0 and Xen end up fighting over the system. > > I agree that we want some way for dom0 to get this information, but I'm > afraid it's not as simple as just permitting access to the MSRs like this. I see. So a generic solution is not so easy. Also even if there would be an interface for dom0, my main motivation was to be able to just use existing code like /sys/kernel/debug/pmc_core/package_cstate_show and turbostat. You can already read those PC-states via Xen's debug interface, but that's less convenient. For those package-scoped MSRs, how about limiting them to single-socket systems? Simon
On Wed, Jul 19, 2023 at 12:38:39AM +0200, Simon Gaiser wrote: > Jan Beulich: > > On 18.07.2023 15:17, Simon Gaiser wrote: > >> --- a/xen/arch/x86/pv/emul-priv-op.c > >> +++ b/xen/arch/x86/pv/emul-priv-op.c > >> @@ -965,6 +965,20 @@ static int cf_check read_msr( > >> *val = 0; > >> return X86EMUL_OKAY; > >> > >> + case MSR_PKG_C2_RESIDENCY: > >> + case MSR_PKG_C3_RESIDENCY: > >> + case MSR_PKG_C6_RESIDENCY: > >> + case MSR_PKG_C7_RESIDENCY: > >> + case MSR_PKG_C8_RESIDENCY: > >> + case MSR_PKG_C9_RESIDENCY: > >> + case MSR_PKG_C10_RESIDENCY: > >> + if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) > >> + break; > >> + if ( is_hardware_domain(currd) ) > >> + goto normal; > >> + *val = 0; > >> + return X86EMUL_OKAY; > > > > In addition to what Andrew said: Why would we suddenly allow these > > reads to succeed for DomU-s? > > That patch wouldn't actually allow those reads, but fake a 0 response, > or do I miss something. If you mean that behavior: I just mirrored what > is done there in some of the other cases. If you prefer something else, > for example treating it as unimplemented, I can change that. I think Jan meant exactly this difference - faking 0, instead of failing the read.
On 19.07.2023 03:17, Marek Marczykowski-Górecki wrote: > On Wed, Jul 19, 2023 at 12:38:39AM +0200, Simon Gaiser wrote: >> Jan Beulich: >>> On 18.07.2023 15:17, Simon Gaiser wrote: >>>> --- a/xen/arch/x86/pv/emul-priv-op.c >>>> +++ b/xen/arch/x86/pv/emul-priv-op.c >>>> @@ -965,6 +965,20 @@ static int cf_check read_msr( >>>> *val = 0; >>>> return X86EMUL_OKAY; >>>> >>>> + case MSR_PKG_C2_RESIDENCY: >>>> + case MSR_PKG_C3_RESIDENCY: >>>> + case MSR_PKG_C6_RESIDENCY: >>>> + case MSR_PKG_C7_RESIDENCY: >>>> + case MSR_PKG_C8_RESIDENCY: >>>> + case MSR_PKG_C9_RESIDENCY: >>>> + case MSR_PKG_C10_RESIDENCY: >>>> + if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) >>>> + break; >>>> + if ( is_hardware_domain(currd) ) >>>> + goto normal; >>>> + *val = 0; >>>> + return X86EMUL_OKAY; >>> >>> In addition to what Andrew said: Why would we suddenly allow these >>> reads to succeed for DomU-s? >> >> That patch wouldn't actually allow those reads, but fake a 0 response, >> or do I miss something. If you mean that behavior: I just mirrored what >> is done there in some of the other cases. If you prefer something else, >> for example treating it as unimplemented, I can change that. > > I think Jan meant exactly this difference - faking 0, instead of > failing the read. Indeed. Jan
On 19.07.2023 01:27, Simon Gaiser wrote: > Andrew Cooper: >> On 18/07/2023 2:17 pm, Simon Gaiser wrote: >>> Since it's limited to the hardware domain it should be safe and it's >>> very useful to have access to this directly in dom0 when debugging power >>> related things for example S0ix. >> >> You need a SoB. > > Yeah, sorry. > >> But, this is an area there things are subtly broken. For package-scope >> MSRs on single socket systems (which does include client systems), then >> this happens to function. >> >> It does not function for core-scoped MSRs, or at all in a multi-socket >> system. In such scenarios, dom0 can be rescheduled to a CPU in a >> different scope while it thinks it is sampling a single scope. >> >> This is one of the areas where dom0 and Xen end up fighting over the system. >> >> I agree that we want some way for dom0 to get this information, but I'm >> afraid it's not as simple as just permitting access to the MSRs like this. > > I see. So a generic solution is not so easy. Also even if there would be > an interface for dom0, my main motivation was to be able to just use > existing code like /sys/kernel/debug/pmc_core/package_cstate_show and > turbostat. You can already read those PC-states via Xen's debug > interface, but that's less convenient. > > For those package-scoped MSRs, how about limiting them to single-socket > systems? No, that's all hackery that in the extreme case may be okay to have in debug builds, but never in release ones. Even having such in debug builds is problematic, because then we won't routinely test code paths as used by release builds. When virtualized and trying to access real MSRs (rather than properly virtualized ones), you need some sort of PV solution. If you want stuff like that to show up in "standard interfaces", you'll have to teach the code providing those to use whatever the (perhaps to be added) PV interface is. Jan
diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h index 4f861c0bb4..7e7255383d 100644 --- a/xen/arch/x86/include/asm/msr-index.h +++ b/xen/arch/x86/include/asm/msr-index.h @@ -704,4 +704,13 @@ #define MSR_PKGC9_IRTL 0x00000634 #define MSR_PKGC10_IRTL 0x00000635 +/* Package C-state residency counters */ +#define MSR_PKG_C2_RESIDENCY 0x0000060d +#define MSR_PKG_C3_RESIDENCY 0x000003f8 +#define MSR_PKG_C6_RESIDENCY 0x000003f9 +#define MSR_PKG_C7_RESIDENCY 0x000003fa +#define MSR_PKG_C8_RESIDENCY 0x00000630 +#define MSR_PKG_C9_RESIDENCY 0x00000631 +#define MSR_PKG_C10_RESIDENCY 0x00000632 + #endif /* __ASM_MSR_INDEX_H */ diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c index 142bc4818c..4593295ee2 100644 --- a/xen/arch/x86/pv/emul-priv-op.c +++ b/xen/arch/x86/pv/emul-priv-op.c @@ -965,6 +965,20 @@ static int cf_check read_msr( *val = 0; return X86EMUL_OKAY; + case MSR_PKG_C2_RESIDENCY: + case MSR_PKG_C3_RESIDENCY: + case MSR_PKG_C6_RESIDENCY: + case MSR_PKG_C7_RESIDENCY: + case MSR_PKG_C8_RESIDENCY: + case MSR_PKG_C9_RESIDENCY: + case MSR_PKG_C10_RESIDENCY: + if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) + break; + if ( is_hardware_domain(currd) ) + goto normal; + *val = 0; + return X86EMUL_OKAY; + case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7): case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3): case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2: