Message ID | 6cd765e98fa4888b9e94215f3572a94e95fe2a4b.1698261255.git.edwin.torok@cloud.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vPMU bugfixes and support for PMUv5 | expand |
On 25/10/2023 8:29 pm, Edwin Török wrote: > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c > index 483b5e4f70..b3cd851d9d 100644 > --- a/xen/arch/x86/msr.c > +++ b/xen/arch/x86/msr.c > @@ -584,6 +584,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val) > } > break; > > + case MSR_K8_HWCR: > + if ( !(cp->x86_vendor & X86_VENDOR_AMD) || > + (val & ~K8_HWCR_IRPERF_EN) || > + wrmsr_safe(msr, val) != 0 ) > + goto gp_fault; > + break; > + > case MSR_AMD64_DE_CFG: > /* > * OpenBSD 6.7 will panic if writing to DE_CFG triggers a #GP: > diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h > index 5faca0bf7a..40f74cd5e8 100644 > --- a/xen/include/public/arch-x86/cpufeatureset.h > +++ b/xen/include/public/arch-x86/cpufeatureset.h > @@ -241,7 +241,7 @@ XEN_CPUFEATURE(EFRO, 7*32+10) /* APERF/MPERF Read Only interface */ > > /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */ > XEN_CPUFEATURE(CLZERO, 8*32+ 0) /*A CLZERO instruction */ > -XEN_CPUFEATURE(IRPERF, 8*32+ 1) /* Instruction Retired Performance Counter */ > +XEN_CPUFEATURE(IRPERF, 8*32+ 1) /*A! Instruction Retired Performance Counter */ > XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */ > XEN_CPUFEATURE(WBNOINVD, 8*32+ 9) /* WBNOINVD instruction */ > XEN_CPUFEATURE(IBPB, 8*32+12) /*A IBPB support only (no IBRS, used by AMD) */ Last things first. You can advertise this bit to guests, but I'm looking at the AMD manuals and woefully little information on what this is an how it works. It does have an architectural enumeration, but there isn't even a a description of what HWCR.IPERF_EN does. The HWCR documentation says "enables Core::X86::Msr::IRPerfCount", but the IRPERF MSR says no such think, noting only that EFRO_LOCK makes the MSR non-writeable (which in turn means we can't always offer it to guests anyway. See below). At a guess, the HWCR bit activates the counter, but nothing states this. At a guess, it's a free-running, reset-able counter, but again nothing states this. The manual just says "is a dedicated counter" and doesn't even state whether it is (or is not) tied into the other core fixed counters and whether it is integrated with PMI or not. Which brings us on to the middle hunk. Putting it there short circuits the PV-specific handling, but you can't do that in general without arranging for HWCR to be context switched properly for vCPUs, nor in this case IPERF_COUNT itself. Unless you context switch HWCR and IPERF_COUNT, a guest will see it not counting, or the count going backwards, 50% of the time as the vCPU is scheduled around. So while in principle we could let guests use this facility (it would have to be off by default, because there doesn't appear to be any filtering capability in it at all, so we can't restrict it to instructions retired in guest context), it will need a far larger patch than this to do it safely. ~Andrew
> On 25 Oct 2023, at 21:59, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote: > > On 25/10/2023 8:29 pm, Edwin Török wrote: >> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c >> index 483b5e4f70..b3cd851d9d 100644 >> --- a/xen/arch/x86/msr.c >> +++ b/xen/arch/x86/msr.c >> @@ -584,6 +584,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val) >> } >> break; >> >> + case MSR_K8_HWCR: >> + if ( !(cp->x86_vendor & X86_VENDOR_AMD) || >> + (val & ~K8_HWCR_IRPERF_EN) || >> + wrmsr_safe(msr, val) != 0 ) >> + goto gp_fault; >> + break; >> + >> case MSR_AMD64_DE_CFG: >> /* >> * OpenBSD 6.7 will panic if writing to DE_CFG triggers a #GP: >> diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h >> index 5faca0bf7a..40f74cd5e8 100644 >> --- a/xen/include/public/arch-x86/cpufeatureset.h >> +++ b/xen/include/public/arch-x86/cpufeatureset.h >> @@ -241,7 +241,7 @@ XEN_CPUFEATURE(EFRO, 7*32+10) /* APERF/MPERF Read Only interface */ >> >> /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */ >> XEN_CPUFEATURE(CLZERO, 8*32+ 0) /*A CLZERO instruction */ >> -XEN_CPUFEATURE(IRPERF, 8*32+ 1) /* Instruction Retired Performance Counter */ >> +XEN_CPUFEATURE(IRPERF, 8*32+ 1) /*A! Instruction Retired Performance Counter */ >> XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */ >> XEN_CPUFEATURE(WBNOINVD, 8*32+ 9) /* WBNOINVD instruction */ >> XEN_CPUFEATURE(IBPB, 8*32+12) /*A IBPB support only (no IBRS, used by AMD) */ > > Last things first. You can advertise this bit to guests, but I'm > looking at the AMD manuals and woefully little information on what this > is an how it works. > > It does have an architectural enumeration, but there isn't even a a > description of what HWCR.IPERF_EN does. > > The HWCR documentation says "enables Core::X86::Msr::IRPerfCount", but > the IRPERF MSR says no such think, > noting only that EFRO_LOCK makes the > MSR non-writeable (which in turn means we can't always offer it to > guests anyway. See below). > At a guess, the HWCR bit activates the counter, but nothing states > this. My version (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf) says: "This is a dedicated counter that is always counting instructions retired. It exists at MSR address C000_00E9. It is enabled by setting a 1 to HWCR[30] and its support is indicated by CPUID Fn8000_0008_EBX[1]." Although digging a bit more there are some erratas around deep C states on some chips and the HWCR register itself is not global but per CCD (which is neatly buried in the _ccd[7:0]_.... smallprint at the top: https://github.com/cyring/CoreFreq/issues/206#issuecomment-722713147 > At a guess, it's a free-running, reset-able counter, but again > nothing states this. The manual just says "is a dedicated counter" and > doesn't even state whether it is (or is not) tied into the other core > fixed counters and whether it is integrated with PMI or not. There is also a LWP 'Instructions Retired' described in 13.4.2.2, which is user-mode only and per-core, but I assume that is completely different from this MSR and part of the features that got dropped in newer cores. > > Which brings us on to the middle hunk. Putting it there short circuits > the PV-specific handling, but you can't do that in general without > arranging for HWCR to be context switched properly for vCPUs, nor in > this case IPERF_COUNT itself. > > Unless you context switch HWCR and IPERF_COUNT, a guest will see it not > counting, or the count going backwards, 50% of the time as the vCPU is > scheduled around. > > So while in principle we could let guests use this facility (it would > have to be off by default, because there doesn't appear to be any > filtering capability in it at all, so we can't restrict it to > instructions retired in guest context), it will need a far larger patch > than this to do it safely. Thanks, I'll move this patch into the 'needs more work' category. Thanks, --Edwin
On 26/10/2023 5:39 pm, Edwin Torok wrote: >> On 25 Oct 2023, at 21:59, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote: >> >> On 25/10/2023 8:29 pm, Edwin Török wrote: >>> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c >>> index 483b5e4f70..b3cd851d9d 100644 >>> --- a/xen/arch/x86/msr.c >>> +++ b/xen/arch/x86/msr.c >>> @@ -584,6 +584,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val) >>> } >>> break; >>> >>> + case MSR_K8_HWCR: >>> + if ( !(cp->x86_vendor & X86_VENDOR_AMD) || >>> + (val & ~K8_HWCR_IRPERF_EN) || >>> + wrmsr_safe(msr, val) != 0 ) >>> + goto gp_fault; >>> + break; >>> + >>> case MSR_AMD64_DE_CFG: >>> /* >>> * OpenBSD 6.7 will panic if writing to DE_CFG triggers a #GP: >>> diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h >>> index 5faca0bf7a..40f74cd5e8 100644 >>> --- a/xen/include/public/arch-x86/cpufeatureset.h >>> +++ b/xen/include/public/arch-x86/cpufeatureset.h >>> @@ -241,7 +241,7 @@ XEN_CPUFEATURE(EFRO, 7*32+10) /* APERF/MPERF Read Only interface */ >>> >>> /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */ >>> XEN_CPUFEATURE(CLZERO, 8*32+ 0) /*A CLZERO instruction */ >>> -XEN_CPUFEATURE(IRPERF, 8*32+ 1) /* Instruction Retired Performance Counter */ >>> +XEN_CPUFEATURE(IRPERF, 8*32+ 1) /*A! Instruction Retired Performance Counter */ >>> XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */ >>> XEN_CPUFEATURE(WBNOINVD, 8*32+ 9) /* WBNOINVD instruction */ >>> XEN_CPUFEATURE(IBPB, 8*32+12) /*A IBPB support only (no IBRS, used by AMD) */ >> Last things first. You can advertise this bit to guests, but I'm >> looking at the AMD manuals and woefully little information on what this >> is an how it works. >> >> It does have an architectural enumeration, but there isn't even a a >> description of what HWCR.IPERF_EN does. >> >> The HWCR documentation says "enables Core::X86::Msr::IRPerfCount", but >> the IRPERF MSR says no such think, >> noting only that EFRO_LOCK makes the >> MSR non-writeable (which in turn means we can't always offer it to >> guests anyway. See below). >> At a guess, the HWCR bit activates the counter, but nothing states >> this. > > My version (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf) says: > "This is a dedicated counter that is always counting instructions retired. It exists at MSR address C000_00E9. It is enabled by setting a 1 to HWCR[30] and its support is indicated by CPUID Fn8000_0008_EBX[1]." > > Although digging a bit more there are some erratas around deep C states on some chips and the HWCR register itself is not global but per CCD (which is neatly buried in the _ccd[7:0]_.... smallprint at the top: > https://github.com/cyring/CoreFreq/issues/206#issuecomment-722713147 HWCR is per-thread, and that's what both screenshots there say. Ryzen's don't typically have CCDs, which is why CCD is missing from the scope description. Consider the implications of making bit 30 work if the MSR was CCD-wide, seeing as the counting really does need to be done in at the core level. Or for that matter bit 35 (the CPUID-faulting enable bit). >> At a guess, it's a free-running, reset-able counter, but again >> nothing states this. The manual just says "is a dedicated counter" and >> doesn't even state whether it is (or is not) tied into the other core >> fixed counters and whether it is integrated with PMI or not. > There is also a LWP 'Instructions Retired' described in 13.4.2.2, which is user-mode only and per-core, but I assume that is completely different from this MSR > and part of the features that got dropped in newer cores. LWP doesn't exist any more, even in older processors. It was the feature sacrificed to make enough microcode space to retrofit IBPB in microcode for speculation protections. ~Andrew
On 25.10.2023 21:29, Edwin Török wrote: > --- a/xen/arch/x86/msr.c > +++ b/xen/arch/x86/msr.c > @@ -584,6 +584,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val) > } > break; > > + case MSR_K8_HWCR: > + if ( !(cp->x86_vendor & X86_VENDOR_AMD) || > + (val & ~K8_HWCR_IRPERF_EN) || > + wrmsr_safe(msr, val) != 0 ) > + goto gp_fault; > + break; Well, no: There's no-where you ever turn the bit off again. There's no respective adjustment to guest_rdmsr(). There's no migration of the bit guests gain control of. You also shouldn't permit writing the bit when the feature flag is clear. > --- a/xen/include/public/arch-x86/cpufeatureset.h > +++ b/xen/include/public/arch-x86/cpufeatureset.h > @@ -241,7 +241,7 @@ XEN_CPUFEATURE(EFRO, 7*32+10) /* APERF/MPERF Read Only interface */ > > /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */ > XEN_CPUFEATURE(CLZERO, 8*32+ 0) /*A CLZERO instruction */ > -XEN_CPUFEATURE(IRPERF, 8*32+ 1) /* Instruction Retired Performance Counter */ > +XEN_CPUFEATURE(IRPERF, 8*32+ 1) /*A! Instruction Retired Performance Counter */ Again I don't see why you have ! here. There's also missing movement of code the previous patch added to recalculate_misc(), which now ought to be in calculate_host_policy(). Jan
diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h index 061b07c7ae..1d94fe3a5b 100644 --- a/xen/arch/x86/include/asm/msr-index.h +++ b/xen/arch/x86/include/asm/msr-index.h @@ -393,6 +393,7 @@ #define MSR_K8_HWCR 0xc0010015 #define K8_HWCR_TSC_FREQ_SEL (1ULL << 24) +#define K8_HWCR_IRPERF_EN (1ULL << 30) #define K8_HWCR_CPUID_USER_DIS (1ULL << 35) #define MSR_K7_FID_VID_CTL 0xc0010041 diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c index 483b5e4f70..b3cd851d9d 100644 --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -584,6 +584,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val) } break; + case MSR_K8_HWCR: + if ( !(cp->x86_vendor & X86_VENDOR_AMD) || + (val & ~K8_HWCR_IRPERF_EN) || + wrmsr_safe(msr, val) != 0 ) + goto gp_fault; + break; + case MSR_AMD64_DE_CFG: /* * OpenBSD 6.7 will panic if writing to DE_CFG triggers a #GP: diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index 5faca0bf7a..40f74cd5e8 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -241,7 +241,7 @@ XEN_CPUFEATURE(EFRO, 7*32+10) /* APERF/MPERF Read Only interface */ /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */ XEN_CPUFEATURE(CLZERO, 8*32+ 0) /*A CLZERO instruction */ -XEN_CPUFEATURE(IRPERF, 8*32+ 1) /* Instruction Retired Performance Counter */ +XEN_CPUFEATURE(IRPERF, 8*32+ 1) /*A! Instruction Retired Performance Counter */ XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */ XEN_CPUFEATURE(WBNOINVD, 8*32+ 9) /* WBNOINVD instruction */ XEN_CPUFEATURE(IBPB, 8*32+12) /*A IBPB support only (no IBRS, used by AMD) */