diff mbox series

[RFC,22/22] x86/AMD: add IRPerf support

Message ID 6cd765e98fa4888b9e94215f3572a94e95fe2a4b.1698261255.git.edwin.torok@cloud.com (mailing list archive)
State New, archived
Headers show
Series vPMU bugfixes and support for PMUv5 | expand

Commit Message

Edwin Torok Oct. 25, 2023, 7:29 p.m. UTC
From: Edwin Török <edvin.torok@citrix.com>

Instruction retired perf counter, enabled by writing to a bit in HWCR.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
---
 xen/arch/x86/include/asm/msr-index.h        | 1 +
 xen/arch/x86/msr.c                          | 7 +++++++
 xen/include/public/arch-x86/cpufeatureset.h | 2 +-
 3 files changed, 9 insertions(+), 1 deletion(-)

Comments

Andrew Cooper Oct. 25, 2023, 8:59 p.m. UTC | #1
On 25/10/2023 8:29 pm, Edwin Török wrote:
> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> index 483b5e4f70..b3cd851d9d 100644
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -584,6 +584,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
>          }
>          break;
>  
> +    case MSR_K8_HWCR:
> +        if ( !(cp->x86_vendor & X86_VENDOR_AMD) ||
> +             (val & ~K8_HWCR_IRPERF_EN) ||
> +             wrmsr_safe(msr, val) != 0 )
> +            goto gp_fault;
> +        break;
> +
>      case MSR_AMD64_DE_CFG:
>          /*
>           * OpenBSD 6.7 will panic if writing to DE_CFG triggers a #GP:
> diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
> index 5faca0bf7a..40f74cd5e8 100644
> --- a/xen/include/public/arch-x86/cpufeatureset.h
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -241,7 +241,7 @@ XEN_CPUFEATURE(EFRO,          7*32+10) /*   APERF/MPERF Read Only interface */
>  
>  /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
>  XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
> -XEN_CPUFEATURE(IRPERF,        8*32+ 1) /* Instruction Retired Performance Counter */
> +XEN_CPUFEATURE(IRPERF,        8*32+ 1) /*A! Instruction Retired Performance Counter */
>  XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A  (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */
>  XEN_CPUFEATURE(WBNOINVD,      8*32+ 9) /*   WBNOINVD instruction */
>  XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by AMD) */

Last things first.  You can advertise this bit to guests, but I'm
looking at the AMD manuals and woefully little information on what this
is an how it works.

It does have an architectural enumeration, but there isn't even a a
description of what HWCR.IPERF_EN does.

The HWCR documentation says "enables Core::X86::Msr::IRPerfCount", but
the IRPERF MSR says no such think, noting only that EFRO_LOCK makes the
MSR non-writeable (which in turn means we can't always offer it to
guests anyway.  See below).

At a guess, the HWCR bit activates the counter, but nothing states
this.  At a guess, it's a free-running, reset-able counter, but again
nothing states this.  The manual just says "is a dedicated counter" and
doesn't even state whether it is (or is not) tied into the other core
fixed counters and whether it is integrated with PMI or not.

Which brings us on to the middle hunk.  Putting it there short circuits
the PV-specific handling, but you can't do that in general without
arranging for HWCR to be context switched properly for vCPUs, nor in
this case IPERF_COUNT itself.

Unless you context switch HWCR and IPERF_COUNT, a guest will see it not
counting, or the count going backwards, 50% of the time as the vCPU is
scheduled around.

So while in principle we could let guests use this facility (it would
have to be off by default, because there doesn't appear to be any
filtering capability in it at all, so we can't restrict it to
instructions retired in guest context), it will need a far larger patch
than this to do it safely.

~Andrew
Edwin Torok Oct. 26, 2023, 4:39 p.m. UTC | #2
> On 25 Oct 2023, at 21:59, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
> 
> On 25/10/2023 8:29 pm, Edwin Török wrote:
>> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
>> index 483b5e4f70..b3cd851d9d 100644
>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -584,6 +584,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
>>         }
>>         break;
>> 
>> +    case MSR_K8_HWCR:
>> +        if ( !(cp->x86_vendor & X86_VENDOR_AMD) ||
>> +             (val & ~K8_HWCR_IRPERF_EN) ||
>> +             wrmsr_safe(msr, val) != 0 )
>> +            goto gp_fault;
>> +        break;
>> +
>>     case MSR_AMD64_DE_CFG:
>>         /*
>>          * OpenBSD 6.7 will panic if writing to DE_CFG triggers a #GP:
>> diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
>> index 5faca0bf7a..40f74cd5e8 100644
>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>> @@ -241,7 +241,7 @@ XEN_CPUFEATURE(EFRO,          7*32+10) /*   APERF/MPERF Read Only interface */
>> 
>> /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
>> XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
>> -XEN_CPUFEATURE(IRPERF,        8*32+ 1) /* Instruction Retired Performance Counter */
>> +XEN_CPUFEATURE(IRPERF,        8*32+ 1) /*A! Instruction Retired Performance Counter */
>> XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A  (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */
>> XEN_CPUFEATURE(WBNOINVD,      8*32+ 9) /*   WBNOINVD instruction */
>> XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by AMD) */
> 
> Last things first.  You can advertise this bit to guests, but I'm
> looking at the AMD manuals and woefully little information on what this
> is an how it works.
> 
> It does have an architectural enumeration, but there isn't even a a
> description of what HWCR.IPERF_EN does.
> 
> The HWCR documentation says "enables Core::X86::Msr::IRPerfCount", but
> the IRPERF MSR says no such think,

> noting only that EFRO_LOCK makes the
> MSR non-writeable (which in turn means we can't always offer it to
> guests anyway.  See below).

> At a guess, the HWCR bit activates the counter, but nothing states
> this.


My version (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf) says:
"This is a dedicated counter that is always counting instructions retired. It exists at MSR address C000_00E9. It is enabled by setting a 1 to HWCR[30] and its support is indicated by CPUID Fn8000_0008_EBX[1]."

Although digging a bit more there are some erratas around deep C states on some chips and the HWCR register itself is not global but per CCD (which is neatly buried in the _ccd[7:0]_.... smallprint at the top:
https://github.com/cyring/CoreFreq/issues/206#issuecomment-722713147


>   At a guess, it's a free-running, reset-able counter, but again
> nothing states this.  The manual just says "is a dedicated counter" and
> doesn't even state whether it is (or is not) tied into the other core
> fixed counters and whether it is integrated with PMI or not.

There is also a LWP 'Instructions Retired' described in 13.4.2.2, which is user-mode only and per-core, but I assume that is completely different from this MSR
and part of the features that got dropped in newer cores.

> 
> Which brings us on to the middle hunk.  Putting it there short circuits
> the PV-specific handling, but you can't do that in general without
> arranging for HWCR to be context switched properly for vCPUs, nor in
> this case IPERF_COUNT itself.
> 
> Unless you context switch HWCR and IPERF_COUNT, a guest will see it not
> counting, or the count going backwards, 50% of the time as the vCPU is
> scheduled around.
> 
> So while in principle we could let guests use this facility (it would
> have to be off by default, because there doesn't appear to be any
> filtering capability in it at all, so we can't restrict it to
> instructions retired in guest context), it will need a far larger patch
> than this to do it safely.

Thanks, I'll move this patch into the 'needs more work' category.


Thanks,
--Edwin
Andrew Cooper Oct. 26, 2023, 8:38 p.m. UTC | #3
On 26/10/2023 5:39 pm, Edwin Torok wrote:
>> On 25 Oct 2023, at 21:59, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>>
>> On 25/10/2023 8:29 pm, Edwin Török wrote:
>>> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
>>> index 483b5e4f70..b3cd851d9d 100644
>>> --- a/xen/arch/x86/msr.c
>>> +++ b/xen/arch/x86/msr.c
>>> @@ -584,6 +584,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
>>>         }
>>>         break;
>>>
>>> +    case MSR_K8_HWCR:
>>> +        if ( !(cp->x86_vendor & X86_VENDOR_AMD) ||
>>> +             (val & ~K8_HWCR_IRPERF_EN) ||
>>> +             wrmsr_safe(msr, val) != 0 )
>>> +            goto gp_fault;
>>> +        break;
>>> +
>>>     case MSR_AMD64_DE_CFG:
>>>         /*
>>>          * OpenBSD 6.7 will panic if writing to DE_CFG triggers a #GP:
>>> diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
>>> index 5faca0bf7a..40f74cd5e8 100644
>>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>>> @@ -241,7 +241,7 @@ XEN_CPUFEATURE(EFRO,          7*32+10) /*   APERF/MPERF Read Only interface */
>>>
>>> /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
>>> XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
>>> -XEN_CPUFEATURE(IRPERF,        8*32+ 1) /* Instruction Retired Performance Counter */
>>> +XEN_CPUFEATURE(IRPERF,        8*32+ 1) /*A! Instruction Retired Performance Counter */
>>> XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A  (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */
>>> XEN_CPUFEATURE(WBNOINVD,      8*32+ 9) /*   WBNOINVD instruction */
>>> XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by AMD) */
>> Last things first.  You can advertise this bit to guests, but I'm
>> looking at the AMD manuals and woefully little information on what this
>> is an how it works.
>>
>> It does have an architectural enumeration, but there isn't even a a
>> description of what HWCR.IPERF_EN does.
>>
>> The HWCR documentation says "enables Core::X86::Msr::IRPerfCount", but
>> the IRPERF MSR says no such think,
>> noting only that EFRO_LOCK makes the
>> MSR non-writeable (which in turn means we can't always offer it to
>> guests anyway.  See below).
>> At a guess, the HWCR bit activates the counter, but nothing states
>> this.
>
> My version (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf) says:
> "This is a dedicated counter that is always counting instructions retired. It exists at MSR address C000_00E9. It is enabled by setting a 1 to HWCR[30] and its support is indicated by CPUID Fn8000_0008_EBX[1]."
>
> Although digging a bit more there are some erratas around deep C states on some chips and the HWCR register itself is not global but per CCD (which is neatly buried in the _ccd[7:0]_.... smallprint at the top:
> https://github.com/cyring/CoreFreq/issues/206#issuecomment-722713147

HWCR is per-thread, and that's what both screenshots there say.  Ryzen's
don't typically have CCDs, which is why CCD is missing from the scope
description.

Consider the implications of making bit 30 work if the MSR was CCD-wide,
seeing as the counting really does need to be done in at the core level.

Or for that matter bit 35 (the CPUID-faulting enable bit).
>>   At a guess, it's a free-running, reset-able counter, but again
>> nothing states this.  The manual just says "is a dedicated counter" and
>> doesn't even state whether it is (or is not) tied into the other core
>> fixed counters and whether it is integrated with PMI or not.
> There is also a LWP 'Instructions Retired' described in 13.4.2.2, which is user-mode only and per-core, but I assume that is completely different from this MSR
> and part of the features that got dropped in newer cores.

LWP doesn't exist any more, even in older processors.

It was the feature sacrificed to make enough microcode space to retrofit
IBPB in microcode for speculation protections.

~Andrew
Jan Beulich July 22, 2024, 1:50 p.m. UTC | #4
On 25.10.2023 21:29, Edwin Török wrote:
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -584,6 +584,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
>          }
>          break;
>  
> +    case MSR_K8_HWCR:
> +        if ( !(cp->x86_vendor & X86_VENDOR_AMD) ||
> +             (val & ~K8_HWCR_IRPERF_EN) ||
> +             wrmsr_safe(msr, val) != 0 )
> +            goto gp_fault;
> +        break;

Well, no: There's no-where you ever turn the bit off again. There's no
respective adjustment to guest_rdmsr(). There's no migration of the bit
guests gain control of. You also shouldn't permit writing the bit when
the feature flag is clear.

> --- a/xen/include/public/arch-x86/cpufeatureset.h
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -241,7 +241,7 @@ XEN_CPUFEATURE(EFRO,          7*32+10) /*   APERF/MPERF Read Only interface */
>  
>  /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
>  XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
> -XEN_CPUFEATURE(IRPERF,        8*32+ 1) /* Instruction Retired Performance Counter */
> +XEN_CPUFEATURE(IRPERF,        8*32+ 1) /*A! Instruction Retired Performance Counter */

Again I don't see why you have ! here. There's also missing movement of code
the previous patch added to recalculate_misc(), which now ought to be in
calculate_host_policy().

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h
index 061b07c7ae..1d94fe3a5b 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -393,6 +393,7 @@ 
 
 #define MSR_K8_HWCR			0xc0010015
 #define K8_HWCR_TSC_FREQ_SEL		(1ULL << 24)
+#define K8_HWCR_IRPERF_EN		(1ULL << 30)
 #define K8_HWCR_CPUID_USER_DIS		(1ULL << 35)
 
 #define MSR_K7_FID_VID_CTL		0xc0010041
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 483b5e4f70..b3cd851d9d 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -584,6 +584,13 @@  int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
         }
         break;
 
+    case MSR_K8_HWCR:
+        if ( !(cp->x86_vendor & X86_VENDOR_AMD) ||
+             (val & ~K8_HWCR_IRPERF_EN) ||
+             wrmsr_safe(msr, val) != 0 )
+            goto gp_fault;
+        break;
+
     case MSR_AMD64_DE_CFG:
         /*
          * OpenBSD 6.7 will panic if writing to DE_CFG triggers a #GP:
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 5faca0bf7a..40f74cd5e8 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -241,7 +241,7 @@  XEN_CPUFEATURE(EFRO,          7*32+10) /*   APERF/MPERF Read Only interface */
 
 /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
 XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
-XEN_CPUFEATURE(IRPERF,        8*32+ 1) /* Instruction Retired Performance Counter */
+XEN_CPUFEATURE(IRPERF,        8*32+ 1) /*A! Instruction Retired Performance Counter */
 XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A  (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */
 XEN_CPUFEATURE(WBNOINVD,      8*32+ 9) /*   WBNOINVD instruction */
 XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by AMD) */