diff mbox series

x86/cstate: fix mwait hint target cstate calc

Message ID 64137e23-e374-4129-8e3c-dcd7606364d4@linux.alibaba.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series x86/cstate: fix mwait hint target cstate calc | expand

Commit Message

He Rongguang Feb. 28, 2024, 7:28 a.m. UTC
According to x86 manual (Intel SDM Vol 2, Table 4-11. MWAIT Hints
Register (EAX) and AMD manual Vol 3, MWAIT), mwait hint[7:4] adds 1 is
the corresponding cstate, and 0xF means C0, so fix the handling of
0xF -> C0.

Intel: "Value of 0 means C1; 1 means C2 and so on
Value of 01111B means C0".

AMD: "The processor C-state is EAX[7:4]+1, so to request C0 is to place
the value F in EAX[7:4] and to request C1 is to place the value 0 in
EAX[7:4].".

Signed-off-by: He Rongguang <herongguang@linux.alibaba.com>
---
  arch/x86/kernel/acpi/cstate.c | 4 ++--
  drivers/idle/intel_idle.c     | 3 ++-
  2 files changed, 4 insertions(+), 3 deletions(-)


--
2.43.0

Comments

Rafael J. Wysocki Feb. 29, 2024, 5:22 p.m. UTC | #1
On Wed, Feb 28, 2024 at 8:28 AM He Rongguang
<herongguang@linux.alibaba.com> wrote:
>
> According to x86 manual (Intel SDM Vol 2, Table 4-11. MWAIT Hints
> Register (EAX) and AMD manual Vol 3, MWAIT), mwait hint[7:4] adds 1 is
> the corresponding cstate, and 0xF means C0, so fix the handling of
> 0xF -> C0.
>
> Intel: "Value of 0 means C1; 1 means C2 and so on
> Value of 01111B means C0".
>
> AMD: "The processor C-state is EAX[7:4]+1, so to request C0 is to place
> the value F in EAX[7:4] and to request C1 is to place the value 0 in
> EAX[7:4].".

Yes, 0x0F is defined to correspond to C0.  However, the value in
question is never equal to 0x0F in any of the functions modified by
this patch.

What's the purpose of the modification, then?

> Signed-off-by: He Rongguang <herongguang@linux.alibaba.com>
> ---
>   arch/x86/kernel/acpi/cstate.c | 4 ++--
>   drivers/idle/intel_idle.c     | 3 ++-
>   2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
> index 401808b47af3..f3ffd0a3a012 100644
> --- a/arch/x86/kernel/acpi/cstate.c
> +++ b/arch/x86/kernel/acpi/cstate.c
> @@ -131,8 +131,8 @@ static long acpi_processor_ffh_cstate_probe_cpu(void
> *_cx)
>          cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
>
>          /* Check whether this particular cx_type (in CST) is supported
> or not */
> -       cstate_type = ((cx->address >> MWAIT_SUBSTATE_SIZE) &
> -                       MWAIT_CSTATE_MASK) + 1;
> +       cstate_type = (((cx->address >> MWAIT_SUBSTATE_SIZE) &
> +                       MWAIT_CSTATE_MASK) + 1) & MWAIT_CSTATE_MASK;
>          edx_part = edx >> (cstate_type * MWAIT_SUBSTATE_SIZE);
>          num_cstate_subtype = edx_part & MWAIT_SUBSTATE_MASK;
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index bcf1198e8991..e486027f8b07 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -1934,7 +1934,8 @@ static void __init spr_idle_state_table_update(void)
>
>   static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
>   {
> -       unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1;
> +       unsigned int mwait_cstate = (MWAIT_HINT2CSTATE(mwait_hint) + 1) &
> +                                       MWAIT_CSTATE_MASK;
>          unsigned int num_substates = (mwait_substates >> mwait_cstate *
> 4) &
>                                          MWAIT_SUBSTATE_MASK;
>
> --
He Rongguang March 1, 2024, 10:02 a.m. UTC | #2
在 2024/3/1 1:22, Rafael J. Wysocki 写道:
> On Wed, Feb 28, 2024 at 8:28 AM He Rongguang
> <herongguang@linux.alibaba.com> wrote:
>>
>> According to x86 manual (Intel SDM Vol 2, Table 4-11. MWAIT Hints
>> Register (EAX) and AMD manual Vol 3, MWAIT), mwait hint[7:4] adds 1 is
>> the corresponding cstate, and 0xF means C0, so fix the handling of
>> 0xF -> C0.
>>
>> Intel: "Value of 0 means C1; 1 means C2 and so on
>> Value of 01111B means C0".
>>
>> AMD: "The processor C-state is EAX[7:4]+1, so to request C0 is to place
>> the value F in EAX[7:4] and to request C1 is to place the value 0 in
>> EAX[7:4].".
> 
> Yes, 0x0F is defined to correspond to C0.  However, the value in
> question is never equal to 0x0F in any of the functions modified by
> this patch.
> 
> What's the purpose of the modification, then?
> 

Hi, this is found when I tweak ACPI cstate table qemu presenting to VM.

Usually, ACPI cstate table only contains C1+, but nothing prevents ACPI 
firmware from presenting a cstate (maybe C1+) but using a mwait address 
C0 (i.e., 0xF in ACPI FFH MWAIT hint address). And if this is the case, 
Linux erroneously treat this cstate as C16, while actually this should 
be legal C0 state.

As I think ACPI firmware is out of Linux kernel scope, so I think it’s 
better for Linux kernel to implement here referring to spec, how do you 
think? :)


>> Signed-off-by: He Rongguang <herongguang@linux.alibaba.com>
>> ---
>>    arch/x86/kernel/acpi/cstate.c | 4 ++--
>>    drivers/idle/intel_idle.c     | 3 ++-
>>    2 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
>> index 401808b47af3..f3ffd0a3a012 100644
>> --- a/arch/x86/kernel/acpi/cstate.c
>> +++ b/arch/x86/kernel/acpi/cstate.c
>> @@ -131,8 +131,8 @@ static long acpi_processor_ffh_cstate_probe_cpu(void
>> *_cx)
>>           cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
>>
>>           /* Check whether this particular cx_type (in CST) is supported
>> or not */
>> -       cstate_type = ((cx->address >> MWAIT_SUBSTATE_SIZE) &
>> -                       MWAIT_CSTATE_MASK) + 1;
>> +       cstate_type = (((cx->address >> MWAIT_SUBSTATE_SIZE) &
>> +                       MWAIT_CSTATE_MASK) + 1) & MWAIT_CSTATE_MASK;
>>           edx_part = edx >> (cstate_type * MWAIT_SUBSTATE_SIZE);
>>           num_cstate_subtype = edx_part & MWAIT_SUBSTATE_MASK;
>>
>> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
>> index bcf1198e8991..e486027f8b07 100644
>> --- a/drivers/idle/intel_idle.c
>> +++ b/drivers/idle/intel_idle.c
>> @@ -1934,7 +1934,8 @@ static void __init spr_idle_state_table_update(void)
>>
>>    static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
>>    {
>> -       unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1;
>> +       unsigned int mwait_cstate = (MWAIT_HINT2CSTATE(mwait_hint) + 1) &
>> +                                       MWAIT_CSTATE_MASK;
>>           unsigned int num_substates = (mwait_substates >> mwait_cstate *
>> 4) &
>>                                           MWAIT_SUBSTATE_MASK;
>>
>> --
Rafael J. Wysocki March 1, 2024, 11:04 a.m. UTC | #3
On Fri, Mar 1, 2024 at 11:02 AM He Rongguang
<herongguang@linux.alibaba.com> wrote:
>
> 在 2024/3/1 1:22, Rafael J. Wysocki 写道:
> > On Wed, Feb 28, 2024 at 8:28 AM He Rongguang
> > <herongguang@linux.alibaba.com> wrote:
> >>
> >> According to x86 manual (Intel SDM Vol 2, Table 4-11. MWAIT Hints
> >> Register (EAX) and AMD manual Vol 3, MWAIT), mwait hint[7:4] adds 1 is
> >> the corresponding cstate, and 0xF means C0, so fix the handling of
> >> 0xF -> C0.
> >>
> >> Intel: "Value of 0 means C1; 1 means C2 and so on
> >> Value of 01111B means C0".
> >>
> >> AMD: "The processor C-state is EAX[7:4]+1, so to request C0 is to place
> >> the value F in EAX[7:4] and to request C1 is to place the value 0 in
> >> EAX[7:4].".
> >
> > Yes, 0x0F is defined to correspond to C0.  However, the value in
> > question is never equal to 0x0F in any of the functions modified by
> > this patch.
> >
> > What's the purpose of the modification, then?
> >
>
> Hi, this is found when I tweak ACPI cstate table qemu presenting to VM.
>
> Usually, ACPI cstate table only contains C1+, but nothing prevents ACPI
> firmware from presenting a cstate (maybe C1+) but using a mwait address
> C0 (i.e., 0xF in ACPI FFH MWAIT hint address). And if this is the case,
> Linux erroneously treat this cstate as C16, while actually this should
> be legal C0 state.
>
> As I think ACPI firmware is out of Linux kernel scope, so I think it’s
> better for Linux kernel to implement here referring to spec, how do you
> think? :)

That's fair, but you need to put the above information into the patch
changelog.  Otherwise it is unclear why you want to make this change.
He Rongguang March 3, 2024, 4:13 a.m. UTC | #4
在 2024/3/1 19:04, Rafael J. Wysocki 写道:
> On Fri, Mar 1, 2024 at 11:02 AM He Rongguang
> <herongguang@linux.alibaba.com> wrote:
>>
>> 在 2024/3/1 1:22, Rafael J. Wysocki 写道:
>>> On Wed, Feb 28, 2024 at 8:28 AM He Rongguang
>>> <herongguang@linux.alibaba.com> wrote:
>>>>
>>>> According to x86 manual (Intel SDM Vol 2, Table 4-11. MWAIT Hints
>>>> Register (EAX) and AMD manual Vol 3, MWAIT), mwait hint[7:4] adds 1 is
>>>> the corresponding cstate, and 0xF means C0, so fix the handling of
>>>> 0xF -> C0.
>>>>
>>>> Intel: "Value of 0 means C1; 1 means C2 and so on
>>>> Value of 01111B means C0".
>>>>
>>>> AMD: "The processor C-state is EAX[7:4]+1, so to request C0 is to place
>>>> the value F in EAX[7:4] and to request C1 is to place the value 0 in
>>>> EAX[7:4].".
>>>
>>> Yes, 0x0F is defined to correspond to C0.  However, the value in
>>> question is never equal to 0x0F in any of the functions modified by
>>> this patch.
>>>
>>> What's the purpose of the modification, then?
>>>
>>
>> Hi, this is found when I tweak ACPI cstate table qemu presenting to VM.
>>
>> Usually, ACPI cstate table only contains C1+, but nothing prevents ACPI
>> firmware from presenting a cstate (maybe C1+) but using a mwait address
>> C0 (i.e., 0xF in ACPI FFH MWAIT hint address). And if this is the case,
>> Linux erroneously treat this cstate as C16, while actually this should
>> be legal C0 state.
>>
>> As I think ACPI firmware is out of Linux kernel scope, so I think it’s
>> better for Linux kernel to implement here referring to spec, how do you
>> think? :)
> 
> That's fair, but you need to put the above information into the patch
> changelog.  Otherwise it is unclear why you want to make this change.

Ok, will send a V2 patch.
diff mbox series

Patch

diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index 401808b47af3..f3ffd0a3a012 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -131,8 +131,8 @@  static long acpi_processor_ffh_cstate_probe_cpu(void 
*_cx)
         cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);

         /* Check whether this particular cx_type (in CST) is supported 
or not */
-       cstate_type = ((cx->address >> MWAIT_SUBSTATE_SIZE) &
-                       MWAIT_CSTATE_MASK) + 1;
+       cstate_type = (((cx->address >> MWAIT_SUBSTATE_SIZE) &
+                       MWAIT_CSTATE_MASK) + 1) & MWAIT_CSTATE_MASK;
         edx_part = edx >> (cstate_type * MWAIT_SUBSTATE_SIZE);
         num_cstate_subtype = edx_part & MWAIT_SUBSTATE_MASK;

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index bcf1198e8991..e486027f8b07 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1934,7 +1934,8 @@  static void __init spr_idle_state_table_update(void)

  static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
  {
-       unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1;
+       unsigned int mwait_cstate = (MWAIT_HINT2CSTATE(mwait_hint) + 1) &
+                                       MWAIT_CSTATE_MASK;
         unsigned int num_substates = (mwait_substates >> mwait_cstate * 
4) &
                                         MWAIT_SUBSTATE_MASK;