Message ID | 64137e23-e374-4129-8e3c-dcd7606364d4@linux.alibaba.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | x86/cstate: fix mwait hint target cstate calc | expand |
On Wed, Feb 28, 2024 at 8:28 AM He Rongguang <herongguang@linux.alibaba.com> wrote: > > According to x86 manual (Intel SDM Vol 2, Table 4-11. MWAIT Hints > Register (EAX) and AMD manual Vol 3, MWAIT), mwait hint[7:4] adds 1 is > the corresponding cstate, and 0xF means C0, so fix the handling of > 0xF -> C0. > > Intel: "Value of 0 means C1; 1 means C2 and so on > Value of 01111B means C0". > > AMD: "The processor C-state is EAX[7:4]+1, so to request C0 is to place > the value F in EAX[7:4] and to request C1 is to place the value 0 in > EAX[7:4].". Yes, 0x0F is defined to correspond to C0. However, the value in question is never equal to 0x0F in any of the functions modified by this patch. What's the purpose of the modification, then? > Signed-off-by: He Rongguang <herongguang@linux.alibaba.com> > --- > arch/x86/kernel/acpi/cstate.c | 4 ++-- > drivers/idle/intel_idle.c | 3 ++- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c > index 401808b47af3..f3ffd0a3a012 100644 > --- a/arch/x86/kernel/acpi/cstate.c > +++ b/arch/x86/kernel/acpi/cstate.c > @@ -131,8 +131,8 @@ static long acpi_processor_ffh_cstate_probe_cpu(void > *_cx) > cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx); > > /* Check whether this particular cx_type (in CST) is supported > or not */ > - cstate_type = ((cx->address >> MWAIT_SUBSTATE_SIZE) & > - MWAIT_CSTATE_MASK) + 1; > + cstate_type = (((cx->address >> MWAIT_SUBSTATE_SIZE) & > + MWAIT_CSTATE_MASK) + 1) & MWAIT_CSTATE_MASK; > edx_part = edx >> (cstate_type * MWAIT_SUBSTATE_SIZE); > num_cstate_subtype = edx_part & MWAIT_SUBSTATE_MASK; > > diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c > index bcf1198e8991..e486027f8b07 100644 > --- a/drivers/idle/intel_idle.c > +++ b/drivers/idle/intel_idle.c > @@ -1934,7 +1934,8 @@ static void __init spr_idle_state_table_update(void) > > static bool __init intel_idle_verify_cstate(unsigned int mwait_hint) > { > - unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1; > + unsigned int mwait_cstate = (MWAIT_HINT2CSTATE(mwait_hint) + 1) & > + MWAIT_CSTATE_MASK; > unsigned int num_substates = (mwait_substates >> mwait_cstate * > 4) & > MWAIT_SUBSTATE_MASK; > > --
在 2024/3/1 1:22, Rafael J. Wysocki 写道: > On Wed, Feb 28, 2024 at 8:28 AM He Rongguang > <herongguang@linux.alibaba.com> wrote: >> >> According to x86 manual (Intel SDM Vol 2, Table 4-11. MWAIT Hints >> Register (EAX) and AMD manual Vol 3, MWAIT), mwait hint[7:4] adds 1 is >> the corresponding cstate, and 0xF means C0, so fix the handling of >> 0xF -> C0. >> >> Intel: "Value of 0 means C1; 1 means C2 and so on >> Value of 01111B means C0". >> >> AMD: "The processor C-state is EAX[7:4]+1, so to request C0 is to place >> the value F in EAX[7:4] and to request C1 is to place the value 0 in >> EAX[7:4].". > > Yes, 0x0F is defined to correspond to C0. However, the value in > question is never equal to 0x0F in any of the functions modified by > this patch. > > What's the purpose of the modification, then? > Hi, this is found when I tweak ACPI cstate table qemu presenting to VM. Usually, ACPI cstate table only contains C1+, but nothing prevents ACPI firmware from presenting a cstate (maybe C1+) but using a mwait address C0 (i.e., 0xF in ACPI FFH MWAIT hint address). And if this is the case, Linux erroneously treat this cstate as C16, while actually this should be legal C0 state. As I think ACPI firmware is out of Linux kernel scope, so I think it’s better for Linux kernel to implement here referring to spec, how do you think? :) >> Signed-off-by: He Rongguang <herongguang@linux.alibaba.com> >> --- >> arch/x86/kernel/acpi/cstate.c | 4 ++-- >> drivers/idle/intel_idle.c | 3 ++- >> 2 files changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c >> index 401808b47af3..f3ffd0a3a012 100644 >> --- a/arch/x86/kernel/acpi/cstate.c >> +++ b/arch/x86/kernel/acpi/cstate.c >> @@ -131,8 +131,8 @@ static long acpi_processor_ffh_cstate_probe_cpu(void >> *_cx) >> cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx); >> >> /* Check whether this particular cx_type (in CST) is supported >> or not */ >> - cstate_type = ((cx->address >> MWAIT_SUBSTATE_SIZE) & >> - MWAIT_CSTATE_MASK) + 1; >> + cstate_type = (((cx->address >> MWAIT_SUBSTATE_SIZE) & >> + MWAIT_CSTATE_MASK) + 1) & MWAIT_CSTATE_MASK; >> edx_part = edx >> (cstate_type * MWAIT_SUBSTATE_SIZE); >> num_cstate_subtype = edx_part & MWAIT_SUBSTATE_MASK; >> >> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c >> index bcf1198e8991..e486027f8b07 100644 >> --- a/drivers/idle/intel_idle.c >> +++ b/drivers/idle/intel_idle.c >> @@ -1934,7 +1934,8 @@ static void __init spr_idle_state_table_update(void) >> >> static bool __init intel_idle_verify_cstate(unsigned int mwait_hint) >> { >> - unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1; >> + unsigned int mwait_cstate = (MWAIT_HINT2CSTATE(mwait_hint) + 1) & >> + MWAIT_CSTATE_MASK; >> unsigned int num_substates = (mwait_substates >> mwait_cstate * >> 4) & >> MWAIT_SUBSTATE_MASK; >> >> --
On Fri, Mar 1, 2024 at 11:02 AM He Rongguang <herongguang@linux.alibaba.com> wrote: > > 在 2024/3/1 1:22, Rafael J. Wysocki 写道: > > On Wed, Feb 28, 2024 at 8:28 AM He Rongguang > > <herongguang@linux.alibaba.com> wrote: > >> > >> According to x86 manual (Intel SDM Vol 2, Table 4-11. MWAIT Hints > >> Register (EAX) and AMD manual Vol 3, MWAIT), mwait hint[7:4] adds 1 is > >> the corresponding cstate, and 0xF means C0, so fix the handling of > >> 0xF -> C0. > >> > >> Intel: "Value of 0 means C1; 1 means C2 and so on > >> Value of 01111B means C0". > >> > >> AMD: "The processor C-state is EAX[7:4]+1, so to request C0 is to place > >> the value F in EAX[7:4] and to request C1 is to place the value 0 in > >> EAX[7:4].". > > > > Yes, 0x0F is defined to correspond to C0. However, the value in > > question is never equal to 0x0F in any of the functions modified by > > this patch. > > > > What's the purpose of the modification, then? > > > > Hi, this is found when I tweak ACPI cstate table qemu presenting to VM. > > Usually, ACPI cstate table only contains C1+, but nothing prevents ACPI > firmware from presenting a cstate (maybe C1+) but using a mwait address > C0 (i.e., 0xF in ACPI FFH MWAIT hint address). And if this is the case, > Linux erroneously treat this cstate as C16, while actually this should > be legal C0 state. > > As I think ACPI firmware is out of Linux kernel scope, so I think it’s > better for Linux kernel to implement here referring to spec, how do you > think? :) That's fair, but you need to put the above information into the patch changelog. Otherwise it is unclear why you want to make this change.
在 2024/3/1 19:04, Rafael J. Wysocki 写道: > On Fri, Mar 1, 2024 at 11:02 AM He Rongguang > <herongguang@linux.alibaba.com> wrote: >> >> 在 2024/3/1 1:22, Rafael J. Wysocki 写道: >>> On Wed, Feb 28, 2024 at 8:28 AM He Rongguang >>> <herongguang@linux.alibaba.com> wrote: >>>> >>>> According to x86 manual (Intel SDM Vol 2, Table 4-11. MWAIT Hints >>>> Register (EAX) and AMD manual Vol 3, MWAIT), mwait hint[7:4] adds 1 is >>>> the corresponding cstate, and 0xF means C0, so fix the handling of >>>> 0xF -> C0. >>>> >>>> Intel: "Value of 0 means C1; 1 means C2 and so on >>>> Value of 01111B means C0". >>>> >>>> AMD: "The processor C-state is EAX[7:4]+1, so to request C0 is to place >>>> the value F in EAX[7:4] and to request C1 is to place the value 0 in >>>> EAX[7:4].". >>> >>> Yes, 0x0F is defined to correspond to C0. However, the value in >>> question is never equal to 0x0F in any of the functions modified by >>> this patch. >>> >>> What's the purpose of the modification, then? >>> >> >> Hi, this is found when I tweak ACPI cstate table qemu presenting to VM. >> >> Usually, ACPI cstate table only contains C1+, but nothing prevents ACPI >> firmware from presenting a cstate (maybe C1+) but using a mwait address >> C0 (i.e., 0xF in ACPI FFH MWAIT hint address). And if this is the case, >> Linux erroneously treat this cstate as C16, while actually this should >> be legal C0 state. >> >> As I think ACPI firmware is out of Linux kernel scope, so I think it’s >> better for Linux kernel to implement here referring to spec, how do you >> think? :) > > That's fair, but you need to put the above information into the patch > changelog. Otherwise it is unclear why you want to make this change. Ok, will send a V2 patch.
diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c index 401808b47af3..f3ffd0a3a012 100644 --- a/arch/x86/kernel/acpi/cstate.c +++ b/arch/x86/kernel/acpi/cstate.c @@ -131,8 +131,8 @@ static long acpi_processor_ffh_cstate_probe_cpu(void *_cx) cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx); /* Check whether this particular cx_type (in CST) is supported or not */ - cstate_type = ((cx->address >> MWAIT_SUBSTATE_SIZE) & - MWAIT_CSTATE_MASK) + 1; + cstate_type = (((cx->address >> MWAIT_SUBSTATE_SIZE) & + MWAIT_CSTATE_MASK) + 1) & MWAIT_CSTATE_MASK; edx_part = edx >> (cstate_type * MWAIT_SUBSTATE_SIZE); num_cstate_subtype = edx_part & MWAIT_SUBSTATE_MASK; diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index bcf1198e8991..e486027f8b07 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -1934,7 +1934,8 @@ static void __init spr_idle_state_table_update(void) static bool __init intel_idle_verify_cstate(unsigned int mwait_hint) { - unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1; + unsigned int mwait_cstate = (MWAIT_HINT2CSTATE(mwait_hint) + 1) & + MWAIT_CSTATE_MASK; unsigned int num_substates = (mwait_substates >> mwait_cstate * 4) & MWAIT_SUBSTATE_MASK;
According to x86 manual (Intel SDM Vol 2, Table 4-11. MWAIT Hints Register (EAX) and AMD manual Vol 3, MWAIT), mwait hint[7:4] adds 1 is the corresponding cstate, and 0xF means C0, so fix the handling of 0xF -> C0. Intel: "Value of 0 means C1; 1 means C2 and so on Value of 01111B means C0". AMD: "The processor C-state is EAX[7:4]+1, so to request C0 is to place the value F in EAX[7:4] and to request C1 is to place the value 0 in EAX[7:4].". Signed-off-by: He Rongguang <herongguang@linux.alibaba.com> --- arch/x86/kernel/acpi/cstate.c | 4 ++-- drivers/idle/intel_idle.c | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-) -- 2.43.0