diff mbox series

target/i386: Fix CPUID encoding of Fn8000001E_ECX

Message ID 20231110170806.70962-1-babu.moger@amd.com (mailing list archive)
State New, archived
Headers show
Series target/i386: Fix CPUID encoding of Fn8000001E_ECX | expand

Commit Message

Babu Moger Nov. 10, 2023, 5:08 p.m. UTC
Observed the following failure while booting the SEV-SNP guest and the
guest fails to boot with the smp parameters:
"-smp 192,sockets=1,dies=12,cores=8,threads=2".

qemu-system-x86_64: sev_snp_launch_update: SNP_LAUNCH_UPDATE ret=-5 fw_error=22 'Invalid parameter'
qemu-system-x86_64: SEV-SNP: CPUID validation failed for function 0x8000001e, index: 0x0.
provided: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000b00, edx: 0x00000000
expected: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000300, edx: 0x00000000
qemu-system-x86_64: SEV-SNP: failed update CPUID page

Reason for the failure is due to overflowing of bits used for "Node per
processor" in CPUID Fn8000001E_ECX. This field's width is 3 bits wide and
can hold maximum value 0x7. With dies=12 (0xB), it overflows and spills
over into the reserved bits. In the case of SEV-SNP, this causes CPUID
enforcement failure and guest fails to boot.

The PPR documentation for CPUID_Fn8000001E_ECX [Node Identifiers]
=================================================================
Bits    Description
31:11   Reserved.

10:8    NodesPerProcessor: Node per processor. Read-only.
        ValidValues:
        Value   Description
        0h      1 node per processor.
        7h-1h   Reserved.

7:0     NodeId: Node ID. Read-only. Reset: Fixed,XXh.
=================================================================

As in the spec, the valid value for "node per processor" is 0 and rest
are reserved.

Looking back at the history of decoding of CPUID_Fn8000001E_ECX, noticed
that there were cases where "node per processor" can be more than 1. It
is valid only for pre-F17h (pre-EPYC) architectures. For EPYC or later
CPUs, the linux kernel does not use this information to build the L3
topology.

Also noted that the CPUID Function 0x8000001E_ECX is available only when
TOPOEXT feature is enabled. This feature is enabled only for EPYC(F17h)
or later processors. So, previous generation of processors do not not
enumerate 0x8000001E_ECX leaf.

There could be some corner cases where the older guests could enable the
TOPOEXT feature by running with -cpu host, in which case legacy guests
might notice the topology change. To address those cases introduced a
new CPU property "legacy-multi-node". It will be true for older machine
types to maintain compatibility. By default, it will be false, so new
decoding will be used going forward.

The documentation is taken from Preliminary Processor Programming
Reference (PPR) for AMD Family 19h Model 11h, Revision B1 Processors 55901
Rev 0.25 - Oct 6, 2022.

Cc: qemu-stable@nongnu.org
Fixes: 31ada106d891 ("Simplify CPUID_8000_001E for AMD")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 hw/i386/pc.c      |  4 +++-
 target/i386/cpu.c | 18 ++++++++++--------
 target/i386/cpu.h |  1 +
 3 files changed, 14 insertions(+), 9 deletions(-)

Comments

Babu Moger Dec. 13, 2023, 2:57 p.m. UTC | #1
Gentle reminder. Please let me know if there are any concerns or please
pull these patches for next update.
Thanks Babu

On 11/10/23 11:08, Babu Moger wrote:
> Observed the following failure while booting the SEV-SNP guest and the
> guest fails to boot with the smp parameters:
> "-smp 192,sockets=1,dies=12,cores=8,threads=2".
> 
> qemu-system-x86_64: sev_snp_launch_update: SNP_LAUNCH_UPDATE ret=-5 fw_error=22 'Invalid parameter'
> qemu-system-x86_64: SEV-SNP: CPUID validation failed for function 0x8000001e, index: 0x0.
> provided: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000b00, edx: 0x00000000
> expected: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000300, edx: 0x00000000
> qemu-system-x86_64: SEV-SNP: failed update CPUID page
> 
> Reason for the failure is due to overflowing of bits used for "Node per
> processor" in CPUID Fn8000001E_ECX. This field's width is 3 bits wide and
> can hold maximum value 0x7. With dies=12 (0xB), it overflows and spills
> over into the reserved bits. In the case of SEV-SNP, this causes CPUID
> enforcement failure and guest fails to boot.
> 
> The PPR documentation for CPUID_Fn8000001E_ECX [Node Identifiers]
> =================================================================
> Bits    Description
> 31:11   Reserved.
> 
> 10:8    NodesPerProcessor: Node per processor. Read-only.
>         ValidValues:
>         Value   Description
>         0h      1 node per processor.
>         7h-1h   Reserved.
> 
> 7:0     NodeId: Node ID. Read-only. Reset: Fixed,XXh.
> =================================================================
> 
> As in the spec, the valid value for "node per processor" is 0 and rest
> are reserved.
> 
> Looking back at the history of decoding of CPUID_Fn8000001E_ECX, noticed
> that there were cases where "node per processor" can be more than 1. It
> is valid only for pre-F17h (pre-EPYC) architectures. For EPYC or later
> CPUs, the linux kernel does not use this information to build the L3
> topology.
> 
> Also noted that the CPUID Function 0x8000001E_ECX is available only when
> TOPOEXT feature is enabled. This feature is enabled only for EPYC(F17h)
> or later processors. So, previous generation of processors do not not
> enumerate 0x8000001E_ECX leaf.
> 
> There could be some corner cases where the older guests could enable the
> TOPOEXT feature by running with -cpu host, in which case legacy guests
> might notice the topology change. To address those cases introduced a
> new CPU property "legacy-multi-node". It will be true for older machine
> types to maintain compatibility. By default, it will be false, so new
> decoding will be used going forward.
> 
> The documentation is taken from Preliminary Processor Programming
> Reference (PPR) for AMD Family 19h Model 11h, Revision B1 Processors 55901
> Rev 0.25 - Oct 6, 2022.
> 
> Cc: qemu-stable@nongnu.org
> Fixes: 31ada106d891 ("Simplify CPUID_8000_001E for AMD")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  hw/i386/pc.c      |  4 +++-
>  target/i386/cpu.c | 18 ++++++++++--------
>  target/i386/cpu.h |  1 +
>  3 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 188bc9d0f8..624d5da146 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -77,7 +77,9 @@
>      { "qemu64-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },\
>      { "athlon-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },
>  
> -GlobalProperty pc_compat_8_1[] = {};
> +GlobalProperty pc_compat_8_1[] = {
> +    { TYPE_X86_CPU, "legacy-multi-node", "on" },
> +};
>  const size_t pc_compat_8_1_len = G_N_ELEMENTS(pc_compat_8_1);
>  
>  GlobalProperty pc_compat_8_0[] = {
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 358d9c0a65..baee9394a1 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -398,12 +398,9 @@ static void encode_topo_cpuid8000001e(X86CPU *cpu, X86CPUTopoInfo *topo_info,
>       * 31:11 Reserved.
>       * 10:8 NodesPerProcessor: Node per processor. Read-only. Reset: XXXb.
>       *      ValidValues:
> -     *      Value Description
> -     *      000b  1 node per processor.
> -     *      001b  2 nodes per processor.
> -     *      010b Reserved.
> -     *      011b 4 nodes per processor.
> -     *      111b-100b Reserved.
> +     *      Value   Description
> +     *      0h      1 node per processor.
> +     *      7h-1h   Reserved.
>       *  7:0 NodeId: Node ID. Read-only. Reset: XXh.
>       *
>       * NOTE: Hardware reserves 3 bits for number of nodes per processor.
> @@ -412,8 +409,12 @@ static void encode_topo_cpuid8000001e(X86CPU *cpu, X86CPUTopoInfo *topo_info,
>       * NodeId is combination of node and socket_id which is already decoded
>       * in apic_id. Just use it by shifting.
>       */
> -    *ecx = ((topo_info->dies_per_pkg - 1) << 8) |
> -           ((cpu->apic_id >> apicid_die_offset(topo_info)) & 0xFF);
> +    if (cpu->legacy_multi_node) {
> +        *ecx = ((topo_info->dies_per_pkg - 1) << 8) |
> +               ((cpu->apic_id >> apicid_die_offset(topo_info)) & 0xFF);
> +    } else {
> +        *ecx = (cpu->apic_id >> apicid_pkg_offset(topo_info)) & 0xFF;
> +    }
>  
>      *edx = 0;
>  }
> @@ -7894,6 +7895,7 @@ static Property x86_cpu_properties[] = {
>       * own cache information (see x86_cpu_load_def()).
>       */
>      DEFINE_PROP_BOOL("legacy-cache", X86CPU, legacy_cache, true),
> +    DEFINE_PROP_BOOL("legacy-multi-node", X86CPU, legacy_multi_node, false),
>      DEFINE_PROP_BOOL("xen-vapic", X86CPU, xen_vapic, false),
>  
>      /*
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index cd2e295bd6..7b855924d6 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1988,6 +1988,7 @@ struct ArchCPU {
>       * If true present the old cache topology information
>       */
>      bool legacy_cache;
> +    bool legacy_multi_node;
>  
>      /* Compatibility bits for old machine types: */
>      bool enable_cpuid_0xb;
Zhao Liu Dec. 14, 2023, 2:08 p.m. UTC | #2
On Fri, Nov 10, 2023 at 11:08:06AM -0600, Babu Moger wrote:
> Date: Fri, 10 Nov 2023 11:08:06 -0600
> From: Babu Moger <babu.moger@amd.com>
> Subject: [PATCH] target/i386: Fix CPUID encoding of Fn8000001E_ECX
> X-Mailer: git-send-email 2.34.1
> 
> Observed the following failure while booting the SEV-SNP guest and the
> guest fails to boot with the smp parameters:
> "-smp 192,sockets=1,dies=12,cores=8,threads=2".
> 
> qemu-system-x86_64: sev_snp_launch_update: SNP_LAUNCH_UPDATE ret=-5 fw_error=22 'Invalid parameter'
> qemu-system-x86_64: SEV-SNP: CPUID validation failed for function 0x8000001e, index: 0x0.
> provided: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000b00, edx: 0x00000000
> expected: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000300, edx: 0x00000000
> qemu-system-x86_64: SEV-SNP: failed update CPUID page
> 
> Reason for the failure is due to overflowing of bits used for "Node per
> processor" in CPUID Fn8000001E_ECX. This field's width is 3 bits wide and
> can hold maximum value 0x7. With dies=12 (0xB), it overflows and spills
> over into the reserved bits. In the case of SEV-SNP, this causes CPUID
> enforcement failure and guest fails to boot.
> 
> The PPR documentation for CPUID_Fn8000001E_ECX [Node Identifiers]
> =================================================================
> Bits    Description
> 31:11   Reserved.
> 
> 10:8    NodesPerProcessor: Node per processor. Read-only.
>         ValidValues:
>         Value   Description
>         0h      1 node per processor.
>         7h-1h   Reserved.
> 
> 7:0     NodeId: Node ID. Read-only. Reset: Fixed,XXh.
> =================================================================
> 
> As in the spec, the valid value for "node per processor" is 0 and rest
> are reserved.
> 
> Looking back at the history of decoding of CPUID_Fn8000001E_ECX, noticed
> that there were cases where "node per processor" can be more than 1. It
> is valid only for pre-F17h (pre-EPYC) architectures. For EPYC or later
> CPUs, the linux kernel does not use this information to build the L3
> topology.
> 
> Also noted that the CPUID Function 0x8000001E_ECX is available only when
> TOPOEXT feature is enabled. 

One additional query, such dependency relationship is not reflected in
encode_topo_cpuid8000001e(), should TOPOEXT be checked in
encode_topo_cpuid8000001e()?

> This feature is enabled only for EPYC(F17h)
> or later processors. So, previous generation of processors do not not
> enumerate 0x8000001E_ECX leaf.
> 
> There could be some corner cases where the older guests could enable the
> TOPOEXT feature by running with -cpu host, in which case legacy guests
> might notice the topology change. To address those cases introduced a
> new CPU property "legacy-multi-node". It will be true for older machine
> types to maintain compatibility. By default, it will be false, so new
> decoding will be used going forward.
> 
> The documentation is taken from Preliminary Processor Programming
> Reference (PPR) for AMD Family 19h Model 11h, Revision B1 Processors 55901
> Rev 0.25 - Oct 6, 2022.
> 
> Cc: qemu-stable@nongnu.org
> Fixes: 31ada106d891 ("Simplify CPUID_8000_001E for AMD")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

[snip]

> +++ b/target/i386/cpu.h
> @@ -1988,6 +1988,7 @@ struct ArchCPU {
>       * If true present the old cache topology information
>       */
>      bool legacy_cache;
> +    bool legacy_multi_node;

This property deserves a comment, as does legacy_cache above.

>  
>      /* Compatibility bits for old machine types: */
>      bool enable_cpuid_0xb;
> -- 
> 2.34.1
> 

Just the above nit, otherwise,
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Moger, Babu Dec. 14, 2023, 6:37 p.m. UTC | #3
Hi Zhao,

On 12/14/2023 8:08 AM, Zhao Liu wrote:
> On Fri, Nov 10, 2023 at 11:08:06AM -0600, Babu Moger wrote:
>> Date: Fri, 10 Nov 2023 11:08:06 -0600
>> From: Babu Moger <babu.moger@amd.com>
>> Subject: [PATCH] target/i386: Fix CPUID encoding of Fn8000001E_ECX
>> X-Mailer: git-send-email 2.34.1
>>
>> Observed the following failure while booting the SEV-SNP guest and the
>> guest fails to boot with the smp parameters:
>> "-smp 192,sockets=1,dies=12,cores=8,threads=2".
>>
>> qemu-system-x86_64: sev_snp_launch_update: SNP_LAUNCH_UPDATE ret=-5 fw_error=22 'Invalid parameter'
>> qemu-system-x86_64: SEV-SNP: CPUID validation failed for function 0x8000001e, index: 0x0.
>> provided: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000b00, edx: 0x00000000
>> expected: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000300, edx: 0x00000000
>> qemu-system-x86_64: SEV-SNP: failed update CPUID page
>>
>> Reason for the failure is due to overflowing of bits used for "Node per
>> processor" in CPUID Fn8000001E_ECX. This field's width is 3 bits wide and
>> can hold maximum value 0x7. With dies=12 (0xB), it overflows and spills
>> over into the reserved bits. In the case of SEV-SNP, this causes CPUID
>> enforcement failure and guest fails to boot.
>>
>> The PPR documentation for CPUID_Fn8000001E_ECX [Node Identifiers]
>> =================================================================
>> Bits    Description
>> 31:11   Reserved.
>>
>> 10:8    NodesPerProcessor: Node per processor. Read-only.
>>          ValidValues:
>>          Value   Description
>>          0h      1 node per processor.
>>          7h-1h   Reserved.
>>
>> 7:0     NodeId: Node ID. Read-only. Reset: Fixed,XXh.
>> =================================================================
>>
>> As in the spec, the valid value for "node per processor" is 0 and rest
>> are reserved.
>>
>> Looking back at the history of decoding of CPUID_Fn8000001E_ECX, noticed
>> that there were cases where "node per processor" can be more than 1. It
>> is valid only for pre-F17h (pre-EPYC) architectures. For EPYC or later
>> CPUs, the linux kernel does not use this information to build the L3
>> topology.
>>
>> Also noted that the CPUID Function 0x8000001E_ECX is available only when
>> TOPOEXT feature is enabled.
> One additional query, such dependency relationship is not reflected in
> encode_topo_cpuid8000001e(), should TOPOEXT be checked in
> encode_topo_cpuid8000001e()?
No. We don't need to check in encode_topo_cpuid8000001e. Dependency 
check is done earlier than this is called.
>
>> This feature is enabled only for EPYC(F17h)
>> or later processors. So, previous generation of processors do not not
>> enumerate 0x8000001E_ECX leaf.
>>
>> There could be some corner cases where the older guests could enable the
>> TOPOEXT feature by running with -cpu host, in which case legacy guests
>> might notice the topology change. To address those cases introduced a
>> new CPU property "legacy-multi-node". It will be true for older machine
>> types to maintain compatibility. By default, it will be false, so new
>> decoding will be used going forward.
>>
>> The documentation is taken from Preliminary Processor Programming
>> Reference (PPR) for AMD Family 19h Model 11h, Revision B1 Processors 55901
>> Rev 0.25 - Oct 6, 2022.
>>
>> Cc: qemu-stable@nongnu.org
>> Fixes: 31ada106d891 ("Simplify CPUID_8000_001E for AMD")
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> [snip]
>
>> +++ b/target/i386/cpu.h
>> @@ -1988,6 +1988,7 @@ struct ArchCPU {
>>        * If true present the old cache topology information
>>        */
>>       bool legacy_cache;
>> +    bool legacy_multi_node;
> This property deserves a comment, as does legacy_cache above.
Sure. Will do.
>
>>   
>>       /* Compatibility bits for old machine types: */
>>       bool enable_cpuid_0xb;
>> -- 
>> 2.34.1
>>
> Just the above nit, otherwise,
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>

Thank you.

Babu

>
diff mbox series

Patch

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 188bc9d0f8..624d5da146 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -77,7 +77,9 @@ 
     { "qemu64-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },\
     { "athlon-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },
 
-GlobalProperty pc_compat_8_1[] = {};
+GlobalProperty pc_compat_8_1[] = {
+    { TYPE_X86_CPU, "legacy-multi-node", "on" },
+};
 const size_t pc_compat_8_1_len = G_N_ELEMENTS(pc_compat_8_1);
 
 GlobalProperty pc_compat_8_0[] = {
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 358d9c0a65..baee9394a1 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -398,12 +398,9 @@  static void encode_topo_cpuid8000001e(X86CPU *cpu, X86CPUTopoInfo *topo_info,
      * 31:11 Reserved.
      * 10:8 NodesPerProcessor: Node per processor. Read-only. Reset: XXXb.
      *      ValidValues:
-     *      Value Description
-     *      000b  1 node per processor.
-     *      001b  2 nodes per processor.
-     *      010b Reserved.
-     *      011b 4 nodes per processor.
-     *      111b-100b Reserved.
+     *      Value   Description
+     *      0h      1 node per processor.
+     *      7h-1h   Reserved.
      *  7:0 NodeId: Node ID. Read-only. Reset: XXh.
      *
      * NOTE: Hardware reserves 3 bits for number of nodes per processor.
@@ -412,8 +409,12 @@  static void encode_topo_cpuid8000001e(X86CPU *cpu, X86CPUTopoInfo *topo_info,
      * NodeId is combination of node and socket_id which is already decoded
      * in apic_id. Just use it by shifting.
      */
-    *ecx = ((topo_info->dies_per_pkg - 1) << 8) |
-           ((cpu->apic_id >> apicid_die_offset(topo_info)) & 0xFF);
+    if (cpu->legacy_multi_node) {
+        *ecx = ((topo_info->dies_per_pkg - 1) << 8) |
+               ((cpu->apic_id >> apicid_die_offset(topo_info)) & 0xFF);
+    } else {
+        *ecx = (cpu->apic_id >> apicid_pkg_offset(topo_info)) & 0xFF;
+    }
 
     *edx = 0;
 }
@@ -7894,6 +7895,7 @@  static Property x86_cpu_properties[] = {
      * own cache information (see x86_cpu_load_def()).
      */
     DEFINE_PROP_BOOL("legacy-cache", X86CPU, legacy_cache, true),
+    DEFINE_PROP_BOOL("legacy-multi-node", X86CPU, legacy_multi_node, false),
     DEFINE_PROP_BOOL("xen-vapic", X86CPU, xen_vapic, false),
 
     /*
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index cd2e295bd6..7b855924d6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1988,6 +1988,7 @@  struct ArchCPU {
      * If true present the old cache topology information
      */
     bool legacy_cache;
+    bool legacy_multi_node;
 
     /* Compatibility bits for old machine types: */
     bool enable_cpuid_0xb;