Message ID | 1578388729-55540-1-git-send-email-prime.zeng@hisilicon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hw/arm/acpi: Pack the SRAT processors structure by node_id ascending order | expand |
On Tue, Jan 07, 2020 at 05:18:49PM +0800, Zeng Tao wrote: > When booting the guest linux with the following numa configuration: > -numa node,node_id=1,cpus=0-3 > -numa node,node_id=0,cpus=4-7 > We can get the following numa topology in the guest system: > Architecture: aarch64 > Byte Order: Little Endian > CPU(s): 8 > On-line CPU(s) list: 0-7 > Thread(s) per core: 1 > Core(s) per socket: 8 > Socket(s): 1 > NUMA node(s): 2 > L1d cache: unknown size > L1i cache: unknown size > L2 cache: unknown size > NUMA node0 CPU(s): 0-3 > NUMA node1 CPU(s): 4-7 > The Cpus 0-3 is assigned with NUMA node 1 in QEMU while it get NUMA node > 0 in the guest. > > In fact, In the linux kernel, numa_node_id is allocated per the ACPI > SRAT processors structure order,so the cpu 0 will be the first one to > allocate its NUMA node id, so it gets the NUMA node 0. > > To fix this issue, we pack the SRAT processors structure in numa node id > order but not the default cpu number order. > > Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> Does this matter? If yes fixing linux to take node id from proximity field in ACPI seems cleaner ... > --- > hw/arm/virt-acpi-build.c | 23 +++++++++++++++-------- > 1 file changed, 15 insertions(+), 8 deletions(-) > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > index bd5f771..497192b 100644 > --- a/hw/arm/virt-acpi-build.c > +++ b/hw/arm/virt-acpi-build.c > @@ -520,7 +520,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) > AcpiSystemResourceAffinityTable *srat; > AcpiSratProcessorGiccAffinity *core; > AcpiSratMemoryAffinity *numamem; > - int i, srat_start; > + int i, j, srat_start; > + uint32_t node_id; > uint64_t mem_base; > MachineClass *mc = MACHINE_GET_CLASS(vms); > MachineState *ms = MACHINE(vms); > @@ -530,13 +531,19 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) > srat = acpi_data_push(table_data, sizeof(*srat)); > srat->reserved1 = cpu_to_le32(1); > > - for (i = 0; i < cpu_list->len; ++i) { > - core = acpi_data_push(table_data, sizeof(*core)); > - core->type = ACPI_SRAT_PROCESSOR_GICC; > - core->length = sizeof(*core); > - core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id); > - core->acpi_processor_uid = cpu_to_le32(i); > - core->flags = cpu_to_le32(1); > + for (i = 0; i < ms->numa_state->num_nodes; ++i) { > + for (j = 0; j < cpu_list->len; ++j) { Hmm O(n ^2) isn't great ... > + node_id = cpu_to_le32(cpu_list->cpus[j].props.node_id); > + if (node_id != i) { > + continue; > + } > + core = acpi_data_push(table_data, sizeof(*core)); > + core->type = ACPI_SRAT_PROCESSOR_GICC; > + core->length = sizeof(*core); > + core->proximity = node_id; > + core->acpi_processor_uid = cpu_to_le32(j); > + core->flags = cpu_to_le32(1); > + } > } is the issue arm specific? wouldn't it affect x86 too? > mem_base = vms->memmap[VIRT_MEM].base; > -- > 2.8.1
> -----Original Message----- > From: Michael S. Tsirkin [mailto:mst@redhat.com] > Sent: Tuesday, January 07, 2020 5:33 PM > To: Zengtao (B) > Cc: qemu-devel@nongnu.org; qemu-trivial@nongnu.org; Shannon Zhao; > Peter Maydell; Igor Mammedov; qemu-arm@nongnu.org > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure by > node_id ascending order > > On Tue, Jan 07, 2020 at 05:18:49PM +0800, Zeng Tao wrote: > > When booting the guest linux with the following numa configuration: > > -numa node,node_id=1,cpus=0-3 > > -numa node,node_id=0,cpus=4-7 > > We can get the following numa topology in the guest system: > > Architecture: aarch64 > > Byte Order: Little Endian > > CPU(s): 8 > > On-line CPU(s) list: 0-7 > > Thread(s) per core: 1 > > Core(s) per socket: 8 > > Socket(s): 1 > > NUMA node(s): 2 > > L1d cache: unknown size > > L1i cache: unknown size > > L2 cache: unknown size > > NUMA node0 CPU(s): 0-3 > > NUMA node1 CPU(s): 4-7 > > The Cpus 0-3 is assigned with NUMA node 1 in QEMU while it get NUMA > node > > 0 in the guest. > > > > In fact, In the linux kernel, numa_node_id is allocated per the ACPI > > SRAT processors structure order,so the cpu 0 will be the first one to > > allocate its NUMA node id, so it gets the NUMA node 0. > > > > To fix this issue, we pack the SRAT processors structure in numa node id > > order but not the default cpu number order. > > > > Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> > > > Does this matter? If yes fixing linux to take node id from proximity > field in ACPI seems cleaner ... > In fact, I just want to align the node_id concept in QEMU and Linux. If we fix the kernel side, we need to align with all platforms. i think maybe not a good idea. > > --- > > hw/arm/virt-acpi-build.c | 23 +++++++++++++++-------- > > 1 file changed, 15 insertions(+), 8 deletions(-) > > > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > > index bd5f771..497192b 100644 > > --- a/hw/arm/virt-acpi-build.c > > +++ b/hw/arm/virt-acpi-build.c > > @@ -520,7 +520,8 @@ build_srat(GArray *table_data, BIOSLinker > *linker, VirtMachineState *vms) > > AcpiSystemResourceAffinityTable *srat; > > AcpiSratProcessorGiccAffinity *core; > > AcpiSratMemoryAffinity *numamem; > > - int i, srat_start; > > + int i, j, srat_start; > > + uint32_t node_id; > > uint64_t mem_base; > > MachineClass *mc = MACHINE_GET_CLASS(vms); > > MachineState *ms = MACHINE(vms); > > @@ -530,13 +531,19 @@ build_srat(GArray *table_data, BIOSLinker > *linker, VirtMachineState *vms) > > srat = acpi_data_push(table_data, sizeof(*srat)); > > srat->reserved1 = cpu_to_le32(1); > > > > - for (i = 0; i < cpu_list->len; ++i) { > > - core = acpi_data_push(table_data, sizeof(*core)); > > - core->type = ACPI_SRAT_PROCESSOR_GICC; > > - core->length = sizeof(*core); > > - core->proximity = > cpu_to_le32(cpu_list->cpus[i].props.node_id); > > - core->acpi_processor_uid = cpu_to_le32(i); > > - core->flags = cpu_to_le32(1); > > + for (i = 0; i < ms->numa_state->num_nodes; ++i) { > > + for (j = 0; j < cpu_list->len; ++j) { > > Hmm O(n ^2) isn't great ... Good suggestion, 3Q. > > > + node_id = cpu_to_le32(cpu_list->cpus[j].props.node_id); > > + if (node_id != i) { > > + continue; > > + } > > + core = acpi_data_push(table_data, sizeof(*core)); > > + core->type = ACPI_SRAT_PROCESSOR_GICC; > > + core->length = sizeof(*core); > > + core->proximity = node_id; > > + core->acpi_processor_uid = cpu_to_le32(j); > > + core->flags = cpu_to_le32(1); > > + } > > } > > is the issue arm specific? wouldn't it affect x86 too? > Good question, I think it will affect x86, but I need to confirm. > > mem_base = vms->memmap[VIRT_MEM].base; > > -- > > 2.8.1
On Tue, 7 Jan 2020 10:29:22 +0000 "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > -----Original Message----- > > From: Michael S. Tsirkin [mailto:mst@redhat.com] > > Sent: Tuesday, January 07, 2020 5:33 PM > > To: Zengtao (B) > > Cc: qemu-devel@nongnu.org; qemu-trivial@nongnu.org; Shannon Zhao; > > Peter Maydell; Igor Mammedov; qemu-arm@nongnu.org > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure by > > node_id ascending order > > > > On Tue, Jan 07, 2020 at 05:18:49PM +0800, Zeng Tao wrote: > > > When booting the guest linux with the following numa configuration: > > > -numa node,node_id=1,cpus=0-3 > > > -numa node,node_id=0,cpus=4-7 > > > We can get the following numa topology in the guest system: > > > Architecture: aarch64 > > > Byte Order: Little Endian > > > CPU(s): 8 > > > On-line CPU(s) list: 0-7 > > > Thread(s) per core: 1 > > > Core(s) per socket: 8 > > > Socket(s): 1 > > > NUMA node(s): 2 > > > L1d cache: unknown size > > > L1i cache: unknown size > > > L2 cache: unknown size > > > NUMA node0 CPU(s): 0-3 > > > NUMA node1 CPU(s): 4-7 > > > The Cpus 0-3 is assigned with NUMA node 1 in QEMU while it get NUMA > > node > > > 0 in the guest. > > > > > > In fact, In the linux kernel, numa_node_id is allocated per the ACPI > > > SRAT processors structure order,so the cpu 0 will be the first one to > > > allocate its NUMA node id, so it gets the NUMA node 0. > > > > > > To fix this issue, we pack the SRAT processors structure in numa node id > > > order but not the default cpu number order. > > > > > > Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> > > > > > > Does this matter? If yes fixing linux to take node id from proximity > > field in ACPI seems cleaner ... > > > > In fact, I just want to align the node_id concept in QEMU and Linux. > If we fix the kernel side, we need to align with all platforms. > i think maybe not a good idea. if linux makes up node ID's on it's own, it would be hard for it to map SRAT entries to other tables that use proximity id as well. So it would need to maintain map of [proximity id] -> [node id] (and reverse) somewhere to resolve mappings on other tables. If it doesn't do this then it's broken and works just by accident, if it does the fix probably should be in that code and not in QEMU. > > > > --- > > > hw/arm/virt-acpi-build.c | 23 +++++++++++++++-------- > > > 1 file changed, 15 insertions(+), 8 deletions(-) > > > > > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > > > index bd5f771..497192b 100644 > > > --- a/hw/arm/virt-acpi-build.c > > > +++ b/hw/arm/virt-acpi-build.c > > > @@ -520,7 +520,8 @@ build_srat(GArray *table_data, BIOSLinker > > *linker, VirtMachineState *vms) > > > AcpiSystemResourceAffinityTable *srat; > > > AcpiSratProcessorGiccAffinity *core; > > > AcpiSratMemoryAffinity *numamem; > > > - int i, srat_start; > > > + int i, j, srat_start; > > > + uint32_t node_id; > > > uint64_t mem_base; > > > MachineClass *mc = MACHINE_GET_CLASS(vms); > > > MachineState *ms = MACHINE(vms); > > > @@ -530,13 +531,19 @@ build_srat(GArray *table_data, BIOSLinker > > *linker, VirtMachineState *vms) > > > srat = acpi_data_push(table_data, sizeof(*srat)); > > > srat->reserved1 = cpu_to_le32(1); > > > > > > - for (i = 0; i < cpu_list->len; ++i) { > > > - core = acpi_data_push(table_data, sizeof(*core)); > > > - core->type = ACPI_SRAT_PROCESSOR_GICC; > > > - core->length = sizeof(*core); > > > - core->proximity = > > cpu_to_le32(cpu_list->cpus[i].props.node_id); > > > - core->acpi_processor_uid = cpu_to_le32(i); > > > - core->flags = cpu_to_le32(1); > > > + for (i = 0; i < ms->numa_state->num_nodes; ++i) { > > > + for (j = 0; j < cpu_list->len; ++j) { > > > > Hmm O(n ^2) isn't great ... > Good suggestion, 3Q. > > > > > + node_id = cpu_to_le32(cpu_list->cpus[j].props.node_id); > > > + if (node_id != i) { > > > + continue; > > > + } > > > + core = acpi_data_push(table_data, sizeof(*core)); > > > + core->type = ACPI_SRAT_PROCESSOR_GICC; > > > + core->length = sizeof(*core); > > > + core->proximity = node_id; > > > + core->acpi_processor_uid = cpu_to_le32(j); > > > + core->flags = cpu_to_le32(1); > > > + } > > > } > > > > is the issue arm specific? wouldn't it affect x86 too? > > > Good question, I think it will affect x86, but I need to confirm. > > > > mem_base = vms->memmap[VIRT_MEM].base; > > > -- > > > 2.8.1 >
> -----Original Message----- > From: Igor Mammedov [mailto:imammedo@redhat.com] > Sent: Tuesday, January 07, 2020 11:50 PM > To: Zengtao (B) > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; qemu-trivial@nongnu.org; > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure by > node_id ascending order > > On Tue, 7 Jan 2020 10:29:22 +0000 > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > -----Original Message----- > > > From: Michael S. Tsirkin [mailto:mst@redhat.com] > > > Sent: Tuesday, January 07, 2020 5:33 PM > > > To: Zengtao (B) > > > Cc: qemu-devel@nongnu.org; qemu-trivial@nongnu.org; Shannon > Zhao; > > > Peter Maydell; Igor Mammedov; qemu-arm@nongnu.org > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure > by > > > node_id ascending order > > > > > > On Tue, Jan 07, 2020 at 05:18:49PM +0800, Zeng Tao wrote: > > > > When booting the guest linux with the following numa configuration: > > > > -numa node,node_id=1,cpus=0-3 > > > > -numa node,node_id=0,cpus=4-7 > > > > We can get the following numa topology in the guest system: > > > > Architecture: aarch64 > > > > Byte Order: Little Endian > > > > CPU(s): 8 > > > > On-line CPU(s) list: 0-7 > > > > Thread(s) per core: 1 > > > > Core(s) per socket: 8 > > > > Socket(s): 1 > > > > NUMA node(s): 2 > > > > L1d cache: unknown size > > > > L1i cache: unknown size > > > > L2 cache: unknown size > > > > NUMA node0 CPU(s): 0-3 > > > > NUMA node1 CPU(s): 4-7 > > > > The Cpus 0-3 is assigned with NUMA node 1 in QEMU while it get > NUMA > > > node > > > > 0 in the guest. > > > > > > > > In fact, In the linux kernel, numa_node_id is allocated per the ACPI > > > > SRAT processors structure order,so the cpu 0 will be the first one to > > > > allocate its NUMA node id, so it gets the NUMA node 0. > > > > > > > > To fix this issue, we pack the SRAT processors structure in numa node > id > > > > order but not the default cpu number order. > > > > > > > > Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> > > > > > > > > > Does this matter? If yes fixing linux to take node id from proximity > > > field in ACPI seems cleaner ... > > > > > > > In fact, I just want to align the node_id concept in QEMU and Linux. > > If we fix the kernel side, we need to align with all platforms. > > i think maybe not a good idea. > if linux makes up node ID's on it's own, it would be hard for it to > map SRAT entries to other tables that use proximity id as well. > > So it would need to maintain map of [proximity id] -> [node id] (and reverse) > somewhere to resolve mappings on other tables. > If it doesn't do this then it's broken and works just by accident, > if it does the fix probably should be in that code and not in QEMU. > Hmm, the problem is how to understand the concept of node id. 1. In dts, there is node id. Both the QEMU and Linux can use it directly, so no conflict. 2. In ACPI, there is only proximity domain, but no node id, there should be a mapping between them, while kernel and QEMU maintain their own rule, and unfortunately they conflict with each other sometimes. There is no specification to indicate what we should do to maintain the mapping, so it's hard to align the QEMU and Linux. Any suggestion, or we just accept it as a rule since it don't affect much? > > > > > > --- > > > > hw/arm/virt-acpi-build.c | 23 +++++++++++++++-------- > > > > 1 file changed, 15 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > > > > index bd5f771..497192b 100644 > > > > --- a/hw/arm/virt-acpi-build.c > > > > +++ b/hw/arm/virt-acpi-build.c > > > > @@ -520,7 +520,8 @@ build_srat(GArray *table_data, BIOSLinker > > > *linker, VirtMachineState *vms) > > > > AcpiSystemResourceAffinityTable *srat; > > > > AcpiSratProcessorGiccAffinity *core; > > > > AcpiSratMemoryAffinity *numamem; > > > > - int i, srat_start; > > > > + int i, j, srat_start; > > > > + uint32_t node_id; > > > > uint64_t mem_base; > > > > MachineClass *mc = MACHINE_GET_CLASS(vms); > > > > MachineState *ms = MACHINE(vms); > > > > @@ -530,13 +531,19 @@ build_srat(GArray *table_data, BIOSLinker > > > *linker, VirtMachineState *vms) > > > > srat = acpi_data_push(table_data, sizeof(*srat)); > > > > srat->reserved1 = cpu_to_le32(1); > > > > > > > > - for (i = 0; i < cpu_list->len; ++i) { > > > > - core = acpi_data_push(table_data, sizeof(*core)); > > > > - core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > - core->length = sizeof(*core); > > > > - core->proximity = > > > cpu_to_le32(cpu_list->cpus[i].props.node_id); > > > > - core->acpi_processor_uid = cpu_to_le32(i); > > > > - core->flags = cpu_to_le32(1); > > > > + for (i = 0; i < ms->numa_state->num_nodes; ++i) { > > > > + for (j = 0; j < cpu_list->len; ++j) { > > > > > > Hmm O(n ^2) isn't great ... > > Good suggestion, 3Q. > > > > > > > + node_id = > cpu_to_le32(cpu_list->cpus[j].props.node_id); > > > > + if (node_id != i) { > > > > + continue; > > > > + } > > > > + core = acpi_data_push(table_data, sizeof(*core)); > > > > + core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > + core->length = sizeof(*core); > > > > + core->proximity = node_id; > > > > + core->acpi_processor_uid = cpu_to_le32(j); > > > > + core->flags = cpu_to_le32(1); > > > > + } > > > > } > > > > > > is the issue arm specific? wouldn't it affect x86 too? > > > > > Good question, I think it will affect x86, but I need to confirm. > > > > > > mem_base = vms->memmap[VIRT_MEM].base; > > > > -- > > > > 2.8.1 > >
On Wed, 8 Jan 2020 04:02:10 +0000 "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > -----Original Message----- > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > Sent: Tuesday, January 07, 2020 11:50 PM > > To: Zengtao (B) > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; qemu-trivial@nongnu.org; > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure by > > node_id ascending order > > > > On Tue, 7 Jan 2020 10:29:22 +0000 > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > -----Original Message----- > > > > From: Michael S. Tsirkin [mailto:mst@redhat.com] > > > > Sent: Tuesday, January 07, 2020 5:33 PM > > > > To: Zengtao (B) > > > > Cc: qemu-devel@nongnu.org; qemu-trivial@nongnu.org; Shannon > > Zhao; > > > > Peter Maydell; Igor Mammedov; qemu-arm@nongnu.org > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure > > by > > > > node_id ascending order > > > > > > > > On Tue, Jan 07, 2020 at 05:18:49PM +0800, Zeng Tao wrote: > > > > > When booting the guest linux with the following numa configuration: > > > > > -numa node,node_id=1,cpus=0-3 > > > > > -numa node,node_id=0,cpus=4-7 > > > > > We can get the following numa topology in the guest system: > > > > > Architecture: aarch64 > > > > > Byte Order: Little Endian > > > > > CPU(s): 8 > > > > > On-line CPU(s) list: 0-7 > > > > > Thread(s) per core: 1 > > > > > Core(s) per socket: 8 > > > > > Socket(s): 1 > > > > > NUMA node(s): 2 > > > > > L1d cache: unknown size > > > > > L1i cache: unknown size > > > > > L2 cache: unknown size > > > > > NUMA node0 CPU(s): 0-3 > > > > > NUMA node1 CPU(s): 4-7 > > > > > The Cpus 0-3 is assigned with NUMA node 1 in QEMU while it get > > NUMA > > > > node > > > > > 0 in the guest. > > > > > > > > > > In fact, In the linux kernel, numa_node_id is allocated per the ACPI > > > > > SRAT processors structure order,so the cpu 0 will be the first one to > > > > > allocate its NUMA node id, so it gets the NUMA node 0. > > > > > > > > > > To fix this issue, we pack the SRAT processors structure in numa node > > id > > > > > order but not the default cpu number order. > > > > > > > > > > Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> > > > > > > > > > > > > Does this matter? If yes fixing linux to take node id from proximity > > > > field in ACPI seems cleaner ... > > > > > > > > > > In fact, I just want to align the node_id concept in QEMU and Linux. > > > If we fix the kernel side, we need to align with all platforms. > > > i think maybe not a good idea. > > if linux makes up node ID's on it's own, it would be hard for it to > > map SRAT entries to other tables that use proximity id as well. > > > > So it would need to maintain map of [proximity id] -> [node id] (and reverse) > > somewhere to resolve mappings on other tables. > > If it doesn't do this then it's broken and works just by accident, > > if it does the fix probably should be in that code and not in QEMU. > > > > Hmm, the problem is how to understand the concept of node id. > 1. In dts, there is node id. Both the QEMU and Linux can use it > directly, so no conflict. > 2. In ACPI, there is only proximity domain, but no node id, there > should be a mapping between them, while kernel and QEMU maintain > their own rule, and unfortunately they conflict with each other > sometimes. > > There is no specification to indicate what we should do to maintain the > mapping, so it's hard to align the QEMU and Linux. > > Any suggestion, or we just accept it as a rule since it don't affect much? If node id generation is driven by SRAT content, it might be reasonable to ask for SRAT parser in kernel to create node ids using proximity value instead of the order ACPI_SRAT_PROCESSOR_GICC structures are enumerated. That way node id would match ACPI spec. But even with that I'd wouldn't expect cpu ids match as its basically arbitrary numbers on both sided. One would need to use arch specific ids to reliably match cpus on both sides (MPIDR in ARM case or APICID in x86). > > > > > > > > --- > > > > > hw/arm/virt-acpi-build.c | 23 +++++++++++++++-------- > > > > > 1 file changed, 15 insertions(+), 8 deletions(-) > > > > > > > > > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > > > > > index bd5f771..497192b 100644 > > > > > --- a/hw/arm/virt-acpi-build.c > > > > > +++ b/hw/arm/virt-acpi-build.c > > > > > @@ -520,7 +520,8 @@ build_srat(GArray *table_data, BIOSLinker > > > > *linker, VirtMachineState *vms) > > > > > AcpiSystemResourceAffinityTable *srat; > > > > > AcpiSratProcessorGiccAffinity *core; > > > > > AcpiSratMemoryAffinity *numamem; > > > > > - int i, srat_start; > > > > > + int i, j, srat_start; > > > > > + uint32_t node_id; > > > > > uint64_t mem_base; > > > > > MachineClass *mc = MACHINE_GET_CLASS(vms); > > > > > MachineState *ms = MACHINE(vms); > > > > > @@ -530,13 +531,19 @@ build_srat(GArray *table_data, BIOSLinker > > > > *linker, VirtMachineState *vms) > > > > > srat = acpi_data_push(table_data, sizeof(*srat)); > > > > > srat->reserved1 = cpu_to_le32(1); > > > > > > > > > > - for (i = 0; i < cpu_list->len; ++i) { > > > > > - core = acpi_data_push(table_data, sizeof(*core)); > > > > > - core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > - core->length = sizeof(*core); > > > > > - core->proximity = > > > > cpu_to_le32(cpu_list->cpus[i].props.node_id); > > > > > - core->acpi_processor_uid = cpu_to_le32(i); > > > > > - core->flags = cpu_to_le32(1); > > > > > + for (i = 0; i < ms->numa_state->num_nodes; ++i) { > > > > > + for (j = 0; j < cpu_list->len; ++j) { > > > > > > > > Hmm O(n ^2) isn't great ... > > > Good suggestion, 3Q. > > > > > > > > > + node_id = > > cpu_to_le32(cpu_list->cpus[j].props.node_id); > > > > > + if (node_id != i) { > > > > > + continue; > > > > > + } > > > > > + core = acpi_data_push(table_data, sizeof(*core)); > > > > > + core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > + core->length = sizeof(*core); > > > > > + core->proximity = node_id; > > > > > + core->acpi_processor_uid = cpu_to_le32(j); > > > > > + core->flags = cpu_to_le32(1); > > > > > + } > > > > > } > > > > > > > > is the issue arm specific? wouldn't it affect x86 too? > > > > > > > Good question, I think it will affect x86, but I need to confirm. > > > > > > > > mem_base = vms->memmap[VIRT_MEM].base; > > > > > -- > > > > > 2.8.1 > > > >
> -----Original Message----- > From: Igor Mammedov [mailto:imammedo@redhat.com] > Sent: Thursday, January 09, 2020 12:39 AM > To: Zengtao (B) > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; qemu-trivial@nongnu.org; > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure by > node_id ascending order > > On Wed, 8 Jan 2020 04:02:10 +0000 > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > -----Original Message----- > > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > > Sent: Tuesday, January 07, 2020 11:50 PM > > > To: Zengtao (B) > > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; > qemu-trivial@nongnu.org; > > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure > by > > > node_id ascending order > > > > > > On Tue, 7 Jan 2020 10:29:22 +0000 > > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > > > -----Original Message----- > > > > > From: Michael S. Tsirkin [mailto:mst@redhat.com] > > > > > Sent: Tuesday, January 07, 2020 5:33 PM > > > > > To: Zengtao (B) > > > > > Cc: qemu-devel@nongnu.org; qemu-trivial@nongnu.org; Shannon > > > Zhao; > > > > > Peter Maydell; Igor Mammedov; qemu-arm@nongnu.org > > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors > structure > > > by > > > > > node_id ascending order > > > > > > > > > > On Tue, Jan 07, 2020 at 05:18:49PM +0800, Zeng Tao wrote: > > > > > > When booting the guest linux with the following numa > configuration: > > > > > > -numa node,node_id=1,cpus=0-3 > > > > > > -numa node,node_id=0,cpus=4-7 > > > > > > We can get the following numa topology in the guest system: > > > > > > Architecture: aarch64 > > > > > > Byte Order: Little Endian > > > > > > CPU(s): 8 > > > > > > On-line CPU(s) list: 0-7 > > > > > > Thread(s) per core: 1 > > > > > > Core(s) per socket: 8 > > > > > > Socket(s): 1 > > > > > > NUMA node(s): 2 > > > > > > L1d cache: unknown size > > > > > > L1i cache: unknown size > > > > > > L2 cache: unknown size > > > > > > NUMA node0 CPU(s): 0-3 > > > > > > NUMA node1 CPU(s): 4-7 > > > > > > The Cpus 0-3 is assigned with NUMA node 1 in QEMU while it get > > > NUMA > > > > > node > > > > > > 0 in the guest. > > > > > > > > > > > > In fact, In the linux kernel, numa_node_id is allocated per the > ACPI > > > > > > SRAT processors structure order,so the cpu 0 will be the first one > to > > > > > > allocate its NUMA node id, so it gets the NUMA node 0. > > > > > > > > > > > > To fix this issue, we pack the SRAT processors structure in numa > node > > > id > > > > > > order but not the default cpu number order. > > > > > > > > > > > > Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> > > > > > > > > > > > > > > > Does this matter? If yes fixing linux to take node id from proximity > > > > > field in ACPI seems cleaner ... > > > > > > > > > > > > > In fact, I just want to align the node_id concept in QEMU and Linux. > > > > If we fix the kernel side, we need to align with all platforms. > > > > i think maybe not a good idea. > > > if linux makes up node ID's on it's own, it would be hard for it to > > > map SRAT entries to other tables that use proximity id as well. > > > > > > So it would need to maintain map of [proximity id] -> [node id] (and > reverse) > > > somewhere to resolve mappings on other tables. > > > If it doesn't do this then it's broken and works just by accident, > > > if it does the fix probably should be in that code and not in QEMU. > > > > > > > Hmm, the problem is how to understand the concept of node id. > > 1. In dts, there is node id. Both the QEMU and Linux can use it > > directly, so no conflict. > > 2. In ACPI, there is only proximity domain, but no node id, there > > should be a mapping between them, while kernel and QEMU maintain > > their own rule, and unfortunately they conflict with each other > > sometimes. > > > > There is no specification to indicate what we should do to maintain the > > mapping, so it's hard to align the QEMU and Linux. > > > > Any suggestion, or we just accept it as a rule since it don't affect much? > > If node id generation is driven by SRAT content, it might be reasonable to > ask for SRAT parser in kernel to create node ids using proximity value > instead of the order ACPI_SRAT_PROCESSOR_GICC structures are > enumerated. > That way node id would match ACPI spec. > I don't quite understand "That way node id would match ACPI spec." I check the ACPI 6.3 spec, I didn't find any description that node id should be equal to proximity value, in section 6.2.15, there is indeed an example which node numbers equals to proximity value. Thanks > But even with that I'd wouldn't expect cpu ids match as its basically > arbitrary numbers on both sided. One would need to use arch specific ids > to reliably match cpus on both sides (MPIDR in ARM case or APICID in x86). >. > > > > > > > > > > --- > > > > > > hw/arm/virt-acpi-build.c | 23 +++++++++++++++-------- > > > > > > 1 file changed, 15 insertions(+), 8 deletions(-) > > > > > > > > > > > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > > > > > > index bd5f771..497192b 100644 > > > > > > --- a/hw/arm/virt-acpi-build.c > > > > > > +++ b/hw/arm/virt-acpi-build.c > > > > > > @@ -520,7 +520,8 @@ build_srat(GArray *table_data, > BIOSLinker > > > > > *linker, VirtMachineState *vms) > > > > > > AcpiSystemResourceAffinityTable *srat; > > > > > > AcpiSratProcessorGiccAffinity *core; > > > > > > AcpiSratMemoryAffinity *numamem; > > > > > > - int i, srat_start; > > > > > > + int i, j, srat_start; > > > > > > + uint32_t node_id; > > > > > > uint64_t mem_base; > > > > > > MachineClass *mc = MACHINE_GET_CLASS(vms); > > > > > > MachineState *ms = MACHINE(vms); > > > > > > @@ -530,13 +531,19 @@ build_srat(GArray *table_data, > BIOSLinker > > > > > *linker, VirtMachineState *vms) > > > > > > srat = acpi_data_push(table_data, sizeof(*srat)); > > > > > > srat->reserved1 = cpu_to_le32(1); > > > > > > > > > > > > - for (i = 0; i < cpu_list->len; ++i) { > > > > > > - core = acpi_data_push(table_data, sizeof(*core)); > > > > > > - core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > > - core->length = sizeof(*core); > > > > > > - core->proximity = > > > > > cpu_to_le32(cpu_list->cpus[i].props.node_id); > > > > > > - core->acpi_processor_uid = cpu_to_le32(i); > > > > > > - core->flags = cpu_to_le32(1); > > > > > > + for (i = 0; i < ms->numa_state->num_nodes; ++i) { > > > > > > + for (j = 0; j < cpu_list->len; ++j) { > > > > > > > > > > Hmm O(n ^2) isn't great ... > > > > Good suggestion, 3Q. > > > > > > > > > > > + node_id = > > > cpu_to_le32(cpu_list->cpus[j].props.node_id); > > > > > > + if (node_id != i) { > > > > > > + continue; > > > > > > + } > > > > > > + core = acpi_data_push(table_data, sizeof(*core)); > > > > > > + core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > > + core->length = sizeof(*core); > > > > > > + core->proximity = node_id; > > > > > > + core->acpi_processor_uid = cpu_to_le32(j); > > > > > > + core->flags = cpu_to_le32(1); > > > > > > + } > > > > > > } > > > > > > > > > > is the issue arm specific? wouldn't it affect x86 too? > > > > > > > > > Good question, I think it will affect x86, but I need to confirm. > > > > > > > > > > mem_base = vms->memmap[VIRT_MEM].base; > > > > > > -- > > > > > > 2.8.1 > > > > > >
On Thu, 9 Jan 2020 02:45:58 +0000 "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > -----Original Message----- > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > Sent: Thursday, January 09, 2020 12:39 AM > > To: Zengtao (B) > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; qemu-trivial@nongnu.org; > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure by > > node_id ascending order > > > > On Wed, 8 Jan 2020 04:02:10 +0000 > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > -----Original Message----- > > > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > > > Sent: Tuesday, January 07, 2020 11:50 PM > > > > To: Zengtao (B) > > > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; > > qemu-trivial@nongnu.org; > > > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure > > by > > > > node_id ascending order > > > > > > > > On Tue, 7 Jan 2020 10:29:22 +0000 > > > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > > > > > -----Original Message----- > > > > > > From: Michael S. Tsirkin [mailto:mst@redhat.com] > > > > > > Sent: Tuesday, January 07, 2020 5:33 PM > > > > > > To: Zengtao (B) > > > > > > Cc: qemu-devel@nongnu.org; qemu-trivial@nongnu.org; Shannon > > > > Zhao; > > > > > > Peter Maydell; Igor Mammedov; qemu-arm@nongnu.org > > > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors > > structure > > > > by > > > > > > node_id ascending order > > > > > > > > > > > > On Tue, Jan 07, 2020 at 05:18:49PM +0800, Zeng Tao wrote: > > > > > > > When booting the guest linux with the following numa > > configuration: > > > > > > > -numa node,node_id=1,cpus=0-3 > > > > > > > -numa node,node_id=0,cpus=4-7 > > > > > > > We can get the following numa topology in the guest system: > > > > > > > Architecture: aarch64 > > > > > > > Byte Order: Little Endian > > > > > > > CPU(s): 8 > > > > > > > On-line CPU(s) list: 0-7 > > > > > > > Thread(s) per core: 1 > > > > > > > Core(s) per socket: 8 > > > > > > > Socket(s): 1 > > > > > > > NUMA node(s): 2 > > > > > > > L1d cache: unknown size > > > > > > > L1i cache: unknown size > > > > > > > L2 cache: unknown size > > > > > > > NUMA node0 CPU(s): 0-3 > > > > > > > NUMA node1 CPU(s): 4-7 > > > > > > > The Cpus 0-3 is assigned with NUMA node 1 in QEMU while it get > > > > NUMA > > > > > > node > > > > > > > 0 in the guest. > > > > > > > > > > > > > > In fact, In the linux kernel, numa_node_id is allocated per the > > ACPI > > > > > > > SRAT processors structure order,so the cpu 0 will be the first one > > to > > > > > > > allocate its NUMA node id, so it gets the NUMA node 0. > > > > > > > > > > > > > > To fix this issue, we pack the SRAT processors structure in numa > > node > > > > id > > > > > > > order but not the default cpu number order. > > > > > > > > > > > > > > Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> > > > > > > > > > > > > > > > > > > Does this matter? If yes fixing linux to take node id from proximity > > > > > > field in ACPI seems cleaner ... > > > > > > > > > > > > > > > > In fact, I just want to align the node_id concept in QEMU and Linux. > > > > > If we fix the kernel side, we need to align with all platforms. > > > > > i think maybe not a good idea. > > > > if linux makes up node ID's on it's own, it would be hard for it to > > > > map SRAT entries to other tables that use proximity id as well. > > > > > > > > So it would need to maintain map of [proximity id] -> [node id] (and > > reverse) > > > > somewhere to resolve mappings on other tables. > > > > If it doesn't do this then it's broken and works just by accident, > > > > if it does the fix probably should be in that code and not in QEMU. > > > > > > > > > > Hmm, the problem is how to understand the concept of node id. > > > 1. In dts, there is node id. Both the QEMU and Linux can use it > > > directly, so no conflict. > > > 2. In ACPI, there is only proximity domain, but no node id, there > > > should be a mapping between them, while kernel and QEMU maintain > > > their own rule, and unfortunately they conflict with each other > > > sometimes. > > > > > > There is no specification to indicate what we should do to maintain the > > > mapping, so it's hard to align the QEMU and Linux. > > > > > > Any suggestion, or we just accept it as a rule since it don't affect much? > > > > If node id generation is driven by SRAT content, it might be reasonable to > > ask for SRAT parser in kernel to create node ids using proximity value > > instead of the order ACPI_SRAT_PROCESSOR_GICC structures are > > enumerated. > > That way node id would match ACPI spec. > > > > I don't quite understand "That way node id would match ACPI spec." > I check the ACPI 6.3 spec, I didn't find any description that node id should > be equal to proximity value, in section 6.2.15, there is indeed an example > which node numbers equals to proximity value. There is only proximity id in spec (which conceptually is the same as node id). What I'm saying is that linux kernel ACPI parser could use it as node id instead of using counter for making up node ids. (issue with it is that proximity values could be sparse, so one would need to audit current node id users to make sure that using proximity id won't break anything) > Thanks > > > But even with that I'd wouldn't expect cpu ids match as its basically > > arbitrary numbers on both sided. One would need to use arch specific ids > > to reliably match cpus on both sides (MPIDR in ARM case or APICID in x86). > >. > > > > > > > > > > > > --- > > > > > > > hw/arm/virt-acpi-build.c | 23 +++++++++++++++-------- > > > > > > > 1 file changed, 15 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > > > > > > > index bd5f771..497192b 100644 > > > > > > > --- a/hw/arm/virt-acpi-build.c > > > > > > > +++ b/hw/arm/virt-acpi-build.c > > > > > > > @@ -520,7 +520,8 @@ build_srat(GArray *table_data, > > BIOSLinker > > > > > > *linker, VirtMachineState *vms) > > > > > > > AcpiSystemResourceAffinityTable *srat; > > > > > > > AcpiSratProcessorGiccAffinity *core; > > > > > > > AcpiSratMemoryAffinity *numamem; > > > > > > > - int i, srat_start; > > > > > > > + int i, j, srat_start; > > > > > > > + uint32_t node_id; > > > > > > > uint64_t mem_base; > > > > > > > MachineClass *mc = MACHINE_GET_CLASS(vms); > > > > > > > MachineState *ms = MACHINE(vms); > > > > > > > @@ -530,13 +531,19 @@ build_srat(GArray *table_data, > > BIOSLinker > > > > > > *linker, VirtMachineState *vms) > > > > > > > srat = acpi_data_push(table_data, sizeof(*srat)); > > > > > > > srat->reserved1 = cpu_to_le32(1); > > > > > > > > > > > > > > - for (i = 0; i < cpu_list->len; ++i) { > > > > > > > - core = acpi_data_push(table_data, sizeof(*core)); > > > > > > > - core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > > > - core->length = sizeof(*core); > > > > > > > - core->proximity = > > > > > > cpu_to_le32(cpu_list->cpus[i].props.node_id); > > > > > > > - core->acpi_processor_uid = cpu_to_le32(i); > > > > > > > - core->flags = cpu_to_le32(1); > > > > > > > + for (i = 0; i < ms->numa_state->num_nodes; ++i) { > > > > > > > + for (j = 0; j < cpu_list->len; ++j) { > > > > > > > > > > > > Hmm O(n ^2) isn't great ... > > > > > Good suggestion, 3Q. > > > > > > > > > > > > > + node_id = > > > > cpu_to_le32(cpu_list->cpus[j].props.node_id); > > > > > > > + if (node_id != i) { > > > > > > > + continue; > > > > > > > + } > > > > > > > + core = acpi_data_push(table_data, sizeof(*core)); > > > > > > > + core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > > > + core->length = sizeof(*core); > > > > > > > + core->proximity = node_id; > > > > > > > + core->acpi_processor_uid = cpu_to_le32(j); > > > > > > > + core->flags = cpu_to_le32(1); > > > > > > > + } > > > > > > > } > > > > > > > > > > > > is the issue arm specific? wouldn't it affect x86 too? > > > > > > > > > > > Good question, I think it will affect x86, but I need to confirm. > > > > > > > > > > > > mem_base = vms->memmap[VIRT_MEM].base; > > > > > > > -- > > > > > > > 2.8.1 > > > > > > > > >
> -----Original Message----- > From: Igor Mammedov [mailto:imammedo@redhat.com] > Sent: Thursday, January 09, 2020 5:54 PM > To: Zengtao (B) > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; qemu-trivial@nongnu.org; > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure by > node_id ascending order > > On Thu, 9 Jan 2020 02:45:58 +0000 > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > -----Original Message----- > > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > > Sent: Thursday, January 09, 2020 12:39 AM > > > To: Zengtao (B) > > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; > qemu-trivial@nongnu.org; > > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure > by > > > node_id ascending order > > > > > > On Wed, 8 Jan 2020 04:02:10 +0000 > > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > > > -----Original Message----- > > > > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > > > > Sent: Tuesday, January 07, 2020 11:50 PM > > > > > To: Zengtao (B) > > > > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; > > > qemu-trivial@nongnu.org; > > > > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors > structure > > > by > > > > > node_id ascending order > > > > > > > > > > On Tue, 7 Jan 2020 10:29:22 +0000 > > > > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Michael S. Tsirkin [mailto:mst@redhat.com] > > > > > > > Sent: Tuesday, January 07, 2020 5:33 PM > > > > > > > To: Zengtao (B) > > > > > > > Cc: qemu-devel@nongnu.org; qemu-trivial@nongnu.org; > Shannon > > > > > Zhao; > > > > > > > Peter Maydell; Igor Mammedov; qemu-arm@nongnu.org > > > > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors > > > structure > > > > > by > > > > > > > node_id ascending order > > > > > > > > > > > > > > On Tue, Jan 07, 2020 at 05:18:49PM +0800, Zeng Tao wrote: > > > > > > > > When booting the guest linux with the following numa > > > configuration: > > > > > > > > -numa node,node_id=1,cpus=0-3 > > > > > > > > -numa node,node_id=0,cpus=4-7 > > > > > > > > We can get the following numa topology in the guest system: > > > > > > > > Architecture: aarch64 > > > > > > > > Byte Order: Little Endian > > > > > > > > CPU(s): 8 > > > > > > > > On-line CPU(s) list: 0-7 > > > > > > > > Thread(s) per core: 1 > > > > > > > > Core(s) per socket: 8 > > > > > > > > Socket(s): 1 > > > > > > > > NUMA node(s): 2 > > > > > > > > L1d cache: unknown size > > > > > > > > L1i cache: unknown size > > > > > > > > L2 cache: unknown size > > > > > > > > NUMA node0 CPU(s): 0-3 > > > > > > > > NUMA node1 CPU(s): 4-7 > > > > > > > > The Cpus 0-3 is assigned with NUMA node 1 in QEMU while it > get > > > > > NUMA > > > > > > > node > > > > > > > > 0 in the guest. > > > > > > > > > > > > > > > > In fact, In the linux kernel, numa_node_id is allocated per the > > > ACPI > > > > > > > > SRAT processors structure order,so the cpu 0 will be the first > one > > > to > > > > > > > > allocate its NUMA node id, so it gets the NUMA node 0. > > > > > > > > > > > > > > > > To fix this issue, we pack the SRAT processors structure in > numa > > > node > > > > > id > > > > > > > > order but not the default cpu number order. > > > > > > > > > > > > > > > > Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> > > > > > > > > > > > > > > > > > > > > > Does this matter? If yes fixing linux to take node id from > proximity > > > > > > > field in ACPI seems cleaner ... > > > > > > > > > > > > > > > > > > > In fact, I just want to align the node_id concept in QEMU and > Linux. > > > > > > If we fix the kernel side, we need to align with all platforms. > > > > > > i think maybe not a good idea. > > > > > if linux makes up node ID's on it's own, it would be hard for it to > > > > > map SRAT entries to other tables that use proximity id as well. > > > > > > > > > > So it would need to maintain map of [proximity id] -> [node id] (and > > > reverse) > > > > > somewhere to resolve mappings on other tables. > > > > > If it doesn't do this then it's broken and works just by accident, > > > > > if it does the fix probably should be in that code and not in QEMU. > > > > > > > > > > > > > Hmm, the problem is how to understand the concept of node id. > > > > 1. In dts, there is node id. Both the QEMU and Linux can use it > > > > directly, so no conflict. > > > > 2. In ACPI, there is only proximity domain, but no node id, there > > > > should be a mapping between them, while kernel and QEMU > maintain > > > > their own rule, and unfortunately they conflict with each other > > > > sometimes. > > > > > > > > There is no specification to indicate what we should do to maintain > the > > > > mapping, so it's hard to align the QEMU and Linux. > > > > > > > > Any suggestion, or we just accept it as a rule since it don't affect > much? > > > > > > If node id generation is driven by SRAT content, it might be reasonable > to > > > ask for SRAT parser in kernel to create node ids using proximity value > > > instead of the order ACPI_SRAT_PROCESSOR_GICC structures are > > > enumerated. > > > That way node id would match ACPI spec. > > > > > > > I don't quite understand "That way node id would match ACPI spec." > > I check the ACPI 6.3 spec, I didn't find any description that node id should > > be equal to proximity value, in section 6.2.15, there is indeed an example > > which node numbers equals to proximity value. > > There is only proximity id in spec (which conceptually is the same as node > id). > What I'm saying is that linux kernel ACPI parser could use it as node id > instead of using counter for making up node ids. > > (issue with it is that proximity values could be sparse, so one > would need to audit current node id users to make sure that using > proximity id won't break anything) > I think that is really a problem, in linux kernel: #if MAX_NUMNODEs > 256 #define MAX_PXM_DOMAINS MAX_NUMNODEs #else #define MAX_PXM_DOMAINS 256 #endif It seems that using counter is a better choice. > > Thanks > > > > > But even with that I'd wouldn't expect cpu ids match as its basically > > > arbitrary numbers on both sided. One would need to use arch specific > ids > > > to reliably match cpus on both sides (MPIDR in ARM case or APICID in > x86). > > >. > > > > > > > > > > > > > > --- > > > > > > > > hw/arm/virt-acpi-build.c | 23 +++++++++++++++-------- > > > > > > > > 1 file changed, 15 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > > > > > > > > index bd5f771..497192b 100644 > > > > > > > > --- a/hw/arm/virt-acpi-build.c > > > > > > > > +++ b/hw/arm/virt-acpi-build.c > > > > > > > > @@ -520,7 +520,8 @@ build_srat(GArray *table_data, > > > BIOSLinker > > > > > > > *linker, VirtMachineState *vms) > > > > > > > > AcpiSystemResourceAffinityTable *srat; > > > > > > > > AcpiSratProcessorGiccAffinity *core; > > > > > > > > AcpiSratMemoryAffinity *numamem; > > > > > > > > - int i, srat_start; > > > > > > > > + int i, j, srat_start; > > > > > > > > + uint32_t node_id; > > > > > > > > uint64_t mem_base; > > > > > > > > MachineClass *mc = MACHINE_GET_CLASS(vms); > > > > > > > > MachineState *ms = MACHINE(vms); > > > > > > > > @@ -530,13 +531,19 @@ build_srat(GArray *table_data, > > > BIOSLinker > > > > > > > *linker, VirtMachineState *vms) > > > > > > > > srat = acpi_data_push(table_data, sizeof(*srat)); > > > > > > > > srat->reserved1 = cpu_to_le32(1); > > > > > > > > > > > > > > > > - for (i = 0; i < cpu_list->len; ++i) { > > > > > > > > - core = acpi_data_push(table_data, sizeof(*core)); > > > > > > > > - core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > > > > - core->length = sizeof(*core); > > > > > > > > - core->proximity = > > > > > > > cpu_to_le32(cpu_list->cpus[i].props.node_id); > > > > > > > > - core->acpi_processor_uid = cpu_to_le32(i); > > > > > > > > - core->flags = cpu_to_le32(1); > > > > > > > > + for (i = 0; i < ms->numa_state->num_nodes; ++i) { > > > > > > > > + for (j = 0; j < cpu_list->len; ++j) { > > > > > > > > > > > > > > Hmm O(n ^2) isn't great ... > > > > > > Good suggestion, 3Q. > > > > > > > > > > > > > > > + node_id = > > > > > cpu_to_le32(cpu_list->cpus[j].props.node_id); > > > > > > > > + if (node_id != i) { > > > > > > > > + continue; > > > > > > > > + } > > > > > > > > + core = acpi_data_push(table_data, > sizeof(*core)); > > > > > > > > + core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > > > > + core->length = sizeof(*core); > > > > > > > > + core->proximity = node_id; > > > > > > > > + core->acpi_processor_uid = cpu_to_le32(j); > > > > > > > > + core->flags = cpu_to_le32(1); > > > > > > > > + } > > > > > > > > } > > > > > > > > > > > > > > is the issue arm specific? wouldn't it affect x86 too? > > > > > > > > > > > > > Good question, I think it will affect x86, but I need to confirm. > > > > > > > > > > > > > > mem_base = vms->memmap[VIRT_MEM].base; > > > > > > > > -- > > > > > > > > 2.8.1 > > > > > > > > > > > >
On Fri, 10 Jan 2020 02:56:35 +0000 "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > -----Original Message----- > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > Sent: Thursday, January 09, 2020 5:54 PM > > To: Zengtao (B) > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; qemu-trivial@nongnu.org; > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure by > > node_id ascending order > > > > On Thu, 9 Jan 2020 02:45:58 +0000 > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > -----Original Message----- > > > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > > > Sent: Thursday, January 09, 2020 12:39 AM > > > > To: Zengtao (B) > > > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; > > qemu-trivial@nongnu.org; > > > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure > > by > > > > node_id ascending order > > > > > > > > On Wed, 8 Jan 2020 04:02:10 +0000 > > > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > > > > > -----Original Message----- > > > > > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > > > > > Sent: Tuesday, January 07, 2020 11:50 PM > > > > > > To: Zengtao (B) > > > > > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; > > > > qemu-trivial@nongnu.org; > > > > > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors > > structure > > > > by > > > > > > node_id ascending order > > > > > > > > > > > > On Tue, 7 Jan 2020 10:29:22 +0000 > > > > > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Michael S. Tsirkin [mailto:mst@redhat.com] > > > > > > > > Sent: Tuesday, January 07, 2020 5:33 PM > > > > > > > > To: Zengtao (B) > > > > > > > > Cc: qemu-devel@nongnu.org; qemu-trivial@nongnu.org; > > Shannon > > > > > > Zhao; > > > > > > > > Peter Maydell; Igor Mammedov; qemu-arm@nongnu.org > > > > > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors > > > > structure > > > > > > by > > > > > > > > node_id ascending order > > > > > > > > > > > > > > > > On Tue, Jan 07, 2020 at 05:18:49PM +0800, Zeng Tao wrote: > > > > > > > > > When booting the guest linux with the following numa > > > > configuration: > > > > > > > > > -numa node,node_id=1,cpus=0-3 > > > > > > > > > -numa node,node_id=0,cpus=4-7 > > > > > > > > > We can get the following numa topology in the guest system: > > > > > > > > > Architecture: aarch64 > > > > > > > > > Byte Order: Little Endian > > > > > > > > > CPU(s): 8 > > > > > > > > > On-line CPU(s) list: 0-7 > > > > > > > > > Thread(s) per core: 1 > > > > > > > > > Core(s) per socket: 8 > > > > > > > > > Socket(s): 1 > > > > > > > > > NUMA node(s): 2 > > > > > > > > > L1d cache: unknown size > > > > > > > > > L1i cache: unknown size > > > > > > > > > L2 cache: unknown size > > > > > > > > > NUMA node0 CPU(s): 0-3 > > > > > > > > > NUMA node1 CPU(s): 4-7 > > > > > > > > > The Cpus 0-3 is assigned with NUMA node 1 in QEMU while it > > get > > > > > > NUMA > > > > > > > > node > > > > > > > > > 0 in the guest. > > > > > > > > > > > > > > > > > > In fact, In the linux kernel, numa_node_id is allocated per the > > > > ACPI > > > > > > > > > SRAT processors structure order,so the cpu 0 will be the first > > one > > > > to > > > > > > > > > allocate its NUMA node id, so it gets the NUMA node 0. > > > > > > > > > > > > > > > > > > To fix this issue, we pack the SRAT processors structure in > > numa > > > > node > > > > > > id > > > > > > > > > order but not the default cpu number order. > > > > > > > > > > > > > > > > > > Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> > > > > > > > > > > > > > > > > > > > > > > > > Does this matter? If yes fixing linux to take node id from > > proximity > > > > > > > > field in ACPI seems cleaner ... > > > > > > > > > > > > > > > > > > > > > > In fact, I just want to align the node_id concept in QEMU and > > Linux. > > > > > > > If we fix the kernel side, we need to align with all platforms. > > > > > > > i think maybe not a good idea. > > > > > > if linux makes up node ID's on it's own, it would be hard for it to > > > > > > map SRAT entries to other tables that use proximity id as well. > > > > > > > > > > > > So it would need to maintain map of [proximity id] -> [node id] (and > > > > reverse) > > > > > > somewhere to resolve mappings on other tables. > > > > > > If it doesn't do this then it's broken and works just by accident, > > > > > > if it does the fix probably should be in that code and not in QEMU. > > > > > > > > > > > > > > > > Hmm, the problem is how to understand the concept of node id. > > > > > 1. In dts, there is node id. Both the QEMU and Linux can use it > > > > > directly, so no conflict. > > > > > 2. In ACPI, there is only proximity domain, but no node id, there > > > > > should be a mapping between them, while kernel and QEMU > > maintain > > > > > their own rule, and unfortunately they conflict with each other > > > > > sometimes. > > > > > > > > > > There is no specification to indicate what we should do to maintain > > the > > > > > mapping, so it's hard to align the QEMU and Linux. > > > > > > > > > > Any suggestion, or we just accept it as a rule since it don't affect > > much? > > > > > > > > If node id generation is driven by SRAT content, it might be reasonable > > to > > > > ask for SRAT parser in kernel to create node ids using proximity value > > > > instead of the order ACPI_SRAT_PROCESSOR_GICC structures are > > > > enumerated. > > > > That way node id would match ACPI spec. > > > > > > > > > > I don't quite understand "That way node id would match ACPI spec." > > > I check the ACPI 6.3 spec, I didn't find any description that node id should > > > be equal to proximity value, in section 6.2.15, there is indeed an example > > > which node numbers equals to proximity value. > > > > There is only proximity id in spec (which conceptually is the same as node > > id). > > What I'm saying is that linux kernel ACPI parser could use it as node id > > instead of using counter for making up node ids. > > > > (issue with it is that proximity values could be sparse, so one > > would need to audit current node id users to make sure that using > > proximity id won't break anything) > > > I think that is really a problem, in linux kernel: > #if MAX_NUMNODEs > 256 > #define MAX_PXM_DOMAINS MAX_NUMNODEs > #else > #define MAX_PXM_DOMAINS 256 > #endif > It seems that using counter is a better choice. In QEMU case nodeid isn't sparse. So even if kernel uses counter, it will work fine as far as it sorts SRAT by proximity value (same thing you are trying to do in this patch but on kernel side). That way it would match what HW vendors think about node numbering in most cases. (We a comparing apples to oranges, so arguing for one case vs another is rather pointless) Another question, does this patch fixes anything? (if it is not, it might be better to just drop this idea) > > > > Thanks > > > > > > > But even with that I'd wouldn't expect cpu ids match as its basically > > > > arbitrary numbers on both sided. One would need to use arch specific > > ids > > > > to reliably match cpus on both sides (MPIDR in ARM case or APICID in > > x86). > > > >. > > > > > > > > > > > > > > > > --- > > > > > > > > > hw/arm/virt-acpi-build.c | 23 +++++++++++++++-------- > > > > > > > > > 1 file changed, 15 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > > > > > > > > > index bd5f771..497192b 100644 > > > > > > > > > --- a/hw/arm/virt-acpi-build.c > > > > > > > > > +++ b/hw/arm/virt-acpi-build.c > > > > > > > > > @@ -520,7 +520,8 @@ build_srat(GArray *table_data, > > > > BIOSLinker > > > > > > > > *linker, VirtMachineState *vms) > > > > > > > > > AcpiSystemResourceAffinityTable *srat; > > > > > > > > > AcpiSratProcessorGiccAffinity *core; > > > > > > > > > AcpiSratMemoryAffinity *numamem; > > > > > > > > > - int i, srat_start; > > > > > > > > > + int i, j, srat_start; > > > > > > > > > + uint32_t node_id; > > > > > > > > > uint64_t mem_base; > > > > > > > > > MachineClass *mc = MACHINE_GET_CLASS(vms); > > > > > > > > > MachineState *ms = MACHINE(vms); > > > > > > > > > @@ -530,13 +531,19 @@ build_srat(GArray *table_data, > > > > BIOSLinker > > > > > > > > *linker, VirtMachineState *vms) > > > > > > > > > srat = acpi_data_push(table_data, sizeof(*srat)); > > > > > > > > > srat->reserved1 = cpu_to_le32(1); > > > > > > > > > > > > > > > > > > - for (i = 0; i < cpu_list->len; ++i) { > > > > > > > > > - core = acpi_data_push(table_data, sizeof(*core)); > > > > > > > > > - core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > > > > > - core->length = sizeof(*core); > > > > > > > > > - core->proximity = > > > > > > > > cpu_to_le32(cpu_list->cpus[i].props.node_id); > > > > > > > > > - core->acpi_processor_uid = cpu_to_le32(i); > > > > > > > > > - core->flags = cpu_to_le32(1); > > > > > > > > > + for (i = 0; i < ms->numa_state->num_nodes; ++i) { > > > > > > > > > + for (j = 0; j < cpu_list->len; ++j) { > > > > > > > > > > > > > > > > Hmm O(n ^2) isn't great ... > > > > > > > Good suggestion, 3Q. > > > > > > > > > > > > > > > > > + node_id = > > > > > > cpu_to_le32(cpu_list->cpus[j].props.node_id); > > > > > > > > > + if (node_id != i) { > > > > > > > > > + continue; > > > > > > > > > + } > > > > > > > > > + core = acpi_data_push(table_data, > > sizeof(*core)); > > > > > > > > > + core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > > > > > + core->length = sizeof(*core); > > > > > > > > > + core->proximity = node_id; > > > > > > > > > + core->acpi_processor_uid = cpu_to_le32(j); > > > > > > > > > + core->flags = cpu_to_le32(1); > > > > > > > > > + } > > > > > > > > > } > > > > > > > > > > > > > > > > is the issue arm specific? wouldn't it affect x86 too? > > > > > > > > > > > > > > > Good question, I think it will affect x86, but I need to confirm. > > > > > > > > > > > > > > > > mem_base = vms->memmap[VIRT_MEM].base; > > > > > > > > > -- > > > > > > > > > 2.8.1 > > > > > > > > > > > > > > > > >
> -----Original Message----- > From: Igor Mammedov [mailto:imammedo@redhat.com] > Sent: Monday, January 13, 2020 5:06 PM > To: Zengtao (B) > Cc: Peter Maydell; Michael S. Tsirkin; qemu-trivial@nongnu.org; > qemu-devel@nongnu.org; Shannon Zhao; qemu-arm@nongnu.org > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure > by node_id ascending order > > On Fri, 10 Jan 2020 02:56:35 +0000 > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > -----Original Message----- > > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > > Sent: Thursday, January 09, 2020 5:54 PM > > > To: Zengtao (B) > > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; > qemu-trivial@nongnu.org; > > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors > structure by > > > node_id ascending order > > > > > > On Thu, 9 Jan 2020 02:45:58 +0000 > > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > > > -----Original Message----- > > > > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > > > > Sent: Thursday, January 09, 2020 12:39 AM > > > > > To: Zengtao (B) > > > > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; > > > qemu-trivial@nongnu.org; > > > > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors > structure > > > by > > > > > node_id ascending order > > > > > > > > > > On Wed, 8 Jan 2020 04:02:10 +0000 > > > > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Igor Mammedov [mailto:imammedo@redhat.com] > > > > > > > Sent: Tuesday, January 07, 2020 11:50 PM > > > > > > > To: Zengtao (B) > > > > > > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; > > > > > qemu-trivial@nongnu.org; > > > > > > > Shannon Zhao; Peter Maydell; qemu-arm@nongnu.org > > > > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors > > > structure > > > > > by > > > > > > > node_id ascending order > > > > > > > > > > > > > > On Tue, 7 Jan 2020 10:29:22 +0000 > > > > > > > "Zengtao (B)" <prime.zeng@hisilicon.com> wrote: > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Michael S. Tsirkin [mailto:mst@redhat.com] > > > > > > > > > Sent: Tuesday, January 07, 2020 5:33 PM > > > > > > > > > To: Zengtao (B) > > > > > > > > > Cc: qemu-devel@nongnu.org; qemu-trivial@nongnu.org; > > > Shannon > > > > > > > Zhao; > > > > > > > > > Peter Maydell; Igor Mammedov; qemu-arm@nongnu.org > > > > > > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT > processors > > > > > structure > > > > > > > by > > > > > > > > > node_id ascending order > > > > > > > > > > > > > > > > > > On Tue, Jan 07, 2020 at 05:18:49PM +0800, Zeng Tao > wrote: > > > > > > > > > > When booting the guest linux with the following numa > > > > > configuration: > > > > > > > > > > -numa node,node_id=1,cpus=0-3 > > > > > > > > > > -numa node,node_id=0,cpus=4-7 > > > > > > > > > > We can get the following numa topology in the guest > system: > > > > > > > > > > Architecture: aarch64 > > > > > > > > > > Byte Order: Little Endian > > > > > > > > > > CPU(s): 8 > > > > > > > > > > On-line CPU(s) list: 0-7 > > > > > > > > > > Thread(s) per core: 1 > > > > > > > > > > Core(s) per socket: 8 > > > > > > > > > > Socket(s): 1 > > > > > > > > > > NUMA node(s): 2 > > > > > > > > > > L1d cache: unknown size > > > > > > > > > > L1i cache: unknown size > > > > > > > > > > L2 cache: unknown size > > > > > > > > > > NUMA node0 CPU(s): 0-3 > > > > > > > > > > NUMA node1 CPU(s): 4-7 > > > > > > > > > > The Cpus 0-3 is assigned with NUMA node 1 in QEMU > while it > > > get > > > > > > > NUMA > > > > > > > > > node > > > > > > > > > > 0 in the guest. > > > > > > > > > > > > > > > > > > > > In fact, In the linux kernel, numa_node_id is allocated > per the > > > > > ACPI > > > > > > > > > > SRAT processors structure order,so the cpu 0 will be the > first > > > one > > > > > to > > > > > > > > > > allocate its NUMA node id, so it gets the NUMA node 0. > > > > > > > > > > > > > > > > > > > > To fix this issue, we pack the SRAT processors structure > in > > > numa > > > > > node > > > > > > > id > > > > > > > > > > order but not the default cpu number order. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > Does this matter? If yes fixing linux to take node id from > > > proximity > > > > > > > > > field in ACPI seems cleaner ... > > > > > > > > > > > > > > > > > > > > > > > > > In fact, I just want to align the node_id concept in QEMU > and > > > Linux. > > > > > > > > If we fix the kernel side, we need to align with all platforms. > > > > > > > > i think maybe not a good idea. > > > > > > > if linux makes up node ID's on it's own, it would be hard for it > to > > > > > > > map SRAT entries to other tables that use proximity id as > well. > > > > > > > > > > > > > > So it would need to maintain map of [proximity id] -> [node > id] (and > > > > > reverse) > > > > > > > somewhere to resolve mappings on other tables. > > > > > > > If it doesn't do this then it's broken and works just by > accident, > > > > > > > if it does the fix probably should be in that code and not in > QEMU. > > > > > > > > > > > > > > > > > > > Hmm, the problem is how to understand the concept of node > id. > > > > > > 1. In dts, there is node id. Both the QEMU and Linux can use it > > > > > > directly, so no conflict. > > > > > > 2. In ACPI, there is only proximity domain, but no node id, there > > > > > > should be a mapping between them, while kernel and QEMU > > > maintain > > > > > > their own rule, and unfortunately they conflict with each > other > > > > > > sometimes. > > > > > > > > > > > > There is no specification to indicate what we should do to > maintain > > > the > > > > > > mapping, so it's hard to align the QEMU and Linux. > > > > > > > > > > > > Any suggestion, or we just accept it as a rule since it don't > affect > > > much? > > > > > > > > > > If node id generation is driven by SRAT content, it might be > reasonable > > > to > > > > > ask for SRAT parser in kernel to create node ids using proximity > value > > > > > instead of the order ACPI_SRAT_PROCESSOR_GICC structures are > > > > > enumerated. > > > > > That way node id would match ACPI spec. > > > > > > > > > > > > > I don't quite understand "That way node id would match ACPI > spec." > > > > I check the ACPI 6.3 spec, I didn't find any description that node id > should > > > > be equal to proximity value, in section 6.2.15, there is indeed an > example > > > > which node numbers equals to proximity value. > > > > > > There is only proximity id in spec (which conceptually is the same as > node > > > id). > > > What I'm saying is that linux kernel ACPI parser could use it as node > id > > > instead of using counter for making up node ids. > > > > > > (issue with it is that proximity values could be sparse, so one > > > would need to audit current node id users to make sure that using > > > proximity id won't break anything) > > > > > I think that is really a problem, in linux kernel: > > #if MAX_NUMNODEs > 256 > > #define MAX_PXM_DOMAINS MAX_NUMNODEs > > #else > > #define MAX_PXM_DOMAINS 256 > > #endif > > It seems that using counter is a better choice. > > In QEMU case nodeid isn't sparse. So even if kernel uses counter, it will > work fine as far as it sorts SRAT by proximity value (same thing you are > trying > to do in this patch but on kernel side). > That way it would match what HW vendors think about node numbering > in most cases. > > (We a comparing apples to oranges, so arguing for one case vs another is > rather pointless) > > Another question, does this patch fixes anything? > (if it is not, it might be better to just drop this idea) > :) I am not persist on this patch, but I think you have provide me an important clue. "Most HW vendors number pxm and node in the same way, and so does QEMU", so i think it's reasonable for linux kernel to do the same. So i think we can end the discussion and just ignore this patch. Thanks very much. Regards Zengtao
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index bd5f771..497192b 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -520,7 +520,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) AcpiSystemResourceAffinityTable *srat; AcpiSratProcessorGiccAffinity *core; AcpiSratMemoryAffinity *numamem; - int i, srat_start; + int i, j, srat_start; + uint32_t node_id; uint64_t mem_base; MachineClass *mc = MACHINE_GET_CLASS(vms); MachineState *ms = MACHINE(vms); @@ -530,13 +531,19 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) srat = acpi_data_push(table_data, sizeof(*srat)); srat->reserved1 = cpu_to_le32(1); - for (i = 0; i < cpu_list->len; ++i) { - core = acpi_data_push(table_data, sizeof(*core)); - core->type = ACPI_SRAT_PROCESSOR_GICC; - core->length = sizeof(*core); - core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id); - core->acpi_processor_uid = cpu_to_le32(i); - core->flags = cpu_to_le32(1); + for (i = 0; i < ms->numa_state->num_nodes; ++i) { + for (j = 0; j < cpu_list->len; ++j) { + node_id = cpu_to_le32(cpu_list->cpus[j].props.node_id); + if (node_id != i) { + continue; + } + core = acpi_data_push(table_data, sizeof(*core)); + core->type = ACPI_SRAT_PROCESSOR_GICC; + core->length = sizeof(*core); + core->proximity = node_id; + core->acpi_processor_uid = cpu_to_le32(j); + core->flags = cpu_to_le32(1); + } } mem_base = vms->memmap[VIRT_MEM].base;
When booting the guest linux with the following numa configuration: -numa node,node_id=1,cpus=0-3 -numa node,node_id=0,cpus=4-7 We can get the following numa topology in the guest system: Architecture: aarch64 Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 2 L1d cache: unknown size L1i cache: unknown size L2 cache: unknown size NUMA node0 CPU(s): 0-3 NUMA node1 CPU(s): 4-7 The Cpus 0-3 is assigned with NUMA node 1 in QEMU while it get NUMA node 0 in the guest. In fact, In the linux kernel, numa_node_id is allocated per the ACPI SRAT processors structure order,so the cpu 0 will be the first one to allocate its NUMA node id, so it gets the NUMA node 0. To fix this issue, we pack the SRAT processors structure in numa node id order but not the default cpu number order. Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> --- hw/arm/virt-acpi-build.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)