From patchwork Sat Feb 17 06:31:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haozhong Zhang X-Patchwork-Id: 10225997 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DA6F5601D4 for ; Sat, 17 Feb 2018 06:32:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1B3C288C7 for ; Sat, 17 Feb 2018 06:32:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B2769289D2; Sat, 17 Feb 2018 06:32:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id EFFF1288C7 for ; Sat, 17 Feb 2018 06:32:58 +0000 (UTC) Received: from localhost ([::1]:45657 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1emw3N-0002u6-DN for patchwork-qemu-devel@patchwork.kernel.org; Sat, 17 Feb 2018 01:32:57 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51502) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1emw2h-0002tP-Cv for qemu-devel@nongnu.org; Sat, 17 Feb 2018 01:32:16 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1emw2e-0005Il-8n for qemu-devel@nongnu.org; Sat, 17 Feb 2018 01:32:15 -0500 Received: from mga03.intel.com ([134.134.136.65]:39578) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1emw2d-0005HR-SO for qemu-devel@nongnu.org; Sat, 17 Feb 2018 01:32:12 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Feb 2018 22:32:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,523,1511856000"; d="scan'208";a="32168634" Received: from hz-desktop.sh.intel.com (HELO localhost) ([10.239.13.35]) by orsmga001.jf.intel.com with ESMTP; 16 Feb 2018 22:32:06 -0800 From: Haozhong Zhang To: qemu-devel@nongnu.org Date: Sat, 17 Feb 2018 14:31:35 +0800 Message-Id: <20180217063135.21550-1-haozhong.zhang@intel.com> X-Mailer: git-send-email 2.14.1 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.65 Subject: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Haozhong Zhang , Xiao Guangrong , mst@redhat.com, Eduardo Habkost , Stefan Hajnoczi , Paolo Bonzini , Marcel Apfelbaum , Igor Mammedov , Dan Williams , Richard Henderson Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity domain of a NVDIMM SPA range must match with corresponding entry in SRAT table. The address ranges of vNVDIMM in QEMU are allocated from the hot-pluggable address space, which is entirely covered by one SRAT memory affinity structure. However, users can set the vNVDIMM proximity domain in NFIT SPA range structure by the 'node' property of '-device nvdimm' to a value different than the one in the above SRAT memory affinity structure. In order to solve such proximity domain mismatch, this patch build one SRAT memory affinity structure for each NVDIMM device with the proximity domain used in NFIT. The remaining hot-pluggable address space is covered by one or multiple SRAT memory affinity structures with the proximity domain of the last node as before. Signed-off-by: Haozhong Zhang --- hw/acpi/nvdimm.c | 15 +++++++++++++-- hw/i386/acpi-build.c | 47 +++++++++++++++++++++++++++++++++++++++++++---- include/hw/mem/nvdimm.h | 11 +++++++++++ 3 files changed, 67 insertions(+), 6 deletions(-) diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c index 59d6e4254c..dff0818e77 100644 --- a/hw/acpi/nvdimm.c +++ b/hw/acpi/nvdimm.c @@ -33,12 +33,23 @@ #include "hw/nvram/fw_cfg.h" #include "hw/mem/nvdimm.h" +static gint nvdimm_addr_sort(gconstpointer a, gconstpointer b) +{ + uint64_t addr0 = object_property_get_uint(OBJECT(NVDIMM(a)), + PC_DIMM_ADDR_PROP, NULL); + uint64_t addr1 = object_property_get_uint(OBJECT(NVDIMM(b)), + PC_DIMM_ADDR_PROP, NULL); + + return addr0 < addr1 ? -1 : + addr0 > addr1 ? 1 : 0; +} + static int nvdimm_device_list(Object *obj, void *opaque) { GSList **list = opaque; if (object_dynamic_cast(obj, TYPE_NVDIMM)) { - *list = g_slist_append(*list, DEVICE(obj)); + *list = g_slist_insert_sorted(*list, DEVICE(obj), nvdimm_addr_sort); } object_child_foreach(obj, nvdimm_device_list, opaque); @@ -52,7 +63,7 @@ static int nvdimm_device_list(Object *obj, void *opaque) * Note: it is the caller's responsibility to free the list to avoid * memory leak. */ -static GSList *nvdimm_get_device_list(void) +GSList *nvdimm_get_device_list(void) { GSList *list = NULL; diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index deb440f286..637ac3a8f0 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2323,6 +2323,46 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog) #define HOLE_640K_START (640 * 1024) #define HOLE_640K_END (1024 * 1024) +static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base, + uint64_t len, int default_node) +{ + GSList *nvdimms = nvdimm_get_device_list(); + GSList *ent = nvdimms; + NVDIMMDevice *dev; + uint64_t end = base + len, addr, size; + int node; + AcpiSratMemoryAffinity *numamem; + + while (base < end) { + numamem = acpi_data_push(table_data, sizeof *numamem); + + if (!ent) { + build_srat_memory(numamem, base, end - base, default_node, + MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED); + break; + } + + dev = NVDIMM(ent->data); + addr = object_property_get_uint(OBJECT(dev), PC_DIMM_ADDR_PROP, NULL); + size = object_property_get_uint(OBJECT(dev), PC_DIMM_SIZE_PROP, NULL); + node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, NULL); + + if (base < addr) { + build_srat_memory(numamem, base, addr - base, default_node, + MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED); + numamem = acpi_data_push(table_data, sizeof *numamem); + } + build_srat_memory(numamem, addr, size, node, + MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED | + MEM_AFFINITY_NON_VOLATILE); + + base = addr + size; + ent = ent->next; + } + + g_slist_free(nvdimms); +} + static void build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) { @@ -2434,10 +2474,9 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) * providing _PXM method if necessary. */ if (hotplugabble_address_space_size) { - numamem = acpi_data_push(table_data, sizeof *numamem); - build_srat_memory(numamem, pcms->hotplug_memory.base, - hotplugabble_address_space_size, pcms->numa_nodes - 1, - MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED); + build_srat_hotpluggable_memory(table_data, pcms->hotplug_memory.base, + hotplugabble_address_space_size, + pcms->numa_nodes - 1); } build_header(linker, table_data, diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h index 7fd87c4e1c..ca9d6aa714 100644 --- a/include/hw/mem/nvdimm.h +++ b/include/hw/mem/nvdimm.h @@ -144,4 +144,15 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data, uint32_t ram_slots); void nvdimm_plug(AcpiNVDIMMState *state); void nvdimm_acpi_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev); + +/* + * Inquire NVDIMM devices and link them into the list which is + * returned to the caller and sorted in the ascending order of the + * base address of NVDIMM devices. + * + * Note: it is the caller's responsibility to free the list to avoid + * memory leak. + */ +GSList *nvdimm_get_device_list(void); + #endif