From patchwork Fri Jun 16 09:26:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13282392 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D86FEB64DB for ; Fri, 16 Jun 2023 09:30:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343849AbjFPJay (ORCPT ); Fri, 16 Jun 2023 05:30:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344666AbjFPJaF (ORCPT ); Fri, 16 Jun 2023 05:30:05 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 217944493 for ; Fri, 16 Jun 2023 02:28:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686907676; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6jEoy6ZNsKhLBjtmNt1kuWH2Fe86pUOaubwwitfFo6c=; b=Uhq9rPgV5dPD0p0HzY4x1/sRI3l6br2hpbng8OhUouZ3aeExQ2hsaL4Q0bdvGFyjkwBrKO d2S+dUHQOfFd94VwGvKoUk7Ro5s4ORcMe8jst+uZdPrcq4968vn3a3N7SGBXG7YfF0s1jj tDtf66USPBSeKiQt39k6KNuVpmpqZr4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-369-kUIF_0_XPLGQUNV91CYUHw-1; Fri, 16 Jun 2023 05:27:53 -0400 X-MC-Unique: kUIF_0_XPLGQUNV91CYUHw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 89ACE811E78; Fri, 16 Jun 2023 09:27:52 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.194.44]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8FE031121314; Fri, 16 Jun 2023 09:27:49 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Gavin Shan , Alex Williamson , kvm@vger.kernel.org Subject: [PATCH v1 13/15] virtio-mem: Expose device memory via multiple memslots if enabled Date: Fri, 16 Jun 2023 11:26:52 +0200 Message-Id: <20230616092654.175518-14-david@redhat.com> In-Reply-To: <20230616092654.175518-1-david@redhat.com> References: <20230616092654.175518-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Having large virtio-mem devices that only expose little memory to a VM is currently a problem: we map the whole sparse memory region into the guest using a single memslot, resulting in one gigantic memslot in KVM. KVM allocates metadata for the whole memslot, which can result in quite some memory waste. Assuming we have a 1 TiB virtio-mem device and only expose little (e.g., 1 GiB) memory, we would create a single 1 TiB memslot and KVM has to allocate metadata for that 1 TiB memslot: on x86, this implies allocating a significant amount of memory for metadata: (1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB -> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %) With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets allocated lazily when required for nested VMs (2) gfn_track: 2 bytes per 4 KiB -> For 1 TiB: 536870912 = ~512 MiB (0.05 %) (3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB -> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %) (4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page -> For 1 TiB: 536870912 = 64 MiB (0.006 %) So we primarily care about (1) and (2). The bad thing is, that the memory consumption *doubles* once SMM is enabled, because we create the memslot once for !SMM and once for SMM. Having a 1 TiB memslot without the TDP MMU consumes around: * With SMM: 5 GiB * Without SMM: 2.5 GiB Having a 1 TiB memslot with the TDP MMU consumes around: * With SMM: 1 GiB * Without SMM: 512 MiB ... and that's really something we want to optimize, to be able to just start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device that can grow very large (e.g., 1 TiB). Consequently, using multiple memslots and only mapping the memslots we really need can significantly reduce memory waste and speed up memslot-related operations. Let's expose the sparse RAM memory region using multiple memslots, mapping only the memslots we currently need into our device memory region container. * With VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we only map the memslots that actually have memory plugged, and dynamically (un)map when (un)plugging memory blocks. * Without VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we always map the memslots covered by the usable region, and dynamically (un)map when resizing the usable region. We'll auto-determine the number of memslots to use based on the suggested memslot limit provided by the core. We'll use at most 1 memslot per gigabyte. Note that our global limit of memslots accross all memory devices is currently set to 256: even with multiple large virtio-mem devices, we'd still have a sane limit on the number of memslots used. The default is a single memslot for now ("multiple-memslots=off"). The optimization must be enabled manually using "multiple-memslots=on", because some vhost setups (e.g., hotplug of vhost-user devices) might be problematic until we support more memslots especially in vhost-user backends. Note that "multiple-memslots=on" is just a hint that multiple memslots *may* be used for internal optimizations, not that multiple memslots *must* be used. The actual number of memslots that are used is an internal detail: for example, once memslot metadata is no longer an issue, we could simply stop optimizing for that. Migration source and destination can differ on the setting of "multiple-memslots". Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem-pci.c | 21 +++ hw/virtio/virtio-mem.c | 265 ++++++++++++++++++++++++++++++++- include/hw/virtio/virtio-mem.h | 23 ++- 3 files changed, 304 insertions(+), 5 deletions(-) diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c index b85c12668d..8b403e7e78 100644 --- a/hw/virtio/virtio-mem-pci.c +++ b/hw/virtio/virtio-mem-pci.c @@ -48,6 +48,25 @@ static MemoryRegion *virtio_mem_pci_get_memory_region(MemoryDeviceState *md, return vmc->get_memory_region(vmem, errp); } +static void virtio_mem_pci_set_suggested_memslot_limit(MemoryDeviceState *md, + unsigned int limit) +{ + VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md); + VirtIOMEM *vmem = VIRTIO_MEM(&pci_mem->vdev); + VirtIOMEMClass *vmc = VIRTIO_MEM_GET_CLASS(vmem); + + vmc->set_suggested_memslot_limit(vmem, limit); +} + +static unsigned int virtio_mem_pci_get_memslots(MemoryDeviceState *md) +{ + VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md); + VirtIOMEM *vmem = VIRTIO_MEM(&pci_mem->vdev); + VirtIOMEMClass *vmc = VIRTIO_MEM_GET_CLASS(vmem); + + return vmc->get_memslots(vmem); +} + static uint64_t virtio_mem_pci_get_plugged_size(const MemoryDeviceState *md, Error **errp) { @@ -109,6 +128,8 @@ static void virtio_mem_pci_class_init(ObjectClass *klass, void *data) mdc->set_addr = virtio_mem_pci_set_addr; mdc->get_plugged_size = virtio_mem_pci_get_plugged_size; mdc->get_memory_region = virtio_mem_pci_get_memory_region; + mdc->set_suggested_memslot_limit = virtio_mem_pci_set_suggested_memslot_limit; + mdc->get_memslots = virtio_mem_pci_get_memslots; mdc->fill_device_info = virtio_mem_pci_fill_device_info; mdc->get_min_alignment = virtio_mem_pci_get_min_alignment; } diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index e24269e745..516370067a 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -66,6 +66,13 @@ static uint32_t virtio_mem_default_thp_size(void) return default_thp_size; } +/* + * The minimum memslot size depends on this setting ("sane default"), the + * device block size, and the memory backend page size. The last (or single) + * memslot might be smaller than this constant. + */ +#define VIRTIO_MEM_MIN_MEMSLOT_SIZE (1 * GiB) + /* * We want to have a reasonable default block size such that * 1. We avoid splitting THPs when unplugging memory, which degrades @@ -483,6 +490,94 @@ static bool virtio_mem_valid_range(const VirtIOMEM *vmem, uint64_t gpa, return true; } +static void virtio_mem_activate_memslot(VirtIOMEM *vmem, unsigned int idx) +{ + const uint64_t memslot_offset = idx * vmem->memslot_size; + + /* + * Instead of enabling/disabling memslot, we add/remove them. This should + * make address space updates faster, because we don't have to loop over + * many disabled subregions. + */ + if (memory_region_is_mapped(&vmem->memslots[idx])) { + return; + } + memory_region_add_subregion(vmem->mr, memslot_offset, &vmem->memslots[idx]); +} + +static void virtio_mem_deactivate_memslot(VirtIOMEM *vmem, unsigned int idx) +{ + if (!memory_region_is_mapped(&vmem->memslots[idx])) { + return; + } + memory_region_del_subregion(vmem->mr, &vmem->memslots[idx]); +} + +static void virtio_mem_activate_memslots_to_plug(VirtIOMEM *vmem, + uint64_t offset, uint64_t size) +{ + const unsigned int start_idx = offset / vmem->memslot_size; + const unsigned int end_idx = (offset + size + vmem->memslot_size - 1) / + vmem->memslot_size; + unsigned int idx; + + if (vmem->unplugged_inaccessible == ON_OFF_AUTO_OFF) { + /* All memslots covered by the usable region are always enabled. */ + return; + } + + /* Activate all involved memslots in a single transaction. */ + memory_region_transaction_begin(); + for (idx = start_idx; idx < end_idx; idx++) { + virtio_mem_activate_memslot(vmem, idx); + } + memory_region_transaction_commit(); +} + +static void virtio_mem_deactivate_unplugged_memslots(VirtIOMEM *vmem, + uint64_t offset, + uint64_t size) +{ + const uint64_t region_size = memory_region_size(&vmem->memdev->mr); + const unsigned int start_idx = offset / vmem->memslot_size; + const unsigned int end_idx = (offset + size + vmem->memslot_size - 1) / + vmem->memslot_size; + unsigned int idx; + + if (vmem->unplugged_inaccessible == ON_OFF_AUTO_OFF) { + /* All memslots covered by the usable region are always enabled. */ + return; + } + + /* Deactivate all memslots with unplugged blocks in a single transaction. */ + memory_region_transaction_begin(); + for (idx = start_idx; idx < end_idx; idx++) { + const uint64_t memslot_offset = idx * vmem->memslot_size; + uint64_t memslot_size = vmem->memslot_size; + + /* The size of the last memslot might be smaller. */ + if (memslot_offset + memslot_size > region_size) { + memslot_size = region_size - memslot_offset; + } + + /* + * Partially covered memslots might still have some blocks plugged and + * have to remain enabled if that's the case. + */ + if (offset > memslot_offset || + offset + size < memslot_offset + memslot_size) { + const uint64_t gpa = vmem->addr + memslot_offset; + + if (!virtio_mem_is_range_unplugged(vmem, gpa, memslot_size)) { + continue; + } + } + + virtio_mem_deactivate_memslot(vmem, idx); + } + memory_region_transaction_commit(); +} + static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, uint64_t size, bool plug) { @@ -500,6 +595,8 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, } virtio_mem_notify_unplug(vmem, offset, size); virtio_mem_set_range_unplugged(vmem, start_gpa, size); + /* Disable completely unplugged memslots after updating the state. */ + virtio_mem_deactivate_unplugged_memslots(vmem, offset, size); return 0; } @@ -527,7 +624,20 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, } if (!ret) { + /* + * Activate before notifying and rollback in case of any errors. + * + * When enabling a yet disabled memslot, memory notifiers will get + * notified about the added memory region and can register with the + * RamDiscardManager; this will traverse all plugged blocks and skip the + * blocks we are plugging here. The following notification will inform + * registered listeners about the blocks we're plugging. + */ + virtio_mem_activate_memslots_to_plug(vmem, offset, size); ret = virtio_mem_notify_plug(vmem, offset, size); + if (ret) { + virtio_mem_deactivate_unplugged_memslots(vmem, offset, size); + } } if (ret) { /* Could be preallocation or a notifier populated memory. */ @@ -602,6 +712,7 @@ static void virtio_mem_resize_usable_region(VirtIOMEM *vmem, { uint64_t newsize = MIN(memory_region_size(&vmem->memdev->mr), requested_size + VIRTIO_MEM_USABLE_EXTENT); + unsigned int idx; /* The usable region size always has to be multiples of the block size. */ newsize = QEMU_ALIGN_UP(newsize, vmem->block_size); @@ -616,17 +727,34 @@ static void virtio_mem_resize_usable_region(VirtIOMEM *vmem, trace_virtio_mem_resized_usable_region(vmem->usable_region_size, newsize); vmem->usable_region_size = newsize; + + if (vmem->unplugged_inaccessible == ON_OFF_AUTO_OFF) { + /* + * Activate all memslots covered by the usable region and deactivate the + * remaining ones in a single transaction. + */ + memory_region_transaction_begin(); + for (idx = 0; idx < vmem->nb_memslots; idx++) { + if (vmem->memslot_size * idx < vmem->usable_region_size) { + virtio_mem_activate_memslot(vmem, idx); + } else { + virtio_mem_deactivate_memslot(vmem, idx); + } + } + memory_region_transaction_commit(); + } } static int virtio_mem_unplug_all(VirtIOMEM *vmem) { + const uint64_t region_size = memory_region_size(&vmem->memdev->mr); RAMBlock *rb = vmem->memdev->mr.ram_block; if (virtio_mem_is_busy()) { return -EBUSY; } - if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) { + if (ram_block_discard_range(rb, 0, region_size)) { return -EBUSY; } virtio_mem_notify_unplug_all(vmem); @@ -636,6 +764,9 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem) vmem->size = 0; notifier_list_notify(&vmem->size_change_notifiers, &vmem->size); } + /* Deactivate all memslots after updating the state. */ + virtio_mem_deactivate_unplugged_memslots(vmem, 0, region_size); + trace_virtio_mem_unplugged_all(); virtio_mem_resize_usable_region(vmem, vmem->requested_size, true); return 0; @@ -790,6 +921,43 @@ static void virtio_mem_system_reset(void *opaque) virtio_mem_unplug_all(vmem); } +static void virtio_mem_prepare_mr(VirtIOMEM *vmem) +{ + const uint64_t region_size = memory_region_size(&vmem->memdev->mr); + + g_assert(!vmem->mr); + vmem->mr = g_new0(MemoryRegion, 1); + memory_region_init(vmem->mr, OBJECT(vmem), "virtio-mem", + region_size); + vmem->mr->align = memory_region_get_alignment(&vmem->memdev->mr); +} + +static void virtio_mem_prepare_memslots(VirtIOMEM *vmem) +{ + const uint64_t region_size = memory_region_size(&vmem->memdev->mr); + unsigned int idx; + + g_assert(!vmem->memslots && vmem->nb_memslots); + vmem->memslots = g_new0(MemoryRegion, vmem->nb_memslots); + + /* Initialize our memslots, but don't map them yet. */ + for (idx = 0; idx < vmem->nb_memslots; idx++) { + const uint64_t memslot_offset = idx * vmem->memslot_size; + uint64_t memslot_size = vmem->memslot_size; + char name[20]; + + /* The size of the last memslot might be smaller. */ + if (idx == vmem->nb_memslots) { + memslot_size = region_size - memslot_offset; + } + + snprintf(name, sizeof(name), "memslot-%u", idx); + memory_region_init_alias(&vmem->memslots[idx], OBJECT(vmem), name, + &vmem->memdev->mr, memslot_offset, + memslot_size); + } +} + static void virtio_mem_device_realize(DeviceState *dev, Error **errp) { MachineState *ms = MACHINE(qdev_get_machine()); @@ -909,8 +1077,6 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) return; } - virtio_mem_resize_usable_region(vmem, vmem->requested_size, true); - vmem->bitmap_size = memory_region_size(&vmem->memdev->mr) / vmem->block_size; vmem->bitmap = bitmap_new(vmem->bitmap_size); @@ -918,6 +1084,18 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) virtio_init(vdev, VIRTIO_ID_MEM, sizeof(struct virtio_mem_config)); vmem->vq = virtio_add_queue(vdev, 128, virtio_mem_handle_request); + if (!vmem->mr) { + virtio_mem_prepare_mr(vmem); + } + if (!vmem->nb_memslots || vmem->nb_memslots == 1) { + vmem->nb_memslots = 1; + vmem->memslot_size = memory_region_size(&vmem->memdev->mr); + } + if (!vmem->memslots) { + virtio_mem_prepare_memslots(vmem); + } + + virtio_mem_resize_usable_region(vmem, vmem->requested_size, true); host_memory_backend_set_mapped(vmem->memdev, true); vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem)); if (vmem->early_migration) { @@ -951,6 +1129,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev) } vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem)); host_memory_backend_set_mapped(vmem->memdev, false); + virtio_mem_resize_usable_region(vmem, 0, true); virtio_del_queue(vdev, 0); virtio_cleanup(vdev); g_free(vmem->bitmap); @@ -1207,9 +1386,67 @@ static MemoryRegion *virtio_mem_get_memory_region(VirtIOMEM *vmem, Error **errp) if (!vmem->memdev) { error_setg(errp, "'%s' property must be set", VIRTIO_MEM_MEMDEV_PROP); return NULL; + } else if (!vmem->mr) { + virtio_mem_prepare_mr(vmem); + } + + return vmem->mr; +} + +static void virtio_mem_set_suggested_memslot_limit(VirtIOMEM *vmem, + unsigned int limit) +{ + uint64_t region_size, memslot_size, min_memslot_size; + unsigned int memslots; + RAMBlock *rb; + + /* We're called exactly once, before realizing the device. */ + g_assert(!vmem->nb_memslots); + + /* If realizing the device will fail, just assume a single memslot. */ + if (limit <= 1 || !vmem->multiple_memslots || !vmem->memdev || + !vmem->memdev->mr.ram_block) { + vmem->nb_memslots = 1; + return; + } + + rb = vmem->memdev->mr.ram_block; + region_size = memory_region_size(&vmem->memdev->mr); + + /* + * Determine the default block size now, to determine the minimum memslot + * size. We want the minimum slot size to be at least the device block size. + */ + if (!vmem->block_size) { + vmem->block_size = virtio_mem_default_block_size(rb); + } + /* If realizing the device will fail, just assume a single memslot. */ + if (vmem->block_size < qemu_ram_pagesize(rb) || + !QEMU_IS_ALIGNED(region_size, vmem->block_size)) { + vmem->nb_memslots = 1; + return; } - return &vmem->memdev->mr; + /* + * All memslots except the last one have a reasonable minimum size, and + * and all memslot sizes are aligned to the device block size. + */ + memslot_size = QEMU_ALIGN_UP(region_size / limit, vmem->block_size); + min_memslot_size = MAX(vmem->block_size, VIRTIO_MEM_MIN_MEMSLOT_SIZE); + memslot_size = MAX(memslot_size, min_memslot_size); + + memslots = QEMU_ALIGN_UP(region_size, memslot_size) / memslot_size; + if (memslots != 1) { + vmem->memslot_size = memslot_size; + } + vmem->nb_memslots = memslots; +} + +static unsigned int virtio_mem_get_memslots(VirtIOMEM *vmem) +{ + /* We're called after setting the suggested limit. */ + g_assert(vmem->nb_memslots); + return vmem->nb_memslots; } static void virtio_mem_add_size_change_notifier(VirtIOMEM *vmem, @@ -1349,6 +1586,21 @@ static void virtio_mem_instance_init(Object *obj) NULL, NULL); } +static void virtio_mem_instance_finalize(Object *obj) +{ + VirtIOMEM *vmem = VIRTIO_MEM(obj); + + /* + * Note: the core already dropped the references on all memory regions + * (it's passed as the owner to memory_region_init_*()) and finalized + * these objects. We can simply free the memory. + */ + g_free(vmem->memslots); + vmem->memslots = NULL; + g_free(vmem->mr); + vmem->mr = NULL; +} + static Property virtio_mem_properties[] = { DEFINE_PROP_UINT64(VIRTIO_MEM_ADDR_PROP, VirtIOMEM, addr, 0), DEFINE_PROP_UINT32(VIRTIO_MEM_NODE_PROP, VirtIOMEM, node, 0), @@ -1361,6 +1613,8 @@ static Property virtio_mem_properties[] = { #endif DEFINE_PROP_BOOL(VIRTIO_MEM_EARLY_MIGRATION_PROP, VirtIOMEM, early_migration, true), + DEFINE_PROP_BOOL(VIRTIO_MEM_MULTIPLE_MEMSLOTS_PROP, VirtIOMEM, + multiple_memslots, false), DEFINE_PROP_END_OF_LIST(), }; @@ -1504,6 +1758,8 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data) vmc->fill_device_info = virtio_mem_fill_device_info; vmc->get_memory_region = virtio_mem_get_memory_region; + vmc->set_suggested_memslot_limit = virtio_mem_set_suggested_memslot_limit; + vmc->get_memslots = virtio_mem_get_memslots; vmc->add_size_change_notifier = virtio_mem_add_size_change_notifier; vmc->remove_size_change_notifier = virtio_mem_remove_size_change_notifier; @@ -1520,6 +1776,7 @@ static const TypeInfo virtio_mem_info = { .parent = TYPE_VIRTIO_DEVICE, .instance_size = sizeof(VirtIOMEM), .instance_init = virtio_mem_instance_init, + .instance_finalize = virtio_mem_instance_finalize, .class_init = virtio_mem_class_init, .class_size = sizeof(VirtIOMEMClass), .interfaces = (InterfaceInfo[]) { diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h index f15e561785..7fe9460c69 100644 --- a/include/hw/virtio/virtio-mem.h +++ b/include/hw/virtio/virtio-mem.h @@ -33,6 +33,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass, #define VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP "unplugged-inaccessible" #define VIRTIO_MEM_EARLY_MIGRATION_PROP "x-early-migration" #define VIRTIO_MEM_PREALLOC_PROP "prealloc" +#define VIRTIO_MEM_MULTIPLE_MEMSLOTS_PROP "multiple-memslots" struct VirtIOMEM { VirtIODevice parent_obj; @@ -44,7 +45,22 @@ struct VirtIOMEM { int32_t bitmap_size; unsigned long *bitmap; - /* assigned memory backend and memory region */ + /* Device memory region in which we map the individual memslots. */ + MemoryRegion *mr; + + /* The individual memslots (aliases into the memory backend). */ + MemoryRegion *memslots; + + /* The total number of memslots. */ + uint16_t nb_memslots; + + /* Size of one memslot (the last one can be smaller). */ + uint64_t memslot_size; + + /* + * Assigned memory backend with the RAM memory region we split into + * memslots, to map the individual memslots only on demand. + */ HostMemoryBackend *memdev; /* NUMA node */ @@ -82,6 +98,9 @@ struct VirtIOMEM { */ bool early_migration; + /* Whether we may use multiple memslots instead of only a single one. */ + bool multiple_memslots; + /* notifiers to notify when "size" changes */ NotifierList size_change_notifiers; @@ -96,6 +115,8 @@ struct VirtIOMEMClass { /* public */ void (*fill_device_info)(const VirtIOMEM *vmen, VirtioMEMDeviceInfo *vi); MemoryRegion *(*get_memory_region)(VirtIOMEM *vmem, Error **errp); + void (*set_suggested_memslot_limit)(VirtIOMEM *vmem, unsigned int limit); + unsigned int (*get_memslots)(VirtIOMEM *vmem); void (*add_size_change_notifier)(VirtIOMEM *vmem, Notifier *notifier); void (*remove_size_change_notifier)(VirtIOMEM *vmem, Notifier *notifier); };