From patchwork Fri Sep 8 14:21:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13377514 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BABD1EE800F for ; Fri, 8 Sep 2023 14:22:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243818AbjIHOW5 (ORCPT ); Fri, 8 Sep 2023 10:22:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243767AbjIHOWz (ORCPT ); Fri, 8 Sep 2023 10:22:55 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26BA61FC0 for ; Fri, 8 Sep 2023 07:22:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182930; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pjxTzzi+bUALobF5fcq+ZnLIGXFuzgDZ7k3W7Jwshcc=; b=OdQT8gbKAW4zGzvi71Uc07VAh//OskGsEnpAgVFW3oBrTViwprsaNq9dxygSzZwRIJ3GFf z0UJ1Dny4HBpx9JZ2Sk8LshVdGT5noLIFWHjCJhAgwpxWpjgdfA0vPf4MKDCLvH9LuLnwF /+9kkYXTQ5wJMZ5/GovtuDX4u857KZk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-658-dAuzfzXiOwKSFnAHbJ0VbQ-1; Fri, 08 Sep 2023 10:22:07 -0400 X-MC-Unique: dAuzfzXiOwKSFnAHbJ0VbQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 76A93181C283; Fri, 8 Sep 2023 14:22:06 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id DBAB2C03295; Fri, 8 Sep 2023 14:22:03 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 09/16] memory-device,vhost: Support memory devices that dynamically consume memslots Date: Fri, 8 Sep 2023 16:21:29 +0200 Message-ID: <20230908142136.403541-10-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org We want to support memory devices that have a dynamically managed memory region container as device memory region. This device memory region maps multiple RAM memory subregions (e.g., aliases to the same RAM memory region), whereby these subregions can be (un)mapped on demand. Each RAM subregion will consume a memslot in KVM and vhost, resulting in such a new device consuming memslots dynamically, and initially usually 0. We already track the number of used vs. required memslots for all memslots. From that, we can derive the number of reserved memslots that must not be used otherwise. The target use case is virtio-mem and the hyper-v balloon, which will dynamically map aliases to RAM memory region into their device memory region container. Properly document what's supported and what's not and extend the vhost memslot check accordingly. Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- hw/mem/memory-device.c | 29 +++++++++++++++++++++++++++-- hw/virtio/vhost.c | 18 ++++++++++++++---- include/hw/mem/memory-device.h | 7 +++++++ stubs/memory_device.c | 5 +++++ 4 files changed, 53 insertions(+), 6 deletions(-) diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c index d37cfbd65d..1b14ba5661 100644 --- a/hw/mem/memory-device.c +++ b/hw/mem/memory-device.c @@ -62,19 +62,44 @@ static unsigned int memory_device_get_memslots(MemoryDeviceState *md) return 1; } +/* + * Memslots that are reserved by memory devices (required but still reported + * as free from KVM / vhost). + */ +static unsigned int get_reserved_memslots(MachineState *ms) +{ + if (ms->device_memory->used_memslots > + ms->device_memory->required_memslots) { + /* This is unexpected, and we warned already in the memory notifier. */ + return 0; + } + return ms->device_memory->required_memslots - + ms->device_memory->used_memslots; +} + +unsigned int memory_devices_get_reserved_memslots(void) +{ + if (!current_machine->device_memory) { + return 0; + } + return get_reserved_memslots(current_machine); +} + static void memory_device_check_addable(MachineState *ms, MemoryDeviceState *md, MemoryRegion *mr, Error **errp) { const uint64_t used_region_size = ms->device_memory->used_region_size; const uint64_t size = memory_region_size(mr); const unsigned int required_memslots = memory_device_get_memslots(md); + const unsigned int reserved_memslots = get_reserved_memslots(ms); /* we will need memory slots for kvm and vhost */ - if (kvm_enabled() && kvm_get_free_memslots() < required_memslots) { + if (kvm_enabled() && + kvm_get_free_memslots() < required_memslots + reserved_memslots) { error_setg(errp, "hypervisor has not enough free memory slots left"); return; } - if (vhost_get_free_memslots() < required_memslots) { + if (vhost_get_free_memslots() < required_memslots + reserved_memslots) { error_setg(errp, "a used vhost backend has not enough free memory slots left"); return; } diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 8e84dca246..f7e1ac12a8 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -23,6 +23,7 @@ #include "qemu/log.h" #include "standard-headers/linux/vhost_types.h" #include "hw/virtio/virtio-bus.h" +#include "hw/mem/memory-device.h" #include "migration/blocker.h" #include "migration/qemu-file-types.h" #include "sysemu/dma.h" @@ -1423,7 +1424,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, VhostBackendType backend_type, uint32_t busyloop_timeout, Error **errp) { - unsigned int used; + unsigned int used, reserved, limit; uint64_t features; int i, r, n_initialized_vqs = 0; @@ -1529,9 +1530,18 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, } else { used = used_memslots; } - if (used > hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) { - error_setg(errp, "vhost backend memory slots limit is less" - " than current number of present memory slots"); + /* + * We assume that all reserved memslots actually require a real memslot + * in our vhost backend. This might not be true, for example, if the + * memslot would be ROM. If ever relevant, we can optimize for that -- + * but we'll need additional information about the reservations. + */ + reserved = memory_devices_get_reserved_memslots(); + limit = hdev->vhost_ops->vhost_backend_memslots_limit(hdev); + if (used + reserved > limit) { + error_setg(errp, "vhost backend memory slots limit (%d) is less" + " than current number of used (%d) and reserved (%d)" + " memory slots for memory devices.", limit, used, reserved); r = -EINVAL; goto fail_busyloop; } diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h index b51a579fb9..c7b624da6a 100644 --- a/include/hw/mem/memory-device.h +++ b/include/hw/mem/memory-device.h @@ -46,6 +46,12 @@ typedef struct MemoryDeviceState MemoryDeviceState; * single RAM memory region or a memory region container with subregions * that are RAM memory regions or aliases to RAM memory regions. Other * memory regions or subregions are not supported. + * + * If the device memory region returned via @get_memory_region is a + * memory region container, it's supported to dynamically (un)map subregions + * as long as the number of memslots returned by @get_memslots() won't + * be exceeded and as long as all memory regions are of the same kind (e.g., + * all RAM or all ROM). */ struct MemoryDeviceClass { /* private */ @@ -125,6 +131,7 @@ struct MemoryDeviceClass { MemoryDeviceInfoList *qmp_memory_device_list(void); uint64_t get_plugged_memory_size(void); +unsigned int memory_devices_get_reserved_memslots(void); void memory_device_pre_plug(MemoryDeviceState *md, MachineState *ms, const uint64_t *legacy_align, Error **errp); void memory_device_plug(MemoryDeviceState *md, MachineState *ms); diff --git a/stubs/memory_device.c b/stubs/memory_device.c index e75cac62dc..318a5d4187 100644 --- a/stubs/memory_device.c +++ b/stubs/memory_device.c @@ -10,3 +10,8 @@ uint64_t get_plugged_memory_size(void) { return (uint64_t)-1; } + +unsigned int memory_devices_get_reserved_memslots(void) +{ + return 0; +}