From patchwork Mon Mar 10 08:18:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 14009379 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE0091D47C3 for ; Mon, 10 Mar 2025 08:20:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594821; cv=none; b=J2R0T8BswaBqzNoTSszxGxtpFLp2E9e5rp1kgaCo8E0vPJ1GE0bXKUWRJ9ODfppVwatM3VH07ToIx5q1s2DeMoLdk9FGR0ntaTOi0ao14K+1kXbprXXIo3rvB8lbtWLM31YIfNhB476tSe+HoD1mjhqXLOso6jwjp1DFo+YEdcs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594821; c=relaxed/simple; bh=vScSyzfmxhM/0S7N+A4A8vS6DMOjDIXS/tr7RG046k0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DsdIQ15r2aymtJxteDLMsTquAyzDHiFuxt5AaktWGXL6Zeo0aN/OI2pz6m48+xypdh2TecHoKeQ4k5V6Fd3UOzFVp2u0USkf4rbsHjHPoXH2OSqhtfi7XKQ0QV/VRgMkwMVov/+t1Cr8POKcmBc4iZGwLYczJbLj03IkMSBjuJY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FYvylqrm; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FYvylqrm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1741594820; x=1773130820; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vScSyzfmxhM/0S7N+A4A8vS6DMOjDIXS/tr7RG046k0=; b=FYvylqrmOxc5GWtglS4sRq8CD+V65P6J9iEwJ29FMgF+ZR2bqq/iHizV FR1Q0lB3r9/z/Y9u6wMzBweCFR9UnvPMWtE/O8OFL5AxfY5Vt1jbagT/J C9s6Z978fo9RHjZJ4r7c327NGx5QgcyXiTtrsrYJlJgzzmYStzS2A36w4 sDeL9n69VL0pTMwJbga1Y3pU+rZHVb0rBekU+94F6WkK3dR4B2KUUbZA3 quMZ/dJzipjHnHXbrXIuoFN/sD0GD83rUiCbsb9NDLqV7tSVmj7IIn/32 9lNoErszGAqzlv9j1NZAmw/8zZ1X0MZ8+V1TJKlYt/mUmCHsJEAFudJrv A==; X-CSE-ConnectionGUID: /6A6/W17TJetdWRQ5bGUcA== X-CSE-MsgGUID: AsCYZsSwRJminTfkyZdSQA== X-IronPort-AV: E=McAfee;i="6700,10204,11368"; a="42688448" X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="42688448" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:19 -0700 X-CSE-ConnectionGUID: CJtjUPAjR9yZlwNp+LKjJw== X-CSE-MsgGUID: +DlC1u7HSFCuVM0zsdsjPQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="150862786" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:17 -0700 From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun , Li Xiaoyao Subject: [PATCH v3 1/7] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Date: Mon, 10 Mar 2025 16:18:29 +0800 Message-ID: <20250310081837.13123-2-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250310081837.13123-1-chenyi.qiang@intel.com> References: <20250310081837.13123-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Rename the helper to memory_region_section_intersect_range() to make it more generic. Meanwhile, define the @end as Int128 and replace the related operations with Int128_* format since the helper is exported as a wider API. Suggested-by: Alexey Kardashevskiy Reviewed-by: David Hildenbrand Signed-off-by: Chenyi Qiang --- Changes in v3: - No change Changes in v2: - Make memory_region_section_intersect_range() an inline function. - Add Reviewed-by from David - Define the @end as Int128 and use the related Int128_* ops as a wilder API (Alexey) --- hw/virtio/virtio-mem.c | 32 +++++--------------------------- include/exec/memory.h | 27 +++++++++++++++++++++++++++ 2 files changed, 32 insertions(+), 27 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index b1a003736b..21f16e4912 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -244,28 +244,6 @@ static int virtio_mem_for_each_plugged_range(VirtIOMEM *vmem, void *arg, return ret; } -/* - * Adjust the memory section to cover the intersection with the given range. - * - * Returns false if the intersection is empty, otherwise returns true. - */ -static bool virtio_mem_intersect_memory_section(MemoryRegionSection *s, - uint64_t offset, uint64_t size) -{ - uint64_t start = MAX(s->offset_within_region, offset); - uint64_t end = MIN(s->offset_within_region + int128_get64(s->size), - offset + size); - - if (end <= start) { - return false; - } - - s->offset_within_address_space += start - s->offset_within_region; - s->offset_within_region = start; - s->size = int128_make64(end - start); - return true; -} - typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void *arg); static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem, @@ -287,7 +265,7 @@ static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem, first_bit + 1) - 1; size = (last_bit - first_bit + 1) * vmem->block_size; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { break; } ret = cb(&tmp, arg); @@ -319,7 +297,7 @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem, first_bit + 1) - 1; size = (last_bit - first_bit + 1) * vmem->block_size; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { break; } ret = cb(&tmp, arg); @@ -355,7 +333,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset, QLIST_FOREACH(rdl, &vmem->rdl_list, next) { MemoryRegionSection tmp = *rdl->section; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } rdl->notify_discard(rdl, &tmp); @@ -371,7 +349,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, QLIST_FOREACH(rdl, &vmem->rdl_list, next) { MemoryRegionSection tmp = *rdl->section; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } ret = rdl->notify_populate(rdl, &tmp); @@ -388,7 +366,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, if (rdl2 == rdl) { break; } - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } rdl2->notify_discard(rdl2, &tmp); diff --git a/include/exec/memory.h b/include/exec/memory.h index 3ee1901b52..3bebc43d59 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1202,6 +1202,33 @@ MemoryRegionSection *memory_region_section_new_copy(MemoryRegionSection *s); */ void memory_region_section_free_copy(MemoryRegionSection *s); +/** + * memory_region_section_intersect_range: Adjust the memory section to cover + * the intersection with the given range. + * + * @s: the #MemoryRegionSection to be adjusted + * @offset: the offset of the given range in the memory region + * @size: the size of the given range + * + * Returns false if the intersection is empty, otherwise returns true. + */ +static inline bool memory_region_section_intersect_range(MemoryRegionSection *s, + uint64_t offset, uint64_t size) +{ + uint64_t start = MAX(s->offset_within_region, offset); + Int128 end = int128_min(int128_add(int128_make64(s->offset_within_region), s->size), + int128_add(int128_make64(offset), int128_make64(size))); + + if (int128_le(end, int128_make64(start))) { + return false; + } + + s->offset_within_address_space += start - s->offset_within_region; + s->offset_within_region = start; + s->size = int128_sub(end, int128_make64(start)); + return true; +} + /** * memory_region_init: Initialize a memory region * From patchwork Mon Mar 10 08:18:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 14009380 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21CA62236E1 for ; Mon, 10 Mar 2025 08:20:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594824; cv=none; b=H9TyFV6tTM0IHFay5L/Kah+mmiu9S/LkKILtosY+SYec3Hw5M+5FdmplSd+30CK3tfpQkENB5ZnFPJq4Y5aU/meZ7cH5JlqV+xYvfKmYaf2ZLr/yS3T6VRWJzFcwKioWb9OjuLPzwnsewZZT+K7ltoO7vz11d2lmyJqWJDSULwY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594824; c=relaxed/simple; bh=SdF5xZxwT1B/fJL7eNhWt5x8go9X0aH01ikKl69PbXM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Me9zA2PMy7uY+afxnFgwqGfWJ5O2aMhN3CQ0x5MnR838B3wkWP1eTvABykVr8Op5Gx7gQ9WZdGorZ36Z6utDh3S0y3rPHbxqtivITR3Vi/K0Cn8OMVweVedIH3VIJ1LqEgu78Ei1BtkSngOuuZIQy0zk5CGFmmb2T2z368spBT0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AjzecmOu; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AjzecmOu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1741594823; x=1773130823; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SdF5xZxwT1B/fJL7eNhWt5x8go9X0aH01ikKl69PbXM=; b=AjzecmOuizNeTUYVfxUx55si20EomBt4n42ODJlvvfqCE/4nt0UGPUd/ KZCT3ZyLz14b7GCW33hv6pO3WRHcYnW7HWeELhuzsAJRFyNodrm8j/Qkj S/pEGfUhEYSVXs+qEBst+M/agEq7Kh+LpjH2D4fOeFJ9LVGPVuK5SVLe9 K8/Y37PELbAEvcG21GBLNpMvsJ7BjB8k7d/4m58/MxLlWEIrRUR1T+JSN KQU86FESsNcQUqeCLUGCmCG36gqKX0nlUA+VuH0y5On5EgBT94SoKRtFe k+wKN7V9GzpLD2RYQUtBASadXU0xxXYXXwc1OQcY4VIY7tXr8edl5I01a w==; X-CSE-ConnectionGUID: Se6ECu6XSA6rhgolI7B5hw== X-CSE-MsgGUID: hjvcZP9TSRGe8Wm617IpCQ== X-IronPort-AV: E=McAfee;i="6700,10204,11368"; a="42688456" X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="42688456" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:23 -0700 X-CSE-ConnectionGUID: X70SiXx8S1iJooE6XqbCkw== X-CSE-MsgGUID: DSf7WM4yQ4qv41AiUP25eg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="150862796" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:20 -0700 From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun , Li Xiaoyao Subject: [PATCH v3 2/7] memory: Change memory_region_set_ram_discard_manager() to return the result Date: Mon, 10 Mar 2025 16:18:30 +0800 Message-ID: <20250310081837.13123-3-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250310081837.13123-1-chenyi.qiang@intel.com> References: <20250310081837.13123-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Modify memory_region_set_ram_discard_manager() to return false if a RamDiscardManager is already set in the MemoryRegion. The caller must handle this failure, such as having virtio-mem undo its actions and fail the realize() process. Opportunistically move the call earlier to avoid complex error handling. This change is beneficial when introducing a new RamDiscardManager instance besides virtio-mem. After ram_block_coordinated_discard_require(true) unlocks all RamDiscardManager instances, only one instance is allowed to be set for a MemoryRegion at present. Suggested-by: David Hildenbrand Signed-off-by: Chenyi Qiang --- Changes in v3: - Move set_ram_discard_manager() up to avoid a g_free() - Clean up set_ram_discard_manager() definition Changes in v2: - newly added. --- hw/virtio/virtio-mem.c | 29 ++++++++++++++++------------- include/exec/memory.h | 6 +++--- system/memory.c | 10 +++++++--- 3 files changed, 26 insertions(+), 19 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 21f16e4912..d0d3a0240f 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -1049,6 +1049,17 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) return; } + /* + * Set ourselves as RamDiscardManager before the plug handler maps the + * memory region and exposes it via an address space. + */ + if (memory_region_set_ram_discard_manager(&vmem->memdev->mr, + RAM_DISCARD_MANAGER(vmem))) { + error_setg(errp, "Failed to set RamDiscardManager"); + ram_block_coordinated_discard_require(false); + return; + } + /* * We don't know at this point whether shared RAM is migrated using * QEMU or migrated using the file content. "x-ignore-shared" will be @@ -1124,13 +1135,6 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) vmem->system_reset = VIRTIO_MEM_SYSTEM_RESET(obj); vmem->system_reset->vmem = vmem; qemu_register_resettable(obj); - - /* - * Set ourselves as RamDiscardManager before the plug handler maps the - * memory region and exposes it via an address space. - */ - memory_region_set_ram_discard_manager(&vmem->memdev->mr, - RAM_DISCARD_MANAGER(vmem)); } static void virtio_mem_device_unrealize(DeviceState *dev) @@ -1138,12 +1142,6 @@ static void virtio_mem_device_unrealize(DeviceState *dev) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIOMEM *vmem = VIRTIO_MEM(dev); - /* - * The unplug handler unmapped the memory region, it cannot be - * found via an address space anymore. Unset ourselves. - */ - memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL); - qemu_unregister_resettable(OBJECT(vmem->system_reset)); object_unref(OBJECT(vmem->system_reset)); @@ -1156,6 +1154,11 @@ static void virtio_mem_device_unrealize(DeviceState *dev) virtio_del_queue(vdev, 0); virtio_cleanup(vdev); g_free(vmem->bitmap); + /* + * The unplug handler unmapped the memory region, it cannot be + * found via an address space anymore. Unset ourselves. + */ + memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL); ram_block_coordinated_discard_require(false); } diff --git a/include/exec/memory.h b/include/exec/memory.h index 3bebc43d59..390477b588 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -2487,13 +2487,13 @@ static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr) * * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion * that does not cover RAM, or a #MemoryRegion that already has a - * #RamDiscardManager assigned. + * #RamDiscardManager assigned. Return 0 if the rdm is set successfully. * * @mr: the #MemoryRegion * @rdm: #RamDiscardManager to set */ -void memory_region_set_ram_discard_manager(MemoryRegion *mr, - RamDiscardManager *rdm); +int memory_region_set_ram_discard_manager(MemoryRegion *mr, + RamDiscardManager *rdm); /** * memory_region_find: translate an address/size relative to a diff --git a/system/memory.c b/system/memory.c index b17b5538ff..62d6b410f0 100644 --- a/system/memory.c +++ b/system/memory.c @@ -2115,12 +2115,16 @@ RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr) return mr->rdm; } -void memory_region_set_ram_discard_manager(MemoryRegion *mr, - RamDiscardManager *rdm) +int memory_region_set_ram_discard_manager(MemoryRegion *mr, + RamDiscardManager *rdm) { g_assert(memory_region_is_ram(mr)); - g_assert(!rdm || !mr->rdm); + if (mr->rdm && rdm) { + return -EBUSY; + } + mr->rdm = rdm; + return 0; } uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, From patchwork Mon Mar 10 08:18:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 14009381 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C9A12206B7 for ; Mon, 10 Mar 2025 08:20:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594828; cv=none; b=IpuQlzaE/5V54Th8gvrwG9Gm/zyNPZgPMym+nCIJr6SyB6GtJsqFGV98OcFIVxQb7aStfZe2xEKsBkZERwYLeeh2pcauf9RmDgPoYu64FAI/KzG9Xam4lpEr/y1MxaM/sw+ue6znox0OGP55cGqRKkrH1Tgt1lYSUB2WxajnDts= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594828; c=relaxed/simple; bh=jc+MM1PaxuEIdOSl04LSzW2Hm+iS2U1LUcyIKZi7jKI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Rvo94P9XV4nirn9d2eXSorUG76FsW4c6dy5sCineXr/kpkjsPJUGlC3PoyJYi3yd/aK6TTJvTcOPZ0krYEAKsUIaqUB66WAVr9QvHoTP6HY8KW69pQgNYpxRkY500i2g3DyQHQI3CWrIZSxiX3RJZDH5SYc2qOet/BMprMUTaNA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=oH2AuHV8; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="oH2AuHV8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1741594826; x=1773130826; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jc+MM1PaxuEIdOSl04LSzW2Hm+iS2U1LUcyIKZi7jKI=; b=oH2AuHV8P6r4wfoUZTDZu2EZ7LtIgaChDeJ9NifPdS1h19PTupaGs9MZ ZLNWUzmvQKIIsSF6butijxkk13HXPgW1yhDcuW4Mgcb6mLlLKBG+LvYEn k+FmMHG+jA72tz6+U9gcT7npKJPIxWvpIwbiaFOt3rxlvPZP6F/cfwZaw gWLX99jgdX4ioAvRv8LuW9n+95s+1d0XHRGp105iZabwm7Wjo1bdLgK8R iLX+FPHr51O+h9pm8yGoPVUDvyLzZZWoC359nAKeKgX8sjcOeHkKZ7oth fI1UXEICQCaZNY2F4oajtc5damVuqlqvuCIGRI1Vtl+yVluMqzloSS+YN A==; X-CSE-ConnectionGUID: H24bu5yZTmiuJ5AN1P8Yig== X-CSE-MsgGUID: vGGGEYqoTlO70Wl9tjtOCA== X-IronPort-AV: E=McAfee;i="6700,10204,11368"; a="42688466" X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="42688466" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:26 -0700 X-CSE-ConnectionGUID: rFbQ405CSFuduoZ3ISdGHw== X-CSE-MsgGUID: obLDyxTXS4uay3UWGzWUvA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="150862807" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:23 -0700 From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun , Li Xiaoyao Subject: [PATCH v3 3/7] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() Date: Mon, 10 Mar 2025 16:18:31 +0800 Message-ID: <20250310081837.13123-4-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250310081837.13123-1-chenyi.qiang@intel.com> References: <20250310081837.13123-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Update ReplayRamDiscard() function to return the result and unify the ReplayRamPopulate() and ReplayRamDiscard() to ReplayRamStateChange() at the same time due to their identical definitions. This unification simplifies related structures, such as VirtIOMEMReplayData, which makes it more cleaner and maintainable. It also paves the way for future extension when introducing new ReplayRamDiscard() functions to return the results. Signed-off-by: Chenyi Qiang --- Changes in v3: - Newly added. --- hw/virtio/virtio-mem.c | 20 ++++++++++---------- include/exec/memory.h | 31 ++++++++++++++++--------------- migration/ram.c | 5 +++-- system/memory.c | 12 ++++++------ 4 files changed, 35 insertions(+), 33 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index d0d3a0240f..816ae8abdd 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -1733,7 +1733,7 @@ static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm, } struct VirtIOMEMReplayData { - void *fn; + ReplayRamStateChange fn; void *opaque; }; @@ -1741,12 +1741,12 @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg) { struct VirtIOMEMReplayData *data = arg; - return ((ReplayRamPopulate)data->fn)(s, data->opaque); + return data->fn(s, data->opaque); } static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *s, - ReplayRamPopulate replay_fn, + ReplayRamStateChange replay_fn, void *opaque) { const VirtIOMEM *vmem = VIRTIO_MEM(rdm); @@ -1765,14 +1765,14 @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s, { struct VirtIOMEMReplayData *data = arg; - ((ReplayRamDiscard)data->fn)(s, data->opaque); + data->fn(s, data->opaque); return 0; } -static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, - MemoryRegionSection *s, - ReplayRamDiscard replay_fn, - void *opaque) +static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *s, + ReplayRamStateChange replay_fn, + void *opaque) { const VirtIOMEM *vmem = VIRTIO_MEM(rdm); struct VirtIOMEMReplayData data = { @@ -1781,8 +1781,8 @@ static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, }; g_assert(s->mr == &vmem->memdev->mr); - virtio_mem_for_each_unplugged_section(vmem, s, &data, - virtio_mem_rdm_replay_discarded_cb); + return virtio_mem_for_each_unplugged_section(vmem, s, &data, + virtio_mem_rdm_replay_discarded_cb); } static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm, diff --git a/include/exec/memory.h b/include/exec/memory.h index 390477b588..aa67d84329 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -566,8 +566,7 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl, rdl->double_discard_supported = double_discard_supported; } -typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque); -typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque); +typedef int (*ReplayRamStateChange)(MemoryRegionSection *section, void *opaque); /* * RamDiscardManagerClass: @@ -641,36 +640,38 @@ struct RamDiscardManagerClass { /** * @replay_populated: * - * Call the #ReplayRamPopulate callback for all populated parts within the + * Call the #ReplayRamStateChange callback for all populated parts within the * #MemoryRegionSection via the #RamDiscardManager. * * In case any call fails, no further calls are made. * * @rdm: the #RamDiscardManager * @section: the #MemoryRegionSection - * @replay_fn: the #ReplayRamPopulate callback + * @replay_fn: the #ReplayRamStateChange callback * @opaque: pointer to forward to the callback * * Returns 0 on success, or a negative error if any notification failed. */ int (*replay_populated)(const RamDiscardManager *rdm, MemoryRegionSection *section, - ReplayRamPopulate replay_fn, void *opaque); + ReplayRamStateChange replay_fn, void *opaque); /** * @replay_discarded: * - * Call the #ReplayRamDiscard callback for all discarded parts within the + * Call the #ReplayRamStateChange callback for all discarded parts within the * #MemoryRegionSection via the #RamDiscardManager. * * @rdm: the #RamDiscardManager * @section: the #MemoryRegionSection - * @replay_fn: the #ReplayRamDiscard callback + * @replay_fn: the #ReplayRamStateChange callback * @opaque: pointer to forward to the callback + * + * Returns 0 on success, or a negative error if any notification failed. */ - void (*replay_discarded)(const RamDiscardManager *rdm, - MemoryRegionSection *section, - ReplayRamDiscard replay_fn, void *opaque); + int (*replay_discarded)(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamStateChange replay_fn, void *opaque); /** * @register_listener: @@ -713,13 +714,13 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *section, - ReplayRamPopulate replay_fn, + ReplayRamStateChange replay_fn, void *opaque); -void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, - MemoryRegionSection *section, - ReplayRamDiscard replay_fn, - void *opaque); +int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamStateChange replay_fn, + void *opaque); void ram_discard_manager_register_listener(RamDiscardManager *rdm, RamDiscardListener *rdl, diff --git a/migration/ram.c b/migration/ram.c index ce28328141..053730367b 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -816,8 +816,8 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs, return ret; } -static void dirty_bitmap_clear_section(MemoryRegionSection *section, - void *opaque) +static int dirty_bitmap_clear_section(MemoryRegionSection *section, + void *opaque) { const hwaddr offset = section->offset_within_region; const hwaddr size = int128_get64(section->size); @@ -836,6 +836,7 @@ static void dirty_bitmap_clear_section(MemoryRegionSection *section, } *cleared_bits += bitmap_count_one_with_offset(rb->bmap, start, npages); bitmap_clear(rb->bmap, start, npages); + return 0; } /* diff --git a/system/memory.c b/system/memory.c index 62d6b410f0..8622d17133 100644 --- a/system/memory.c +++ b/system/memory.c @@ -2147,7 +2147,7 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *section, - ReplayRamPopulate replay_fn, + ReplayRamStateChange replay_fn, void *opaque) { RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); @@ -2156,15 +2156,15 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, return rdmc->replay_populated(rdm, section, replay_fn, opaque); } -void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, - MemoryRegionSection *section, - ReplayRamDiscard replay_fn, - void *opaque) +int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamStateChange replay_fn, + void *opaque) { RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); g_assert(rdmc->replay_discarded); - rdmc->replay_discarded(rdm, section, replay_fn, opaque); + return rdmc->replay_discarded(rdm, section, replay_fn, opaque); } void ram_discard_manager_register_listener(RamDiscardManager *rdm, From patchwork Mon Mar 10 08:18:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 14009382 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 282792206B7 for ; Mon, 10 Mar 2025 08:20:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594832; cv=none; b=WUFh1wpvw2n/0TLfG1c+N+bn/87OBrY/J2VPSyJ0tJ+I396xmlOhCx4QX2ns1OnXozkGnp1ra8dgFUZF3SWaiLeYlNBr5y3McuuEFGsoi4dCyHYoS4evPm6210D6rV6IDLX+9m1L2DbCP7JMte5xd9cG8+coGfTOgxIoWegLJBc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594832; c=relaxed/simple; bh=HxWtun9ij4pwEwNTy5OfOAU4jVRK3y6QcvEkKC14gyw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=boXKGAc/DOex4aRgGNDW/R1vf2zeBC/ue1xUGGtyF7pAKUtRdiqYItktLIY7J9Vxdq868i+zBS2MJFP8yppjp5Wtc1c+mTMevLEWCOvIORhcGKFXDe9M33ZOsVx/1ZZnRQ4ST4DiSODfwhWk3+b3QfwY1wz+C2F3Bl7Wt46JIB0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=egsr+2T7; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="egsr+2T7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1741594830; x=1773130830; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HxWtun9ij4pwEwNTy5OfOAU4jVRK3y6QcvEkKC14gyw=; b=egsr+2T7VfiuhC8nVVza/4zqVsym/sy5KS2YK6aT2YYZk3iB6STMiuAX /fBXvrC03yNUJuntjcmhJvcY61dk/roQMYMwCXYpuBARCuL+Y7xFQ7g9p JR1GonU35GI1NFKk2DoIGRB7vkBqhw1WeDwu9IrmkUm4BP390JModUTHw XY2wkHkT3KDGXzWhctVEKE8RwddGIqnGvNOxjzxfyp0ckUSsvAXQ8DRtf tNl59gFOts0eOsfFh/nQia0Yg5qCbRrDXXli4Ripc/TN1tXB7S/gnIUT6 zfkQCfwd+m11gH/4cz0s9x/fNO5MYJhX8cmvNvliyX0op60QIRuQdHGEH g==; X-CSE-ConnectionGUID: CX+cvLVHS4iXOByjAgI+jw== X-CSE-MsgGUID: URT/c8GHRUS8BvnyjoxB+Q== X-IronPort-AV: E=McAfee;i="6700,10204,11368"; a="42688479" X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="42688479" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:30 -0700 X-CSE-ConnectionGUID: 6I13lfMUTVil4T7P96Ktbg== X-CSE-MsgGUID: Rd2hUbOGQS2lqDVUd3huHA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="150862818" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:26 -0700 From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun , Li Xiaoyao Subject: [PATCH v3 4/7] memory-attribute-manager: Introduce MemoryAttributeManager to manage RAMBLock with guest_memfd Date: Mon, 10 Mar 2025 16:18:32 +0800 Message-ID: <20250310081837.13123-5-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250310081837.13123-1-chenyi.qiang@intel.com> References: <20250310081837.13123-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 As the commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated discard") highlighted, some subsystems like VFIO may disable ram block discard. However, guest_memfd relies on the discard operation to perform page conversion between private and shared memory. This can lead to stale IOMMU mapping issue when assigning a hardware device to a confidential VM via shared memory. To address this, it is crucial to ensure systems like VFIO refresh its IOMMU mappings. RamDiscardManager is an existing concept (used by virtio-mem) to adjust VFIO mappings in relation to VM page assignment. Effectively page conversion is similar to hot-removing a page in one mode and adding it back in the other. Therefore, similar actions are required for page conversion events. Introduce the RamDiscardManager to guest_memfd to facilitate this process. Since guest_memfd is not an object, it cannot directly implement the RamDiscardManager interface. One potential attempt is to implement it in HostMemoryBackend. This is not appropriate because guest_memfd is per RAMBlock. Some RAMBlocks have a memory backend but others do not. In particular, the ones like virtual BIOS calling memory_region_init_ram_guest_memfd() do not. To manage the RAMBlocks with guest_memfd, define a new object named MemoryAttributeManager to implement the RamDiscardManager interface. The object stores guest_memfd information such as shared_bitmap, and handles page conversion notification. Using the name of MemoryAttributeManager is aimed to make it more generic. The term "Memory" emcompasses not only RAM but also private MMIO in TEE I/O, which might rely on this object/interface to handle page conversion events in the future. The term "Attribute" allows for the management of various attributes beyond shared and private. For instance, it could support scenarios where discard vs. populated and shared vs. private states co-exists, such as supporting virtio-mem or something similar in the future. In the current context, MemoryAttributeManager signifies discarded state as private and populated state as shared. Memory state is tracked at the host page size granularity, as the minimum memory conversion size can be one page per request. Additionally, VFIO expects the DMA mapping for a specific iova to be mapped and unmapped with the same granularity. Confidential VMs may perform partial conversions, e.g. conversion happens on a small region within a large region. To prevent such invalid cases and until cut_mapping operation support is introduced, all operations are performed with 4K granularity. Signed-off-by: Chenyi Qiang --- Changes in v3: - Some rename (bitmap_size->shared_bitmap_size, first_one/zero_bit->first_bit, etc.) - Change shared_bitmap_size from uint32_t to unsigned - Return mgr->mr->ram_block->page_size in get_block_size() - Move set_ram_discard_manager() up to avoid a g_free() in failure case. - Add const for the memory_attribute_manager_get_block_size() - Unify the ReplayRamPopulate and ReplayRamDiscard and related callback. Changes in v2: - Rename the object name to MemoryAttributeManager - Rename the bitmap to shared_bitmap to make it more clear. - Remove block_size field and get it from a helper. In future, we can get the page_size from RAMBlock if necessary. - Remove the unncessary "struct" before GuestMemfdReplayData - Remove the unncessary g_free() for the bitmap - Add some error report when the callback failure for populated/discarded section. - Move the realize()/unrealize() definition to this patch. --- include/system/memory-attribute-manager.h | 42 ++++ system/memory-attribute-manager.c | 283 ++++++++++++++++++++++ system/meson.build | 1 + 3 files changed, 326 insertions(+) create mode 100644 include/system/memory-attribute-manager.h create mode 100644 system/memory-attribute-manager.c diff --git a/include/system/memory-attribute-manager.h b/include/system/memory-attribute-manager.h new file mode 100644 index 0000000000..23375a14b8 --- /dev/null +++ b/include/system/memory-attribute-manager.h @@ -0,0 +1,42 @@ +/* + * QEMU memory attribute manager + * + * Copyright Intel + * + * Author: + * Chenyi Qiang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory + * + */ + +#ifndef SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H +#define SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H + +#include "system/hostmem.h" + +#define TYPE_MEMORY_ATTRIBUTE_MANAGER "memory-attribute-manager" + +OBJECT_DECLARE_TYPE(MemoryAttributeManager, MemoryAttributeManagerClass, MEMORY_ATTRIBUTE_MANAGER) + +struct MemoryAttributeManager { + Object parent; + + MemoryRegion *mr; + + /* 1-setting of the bit represents the memory is populated (shared) */ + unsigned shared_bitmap_size; + unsigned long *shared_bitmap; + + QLIST_HEAD(, RamDiscardListener) rdl_list; +}; + +struct MemoryAttributeManagerClass { + ObjectClass parent_class; +}; + +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr); +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr); + +#endif diff --git a/system/memory-attribute-manager.c b/system/memory-attribute-manager.c new file mode 100644 index 0000000000..7c3789cf49 --- /dev/null +++ b/system/memory-attribute-manager.c @@ -0,0 +1,283 @@ +/* + * QEMU memory attribute manager + * + * Copyright Intel + * + * Author: + * Chenyi Qiang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory + * + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "exec/ramblock.h" +#include "system/memory-attribute-manager.h" + +OBJECT_DEFINE_TYPE_WITH_INTERFACES(MemoryAttributeManager, + memory_attribute_manager, + MEMORY_ATTRIBUTE_MANAGER, + OBJECT, + { TYPE_RAM_DISCARD_MANAGER }, + { }) + +static size_t memory_attribute_manager_get_block_size(const MemoryAttributeManager *mgr) +{ + /* + * Because page conversion could be manipulated in the size of at least 4K or 4K aligned, + * Use the host page size as the granularity to track the memory attribute. + */ + g_assert(mgr && mgr->mr && mgr->mr->ram_block); + g_assert(mgr->mr->ram_block->page_size == qemu_real_host_page_size()); + return mgr->mr->ram_block->page_size; +} + + +static bool memory_attribute_rdm_is_populated(const RamDiscardManager *rdm, + const MemoryRegionSection *section) +{ + const MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); + const int block_size = memory_attribute_manager_get_block_size(mgr); + uint64_t first_bit = section->offset_within_region / block_size; + uint64_t last_bit = first_bit + int128_get64(section->size) / block_size - 1; + unsigned long first_discard_bit; + + first_discard_bit = find_next_zero_bit(mgr->shared_bitmap, last_bit + 1, first_bit); + return first_discard_bit > last_bit; +} + +typedef int (*memory_attribute_section_cb)(MemoryRegionSection *s, void *arg); + +static int memory_attribute_notify_populate_cb(MemoryRegionSection *section, void *arg) +{ + RamDiscardListener *rdl = arg; + + return rdl->notify_populate(rdl, section); +} + +static int memory_attribute_notify_discard_cb(MemoryRegionSection *section, void *arg) +{ + RamDiscardListener *rdl = arg; + + rdl->notify_discard(rdl, section); + + return 0; +} + +static int memory_attribute_for_each_populated_section(const MemoryAttributeManager *mgr, + MemoryRegionSection *section, + void *arg, + memory_attribute_section_cb cb) +{ + unsigned long first_bit, last_bit; + uint64_t offset, size; + const int block_size = memory_attribute_manager_get_block_size(mgr); + int ret = 0; + + first_bit = section->offset_within_region / block_size; + first_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, first_bit); + + while (first_bit < mgr->shared_bitmap_size) { + MemoryRegionSection tmp = *section; + + offset = first_bit * block_size; + last_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, + first_bit + 1) - 1; + size = (last_bit - first_bit + 1) * block_size; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + break; + } + + ret = cb(&tmp, arg); + if (ret) { + error_report("%s: Failed to notify RAM discard listener: %s", __func__, + strerror(-ret)); + break; + } + + first_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, + last_bit + 2); + } + + return ret; +} + +static int memory_attribute_for_each_discarded_section(const MemoryAttributeManager *mgr, + MemoryRegionSection *section, + void *arg, + memory_attribute_section_cb cb) +{ + unsigned long first_bit, last_bit; + uint64_t offset, size; + const int block_size = memory_attribute_manager_get_block_size(mgr); + int ret = 0; + + first_bit = section->offset_within_region / block_size; + first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, + first_bit); + + while (first_bit < mgr->shared_bitmap_size) { + MemoryRegionSection tmp = *section; + + offset = first_bit * block_size; + last_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, + first_bit + 1) - 1; + size = (last_bit - first_bit + 1) * block_size; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + break; + } + + ret = cb(&tmp, arg); + if (ret) { + error_report("%s: Failed to notify RAM discard listener: %s", __func__, + strerror(-ret)); + break; + } + + first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, + last_bit + 2); + } + + return ret; +} + +static uint64_t memory_attribute_rdm_get_min_granularity(const RamDiscardManager *rdm, + const MemoryRegion *mr) +{ + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); + + g_assert(mr == mgr->mr); + return memory_attribute_manager_get_block_size(mgr); +} + +static void memory_attribute_rdm_register_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl, + MemoryRegionSection *section) +{ + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); + int ret; + + g_assert(section->mr == mgr->mr); + rdl->section = memory_region_section_new_copy(section); + + QLIST_INSERT_HEAD(&mgr->rdl_list, rdl, next); + + ret = memory_attribute_for_each_populated_section(mgr, section, rdl, + memory_attribute_notify_populate_cb); + if (ret) { + error_report("%s: Failed to register RAM discard listener: %s", __func__, + strerror(-ret)); + } +} + +static void memory_attribute_rdm_unregister_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl) +{ + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); + int ret; + + g_assert(rdl->section); + g_assert(rdl->section->mr == mgr->mr); + + ret = memory_attribute_for_each_populated_section(mgr, rdl->section, rdl, + memory_attribute_notify_discard_cb); + if (ret) { + error_report("%s: Failed to unregister RAM discard listener: %s", __func__, + strerror(-ret)); + } + + memory_region_section_free_copy(rdl->section); + rdl->section = NULL; + QLIST_REMOVE(rdl, next); + +} + +typedef struct MemoryAttributeReplayData { + ReplayRamStateChange fn; + void *opaque; +} MemoryAttributeReplayData; + +static int memory_attribute_rdm_replay_cb(MemoryRegionSection *section, void *arg) +{ + MemoryAttributeReplayData *data = arg; + + return data->fn(section, data->opaque); +} + +static int memory_attribute_rdm_replay_populated(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamStateChange replay_fn, + void *opaque) +{ + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); + MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = opaque }; + + g_assert(section->mr == mgr->mr); + return memory_attribute_for_each_populated_section(mgr, section, &data, + memory_attribute_rdm_replay_cb); +} + +static int memory_attribute_rdm_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamStateChange replay_fn, + void *opaque) +{ + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); + MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = opaque }; + + g_assert(section->mr == mgr->mr); + return memory_attribute_for_each_discarded_section(mgr, section, &data, + memory_attribute_rdm_replay_cb); +} + +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr) +{ + uint64_t shared_bitmap_size; + const int block_size = qemu_real_host_page_size(); + int ret; + + shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size; + + mgr->mr = mr; + ret = memory_region_set_ram_discard_manager(mgr->mr, RAM_DISCARD_MANAGER(mgr)); + if (ret) { + return ret; + } + mgr->shared_bitmap_size = shared_bitmap_size; + mgr->shared_bitmap = bitmap_new(shared_bitmap_size); + + return ret; +} + +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr) +{ + g_free(mgr->shared_bitmap); + memory_region_set_ram_discard_manager(mgr->mr, NULL); +} + +static void memory_attribute_manager_init(Object *obj) +{ + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(obj); + + QLIST_INIT(&mgr->rdl_list); +} + +static void memory_attribute_manager_finalize(Object *obj) +{ +} + +static void memory_attribute_manager_class_init(ObjectClass *oc, void *data) +{ + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); + + rdmc->get_min_granularity = memory_attribute_rdm_get_min_granularity; + rdmc->register_listener = memory_attribute_rdm_register_listener; + rdmc->unregister_listener = memory_attribute_rdm_unregister_listener; + rdmc->is_populated = memory_attribute_rdm_is_populated; + rdmc->replay_populated = memory_attribute_rdm_replay_populated; + rdmc->replay_discarded = memory_attribute_rdm_replay_discarded; +} diff --git a/system/meson.build b/system/meson.build index 4952f4b2c7..ab07ff1442 100644 --- a/system/meson.build +++ b/system/meson.build @@ -15,6 +15,7 @@ system_ss.add(files( 'dirtylimit.c', 'dma-helpers.c', 'globals.c', + 'memory-attribute-manager.c', 'memory_mapping.c', 'qdev-monitor.c', 'qtest.c', From patchwork Mon Mar 10 08:18:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 14009383 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87C94223704 for ; Mon, 10 Mar 2025 08:20:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594835; cv=none; b=RvLoz1ho2C3sdYPnSU3biNyVTloWK59x73LWE4XVAnYHEkdsdUW4yiu4EvsoAFoxhYGm+Ku6Cy7DVp6NGmQu0ObVJ4EoaPkLoxbmqslB+pjLpoKJLRuo81m6cxCAj0HJ9reE8gvHvaePLZDNWCgbOrJ3U1357g5VfgVYDgbBORc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594835; c=relaxed/simple; bh=b6lchNVADy7xTCAqWEfXhsSnOKmLmxfVx+1juDHu804=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ajh9PTnba/3Ysq73WUm+/nftjLHKad/jdG4dgc4jsER/ke8JuWE9dnpxFZgApxWFEhW5+DuEolcUCl9Ab5txcSyq+Czh8n4xVZ3egv5ZQ90PgxZ4eEH8UYT32FnWqFeJfR74XtXEMMYmPMdaEn2mXFuRC4UDDOlxYFS18iXtrTc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZGI/iz1d; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZGI/iz1d" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1741594833; x=1773130833; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=b6lchNVADy7xTCAqWEfXhsSnOKmLmxfVx+1juDHu804=; b=ZGI/iz1dQzU0XKtUoZTq54uSWidunkaiCKxF1eMqB0M3vv0oeoJBhZKq YN6+wZOzjoAunArrL/nfHNGPlYYsVdBblzLrm5XDC1b1waDyNuNa3O2OY qaqTZiZMIKxoHyXW/DLgSWGgcAF2thGPQugIXtJ5j7ZJAiwZfSzSeN6Gf 5HBYV1P4e6HyUf2EyClO6HRuEy2r4ULQn1Lv6Au44GBuoO6174MkNRqZO AAnj497ccgC5BRkPaEQ2aDllNRvFDMcB8v9wJBP2BzdvSADWTO90O4wrm SENBQIpLC5ezQTxAtk9bP/qH204nprfmX+cdduAc8PcgnI5XqE9u9WWQc g==; X-CSE-ConnectionGUID: 7unmDtFbSQuVm708gRlq7Q== X-CSE-MsgGUID: dHpRa+FpRAOmk1TBv8JbCg== X-IronPort-AV: E=McAfee;i="6700,10204,11368"; a="42688493" X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="42688493" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:33 -0700 X-CSE-ConnectionGUID: whMZImZRRfyA/jFe6PRazw== X-CSE-MsgGUID: JoK30sO+ReKoPWf5DZzAdQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="150862833" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:30 -0700 From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun , Li Xiaoyao Subject: [PATCH v3 5/7] memory-attribute-manager: Introduce a callback to notify the shared/private state change Date: Mon, 10 Mar 2025 16:18:33 +0800 Message-ID: <20250310081837.13123-6-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250310081837.13123-1-chenyi.qiang@intel.com> References: <20250310081837.13123-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Introduce a new state_change() callback in MemoryAttributeManagerClass to efficiently notify all registered RamDiscardListeners, including VFIO listeners about the memory conversion events in guest_memfd. The existing VFIO listener can dynamically DMA map/unmap the shared pages based on conversion types: - For conversions from shared to private, the VFIO system ensures the discarding of shared mapping from the IOMMU. - For conversions from private to shared, it triggers the population of the shared mapping into the IOMMU. Additionally, there could be some special conversion requests: - When a conversion request is made for a page already in the desired state, the helper simply returns success. - For requests involving a range partially in the desired state, only the necessary segments are converted, ensuring the entire range complies with the request efficiently. In this case, fallback to a "1 block at a time" handling. - In scenarios where a conversion request is declined by other systems, such as a failure from VFIO during notify_populate(), the helper will roll back the request, maintaining consistency. Note that the bitmap status is updated before the notifier callbacks so that the listener can handle the memory based on the latest status. Opportunistically introduce a helper to trigger the state_change() callback of the class. Signed-off-by: Chenyi Qiang --- Changes in v3: - Move the bitmap update before notifier callbacks. - Call the notifier callbacks directly in notify_discard/populate() with the expectation that the request memory range is in the desired attribute. - For the case that only partial range in the desire status, handle the range with block_size granularity for ease of rollback (https://lore.kernel.org/qemu-devel/812768d7-a02d-4b29-95f3-fb7a125cf54e@redhat.com/) Changes in v2: - Do the alignment changes due to the rename to MemoryAttributeManager - Move the state_change() helper definition in this patch. --- include/system/memory-attribute-manager.h | 18 +++ system/memory-attribute-manager.c | 188 ++++++++++++++++++++++ 2 files changed, 206 insertions(+) diff --git a/include/system/memory-attribute-manager.h b/include/system/memory-attribute-manager.h index 23375a14b8..3d9227d62a 100644 --- a/include/system/memory-attribute-manager.h +++ b/include/system/memory-attribute-manager.h @@ -34,8 +34,26 @@ struct MemoryAttributeManager { struct MemoryAttributeManagerClass { ObjectClass parent_class; + + int (*state_change)(MemoryAttributeManager *mgr, uint64_t offset, uint64_t size, + bool to_private); }; +static inline int memory_attribute_manager_state_change(MemoryAttributeManager *mgr, uint64_t offset, + uint64_t size, bool to_private) +{ + MemoryAttributeManagerClass *klass; + + if (mgr == NULL) { + return 0; + } + + klass = MEMORY_ATTRIBUTE_MANAGER_GET_CLASS(mgr); + + g_assert(klass->state_change); + return klass->state_change(mgr, offset, size, to_private); +} + int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr); void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr); diff --git a/system/memory-attribute-manager.c b/system/memory-attribute-manager.c index 7c3789cf49..6456babc95 100644 --- a/system/memory-attribute-manager.c +++ b/system/memory-attribute-manager.c @@ -234,6 +234,191 @@ static int memory_attribute_rdm_replay_discarded(const RamDiscardManager *rdm, memory_attribute_rdm_replay_cb); } +static bool memory_attribute_is_valid_range(MemoryAttributeManager *mgr, + uint64_t offset, uint64_t size) +{ + MemoryRegion *mr = mgr->mr; + + g_assert(mr); + + uint64_t region_size = memory_region_size(mr); + int block_size = memory_attribute_manager_get_block_size(mgr); + + if (!QEMU_IS_ALIGNED(offset, block_size)) { + return false; + } + if (offset + size < offset || !size) { + return false; + } + if (offset >= region_size || offset + size > region_size) { + return false; + } + return true; +} + +static void memory_attribute_notify_discard(MemoryAttributeManager *mgr, + uint64_t offset, uint64_t size) +{ + RamDiscardListener *rdl; + + QLIST_FOREACH(rdl, &mgr->rdl_list, next) { + MemoryRegionSection tmp = *rdl->section; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + rdl->notify_discard(rdl, &tmp); + } +} + +static int memory_attribute_notify_populate(MemoryAttributeManager *mgr, + uint64_t offset, uint64_t size) +{ + RamDiscardListener *rdl, *rdl2; + int ret = 0; + + QLIST_FOREACH(rdl, &mgr->rdl_list, next) { + MemoryRegionSection tmp = *rdl->section; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + ret = rdl->notify_populate(rdl, &tmp); + if (ret) { + break; + } + } + + if (ret) { + /* Notify all already-notified listeners. */ + QLIST_FOREACH(rdl2, &mgr->rdl_list, next) { + MemoryRegionSection tmp = *rdl2->section; + + if (rdl2 == rdl) { + break; + } + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + rdl2->notify_discard(rdl2, &tmp); + } + } + return ret; +} + +static bool memory_attribute_is_range_populated(MemoryAttributeManager *mgr, + uint64_t offset, uint64_t size) +{ + const int block_size = memory_attribute_manager_get_block_size(mgr); + const unsigned long first_bit = offset / block_size; + const unsigned long last_bit = first_bit + (size / block_size) - 1; + unsigned long found_bit; + + /* We fake a shorter bitmap to avoid searching too far. */ + found_bit = find_next_zero_bit(mgr->shared_bitmap, last_bit + 1, first_bit); + return found_bit > last_bit; +} + +static bool memory_attribute_is_range_discarded(MemoryAttributeManager *mgr, + uint64_t offset, uint64_t size) +{ + const int block_size = memory_attribute_manager_get_block_size(mgr); + const unsigned long first_bit = offset / block_size; + const unsigned long last_bit = first_bit + (size / block_size) - 1; + unsigned long found_bit; + + /* We fake a shorter bitmap to avoid searching too far. */ + found_bit = find_next_bit(mgr->shared_bitmap, last_bit + 1, first_bit); + return found_bit > last_bit; +} + +static int memory_attribute_state_change(MemoryAttributeManager *mgr, uint64_t offset, + uint64_t size, bool to_private) +{ + const int block_size = memory_attribute_manager_get_block_size(mgr); + const unsigned long first_bit = offset / block_size; + const unsigned long nbits = size / block_size; + const uint64_t end = offset + size; + unsigned long bit; + uint64_t cur; + int ret = 0; + + if (!memory_attribute_is_valid_range(mgr, offset, size)) { + error_report("%s, invalid range: offset 0x%lx, size 0x%lx", + __func__, offset, size); + return -1; + } + + if (to_private) { + if (memory_attribute_is_range_discarded(mgr, offset, size)) { + /* Already private */ + } else if (!memory_attribute_is_range_populated(mgr, offset, size)) { + /* Unexpected mixture: process individual blocks */ + for (cur = offset; cur < end; cur += block_size) { + bit = cur / block_size; + if (!test_bit(bit, mgr->shared_bitmap)) { + continue; + } + clear_bit(bit, mgr->shared_bitmap); + memory_attribute_notify_discard(mgr, cur, block_size); + } + } else { + /* Completely shared */ + bitmap_clear(mgr->shared_bitmap, first_bit, nbits); + memory_attribute_notify_discard(mgr, offset, size); + } + } else { + if (memory_attribute_is_range_populated(mgr, offset, size)) { + /* Already shared */ + } else if (!memory_attribute_is_range_discarded(mgr, offset, size)) { + /* Unexpected mixture: process individual blocks */ + unsigned long *modified_bitmap = bitmap_new(nbits); + + for (cur = offset; cur < end; cur += block_size) { + bit = cur / block_size; + if (test_bit(bit, mgr->shared_bitmap)) { + continue; + } + set_bit(bit, mgr->shared_bitmap); + ret = memory_attribute_notify_populate(mgr, cur, block_size); + if (!ret) { + set_bit(bit - first_bit, modified_bitmap); + continue; + } + clear_bit(bit, mgr->shared_bitmap); + break; + } + + if (ret) { + /* + * Very unexpected: something went wrong. Revert to the old + * state, marking only the blocks as private that we converted + * to shared. + */ + for (cur = offset; cur < end; cur += block_size) { + bit = cur / block_size; + if (!test_bit(bit - first_bit, modified_bitmap)) { + continue; + } + assert(test_bit(bit, mgr->shared_bitmap)); + clear_bit(bit, mgr->shared_bitmap); + memory_attribute_notify_discard(mgr, cur, block_size); + } + } + g_free(modified_bitmap); + } else { + /* Complete private */ + bitmap_set(mgr->shared_bitmap, first_bit, nbits); + ret = memory_attribute_notify_populate(mgr, offset, size); + if (ret) { + bitmap_clear(mgr->shared_bitmap, first_bit, nbits); + } + } + } + + return ret; +} + int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr) { uint64_t shared_bitmap_size; @@ -272,8 +457,11 @@ static void memory_attribute_manager_finalize(Object *obj) static void memory_attribute_manager_class_init(ObjectClass *oc, void *data) { + MemoryAttributeManagerClass *mamc = MEMORY_ATTRIBUTE_MANAGER_CLASS(oc); RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); + mamc->state_change = memory_attribute_state_change; + rdmc->get_min_granularity = memory_attribute_rdm_get_min_granularity; rdmc->register_listener = memory_attribute_rdm_register_listener; rdmc->unregister_listener = memory_attribute_rdm_unregister_listener; From patchwork Mon Mar 10 08:18:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 14009384 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 436601DE4FA for ; Mon, 10 Mar 2025 08:20:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594839; cv=none; b=pTFQyor9+EpJlleNyv9xYyvQpELDvnK+9wec+lrsUsS1avp3Ht5hHphRJSfgqbraKTvnVFrO8qgvH34HOIhPis0oTJnrNIi2rt0GMtBa5lDsUAtReJ5sXeQSlbDpxgxaDT6h+Y3MPAQyOjCXbn0lnz7OjGBbNsnvEo/hJIpJlTI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594839; c=relaxed/simple; bh=mUz3KRZzpMdb9UUDv1l5scZ1u6EFNRr/6avaaV0mfRQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZOs10YQs/hPvIEGSSpR/oBYw0v8/aEtNMRjieNani81Lg1iEz2nSZ44bur3SkwU5p9vvmTdYi4o3OWS1EdtmQdmQ9ZEoZqNjUP9u1uTf/CLiO2NmnUy2OYzZ5QhNmIHd/nRmexKWvW6iQDlqXnCVMPtF0HUGIsEYvz233crpvVM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Yhpp9hDC; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Yhpp9hDC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1741594838; x=1773130838; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mUz3KRZzpMdb9UUDv1l5scZ1u6EFNRr/6avaaV0mfRQ=; b=Yhpp9hDCo+Ih7z3CLV6U1TbpfeNF71zT0MuGQLGTEg2uyzxwzrOR5ODz 7ibLDCb2xGcBesw07AeYiqsxoZvkdtJ05/3WK9D30gaqewO86xEr78q0D EHwzhNt8wNXq45GE8tWSh0UvoT1PYflZUbT5AMLL3NEI2InHtiDTmpAh5 h/b2372aOYFFBM36eF5btzlW0G+olFbTE2z8xWLH07ja6V9FAz2FtX6QP I5YLk2BXQ4NqODqRf83y79SYm8VIPADoo5RcHBO+ZySnq6iouBATCm7jG W6v5Rx9Rk4slCGVWYgVgIPKYefhldFRmA2OAKVN49dVMdiYu9Fj8N/jYl Q==; X-CSE-ConnectionGUID: 9kfMncrKQIKeI/x7MU1Grg== X-CSE-MsgGUID: IfxCS514QKKgEYE7u/Zotw== X-IronPort-AV: E=McAfee;i="6700,10204,11368"; a="42688521" X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="42688521" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:38 -0700 X-CSE-ConnectionGUID: oa6bsM18TPihU1XUuN3n/A== X-CSE-MsgGUID: xVbdbMIeR62RqMthfT4pbw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="150862852" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:33 -0700 From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun , Li Xiaoyao Subject: [PATCH v3 6/7] memory: Attach MemoryAttributeManager to guest_memfd-backed RAMBlocks Date: Mon, 10 Mar 2025 16:18:34 +0800 Message-ID: <20250310081837.13123-7-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250310081837.13123-1-chenyi.qiang@intel.com> References: <20250310081837.13123-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Introduce a new field, memory_attribute_manager, in RAMBlock to link to an MemoryAttributeManager object. This change centralizes all guest_memfd state information (like fd and shared_bitmap) within a RAMBlock, making it easier to manage. Use the realize()/unrealize() helpers to initialize/uninitialize the MemoryAttributeManager object. Register/unregister the object in the target RAMBlock's MemoryRegion when creating guest_memfd. In the kvm_convert_memory() function, manage memory state changes by using the shared_bitmap to call set_attribute() only on the specific memory range. Additionally, use the memory_attribute_manager_state_change() helper to notify the reigstered RamDiscardListener of these changes. Signed-off-by: Chenyi Qiang --- Changes in v3: - Use ram_discard_manager_reply_populated/discarded() to set the memory attribute and add the undo support if state_change() failed. - Didn't add Reviewed-by from Alexey due to the new changes in this commit. Changes in v2: - Introduce a new field memory_attribute_manager in RAMBlock. - Move the state_change() handling during page conversion in this patch. - Undo what we did if it fails to set. - Change the order of close(guest_memfd) and memory_attribute_manager cleanup. --- accel/kvm/kvm-all.c | 50 +++++++++++++++++++++++++++++++++++++++-- include/exec/ramblock.h | 2 ++ system/physmem.c | 13 +++++++++++ 3 files changed, 63 insertions(+), 2 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index c1fea69d58..a89c5655e8 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -48,6 +48,7 @@ #include "kvm-cpus.h" #include "system/dirtylimit.h" #include "qemu/range.h" +#include "system/memory-attribute-manager.h" #include "hw/boards.h" #include "system/stats.h" @@ -3018,6 +3019,25 @@ static void kvm_eat_signals(CPUState *cpu) } while (sigismember(&chkset, SIG_IPI)); } +typedef struct SetMemoryAttribute { + bool to_private; +} SetMemoryAttribute; + +static int kvm_set_memory_attributes_cb(MemoryRegionSection *section, + void *opaque) +{ + hwaddr start = section->offset_within_address_space; + hwaddr size = section->size; + SetMemoryAttribute *args = opaque; + bool to_private = args->to_private; + + if (to_private) { + return kvm_set_memory_attributes_private(start, size); + } else { + return kvm_set_memory_attributes_shared(start, size); + } +} + int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) { MemoryRegionSection section; @@ -3026,6 +3046,7 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) RAMBlock *rb; void *addr; int ret = -EINVAL; + SetMemoryAttribute args = { .to_private = to_private }; trace_kvm_convert_memory(start, size, to_private ? "shared_to_private" : "private_to_shared"); @@ -3077,9 +3098,13 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) } if (to_private) { - ret = kvm_set_memory_attributes_private(start, size); + ret = ram_discard_manager_replay_populated(mr->rdm, §ion, + kvm_set_memory_attributes_cb, + &args); } else { - ret = kvm_set_memory_attributes_shared(start, size); + ret = ram_discard_manager_replay_discarded(mr->rdm, §ion, + kvm_set_memory_attributes_cb, + &args); } if (ret) { goto out_unref; @@ -3088,6 +3113,27 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) addr = memory_region_get_ram_ptr(mr) + section.offset_within_region; rb = qemu_ram_block_from_host(addr, false, &offset); + ret = memory_attribute_manager_state_change(MEMORY_ATTRIBUTE_MANAGER(mr->rdm), + offset, size, to_private); + if (ret) { + warn_report("Failed to notify the listener the state change of " + "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s", + start, size, to_private ? "private" : "shared"); + args.to_private = !to_private; + if (to_private) { + ret = ram_discard_manager_replay_populated(mr->rdm, §ion, + kvm_set_memory_attributes_cb, + &args); + } else { + ret = ram_discard_manager_replay_discarded(mr->rdm, §ion, + kvm_set_memory_attributes_cb, + &args); + } + if (ret) { + goto out_unref; + } + } + if (to_private) { if (rb->page_size != qemu_real_host_page_size()) { /* diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h index 0babd105c0..06fd365326 100644 --- a/include/exec/ramblock.h +++ b/include/exec/ramblock.h @@ -23,6 +23,7 @@ #include "cpu-common.h" #include "qemu/rcu.h" #include "exec/ramlist.h" +#include "system/memory-attribute-manager.h" struct RAMBlock { struct rcu_head rcu; @@ -42,6 +43,7 @@ struct RAMBlock { int fd; uint64_t fd_offset; int guest_memfd; + MemoryAttributeManager *memory_attribute_manager; size_t page_size; /* dirty bitmap used during migration */ unsigned long *bmap; diff --git a/system/physmem.c b/system/physmem.c index c76503aea8..0ed394c5d2 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -54,6 +54,7 @@ #include "system/hostmem.h" #include "system/hw_accel.h" #include "system/xen-mapcache.h" +#include "system/memory-attribute-manager.h" #include "trace.h" #ifdef CONFIG_FALLOCATE_PUNCH_HOLE @@ -1885,6 +1886,16 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) qemu_mutex_unlock_ramlist(); goto out_free; } + + new_block->memory_attribute_manager = MEMORY_ATTRIBUTE_MANAGER(object_new(TYPE_MEMORY_ATTRIBUTE_MANAGER)); + if (memory_attribute_manager_realize(new_block->memory_attribute_manager, new_block->mr)) { + error_setg(errp, "Failed to realize memory attribute manager"); + object_unref(OBJECT(new_block->memory_attribute_manager)); + close(new_block->guest_memfd); + ram_block_discard_require(false); + qemu_mutex_unlock_ramlist(); + goto out_free; + } } ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS; @@ -2138,6 +2149,8 @@ static void reclaim_ramblock(RAMBlock *block) } if (block->guest_memfd >= 0) { + memory_attribute_manager_unrealize(block->memory_attribute_manager); + object_unref(OBJECT(block->memory_attribute_manager)); close(block->guest_memfd); ram_block_discard_require(false); } From patchwork Mon Mar 10 08:18:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 14009385 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D3771DE4FA for ; Mon, 10 Mar 2025 08:20:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594849; cv=none; b=TCLdHruuNgDYfOCMXjeLogIgbPBXYvEx4VV7UwLA2+rg4wBwLX5rBdof1Ez/kJPdcDU5GsGMZXcQpqKfrOPRGjvLRLcbREukDJtn2TFxiieTSJ59B7KP2g/uJ0GkNnZqb1eNjE4TZjQ0eMnoqgoJITX1VA6BinltjF/z2Es0UU4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741594849; c=relaxed/simple; bh=hNdLZ8kK1rf9TDHkt6IAvF/6Xekzb2aG95CQUh+XPPY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Rmy35enxJZraWOeiHIejyZt1CDcVYR6SIjepWUN4xIm7FXyCcqLkUj9Jam0/NL1VfBVJKYWWBAFlyUS9Y72Zu3SD0ADJ4SmVi5Mjxc6XEISHr5StGUpOV3Al1rV+LQInoCAS8iPtqR8x8OsRLUcCEcRBiATKvXauU1Mhmvo46E4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=bHS56YfZ; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bHS56YfZ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1741594847; x=1773130847; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hNdLZ8kK1rf9TDHkt6IAvF/6Xekzb2aG95CQUh+XPPY=; b=bHS56YfZFVMzG4CTzrOtM/IASZuq+dyX/JI3yvAyaOGJrMQKMQvbzM/M 3UOYxeiySGSqQ79B/p4EJDApZiYOtOHK+EMFdZ5gfwDyorO6+UgxGVaP5 KjQrMC6ICCJHV4XU2FTKyiYm7AU/8cOnPWI99JkheLRuFQ0+guwLoF36h qgSUJ/8yyTW7MDaoIDbawxmv3ArdtwAKat7xzRLsRcfazPNzfdaLjrQT7 Kf8S5G0EywDLwwA6EqhHD7LYVulmpaTfu1DllTEiMgYwkz55dzNsWup6d vwT4yvNoixNYf+/3kbECnY8KE5/IKqdOAkScncu4xknA/hZPi3pn5tQ6y Q==; X-CSE-ConnectionGUID: CF/zMlA7RfinZSVLiaEtXQ== X-CSE-MsgGUID: rX49QVI6RFGcFqy6lIa+pg== X-IronPort-AV: E=McAfee;i="6700,10204,11368"; a="42688569" X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="42688569" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:46 -0700 X-CSE-ConnectionGUID: Rr8bERHAT9qYYs5LAc7IXg== X-CSE-MsgGUID: v2iKTpieQ3yncekKnk3AjQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.14,235,1736841600"; d="scan'208";a="150862863" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2025 01:20:38 -0700 From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun , Li Xiaoyao Subject: [PATCH v3 7/7] RAMBlock: Make guest_memfd require coordinate discard Date: Mon, 10 Mar 2025 16:18:35 +0800 Message-ID: <20250310081837.13123-8-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250310081837.13123-1-chenyi.qiang@intel.com> References: <20250310081837.13123-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 As guest_memfd is now managed by memory_attribute_manager with RamDiscardManager, only block uncoordinated discard. Signed-off-by: Chenyi Qiang --- Changes in v3: - No change. Changes in v2: - Change the ram_block_discard_require(false) to ram_block_coordinated_discard_require(false). --- system/physmem.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/system/physmem.c b/system/physmem.c index 0ed394c5d2..a30cdd43ee 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -1872,7 +1872,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) assert(kvm_enabled()); assert(new_block->guest_memfd < 0); - ret = ram_block_discard_require(true); + ret = ram_block_coordinated_discard_require(true); if (ret < 0) { error_setg_errno(errp, -ret, "cannot set up private guest memory: discard currently blocked"); @@ -1892,7 +1892,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) error_setg(errp, "Failed to realize memory attribute manager"); object_unref(OBJECT(new_block->memory_attribute_manager)); close(new_block->guest_memfd); - ram_block_discard_require(false); + ram_block_coordinated_discard_require(false); qemu_mutex_unlock_ramlist(); goto out_free; } @@ -2152,7 +2152,7 @@ static void reclaim_ramblock(RAMBlock *block) memory_attribute_manager_unrealize(block->memory_attribute_manager); object_unref(OBJECT(block->memory_attribute_manager)); close(block->guest_memfd); - ram_block_discard_require(false); + ram_block_coordinated_discard_require(false); } g_free(block);