From patchwork Thu Feb 16 14:48:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 13143292 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 08964C61DA4 for ; Thu, 16 Feb 2023 14:49:22 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8CC5510E326; Thu, 16 Feb 2023 14:49:19 +0000 (UTC) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7155C10E330; Thu, 16 Feb 2023 14:49:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676558956; x=1708094956; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=37uTa8eb6jdAgVE5XMt9lrfpatTneRtb0p7zsXq5PYU=; b=B74by2HlSZDRmMWeiDEVnYohCApj8rJjKaBuwxCih1N9rJUUst0fZCUZ qiRJ0jz+cFydOY5bE1EbwpaTD7D6I4sKdQ4b0ZNnl0ijZ2sZnzkX61P/j aZrctwh12Nfzozepw+JrW9mrOIbE8qOlGgd07fJmtoxs/w62lLpGuSz11 sIL1vi4Ae4cKBm8NCrsnieE4RPPI6xJv2d+4fZAzux4sMQQk6HrzzOXqx 4851+RDHely6rHr9M6NkNsDQ83PBrviwXLdKeGcTRDxjcijGVvTpc7LQv IeIGYbZqWqLM2A1Zhqeylbt+Tjsxa428ldMngcwcPqMYGGj2sFSe0lFrr Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="331729148" X-IronPort-AV: E=Sophos;i="5.97,302,1669104000"; d="scan'208";a="331729148" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Feb 2023 06:49:16 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="672175117" X-IronPort-AV: E=Sophos;i="5.97,302,1669104000"; d="scan'208";a="672175117" Received: from ksushmit-mobl1.gar.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.179]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Feb 2023 06:49:14 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: dri-devel@lists.freedesktop.org Subject: [PATCH 1/3] drm/suballoc: Introduce a generic suballocation manager Date: Thu, 16 Feb 2023 15:48:45 +0100 Message-Id: <20230216144847.216259-2-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20230216144847.216259-1-thomas.hellstrom@linux.intel.com> References: <20230216144847.216259-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , Daniel Vetter , Christian Koenig , Dave Airlie , intel-xe@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Initially we tried to leverage the amdgpu suballocation manager. It turnes out, however, that it tries extremely hard not to enable signalling on the fences that hold the memory up for freeing, which makes it hard to understand and to fix potential issues with it. So in a simplification effort, introduce a drm suballocation manager as a wrapper around an existing allocator (drm_mm) and to avoid using queues for freeing, thus avoiding throttling on free which is an undesired feature as typically the throttling needs to be done uninterruptible. This variant is probably more cpu-hungry but can be improved at the cost of additional complexity. Ideas for that are documented in the drm_suballoc.c file. Signed-off-by: Thomas Hellström Co-developed-by: Maarten Lankhorst Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/Kconfig | 4 + drivers/gpu/drm/Makefile | 3 + drivers/gpu/drm/drm_suballoc.c | 301 +++++++++++++++++++++++++++++++++ include/drm/drm_suballoc.h | 112 ++++++++++++ 4 files changed, 420 insertions(+) create mode 100644 drivers/gpu/drm/drm_suballoc.c create mode 100644 include/drm/drm_suballoc.h diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index dc0f94f02a82..8fbe57407c60 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -232,6 +232,10 @@ config DRM_GEM_SHMEM_HELPER help Choose this if you need the GEM shmem helper functions +config DRM_SUBALLOC_HELPER + tristate + depends on DRM + config DRM_SCHED tristate depends on DRM diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index ab4460fcd63f..1e04d135e866 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -88,6 +88,9 @@ obj-$(CONFIG_DRM_GEM_DMA_HELPER) += drm_dma_helper.o drm_shmem_helper-y := drm_gem_shmem_helper.o obj-$(CONFIG_DRM_GEM_SHMEM_HELPER) += drm_shmem_helper.o +drm_suballoc_helper-y := drm_suballoc.o +obj-$(CONFIG_DRM_SUBALLOC_HELPER) += drm_suballoc_helper.o + drm_vram_helper-y := drm_gem_vram_helper.o obj-$(CONFIG_DRM_VRAM_HELPER) += drm_vram_helper.o diff --git a/drivers/gpu/drm/drm_suballoc.c b/drivers/gpu/drm/drm_suballoc.c new file mode 100644 index 000000000000..6e0292dea548 --- /dev/null +++ b/drivers/gpu/drm/drm_suballoc.c @@ -0,0 +1,301 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2022 Intel Corporation + */ + +#include + +/** + * DOC: + * This suballocator intends to be a wrapper around a range allocator + * that is aware also of deferred range freeing with fences. Currently + * we hard-code the drm_mm as the range allocator. + * The approach, while rather simple, suffers from three performance + * issues that can all be fixed if needed at the tradeoff of more and / or + * more complex code: + * + * 1) It's cpu-hungry, the drm_mm allocator is overkill. Either code a + * much simpler range allocator, or let the caller decide by providing + * ops that wrap any range allocator. Also could avoid waking up unless + * there is a reasonable chance of enough space in the range manager. + * + * 2) We unnecessarily install the fence callbacks too early, forcing + * enable_signaling() too early causing extra driver effort. This is likely + * not an issue if used with the drm_scheduler since it calls + * enable_signaling() early anyway. + * + * 3) Long processing in irq (disabled) context. We've mostly worked around + * that already by using the idle_list. If that workaround is deemed to + * complex for little gain, we can remove it and use spin_lock_irq() + * throughout the manager. If we want to shorten processing in irq context + * even further, we can skip the spin_trylock in __drm_suballoc_free() and + * avoid freeing allocations from irq context altogeher. However drm_mm + * should be quite fast at freeing ranges. + * + * 4) Shrinker that starts processing the list items in 2) and 3) to play + * better with the system. + */ + +static void drm_suballoc_process_idle(struct drm_suballoc_manager *sa_manager); + +/** + * drm_suballoc_manager_init() - Initialise the drm_suballoc_manager + * @sa_manager: pointer to the sa_manager + * @size: number of bytes we want to suballocate + * @align: alignment for each suballocated chunk + * + * Prepares the suballocation manager for suballocations. + */ +void drm_suballoc_manager_init(struct drm_suballoc_manager *sa_manager, + u64 size, u64 align) +{ + spin_lock_init(&sa_manager->lock); + spin_lock_init(&sa_manager->idle_list_lock); + mutex_init(&sa_manager->alloc_mutex); + drm_mm_init(&sa_manager->mm, 0, size); + init_waitqueue_head(&sa_manager->wq); + sa_manager->range_size = size; + sa_manager->alignment = align; + INIT_LIST_HEAD(&sa_manager->idle_list); +} +EXPORT_SYMBOL(drm_suballoc_manager_init); + +/** + * drm_suballoc_manager_fini() - Destroy the drm_suballoc_manager + * @sa_manager: pointer to the sa_manager + * + * Cleans up the suballocation manager after use. All fences added + * with drm_suballoc_free() must be signaled, or we cannot clean up + * the entire manager. + */ +void drm_suballoc_manager_fini(struct drm_suballoc_manager *sa_manager) +{ + drm_suballoc_process_idle(sa_manager); + drm_mm_takedown(&sa_manager->mm); + mutex_destroy(&sa_manager->alloc_mutex); +} +EXPORT_SYMBOL(drm_suballoc_manager_fini); + +static void __drm_suballoc_free(struct drm_suballoc *sa) +{ + struct drm_suballoc_manager *sa_manager = sa->manager; + struct dma_fence *fence; + + /* + * In order to avoid protecting the potentially lengthy drm_mm manager + * *allocation* processing with an irq-disabling lock, + * defer touching the drm_mm for freeing until we're in task context, + * with no irqs disabled, or happen to succeed in taking the manager + * lock. + */ + if (!in_task() || irqs_disabled()) { + unsigned long irqflags; + + if (spin_trylock(&sa_manager->lock)) + goto locked; + + spin_lock_irqsave(&sa_manager->idle_list_lock, irqflags); + list_add_tail(&sa->idle_link, &sa_manager->idle_list); + spin_unlock_irqrestore(&sa_manager->idle_list_lock, irqflags); + wake_up(&sa_manager->wq); + return; + } + + spin_lock(&sa_manager->lock); +locked: + drm_mm_remove_node(&sa->node); + + fence = sa->fence; + sa->fence = NULL; + spin_unlock(&sa_manager->lock); + /* Maybe only wake if first mm hole is sufficiently large? */ + wake_up(&sa_manager->wq); + dma_fence_put(fence); + kfree(sa); +} + +/* Free all deferred idle allocations */ +static void drm_suballoc_process_idle(struct drm_suballoc_manager *sa_manager) +{ + /* + * prepare_to_wait() / wake_up() semantics ensure that any list + * addition that was done before wake_up() is visible when + * this code is called from the wait loop. + */ + if (!list_empty_careful(&sa_manager->idle_list)) { + struct drm_suballoc *sa, *next; + unsigned long irqflags; + LIST_HEAD(list); + + spin_lock_irqsave(&sa_manager->idle_list_lock, irqflags); + list_splice_init(&sa_manager->idle_list, &list); + spin_unlock_irqrestore(&sa_manager->idle_list_lock, irqflags); + + list_for_each_entry_safe(sa, next, &list, idle_link) + __drm_suballoc_free(sa); + } +} + +static void +drm_suballoc_fence_signaled(struct dma_fence *fence, struct dma_fence_cb *cb) +{ + struct drm_suballoc *sa = container_of(cb, typeof(*sa), cb); + + __drm_suballoc_free(sa); +} + +static int drm_suballoc_tryalloc(struct drm_suballoc *sa, u64 size) +{ + struct drm_suballoc_manager *sa_manager = sa->manager; + int err; + + drm_suballoc_process_idle(sa_manager); + spin_lock(&sa_manager->lock); + err = drm_mm_insert_node_generic(&sa_manager->mm, &sa->node, size, + sa_manager->alignment, 0, + DRM_MM_INSERT_EVICT); + spin_unlock(&sa_manager->lock); + return err; +} + +/** + * drm_suballoc_new() - Make a suballocation. + * @sa_manager: pointer to the sa_manager + * @size: number of bytes we want to suballocate. + * @gfp: Allocation context. + * @intr: Whether to sleep interruptibly if sleeping. + * + * Try to make a suballocation of size @size, which will be rounded + * up to the alignment specified in specified in drm_suballoc_manager_init(). + * + * Returns a new suballocated bo, or an ERR_PTR. + */ +struct drm_suballoc* +drm_suballoc_new(struct drm_suballoc_manager *sa_manager, u64 size, + gfp_t gfp, bool intr) +{ + struct drm_suballoc *sa; + DEFINE_WAIT(wait); + int err = 0; + + if (size > sa_manager->range_size) + return ERR_PTR(-ENOSPC); + + sa = kzalloc(sizeof(*sa), gfp); + if (!sa) + return ERR_PTR(-ENOMEM); + + /* Avoid starvation using the alloc_mutex */ + if (intr) + err = mutex_lock_interruptible(&sa_manager->alloc_mutex); + else + mutex_lock(&sa_manager->alloc_mutex); + if (err) { + kfree(sa); + return ERR_PTR(err); + } + + sa->manager = sa_manager; + err = drm_suballoc_tryalloc(sa, size); + if (err != -ENOSPC) + goto out; + + for (;;) { + prepare_to_wait(&sa_manager->wq, &wait, + intr ? TASK_INTERRUPTIBLE : + TASK_UNINTERRUPTIBLE); + + err = drm_suballoc_tryalloc(sa, size); + if (err != -ENOSPC) + break; + + if (intr && signal_pending(current)) { + err = -ERESTARTSYS; + break; + } + + io_schedule(); + } + finish_wait(&sa_manager->wq, &wait); + +out: + mutex_unlock(&sa_manager->alloc_mutex); + if (!sa->node.size) { + kfree(sa); + WARN_ON(!err); + sa = ERR_PTR(err); + } + + return sa; +} +EXPORT_SYMBOL(drm_suballoc_new); + +/** + * drm_suballoc_free() - Free a suballocation + * @suballoc: pointer to the suballocation + * @fence: fence that signals when suballocation is idle + * @queue: the index to which queue the suballocation will be placed on the free list. + * + * Free the suballocation. The suballocation can be re-used after @fence + * signals. + */ +void +drm_suballoc_free(struct drm_suballoc *sa, struct dma_fence *fence) +{ + if (!sa) + return; + + if (!fence || dma_fence_is_signaled(fence)) { + __drm_suballoc_free(sa); + return; + } + + sa->fence = dma_fence_get(fence); + if (dma_fence_add_callback(fence, &sa->cb, drm_suballoc_fence_signaled)) + __drm_suballoc_free(sa); +} +EXPORT_SYMBOL(drm_suballoc_free); + +#ifdef CONFIG_DEBUG_FS + +/** + * drm_suballoc_dump_debug_info() - Dump the suballocator state + * @sa_manager: The suballoc manager. + * @p: Pointer to a drm printer for output. + * @suballoc_base: Constant to add to the suballocated offsets on printout. + * + * This function dumps the suballocator state. Note that the caller has + * to explicitly order frees and calls to this function in order for the + * freed node to show up as protected by a fence. + */ +void drm_suballoc_dump_debug_info(struct drm_suballoc_manager *sa_manager, + struct drm_printer *p, u64 suballoc_base) +{ + const struct drm_mm_node *entry; + + spin_lock(&sa_manager->lock); + drm_mm_for_each_node(entry, &sa_manager->mm) { + struct drm_suballoc *sa = + container_of(entry, typeof(*sa), node); + + drm_printf(p, " "); + drm_printf(p, "[0x%010llx 0x%010llx] size %8lld", + (unsigned long long)suballoc_base + entry->start, + (unsigned long long)suballoc_base + entry->start + + entry->size, (unsigned long long)entry->size); + + if (sa->fence) + drm_printf(p, " protected by 0x%016llx on context %llu", + (unsigned long long)sa->fence->seqno, + (unsigned long long)sa->fence->context); + + drm_printf(p, "\n"); + } + spin_unlock(&sa_manager->lock); +} +EXPORT_SYMBOL(drm_suballoc_dump_debug_info); +#endif + +MODULE_AUTHOR("Intel Corporation"); +MODULE_DESCRIPTION("Simple range suballocator helper"); +MODULE_LICENSE("GPL and additional rights"); diff --git a/include/drm/drm_suballoc.h b/include/drm/drm_suballoc.h new file mode 100644 index 000000000000..910952b3383b --- /dev/null +++ b/include/drm/drm_suballoc.h @@ -0,0 +1,112 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2022 Intel Corporation + */ +#ifndef _DRM_SUBALLOC_H_ +#define _DRM_SUBALLOC_H_ + +#include + +#include +#include + +/** + * struct drm_suballoc_manager - Wrapper for fenced range allocations + * @mm: The range manager. Protected by @lock. + * @range_size: The total size of the range. + * @alignment: Range alignment. + * @wq: Wait queue for sleeping allocations on contention. + * @idle_list: List of idle but not yet freed allocations. Protected by + * @idle_list_lock. + * @task: Task waiting for allocation. Protected by @lock. + */ +struct drm_suballoc_manager { + /** @lock: Manager lock. Protects @mm. */ + spinlock_t lock; + /** + * @idle_list_lock: Lock to protect the idle_list. + * Disable irqs when locking. + */ + spinlock_t idle_list_lock; + /** @alloc_mutex: Mutex to protect against stavation. */ + struct mutex alloc_mutex; + struct drm_mm mm; + u64 range_size; + u64 alignment; + wait_queue_head_t wq; + struct list_head idle_list; +}; + +/** + * struct drm_suballoc: Suballocated range. + * @node: The drm_mm representation of the range. + * @fence: dma-fence indicating whether allocation is active or idle. + * Assigned on call to free the allocation so doesn't need protection. + * @cb: dma-fence callback structure. Used for callbacks when the fence signals. + * @manager: The struct drm_suballoc_manager the range belongs to. Immutable. + * @idle_link: Link for the manager idle_list. Progected by the + * drm_suballoc_manager::idle_lock. + */ +struct drm_suballoc { + struct drm_mm_node node; + struct dma_fence *fence; + struct dma_fence_cb cb; + struct drm_suballoc_manager *manager; + struct list_head idle_link; +}; + +void drm_suballoc_manager_init(struct drm_suballoc_manager *sa_manager, + u64 size, u64 align); + +void drm_suballoc_manager_fini(struct drm_suballoc_manager *sa_manager); + +struct drm_suballoc *drm_suballoc_new(struct drm_suballoc_manager *sa_manager, + u64 size, gfp_t gfp, bool intr); + +void drm_suballoc_free(struct drm_suballoc *sa, struct dma_fence *fence); + +/** + * drm_suballoc_soffset - Range start. + * @sa: The struct drm_suballoc. + * + * Return: The start of the allocated range. + */ +static inline u64 drm_suballoc_soffset(struct drm_suballoc *sa) +{ + return sa->node.start; +} + +/** + * drm_suballoc_eoffset - Range end. + * @sa: The struct drm_suballoc. + * + * Return: The end of the allocated range + 1. + */ +static inline u64 drm_suballoc_eoffset(struct drm_suballoc *sa) +{ + return sa->node.start + sa->node.size; +} + +/** + * drm_suballoc_size - Range size. + * @sa: The struct drm_suballoc. + * + * Return: The size of the allocated range. + */ +static inline u64 drm_suballoc_size(struct drm_suballoc *sa) +{ + return sa->node.size; +} + +#ifdef CONFIG_DEBUG_FS +void drm_suballoc_dump_debug_info(struct drm_suballoc_manager *sa_manager, + struct drm_printer *p, u64 suballoc_base); +#else +static inline void +drm_suballoc_dump_debug_info(struct drm_suballoc_manager *sa_manager, + struct drm_printer *p, u64 suballoc_base) +{ } + +#endif + +#endif /* _DRM_SUBALLOC_H_ */ From patchwork Thu Feb 16 14:48:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 13143293 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5F00DC61DA4 for ; Thu, 16 Feb 2023 14:49:25 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2E29D10E330; Thu, 16 Feb 2023 14:49:22 +0000 (UTC) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTPS id 02A6E10E326; Thu, 16 Feb 2023 14:49:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676558959; x=1708094959; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7v9K+qsv2+G7WUAb7S2LZ1De6QfQWGZz6SoFxHnMrag=; b=AtPoCKcLuXV5blszzug1XbE1heitdcwqZrc2pCh+B27Q9DNArKHdUUSZ vnMIwPEYP0ginsVnYyq+Ou+LUjpK8gkja8+zRBlgW0vnVp5bsmDjlxNRs bb4bUGC0cEhSQBA/SRsFiSU1ehBRp+NPLo2KI0P7C923k+mw7WZ57GHje YPU1NdshjOKtDqTWyDPNkYnR5GoNBP34aXtFmIlR+8QsfDrPAjTdjqyMi cmHOmXix9R8cchSPMGFN2sS9TlSjgxAXW8kBam4EgymFb3D8ZA9UN6fMl SuQiFpLG0sSLztwQITXjurDM00mlA/20iTJqvw/KYc2KQqoMI2WzMSAFa w==; X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="331729167" X-IronPort-AV: E=Sophos;i="5.97,302,1669104000"; d="scan'208";a="331729167" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Feb 2023 06:49:18 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="672175124" X-IronPort-AV: E=Sophos;i="5.97,302,1669104000"; d="scan'208";a="672175124" Received: from ksushmit-mobl1.gar.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.179]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Feb 2023 06:49:16 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: dri-devel@lists.freedesktop.org Subject: [PATCH 2/3] drm/amd: Convert amdgpu to use suballocation helper. Date: Thu, 16 Feb 2023 15:48:46 +0100 Message-Id: <20230216144847.216259-3-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20230216144847.216259-1-thomas.hellstrom@linux.intel.com> References: <20230216144847.216259-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , Daniel Vetter , Christian Koenig , Dave Airlie , intel-xe@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Maarten Lankhorst Now that we have a generic suballocation helper, Use it in amdgpu. The debug output is slightly different and suballocation may be slightly more cpu-hungry. Signed-off-by: Maarten Lankhorst Co-developed-by: Thomas Hellström Signed-off-by: Thomas Hellström --- drivers/gpu/drm/Kconfig | 1 + drivers/gpu/drm/amd/amdgpu/Kconfig | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu.h | 26 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 23 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c | 320 ++------------------- 7 files changed, 43 insertions(+), 336 deletions(-) diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 8fbe57407c60..73ddfdf3a894 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -77,6 +77,7 @@ config DRM_KUNIT_TEST select DRM_DISPLAY_HELPER select DRM_LIB_RANDOM select DRM_KMS_HELPER + select DRM_SUBALLOC_HELPER select DRM_BUDDY select DRM_EXPORT_FOR_TESTS if m select DRM_KUNIT_TEST_HELPERS diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig index 5341b6b242c3..0ed12171450b 100644 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig @@ -18,6 +18,7 @@ config DRM_AMDGPU select BACKLIGHT_CLASS_DEVICE select INTERVAL_TREE select DRM_BUDDY + select DRM_SUBALLOC_HELPER # amdgpu depends on ACPI_VIDEO when ACPI is enabled, for select to work # ACPI_VIDEO's dependencies must also be selected. select INPUT if ACPI diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 164141bc8b4a..dda88090f044 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -424,29 +424,11 @@ struct amdgpu_clock { * alignment). */ -#define AMDGPU_SA_NUM_FENCE_LISTS 32 - struct amdgpu_sa_manager { - wait_queue_head_t wq; - struct amdgpu_bo *bo; - struct list_head *hole; - struct list_head flist[AMDGPU_SA_NUM_FENCE_LISTS]; - struct list_head olist; - unsigned size; - uint64_t gpu_addr; - void *cpu_ptr; - uint32_t domain; - uint32_t align; -}; - -/* sub-allocation buffer */ -struct amdgpu_sa_bo { - struct list_head olist; - struct list_head flist; - struct amdgpu_sa_manager *manager; - unsigned soffset; - unsigned eoffset; - struct dma_fence *fence; + struct drm_suballoc_manager base; + struct amdgpu_bo *bo; + uint64_t gpu_addr; + void *cpu_ptr; }; int amdgpu_fence_slab_init(void); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index bcccc348dbe2..5621b63c7f42 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -69,7 +69,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm, if (size) { r = amdgpu_sa_bo_new(&adev->ib_pools[pool_type], - &ib->sa_bo, size, 256); + &ib->sa_bo, size); if (r) { dev_err(adev->dev, "failed to get a new IB (%d)\n", r); return r; @@ -309,8 +309,7 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev) for (i = 0; i < AMDGPU_IB_POOL_MAX; i++) { r = amdgpu_sa_bo_manager_init(adev, &adev->ib_pools[i], - AMDGPU_IB_POOL_SIZE, - AMDGPU_GPU_PAGE_SIZE, + AMDGPU_IB_POOL_SIZE, 256, AMDGPU_GEM_DOMAIN_GTT); if (r) goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 93207badf83f..568baf15d5b1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -336,15 +336,22 @@ uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev, /* * sub allocation */ +static inline struct amdgpu_sa_manager * +to_amdgpu_sa_manager(struct drm_suballoc_manager *manager) +{ + return container_of(manager, struct amdgpu_sa_manager, base); +} -static inline uint64_t amdgpu_sa_bo_gpu_addr(struct amdgpu_sa_bo *sa_bo) +static inline uint64_t amdgpu_sa_bo_gpu_addr(struct drm_suballoc *sa_bo) { - return sa_bo->manager->gpu_addr + sa_bo->soffset; + return to_amdgpu_sa_manager(sa_bo->manager)->gpu_addr + + drm_suballoc_soffset(sa_bo); } -static inline void * amdgpu_sa_bo_cpu_addr(struct amdgpu_sa_bo *sa_bo) +static inline void * amdgpu_sa_bo_cpu_addr(struct drm_suballoc *sa_bo) { - return sa_bo->manager->cpu_ptr + sa_bo->soffset; + return to_amdgpu_sa_manager(sa_bo->manager)->cpu_ptr + + drm_suballoc_soffset(sa_bo); } int amdgpu_sa_bo_manager_init(struct amdgpu_device *adev, @@ -355,11 +362,11 @@ void amdgpu_sa_bo_manager_fini(struct amdgpu_device *adev, int amdgpu_sa_bo_manager_start(struct amdgpu_device *adev, struct amdgpu_sa_manager *sa_manager); int amdgpu_sa_bo_new(struct amdgpu_sa_manager *sa_manager, - struct amdgpu_sa_bo **sa_bo, - unsigned size, unsigned align); + struct drm_suballoc **sa_bo, + unsigned size); void amdgpu_sa_bo_free(struct amdgpu_device *adev, - struct amdgpu_sa_bo **sa_bo, - struct dma_fence *fence); + struct drm_suballoc **sa_bo, + struct dma_fence *fence); #if defined(CONFIG_DEBUG_FS) void amdgpu_sa_bo_dump_debug_info(struct amdgpu_sa_manager *sa_manager, struct seq_file *m); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 3989e755a5b4..018f36b10de8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -27,6 +27,7 @@ #include #include #include +#include struct amdgpu_device; struct amdgpu_ring; @@ -92,7 +93,7 @@ enum amdgpu_ib_pool_type { }; struct amdgpu_ib { - struct amdgpu_sa_bo *sa_bo; + struct drm_suballoc *sa_bo; uint32_t length_dw; uint64_t gpu_addr; uint32_t *ptr; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c index 524d10b21041..e7b3539e0294 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c @@ -44,327 +44,61 @@ #include "amdgpu.h" -static void amdgpu_sa_bo_remove_locked(struct amdgpu_sa_bo *sa_bo); -static void amdgpu_sa_bo_try_free(struct amdgpu_sa_manager *sa_manager); - int amdgpu_sa_bo_manager_init(struct amdgpu_device *adev, struct amdgpu_sa_manager *sa_manager, - unsigned size, u32 align, u32 domain) + unsigned size, u32 suballoc_align, u32 domain) { - int i, r; - - init_waitqueue_head(&sa_manager->wq); - sa_manager->bo = NULL; - sa_manager->size = size; - sa_manager->domain = domain; - sa_manager->align = align; - sa_manager->hole = &sa_manager->olist; - INIT_LIST_HEAD(&sa_manager->olist); - for (i = 0; i < AMDGPU_SA_NUM_FENCE_LISTS; ++i) - INIT_LIST_HEAD(&sa_manager->flist[i]); + int r; - r = amdgpu_bo_create_kernel(adev, size, align, domain, &sa_manager->bo, + r = amdgpu_bo_create_kernel(adev, size, AMDGPU_GPU_PAGE_SIZE, domain, &sa_manager->bo, &sa_manager->gpu_addr, &sa_manager->cpu_ptr); if (r) { dev_err(adev->dev, "(%d) failed to allocate bo for manager\n", r); return r; } - memset(sa_manager->cpu_ptr, 0, sa_manager->size); + memset(sa_manager->cpu_ptr, 0, size); + drm_suballoc_manager_init(&sa_manager->base, size, suballoc_align); return r; } void amdgpu_sa_bo_manager_fini(struct amdgpu_device *adev, struct amdgpu_sa_manager *sa_manager) { - struct amdgpu_sa_bo *sa_bo, *tmp; - if (sa_manager->bo == NULL) { dev_err(adev->dev, "no bo for sa manager\n"); return; } - if (!list_empty(&sa_manager->olist)) { - sa_manager->hole = &sa_manager->olist, - amdgpu_sa_bo_try_free(sa_manager); - if (!list_empty(&sa_manager->olist)) { - dev_err(adev->dev, "sa_manager is not empty, clearing anyway\n"); - } - } - list_for_each_entry_safe(sa_bo, tmp, &sa_manager->olist, olist) { - amdgpu_sa_bo_remove_locked(sa_bo); - } + drm_suballoc_manager_fini(&sa_manager->base); amdgpu_bo_free_kernel(&sa_manager->bo, &sa_manager->gpu_addr, &sa_manager->cpu_ptr); - sa_manager->size = 0; } -static void amdgpu_sa_bo_remove_locked(struct amdgpu_sa_bo *sa_bo) -{ - struct amdgpu_sa_manager *sa_manager = sa_bo->manager; - if (sa_manager->hole == &sa_bo->olist) { - sa_manager->hole = sa_bo->olist.prev; - } - list_del_init(&sa_bo->olist); - list_del_init(&sa_bo->flist); - dma_fence_put(sa_bo->fence); - kfree(sa_bo); -} - -static void amdgpu_sa_bo_try_free(struct amdgpu_sa_manager *sa_manager) +int amdgpu_sa_bo_new(struct amdgpu_sa_manager *sa_manager, + struct drm_suballoc **sa_bo, + unsigned size) { - struct amdgpu_sa_bo *sa_bo, *tmp; + struct drm_suballoc *sa = drm_suballoc_new(&sa_manager->base, size, GFP_KERNEL, true); - if (sa_manager->hole->next == &sa_manager->olist) - return; + if (IS_ERR(sa)) { + *sa_bo = NULL; - sa_bo = list_entry(sa_manager->hole->next, struct amdgpu_sa_bo, olist); - list_for_each_entry_safe_from(sa_bo, tmp, &sa_manager->olist, olist) { - if (sa_bo->fence == NULL || - !dma_fence_is_signaled(sa_bo->fence)) { - return; - } - amdgpu_sa_bo_remove_locked(sa_bo); + return PTR_ERR(sa); } -} -static inline unsigned amdgpu_sa_bo_hole_soffset(struct amdgpu_sa_manager *sa_manager) -{ - struct list_head *hole = sa_manager->hole; - - if (hole != &sa_manager->olist) { - return list_entry(hole, struct amdgpu_sa_bo, olist)->eoffset; - } + *sa_bo = sa; return 0; } -static inline unsigned amdgpu_sa_bo_hole_eoffset(struct amdgpu_sa_manager *sa_manager) -{ - struct list_head *hole = sa_manager->hole; - - if (hole->next != &sa_manager->olist) { - return list_entry(hole->next, struct amdgpu_sa_bo, olist)->soffset; - } - return sa_manager->size; -} - -static bool amdgpu_sa_bo_try_alloc(struct amdgpu_sa_manager *sa_manager, - struct amdgpu_sa_bo *sa_bo, - unsigned size, unsigned align) -{ - unsigned soffset, eoffset, wasted; - - soffset = amdgpu_sa_bo_hole_soffset(sa_manager); - eoffset = amdgpu_sa_bo_hole_eoffset(sa_manager); - wasted = (align - (soffset % align)) % align; - - if ((eoffset - soffset) >= (size + wasted)) { - soffset += wasted; - - sa_bo->manager = sa_manager; - sa_bo->soffset = soffset; - sa_bo->eoffset = soffset + size; - list_add(&sa_bo->olist, sa_manager->hole); - INIT_LIST_HEAD(&sa_bo->flist); - sa_manager->hole = &sa_bo->olist; - return true; - } - return false; -} - -/** - * amdgpu_sa_event - Check if we can stop waiting - * - * @sa_manager: pointer to the sa_manager - * @size: number of bytes we want to allocate - * @align: alignment we need to match - * - * Check if either there is a fence we can wait for or - * enough free memory to satisfy the allocation directly - */ -static bool amdgpu_sa_event(struct amdgpu_sa_manager *sa_manager, - unsigned size, unsigned align) -{ - unsigned soffset, eoffset, wasted; - int i; - - for (i = 0; i < AMDGPU_SA_NUM_FENCE_LISTS; ++i) - if (!list_empty(&sa_manager->flist[i])) - return true; - - soffset = amdgpu_sa_bo_hole_soffset(sa_manager); - eoffset = amdgpu_sa_bo_hole_eoffset(sa_manager); - wasted = (align - (soffset % align)) % align; - - if ((eoffset - soffset) >= (size + wasted)) { - return true; - } - - return false; -} - -static bool amdgpu_sa_bo_next_hole(struct amdgpu_sa_manager *sa_manager, - struct dma_fence **fences, - unsigned *tries) -{ - struct amdgpu_sa_bo *best_bo = NULL; - unsigned i, soffset, best, tmp; - - /* if hole points to the end of the buffer */ - if (sa_manager->hole->next == &sa_manager->olist) { - /* try again with its beginning */ - sa_manager->hole = &sa_manager->olist; - return true; - } - - soffset = amdgpu_sa_bo_hole_soffset(sa_manager); - /* to handle wrap around we add sa_manager->size */ - best = sa_manager->size * 2; - /* go over all fence list and try to find the closest sa_bo - * of the current last - */ - for (i = 0; i < AMDGPU_SA_NUM_FENCE_LISTS; ++i) { - struct amdgpu_sa_bo *sa_bo; - - fences[i] = NULL; - - if (list_empty(&sa_manager->flist[i])) - continue; - - sa_bo = list_first_entry(&sa_manager->flist[i], - struct amdgpu_sa_bo, flist); - - if (!dma_fence_is_signaled(sa_bo->fence)) { - fences[i] = sa_bo->fence; - continue; - } - - /* limit the number of tries each ring gets */ - if (tries[i] > 2) { - continue; - } - - tmp = sa_bo->soffset; - if (tmp < soffset) { - /* wrap around, pretend it's after */ - tmp += sa_manager->size; - } - tmp -= soffset; - if (tmp < best) { - /* this sa bo is the closest one */ - best = tmp; - best_bo = sa_bo; - } - } - - if (best_bo) { - uint32_t idx = best_bo->fence->context; - - idx %= AMDGPU_SA_NUM_FENCE_LISTS; - ++tries[idx]; - sa_manager->hole = best_bo->olist.prev; - - /* we knew that this one is signaled, - so it's save to remote it */ - amdgpu_sa_bo_remove_locked(best_bo); - return true; - } - return false; -} - -int amdgpu_sa_bo_new(struct amdgpu_sa_manager *sa_manager, - struct amdgpu_sa_bo **sa_bo, - unsigned size, unsigned align) -{ - struct dma_fence *fences[AMDGPU_SA_NUM_FENCE_LISTS]; - unsigned tries[AMDGPU_SA_NUM_FENCE_LISTS]; - unsigned count; - int i, r; - signed long t; - - if (WARN_ON_ONCE(align > sa_manager->align)) - return -EINVAL; - - if (WARN_ON_ONCE(size > sa_manager->size)) - return -EINVAL; - - *sa_bo = kmalloc(sizeof(struct amdgpu_sa_bo), GFP_KERNEL); - if (!(*sa_bo)) - return -ENOMEM; - (*sa_bo)->manager = sa_manager; - (*sa_bo)->fence = NULL; - INIT_LIST_HEAD(&(*sa_bo)->olist); - INIT_LIST_HEAD(&(*sa_bo)->flist); - - spin_lock(&sa_manager->wq.lock); - do { - for (i = 0; i < AMDGPU_SA_NUM_FENCE_LISTS; ++i) - tries[i] = 0; - - do { - amdgpu_sa_bo_try_free(sa_manager); - - if (amdgpu_sa_bo_try_alloc(sa_manager, *sa_bo, - size, align)) { - spin_unlock(&sa_manager->wq.lock); - return 0; - } - - /* see if we can skip over some allocations */ - } while (amdgpu_sa_bo_next_hole(sa_manager, fences, tries)); - - for (i = 0, count = 0; i < AMDGPU_SA_NUM_FENCE_LISTS; ++i) - if (fences[i]) - fences[count++] = dma_fence_get(fences[i]); - - if (count) { - spin_unlock(&sa_manager->wq.lock); - t = dma_fence_wait_any_timeout(fences, count, false, - MAX_SCHEDULE_TIMEOUT, - NULL); - for (i = 0; i < count; ++i) - dma_fence_put(fences[i]); - - r = (t > 0) ? 0 : t; - spin_lock(&sa_manager->wq.lock); - } else { - /* if we have nothing to wait for block */ - r = wait_event_interruptible_locked( - sa_manager->wq, - amdgpu_sa_event(sa_manager, size, align) - ); - } - - } while (!r); - - spin_unlock(&sa_manager->wq.lock); - kfree(*sa_bo); - *sa_bo = NULL; - return r; -} - -void amdgpu_sa_bo_free(struct amdgpu_device *adev, struct amdgpu_sa_bo **sa_bo, +void amdgpu_sa_bo_free(struct amdgpu_device *adev, struct drm_suballoc **sa_bo, struct dma_fence *fence) { - struct amdgpu_sa_manager *sa_manager; - if (sa_bo == NULL || *sa_bo == NULL) { return; } - sa_manager = (*sa_bo)->manager; - spin_lock(&sa_manager->wq.lock); - if (fence && !dma_fence_is_signaled(fence)) { - uint32_t idx; - - (*sa_bo)->fence = dma_fence_get(fence); - idx = fence->context % AMDGPU_SA_NUM_FENCE_LISTS; - list_add_tail(&(*sa_bo)->flist, &sa_manager->flist[idx]); - } else { - amdgpu_sa_bo_remove_locked(*sa_bo); - } - wake_up_all_locked(&sa_manager->wq); - spin_unlock(&sa_manager->wq.lock); + drm_suballoc_free(*sa_bo, fence); *sa_bo = NULL; } @@ -373,26 +107,8 @@ void amdgpu_sa_bo_free(struct amdgpu_device *adev, struct amdgpu_sa_bo **sa_bo, void amdgpu_sa_bo_dump_debug_info(struct amdgpu_sa_manager *sa_manager, struct seq_file *m) { - struct amdgpu_sa_bo *i; - - spin_lock(&sa_manager->wq.lock); - list_for_each_entry(i, &sa_manager->olist, olist) { - uint64_t soffset = i->soffset + sa_manager->gpu_addr; - uint64_t eoffset = i->eoffset + sa_manager->gpu_addr; - if (&i->olist == sa_manager->hole) { - seq_printf(m, ">"); - } else { - seq_printf(m, " "); - } - seq_printf(m, "[0x%010llx 0x%010llx] size %8lld", - soffset, eoffset, eoffset - soffset); + struct drm_printer p = drm_seq_file_printer(m); - if (i->fence) - seq_printf(m, " protected by 0x%016llx on context %llu", - i->fence->seqno, i->fence->context); - - seq_printf(m, "\n"); - } - spin_unlock(&sa_manager->wq.lock); + drm_suballoc_dump_debug_info(&sa_manager->base, &p, sa_manager->gpu_addr); } #endif From patchwork Thu Feb 16 14:48:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 13143294 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5BFC1C636CC for ; Thu, 16 Feb 2023 14:49:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 45C7F10EDB8; Thu, 16 Feb 2023 14:49:26 +0000 (UTC) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2FC3010E330; Thu, 16 Feb 2023 14:49:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676558961; x=1708094961; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0UZEIlWqAHKtDymVLKJUafLy1unsmnAuCAaURDyi6cw=; b=cMtSvbOUkX5ZMTffoNcVxYttaCdvLNRT+Vayn9oeHYqkOdAfVO2U1d7A +K8zXij/Afbe1LP9zSIZFjwjYV0cxRZF7R/LBQRM+ozvImEutlt2bKVuy gB5l2vgzp1GvN7iWcc7cUZYA5Po7FTYYJ2M8mVrhXx9RhPyrZoW4S4Nip /BjqakwcaZj9D+nQLncKgoJHOaLftJjI/mnYxhP+U9sRqceRG7HGVfeuR xwHXkCooy8Jx0iJBtjGsA6QxvD0drE46Uo3478nLZgGwueWCSpo4jFPIb mKxZWHbLOih8XLu/zGCe65s/LyGb6dgF6w9hnqZD5U+jg6trsJd7CwMuP A==; X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="331729182" X-IronPort-AV: E=Sophos;i="5.97,302,1669104000"; d="scan'208";a="331729182" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Feb 2023 06:49:20 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="672175132" X-IronPort-AV: E=Sophos;i="5.97,302,1669104000"; d="scan'208";a="672175132" Received: from ksushmit-mobl1.gar.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.179]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Feb 2023 06:49:18 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: dri-devel@lists.freedesktop.org Subject: [PATCH 3/3] drm/radeon: Use the drm suballocation manager implementation. Date: Thu, 16 Feb 2023 15:48:47 +0100 Message-Id: <20230216144847.216259-4-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20230216144847.216259-1-thomas.hellstrom@linux.intel.com> References: <20230216144847.216259-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , Daniel Vetter , Christian Koenig , Dave Airlie , intel-xe@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Maarten Lankhorst Use the generic suballocation helper. Note that the generic suballocator only allows a single alignment, so we may waste a few more bytes for radeon_semaphore, shouldn't be a big deal, could be re-added if needed. Also, similar to amdgpu, debug output changes slightly and suballocator cpu usage may be slightly higher. Signed-off-by: Maarten Lankhorst Co-developed-by: Thomas Hellström Signed-off-by: Thomas Hellström --- drivers/gpu/drm/radeon/radeon.h | 55 +--- drivers/gpu/drm/radeon/radeon_ib.c | 12 +- drivers/gpu/drm/radeon/radeon_object.h | 25 +- drivers/gpu/drm/radeon/radeon_sa.c | 314 ++-------------------- drivers/gpu/drm/radeon/radeon_semaphore.c | 6 +- 5 files changed, 55 insertions(+), 357 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 57e20780a458..d19a4b1c1a8f 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -79,6 +79,7 @@ #include #include +#include #include "radeon_family.h" #include "radeon_mode.h" @@ -511,52 +512,12 @@ struct radeon_bo { }; #define gem_to_radeon_bo(gobj) container_of((gobj), struct radeon_bo, tbo.base) -/* sub-allocation manager, it has to be protected by another lock. - * By conception this is an helper for other part of the driver - * like the indirect buffer or semaphore, which both have their - * locking. - * - * Principe is simple, we keep a list of sub allocation in offset - * order (first entry has offset == 0, last entry has the highest - * offset). - * - * When allocating new object we first check if there is room at - * the end total_size - (last_object_offset + last_object_size) >= - * alloc_size. If so we allocate new object there. - * - * When there is not enough room at the end, we start waiting for - * each sub object until we reach object_offset+object_size >= - * alloc_size, this object then become the sub object we return. - * - * Alignment can't be bigger than page size. - * - * Hole are not considered for allocation to keep things simple. - * Assumption is that there won't be hole (all object on same - * alignment). - */ struct radeon_sa_manager { - wait_queue_head_t wq; - struct radeon_bo *bo; - struct list_head *hole; - struct list_head flist[RADEON_NUM_RINGS]; - struct list_head olist; - unsigned size; - uint64_t gpu_addr; - void *cpu_ptr; - uint32_t domain; - uint32_t align; -}; - -struct radeon_sa_bo; - -/* sub-allocation buffer */ -struct radeon_sa_bo { - struct list_head olist; - struct list_head flist; - struct radeon_sa_manager *manager; - unsigned soffset; - unsigned eoffset; - struct radeon_fence *fence; + struct drm_suballoc_manager base; + struct radeon_bo *bo; + uint64_t gpu_addr; + void *cpu_ptr; + u32 domain; }; /* @@ -587,7 +548,7 @@ int radeon_mode_dumb_mmap(struct drm_file *filp, * Semaphores. */ struct radeon_semaphore { - struct radeon_sa_bo *sa_bo; + struct drm_suballoc *sa_bo; signed waiters; uint64_t gpu_addr; }; @@ -816,7 +777,7 @@ void radeon_irq_kms_disable_hpd(struct radeon_device *rdev, unsigned hpd_mask); */ struct radeon_ib { - struct radeon_sa_bo *sa_bo; + struct drm_suballoc *sa_bo; uint32_t length_dw; uint64_t gpu_addr; uint32_t *ptr; diff --git a/drivers/gpu/drm/radeon/radeon_ib.c b/drivers/gpu/drm/radeon/radeon_ib.c index 62b116727b4f..63fcfe65d814 100644 --- a/drivers/gpu/drm/radeon/radeon_ib.c +++ b/drivers/gpu/drm/radeon/radeon_ib.c @@ -61,7 +61,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring, { int r; - r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo, &ib->sa_bo, size, 256); + r = radeon_sa_bo_new(&rdev->ring_tmp_bo, &ib->sa_bo, size); if (r) { dev_err(rdev->dev, "failed to get a new IB (%d)\n", r); return r; @@ -77,7 +77,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring, /* ib pool is bound at RADEON_VA_IB_OFFSET in virtual address * space and soffset is the offset inside the pool bo */ - ib->gpu_addr = ib->sa_bo->soffset + RADEON_VA_IB_OFFSET; + ib->gpu_addr = drm_suballoc_soffset(ib->sa_bo) + RADEON_VA_IB_OFFSET; } else { ib->gpu_addr = radeon_sa_bo_gpu_addr(ib->sa_bo); } @@ -97,7 +97,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring, void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib *ib) { radeon_sync_free(rdev, &ib->sync, ib->fence); - radeon_sa_bo_free(rdev, &ib->sa_bo, ib->fence); + radeon_sa_bo_free(&ib->sa_bo, ib->fence); radeon_fence_unref(&ib->fence); } @@ -201,8 +201,7 @@ int radeon_ib_pool_init(struct radeon_device *rdev) if (rdev->family >= CHIP_BONAIRE) { r = radeon_sa_bo_manager_init(rdev, &rdev->ring_tmp_bo, - RADEON_IB_POOL_SIZE*64*1024, - RADEON_GPU_PAGE_SIZE, + RADEON_IB_POOL_SIZE*64*1024, 256, RADEON_GEM_DOMAIN_GTT, RADEON_GEM_GTT_WC); } else { @@ -210,8 +209,7 @@ int radeon_ib_pool_init(struct radeon_device *rdev) * to the command stream checking */ r = radeon_sa_bo_manager_init(rdev, &rdev->ring_tmp_bo, - RADEON_IB_POOL_SIZE*64*1024, - RADEON_GPU_PAGE_SIZE, + RADEON_IB_POOL_SIZE*64*1024, 256, RADEON_GEM_DOMAIN_GTT, 0); } if (r) { diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index 0a6ef49e990a..b7c5087a7dbc 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -169,15 +169,22 @@ extern void radeon_bo_fence(struct radeon_bo *bo, struct radeon_fence *fence, /* * sub allocation */ +static inline struct radeon_sa_manager * +to_radeon_sa_manager(struct drm_suballoc_manager *manager) +{ + return container_of(manager, struct radeon_sa_manager, base); +} -static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo) +static inline uint64_t radeon_sa_bo_gpu_addr(struct drm_suballoc *sa_bo) { - return sa_bo->manager->gpu_addr + sa_bo->soffset; + return to_radeon_sa_manager(sa_bo->manager)->gpu_addr + + drm_suballoc_soffset(sa_bo); } -static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo) +static inline void * radeon_sa_bo_cpu_addr(struct drm_suballoc *sa_bo) { - return sa_bo->manager->cpu_ptr + sa_bo->soffset; + return to_radeon_sa_manager(sa_bo->manager)->cpu_ptr + + drm_suballoc_soffset(sa_bo); } extern int radeon_sa_bo_manager_init(struct radeon_device *rdev, @@ -190,12 +197,10 @@ extern int radeon_sa_bo_manager_start(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager); extern int radeon_sa_bo_manager_suspend(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager); -extern int radeon_sa_bo_new(struct radeon_device *rdev, - struct radeon_sa_manager *sa_manager, - struct radeon_sa_bo **sa_bo, - unsigned size, unsigned align); -extern void radeon_sa_bo_free(struct radeon_device *rdev, - struct radeon_sa_bo **sa_bo, +extern int radeon_sa_bo_new(struct radeon_sa_manager *sa_manager, + struct drm_suballoc **sa_bo, + unsigned size); +extern void radeon_sa_bo_free(struct drm_suballoc **sa_bo, struct radeon_fence *fence); #if defined(CONFIG_DEBUG_FS) extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 0981948bd9ed..b5555750aa0d 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -44,53 +44,31 @@ #include "radeon.h" -static void radeon_sa_bo_remove_locked(struct radeon_sa_bo *sa_bo); -static void radeon_sa_bo_try_free(struct radeon_sa_manager *sa_manager); - int radeon_sa_bo_manager_init(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, - unsigned size, u32 align, u32 domain, u32 flags) + unsigned size, u32 sa_align, u32 domain, u32 flags) { - int i, r; - - init_waitqueue_head(&sa_manager->wq); - sa_manager->bo = NULL; - sa_manager->size = size; - sa_manager->domain = domain; - sa_manager->align = align; - sa_manager->hole = &sa_manager->olist; - INIT_LIST_HEAD(&sa_manager->olist); - for (i = 0; i < RADEON_NUM_RINGS; ++i) { - INIT_LIST_HEAD(&sa_manager->flist[i]); - } + int r; - r = radeon_bo_create(rdev, size, align, true, + r = radeon_bo_create(rdev, size, RADEON_GPU_PAGE_SIZE, true, domain, flags, NULL, NULL, &sa_manager->bo); if (r) { dev_err(rdev->dev, "(%d) failed to allocate bo for manager\n", r); return r; } + sa_manager->domain = domain; + + drm_suballoc_manager_init(&sa_manager->base, size, sa_align); + return r; } void radeon_sa_bo_manager_fini(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager) { - struct radeon_sa_bo *sa_bo, *tmp; - - if (!list_empty(&sa_manager->olist)) { - sa_manager->hole = &sa_manager->olist, - radeon_sa_bo_try_free(sa_manager); - if (!list_empty(&sa_manager->olist)) { - dev_err(rdev->dev, "sa_manager is not empty, clearing anyway\n"); - } - } - list_for_each_entry_safe(sa_bo, tmp, &sa_manager->olist, olist) { - radeon_sa_bo_remove_locked(sa_bo); - } + drm_suballoc_manager_fini(&sa_manager->base); radeon_bo_unref(&sa_manager->bo); - sa_manager->size = 0; } int radeon_sa_bo_manager_start(struct radeon_device *rdev, @@ -139,260 +117,33 @@ int radeon_sa_bo_manager_suspend(struct radeon_device *rdev, return r; } -static void radeon_sa_bo_remove_locked(struct radeon_sa_bo *sa_bo) +int radeon_sa_bo_new(struct radeon_sa_manager *sa_manager, + struct drm_suballoc **sa_bo, + unsigned size) { - struct radeon_sa_manager *sa_manager = sa_bo->manager; - if (sa_manager->hole == &sa_bo->olist) { - sa_manager->hole = sa_bo->olist.prev; - } - list_del_init(&sa_bo->olist); - list_del_init(&sa_bo->flist); - radeon_fence_unref(&sa_bo->fence); - kfree(sa_bo); -} - -static void radeon_sa_bo_try_free(struct radeon_sa_manager *sa_manager) -{ - struct radeon_sa_bo *sa_bo, *tmp; - - if (sa_manager->hole->next == &sa_manager->olist) - return; + struct drm_suballoc *sa = drm_suballoc_new(&sa_manager->base, size, GFP_KERNEL, true); - sa_bo = list_entry(sa_manager->hole->next, struct radeon_sa_bo, olist); - list_for_each_entry_safe_from(sa_bo, tmp, &sa_manager->olist, olist) { - if (sa_bo->fence == NULL || !radeon_fence_signaled(sa_bo->fence)) { - return; - } - radeon_sa_bo_remove_locked(sa_bo); + if (IS_ERR(sa)) { + *sa_bo = NULL; + return PTR_ERR(sa); } -} -static inline unsigned radeon_sa_bo_hole_soffset(struct radeon_sa_manager *sa_manager) -{ - struct list_head *hole = sa_manager->hole; - - if (hole != &sa_manager->olist) { - return list_entry(hole, struct radeon_sa_bo, olist)->eoffset; - } + *sa_bo = sa; return 0; } -static inline unsigned radeon_sa_bo_hole_eoffset(struct radeon_sa_manager *sa_manager) -{ - struct list_head *hole = sa_manager->hole; - - if (hole->next != &sa_manager->olist) { - return list_entry(hole->next, struct radeon_sa_bo, olist)->soffset; - } - return sa_manager->size; -} - -static bool radeon_sa_bo_try_alloc(struct radeon_sa_manager *sa_manager, - struct radeon_sa_bo *sa_bo, - unsigned size, unsigned align) -{ - unsigned soffset, eoffset, wasted; - - soffset = radeon_sa_bo_hole_soffset(sa_manager); - eoffset = radeon_sa_bo_hole_eoffset(sa_manager); - wasted = (align - (soffset % align)) % align; - - if ((eoffset - soffset) >= (size + wasted)) { - soffset += wasted; - - sa_bo->manager = sa_manager; - sa_bo->soffset = soffset; - sa_bo->eoffset = soffset + size; - list_add(&sa_bo->olist, sa_manager->hole); - INIT_LIST_HEAD(&sa_bo->flist); - sa_manager->hole = &sa_bo->olist; - return true; - } - return false; -} - -/** - * radeon_sa_event - Check if we can stop waiting - * - * @sa_manager: pointer to the sa_manager - * @size: number of bytes we want to allocate - * @align: alignment we need to match - * - * Check if either there is a fence we can wait for or - * enough free memory to satisfy the allocation directly - */ -static bool radeon_sa_event(struct radeon_sa_manager *sa_manager, - unsigned size, unsigned align) -{ - unsigned soffset, eoffset, wasted; - int i; - - for (i = 0; i < RADEON_NUM_RINGS; ++i) { - if (!list_empty(&sa_manager->flist[i])) { - return true; - } - } - - soffset = radeon_sa_bo_hole_soffset(sa_manager); - eoffset = radeon_sa_bo_hole_eoffset(sa_manager); - wasted = (align - (soffset % align)) % align; - - if ((eoffset - soffset) >= (size + wasted)) { - return true; - } - - return false; -} - -static bool radeon_sa_bo_next_hole(struct radeon_sa_manager *sa_manager, - struct radeon_fence **fences, - unsigned *tries) -{ - struct radeon_sa_bo *best_bo = NULL; - unsigned i, soffset, best, tmp; - - /* if hole points to the end of the buffer */ - if (sa_manager->hole->next == &sa_manager->olist) { - /* try again with its beginning */ - sa_manager->hole = &sa_manager->olist; - return true; - } - - soffset = radeon_sa_bo_hole_soffset(sa_manager); - /* to handle wrap around we add sa_manager->size */ - best = sa_manager->size * 2; - /* go over all fence list and try to find the closest sa_bo - * of the current last - */ - for (i = 0; i < RADEON_NUM_RINGS; ++i) { - struct radeon_sa_bo *sa_bo; - - fences[i] = NULL; - - if (list_empty(&sa_manager->flist[i])) { - continue; - } - - sa_bo = list_first_entry(&sa_manager->flist[i], - struct radeon_sa_bo, flist); - - if (!radeon_fence_signaled(sa_bo->fence)) { - fences[i] = sa_bo->fence; - continue; - } - - /* limit the number of tries each ring gets */ - if (tries[i] > 2) { - continue; - } - - tmp = sa_bo->soffset; - if (tmp < soffset) { - /* wrap around, pretend it's after */ - tmp += sa_manager->size; - } - tmp -= soffset; - if (tmp < best) { - /* this sa bo is the closest one */ - best = tmp; - best_bo = sa_bo; - } - } - - if (best_bo) { - ++tries[best_bo->fence->ring]; - sa_manager->hole = best_bo->olist.prev; - - /* we knew that this one is signaled, - so it's save to remote it */ - radeon_sa_bo_remove_locked(best_bo); - return true; - } - return false; -} - -int radeon_sa_bo_new(struct radeon_device *rdev, - struct radeon_sa_manager *sa_manager, - struct radeon_sa_bo **sa_bo, - unsigned size, unsigned align) -{ - struct radeon_fence *fences[RADEON_NUM_RINGS]; - unsigned tries[RADEON_NUM_RINGS]; - int i, r; - - BUG_ON(align > sa_manager->align); - BUG_ON(size > sa_manager->size); - - *sa_bo = kmalloc(sizeof(struct radeon_sa_bo), GFP_KERNEL); - if ((*sa_bo) == NULL) { - return -ENOMEM; - } - (*sa_bo)->manager = sa_manager; - (*sa_bo)->fence = NULL; - INIT_LIST_HEAD(&(*sa_bo)->olist); - INIT_LIST_HEAD(&(*sa_bo)->flist); - - spin_lock(&sa_manager->wq.lock); - do { - for (i = 0; i < RADEON_NUM_RINGS; ++i) - tries[i] = 0; - - do { - radeon_sa_bo_try_free(sa_manager); - - if (radeon_sa_bo_try_alloc(sa_manager, *sa_bo, - size, align)) { - spin_unlock(&sa_manager->wq.lock); - return 0; - } - - /* see if we can skip over some allocations */ - } while (radeon_sa_bo_next_hole(sa_manager, fences, tries)); - - for (i = 0; i < RADEON_NUM_RINGS; ++i) - radeon_fence_ref(fences[i]); - - spin_unlock(&sa_manager->wq.lock); - r = radeon_fence_wait_any(rdev, fences, false); - for (i = 0; i < RADEON_NUM_RINGS; ++i) - radeon_fence_unref(&fences[i]); - spin_lock(&sa_manager->wq.lock); - /* if we have nothing to wait for block */ - if (r == -ENOENT) { - r = wait_event_interruptible_locked( - sa_manager->wq, - radeon_sa_event(sa_manager, size, align) - ); - } - - } while (!r); - - spin_unlock(&sa_manager->wq.lock); - kfree(*sa_bo); - *sa_bo = NULL; - return r; -} - -void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo **sa_bo, +void radeon_sa_bo_free(struct drm_suballoc **sa_bo, struct radeon_fence *fence) { - struct radeon_sa_manager *sa_manager; - if (sa_bo == NULL || *sa_bo == NULL) { return; } - sa_manager = (*sa_bo)->manager; - spin_lock(&sa_manager->wq.lock); - if (fence && !radeon_fence_signaled(fence)) { - (*sa_bo)->fence = radeon_fence_ref(fence); - list_add_tail(&(*sa_bo)->flist, - &sa_manager->flist[fence->ring]); - } else { - radeon_sa_bo_remove_locked(*sa_bo); - } - wake_up_all_locked(&sa_manager->wq); - spin_unlock(&sa_manager->wq.lock); + if (fence) + drm_suballoc_free(*sa_bo, &fence->base); + else + drm_suballoc_free(*sa_bo, NULL); + *sa_bo = NULL; } @@ -400,25 +151,8 @@ void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo **sa_bo, void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, struct seq_file *m) { - struct radeon_sa_bo *i; + struct drm_printer p = drm_seq_file_printer(m); - spin_lock(&sa_manager->wq.lock); - list_for_each_entry(i, &sa_manager->olist, olist) { - uint64_t soffset = i->soffset + sa_manager->gpu_addr; - uint64_t eoffset = i->eoffset + sa_manager->gpu_addr; - if (&i->olist == sa_manager->hole) { - seq_printf(m, ">"); - } else { - seq_printf(m, " "); - } - seq_printf(m, "[0x%010llx 0x%010llx] size %8lld", - soffset, eoffset, eoffset - soffset); - if (i->fence) { - seq_printf(m, " protected by 0x%016llx on ring %d", - i->fence->seq, i->fence->ring); - } - seq_printf(m, "\n"); - } - spin_unlock(&sa_manager->wq.lock); + drm_suballoc_dump_debug_info(&sa_manager->base, &p, sa_manager->gpu_addr); } #endif diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c index 221e59476f64..3e2b0bf0d55d 100644 --- a/drivers/gpu/drm/radeon/radeon_semaphore.c +++ b/drivers/gpu/drm/radeon/radeon_semaphore.c @@ -40,8 +40,8 @@ int radeon_semaphore_create(struct radeon_device *rdev, if (*semaphore == NULL) { return -ENOMEM; } - r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo, - &(*semaphore)->sa_bo, 8, 8); + r = radeon_sa_bo_new(&rdev->ring_tmp_bo, + &(*semaphore)->sa_bo, 8); if (r) { kfree(*semaphore); *semaphore = NULL; @@ -100,7 +100,7 @@ void radeon_semaphore_free(struct radeon_device *rdev, dev_err(rdev->dev, "semaphore %p has more waiters than signalers," " hardware lockup imminent!\n", *semaphore); } - radeon_sa_bo_free(rdev, &(*semaphore)->sa_bo, fence); + radeon_sa_bo_free(&(*semaphore)->sa_bo, fence); kfree(*semaphore); *semaphore = NULL; }