[01/13] drm: execution context for GEM buffers v4

Message ID	20230504115159.2245-2-christian.koenig@amd.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: " =?utf-8?q?Christian_K=C3=B6nig?= " <ckoenig.leichtzumerken@gmail.com> To: francois.dugast@intel.com, felix.kuehling@amd.com, arunpravin.paneerselvam@amd.com, thomas_os@shipmail.org, dakr@redhat.com, luben.tuikov@amd.com, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Subject: [PATCH 01/13] drm: execution context for GEM buffers v4 Date: Thu, 4 May 2023 13:51:47 +0200 Message-Id: <20230504115159.2245-2-christian.koenig@amd.com> In-Reply-To: <20230504115159.2245-1-christian.koenig@amd.com> References: <20230504115159.2245-1-christian.koenig@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	[01/13] drm: execution context for GEM buffers v4 \| expand [01/13] drm: execution context for GEM buffers v4 [02/13] drm: add drm_exec selftests v2 [03/13] drm/amdkfd: switch over to using drm_exec v2 [04/13] drm/amdgpu: use drm_exec for GEM and CSA handling [05/13] drm/amdgpu: use drm_exec for MES testing [06/13] drm/amdgpu: use the new drm_exec object for CS v2 [07/13] drm/radeon: switch over to drm_exec [08/13] drm/qxl: switch to using drm_exec [09/13] drm/lima: switch to using drm_exec [10/13] drm/virtgpu: switch to using drm_exec [11/13] drm/panfrost: switch to using drm_exec [12/13] drm/v3d: switch to using drm_exec [13/13] drm: remove drm_gem_(un)lock_reservations

Christian König May 4, 2023, 11:51 a.m. UTC

This adds the infrastructure for an execution context for GEM buffers
which is similar to the existing TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

v2: drop xarray and use dynamic resized array instead, the locking
    overhead is unecessary and measurable.
v3: drop duplicate tracking, radeon is really the only one needing that.
v4: fixes issues pointed out by Danilo, some typos in comments and a
    helper for lock arrays of GEM objects.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 Documentation/gpu/drm-mm.rst |  12 ++
 drivers/gpu/drm/Kconfig      |   6 +
 drivers/gpu/drm/Makefile     |   2 +
 drivers/gpu/drm/drm_exec.c   | 278 +++++++++++++++++++++++++++++++++++
 include/drm/drm_exec.h       | 119 +++++++++++++++
 5 files changed, 417 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 include/drm/drm_exec.h

Thomas Hellström (Intel) May 4, 2023, 2:02 p.m. UTC | #1

On 5/4/23 13:51, Christian König wrote:
> This adds the infrastructure for an execution context for GEM buffers
> which is similar to the existing TTMs execbuf util and intended to replace
> it in the long term.
>
> The basic functionality is that we abstracts the necessary loop to lock
> many different GEM buffers with automated deadlock and duplicate handling.
>
> v2: drop xarray and use dynamic resized array instead, the locking
>      overhead is unecessary and measurable.
> v3: drop duplicate tracking, radeon is really the only one needing that.
> v4: fixes issues pointed out by Danilo, some typos in comments and a
>      helper for lock arrays of GEM objects.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
...
> +/**
> + * struct drm_exec - Execution context
> + */
> +struct drm_exec {
> +	/**
> +	 * @interruptible: If locks should be taken interruptible
> +	 */
> +	bool			interruptible;
> +
> +	/**
> +	 * @ticket: WW ticket used for acquiring locks
> +	 */
> +	struct ww_acquire_ctx	ticket;
> +
> +	/**
> +	 * @num_objects: number of objects locked
> +	 */
> +	unsigned int		num_objects;
> +
> +	/**
> +	 * @max_objects: maximum objects in array
> +	 */
> +	unsigned int		max_objects;
> +
> +	/**
> +	 * @objects: array of the locked objects
> +	 */
> +	struct drm_gem_object	**objects;

Hi, Christian. Did you consider using a list here with links embedded in 
gem objects, now that only locked objects are to be put on the list / array.

That should work as only the process owning the lock may access the list 
link. Apart from getting rid of reallocing this is beneficial for the 
more general types of ww transactions that are used by i915 (and to a 
minor extent xe as well, I think).

In those cases we would want to unlock a temporary held object within 
the while_not_all_locked() loop and would then have to search the entire 
array for the correct pointer. Instead one could just remove it from the 
list of locked objects.

Thanks,

Thomas

Danilo Krummrich May 25, 2023, 8:42 p.m. UTC | #2

On 5/4/23 13:51, Christian König wrote:
> This adds the infrastructure for an execution context for GEM buffers
> which is similar to the existing TTMs execbuf util and intended to replace
> it in the long term.
> 
> The basic functionality is that we abstracts the necessary loop to lock
> many different GEM buffers with automated deadlock and duplicate handling.
> 
> v2: drop xarray and use dynamic resized array instead, the locking
>      overhead is unecessary and measurable.
> v3: drop duplicate tracking, radeon is really the only one needing that.
> v4: fixes issues pointed out by Danilo, some typos in comments and a
>      helper for lock arrays of GEM objects.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>

Reviewed-by: Danilo Krummrich <dakr@redhat.com>

> ---
>   Documentation/gpu/drm-mm.rst |  12 ++
>   drivers/gpu/drm/Kconfig      |   6 +
>   drivers/gpu/drm/Makefile     |   2 +
>   drivers/gpu/drm/drm_exec.c   | 278 +++++++++++++++++++++++++++++++++++
>   include/drm/drm_exec.h       | 119 +++++++++++++++
>   5 files changed, 417 insertions(+)
>   create mode 100644 drivers/gpu/drm/drm_exec.c
>   create mode 100644 include/drm/drm_exec.h
> 
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index a79fd3549ff8..a52e6f4117d6 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -493,6 +493,18 @@ DRM Sync Objects
>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>      :export:
>   
> +DRM Execution context
> +=====================
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :doc: Overview
> +
> +.. kernel-doc:: include/drm/drm_exec.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :export:
> +
>   GPU Scheduler
>   =============
>   
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index ba3fb04bb691..2dc81eb062eb 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -201,6 +201,12 @@ config DRM_TTM
>   	  GPU memory types. Will be enabled automatically if a device driver
>   	  uses it.
>   
> +config DRM_EXEC
> +	tristate
> +	depends on DRM
> +	help
> +	  Execution context for command submissions
> +
>   config DRM_BUDDY
>   	tristate
>   	depends on DRM
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index a33257d2bc7f..9c6446eb3c83 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -78,6 +78,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
>   #
>   # Memory-management helpers
>   #
> +#
> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>   
>   obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>   
> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
> new file mode 100644
> index 000000000000..18071bff20f4
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_exec.c
> @@ -0,0 +1,278 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gem.h>
> +#include <linux/dma-resv.h>
> +
> +/**
> + * DOC: Overview
> + *
> + * This component mainly abstracts the retry loop necessary for locking
> + * multiple GEM objects while preparing hardware operations (e.g. command
> + * submissions, page table updates etc..).
> + *
> + * If a contention is detected while locking a GEM object the cleanup procedure
> + * unlocks all previously locked GEM objects and locks the contended one first
> + * before locking any further objects.
> + *
> + * After an object is locked fences slots can optionally be reserved on the
> + * dma_resv object inside the GEM object.
> + *
> + * A typical usage pattern should look like this::
> + *
> + *	struct drm_gem_object *obj;
> + *	struct drm_exec exec;
> + *	unsigned long index;
> + *	int ret;
> + *
> + *	drm_exec_init(&exec, true);
> + *	drm_exec_while_not_all_locked(&exec) {
> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *
> + *		ret = drm_exec_prepare_obj(&exec, boB, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *	}
> + *
> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
> + *		...
> + *	}
> + *	drm_exec_fini(&exec);
> + *
> + * See struct dma_exec for more details.
> + */
> +
> +/* Dummy value used to initially enter the retry loop */
> +#define DRM_EXEC_DUMMY (void*)~0
> +
> +/* Unlock all objects and drop references */
> +static void drm_exec_unlock_all(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		dma_resv_unlock(obj->resv);
> +		drm_gem_object_put(obj);
> +	}
> +
> +	drm_gem_object_put(exec->prelocked);
> +	exec->prelocked = NULL;
> +}
> +
> +/**
> + * drm_exec_init - initialize a drm_exec object
> + * @exec: the drm_exec object to initialize
> + * @interruptible: if locks should be acquired interruptible
> + *
> + * Initialize the object and make sure that we can track locked objects.
> + */
> +void drm_exec_init(struct drm_exec *exec, bool interruptible)
> +{
> +	exec->interruptible = interruptible;
> +	exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +
> +	/* If allocation here fails, just delay that till the first use */
> +	exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
> +	exec->num_objects = 0;
> +	exec->contended = DRM_EXEC_DUMMY;
> +	exec->prelocked = NULL;
> +}
> +EXPORT_SYMBOL(drm_exec_init);
> +
> +/**
> + * drm_exec_fini - finalize a drm_exec object
> + * @exec: the drm_exec object to finalize
> + *
> + * Unlock all locked objects, drop the references to objects and free all memory
> + * used for tracking the state.
> + */
> +void drm_exec_fini(struct drm_exec *exec)
> +{
> +	drm_exec_unlock_all(exec);
> +	kvfree(exec->objects);
> +	if (exec->contended != DRM_EXEC_DUMMY) {
> +		drm_gem_object_put(exec->contended);
> +		ww_acquire_fini(&exec->ticket);
> +	}
> +}
> +EXPORT_SYMBOL(drm_exec_fini);
> +
> +/**
> + * drm_exec_cleanup - cleanup when contention is detected
> + * @exec: the drm_exec object to cleanup
> + *
> + * Cleanup the current state and return true if we should stay inside the retry
> + * loop, false if there wasn't any contention detected and we can keep the
> + * objects locked.
> + */
> +bool drm_exec_cleanup(struct drm_exec *exec)
> +{
> +	if (likely(!exec->contended)) {
> +		ww_acquire_done(&exec->ticket);
> +		return false;
> +	}
> +
> +	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
> +		exec->contended = NULL;
> +		ww_acquire_init(&exec->ticket, &reservation_ww_class);
> +		return true;
> +	}
> +
> +	drm_exec_unlock_all(exec);
> +	exec->num_objects = 0;
> +	return true;
> +}
> +EXPORT_SYMBOL(drm_exec_cleanup);
> +
> +/* Track the locked object in the array */
> +static int drm_exec_obj_locked(struct drm_exec *exec,
> +			       struct drm_gem_object *obj)
> +{
> +	if (unlikely(exec->num_objects == exec->max_objects)) {
> +		size_t size = exec->max_objects * sizeof(void *);
> +		void *tmp;
> +
> +		tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
> +				GFP_KERNEL);
> +		if (!tmp)
> +			return -ENOMEM;
> +
> +		exec->objects = tmp;
> +		exec->max_objects += PAGE_SIZE / sizeof(void *);
> +	}
> +	drm_gem_object_get(obj);
> +	exec->objects[exec->num_objects++] = obj;
> +
> +	return 0;
> +}
> +
> +/* Make sure the contended object is locked first */
> +static int drm_exec_lock_contended(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj = exec->contended;
> +	int ret;
> +
> +	if (likely(!obj))
> +		return 0;
> +
> +	if (exec->interruptible) {
> +		ret = dma_resv_lock_slow_interruptible(obj->resv,
> +						       &exec->ticket);
> +		if (unlikely(ret))
> +			goto error_dropref;
> +	} else {
> +		dma_resv_lock_slow(obj->resv, &exec->ticket);
> +	}
> +
> +	ret = drm_exec_obj_locked(exec, obj);
> +	if (unlikely(ret)) {
> +		dma_resv_unlock(obj->resv);
> +		goto error_dropref;
> +	}
> +
> +	swap(exec->prelocked, obj);
> +
> +error_dropref:
> +	/* Always cleanup the contention so that error handling can kick in */
> +	drm_gem_object_put(obj);
> +	exec->contended = NULL;
> +	return ret;
> +}
> +
> +/**
> + * drm_exec_prepare_obj - prepare a GEM object for use
> + * @exec: the drm_exec object with the state
> + * @obj: the GEM object to prepare
> + * @num_fences: how many fences to reserve
> + *
> + * Prepare a GEM object for use by locking it and reserving fence slots. All
> + * successfully locked objects are put into the locked container.
> + *
> + * Returns: -EDEADLK if a contention is detected, -EALREADY when object is
> + * already locked, -ENOMEM when memory allocation failed and zero for success.
> + */
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences)
> +{
> +	int ret;
> +
> +	ret = drm_exec_lock_contended(exec);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (exec->prelocked == obj) {
> +		drm_gem_object_put(exec->prelocked);
> +		exec->prelocked = NULL;
> +
> +		return dma_resv_reserve_fences(obj->resv, num_fences);
> +	}
> +
> +	if (exec->interruptible)
> +		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
> +	else
> +		ret = dma_resv_lock(obj->resv, &exec->ticket);
> +
> +	if (unlikely(ret == -EDEADLK)) {
> +		drm_gem_object_get(obj);
> +		exec->contended = obj;
> +		return -EDEADLK;
> +	}
> +
> +	if (unlikely(ret))
> +		return ret;
> +
> +	ret = drm_exec_obj_locked(exec, obj);
> +	if (ret)
> +		goto error_unlock;
> +
> +	/* Keep locked when reserving fences fails */
> +	return dma_resv_reserve_fences(obj->resv, num_fences);
> +
> +error_unlock:
> +	dma_resv_unlock(obj->resv);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_obj);
> +
> +/**
> + * drm_exec_prepare_array - helper to prepare an array of objects
> + * @exec: the drm_exec object with the state
> + * @objects: array of GEM object to prepare
> + * @num_objects: number of GEM objects in the array
> + * @num_fences: number of fences to reserve on each GEM object
> + *
> + * Prepares all GEM objects in an array, handles contention but aports on first
> + * error otherwise. Reserves @num_fences on each GEM object after locking it.
> + *
> + * Returns: -EALREADY when object is already locked, -ENOMEM when memory
> + * allocation failed and zero for success.
> + */
> +int drm_exec_prepare_array(struct drm_exec *exec,
> +			   struct drm_gem_object **objects,
> +			   unsigned int num_objects,
> +			   unsigned int num_fences)
> +{
> +	int ret;
> +
> +	drm_exec_while_not_all_locked(exec) {
> +		for (unsigned int i = 0; i < num_objects; ++i) {
> +			ret = drm_exec_prepare_obj(exec, objects[i],
> +						   num_fences);
> +			drm_exec_break_on_contention(exec);
> +			if (unlikely(ret))
> +				return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_array);
> +
> +MODULE_DESCRIPTION("DRM execution context");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> new file mode 100644
> index 000000000000..7c7481ed088a
> --- /dev/null
> +++ b/include/drm/drm_exec.h
> @@ -0,0 +1,119 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#ifndef __DRM_EXEC_H__
> +#define __DRM_EXEC_H__
> +
> +#include <linux/ww_mutex.h>
> +
> +struct drm_gem_object;
> +
> +/**
> + * struct drm_exec - Execution context
> + */
> +struct drm_exec {
> +	/**
> +	 * @interruptible: If locks should be taken interruptible
> +	 */
> +	bool			interruptible;
> +
> +	/**
> +	 * @ticket: WW ticket used for acquiring locks
> +	 */
> +	struct ww_acquire_ctx	ticket;
> +
> +	/**
> +	 * @num_objects: number of objects locked
> +	 */
> +	unsigned int		num_objects;
> +
> +	/**
> +	 * @max_objects: maximum objects in array
> +	 */
> +	unsigned int		max_objects;
> +
> +	/**
> +	 * @objects: array of the locked objects
> +	 */
> +	struct drm_gem_object	**objects;
> +
> +	/**
> +	 * @contended: contended GEM object we backed off for
> +	 */
> +	struct drm_gem_object	*contended;
> +
> +	/**
> +	 * @prelocked: already locked GEM object due to contention
> +	 */
> +	struct drm_gem_object *prelocked;
> +};
> +
> +/**
> + * drm_exec_for_each_locked_object - iterate over all the locked objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the locked GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_locked_object(exec, index, obj)	\
> +	for (index = 0, obj = (exec)->objects[0];		\
> +	     index < (exec)->num_objects;			\
> +	     ++index, obj = (exec)->objects[index])
> +
> +/**
> + * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
> + * @exec: drm_exec object
> + *
> + * Core functionality of the drm_exec object. Loops until all GEM objects are
> + * prepared and no more contention exists.
> + *
> + * At the beginning of the loop it is guaranteed that no GEM object is locked.
> + */
> +#define drm_exec_while_not_all_locked(exec)	\
> +	while (drm_exec_cleanup(exec))
> +
> +/**
> + * drm_exec_continue_on_contention - continue the loop when we need to cleanup
> + * @exec: drm_exec object
> + *
> + * Control flow helper to continue when a contention was detected and we need to
> + * clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_continue_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		continue
> +
> +/**
> + * drm_exec_break_on_contention - break a subordinal loop on contention
> + * @exec: drm_exec object
> + *
> + * Control flow helper to break a subordinal loop when a contention was detected
> + * and we need to clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_break_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		break
> +
> +/**
> + * drm_exec_is_contended - check for contention
> + * @exec: drm_exec object
> + *
> + * Returns true if the drm_exec object has run into some contention while
> + * locking a GEM object and needs to clean up.
> + */
> +static inline bool drm_exec_is_contended(struct drm_exec *exec)
> +{
> +	return !!exec->contended;
> +}
> +
> +void drm_exec_init(struct drm_exec *exec, bool interruptible);
> +void drm_exec_fini(struct drm_exec *exec);
> +bool drm_exec_cleanup(struct drm_exec *exec);
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences);
> +int drm_exec_prepare_array(struct drm_exec *exec,
> +			   struct drm_gem_object **objects,
> +			   unsigned int num_objects,
> +			   unsigned int num_fences);
> +
> +#endif

Boris Brezillon June 14, 2023, 12:23 p.m. UTC | #3

Hi Christian,

On Thu,  4 May 2023 13:51:47 +0200
"Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> This adds the infrastructure for an execution context for GEM buffers
> which is similar to the existing TTMs execbuf util and intended to replace
> it in the long term.
> 
> The basic functionality is that we abstracts the necessary loop to lock
> many different GEM buffers with automated deadlock and duplicate handling.

As many other drivers do already, we are considering using drm_exec()
for our resv locking in the PowerVR driver, so we might have more
questions/comments in the coming days/weeks, but I already have a
couple right now (see below).

> v3: drop duplicate tracking, radeon is really the only one needing that

I think we'd actually be interested in duplicate tracking. Is there any
way we can make it an optional feature through some extra helpers/flags?
Doesn't have to be done in this patch series, I'm just wondering if this
is something we can share as well.

[...]

> +/**
> + * DOC: Overview
> + *
> + * This component mainly abstracts the retry loop necessary for locking
> + * multiple GEM objects while preparing hardware operations (e.g. command
> + * submissions, page table updates etc..).
> + *
> + * If a contention is detected while locking a GEM object the cleanup procedure
> + * unlocks all previously locked GEM objects and locks the contended one first
> + * before locking any further objects.
> + *
> + * After an object is locked fences slots can optionally be reserved on the
> + * dma_resv object inside the GEM object.
> + *
> + * A typical usage pattern should look like this::
> + *
> + *	struct drm_gem_object *obj;
> + *	struct drm_exec exec;
> + *	unsigned long index;
> + *	int ret;
> + *
> + *	drm_exec_init(&exec, true);
> + *	drm_exec_while_not_all_locked(&exec) {
> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *

Have you considered defining a drm_exec_try_prepare_obj_or_retry()
combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?

#define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
        ({ \
                int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
                if (unlikely(drm_exec_is_contended(exec))) \
                        continue; \
                __ret; \
        })

This way the following pattern

		ret = drm_exec_prepare_obj(&exec, boA, 1);
		drm_exec_continue_on_contention(&exec);
		if (ret)
			goto error;

can be turned into something more conventional:

		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
		if (ret)
			goto error;

I guess we could even add static checks to make sure
drm_exec_try_prepare_obj() is called inside a
drm_exec_while_not_all_locked() loop.

> + *		ret = drm_exec_prepare_obj(&exec, boB, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *	}
> + *
> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
> + *		...
> + *	}
> + *	drm_exec_fini(&exec);
> + *
> + * See struct dma_exec for more details.
> + */

[...]

> +/**
> + * drm_exec_prepare_array - helper to prepare an array of objects
> + * @exec: the drm_exec object with the state
> + * @objects: array of GEM object to prepare
> + * @num_objects: number of GEM objects in the array
> + * @num_fences: number of fences to reserve on each GEM object
> + *
> + * Prepares all GEM objects in an array, handles contention but aports on first
> + * error otherwise. Reserves @num_fences on each GEM object after locking it.
> + *
> + * Returns: -EALREADY when object is already locked, -ENOMEM when memory
> + * allocation failed and zero for success.
> + */
> +int drm_exec_prepare_array(struct drm_exec *exec,
> +			   struct drm_gem_object **objects,
> +			   unsigned int num_objects,
> +			   unsigned int num_fences)
> +{
> +	int ret;
> +
> +	drm_exec_while_not_all_locked(exec) {
> +		for (unsigned int i = 0; i < num_objects; ++i) {
> +			ret = drm_exec_prepare_obj(exec, objects[i],
> +						   num_fences);
> +			drm_exec_break_on_contention(exec);

I had a hard time understanding what the intent was here: we do want the
locking to keep going on contention (reset and retry), but we need to
break out of the inner loop for this to happen, which is what this
drm_exec_break_on_contention() is doing. My misunderstanding was coming
from the fact I was expecting drm_exec_break_on_contention() to stop
the process of preparing objects. Maybe it's just me, but I think it'd
be less confusing if we were getting rid of
drm_exec_{break,continue}_on_contention and have the loop slightly
adjusted:

	unsigned int obj_ptr = 0;

	drm_exec_while_not_all_locked(exec) {
		int ret;

		/* We acquired/prepared all objects, we can leave the loop now. */
		if (obj_ptr == num_objects)
			break;

		ret = drm_exec_try_prepare_obj_or_retry(exec, objects[obj_ptr++],
							num_fences);
		if (ret)
			return ret;
	}

	return 0;

Of course, this is just my personal view on this, and none of these
comments should be considered as blockers, but I thought I'd share
my concerns anyway.

Thanks again for your work!

Regards,

Boris

Christian König June 14, 2023, 12:30 p.m. UTC | #4

Am 14.06.23 um 14:23 schrieb Boris Brezillon:
> Hi Christian,
>
> On Thu,  4 May 2023 13:51:47 +0200
> "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
>
>> This adds the infrastructure for an execution context for GEM buffers
>> which is similar to the existing TTMs execbuf util and intended to replace
>> it in the long term.
>>
>> The basic functionality is that we abstracts the necessary loop to lock
>> many different GEM buffers with automated deadlock and duplicate handling.
> As many other drivers do already, we are considering using drm_exec()
> for our resv locking in the PowerVR driver, so we might have more
> questions/comments in the coming days/weeks, but I already have a
> couple right now (see below).
>
>> v3: drop duplicate tracking, radeon is really the only one needing that
> I think we'd actually be interested in duplicate tracking. Is there any
> way we can make it an optional feature through some extra helpers/flags?
> Doesn't have to be done in this patch series, I'm just wondering if this
> is something we can share as well.

You can still capture the -EALREADY error and act appropriately in your 
driver.

For radeon it just means ignoring the error code and going ahead, but 
that behavior doesn't seem to be desired in most cases.

Initially I though we need to separately track how many and how often 
BOs are duplicated, but there is simply no use for this.

>
> [...]
>
>> +/**
>> + * DOC: Overview
>> + *
>> + * This component mainly abstracts the retry loop necessary for locking
>> + * multiple GEM objects while preparing hardware operations (e.g. command
>> + * submissions, page table updates etc..).
>> + *
>> + * If a contention is detected while locking a GEM object the cleanup procedure
>> + * unlocks all previously locked GEM objects and locks the contended one first
>> + * before locking any further objects.
>> + *
>> + * After an object is locked fences slots can optionally be reserved on the
>> + * dma_resv object inside the GEM object.
>> + *
>> + * A typical usage pattern should look like this::
>> + *
>> + *	struct drm_gem_object *obj;
>> + *	struct drm_exec exec;
>> + *	unsigned long index;
>> + *	int ret;
>> + *
>> + *	drm_exec_init(&exec, true);
>> + *	drm_exec_while_not_all_locked(&exec) {
>> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
>> + *		drm_exec_continue_on_contention(&exec);
>> + *		if (ret)
>> + *			goto error;
>> + *
> Have you considered defining a drm_exec_try_prepare_obj_or_retry()
> combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?
>
> #define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
>          ({ \
>                  int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
>                  if (unlikely(drm_exec_is_contended(exec))) \
>                          continue; \
>                  __ret; \
>          })
>
> This way the following pattern
>
> 		ret = drm_exec_prepare_obj(&exec, boA, 1);
> 		drm_exec_continue_on_contention(&exec);
> 		if (ret)
> 			goto error;
>
> can be turned into something more conventional:
>
> 		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
> 		if (ret)
> 			goto error;

Yeah, I was considering that as well. But then abandoned it as to 
complicated.

I really need to find some time to work on that anyway.

>
> I guess we could even add static checks to make sure
> drm_exec_try_prepare_obj() is called inside a
> drm_exec_while_not_all_locked() loop.

Interesting idea, but how would somebody do that?

Regards,
Christian.

>
>> + *		ret = drm_exec_prepare_obj(&exec, boB, 1);
>> + *		drm_exec_continue_on_contention(&exec);
>> + *		if (ret)
>> + *			goto error;
>> + *	}
>> + *
>> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
>> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>> + *		...
>> + *	}
>> + *	drm_exec_fini(&exec);
>> + *
>> + * See struct dma_exec for more details.
>> + */
> [...]
>
>> +/**
>> + * drm_exec_prepare_array - helper to prepare an array of objects
>> + * @exec: the drm_exec object with the state
>> + * @objects: array of GEM object to prepare
>> + * @num_objects: number of GEM objects in the array
>> + * @num_fences: number of fences to reserve on each GEM object
>> + *
>> + * Prepares all GEM objects in an array, handles contention but aports on first
>> + * error otherwise. Reserves @num_fences on each GEM object after locking it.
>> + *
>> + * Returns: -EALREADY when object is already locked, -ENOMEM when memory
>> + * allocation failed and zero for success.
>> + */
>> +int drm_exec_prepare_array(struct drm_exec *exec,
>> +			   struct drm_gem_object **objects,
>> +			   unsigned int num_objects,
>> +			   unsigned int num_fences)
>> +{
>> +	int ret;
>> +
>> +	drm_exec_while_not_all_locked(exec) {
>> +		for (unsigned int i = 0; i < num_objects; ++i) {
>> +			ret = drm_exec_prepare_obj(exec, objects[i],
>> +						   num_fences);
>> +			drm_exec_break_on_contention(exec);
> I had a hard time understanding what the intent was here: we do want the
> locking to keep going on contention (reset and retry), but we need to
> break out of the inner loop for this to happen, which is what this
> drm_exec_break_on_contention() is doing. My misunderstanding was coming
> from the fact I was expecting drm_exec_break_on_contention() to stop
> the process of preparing objects. Maybe it's just me, but I think it'd
> be less confusing if we were getting rid of
> drm_exec_{break,continue}_on_contention and have the loop slightly
> adjusted:
>
> 	unsigned int obj_ptr = 0;
>
> 	drm_exec_while_not_all_locked(exec) {
> 		int ret;
>
> 		/* We acquired/prepared all objects, we can leave the loop now. */
> 		if (obj_ptr == num_objects)
> 			break;
>
> 		ret = drm_exec_try_prepare_obj_or_retry(exec, objects[obj_ptr++],
> 							num_fences);
> 		if (ret)
> 			return ret;
> 	}
>
> 	return 0;
>
> Of course, this is just my personal view on this, and none of these
> comments should be considered as blockers, but I thought I'd share
> my concerns anyway.
>
> Thanks again for your work!
>
> Regards,
>
> Boris
>

Boris Brezillon June 14, 2023, 1:02 p.m. UTC | #5

On Wed, 14 Jun 2023 14:30:53 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 14.06.23 um 14:23 schrieb Boris Brezillon:
> > Hi Christian,
> >
> > On Thu,  4 May 2023 13:51:47 +0200
> > "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
> >  
> >> This adds the infrastructure for an execution context for GEM buffers
> >> which is similar to the existing TTMs execbuf util and intended to replace
> >> it in the long term.
> >>
> >> The basic functionality is that we abstracts the necessary loop to lock
> >> many different GEM buffers with automated deadlock and duplicate handling.  
> > As many other drivers do already, we are considering using drm_exec()
> > for our resv locking in the PowerVR driver, so we might have more
> > questions/comments in the coming days/weeks, but I already have a
> > couple right now (see below).
> >  
> >> v3: drop duplicate tracking, radeon is really the only one needing that  
> > I think we'd actually be interested in duplicate tracking. Is there any
> > way we can make it an optional feature through some extra helpers/flags?
> > Doesn't have to be done in this patch series, I'm just wondering if this
> > is something we can share as well.  
> 
> You can still capture the -EALREADY error and act appropriately in your 
> driver.
> 
> For radeon it just means ignoring the error code and going ahead, but 
> that behavior doesn't seem to be desired in most cases.
> 
> Initially I though we need to separately track how many and how often 
> BOs are duplicated, but there is simply no use for this.
> 
> >
> > [...]
> >  
> >> +/**
> >> + * DOC: Overview
> >> + *
> >> + * This component mainly abstracts the retry loop necessary for locking
> >> + * multiple GEM objects while preparing hardware operations (e.g. command
> >> + * submissions, page table updates etc..).
> >> + *
> >> + * If a contention is detected while locking a GEM object the cleanup procedure
> >> + * unlocks all previously locked GEM objects and locks the contended one first
> >> + * before locking any further objects.
> >> + *
> >> + * After an object is locked fences slots can optionally be reserved on the
> >> + * dma_resv object inside the GEM object.
> >> + *
> >> + * A typical usage pattern should look like this::
> >> + *
> >> + *	struct drm_gem_object *obj;
> >> + *	struct drm_exec exec;
> >> + *	unsigned long index;
> >> + *	int ret;
> >> + *
> >> + *	drm_exec_init(&exec, true);
> >> + *	drm_exec_while_not_all_locked(&exec) {
> >> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> >> + *		drm_exec_continue_on_contention(&exec);
> >> + *		if (ret)
> >> + *			goto error;
> >> + *  
> > Have you considered defining a drm_exec_try_prepare_obj_or_retry()
> > combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?
> >
> > #define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
> >          ({ \
> >                  int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
> >                  if (unlikely(drm_exec_is_contended(exec))) \
> >                          continue; \
> >                  __ret; \
> >          })
> >
> > This way the following pattern
> >
> > 		ret = drm_exec_prepare_obj(&exec, boA, 1);
> > 		drm_exec_continue_on_contention(&exec);
> > 		if (ret)
> > 			goto error;
> >
> > can be turned into something more conventional:
> >
> > 		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
> > 		if (ret)
> > 			goto error;  
> 
> Yeah, I was considering that as well. But then abandoned it as to 
> complicated.
> 
> I really need to find some time to work on that anyway.
> 
> >
> > I guess we could even add static checks to make sure
> > drm_exec_try_prepare_obj() is called inside a
> > drm_exec_while_not_all_locked() loop.  
> 
> Interesting idea, but how would somebody do that?

There are probably better/cleaner ways, but the below diff
seems to catch cases where drm_exec_try_prepare_obj() is
called in a context where break/continue are allowed, but
that's not inside a drm_exec_while_not_all_locked() section.

What's missing though is a way to detect when it's called
from an inner loop.

---
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
index 7c7481ed088a..1f4e0e1a7783 100644
--- a/include/drm/drm_exec.h
+++ b/include/drm/drm_exec.h
@@ -69,8 +69,10 @@ struct drm_exec {
  *
  * At the beginning of the loop it is guaranteed that no GEM object is locked.
  */
+#define __in_drm_exec_while_not_all_locked false
 #define drm_exec_while_not_all_locked(exec)    \
-       while (drm_exec_cleanup(exec))
+       for (const bool __in_drm_exec_while_not_all_locked = true; \
+            drm_exec_cleanup(exec); )
 
 /**
  * drm_exec_continue_on_contention - continue the loop when we need to cleanup
@@ -83,6 +85,25 @@ struct drm_exec {
        if (unlikely(drm_exec_is_contended(exec)))      \
                continue
 
+/**
+ * drm_exec_try_prepare_obj - Try prepare an object and retry on contention
+ * @exec: drm_exec object
+ * @obj: GEM object to prepare
+ * @num_fence: number of fence slots to reserve
+ *
+ * Wrapper around drm_exec_prepare_obj() that automatically retries on
+ * contention by going back to the head of the drm_exec_while_not_all_locked()
+ * loop.
+ */
+#define drm_exec_try_prepare_obj(exec, obj, num_fences) \
+       ({ \
+               int __ret = drm_exec_prepare_obj(exec, obj, num_fences); \
+               static_assert(__in_drm_exec_while_not_all_locked == true); \
+               if (unlikely(drm_exec_is_contended(exec))) \
+                       continue; \
+               __ret; \
+       })
+
 /**
  * drm_exec_break_on_contention - break a subordinal loop on contention
  * @exec: drm_exec object

Boris Brezillon June 17, 2023, 11:54 a.m. UTC | #6

+Matthew who's been using drm_exec in Xe if I'm correct.

Hello Christian,

On Wed, 14 Jun 2023 15:02:52 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Wed, 14 Jun 2023 14:30:53 +0200
> Christian König <christian.koenig@amd.com> wrote:
> 
> > Am 14.06.23 um 14:23 schrieb Boris Brezillon:  
> > > Hi Christian,
> > >
> > > On Thu,  4 May 2023 13:51:47 +0200
> > > "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
> > >    
> > >> This adds the infrastructure for an execution context for GEM buffers
> > >> which is similar to the existing TTMs execbuf util and intended to replace
> > >> it in the long term.
> > >>
> > >> The basic functionality is that we abstracts the necessary loop to lock
> > >> many different GEM buffers with automated deadlock and duplicate handling.    
> > > As many other drivers do already, we are considering using drm_exec()
> > > for our resv locking in the PowerVR driver, so we might have more
> > > questions/comments in the coming days/weeks, but I already have a
> > > couple right now (see below).
> > >    
> > >> v3: drop duplicate tracking, radeon is really the only one needing that    
> > > I think we'd actually be interested in duplicate tracking. Is there any
> > > way we can make it an optional feature through some extra helpers/flags?
> > > Doesn't have to be done in this patch series, I'm just wondering if this
> > > is something we can share as well.    
> > 
> > You can still capture the -EALREADY error and act appropriately in your 
> > driver.
> > 
> > For radeon it just means ignoring the error code and going ahead, but 
> > that behavior doesn't seem to be desired in most cases.
> > 
> > Initially I though we need to separately track how many and how often 
> > BOs are duplicated, but there is simply no use for this.
> >   
> > >
> > > [...]
> > >    
> > >> +/**
> > >> + * DOC: Overview
> > >> + *
> > >> + * This component mainly abstracts the retry loop necessary for locking
> > >> + * multiple GEM objects while preparing hardware operations (e.g. command
> > >> + * submissions, page table updates etc..).
> > >> + *
> > >> + * If a contention is detected while locking a GEM object the cleanup procedure
> > >> + * unlocks all previously locked GEM objects and locks the contended one first
> > >> + * before locking any further objects.
> > >> + *
> > >> + * After an object is locked fences slots can optionally be reserved on the
> > >> + * dma_resv object inside the GEM object.
> > >> + *
> > >> + * A typical usage pattern should look like this::
> > >> + *
> > >> + *	struct drm_gem_object *obj;
> > >> + *	struct drm_exec exec;
> > >> + *	unsigned long index;
> > >> + *	int ret;
> > >> + *
> > >> + *	drm_exec_init(&exec, true);
> > >> + *	drm_exec_while_not_all_locked(&exec) {
> > >> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> > >> + *		drm_exec_continue_on_contention(&exec);
> > >> + *		if (ret)
> > >> + *			goto error;
> > >> + *    
> > > Have you considered defining a drm_exec_try_prepare_obj_or_retry()
> > > combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?
> > >
> > > #define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
> > >          ({ \
> > >                  int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
> > >                  if (unlikely(drm_exec_is_contended(exec))) \
> > >                          continue; \
> > >                  __ret; \
> > >          })
> > >
> > > This way the following pattern
> > >
> > > 		ret = drm_exec_prepare_obj(&exec, boA, 1);
> > > 		drm_exec_continue_on_contention(&exec);
> > > 		if (ret)
> > > 			goto error;
> > >
> > > can be turned into something more conventional:
> > >
> > > 		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
> > > 		if (ret)
> > > 			goto error;    
> > 
> > Yeah, I was considering that as well. But then abandoned it as to 
> > complicated.
> > 
> > I really need to find some time to work on that anyway.

I've been playing with drm_exec for a couple weeks now, and I wanted
to share something I hacked to try and make the API simpler and
more robust against misuse (see the below diff, which is a slightly
adjusted version of your work).

In this version, the user is no longer in control of the retry
loop. Instead, it provides an expression (a call to a
sub-function) to be re-evaluated each time a contention is
detected. IMHO, this makes the 'prepare-objs' functions easier to
apprehend, and avoids any mistake like calling
drm_exec_continue_on_contention() in an inner loop, or breaking
out of the drm_exec_while_all_locked() loop unintentionally.

It also makes the internal management a bit simpler, since we
no longer call drm_exec_cleanup() on the first attempt, and can
thus get rid of the DRM_EXEC_DUMMY trick.

In the below diff, I also re-introduced native support for
duplicates as an opt-in, so we don't have to do things like:

	ret = drm_exec_prepare_obj(exec, obj, num_fences);
	if (ret == -EALREADY)
		ret = dma_resv_reserve_fences(obj->resv, num_fences);
	if (ret)
		return ret;

and can just do:

	ret = drm_exec_prepare_obj(exec, obj, num_fences);
	if (ret)
		return;

Of course drivers can open-code a wrapper doing the same thing, but
given at least pvr and radeon need this, it'd be nice if the core
could support it natively.

That's mostly it. Just wanted to share what I had in case you're
interested. If not, that's fine too.

Regards,

Boris
---
 Documentation/gpu/drm-mm.rst |  12 ++
 drivers/gpu/drm/Kconfig      |   6 +
 drivers/gpu/drm/Makefile     |   2 +
 drivers/gpu/drm/drm_exec.c   | 274 +++++++++++++++++++++++++++++++++++
 include/drm/drm_exec.h       | 130 +++++++++++++++++
 5 files changed, 424 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index fe40ee686f6e..c9f120cfe730 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -524,6 +524,18 @@ DRM Sync Objects
 .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
    :export:
 
+DRM Execution context
+=====================
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
 GPU Scheduler
 =============
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 76991720637c..01a38fcdb1c4 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -194,6 +194,12 @@ config DRM_TTM
 	  GPU memory types. Will be enabled automatically if a device driver
 	  uses it.
 
+config DRM_EXEC
+	tristate
+	depends on DRM
+	help
+	  Execution context for command submissions
+
 config DRM_BUDDY
 	tristate
 	depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 1873f64db171..18a02eaf2d49 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
 #
 # Memory-management helpers
 #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
 
 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
 
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
new file mode 100644
index 000000000000..e0ad1a3e1610
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,274 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include <drm/drm_exec.h>
+#include <drm/drm_gem.h>
+#include <linux/dma-resv.h>
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking
+ * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure
+ * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ * int prepare_objs_func(struct drm_exec *exec, ...)
+ * {
+ *	struct drm_gem_object *boA, *boB;
+ * 	int ret;
+ *
+ *	<retrieve boA and boB here>
+ *
+ *	ret = drm_exec_prepare_obj(&exec, boA, 1);
+ *	if (ret)
+ *		return ret;
+ *
+ *	ret = drm_exec_prepare_obj(&exec, boB, 1);
+ *	if (ret)
+ *		return ret;
+ *
+ * 	return 0;
+ * }
+ *
+ * int some_func()
+ * {
+ *	struct drm_exec exec;
+ *	unsigned long index;
+ *	int ret;
+ *
+ *	drm_exec_init(&exec, true);
+ *	ret = drm_exec_until_all_locked(&exec, prepare_objs_func(&exec, ...));
+ *	if (ret)
+ *		goto error;
+ *
+ *	drm_exec_for_each_locked_object(&exec, index, obj) {
+ *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
+ *		...
+ *	}
+ *	drm_exec_fini(&exec);
+ *
+ * See struct dma_exec for more details.
+ */
+
+/* Unlock all objects and drop references */
+static void drm_exec_unlock_all(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		dma_resv_unlock(obj->resv);
+		drm_gem_object_put(obj);
+	}
+
+	drm_gem_object_put(exec->prelocked);
+	exec->prelocked = NULL;
+}
+
+/**
+ * drm_exec_init - initialize a drm_exec object
+ * @exec: the drm_exec object to initialize
+ * @interruptible: if locks should be acquired interruptible
+ *
+ * Initialize the object and make sure that we can track locked objects.
+ */
+void drm_exec_init(struct drm_exec *exec, u32 flags)
+{
+	exec->flags = flags;
+	exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	/* If allocation here fails, just delay that till the first use */
+	exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
+	exec->num_objects = 0;
+	exec->contended = NULL;
+	exec->prelocked = NULL;
+	ww_acquire_init(&exec->ticket, &reservation_ww_class);
+}
+EXPORT_SYMBOL(drm_exec_init);
+
+/**
+ * drm_exec_fini - finalize a drm_exec object
+ * @exec: the drm_exec object to finalize
+ *
+ * Unlock all locked objects, drop the references to objects and free all memory
+ * used for tracking the state.
+ */
+void drm_exec_fini(struct drm_exec *exec)
+{
+	drm_exec_unlock_all(exec);
+	kvfree(exec->objects);
+	if (exec->contended)
+		drm_gem_object_put(exec->contended);
+	ww_acquire_fini(&exec->ticket);
+}
+EXPORT_SYMBOL(drm_exec_fini);
+
+/**
+ * drm_exec_reset - reset a drm_exec object after a contention
+ * @exec: the drm_exec object to reset
+ *
+ * Unlock all locked objects and resets the number of objects locked.
+ */
+void drm_exec_reset(struct drm_exec *exec)
+{
+	WARN_ON(!exec->contended);
+	drm_exec_unlock_all(exec);
+	exec->num_objects = 0;
+}
+EXPORT_SYMBOL(drm_exec_reset);
+
+/* Track the locked object in the array */
+static int drm_exec_obj_locked(struct drm_exec *exec,
+			       struct drm_gem_object *obj)
+{
+	if (unlikely(exec->num_objects == exec->max_objects)) {
+		size_t size = exec->max_objects * sizeof(void *);
+		void *tmp;
+
+		tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
+				GFP_KERNEL);
+		if (!tmp)
+			return -ENOMEM;
+
+		exec->objects = tmp;
+		exec->max_objects += PAGE_SIZE / sizeof(void *);
+	}
+	drm_gem_object_get(obj);
+	exec->objects[exec->num_objects++] = obj;
+
+	return 0;
+}
+
+/* Make sure the contended object is locked first */
+static int drm_exec_lock_contended(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj = exec->contended;
+	int ret;
+
+	if (likely(!obj))
+		return 0;
+
+	if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) {
+		ret = dma_resv_lock_slow_interruptible(obj->resv,
+						       &exec->ticket);
+		if (unlikely(ret))
+			goto error_dropref;
+	} else {
+		dma_resv_lock_slow(obj->resv, &exec->ticket);
+	}
+
+	ret = drm_exec_obj_locked(exec, obj);
+	if (unlikely(ret)) {
+		dma_resv_unlock(obj->resv);
+		goto error_dropref;
+	}
+
+	swap(exec->prelocked, obj);
+
+error_dropref:
+	/* Always cleanup the contention so that error handling can kick in */
+	drm_gem_object_put(obj);
+	exec->contended = NULL;
+	return ret;
+}
+
+/**
+ * drm_exec_prepare_obj - prepare a GEM object for use
+ * @exec: the drm_exec object with the state
+ * @obj: the GEM object to prepare
+ * @num_fences: how many fences to reserve
+ *
+ * Prepare a GEM object for use by locking it and reserving fence slots. All
+ * successfully locked objects are put into the locked container.
+ *
+ * Returns: -EDEADLK if a contention is detected, -EALREADY when object is
+ * already locked, -ENOMEM when memory allocation failed and zero for success.
+ */
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences)
+{
+	int ret;
+
+	ret = drm_exec_lock_contended(exec);
+	if (unlikely(ret))
+		return ret;
+
+	if (exec->prelocked == obj) {
+		drm_gem_object_put(exec->prelocked);
+		exec->prelocked = NULL;
+
+		return dma_resv_reserve_fences(obj->resv, num_fences);
+	}
+
+	if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
+		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
+	else
+		ret = dma_resv_lock(obj->resv, &exec->ticket);
+
+	if (unlikely(ret == -EDEADLK)) {
+		drm_gem_object_get(obj);
+		exec->contended = obj;
+		return -EDEADLK;
+	}
+
+	if (unlikely(ret == -EALREADY &&
+	    (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
+		goto reserve_fences;
+
+	if (unlikely(ret))
+		return ret;
+
+	ret = drm_exec_obj_locked(exec, obj);
+	if (ret)
+		goto error_unlock;
+
+reserve_fences:
+	/* Keep locked when reserving fences fails */
+	return dma_resv_reserve_fences(obj->resv, num_fences);
+
+error_unlock:
+	dma_resv_unlock(obj->resv);
+	return ret;
+}
+EXPORT_SYMBOL(drm_exec_prepare_obj);
+
+/**
+ * drm_exec_prepare_array - helper to prepare an array of objects
+ * @exec: the drm_exec object with the state
+ * @objects: array of GEM object to prepare
+ * @num_objects: number of GEM objects in the array
+ * @num_fences: number of fences to reserve on each GEM object
+ *
+ * Prepares all GEM objects in an array, handles contention but aports on first
+ * error otherwise. Reserves @num_fences on each GEM object after locking it.
+ *
+ * Returns: -EALREADY when object is already locked, -ENOMEM when memory
+ * allocation failed and zero for success.
+ */
+int drm_exec_prepare_array(struct drm_exec *exec,
+			   struct drm_gem_object **objects,
+			   unsigned int num_objects,
+			   unsigned int num_fences)
+{
+	int ret;
+
+	for (unsigned int i = 0; i < num_objects; ++i) {
+		ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_exec_prepare_array);
+
+MODULE_DESCRIPTION("DRM execution context");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
new file mode 100644
index 000000000000..b1a5da0509c1
--- /dev/null
+++ b/include/drm/drm_exec.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#ifndef __DRM_EXEC_H__
+#define __DRM_EXEC_H__
+
+#include <linux/ww_mutex.h>
+
+struct drm_gem_object;
+
+/**
+ * enum drm_exec_flags - Execution context flags
+ */
+enum drm_exec_flags {
+	/**
+	 * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use interruptible locking
+	 * functions.
+	 */
+	DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
+
+	/**
+	 * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow EALREADY errors.
+	 */
+	DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
+};
+
+/**
+ * struct drm_exec - Execution context
+ */
+struct drm_exec {
+	/**
+	 * @flags: Combinations of DRM_EXEC_FLAG_* flags.
+	 */
+	u32 flags;
+
+	/**
+	 * @ticket: WW ticket used for acquiring locks
+	 */
+	struct ww_acquire_ctx	ticket;
+
+	/**
+	 * @num_objects: number of objects locked
+	 */
+	unsigned int		num_objects;
+
+	/**
+	 * @max_objects: maximum objects in array
+	 */
+	unsigned int		max_objects;
+
+	/**
+	 * @objects: array of the locked objects
+	 */
+	struct drm_gem_object	**objects;
+
+	/**
+	 * @contended: contended GEM object we backed off for
+	 */
+	struct drm_gem_object	*contended;
+
+	/**
+	 * @prelocked: already locked GEM object due to contention
+	 */
+	struct drm_gem_object *prelocked;
+};
+
+/**
+ * drm_exec_for_each_locked_object - iterate over all the locked objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the locked GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_locked_object(exec, index, obj)	\
+	for (index = 0, obj = (exec)->objects[0];		\
+	     index < (exec)->num_objects;			\
+	     ++index, obj = (exec)->objects[index])
+
+/**
+ * drm_exec_until_all_locked - retry objects preparation until all objects
+ * are locked
+ * @exec: drm_exec object
+ * @expr: expression to be evaluated on each attempt
+ *
+ * This helper tries to prepare objects and if a deadlock is detected,
+ * rollbacks and retries.
+ *
+ * @expr is typically a function that tries to prepare objects using
+ * drm_exec_prepare_obj().
+ *
+ * If we take drm_exec_prepare_array() as an example, you should do:
+ *
+ *	ret = drm_exec_until_all_locked(exec,
+ *					drm_exec_prepare_array(exec,
+ *							       objs,
+ *							       num_objs,
+ *							       num_fences));
+ *	if (ret)
+ *		goto error_path;
+ *
+ *	...
+ *
+ * Returns: 0 on success, a negative error code on failure.
+ */
+#define drm_exec_until_all_locked(exec, expr)		\
+	({						\
+		__label__ retry;			\
+		int __ret;				\
+retry:							\
+		__ret = expr;				\
+		if ((exec)->contended) {		\
+			WARN_ON(__ret != -EDEADLK);	\
+			drm_exec_reset(exec);		\
+			goto retry;			\
+		}					\
+		ww_acquire_done(&(exec)->ticket);	\
+		__ret;					\
+	})
+
+void drm_exec_init(struct drm_exec *exec, u32 flags);
+void drm_exec_fini(struct drm_exec *exec);
+void drm_exec_reset(struct drm_exec *exec);
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences);
+int drm_exec_prepare_array(struct drm_exec *exec,
+			   struct drm_gem_object **objects,
+			   unsigned int num_objects,
+			   unsigned int num_fences);
+
+#endif

Thomas Hellström (Intel) June 19, 2023, 8:59 a.m. UTC | #7

On 6/17/23 13:54, Boris Brezillon wrote:
> +Matthew who's been using drm_exec in Xe if I'm correct.
>
> Hello Christian,
>
> On Wed, 14 Jun 2023 15:02:52 +0200
> Boris Brezillon <boris.brezillon@collabora.com> wrote:
>
>> On Wed, 14 Jun 2023 14:30:53 +0200
>> Christian König <christian.koenig@amd.com> wrote:
>>
>>> Am 14.06.23 um 14:23 schrieb Boris Brezillon:
>>>> Hi Christian,
>>>>
>>>> On Thu,  4 May 2023 13:51:47 +0200
>>>> "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>     
>>>>> This adds the infrastructure for an execution context for GEM buffers
>>>>> which is similar to the existing TTMs execbuf util and intended to replace
>>>>> it in the long term.
>>>>>
>>>>> The basic functionality is that we abstracts the necessary loop to lock
>>>>> many different GEM buffers with automated deadlock and duplicate handling.
>>>> As many other drivers do already, we are considering using drm_exec()
>>>> for our resv locking in the PowerVR driver, so we might have more
>>>> questions/comments in the coming days/weeks, but I already have a
>>>> couple right now (see below).
>>>>     
>>>>> v3: drop duplicate tracking, radeon is really the only one needing that
>>>> I think we'd actually be interested in duplicate tracking. Is there any
>>>> way we can make it an optional feature through some extra helpers/flags?
>>>> Doesn't have to be done in this patch series, I'm just wondering if this
>>>> is something we can share as well.
>>> You can still capture the -EALREADY error and act appropriately in your
>>> driver.
>>>
>>> For radeon it just means ignoring the error code and going ahead, but
>>> that behavior doesn't seem to be desired in most cases.
>>>
>>> Initially I though we need to separately track how many and how often
>>> BOs are duplicated, but there is simply no use for this.
>>>    
>>>> [...]
>>>>     
>>>>> +/**
>>>>> + * DOC: Overview
>>>>> + *
>>>>> + * This component mainly abstracts the retry loop necessary for locking
>>>>> + * multiple GEM objects while preparing hardware operations (e.g. command
>>>>> + * submissions, page table updates etc..).
>>>>> + *
>>>>> + * If a contention is detected while locking a GEM object the cleanup procedure
>>>>> + * unlocks all previously locked GEM objects and locks the contended one first
>>>>> + * before locking any further objects.
>>>>> + *
>>>>> + * After an object is locked fences slots can optionally be reserved on the
>>>>> + * dma_resv object inside the GEM object.
>>>>> + *
>>>>> + * A typical usage pattern should look like this::
>>>>> + *
>>>>> + *	struct drm_gem_object *obj;
>>>>> + *	struct drm_exec exec;
>>>>> + *	unsigned long index;
>>>>> + *	int ret;
>>>>> + *
>>>>> + *	drm_exec_init(&exec, true);
>>>>> + *	drm_exec_while_not_all_locked(&exec) {
>>>>> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
>>>>> + *		drm_exec_continue_on_contention(&exec);
>>>>> + *		if (ret)
>>>>> + *			goto error;
>>>>> + *
>>>> Have you considered defining a drm_exec_try_prepare_obj_or_retry()
>>>> combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?
>>>>
>>>> #define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
>>>>           ({ \
>>>>                   int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
>>>>                   if (unlikely(drm_exec_is_contended(exec))) \
>>>>                           continue; \
>>>>                   __ret; \
>>>>           })
>>>>
>>>> This way the following pattern
>>>>
>>>> 		ret = drm_exec_prepare_obj(&exec, boA, 1);
>>>> 		drm_exec_continue_on_contention(&exec);
>>>> 		if (ret)
>>>> 			goto error;
>>>>
>>>> can be turned into something more conventional:
>>>>
>>>> 		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
>>>> 		if (ret)
>>>> 			goto error;
>>> Yeah, I was considering that as well. But then abandoned it as to
>>> complicated.
>>>
>>> I really need to find some time to work on that anyway.
> I've been playing with drm_exec for a couple weeks now, and I wanted
> to share something I hacked to try and make the API simpler and
> more robust against misuse (see the below diff, which is a slightly
> adjusted version of your work).

It would be good if we could have someone taking charge of this series 
and address all review comments, I see some of my comments getting lost, 
we have multiple submitters and I can't find a dri-devel patchwork entry 
for this. Anyway some comments below.

>
> In this version, the user is no longer in control of the retry
> loop. Instead, it provides an expression (a call to a
> sub-function) to be re-evaluated each time a contention is
> detected. IMHO, this makes the 'prepare-objs' functions easier to
> apprehend, and avoids any mistake like calling
> drm_exec_continue_on_contention() in an inner loop, or breaking
> out of the drm_exec_while_all_locked() loop unintentionally.

In i915 we've had a very similar helper to this, and while I agree this 
newer version would probably help make code cleaner, but OTOH there also 
are some places where the short drm_exec_while_all_locked() -likeblock 
don't really motivate a separate function. Porting i915 to the current 
version will take some work, For  the xe driver both versions would work 
fine.

Some additional review comments not related to the interface change below:

>
> It also makes the internal management a bit simpler, since we
> no longer call drm_exec_cleanup() on the first attempt, and can
> thus get rid of the DRM_EXEC_DUMMY trick.
>
> In the below diff, I also re-introduced native support for
> duplicates as an opt-in, so we don't have to do things like:
>
> 	ret = drm_exec_prepare_obj(exec, obj, num_fences);
> 	if (ret == -EALREADY)
> 		ret = dma_resv_reserve_fences(obj->resv, num_fences);
> 	if (ret)
> 		return ret;
>
> and can just do:
>
> 	ret = drm_exec_prepare_obj(exec, obj, num_fences);
> 	if (ret)
> 		return;
>
> Of course drivers can open-code a wrapper doing the same thing, but
> given at least pvr and radeon need this, it'd be nice if the core
> could support it natively.
>
> That's mostly it. Just wanted to share what I had in case you're
> interested. If not, that's fine too.
>
> Regards,
>
> Boris
> ---
>   Documentation/gpu/drm-mm.rst |  12 ++
>   drivers/gpu/drm/Kconfig      |   6 +
>   drivers/gpu/drm/Makefile     |   2 +
>   drivers/gpu/drm/drm_exec.c   | 274 +++++++++++++++++++++++++++++++++++
>   include/drm/drm_exec.h       | 130 +++++++++++++++++
>   5 files changed, 424 insertions(+)
>   create mode 100644 drivers/gpu/drm/drm_exec.c
>   create mode 100644 include/drm/drm_exec.h
>
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index fe40ee686f6e..c9f120cfe730 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -524,6 +524,18 @@ DRM Sync Objects
>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>      :export:
>   
> +DRM Execution context
> +=====================
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :doc: Overview
> +
> +.. kernel-doc:: include/drm/drm_exec.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :export:
> +
>   GPU Scheduler
>   =============
>   
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 76991720637c..01a38fcdb1c4 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -194,6 +194,12 @@ config DRM_TTM
>   	  GPU memory types. Will be enabled automatically if a device driver
>   	  uses it.
>   
> +config DRM_EXEC
> +	tristate
> +	depends on DRM
> +	help
> +	  Execution context for command submissions
> +
>   config DRM_BUDDY
>   	tristate
>   	depends on DRM
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 1873f64db171..18a02eaf2d49 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
>   #
>   # Memory-management helpers
>   #
> +#
> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>   
>   obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>   
> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
> new file mode 100644
> index 000000000000..e0ad1a3e1610
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_exec.c
> @@ -0,0 +1,274 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gem.h>
> +#include <linux/dma-resv.h>
> +
> +/**
> + * DOC: Overview
> + *
> + * This component mainly abstracts the retry loop necessary for locking
> + * multiple GEM objects while preparing hardware operations (e.g. command
> + * submissions, page table updates etc..).
> + *
> + * If a contention is detected while locking a GEM object the cleanup procedure
> + * unlocks all previously locked GEM objects and locks the contended one first
> + * before locking any further objects.
> + *
> + * After an object is locked fences slots can optionally be reserved on the
> + * dma_resv object inside the GEM object.
> + *
> + * A typical usage pattern should look like this::
> + *
> + * int prepare_objs_func(struct drm_exec *exec, ...)
> + * {
> + *	struct drm_gem_object *boA, *boB;
> + * 	int ret;
> + *
> + *	<retrieve boA and boB here>
> + *
> + *	ret = drm_exec_prepare_obj(&exec, boA, 1);
> + *	if (ret)
> + *		return ret;
> + *
> + *	ret = drm_exec_prepare_obj(&exec, boB, 1);
> + *	if (ret)
> + *		return ret;
> + *
> + * 	return 0;
> + * }
> + *
> + * int some_func()
> + * {
> + *	struct drm_exec exec;
> + *	unsigned long index;
> + *	int ret;
> + *
> + *	drm_exec_init(&exec, true);
> + *	ret = drm_exec_until_all_locked(&exec, prepare_objs_func(&exec, ...));
> + *	if (ret)
> + *		goto error;
> + *
> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
> + *		...
> + *	}
> + *	drm_exec_fini(&exec);
> + *
> + * See struct dma_exec for more details.
> + */
> +
> +/* Unlock all objects and drop references */
> +static void drm_exec_unlock_all(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		dma_resv_unlock(obj->resv);
> +		drm_gem_object_put(obj);
> +	}
> +
> +	drm_gem_object_put(exec->prelocked);
> +	exec->prelocked = NULL;
> +}
> +
> +/**
> + * drm_exec_init - initialize a drm_exec object
> + * @exec: the drm_exec object to initialize
> + * @interruptible: if locks should be acquired interruptible
> + *
> + * Initialize the object and make sure that we can track locked objects.
> + */
> +void drm_exec_init(struct drm_exec *exec, u32 flags)
> +{
> +	exec->flags = flags;
> +	exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +
> +	/* If allocation here fails, just delay that till the first use */
> +	exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
> +	exec->num_objects = 0;
> +	exec->contended = NULL;
> +	exec->prelocked = NULL;
> +	ww_acquire_init(&exec->ticket, &reservation_ww_class);
> +}
> +EXPORT_SYMBOL(drm_exec_init);
> +
> +/**
> + * drm_exec_fini - finalize a drm_exec object
> + * @exec: the drm_exec object to finalize
> + *
> + * Unlock all locked objects, drop the references to objects and free all memory
> + * used for tracking the state.
> + */
> +void drm_exec_fini(struct drm_exec *exec)
> +{
> +	drm_exec_unlock_all(exec);
> +	kvfree(exec->objects);
> +	if (exec->contended)
> +		drm_gem_object_put(exec->contended);
> +	ww_acquire_fini(&exec->ticket);
> +}
> +EXPORT_SYMBOL(drm_exec_fini);
> +
> +/**
> + * drm_exec_reset - reset a drm_exec object after a contention
> + * @exec: the drm_exec object to reset
> + *
> + * Unlock all locked objects and resets the number of objects locked.
> + */
> +void drm_exec_reset(struct drm_exec *exec)
> +{
> +	WARN_ON(!exec->contended);
> +	drm_exec_unlock_all(exec);
> +	exec->num_objects = 0;
> +}
> +EXPORT_SYMBOL(drm_exec_reset);
> +
> +/* Track the locked object in the array */
> +static int drm_exec_obj_locked(struct drm_exec *exec,
> +			       struct drm_gem_object *obj)
> +{
> +	if (unlikely(exec->num_objects == exec->max_objects)) {
> +		size_t size = exec->max_objects * sizeof(void *);
> +		void *tmp;
> +
> +		tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
> +				GFP_KERNEL);
> +		if (!tmp)
> +			return -ENOMEM;

Sometimes you need to just temporarily lock an object and then unlock it 
again if it goes out of scope before reaching the end of 
_until_all_locked(). In that case you might need to remove a lock from 
the array. I *think* for all use-cases in i915 it would suffice to take 
a snapshot of num_objects, and unlock everything above that, having 
exec->objects behave like a stack, but was ever a list considered 
instead of a realloced array?

> +
> +		exec->objects = tmp;
> +		exec->max_objects += PAGE_SIZE / sizeof(void *);
> +	}
> +	drm_gem_object_get(obj);
> +	exec->objects[exec->num_objects++] = obj;
> +
> +	return 0;
> +}
> +
> +/* Make sure the contended object is locked first */
> +static int drm_exec_lock_contended(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj = exec->contended;
> +	int ret;
> +
> +	if (likely(!obj))
> +		return 0;
> +
> +	if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) {
> +		ret = dma_resv_lock_slow_interruptible(obj->resv,
> +						       &exec->ticket);
> +		if (unlikely(ret))
> +			goto error_dropref;
> +	} else {
> +		dma_resv_lock_slow(obj->resv, &exec->ticket);
> +	}
> +

Sometimes you want to just drop the contended lock after the above 
relaxation. (Eviction would be one example), and not add as prelocked, 
if the contended object goes out of scope. Eviction would in some 
situations be one such example, -EDEADLOCK leading to an error path 
where the object should otherwise be freed is another. Perhaps we could 
add an argument to prepare_obj() as to whether the object should be 
immediately put after relaxation.

> +	ret = drm_exec_obj_locked(exec, obj);
> +	if (unlikely(ret)) {
> +		dma_resv_unlock(obj->resv);
> +		goto error_dropref;
> +	}
> +
> +	swap(exec->prelocked, obj);
> +
> +error_dropref:
> +	/* Always cleanup the contention so that error handling can kick in */
> +	drm_gem_object_put(obj);
> +	exec->contended = NULL;
> +	return ret;
> +}
> +
> +/**
> + * drm_exec_prepare_obj - prepare a GEM object for use
> + * @exec: the drm_exec object with the state
> + * @obj: the GEM object to prepare
> + * @num_fences: how many fences to reserve
> + *
> + * Prepare a GEM object for use by locking it and reserving fence slots. All
> + * successfully locked objects are put into the locked container.
> + *
> + * Returns: -EDEADLK if a contention is detected, -EALREADY when object is
> + * already locked, -ENOMEM when memory allocation failed and zero for success.
> + */
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences)
> +{
> +	int ret;
> +
> +	ret = drm_exec_lock_contended(exec);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (exec->prelocked == obj) {
> +		drm_gem_object_put(exec->prelocked);
> +		exec->prelocked = NULL;
> +
> +		return dma_resv_reserve_fences(obj->resv, num_fences);
> +	}
> +
> +	if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
> +		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
> +	else
> +		ret = dma_resv_lock(obj->resv, &exec->ticket);
> +
> +	if (unlikely(ret == -EDEADLK)) {
> +		drm_gem_object_get(obj);
> +		exec->contended = obj;
> +		return -EDEADLK;
> +	}
> +
> +	if (unlikely(ret == -EALREADY &&
> +	    (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
> +		goto reserve_fences;
> +
> +	if (unlikely(ret))
> +		return ret;
> +
> +	ret = drm_exec_obj_locked(exec, obj);
> +	if (ret)
> +		goto error_unlock;
> +
> +reserve_fences:
> +	/* Keep locked when reserving fences fails */
> +	return dma_resv_reserve_fences(obj->resv, num_fences);

Ugh, what is the use-case for keeping things locked here? How would a 
caller tell the difference between an error where everything is locked 
and nothing is locked? IMO, we should unlock on error here. If there 
indeed is a use-case we should add a separate function for reserving 
fences for all locked objects, rather than returning sometimes locked on 
error sometime not.

Thanks,

Thomas


> +
> +error_unlock:
> +	dma_resv_unlock(obj->resv);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_obj);
> +
> +/**
> + * drm_exec_prepare_array - helper to prepare an array of objects
> + * @exec: the drm_exec object with the state
> + * @objects: array of GEM object to prepare
> + * @num_objects: number of GEM objects in the array
> + * @num_fences: number of fences to reserve on each GEM object
> + *
> + * Prepares all GEM objects in an array, handles contention but aports on first
> + * error otherwise. Reserves @num_fences on each GEM object after locking it.
> + *
> + * Returns: -EALREADY when object is already locked, -ENOMEM when memory
> + * allocation failed and zero for success.
> + */
> +int drm_exec_prepare_array(struct drm_exec *exec,
> +			   struct drm_gem_object **objects,
> +			   unsigned int num_objects,
> +			   unsigned int num_fences)
> +{
> +	int ret;
> +
> +	for (unsigned int i = 0; i < num_objects; ++i) {
> +		ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_array);
> +
> +MODULE_DESCRIPTION("DRM execution context");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> new file mode 100644
> index 000000000000..b1a5da0509c1
> --- /dev/null
> +++ b/include/drm/drm_exec.h
> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#ifndef __DRM_EXEC_H__
> +#define __DRM_EXEC_H__
> +
> +#include <linux/ww_mutex.h>
> +
> +struct drm_gem_object;
> +
> +/**
> + * enum drm_exec_flags - Execution context flags
> + */
> +enum drm_exec_flags {
> +	/**
> +	 * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use interruptible locking
> +	 * functions.
> +	 */
> +	DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
> +
> +	/**
> +	 * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow EALREADY errors.
> +	 */
> +	DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
> +};
> +
> +/**
> + * struct drm_exec - Execution context
> + */
> +struct drm_exec {
> +	/**
> +	 * @flags: Combinations of DRM_EXEC_FLAG_* flags.
> +	 */
> +	u32 flags;
> +
> +	/**
> +	 * @ticket: WW ticket used for acquiring locks
> +	 */
> +	struct ww_acquire_ctx	ticket;
> +
> +	/**
> +	 * @num_objects: number of objects locked
> +	 */
> +	unsigned int		num_objects;
> +
> +	/**
> +	 * @max_objects: maximum objects in array
> +	 */
> +	unsigned int		max_objects;
> +
> +	/**
> +	 * @objects: array of the locked objects
> +	 */
> +	struct drm_gem_object	**objects;
> +
> +	/**
> +	 * @contended: contended GEM object we backed off for
> +	 */
> +	struct drm_gem_object	*contended;
> +
> +	/**
> +	 * @prelocked: already locked GEM object due to contention
> +	 */
> +	struct drm_gem_object *prelocked;
> +};
> +
> +/**
> + * drm_exec_for_each_locked_object - iterate over all the locked objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the locked GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_locked_object(exec, index, obj)	\
> +	for (index = 0, obj = (exec)->objects[0];		\
> +	     index < (exec)->num_objects;			\
> +	     ++index, obj = (exec)->objects[index])
> +
> +/**
> + * drm_exec_until_all_locked - retry objects preparation until all objects
> + * are locked
> + * @exec: drm_exec object
> + * @expr: expression to be evaluated on each attempt
> + *
> + * This helper tries to prepare objects and if a deadlock is detected,
> + * rollbacks and retries.
> + *
> + * @expr is typically a function that tries to prepare objects using
> + * drm_exec_prepare_obj().
> + *
> + * If we take drm_exec_prepare_array() as an example, you should do:
> + *
> + *	ret = drm_exec_until_all_locked(exec,
> + *					drm_exec_prepare_array(exec,
> + *							       objs,
> + *							       num_objs,
> + *							       num_fences));
> + *	if (ret)
> + *		goto error_path;
> + *
> + *	...
> + *
> + * Returns: 0 on success, a negative error code on failure.
> + */
> +#define drm_exec_until_all_locked(exec, expr)		\
> +	({						\
> +		__label__ retry;			\
> +		int __ret;				\
> +retry:							\
> +		__ret = expr;				\
> +		if ((exec)->contended) {		\
> +			WARN_ON(__ret != -EDEADLK);	\
> +			drm_exec_reset(exec);		\
> +			goto retry;			\
> +		}					\
> +		ww_acquire_done(&(exec)->ticket);	\
> +		__ret;					\
> +	})
> +
> +void drm_exec_init(struct drm_exec *exec, u32 flags);
> +void drm_exec_fini(struct drm_exec *exec);
> +void drm_exec_reset(struct drm_exec *exec);
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences);
> +int drm_exec_prepare_array(struct drm_exec *exec,
> +			   struct drm_gem_object **objects,
> +			   unsigned int num_objects,
> +			   unsigned int num_fences);
> +
> +#endif

Christian König June 19, 2023, 9:20 a.m. UTC | #8

Hi guys,

Am 19.06.23 um 10:59 schrieb Thomas Hellström (Intel):
> [SNIP]
>>>>
>>>> I really need to find some time to work on that anyway.
>> I've been playing with drm_exec for a couple weeks now, and I wanted
>> to share something I hacked to try and make the API simpler and
>> more robust against misuse (see the below diff, which is a slightly
>> adjusted version of your work).
>
> It would be good if we could have someone taking charge of this series 
> and address all review comments, I see some of my comments getting 
> lost, we have multiple submitters and I can't find a dri-devel 
> patchwork entry for this. Anyway some comments below.

I can try to find some time for the series this week (As long as nobody 
comes along and has any burning roof).

>
>>
>> In this version, the user is no longer in control of the retry
>> loop. Instead, it provides an expression (a call to a
>> sub-function) to be re-evaluated each time a contention is
>> detected. IMHO, this makes the 'prepare-objs' functions easier to
>> apprehend, and avoids any mistake like calling
>> drm_exec_continue_on_contention() in an inner loop, or breaking
>> out of the drm_exec_while_all_locked() loop unintentionally.
>
> In i915 we've had a very similar helper to this, and while I agree 
> this newer version would probably help make code cleaner, but OTOH 
> there also are some places where the short drm_exec_while_all_locked() 
> -likeblock don't really motivate a separate function. Porting i915 to 
> the current version will take some work, For  the xe driver both 
> versions would work fine.

Yeah, this is actually what my first version of this looked like. But I 
abandoned that approach because we have a lot of cases were we just 
quickly want to lock a few GEM objects and don't want the extra overhead 
of putting all the state into some bag to forward it to a function.

>
> Some additional review comments not related to the interface change 
> below:
>
>>
>> It also makes the internal management a bit simpler, since we
>> no longer call drm_exec_cleanup() on the first attempt, and can
>> thus get rid of the DRM_EXEC_DUMMY trick.
>>
>> In the below diff, I also re-introduced native support for
>> duplicates as an opt-in, so we don't have to do things like:
>>
>>     ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>     if (ret == -EALREADY)
>>         ret = dma_resv_reserve_fences(obj->resv, num_fences);
>>     if (ret)
>>         return ret;
>>
>> and can just do:
>>
>>     ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>     if (ret)
>>         return;
>>
>> Of course drivers can open-code a wrapper doing the same thing, but
>> given at least pvr and radeon need this, it'd be nice if the core
>> could support it natively.
>>
>> That's mostly it. Just wanted to share what I had in case you're
>> interested. If not, that's fine too.
>>
>> Regards,
>>
>> Boris
>> ---
>>   Documentation/gpu/drm-mm.rst |  12 ++
>>   drivers/gpu/drm/Kconfig      |   6 +
>>   drivers/gpu/drm/Makefile     |   2 +
>>   drivers/gpu/drm/drm_exec.c   | 274 +++++++++++++++++++++++++++++++++++
>>   include/drm/drm_exec.h       | 130 +++++++++++++++++
>>   5 files changed, 424 insertions(+)
>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>   create mode 100644 include/drm/drm_exec.h
>>
>> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
>> index fe40ee686f6e..c9f120cfe730 100644
>> --- a/Documentation/gpu/drm-mm.rst
>> +++ b/Documentation/gpu/drm-mm.rst
>> @@ -524,6 +524,18 @@ DRM Sync Objects
>>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>>      :export:
>>   +DRM Execution context
>> +=====================
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>> +   :doc: Overview
>> +
>> +.. kernel-doc:: include/drm/drm_exec.h
>> +   :internal:
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>> +   :export:
>> +
>>   GPU Scheduler
>>   =============
>>   diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
>> index 76991720637c..01a38fcdb1c4 100644
>> --- a/drivers/gpu/drm/Kconfig
>> +++ b/drivers/gpu/drm/Kconfig
>> @@ -194,6 +194,12 @@ config DRM_TTM
>>         GPU memory types. Will be enabled automatically if a device 
>> driver
>>         uses it.
>>   +config DRM_EXEC
>> +    tristate
>> +    depends on DRM
>> +    help
>> +      Execution context for command submissions
>> +
>>   config DRM_BUDDY
>>       tristate
>>       depends on DRM
>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>> index 1873f64db171..18a02eaf2d49 100644
>> --- a/drivers/gpu/drm/Makefile
>> +++ b/drivers/gpu/drm/Makefile
>> @@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += 
>> drm_panel_orientation_quirks.o
>>   #
>>   # Memory-management helpers
>>   #
>> +#
>> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>>     obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>>   diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
>> new file mode 100644
>> index 000000000000..e0ad1a3e1610
>> --- /dev/null
>> +++ b/drivers/gpu/drm/drm_exec.c
>> @@ -0,0 +1,274 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>> +
>> +#include <drm/drm_exec.h>
>> +#include <drm/drm_gem.h>
>> +#include <linux/dma-resv.h>
>> +
>> +/**
>> + * DOC: Overview
>> + *
>> + * This component mainly abstracts the retry loop necessary for locking
>> + * multiple GEM objects while preparing hardware operations (e.g. 
>> command
>> + * submissions, page table updates etc..).
>> + *
>> + * If a contention is detected while locking a GEM object the 
>> cleanup procedure
>> + * unlocks all previously locked GEM objects and locks the contended 
>> one first
>> + * before locking any further objects.
>> + *
>> + * After an object is locked fences slots can optionally be reserved 
>> on the
>> + * dma_resv object inside the GEM object.
>> + *
>> + * A typical usage pattern should look like this::
>> + *
>> + * int prepare_objs_func(struct drm_exec *exec, ...)
>> + * {
>> + *    struct drm_gem_object *boA, *boB;
>> + *     int ret;
>> + *
>> + *    <retrieve boA and boB here>
>> + *
>> + *    ret = drm_exec_prepare_obj(&exec, boA, 1);
>> + *    if (ret)
>> + *        return ret;
>> + *
>> + *    ret = drm_exec_prepare_obj(&exec, boB, 1);
>> + *    if (ret)
>> + *        return ret;
>> + *
>> + *     return 0;
>> + * }
>> + *
>> + * int some_func()
>> + * {
>> + *    struct drm_exec exec;
>> + *    unsigned long index;
>> + *    int ret;
>> + *
>> + *    drm_exec_init(&exec, true);
>> + *    ret = drm_exec_until_all_locked(&exec, 
>> prepare_objs_func(&exec, ...));
>> + *    if (ret)
>> + *        goto error;
>> + *
>> + *    drm_exec_for_each_locked_object(&exec, index, obj) {
>> + *        dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>> + *        ...
>> + *    }
>> + *    drm_exec_fini(&exec);
>> + *
>> + * See struct dma_exec for more details.
>> + */
>> +
>> +/* Unlock all objects and drop references */
>> +static void drm_exec_unlock_all(struct drm_exec *exec)
>> +{
>> +    struct drm_gem_object *obj;
>> +    unsigned long index;
>> +
>> +    drm_exec_for_each_locked_object(exec, index, obj) {
>> +        dma_resv_unlock(obj->resv);
>> +        drm_gem_object_put(obj);
>> +    }
>> +
>> +    drm_gem_object_put(exec->prelocked);
>> +    exec->prelocked = NULL;
>> +}
>> +
>> +/**
>> + * drm_exec_init - initialize a drm_exec object
>> + * @exec: the drm_exec object to initialize
>> + * @interruptible: if locks should be acquired interruptible
>> + *
>> + * Initialize the object and make sure that we can track locked 
>> objects.
>> + */
>> +void drm_exec_init(struct drm_exec *exec, u32 flags)
>> +{
>> +    exec->flags = flags;
>> +    exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +
>> +    /* If allocation here fails, just delay that till the first use */
>> +    exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
>> +    exec->num_objects = 0;
>> +    exec->contended = NULL;
>> +    exec->prelocked = NULL;
>> +    ww_acquire_init(&exec->ticket, &reservation_ww_class);
>> +}
>> +EXPORT_SYMBOL(drm_exec_init);
>> +
>> +/**
>> + * drm_exec_fini - finalize a drm_exec object
>> + * @exec: the drm_exec object to finalize
>> + *
>> + * Unlock all locked objects, drop the references to objects and 
>> free all memory
>> + * used for tracking the state.
>> + */
>> +void drm_exec_fini(struct drm_exec *exec)
>> +{
>> +    drm_exec_unlock_all(exec);
>> +    kvfree(exec->objects);
>> +    if (exec->contended)
>> +        drm_gem_object_put(exec->contended);
>> +    ww_acquire_fini(&exec->ticket);
>> +}
>> +EXPORT_SYMBOL(drm_exec_fini);
>> +
>> +/**
>> + * drm_exec_reset - reset a drm_exec object after a contention
>> + * @exec: the drm_exec object to reset
>> + *
>> + * Unlock all locked objects and resets the number of objects locked.
>> + */
>> +void drm_exec_reset(struct drm_exec *exec)
>> +{
>> +    WARN_ON(!exec->contended);
>> +    drm_exec_unlock_all(exec);
>> +    exec->num_objects = 0;
>> +}
>> +EXPORT_SYMBOL(drm_exec_reset);
>> +
>> +/* Track the locked object in the array */
>> +static int drm_exec_obj_locked(struct drm_exec *exec,
>> +                   struct drm_gem_object *obj)
>> +{
>> +    if (unlikely(exec->num_objects == exec->max_objects)) {
>> +        size_t size = exec->max_objects * sizeof(void *);
>> +        void *tmp;
>> +
>> +        tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
>> +                GFP_KERNEL);
>> +        if (!tmp)
>> +            return -ENOMEM;
>
> Sometimes you need to just temporarily lock an object and then unlock 
> it again if it goes out of scope before reaching the end of 
> _until_all_locked(). In that case you might need to remove a lock from 
> the array. I *think* for all use-cases in i915 it would suffice to 
> take a snapshot of num_objects, and unlock everything above that, 
> having exec->objects behave like a stack, but was ever a list 
> considered instead of a realloced array?

Yes, the problem is that linked lists really suck regarding their cache 
line locality. That's why I've came up with this approach here.

What we could maybe do is to allow unlocking objects, but with the cost 
of linear backward searching for them in the array.

>
>> +
>> +        exec->objects = tmp;
>> +        exec->max_objects += PAGE_SIZE / sizeof(void *);
>> +    }
>> +    drm_gem_object_get(obj);
>> +    exec->objects[exec->num_objects++] = obj;
>> +
>> +    return 0;
>> +}
>> +
>> +/* Make sure the contended object is locked first */
>> +static int drm_exec_lock_contended(struct drm_exec *exec)
>> +{
>> +    struct drm_gem_object *obj = exec->contended;
>> +    int ret;
>> +
>> +    if (likely(!obj))
>> +        return 0;
>> +
>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) {
>> +        ret = dma_resv_lock_slow_interruptible(obj->resv,
>> +                               &exec->ticket);
>> +        if (unlikely(ret))
>> +            goto error_dropref;
>> +    } else {
>> +        dma_resv_lock_slow(obj->resv, &exec->ticket);
>> +    }
>> +
>
> Sometimes you want to just drop the contended lock after the above 
> relaxation. (Eviction would be one example), and not add as prelocked, 
> if the contended object goes out of scope. Eviction would in some 
> situations be one such example, -EDEADLOCK leading to an error path 
> where the object should otherwise be freed is another. Perhaps we 
> could add an argument to prepare_obj() as to whether the object should 
> be immediately put after relaxation.

I was considering a try_prepare version as well, that should cover this 
use case.

>
>> +    ret = drm_exec_obj_locked(exec, obj);
>> +    if (unlikely(ret)) {
>> +        dma_resv_unlock(obj->resv);
>> +        goto error_dropref;
>> +    }
>> +
>> +    swap(exec->prelocked, obj);
>> +
>> +error_dropref:
>> +    /* Always cleanup the contention so that error handling can kick 
>> in */
>> +    drm_gem_object_put(obj);
>> +    exec->contended = NULL;
>> +    return ret;
>> +}
>> +
>> +/**
>> + * drm_exec_prepare_obj - prepare a GEM object for use
>> + * @exec: the drm_exec object with the state
>> + * @obj: the GEM object to prepare
>> + * @num_fences: how many fences to reserve
>> + *
>> + * Prepare a GEM object for use by locking it and reserving fence 
>> slots. All
>> + * successfully locked objects are put into the locked container.
>> + *
>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when 
>> object is
>> + * already locked, -ENOMEM when memory allocation failed and zero 
>> for success.
>> + */
>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>> drm_gem_object *obj,
>> +             unsigned int num_fences)
>> +{
>> +    int ret;
>> +
>> +    ret = drm_exec_lock_contended(exec);
>> +    if (unlikely(ret))
>> +        return ret;
>> +
>> +    if (exec->prelocked == obj) {
>> +        drm_gem_object_put(exec->prelocked);
>> +        exec->prelocked = NULL;
>> +
>> +        return dma_resv_reserve_fences(obj->resv, num_fences);
>> +    }
>> +
>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>> +        ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
>> +    else
>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>> +
>> +    if (unlikely(ret == -EDEADLK)) {
>> +        drm_gem_object_get(obj);
>> +        exec->contended = obj;
>> +        return -EDEADLK;
>> +    }
>> +
>> +    if (unlikely(ret == -EALREADY &&
>> +        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>> +        goto reserve_fences;
>> +
>> +    if (unlikely(ret))
>> +        return ret;
>> +
>> +    ret = drm_exec_obj_locked(exec, obj);
>> +    if (ret)
>> +        goto error_unlock;
>> +
>> +reserve_fences:
>> +    /* Keep locked when reserving fences fails */
>> +    return dma_resv_reserve_fences(obj->resv, num_fences);
>
> Ugh, what is the use-case for keeping things locked here? How would a 
> caller tell the difference between an error where everything is locked 
> and nothing is locked? IMO, we should unlock on error here. If there 
> indeed is a use-case we should add a separate function for reserving 
> fences for all locked objects, rather than returning sometimes locked 
> on error sometime not.

We return the object locked here because it was to much churn to remove 
it again from the array and we are getting fully cleaned up at the end 
anyway.

Regards,
Christian.

>
> Thanks,
>
> Thomas
>
>
>> +
>> +error_unlock:
>> +    dma_resv_unlock(obj->resv);
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>> +
>> +/**
>> + * drm_exec_prepare_array - helper to prepare an array of objects
>> + * @exec: the drm_exec object with the state
>> + * @objects: array of GEM object to prepare
>> + * @num_objects: number of GEM objects in the array
>> + * @num_fences: number of fences to reserve on each GEM object
>> + *
>> + * Prepares all GEM objects in an array, handles contention but 
>> aports on first
>> + * error otherwise. Reserves @num_fences on each GEM object after 
>> locking it.
>> + *
>> + * Returns: -EALREADY when object is already locked, -ENOMEM when 
>> memory
>> + * allocation failed and zero for success.
>> + */
>> +int drm_exec_prepare_array(struct drm_exec *exec,
>> +               struct drm_gem_object **objects,
>> +               unsigned int num_objects,
>> +               unsigned int num_fences)
>> +{
>> +    int ret;
>> +
>> +    for (unsigned int i = 0; i < num_objects; ++i) {
>> +        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>> +        if (ret)
>> +            return ret;
>> +    }
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>> +
>> +MODULE_DESCRIPTION("DRM execution context");
>> +MODULE_LICENSE("Dual MIT/GPL");
>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>> new file mode 100644
>> index 000000000000..b1a5da0509c1
>> --- /dev/null
>> +++ b/include/drm/drm_exec.h
>> @@ -0,0 +1,130 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>> +
>> +#ifndef __DRM_EXEC_H__
>> +#define __DRM_EXEC_H__
>> +
>> +#include <linux/ww_mutex.h>
>> +
>> +struct drm_gem_object;
>> +
>> +/**
>> + * enum drm_exec_flags - Execution context flags
>> + */
>> +enum drm_exec_flags {
>> +    /**
>> +     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use interruptible 
>> locking
>> +     * functions.
>> +     */
>> +    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>> +
>> +    /**
>> +     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow EALREADY 
>> errors.
>> +     */
>> +    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>> +};
>> +
>> +/**
>> + * struct drm_exec - Execution context
>> + */
>> +struct drm_exec {
>> +    /**
>> +     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>> +     */
>> +    u32 flags;
>> +
>> +    /**
>> +     * @ticket: WW ticket used for acquiring locks
>> +     */
>> +    struct ww_acquire_ctx    ticket;
>> +
>> +    /**
>> +     * @num_objects: number of objects locked
>> +     */
>> +    unsigned int        num_objects;
>> +
>> +    /**
>> +     * @max_objects: maximum objects in array
>> +     */
>> +    unsigned int        max_objects;
>> +
>> +    /**
>> +     * @objects: array of the locked objects
>> +     */
>> +    struct drm_gem_object    **objects;
>> +
>> +    /**
>> +     * @contended: contended GEM object we backed off for
>> +     */
>> +    struct drm_gem_object    *contended;
>> +
>> +    /**
>> +     * @prelocked: already locked GEM object due to contention
>> +     */
>> +    struct drm_gem_object *prelocked;
>> +};
>> +
>> +/**
>> + * drm_exec_for_each_locked_object - iterate over all the locked 
>> objects
>> + * @exec: drm_exec object
>> + * @index: unsigned long index for the iteration
>> + * @obj: the current GEM object
>> + *
>> + * Iterate over all the locked GEM objects inside the drm_exec object.
>> + */
>> +#define drm_exec_for_each_locked_object(exec, index, obj)    \
>> +    for (index = 0, obj = (exec)->objects[0];        \
>> +         index < (exec)->num_objects;            \
>> +         ++index, obj = (exec)->objects[index])
>> +
>> +/**
>> + * drm_exec_until_all_locked - retry objects preparation until all 
>> objects
>> + * are locked
>> + * @exec: drm_exec object
>> + * @expr: expression to be evaluated on each attempt
>> + *
>> + * This helper tries to prepare objects and if a deadlock is detected,
>> + * rollbacks and retries.
>> + *
>> + * @expr is typically a function that tries to prepare objects using
>> + * drm_exec_prepare_obj().
>> + *
>> + * If we take drm_exec_prepare_array() as an example, you should do:
>> + *
>> + *    ret = drm_exec_until_all_locked(exec,
>> + *                    drm_exec_prepare_array(exec,
>> + *                                   objs,
>> + *                                   num_objs,
>> + *                                   num_fences));
>> + *    if (ret)
>> + *        goto error_path;
>> + *
>> + *    ...
>> + *
>> + * Returns: 0 on success, a negative error code on failure.
>> + */
>> +#define drm_exec_until_all_locked(exec, expr)        \
>> +    ({                        \
>> +        __label__ retry;            \
>> +        int __ret;                \
>> +retry:                            \
>> +        __ret = expr;                \
>> +        if ((exec)->contended) {        \
>> +            WARN_ON(__ret != -EDEADLK);    \
>> +            drm_exec_reset(exec);        \
>> +            goto retry;            \
>> +        }                    \
>> +        ww_acquire_done(&(exec)->ticket);    \
>> +        __ret;                    \
>> +    })
>> +
>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>> +void drm_exec_fini(struct drm_exec *exec);
>> +void drm_exec_reset(struct drm_exec *exec);
>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>> drm_gem_object *obj,
>> +             unsigned int num_fences);
>> +int drm_exec_prepare_array(struct drm_exec *exec,
>> +               struct drm_gem_object **objects,
>> +               unsigned int num_objects,
>> +               unsigned int num_fences);
>> +
>> +#endif

Thomas Hellström (Intel) June 19, 2023, 9:33 a.m. UTC | #9

Hi!

On 6/19/23 11:20, Christian König wrote:
> Hi guys,
>
> Am 19.06.23 um 10:59 schrieb Thomas Hellström (Intel):
>> [SNIP]
>>>>>
>>>>> I really need to find some time to work on that anyway.
>>> I've been playing with drm_exec for a couple weeks now, and I wanted
>>> to share something I hacked to try and make the API simpler and
>>> more robust against misuse (see the below diff, which is a slightly
>>> adjusted version of your work).
>>
>> It would be good if we could have someone taking charge of this 
>> series and address all review comments, I see some of my comments 
>> getting lost, we have multiple submitters and I can't find a 
>> dri-devel patchwork entry for this. Anyway some comments below.
>
> I can try to find some time for the series this week (As long as 
> nobody comes along and has any burning roof).
>
>>
>>>
>>> In this version, the user is no longer in control of the retry
>>> loop. Instead, it provides an expression (a call to a
>>> sub-function) to be re-evaluated each time a contention is
>>> detected. IMHO, this makes the 'prepare-objs' functions easier to
>>> apprehend, and avoids any mistake like calling
>>> drm_exec_continue_on_contention() in an inner loop, or breaking
>>> out of the drm_exec_while_all_locked() loop unintentionally.
>>
>> In i915 we've had a very similar helper to this, and while I agree 
>> this newer version would probably help make code cleaner, but OTOH 
>> there also are some places where the short 
>> drm_exec_while_all_locked() -likeblock don't really motivate a 
>> separate function. Porting i915 to the current version will take some 
>> work, For  the xe driver both versions would work fine.
>
> Yeah, this is actually what my first version of this looked like. But 
> I abandoned that approach because we have a lot of cases were we just 
> quickly want to lock a few GEM objects and don't want the extra 
> overhead of putting all the state into some bag to forward it to a 
> function.
>
>>
>> Some additional review comments not related to the interface change 
>> below:
>>
>>>
>>> It also makes the internal management a bit simpler, since we
>>> no longer call drm_exec_cleanup() on the first attempt, and can
>>> thus get rid of the DRM_EXEC_DUMMY trick.
>>>
>>> In the below diff, I also re-introduced native support for
>>> duplicates as an opt-in, so we don't have to do things like:
>>>
>>>     ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>>     if (ret == -EALREADY)
>>>         ret = dma_resv_reserve_fences(obj->resv, num_fences);
>>>     if (ret)
>>>         return ret;
>>>
>>> and can just do:
>>>
>>>     ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>>     if (ret)
>>>         return;
>>>
>>> Of course drivers can open-code a wrapper doing the same thing, but
>>> given at least pvr and radeon need this, it'd be nice if the core
>>> could support it natively.
>>>
>>> That's mostly it. Just wanted to share what I had in case you're
>>> interested. If not, that's fine too.
>>>
>>> Regards,
>>>
>>> Boris
>>> ---
>>>   Documentation/gpu/drm-mm.rst |  12 ++
>>>   drivers/gpu/drm/Kconfig      |   6 +
>>>   drivers/gpu/drm/Makefile     |   2 +
>>>   drivers/gpu/drm/drm_exec.c   | 274 
>>> +++++++++++++++++++++++++++++++++++
>>>   include/drm/drm_exec.h       | 130 +++++++++++++++++
>>>   5 files changed, 424 insertions(+)
>>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>>   create mode 100644 include/drm/drm_exec.h
>>>
>>> diff --git a/Documentation/gpu/drm-mm.rst 
>>> b/Documentation/gpu/drm-mm.rst
>>> index fe40ee686f6e..c9f120cfe730 100644
>>> --- a/Documentation/gpu/drm-mm.rst
>>> +++ b/Documentation/gpu/drm-mm.rst
>>> @@ -524,6 +524,18 @@ DRM Sync Objects
>>>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>>>      :export:
>>>   +DRM Execution context
>>> +=====================
>>> +
>>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>>> +   :doc: Overview
>>> +
>>> +.. kernel-doc:: include/drm/drm_exec.h
>>> +   :internal:
>>> +
>>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>>> +   :export:
>>> +
>>>   GPU Scheduler
>>>   =============
>>>   diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
>>> index 76991720637c..01a38fcdb1c4 100644
>>> --- a/drivers/gpu/drm/Kconfig
>>> +++ b/drivers/gpu/drm/Kconfig
>>> @@ -194,6 +194,12 @@ config DRM_TTM
>>>         GPU memory types. Will be enabled automatically if a device 
>>> driver
>>>         uses it.
>>>   +config DRM_EXEC
>>> +    tristate
>>> +    depends on DRM
>>> +    help
>>> +      Execution context for command submissions
>>> +
>>>   config DRM_BUDDY
>>>       tristate
>>>       depends on DRM
>>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>>> index 1873f64db171..18a02eaf2d49 100644
>>> --- a/drivers/gpu/drm/Makefile
>>> +++ b/drivers/gpu/drm/Makefile
>>> @@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += 
>>> drm_panel_orientation_quirks.o
>>>   #
>>>   # Memory-management helpers
>>>   #
>>> +#
>>> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>>>     obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>>>   diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
>>> new file mode 100644
>>> index 000000000000..e0ad1a3e1610
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/drm_exec.c
>>> @@ -0,0 +1,274 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>> +
>>> +#include <drm/drm_exec.h>
>>> +#include <drm/drm_gem.h>
>>> +#include <linux/dma-resv.h>
>>> +
>>> +/**
>>> + * DOC: Overview
>>> + *
>>> + * This component mainly abstracts the retry loop necessary for 
>>> locking
>>> + * multiple GEM objects while preparing hardware operations (e.g. 
>>> command
>>> + * submissions, page table updates etc..).
>>> + *
>>> + * If a contention is detected while locking a GEM object the 
>>> cleanup procedure
>>> + * unlocks all previously locked GEM objects and locks the 
>>> contended one first
>>> + * before locking any further objects.
>>> + *
>>> + * After an object is locked fences slots can optionally be 
>>> reserved on the
>>> + * dma_resv object inside the GEM object.
>>> + *
>>> + * A typical usage pattern should look like this::
>>> + *
>>> + * int prepare_objs_func(struct drm_exec *exec, ...)
>>> + * {
>>> + *    struct drm_gem_object *boA, *boB;
>>> + *     int ret;
>>> + *
>>> + *    <retrieve boA and boB here>
>>> + *
>>> + *    ret = drm_exec_prepare_obj(&exec, boA, 1);
>>> + *    if (ret)
>>> + *        return ret;
>>> + *
>>> + *    ret = drm_exec_prepare_obj(&exec, boB, 1);
>>> + *    if (ret)
>>> + *        return ret;
>>> + *
>>> + *     return 0;
>>> + * }
>>> + *
>>> + * int some_func()
>>> + * {
>>> + *    struct drm_exec exec;
>>> + *    unsigned long index;
>>> + *    int ret;
>>> + *
>>> + *    drm_exec_init(&exec, true);
>>> + *    ret = drm_exec_until_all_locked(&exec, 
>>> prepare_objs_func(&exec, ...));
>>> + *    if (ret)
>>> + *        goto error;
>>> + *
>>> + *    drm_exec_for_each_locked_object(&exec, index, obj) {
>>> + *        dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>>> + *        ...
>>> + *    }
>>> + *    drm_exec_fini(&exec);
>>> + *
>>> + * See struct dma_exec for more details.
>>> + */
>>> +
>>> +/* Unlock all objects and drop references */
>>> +static void drm_exec_unlock_all(struct drm_exec *exec)
>>> +{
>>> +    struct drm_gem_object *obj;
>>> +    unsigned long index;
>>> +
>>> +    drm_exec_for_each_locked_object(exec, index, obj) {
>>> +        dma_resv_unlock(obj->resv);
>>> +        drm_gem_object_put(obj);
>>> +    }
>>> +
>>> +    drm_gem_object_put(exec->prelocked);
>>> +    exec->prelocked = NULL;
>>> +}
>>> +
>>> +/**
>>> + * drm_exec_init - initialize a drm_exec object
>>> + * @exec: the drm_exec object to initialize
>>> + * @interruptible: if locks should be acquired interruptible
>>> + *
>>> + * Initialize the object and make sure that we can track locked 
>>> objects.
>>> + */
>>> +void drm_exec_init(struct drm_exec *exec, u32 flags)
>>> +{
>>> +    exec->flags = flags;
>>> +    exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
>>> +
>>> +    /* If allocation here fails, just delay that till the first use */
>>> +    exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) 
>>> : 0;
>>> +    exec->num_objects = 0;
>>> +    exec->contended = NULL;
>>> +    exec->prelocked = NULL;
>>> +    ww_acquire_init(&exec->ticket, &reservation_ww_class);
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_init);
>>> +
>>> +/**
>>> + * drm_exec_fini - finalize a drm_exec object
>>> + * @exec: the drm_exec object to finalize
>>> + *
>>> + * Unlock all locked objects, drop the references to objects and 
>>> free all memory
>>> + * used for tracking the state.
>>> + */
>>> +void drm_exec_fini(struct drm_exec *exec)
>>> +{
>>> +    drm_exec_unlock_all(exec);
>>> +    kvfree(exec->objects);
>>> +    if (exec->contended)
>>> +        drm_gem_object_put(exec->contended);
>>> +    ww_acquire_fini(&exec->ticket);
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_fini);
>>> +
>>> +/**
>>> + * drm_exec_reset - reset a drm_exec object after a contention
>>> + * @exec: the drm_exec object to reset
>>> + *
>>> + * Unlock all locked objects and resets the number of objects locked.
>>> + */
>>> +void drm_exec_reset(struct drm_exec *exec)
>>> +{
>>> +    WARN_ON(!exec->contended);
>>> +    drm_exec_unlock_all(exec);
>>> +    exec->num_objects = 0;
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_reset);
>>> +
>>> +/* Track the locked object in the array */
>>> +static int drm_exec_obj_locked(struct drm_exec *exec,
>>> +                   struct drm_gem_object *obj)
>>> +{
>>> +    if (unlikely(exec->num_objects == exec->max_objects)) {
>>> +        size_t size = exec->max_objects * sizeof(void *);
>>> +        void *tmp;
>>> +
>>> +        tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
>>> +                GFP_KERNEL);
>>> +        if (!tmp)
>>> +            return -ENOMEM;
>>
>> Sometimes you need to just temporarily lock an object and then unlock 
>> it again if it goes out of scope before reaching the end of 
>> _until_all_locked(). In that case you might need to remove a lock 
>> from the array. I *think* for all use-cases in i915 it would suffice 
>> to take a snapshot of num_objects, and unlock everything above that, 
>> having exec->objects behave like a stack, but was ever a list 
>> considered instead of a realloced array?
>
> Yes, the problem is that linked lists really suck regarding their 
> cache line locality. That's why I've came up with this approach here.
>
> What we could maybe do is to allow unlocking objects, but with the 
> cost of linear backward searching for them in the array.
>
>>
>>> +
>>> +        exec->objects = tmp;
>>> +        exec->max_objects += PAGE_SIZE / sizeof(void *);
>>> +    }
>>> +    drm_gem_object_get(obj);
>>> +    exec->objects[exec->num_objects++] = obj;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/* Make sure the contended object is locked first */
>>> +static int drm_exec_lock_contended(struct drm_exec *exec)
>>> +{
>>> +    struct drm_gem_object *obj = exec->contended;
>>> +    int ret;
>>> +
>>> +    if (likely(!obj))
>>> +        return 0;
>>> +
>>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) {
>>> +        ret = dma_resv_lock_slow_interruptible(obj->resv,
>>> +                               &exec->ticket);
>>> +        if (unlikely(ret))
>>> +            goto error_dropref;
>>> +    } else {
>>> +        dma_resv_lock_slow(obj->resv, &exec->ticket);
>>> +    }
>>> +
>>
>> Sometimes you want to just drop the contended lock after the above 
>> relaxation. (Eviction would be one example), and not add as 
>> prelocked, if the contended object goes out of scope. Eviction would 
>> in some situations be one such example, -EDEADLOCK leading to an 
>> error path where the object should otherwise be freed is another. 
>> Perhaps we could add an argument to prepare_obj() as to whether the 
>> object should be immediately put after relaxation.
>
> I was considering a try_prepare version as well, that should cover 
> this use case.

That sounds a bit different from this use-case. The use-case above 
would, on -EDEADLOCK actually unlock everything, then lock-slow the 
contending lock and then immediately unlock it and drop. It sounds like 
try_prepare would just skip locking and continue with everything locked 
so far still locked?

>
>>
>>> +    ret = drm_exec_obj_locked(exec, obj);
>>> +    if (unlikely(ret)) {
>>> +        dma_resv_unlock(obj->resv);
>>> +        goto error_dropref;
>>> +    }
>>> +
>>> +    swap(exec->prelocked, obj);
>>> +
>>> +error_dropref:
>>> +    /* Always cleanup the contention so that error handling can 
>>> kick in */
>>> +    drm_gem_object_put(obj);
>>> +    exec->contended = NULL;
>>> +    return ret;
>>> +}
>>> +
>>> +/**
>>> + * drm_exec_prepare_obj - prepare a GEM object for use
>>> + * @exec: the drm_exec object with the state
>>> + * @obj: the GEM object to prepare
>>> + * @num_fences: how many fences to reserve
>>> + *
>>> + * Prepare a GEM object for use by locking it and reserving fence 
>>> slots. All
>>> + * successfully locked objects are put into the locked container.
>>> + *
>>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when 
>>> object is
>>> + * already locked, -ENOMEM when memory allocation failed and zero 
>>> for success.
>>> + */
>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>> drm_gem_object *obj,
>>> +             unsigned int num_fences)
>>> +{
>>> +    int ret;
>>> +
>>> +    ret = drm_exec_lock_contended(exec);
>>> +    if (unlikely(ret))
>>> +        return ret;
>>> +
>>> +    if (exec->prelocked == obj) {
>>> +        drm_gem_object_put(exec->prelocked);
>>> +        exec->prelocked = NULL;
>>> +
>>> +        return dma_resv_reserve_fences(obj->resv, num_fences);
>>> +    }
>>> +
>>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>>> +        ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
>>> +    else
>>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>>> +
>>> +    if (unlikely(ret == -EDEADLK)) {
>>> +        drm_gem_object_get(obj);
>>> +        exec->contended = obj;
>>> +        return -EDEADLK;
>>> +    }
>>> +
>>> +    if (unlikely(ret == -EALREADY &&
>>> +        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>>> +        goto reserve_fences;
>>> +
>>> +    if (unlikely(ret))
>>> +        return ret;
>>> +
>>> +    ret = drm_exec_obj_locked(exec, obj);
>>> +    if (ret)
>>> +        goto error_unlock;
>>> +
>>> +reserve_fences:
>>> +    /* Keep locked when reserving fences fails */
>>> +    return dma_resv_reserve_fences(obj->resv, num_fences);
>>
>> Ugh, what is the use-case for keeping things locked here? How would a 
>> caller tell the difference between an error where everything is 
>> locked and nothing is locked? IMO, we should unlock on error here. If 
>> there indeed is a use-case we should add a separate function for 
>> reserving fences for all locked objects, rather than returning 
>> sometimes locked on error sometime not.
>
> We return the object locked here because it was to much churn to 
> remove it again from the array and we are getting fully cleaned up at 
> the end anyway.

OK, so if we add an unlock functionality, we could just have a 
consistent locking state on error return?

Thanks,
Thomas

>
> Regards,
> Christian.
>
>>
>> Thanks,
>>
>> Thomas
>>
>>
>>> +
>>> +error_unlock:
>>> +    dma_resv_unlock(obj->resv);
>>> +    return ret;
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>>> +
>>> +/**
>>> + * drm_exec_prepare_array - helper to prepare an array of objects
>>> + * @exec: the drm_exec object with the state
>>> + * @objects: array of GEM object to prepare
>>> + * @num_objects: number of GEM objects in the array
>>> + * @num_fences: number of fences to reserve on each GEM object
>>> + *
>>> + * Prepares all GEM objects in an array, handles contention but 
>>> aports on first
>>> + * error otherwise. Reserves @num_fences on each GEM object after 
>>> locking it.
>>> + *
>>> + * Returns: -EALREADY when object is already locked, -ENOMEM when 
>>> memory
>>> + * allocation failed and zero for success.
>>> + */
>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>> +               struct drm_gem_object **objects,
>>> +               unsigned int num_objects,
>>> +               unsigned int num_fences)
>>> +{
>>> +    int ret;
>>> +
>>> +    for (unsigned int i = 0; i < num_objects; ++i) {
>>> +        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>>> +        if (ret)
>>> +            return ret;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>>> +
>>> +MODULE_DESCRIPTION("DRM execution context");
>>> +MODULE_LICENSE("Dual MIT/GPL");
>>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>>> new file mode 100644
>>> index 000000000000..b1a5da0509c1
>>> --- /dev/null
>>> +++ b/include/drm/drm_exec.h
>>> @@ -0,0 +1,130 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>> +
>>> +#ifndef __DRM_EXEC_H__
>>> +#define __DRM_EXEC_H__
>>> +
>>> +#include <linux/ww_mutex.h>
>>> +
>>> +struct drm_gem_object;
>>> +
>>> +/**
>>> + * enum drm_exec_flags - Execution context flags
>>> + */
>>> +enum drm_exec_flags {
>>> +    /**
>>> +     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use 
>>> interruptible locking
>>> +     * functions.
>>> +     */
>>> +    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>>> +
>>> +    /**
>>> +     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow 
>>> EALREADY errors.
>>> +     */
>>> +    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>>> +};
>>> +
>>> +/**
>>> + * struct drm_exec - Execution context
>>> + */
>>> +struct drm_exec {
>>> +    /**
>>> +     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>>> +     */
>>> +    u32 flags;
>>> +
>>> +    /**
>>> +     * @ticket: WW ticket used for acquiring locks
>>> +     */
>>> +    struct ww_acquire_ctx    ticket;
>>> +
>>> +    /**
>>> +     * @num_objects: number of objects locked
>>> +     */
>>> +    unsigned int        num_objects;
>>> +
>>> +    /**
>>> +     * @max_objects: maximum objects in array
>>> +     */
>>> +    unsigned int        max_objects;
>>> +
>>> +    /**
>>> +     * @objects: array of the locked objects
>>> +     */
>>> +    struct drm_gem_object    **objects;
>>> +
>>> +    /**
>>> +     * @contended: contended GEM object we backed off for
>>> +     */
>>> +    struct drm_gem_object    *contended;
>>> +
>>> +    /**
>>> +     * @prelocked: already locked GEM object due to contention
>>> +     */
>>> +    struct drm_gem_object *prelocked;
>>> +};
>>> +
>>> +/**
>>> + * drm_exec_for_each_locked_object - iterate over all the locked 
>>> objects
>>> + * @exec: drm_exec object
>>> + * @index: unsigned long index for the iteration
>>> + * @obj: the current GEM object
>>> + *
>>> + * Iterate over all the locked GEM objects inside the drm_exec object.
>>> + */
>>> +#define drm_exec_for_each_locked_object(exec, index, obj) \
>>> +    for (index = 0, obj = (exec)->objects[0];        \
>>> +         index < (exec)->num_objects;            \
>>> +         ++index, obj = (exec)->objects[index])
>>> +
>>> +/**
>>> + * drm_exec_until_all_locked - retry objects preparation until all 
>>> objects
>>> + * are locked
>>> + * @exec: drm_exec object
>>> + * @expr: expression to be evaluated on each attempt
>>> + *
>>> + * This helper tries to prepare objects and if a deadlock is detected,
>>> + * rollbacks and retries.
>>> + *
>>> + * @expr is typically a function that tries to prepare objects using
>>> + * drm_exec_prepare_obj().
>>> + *
>>> + * If we take drm_exec_prepare_array() as an example, you should do:
>>> + *
>>> + *    ret = drm_exec_until_all_locked(exec,
>>> + *                    drm_exec_prepare_array(exec,
>>> + *                                   objs,
>>> + *                                   num_objs,
>>> + *                                   num_fences));
>>> + *    if (ret)
>>> + *        goto error_path;
>>> + *
>>> + *    ...
>>> + *
>>> + * Returns: 0 on success, a negative error code on failure.
>>> + */
>>> +#define drm_exec_until_all_locked(exec, expr)        \
>>> +    ({                        \
>>> +        __label__ retry;            \
>>> +        int __ret;                \
>>> +retry:                            \
>>> +        __ret = expr;                \
>>> +        if ((exec)->contended) {        \
>>> +            WARN_ON(__ret != -EDEADLK);    \
>>> +            drm_exec_reset(exec);        \
>>> +            goto retry;            \
>>> +        }                    \
>>> +        ww_acquire_done(&(exec)->ticket);    \
>>> +        __ret;                    \
>>> +    })
>>> +
>>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>>> +void drm_exec_fini(struct drm_exec *exec);
>>> +void drm_exec_reset(struct drm_exec *exec);
>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>> drm_gem_object *obj,
>>> +             unsigned int num_fences);
>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>> +               struct drm_gem_object **objects,
>>> +               unsigned int num_objects,
>>> +               unsigned int num_fences);
>>> +
>>> +#endif

Christian König June 19, 2023, 9:48 a.m. UTC | #10

Hi,

Am 19.06.23 um 11:33 schrieb Thomas Hellström (Intel):
> [SNIP]
>>> Sometimes you want to just drop the contended lock after the above 
>>> relaxation. (Eviction would be one example), and not add as 
>>> prelocked, if the contended object goes out of scope. Eviction would 
>>> in some situations be one such example, -EDEADLOCK leading to an 
>>> error path where the object should otherwise be freed is another. 
>>> Perhaps we could add an argument to prepare_obj() as to whether the 
>>> object should be immediately put after relaxation.
>>
>> I was considering a try_prepare version as well, that should cover 
>> this use case.
>
> That sounds a bit different from this use-case. The use-case above 
> would, on -EDEADLOCK actually unlock everything, then lock-slow the 
> contending lock and then immediately unlock it and drop.

Hui? What would that be good for?

> It sounds like try_prepare would just skip locking and continue with 
> everything locked so far still locked?

Correct.

>
>>
>>>
>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>> +    if (unlikely(ret)) {
>>>> +        dma_resv_unlock(obj->resv);
>>>> +        goto error_dropref;
>>>> +    }
>>>> +
>>>> +    swap(exec->prelocked, obj);
>>>> +
>>>> +error_dropref:
>>>> +    /* Always cleanup the contention so that error handling can 
>>>> kick in */
>>>> +    drm_gem_object_put(obj);
>>>> +    exec->contended = NULL;
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +/**
>>>> + * drm_exec_prepare_obj - prepare a GEM object for use
>>>> + * @exec: the drm_exec object with the state
>>>> + * @obj: the GEM object to prepare
>>>> + * @num_fences: how many fences to reserve
>>>> + *
>>>> + * Prepare a GEM object for use by locking it and reserving fence 
>>>> slots. All
>>>> + * successfully locked objects are put into the locked container.
>>>> + *
>>>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when 
>>>> object is
>>>> + * already locked, -ENOMEM when memory allocation failed and zero 
>>>> for success.
>>>> + */
>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>> drm_gem_object *obj,
>>>> +             unsigned int num_fences)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    ret = drm_exec_lock_contended(exec);
>>>> +    if (unlikely(ret))
>>>> +        return ret;
>>>> +
>>>> +    if (exec->prelocked == obj) {
>>>> +        drm_gem_object_put(exec->prelocked);
>>>> +        exec->prelocked = NULL;
>>>> +
>>>> +        return dma_resv_reserve_fences(obj->resv, num_fences);
>>>> +    }
>>>> +
>>>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>>>> +        ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
>>>> +    else
>>>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>>>> +
>>>> +    if (unlikely(ret == -EDEADLK)) {
>>>> +        drm_gem_object_get(obj);
>>>> +        exec->contended = obj;
>>>> +        return -EDEADLK;
>>>> +    }
>>>> +
>>>> +    if (unlikely(ret == -EALREADY &&
>>>> +        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>>>> +        goto reserve_fences;
>>>> +
>>>> +    if (unlikely(ret))
>>>> +        return ret;
>>>> +
>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>> +    if (ret)
>>>> +        goto error_unlock;
>>>> +
>>>> +reserve_fences:
>>>> +    /* Keep locked when reserving fences fails */
>>>> +    return dma_resv_reserve_fences(obj->resv, num_fences);
>>>
>>> Ugh, what is the use-case for keeping things locked here? How would 
>>> a caller tell the difference between an error where everything is 
>>> locked and nothing is locked? IMO, we should unlock on error here. 
>>> If there indeed is a use-case we should add a separate function for 
>>> reserving fences for all locked objects, rather than returning 
>>> sometimes locked on error sometime not.
>>
>> We return the object locked here because it was to much churn to 
>> remove it again from the array and we are getting fully cleaned up at 
>> the end anyway.
>
> OK, so if we add an unlock functionality, we could just have a 
> consistent locking state on error return?

Yeah, that should work. Going to work on this.

Regards,
Christian.

>
> Thanks,
> Thomas
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>>
>>> Thomas
>>>
>>>
>>>> +
>>>> +error_unlock:
>>>> +    dma_resv_unlock(obj->resv);
>>>> +    return ret;
>>>> +}
>>>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>>>> +
>>>> +/**
>>>> + * drm_exec_prepare_array - helper to prepare an array of objects
>>>> + * @exec: the drm_exec object with the state
>>>> + * @objects: array of GEM object to prepare
>>>> + * @num_objects: number of GEM objects in the array
>>>> + * @num_fences: number of fences to reserve on each GEM object
>>>> + *
>>>> + * Prepares all GEM objects in an array, handles contention but 
>>>> aports on first
>>>> + * error otherwise. Reserves @num_fences on each GEM object after 
>>>> locking it.
>>>> + *
>>>> + * Returns: -EALREADY when object is already locked, -ENOMEM when 
>>>> memory
>>>> + * allocation failed and zero for success.
>>>> + */
>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>> +               struct drm_gem_object **objects,
>>>> +               unsigned int num_objects,
>>>> +               unsigned int num_fences)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    for (unsigned int i = 0; i < num_objects; ++i) {
>>>> +        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>>>> +        if (ret)
>>>> +            return ret;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>>>> +
>>>> +MODULE_DESCRIPTION("DRM execution context");
>>>> +MODULE_LICENSE("Dual MIT/GPL");
>>>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>>>> new file mode 100644
>>>> index 000000000000..b1a5da0509c1
>>>> --- /dev/null
>>>> +++ b/include/drm/drm_exec.h
>>>> @@ -0,0 +1,130 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>>> +
>>>> +#ifndef __DRM_EXEC_H__
>>>> +#define __DRM_EXEC_H__
>>>> +
>>>> +#include <linux/ww_mutex.h>
>>>> +
>>>> +struct drm_gem_object;
>>>> +
>>>> +/**
>>>> + * enum drm_exec_flags - Execution context flags
>>>> + */
>>>> +enum drm_exec_flags {
>>>> +    /**
>>>> +     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use 
>>>> interruptible locking
>>>> +     * functions.
>>>> +     */
>>>> +    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>>>> +
>>>> +    /**
>>>> +     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow 
>>>> EALREADY errors.
>>>> +     */
>>>> +    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_exec - Execution context
>>>> + */
>>>> +struct drm_exec {
>>>> +    /**
>>>> +     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>>>> +     */
>>>> +    u32 flags;
>>>> +
>>>> +    /**
>>>> +     * @ticket: WW ticket used for acquiring locks
>>>> +     */
>>>> +    struct ww_acquire_ctx    ticket;
>>>> +
>>>> +    /**
>>>> +     * @num_objects: number of objects locked
>>>> +     */
>>>> +    unsigned int        num_objects;
>>>> +
>>>> +    /**
>>>> +     * @max_objects: maximum objects in array
>>>> +     */
>>>> +    unsigned int        max_objects;
>>>> +
>>>> +    /**
>>>> +     * @objects: array of the locked objects
>>>> +     */
>>>> +    struct drm_gem_object    **objects;
>>>> +
>>>> +    /**
>>>> +     * @contended: contended GEM object we backed off for
>>>> +     */
>>>> +    struct drm_gem_object    *contended;
>>>> +
>>>> +    /**
>>>> +     * @prelocked: already locked GEM object due to contention
>>>> +     */
>>>> +    struct drm_gem_object *prelocked;
>>>> +};
>>>> +
>>>> +/**
>>>> + * drm_exec_for_each_locked_object - iterate over all the locked 
>>>> objects
>>>> + * @exec: drm_exec object
>>>> + * @index: unsigned long index for the iteration
>>>> + * @obj: the current GEM object
>>>> + *
>>>> + * Iterate over all the locked GEM objects inside the drm_exec 
>>>> object.
>>>> + */
>>>> +#define drm_exec_for_each_locked_object(exec, index, obj) \
>>>> +    for (index = 0, obj = (exec)->objects[0];        \
>>>> +         index < (exec)->num_objects;            \
>>>> +         ++index, obj = (exec)->objects[index])
>>>> +
>>>> +/**
>>>> + * drm_exec_until_all_locked - retry objects preparation until all 
>>>> objects
>>>> + * are locked
>>>> + * @exec: drm_exec object
>>>> + * @expr: expression to be evaluated on each attempt
>>>> + *
>>>> + * This helper tries to prepare objects and if a deadlock is 
>>>> detected,
>>>> + * rollbacks and retries.
>>>> + *
>>>> + * @expr is typically a function that tries to prepare objects using
>>>> + * drm_exec_prepare_obj().
>>>> + *
>>>> + * If we take drm_exec_prepare_array() as an example, you should do:
>>>> + *
>>>> + *    ret = drm_exec_until_all_locked(exec,
>>>> + *                    drm_exec_prepare_array(exec,
>>>> + *                                   objs,
>>>> + *                                   num_objs,
>>>> + *                                   num_fences));
>>>> + *    if (ret)
>>>> + *        goto error_path;
>>>> + *
>>>> + *    ...
>>>> + *
>>>> + * Returns: 0 on success, a negative error code on failure.
>>>> + */
>>>> +#define drm_exec_until_all_locked(exec, expr)        \
>>>> +    ({                        \
>>>> +        __label__ retry;            \
>>>> +        int __ret;                \
>>>> +retry:                            \
>>>> +        __ret = expr;                \
>>>> +        if ((exec)->contended) {        \
>>>> +            WARN_ON(__ret != -EDEADLK);    \
>>>> +            drm_exec_reset(exec);        \
>>>> +            goto retry;            \
>>>> +        }                    \
>>>> +        ww_acquire_done(&(exec)->ticket);    \
>>>> +        __ret;                    \
>>>> +    })
>>>> +
>>>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>>>> +void drm_exec_fini(struct drm_exec *exec);
>>>> +void drm_exec_reset(struct drm_exec *exec);
>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>> drm_gem_object *obj,
>>>> +             unsigned int num_fences);
>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>> +               struct drm_gem_object **objects,
>>>> +               unsigned int num_objects,
>>>> +               unsigned int num_fences);
>>>> +
>>>> +#endif

Boris Brezillon June 19, 2023, 10:12 a.m. UTC | #11

Hello Thomas,

On Mon, 19 Jun 2023 10:59:16 +0200
Thomas Hellström (Intel) <thomas_os@shipmail.org> wrote:

> >>>>       
> >>>>> +/**
> >>>>> + * DOC: Overview
> >>>>> + *
> >>>>> + * This component mainly abstracts the retry loop necessary for locking
> >>>>> + * multiple GEM objects while preparing hardware operations (e.g. command
> >>>>> + * submissions, page table updates etc..).
> >>>>> + *
> >>>>> + * If a contention is detected while locking a GEM object the cleanup procedure
> >>>>> + * unlocks all previously locked GEM objects and locks the contended one first
> >>>>> + * before locking any further objects.
> >>>>> + *
> >>>>> + * After an object is locked fences slots can optionally be reserved on the
> >>>>> + * dma_resv object inside the GEM object.
> >>>>> + *
> >>>>> + * A typical usage pattern should look like this::
> >>>>> + *
> >>>>> + *	struct drm_gem_object *obj;
> >>>>> + *	struct drm_exec exec;
> >>>>> + *	unsigned long index;
> >>>>> + *	int ret;
> >>>>> + *
> >>>>> + *	drm_exec_init(&exec, true);
> >>>>> + *	drm_exec_while_not_all_locked(&exec) {
> >>>>> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> >>>>> + *		drm_exec_continue_on_contention(&exec);
> >>>>> + *		if (ret)
> >>>>> + *			goto error;
> >>>>> + *  
> >>>> Have you considered defining a drm_exec_try_prepare_obj_or_retry()
> >>>> combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?
> >>>>
> >>>> #define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
> >>>>           ({ \
> >>>>                   int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
> >>>>                   if (unlikely(drm_exec_is_contended(exec))) \
> >>>>                           continue; \
> >>>>                   __ret; \
> >>>>           })
> >>>>
> >>>> This way the following pattern
> >>>>
> >>>> 		ret = drm_exec_prepare_obj(&exec, boA, 1);
> >>>> 		drm_exec_continue_on_contention(&exec);
> >>>> 		if (ret)
> >>>> 			goto error;
> >>>>
> >>>> can be turned into something more conventional:
> >>>>
> >>>> 		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
> >>>> 		if (ret)
> >>>> 			goto error;  
> >>> Yeah, I was considering that as well. But then abandoned it as to
> >>> complicated.
> >>>
> >>> I really need to find some time to work on that anyway.  
> > I've been playing with drm_exec for a couple weeks now, and I wanted
> > to share something I hacked to try and make the API simpler and
> > more robust against misuse (see the below diff, which is a slightly
> > adjusted version of your work).  
> 
> It would be good if we could have someone taking charge of this series 
> and address all review comments, I see some of my comments getting lost, 
> we have multiple submitters and I can't find a dri-devel patchwork entry 
> for this.

My bad, I wasn't intending to submit a new version. I just added a
diff to show what I had in mind. This being said, it'd be great if we
could make some progress on this series, because we have quite a few
drivers depending on it now.

> 
> >
> > In this version, the user is no longer in control of the retry
> > loop. Instead, it provides an expression (a call to a
> > sub-function) to be re-evaluated each time a contention is
> > detected. IMHO, this makes the 'prepare-objs' functions easier to
> > apprehend, and avoids any mistake like calling
> > drm_exec_continue_on_contention() in an inner loop, or breaking
> > out of the drm_exec_while_all_locked() loop unintentionally.  
> 
> In i915 we've had a very similar helper to this, and while I agree this 
> newer version would probably help make code cleaner, but OTOH there also 
> are some places where the short drm_exec_while_all_locked() -likeblock 
> don't really motivate a separate function. Porting i915 to the current 
> version will take some work, For  the xe driver both versions would work 
> fine.

Note that the drm_exec_until_all_locked() helper I introduced is taking
an expression, so in theory, you don't have to define a separate
function.

	drm_exec_until_all_locked(&exec, {
		/* inlined-code */
		int ret;

		ret = blabla()
		if (ret)
			goto error;

		...

error:
		/* return value. */
		ret;
	});

This being said, as soon as you have several failure paths,
it makes things a lot easier/controllable if you make it a function,
and I honestly don't think the readability would suffer from having a
function defined just above the user. My main concern with the original
approach was the risk of calling continue/break_if_contended() in the
wrong place, and also the fact you can't really externalize things to
a function if you're looking for a cleaner split. At least with
drm_exec_until_all_locked() you can do both.

Regards,

Boris

Boris Brezillon June 19, 2023, 10:23 a.m. UTC | #12

On Mon, 19 Jun 2023 11:20:06 +0200
Christian König <christian.koenig@amd.com> wrote:

> Hi guys,
> 
> Am 19.06.23 um 10:59 schrieb Thomas Hellström (Intel):
> > [SNIP]  
> >>>>
> >>>> I really need to find some time to work on that anyway.  
> >> I've been playing with drm_exec for a couple weeks now, and I wanted
> >> to share something I hacked to try and make the API simpler and
> >> more robust against misuse (see the below diff, which is a slightly
> >> adjusted version of your work).  
> >
> > It would be good if we could have someone taking charge of this series 
> > and address all review comments, I see some of my comments getting 
> > lost, we have multiple submitters and I can't find a dri-devel 
> > patchwork entry for this. Anyway some comments below.  
> 
> I can try to find some time for the series this week (As long as nobody 
> comes along and has any burning roof).

That's great news!

> 
> >  
> >>
> >> In this version, the user is no longer in control of the retry
> >> loop. Instead, it provides an expression (a call to a
> >> sub-function) to be re-evaluated each time a contention is
> >> detected. IMHO, this makes the 'prepare-objs' functions easier to
> >> apprehend, and avoids any mistake like calling
> >> drm_exec_continue_on_contention() in an inner loop, or breaking
> >> out of the drm_exec_while_all_locked() loop unintentionally.  
> >
> > In i915 we've had a very similar helper to this, and while I agree 
> > this newer version would probably help make code cleaner, but OTOH 
> > there also are some places where the short drm_exec_while_all_locked() 
> > -likeblock don't really motivate a separate function. Porting i915 to 
> > the current version will take some work, For  the xe driver both 
> > versions would work fine.  
> 
> Yeah, this is actually what my first version of this looked like. But I 
> abandoned that approach because we have a lot of cases were we just 
> quickly want to lock a few GEM objects and don't want the extra overhead 
> of putting all the state into some bag to forward it to a function.

If you're talking about verbosity, it might be the case, though I guess
it mostly a matter of taste (I do like when things are well isolated).
As for runtime overhead, I'd expect the compiler to inline the function
anyway, so it's unlikely to change anything.

> >> +/* Track the locked object in the array */
> >> +static int drm_exec_obj_locked(struct drm_exec *exec,
> >> +                   struct drm_gem_object *obj)
> >> +{
> >> +    if (unlikely(exec->num_objects == exec->max_objects)) {
> >> +        size_t size = exec->max_objects * sizeof(void *);
> >> +        void *tmp;
> >> +
> >> +        tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
> >> +                GFP_KERNEL);
> >> +        if (!tmp)
> >> +            return -ENOMEM;  
> >
> > Sometimes you need to just temporarily lock an object and then unlock 
> > it again if it goes out of scope before reaching the end of 
> > _until_all_locked(). In that case you might need to remove a lock from 
> > the array. I *think* for all use-cases in i915 it would suffice to 
> > take a snapshot of num_objects, and unlock everything above that, 
> > having exec->objects behave like a stack, but was ever a list 
> > considered instead of a realloced array?  
> 
> Yes, the problem is that linked lists really suck regarding their cache 
> line locality. That's why I've came up with this approach here.

Hm, maybe I'm missing something, but if you place the list_head obj you
use to stack the locked objects close enough to the resv pointer, and
aligned on cache line, it shouldn't really be a problem, given you have
to dereference the GEM object to retrieve its resv anyway.

Christian König June 19, 2023, 10:44 a.m. UTC | #13

Am 19.06.23 um 12:12 schrieb Boris Brezillon:
> [SNIP]
> Note that the drm_exec_until_all_locked() helper I introduced is taking
> an expression, so in theory, you don't have to define a separate
> function.
>
> 	drm_exec_until_all_locked(&exec, {
> 		/* inlined-code */
> 		int ret;
>
> 		ret = blabla()
> 		if (ret)
> 			goto error;
>
> 		...
>
> error:
> 		/* return value. */
> 		ret;
> 	});
>
> This being said, as soon as you have several failure paths,
> it makes things a lot easier/controllable if you make it a function,
> and I honestly don't think the readability would suffer from having a
> function defined just above the user. My main concern with the original
> approach was the risk of calling continue/break_if_contended() in the
> wrong place, and also the fact you can't really externalize things to
> a function if you're looking for a cleaner split. At least with
> drm_exec_until_all_locked() you can do both.

Yeah, but that means that you can't use return inside your code block 
and instead has to define an error label for handling "normal" 
contention which is what I'm trying to avoid here.

How about:

#define drm_exec_until_all_locked(exec)    \
         __drm_exec_retry: if (drm_exec_cleanup(exec))


#define drm_exec_retry_on_contention(exec)              \
         if (unlikely(drm_exec_is_contended(exec)))      \
                 goto __drm_exec_retry


And then use it like:

drm_exec_until_all_locked(exec)
{
     ret = drm_exec_prepare_obj(exec, obj);
     drm_exec_retry_on_contention(exec);
}

The only problem I can see with this is that the __drm_exec_retry label 
would be function local.

Regards,
Christian.

>
> Regards,
>
> Boris

Boris Brezillon June 19, 2023, 11:05 a.m. UTC | #14

On Mon, 19 Jun 2023 12:44:06 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 19.06.23 um 12:12 schrieb Boris Brezillon:
> > [SNIP]
> > Note that the drm_exec_until_all_locked() helper I introduced is taking
> > an expression, so in theory, you don't have to define a separate
> > function.
> >
> > 	drm_exec_until_all_locked(&exec, {
> > 		/* inlined-code */
> > 		int ret;
> >
> > 		ret = blabla()
> > 		if (ret)
> > 			goto error;
> >
> > 		...
> >
> > error:
> > 		/* return value. */
> > 		ret;
> > 	});
> >
> > This being said, as soon as you have several failure paths,
> > it makes things a lot easier/controllable if you make it a function,
> > and I honestly don't think the readability would suffer from having a
> > function defined just above the user. My main concern with the original
> > approach was the risk of calling continue/break_if_contended() in the
> > wrong place, and also the fact you can't really externalize things to
> > a function if you're looking for a cleaner split. At least with
> > drm_exec_until_all_locked() you can do both.  
> 
> Yeah, but that means that you can't use return inside your code block 
> and instead has to define an error label for handling "normal" 
> contention which is what I'm trying to avoid here.
> 
> How about:
> 
> #define drm_exec_until_all_locked(exec)    \
>          __drm_exec_retry: if (drm_exec_cleanup(exec))
> 
> 
> #define drm_exec_retry_on_contention(exec)              \
>          if (unlikely(drm_exec_is_contended(exec)))      \
>                  goto __drm_exec_retry
> 
> 
> And then use it like:
> 
> drm_exec_until_all_locked(exec)
> {
>      ret = drm_exec_prepare_obj(exec, obj);
>      drm_exec_retry_on_contention(exec);
> }

That would work, and I was about to suggest extending my proposal with
a drm_exec_retry_on_contention() to support both use cases. The only
downside is the fact you might be able to break out of a loop that has
local variables, which will leak stack space.

> 
> The only problem I can see with this is that the __drm_exec_retry label 
> would be function local.

You can use local labels [1] to make it local to a block (see my
version, just need to rename the retry label into __drm_exec_retry). I
checked, and this is used elsewhere in the kernel (like in
linux/wait.h, which is a core feature), so it should be safe to use.

[1]https://gcc.gnu.org/onlinedocs/gcc/Local-Labels.html

Thomas Hellström (Intel) June 19, 2023, 11:06 a.m. UTC | #15

On 6/19/23 11:48, Christian König wrote:
> Hi,
>
> Am 19.06.23 um 11:33 schrieb Thomas Hellström (Intel):
>> [SNIP]
>>>> Sometimes you want to just drop the contended lock after the above 
>>>> relaxation. (Eviction would be one example), and not add as 
>>>> prelocked, if the contended object goes out of scope. Eviction 
>>>> would in some situations be one such example, -EDEADLOCK leading to 
>>>> an error path where the object should otherwise be freed is 
>>>> another. Perhaps we could add an argument to prepare_obj() as to 
>>>> whether the object should be immediately put after relaxation.
>>>
>>> I was considering a try_prepare version as well, that should cover 
>>> this use case.
>>
>> That sounds a bit different from this use-case. The use-case above 
>> would, on -EDEADLOCK actually unlock everything, then lock-slow the 
>> contending lock and then immediately unlock it and drop.
>
> Hui? What would that be good for?

It's for the case where you have nested locking, the contended lock has 
gone out-of-scope and you're probably not going to need it on the next 
attempt. I think the refcounted "prelocked" object that is lacking in 
the i915 variant will resolve all correctness / uaf issues, but still 
the object might be needlessly carried around for yet another locking round.


>
>> It sounds like try_prepare would just skip locking and continue with 
>> everything locked so far still locked?
>
> Correct.
>
>>
>>>
>>>>
>>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>>> +    if (unlikely(ret)) {
>>>>> +        dma_resv_unlock(obj->resv);
>>>>> +        goto error_dropref;
>>>>> +    }
>>>>> +
>>>>> +    swap(exec->prelocked, obj);
>>>>> +
>>>>> +error_dropref:
>>>>> +    /* Always cleanup the contention so that error handling can 
>>>>> kick in */
>>>>> +    drm_gem_object_put(obj);
>>>>> +    exec->contended = NULL;
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * drm_exec_prepare_obj - prepare a GEM object for use
>>>>> + * @exec: the drm_exec object with the state
>>>>> + * @obj: the GEM object to prepare
>>>>> + * @num_fences: how many fences to reserve
>>>>> + *
>>>>> + * Prepare a GEM object for use by locking it and reserving fence 
>>>>> slots. All
>>>>> + * successfully locked objects are put into the locked container.
>>>>> + *
>>>>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when 
>>>>> object is
>>>>> + * already locked, -ENOMEM when memory allocation failed and zero 
>>>>> for success.
>>>>> + */
>>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>>> drm_gem_object *obj,
>>>>> +             unsigned int num_fences)
>>>>> +{
>>>>> +    int ret;
>>>>> +
>>>>> +    ret = drm_exec_lock_contended(exec);
>>>>> +    if (unlikely(ret))
>>>>> +        return ret;
>>>>> +
>>>>> +    if (exec->prelocked == obj) {
>>>>> +        drm_gem_object_put(exec->prelocked);
>>>>> +        exec->prelocked = NULL;
>>>>> +
>>>>> +        return dma_resv_reserve_fences(obj->resv, num_fences);
>>>>> +    }
>>>>> +
>>>>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>>>>> +        ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
>>>>> +    else
>>>>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>>>>> +
>>>>> +    if (unlikely(ret == -EDEADLK)) {
>>>>> +        drm_gem_object_get(obj);
>>>>> +        exec->contended = obj;
>>>>> +        return -EDEADLK;
>>>>> +    }
>>>>> +
>>>>> +    if (unlikely(ret == -EALREADY &&
>>>>> +        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>>>>> +        goto reserve_fences;
>>>>> +
>>>>> +    if (unlikely(ret))
>>>>> +        return ret;
>>>>> +
>>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>>> +    if (ret)
>>>>> +        goto error_unlock;
>>>>> +
>>>>> +reserve_fences:
>>>>> +    /* Keep locked when reserving fences fails */
>>>>> +    return dma_resv_reserve_fences(obj->resv, num_fences);
>>>>
>>>> Ugh, what is the use-case for keeping things locked here? How would 
>>>> a caller tell the difference between an error where everything is 
>>>> locked and nothing is locked? IMO, we should unlock on error here. 
>>>> If there indeed is a use-case we should add a separate function for 
>>>> reserving fences for all locked objects, rather than returning 
>>>> sometimes locked on error sometime not.
>>>
>>> We return the object locked here because it was to much churn to 
>>> remove it again from the array and we are getting fully cleaned up 
>>> at the end anyway.
>>
>> OK, so if we add an unlock functionality, we could just have a 
>> consistent locking state on error return?
>
> Yeah, that should work. Going to work on this.

Great.

Thanks,

Thomas


>
> Regards,
> Christian.
>
>>
>> Thanks,
>> Thomas
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Thomas
>>>>
>>>>
>>>>> +
>>>>> +error_unlock:
>>>>> +    dma_resv_unlock(obj->resv);
>>>>> +    return ret;
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>>>>> +
>>>>> +/**
>>>>> + * drm_exec_prepare_array - helper to prepare an array of objects
>>>>> + * @exec: the drm_exec object with the state
>>>>> + * @objects: array of GEM object to prepare
>>>>> + * @num_objects: number of GEM objects in the array
>>>>> + * @num_fences: number of fences to reserve on each GEM object
>>>>> + *
>>>>> + * Prepares all GEM objects in an array, handles contention but 
>>>>> aports on first
>>>>> + * error otherwise. Reserves @num_fences on each GEM object after 
>>>>> locking it.
>>>>> + *
>>>>> + * Returns: -EALREADY when object is already locked, -ENOMEM when 
>>>>> memory
>>>>> + * allocation failed and zero for success.
>>>>> + */
>>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>>> +               struct drm_gem_object **objects,
>>>>> +               unsigned int num_objects,
>>>>> +               unsigned int num_fences)
>>>>> +{
>>>>> +    int ret;
>>>>> +
>>>>> +    for (unsigned int i = 0; i < num_objects; ++i) {
>>>>> +        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>>>>> +        if (ret)
>>>>> +            return ret;
>>>>> +    }
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>>>>> +
>>>>> +MODULE_DESCRIPTION("DRM execution context");
>>>>> +MODULE_LICENSE("Dual MIT/GPL");
>>>>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>>>>> new file mode 100644
>>>>> index 000000000000..b1a5da0509c1
>>>>> --- /dev/null
>>>>> +++ b/include/drm/drm_exec.h
>>>>> @@ -0,0 +1,130 @@
>>>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>>>> +
>>>>> +#ifndef __DRM_EXEC_H__
>>>>> +#define __DRM_EXEC_H__
>>>>> +
>>>>> +#include <linux/ww_mutex.h>
>>>>> +
>>>>> +struct drm_gem_object;
>>>>> +
>>>>> +/**
>>>>> + * enum drm_exec_flags - Execution context flags
>>>>> + */
>>>>> +enum drm_exec_flags {
>>>>> +    /**
>>>>> +     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use 
>>>>> interruptible locking
>>>>> +     * functions.
>>>>> +     */
>>>>> +    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>>>>> +
>>>>> +    /**
>>>>> +     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow 
>>>>> EALREADY errors.
>>>>> +     */
>>>>> +    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct drm_exec - Execution context
>>>>> + */
>>>>> +struct drm_exec {
>>>>> +    /**
>>>>> +     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>>>>> +     */
>>>>> +    u32 flags;
>>>>> +
>>>>> +    /**
>>>>> +     * @ticket: WW ticket used for acquiring locks
>>>>> +     */
>>>>> +    struct ww_acquire_ctx    ticket;
>>>>> +
>>>>> +    /**
>>>>> +     * @num_objects: number of objects locked
>>>>> +     */
>>>>> +    unsigned int        num_objects;
>>>>> +
>>>>> +    /**
>>>>> +     * @max_objects: maximum objects in array
>>>>> +     */
>>>>> +    unsigned int        max_objects;
>>>>> +
>>>>> +    /**
>>>>> +     * @objects: array of the locked objects
>>>>> +     */
>>>>> +    struct drm_gem_object    **objects;
>>>>> +
>>>>> +    /**
>>>>> +     * @contended: contended GEM object we backed off for
>>>>> +     */
>>>>> +    struct drm_gem_object    *contended;
>>>>> +
>>>>> +    /**
>>>>> +     * @prelocked: already locked GEM object due to contention
>>>>> +     */
>>>>> +    struct drm_gem_object *prelocked;
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * drm_exec_for_each_locked_object - iterate over all the locked 
>>>>> objects
>>>>> + * @exec: drm_exec object
>>>>> + * @index: unsigned long index for the iteration
>>>>> + * @obj: the current GEM object
>>>>> + *
>>>>> + * Iterate over all the locked GEM objects inside the drm_exec 
>>>>> object.
>>>>> + */
>>>>> +#define drm_exec_for_each_locked_object(exec, index, obj) \
>>>>> +    for (index = 0, obj = (exec)->objects[0];        \
>>>>> +         index < (exec)->num_objects;            \
>>>>> +         ++index, obj = (exec)->objects[index])
>>>>> +
>>>>> +/**
>>>>> + * drm_exec_until_all_locked - retry objects preparation until 
>>>>> all objects
>>>>> + * are locked
>>>>> + * @exec: drm_exec object
>>>>> + * @expr: expression to be evaluated on each attempt
>>>>> + *
>>>>> + * This helper tries to prepare objects and if a deadlock is 
>>>>> detected,
>>>>> + * rollbacks and retries.
>>>>> + *
>>>>> + * @expr is typically a function that tries to prepare objects using
>>>>> + * drm_exec_prepare_obj().
>>>>> + *
>>>>> + * If we take drm_exec_prepare_array() as an example, you should do:
>>>>> + *
>>>>> + *    ret = drm_exec_until_all_locked(exec,
>>>>> + *                    drm_exec_prepare_array(exec,
>>>>> + *                                   objs,
>>>>> + *                                   num_objs,
>>>>> + *                                   num_fences));
>>>>> + *    if (ret)
>>>>> + *        goto error_path;
>>>>> + *
>>>>> + *    ...
>>>>> + *
>>>>> + * Returns: 0 on success, a negative error code on failure.
>>>>> + */
>>>>> +#define drm_exec_until_all_locked(exec, expr)        \
>>>>> +    ({                        \
>>>>> +        __label__ retry;            \
>>>>> +        int __ret;                \
>>>>> +retry:                            \
>>>>> +        __ret = expr;                \
>>>>> +        if ((exec)->contended) {        \
>>>>> +            WARN_ON(__ret != -EDEADLK);    \
>>>>> +            drm_exec_reset(exec);        \
>>>>> +            goto retry;            \
>>>>> +        }                    \
>>>>> +        ww_acquire_done(&(exec)->ticket);    \
>>>>> +        __ret;                    \
>>>>> +    })
>>>>> +
>>>>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>>>>> +void drm_exec_fini(struct drm_exec *exec);
>>>>> +void drm_exec_reset(struct drm_exec *exec);
>>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>>> drm_gem_object *obj,
>>>>> +             unsigned int num_fences);
>>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>>> +               struct drm_gem_object **objects,
>>>>> +               unsigned int num_objects,
>>>>> +               unsigned int num_fences);
>>>>> +
>>>>> +#endif

Boris Brezillon June 19, 2023, 12:01 p.m. UTC | #16

On Mon, 19 Jun 2023 13:05:02 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Mon, 19 Jun 2023 12:44:06 +0200
> Christian König <christian.koenig@amd.com> wrote:
> 
> > Am 19.06.23 um 12:12 schrieb Boris Brezillon:  
> > > [SNIP]
> > > Note that the drm_exec_until_all_locked() helper I introduced is taking
> > > an expression, so in theory, you don't have to define a separate
> > > function.
> > >
> > > 	drm_exec_until_all_locked(&exec, {
> > > 		/* inlined-code */
> > > 		int ret;
> > >
> > > 		ret = blabla()
> > > 		if (ret)
> > > 			goto error;
> > >
> > > 		...
> > >
> > > error:
> > > 		/* return value. */
> > > 		ret;
> > > 	});
> > >
> > > This being said, as soon as you have several failure paths,
> > > it makes things a lot easier/controllable if you make it a function,
> > > and I honestly don't think the readability would suffer from having a
> > > function defined just above the user. My main concern with the original
> > > approach was the risk of calling continue/break_if_contended() in the
> > > wrong place, and also the fact you can't really externalize things to
> > > a function if you're looking for a cleaner split. At least with
> > > drm_exec_until_all_locked() you can do both.    
> > 
> > Yeah, but that means that you can't use return inside your code block 
> > and instead has to define an error label for handling "normal" 
> > contention which is what I'm trying to avoid here.
> > 
> > How about:
> > 
> > #define drm_exec_until_all_locked(exec)    \
> >          __drm_exec_retry: if (drm_exec_cleanup(exec))
> > 
> > 
> > #define drm_exec_retry_on_contention(exec)              \
> >          if (unlikely(drm_exec_is_contended(exec)))      \
> >                  goto __drm_exec_retry
> > 
> > 
> > And then use it like:
> > 
> > drm_exec_until_all_locked(exec)
> > {
> >      ret = drm_exec_prepare_obj(exec, obj);
> >      drm_exec_retry_on_contention(exec);
> > }  
> 
> That would work, and I was about to suggest extending my proposal with
> a drm_exec_retry_on_contention() to support both use cases. The only
> downside is the fact you might be able to break out of a loop that has
> local variables, which will leak stack space.

Nevermind, brain fart on my end. It shouldn't leak any stack space, so
yeah, I think that's a good compromise.

Boris Brezillon June 19, 2023, 12:29 p.m. UTC | #17

On Mon, 19 Jun 2023 12:44:06 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 19.06.23 um 12:12 schrieb Boris Brezillon:
> > [SNIP]
> > Note that the drm_exec_until_all_locked() helper I introduced is taking
> > an expression, so in theory, you don't have to define a separate
> > function.
> >
> > 	drm_exec_until_all_locked(&exec, {
> > 		/* inlined-code */
> > 		int ret;
> >
> > 		ret = blabla()
> > 		if (ret)
> > 			goto error;
> >
> > 		...
> >
> > error:
> > 		/* return value. */
> > 		ret;
> > 	});
> >
> > This being said, as soon as you have several failure paths,
> > it makes things a lot easier/controllable if you make it a function,
> > and I honestly don't think the readability would suffer from having a
> > function defined just above the user. My main concern with the original
> > approach was the risk of calling continue/break_if_contended() in the
> > wrong place, and also the fact you can't really externalize things to
> > a function if you're looking for a cleaner split. At least with
> > drm_exec_until_all_locked() you can do both.  
> 
> Yeah, but that means that you can't use return inside your code block 
> and instead has to define an error label for handling "normal" 
> contention which is what I'm trying to avoid here.

Sorry, didn't pay attention to this particular concern. Indeed, if you
want to return inside the expression, that's a problem.

> 
> How about:
> 
> #define drm_exec_until_all_locked(exec)    \
>          __drm_exec_retry: if (drm_exec_cleanup(exec))
> 
> 
> #define drm_exec_retry_on_contention(exec)              \
>          if (unlikely(drm_exec_is_contended(exec)))      \
>                  goto __drm_exec_retry
> 
> 
> And then use it like:
> 
> drm_exec_until_all_locked(exec)
> {
>      ret = drm_exec_prepare_obj(exec, obj);
>      drm_exec_retry_on_contention(exec);
> }
> 
> The only problem I can see with this is that the __drm_exec_retry label 
> would be function local.

Yeah, I'm not sure it's safe to use non-local labels for that, because,
as soon as you have more than one drm_exec_until_all_locked() call in a
given function it won't work, which is why I placed things in a block
with local labels, which in turn means you can't return directly,
unfortunately.

Boris Brezillon June 20, 2023, 6:47 a.m. UTC | #18

On Mon, 19 Jun 2023 14:29:23 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Mon, 19 Jun 2023 12:44:06 +0200
> Christian König <christian.koenig@amd.com> wrote:
> 
> > Am 19.06.23 um 12:12 schrieb Boris Brezillon:  
> > > [SNIP]
> > > Note that the drm_exec_until_all_locked() helper I introduced is taking
> > > an expression, so in theory, you don't have to define a separate
> > > function.
> > >
> > > 	drm_exec_until_all_locked(&exec, {
> > > 		/* inlined-code */
> > > 		int ret;
> > >
> > > 		ret = blabla()
> > > 		if (ret)
> > > 			goto error;
> > >
> > > 		...
> > >
> > > error:
> > > 		/* return value. */
> > > 		ret;
> > > 	});
> > >
> > > This being said, as soon as you have several failure paths,
> > > it makes things a lot easier/controllable if you make it a function,
> > > and I honestly don't think the readability would suffer from having a
> > > function defined just above the user. My main concern with the original
> > > approach was the risk of calling continue/break_if_contended() in the
> > > wrong place, and also the fact you can't really externalize things to
> > > a function if you're looking for a cleaner split. At least with
> > > drm_exec_until_all_locked() you can do both.    
> > 
> > Yeah, but that means that you can't use return inside your code block 
> > and instead has to define an error label for handling "normal" 
> > contention which is what I'm trying to avoid here.  
> 
> Sorry, didn't pay attention to this particular concern. Indeed, if you
> want to return inside the expression, that's a problem.

Sorry, that's wrong again. Had trouble focusing yesterday...

So, returning directly from the expression block should be perfectly
fine. The only problem is breaking out of the retry loop early and
propagating the error, but that's no more or less problematic than it
was before. We just need the drm_exec_retry_on_contention() helper you
suggested, and a drm_exec_stop() that would go to some local
__drm_exec_stop label.

	int ret = 0;

	ret = drm_exec_until_all_locked(exec, ({
		...
		ret = drm_exec_prepare_obj(exec, objA, 1);
		drm_exec_retry_on_contention(exec);
		if (ret)
			drm_exec_stop(exec, ret);
		...

		ret = drm_exec_prepare_obj(exec, objB, 1);
		drm_exec_retry_on_contention(exec);
		if (ret)
			drm_exec_stop(exec, ret);

		0;
	}));

Which is pretty close to the syntax you defined initially, except for
the '0;' oddity at the end, which is ugly, I admit.

Christian König June 20, 2023, 7:28 a.m. UTC | #19

Am 20.06.23 um 08:47 schrieb Boris Brezillon:
> On Mon, 19 Jun 2023 14:29:23 +0200
> Boris Brezillon <boris.brezillon@collabora.com> wrote:
>
>> On Mon, 19 Jun 2023 12:44:06 +0200
>> Christian König <christian.koenig@amd.com> wrote:
>>
>>> Am 19.06.23 um 12:12 schrieb Boris Brezillon:
>>>> [SNIP]
>>>> Note that the drm_exec_until_all_locked() helper I introduced is taking
>>>> an expression, so in theory, you don't have to define a separate
>>>> function.
>>>>
>>>> 	drm_exec_until_all_locked(&exec, {
>>>> 		/* inlined-code */
>>>> 		int ret;
>>>>
>>>> 		ret = blabla()
>>>> 		if (ret)
>>>> 			goto error;
>>>>
>>>> 		...
>>>>
>>>> error:
>>>> 		/* return value. */
>>>> 		ret;
>>>> 	});
>>>>
>>>> This being said, as soon as you have several failure paths,
>>>> it makes things a lot easier/controllable if you make it a function,
>>>> and I honestly don't think the readability would suffer from having a
>>>> function defined just above the user. My main concern with the original
>>>> approach was the risk of calling continue/break_if_contended() in the
>>>> wrong place, and also the fact you can't really externalize things to
>>>> a function if you're looking for a cleaner split. At least with
>>>> drm_exec_until_all_locked() you can do both.
>>> Yeah, but that means that you can't use return inside your code block
>>> and instead has to define an error label for handling "normal"
>>> contention which is what I'm trying to avoid here.
>> Sorry, didn't pay attention to this particular concern. Indeed, if you
>> want to return inside the expression, that's a problem.
> Sorry, that's wrong again. Had trouble focusing yesterday...
>
> So, returning directly from the expression block should be perfectly
> fine. The only problem is breaking out of the retry loop early and
> propagating the error, but that's no more or less problematic than it
> was before. We just need the drm_exec_retry_on_contention() helper you
> suggested, and a drm_exec_stop() that would go to some local
> __drm_exec_stop label.
>
> 	int ret = 0;
>
> 	ret = drm_exec_until_all_locked(exec, ({
> 		...
> 		ret = drm_exec_prepare_obj(exec, objA, 1);
> 		drm_exec_retry_on_contention(exec);
> 		if (ret)
> 			drm_exec_stop(exec, ret);
> 		...
>
> 		ret = drm_exec_prepare_obj(exec, objB, 1);
> 		drm_exec_retry_on_contention(exec);
> 		if (ret)
> 			drm_exec_stop(exec, ret);
>
> 		0;
> 	}));
>
> Which is pretty close to the syntax you defined initially, except for
> the '0;' oddity at the end, which is ugly, I admit.

Yeah and it looks like giving code blocks as macro argument is usually 
also not a good idea.

How about this:

#define drm_exec_until_all_locked(exec)                         \
         for (void *__drm_exec_retry_ptr; ({                     \
                 __label__ __drm_exec_retry;                     \
__drm_exec_retry:                                               \
                 __drm_exec_retry_ptr = &&__drm_exec_retry;      \
                 drm_exec_cleanup(exec);                         \
         });)

#define drm_exec_retry_on_contention(exec)              \
         if (unlikely(drm_exec_is_contended(exec)))      \
                 goto *__drm_exec_retry_ptr


The problem is that you can't declare a label so that it dominates the 
body of the loop.

But what you can do is to declare a void* which dominates the loop, then 
assign the address of a label to it and when you need it use goto*.

The goto* syntax is a gcc extension, but BPF is already using that in 
the upstream kernel.

It's quite a hack and I need to extend my testing a bit, but as far as I 
can see this actually works.

Regards,
Christian.

Christian König June 21, 2023, 1:35 p.m. UTC | #20

Am 19.06.23 um 13:06 schrieb Thomas Hellström (Intel):
>
> On 6/19/23 11:48, Christian König wrote:
>> Hi,
>>
>> Am 19.06.23 um 11:33 schrieb Thomas Hellström (Intel):
>>> [SNIP]
>>>>> Sometimes you want to just drop the contended lock after the above 
>>>>> relaxation. (Eviction would be one example), and not add as 
>>>>> prelocked, if the contended object goes out of scope. Eviction 
>>>>> would in some situations be one such example, -EDEADLOCK leading 
>>>>> to an error path where the object should otherwise be freed is 
>>>>> another. Perhaps we could add an argument to prepare_obj() as to 
>>>>> whether the object should be immediately put after relaxation.
>>>>
>>>> I was considering a try_prepare version as well, that should cover 
>>>> this use case.
>>>
>>> That sounds a bit different from this use-case. The use-case above 
>>> would, on -EDEADLOCK actually unlock everything, then lock-slow the 
>>> contending lock and then immediately unlock it and drop.
>>
>> Hui? What would that be good for?
>
> It's for the case where you have nested locking, the contended lock 
> has gone out-of-scope and you're probably not going to need it on the 
> next attempt. I think the refcounted "prelocked" object that is 
> lacking in the i915 variant will resolve all correctness / uaf issues, 
> but still the object might be needlessly carried around for yet 
> another locking round.

Yeah, but that case is so rare that we probably don't need to care about it.

I've changed the implementation so that it should now match the other 
requirements.

Going to send that out now, please double check.

Thanks,
Christian.


>
>
>
>>
>>> It sounds like try_prepare would just skip locking and continue with 
>>> everything locked so far still locked?
>>
>> Correct.
>>
>>>
>>>>
>>>>>
>>>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>>>> +    if (unlikely(ret)) {
>>>>>> +        dma_resv_unlock(obj->resv);
>>>>>> +        goto error_dropref;
>>>>>> +    }
>>>>>> +
>>>>>> +    swap(exec->prelocked, obj);
>>>>>> +
>>>>>> +error_dropref:
>>>>>> +    /* Always cleanup the contention so that error handling can 
>>>>>> kick in */
>>>>>> +    drm_gem_object_put(obj);
>>>>>> +    exec->contended = NULL;
>>>>>> +    return ret;
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_exec_prepare_obj - prepare a GEM object for use
>>>>>> + * @exec: the drm_exec object with the state
>>>>>> + * @obj: the GEM object to prepare
>>>>>> + * @num_fences: how many fences to reserve
>>>>>> + *
>>>>>> + * Prepare a GEM object for use by locking it and reserving 
>>>>>> fence slots. All
>>>>>> + * successfully locked objects are put into the locked container.
>>>>>> + *
>>>>>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when 
>>>>>> object is
>>>>>> + * already locked, -ENOMEM when memory allocation failed and 
>>>>>> zero for success.
>>>>>> + */
>>>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>>>> drm_gem_object *obj,
>>>>>> +             unsigned int num_fences)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    ret = drm_exec_lock_contended(exec);
>>>>>> +    if (unlikely(ret))
>>>>>> +        return ret;
>>>>>> +
>>>>>> +    if (exec->prelocked == obj) {
>>>>>> +        drm_gem_object_put(exec->prelocked);
>>>>>> +        exec->prelocked = NULL;
>>>>>> +
>>>>>> +        return dma_resv_reserve_fences(obj->resv, num_fences);
>>>>>> +    }
>>>>>> +
>>>>>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>>>>>> +        ret = dma_resv_lock_interruptible(obj->resv, 
>>>>>> &exec->ticket);
>>>>>> +    else
>>>>>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>>>>>> +
>>>>>> +    if (unlikely(ret == -EDEADLK)) {
>>>>>> +        drm_gem_object_get(obj);
>>>>>> +        exec->contended = obj;
>>>>>> +        return -EDEADLK;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (unlikely(ret == -EALREADY &&
>>>>>> +        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>>>>>> +        goto reserve_fences;
>>>>>> +
>>>>>> +    if (unlikely(ret))
>>>>>> +        return ret;
>>>>>> +
>>>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>>>> +    if (ret)
>>>>>> +        goto error_unlock;
>>>>>> +
>>>>>> +reserve_fences:
>>>>>> +    /* Keep locked when reserving fences fails */
>>>>>> +    return dma_resv_reserve_fences(obj->resv, num_fences);
>>>>>
>>>>> Ugh, what is the use-case for keeping things locked here? How 
>>>>> would a caller tell the difference between an error where 
>>>>> everything is locked and nothing is locked? IMO, we should unlock 
>>>>> on error here. If there indeed is a use-case we should add a 
>>>>> separate function for reserving fences for all locked objects, 
>>>>> rather than returning sometimes locked on error sometime not.
>>>>
>>>> We return the object locked here because it was to much churn to 
>>>> remove it again from the array and we are getting fully cleaned up 
>>>> at the end anyway.
>>>
>>> OK, so if we add an unlock functionality, we could just have a 
>>> consistent locking state on error return?
>>
>> Yeah, that should work. Going to work on this.
>
> Great.
>
> Thanks,
>
> Thomas
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>> Thomas
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>>> +
>>>>>> +error_unlock:
>>>>>> +    dma_resv_unlock(obj->resv);
>>>>>> +    return ret;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_exec_prepare_array - helper to prepare an array of objects
>>>>>> + * @exec: the drm_exec object with the state
>>>>>> + * @objects: array of GEM object to prepare
>>>>>> + * @num_objects: number of GEM objects in the array
>>>>>> + * @num_fences: number of fences to reserve on each GEM object
>>>>>> + *
>>>>>> + * Prepares all GEM objects in an array, handles contention but 
>>>>>> aports on first
>>>>>> + * error otherwise. Reserves @num_fences on each GEM object 
>>>>>> after locking it.
>>>>>> + *
>>>>>> + * Returns: -EALREADY when object is already locked, -ENOMEM 
>>>>>> when memory
>>>>>> + * allocation failed and zero for success.
>>>>>> + */
>>>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>>>> +               struct drm_gem_object **objects,
>>>>>> +               unsigned int num_objects,
>>>>>> +               unsigned int num_fences)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    for (unsigned int i = 0; i < num_objects; ++i) {
>>>>>> +        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>>>>>> +        if (ret)
>>>>>> +            return ret;
>>>>>> +    }
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>>>>>> +
>>>>>> +MODULE_DESCRIPTION("DRM execution context");
>>>>>> +MODULE_LICENSE("Dual MIT/GPL");
>>>>>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>>>>>> new file mode 100644
>>>>>> index 000000000000..b1a5da0509c1
>>>>>> --- /dev/null
>>>>>> +++ b/include/drm/drm_exec.h
>>>>>> @@ -0,0 +1,130 @@
>>>>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>>>>> +
>>>>>> +#ifndef __DRM_EXEC_H__
>>>>>> +#define __DRM_EXEC_H__
>>>>>> +
>>>>>> +#include <linux/ww_mutex.h>
>>>>>> +
>>>>>> +struct drm_gem_object;
>>>>>> +
>>>>>> +/**
>>>>>> + * enum drm_exec_flags - Execution context flags
>>>>>> + */
>>>>>> +enum drm_exec_flags {
>>>>>> +    /**
>>>>>> +     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use 
>>>>>> interruptible locking
>>>>>> +     * functions.
>>>>>> +     */
>>>>>> +    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>>>>>> +
>>>>>> +    /**
>>>>>> +     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow 
>>>>>> EALREADY errors.
>>>>>> +     */
>>>>>> +    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>> + * struct drm_exec - Execution context
>>>>>> + */
>>>>>> +struct drm_exec {
>>>>>> +    /**
>>>>>> +     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>>>>>> +     */
>>>>>> +    u32 flags;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @ticket: WW ticket used for acquiring locks
>>>>>> +     */
>>>>>> +    struct ww_acquire_ctx    ticket;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @num_objects: number of objects locked
>>>>>> +     */
>>>>>> +    unsigned int        num_objects;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @max_objects: maximum objects in array
>>>>>> +     */
>>>>>> +    unsigned int        max_objects;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @objects: array of the locked objects
>>>>>> +     */
>>>>>> +    struct drm_gem_object    **objects;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @contended: contended GEM object we backed off for
>>>>>> +     */
>>>>>> +    struct drm_gem_object    *contended;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @prelocked: already locked GEM object due to contention
>>>>>> +     */
>>>>>> +    struct drm_gem_object *prelocked;
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_exec_for_each_locked_object - iterate over all the locked 
>>>>>> objects
>>>>>> + * @exec: drm_exec object
>>>>>> + * @index: unsigned long index for the iteration
>>>>>> + * @obj: the current GEM object
>>>>>> + *
>>>>>> + * Iterate over all the locked GEM objects inside the drm_exec 
>>>>>> object.
>>>>>> + */
>>>>>> +#define drm_exec_for_each_locked_object(exec, index, obj) \
>>>>>> +    for (index = 0, obj = (exec)->objects[0]; \
>>>>>> +         index < (exec)->num_objects; \
>>>>>> +         ++index, obj = (exec)->objects[index])
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_exec_until_all_locked - retry objects preparation until 
>>>>>> all objects
>>>>>> + * are locked
>>>>>> + * @exec: drm_exec object
>>>>>> + * @expr: expression to be evaluated on each attempt
>>>>>> + *
>>>>>> + * This helper tries to prepare objects and if a deadlock is 
>>>>>> detected,
>>>>>> + * rollbacks and retries.
>>>>>> + *
>>>>>> + * @expr is typically a function that tries to prepare objects 
>>>>>> using
>>>>>> + * drm_exec_prepare_obj().
>>>>>> + *
>>>>>> + * If we take drm_exec_prepare_array() as an example, you should 
>>>>>> do:
>>>>>> + *
>>>>>> + *    ret = drm_exec_until_all_locked(exec,
>>>>>> + *                    drm_exec_prepare_array(exec,
>>>>>> + *                                   objs,
>>>>>> + *                                   num_objs,
>>>>>> + *                                   num_fences));
>>>>>> + *    if (ret)
>>>>>> + *        goto error_path;
>>>>>> + *
>>>>>> + *    ...
>>>>>> + *
>>>>>> + * Returns: 0 on success, a negative error code on failure.
>>>>>> + */
>>>>>> +#define drm_exec_until_all_locked(exec, expr)        \
>>>>>> +    ({                        \
>>>>>> +        __label__ retry;            \
>>>>>> +        int __ret;                \
>>>>>> +retry:                            \
>>>>>> +        __ret = expr;                \
>>>>>> +        if ((exec)->contended) {        \
>>>>>> +            WARN_ON(__ret != -EDEADLK);    \
>>>>>> +            drm_exec_reset(exec);        \
>>>>>> +            goto retry;            \
>>>>>> +        }                    \
>>>>>> +        ww_acquire_done(&(exec)->ticket);    \
>>>>>> +        __ret;                    \
>>>>>> +    })
>>>>>> +
>>>>>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>>>>>> +void drm_exec_fini(struct drm_exec *exec);
>>>>>> +void drm_exec_reset(struct drm_exec *exec);
>>>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>>>> drm_gem_object *obj,
>>>>>> +             unsigned int num_fences);
>>>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>>>> +               struct drm_gem_object **objects,
>>>>>> +               unsigned int num_objects,
>>>>>> +               unsigned int num_fences);
>>>>>> +
>>>>>> +#endif

[01/13] drm: execution context for GEM buffers v4

Commit Message

Comments

Patch