diff mbox series

[RFC,03/11] drm: introduce drm evictable LRU

Message ID 20231102043306.2931989-4-oak.zeng@intel.com (mailing list archive)
State New, archived
Headers show
Series Introduce drm evictable lru | expand

Commit Message

Oak Zeng Nov. 2, 2023, 4:32 a.m. UTC
drm LRU manager is introuced for resource eviction purpose. It maintains
a LRU list per resource type. It provides functions to add or remove
resource to or from the list. It also provides function to retrieve the
first entity on the LRU list.

drm LRU manager also provides functions for bulk moving resources
on the LRU lists.

drm LRU manager also does very basic memory accounting function, i.e.,
LRU manager keeps a size of this resource type and a usage member
for how much of resource has been added to this LRU manager's LRU
list. TTM resource manager memory accounting functoins such as
struct ttm_resource_manager::size and struct ttm_resource_manger::usage
are still kept. In the future, when SVM codes are in the picture,
those memory accounting functions need some rework to consider
the memory used by both TTM and SVM.

For one device, a global drm LRU manager per resource type should be
created/initialized at device initialization time. Drm LRU manager
instances are embedded in struct drm_device.

It is pretty much moving some of the ttm resource manager functions
to the drm layer. The reason of this code refactory is, we want to
create a single LRU list for memory allocated from BO(buffer object)
based driver and hmm/svm(shared virtual memory) based driver, thus BO
driver and svm driver can evict memory from each other.

Previously the LRU list in TTM resource manager (lru field in struct
ttm_reource_manager) is coupled with ttm_buffer_object concept, i.e.,
each ttm resource is backed by a ttm_buffer_object and the LRU list
is essentially a list of ttm_buffer_object. Due to this behavior, the
TTM resource manager can't be used by hmm/svm driver as we don't plan
to have the BO concept for the hmm/svm implemenation. So we decouple
the evictable LRU list from the BO concept in this series.

The design goal of drm lru manager is to make it as lean as possible.
So each lru entity only has a list node member used to link this entity
to the evictable LRU list, and the basic resource size/type/priority
of this entity. It doesn't have any driver specify information. A lru
entity also has a function pointer of evict function. This is used to
implement ttm or svm specific eviction function. A lru entity is supposed
to be embedded in a driver specific structure such as struct
ttm_resource, see the usage in the next patch of this series.

The ttm resource manager, and some of the ttm_bo functions such as
ttm_mem_evict_first will be rewriten using the new drm lru manager
library, see the next patch in this series.

The future hmm/svm implemenation will call lru manager function to add
hmm/svm allocations to the shared evictable lru list.

Lock design: previously ttm_resource LRU list is protected by a device
global ttm_device::lru_lock (bdev->lru_lock in codes). This lock also
protects ttm_buffer_object::pin_count, ttm_resource_manager::usage,
ttm_resource::bo, ttm_device::pinned list etc. With this refactory,
lru_lock is moved out of ttm_device and is added to struct drm_deive, so
it can be shared b/t ttm code and svm code.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
---
 drivers/gpu/drm/Makefile            |   1 +
 drivers/gpu/drm/drm_evictable_lru.c | 232 ++++++++++++++++++++++++++++
 include/drm/drm_device.h            |   7 +
 include/drm/drm_evictable_lru.h     | 188 ++++++++++++++++++++++
 4 files changed, 428 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_evictable_lru.c
 create mode 100644 include/drm/drm_evictable_lru.h

Comments

Christian König Nov. 2, 2023, 1:23 p.m. UTC | #1
Am 02.11.23 um 05:32 schrieb Oak Zeng:
> drm LRU manager is introuced for resource eviction purpose. It maintains
> a LRU list per resource type.

Shouldn't we first add the possible resource types in a separate patch?

>   It provides functions to add or remove
> resource to or from the list. It also provides function to retrieve the
> first entity on the LRU list.

+ functions to iterate over them.

>
> drm LRU manager also provides functions for bulk moving resources
> on the LRU lists.
>
> drm LRU manager also does very basic memory accounting function, i.e.,
> LRU manager keeps a size of this resource type and a usage member
> for how much of resource has been added to this LRU manager's LRU
> list. TTM resource manager memory accounting functoins such as
> struct ttm_resource_manager::size and struct ttm_resource_manger::usage
> are still kept. In the future, when SVM codes are in the picture,
> those memory accounting functions need some rework to consider
> the memory used by both TTM and SVM.

Please keep in mind that this structure needs to extremely small to be 
usable for SVM. E.g. struct page size small :)

At least HMM based implementations ideally wants to have one for each 
page or something like that.

> For one device, a global drm LRU manager per resource type should be
> created/initialized at device initialization time. Drm LRU manager
> instances are embedded in struct drm_device.
>
> It is pretty much moving some of the ttm resource manager functions
> to the drm layer. The reason of this code refactory is, we want to
> create a single LRU list for memory allocated from BO(buffer object)
> based driver and hmm/svm(shared virtual memory) based driver, thus BO
> driver and svm driver can evict memory from each other.
>
> Previously the LRU list in TTM resource manager (lru field in struct
> ttm_reource_manager) is coupled with ttm_buffer_object concept, i.e.,
> each ttm resource is backed by a ttm_buffer_object and the LRU list
> is essentially a list of ttm_buffer_object.

Actually it's the other way around. The resource provides the backing of 
the BO.

And when a BO moves around it can temporary be that multiple resource 
point to the same BO.

I also want to have a more advanced iterator at some point where we grab 
the BO lock for keeping a reference into the LRU list. Not sure how to 
do this if we don't have the BO here any more.

Need to think about that further,
Christian.

>   Due to this behavior, the
> TTM resource manager can't be used by hmm/svm driver as we don't plan
> to have the BO concept for the hmm/svm implemenation. So we decouple
> the evictable LRU list from the BO concept in this series.
>
> The design goal of drm lru manager is to make it as lean as possible.
> So each lru entity only has a list node member used to link this entity
> to the evictable LRU list, and the basic resource size/type/priority
> of this entity. It doesn't have any driver specify information. A lru
> entity also has a function pointer of evict function. This is used to
> implement ttm or svm specific eviction function. A lru entity is supposed
> to be embedded in a driver specific structure such as struct
> ttm_resource, see the usage in the next patch of this series.
>
> The ttm resource manager, and some of the ttm_bo functions such as
> ttm_mem_evict_first will be rewriten using the new drm lru manager
> library, see the next patch in this series.
>
> The future hmm/svm implemenation will call lru manager function to add
> hmm/svm allocations to the shared evictable lru list.
>
> Lock design: previously ttm_resource LRU list is protected by a device
> global ttm_device::lru_lock (bdev->lru_lock in codes). This lock also
> protects ttm_buffer_object::pin_count, ttm_resource_manager::usage,
> ttm_resource::bo, ttm_device::pinned list etc. With this refactory,
> lru_lock is moved out of ttm_device and is added to struct drm_deive, so
> it can be shared b/t ttm code and svm code.
>
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> ---
>   drivers/gpu/drm/Makefile            |   1 +
>   drivers/gpu/drm/drm_evictable_lru.c | 232 ++++++++++++++++++++++++++++
>   include/drm/drm_device.h            |   7 +
>   include/drm/drm_evictable_lru.h     | 188 ++++++++++++++++++++++
>   4 files changed, 428 insertions(+)
>   create mode 100644 drivers/gpu/drm/drm_evictable_lru.c
>   create mode 100644 include/drm/drm_evictable_lru.h
>
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 1ad88efb1752..13953b0d271b 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -46,6 +46,7 @@ drm-y := \
>   	drm_vblank_work.o \
>   	drm_vma_manager.o \
>   	drm_gpuva_mgr.o \
> +	drm_evictable_lru.o \
>   	drm_writeback.o
>   drm-$(CONFIG_DRM_LEGACY) += \
>   	drm_agpsupport.o \
> diff --git a/drivers/gpu/drm/drm_evictable_lru.c b/drivers/gpu/drm/drm_evictable_lru.c
> new file mode 100644
> index 000000000000..2ba9105cca03
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_evictable_lru.c
> @@ -0,0 +1,232 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2023 Intel Corporation
> + */
> +
> +#include <linux/lockdep.h>
> +#include <linux/container_of.h>
> +#include <drm/drm_evictable_lru.h>
> +#include <drm/drm_device.h>
> +
> +static inline struct drm_lru_mgr *entity_to_mgr(struct drm_lru_entity *entity)
> +{
> +	struct drm_lru_mgr *mgr;
> +
> +	mgr = &entity->drm->lru_mgr[entity->mem_type];
> +	BUG_ON(!mgr->used);
> +
> +	return mgr;
> +}
> +
> +void drm_lru_entity_init(struct drm_lru_entity *entity, struct drm_device *drm,
> +			uint32_t mem_type, uint64_t size, uint32_t priority)
> +{
> +	entity->drm = drm;
> +	entity->mem_type = mem_type;
> +	entity->size = size;
> +	entity->priority = priority;
> +	INIT_LIST_HEAD(&entity->lru);
> +}
> +
> +/**
> + * drm_lru_mgr_init
> + *
> + * @mgr: drm lru manager to init
> + * @size: size of the resource managed by this manager
> + * @lock: pointer of the global lru_lock
> + *
> + * Initialize a drm lru manager
> + */
> +void drm_lru_mgr_init(struct drm_lru_mgr *mgr, uint64_t size, spinlock_t *lock)
> +{
> +	unsigned j;
> +
> +	mgr->used = true;
> +	mgr->size = size;
> +	mgr->usage = 0;
> +	mgr->lru_lock = lock;
> +
> +	for(j = 0; j < DRM_MAX_LRU_PRIORITY; j++)
> +		INIT_LIST_HEAD(&mgr->lru[j]);
> +}
> +
> +void drm_lru_bulk_move_init(struct drm_lru_bulk_move *bulk_move)
> +{
> +	memset(bulk_move, 0, sizeof(*bulk_move));
> +}
> +
> +/**
> + * drm_lru_first
> + *
> + * @mgr: drm lru manager to iterate over
> + * @cursor: cursor of the current position
> + *
> + * Returns the first entity in drm lru manager
> + */
> +struct drm_lru_entity *
> +drm_lru_first(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor)
> +{
> +	struct drm_lru_entity *entity;
> +
> +	lockdep_assert_held(mgr->lru_lock);
> +
> +	for(cursor->priority = 0; cursor->priority < DRM_MAX_LRU_PRIORITY; ++cursor->priority)
> +		list_for_each_entry(entity, &mgr->lru[cursor->priority], lru)
> +			return entity;
> +
> +	return NULL;
> +}
> +
> +/**
> + * drm_lru_next
> + *
> + * @mgr: drm lru manager to iterate over
> + * @cursor: cursor of the current position
> + * @entity: the current lru entity pointer
> + *
> + * Returns the next entity from drm lru manager
> + */
> +struct drm_lru_entity *
> +drm_lru_next(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor,
> +		struct drm_lru_entity *entity)
> +{
> +	lockdep_assert_held(mgr->lru_lock);
> +
> +	list_for_each_entry_continue(entity, &mgr->lru[cursor->priority], lru)
> +		return entity;
> +
> +	for(++cursor->priority; cursor->priority < DRM_MAX_LRU_PRIORITY; ++cursor->priority)
> +		list_for_each_entry(entity, &mgr->lru[cursor->priority], lru)
> +			return entity;
> +
> +	return NULL;
> +}
> +
> +/**
> + * drm_lru_move_to_tail
> + *
> + * @entity: the lru entity to move to lru tail
> + *
> + * Move a lru entity to lru tail
> + */
> +void drm_lru_move_to_tail(struct drm_lru_entity * entity)
> +{
> +	struct list_head *lru;
> +	struct drm_lru_mgr *mgr;
> +
> +	mgr = entity_to_mgr(entity);
> +	lockdep_assert_held(mgr->lru_lock);
> +	lru = &mgr->lru[entity->priority];
> +	list_move_tail(&entity->lru, lru);
> +}
> +
> +/**
> + * drm_lru_bulk_move_range_tail
> + *
> + * @range: bulk move range
> + * @entity: lru_entity to move
> + *
> + * Move a lru_entity to the tail of a bulk move range
> + */
> +void drm_lru_bulk_move_range_tail(struct drm_lru_bulk_move_range *range,
> +									struct drm_lru_entity *entity)
> +{
> +	if (entity == range->last)
> +		return;
> +
> +	if (entity == range->first)
> +		range->first = container_of(entity->lru.next, struct drm_lru_entity, lru);
> +
> +	if (range->last)
> +		list_move(&entity->lru, &range->last->lru);
> +
> +	range->last = entity;
> +}
> +EXPORT_SYMBOL(drm_lru_bulk_move_range_tail);
> +
> +/**
> + * drm_lru_bulk_move_tail - bulk move range of entities to the LRU tail.
> + *
> + * @bulk: bulk_move structure
> + *
> + * Bulk move entities to the LRU tail, only valid to use when driver makes sure that
> + * resource order never changes.
> + */
> +void drm_lru_bulk_move_tail(struct drm_lru_bulk_move *bulk)
> +{
> +
> +	unsigned i, j;
> +
> +	for (i = 0; i < DRM_NUM_MEM_TYPES; ++i) {
> +		for (j = 0; j < DRM_MAX_LRU_PRIORITY; ++j) {
> +			struct drm_lru_bulk_move_range *range = &bulk->range[i][j];
> +			struct drm_lru_mgr *mgr;
> +
> +			if (!range->first)
> +				continue;
> +
> +			mgr = entity_to_mgr(range->first);
> +			lockdep_assert_held(mgr->lru_lock);
> +			list_bulk_move_tail(&mgr->lru[range->first->priority], &range->first->lru,
> +					&range->last->lru);
> +		}
> +	}
> +}
> +EXPORT_SYMBOL(drm_lru_bulk_move_tail);
> +
> +/**
> + * drm_lru_add_bulk_move
> + *
> + * @entity: the lru entity to add to the bulk move range
> + * @bulk_move: the bulk move ranges to add the entity
> + *
> + * Add a lru entity to the tail of a bulk move range
> + */
> +void drm_lru_add_bulk_move(struct drm_lru_entity *entity,
> +						struct drm_lru_bulk_move *bulk_move)
> +{
> +	struct drm_lru_bulk_move_range *range;
> +
> +	range = &bulk_move->range[entity->mem_type][entity->priority];
> +
> +	if (!range->first) {
> +		range->first = entity;
> +		range->last = entity;
> +		return;
> +	}
> +
> +	drm_lru_bulk_move_range_tail(range, entity);
> +}
> +
> +EXPORT_SYMBOL(drm_lru_add_bulk_move);
> +/**
> + * drm_lru_del_bulk_move
> + *
> + * @entity: the lru entity to move from the bulk move range
> + * @bulk_move: the bulk move ranges to move the entity out of
> + *
> + * Move a lru entity out of bulk move range. This doesn't
> + * delete entity from lru manager's lru list.
> + */
> +void drm_lru_del_bulk_move(struct drm_lru_entity *entity,
> +					struct drm_lru_bulk_move *bulk_move)
> +{
> +	struct drm_lru_bulk_move_range *range;
> +
> +	range = &bulk_move->range[entity->mem_type][entity->priority];
> +
> +	if (unlikely(WARN_ON(!range->first || !range->last) ||
> +			(range->first == entity && range->last == entity))) {
> +		range->first = NULL;
> +		range->last = NULL;
> +	} else if (range->first == entity) {
> +		range->first = container_of(entity->lru.next,
> +				struct drm_lru_entity, lru);
> +	} else if (range->last == entity) {
> +		range->last = container_of(entity->lru.prev,
> +				struct drm_lru_entity, lru);
> +	} else {
> +		list_move(&entity->lru, &range->last->lru);
> +	}
> +}
> +EXPORT_SYMBOL(drm_lru_del_bulk_move);
> diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
> index d0b5f42786be..1bdcd34d3f6b 100644
> --- a/include/drm/drm_device.h
> +++ b/include/drm/drm_device.h
> @@ -8,6 +8,7 @@
>   
>   #include <drm/drm_legacy.h>
>   #include <drm/drm_mode_config.h>
> +#include <drm/drm_evictable_lru.h>
>   
>   struct drm_driver;
>   struct drm_minor;
> @@ -331,6 +332,12 @@ struct drm_device {
>   	 */
>   	spinlock_t lru_lock;
>   
> +	/**
> +	 * @lru_mgr: Device global lru managers per memory type or memory
> +	 * region. Each lru manager manages a lru list of this memory type.
> +	 */
> +	struct drm_lru_mgr lru_mgr[DRM_NUM_MEM_TYPES];
> +
>   	/* Everything below here is for legacy driver, never use! */
>   	/* private: */
>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
> diff --git a/include/drm/drm_evictable_lru.h b/include/drm/drm_evictable_lru.h
> new file mode 100644
> index 000000000000..3fd6bd2475d9
> --- /dev/null
> +++ b/include/drm/drm_evictable_lru.h
> @@ -0,0 +1,188 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2023 Intel Corporation
> + */
> +
> +#ifndef _DRM_EVICTABLE_LRU_H_
> +#define _DRM_EVICTABLE_LRU_H_
> +
> +#include <linux/list.h>
> +#include <linux/spinlock_types.h>
> +#include <linux/spinlock.h>
> +
> +struct drm_device;
> +
> +#define DRM_MAX_LRU_PRIORITY 4
> +#define DRM_NUM_MEM_TYPES 8
> +
> +/**
> + * struct drm_lru_entity
> + *
> + * @drm: drm device that this entity belongs to
> + * @mem_type: The memory type that this entity belongs to
> + * @size: resource size of this entity
> + * @priority: The priority of this entity
> + * @lru: least recent used list node, see &drm_lru_mgr.lru
> + *
> + * This structure represents an entity in drm_lru_mgr's
> + * list. This structure is supposed to be embedded in
> + * user's data structure.
> + */
> +struct drm_lru_entity {
> +	struct drm_device *drm;
> +	uint32_t mem_type;
> +	uint64_t size;
> +	uint32_t priority;
> +	struct list_head lru;
> +};
> +
> +/**
> + * struct drm_lru_mgr
> + *
> + * @used: whether this lru manager is used or not
> + * @size: size of the resource
> + * @usage: how much resource has been used
> + * @lru_lock: a weak reference to the global lru_lock
> + * @lru: least recent used list, per priority
> + *
> + * This structure maintains all the buffer allocations
> + * in a least recent used list, so a victim for eviction
> + * can be easily found.
> + */
> +struct drm_lru_mgr {
> +	bool used;
> +	uint64_t size;
> +	uint64_t usage;
> +	spinlock_t *lru_lock;
> +	struct list_head lru[DRM_MAX_LRU_PRIORITY];
> +};
> +
> +/**
> + * struct drm_lru_cursor
> + *
> + * @priority: the current priority
> + *
> + * Cursor to iterate over all entities in lru manager.
> + */
> +struct drm_lru_cursor {
> +	unsigned priority;
> +};
> +
> +/**
> + * struct drm_lru_bulk_move_range
> + *
> + * @first: the first entity in the range
> + * @last: the last entity in the range
> + *
> + * Range of entities on a lru list.
> + */
> +struct drm_lru_bulk_move_range
> +{
> +	struct drm_lru_entity *first;
> +	struct drm_lru_entity *last;
> +};
> +
> +/**
> + * struct drm_lru_bulk_move
> + *
> + * @range: An array of bulk move range, each corelates to the drm_lru_mgr's
> + * lru list of the same memory type and same priority.
> + *
> + * A collection of bulk_move range which can be used to move drm_lru_entity
> + * on the lru list in a bulk way. It should be initialized through
> + * drm_lru_bulk_move_init. Add/delete a drm_lru_entity to bulk move should call
> + * drm_lru_add_bulk_move/drm_lru_del_bulk_move.
> + */
> +struct drm_lru_bulk_move {
> +	struct drm_lru_bulk_move_range range[DRM_NUM_MEM_TYPES][DRM_MAX_LRU_PRIORITY];
> +};
> +
> +
> +
> +/**
> + * drm_lru_add_entity
> + *
> + * @entity: the lru entity to add
> + * @mgr: the drm lru manager
> + * @priority: specify which priority list to add
> + *
> + * Add an entity to lru list
> + */
> +static inline void drm_lru_add_entity(struct drm_lru_entity *entity,
> +		struct drm_lru_mgr *mgr, unsigned priority)
> +{
> +	lockdep_assert_held(mgr->lru_lock);
> +	list_add_tail(&entity->lru, &mgr->lru[priority]);
> +	mgr->usage += entity->size;
> +}
> +
> +/**
> + * drm_lru_remove_entity
> + *
> + * @entity: the lru entity to remove
> + * @mgr: the drm lru manager
> + *
> + * Remove an entity from lru list
> + */
> +static inline void drm_lru_remove_entity(struct drm_lru_entity *entity,
> +		struct drm_lru_mgr *mgr)
> +{
> +	lockdep_assert_held(mgr->lru_lock);
> +	list_del_init(&entity->lru);
> +	mgr->usage -= entity->size;
> +}
> +
> +/**
> + * drm_lru_mgr_fini
> + *
> + * @mgr: the drm lru manager
> + *
> + * de-initialize a lru manager
> + */
> +static inline void drm_lru_mgr_fini(struct drm_lru_mgr *mgr)
> +{
> +	mgr->used = false;
> +}
> +
> +void drm_lru_entity_init(struct drm_lru_entity *entity, struct drm_device *drm,
> +			uint32_t mem_type, uint64_t size, uint32_t priority);
> +
> +struct drm_lru_entity *
> +drm_lru_first(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor);
> +
> +struct drm_lru_entity *
> +drm_lru_next(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor,
> +		struct drm_lru_entity *entity);
> +
> +void drm_lru_mgr_init(struct drm_lru_mgr *mgr, uint64_t size,
> +		spinlock_t *lru_lock);
> +
> +void drm_lru_move_to_tail(struct drm_lru_entity * entity);
> +
> +void drm_lru_bulk_move_init(struct drm_lru_bulk_move *bulk_move);
> +
> +
> +void drm_lru_bulk_move_tail(struct drm_lru_bulk_move *bulk);
> +
> +void drm_lru_bulk_move_range_tail(struct drm_lru_bulk_move_range *range,
> +		struct drm_lru_entity *entity);
> +
> +void drm_lru_add_bulk_move(struct drm_lru_entity *entity,
> +		struct drm_lru_bulk_move *bulk_move);
> +
> +void drm_lru_del_bulk_move(struct drm_lru_entity *entity,
> +		struct drm_lru_bulk_move *bulk_move);
> +/**
> + * drm_lru_for_each_entity
> + *
> + * @mgr: the drm lru manager
> + * @cursor: cursor for the current position
> + * @entity: the current drm_lru_entity
> + *
> + * Iterate over all entities in drm lru manager
> + */
> +#define drm_lru_for_each_entity(mgr, cursor, entity)		\
> +	for (entity = drm_lru_first(mgr, cursor); entity;	\
> +	     entity = drm_lru_next(mgr, cursor, entity))
> +
> +#endif
Oak Zeng Nov. 3, 2023, 4:04 a.m. UTC | #2
> -----Original Message-----
> From: Christian König <christian.koenig@amd.com>
> Sent: Thursday, November 2, 2023 9:24 AM
> To: Zeng, Oak <oak.zeng@intel.com>; dri-devel@lists.freedesktop.org; intel-
> xe@lists.freedesktop.org
> Cc: Thomas.Hellstrom@linux.intel.com; felix.kuehling@amd.com;
> airlied@gmail.com; Welty, Brian <brian.welty@intel.com>
> Subject: Re: [RFC 03/11] drm: introduce drm evictable LRU
> 
> Am 02.11.23 um 05:32 schrieb Oak Zeng:
> > drm LRU manager is introuced for resource eviction purpose. It maintains
> > a LRU list per resource type.
> 
> Shouldn't we first add the possible resource types in a separate patch?

Resource type in my description message is not a good name. for resource type we have:
System memory
Gpu stolen memory
Normal gpu vram

Some device such as Intel's pvc has sub-device concept (it is called tile in xe driver). For such device, we create multiple vram type ttm resource manager and multiple lru manager, one for each sub-device...So currently we only defined a DRM_NUM_MEM_TYPES in lru manager, but not defining each resource (memory) type. 
> 
> >   It provides functions to add or remove
> > resource to or from the list. It also provides function to retrieve the
> > first entity on the LRU list.
> 
> + functions to iterate over them.

Yes basic iterate functions are implemented in this patch. Will add it to description message.
> 
> >
> > drm LRU manager also provides functions for bulk moving resources
> > on the LRU lists.
> >
> > drm LRU manager also does very basic memory accounting function, i.e.,
> > LRU manager keeps a size of this resource type and a usage member
> > for how much of resource has been added to this LRU manager's LRU
> > list. TTM resource manager memory accounting functoins such as
> > struct ttm_resource_manager::size and struct ttm_resource_manger::usage
> > are still kept. In the future, when SVM codes are in the picture,
> > those memory accounting functions need some rework to consider
> > the memory used by both TTM and SVM.
> 
> Please keep in mind that this structure needs to extremely small to be
> usable for SVM. E.g. struct page size small :)
> 
> At least HMM based implementations ideally wants to have one for each
> page or something like that.

Very good point. List node and eviction function pointer are necessary for drm_lru_entity. I will look whether we can remove other members. At least we can remove the drm_device pointer if we make drm_device base class of ttm_device as you suggested in previous patch comment.

And mem_type and priority can use bitfield, so a dword is enough.


> 
> > For one device, a global drm LRU manager per resource type should be
> > created/initialized at device initialization time. Drm LRU manager
> > instances are embedded in struct drm_device.
> >
> > It is pretty much moving some of the ttm resource manager functions
> > to the drm layer. The reason of this code refactory is, we want to
> > create a single LRU list for memory allocated from BO(buffer object)
> > based driver and hmm/svm(shared virtual memory) based driver, thus BO
> > driver and svm driver can evict memory from each other.
> >
> > Previously the LRU list in TTM resource manager (lru field in struct
> > ttm_reource_manager) is coupled with ttm_buffer_object concept, i.e.,
> > each ttm resource is backed by a ttm_buffer_object and the LRU list
> > is essentially a list of ttm_buffer_object.
> 
> Actually it's the other way around. The resource provides the backing of
> the BO.

You are right. Will fix this description.
> 
> And when a BO moves around it can temporary be that multiple resource
> point to the same BO.
> 
> I also want to have a more advanced iterator at some point where we grab
> the BO lock for keeping a reference into the LRU list. Not sure how to
> do this if we don't have the BO here any more.
> 
> Need to think about that further,

Don't quite get the what you want to do with the advanced iterator. But with this work, the lru entity is a base class of ttm_resource or any other resource struct in hmm/svm. Lru is decoupled from bo concept - this is why this lru can be shared with svm code which is bo-less.

Oak 

> Christian.
> 
> >   Due to this behavior, the
> > TTM resource manager can't be used by hmm/svm driver as we don't plan
> > to have the BO concept for the hmm/svm implemenation. So we decouple
> > the evictable LRU list from the BO concept in this series.
> >
> > The design goal of drm lru manager is to make it as lean as possible.
> > So each lru entity only has a list node member used to link this entity
> > to the evictable LRU list, and the basic resource size/type/priority
> > of this entity. It doesn't have any driver specify information. A lru
> > entity also has a function pointer of evict function. This is used to
> > implement ttm or svm specific eviction function. A lru entity is supposed
> > to be embedded in a driver specific structure such as struct
> > ttm_resource, see the usage in the next patch of this series.
> >
> > The ttm resource manager, and some of the ttm_bo functions such as
> > ttm_mem_evict_first will be rewriten using the new drm lru manager
> > library, see the next patch in this series.
> >
> > The future hmm/svm implemenation will call lru manager function to add
> > hmm/svm allocations to the shared evictable lru list.
> >
> > Lock design: previously ttm_resource LRU list is protected by a device
> > global ttm_device::lru_lock (bdev->lru_lock in codes). This lock also
> > protects ttm_buffer_object::pin_count, ttm_resource_manager::usage,
> > ttm_resource::bo, ttm_device::pinned list etc. With this refactory,
> > lru_lock is moved out of ttm_device and is added to struct drm_deive, so
> > it can be shared b/t ttm code and svm code.
> >
> > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > ---
> >   drivers/gpu/drm/Makefile            |   1 +
> >   drivers/gpu/drm/drm_evictable_lru.c | 232 ++++++++++++++++++++++++++++
> >   include/drm/drm_device.h            |   7 +
> >   include/drm/drm_evictable_lru.h     | 188 ++++++++++++++++++++++
> >   4 files changed, 428 insertions(+)
> >   create mode 100644 drivers/gpu/drm/drm_evictable_lru.c
> >   create mode 100644 include/drm/drm_evictable_lru.h
> >
> > diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> > index 1ad88efb1752..13953b0d271b 100644
> > --- a/drivers/gpu/drm/Makefile
> > +++ b/drivers/gpu/drm/Makefile
> > @@ -46,6 +46,7 @@ drm-y := \
> >   	drm_vblank_work.o \
> >   	drm_vma_manager.o \
> >   	drm_gpuva_mgr.o \
> > +	drm_evictable_lru.o \
> >   	drm_writeback.o
> >   drm-$(CONFIG_DRM_LEGACY) += \
> >   	drm_agpsupport.o \
> > diff --git a/drivers/gpu/drm/drm_evictable_lru.c
> b/drivers/gpu/drm/drm_evictable_lru.c
> > new file mode 100644
> > index 000000000000..2ba9105cca03
> > --- /dev/null
> > +++ b/drivers/gpu/drm/drm_evictable_lru.c
> > @@ -0,0 +1,232 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2023 Intel Corporation
> > + */
> > +
> > +#include <linux/lockdep.h>
> > +#include <linux/container_of.h>
> > +#include <drm/drm_evictable_lru.h>
> > +#include <drm/drm_device.h>
> > +
> > +static inline struct drm_lru_mgr *entity_to_mgr(struct drm_lru_entity *entity)
> > +{
> > +	struct drm_lru_mgr *mgr;
> > +
> > +	mgr = &entity->drm->lru_mgr[entity->mem_type];
> > +	BUG_ON(!mgr->used);
> > +
> > +	return mgr;
> > +}
> > +
> > +void drm_lru_entity_init(struct drm_lru_entity *entity, struct drm_device *drm,
> > +			uint32_t mem_type, uint64_t size, uint32_t priority)
> > +{
> > +	entity->drm = drm;
> > +	entity->mem_type = mem_type;
> > +	entity->size = size;
> > +	entity->priority = priority;
> > +	INIT_LIST_HEAD(&entity->lru);
> > +}
> > +
> > +/**
> > + * drm_lru_mgr_init
> > + *
> > + * @mgr: drm lru manager to init
> > + * @size: size of the resource managed by this manager
> > + * @lock: pointer of the global lru_lock
> > + *
> > + * Initialize a drm lru manager
> > + */
> > +void drm_lru_mgr_init(struct drm_lru_mgr *mgr, uint64_t size, spinlock_t *lock)
> > +{
> > +	unsigned j;
> > +
> > +	mgr->used = true;
> > +	mgr->size = size;
> > +	mgr->usage = 0;
> > +	mgr->lru_lock = lock;
> > +
> > +	for(j = 0; j < DRM_MAX_LRU_PRIORITY; j++)
> > +		INIT_LIST_HEAD(&mgr->lru[j]);
> > +}
> > +
> > +void drm_lru_bulk_move_init(struct drm_lru_bulk_move *bulk_move)
> > +{
> > +	memset(bulk_move, 0, sizeof(*bulk_move));
> > +}
> > +
> > +/**
> > + * drm_lru_first
> > + *
> > + * @mgr: drm lru manager to iterate over
> > + * @cursor: cursor of the current position
> > + *
> > + * Returns the first entity in drm lru manager
> > + */
> > +struct drm_lru_entity *
> > +drm_lru_first(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor)
> > +{
> > +	struct drm_lru_entity *entity;
> > +
> > +	lockdep_assert_held(mgr->lru_lock);
> > +
> > +	for(cursor->priority = 0; cursor->priority < DRM_MAX_LRU_PRIORITY;
> ++cursor->priority)
> > +		list_for_each_entry(entity, &mgr->lru[cursor->priority], lru)
> > +			return entity;
> > +
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * drm_lru_next
> > + *
> > + * @mgr: drm lru manager to iterate over
> > + * @cursor: cursor of the current position
> > + * @entity: the current lru entity pointer
> > + *
> > + * Returns the next entity from drm lru manager
> > + */
> > +struct drm_lru_entity *
> > +drm_lru_next(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor,
> > +		struct drm_lru_entity *entity)
> > +{
> > +	lockdep_assert_held(mgr->lru_lock);
> > +
> > +	list_for_each_entry_continue(entity, &mgr->lru[cursor->priority], lru)
> > +		return entity;
> > +
> > +	for(++cursor->priority; cursor->priority < DRM_MAX_LRU_PRIORITY;
> ++cursor->priority)
> > +		list_for_each_entry(entity, &mgr->lru[cursor->priority], lru)
> > +			return entity;
> > +
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * drm_lru_move_to_tail
> > + *
> > + * @entity: the lru entity to move to lru tail
> > + *
> > + * Move a lru entity to lru tail
> > + */
> > +void drm_lru_move_to_tail(struct drm_lru_entity * entity)
> > +{
> > +	struct list_head *lru;
> > +	struct drm_lru_mgr *mgr;
> > +
> > +	mgr = entity_to_mgr(entity);
> > +	lockdep_assert_held(mgr->lru_lock);
> > +	lru = &mgr->lru[entity->priority];
> > +	list_move_tail(&entity->lru, lru);
> > +}
> > +
> > +/**
> > + * drm_lru_bulk_move_range_tail
> > + *
> > + * @range: bulk move range
> > + * @entity: lru_entity to move
> > + *
> > + * Move a lru_entity to the tail of a bulk move range
> > + */
> > +void drm_lru_bulk_move_range_tail(struct drm_lru_bulk_move_range *range,
> > +									struct
> drm_lru_entity *entity)
> > +{
> > +	if (entity == range->last)
> > +		return;
> > +
> > +	if (entity == range->first)
> > +		range->first = container_of(entity->lru.next, struct drm_lru_entity,
> lru);
> > +
> > +	if (range->last)
> > +		list_move(&entity->lru, &range->last->lru);
> > +
> > +	range->last = entity;
> > +}
> > +EXPORT_SYMBOL(drm_lru_bulk_move_range_tail);
> > +
> > +/**
> > + * drm_lru_bulk_move_tail - bulk move range of entities to the LRU tail.
> > + *
> > + * @bulk: bulk_move structure
> > + *
> > + * Bulk move entities to the LRU tail, only valid to use when driver makes sure that
> > + * resource order never changes.
> > + */
> > +void drm_lru_bulk_move_tail(struct drm_lru_bulk_move *bulk)
> > +{
> > +
> > +	unsigned i, j;
> > +
> > +	for (i = 0; i < DRM_NUM_MEM_TYPES; ++i) {
> > +		for (j = 0; j < DRM_MAX_LRU_PRIORITY; ++j) {
> > +			struct drm_lru_bulk_move_range *range = &bulk-
> >range[i][j];
> > +			struct drm_lru_mgr *mgr;
> > +
> > +			if (!range->first)
> > +				continue;
> > +
> > +			mgr = entity_to_mgr(range->first);
> > +			lockdep_assert_held(mgr->lru_lock);
> > +			list_bulk_move_tail(&mgr->lru[range->first->priority],
> &range->first->lru,
> > +					&range->last->lru);
> > +		}
> > +	}
> > +}
> > +EXPORT_SYMBOL(drm_lru_bulk_move_tail);
> > +
> > +/**
> > + * drm_lru_add_bulk_move
> > + *
> > + * @entity: the lru entity to add to the bulk move range
> > + * @bulk_move: the bulk move ranges to add the entity
> > + *
> > + * Add a lru entity to the tail of a bulk move range
> > + */
> > +void drm_lru_add_bulk_move(struct drm_lru_entity *entity,
> > +						struct drm_lru_bulk_move
> *bulk_move)
> > +{
> > +	struct drm_lru_bulk_move_range *range;
> > +
> > +	range = &bulk_move->range[entity->mem_type][entity->priority];
> > +
> > +	if (!range->first) {
> > +		range->first = entity;
> > +		range->last = entity;
> > +		return;
> > +	}
> > +
> > +	drm_lru_bulk_move_range_tail(range, entity);
> > +}
> > +
> > +EXPORT_SYMBOL(drm_lru_add_bulk_move);
> > +/**
> > + * drm_lru_del_bulk_move
> > + *
> > + * @entity: the lru entity to move from the bulk move range
> > + * @bulk_move: the bulk move ranges to move the entity out of
> > + *
> > + * Move a lru entity out of bulk move range. This doesn't
> > + * delete entity from lru manager's lru list.
> > + */
> > +void drm_lru_del_bulk_move(struct drm_lru_entity *entity,
> > +					struct drm_lru_bulk_move *bulk_move)
> > +{
> > +	struct drm_lru_bulk_move_range *range;
> > +
> > +	range = &bulk_move->range[entity->mem_type][entity->priority];
> > +
> > +	if (unlikely(WARN_ON(!range->first || !range->last) ||
> > +			(range->first == entity && range->last == entity))) {
> > +		range->first = NULL;
> > +		range->last = NULL;
> > +	} else if (range->first == entity) {
> > +		range->first = container_of(entity->lru.next,
> > +				struct drm_lru_entity, lru);
> > +	} else if (range->last == entity) {
> > +		range->last = container_of(entity->lru.prev,
> > +				struct drm_lru_entity, lru);
> > +	} else {
> > +		list_move(&entity->lru, &range->last->lru);
> > +	}
> > +}
> > +EXPORT_SYMBOL(drm_lru_del_bulk_move);
> > diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
> > index d0b5f42786be..1bdcd34d3f6b 100644
> > --- a/include/drm/drm_device.h
> > +++ b/include/drm/drm_device.h
> > @@ -8,6 +8,7 @@
> >
> >   #include <drm/drm_legacy.h>
> >   #include <drm/drm_mode_config.h>
> > +#include <drm/drm_evictable_lru.h>
> >
> >   struct drm_driver;
> >   struct drm_minor;
> > @@ -331,6 +332,12 @@ struct drm_device {
> >   	 */
> >   	spinlock_t lru_lock;
> >
> > +	/**
> > +	 * @lru_mgr: Device global lru managers per memory type or memory
> > +	 * region. Each lru manager manages a lru list of this memory type.
> > +	 */
> > +	struct drm_lru_mgr lru_mgr[DRM_NUM_MEM_TYPES];
> > +
> >   	/* Everything below here is for legacy driver, never use! */
> >   	/* private: */
> >   #if IS_ENABLED(CONFIG_DRM_LEGACY)
> > diff --git a/include/drm/drm_evictable_lru.h b/include/drm/drm_evictable_lru.h
> > new file mode 100644
> > index 000000000000..3fd6bd2475d9
> > --- /dev/null
> > +++ b/include/drm/drm_evictable_lru.h
> > @@ -0,0 +1,188 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2023 Intel Corporation
> > + */
> > +
> > +#ifndef _DRM_EVICTABLE_LRU_H_
> > +#define _DRM_EVICTABLE_LRU_H_
> > +
> > +#include <linux/list.h>
> > +#include <linux/spinlock_types.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct drm_device;
> > +
> > +#define DRM_MAX_LRU_PRIORITY 4
> > +#define DRM_NUM_MEM_TYPES 8
> > +
> > +/**
> > + * struct drm_lru_entity
> > + *
> > + * @drm: drm device that this entity belongs to
> > + * @mem_type: The memory type that this entity belongs to
> > + * @size: resource size of this entity
> > + * @priority: The priority of this entity
> > + * @lru: least recent used list node, see &drm_lru_mgr.lru
> > + *
> > + * This structure represents an entity in drm_lru_mgr's
> > + * list. This structure is supposed to be embedded in
> > + * user's data structure.
> > + */
> > +struct drm_lru_entity {
> > +	struct drm_device *drm;
> > +	uint32_t mem_type;
> > +	uint64_t size;
> > +	uint32_t priority;
> > +	struct list_head lru;
> > +};
> > +
> > +/**
> > + * struct drm_lru_mgr
> > + *
> > + * @used: whether this lru manager is used or not
> > + * @size: size of the resource
> > + * @usage: how much resource has been used
> > + * @lru_lock: a weak reference to the global lru_lock
> > + * @lru: least recent used list, per priority
> > + *
> > + * This structure maintains all the buffer allocations
> > + * in a least recent used list, so a victim for eviction
> > + * can be easily found.
> > + */
> > +struct drm_lru_mgr {
> > +	bool used;
> > +	uint64_t size;
> > +	uint64_t usage;
> > +	spinlock_t *lru_lock;
> > +	struct list_head lru[DRM_MAX_LRU_PRIORITY];
> > +};
> > +
> > +/**
> > + * struct drm_lru_cursor
> > + *
> > + * @priority: the current priority
> > + *
> > + * Cursor to iterate over all entities in lru manager.
> > + */
> > +struct drm_lru_cursor {
> > +	unsigned priority;
> > +};
> > +
> > +/**
> > + * struct drm_lru_bulk_move_range
> > + *
> > + * @first: the first entity in the range
> > + * @last: the last entity in the range
> > + *
> > + * Range of entities on a lru list.
> > + */
> > +struct drm_lru_bulk_move_range
> > +{
> > +	struct drm_lru_entity *first;
> > +	struct drm_lru_entity *last;
> > +};
> > +
> > +/**
> > + * struct drm_lru_bulk_move
> > + *
> > + * @range: An array of bulk move range, each corelates to the drm_lru_mgr's
> > + * lru list of the same memory type and same priority.
> > + *
> > + * A collection of bulk_move range which can be used to move drm_lru_entity
> > + * on the lru list in a bulk way. It should be initialized through
> > + * drm_lru_bulk_move_init. Add/delete a drm_lru_entity to bulk move should call
> > + * drm_lru_add_bulk_move/drm_lru_del_bulk_move.
> > + */
> > +struct drm_lru_bulk_move {
> > +	struct drm_lru_bulk_move_range
> range[DRM_NUM_MEM_TYPES][DRM_MAX_LRU_PRIORITY];
> > +};
> > +
> > +
> > +
> > +/**
> > + * drm_lru_add_entity
> > + *
> > + * @entity: the lru entity to add
> > + * @mgr: the drm lru manager
> > + * @priority: specify which priority list to add
> > + *
> > + * Add an entity to lru list
> > + */
> > +static inline void drm_lru_add_entity(struct drm_lru_entity *entity,
> > +		struct drm_lru_mgr *mgr, unsigned priority)
> > +{
> > +	lockdep_assert_held(mgr->lru_lock);
> > +	list_add_tail(&entity->lru, &mgr->lru[priority]);
> > +	mgr->usage += entity->size;
> > +}
> > +
> > +/**
> > + * drm_lru_remove_entity
> > + *
> > + * @entity: the lru entity to remove
> > + * @mgr: the drm lru manager
> > + *
> > + * Remove an entity from lru list
> > + */
> > +static inline void drm_lru_remove_entity(struct drm_lru_entity *entity,
> > +		struct drm_lru_mgr *mgr)
> > +{
> > +	lockdep_assert_held(mgr->lru_lock);
> > +	list_del_init(&entity->lru);
> > +	mgr->usage -= entity->size;
> > +}
> > +
> > +/**
> > + * drm_lru_mgr_fini
> > + *
> > + * @mgr: the drm lru manager
> > + *
> > + * de-initialize a lru manager
> > + */
> > +static inline void drm_lru_mgr_fini(struct drm_lru_mgr *mgr)
> > +{
> > +	mgr->used = false;
> > +}
> > +
> > +void drm_lru_entity_init(struct drm_lru_entity *entity, struct drm_device *drm,
> > +			uint32_t mem_type, uint64_t size, uint32_t priority);
> > +
> > +struct drm_lru_entity *
> > +drm_lru_first(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor);
> > +
> > +struct drm_lru_entity *
> > +drm_lru_next(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor,
> > +		struct drm_lru_entity *entity);
> > +
> > +void drm_lru_mgr_init(struct drm_lru_mgr *mgr, uint64_t size,
> > +		spinlock_t *lru_lock);
> > +
> > +void drm_lru_move_to_tail(struct drm_lru_entity * entity);
> > +
> > +void drm_lru_bulk_move_init(struct drm_lru_bulk_move *bulk_move);
> > +
> > +
> > +void drm_lru_bulk_move_tail(struct drm_lru_bulk_move *bulk);
> > +
> > +void drm_lru_bulk_move_range_tail(struct drm_lru_bulk_move_range *range,
> > +		struct drm_lru_entity *entity);
> > +
> > +void drm_lru_add_bulk_move(struct drm_lru_entity *entity,
> > +		struct drm_lru_bulk_move *bulk_move);
> > +
> > +void drm_lru_del_bulk_move(struct drm_lru_entity *entity,
> > +		struct drm_lru_bulk_move *bulk_move);
> > +/**
> > + * drm_lru_for_each_entity
> > + *
> > + * @mgr: the drm lru manager
> > + * @cursor: cursor for the current position
> > + * @entity: the current drm_lru_entity
> > + *
> > + * Iterate over all entities in drm lru manager
> > + */
> > +#define drm_lru_for_each_entity(mgr, cursor, entity)		\
> > +	for (entity = drm_lru_first(mgr, cursor); entity;	\
> > +	     entity = drm_lru_next(mgr, cursor, entity))
> > +
> > +#endif
Christian König Nov. 3, 2023, 9:36 a.m. UTC | #3
Am 03.11.23 um 05:04 schrieb Zeng, Oak:[SNIP]
> I also want to have a more advanced iterator at some point where we grab
> the BO lock for keeping a reference into the LRU list. Not sure how to
> do this if we don't have the BO here any more.
>
> Need to think about that further,
> Don't quite get the what you want to do with the advanced iterator. But with this work, the lru entity is a base class of ttm_resource or any other resource struct in hmm/svm. Lru is decoupled from bo concept - this is why this lru can be shared with svm code which is bo-less.

This is just a crazy idea I had because TTM tends to perform bad on 
certain tasks.

When we start to evict something we use a callback which indicates if an 
eviction is valuable or not. So it can happen that we have to skip quite 
a bunch of BOs on the LRU until we found one which is worth evicting.

Not it can be that the first eviction doesn't make enough room to 
fulfill the allocation requirement, in this case we currently start over 
at the beginning searching for some BO to evict.

I want to avoid this by being able to have cursors into the LRU, e.g. 
the next BO which can't move until we have evicted the current one.

BTW: How do you handle eviction here? I mean we can't call the evict 
callback with the spinlock held easily?

Christian.

>
> Oak
>
Oak Zeng Nov. 3, 2023, 2:36 p.m. UTC | #4
From: Christian König <christian.koenig@amd.com>
Sent: Friday, November 3, 2023 5:36 AM
To: Zeng, Oak <oak.zeng@intel.com>; dri-devel@lists.freedesktop.org; intel-xe@lists.freedesktop.org
Cc: Thomas.Hellstrom@linux.intel.com; felix.kuehling@amd.com; airlied@gmail.com; Welty, Brian <brian.welty@intel.com>
Subject: Re: [RFC 03/11] drm: introduce drm evictable LRU

Am 03.11.23 um 05:04 schrieb Zeng, Oak:[SNIP]



I also want to have a more advanced iterator at some point where we grab

the BO lock for keeping a reference into the LRU list. Not sure how to

do this if we don't have the BO here any more.



Need to think about that further,



Don't quite get the what you want to do with the advanced iterator. But with this work, the lru entity is a base class of ttm_resource or any other resource struct in hmm/svm. Lru is decoupled from bo concept - this is why this lru can be shared with svm code which is bo-less.

This is just a crazy idea I had because TTM tends to perform bad on certain tasks.

When we start to evict something we use a callback which indicates if an eviction is valuable or not. So it can happen that we have to skip quite a bunch of BOs on the LRU until we found one which is worth evicting.

Not it can be that the first eviction doesn't make enough room to fulfill the allocation requirement, in this case we currently start over at the beginning searching for some BO to evict.

I want to avoid this by being able to have cursors into the LRU, e.g. the next BO which can't move until we have evicted the current one.


Got you now. I didn’t know this problem so I didn’t try to fix this efficiency problem in this series. Theoretically I think we can fix this issue this way: change ttm_mem_evict_first to ttm_mem_evict_first_n and add a parameter to this function to specify how much room we want to yield; then we evict the first n objects to make enough room before return, or fail if we can’t make enough room. This scheme would need the caller of ttm_mem_evict_first to tell how much room he need – I think reasonable.


BTW: How do you handle eviction here? I mean we can't call the evict callback with the spinlock held easily?

I was actually struggling when I refactored ttm_mem_evict_first function. I moved this function to lru manager and abstracted 3 callback functions (evict_allowable/valuable, evict_entity, evict_busy_entity) – those need to relook when hmm/svm codes come in picture. I tried not to change any logic of this function – I know people worked on this function in the past 15 years so better to be very careful.

So in my current implementation, spinlock is held calling the evict_entity callback. Spinlock is unlocked before calling ttm_bo_evict in the evict_entity callback and re-held if we need to move entity in lru list. See details in patch 4 and patch 10. So it keeps exactly the original call sequence but does look awkward.

But I think you are right. We can release the spinlock in the drm_lru_evict_first function before calling evict callback.

Oak


Christian.






Oak
diff mbox series

Patch

diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 1ad88efb1752..13953b0d271b 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -46,6 +46,7 @@  drm-y := \
 	drm_vblank_work.o \
 	drm_vma_manager.o \
 	drm_gpuva_mgr.o \
+	drm_evictable_lru.o \
 	drm_writeback.o
 drm-$(CONFIG_DRM_LEGACY) += \
 	drm_agpsupport.o \
diff --git a/drivers/gpu/drm/drm_evictable_lru.c b/drivers/gpu/drm/drm_evictable_lru.c
new file mode 100644
index 000000000000..2ba9105cca03
--- /dev/null
+++ b/drivers/gpu/drm/drm_evictable_lru.c
@@ -0,0 +1,232 @@ 
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/lockdep.h>
+#include <linux/container_of.h>
+#include <drm/drm_evictable_lru.h>
+#include <drm/drm_device.h>
+
+static inline struct drm_lru_mgr *entity_to_mgr(struct drm_lru_entity *entity)
+{
+	struct drm_lru_mgr *mgr;
+
+	mgr = &entity->drm->lru_mgr[entity->mem_type];
+	BUG_ON(!mgr->used);
+
+	return mgr;
+}
+
+void drm_lru_entity_init(struct drm_lru_entity *entity, struct drm_device *drm,
+			uint32_t mem_type, uint64_t size, uint32_t priority)
+{
+	entity->drm = drm;
+	entity->mem_type = mem_type;
+	entity->size = size;
+	entity->priority = priority;
+	INIT_LIST_HEAD(&entity->lru);
+}
+
+/**
+ * drm_lru_mgr_init
+ *
+ * @mgr: drm lru manager to init
+ * @size: size of the resource managed by this manager
+ * @lock: pointer of the global lru_lock
+ *
+ * Initialize a drm lru manager
+ */
+void drm_lru_mgr_init(struct drm_lru_mgr *mgr, uint64_t size, spinlock_t *lock)
+{
+	unsigned j;
+
+	mgr->used = true;
+	mgr->size = size;
+	mgr->usage = 0;
+	mgr->lru_lock = lock;
+
+	for(j = 0; j < DRM_MAX_LRU_PRIORITY; j++)
+		INIT_LIST_HEAD(&mgr->lru[j]);
+}
+
+void drm_lru_bulk_move_init(struct drm_lru_bulk_move *bulk_move)
+{
+	memset(bulk_move, 0, sizeof(*bulk_move));
+}
+
+/**
+ * drm_lru_first
+ *
+ * @mgr: drm lru manager to iterate over
+ * @cursor: cursor of the current position
+ *
+ * Returns the first entity in drm lru manager
+ */
+struct drm_lru_entity *
+drm_lru_first(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor)
+{
+	struct drm_lru_entity *entity;
+
+	lockdep_assert_held(mgr->lru_lock);
+
+	for(cursor->priority = 0; cursor->priority < DRM_MAX_LRU_PRIORITY; ++cursor->priority)
+		list_for_each_entry(entity, &mgr->lru[cursor->priority], lru)
+			return entity;
+
+	return NULL;
+}
+
+/**
+ * drm_lru_next
+ *
+ * @mgr: drm lru manager to iterate over
+ * @cursor: cursor of the current position
+ * @entity: the current lru entity pointer
+ *
+ * Returns the next entity from drm lru manager
+ */
+struct drm_lru_entity *
+drm_lru_next(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor,
+		struct drm_lru_entity *entity)
+{
+	lockdep_assert_held(mgr->lru_lock);
+
+	list_for_each_entry_continue(entity, &mgr->lru[cursor->priority], lru)
+		return entity;
+
+	for(++cursor->priority; cursor->priority < DRM_MAX_LRU_PRIORITY; ++cursor->priority)
+		list_for_each_entry(entity, &mgr->lru[cursor->priority], lru)
+			return entity;
+
+	return NULL;
+}
+
+/**
+ * drm_lru_move_to_tail
+ *
+ * @entity: the lru entity to move to lru tail
+ *
+ * Move a lru entity to lru tail
+ */
+void drm_lru_move_to_tail(struct drm_lru_entity * entity)
+{
+	struct list_head *lru;
+	struct drm_lru_mgr *mgr;
+
+	mgr = entity_to_mgr(entity);
+	lockdep_assert_held(mgr->lru_lock);
+	lru = &mgr->lru[entity->priority];
+	list_move_tail(&entity->lru, lru);
+}
+
+/**
+ * drm_lru_bulk_move_range_tail
+ *
+ * @range: bulk move range
+ * @entity: lru_entity to move
+ *
+ * Move a lru_entity to the tail of a bulk move range
+ */
+void drm_lru_bulk_move_range_tail(struct drm_lru_bulk_move_range *range,
+									struct drm_lru_entity *entity)
+{
+	if (entity == range->last)
+		return;
+
+	if (entity == range->first)
+		range->first = container_of(entity->lru.next, struct drm_lru_entity, lru);
+
+	if (range->last)
+		list_move(&entity->lru, &range->last->lru);
+
+	range->last = entity;
+}
+EXPORT_SYMBOL(drm_lru_bulk_move_range_tail);
+
+/**
+ * drm_lru_bulk_move_tail - bulk move range of entities to the LRU tail.
+ *
+ * @bulk: bulk_move structure
+ *
+ * Bulk move entities to the LRU tail, only valid to use when driver makes sure that
+ * resource order never changes.
+ */
+void drm_lru_bulk_move_tail(struct drm_lru_bulk_move *bulk)
+{
+
+	unsigned i, j;
+
+	for (i = 0; i < DRM_NUM_MEM_TYPES; ++i) {
+		for (j = 0; j < DRM_MAX_LRU_PRIORITY; ++j) {
+			struct drm_lru_bulk_move_range *range = &bulk->range[i][j];
+			struct drm_lru_mgr *mgr;
+
+			if (!range->first)
+				continue;
+
+			mgr = entity_to_mgr(range->first);
+			lockdep_assert_held(mgr->lru_lock);
+			list_bulk_move_tail(&mgr->lru[range->first->priority], &range->first->lru,
+					&range->last->lru);
+		}
+	}
+}
+EXPORT_SYMBOL(drm_lru_bulk_move_tail);
+
+/**
+ * drm_lru_add_bulk_move
+ *
+ * @entity: the lru entity to add to the bulk move range
+ * @bulk_move: the bulk move ranges to add the entity
+ *
+ * Add a lru entity to the tail of a bulk move range
+ */
+void drm_lru_add_bulk_move(struct drm_lru_entity *entity,
+						struct drm_lru_bulk_move *bulk_move)
+{
+	struct drm_lru_bulk_move_range *range;
+
+	range = &bulk_move->range[entity->mem_type][entity->priority];
+
+	if (!range->first) {
+		range->first = entity;
+		range->last = entity;
+		return;
+	}
+
+	drm_lru_bulk_move_range_tail(range, entity);
+}
+
+EXPORT_SYMBOL(drm_lru_add_bulk_move);
+/**
+ * drm_lru_del_bulk_move
+ *
+ * @entity: the lru entity to move from the bulk move range
+ * @bulk_move: the bulk move ranges to move the entity out of
+ *
+ * Move a lru entity out of bulk move range. This doesn't
+ * delete entity from lru manager's lru list.
+ */
+void drm_lru_del_bulk_move(struct drm_lru_entity *entity,
+					struct drm_lru_bulk_move *bulk_move)
+{
+	struct drm_lru_bulk_move_range *range;
+
+	range = &bulk_move->range[entity->mem_type][entity->priority];
+
+	if (unlikely(WARN_ON(!range->first || !range->last) ||
+			(range->first == entity && range->last == entity))) {
+		range->first = NULL;
+		range->last = NULL;
+	} else if (range->first == entity) {
+		range->first = container_of(entity->lru.next,
+				struct drm_lru_entity, lru);
+	} else if (range->last == entity) {
+		range->last = container_of(entity->lru.prev,
+				struct drm_lru_entity, lru);
+	} else {
+		list_move(&entity->lru, &range->last->lru);
+	}
+}
+EXPORT_SYMBOL(drm_lru_del_bulk_move);
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index d0b5f42786be..1bdcd34d3f6b 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -8,6 +8,7 @@ 
 
 #include <drm/drm_legacy.h>
 #include <drm/drm_mode_config.h>
+#include <drm/drm_evictable_lru.h>
 
 struct drm_driver;
 struct drm_minor;
@@ -331,6 +332,12 @@  struct drm_device {
 	 */
 	spinlock_t lru_lock;
 
+	/**
+	 * @lru_mgr: Device global lru managers per memory type or memory
+	 * region. Each lru manager manages a lru list of this memory type.
+	 */
+	struct drm_lru_mgr lru_mgr[DRM_NUM_MEM_TYPES];
+
 	/* Everything below here is for legacy driver, never use! */
 	/* private: */
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
diff --git a/include/drm/drm_evictable_lru.h b/include/drm/drm_evictable_lru.h
new file mode 100644
index 000000000000..3fd6bd2475d9
--- /dev/null
+++ b/include/drm/drm_evictable_lru.h
@@ -0,0 +1,188 @@ 
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _DRM_EVICTABLE_LRU_H_
+#define _DRM_EVICTABLE_LRU_H_
+
+#include <linux/list.h>
+#include <linux/spinlock_types.h>
+#include <linux/spinlock.h>
+
+struct drm_device;
+
+#define DRM_MAX_LRU_PRIORITY 4
+#define DRM_NUM_MEM_TYPES 8
+
+/**
+ * struct drm_lru_entity
+ *
+ * @drm: drm device that this entity belongs to
+ * @mem_type: The memory type that this entity belongs to
+ * @size: resource size of this entity
+ * @priority: The priority of this entity
+ * @lru: least recent used list node, see &drm_lru_mgr.lru
+ *
+ * This structure represents an entity in drm_lru_mgr's
+ * list. This structure is supposed to be embedded in
+ * user's data structure.
+ */
+struct drm_lru_entity {
+	struct drm_device *drm;
+	uint32_t mem_type;
+	uint64_t size;
+	uint32_t priority;
+	struct list_head lru;
+};
+
+/**
+ * struct drm_lru_mgr
+ *
+ * @used: whether this lru manager is used or not
+ * @size: size of the resource
+ * @usage: how much resource has been used
+ * @lru_lock: a weak reference to the global lru_lock
+ * @lru: least recent used list, per priority
+ *
+ * This structure maintains all the buffer allocations
+ * in a least recent used list, so a victim for eviction
+ * can be easily found.
+ */
+struct drm_lru_mgr {
+	bool used;
+	uint64_t size;
+	uint64_t usage;
+	spinlock_t *lru_lock;
+	struct list_head lru[DRM_MAX_LRU_PRIORITY];
+};
+
+/**
+ * struct drm_lru_cursor
+ *
+ * @priority: the current priority
+ *
+ * Cursor to iterate over all entities in lru manager.
+ */
+struct drm_lru_cursor {
+	unsigned priority;
+};
+
+/**
+ * struct drm_lru_bulk_move_range
+ *
+ * @first: the first entity in the range
+ * @last: the last entity in the range
+ *
+ * Range of entities on a lru list.
+ */
+struct drm_lru_bulk_move_range
+{
+	struct drm_lru_entity *first;
+	struct drm_lru_entity *last;
+};
+
+/**
+ * struct drm_lru_bulk_move
+ *
+ * @range: An array of bulk move range, each corelates to the drm_lru_mgr's
+ * lru list of the same memory type and same priority.
+ *
+ * A collection of bulk_move range which can be used to move drm_lru_entity
+ * on the lru list in a bulk way. It should be initialized through
+ * drm_lru_bulk_move_init. Add/delete a drm_lru_entity to bulk move should call
+ * drm_lru_add_bulk_move/drm_lru_del_bulk_move.
+ */
+struct drm_lru_bulk_move {
+	struct drm_lru_bulk_move_range range[DRM_NUM_MEM_TYPES][DRM_MAX_LRU_PRIORITY];
+};
+
+
+
+/**
+ * drm_lru_add_entity
+ *
+ * @entity: the lru entity to add
+ * @mgr: the drm lru manager
+ * @priority: specify which priority list to add
+ *
+ * Add an entity to lru list
+ */
+static inline void drm_lru_add_entity(struct drm_lru_entity *entity,
+		struct drm_lru_mgr *mgr, unsigned priority)
+{
+	lockdep_assert_held(mgr->lru_lock);
+	list_add_tail(&entity->lru, &mgr->lru[priority]);
+	mgr->usage += entity->size;
+}
+
+/**
+ * drm_lru_remove_entity
+ *
+ * @entity: the lru entity to remove
+ * @mgr: the drm lru manager
+ *
+ * Remove an entity from lru list
+ */
+static inline void drm_lru_remove_entity(struct drm_lru_entity *entity,
+		struct drm_lru_mgr *mgr)
+{
+	lockdep_assert_held(mgr->lru_lock);
+	list_del_init(&entity->lru);
+	mgr->usage -= entity->size;
+}
+
+/**
+ * drm_lru_mgr_fini
+ *
+ * @mgr: the drm lru manager
+ *
+ * de-initialize a lru manager
+ */
+static inline void drm_lru_mgr_fini(struct drm_lru_mgr *mgr)
+{
+	mgr->used = false;
+}
+
+void drm_lru_entity_init(struct drm_lru_entity *entity, struct drm_device *drm,
+			uint32_t mem_type, uint64_t size, uint32_t priority);
+
+struct drm_lru_entity *
+drm_lru_first(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor);
+
+struct drm_lru_entity *
+drm_lru_next(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor,
+		struct drm_lru_entity *entity);
+
+void drm_lru_mgr_init(struct drm_lru_mgr *mgr, uint64_t size,
+		spinlock_t *lru_lock);
+
+void drm_lru_move_to_tail(struct drm_lru_entity * entity);
+
+void drm_lru_bulk_move_init(struct drm_lru_bulk_move *bulk_move);
+
+
+void drm_lru_bulk_move_tail(struct drm_lru_bulk_move *bulk);
+
+void drm_lru_bulk_move_range_tail(struct drm_lru_bulk_move_range *range,
+		struct drm_lru_entity *entity);
+
+void drm_lru_add_bulk_move(struct drm_lru_entity *entity,
+		struct drm_lru_bulk_move *bulk_move);
+
+void drm_lru_del_bulk_move(struct drm_lru_entity *entity,
+		struct drm_lru_bulk_move *bulk_move);
+/**
+ * drm_lru_for_each_entity
+ *
+ * @mgr: the drm lru manager
+ * @cursor: cursor for the current position
+ * @entity: the current drm_lru_entity
+ *
+ * Iterate over all entities in drm lru manager
+ */
+#define drm_lru_for_each_entity(mgr, cursor, entity)		\
+	for (entity = drm_lru_first(mgr, cursor); entity;	\
+	     entity = drm_lru_next(mgr, cursor, entity))
+
+#endif